Commit Graph

726 Commits

Author SHA1 Message Date
358efe7ae0 Manager: perform a database vacuum after migrations
Just to make sure the DB is properly cleaned up after a big migration
happened.
2024-03-06 11:59:17 +01:00
16114ee529 Worker: fix Go scheduling issue in sleep command test
Add a 1ms delay in the test loop, so that other goroutines can be scheduled
as well. This should fix #104288.
2024-03-04 14:18:08 +01:00
27cbb2ed0f Manager: increase timeout for database integrity check
With a fuller database, 2 seconds is apparently not always long enough,
so increase the timeout to 10 seconds.
2024-03-04 14:04:59 +01:00
a4e5eef83e Manager: fix database migration 0004
Fix the database migration that adds `NOT NULL` clauses. It used
`INSERT INTO temp_x SELECT * from x;`, and the `*` returns the fields in
the order they are defined on the table. Since this might be different from
the order that the `INSERT INTO temp_x` expects, strange problems can
happen where columns get swapped (or constraints can fail on columns that
they should not fail for, because they got fed data from a different
column).
2024-03-04 13:06:09 +01:00
7b72d0ca43 Refactor: move jobs-related queries to queries_jobs.sql
This makes it easier to later also create `query_workesr.sql`,
`query_meta.sql` etc. so that the sqlc-generated code can follow the
same subdivision as the persistence service code itself.

No functional changes.
2024-03-03 23:27:55 +01:00
b102b73a1f Refactor: convert more job functions to sqlc
No functional changes.
2024-03-03 23:23:51 +01:00
1ac796d0d8 Refactor: Manager: remove unused query from queries.sql
No functional changes.
2024-03-03 22:42:37 +01:00
3fbb3cde34 Manager: SQLC rename Uuid to UUID
No functional changes.
2024-03-03 20:54:43 +01:00
c046094880 Manager: start replacing GORM with SQLC
GORM has certain downsides:

- Code-first approach, where queries have to be translated to the Go code
  required to execute them.
- GORM comes with its own SQLite implementation, which doesn't provide an
  on-connect callback. This means that new connections cannot correctly
  enable foreign key constraints, causing database consistency issues.

[SQLC](https://sqlc.dev/) solves these issues for us.

This commit doesn't fully replace GORM with SQLC, but introduces it for
a few queries. Once all queries have been converted, GORM can be removed
completely.
2024-03-03 20:15:39 +01:00
1e7c059d12 Manager: check the farm status quickly after startup
The database is polled every 30 seconds to determine the farm status; at
startup the first poll is done after 1 second to get a faster status.

Note that when jobs and workers change their status, the farm status is
always updated.
2024-03-02 22:09:53 +01:00
7eb5eb68a3 Manager: ensure foreign keys are enabled in periodic integrity check
There are still issues with foreign keys getting disabled, so enable them
in the periodic database consistency check.

A more permanent solution is likely to drop GORM and switch to something
else that gives us an on-connect-callback, which can then be used to
turn on foreign key constraints for every connection made.
2024-03-01 23:42:04 +01:00
c1a9b1e877 Manager: force a poll of the farm status when a job/worker changes state
This introduces the concept of 'event listener', which is now used by
the farm status service to respond to events on the event bus.

This makes it possible to reduce the regular poll period from 5 to 30
seconds. That's now only necessary as backup, just in case events are
missed or otherwise things change without the event bus logic noticing.
2024-03-01 22:36:38 +01:00
9bfb53a7f6 Manager: log error when an event doesn't have a SocketIO event type
SocketIO has 'rooms' and 'event types'. The 'event type' is set via
reflection of the OpenAPI type of the event payload. This has to be set
up in a mapping, though, and if that mapping is incomplete, an error will
now be logged.
2024-03-01 22:36:26 +01:00
ee7af29748 Manager: fix unit test for farm status events 2024-03-01 22:36:26 +01:00
54f7878045 Manager: add farm status events to the event bus
Send an event to the event bus whenever the farm status changes. The event
contains a farm status report (like `{status: "active"}`), and is sent to
the `/status` topic.

Note that at this moment the status is only polled every X seconds, and
thus may lag behind other events.
2024-03-01 08:41:35 +01:00
61cc8ff04d Manager: implement API operation to get the farm status
Add a new API operation to get the overall farm status. This is based on
the jobs and workers, and their status.

The statuses are:

- `active`: Actively working on jobs.
- `idle`: Farm could be active, but has no work to do.
- `waiting`: Work has been queued, but all workers are asleep.
- `asleep`: Farm is idle, and all workers are asleep.
- `inoperative`: Cannot work: no workers, or all are offline/error.
- `starting`: Farm is starting up.
- `unknown`: Unexpected configuration of worker and job statuses.
2024-02-29 20:42:28 +01:00
d9ffe8a1b6 OAPI: regenerate code 2024-02-29 20:38:38 +01:00
81968610ed Worker: blender-render command, make the blendfile parameter optional
Only include the `blendfile` parameter if it is not empty. This makes it
possible to pass a Python script that loads/constructs the blend file,
instead of loading one directly.
2024-02-25 23:09:11 +01:00
72d5cfa07c Job compiler: test simplification of frame range for video task 2024-02-24 11:57:52 +01:00
Emmanuel Durand
1ffe0a10bd Job compiler: simplify output video name on rendering list of frames
When a more complex list of frames is to be rendered (like `1, 4, 5, 10,
15`), simplify the video filename to `{first}-{last}`.

Before: `somename-1, 4, 5, 10, 15.mp4`
Now:    `somename-1-15.mp4`
2024-02-24 11:57:46 +01:00
dfdb8e82a1 Worker: Refer to website instead of non-existent example file
The worker-written config files would all refer to
`flamenco-worker-example.yaml`, even though this file doesn't even
exist. Instead, the configuration file will refer to the appropriate
documentation on the website, and the credentials file will explain what
happens when you delete it.

The credentials are otherwise intentionally left undocumented, as their
contents are not to be manually edited. The only thing to do with that
file is delete it so that the Worker re-registers itself at startup.
2024-02-22 12:46:13 +01:00
7033028a0b Worker: use explicit type when writing config file
Instead of passing an arbitrary string ("Configuration" or "Credentials"),
use an explicit type for this. This will make it possible to have the
config-writing functions behave slightly differently depending on which
configuration type is being written.

No functional changes.
2024-02-22 12:46:13 +01:00
12bfa82854 Manager: add lifecycle events to the event bus
Send events on Manager startup & shutdown. To make this possible, events
sent to MQTT are now queued up until an MQTT server can be reached.
Otherwise the startup event would be sent before the MQTT connection was
established.
2024-02-21 22:20:56 +01:00
e7c4285ac6 Manager: Adjust code for renaming SocketIO... types to Event...
No functional changes, just adjusting to the OpenAPI renames.
2024-02-05 09:25:43 +01:00
3326f683da Manager: consistent MQTT server/broker naming
Consistently log about the MQTT "broker", not the "server". The former
is common MQTT terminology.
2024-02-04 18:28:12 +01:00
91d15df765 Manager Cleanup: consistent variable name for MQTTClient receiver pointer
No functional changes.
2024-02-04 17:03:38 +01:00
44bfe58891 Manager: friendlier log message when MQTT server connection cannot be made 2024-02-04 16:32:36 +01:00
f0c7acd903 Manager: fix web interface not showing last-rendered images on job view
Fix SocketIO subscriptions so that the client also subscribes to
job-specific last-rendered images whenever subscribing to job-specific
events. These are sent to another event topic, and thus need some extra
care. Before the introduction of the generic event bus, both message types
were sent to the same topic, but that's not supported by MQTT, and so things
had to change.
2024-02-04 16:12:16 +01:00
4f804958e5 Manager: add unittest for eventbus topics
The code was doing its work just fine, but I wanted to be sure.
2024-02-04 16:12:16 +01:00
dd98c7471d Manager: don't log event payload in event logging
Don't log event payload in MQTT/SocketIO debug logs. It's getting too
noisy.
2024-02-04 16:11:58 +01:00
740ede80fa Manager: log eventbus events with 'eventbus' prefix instead of 'socketio'
These messages are now no longer SocketIO-specific, so should use the
'eventbus' prefix.
2024-02-04 16:08:45 +01:00
b375acb1a1 Cleanup: add SPDX license identifiers 2024-02-03 23:42:51 +01:00
4fe8605744 Manager: Add MQTT client for sending events
Add an MQTT client to send events from the event bus to an MQTT broker.
2024-02-03 23:20:15 +01:00
76a24243f0 Manager: Introduce event bus system
Introduce an "event bus"-like system. It's more like a fan-out
broadcaster for certain events. Instead of directly sending events to
SocketIO, they are now sent to the broker, which in turn sends it to any
registered "forwarder". Currently there is ony one forwarder, for
SocketIO.

This opens the door for a proper MQTT client that sends the same events
to an MQTT server.
2024-02-03 22:55:23 +01:00
f464aea137 Manager & website: provide more helpful info when Worker auth fails
Provide more useful info when a Worker tries to communicate but fails
the authentication check. The message about this is now more friendly
and links to a new FAQ entry at
https://flamenco.blender.org/faq/#what-does-unknown-worker-is-trying-to-communicate-mean
2024-01-25 14:19:24 +01:00
9afd79d8c0 Manager: prevent logging an error when fetching unknown worker
Prevent logging an error in the persistence layer when an unknown worker
is requested.

This reduces the noise & confusion when the web interface is showing the
details of a worker, but the worker gets removed by someone else. Or when
the Manager doesn't know about a Worker and it's trying to connect.

See #104282.
2024-01-25 12:38:13 +01:00
70faa4e225 Move URLs to the Flamenco website to constants in a dedicated package
Create a dedicated package `.../pkg/website` to contain constants for the
URLs of documentation, bug reporting, etc. That way it's easier to see
which parts of the website are being referred to from the Flamenco
binaries, and updates can happen in a central spot.

No functional changes.
2024-01-25 12:25:06 +01:00
b39f116b0e Manager: after deleting a job, perform a database consistency check
Deleting jobs from the database can still sometimes cause consistency
errors, as if foreign key constraints aren't enabled. This check is there
to try and get a grip on things.
2024-01-11 20:03:53 +01:00
aac2ec7bf6 Manager: when requesting job deletion, also log its low-level database ID
When an API request comes in to delete a job, not only log the job's UUID,
but also include its database ID. This can help in figuring out database
issues, as when the job is deleted, it's unknown what UUID it had. Database
relations use the ID, and not the UUID.
2024-01-11 17:17:56 +01:00
6777e89589 Manager: refuse to delete job when foreign keys are disabled
Just as a safety measure, before deleting a job, check that foreign key
constraints are enabled. These are optional in SQLite, and the deletion
function assumes that they are on.
2024-01-11 17:17:56 +01:00
3e46322d14 Manager: reduce log level when last-rendered image was accepted
Reduce the log level when a last-rendered image was accepted from a Worker.
2024-01-11 17:17:56 +01:00
17b664f152 Worker: log copy-pastable commandline invocation
Log any CLI command that's run in a way that can be easily copy-pasted
from the task log. This can help a lot in determining whether an issue
is caused by Flamenco or by the CLI program itself.
2023-12-25 15:07:18 +01:00
8ae0bc37dd Worker: reduce double logging
Remove double logging of 'command exited succesfully' message.
2023-12-25 14:55:35 +01:00
fe26a026e6 Refactor: rename command_exe.go to cmd_executor.go
Rename the file containing the command executor from `command_exe.go` to
`cmd_executor.go`), to distinguish it from the implementation of the
`exec` command (`command_exec.go`).

No functional changes.
2023-12-25 14:14:53 +01:00
246916475f Manager: Implement mass mark-for-deletion of jobs
Implement the API function to mass-mark jobs for deletion, based on
their 'updated_at' timestamp.

Note that the `last_updated_max` parameter is rounded up to entire
seconds. This may mark more jobs for deletion than you expect, if their
`updated_at` timestamps differ by less than a second.
2023-12-16 23:05:52 +01:00
b9e41065c1 OAPI: regenerate code 2023-12-16 23:03:53 +01:00
0ea3cf8c3f Worker: perform database migrations with Goose
Replace the GORM auto-migration with Goose. The latter uses hand-written
SQL files to migrate the database with understandable, explicit queries.
2023-12-14 10:13:42 +01:00
acc9499f2a Manager: drop the job_storage_infos database table
GORM Automigration created a separate `job_storage_infos` table (because
we used it wrong, to be fair), which is actually only used as an
embedded struct in the `jobs` table. This means this table itself can be
dropped.
2023-12-14 10:13:42 +01:00
a65f234bea Manager: replace GORM database migration with Goose
Replace GORM's auto-migration with Goose. The latter uses hand-written
SQL queries to apply database schema changes, which is safer and easier to
understand than what GORM is doing.
2023-12-14 10:13:40 +01:00
d260a308bd Worker: enable write-ahead logging on the database
Now the Worker and the Manager share the same database initialisation
code (enabling foreign key constraints + write-ahead logging).

The foreign key constraints were already enabled before, but now it's done
with (a copy of) the same code as the Manager.
2023-12-14 10:10:03 +01:00