Fix the database migration that adds `NOT NULL` clauses. It used
`INSERT INTO temp_x SELECT * from x;`, and the `*` returns the fields in
the order they are defined on the table. Since this might be different from
the order that the `INSERT INTO temp_x` expects, strange problems can
happen where columns get swapped (or constraints can fail on columns that
they should not fail for, because they got fed data from a different
column).
This makes it easier to later also create `query_workesr.sql`,
`query_meta.sql` etc. so that the sqlc-generated code can follow the
same subdivision as the persistence service code itself.
No functional changes.
GORM has certain downsides:
- Code-first approach, where queries have to be translated to the Go code
required to execute them.
- GORM comes with its own SQLite implementation, which doesn't provide an
on-connect callback. This means that new connections cannot correctly
enable foreign key constraints, causing database consistency issues.
[SQLC](https://sqlc.dev/) solves these issues for us.
This commit doesn't fully replace GORM with SQLC, but introduces it for
a few queries. Once all queries have been converted, GORM can be removed
completely.
The database is polled every 30 seconds to determine the farm status; at
startup the first poll is done after 1 second to get a faster status.
Note that when jobs and workers change their status, the farm status is
always updated.
There are still issues with foreign keys getting disabled, so enable them
in the periodic database consistency check.
A more permanent solution is likely to drop GORM and switch to something
else that gives us an on-connect-callback, which can then be used to
turn on foreign key constraints for every connection made.
This introduces the concept of 'event listener', which is now used by
the farm status service to respond to events on the event bus.
This makes it possible to reduce the regular poll period from 5 to 30
seconds. That's now only necessary as backup, just in case events are
missed or otherwise things change without the event bus logic noticing.
SocketIO has 'rooms' and 'event types'. The 'event type' is set via
reflection of the OpenAPI type of the event payload. This has to be set
up in a mapping, though, and if that mapping is incomplete, an error will
now be logged.
Send an event to the event bus whenever the farm status changes. The event
contains a farm status report (like `{status: "active"}`), and is sent to
the `/status` topic.
Note that at this moment the status is only polled every X seconds, and
thus may lag behind other events.
Add a new API operation to get the overall farm status. This is based on
the jobs and workers, and their status.
The statuses are:
- `active`: Actively working on jobs.
- `idle`: Farm could be active, but has no work to do.
- `waiting`: Work has been queued, but all workers are asleep.
- `asleep`: Farm is idle, and all workers are asleep.
- `inoperative`: Cannot work: no workers, or all are offline/error.
- `starting`: Farm is starting up.
- `unknown`: Unexpected configuration of worker and job statuses.
Only include the `blendfile` parameter if it is not empty. This makes it
possible to pass a Python script that loads/constructs the blend file,
instead of loading one directly.
When a more complex list of frames is to be rendered (like `1, 4, 5, 10,
15`), simplify the video filename to `{first}-{last}`.
Before: `somename-1, 4, 5, 10, 15.mp4`
Now: `somename-1-15.mp4`
The worker-written config files would all refer to
`flamenco-worker-example.yaml`, even though this file doesn't even
exist. Instead, the configuration file will refer to the appropriate
documentation on the website, and the credentials file will explain what
happens when you delete it.
The credentials are otherwise intentionally left undocumented, as their
contents are not to be manually edited. The only thing to do with that
file is delete it so that the Worker re-registers itself at startup.
Instead of passing an arbitrary string ("Configuration" or "Credentials"),
use an explicit type for this. This will make it possible to have the
config-writing functions behave slightly differently depending on which
configuration type is being written.
No functional changes.
Send events on Manager startup & shutdown. To make this possible, events
sent to MQTT are now queued up until an MQTT server can be reached.
Otherwise the startup event would be sent before the MQTT connection was
established.
Fix SocketIO subscriptions so that the client also subscribes to
job-specific last-rendered images whenever subscribing to job-specific
events. These are sent to another event topic, and thus need some extra
care. Before the introduction of the generic event bus, both message types
were sent to the same topic, but that's not supported by MQTT, and so things
had to change.
Introduce an "event bus"-like system. It's more like a fan-out
broadcaster for certain events. Instead of directly sending events to
SocketIO, they are now sent to the broker, which in turn sends it to any
registered "forwarder". Currently there is ony one forwarder, for
SocketIO.
This opens the door for a proper MQTT client that sends the same events
to an MQTT server.
Prevent logging an error in the persistence layer when an unknown worker
is requested.
This reduces the noise & confusion when the web interface is showing the
details of a worker, but the worker gets removed by someone else. Or when
the Manager doesn't know about a Worker and it's trying to connect.
See #104282.
Create a dedicated package `.../pkg/website` to contain constants for the
URLs of documentation, bug reporting, etc. That way it's easier to see
which parts of the website are being referred to from the Flamenco
binaries, and updates can happen in a central spot.
No functional changes.
Deleting jobs from the database can still sometimes cause consistency
errors, as if foreign key constraints aren't enabled. This check is there
to try and get a grip on things.
When an API request comes in to delete a job, not only log the job's UUID,
but also include its database ID. This can help in figuring out database
issues, as when the job is deleted, it's unknown what UUID it had. Database
relations use the ID, and not the UUID.
Just as a safety measure, before deleting a job, check that foreign key
constraints are enabled. These are optional in SQLite, and the deletion
function assumes that they are on.
Log any CLI command that's run in a way that can be easily copy-pasted
from the task log. This can help a lot in determining whether an issue
is caused by Flamenco or by the CLI program itself.
Rename the file containing the command executor from `command_exe.go` to
`cmd_executor.go`), to distinguish it from the implementation of the
`exec` command (`command_exec.go`).
No functional changes.
Implement the API function to mass-mark jobs for deletion, based on
their 'updated_at' timestamp.
Note that the `last_updated_max` parameter is rounded up to entire
seconds. This may mark more jobs for deletion than you expect, if their
`updated_at` timestamps differ by less than a second.
GORM Automigration created a separate `job_storage_infos` table (because
we used it wrong, to be fair), which is actually only used as an
embedded struct in the `jobs` table. This means this table itself can be
dropped.
Replace GORM's auto-migration with Goose. The latter uses hand-written
SQL queries to apply database schema changes, which is safer and easier to
understand than what GORM is doing.
Now the Worker and the Manager share the same database initialisation
code (enabling foreign key constraints + write-ahead logging).
The foreign key constraints were already enabled before, but now it's done
with (a copy of) the same code as the Manager.