Fix an issue where a timed-out task would cause a panic, as it wasn't
fetching its Job UUID.
I see this as working around a limitation of GORM, which should get
replaced with sqlc soon-ish anyway.
When expanding the `{shared}` variable in a path like
`{shared}\shot\file.blend`, all slashes will be normalised to the target
platform, so for example result in:
- Windows: `Y:\shared\flamenco\shot\file.blend`
- Linux : `/shared/flamenco/shot/file.blend`
Due to this normalisation, the same paths will be served to these
platforms when the path was `{shared}/shot/file.blend`.
Remove the introductionary comments from `query_jobs.sql` and
`query_workers.sql`. Sqlc got confused by this, and placed them in the
wrong (well, not-intended-by-me) place in the generated Go code.
No functional changes.
Convert 'last-rendered' to sqlc. The query is slightly suboptimal.
There's a bug in sqlc: the `ON CONFLICT DO UPDATE` clause is generated
incorrectly. See https://github.com/sqlc-dev/sqlc/issues/3334 for more
info.
Turn `workers.lazy_status_request` and `workers.can_restart` into a
`boolean`. They were `smallint` before.
Having these explicitly modeled as `boolean` will make sqlc generate the
right type for them.
No functional changes.
The context passed to the database layer will auto-close when the HTTP
client disconnects. This will cancel any running query, which is the
expected behaviour. Now this no longer results in an error being logged
in the database layer. Instead, a message is logged at debug level.
The API layer is also adjusted to silence logging of `context.Canceled`
for certain operations, most notably getting all jobs and getting all
tasks of a job. These calls occur when the webapp reconnects after a
restart of the Manager. That may trigger a refresh of the page, which
immediately aborts any pending API calls. This is normal and should not
cause errors to be logged.
Add a GORM hook that sets `task.JobUUID` and `.WorkerUUID`. These were
only set by the sqlc code; this change ensures that they are now always
set, so that the caller doesn't have to worry about which function is
already ported to sqlc and which one is still GORM.
Wrap the SQLite error "interrupted (9)". That error is (as far as I
could figure out) caused by the context being closed. Unfortunately
there is no wrapping of the underlying context error, so it's not
possible to determine whether it was due to a 'deadline exceeded' error
or another cancellation cause (like upstream HTTP connection closing).
Primarily this makes a rather unreliable unit test properly reliable.
The code under test could return either `context.DeadlineExceeded` or
the "interrupted (9)" error (GORM + SQLite doesn't reliably chose one or
the other), and now this is cleanly tested for.
Job deletions are placed in an in-memory queue in batches of 100 jobs.
Between batches the Manager's job deleter would idle for 1 minute. Now,
once the in-memory queue has been emptied, the job deleter will wait
only 100ms before checking the database again.
This 100ms might not be necessary either, but I think it's nice to give
the Manager a bit of a breather before diving into another batch of
deletions.
Speed up the deletion of multiple jobs by skipping the database integrity
check. It is now clear what was causing the integrity issues (disabled
foreign key constraints), and this is now checked for before deleting
anything. This reduces the deletion time from ~500ms per job to ~150ms
(on my computer, with my database, of course).
Replace the use of the `t *testing.T` parameter with just plain `panic()`
when test setup fails. This makes it easier to call the same functions
from other situations, like benchmark functions.
No functional changes to Flamenco itself.
Instead of closing the sqlite database connection, tell GORM to close the
connection. Only that properly closes the DB, so that testing with a file
on disk doesn't fail when trying to delete that file.
No functional changes to the Manager itself.
Tweak the logging a little bit so it's less noisy, properly warns when the
Shaman checkout dir cannot be removed, and optimise the database query
a bit (by just fetching the one field that's needed, instead of the entire
job).
Deletion still works the same.
The task state machine expects that `task.Job` is set correctly. Since
SQLC does not automatically fill this field (and rightfully so), I've added
a bit of Go code that fetches the job in a separate query.
A TODO is added as a reminder that it would be better for the task state
machine itself to fetch the job when needed.
This is a bit more work than other queries, as it also breaks apart the
fetching of the job and the worker into separate ones. In other words,
internally the persistence layer API changes.
Regenerate the Go mock implementation after the removal of the SaveTask
function from the mocked interface.
See 097d5abb7c13e6eff1facea12f89f24c144194c0
Add a few more unit tests for the persistence layer. The goal is to have
100% coverage of the happy flow, to aid in conversion from GORM to sqlc.
No functional changes.
Add a function `shellSplit(string)` to the global namespace of job
compiler scripts. It splits a string into an array of strings using
shell/CLI semantics.
For example: `shellSplit("--python-expr 'print(1 + 1)'")` will return
`["--python-expr", "print(1 + 1)"]`.
As a safety measure, refuse to delete Workers from the Manager's database
when foreign key constraints are disabled.
In the long term, the underlying problem should be solved. This is a stop-
gap measure to ensure database consistency.
Before deleting a Worker Tag, check that foreign key constraints are
active for the current database connection.
Sometimes GORM decides to create a new database connection by itself,
without telling us, and then foreign key constraints are not active on
it. This commit is a workaround to avoid database corruption.
Move some of the Worker Tags test code into a function of its own, to have
a clearer separation between 'the test' and 'what needs to happen to do
this part of the test'.
Also it'll make an upcoming change easier to implement.
No functional changes.
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.
No functional changes, except that when tests fail, they now fail
without panicking.
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.
No functional changes, except that when tests fail, they now fail
without panicking.
Set the default MQTT topic prefix to 'flamenco'. It can still be overridden
by the config in the YAML file, but it's nice to have a sensible default
when people don't configure this.
Fix the database migration that adds `NOT NULL` clauses. It used
`INSERT INTO temp_x SELECT * from x;`, and the `*` returns the fields in
the order they are defined on the table. Since this might be different from
the order that the `INSERT INTO temp_x` expects, strange problems can
happen where columns get swapped (or constraints can fail on columns that
they should not fail for, because they got fed data from a different
column).
This makes it easier to later also create `query_workesr.sql`,
`query_meta.sql` etc. so that the sqlc-generated code can follow the
same subdivision as the persistence service code itself.
No functional changes.