Commit Graph

783 Commits

Author SHA1 Message Date
59655ea770 Manager: fix error in sleep scheduler when shutting down
When the Manager was shutting down while the sleep scheduler was running, it
could cause a null pointer dereference. This is now doubly solved:

- `worker.Identifier()` is now nil-safe, as in, `worker` can be `nil` and
  it will still return a sensible string.
- failure to apply the sleep schedule due to the context closing is not
  logged as error any more.
2022-09-27 12:27:18 +02:00
759a94e49b Blender finder: also handle exec.ErrNotFound as "expected"
Blender not being found can be reported via various errors (this should be
reworked in the 'blender finder API' at some point). `exec.ErrNotFound` is
returned when Blender cannot be found on `$PATH`, which is something that's
absolutely fine. This is now logged less dramatically.
2022-09-22 12:39:40 +02:00
161a7f7cb3 Less dramatic logging when Blender cannot be found
Avoid the word "error" in logging when Blender cannot be found. Typically
these are warnings, and having the word "error" there makes people think
otherwise.
2022-09-22 12:37:46 +02:00
b3b46f89b2 Fix T100757: error stating OpenEXR format is unknown format
Fix T100757 by reducing the log level to "info" when Blender writes output
to a file format the Worker cannot handle. Such cases are expected, and
now no longer result in an error message.
2022-09-12 12:40:06 +02:00
1ffd56939a Manager: match Windows paths in two-way variables also with slashes
When doing two-way variable replacement, if the variable has a Windows
path (i.e. backslashes) also do a match for the value with forward slashes.

In other words, if a path `Y:/shared/...` comes in, and the variable value
is (correctly) `Y:\shared\...`, it will be seen as a match.
2022-09-01 15:27:31 +02:00
8368feebac Fix unit test
The recent change in error message caused a test to fail, this is now
fixed. No functional changes.
2022-09-01 15:17:04 +02:00
46792ee164 Clarify "job etag mismatch" situation
When a submitted job is refused because of a mismatched etag, there is
now a more explanatory error logged on the Manager. The website also has
an entry in the FAQ for this, as I expect more people to run into this
issue when they upgrade Flamenco.
2022-09-01 14:46:30 +02:00
780a9f9ef6 Refactor some tests to use require. instead of assert. + fail
The `require.XXX` functions are exactly the same as `assert.XXX`
functions + directly failing the test, so this refactor simplifies the
code quite a bit. Can be done in more areas than this.

No functional changes.
2022-08-31 17:28:19 +02:00
0afde53209 Simple Blender Render: no longer render to intermediate directory
Simple Blender Render now no longer renders to an intermediate directory.
This not only simplifies the script, but it also opens the door for
selective re-running of individual tasks.

In the old situation, where the intermediate directory was renamed to
the desired name in the last task, rerunning tasks would fail because the
directory they expect to exist no longer exists. This is now resolved.
2022-08-31 17:24:31 +02:00
f065cda830 Cleanup: remove some debug prints from Simple Blender Render script 2022-08-31 16:25:52 +02:00
2e1c0b83bf Simple Blender Render: refuse to render videos
The original idea behind this job type was that it would work equally
well for videos as for images, but that was never really well tested.
It's currently broken, so this commit removes video support altogether.
2022-08-31 16:25:23 +02:00
eb89984db8 Simple Blender Render: remove blender_cmd setting
Remove the `blender_cmd` setting, and just hard-code it to `{blender}`.
The Blender add-on was already passing this string, and it's very unlikely
that people are already writing custom add-ons to pass something different.
It provided flexibility that was untested, so it's better to simplify
things.
2022-08-31 16:24:34 +02:00
25dd7b214b Manager: remove superfluous "error compiling job: " prefix from message
The wrapped error already mentioned it was about job compilation.
2022-08-31 16:23:10 +02:00
6e401f882f Worker: fix typo 'FFmepg' -> 'FFmpeg'
Just a logging message fix, no functional changes.
2022-08-31 15:34:42 +02:00
9da75eef04 Worker: fix issue running FFmpeg
The `exeArg` command parameter was incorrectly always expected. It's now
optional, as it should be.
2022-08-31 15:00:46 +02:00
31cf0a4ecc Implement getSharedStorage operation & use it in the add-on
Implement the `getSharedStorage` operation in the Manager, and use it in
the add-on to get the shared storage location in a way that makes sense
for the platform of the user.

Manifest task: T100196
2022-08-31 11:44:37 +02:00
31769bcdf2 Manager: always set config.currentGOOS
This variable is used in tests to mock the current OS, but wasn't set
during normal operation of the Manager. This caused issues with the
two-way variable system.
2022-08-31 11:43:28 +02:00
0a1e1efc41 OAPI: regenerate code 2022-08-31 11:42:46 +02:00
2eae682b9a Manager: actually return the short version in the GetVersion operation 2022-08-31 08:58:59 +02:00
ab14c97b2e Manager: fix tests on Windows
Fix some tests that were failing because some parts of Flamenco now use
native path separators instead of always-forward ones.
2022-08-30 15:44:14 +02:00
e5a20425c4 Separate variables for Blender executable and its arguments.
Split "executable" from "its arguments" in blender & ffmpeg commands.

Use `{blenderArgs}` variable to hold the default Blender arguments,
instead of having both the executable and its arguments in `{blender}`.

The reason for this is to support backslashes in the Blender executable
path. These were interpreted as escape characters by the shell lexer.
The shell lexer based splitting is now only performed on the default
arguments, with the result that `C:\Program Files\Blender
Foundation\3.3\blender.exe` is now a valid value for `{blender}`.

This does mean that this is backward incompatible change, and that it
requires setting up Flamenco Manager again, and that older jobs will not
be able to be rerun.

It is recommended to remove `flamenco-manager.yaml`, restart Flamenco
Manager, and reconfigure via the setup assistant.
2022-08-30 14:58:16 +02:00
87684a0d92 Worker: change "running command" to "running Flamenco command" in log
There are Flamenco "commands" and CLI "commands", and it's nice to be
explicit about which is which. I'm sure this is needed in some other
areas as well.
2022-08-30 10:34:40 +02:00
afdbbcc1d8 Cleanup: explain a bit more in a comment 2022-08-30 10:34:05 +02:00
84cff6919a Worker: also log job UUID when running a task
Having both the job and task UUIDs in the log output helps when debugging.
2022-08-30 10:18:32 +02:00
c504e68d8e Manager: store the jobs implicit variable in platform-native notation
Don't change backslashes to forward slashes on Windows. Trying to use
forward slashes everywhere was a mistake, and this is one of the steps to
make it right.
2022-08-29 17:51:20 +02:00
20395e0e26 Manager: always start the variable lookup table with a fresh map
If the loaded config doesn't define the default variables, the latter
should not be found in the lookup table any more; this is now fixed.
2022-08-29 17:44:47 +02:00
4a201d47b4 Cleanup: add unit test for parsing backslashes in variable values
Backslashes can be included in two ways, as-is (which works fine) and
between double quotes (in which case they need escaping). This test checks
for both.
2022-08-29 17:28:40 +02:00
0c91fe93d0 Manager: only do pathsep localisation on two-way variables
By accident the Manager was performing slash localisation on all
command parameters, causing some math expressions for FFmpeg to fail.
2022-08-25 15:02:56 +02:00
6b4b205c1c Manager: allow backslashes in variables
Windows machines should be able to simply use backslashes.
2022-08-25 13:59:02 +02:00
22aa041ec1 Allow relative render output root paths
Add a new `abspath(path)` function to the add-on, for use in job type
settings. With this, the "simple blender render" job can support relative
paths for the "render output root" setting, and still have an absolute
final "render output path".
2022-08-25 13:14:48 +02:00
63c60a5b15 Two-way variable replacement: change path separators to target platform
Two-way variable replacement now also changes the path separators. Since
the two-way replacement is made for paths, it makes sense to also clean up
the path for the target platform.
2022-08-25 12:19:30 +02:00
1355ec5e1d Worker: Change how the worker shuts down
Instead of sending the current process an interrupt signal, use a dedicated
channel to signal the wish to shut down. The main function responds to that
channel closing by performing the shutdown.

This solves an issue where the Worker would not cleanly shut down on
Windows when `offline` state was requested by the Manager.
2022-08-12 11:15:19 -07:00
2a345a3d2c API for deleting workers
Workers can now be soft-deleted. Tasks assigned to the worker will remain
associated with that Worker. Active tasks will be re-queued so other
workers can pick them up.
2022-08-11 16:59:53 -07:00
458c33573e OAPI: regenerate code 2022-08-11 16:58:05 -07:00
cbafe0ff34 Manager: when finding Blender, be less dramatic when it can't be found
It's fine when Blender is not available on `$PATH`, so only log that at
debug level.
2022-08-02 13:36:25 +02:00
cbc6bfaf02 Manager: also recognise exec.ErrNotFound as a "blender not found" error 2022-08-02 13:36:25 +02:00
11e5363d24 Manager: reject removal of empty list of blocklist entries
A request to remove an empty list of blocklist entries now results in a
400 Bad Request.
2022-08-01 18:55:33 +02:00
3b978ceda0 Cleanup: manager, name variable correctly
It was an old name from copy-pasted code, now it reflects the actual code.

No functional changes.
2022-08-01 18:55:08 +02:00
1469345f3a Manager: sort blocklist by worker name 2022-08-01 18:54:28 +02:00
f3aab8611c Manager: include worker name when returning blocklist 2022-08-01 18:03:17 +02:00
fef3de28e1 Fix unit test
Fix unit test broken in rF449c83b9.

No functional changes.
2022-08-01 16:02:08 +02:00
642ef36778 Blender finder: fix compatibility with Windows Home
For some reason, calling `AssocQueryStringW` on Windows Home returns error
code 122, "The data area passed to a system call is too small", even when
the data area is large enough. Furthermore, the API actually describes that
in such cases `S_FALSE` is supposed to be returned, with `*pcchOut` set to
the required size. Because of this apparent violation of the documentation,
and because it just works, Flamenco now ignores this particular error and
just returns the obtained string.
2022-08-01 16:00:49 +02:00
350f4f60cb Worker: convert database interface to GORM
Convert the database interface from the stdlib `database/sql` package to
the GORM object relational mapper.

GORM is also used by the Manager, and thus with this change both Worker
and Manager have a uniform way of accessing their databases.
2022-08-01 14:29:14 +02:00
449c83b94a Manager: broadcast worker update after assigning task
The Manager now broadcasts a worker update to SocketIO clients when a
worker gets a new task assigned. This ensures the "current task" shown in
the worker details view is up to date.
2022-08-01 14:29:08 +02:00
a6c935a634 Fix T99421: Introducing an etag for job types
The etag prevents job submissions with old settings, when the job
compiler script has been edited. The etag is the SHA1 hash of the
`JOB_TYPE` dictionary (as defined by the JavaScript file). The hash is
computed in a way that's independent of the exact formatting in the
JavaScript file. Also the actual JS code itself is irrelevant, just the
`JOB_TYPE` dictionary is used.
2022-07-29 21:13:37 +02:00
48ca73f550 Refactor, manager: rename compilerForJobType to compilerVMForJobType
The function returns a `*VM`, which contains a compiler, and allows you
to run a compiler, but is not a compiler itself.
2022-07-29 14:26:54 +02:00
370f935f65 Simple-blender-render job: use absolute path for render_output_path
Blender cannot be told to only allow absolute path for an RNA property
(of type `string`, subtype `dir_path`), so as a workaround the final
`render_output_path` is now using `bpy.path.abspath()` to make the path
absolute.

This has as advantage that the render output path can be defined by artists
as a blendfile-relative path, and that it'll be resolved when submitting
the blend file.
2022-07-29 11:03:14 +02:00
be1ddaa4eb Manager test: reduce timeout to practical value
The timeout was increased to aid debugging, but shouldn't have been
committed.
2022-07-29 09:59:54 +02:00
8c8855554e Manager: remove --factory-startup from default Blender arguments
Remove `--factory-startup` from the default Blender arguments. This makes
it simpler to configure each Worker to use its own GPU, without having to
inject Python code into the arguments.

Users can always add this when they need, but I think it's friendlier to
have Blender behave the same when they manually run it and when used by
Flamenco Worker.
2022-07-29 09:54:29 +02:00
377583c9e2 Cleanup: worker, move FFmpeg-finding at startup into its own file
Just a move of code from `main.go` to a dedicated file in the same package.

No functional changes
2022-07-29 09:47:30 +02:00
d4dfa2d071 Add release cycle to versioning of Flamenco
Include `RELEASE_CYCLE` in the Makefile. This is mentioned at startup of
Manager and Worker, and reflects in the software version they report.

If `RELEASE_CYCLE == "release"`, Manager and Worker report their version
as `ApplicationVersion`. If it's any other string, the Git hash will get
appended.
2022-07-28 15:10:27 +02:00
8c86d4c1a9 Worker: Wait for subprocess even when it failed
The Worker now always waits for subprocesses. When faced with multiple
errors (like I/O reading from stdout and a returned error status from
the process) will return the most important one (in this case the exit
status of the process).

Subprocesses need to be waited for, even when they crashed, otherwise
they will linger around as "defunct" processes. This caused
out-of-memory errors, because several defunct Blenders were eating up
the memory.
2022-07-28 14:36:01 +02:00
c79fe55068 Worker: Refactor the running of subprocesses
Blender and FFmpeg were run in the same way, using copy-pasted code. This
is now abstracted away into the CLI runner, which in turn is moved into
its own subpackage.

No functional changes.
2022-07-28 14:34:33 +02:00
c42665322b Cleanup: add a comment
Just a comment that explains why an error is ignored.

No functional changes.
2022-07-28 14:28:02 +02:00
b26374d480 Manager: when worker goes to sleep, log in task log which worker
When a worker's tasks get requeued because it goes to sleep, the task log
will now mention the worker identification (name + UUID). This aids in
figuring out what happened to tasks.
2022-07-28 14:27:44 +02:00
4cb0a6fb14 Blender Finder: allow passing the directory instead of the executable
Blender Finder now understands that directory paths should be suffixed
with `blender` (Linux, macOS) or `blender.exe` (Windows).

Giving the Setup Assistant a path like `C:\Program files\Blender
Foundation\Blender 3.2` will now just work. This is considerably simpler
for many users, as copy-pasting a directory from a file explorer is
simpler than obtaining/typing the path to the executable.
2022-07-26 18:18:02 +02:00
1e3a2b5480 Blender Finder: better reporting on timeout errors
Instead of just `signal: killed`, report that it actually took too long.
2022-07-26 17:40:28 +02:00
fa79b81d5b Blender Finder: support multi-line output of blender --version
When compiled without OpenColorIO, Blender will first complain "Color
management: Error could not find role data role." before showing the
actual version number. This is now handled by looking for a "Blender "
prefix instead of just returning the first line of output.

This has as a side-effect that when no such line can be found, we know
it's not Blender, and thus an error can be returned (instead of the
version of whatever binary was being run).
2022-07-26 17:25:50 +02:00
cb6a3a5a88 Manager: test error with errors.Is() instead of ==
It's just a better way to test errors.
2022-07-26 17:25:50 +02:00
859a2e6eda Manager: better logging when trying to find Blender 2022-07-26 17:25:50 +02:00
3f6dd9be8b Blender Finder: add timeout to blender --version invocation
Make sure that the command execution doesn't hang indefinitely.
2022-07-26 17:25:50 +02:00
f71bfdfafe Manager: fix unit test
Fix the unit test I broke in rF736ca103c3d7f37557ed541ca70117bc95bef932
2022-07-26 17:25:50 +02:00
736ca103c3 Manager: show current/last task in worker details
The Task details component already linked to the Worker it was assigned
to last, and now the Worker links back to the task.

There's only one task shown in the Worker details. If the Worker is
actively working on a task, that one's shown. Otherwise it's the
last-updated task that was assigned to the worker.
2022-07-26 10:36:02 +02:00
Francesco Siddi
9948fdab71 Rename First Time Wizard to Setup Assistant
This commit does not introduce functional changes, besides renaming
every mention of 'wizard' with 'setup assistant'. In order to run the
manager setup assistant use:

./flamenco-manager -setup-assistant

The change was introduced to favor more neutral and descriptive working
for this functionality. Thanks to Sybren for helping to get this done!
2022-07-25 17:17:04 +02:00
Francesco Siddi
a2bd8a5615 OAPI: generate code 2022-07-25 17:16:53 +02:00
c1a728dc2f Version updates via Makefile
Flamenco now no longer uses the Git tags + hash for the application
version, but an explicit `VERSION` variable in the `Makefile`.

After changing the `VERSION` variable in the `Makefile`, run
`make update-version`.

Not every part of Flamenco looks at this variable, though. Most
importantly: the Blender add-on needs special handling, because that
doesn't just take a version string but a tuple of integers. Running
`make update-version` updates the add-on's `bl_info` dict with the new
version. If the version has any `-blabla` suffix (like `3.0-beta0`) it
will also set the `warning` field to explain that it's not a stable
release.
2022-07-25 16:08:07 +02:00
ab8ecc24cc Cleanup: Add missing license specifiers
Add license specifiers to Go files that were missing them:

```
// SPDX-License-Identifier: GPL-3.0-or-later
```

No functional changes.
2022-07-25 16:08:07 +02:00
0e6d61dd84 Remove the {ffmpeg} variable
Remove the `{ffmpeg}` variable from the default configuration, and its use
from the job compiler scripts. Now that the Worker can find its bundled
FFmpeg, it's no longer needed to configure its location on the Manager.
2022-07-22 16:37:14 +02:00
09946c0894 Worker: use bundled FFmpeg if available
Worker will now try one of the following paths, relative to the flamenco-worker
executable, in order to find FFmpeg. If they cannot be found, `$PATH` is
searched for FFmpeg.

- `tools/ffmpeg-$GOOS-$GOARCH`
- `tools/ffmpeg-$GOOS`
- `tools/ffmpeg`

On Windows these paths will have a `.exe` suffix appended. `$GOOS` is the
operating system, like "linux", "darwin", "windows", etc. `$GOARCH` is the
architecture, like "amd64", "386", etc.
2022-07-22 16:37:14 +02:00
a5940a24f0 Worker: load flamenco-worker.yaml from current directory
By accident I made the worker load `flamenco-worker.yaml` from the "local
files" directory (~/.local/share/flamenco on Linux) instead of the current
directory. This was incorrect, as that file is meant to contain
configuration that's shared between workers.
2022-07-22 16:37:14 +02:00
Pablo Vazquez
53598c3ee0 Manager: Rephrase wording on report for successfully writing to Shared Storage
* Replace "OK!" with "successfully"
  Remove exclamation mark since there is no need to call for attention.
  Use "successfully" as it is more descriptive in this case than OK,
  which can have other meanings.
2022-07-22 14:57:12 +02:00
Francesco Siddi
08f52993ad Setup Screen: Overall UI/UX tweaks
- Added initial description and illustration
- Swap "Check" button for fields with a debounced @input event
- Turn Blender's list into a radio selector
- Tweak wording when paths are not found
- Add microtip library for tooltips
- Make navigation steps clickable, according to the state
2022-07-22 14:57:11 +02:00
11a352968a Fix T99434: Two-way Variables
Two-way variable implementation in the job submission end-point. Where
Flamenco v2 did the variable replacement in the add-on, this has now
been moved to the Manager itself. The only thing the add-on needs to
pass is its platform, so that the right values can be recognised.

This also implements two-way replacement when tasks are handed out, such
that the `{jobs}` value gets replaced to a value suitable for the
Worker's platform as well.
2022-07-22 11:58:35 +02:00
585c886bd5 Fix Windows build errors 2022-07-21 20:59:10 +02:00
af0389efc6 Cleanup: correct function name in docstring 2022-07-21 16:29:23 +02:00
894058bc69 Cleanup: variable replacement, avoid hard-coded "workers" string
Use `config.VariableAudienceWorkers` instead.

No functional changes.
2022-07-21 16:29:05 +02:00
27602174ae Variable replacement: fix issue replacing vars in nested lists
An array-of-strings in Go can become an array-of-`interface{}` when
converted to JSON and back again. Such cases are now handled properly.
2022-07-21 16:28:38 +02:00
48f081e03e Sleep Scheduler: don't overwrite error status from Worker
The Sleep Scheduler shouldn't push a Worker out of `error` status, as that
could hide problematic situations.
2022-07-21 12:49:32 +02:00
d553ca5ab9 Worker: pass input frame rate to FFmpeg when converting frames to video
FFmpeg needs the input frame rate as well, otherwise it'll default to 25
FPS, and mysteriously drop frames when rendering a 24 FPS shot.
2022-07-19 18:43:06 +02:00
de80a09223 Manager: include job UUID in "last-rendered image received" log entries
This makes it possible to collect all "last-rendered image received"
entries for a single job.
2022-07-19 18:40:22 +02:00
d929885b06 Manager: only log task status change if there is an actual change
Don't log "changes" from, say, `active` -> `active`.
2022-07-19 17:47:43 +02:00
ac3236786b Manager: add entry to task log whenever task changes status
Add a line to the task log whenever task changes status. This only applies
to directly-changed tasks, and not to mass-updates (like all tasks going
from 'completed' to 'queued' on a job requeue).
2022-07-19 17:23:13 +02:00
696b97c553 Re-queue tasks of worker after changing to non-'awake' state
When a Worker changes state from `awake` to something else, it cannot
run tasks any more. This now triggers a requeue of its active task
(should be one at most, if things are sane) so that another worker can pick
it up.
2022-07-19 15:38:36 +02:00
ecfeaec4b2 Worker: store files on Windows in Blender Foundation\Flamenco
On Windows, store files in `%LOCALAPPDATA%\Blender Foundation\Flamenco`.
Previously the `Blender Foundation` part of the path was missing.

Manifest Task: T99415
2022-07-19 12:13:34 +02:00
2f76df437b T99415: Worker: change default location for writing local files
Change the location where the Worker writes its local files so that it
follows the XDG specification (instead of writing to the current working
directory).

- Linux:   `$HOME/.local/share/flamenco`
- Windows: `C:\Users\UserName\AppData\Local\Flamenco`
- macOS:   `$HOME/Library/Application Support/Flamenco`

NOTE: The old files will not be loaded any more. This means that if
nothing is done and the new worker is run as-is, it will reregister as a
brand new worker. Move `flamenco-worker-credentials.yaml` and
`flamenco-worker.sqlite` to the new location to avoid this.
2022-07-19 12:08:41 +02:00
fa600d6fc9 Cleanup: rename mustHostname() to workerName()
The function determines the worker's name. The fact that it can use the
hostname for this isn't that relevant.
2022-07-19 12:03:08 +02:00
0a5f87bc5a Sleep Scheduler: perform first check at startup
Instead of waiting for a minute, run the first sleep scheduler iteration
at startup.
2022-07-18 19:30:38 +02:00
83467e4c60 Sleep schedule: store 'next check' timestamp in UTC
SQLite doesn't parse the timezone info, so timestamps should always be in
UTC.
2022-07-18 19:30:17 +02:00
3baac0a2d8 Manager: reduce log level when worker asks task but has wrong status
This can happen quite often and it's fine, so it's not worth a warning.
2022-07-18 19:26:49 +02:00
24f921b0c8 Manager: add more logging when worker cannot be marked as 'seen'
SQLite often errors out on this with only `interrupted (9)` as message.
This logging should at least tell us whether it's our own "background
context" timing out, or whether something else fishy is going on.
2022-07-18 19:04:15 +02:00
bfd6746f78 Manager: consult the sleep schedule on worker sign-on
If there is no status change queued for the Worker, the sleep schedule
should determine its initial status.
2022-07-18 18:25:24 +02:00
bc725ea7dc Manager: mark worker as 'seen' when calling the WorkerState operation
Fix workers timing out when they're `asleep`. When sleeping, the Worker
will call the `WorkerState` operation to see if they have to wake up, but
that didn't mark the workers as "seen". As a result, a sleeping worker
would always time out.
2022-07-18 17:56:56 +02:00
47e517a3a5 Worker: cleanly sign off after flushing buffer
When running the Worker with the `-flush` CLI argument, actually sign off
from the Manager before shutting down.
2022-07-18 16:36:45 +02:00
0697f71b62 Manager: run some operations in a background context
Run some API operations in a background context. This should prevent some
of the SQLite "interrupted" errors, as those can occur when the context
closes while a query is running.

The API operations that Workers use are now mostly running in a separate
background context, at least from the moment onward when they can run
independently of the Worker connection.
2022-07-18 16:26:06 +02:00
43e8f3f623 Manager: improve the "my own URLs" construction
Improve the "my own URLs" construction, such that:
- IPv6 link-local addresses are always skipped. They require a "zone index"
  string, typically the interface name, so something like
  `[fe80::cafe:f00d%eth0]`. This is not supported by web browsers, so the
  URLs would be of limited use. Furthermore, they require the interface
  name of the side initiating the connection, whereas this code is used to
  answer the question "how can this machine be reached as a server?"
- IPv4 addresses are sorted before IPv6 addresses. Even though I like IPv6
  a lot, IPv4 is still more familiar to people.
- Loopback addresses (::1, 127.0.0.1) are sorted last, so that the First-
  Time Wizard is most likely to use the bigger-scoped address.
2022-07-18 15:36:43 +02:00
e91623557a Worker: log which URLs were tried when auto-discovery failed
When the Worker cannot find any Manager, log which URLs were tried.
2022-07-18 14:14:02 +02:00
ad57070a2d Manager: reduce log level of "loading configuration" message
Every time the web interface starts, it queries the config to see whether
it should be in first-time-wizard mode or not. This caused unnecessary
info-level logging.

In the future it would be better to load the config file just once,
instead.
2022-07-18 14:11:22 +02:00
658a3d7a85 Worker Timeout: subject all but offline/error workers to timeout checks
Workers that are in `starting`, `asleep`, or `testing` state should also
be subject to the timeout check, not just workers in `awake` state.
2022-07-18 11:30:39 +02:00
a6ca3f7bdc Sleep Scheduler: reduce check interval and log level
Reduce the check interval and the log level of "nothing to do" messages,
from "developer friendly" to "actually useful".
2022-07-17 17:31:51 +02:00
d7b164133a Sleep Scheduler implementation for the Manager
The Manager now has a sleep scheduler for Workers. The API and background
service work, but there is no web interface yet.

Manifest Task: T99397
2022-07-17 17:27:32 +02:00
627996525e Manager: implement operations for getting & setting worker sleep schedule
This is just the API, no web interface yet.

Manifest Task: T99397
2022-07-16 16:00:25 +02:00
0e92004f2a OAPI: regenerate code 2022-07-16 15:59:48 +02:00
726129446d T99730: Allow access to full task log
The web interface has a button that opens the task log in a new window.
This might need some restyling ;-)
2022-07-16 12:55:41 +02:00
686295090b Manager: implement endpoint for getting the full task log
Previously only the log tail was available, which is fine for many cases,
but for serious debugging the entire log is needed.

Manifest task: T99730
2022-07-16 11:13:31 +02:00
e2434b44f2 OAPI: regenerate code 2022-07-16 11:11:34 +02:00
ca586bf3fe Windows: Skip "inaccessible path" test
For some reason, on Windows, creating a directory with zero permissions
still allows creating a file in there. Just skip that part of the test.

The Explorer's properties panel of the directory also shows "Read Only
(only applies to files)", so at least that seems consistent.
2022-07-16 10:31:35 +02:00
859a261b05 Manager: on deletion of a worker, do not cascade to deletion of its tasks
Fix an issue where deleting a Worker would also delete the tasks it was
assigned to.
2022-07-15 17:00:25 +02:00
904b6c0d73 Stresser: stress the Manager by querying for tasks to execute 2022-07-15 15:08:00 +02:00
1fceae3604 Manager: more efficient database queries
Be more selective in what's saved to the database to speed some things up.
Most importantly, this avoids saving the entire job when a task status is
updated or a task is assigned.
2022-07-15 15:08:00 +02:00
1055aabee2 Manager: optimise db.SaveActivity() query
Use an explicit `Select()` GORM call to avoid saving related objects.
2022-07-15 15:08:00 +02:00
2e1a9c61b8 Manager: add SHA256 password hasher for worker auth
Add a SHA256 password hasher for worker authentication. It's not used at
the moment, but can be switched to for faster API queries. Note that
switching will cause authentication errors on already-existing workers,
which means they'll automatically re-register.

This is mostly useful for debugging & profiling purposes.
2022-07-15 15:08:00 +02:00
0e4ed1c54d Manager: move worker password hasher into a struct + interface
Move the Worker password hashing/comparison functions into a struct, and
use it via an interface. This will make it easier to switch to different
hashing algorithms.

Even with a low number of iterations, BCrypt is quite slow. That's good for
security, but not for Flamenco Worker authentication -- the password is
more as "nice check to avoid accidentally reusing the same ID" than
something for security.
2022-07-15 15:08:00 +02:00
35fe0146d3 Add stress tester for task updates
Build with `make stresser`. Run with:

  ./stresser -worker UUID -secret ABCXYZ

The worker ID and secret can be obtained from
`flamenco-worker-credentials.yaml`. If left empty, the stresser will
register as a new worker, and log the credentials to be used on the next
invocation.
2022-07-15 15:08:00 +02:00
6e28271c93 Manager: prevent saving related job & worker when "touching" task 2022-07-15 15:08:00 +02:00
62ecd09f5f Don't return 500 Error when Blender cannot be found on $PATH
In the first-time wizard, if Blender cannot be found on $PATH but it can
be found via .blend file association, that should just be reported as a
normal sitation, and not as a `500 Internal Server Error`.
2022-07-14 18:50:34 +02:00
c0f4657be4 Wrap error message when finding Blender via file association fails 2022-07-14 18:49:37 +02:00
72337c55cd Blender finder: fix Windows build error 2022-07-14 18:41:55 +02:00
86bccf3aa9 Blender finder: report only the first line of stdout 2022-07-14 18:41:50 +02:00
8b494dc448 Manager: Fix logic error detecting first-time run
If the config file is missing, `true` should be returned.
2022-07-14 18:24:47 +02:00
8719103462 Manager: set default storage path to "" to trigger the first-time wizard
Trigger the first-time wizard on first-time runs of Flamenco, by defaulting
the storage path to the empty string.

The wizard can always be triggered with the `-wizard` CLI argument. This is
just for detection of first-time / unconfigured runs.
2022-07-14 18:24:47 +02:00
b35af5de9f Manager: allow requesting shutdown multiple times
It's fine to request a shutdown multiple times. This fixes a hard crash
due to a panic.
2022-07-14 18:24:16 +02:00
38b8220476 Restart Flamenco Manager when the first-time wizard is complete 2022-07-14 17:52:38 +02:00
10f56148d4 Allow saving configuration from the first-time wizard
This just updates the config and saves it to `flamenco-manager.yaml`.

Saving the configuration doesn't restart the Manager yet, that's for
another commit.
2022-07-14 17:27:17 +02:00
f9a3d3864a OAPI: regenerate code 2022-07-14 17:26:26 +02:00
7204bb833a Blender: run with enable-autoexec flag by default & shorten flags
Run with `-b -y`, instead of `--background --enable-autoexec`, to shorten
the default flags.
2022-07-14 15:52:57 +02:00
aec5ee49e0 First-Time Wizard: allow selecting Blender executables
The wizard now finds Blender in various ways, and lets the user select
which one to use.

Doesn't save anything yet, though.
2022-07-14 12:22:56 +02:00
20f13257f7 Move "blender finder" from Worker-specific to common location
Manager's first-time wizard will have to be able to find Blender as well.
2022-07-14 11:17:03 +02:00
aa9837b5f0 First incarnation of the first-time wizard
This adds a `-wizard` CLI option to the Manager, which opens a webbrowser
and shows the First-Time Wizard to aid in configuration of Flamenco.

This is work in progress. The wizard is just one page, and doesn't save
anything yet to the configuration.
2022-07-14 11:17:03 +02:00
e4a38f071c OAPI: regenerate code 2022-07-14 11:16:59 +02:00
6b5f9317cb Manager: clear job's blocklist when requeueing the job
Requeueing a job means that the issues that caused workers to get blocked
might be resolved, so it should be run with a clean slate.
2022-07-14 11:03:11 +02:00
3c290b1f6d Manager: ensure the {jobs} implicit variable uses forward slashes
Since the variable expansion is unaware of path semantics, using forward
slashes is the safest way to go about things in a platform-indepdent way.
2022-07-13 12:45:55 +02:00
ce250a611e Windows: fix error handling of syscall to AssocQueryStringW
syscall.SyscallN returns a `uintptr` type alias, and thus has to be
compared to `0`, not `nil`. Yeah, it's a bit weird.
2022-07-13 11:48:26 +02:00
0ff8ed7585 Manager: implement the getVariables OpenAPI operation 2022-07-08 11:36:00 +02:00
ae2cb281b4 OAPI: regenerate code 2022-07-08 11:35:57 +02:00
ac5bb5e378 Remove assumption {jobs} only exists when Shaman is enabled
Manager always creates an implicit variable `{jobs}`. This used to be
Shaman-dependent, but now it's always there (has been for a while). This
is now reflected in an add-on comment, and in an extra unit test.
2022-07-05 18:19:49 +02:00
d4429d593c Unify task log storage & manager-local storage
The task logs storage system is refactored to use the `local_storage`
package. Configuration options have also changed:

- `task_logs_path` is renamed to `local_manager_storage_path`, to
  emphasise that only the Manager deals with those files, with default
  value `./flamenco-manager-storage`.
- `storage_path` is renamed to `shared_storage_path`, to emphasise this
  is the storage shared between Manager and Workers, with default value
  `./flamenco-shared-storage`.

Task logs are still stored in
`${local_manager_storage_path}/job-{jobUUID[0:4]}/{jobUUID}/task-{taskUUID}.txt`

Manifest task: T99409
2022-07-05 17:58:58 +02:00
9f9a278634 Manager: remove old commented-out config sections
Various config sections were commented out, because they were brought in
from Flamenco 2 but weren't implemented yet. These have now been removed,
as the basic functionality is there, and new functionality will likely
be different from Flamenco 2 anyway.
2022-07-05 17:23:31 +02:00
2965856aa3 Worker: add test flag to enable Blender-dependent test
Add a `-withBlender` CLI argument for a unit test, to aid in debugging
T99438.

Run the test with `go test ./internal/worker/find_blender/ -args -withBlender`
to actually fail when the file association with `.blend` files cannot be
found.

Note that this doesn't rely on Blender being runnable, but it does rely
on _something_ being associated with .blend files.
2022-07-05 10:01:10 +02:00
60971722fc Windows: add missing imports
A recent refactor (rFfb89658530da25a77dc03fb329c394198bf6358f) performed
on Linux didn't properly update a Windows-only file.
2022-07-05 10:01:10 +02:00
2c932ebad5 Show Worker's "last seen" timestamp in web interface & API responses 2022-07-04 12:49:56 +02:00
7d64d1bca4 Move SwaggerUI to /api/v3/swagger-ui
Include the `v3` path component in the Swagger UI URL.
2022-07-04 12:21:18 +02:00
f2f8357df7 Bump thumbnail JPEG quality from 80 to 85
80 was a bit too low. 85 might still be too low, we'll have to see.
2022-07-01 17:44:26 +02:00
5fbdc388ad Job compiler: tweak settings visibility of simple-blender-render
In the `simple-blender-render` job type settings, hide the `chunk_size`
setting from the web frontend, and show the `blendfile` setting instead.

The actual blend file being rendered is important to know, whereas the
chunk size can be inferred from the task names anyway.
2022-07-01 13:36:44 +02:00
d25151184d Add a "Last Rendered" view
Add a "Last Rendered" view to the webapp.

The Manager now stores (in the database) which job was the last
recipient of a rendered image, and serves that to the appropriate
OpenAPI endpoint.

A new SocketIO subscription + accompanying room makes it possible for
the web interface to receive all rendered images (if they survive the
queue, which discards images when it gets too full).
2022-07-01 12:34:40 +02:00
801fa20f12 OAPI: regenerate code 2022-07-01 12:32:42 +02:00
2457a63518 Manager: Show "nothing rendered yet" image in job details
Show a "nothing rendered yet" image in the job details when there is no
last-rendered image yet.
2022-06-30 19:20:19 +02:00
0fc5ba0bc6 Manager: broadcast last-rendered image info via SocketIO
After processing an image in the "last-rendered" processor, a SocketIO
object is sent to clients to indicate the last-rendered image needs to
be (re)loaded.

This also moves the previously existing "done callback" from a single
function to a per-image callback, so that it can be called with the
right information in there, and only when that particular image is
actually done processing.

The notification message sent via SocketIO also contains the necessary
info to render the image, so that the web client doesn't have to call
the `fetchJobLastRenderedInfo` operation.
2022-06-30 18:36:24 +02:00
6efd67b05c Manager: implement FetchJobLastRenderedInfo() API operation
Allow querying for the URL & available versions of a job's last-rendered
image.
2022-06-28 17:08:00 +02:00
668e25fe95 OAPI: regenerate code 2022-06-28 17:07:08 +02:00
24344e9632 Cleanup: worker, simplify setting the manager URL
The return value of `FileConfigWrangler.SetManagerURL()` was never used,
so now the function doesn't return anything any more.
2022-06-28 11:42:47 +02:00
d6cfff4031 Worker: treat empty config file the same as a missing one
EOF while parsing the config file is now handled as an indication that
the default config should be used, rather than a fatal error.
2022-06-28 10:24:46 +02:00
fb89658530 Refactor: replace os.IsNotExist() with errors.Is(err, fs.ErrNotExist()
`os.IsNotExist()` is from before `errors.Is()` existed. The latter is the
recommended approach, as it also recognised wrapped errors.

No functional changes, except for recognising more cases of "does not
exist" errors as such.
2022-06-28 10:24:46 +02:00
64512c81ba Manager: implement OAPI operations to fetch blocklist & delete items 2022-06-27 11:32:35 +02:00
1353d1df0f OAPI: regenerate code 2022-06-27 11:32:12 +02:00
2d6c11e98b Worker: send produced output to Manager
Workers now send output produced by Blender (limited to PNG and JPEG
images, currently) to Manager. This is done by converting to JPEG first,
then sending the bytes via the Flamenco API to the Manager.
2022-06-27 11:30:37 +02:00
34f1cc076c Cleanup: Worker, simplify Listerer.Run() function
No functional changes, except that now the "listener shutting down"
message will also be logged in case of a panic.
2022-06-27 11:30:37 +02:00
f244355328 Worker: parse stdout of Blender to recognise saved files
Prepare the Worker for submission of last-rendered images to Manager, by
parsing `stdout` of Blender to see which files were saved.

This needs more work, as now just an error "not implemented" is logged.
2022-06-27 11:30:37 +02:00
1f8c2df919 Worker: skip sometimes-hanging unit test
The test can hang occasionally, and needs some love & attention. For now
I've done some patching to make it slightly better, but still disabled it
and added a `FIXME` note to it.
2022-06-27 11:30:35 +02:00
e6af6a708c Manager: always close file when saving to JPEG
Always close the output file; previously this was not done when the
JPEG encoding would fail.
2022-06-26 13:24:37 +02:00
15ad890646 Unit test: properly close image file in test
On Windows it's not allowed to erase a file while it's opened, which caused
this error to surface. The file is now properly closed before the test
file is erased.
2022-06-26 13:23:48 +02:00
e687c95e5d Manager: add "last rendered image" processing pipeline
Add a handler for the OpenAPI `taskOutputProduced` operation, and an
image thumbnailing goroutine.

The queue of images to process + the function to handle queued images
is managed by `last_rendered.LastRenderedProcessor`. This queue currently
simply allows 3 requests; this should be improved such that it keeps
track of the job IDs as well, as with the current approach a spammy job
can starve the updates from a more calm job.
2022-06-24 16:51:11 +02:00
167b2eaf45 OAPI: regenerate code 2022-06-24 16:39:50 +02:00
b53cd67eb4 Cleanup: rename assertResponseEmpty()assertResponseNoContent()
The function tests the HTTP response is `204 No Content`, and now the
name reflects that better.

No functional changes.
2022-06-24 16:09:46 +02:00
27a6dde708 Manager: add local_storage package for managing storage locations
Add a `local_storage` package that finds a suitable place to put files.
Currently it just looks at the location of the currently running
executable; it can later do other things. It can be queried for directory
to put job-specific files.

It is intended to be used by the under-development "last rendered output"
processing system, to store an image file per job. Later we should also
refactor the task log handling system to use this.
2022-06-23 16:45:38 +02:00
b441f3f3de Manager: load job compiler scripts from disk as well
If there is a `scripts` directory next to the current executable, load
scripts from that directory as well.

It is still required to restart the Manager in order to pick up changes
to those scripts (including new/removed files), PLUS a refresh in the
add-on.
2022-06-21 17:59:20 +02:00
87f1959e26 Manager: use blocklist to actually block workers
Actually use the blocklist in the task scheduler to block workers from
doing blocked job types.
2022-06-21 17:59:20 +02:00
a0e8eebcb3 Manager: make access to job compilers script thread-safe
When on-disk job compiler scripts are supported, they will be reloaded
often, and it becomes more important to have the access to the map of
loaded job compilers thread-safe.
2022-06-20 18:09:33 +02:00
defa5b0431 Refactor: extract 'get the embedded filesystem' to a separate function
The global `scriptFS` variable was too easy to access, which caused an
issue where the mandatory `"scripts"` subdirectory was not passed.
Accessing via a getter function that hides this requirement prevents this.
2022-06-20 17:43:08 +02:00
201236cf46 Refactor: take some functions out of job_compilers.Service
Take some functions out of the `Service` struct, as they are more or less
standalone anyway. This will also make it easier later to make things
thread-safe, as that'll become important when files can get live-reloaded.
2022-06-20 17:26:17 +02:00
d5c527209f Cleanup: rename local var from compiler to service
The `Load()` function returns a `*Service`, and it was confusing that the
local variable is named `compiler` instead. Now it's called `service`.

No functional changes.
2022-06-20 17:21:19 +02:00
89fdc45b45 Manager: ignore small JS files
Empty (or almost-empty) JS files are ignored by the job compiler.
2022-06-20 17:14:06 +02:00
7a89c07fc9 Manager, refactor access to JS script files
Refactor the JS script file loading code so that it's tied to the `fs.FS`
interface for longer, and less to the specifics of our `embed.FS` instance.
This should make it possible to use other filesystems, like a real on-disk
one, to load scripts.
2022-06-20 17:06:46 +02:00
2d05e1c773 Fix unit test for recent scheduler change
Fix unit test for rF1586c37b.
2022-06-20 16:05:36 +02:00
380d55b4f0 Cleanup: rename job_compilers/path.go to js_path.go
Rename the file by adding `js_` suffix, to indicate it's for exposing a
"path" object to JavaScript.

No functional changes.
2022-06-20 15:57:03 +02:00
a7fbbf3313 Cleanup: rename job_compilers/process.go to js_process.go
Rename the file by adding `js_` suffix, to indicate it's for exposing a
"process" object to JavaScript.

No functional changes.
2022-06-20 15:56:09 +02:00
1586c37b32 Manager: mark task as active as soon as it is assigned to a worker
Move the task to 'active' status so that it won't be assigned to another
worker. This also enables the task timeout monitoring.
2022-06-20 13:00:49 +02:00
2a4c9b2c13 Worker: enable SQLite foreign keys
They're not used now, but enabling them is good default behaviour anyway.
2022-06-20 13:00:49 +02:00
de5d12362d Manager: add sleep_repeats parameter to echo-sleep-test job type
This makes it convenient to create an arbitrary number of tasks.
2022-06-20 11:44:41 +02:00
a2b667c043 Manager: log blocklist threshold 2022-06-17 17:15:23 +02:00
13bdb0ed73 Manager: remove outdated TODO 2022-06-17 17:15:13 +02:00
a368230afa Manager: fix race condition in logging of worker name/UUID
Instead of updating the logger in the context, just store a new logger
in a new sub-context.
2022-06-17 17:13:32 +02:00
64c8fa851d Show assigned worker in task details
Show the worker assigned to the task in the task details view, as link
to the worker itself.
2022-06-17 16:36:55 +02:00
7327896db9 Worker: allow overriding worker name from environment
Allow overriding the worker name by setting the `FLAMENCO_WORKER_NAME`
environment variable. This makes it easy to do from Docker configs, and,
more importantly, from the scripts I use to run multiple workers on the
same machine while developing Flamenco.
2022-06-17 16:24:03 +02:00
cdb7789f08 Refactor: Manager, move test code
Move code that covers `worker_task_updates.go` into
`worker_task_updates_test.go`.

No functional changes.
2022-06-17 15:51:15 +02:00
046853932d Manager: re-queue previously failed tasks of worker when blocklisting
When a Worker is blocked from a job, re-queue its previously failed tasks
so that other workers can give them a try.
2022-06-17 15:49:16 +02:00
b95bed1f96 Refactor: rename RequeueTasksOfWorker to RequeueActiveTasksOfWorker
Soon there will be another function to requeue tasks of workers by other
criteria, so being clear in the name helps.

No functional changes.
2022-06-17 15:49:16 +02:00
fd31a85bcd Manager: add blocking of workers when they fail certain tasks too much
When a worker fails too many tasks, of the same task type, on the same job,
it'll get blocked from doing those.
2022-06-17 15:49:16 +02:00
56abc825a6 Refactor: Manager, refactor handling of task failures
Split the handling of soft and hard failures into separate functions.

No functional changes intended.
2022-06-17 15:01:52 +02:00
6feee74c54 Cleanup: Manager, move worker task update handling code into its own file
Move the code related to task updates from workers to
`worker_task_updates.go`. It's going to get more complex with the
blocklisting in there; this prepares for that.

No functional changes.
2022-06-17 11:46:07 +02:00
81f81d0e0a Show task failure list in the web frontend
Show the task failure list in the web frontend's `TaskDetails` component.
2022-06-17 11:37:56 +02:00
0b5140fc5f Manager: clear task failure list on requeueing of jobs & tasks
When a job or task gets requeued from the web interface, its task
failure lists (i.e. the list of workers that previously failed this
task) will be cleared.

This clearing doesn't happen in other situations, e.g. when a worker
signs off and its task gets requeued, the task's failure list will
remain as-is.
2022-06-17 11:37:28 +02:00
e9fca8d993 Cleanup: typo fix in comment 2022-06-17 11:03:43 +02:00
b991e5f446 Cleanup: Manager, clarify some function names of the task state machine
Rename functions `onTaskStatusX` to `updateJobOnTaskStatusX` to clarify
their responsibility is to update the job in reaction to a task status
change.

No functional changes.
2022-06-17 11:01:41 +02:00
8764f8f7c1 Manager: task scheduler, don't schedule tasks the worker failed before
When a worker asks for a task to perform, don't give it a task that it
failed before.
2022-06-16 16:02:28 +02:00
ec10128f85 Worker: Sleep command, return error when sleep time is negative
I need a way to reliably generate task errors, and having a more thorough
check on the sleep duration parameter seemed a nice way to create those.
2022-06-16 15:46:03 +02:00
d5d0893b05 Worker: use explicit types for command parameter errors
Introduce `ParameterMissingError` and `ParameterInvalidError` structs, to
be returned from command executors. These replace free-form `fmt.Errorf()`
style errors.
2022-06-16 15:45:09 +02:00
8af1b9d976 Worker: fix sync issue in TestUpstreamBufferManagerUnavailable unit test
Fix synchronisation/goroutine issue in the "upstream buffer" test,
where very occasionally the queue size was checked at the wrong time.
2022-06-16 15:43:20 +02:00
da1b42f9fa Worker: fix sqlite connection issue in unit tests
Fix sqlite issues in the "upstream buffer" test. The test used
`:memory:` to have an in-memory DB to separate from other tests. The
"flush at shutdown" code runs in a different goroutine, though, and
creates a new DB connection. The SQLite separation was too strong,
making that function not find any tables. This is now solved by having
an in-memory database that's shared between all connections made from
the same unit test.
2022-06-16 15:42:52 +02:00
7e28cfa69c Worker: add task failures to the task log as well
Task failures were only placed in the task's activity field, and are now
added to the log as well.
2022-06-16 12:22:05 +02:00
e1309ad8fc Worker: flush upstream buffer when shutting down
When shutting down, the worker now tries to flush any buffered task updates
before closing.
2022-06-16 12:21:17 +02:00
9ddf72fa37 Worker: sign off as last step of shutdown
Within the shutdown procedure, signing off is now the last thing the
worker does. This makes things more consistent from the Manager's point
of view (like receiving last-second log entries while the Worker is still
online).
2022-06-16 12:19:03 +02:00
5bc94101e8 Worker: Avoid sleep at shutdown
Make the sleep between fetching tasks interruptable, so that a shutdown
doesn't have to wait a few seconds.
2022-06-16 12:08:13 +02:00
9ab41984ac Adjust Go code for Nickname -> Name change
This fixes a bug where 'Worker undefined changed status' was logged in
the web interface, as that was (back then incorrectly) `workerupdate.name`.
Now that code is correct.
2022-06-16 11:03:18 +02:00
12f0a605a4 Manager: log configured worker timeout at startup 2022-06-16 10:51:17 +02:00
5f2712980e Manager: task scheduler, check for requested worker status change first
Before checking whether the Worker is allowed to do work (i.e. is in
`awake` state), check any queued-up status changes. Those should be
communicated, before saying "no work for you", so that the Worker can
actually respond to it.
2022-06-16 10:48:38 +02:00
ee53373878 Cleanup: compare worker state to constant instead of hard-coded state
Use the `requiredStatusToGetTask` constant to compare the worker status,
and not just for logging.

No functional changes, just better code.
2022-06-16 10:46:50 +02:00
40f711bf69 Fix two unit tests for the previous commit
I pushed too soon :'(
2022-06-16 10:42:04 +02:00
be0b10400f Manager: count workers as 'seen' even when there is no task
Fix a bug where a worker would only be counted as 'seen' by the task
scheduler if it actually got a task assigned.
2022-06-16 10:39:42 +02:00
7d7c2b1bd6 Cleanup: blacklist → blocklist
Change "blacklist" to "blocklist", because that makes people happier.

No functional changes.
2022-06-16 10:36:36 +02:00
6e12a2fb25 Manager: keep track of which worker failed which task
When a Worker indicates a task failed, mark it as `soft-failed` until
enough workers have tried & failed at the same task.

This is the first step in a blocklisting system, where tasks of an
often-failing worker will be requeued to be retried by others.

NOTE: currently the failure list of a task is NOT reset whenever it is
requeued! This will be implemented in a future commit, and is tracked in
`FEATURES.md`.
2022-06-13 18:41:38 +02:00
c5debdeb70 Manager: add 'task failure list' to record workers failing tasks
The persistence layer can now store which worker failed which task, as
preparation for a blocklisting system. Such a system should be able to
determine whether there are still any workers left to do the work.
2022-06-13 18:41:30 +02:00
e35911d106 Manager: add ability to delete jobs
This is needed for a future unit test, and exposed the fact that SQLite
didn't enforce foreign key constraints (and thus also didn't handle
on-delete-cascade attributes). This has been fixed in the previous commit.
2022-06-13 18:41:19 +02:00
e5d0e987e1 Manager: enforce DB foreign key checks at startup
SQLite disables foreign key checks by default, so Flamenco has to enable
them explicitly.
2022-06-13 18:41:19 +02:00
6ec493d944 Manager, more efficiently create tasks
When creating tasks the inter-task dependencies are saved as a 2nd pass,by
updating the tasks in the database. This now only saves those dependencies,
and no longer saves the entire task again.
2022-06-13 18:40:42 +02:00
02bc03ae2b Manager: replace gorm.Model with our own persistence.Model struct
`persistence.Model` contains the common database fields for most model
structs. It is a copy of `gorm.Model`, but without the `DeletedAt`
field (which triggers Gorm's soft deletion).

Soft deletion is not used by Flamenco. If it ever becomes necessary to
support soft-deletion, see https://gorm.io/docs/delete.html#Soft-Delete
2022-06-13 18:40:42 +02:00
ec5b3aac52 Manager: on getting task update from Worker, write log before status change
When receiving a `TaskUpdate` from a Worker, write to the task log, before
handling any task status change.

If both log and task status change are sent, the log will likely contain
the cause of the task state change. Any subsequent task logs, for example
generated by the Manager in response to the status change, should be
logged after that.
2022-06-13 18:40:42 +02:00
25d5b01b3c Cleanup: test errors with assert.NoError() instead of assert.Nil()
No functional changes, just nicer way to test.
2022-06-13 18:40:42 +02:00
6fc936d0a6 Revert accidental debug code
Revert change in rF01c45afc20854918d1f18e6859b4154499d500b6 that made
unit tests use an on-disk database.
2022-06-13 18:40:25 +02:00
b922722614 Manager: broadcast worker timeouts over SocketIO
This way the web interface will also show timed-out workers.
2022-06-13 13:05:20 +02:00
75ca0e652e Cleanup: timeout checker, improve readability of failed tests
No functional changes
2022-06-13 12:50:27 +02:00
1de1e3a9a5 Manager: add 'canary' test to all timeout checker tests
The canary test asserts that certain constants still have the expected
value. Lowering those constants is good for testing the timeout stuff with
the actual Flamenco Manager + Worker (without having to wait 5 minutes for
it to kick in), but it's too easy to accidentally run the unit tests and
get cryptic errors about everything failing horribly and miserably when
you leave those constants low.
2022-06-13 12:50:02 +02:00
5dac3c2dc0 Manager: mark workers as 'seen' when they send updates
Update the 'last seen at' timestamp of workers when they:
- sign on
- sign off
- get a task assigned
- send a task update
- check whether they can keep running their task

Note that this commit is necessary to not have the workers time out
immediately ;-)
2022-06-13 12:47:07 +02:00
986b647967 Manager: re-queue tasks of timed-out workers
Allow other workers to pick up the task(s) assigned to a timed-out worker.
2022-06-13 12:38:35 +02:00
7d5aae25b5 Manager: add timeout checks for workers 2022-06-13 12:33:22 +02:00
e8171fc597 Cleanup: Manager, reduce log level of task timeout checks 2022-06-13 12:33:16 +02:00
67562856d3 Manager: let Gorm create an index on Task.LastTouchedAt
It's used in timeout queries, and there could be tens or hundreds of
thousands of tasks in the database.
2022-06-13 12:33:05 +02:00
c3525c3b1a Manager: move task requeueing to TaskStateMachine
Requeueing the tasks of a specific worker is now done in the
`TaskStateMachine`, such that it can be called from other services as
well in future commits.

This also makes the `LogStorage` service a dependency of the
`TaskStateMachine`, as it needs to write "this task was requeued" kind
of messages to the task logs.
2022-06-13 12:33:01 +02:00
e06bc484f4 Cleanup: manager, move task state machine interfaces to their own file
No functional changes.
2022-06-13 12:32:18 +02:00
01c45afc20 Manager: explicitly store timestamps as UTC
SQLite doesn't handle timezones by default, when you just use something
like `date1 < date2`, for example. This makes GORM explicitly use UTC
timestamps for the `CreatedAt`, `UpdatedAt`, and `DeletedAt` fields.
Our own code should also use UTC when saving timestamps. That way all
datetimes in the database are in the same timezone, and can be compared
naievely.
2022-06-13 12:10:11 +02:00
fe1627dd85 Cleanup: timeout checker, move task-specific code to tasks.go
Just a cleanup to prepare for the addition of worker timeouts.
2022-06-10 14:58:44 +02:00
13307c5a24 Manager: add canary test to timeout checker unit test
The `TestTaskTimeout()` unit test assumes specific durations for initial &
subsequent sleeps of the timeout checker. The test will fail quite
cryptically when that assumption doesn't hold, so just test for it at
the start of the unit test.
2022-06-10 14:53:23 +02:00
09902d201c Manager: fix task timeout check logging of assigned workers
The task's worker wasn't fetched from the database, always causing
"unknown worker" messages in the task log.
2022-06-10 14:52:03 +02:00
d90a8b987d Manager: Task Timeout Checker
Tasks that are in state `active` but haven't been 'touched' by a Worker
for 10 minutes or longer will transition to state `failed`.

In the future, it might be better to move the decision about which state
is suitable to the Task State Machine service, so that it can be smarter
and take the history of the task into account. Going to `soft-failed`
first might be a nice touch.
2022-06-10 14:32:02 +02:00
295891a17a Manager: ensure Gorm-generated timestamps are in UTC
SQLite should store all timestamps in UTC, as the database is woefully
unaware of timezones and will compare lexicographically.
2022-06-10 14:31:53 +02:00
24204084c1 Manager: move timestamping of log messages to task_logs package
In the future different services will write to the task log, and thus
it makes sense to move the responsibility of prepending the timestamps
to the log storage service.
2022-06-09 17:00:38 +02:00
819cad1d18 Manager: move broadcasting of task logs via SocketIO to task log service
To ensure all task logs also get broadcast via SocketIO, the responsibility
has moved from the `api_impl` to the `task_logs` package.
2022-06-09 16:49:48 +02:00
04dd479248 Manager: protect task log writing with mutex
A per-task mutex is used to protect the writing of task logs, so that
mutliple goroutines can safely write to the same task log.
2022-06-09 14:44:54 +02:00
92d6693871 Show Task's "last touched" in the web interface 2022-06-09 11:59:43 +02:00
354fd29f9e Manager: Start timeout counting as soon as Worker gets task assigned
Set the task's "last touched" field in the database to "now" as soon as
the task is assigned to a worker.
2022-06-09 11:58:30 +02:00
87bce6be36 Manager: unify logging of task assignment and requeue-on-signoff
The requeue-task-on-worker-signoff operation also needs to log a timestamp.
The code for this, and the recently added code for timestamping the
"task assigned to worker" message, are now unified.
2022-06-09 11:30:46 +02:00
75903a2da3 Manager: prepend timestamp to "task assigned to worker" task log entries
Add a new `clock` service to the Flamenco struct, which allows us to mock
the passing of time, and thus test for timestamps in a stable fashion.
2022-06-09 11:24:02 +02:00
b186ea1828 Manager: write to task log when assigning it to a worker 2022-06-09 10:59:44 +02:00
b4d2fc4231 Manager: keep track of when a Worker last worked on a task
This will be used for keeping track of stuck tasks.
2022-06-03 16:33:50 +02:00
0be1ca30dd Cleanup: manager, move api_impl interfaces to interfaces.go
The number of interfaces declared by the `api_impl` package is getting
large, so they deserve their own file.

No functional changes.
2022-06-03 15:52:07 +02:00
8e7f1e2868 Manager: some extra unit tests for worker signoff behaviour 2022-06-02 16:37:29 +02:00
6cf82e5d43 Manager: cleanup, refactor Worker state change request persistence code
Move the setting & clearing of worker state change requests into separate
functions.

No functional changes.
2022-06-02 16:36:06 +02:00
132ce8f2ec Merge 'shutdown' and 'offline' states
Move the 'shutdown' state code to the 'offline' state, to match the
removal of the 'shutdown' state from the OpenAPI definition.
2022-06-02 16:35:07 +02:00
678308fb6d Manager: allow cancelling worker state change requests
A worker state change request can now be cancelled by requesting the worker
to go to its current state. In other words, a previously requested change
`A → B` can be cancelled by requesting the worker goes to state `A`.

Previously this would simply overwrite the last request, resulting in a
requested state change `A → A`. Having this non-lazy would even interrupt
the currently running task.
2022-06-02 12:43:16 +02:00
9ed6b6d931 Manager: adjust code for WorkerStatusChangeRequest extraction
See preceeding OpenAPI change.
2022-06-02 12:17:54 +02:00
ae6831ce6e Manager: fix unit test
rFcfb17b178da2055ef12b2aa2ad8f7f778a952bc3 changed the semantics of
`SocketIOWorkerUpdate`, in the sense that any update that doesn't change
the worker status can omit `previous_status`. This commit adjusts the
unit test for this.
2022-06-02 12:13:25 +02:00