Previously, BigDecimal was listed as not needing a serializer. However,
when used with an adapter storing the job arguments as JSON, it would get
serialized as a simple String, resulting in deserialization also producing
a String (instead of a BigDecimal).
By using a serializer, we ensure the round trip is safe.
During upgrade deployments of applications with multiple replicas making use of
BigDecimal job arguments with a queue adapter serializing to JSON, there exists
a possible race condition, whereby a "new" replica enqueues a job with an
argument serialized using `BigDecimalSerializer`, and an "old" replica fails to
deserialize it (as it does not have `BigDecimalSerializer`).
Therefore, to ensure safe upgrades, serialization will not use
`BigDecimalSerializer` until `config.active_job.use_big_decimal_serializer` is
enabled, which can be done safely after successful deployment of Rails 7.1.
This option will be removed in Rails 7.2, when it will become the default.
This adds `:db_runtime` to `perform.active_job` notification payloads,
which is the total time taken by database queries while performing a
job. This value can be used to better understand how a job's time is
spent.
This is similar to the `:db_runtime` value added to
`process_action.action_controller` notification payloads by
`ActiveRecord::Railties::ControllerRuntime`.
Closes#35354.
Co-authored-by: Cory Gwin <gwincr11@github.com>
The ModuleSerializer does not support serializing anonymous classes
because when we try to deserialize the anonymous class, it wouldn't
know which class to use (since class name is nil).
For this reason, ModuleSerialzier now raises an error if the class
name is nil. Previously, ModuleSerializer would raise an `undefined
method `constantize' for nil:NilClass` error during deserialization.
It's not clear why the deserialization failed from the error.
In this commit, we raise an explicit error when trying to serialize
an anonymous class indicating this behaviour is not supported.
Previously in `perform_enqueued_jobs`, `deserialize_arguments_if_needed`
was called before calling `perform_now`. When a record no longer exists
and is serialized using GlobalID this led to raising
an `ActiveJob::DeserializationError` before reaching `perform_now` call.
This behaviour makes difficult testing the job `discard_on/retry_on` logic.
Now `deserialize_arguments_if_needed` call is postponed to when `perform_now`
is called.
Example:
```ruby
class UpdateUserJob < ActiveJob::Base
discard_on ActiveJob::DeserializationError
def perform(user)
# ...
end
end
User.destroy_all
assert_nothing_raised do
perform_enqueued_jobs only: UpdateUserJob
end
assert_no_enqueued_jobs
```
Before this changes the test will fail, now it passes.
In some applications, some classes of errors may be raised during the
execution of a job which the developer would want to retry forever.
These classes of errors would most likely be infrastructure problems that
the developer knows will be resolved eventually but may take a variable
amount of time or errors where due to application business logic, there
could be something temporarily blocking the job from executing, like a
resource that is needed for the job being locked for a lengthy amount of
time.
While an arbitrarily large number of attempts could previously be passed,
this is inexpressive as sometimes the developer may just need the job to
continue to be retried until it eventually succeeds. Without this,
developers would need to include additional code to handle the situation
where the job eventually fails its attempts limit and has to be re-enqueued
manually.
As with many things this should be used with caution and only for errors
that the developer knows will definitely eventually be resolved, allowing
the job to continue.
[Daniel Morton + Rafael Mendonça França]
There is presently no clean way of telling a caller of `perform_later`
the reason why a job failed to enqueue. When the job is enqueued
successfully, the job object itself is returned, but when the job can
not be enqueued, only `false` is returned. This does not allow callers
to distinguish between classes of failures.
One important class of failures is when the job backend experiences a
network partition when communicating with its underlying datastore. It
is entirely possible for that network partition to recover and as such,
code attempting to enqueue a job may wish to take action to reenqueue
that job after a brief delay. This is distinguished from the class of
failures where due a business rule defined in a callback in the
application, a job fails to enqueue and should not be retried.
This PR changes the following:
- Allows a block to be passed to the `perform_later` method. After the
`enqueue` method is executed, but before the result is returned, the
job will be yielded to the block. This allows the code invoking the
`perform_later` method to inspect the job object, even in failure
scenarios.
- Adds an exception `EnqueueError` which job adapters can raise if they
detect a problem specific to their underlying implementation or
infrastructure during the enqueue process.
- Adds two properties to the job base class: `successfully_enqueued` and
`enqueue_error`. `enqueue_error` will be populated by the `enqueue`
method if it rescues an `EnqueueError` raised by the job backend.
`successfully_enqueued` will be true if the job is not rejected by
callbacks and does not cause the job backend to raise an
`EnqueueError` and will be `false` otherwise.
This will allow developers to do something like the following:
MyJob.perform_later do |job|
unless job.successfully_enqueued?
if job.enqueue_error&.message == "Redis was unavailable"
# invoke some code that will retry the job after a delay
end
end
end
Use inheritance to keep the behavior in the right modules.
The order of Instrumentation and Logging had to change to be
flipped to keep the current behavior.
Before this commit, only StandardError exceptions can be handled by
rescue_from handlers.
This changes the rescue clause to catch all Exception objects, allowing
rescue handlers to be defined for Exception classes not inheriting from
StandardError.
This means that rescue handlers that are rescuing Exceptions outside of
StandardError exceptions may rescue exceptions that were not being
rescued before this change.
Co-authored-by: Adrianna Chang <adrianna.chang@shopify.com>
This makes sure jobs don't run twice if `perform_enqueued_jobs` is
called twice without a block.
This also mimics the behavior of using `perform_enqueued_jobs` with a
block, where at the end of the block performed jobs are not in
`enqueued_jobs` but instead in `performed_jobs`.
- ### Problem
If we use `perform_enqueued_jobs` without a block, a job that
uses a retry mechanism to reeenqueue itself would get performed
right away.
This behaviour make sense when using `perform_enqueued_jobs` with
a block.
However I'm expecting `perform_enqueued_jobs` without a block to
perform jobs that are **already** in the queue not the ones that
will get enqueued afterwards.
### Solution
Dup the array of jobs given to avoid future mutation.
- ### Problem
```ruby
MyJob < ApplicationJob
before_enqueue { throw(:abort) }
after_enqueue { # enters here }
end
```
I find AJ behaviour on after_enqueue and after_perform callbacks
weird as they get run even when the callback chain is halted.
It's counter intuitive to run the after_enqueue callbacks even
though the job wasn't event enqueued.
### Solution
In Rails 6.2, I propose to make the new behaviour the default
and stop running after callbacks when the chain is halted.
For application that wants this behaviour now or in 6.1
they can do so by adding the `config.active_job.skip_after_callbacks_if_terminated = true`
in their configuration file.
- ### Problem
ActiveJob will always log "Enqueued MyJob (Job ID) ..." even
if the job doesn't get enqueued through the adapter.
Same problem happens when performing a Job, "Performed MyJob (Job ID) ..." will be logged even when job wasn't performed at all.
This situation can happen either if the callback chain is terminated
(before_enqueue throwing an `abort`) or if an exception is raised.
### Solution
Check if the callback chain is aborted/exception is raised, and log accordingly.
Prior to this change, exponentially_longer had adverse consequences
during system-wide downstream failures. This change adds a random value to the
back off calculation in order to prevent the thundering herd
problem, whereby all retry jobs would retry at the same time.
Specifically this change adds a jitter option to retry_on to enable users of it to
scope the randomness calculation to a reasonable amount. The default is
15% of the exponential back off calculation.