Fixes race condition when multiple preprocessed variants are defined for a Previewable file is attached
This fixes race condition in Active Storage when multiple preprocessed variants are defined for a `Previewable` file is attached. When a variant is specified for a "previewable" file type (e.g. video or PDF) attachment, a `preview_image` attachment is first created and attached on the original blob and then any user-specified variants are derived from _that_ preview image. When those variants are named and have `preprocessed: true`, the jobs to create those variants are queued simultaneously. Example from my case: ```ruby has_one_attached :file, dependent: :purge_later do |attachable| attachable.variant :preview, resize_to_fill: [400, 400], preprocessed: true attachable.variant :still, format: "jpg", saver: {quality: 85}, preprocessed: true end ``` When a `Previewable` attachment is created (a video, in my case), `TransformJob.perform_later` is called for each named variant with `preprocessed: true`. Unless your queue adapter is synchronous (e.g. :inline or :test), this results in a race condition in which every such variant's worker will check `processed?`, see that no `preview_image` attachment exists yet on the `ActiveStorage::Blob`, and: 1. Redundantly download the file from storage 2. Create duplicative ActiveStorage::Attachment and `ActiveStorage::Blob` records for the `preview_image` attachment (all but one of which will be orphaned from the original blob's `has_one_attached :preview_image`) 3. Create variant blobs (and associated `ActiveStorage::VariantRecord`) that are similarly orphaned (by virtue of being a variant of an orphaned `preview_image` blob) As a result, if the video is ever purged, `PurgeJob` will only find the current `has_one_attached :preview_image` and whatever variant demanded it into existence, then leave the rest as orphaned records in the database and in storage. Pretty simple: wrap the first step of the job in `blob.with_lock {}`. By pessimistically locking on the blob, we can prevent processing the preview image multiple times by multiple `TransformJob` jobs running concurrently. Alternate approaches would all be more work: * Queuing a `PreviewImage` job instead of N `TransformJob` and have it, only after `preview_image` is attached, enqueue those `TransformJob` jobs * Batching up all the named variant transformations into a single meta-job Writing a test for this inside Rails would be difficult because it would require running the resulting TransformJob jobs concurrently. I [started a test](https://github.com/searls/rails/blob/fix-video-duplicate-preview-variants/activestorage/test/models/variant_with_record_test.rb#L348-L367) but failed to reproduce, in part because the test queue adapter will perform enqueued jobs inline instead of concurrently. In order to write a test that replicated the issue appropriately, we might first need a new option for `perform_enqueued_jobs(async: true) { … }` If you're interested, [this gist](https://gist.github.com/searls/5b8298abe88b3206f670ea3c6d574aab) includes a driver script and output before and after the patch showing it working. I only found this because I'm a total cheapskate and was literally counting records in my S3 bucket to ensure `PurgeJob` worked. Then I wasted the next two days trying to figure out why before landing on this. I strongly suspect that ActiveStorage users who host video and take advantage of `preprocessed: true` named variants will have a lot of orphaned stuff floating around their buckets. To see if you have any such "zombie" preview_images (and presumably, associated variants) floating around your application that would survive calls to `purge` on the owning attachment, you could write a query like this: ``` ActiveStorage::Attachment .joins("INNER JOIN active_storage_attachments as other_attachments ON active_storage_attachments.record_id = other_attachments.record_id AND active_storage_attachments.id != other_attachments.id") .where( :name => "preview_image", :record_type => "ActiveStorage::Blob", "other_attachments.name" => "preview_image", "other_attachments.record_type" => "ActiveStorage::Blob" ) .distinct ``` Clearing out one's production database and backend storage to get this all right-sized should be a fun exercise for the reader. Co-authored-by: Aaron Patterson <aaron.patterson@gmail.com>
This commit is contained in:
parent
7025c84eda
commit
131fd59ff6
16
activestorage/app/jobs/active_storage/preview_image_job.rb
Normal file
16
activestorage/app/jobs/active_storage/preview_image_job.rb
Normal file
@ -0,0 +1,16 @@
|
||||
# frozen_string_literal: true
|
||||
|
||||
class ActiveStorage::PreviewImageJob < ActiveStorage::BaseJob
|
||||
queue_as { ActiveStorage.queues[:preview_image] }
|
||||
|
||||
discard_on ActiveRecord::RecordNotFound, ActiveStorage::UnrepresentableError
|
||||
retry_on ActiveStorage::IntegrityError, attempts: 10, wait: :polynomially_longer
|
||||
|
||||
def perform(blob, variations)
|
||||
blob.preview({}).processed
|
||||
|
||||
variations.each do |transformations|
|
||||
blob.preprocessed(transformations)
|
||||
end
|
||||
end
|
||||
end
|
@ -132,8 +132,18 @@ def mirror_blob_later
|
||||
end
|
||||
|
||||
def transform_variants_later
|
||||
named_variants.each do |_name, named_variant|
|
||||
blob.preprocessed(named_variant.transformations) if named_variant.preprocessed?(record)
|
||||
preprocessed_variations = named_variants.filter_map { |_name, named_variant|
|
||||
if named_variant.preprocessed?(record)
|
||||
named_variant.transformations
|
||||
end
|
||||
}
|
||||
|
||||
if blob.preview_image_needed_before_processing_variants?
|
||||
blob.create_preview_image_later(preprocessed_variations)
|
||||
else
|
||||
preprocessed_variations.each do |transformations|
|
||||
blob.preprocessed(transformations)
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
|
@ -98,6 +98,14 @@ def representable?
|
||||
variable? || previewable?
|
||||
end
|
||||
|
||||
def preview_image_needed_before_processing_variants?
|
||||
previewable? && !preview_image.attached?
|
||||
end
|
||||
|
||||
def create_preview_image_later(variations)
|
||||
ActiveStorage::PreviewImageJob.perform_later(self, variations) if representable?
|
||||
end
|
||||
|
||||
def preprocessed(transformations) # :nodoc:
|
||||
ActiveStorage::TransformJob.perform_later(self, transformations) if representable?
|
||||
end
|
||||
|
Loading…
Reference in New Issue
Block a user