131fd59ff6
This fixes race condition in Active Storage when multiple preprocessed variants are defined for a `Previewable` file is attached. When a variant is specified for a "previewable" file type (e.g. video or PDF) attachment, a `preview_image` attachment is first created and attached on the original blob and then any user-specified variants are derived from _that_ preview image. When those variants are named and have `preprocessed: true`, the jobs to create those variants are queued simultaneously. Example from my case: ```ruby has_one_attached :file, dependent: :purge_later do |attachable| attachable.variant :preview, resize_to_fill: [400, 400], preprocessed: true attachable.variant :still, format: "jpg", saver: {quality: 85}, preprocessed: true end ``` When a `Previewable` attachment is created (a video, in my case), `TransformJob.perform_later` is called for each named variant with `preprocessed: true`. Unless your queue adapter is synchronous (e.g. :inline or :test), this results in a race condition in which every such variant's worker will check `processed?`, see that no `preview_image` attachment exists yet on the `ActiveStorage::Blob`, and: 1. Redundantly download the file from storage 2. Create duplicative ActiveStorage::Attachment and `ActiveStorage::Blob` records for the `preview_image` attachment (all but one of which will be orphaned from the original blob's `has_one_attached :preview_image`) 3. Create variant blobs (and associated `ActiveStorage::VariantRecord`) that are similarly orphaned (by virtue of being a variant of an orphaned `preview_image` blob) As a result, if the video is ever purged, `PurgeJob` will only find the current `has_one_attached :preview_image` and whatever variant demanded it into existence, then leave the rest as orphaned records in the database and in storage. Pretty simple: wrap the first step of the job in `blob.with_lock {}`. By pessimistically locking on the blob, we can prevent processing the preview image multiple times by multiple `TransformJob` jobs running concurrently. Alternate approaches would all be more work: * Queuing a `PreviewImage` job instead of N `TransformJob` and have it, only after `preview_image` is attached, enqueue those `TransformJob` jobs * Batching up all the named variant transformations into a single meta-job Writing a test for this inside Rails would be difficult because it would require running the resulting TransformJob jobs concurrently. I [started a test](https://github.com/searls/rails/blob/fix-video-duplicate-preview-variants/activestorage/test/models/variant_with_record_test.rb#L348-L367) but failed to reproduce, in part because the test queue adapter will perform enqueued jobs inline instead of concurrently. In order to write a test that replicated the issue appropriately, we might first need a new option for `perform_enqueued_jobs(async: true) { … }` If you're interested, [this gist](https://gist.github.com/searls/5b8298abe88b3206f670ea3c6d574aab) includes a driver script and output before and after the patch showing it working. I only found this because I'm a total cheapskate and was literally counting records in my S3 bucket to ensure `PurgeJob` worked. Then I wasted the next two days trying to figure out why before landing on this. I strongly suspect that ActiveStorage users who host video and take advantage of `preprocessed: true` named variants will have a lot of orphaned stuff floating around their buckets. To see if you have any such "zombie" preview_images (and presumably, associated variants) floating around your application that would survive calls to `purge` on the owning attachment, you could write a query like this: ``` ActiveStorage::Attachment .joins("INNER JOIN active_storage_attachments as other_attachments ON active_storage_attachments.record_id = other_attachments.record_id AND active_storage_attachments.id != other_attachments.id") .where( :name => "preview_image", :record_type => "ActiveStorage::Blob", "other_attachments.name" => "preview_image", "other_attachments.record_type" => "ActiveStorage::Blob" ) .distinct ``` Clearing out one's production database and backend storage to get this all right-sized should be a fun exercise for the reader. Co-authored-by: Aaron Patterson <aaron.patterson@gmail.com> |
||
---|---|---|
.. | ||
analyze_job.rb | ||
base_job.rb | ||
mirror_job.rb | ||
preview_image_job.rb | ||
purge_job.rb | ||
transform_job.rb |