Fix 429 retry-after handling of the LFS batch API endpoint

While the 429 retry-after HTTP error handling for the storage endpoint works
just as expected, the same was not true for batch API endpoint. In case of a
429 error, the call to the batch API endpoint would be retried as well as the
`retry-after` HTTP header honoured correctly, but the LFS command would still
exit with a generic `LFS: Error` message. This was caused by the fact, that
while the retry-able error from the `Batch` method was correctly overwritten by
`nil` in case it of a retry, it would finally again be converted into a
retry-able error and hence be no longer `nil`. This would lead to a bubble-up of
the original 429 retry-able HTTP error to the final error check, altough the
exact operation was successfully retried. Furthermore, the overwrite with
`nil` during correct handling of the first object caused all subsequent objects
to fail and hence never being enqueued for retrial. This was solved by
removing the immediate overwrite by `nil` in case of a retry-able-later
error and doing the final error conversion based on the whole batch
result instead of a single object result.
This commit is contained in:
Fabio Huser 2021-08-08 20:22:06 +02:00
parent 32f571d81d
commit a3ecbcc7f6

@ -557,22 +557,30 @@ func (q *TransferQueue) enqueueAndCollectRetriesFor(batch batch) (batch, error)
var err error
bRes, err = Batch(q.manifest, q.direction, q.remote, q.ref, batch.ToTransfers())
if err != nil {
var hasNonScheduledErrors = false
// If there was an error making the batch API call, mark all of
// the objects for retry, and return them along with the error
// that was encountered. If any of the objects couldn't be
// retried, they will be marked as failed.
for _, t := range batch {
if q.canRetryObject(t.Oid, err) {
hasNonScheduledErrors = true
enqueueRetry(t, err, nil)
} else if readyTime, canRetry := q.canRetryObjectLater(t.Oid, err); canRetry {
err = nil
enqueueRetry(t, err, &readyTime)
} else {
hasNonScheduledErrors = true
q.wait.Done()
}
}
return next, errors.NewRetriableError(err)
// Only return error and mark operation as failure if at least one object
// was not enqueued for retrial at a later point.
if hasNonScheduledErrors {
return next, errors.NewRetriableError(err)
} else {
return next, nil
}
}
}