git-lfs/docs/proposals/transfer_adapters.md
Dimitris Apostolou 21b0402690
Fix typos
2022-01-05 08:49:08 +02:00

4.3 KiB

Transfer adapters for resumable upload / download

Concept

To allow the uploading and downloading of LFS content to be implemented in more ways than the current simple HTTP GET/PUT approach. Features that could be supported by opening this up to other protocols might include:

  • Resumable transfers
  • Block-level de-duplication
  • Delegation to 3rd party services like Dropbox / Google Drive / OneDrive
  • Non-HTTP services

API extensions

See the API documentation for specifics. All changes are optional extras so there are no breaking changes to the API.

The current HTTP GET/PUT system will remain the default. When a version of the git-lfs client supports alternative transfer mechanisms, it notifies the server in the API request using the accept-transfers field.

If the server also supports one of the mechanisms the client advertised, it may select one and alter the upload / download URLs to point at resources compatible with this transfer mechanism. It must also indicate the chosen transfer mechanism in the response using the transfer field.

The URLs provided in this case may not be HTTP, they may be custom protocols. It is up to each individual transfer mechanism to define how URLs are used.

Client extensions

Phase 1: refactoring & abstraction

  1. Introduce a new concept of 'transfer adapter'.
  2. Adapters can provide either upload or download support, or both. This is necessary because some mechanisms are unidirectional, e.g. HTTP Content-Range is download only, tus.io is upload only.
  3. Refactor our current HTTP GET/PUT mechanism to be the default implementation for both upload & download
  4. The LFS core will pass oids to transfer to this adapter in bulk, and receive events back from the adapter for transfer progress, and file completion.
  5. Each adapter is responsible for its own parallelism, but should respect the lfs.concurrenttransfers setting. For example the default (current) approach will parallelise on files (oids), but others may parallelise in other ways e.g. downloading multiple parts of the same file at once
  6. Each adapter should store its own temporary files. On file completion it must notify the core which in the case of a download is then responsible for moving a completed file into permanent storage.
  7. Update the core to have a registry of available transfer mechanisms which it passes to the API, and can recognise a chosen one in the response. Default to our refactored original.

Phase 2: basic resumable downloads

  1. Add a client transfer adapter for HTTP Range headers
  2. Add a range request reference implementation to our integration test server

Phase 3: basic resumable uploads

  1. Add a client transfer adapter for tus.io (upload only)
  2. Add a tus.io reference implementation to our integration test server

Phase 4: external transfer adapters

Ideally we should allow people to add other transfer implementations so that we don't have to implement everything, or bloat the git-lfs binary with every custom system possible.

Because Go is statically linked it's not possible to extend client functionality at runtime through loading libraries, so instead I propose allowing an external process to be invoked, and communicated with via a defined stream protocol. This protocol will be logically identical to the internal adapters; the core passing oids and receiving back progress and completion notifications; just that the implementation will be in an external process and the messages will be serialised over streams.

Only one process will be launched and will remain for the entire period of all transfers. Like internal adapters, the external process will be responsible for its own parallelism and temporary storage, so internally they can (should) do multiple transfers at once.

  1. Build a generic 'external' adapter which can invoke a named process and communicate with it using the standard stream protocol (probably just over stdout / stdin)
  2. Establish a configuration for external adapters; minimum is an identifier (client and server must agree on what that is) and a path to invoke
  3. Implement a small test process in Go which simply wraps the default HTTP mechanism in an external process, to prove the approach (not in release)