2015-08-23 08:32:18 +00:00
|
|
|
# Extending LFS
|
|
|
|
|
|
|
|
Teams who use Git LFS often have custom requirements for how the pointer files and
|
|
|
|
blobs should be handled. Some examples of extensions that could be built:
|
|
|
|
|
|
|
|
* Compress large files on clean, uncompress them on smudge/fetch
|
|
|
|
* Encrypt files on clean, decrypt on smudge/fetch
|
|
|
|
* Scan files on clean to make sure they don't contain sensitive information
|
|
|
|
|
|
|
|
The basic extensibilty model is that LFS extensions must be registered explicitly, and
|
|
|
|
they will be invoked on clean and smudge to manipulate the contents of the files as
|
|
|
|
needed. On clean, LFS itself ensures that the pointer file is updated with all the
|
|
|
|
information needed to be able to smudge correctly, and the extensions never modify the
|
|
|
|
pointer file directly.
|
|
|
|
|
|
|
|
Note that LFS is currently transitioning away from using the Git smudge filter, in favor
|
|
|
|
of smudging all files using "git-lfs fetch" post checkout. However, that detail should
|
|
|
|
be transparent to extensions, since they are still invoked on a per-file basis.
|
|
|
|
|
|
|
|
## Registration
|
|
|
|
|
|
|
|
To register an LFS extension, it must be added to the Git config. Each extension needs
|
|
|
|
to define:
|
|
|
|
|
|
|
|
* Its unique name. This will be used as part of the key in the pointer file.
|
|
|
|
* The command to run on clean
|
|
|
|
* The command to run on smudge/fetch
|
|
|
|
* The priority of the extension, which must be a unique, non-negative integer
|
|
|
|
|
|
|
|
The sequence "%f" in the clean and smudge commands will be replaced by the filename being
|
|
|
|
processed.
|
|
|
|
|
|
|
|
Here's an example extension registration in the Git config:
|
|
|
|
|
|
|
|
```
|
|
|
|
[lfs "extension.foo"]
|
|
|
|
clean = foo clean %f
|
|
|
|
smudge = foo smudge %f
|
|
|
|
priority = 0
|
|
|
|
[lfs "extension.bar"]
|
|
|
|
clean = bar clean %f
|
|
|
|
smudge = bar smudge %f
|
|
|
|
priority = 1
|
|
|
|
```
|
|
|
|
|
|
|
|
## Clean
|
|
|
|
|
|
|
|
When staging a file, Git invokes the LFS clean filter, as described earlier. If no
|
|
|
|
extensions are installed, the LFS clean filter reads bytes from STDIN, calculates the
|
|
|
|
SHA-256 signature, and writes the bytes to a temp file. It then moves the temp file into
|
|
|
|
the appropriate place in .git/lfs/objects and writes a valid pointer file to STDOUT.
|
|
|
|
|
|
|
|
When an extension is installed, LFS will invoke the extension to do additional processing
|
|
|
|
on the bytes before writing them into the temp file. If multiple extensions are
|
|
|
|
installed, they are invoked in the order defined by their priority. LFS will also insert
|
|
|
|
a key in the pointer file for each extension that was invoked, indicating both the order
|
|
|
|
that the extension was invoked and the oid of the file before that extension was invoked.
|
|
|
|
All of that information is required to be able to reliably smudge the file later. Each
|
|
|
|
new line in the pointer file will be of the form
|
|
|
|
|
|
|
|
`ext-{order}-{name} {hash-method}:{hash-of-input-to-extension} `
|
|
|
|
|
|
|
|
This naming ensures that all extensions are written in both alphabetical and priority
|
|
|
|
order, and also shows the progression of changes to the oid as it is processed by the
|
|
|
|
extensions.
|
|
|
|
|
|
|
|
Here's an example sequence, assuming extensions foo and bar are installed, as shown in
|
|
|
|
the previous section.
|
|
|
|
|
|
|
|
* Git passes the original contents of the file to LFS clean over STDIN
|
|
|
|
* LFS reads those bytes and calculates the original SHA-256 signature as it does so
|
|
|
|
* LFS streams the bytes to STDIN of foo clean, which is expected to write
|
|
|
|
those bytes, modified or not, to its STDOUT
|
|
|
|
* LFS reads the bytes from STDOUT of foo clean, calculates the SHA-256
|
|
|
|
signature, and writes them to STDIN of bar clean, which then writes those
|
|
|
|
bytes, modified or not, to its STDOUT
|
|
|
|
* LFS reads the bytes from STDOUT of bar clean, calculates the SHA-256
|
|
|
|
signature, and writes the bytes to a temp flie
|
|
|
|
* When finished, LFS atomically moves the temp file into .git/lfs/objects, as before
|
|
|
|
* LFS generates the pointer file, with some changes:
|
|
|
|
* The oid and size keys are calculated from the final bytes written into the LFS storage
|
|
|
|
* LFS also writes keys named ext-0-foo and ext-1-bar into the pointer, along
|
|
|
|
with their respective input oid's
|
|
|
|
|
|
|
|
Here's an example pointer file, for a file processed by extensions foo and bar:
|
|
|
|
|
|
|
|
```
|
|
|
|
version https://git-lfs.github.com/spec/v1
|
|
|
|
ext-0-foo sha256:{original hash}
|
|
|
|
ext-1-bar sha256:{hash after foo}
|
|
|
|
oid sha256:{hash after bar}
|
|
|
|
size 123
|
|
|
|
(ending \n)
|
|
|
|
```
|
|
|
|
|
|
|
|
Note: as an optimization, if an extension just does a pass-through, its key can be
|
|
|
|
omitted from the pointer file. This will make smudging the file a bit more efficient
|
|
|
|
since that extension can be skipped. LFS can detect a pass-through extension because the
|
|
|
|
input and output oid's will be the same.
|
|
|
|
|
|
|
|
This implies that extensions must have no side effects other than writing to their STDOUT.
|
|
|
|
Otherwise LFS has no way to know what extensions modified a file.
|
|
|
|
|
|
|
|
|
|
|
|
## Smudge
|
|
|
|
|
|
|
|
When a file is checked out, Git invokes the LFS smudge filter, as described earlier. If
|
|
|
|
no extensions are installed, the LFS smudge filter inspects the first 100 bytes of the
|
|
|
|
bytes off STDIN, and if it is a pointer file, uses the oid to find the correct object in
|
|
|
|
the LFS storage, and writes those bytes to STDOUT so that Git can write them to the
|
|
|
|
working directory.
|
|
|
|
|
|
|
|
If the pointer file indicates that extensions were invoked on that file, then those
|
|
|
|
extensions must be installed in order to smudge. If they are not installed, not found,
|
|
|
|
or unusable for any reason, LFS will fail to smudge the file, and outputs an error
|
|
|
|
indicating which extension is missing.
|
|
|
|
|
|
|
|
Each of the extensions indicated in the pointer file must be invoked in reverse order to
|
|
|
|
undo the changes they made to the contents of the file. After each extension is invoked,
|
|
|
|
LFS will compare the SHA-256 signature of the bytes output by the extension with the oid
|
|
|
|
stored in the pointer file as the original input to that same extension. Those
|
|
|
|
signatures must match, otherwise the extension did not undo its changes correctly. In
|
|
|
|
that case, LFS fails to smudge the file, and outputs an error indicating which extension
|
|
|
|
is failing.
|
|
|
|
|
|
|
|
Here's an example sequence, indicating how LFS will smudge the pointer file shown in the
|
|
|
|
previous section:
|
|
|
|
|
|
|
|
* Git passes the bytes of the pointer file to LFS smudge over STDIN. Note that when
|
|
|
|
using "git lfs fetch", LFS reads the files directly from disk rather than off STDIN. The
|
|
|
|
rest of the steps are unaffected either way.
|
|
|
|
* LFS reads those bytes and inspects them to see if this is a pointer file. If it was
|
|
|
|
not, the bytes would just be passed through to STDOUT.
|
|
|
|
* Since it is a pointer file, LFS reads the whole file off STDIN, parses it, and
|
|
|
|
determines that extensions foo and bar both processed the file, in that order.
|
|
|
|
* LFS uses the value of the oid key to find the blob in the .git/lfs/objects folder, or
|
|
|
|
download from the server as needed
|
|
|
|
* LFS writes the contents of the blob to STDIN of bar smudge, which
|
|
|
|
modifies them as needed and writes them to its STDOUT
|
|
|
|
* LFS reads the bytes from STDOUT of bar smudge, calculates the SHA-256
|
|
|
|
signature, and writes the bytes to STDIN of foo smudge, which modifies them
|
|
|
|
as needed and writes to them its STDOUT
|
|
|
|
* LFS reads the bytes from STDOUT of foo smudge, calculates the SHA-256
|
|
|
|
signature, and writes the bytes to its own STDOUT
|
|
|
|
* At the end, ensure that the hashes calculated on the outputs of foo and bar match their
|
|
|
|
corresponding input hashes from the pointer file. If not, write a descriptive error
|
|
|
|
message indicating which extension failed to undo its changes.
|
|
|
|
* Question: On error, should we overwrite the file in the working directory with the
|
|
|
|
original pointer file? Can this be done reliably?
|
|
|
|
|
|
|
|
|
|
|
|
## Handling errors
|
|
|
|
|
|
|
|
If there are errors in the configuration of LFS extensions, such as invalid extension names,
|
|
|
|
duplicate priorities, etc, then any LFS commands that rely on them will abort with a
|
|
|
|
descriptive error message.
|
|
|
|
|
|
|
|
If an extension is unable to perform its task, it can indicate this error by returning a
|
|
|
|
non-zero error code and writing a descriptive error message to its STDERR. The behavior on
|
|
|
|
an error depends on whether we are cleaning or smudging.
|
|
|
|
|
|
|
|
### Clean
|
|
|
|
|
|
|
|
If an extension fails to clean a file, it will return a non-zero error code and write an
|
|
|
|
error message to its STDERR. Because the file was not cleaned correctly, it can't be added
|
|
|
|
to the index. LFS will ensure that no pointer file is added/updated for failed files. In
|
|
|
|
addition, it will display the error messages for any files that could not be cleaned (and
|
|
|
|
keep those errors in a log), so that the user can diagnose the failure, and then rerun "git
|
|
|
|
add" on those files.
|
|
|
|
|
|
|
|
|
|
|
|
### Smudge
|
|
|
|
|
|
|
|
If an extension fails to smudge a file, it will return a non-zero error code and write an
|
|
|
|
error message to its STDERR. Because the file was not smudged correctly, LFS cannot update
|
|
|
|
that file in the working directory. LFS will ensure that the pointer file is written to
|
|
|
|
both the index and working directory. In addition, it will display the error messages for
|
|
|
|
any files that could not be smudged (and keep those errors in a log), so that the user can
|
|
|
|
diagnose the failure and then rerun "git-lfs fetch" to fix up any remaining pointer files.
|