promote extensions to real feature
This commit is contained in:
parent
c47cd718e3
commit
6480bffd8e
184
docs/extensions.md
Normal file
184
docs/extensions.md
Normal file
@ -0,0 +1,184 @@
|
||||
# Extending LFS
|
||||
|
||||
Teams who use Git LFS often have custom requirements for how the pointer files
|
||||
and blobs should be handled. Some examples of extensions that could be built:
|
||||
|
||||
* Compress large files on clean, uncompress them on smudge/fetch
|
||||
* Encrypt files on clean, decrypt on smudge/fetch
|
||||
* Scan files on clean to make sure they don't contain sensitive information
|
||||
|
||||
The basic extensibilty model is that LFS extensions must be registered
|
||||
explicitly, and they will be invoked on clean and smudge to manipulate the
|
||||
contents of the files as needed. On clean, LFS itself ensures that the pointer
|
||||
file is updated with all the information needed to be able to smudge correctly,
|
||||
and the extensions never modify the pointer file directly.
|
||||
|
||||
NOTE: This feature is considered experimental, and included so developers can
|
||||
work on extensions. Exact details of how extensions work are subject to change
|
||||
based on feedback. It is possible for buggy extensions to leave your repository
|
||||
in a bad state, so don't rely on them with a production git repository without
|
||||
extensive testing.
|
||||
|
||||
## Registration
|
||||
|
||||
To register an LFS extension, it must be added to the Git config. Each
|
||||
extension needs to define:
|
||||
|
||||
* Its unique name. This will be used as part of the key in the pointer file.
|
||||
* The command to run on clean (when files are added to git).
|
||||
* The command to run on smudge (when files are downloaded and checked out).
|
||||
* The priority of the extension, which must be a unique, non-negative integer.
|
||||
|
||||
The sequence `%f` in the clean and smudge commands will be replaced by the
|
||||
filename being processed.
|
||||
|
||||
Here's an example extension registration in the Git config:
|
||||
|
||||
```
|
||||
[lfs "extension.foo"]
|
||||
clean = foo clean %f
|
||||
smudge = foo smudge %f
|
||||
priority = 0
|
||||
[lfs "extension.bar"]
|
||||
clean = bar clean %f
|
||||
smudge = bar smudge %f
|
||||
priority = 1
|
||||
```
|
||||
|
||||
## Clean
|
||||
|
||||
When staging a file, Git invokes the LFS clean filter, as described earlier. If
|
||||
no extensions are installed, the LFS clean filter reads bytes from STDIN,
|
||||
calculates the SHA-256 signature, and writes the bytes to a temp file. It then
|
||||
moves the temp file into the appropriate place in .git/lfs/objects and writes a
|
||||
valid pointer file to STDOUT.
|
||||
|
||||
When an extension is installed, LFS will invoke the extension to do additional
|
||||
processing on the bytes before writing them into the temp file. If multiple
|
||||
extensions are installed, they are invoked in the order defined by their
|
||||
priority. LFS will also insert a key in the pointer file for each extension
|
||||
that was invoked, indicating both the order that the extension was invoked and
|
||||
the oid of the file before that extension was invoked. All of that information
|
||||
is required to be able to reliably smudge the file later. Each new line in the
|
||||
pointer file will be of the form:
|
||||
|
||||
`ext-{order}-{name} {hash-method}:{hash-of-input-to-extension}`
|
||||
|
||||
This naming ensures that all extensions are written in both alphabetical and
|
||||
priority order, and also shows the progression of changes to the oid as it is
|
||||
processed by the extensions.
|
||||
|
||||
Here's an example sequence, assuming extensions foo and bar are installed, as
|
||||
shown in the previous section.
|
||||
|
||||
* Git passes the original contents of the file to LFS clean over STDIN.
|
||||
* LFS reads those bytes and calculates the original SHA-256 signature.
|
||||
* LFS streams the bytes to STDIN of `foo clean`, which is expected to write
|
||||
those bytes, modified or not, to its STDOUT.
|
||||
* LFS reads the bytes from STDOUT of `foo clean`, calculates the SHA-256
|
||||
signature, and writes them to STDIN of `bar clean`, which then writes those
|
||||
bytes, modified or not, to its STDOUT.
|
||||
* LFS reads the bytes from STDOUT of `bar clean`, calculates the SHA-256
|
||||
signature, and writes the bytes to a temp file.
|
||||
* When finished, LFS atomically moves the temp file into `.git/lfs/objects`.
|
||||
* LFS generates the pointer file, with some changes:
|
||||
* The oid and size keys are calculated from the final bytes written to LFS
|
||||
local storage.
|
||||
* LFS also writes keys named `ext-0-foo` and `ext-1-bar` into the pointer, along
|
||||
with their respective input oids.
|
||||
|
||||
Here's an example pointer file, for a file processed by extensions foo and bar:
|
||||
|
||||
```
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
ext-0-foo sha256:{original hash}
|
||||
ext-1-bar sha256:{hash after foo}
|
||||
oid sha256:{hash after bar}
|
||||
size 123
|
||||
(ending \n)
|
||||
```
|
||||
|
||||
Note: as an optimization, if an extension just does a pass-through, its key can
|
||||
be omitted from the pointer file. This will make smudging the file a bit more
|
||||
efficient since that extension can be skipped. LFS can detect a pass-through
|
||||
extension because the input and output oids will be the same.
|
||||
|
||||
This implies that extensions must have no side effects other than writing to
|
||||
their STDOUT. Otherwise LFS has no way to know what extensions modified a file.
|
||||
|
||||
## Smudge
|
||||
|
||||
When a file is checked out, Git invokes the LFS smudge filter, as described
|
||||
earlier. If no extensions are installed, the LFS smudge filter inspects the
|
||||
first 100 bytes of the bytes off STDIN, and if it is a pointer file, uses the
|
||||
oid to find the correct object in the LFS storage, and writes those bytes to
|
||||
STDOUT so that Git can write them to the working directory.
|
||||
|
||||
If the pointer file indicates that extensions were invoked on that file, then
|
||||
those extensions must be installed in order to smudge. If they are not
|
||||
installed, not found, or unusable for any reason, LFS will fail to smudge the
|
||||
file, and outputs an error indicating which extension is missing.
|
||||
|
||||
Each of the extensions indicated in the pointer file must be invoked in reverse
|
||||
order to undo the changes they made to the contents of the file. After each
|
||||
extension is invoked, LFS will compare the SHA-256 signature of the bytes output
|
||||
by the extension with the oid stored in the pointer file as the original input
|
||||
to that same extension. Those signatures must match, otherwise the extension
|
||||
did not undo its changes correctly. In that case, LFS fails to smudge the file,
|
||||
and outputs an error indicating which extension is failing.
|
||||
|
||||
Here's an example sequence, indicating how LFS will smudge the pointer file
|
||||
shown in the previous section:
|
||||
|
||||
* Git passes the bytes of the pointer file to LFS smudge over STDIN. Note that
|
||||
when using `git lfs checkout`, LFS reads the files directly from disk rather
|
||||
than off STDIN. The rest of the steps are unaffected either way.
|
||||
* LFS reads those bytes and inspects them to see if this is a pointer file. If
|
||||
it was not, the bytes would just be passed through to STDOUT.
|
||||
* Since it is a pointer file, LFS reads the whole file off STDIN, parses it, and
|
||||
determines that extensions foo and bar both processed the file, in that order.
|
||||
* LFS uses the value of the oid key to find the blob in the `.git/lfs/objects`
|
||||
folder, or download from the server as needed.
|
||||
* LFS writes the contents of the blob to STDIN of `bar smudge`, which modifies
|
||||
them as needed and writes them to its STDOUT.
|
||||
* LFS reads the bytes from STDOUT of `bar smudge`, calculates the SHA-256
|
||||
signature, and writes the bytes to STDIN of `foo smudge`, which modifies them
|
||||
as needed and writes to them its STDOUT.
|
||||
* LFS reads the bytes from STDOUT of `foo smudge`, calculates the SHA-256
|
||||
signature, and writes the bytes to its own STDOUT.
|
||||
* At the end, ensure that the hashes calculated on the outputs of foo and bar
|
||||
match their corresponding input hashes from the pointer file. If not, write a
|
||||
descriptive error message indicating which extension failed to undo its
|
||||
changes.
|
||||
* Question: On error, should we overwrite the file in the working directory with
|
||||
the original pointer file? Can this be done reliably?
|
||||
|
||||
## Handling errors
|
||||
|
||||
If there are errors in the configuration of LFS extensions, such as invalid
|
||||
extension names, duplicate priorities, etc, then any LFS commands that rely on
|
||||
them will abort with a descriptive error message.
|
||||
|
||||
If an extension is unable to perform its task, it can indicate this error by
|
||||
returning a non-zero error code and writing a descriptive error message to its
|
||||
STDERR. The behavior on an error depends on whether we are cleaning or smudging.
|
||||
|
||||
### Clean
|
||||
|
||||
If an extension fails to clean a file, it will return a non-zero error code and
|
||||
write an error message to its STDERR. Because the file was not cleaned
|
||||
correctly, it can't be added to the index. LFS will ensure that no pointer file
|
||||
is added or updated for failed files. In addition, it will display the error
|
||||
messages for any files that could not be cleaned (and keep those errors in a
|
||||
log), so that the user can diagnose the failure, and then rerun "git add" on
|
||||
those files.
|
||||
|
||||
### Smudge
|
||||
|
||||
If an extension fails to smudge a file, it will return a non-zero error code and
|
||||
write an error message to its STDERR. Because the file was not smudged
|
||||
correctly, LFS cannot update that file in the working directory. LFS will
|
||||
ensure that the pointer file is written to both the index and working directory.
|
||||
In addition, it will display the error messages for any files that could not be
|
||||
smudged (and keep those errors in a log), so that the user can diagnose the
|
||||
failure and then rerun `git-lfs checkout` to fix up any remaining pointer files.
|
@ -1,179 +0,0 @@
|
||||
# Extending LFS
|
||||
|
||||
Teams who use Git LFS often have custom requirements for how the pointer files and
|
||||
blobs should be handled. Some examples of extensions that could be built:
|
||||
|
||||
* Compress large files on clean, uncompress them on smudge/fetch
|
||||
* Encrypt files on clean, decrypt on smudge/fetch
|
||||
* Scan files on clean to make sure they don't contain sensitive information
|
||||
|
||||
The basic extensibilty model is that LFS extensions must be registered explicitly, and
|
||||
they will be invoked on clean and smudge to manipulate the contents of the files as
|
||||
needed. On clean, LFS itself ensures that the pointer file is updated with all the
|
||||
information needed to be able to smudge correctly, and the extensions never modify the
|
||||
pointer file directly.
|
||||
|
||||
Note that LFS is currently transitioning away from using the Git smudge filter, in favor
|
||||
of smudging all files using "git-lfs fetch" post checkout. However, that detail should
|
||||
be transparent to extensions, since they are still invoked on a per-file basis.
|
||||
|
||||
## Registration
|
||||
|
||||
To register an LFS extension, it must be added to the Git config. Each extension needs
|
||||
to define:
|
||||
|
||||
* Its unique name. This will be used as part of the key in the pointer file.
|
||||
* The command to run on clean
|
||||
* The command to run on smudge/fetch
|
||||
* The priority of the extension, which must be a unique, non-negative integer
|
||||
|
||||
The sequence "%f" in the clean and smudge commands will be replaced by the filename being
|
||||
processed.
|
||||
|
||||
Here's an example extension registration in the Git config:
|
||||
|
||||
```
|
||||
[lfs "extension.foo"]
|
||||
clean = foo clean %f
|
||||
smudge = foo smudge %f
|
||||
priority = 0
|
||||
[lfs "extension.bar"]
|
||||
clean = bar clean %f
|
||||
smudge = bar smudge %f
|
||||
priority = 1
|
||||
```
|
||||
|
||||
## Clean
|
||||
|
||||
When staging a file, Git invokes the LFS clean filter, as described earlier. If no
|
||||
extensions are installed, the LFS clean filter reads bytes from STDIN, calculates the
|
||||
SHA-256 signature, and writes the bytes to a temp file. It then moves the temp file into
|
||||
the appropriate place in .git/lfs/objects and writes a valid pointer file to STDOUT.
|
||||
|
||||
When an extension is installed, LFS will invoke the extension to do additional processing
|
||||
on the bytes before writing them into the temp file. If multiple extensions are
|
||||
installed, they are invoked in the order defined by their priority. LFS will also insert
|
||||
a key in the pointer file for each extension that was invoked, indicating both the order
|
||||
that the extension was invoked and the oid of the file before that extension was invoked.
|
||||
All of that information is required to be able to reliably smudge the file later. Each
|
||||
new line in the pointer file will be of the form
|
||||
|
||||
`ext-{order}-{name} {hash-method}:{hash-of-input-to-extension} `
|
||||
|
||||
This naming ensures that all extensions are written in both alphabetical and priority
|
||||
order, and also shows the progression of changes to the oid as it is processed by the
|
||||
extensions.
|
||||
|
||||
Here's an example sequence, assuming extensions foo and bar are installed, as shown in
|
||||
the previous section.
|
||||
|
||||
* Git passes the original contents of the file to LFS clean over STDIN
|
||||
* LFS reads those bytes and calculates the original SHA-256 signature as it does so
|
||||
* LFS streams the bytes to STDIN of foo clean, which is expected to write
|
||||
those bytes, modified or not, to its STDOUT
|
||||
* LFS reads the bytes from STDOUT of foo clean, calculates the SHA-256
|
||||
signature, and writes them to STDIN of bar clean, which then writes those
|
||||
bytes, modified or not, to its STDOUT
|
||||
* LFS reads the bytes from STDOUT of bar clean, calculates the SHA-256
|
||||
signature, and writes the bytes to a temp flie
|
||||
* When finished, LFS atomically moves the temp file into .git/lfs/objects, as before
|
||||
* LFS generates the pointer file, with some changes:
|
||||
* The oid and size keys are calculated from the final bytes written into the LFS storage
|
||||
* LFS also writes keys named ext-0-foo and ext-1-bar into the pointer, along
|
||||
with their respective input oid's
|
||||
|
||||
Here's an example pointer file, for a file processed by extensions foo and bar:
|
||||
|
||||
```
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
ext-0-foo sha256:{original hash}
|
||||
ext-1-bar sha256:{hash after foo}
|
||||
oid sha256:{hash after bar}
|
||||
size 123
|
||||
(ending \n)
|
||||
```
|
||||
|
||||
Note: as an optimization, if an extension just does a pass-through, its key can be
|
||||
omitted from the pointer file. This will make smudging the file a bit more efficient
|
||||
since that extension can be skipped. LFS can detect a pass-through extension because the
|
||||
input and output oid's will be the same.
|
||||
|
||||
This implies that extensions must have no side effects other than writing to their STDOUT.
|
||||
Otherwise LFS has no way to know what extensions modified a file.
|
||||
|
||||
|
||||
## Smudge
|
||||
|
||||
When a file is checked out, Git invokes the LFS smudge filter, as described earlier. If
|
||||
no extensions are installed, the LFS smudge filter inspects the first 100 bytes of the
|
||||
bytes off STDIN, and if it is a pointer file, uses the oid to find the correct object in
|
||||
the LFS storage, and writes those bytes to STDOUT so that Git can write them to the
|
||||
working directory.
|
||||
|
||||
If the pointer file indicates that extensions were invoked on that file, then those
|
||||
extensions must be installed in order to smudge. If they are not installed, not found,
|
||||
or unusable for any reason, LFS will fail to smudge the file, and outputs an error
|
||||
indicating which extension is missing.
|
||||
|
||||
Each of the extensions indicated in the pointer file must be invoked in reverse order to
|
||||
undo the changes they made to the contents of the file. After each extension is invoked,
|
||||
LFS will compare the SHA-256 signature of the bytes output by the extension with the oid
|
||||
stored in the pointer file as the original input to that same extension. Those
|
||||
signatures must match, otherwise the extension did not undo its changes correctly. In
|
||||
that case, LFS fails to smudge the file, and outputs an error indicating which extension
|
||||
is failing.
|
||||
|
||||
Here's an example sequence, indicating how LFS will smudge the pointer file shown in the
|
||||
previous section:
|
||||
|
||||
* Git passes the bytes of the pointer file to LFS smudge over STDIN. Note that when
|
||||
using "git lfs fetch", LFS reads the files directly from disk rather than off STDIN. The
|
||||
rest of the steps are unaffected either way.
|
||||
* LFS reads those bytes and inspects them to see if this is a pointer file. If it was
|
||||
not, the bytes would just be passed through to STDOUT.
|
||||
* Since it is a pointer file, LFS reads the whole file off STDIN, parses it, and
|
||||
determines that extensions foo and bar both processed the file, in that order.
|
||||
* LFS uses the value of the oid key to find the blob in the .git/lfs/objects folder, or
|
||||
download from the server as needed
|
||||
* LFS writes the contents of the blob to STDIN of bar smudge, which
|
||||
modifies them as needed and writes them to its STDOUT
|
||||
* LFS reads the bytes from STDOUT of bar smudge, calculates the SHA-256
|
||||
signature, and writes the bytes to STDIN of foo smudge, which modifies them
|
||||
as needed and writes to them its STDOUT
|
||||
* LFS reads the bytes from STDOUT of foo smudge, calculates the SHA-256
|
||||
signature, and writes the bytes to its own STDOUT
|
||||
* At the end, ensure that the hashes calculated on the outputs of foo and bar match their
|
||||
corresponding input hashes from the pointer file. If not, write a descriptive error
|
||||
message indicating which extension failed to undo its changes.
|
||||
* Question: On error, should we overwrite the file in the working directory with the
|
||||
original pointer file? Can this be done reliably?
|
||||
|
||||
|
||||
## Handling errors
|
||||
|
||||
If there are errors in the configuration of LFS extensions, such as invalid extension names,
|
||||
duplicate priorities, etc, then any LFS commands that rely on them will abort with a
|
||||
descriptive error message.
|
||||
|
||||
If an extension is unable to perform its task, it can indicate this error by returning a
|
||||
non-zero error code and writing a descriptive error message to its STDERR. The behavior on
|
||||
an error depends on whether we are cleaning or smudging.
|
||||
|
||||
### Clean
|
||||
|
||||
If an extension fails to clean a file, it will return a non-zero error code and write an
|
||||
error message to its STDERR. Because the file was not cleaned correctly, it can't be added
|
||||
to the index. LFS will ensure that no pointer file is added/updated for failed files. In
|
||||
addition, it will display the error messages for any files that could not be cleaned (and
|
||||
keep those errors in a log), so that the user can diagnose the failure, and then rerun "git
|
||||
add" on those files.
|
||||
|
||||
|
||||
### Smudge
|
||||
|
||||
If an extension fails to smudge a file, it will return a non-zero error code and write an
|
||||
error message to its STDERR. Because the file was not smudged correctly, LFS cannot update
|
||||
that file in the working directory. LFS will ensure that the pointer file is written to
|
||||
both the index and working directory. In addition, it will display the error messages for
|
||||
any files that could not be smudged (and keep those errors in a log), so that the user can
|
||||
diagnose the failure and then rerun "git-lfs fetch" to fix up any remaining pointer files.
|
Loading…
Reference in New Issue
Block a user