Merge pull request #483 from github/proposals

Proposals
This commit is contained in:
risk danger olson 2015-07-21 13:43:31 -06:00
commit a1efa582eb
2 changed files with 63 additions and 55 deletions

8
docs/proposals/README.md Normal file

@ -0,0 +1,8 @@
# Git LFS Proposals
This directory contains high level proposals for future Git LFS features.
Inclusion here does not guarantee when or if a feature will make it in to Git
LFS. It doesn't even guarantee that the specifics won't change.
Everyone is welcome to submit their own proposal as a markdown file in a
pull request for discussion.

@ -1,6 +1,6 @@
# Extending LFS # Extending LFS
Teams who use Git LFS often have custom requirements for how the pointer files and Teams who use Git LFS often have custom requirements for how the pointer files and
blobs should be handled. Some examples of extensions that could be built: blobs should be handled. Some examples of extensions that could be built:
* Compress large files on clean, uncompress them on smudge/fetch * Compress large files on clean, uncompress them on smudge/fetch
@ -8,18 +8,18 @@ blobs should be handled. Some examples of extensions that could be built:
* Scan files on clean to make sure they don't contain sensitive information * Scan files on clean to make sure they don't contain sensitive information
The basic extensibilty model is that LFS extensions must be registered explicitly, and The basic extensibilty model is that LFS extensions must be registered explicitly, and
they will be invoked on clean and smudge to manipulate the contents of the files as they will be invoked on clean and smudge to manipulate the contents of the files as
needed. On clean, LFS itself ensures that the pointer file is updated with all the needed. On clean, LFS itself ensures that the pointer file is updated with all the
information needed to be able to smudge correctly, and the extensions never modify the information needed to be able to smudge correctly, and the extensions never modify the
pointer file directly. pointer file directly.
Note that LFS is currently transitioning away from using the Git smudge filter, in favor Note that LFS is currently transitioning away from using the Git smudge filter, in favor
of smudging all files using "git-lfs fetch" post checkout. However, that detail should of smudging all files using "git-lfs fetch" post checkout. However, that detail should
be transparent to extensions, since they are still invoked on a per-file basis. be transparent to extensions, since they are still invoked on a per-file basis.
## Registration ## Registration
To register an LFS extension, it must be added to the Git config. Each extension needs To register an LFS extension, it must be added to the Git config. Each extension needs
to define: to define:
* Its unique name. This will be used as part of the key in the pointer file. * Its unique name. This will be used as part of the key in the pointer file.
@ -27,7 +27,7 @@ to define:
* The command to run on smudge/fetch * The command to run on smudge/fetch
* The priority of the extension, which must be a unique, non-negative integer * The priority of the extension, which must be a unique, non-negative integer
The sequence "%f" in the clean and smudge commands will be replaced by the filename being The sequence "%f" in the clean and smudge commands will be replaced by the filename being
processed. processed.
Here's an example extension registration in the Git config: Here's an example extension registration in the Git config:
@ -45,45 +45,45 @@ Here's an example extension registration in the Git config:
## Clean ## Clean
When staging a file, Git invokes the LFS clean filter, as described earlier. If no When staging a file, Git invokes the LFS clean filter, as described earlier. If no
extensions are installed, the LFS clean filter reads bytes from STDIN, calculates the extensions are installed, the LFS clean filter reads bytes from STDIN, calculates the
SHA-256 signature, and writes the bytes to a temp file. It then moves the temp file into SHA-256 signature, and writes the bytes to a temp file. It then moves the temp file into
the appropriate place in .git/lfs/objects and writes a valid pointer file to STDOUT. the appropriate place in .git/lfs/objects and writes a valid pointer file to STDOUT.
When an extension is installed, LFS will invoke the extension to do additional processing When an extension is installed, LFS will invoke the extension to do additional processing
on the bytes before writing them into the temp file. If multiple extensions are on the bytes before writing them into the temp file. If multiple extensions are
installed, they are invoked in the order defined by their priority. LFS will also insert installed, they are invoked in the order defined by their priority. LFS will also insert
a key in the pointer file for each extension that was invoked, indicating both the order a key in the pointer file for each extension that was invoked, indicating both the order
that the extension was invoked and the oid of the file before that extension was invoked. that the extension was invoked and the oid of the file before that extension was invoked.
All of that information is required to be able to reliably smudge the file later. Each All of that information is required to be able to reliably smudge the file later. Each
new line in the pointer file will be of the form new line in the pointer file will be of the form
`ext-{priority}-{name} {hash-method}:{hash-of-input-to-extension} ` `ext-{priority}-{name} {hash-method}:{hash-of-input-to-extension} `
This naming ensures that all extensions are written in both alphabetical and priority This naming ensures that all extensions are written in both alphabetical and priority
order, and also shows the progression of changes to the oid as it is processed by the order, and also shows the progression of changes to the oid as it is processed by the
extensions. extensions.
Here's an example sequence, assuming extensions foo and bar are installed, as shown in Here's an example sequence, assuming extensions foo and bar are installed, as shown in
the previous section. the previous section.
* Git passes the original contents of the file to LFS clean over STDIN * Git passes the original contents of the file to LFS clean over STDIN
* LFS reads those bytes and calculates the original SHA-256 signature as it does so * LFS reads those bytes and calculates the original SHA-256 signature as it does so
* LFS streams the bytes to STDIN of lfs-extension.foo.clean, which is expected to write * LFS streams the bytes to STDIN of lfs-ext.foo.clean, which is expected to write
those bytes, modified or not, to its STDOUT those bytes, modified or not, to its STDOUT
* LFS reads the bytes from STDOUT of lfs-extension.foo.clean, calculates the SHA-256 * LFS reads the bytes from STDOUT of lfs-ext.foo.clean, calculates the SHA-256
signature, and writes them to STDIN of lfs-extension.bar.clean, which then writes those signature, and writes them to STDIN of lfs-ext.bar.clean, which then writes those
bytes, modified or not, to its STDOUT bytes, modified or not, to its STDOUT
* LFS reads the bytes from STDOUT of lfs-extension.bar.clean, calculates the SHA-256 * LFS reads the bytes from STDOUT of lfs-ext.bar.clean, calculates the SHA-256
signature, and writes the bytes to a temp flie signature, and writes the bytes to a temp flie
* When finished, LFS atomically moves the temp file into .git/lfs/objects, as before * When finished, LFS atomically moves the temp file into .git/lfs/objects, as before
* LFS generates the pointer file, with some changes: * LFS generates the pointer file, with some changes:
* The oid and size keys are calculated from the final bytes written into the LFS storage * The oid and size keys are calculated from the final bytes written into the LFS storage
* LFS also writes keys named extension-1-foo and extension-2-bar into the pointer, along * LFS also writes keys named extension-1-foo and extension-2-bar into the pointer, along
with their respective input oid's with their respective input oid's
Here's an example pointer file, for a file processed by extensions foo and bar: Here's an example pointer file, for a file processed by extensions foo and bar:
``` ```
version https://git-lfs.github.com/spec/v1 version https://git-lfs.github.com/spec/v1
ext-1-foo sha256:{original hash} ext-1-foo sha256:{original hash}
@ -93,9 +93,9 @@ size 123
(ending \n) (ending \n)
``` ```
Note: as an optimization, if an extension just does a pass-through, its key can be Note: as an optimization, if an extension just does a pass-through, its key can be
omitted from the pointer file. This will make smudging the file a bit more efficient omitted from the pointer file. This will make smudging the file a bit more efficient
since that extension can be skipped. LFS can detect a pass-through extension because the since that extension can be skipped. LFS can detect a pass-through extension because the
input and output oid's will be the same. input and output oid's will be the same.
This implies that extensions must have no side effects other than writing to their STDOUT. This implies that extensions must have no side effects other than writing to their STDOUT.
@ -104,48 +104,48 @@ Otherwise LFS has no way to know what extensions modified a file.
## Smudge ## Smudge
When a file is checked out, Git invokes the LFS smudge filter, as described earlier. If When a file is checked out, Git invokes the LFS smudge filter, as described earlier. If
no extensions are installed, the LFS smudge filter inspects the first 100 bytes of the no extensions are installed, the LFS smudge filter inspects the first 100 bytes of the
bytes off STDIN, and if it is a pointer file, uses the oid to find the correct object in bytes off STDIN, and if it is a pointer file, uses the oid to find the correct object in
the LFS storage, and writes those bytes to STDOUT so that Git can write them to the the LFS storage, and writes those bytes to STDOUT so that Git can write them to the
working directory. working directory.
If the pointer file indicates that extensions were invoked on that file, then those If the pointer file indicates that extensions were invoked on that file, then those
extensions must be installed in order to smudge. If they are not installed, not found, extensions must be installed in order to smudge. If they are not installed, not found,
or unusable for any reason, LFS will fail to smudge the file, and outputs an error or unusable for any reason, LFS will fail to smudge the file, and outputs an error
indicating which extension is missing. indicating which extension is missing.
Each of the extensions indicated in the pointer file must be invoked in reverse order to Each of the extensions indicated in the pointer file must be invoked in reverse order to
undo the changes they made to the contents of the file. After each extension is invoked, undo the changes they made to the contents of the file. After each extension is invoked,
LFS will compare the SHA-256 signature of the bytes output by the extension with the oid LFS will compare the SHA-256 signature of the bytes output by the extension with the oid
stored in the pointer file as the original input to that same extension. Those stored in the pointer file as the original input to that same extension. Those
signatures must match, otherwise the extension did not undo its changes correctly. In signatures must match, otherwise the extension did not undo its changes correctly. In
that case, LFS fails to smudge the file, and outputs an error indicating which extension that case, LFS fails to smudge the file, and outputs an error indicating which extension
is failing. is failing.
Here's an example sequence, indicating how LFS will smudge the pointer file shown in the Here's an example sequence, indicating how LFS will smudge the pointer file shown in the
previous section: previous section:
* Git passes the bytes of the pointer file to LFS smudge over STDIN. Note that when * Git passes the bytes of the pointer file to LFS smudge over STDIN. Note that when
using "git lfs fetch", LFS reads the files directly from disk rather than off STDIN. The using "git lfs fetch", LFS reads the files directly from disk rather than off STDIN. The
rest of the steps are unaffected either way. rest of the steps are unaffected either way.
* LFS reads those bytes and inspects them to see if this is a pointer file. If it was * LFS reads those bytes and inspects them to see if this is a pointer file. If it was
not, the bytes would just be passed through to STDOUT. not, the bytes would just be passed through to STDOUT.
* Since it is a pointer file, LFS reads the whole file off STDIN, parses it, and * Since it is a pointer file, LFS reads the whole file off STDIN, parses it, and
determines that extensions foo and bar both processed the file, in that order. determines that extensions foo and bar both processed the file, in that order.
* LFS uses the value of the oid key to find the blob in the .git/lfs/objects folder, or * LFS uses the value of the oid key to find the blob in the .git/lfs/objects folder, or
download from the server as needed download from the server as needed
* LFS writes the contents of the blob to STDIN of lfs-extension.bar.smudge, which * LFS writes the contents of the blob to STDIN of lfs-ext.bar.smudge, which
modifies them as needed and writes them to its STDOUT modifies them as needed and writes them to its STDOUT
* LFS reads the bytes from STDOUT of lfs-extension.bar.smudge, calculates the SHA-256 * LFS reads the bytes from STDOUT of lfs-ext.bar.smudge, calculates the SHA-256
signature, and writes the bytes to STDIN of lfs-extension.foo.smudge, which modifies them signature, and writes the bytes to STDIN of lfs-ext.foo.smudge, which modifies them
as needed and writes to them its STDOUT as needed and writes to them its STDOUT
* LFS reads the bytes from STDOUT of lfs-extension.foo.smudge, calculates the SHA-256 * LFS reads the bytes from STDOUT of lfs-ext.foo.smudge, calculates the SHA-256
signature, and writes the bytes to its own STDOUT signature, and writes the bytes to its own STDOUT
* At the end, ensure that the hashes calculated on the outputs of foo and bar match their * At the end, ensure that the hashes calculated on the outputs of foo and bar match their
corresponding input hashes from the pointer file. If not, write a descriptive error corresponding input hashes from the pointer file. If not, write a descriptive error
message indicating which extension failed to undo its changes. message indicating which extension failed to undo its changes.
* Question: On error, should we overwrite the file in the working directory with the * Question: On error, should we overwrite the file in the working directory with the
original pointer file? Can this be done reliably? original pointer file? Can this be done reliably?
@ -176,4 +176,4 @@ error message to its STDERR. Because the file was not smudged correctly, LFS ca
that file in the working directory. LFS will ensure that the pointer file is written to that file in the working directory. LFS will ensure that the pointer file is written to
both the index and working directory. In addition, it will display the error messages for both the index and working directory. In addition, it will display the error messages for
any files that could not be smudged (and keep those errors in a log), so that the user can any files that could not be smudged (and keep those errors in a log), so that the user can
diagnose the failure and then rerun "git-lfs fetch" to fix up any remaining pointer files. diagnose the failure and then rerun "git-lfs fetch" to fix up any remaining pointer files.