Several panic() calls in the basic and tus.io transfer
queue adapaters have very similar messages, which we can
make more consistent with each other; this will ease
translation support when we make these messages translatable
in a subsequent commit.
We rephrase some error message strings which are used when
wrapping other errors so they are clearer and more consistent
with other messages. Note that these strings will be
prepended to the wrapper errors' messages.
In the lfs/diff_index_scanner.go file in particular we
rephrase the additional message to include a full Git
command ("git diff-index"), which is similar to how errors
are reported in the git package.
One error message output by the transfer queue currently
begins with a "tq:" prefix, which is unlike any other error
messages from that package, or from anywhere else, for that
matter.
We can therefore simplify our message string slightly by
simply removing this prefix from it.
Note that this message is not yet passed as a translation
string, but we will address that issue in a subsequent commit.
Each of the upload transfer queue adapters reports an error
with a similar message in the case of a 403 response, and all
of these messages begin with an "http:" prefix (even the
one in the SSH adapter). However, other status codes are not
reported the same way.
This prefix should either be removed from the translatable
message text, or removed entirely; we choose the latter option
here as the simplest. We also now interpolate the status code
rather than make it a fixed part of the translatable string,
and start the messages with a capital letter.
Several error messages output by the "git lfs track" command and
the transfer queue currently begin with a "Git LFS:" prefix, which
is unlike any other error messages from either that command or
that package, or anywhere else, for that matter.
We can therefore simplify our translation message strings slightly
by simply removing this prefix from them.
Remove the Verb method on the Direction object in favor of a method
formatting the operation in progress. This is easier to translate and
will prevent sentence fragments from appearing in strings.
Additionally, move a variable down into a function so that we can
translate it, since strings at the top level cannot be translated due to
the locale object not being initialized yet.
During a batch request, pass the hash algorithm we're using to the
remote server, and read the value, if any, that we get back. If it is
not either absent or the string "sha256", fail, since that means that
the client and server don't agree on the proper hash algorithm.
Since we're about to do a v3.0.0 release, let's bump the version to v3.
Make this change automatically with the following command to avoid any
missed items:
git grep -l github.com/git-lfs/git-lfs/v2 | \
xargs sed -i -e 's!github.com/git-lfs/git-lfs/v2!github.com/git-lfs/git-lfs/v3!g'
While the 429 retry-after HTTP error handling for the storage endpoint works
just as expected, the same was not true for batch API endpoint. In case of a
429 error, the call to the batch API endpoint would be retried as well as the
`retry-after` HTTP header honoured correctly, but the LFS command would still
exit with a generic `LFS: Error` message. This was caused by the fact, that
while the retry-able error from the `Batch` method was correctly overwritten by
`nil` in case it of a retry, it would finally again be converted into a
retry-able error and hence be no longer `nil`. This would lead to a bubble-up of
the original 429 retry-able HTTP error to the final error check, altough the
exact operation was successfully retried. Furthermore, the overwrite with
`nil` during correct handling of the first object caused all subsequent objects
to fail and hence never being enqueued for retrial. This was solved by
removing the immediate overwrite by `nil` in case of a retry-able-later
error and doing the final error conversion based on the whole batch
result instead of a single object result.
When our go.mod file was introduced in commit
114e85c2002091eb415040923d872f8e4a4bc636 in PR #3208, the module
path chosen did not include a trailing /v2 component. However,
the Go modules specification now advises that module paths must
have a "major version suffix" which matches the release version.
We therefore add a /v2 suffix to our module path and all its
instances in import paths.
See also https://golang.org/ref/mod#major-version-suffixes for
details regarding the Go module system's major version suffix rule.
The ClearTempStorage() method of the transfer Adapter interface,
while implemented for most adapters, is never actually called,
and moreover is dangerous in some its current implementations
if it were ever to be called.
Specifically in the basic upload and download adapters and the
SSH adapter, the ClearTempStorage() method relies on an internal
tempDir() method to return the path to the temporary directory
to be cleared, which ClearTempStorage() then removes entirely.
The tempDir() methods, however, can return os.TempDir() if they
are unable to create a local temporary directory within the
Git LFS storage directory, and also do not cache the path to
the directory they first returned. So were ClearTempStorage()
ever to be called in one of these adapters, its invocation of
tempDir() might just return the system temporary directory
(e.g., /tmp on Unix), which it would then attempt to remove.
The ClearTempStorage() method was introduced in commits
f124f0585f73279195cb20ecb4aa9e149512b48a and
940a91a8df18eaf16e9a0162999f26d59fa9f5af of PR #1265, but
has never been called by any user of the transfer
adapter interface.
We therefore just remove this method and its implemetations
from all the current tranfer adapters.
We remove one extra copy of a comment added in commit
9c46a38281283d919e841cf3dcae35b4f686e0e1 of PR #4446,
where the first copy was already in place from commit
594f8e386cce3441e06c9094ab5e251f0e07ca1f in the same PR.
We update the comment describing the Begin method of the
Adapter interface to match that for the corresponding method
of the SSH transfer adapter, which was added in commit
594f8e386cce3441e06c9094ab5e251f0e07ca1f of PR #4446.
The older comment for the interface description has been
out of sync with the actual method signature since at least
commit 303156e0e22ef7cdd863605fbb501174dc25e08a of PR #1774,
when the maxConcurrency integer argument was replaced with
the current "cfg" AdapterConfig one.
The lfsapi client is used to perform operations for SSH authentication
already, so we know we'll have one wherever SSH operations might be
done. Let's move the instantiation to the client so that we can reuse
it both in the transfer queue code and in the locking code as well.
When using the pure SSH-based protocol, we can get much higher speeds by
multiplexing multiple connections on the same SSH connection. If we're
using OpenSSH, let's enable the ControlMaster option unless
lfs.ssh.automultiplex is set to false, and multiplex these shell
operations over one connection.
We prefer XDG_RUNTIME_DIR because it's guaranteed to be private and we
can share many connections over one socket, but if that's not set, let's
default to creating a new temporary directory for the socket. On
Windows, where the native SSH client doesn't support ControlMaster,
we should fall back to using multiple connections since we use
ControlMaster=auto.
Note that the option exists because users may already be using SSH
multiplexing and we would want to provide a way for them to disable
this, in addition to the case where users have an old or broken OpenSSH
which cannot support this option.
We pass the connection object into each worker and adjust our transfer
code to pass it into each function we invoke. We also make sure to
properly terminate each connection at the end by reducing our connection
count to 0, which closes the extra (i.e., all) connections.
Co-authored-by: Chris Darroch <chrisd8088@github.com>
Since we currently support multiple connections for HTTPS, let's also
add multiple connections for pure SSH connections. For now, we spawn a
whole new connection, but in the future we'll support using OpenSSH's
ControlMaster flag.
Introduce several new functions to create SSH connections and adjust the
number of connection being used.
A pure SSH-based protocol has been a long time request from many users,
so let's implement one. Implement basic upload and download support,
plus batch requests, and support pkt-line tracing for text packets to
make debugging easier.
Note that locking is not yet supported; that will come in a future
patch.
We prefer the Endpoint function to the RemoteEndpoint function because
the former handles lfs.url and the latter does not.
Update a comment about the shared temporary directory; it is no longer
cleared automatically by the adapter, and instead the cleanup happens
later in other code. Therefore, it is safe to share the directory among
the transport adapters.
Co-authored-by: Chris Darroch <chrisd8088@github.com>
Right now, we always perform a batch operation using an instance of the
transfer queue client, which is HTTP based. However, in the future,
we'll want to add an SSH-based option, so let's turn the client into a
simple interface we can use to abstract this away.
Our NTLM support has been known to be broken in various situations for a
while, specifically on Windows. The core team is unable to troubleshoot
these problems, and nobody has stepped up to maintain the NTLM support.
In addition, NTLM uses cryptography and security techniques that are
known to be insecure, such as the algorithms DES, MD4, and MD5, as well
as simple, unsalted hashes of passwords.
Since we now support Kerberos, most users should be able to replace
their use of NTLM with Kerberos instead. Users have reported this
working on Windows and it is known to work well on at least Debian as
well. Drop support for NTLM and remove it from the codebase.
Previously, retries happened as fast as possible unless the server
provided the Retry-After header. This is effective for certain types of
errors, but not when the LFS server is experiencing a temporary but not
instantaneous failure. Delaying between retries lets the server recover
and the LFS operation complete.
Delays start at a fixed 250ms for the first retry and double with each
successive retry up to a configurable maximum delay, 10s by default. The
maximum retry is configurable using lfs.transfer.maxretrydelay. Delays
can be disabled by setting the max delay to 0.
When we have an error creating the local storage directory, such as if
the permissions are wrong, we fail to produce a helpful error message,
since we fail to check the error we produce.
Let's take the error we get in such a case and pass it through to the
transfer queue, where we can handle it as any other error. This means
that if only one directory has a problem, we can transfer the rest of
the objects and fail for only the one problematic object.
In the future, we're going to need to access the access-related types
in the lfshttp package. To avoid an import loop, move Access and
AccessMode into the creds package. Add constructors and accessors since
the members are private.
If the batch operation fails due to an error instead of a bad HTTP
status code, we'll abort the batch operation and retry. This appears to
be a regression from 1412d6e4 ("Don't fail if we lack objects the server
has", 2019-04-30), which caused us to handle errors differently.
Since there are two error returns from enqueueAndCollectRetriesFor,
let's wrap the batch error case as a retriable error and not abort if we
find a retriable error later on. This lets us continue to abort if we
get a missing object, which should be fatal, but retry in the more
common network failure case.
Currently, if you have a remote with a file URL but have lfs.url set to
point to a non-file URL, your transfers will fail. This occurs because
we look up whether to use a transfer adapter based on the remote URL,
but we also look up whether to default to the built-in transfer adapter
based on the remote URL, not the endpoint URL. We then attempt to pass
the HTTP URL to the standalone transfer adapter, which doesn't work.
Look up the endpoint URL as well and pass that to the code which
determines whether we should use the builtin transfer adapter. If the
actual endpoint is not a file URL, then we won't suggest the builtin
transfer adapter.
Testing showed that while race condition analysis in #3880 was correct, the way it tries to fix that
does not work for the *first* git-lfs process that will actually perform file move.
Instead, this commit performs multiple attempts when working with files in LFS storage.
Similar logic is already implemented in "cmd/go/internal/robustio" and "cmd/go/internal/renameio" packages.
However, they are not public, so we cannot use them.
On Windows, there is no way to replace file atomically. Instead, MoveFileExA:
1. Calls CreateFileA(access=Delete, shareMode=Delete)
2. Calls SetRenameInformationFile to rename file
3. Calls CloseFile
The problem is that if parallel process attempts to open destination file for reading between
steps 2 and 3, it will try to do that without shareMode=Delete and will hit a SHARING_VIOLATION error.
In practice, this race condition results in:
Smudge error: Error opening media file.: open .git\lfs\objects\<sha>: The process cannot access the file because it is being used by another process.
There are two solutions here:
1. Do not overwrite file if it already exists
2. Retry reading from file if got a SHARING_VIOLATION
This commit implements option 1.
Fixes#2825 (for Windows, other OSes are race-free already). Also see #3813 and #3826.
1. git-lfs now only writes to unique temp files created with `ioutil.TempFile`
that are open with `O_CREATE|O_EXCL`
2. Partially-downloaded file is now atomically borrowed and returned back via `os.Rename`
3. `.part <-> .tmp` and `.tmp -> final` renames are allowed to fail and are handled appropriately
This is a continuation of #3813Fixes#2825
There are several error codepaths where we borrow .part file but remove it instead of returning back.
I believe that it is OK and in those erroneous cases it is better to restart download from scratch
instead of attempting to use possibly-corrupt .part file.
When several git-lfs processes use the same LFS storage simultaneously,
there's a race condition between checking whether LFS file exists and renaming temporary
file to final name in LFS storage. When rename fails, check that target file exists.
If it does, pretend that download has finished successfully.
Fixes#2825
In 1412d6e4 ("Don't fail if we lack objects the server has",
2019-04-30), we changed the code to abort later if a missing object
occurs. In doing so, we had to consider the case where the transfer
queue aborts early for some reason and ensure that the sync.WaitGroup
does not unnecessarily block due to outstanding objects never getting
processed.
However, the approach we used, which was to explicitly add the number of
items we skipped processing, was error prone and didn't cover all cases.
Notably, a DNS failure could randomly cause a hang during a push. Solve
this by creating a class for a wait group which is abortable and simply
abort it if we encounter an error, preventing any deadlocks caused by
miscounting the number of items.
When we are cloning or checking out a repo, the filter-process smudge
filter sends the transfer queue data by calling the Add method. This
method inserts items into the incoming channel, which is then processed
by the collectBatches function. This function accepts items, downloads
a batch worth, and splits the remainder into the next and pending
queues, and then, if there are any items in either of these queues, goes
around again. If there are no items, it exits the download queue.
Unfortunately, this algorithm has a problem. If the filter-process
portion is slow and does not provide us enough data, we can end up with
both the next and pending queues empty, but the incoming channel still
open, waiting for the filter-process end to send more data. In such a
case, we stop downloading early and the process deadlocks since there's
nobody to read from the channel.
The solution, however, is simple: since we set the closing flag whenever
the incoming channel is closed, we can check if that flag is set and
only quit if it is. We know that if it is not, there's more data, and
the other end of the channel is just being slow (perhaps because Git is
processing other filters). Do this to avoid the deadlock and ensure
that we terminate properly in all cases.
One commonly requested feature for Git LFS is support for local files.
Currently, we tell users that they must use a standalone transfer
agent, which is true, but nobody has provided one yet. Since writing a
simple transfer agent is not very difficult, let's provide one
ourselves.
Introduce a basic standalone transfer agent, git lfs standalone-file,
that handles uploads and downloads. Add a default configuration required
for it to work, while still allowing users to override this
configuration if they have a preferred implementation that is more
featureful. We provide this as a transfer agent instead of built-in
because it avoids the complexity of adding a different code path to the
main codebase, but also serves as a demonstration of how to write a
standalone transfer agent for others who might want to do so, much
like Git demonstrates remote helpers using its HTTP helper.
os.SEEK_XXX is not deprecated and we can use io.SeekXxx instead.
https://golang.org/pkg/os/#pkg-constants
```go
// Seek whence values.
//
// Deprecated: Use io.SeekStart, io.SeekCurrent, and io.SeekEnd.
const (
SEEK_SET int = 0 // seek relative to the origin of the file
SEEK_CUR int = 1 // seek relative to the current offset
SEEK_END int = 2 // seek relative to the end
)
```
https://golang.org/pkg/io/#pkg-constants
```go
// Seek whence values.
const (
SeekStart = 0 // seek relative to the origin of the file
SeekCurrent = 1 // seek relative to the current offset
SeekEnd = 2 // seek relative to the end
)
```
A Git LFS client may not have the entire history of the objects for the
repository. However, in some situations, we traverse the entire history
of a branch when pushing it, meaning that we need to process every
LFS object in the history of that branch. If the objects for the entire
history are not present, we currently fail to push.
Instead, let's mark objects we don't have on disk as missing and only
fail when we would need to upload those objects. We'll know the server
has the objects if the batch response provides no actions to take for
them when we request an upload. Pass the missing flag down through the
code, and always set it to false for non-uploads.
If for some reason we fail to properly flag a missing object, we will
still fail later on when we cannot open the file, just in a messier and
more poorly controlled way. The technique used here will attempt to
abort the batch as soon as we notice a problem, which means that in the
common case (less than 100 objects) we won't have transferred any
objects, so the user can notice the failure as soon as possible.
Update the tests to look for a string which will occur in the error
message, since we no longer produce the system error message for ENOENT.
When we perform a verify request, we need to specify a proper Accept
header along with the Content-Type header. Do so.
Hoist the setting of the headers before the code which copies headers
from the remote side, so that the remote side can override it with a
suitable header if necessary. This is required for compatibility with
some existing servers.
For backward compatibility and some incompatible situations such as
using SSH protocol, href-rewriting is only enabled if
lfs.transfer.enablehrefrewrite is set to true explicitly.
Use insteadOf/pushInsteadOf aliases to rewrite href to upload/download
LFS objects. This is useful in situations such like you need to access
the LFS server via a reverse proxy.
If we're attempting to resume a download but the file we have on disk is
the expected size or too large, we'll send an invalid Range header where
the lower bound is greater than the upper bound. If this situation
occurs, simply retry the download from the beginning, since we clearly
don't have the expected data (or we wouldn't be resuming) and it isn't
clear how else to recover.
In 4775f9c1 ("tq: avoid nil pointer dereference on unexpected failure",
2019-02-19), we fixed an issue on uploads where we could dereference a
nil http.Response pointer when uploading. We should also check for this
when downloading. Since the check a few lines below is now redundant,
remove it.
When we get an error making an API request, we check to see if the
response is a 429, and if so, we attempt to back off and retry later.
However, we failed to check if we actually had a response; if the
response was missing (say, because we had a network issue), then we
would dereference a nil pointer.
Instead, let's explicitly check for a non-nil response before querying
it for a status code.
Co-authored-by: Taylor Blau <me@ttaylorr.com>