git-lfs/fs/cleanup.go

84 lines
1.8 KiB
Go
Raw Normal View History

2017-10-25 01:16:14 +00:00
package fs
import (
"os"
"path/filepath"
"strings"
fs: be a little less aggressive with cleanup When we invoke the file helper to download files from a remote system, we use the repository's temporary directory in order to make sure we can quickly and easily rename files into the object store without the need for a copy, which might be necessary if we used the system temporary directory. In many cases, these files we generate are actually hard links to the remote repository, which means we can cheaply and easily copy files at maximum speed. Note, however, that these files don't look like a normal object ID; instead, they have a name generated by Go's temporary file code. However, our current code causes a problem if, during the pull, a file is checked out and Git causes it to be uselessly cleaned, which happens in some cases. That's because our temporary file cleanup code will remove all files in the temporary directory that don't look like normal object ID components, and as a result, the clean operation can remove files that are in use by the fetch. Instead of immediately purging anything that looks like it's not a valid object ID, let's wait an hour before pruning any temporary file unless it's an object that we've already downloaded and verified is in our object store. We also need to do one more thing here, which is that we need to ignore any file that lives in a directory younger than an hour old and adjust our file helper to create a new directory for its temporary files. This is important because if we create a hard link to an object in a remote repository, it may be older than one hour, and we don't want to prune those if we're not done yet. Note that we use a sync.Map to make this as efficient as possible and avoid the need for many additional stat calls. Ideally, we won't actually see an increase in stat calls at all.
2021-04-26 14:48:57 +00:00
"sync"
2017-10-25 01:16:14 +00:00
"time"
"github.com/git-lfs/git-lfs/v3/tools"
2017-10-25 01:16:14 +00:00
"github.com/rubyist/tracerx"
)
func (f *Filesystem) cleanupTmp() error {
tmpdir := f.TempDir()
if len(tmpdir) == 0 {
return nil
}
// No temporary directory? No problem.
if _, err := os.Stat(tmpdir); err != nil && os.IsNotExist(err) {
return nil
}
fs: be a little less aggressive with cleanup When we invoke the file helper to download files from a remote system, we use the repository's temporary directory in order to make sure we can quickly and easily rename files into the object store without the need for a copy, which might be necessary if we used the system temporary directory. In many cases, these files we generate are actually hard links to the remote repository, which means we can cheaply and easily copy files at maximum speed. Note, however, that these files don't look like a normal object ID; instead, they have a name generated by Go's temporary file code. However, our current code causes a problem if, during the pull, a file is checked out and Git causes it to be uselessly cleaned, which happens in some cases. That's because our temporary file cleanup code will remove all files in the temporary directory that don't look like normal object ID components, and as a result, the clean operation can remove files that are in use by the fetch. Instead of immediately purging anything that looks like it's not a valid object ID, let's wait an hour before pruning any temporary file unless it's an object that we've already downloaded and verified is in our object store. We also need to do one more thing here, which is that we need to ignore any file that lives in a directory younger than an hour old and adjust our file helper to create a new directory for its temporary files. This is important because if we create a hard link to an object in a remote repository, it may be older than one hour, and we don't want to prune those if we're not done yet. Note that we use a sync.Map to make this as efficient as possible and avoid the need for many additional stat calls. Ideally, we won't actually see an increase in stat calls at all.
2021-04-26 14:48:57 +00:00
traversedDirectories := &sync.Map{}
var walkErr error
tools.FastWalkDir(tmpdir, func(parentDir string, info os.FileInfo, err error) {
if err != nil {
walkErr = err
}
fs: be a little less aggressive with cleanup When we invoke the file helper to download files from a remote system, we use the repository's temporary directory in order to make sure we can quickly and easily rename files into the object store without the need for a copy, which might be necessary if we used the system temporary directory. In many cases, these files we generate are actually hard links to the remote repository, which means we can cheaply and easily copy files at maximum speed. Note, however, that these files don't look like a normal object ID; instead, they have a name generated by Go's temporary file code. However, our current code causes a problem if, during the pull, a file is checked out and Git causes it to be uselessly cleaned, which happens in some cases. That's because our temporary file cleanup code will remove all files in the temporary directory that don't look like normal object ID components, and as a result, the clean operation can remove files that are in use by the fetch. Instead of immediately purging anything that looks like it's not a valid object ID, let's wait an hour before pruning any temporary file unless it's an object that we've already downloaded and verified is in our object store. We also need to do one more thing here, which is that we need to ignore any file that lives in a directory younger than an hour old and adjust our file helper to create a new directory for its temporary files. This is important because if we create a hard link to an object in a remote repository, it may be older than one hour, and we don't want to prune those if we're not done yet. Note that we use a sync.Map to make this as efficient as possible and avoid the need for many additional stat calls. Ideally, we won't actually see an increase in stat calls at all.
2021-04-26 14:48:57 +00:00
if walkErr != nil {
return
}
path := filepath.Join(parentDir, info.Name())
fs: be a little less aggressive with cleanup When we invoke the file helper to download files from a remote system, we use the repository's temporary directory in order to make sure we can quickly and easily rename files into the object store without the need for a copy, which might be necessary if we used the system temporary directory. In many cases, these files we generate are actually hard links to the remote repository, which means we can cheaply and easily copy files at maximum speed. Note, however, that these files don't look like a normal object ID; instead, they have a name generated by Go's temporary file code. However, our current code causes a problem if, during the pull, a file is checked out and Git causes it to be uselessly cleaned, which happens in some cases. That's because our temporary file cleanup code will remove all files in the temporary directory that don't look like normal object ID components, and as a result, the clean operation can remove files that are in use by the fetch. Instead of immediately purging anything that looks like it's not a valid object ID, let's wait an hour before pruning any temporary file unless it's an object that we've already downloaded and verified is in our object store. We also need to do one more thing here, which is that we need to ignore any file that lives in a directory younger than an hour old and adjust our file helper to create a new directory for its temporary files. This is important because if we create a hard link to an object in a remote repository, it may be older than one hour, and we don't want to prune those if we're not done yet. Note that we use a sync.Map to make this as efficient as possible and avoid the need for many additional stat calls. Ideally, we won't actually see an increase in stat calls at all.
2021-04-26 14:48:57 +00:00
if info.IsDir() {
traversedDirectories.Store(path, info)
return
}
parts := strings.SplitN(info.Name(), "-", 2)
oid := parts[0]
fs: be a little less aggressive with cleanup When we invoke the file helper to download files from a remote system, we use the repository's temporary directory in order to make sure we can quickly and easily rename files into the object store without the need for a copy, which might be necessary if we used the system temporary directory. In many cases, these files we generate are actually hard links to the remote repository, which means we can cheaply and easily copy files at maximum speed. Note, however, that these files don't look like a normal object ID; instead, they have a name generated by Go's temporary file code. However, our current code causes a problem if, during the pull, a file is checked out and Git causes it to be uselessly cleaned, which happens in some cases. That's because our temporary file cleanup code will remove all files in the temporary directory that don't look like normal object ID components, and as a result, the clean operation can remove files that are in use by the fetch. Instead of immediately purging anything that looks like it's not a valid object ID, let's wait an hour before pruning any temporary file unless it's an object that we've already downloaded and verified is in our object store. We also need to do one more thing here, which is that we need to ignore any file that lives in a directory younger than an hour old and adjust our file helper to create a new directory for its temporary files. This is important because if we create a hard link to an object in a remote repository, it may be older than one hour, and we don't want to prune those if we're not done yet. Note that we use a sync.Map to make this as efficient as possible and avoid the need for many additional stat calls. Ideally, we won't actually see an increase in stat calls at all.
2021-04-26 14:48:57 +00:00
if len(parts) == 2 && len(oid) == 64 {
fi, err := os.Stat(f.ObjectPathname(oid))
if err == nil && !fi.IsDir() {
tracerx.Printf("Removing existing tmp object file: %s", path)
os.RemoveAll(path)
return
}
2017-10-25 01:16:14 +00:00
}
fs: be a little less aggressive with cleanup When we invoke the file helper to download files from a remote system, we use the repository's temporary directory in order to make sure we can quickly and easily rename files into the object store without the need for a copy, which might be necessary if we used the system temporary directory. In many cases, these files we generate are actually hard links to the remote repository, which means we can cheaply and easily copy files at maximum speed. Note, however, that these files don't look like a normal object ID; instead, they have a name generated by Go's temporary file code. However, our current code causes a problem if, during the pull, a file is checked out and Git causes it to be uselessly cleaned, which happens in some cases. That's because our temporary file cleanup code will remove all files in the temporary directory that don't look like normal object ID components, and as a result, the clean operation can remove files that are in use by the fetch. Instead of immediately purging anything that looks like it's not a valid object ID, let's wait an hour before pruning any temporary file unless it's an object that we've already downloaded and verified is in our object store. We also need to do one more thing here, which is that we need to ignore any file that lives in a directory younger than an hour old and adjust our file helper to create a new directory for its temporary files. This is important because if we create a hard link to an object in a remote repository, it may be older than one hour, and we don't want to prune those if we're not done yet. Note that we use a sync.Map to make this as efficient as possible and avoid the need for many additional stat calls. Ideally, we won't actually see an increase in stat calls at all.
2021-04-26 14:48:57 +00:00
// Don't prune items in a directory younger than an hour. These
// items could be hard links to files from other repositories,
// which would have an older timestamp but which are still in
// use by some active process. Exempt the main temporary from
// this check, since we frequently modify it and we'd never
// prune otherwise.
if tmpdir != parentDir {
var dirInfo os.FileInfo
entry, ok := traversedDirectories.Load(parentDir)
if ok {
dirInfo = entry.(os.FileInfo)
} else {
dirInfo, err = os.Stat(parentDir)
if err != nil {
return
}
traversedDirectories.Store(path, dirInfo)
}
if time.Since(dirInfo.ModTime()) <= time.Hour {
return
}
}
2017-10-25 01:16:14 +00:00
if time.Since(info.ModTime()) > time.Hour {
tracerx.Printf("Removing old tmp object file: %s", path)
os.RemoveAll(path)
return
}
})
2017-10-25 01:16:14 +00:00
return walkErr
2017-10-25 01:16:14 +00:00
}