tools: always force a UTF-8 locale for cygpath

When we look up the repository path from Git, we pass it through cygpath
-w to canonicalize it into a Windows path, since Cygwin's Git will give
us a Unix-style path.  We perform path canonicalization not only on
Cygwin, but also on MINGW as well, which include Git Bash, since we want
to accept and canonicalize Unix-style paths there as well.

Normally, this works great.  However, if invoked not from Git Bash, but
via the Git for Windows bash.exe command, no locale is set in the
environment, despite the locale binary indicating UTF-8 locales.  As a
result, if non-ASCII character exist in the path name, it tries to
encode them in ISO-8850-1.

On a standard Unix, where paths are always bytes, defaulting to
ISO-8859-1 might be fine, because regardless of the encoding, paths are
always bytes and no encoding needs to be performed.  On macOS, where the
file system and all locales use UTF-8, this is also not a problem,
because again, no encoding needs to be done.

However, on Windows, where paths are natively stored as UTF-16, this is
remarkably unhelpful, since the majority of Unicode code points cannot
be represented in ISO-8859-1.  Thus, the vast majority of paths are
broken by cygpath when the locale is not set.

Since we know we always want UTF-8 from cygpath, let's just force that
in the environment we pass it.  We need to copy the environment since
the value we have is shared among all executed subcommands and we don't
want to modify other commands' locales, since that would cause error
messages to be printed in English instead of the user's locale.

Note that before this change, the test would fail because the local
working directory was not read, and therefore it would be empty on
Windows.
This commit is contained in:
brian m. carlson 2020-09-03 21:32:17 +00:00
parent b20dc37426
commit 8f3d9d5ef6
No known key found for this signature in database
GPG Key ID: 2D0C9BC12F82B3A1
2 changed files with 74 additions and 0 deletions

@ -975,3 +975,68 @@ UploadTransfers=basic,lfs-standalone-file
contains_same_elements "$expected" "$actual"
)
end_test
begin_test "env with unicode"
(
set -e
# This contains a Unicode apostrophe, an E with grave accent, and a Euro sign.
# Only the middle one is representable in ISO-8859-1.
reponame="env-dautre-nom-très-bizarr€"
unset_vars
mkdir $reponame
cd $reponame
git init
git remote add origin "$GITSERVER/env-origin-remote"
git remote add other "$GITSERVER/env-other-remote"
touch a.txt
git add a.txt
git commit -m "initial commit"
# Set by the testsuite.
unset LC_ALL
endpoint="$GITSERVER/env-origin-remote.git/info/lfs (auth=none)"
endpoint2="$GITSERVER/env-other-remote.git/info/lfs (auth=none)"
localwd=$(canonical_path "$TRASHDIR/$reponame")
localgit=$(canonical_path "$TRASHDIR/$reponame/.git")
localgitstore=$(canonical_path "$TRASHDIR/$reponame/.git")
lfsstorage=$(canonical_path "$TRASHDIR/$reponame/.git/lfs")
localmedia=$(canonical_path "$TRASHDIR/$reponame/.git/lfs/objects")
tempdir=$(canonical_path "$TRASHDIR/$reponame/.git/lfs/tmp")
envVars=$(printf "%s" "$(env | grep "^GIT")")
expected=$(printf '%s
%s
Endpoint=%s
Endpoint (other)=%s
LocalWorkingDir=%s
LocalGitDir=%s
LocalGitStorageDir=%s
LocalMediaDir=%s
LocalReferenceDirs=
TempDir=%s
ConcurrentTransfers=8
TusTransfers=false
BasicTransfersOnly=false
SkipDownloadErrors=false
FetchRecentAlways=false
FetchRecentRefsDays=7
FetchRecentCommitsDays=0
FetchRecentRefsIncludeRemotes=true
PruneOffsetDays=3
PruneVerifyRemoteAlways=false
PruneRemoteName=origin
LfsStorageDir=%s
AccessDownload=none
AccessUpload=none
DownloadTransfers=basic,lfs-standalone-file
UploadTransfers=basic,lfs-standalone-file
%s
%s
' "$(git lfs version)" "$(git version)" "$endpoint" "$endpoint2" "$localwd" "$localgit" "$localgitstore" "$localmedia" "$tempdir" "$lfsstorage" "$envVars" "$envInitConfig")
actual=$(git lfs env | grep -v "^GIT_EXEC_PATH=")
contains_same_elements "$expected" "$actual"
)
end_test

@ -29,6 +29,15 @@ func Getwd() (dir string, err error) {
func translateCygwinPath(path string) (string, error) {
cmd := subprocess.ExecCommand("cygpath", "-w", path)
// cygpath uses ISO-8850-1 as the default encoding if the locale is not
// set, resulting in breakage, since we want a UTF-8 path.
env := make([]string, 0, len(cmd.Env)+1)
for _, val := range cmd.Env {
if !strings.HasPrefix(val, "LC_ALL=") {
env = append(env, val)
}
}
cmd.Env = append(env, "LC_ALL=C.UTF-8")
buf := &bytes.Buffer{}
cmd.Stderr = buf
out, err := cmd.Output()