git-lfs/docs/proposals/ssh_adapter.md

# SSH protocol proposal

We'd like to implement a protocol for Git LFS that uses SSH protocol
exclusively, avoiding the need to use HTTPS altogether.  This will make
deployment and use easier in a variety of situations where access to certain
ports is limited.

This is merely a proposal, not a commitment to implement for either the client
or server side.  Implementers who prefer to use HTTP can continue to do so.

## What not to do

There are several possible approaches that could be adopted.  SSH provides a
native capability for the SFTP protocol, which can be used to transfer files.
However, in order to implement this on the server side, each access (upload or
download) must have an access control check instead of one at the beginning of
the operation.  This would be inefficient in some server-side implementations,
and nearly impossible to implement securely for implementations that use the
system OpenSSH for implementation.

## A more usable approach

Git already has some places we can look for inspiration.  Its SSH protocol is
based on the Git native protocol, which is based on the pkt-line scheme.
Recently, Git has learned about protocol version 2, which provides better
support for expressing and negotiating capabilities.

Ideally, we would allow multiple operations to occur on a single connection for
efficiency's sake, especially on high-latency connections, where the cost of SSH
connection setup may be high due to multiple round trips.  In addition, a
protocol which maps well onto HTTP may be beneficial for those server-side
implementations which would like to proxy connections to an HTTP-based backend.

## Preliminary design

This design assumes a reference to Git's pkt-line and protocol v2 documentation.
pkt-line headers for this document may contain values up to 65519 decimal.

To initiate a connection, Git LFS should run the following command:

    $ ssh [{user}@]{server} git-lfs-transfer {path} {operation}

If authentication fails, or some other connection error occurs, errors will be
read from standard error and displayed to the user.  The operation may be
`upload`, `download`, or `lock`.  Other operations may be implemented in the
future.

Once the connection is established, the server should send a capability
advertisement:

```
capability-advertisement = capability-list flush-pkt
capability-list = *capability
capability = PKT-LINE(key[=value] LF)

key = 1*(ALPHA | DIGIT | "-_")
value = 1*(ALPHA | DIGIT | "-_.,?\/{}[]()<>!@#$%^&*+=:;")
```

Unlike the Git protocol, but like IMAP, the protocol version is specified as a
capability.  This document defines protocol version 1, which is specified as
`version=1`.  If the server supports other protocol versions, it may enumerate
them here as well.

If the server supports locking, the `locking` capability should be advertised,
and the client may then use the `lock`, `unlock`, and `list-lock` commands.

No capabilities other than the base functionality specified here are enabled
without the client explicitly enabling them.  Note that the `value` production
here, unlike in Git, does not include the space character, since it is used as a
delimiter in parts of the protocol.

The client will then issue an appropriate version command:

```
version-request = PKT-LINE("version " number LF) flush-pkt
number = 1*DIGIT
```

The response from the server will look like the following:

```
version-response = status-command
                   delim-pkt
                   error-message
                   flush-pkt
status-command = PKT-LINE("status " http-status-code LF)
http-status-code = 3DIGIT
error-message = *PKT-LINE(data LF)
```

The `http-status-code` portion of the response is an HTTP status code, identical
to those used if the request is made over HTTP.

The response code should be 200 if the version is accepted or 400 if it is not.
Other values are possible if other errors occur.

### Requests to transfer objects

These commands may be used if the operation was `upload` or `download`.

The `batch` command is used to specify a JSON command identical those used at
the `info/lfs/object/batch` endpoint:

```
batch-request = batch-command
                *argument
                delim-pkt
                *oid-line
                flush-pkt
batch-command = PKT-LINE("batch" LF)
argument = PKT-LINE(key=[data] LF)
oid-line = PKT-LINE(oid size *(key=[value]) LF)
oid = 1*("a-f0-9")
size = 1*DIGIT
```

The `transfer` argument is equivalent to the corresponding value in the HTTP
JSON API.  The `refname` argument is equivalent to the `name` argument of the
`ref` object in the HTTP JSON API.

Unknown arguments should be ignored, as should unknown key-value pairs in the
`oid-line` production.

The response from the server will look like the following:

```
batch-response = status-command
                 *argument
                 delim-pkt
                 (*batch-oid-line | error-message)
                 flush-pkt
batch-oid-line = PKT-LINE(oid size action *(key=[value]) LF)
error-message = PKT-LINE("message " data LF)
```

If the status command is successful (that is, the status is not 200-series
response), the data provided matches the `*batch-oid-line` production;
otherwise, the data provided represents a user-visible error message.

The server response should contain one pkt-line per oid-size-action tuple.  That
is, the same oid and size may be repeated if there are multiple actions.  If the
server has no actions that are valid for an object, it should be listed once in
the response with the `noop` action.

The response for an oid may include a string, `id`, which is an opaque
identifier relevant only to the server to help it identify the object, and
another string, `token`, which is an opaque identifier relevant only to the
server to help it manage authentication.  These strings must meet the syntax for
the `value` production above; if arbitrary bytes are needed, Base64 encoding is
recommended.

The `expires-in` and `expires-at` key-value pairs have the same meaning as their
corresponding items from the HTTP JSON API.

These values, if specified, must be passed as arguments to the `get-object` and
`put-object` commands.

### Downloads

These commands may be used if the operation was `download`.

If the operation was `download`, the command `get-object` may be used to retrieve an object:

```
get-object-request = get-object-command
                     *arguments
                     flush-pkt
get-object-command = PKT-LINE("get-object " oid LF)

```

If the `id` or `token` responses were provided in the response to `batch`, they
must be specified as arguments here.  The server may choose to ignore the `oid`
field specified in favor of looking up the data using the `id` field.

The response looks like the following:

```
status-data-response = status-success-response | status-error-response
status-success-response = status-success-command
                          *argument
                          delim-pkt
                          binary-data
                          flush-pkt
status-success-command = PKT-LINE("status 200" LF)
binary-data = *PKT-LINE(data)
status-error-response = status-error-response
                        *argument
                        delim-pkt
                        error-message
                        flush-pkt
status-error-command = PKT-LINE("status " http-error-code LF)
http-error-code = ("4" | "5") 2DIGIT
```

The `size` argument is required on `status` responses to `get-object` commands.

### Uploads

These commands may be used if the operation was `upload`.

If the operation was `upload`, the commands `put-object` and `verify-object`
may be specified.  `put-object` is used to upload an object to the server:

```
put-object-request = put-object-command
                     *argument
                     delim-pkt
                     binary-data
                     flush-pkt
put-object-command = PKT-LINE("put-object " oid LF)
```

As above, the `size` command is required and `id` and `token` are required if
provided by the server.  The response matches the `status-data-response`
production.  The `binary-data` returned on success is not meaningful and should
be empty.

The `verify-object` command is used to verify an object:

```
verify-object-request = verify-object-command
                        *argument
                        flush-pkt
verify-object-command = PKT-LINE("verify-object " oid " " size LF)
```

The `size` production represents the size of the object represented by `oid` as
a decimal integer in bytes.  The `id` and `token` items from the batch request
must be passed as arguments here, if specified.

The response matches the following:

```
generic-status-response = generic-success-response | status-error-response
generic-success-response = generic-success-command
                           *argument
                           flush-pkt
generic-success-command = PKT-LINE("status 200" LF)
```

### Locks

These commands may be used if the operation was `lock`.

The `lock` command may be used to lock a file on a ref:
```
lock-request = lock-command
               *argument
               delim-pkt
               flush-pkt
lock-command = PKT-LINE("lock" LF)
```

The `path` and `refname` arguments correspond to the `path` component and the
`name` component of the `ref` object in the HTTP JSON API.

The response is as follows:


```
lock-response = lock-success-response | status-error-response
lock-success-response = lock-success-command
                        *argument
                        flush-pkt
lock-success-command = PKT-LINE("status 201" LF)
```

If the response is either successful or a 409 response, the arguments `id`,
`path`, `locked-at`, and `ownername` are provided.  In case of a successful
response, these attributes represent the created lock; if the response is a 409,
then the attributes represent the conflicting lock.

The `list-lock` command may be used to list and verify locks:

```
list-lock-request = list-lock-command
                    *argument
                    flush-pkt
list-lock-command = PKT-LINE("list-lock" LF)
```

The `path`, `id`, `cursor`, `limit`, and `refspec` correspond to the items in
the HTTP JSON API.

```
list-lock-response = list-lock-success-response | status-error-response
list-lock-success-response = list-lock-success-command
                             *argument
                             delim-pkt
                             *lock-spec
                             flush-pkt
list-lock-success-command = PKT-LINE("status 200" LF)
lock-spec = lock-decl
            path-id
            locked-at
            ownername-id
            owner-id
lock-decl = PKT-LINE("lock " lock-id LF)
lock-id = value
path-id = PKT-LINE("path " lock-id path LF)
path = data
locked-at = PKT-LINE("locked-at " lock-id timestamp LF)
ownername-id = PKT-LINE("ownername " lock-id ownername LF)
ownername = data
owner-id = PKT-LINE("owner " lock-id who LF)
who = ("ours" | "theirs")
```

The `lock-decl` production declares a new lock.  The `lock-id` production refers
to the ID provided by the server.  The same ID is repeated in each line to allow
for easier parsing.

The `next-cursor` argument indicates the next value of the `cursor` argument to
be passed to the `list-lock` command.  If there is no `next-cursor` argument,
this is the final response.

```
unlock-request = unlock-command
                 *argument
                 flush-pkt
unlock-command = PKT-LINE("unlock " lock-id LF)
```

The `force` and `refname` arguments have the same meaning as their corresponding
values in the HTTP JSON API.  The response matches the `generic-status-response`
production.
docs/proposals: propose an SSH protocol There has been a significant amount of interest in a purely SSH-based protocol for Git LFS. Add a proposal for an SSH-based protocol based on Git's pkt-line and protocol v2, while still mapping cleanly onto HTTP for ease of implementation on the server-side. Document the basic upload and download protocol, as well as locking. Coalesce the lock listing and lock verification endpoints in the JSON API into one that provides both features. Take care that we allow paths and refs to contain arbitrary byte sequences, including spaces, by always providing them in an unambiguous way at the end of a pkt-line. 2018-10-05 18:50:04 +00:00			`# SSH protocol proposal`

			`We'd like to implement a protocol for Git LFS that uses SSH protocol`
			`exclusively, avoiding the need to use HTTPS altogether. This will make`
			`deployment and use easier in a variety of situations where access to certain`
			`ports is limited.`

Fix typo 2019-08-21 22:47:05 +00:00			`This is merely a proposal, not a commitment to implement for either the client`
docs/proposals: propose an SSH protocol There has been a significant amount of interest in a purely SSH-based protocol for Git LFS. Add a proposal for an SSH-based protocol based on Git's pkt-line and protocol v2, while still mapping cleanly onto HTTP for ease of implementation on the server-side. Document the basic upload and download protocol, as well as locking. Coalesce the lock listing and lock verification endpoints in the JSON API into one that provides both features. Take care that we allow paths and refs to contain arbitrary byte sequences, including spaces, by always providing them in an unambiguous way at the end of a pkt-line. 2018-10-05 18:50:04 +00:00			`or server side. Implementers who prefer to use HTTP can continue to do so.`

			`## What not to do`

			`There are several possible approaches that could be adopted. SSH provides a`
			`native capability for the SFTP protocol, which can be used to transfer files.`
			`However, in order to implement this on the server side, each access (upload or`
			`download) must have an access control check instead of one at the beginning of`
			`the operation. This would be inefficient in some server-side implementations,`
			`and nearly impossible to implement securely for implementations that use the`
			`system OpenSSH for implementation.`

			`## A more usable approach`

			`Git already has some places we can look for inspiration. Its SSH protocol is`
			`based on the Git native protocol, which is based on the pkt-line scheme.`
			`Recently, Git has learned about protocol version 2, which provides better`
			`support for expressing and negotiating capabilities.`

			`Ideally, we would allow multiple operations to occur on a single connection for`
			`efficiency's sake, especially on high-latency connections, where the cost of SSH`
			`connection setup may be high due to multiple round trips. In addition, a`
			`protocol which maps well onto HTTP may be beneficial for those server-side`
			`implementations which would like to proxy connections to an HTTP-based backend.`

			`## Preliminary design`

			`This design assumes a reference to Git's pkt-line and protocol v2 documentation.`
			`pkt-line headers for this document may contain values up to 65519 decimal.`

			`To initiate a connection, Git LFS should run the following command:`

			`$ ssh [{user}@]{server} git-lfs-transfer {path} {operation}`

			`If authentication fails, or some other connection error occurs, errors will be`
			`read from standard error and displayed to the user. The operation may be`
			`upload`, `download`, or `lock`. Other operations may be implemented in the
			`future.`

			`Once the connection is established, the server should send a capability`
			`advertisement:`

			```
			`capability-advertisement = capability-list flush-pkt`
			`capability-list = *capability`
			`capability = PKT-LINE(key[=value] LF)`

			`key = 1*(ALPHA \| DIGIT \| "-_")`
			`value = 1(ALPHA \| DIGIT \| "-_.,?\/{}[]()<>!@#$%^&+=:;")`
			```

			`Unlike the Git protocol, but like IMAP, the protocol version is specified as a`
			`capability. This document defines protocol version 1, which is specified as`
			`version=1`. If the server supports other protocol versions, it may enumerate
			`them here as well.`

			If the server supports locking, the `locking` capability should be advertised,
			and the client may then use the `lock`, `unlock`, and `list-lock` commands.

			`No capabilities other than the base functionality specified here are enabled`
			without the client explicitly enabling them. Note that the `value` production
			`here, unlike in Git, does not include the space character, since it is used as a`
			`delimiter in parts of the protocol.`

			`The client will then issue an appropriate version command:`

			```
			`version-request = PKT-LINE("version " number LF) flush-pkt`
			`number = 1*DIGIT`
			```

			`The response from the server will look like the following:`

			```
			`version-response = status-command`
			`delim-pkt`
			`error-message`
			`flush-pkt`
			`status-command = PKT-LINE("status " http-status-code LF)`
			`http-status-code = 3DIGIT`
			`error-message = *PKT-LINE(data LF)`
			```

			The `http-status-code` portion of the response is an HTTP status code, identical
			`to those used if the request is made over HTTP.`

			`The response code should be 200 if the version is accepted or 400 if it is not.`
			`Other values are possible if other errors occur.`

			`### Requests to transfer objects`

			These commands may be used if the operation was `upload` or `download`.

			The `batch` command is used to specify a JSON command identical those used at
			the `info/lfs/object/batch` endpoint:

			```
			`batch-request = batch-command`
			`*argument`
			`delim-pkt`
			`*oid-line`
			`flush-pkt`
			`batch-command = PKT-LINE("batch" LF)`
			`argument = PKT-LINE(key=[data] LF)`
			`oid-line = PKT-LINE(oid size *(key=[value]) LF)`
			`oid = 1*("a-f0-9")`
			`size = 1*DIGIT`
			```

			The `transfer` argument is equivalent to the corresponding value in the HTTP
			JSON API. The `refname` argument is equivalent to the `name` argument of the
			`ref` object in the HTTP JSON API.

			`Unknown arguments should be ignored, as should unknown key-value pairs in the`
			`oid-line` production.

			`The response from the server will look like the following:`

			```
			`batch-response = status-command`
			`*argument`
			`delim-pkt`
			`(*batch-oid-line \| error-message)`
			`flush-pkt`
			`batch-oid-line = PKT-LINE(oid size action *(key=[value]) LF)`
			`error-message = PKT-LINE("message " data LF)`
			```

			`If the status command is successful (that is, the status is not 200-series`
			response), the data provided matches the `*batch-oid-line` production;
			`otherwise, the data provided represents a user-visible error message.`

			`The server response should contain one pkt-line per oid-size-action tuple. That`
			`is, the same oid and size may be repeated if there are multiple actions. If the`
			`server has no actions that are valid for an object, it should be listed once in`
			the response with the `noop` action.

			The response for an oid may include a string, `id`, which is an opaque
			`identifier relevant only to the server to help it identify the object, and`
			another string, `token`, which is an opaque identifier relevant only to the
			`server to help it manage authentication. These strings must meet the syntax for`
			the `value` production above; if arbitrary bytes are needed, Base64 encoding is
			`recommended.`

			The `expires-in` and `expires-at` key-value pairs have the same meaning as their
			`corresponding items from the HTTP JSON API.`

			These values, if specified, must be passed as arguments to the `get-object` and
			`put-object` commands.

			`### Downloads`

			These commands may be used if the operation was `download`.

			If the operation was `download`, the command `get-object` may be used to retrieve an object:

			```
			`get-object-request = get-object-command`
			`*arguments`
			`flush-pkt`
			`get-object-command = PKT-LINE("get-object " oid LF)`

			```

			If the `id` or `token` responses were provided in the response to `batch`, they
			must be specified as arguments here. The server may choose to ignore the `oid`
			field specified in favor of looking up the data using the `id` field.

			`The response looks like the following:`

			```
			`status-data-response = status-success-response \| status-error-response`
			`status-success-response = status-success-command`
			`*argument`
			`delim-pkt`
			`binary-data`
			`flush-pkt`
			`status-success-command = PKT-LINE("status 200" LF)`
			`binary-data = *PKT-LINE(data)`
			`status-error-response = status-error-response`
			`*argument`
			`delim-pkt`
			`error-message`
			`flush-pkt`
			`status-error-command = PKT-LINE("status " http-error-code LF)`
			`http-error-code = ("4" \| "5") 2DIGIT`
			```

			The `size` argument is required on `status` responses to `get-object` commands.

			`### Uploads`

			These commands may be used if the operation was `upload`.

			If the operation was `upload`, the commands `put-object` and `verify-object`
			may be specified. `put-object` is used to upload an object to the server:

			```
			`put-object-request = put-object-command`
			`*argument`
			`delim-pkt`
			`binary-data`
			`flush-pkt`
			`put-object-command = PKT-LINE("put-object " oid LF)`
			```

			As above, the `size` command is required and `id` and `token` are required if
			provided by the server. The response matches the `status-data-response`
			production. The `binary-data` returned on success is not meaningful and should
			`be empty.`

			The `verify-object` command is used to verify an object:

			```
			`verify-object-request = verify-object-command`
			`*argument`
			`flush-pkt`
			`verify-object-command = PKT-LINE("verify-object " oid " " size LF)`
			```

			The `size` production represents the size of the object represented by `oid` as
			a decimal integer in bytes. The `id` and `token` items from the batch request
			`must be passed as arguments here, if specified.`

			`The response matches the following:`

			```
			`generic-status-response = generic-success-response \| status-error-response`
			`generic-success-response = generic-success-command`
			`*argument`
			`flush-pkt`
			`generic-success-command = PKT-LINE("status 200" LF)`
			```

			`### Locks`

			These commands may be used if the operation was `lock`.

			The `lock` command may be used to lock a file on a ref:
			```
			`lock-request = lock-command`
			`*argument`
			`delim-pkt`
			`flush-pkt`
			`lock-command = PKT-LINE("lock" LF)`
			```

			The `path` and `refname` arguments correspond to the `path` component and the
			`name` component of the `ref` object in the HTTP JSON API.

			`The response is as follows:`


			```
			`lock-response = lock-success-response \| status-error-response`
			`lock-success-response = lock-success-command`
			`*argument`
			`flush-pkt`
			`lock-success-command = PKT-LINE("status 201" LF)`
			```

			If the response is either successful or a 409 response, the arguments `id`,
			`path`, `locked-at`, and `ownername` are provided. In case of a successful
			`response, these attributes represent the created lock; if the response is a 409,`
			`then the attributes represent the conflicting lock.`

			The `list-lock` command may be used to list and verify locks:

			```
			`list-lock-request = list-lock-command`
			`*argument`
			`flush-pkt`
			`list-lock-command = PKT-LINE("list-lock" LF)`
			```

			The `path`, `id`, `cursor`, `limit`, and `refspec` correspond to the items in
			`the HTTP JSON API.`

			```
			`list-lock-response = list-lock-success-response \| status-error-response`
			`list-lock-success-response = list-lock-success-command`
			`*argument`
			`delim-pkt`
			`*lock-spec`
			`flush-pkt`
			`list-lock-success-command = PKT-LINE("status 200" LF)`
			`lock-spec = lock-decl`
			`path-id`
			`locked-at`
			`ownername-id`
			`owner-id`
			`lock-decl = PKT-LINE("lock " lock-id LF)`
			`lock-id = value`
			`path-id = PKT-LINE("path " lock-id path LF)`
			`path = data`
			`locked-at = PKT-LINE("locked-at " lock-id timestamp LF)`
			`ownername-id = PKT-LINE("ownername " lock-id ownername LF)`
			`ownername = data`
			`owner-id = PKT-LINE("owner " lock-id who LF)`
			`who = ("ours" \| "theirs")`
			```

			The `lock-decl` production declares a new lock. The `lock-id` production refers
			`to the ID provided by the server. The same ID is repeated in each line to allow`
			`for easier parsing.`

			The `next-cursor` argument indicates the next value of the `cursor` argument to
			be passed to the `list-lock` command. If there is no `next-cursor` argument,
			`this is the final response.`

			```
			`unlock-request = unlock-command`
			`*argument`
			`flush-pkt`
			`unlock-command = PKT-LINE("unlock " lock-id LF)`
			```

			The `force` and `refname` arguments have the same meaning as their corresponding
			values in the HTTP JSON API. The response matches the `generic-status-response`
			`production.`