Briefly describes the deployment, where an issue was spotted,
number of k8s nodes, is DHCP/STN/TAP used.
- [Logs](#collecting-the-logs):
Attach corresponding logs, at least from the vswitch pods.
- [VPP config](#inspect-vpp-config):
Attach output of the show commands.
- [Basic Collection Example](#basic-example)
### Describe Deployment
Since contiv-vpp can be used with different configurations, it is helpful
to attach the config that was applied. Either attach `values.yaml` passed to the helm chart,
or attach the [corresponding part](https://github.com/contiv/vpp/blob/42b3bfbe8735508667b1e7f1928109a65dfd5261/k8s/contiv-vpp.yaml#L24-L38) from the deployment yaml file.
```
contiv.yaml: |-
TCPstackDisabled: true
UseTAPInterfaces: true
TAPInterfaceVersion: 2
NatExternalTraffic: true
MTUSize: 1500
IPAMConfig:
PodSubnetCIDR: 10.1.0.0/16
PodNetworkPrefixLen: 24
PodIfIPCIDR: 10.2.1.0/24
VPPHostSubnetCIDR: 172.30.0.0/16
VPPHostNetworkPrefixLen: 24
NodeInterconnectCIDR: 192.168.16.0/24
VxlanCIDR: 192.168.30.0/24
NodeInterconnectDHCP: False
```
Information that might be helpful:
- Whether node IPs are statically assigned, or if DHCP is used
- STN is enabled
- Version of TAP interfaces used
- Output of `kubectl get pods -o wide --all-namespaces`
### Collecting the Logs
The most essential thing that needs to be done when debugging and **reporting an issue**
in Contiv-VPP is **collecting the logs from the contiv-vpp vswitch containers**.
#### a) Collecting Vswitch Logs Using kubectl
In order to collect the logs from individual vswitches in the cluster, connect to the master node
and then find the POD names of the individual vswitch containers:
```
$ kubectl get pods --all-namespaces | grep vswitch
kube-system contiv-vswitch-lqxfp 2/2 Running 0 1h
kube-system contiv-vswitch-q6kwt 2/2 Running 0 1h
```
Then run the following command, with *pod name* replaced by the actual POD name:
If option a) does not work, then you can still collect the same logs using the plain docker
command. For that, you need to connect to each individual node in the k8s cluster, and find the container ID of the vswitch container:
```
$ docker ps | grep contivvpp/vswitch
b682b5837e52 contivvpp/vswitch "/usr/bin/supervisor…" 2 hours ago Up 2 hours k8s_contiv-vswitch_contiv-vswitch-q6kwt_kube-system_d09b6210-2903-11e8-b6c9-08002723b076_0
```
Now use the ID from the first column to dump the logs into the `logs-master.txt` file:
```
$ docker logs b682b5837e52 > logs-master.txt
```
#### Reviewing the Vswitch Logs
In order to debug an issue, it is good to start by grepping the logs for the `level=error` string, for example:
```
$ cat logs-master.txt | grep level=error
```
Also, VPP or contiv-agent may crash with some bugs. To check if some process crashed, grep for the string `exit`, for example:
```
$ cat logs-master.txt | grep exit
2018-03-20 06:03:45,948 INFO exited: vpp (terminated by SIGABRT (core dumped); not expected)
2018-03-20 06:03:48,948 WARN received SIGTERM indicating exit request
```
#### Collecting the STN Daemon Logs
In STN (Steal The NIC) deployment scenarios, often need to collect and review the logs
from the STN daemon. This needs to be done on each node:
```
$ docker logs contiv-stn > logs-stn-master.txt
```
#### Collecting Logs in Case of Crash Loop
If the vswitch is crashing in a loop (which can be determined by increasing the number in the `RESTARTS`
column of the `kubectl get pods --all-namespaces` output), the `kubectl logs` or `docker logs` would
give us the logs of the latest incarnation of the vswitch. That might not be the original root cause
of the very first crash, so in order to debug that, we need to disable k8s health check probes to not
restart the vswitch after the very first crash. This can be done by commenting-out the `readinessProbe`
and `livenessProbe` in the contiv-vpp deployment YAML: