Symptom
-------
With NDR traffic blasting at VPP, bringing up a new VM with vhost
connection to VPP causes packet drops. I am able to recreate this
problem easily using a simple setup like this.
TREX-------------- switch ---- VPP
|---------------| |-------|
Cause
-----
The reason for the packet drops is due to vhost holding onto the worker
barrier lock for too long in vhost_user_socket_read(). There are quite a
few of system calls inside the routine. At the end of the routine, it
unconditionally calls vhost_user_update_iface_state() for all message
types. vhost_user_update_iface_state() also unconditionally calls
vhost_user_rx_thread_placement() and vhost_user_tx_thread_placement().
vhost_user_rx_thread_placement scraps out all existing cpu/queue mappings
for the interface and creates brand new cpu/queue mappings for the
interface. This process is very disruptive and very expensive. In my
opinion, this area of code needs a makeover.
Fixes
-----
* vhost_user_socket_read() is rewritten that it should not hold
onto the worker barrier lock for system calls, or at least minimize the
need for doing it.
* Remove the call to vhost_user_update_iface_state as a default route at
the end of vhost_user_socket_read(). There is only a couple of message
types which really need to call vhost_user_update_iface_state(). We put
the call to those message types which need it.
* Remove vhost_user_rx_thread_placement() and
vhost_user_tx_thread_placement from vhost_user_update_iface_state().
There is no need to repetatively change the cpu/queue mappings.
* vhost_user_rx_thread_placement() is actually quite expensive. It should
be called only once per queue for the interface. There is no need to
scrap the existing cpu/queue mappings and create new cpu/queue mappings
when the additional queues becomes active/enable.
* Change to create the cpu/queue mappings for the first RX when the
interface is created. Dont remove the cpu/queue mapping when the
interface is disconnected. Remove the cpu/queue mapping only when the
interface is deleted.
The create vhost user interface CLI also has some very expensive system
calls if the command is entered with the optional keyword "server"
As a bonus, This patch makes the create vhost user interface binary-api and
CLI thread safe. Do the protection for the small amount of code which is
thread unsafe.
Change-Id: I4a19cbf7e9cc37ea01286169882e5603e6d7eb77
Signed-off-by: Steven Luong <sluong@cisco.com>
We register callback for VNET_HW_INTERFACE_LINK_UP_DOWN_FUNCTION and
VNET_SW_INTERFACE_ADMIN_UP_DOWN_FUNCTION to add and remove the slave
interface from the bond interface accordingly. For static bonding without
lacp, one would think that it is good enough to put the slave interface into
the ective slave set as soon as it is configured. Wrong, sometimes the slave
interface is configured to be part of the bonding without ever bringing up the
hardware carrier or setting the admin state to up. In that case, we send
traffic to the "dead" slave interface.
The fix is to make sure both the carrier and admin state are up before we put
the slave into the active set for forwarding traffic.
Change-Id: I93b1c36d5481ca76cc8b87e8ca1b375ca3bd453b
Signed-off-by: Steven <sluong@cisco.com>
(cherry picked from commit e43278f75fe3188551580c7d7991958805756e2f)
jobs in stable/1810 failed to verify even after many rechecks. This is found in the failure log from https://gerrit.fd.io/r/#/c/16728/
13:01:56 2 Problems:
13:01:56 Problem: libboost_headers1_68_0-devel-1.68.0-lp150.243.1.x86_64 conflicts with namespace:otherproviders(libboost_headers-devel) provided by libboost_headers-devel-1.69.0-lp150.1.1.noarch
13:01:56 Problem: libboost_thread1_68_0-devel-1.68.0-lp150.243.1.x86_64 conflicts with namespace:otherproviders(libboost_thread-devel) provided by libboost_thread-devel-1.69.0-lp150.1.1.noarch
13:01:56
13:01:56 Problem: libboost_headers1_68_0-devel-1.68.0-lp150.243.1.x86_64 conflicts with namespace:otherproviders(libboost_headers-devel) provided by libboost_headers-devel-1.69.0-lp150.1.1.noarch
13:01:56 Solution 1: Following actions will be done:
13:01:56 deinstallation of libboost_headers1_68_0-devel-1.68.0-lp150.243.1.x86_64
13:01:56 deinstallation of libboost_chrono1_68_0-devel-1.68.0-lp150.243.1.x86_64
13:01:56 deinstallation of libboost_date_time1_68_0-devel-1.68.0-lp150.243.1.x86_64
13:01:56 Solution 2: do not install libboost_headers-devel-1.69.0-lp150.1.1.noarch
13:01:56
13:01:56 Choose from above solutions by number or skip, retry or cancel [1/2/s/r/c] (c): c
13:01:56 make: *** [Makefile:315: install-dep] Error 4
A test patch was created to include both 16631 and 16728 as found in https://gerrit.fd.io/r/#/c/16986/
The job was verified successfully. It proves to me that stable/1810 is missing 16631.
Change-Id: I4a053f41eef138fc0e6db7e2650860c0ac999552
Signed-off-by: Florin Coras <fcoras@cisco.com>
Signed-off-by: Paul Vinciguerra <pvinci@vinciconsulting.com>
(cherry picked from commit 223548d479c0bde67aa8d05a1f0f13e0afb0aab1)
Needed for arm machines in CI.
Change-Id: Ib16a8b63e145116c7cb22376243e9026d9545c8a
Signed-off-by: juraj.linkes <juraj.linkes@pantheon.tech>
(cherry picked from commit a409f2729ac2431aeee5a18889b4d2e5634c713f)
Should have been done this way years ago. My bad.
Change-Id: Ic7bf937fb6c4dc5c1b6ae64f2ecf8608b62e7039
Signed-off-by: Dave Barach <dave@barachs.net>
(cherry picked from commit b2204671dad112e3195771854b4ef00bb388d4e6)
The permission for the top-level vpp_papi dir under
/usr/lib/python2.7/site-packages is set to 644 which means that
non-root users cannot import vpp_papi. As a result, devstack setup
with VPP/networking-vpp fails since it is run as non-root user.
Change-Id: Id85b468b2dcc92efb3a64c51ffb23ef6d596e4ad
Signed-off-by: Onong Tayeng <otayeng@cisco.com>
(cherry picked from commit 9b0ce0215b6e699851a3b54fb2a7003800ca53e4)
Fix the trivial use-before-check copypaste error.
There was a more subtle issue with that patch that Coverity didn't notice:
namely, vec_validate(v, len-1) is a terrible idea if len happens to be == 0.
Fix that.
Change-Id: I0fab8b1750e9e9973eefb5d39f35e4c3a13fc66f
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
(cherry picked from commit e0152461cbc84d6d4df3f05dddabe992c1c59052)
In a couple of places vec_add1()-style was repeatedly called in a loop for
smallish vectors where the number of additions was known in advance.
With a test with large number of ACEs these numbers contribute to heap
fragmentation noticeably.
Minimize the number of allocations by preallocating the known size and
then resetting the length accordingly, and then calling vec_add1()
Also unify the parsing of the memory-related startup config parameters.
Change-Id: If8fba344eb1dee8f865ffe7b396ca3b6bd9dc1d0
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
(cherry picked from commit 94f509615eb97cebc9192e7290c84cf166518039)
Change-Id: Idd4a5f8bab5d39e5f33f5c130601175af70a20d4
Signed-off-by: Filip Varga <filip.varga@pantheon.tech>
Signed-off-by: Dave Barach <dave@barachs.net>
There are different flavors of vmxnet3 device, esxi server, vm fusion, vmware
workstation, and vmware player, that we need to communicate with. Each of
them also has different versions. We really need the control plane logging
to debug when things don't work as expected.
Change-Id: Idab6896e3d8bf841f1cd877c13a21531fa110568
Signed-off-by: Steven <sluong@cisco.com>
The result vector from stat_segment_ls must be freed
by the caller. Add wrapper for non-C language bindings.
Change-Id: I7eee7f80ec98b41696d354add47b26978e12ef0f
Signed-off-by: Ole Troan <ot@cisco.com>
(cherry picked from commit 8254018c21bbdbbc11225ebc444b1d072606caf7)
- ensure session enqueue epoch does not wrap between two enqueues
- use 3 states for echo clients app, to distinguish between starting and
closing phases
- force tcp fin retransmit if out of buffers while sending a fin
Change-Id: I6f2cab46affd1148aba2a33fb6d58bcc54f32805
Signed-off-by: Florin Coras <fcoras@cisco.com>
A pointer to hash-ready ACL rules is only set once, which might cause a crash if there are colliding entries
from more than one ACL applied.
Solution: reload the pointer based on the element being processed.
Change-Id: I7a701c2c3b4236d67293159f2a33c4f967168953
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
show vmxnet3 desc may display 5000 lines of output since it has 5 tables. Each
table may have 1000 entries. It would not be very useful to debug problem.
We need filtering capability for the subject show command. We need to be able
to display the descriptor table per interface, per interface per table, and
per interface per table per slot. The latter is the most useful.
tested the following valid combinations
show vmxnet3
show vmxnet3 desc
show vmxnet3 vmxnet3-0/13/0/0
show vmxnet3 vmxnet3-0/13/0/0 desc
show vmxnet3 vmxnet3-0/13/0/0 rx-comp
show vmxnet3 vmxnet3-0/13/0/0 rx-comp 1
show vmxnet3 vmxnet3-0/13/0/0 tx-comp
show vmxnet3 vmxnet3-0/13/0/0 tx-comp 1
show vmxnet3 vmxnet3-0/13/0/0 rx-desc-0
show vmxnet3 vmxnet3-0/13/0/0 rx-desc-0 1
show vmxnet3 vmxnet3-0/13/0/0 rx-desc-1
show vmxnet3 vmxnet3-0/13/0/0 rx-desc-1 1
show vmxnet3 vmxnet3-0/13/0/0 tx-desc
show vmxnet3 vmxnet3-0/13/0/0 tx-desc 1
negative tests and command is rejected
show vmxnet3 abc
show vmxnet3 desc abc
show vmxnet3 vmxnet3-0/13/0/0 abc
show vmxnet3 vmxnet3-0/13/0/0 desc abc
show vmxnet3 vmxnet3-0/13/0/0 rx-comp abc
show vmxnet3 vmxnet3-0/13/0/0 rx-comp 1 abc
Change-Id: I0ff233413496e58236f8fb4a94e493494c20c5cb
Signed-off-by: Steven <sluong@cisco.com>
When using vpp_api_test, there is an undefined symbol error for
format_vlib_pci_addr when vmxnet3_test_plugin.so is loaded.
The cause is due to vlib not included in vpp_api_test. Remove the reference
for vlib.so in vmxnet3_test.
Change-Id: I37c00dfe2f843d99ad6c4fc7af6ed10bac4c2df8
Signed-off-by: Steven <sluong@cisco.com>