A bug was reported where a jumbo packet would stay in vhost
queue forever or until a large enough number of other packets
arrived in the queue too.
This is due to a bug in vhost input node buffer allocation.
The fix is to make sure that vhost always allocates at least
enough buffers for one single big packet. '40' is used to
account for 65kB frames.
Change-Id: I1d293028854165083e30cd798fab9d4140230b78
Signed-off-by: Pierre Pfister <ppfister@cisco.com>
With heavy traffic, tx code path may crash due to memory corruption
Thread 5 "vpp_wk_2" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff3995c700 (LWP 2505)]
0x00007ffff73675e8 in vhost_user_if_input (vm=0x7fffb5f5bf9c,
vum=0x7ffff7882a40 <vhost_user_main>, vui=0x7fffb65570c4, qid=0,
node=0x7fffb6577dac, mode=VNET_HW_INTERFACE_RX_MODE_POLLING)
at /home/sluong/vpp-master/vpp/build-data/../src/vnet/devices/virtio/vhost-user.c:1610
1610 bi_current = (vum->cpus[thread_index].rx_buffers)
[vum->cpus[thread_index].rx_buffers_len];
(gdb) p vum->cpus[thread_index].rx_buffers_len
$2 = 793212607
(gdb)
Apparently, some code accidentally wrote the bad value in rx_buffers_len.
rx_buffers_len should never be greater than 1024 since that is how many buffers
we request each time.
After debugging many hours, I discovered that the memory corruption happens
in the tx code path right here on line 2176.
{
vhost_copy_t *cpy = &vum->cpus[thread_index].copy[copy_len];
copy_len++;
cpy->len = bytes_left;
cpy->len = (cpy->len > buffer_len) ? buffer_len : cpy->len;
cpy->dst = buffer_map_addr;
cpy->src = (uword) vlib_buffer_get_current (current_b0) +
current_b0->current_length - bytes_left;
(gdb) p cpy
$3 = (vhost_copy_t *) 0x7fffb554077c
(gdb) p copy_len
$4 = 1025
(gdb) p &vum->cpus[3].rx_buffers_len
$8 = (u32 *) 0x7fffb5540784
copy_len is picking up the index entry 1024 before it was incremented. copy array has only
1024 members (0 - 1023 are valid).
The assignment here in cpy surely causes memory corruption. It is only discovered later
when the memory location that it corrupted is used.
The condition for the crash is to transmit jumbo frames under heavy volume. Since ring
size is 1024, with one packet taking up one index for frame size (less 2048), it does
not cause overflow. With jumbo frames, it requires multiple indices for one packet,
it can cause the overflow under heavy traffic.
The fix is to do copy out when we have 1000 entries in the array to avoid
overflow.
Change-Id: Iefbc739b8e80470f1cf13123113f8331ffcd0eb2
Signed-off-by: Steven <sluong@cisco.com>
When making a call to vlib_packet_template_get_packet(), it
is possible to get back a NULL if the system runs out of buffer.
This can happen when there is buffer leaks. But don't crash
just because we run out of buffers, just punt.
Change-Id: Ie90ea41f3dda6e583d48959cbd18ff124158d7f8
Signed-off-by: Steven <sluong@cisco.com>
(cherry picked from commit 0ff5c563d5048991dbd02a3892dccde8305a7e30)
(cherry picked from commit 1808f3c00a7bcdea7f0c004ef0613db2156c2065)
https://gerrit.fd.io/r/#/c/8551/ decoupled the global variable,
namely tm->iovecs from TX and RX. However, to support multi-threads,
we have to eliminate the use of this global variable with per thread
variable. I notice that rx_buffers must also be per thread variable.
So, we introduce per thread struct to contain rx_buffers and iovecs.
Each thread will find the per thread struct with thread_index.
Change-Id: I61abf2fdace8d722525a382ac72f0d04a173b9ce
Signed-off-by: Steven <sluong@cisco.com>
(cherry picked from commit 4cd257667406d0500a81323ef91f5c7c8c902b25)
It was observed that under heavy traffic, VPP accidentally sent traffic
with the wrong source and destination to the tun/tap interface. Traffic
appears to be sent to the wrong direction. This problem is only
seen when worker thread is configured.
When worker thread is used, TX and RX may reside in different
core. Yet both TX and RX threads are sharing the same global variable,
namely iovecs without any mutex or memory barrier protection.
This creates a race condition when heavy traffic is blasted to VPP,
like 1000 pps.
We could create a mutex or memory barrier to ensure atomic memory access.
But why bother? It is a lot cheaper to just decouple the iovecs such
that TX and RX have their own iovecs.
Change-Id: I86a5a19bd8de54d54f32e1f0845bae6a81bbf686
Signed-off-by: Steven <sluong@cisco.com>
(cherry picked from commit 4ff586d1c6fc5c40e1548cd6f221a8a7f3ad033b)
(cherry picked from commit 3fd57e67532bd55701bef7365adc17da229a44dc)
Change L2 learning path so it update stale timestamp in MAC entry
only if aging is enabled on the BD for the MAC entry.
Change-Id: I849154fe7ad2c6c68d6a94a66ca9345f6a98bc07
Signed-off-by: John Lo <loj@cisco.com>
On host interface if a VLAN tagged packet is received, linux kernel removes
the VLAN header from packet byte stream and adds metadata in tpacket2_hdr.
This patch explicitely checks for the presense of VLAN metadata and adds it
in VPP packet.
Change-Id: I0ba35c1e98dbc008ce18d032f22f2717d610c1aa
Signed-off-by: Akshaya N <akshaya@rtbrick.com>
(cherry picked from commit 535f0bfe0274e86c5d2e00dfd66dd632c6ae20a9)
vpp_uses_dpdk is set to "no" in build-data/platforms/vpp.mk causing the
build to fail.
This patch addresses that issue.
Change-Id: Icc1aaa508e730c9b8715119e1259e4c82f974048
Signed-off-by: Marco Varlese <marco.varlese@suse.com>
(cherry picked from commit edfa2fddf84fe102e3c134c4df312638b3a00339)
1. Limit MAC entry update per l2-learn call to reduce update burst
when wall clock advance to the the next minute so all MAC time
stamps are behind current time.
2. Optimize l2-learn node fast path code sequence.
3. Invalidate cache_key when update MAC entry.
4. Change L2 learn hit counter to L2 learn hit-update counter.
5. Increase L2FIB table memory size to 512MB to fit 4M entries
6. Set MAC learn limit at 4M entries
Change-Id: I19572f7f8a4b42a01be025a609fb03af50af16b2
Signed-off-by: John Lo <loj@cisco.com>
The fix for VPP-935 missed the case that hash_acl_add() and hash_acl_delete() may be called
during the replacement of the existing applied ACL, as a result the "applied" logic needs
to be replicated for the hash acls separately, since it is a lower layer.
Change-Id: I7dcb2b120fcbdceb5e59acb5029f9eb77bd0f240
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
The handler was calling the routines with sw_if_index instead of hw_if_index,
fix that by an extra call to vnet_get_sw_interface, and check that the interface
type is VNET_SW_INTERFACE_TYPE_HARDWARE before proceeding.
Change-Id: I4a6f65f44e250ecdb2b72d2693c9d7db5a52b966
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
This fixes build on non-x86 platforms like arm64.
Change-Id: I7ff5df92f89e34c27889d82f35924dc28cde8c39
Signed-off-by: Damjan Marion <damarion@cisco.com>
(cherry picked from commit 5f22f4ddded8ac41487dab3069ff8d77c3916205)
libraries.
This patch addresses the misbehaviour.
Change-Id: I41f1ece3ca21c5a8f2c95533ed3d77a535233ea6
Signed-off-by: Marco Varlese <marco.varlese@suse.com>
Fedora 24 and 25 distro already includes nasm 2.12 but Centos does not as yet.
Change-Id: I060ea8b7b7892ac8444d850398ed1c9100631fbc
Signed-off-by: Thomas F Herbert <therbert@redhat.com>
Current optional DPDK PMDs are:
- AESNI MB PMD (SW crypto)
- AESNI GCM PMD (SW crypto)
- MLX4 PMD
- MLX5 PMD
This change will always build DPDK SW crypto PMDs and required SW crypto
libraries, while MLX PMDs are still optional and the user has to build
required libraries.
Now the configure script detects if any of the optional DPDK PMDs were
built and link against their required libraries/dependencies.
Change-Id: I1560bebd71035d6486483f22da90042ec2ce40a1
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
In multithread setup the main thread may send packets,
which may pass through the node with permit+reflect action.
This creates the connection in lists for thread0,
however in multithread there are no interupt handlers there.
Ensure we are not spending too much time spinning in a
tight cycle by suspending the main cleaner thread
until the current iteration of interrupts is processed.
Change-Id: Idb7346737757ee9a67b5d3e549bc9ad9aab22e89
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
When looking at resource utilisation, it is useful to understand
the interactions between the acl-plugin and the rest of VPP.
MACIP ACLs till now could only be dumped via API,
which is tricky when debugging. Add the CLIs to see
the MACIP ACLs and where they are applied.
Change-Id: I3211901589e3dcff751697831c1cd0e19dcab1da
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
Multiple portranges that land on the same hash key will always report the match
on the first portrange - even when the subsequent portranges have matched.
Test escape, so make a corresponding test case and fix the code so it passes.
Change-Id: Idbeb8a122252ead2468f5f9dbaf72cf0e8bb78f1
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
The logic in hash ACL bitmask update was using the vector
of ACLs applied to the interface to rebuild the hash lookup mask.
However, in transient cases (like doing group manipulation with
hash ACLs), that will not hold true. Thus, make
a local copy of for which ACL indices the hash_acl_apply
was called previously, and maintain that one local
to the hash_lookup.c file logic.
Change-Id: I30187d68febce8bba2ab6ffbb1eee13b5c96a44b
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
The commit fixing the VPP-910 and separating the memory operations
into separate heaps has missed setting the MHEAP_FLAG_THREAD_SAFE,
which quite obviously caused the issues in the multithread setup.
Fix that.
Also, add the debug CLIs
"set acl-plugin heap {main|hash} {validate|trace} {1|0}"
to toggle the memory instrumentation, in case we ever need it
in the future.
Change-Id: I8bd4f7978613f5ea75a030cfb90674dac34ae7bf
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
The packet that was creating the session was not tracked,
consequently the TCP flags seen within the session record
never got the value for the session to get treated as
being in the established state.
Test-escape, so add the TCP tests which test the
three phases of the TCP session life and make them all pass.
Change-Id: Ib048bc30c809a7f03be2de7e8361c2c281270348
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
- the echo_reply_node is now notifying the cli process on the main thread/vlib_main
- the timestamp for the icmp reply is now acquired in the echo_reply_node and not in the cli process to avoid an off by 10ms error (see 【vpp-dev】delay is error in ping with multi worker thread)
Change-Id: I21d37002b0376b4f2ccab08d8f04c2f2944b9b39
Signed-off-by: Mohammed Hawari <mhawari@cisco.com>
(cherry picked from commit 03a6213fb5022d37ea92f974a1814db1c70bcbdf)
It was uncaught by make test because the corresponding tests are not there yet - part of 17.10 deliverables
Change-Id: I55456f1874ce5665a06ee411c7abf37cd19ed814
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
The further prolonged testing from testbed that reported VPP-910
has uncovered a couple of deeper issues with optimization from
7384, and the usage of subscripts rather than vec_elt_at_index()
allowed to hide a couple of further errors in the code.
Also, the current acl-plugin behavior of using the global
heap for its dynamic data is problematic - it makes
the troubleshooting much harder by potentially spreading
the problem around.
Based on this experience, this commits makes a few changes to fix
the issues seen, also improving the serviceability of the acl-plugin
code for the future:
- Use separate mheaps for any ACL-related control plane
operations and separate for the hash lookup datastructures,
to compartmentalize any memory-related issues for the ACL plugin.
- Ensure vec_elt_at_index() usage throughout the hash_lookup.c file.
- Use vectors rather than raw memory for storing the "ordinary" ACL rules.
- Rework the optimization from 7384 to use a separate tail pointer
rather than overloading the "prev" field.
- Make get_session_ptr() more conservative and adjust is_valid_session_ptr
accordingly
Change-Id: Ifda85193f361de5ed3782a4acd39622bd33c5830
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
Fix several threading-related issues uncovered by the CSIT scale/performance test:
- make the per-interface add/del counters per-thread
- preallocate the per-worker session pools rather than
attempting to resize them within the datapath
- move the bihash initialization to the moment of ACL
being applied rather than later during the connection creation
- adjust the connection cleaning logic to not require
the signaling from workers to main thread
- make the connection lists check in the main thread robust against workers
updating the list heads at the same time
- add more information to "show acl-plugin sessions" to aid in debugging
Change-Id: If82ef715e4993614df11db5e9afa7fa6b522d9bc
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
Node function pointer was not set on all node runtimes causing crash if
new interface is different type.
Change-Id: I4661fe883befc6cd3fc6dfc14fd44f6fa5faf27c
Signed-off-by: Damjan Marion <damarion@cisco.com>
(cherry picked from commit c418e4ac7cf36bd64f3130c258d5f1897c245f2b)
The syntax for debug vhost-user is
debug vhost-user <on | off>
However, currently the code does not reject the invalid command such as below
debug vhost-user
debug vhost-user on blah
debug vhost-user off blah
The fix is to enforece the correct syntax and reject the command when invalid
option is entered.
Change-Id: I1a04ae8ddb6dd299aa6d15b043362964e685ddde
Signed-off-by: Steven <sluong@cisco.com>
change 7385 has added the code which has the first ACE's "prev" entry within the linked list of
shadowed ACEs pointing to the last ACE, in order to avoid the frequent linear list traversal.
That change was not complete and did not update this "prev" entry whenever the last ACE was deleted.
As a result the changes within the applied ACLs which caused the calls to hash_acl_unapply/hash_acl_apply
may result in hitting assert which does the sanity check. The solution is to add the missing update logic.
Change-Id: I9cbe9a7c68b92fa3a22a8efd11b679667d38f186
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
Cleanup mapping of interface output node for the l2-output node
when interface is configured to L2 or L3 modes. The mapping is
now always done in the main thread as part of API/CLI processing,
instead of initiate mapping in the forwarding path which can be
in the worker threads.
Change-Id: Ia789493e7d9f5c76d68edfaf34db43f3e3f53506
Signed-off-by: John Lo <loj@cisco.com>
- PCI devices not properly discovered
- vlib_pci_bus_master_enable () not working
Change-Id: I7433ab1b19b890b8900635b43037b9a2017a1921
Signed-off-by: Damjan Marion <damarion@cisco.com>
Looks like some compiler versions are producing wrong code when we are
copying 9-16 bytes so reverting back to the original code.
Change-Id: I74b5fa54a3b01f6288648f1cb0926030edd3b26f
Signed-off-by: Damjan Marion <damarion@cisco.com>
In multi-threaded model (e.g. 1 main and 1 worker threads),
after an ethernet interface is deleted (e.g. vhost-user interface),
'show runtime' command produces garbled output and sometimes
leads to vpp crash.
The reason is because vlib_node_rename() frees and reallocates node's
'n->name' vector, however the change is not propagated into copies
of the node on worker threads.
Change-Id: Ibf22422913b7f2df22f70f3b2fe8dafd34c1dd06
Signed-off-by: Igor Mikhailov (imichail) <imichail@cisco.com>