This patch aims to improve decap performance by reducing expensive
hash_get callings as less as possible using AVX512 on XEON.
e.g. vxlan, vxlan_gpe, geneve, gtpu.
For the existing code, if vtep4 of the current packet match the last
vtep4_key_t well, expensive hash computation can be avoided and the
code returns directly.
This patch improves tunnel decap multiple flows case greatly by
leveraging 512bit vector register on XEON accommodating 8 vtep4_keys.
It enhances the possiblity of avoiding unnecessary hash computing
once hash key of the current packet hits any one of 8 in the 512bit
cache.
The oldest element in vtep4_cache_t is updated in round-robin order.
vlib_get_buffers is also leveraged in the meanwhile.
Type: improvement
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Signed-off-by: Junfeng Wang <drenfong.wang@intel.com>
Change-Id: I313103202bd76f2dd638cd942554721b37ddad60
vpclmulqdq is introduced on intel icelake architecture and
allows computing 4 carry-less multiplications in paralled by using
512-bit SIMD registers
Type: feature
Change-Id: Idb09d6f51ba6f116bba11649b2d99f649356d449
Signed-off-by: Damjan Marion <damjan.marion@gmail.com>
This allows us to combine 2 XOR operations into signle instruction
which makes difference in crypto op:
- in x86, by using ternary logic instruction
- on ARM, by using EOR3 instruction (available with sha3 feature)
Type: refactor
Change-Id: Ibdf9001840399d2f838d491ca81b57cbd8430433
Signed-off-by: Damjan Marion <damjan.marion@gmail.com>
Remove functions which have native C equivalent (i.e. _is_equal can be
replaced with ==, _add with +)
Add SSE4.2, AVX-512 implementations of splat, load_unaligned, store_unaligned,
is_all_zero, is_equal, is_all_equal
Change-Id: Ie80b0e482e7a76248ad79399c2576468532354cd
Signed-off-by: Damjan Marion <damarion@cisco.com>