Commit Graph

14 Commits

Author SHA1 Message Date
Dave Barach
402606c96b vppinfra: minor refactor in lock.h
For whatever reason, "typedef struct { ... } *foo_t" gives cgo a
horrible case of indigestion. A minor refactor makes the pain go away.

Type: refactor

Signed-off-by: Dave Barach <dave@barachs.net>
Change-Id: I856b2abf9b16348d2f6145178e683e722914c756
2020-10-19 21:36:52 +00:00
Benoît Ganne
b6b484d01a virtio: fix txq locking
Initialize txq lock only if some txq are shared and check if another
worker is already operating on the txq before processing gro timeouts
in input node.

Type: fix

Change-Id: I89dab6c0e6eb6a7aa621fa1548b0a2c76e6c7581
Signed-off-by: Benoît Ganne <bganne@cisco.com>
2020-09-18 10:40:47 +00:00
jaszha03
18512b002d vppinfra: implement CLIB_PAUSE () for aarch64 platforms
Define CLIB_PAUSE () to generate the "yield" instruction. No significant
performance changes were observed for clib_spinlock_t and clib_rwlock_t.

Type: feature

Change-Id: I59eb996e61c7a16007517e57e6996567302c1657
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
2019-09-27 18:31:28 +00:00
jaszha03
30aaf97a90 vppinfra: refactor clib_rwlock_t to use single condition variable
Previous implementation of clib_rwlock_t used two spinlocks: one
writer lock, and one to guard the counter for the number of readers.
This implementation uses a single condition variable rw_cnt which
has the following properties:

if a writer has the rwlock, rw_cnt = -1
if the rwlock is free, rw_cnt = 0
otherwise, rw_cnt > 0 and rw_cnt = number of readers
rw_cnt will never be less than -1

Benchmarking:
The results below are the cycle counts from test_rwlock.c, configured so
that for 10000 iterations, 6 reader and 6 writer threads on separate cores
are spawned such that each writer thread increments a global counter
10000 times in each iteration. For Taishan, 4 reader and 4 writer
threads are spawned in each test.

x86 Xeon old rwlock: 12.473e8, 11.655e8, 13.201e8, 11.347e8, 13.182e8
x86 Xeon new rwlock: 5.881e8, 5.796e8, 6.536e8, 5.540e8, 5.890e8
Aarch64 ThX2* old rwlock: 9.263e7, 8.933e7, 9.074e7, 8.979e7, 9.378e7
Aarch64 ThX2* new rwlock: 7.221e7, 8.107e7, 7.515e7, 7.672e7, 7.386e7
A72 old rwlock: 3.268e6, 3.200e6, 3.086e6, 3.176e6, 3.170e6
A72 new rwlock: 1.261e6, 1.288e6, 1.251e6, 1.229e6, 1.234e6

*ThunderX2 used additional gcc options "-march=armv8.1-a+crc+crypto+lse"

Type: refactor

Change-Id: I7c347d3037b36205ab532cbcb52a374c846eb275
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
2019-08-01 18:01:33 +00:00
jaszha03
fb1ccc7c36 vppinfra: refactor clib_spinlock_t to use compare and swap
Tested performance of a CAS implementation (using __atomic_compare_exchange)
against a TAS implementation (using __atomic_exchange) using test_spinlock.c
and found some performance improvement.

Generated assembly for CAS and TAS implementations show that TAS always
executes with a load-store dependency, but CAS moves a branch condition
between the load and store so that only a load occurs when the lock is free.

Benchmarking:
The results below are the cycle counts from test_spinlock.c, configured so
that for 10000 iterations, 12 threads on separate cores are spawned, each of
which increments a global counter 10000 times in each iteration. For
A72, 8 threads are spawned in each test.

x86 Xeon TAS: 7.333e8, 7.605e8, 7.535e8, 7.485e8, 7.321e8
x86 Xeon CAS: 5.842e8, 5.433e8, 5.389e8, 5.983e8, 5.552e8
Aarch64 ThX2* TAS: 9.852e7, 10.209e7, 9.190e7, 9.600e7, 9.224e7
Aarch64 ThX2* CAS: 7.640e7, 7.486e7, 7.425e7, 7.269e7, 7.534e7
A72 TAS: 7.289e6, 6.963e6, 7.208e6, 6.976e6, 7.200e6
A72 CAS: 1.695e6, 1.608e6, 1.600e6, 1.634e6, 1.746e6

*ThunderX2 used additional gcc options "-march=armv8.1-a+crc+crypto+lse"

Type: refactor

Change-Id: Ic5cd97991804f6b012707fad1a5d1a6edb96cd3d
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
2019-08-01 11:54:17 +00:00
jaszha03
5cdde5c25a vppinfra: refactor test_and_set spinlocks to use clib_spinlock_t
Spinlock performance improved when implemented with compare_and_exchange
instead of test_and_set. All instances of test_and_set locks were refactored
to use clib_spinlock_t when possible. Some locks e.g. ssvm synchronize
between processes rather than threads, so they cannot directly use
clib_spinlock_t.

Type: refactor

Change-Id: Ia16b5d4cd49209b2b57b8df6c94615c28b11bb60
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
2019-07-31 13:53:55 +00:00
jaszha03
f25e7cfa5c vppinfra: refactor use of CLIB_MEMORY_BARRIER ()
All instances of test_and_set locks used the following sequence
to release the locks:

CLIB_MEMORY_BARRIER ();
p->lock = 0; // p is a generic struct with a TAS lock

Use clib_atomic_release to generate more efficient assembly code.

Type: refactor

Change-Id: Idca3a38b1cf43578108bdd1afe83b6ebc17a4c68
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
2019-07-30 16:22:03 +00:00
Dave Barach
b7b929931a c11 safe string handling support
Change-Id: Ied34720ca5a6e6e717eea4e86003e854031b6eab
Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-23 13:06:46 +00:00
Sirshak Das
2f6d7bb93c vppinfra: add atomic macros for __sync builtins
This is first part of addition of atomic macros with only macros for
__sync builtins.

- Based on earlier patch by Damjan (https://gerrit.fd.io/r/#/c/10729/)
Additionally
- clib_atomic_release macro added and used in the absence
of any memory barrier.
- clib_atomic_bool_cmp_and_swap added

Change-Id: Ie4e48c1e184a652018d1d0d87c4be80ddd180a3b
Original-patch-by: Damjan Marion <damarion@cisco.com>
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
2018-10-19 07:10:47 +00:00
Florin Coras
a332c46a51 session: segment manager refactor
- use valloc as a 'central' segment baseva manager
- use per segment manager segment pools and use rwlocks to guard them
- add session test that exercises segment creation
- embed segment manager properties into application since they're shared
- fix rw locks

Change-Id: I761164c147275d9e8a926f1eda395e090d231f9a
Signed-off-by: Florin Coras <fcoras@cisco.com>
2018-02-05 21:45:28 +00:00
Florin Coras
1f9600e50f vppinfra: add readers-writer lock
Change-Id: I606fd89c410369cbd9ce9dcaaaa9dc58796e7c0e
Signed-off-by: Florin Coras <fcoras@cisco.com>
2018-01-25 23:54:22 +00:00
Dave Barach
fa77e8fb1c Fix minor issues in clib_spinlock_unlock()
Change-Id: I20ce799c9dd57332c06003b466ee7c36169bce98
Signed-off-by: Dave Barach <dave@barachs.net>
2017-10-15 22:14:15 +00:00
Damjan Marion
f55f9b851f completelly deprecate os_get_cpu_number, replace new occurences
Change-Id: I82c663bc0866c6c68ba354104b0bb059387f4b9d
Signed-off-by: Damjan Marion <damarion@cisco.com>
2017-05-10 22:01:15 +00:00
Damjan Marion
1927da29cc vppinfra: add spinlock inline functions
Change-Id: I86089e9bb604adfc260a111685001be1c897ce53
Signed-off-by: Damjan Marion <damarion@cisco.com>
2017-03-30 14:14:26 +00:00