* compiler -march= parameter is changed from native to corei7
so code is always genereted with instructions which are available
on the Nehalem microarchitecture (up to SSE4.2)
* compiler -mtune= parameter is added so code is optimized for
corei7-avx which equals to Sandy Bridge microarchitecture
* set of macros is added which allows run-time detection of available
cpu instructions (e.g. clib_cpu_supports_avx())
* set of macros is added which allows us to clone graph node funcitons
where cloned function is optmized for different microarchitecture
Those macros are using following attributes:
__attribute__((flatten))
__attribute__((target("arch=core-avx2)))
I.e. If applied to foo_node_fn() macro will generate cloned
functions foo_node_fn_avx2() and foo_node_fn_avx512() (future)
It will also generate function void * foo_node_fn_multiarch_select()
which detects available instruction set and returns pointer to the
best matching function clone.
Change-Id: I2dce0ac92a5ede95fcb56f47f3d1f3c4c040bac0
Signed-off-by: Damjan Marion <damarion@cisco.com>
A problem is easily reproducible by taking the test harness code from the commit,
and launching it in two terminals with some time overlap - the outputs will
be sent to the wrong session. This commit moves the output_function and argument
from a global structure into the process structure, thus the output_function
is not clobbered anymore and each session gets only its own output.
To ensure the callers can redirect the outputs to different destinations
(e.g. the API calls via shared memory, etc.) the existing logic
for vlib_cli_input() was retained.
To avoid the magic numbers usage in the logic that does the page-alignment
of the process stack, there are changes around the stack[] member
of vlib_process_t. Also added a compile-time assert to ensure that
the stack does indeed start on the page size multiple boundary.
Change-Id: I128680ac480735e5f214f81a884e414268e5d652
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>