Initial commit of Sphinx docs
Change-Id: I9fca8fb98502dffc2555f9de7f507b6f006e0e77 Signed-off-by: John DeNisco <jdenisco@cisco.com>
This commit is contained in:

committed by
Dave Barach

parent
1d65279ffe
commit
06dcd45ff8
273
docs/gettingstarted/developers/bihash.md
Normal file
273
docs/gettingstarted/developers/bihash.md
Normal file
@ -0,0 +1,273 @@
|
||||
Bounded-index Extensible Hashing (bihash)
|
||||
=========================================
|
||||
|
||||
Vpp uses bounded-index extensible hashing to solve a variety of
|
||||
exact-match (key, value) lookup problems. Benefits of the current
|
||||
implementation:
|
||||
|
||||
* Very high record count scaling, tested to 100,000,000 records.
|
||||
* Lookup performance degrades gracefully as the number of records increases
|
||||
* No reader locking required
|
||||
* Template implementation, it's easy to support arbitrary (key,value) types
|
||||
|
||||
Bounded-index extensible hashing has been widely used in databases for
|
||||
decades.
|
||||
|
||||
Bihash uses a two-level data structure:
|
||||
|
||||
```
|
||||
+-----------------+
|
||||
| bucket-0 |
|
||||
| log2_size |
|
||||
| backing store |
|
||||
+-----------------+
|
||||
| bucket-1 |
|
||||
| log2_size | +--------------------------------+
|
||||
| backing store | --------> | KVP_PER_PAGE * key-value-pairs |
|
||||
+-----------------+ | page 0 |
|
||||
... +--------------------------------+
|
||||
+-----------------+ | KVP_PER_PAGE * key-value-pairs |
|
||||
| bucket-2**N-1 | | page 1 |
|
||||
| log2_size | +--------------------------------+
|
||||
| backing store | ---
|
||||
+-----------------+ +--------------------------------+
|
||||
| KVP_PER_PAGE * key-value-pairs |
|
||||
| page 2**(log2(size)) - 1 |
|
||||
+--------------------------------+
|
||||
```
|
||||
|
||||
Discussion of the algorithm
|
||||
---------------------------
|
||||
|
||||
This structure has a couple of major advantages. In practice, each
|
||||
bucket entry fits into a 64-bit integer. Coincidentally, vpp's target
|
||||
CPU architectures support 64-bit atomic operations. When modifying the
|
||||
contents of a specific bucket, we do the following:
|
||||
|
||||
* Make a working copy of the bucket's backing storage
|
||||
* Atomically swap a pointer to the working copy into the bucket array
|
||||
* Change the original backing store data
|
||||
* Atomically swap back to the original
|
||||
|
||||
So, no reader locking is required to search a bihash table.
|
||||
|
||||
At lookup time, the implementation computes a key hash code. We use
|
||||
the least-significant N bits of the hash to select the bucket.
|
||||
|
||||
With the bucket in hand, we learn log2 (nBackingPages) for the
|
||||
selected bucket. At this point, we use the next log2_size bits from
|
||||
the hash code to select the specific backing page in which the
|
||||
(key,value) page will be found.
|
||||
|
||||
Net result: we search **one** backing page, not 2**log2_size
|
||||
pages. This is a key property of the algorithm.
|
||||
|
||||
When sufficient collisions occur to fill the backing pages for a given
|
||||
bucket, we double the bucket size, rehash, and deal the bucket
|
||||
contents into a double-sized set of backing pages. In the future, we
|
||||
may represent the size as a linear combination of two powers-of-two,
|
||||
to increase space efficiency.
|
||||
|
||||
To solve the "jackpot case" where a set of records collide under
|
||||
hashing in a bad way, the implementation will fall back to linear
|
||||
search across 2**log2_size backing pages on a per-bucket basis.
|
||||
|
||||
To maintain *space* efficiency, we should configure the bucket array
|
||||
so that backing pages are effectively utilized. Lookup performance
|
||||
tends to change *very litte* if the bucket array is too small or too
|
||||
large.
|
||||
|
||||
Bihash depends on selecting an effective hash function. If one were to
|
||||
use a truly broken hash function such as "return 1ULL." bihash would
|
||||
still work, but it would be equivalent to poorly-programmed linear
|
||||
search.
|
||||
|
||||
We often use cpu intrinsic functions - think crc32 - to rapidly
|
||||
compute a hash code which has decent statistics.
|
||||
|
||||
Bihash Cookbook
|
||||
---------------
|
||||
|
||||
### Using current (key,value) template instance types
|
||||
|
||||
It's quite easy to use one of the template instance types. As of this
|
||||
writing, .../src/vppinfra provides pre-built templates for 8, 16, 20,
|
||||
24, 40, and 48 byte keys, u8 * vector keys, and 8 byte values.
|
||||
|
||||
See .../src/vppinfra/{bihash_<key-size>_8}.h
|
||||
|
||||
To define the data types, #include a specific template instance, most
|
||||
often in a subsystem header file:
|
||||
|
||||
```c
|
||||
#include <vppinfra/bihash_8_8.h>
|
||||
```
|
||||
|
||||
If you're building a standalone application, you'll need to define the
|
||||
various functions by #including the method implementation file in a C
|
||||
source file.
|
||||
|
||||
The core vpp engine currently uses most if not all of the known bihash
|
||||
types, so you probably won't need to #include the method
|
||||
implementation file.
|
||||
|
||||
|
||||
```c
|
||||
#include <vppinfra/bihash_template.c>
|
||||
```
|
||||
|
||||
Add an instance of the selected bihash data structure to e.g. a
|
||||
"main_t" structure:
|
||||
|
||||
```c
|
||||
typedef struct
|
||||
{
|
||||
...
|
||||
BVT (clib_bihash) hash;
|
||||
or
|
||||
clib_bihash_8_8_t hash;
|
||||
...
|
||||
} my_main_t;
|
||||
```
|
||||
|
||||
The BV macro concatenate its argument with the value of the
|
||||
preprocessor symbol BIHASH_TYPE. The BVT macro concatenates its
|
||||
argument with the value of BIHASH_TYPE and the fixed-string "_t". So
|
||||
in the above example, BVT (clib_bihash) generates "clib_bihash_8_8_t".
|
||||
|
||||
If you're sure you won't decide to change the template / type name
|
||||
later, it's perfectly OK to code "clib_bihash_8_8_t" and so forth.
|
||||
|
||||
In fact, if you #include multiple template instances in a single
|
||||
source file, you **must** use fully-enumerated type names. The macros
|
||||
stand no chance of working.
|
||||
|
||||
### Initializing a bihash table
|
||||
|
||||
Call the init function as shown. As a rough guide, pick a number of
|
||||
buckets which is approximately
|
||||
number_of_expected_records/BIHASH_KVP_PER_PAGE from the relevant
|
||||
template instance header-file. See previous discussion.
|
||||
|
||||
The amount of memory selected should easily contain all of the
|
||||
records, with a generous allowance for hash collisions. Bihash memory
|
||||
is allocated separately from the main heap, and won't cost anything
|
||||
except kernel PTE's until touched, so it's OK to be reasonably
|
||||
generous.
|
||||
|
||||
For example:
|
||||
|
||||
```c
|
||||
my_main_t *mm = &my_main;
|
||||
clib_bihash_8_8_t *h;
|
||||
|
||||
h = &mm->hash_table;
|
||||
|
||||
clib_bihash_init_8_8 (h, "test", (u32) number_of_buckets,
|
||||
(uword) memory_size);
|
||||
```
|
||||
|
||||
### Add or delete a key/value pair
|
||||
|
||||
Use BV(clib_bihash_add_del), or the explicit type variant:
|
||||
|
||||
```c
|
||||
clib_bihash_kv_8_8_t kv;
|
||||
clib_bihash_8_8_t * h;
|
||||
my_main_t *mm = &my_main;
|
||||
clib_bihash_8_8_t *h;
|
||||
|
||||
h = &mm->hash_table;
|
||||
kv.key = key_to_add_or_delete;
|
||||
kv.value = value_to_add_or_delete;
|
||||
|
||||
clib_bihash_add_del_8_8 (h, &kv, is_add /* 1=add, 0=delete */);
|
||||
```
|
||||
|
||||
In the delete case, kv.value is irrelevant. To change the value associated
|
||||
with an existing (key,value) pair, simply re-add the [new] pair.
|
||||
|
||||
### Simple search
|
||||
|
||||
The simplest possible (key, value) search goes like so:
|
||||
|
||||
```c
|
||||
clib_bihash_kv_8_8_t search_kv, return_kv;
|
||||
clib_bihash_8_8_t * h;
|
||||
my_main_t *mm = &my_main;
|
||||
clib_bihash_8_8_t *h;
|
||||
|
||||
h = &mm->hash_table;
|
||||
search_kv.key = key_to_add_or_delete;
|
||||
|
||||
if (clib_bihash_search_8_8 (h, &search_kv, &return_kv) < 0)
|
||||
key_not_found()
|
||||
else
|
||||
key_not_found();
|
||||
```
|
||||
|
||||
Note that it's perfectly fine to collect the lookup result
|
||||
|
||||
```c
|
||||
if (clib_bihash_search_8_8 (h, &search_kv, &search_kv))
|
||||
key_not_found();
|
||||
etc.
|
||||
```
|
||||
|
||||
### Bihash vector processing
|
||||
|
||||
When processing a vector of packets which need a certain lookup
|
||||
performed, it's worth the trouble to compute the key hash, and
|
||||
prefetch the correct bucket ahead of time.
|
||||
|
||||
Here's a sketch of one way to write the required code:
|
||||
|
||||
Dual-loop:
|
||||
* 6 packets ahead, prefetch 2x vlib_buffer_t's and 2x packet data
|
||||
required to form the record keys
|
||||
* 4 packets ahead, form 2x record keys and call BV(clib_bihash_hash)
|
||||
or the explicit hash function to calculate the record hashes.
|
||||
Call 2x BV(clib_bihash_prefetch_bucket) to prefetch the buckets
|
||||
* 2 packets ahead, call 2x BV(clib_bihash_prefetch_data) to prefetch
|
||||
2x (key,value) data pages.
|
||||
* In the processing section, call 2x BV(clib_bihash_search_inline_with_hash)
|
||||
to perform the search
|
||||
|
||||
Programmer's choice whether to stash the hash code somewhere in
|
||||
vnet_buffer(b) metadata, or to use local variables.
|
||||
|
||||
Single-loop:
|
||||
* Use simple search as shown above.
|
||||
|
||||
### Walking a bihash table
|
||||
|
||||
A fairly common scenario to build "show" commands involves walking a
|
||||
bihash table. It's simple enough:
|
||||
|
||||
```c
|
||||
my_main_t *mm = &my_main;
|
||||
clib_bihash_8_8_t *h;
|
||||
void callback_fn (clib_bihash_kv_8_8_t *, void *);
|
||||
|
||||
h = &mm->hash_table;
|
||||
|
||||
BV(clib_bihash_foreach_key_value_pair) (h, callback_fn, (void *) arg);
|
||||
```
|
||||
To nobody's great surprise: clib_bihash_foreach_key_value_pair
|
||||
iterates across the entire table, calling callback_fn with active
|
||||
entries.
|
||||
|
||||
### Creating a new template instance
|
||||
|
||||
Creating a new template is easy. Use one of the existing templates as
|
||||
a model, and make the obvious changes. The hash and key_compare
|
||||
methods are performance-critical in multiple senses.
|
||||
|
||||
If the key compare method is slow, every lookup will be slow. If the
|
||||
hash function is slow, same story. If the hash function has poor
|
||||
statistical properties, space efficiency will suffer. In the limit, a
|
||||
bad enough hash function will cause large portions of the table to
|
||||
revert to linear search.
|
||||
|
||||
Use of the best available vector unit is well worth the trouble in the
|
||||
hash and key_compare functions.
|
151
docs/gettingstarted/developers/building.rst
Normal file
151
docs/gettingstarted/developers/building.rst
Normal file
@ -0,0 +1,151 @@
|
||||
.. _building:
|
||||
|
||||
.. toctree::
|
||||
|
||||
Building VPP
|
||||
============
|
||||
|
||||
To get started developing with VPP you need to get the sources and build the packages.
|
||||
|
||||
.. _setupproxies:
|
||||
|
||||
Set up Proxies
|
||||
--------------
|
||||
|
||||
Depending on the environment, proxies may need to be set.
|
||||
You may run these commands:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ export http_proxy=http://<proxy-server-name>.com:<port-number>
|
||||
$ export https_proxy=https://<proxy-server-name>.com:<port-number>
|
||||
|
||||
|
||||
Get the VPP Sources
|
||||
-------------------
|
||||
|
||||
To get the VPP sources and get ready to build execute the following:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ git clone https://gerrit.fd.io/r/vpp
|
||||
$ cd vpp
|
||||
|
||||
Build VPP Dependencies
|
||||
----------------------
|
||||
|
||||
Before building, make sure there are no FD.io VPP or DPDK packages installed by entering the following
|
||||
commands:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ dpkg -l | grep vpp
|
||||
$ dpkg -l | grep DPDK
|
||||
|
||||
There should be no output, or packages showing after each of the above commands.
|
||||
|
||||
Run this to install the dependencies for FD.io VPP.
|
||||
If it hangs during downloading at any point, you may need to set up :ref:`proxies for this to work <setupproxies>`.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ make install-dep
|
||||
Hit:1 http://us.archive.ubuntu.com/ubuntu xenial InRelease
|
||||
Get:2 http://us.archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
|
||||
Get:3 http://security.ubuntu.com/ubuntu xenial-security InRelease [107 kB]
|
||||
Get:4 http://us.archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]
|
||||
Get:5 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [803 kB]
|
||||
Get:6 http://us.archive.ubuntu.com/ubuntu xenial-updates/main i386 Packages [732 kB]
|
||||
...
|
||||
...
|
||||
Update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jmap to provide /usr/bin/jmap (jmap) in auto mode
|
||||
Setting up default-jdk-headless (2:1.8-56ubuntu2) ...
|
||||
Processing triggers for libc-bin (2.23-0ubuntu3) ...
|
||||
Processing triggers for systemd (229-4ubuntu6) ...
|
||||
Processing triggers for ureadahead (0.100.0-19) ...
|
||||
Processing triggers for ca-certificates (20160104ubuntu1) ...
|
||||
Updating certificates in /etc/ssl/certs...
|
||||
0 added, 0 removed; done.
|
||||
Running hooks in /etc/ca-certificates/update.d...
|
||||
|
||||
done.
|
||||
done.
|
||||
|
||||
Build VPP (Debug Mode)
|
||||
----------------------
|
||||
|
||||
This build version contains debug symbols which is useful to modify VPP. The command below will build debug version of VPP.
|
||||
This build will come with /build-root/vpp_debug-native.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ make build
|
||||
make[1]: Entering directory '/home/vagrant/vpp-master/build-root'
|
||||
@@@@ Arch for platform 'vpp' is native @@@@
|
||||
@@@@ Finding source for dpdk @@@@
|
||||
@@@@ Makefile fragment found in /home/vagrant/vpp-master/build-data/packages/dpdk.mk @@@@
|
||||
@@@@ Source found in /home/vagrant/vpp-master/dpdk @@@@
|
||||
@@@@ Arch for platform 'vpp' is native @@@@
|
||||
@@@@ Finding source for vpp @@@@
|
||||
@@@@ Makefile fragment found in /home/vagrant/vpp-master/build-data/packages/vpp.mk @@@@
|
||||
@@@@ Source found in /home/vagrant/vpp-master/src @@@@
|
||||
...
|
||||
...
|
||||
make[5]: Leaving directory '/home/vagrant/vpp-master/build-root/build-vpp_debug-native/vpp/vpp-api/java'
|
||||
make[4]: Leaving directory '/home/vagrant/vpp-master/build-root/build-vpp_debug-native/vpp/vpp-api/java'
|
||||
make[3]: Leaving directory '/home/vagrant/vpp-master/build-root/build-vpp_debug-native/vpp'
|
||||
make[2]: Leaving directory '/home/vagrant/vpp-master/build-root/build-vpp_debug-native/vpp'
|
||||
@@@@ Installing vpp: nothing to do @@@@
|
||||
make[1]: Leaving directory '/home/vagrant/vpp-master/build-root'
|
||||
|
||||
Build VPP (Release Version)
|
||||
---------------------------
|
||||
|
||||
To build the release version of FD.io VPP.
|
||||
This build is optimized and will not create debug symbols.
|
||||
This build will come with /build-root/build-vpp-native
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ make release
|
||||
|
||||
|
||||
Building Necessary Packages
|
||||
---------------------------
|
||||
|
||||
To build the debian packages, one of the following commands below depending on the system:
|
||||
|
||||
Building Debian Packages
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ make pkg-deb
|
||||
|
||||
|
||||
Building RPM Packages
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ make pkg-rpm
|
||||
|
||||
The packages will be found in the build-root directory.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ ls *.deb
|
||||
|
||||
If packages built correctly, this should be the Output
|
||||
|
||||
vpp_18.07-rc0~456-gb361076_amd64.deb vpp-dbg_18.07-rc0~456-gb361076_amd64.deb
|
||||
vpp-api-java_18.07-rc0~456-gb361076_amd64.deb vpp-dev_18.07-rc0~456-gb361076_amd64.deb
|
||||
vpp-api-lua_18.07-rc0~456-gb361076_amd64.deb vpp-lib_18.07-rc0~456-gb361076_amd64.deb
|
||||
vpp-api-python_18.07-rc0~456-gb361076_amd64.deb vpp-plugins_18.07-rc0~456-gb361076_amd64.deb
|
||||
|
||||
Packages built installed end up in build-root directory. Finally, the command below installs all built packages.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ sudo bash
|
||||
# dpkg -i *.deb
|
224
docs/gettingstarted/developers/featurearcs.md
Normal file
224
docs/gettingstarted/developers/featurearcs.md
Normal file
@ -0,0 +1,224 @@
|
||||
Feature Arcs
|
||||
============
|
||||
|
||||
A significant number of vpp features are configurable on a per-interface
|
||||
or per-system basis. Rather than ask feature coders to manually
|
||||
construct the required graph arcs, we built a general mechanism to
|
||||
manage these mechanics.
|
||||
|
||||
Specifically, feature arcs comprise ordered sets of graph nodes. Each
|
||||
feature node in an arc is independently controlled. Feature arc nodes
|
||||
are generally unaware of each other. Handing a packet to "the next
|
||||
feature node" is quite inexpensive.
|
||||
|
||||
The feature arc implementation solves the problem of creating graph arcs
|
||||
used for steering.
|
||||
|
||||
At the beginning of a feature arc, a bit of setup work is needed, but
|
||||
only if at least one feature is enabled on the arc.
|
||||
|
||||
On a per-arc basis, individual feature definitions create a set of
|
||||
ordering dependencies. Feature infrastructure performs a topological
|
||||
sort of the ordering dependencies, to determine the actual feature
|
||||
order. Missing dependencies **will** lead to runtime disorder. See
|
||||
<https://gerrit.fd.io/r/#/c/12753> for an example.
|
||||
|
||||
If no partial order exists, vpp will refuse to run. Circular dependency
|
||||
loops of the form "a then b, b then c, c then a" are impossible to
|
||||
satisfy.
|
||||
|
||||
Adding a feature to an existing feature arc
|
||||
-------------------------------------------
|
||||
|
||||
To nobody's great surprise, we set up feature arcs using the typical
|
||||
"macro -> constructor function -> list of declarations" pattern:
|
||||
|
||||
```c
|
||||
VNET_FEATURE_INIT (mactime, static) =
|
||||
{
|
||||
.arc_name = "device-input",
|
||||
.node_name = "mactime",
|
||||
.runs_before = VNET_FEATURES ("ethernet-input"),
|
||||
};
|
||||
```
|
||||
|
||||
This creates a "mactime" feature on the "device-input" arc.
|
||||
|
||||
Once per frame, dig up the vnet\_feature\_config\_main\_t corresponding
|
||||
to the "device-input" feature arc:
|
||||
|
||||
```c
|
||||
vnet_main_t *vnm = vnet_get_main ();
|
||||
vnet_interface_main_t *im = &vnm->interface_main;
|
||||
u8 arc = im->output_feature_arc_index;
|
||||
vnet_feature_config_main_t *fcm;
|
||||
|
||||
fcm = vnet_feature_get_config_main (arc);
|
||||
```
|
||||
|
||||
Note that in this case, we've stored the required arc index - assigned
|
||||
by the feature infrastructure - in the vnet\_interface\_main\_t. Where
|
||||
to put the arc index is a programmer's decision when creating a feature
|
||||
arc.
|
||||
|
||||
Per packet, set next0 to steer packets to the next node they should
|
||||
visit:
|
||||
|
||||
```c
|
||||
vnet_get_config_data (&fcm->config_main,
|
||||
&b0->current_config_index /* value-result */,
|
||||
&next0, 0 /* # bytes of config data */);
|
||||
```
|
||||
|
||||
Configuration data is per-feature arc, and is often unused. Note that
|
||||
it's normal to reset next0 to divert packets elsewhere; often, to drop
|
||||
them for cause:
|
||||
|
||||
```c
|
||||
next0 = MACTIME_NEXT_DROP;
|
||||
b0->error = node->errors[DROP_CAUSE];
|
||||
```
|
||||
|
||||
Creating a feature arc
|
||||
----------------------
|
||||
|
||||
Once again, we create feature arcs using constructor macros:
|
||||
|
||||
```c
|
||||
VNET_FEATURE_ARC_INIT (ip4_unicast, static) =
|
||||
{
|
||||
.arc_name = "ip4-unicast",
|
||||
.start_nodes = VNET_FEATURES ("ip4-input", "ip4-input-no-checksum"),
|
||||
.arc_index_ptr = &ip4_main.lookup_main.ucast_feature_arc_index,
|
||||
};
|
||||
```
|
||||
|
||||
In this case, we configure two arc start nodes to handle the
|
||||
"hardware-verified ip checksum or not" cases. During initialization,
|
||||
the feature infrastructure stores the arc index as shown.
|
||||
|
||||
In the head-of-arc node, do the following to send packets along the
|
||||
feature arc:
|
||||
|
||||
```c
|
||||
ip_lookup_main_t *lm = &im->lookup_main;
|
||||
arc = lm->ucast_feature_arc_index;
|
||||
```
|
||||
|
||||
Once per packet, initialize packet metadata to walk the feature arc:
|
||||
|
||||
```c
|
||||
vnet_feature_arc_start (arc, sw_if_index0, &next, b0);
|
||||
```
|
||||
|
||||
Enabling / Disabling features
|
||||
-----------------------------
|
||||
|
||||
Simply call vnet_feature_enable_disable to enable or disable a specific
|
||||
feature:
|
||||
|
||||
```c
|
||||
vnet_feature_enable_disable ("device-input", /* arc name */
|
||||
"mactime", /* feature name */
|
||||
sw_if_index, /* Interface sw_if_index */
|
||||
enable_disable, /* 1 => enable */
|
||||
0 /* (void *) feature_configuration */,
|
||||
0 /* feature_configuration_nbytes */);
|
||||
```
|
||||
|
||||
The feature_configuration opaque is seldom used.
|
||||
|
||||
If you wish to make a feature a _de facto_ system-level concept, pass
|
||||
sw_if_index=0 at all times. Sw_if_index 0 is always valid, and
|
||||
corresponds to the "local" interface.
|
||||
|
||||
Related "show" commands
|
||||
-----------------------
|
||||
|
||||
To display the entire set of features, use "show features [verbose]". The
|
||||
verbose form displays arc indices, and feature indicies within the arcs
|
||||
|
||||
```
|
||||
$ vppctl show features verbose
|
||||
Available feature paths
|
||||
<snip>
|
||||
[14] ip4-unicast:
|
||||
[ 0]: nat64-out2in-handoff
|
||||
[ 1]: nat64-out2in
|
||||
[ 2]: nat44-ed-hairpin-dst
|
||||
[ 3]: nat44-hairpin-dst
|
||||
[ 4]: ip4-dhcp-client-detect
|
||||
[ 5]: nat44-out2in-fast
|
||||
[ 6]: nat44-in2out-fast
|
||||
[ 7]: nat44-handoff-classify
|
||||
[ 8]: nat44-out2in-worker-handoff
|
||||
[ 9]: nat44-in2out-worker-handoff
|
||||
[10]: nat44-ed-classify
|
||||
[11]: nat44-ed-out2in
|
||||
[12]: nat44-ed-in2out
|
||||
[13]: nat44-det-classify
|
||||
[14]: nat44-det-out2in
|
||||
[15]: nat44-det-in2out
|
||||
[16]: nat44-classify
|
||||
[17]: nat44-out2in
|
||||
[18]: nat44-in2out
|
||||
[19]: ip4-qos-record
|
||||
[20]: ip4-vxlan-gpe-bypass
|
||||
[21]: ip4-reassembly-feature
|
||||
[22]: ip4-not-enabled
|
||||
[23]: ip4-source-and-port-range-check-rx
|
||||
[24]: ip4-flow-classify
|
||||
[25]: ip4-inacl
|
||||
[26]: ip4-source-check-via-rx
|
||||
[27]: ip4-source-check-via-any
|
||||
[28]: ip4-policer-classify
|
||||
[29]: ipsec-input-ip4
|
||||
[30]: vpath-input-ip4
|
||||
[31]: ip4-vxlan-bypass
|
||||
[32]: ip4-lookup
|
||||
<snip>
|
||||
```
|
||||
|
||||
Here, we learn that the ip4-unicast feature arc has index 14, and that
|
||||
e.g. ip4-inacl is the 25th feature in the generated partial order.
|
||||
|
||||
To display the features currently active on a specific interface,
|
||||
use "show interface <name> features":
|
||||
|
||||
```
|
||||
$ vppctl show interface GigabitEthernet3/0/0 features
|
||||
Feature paths configured on GigabitEthernet3/0/0...
|
||||
<snip>
|
||||
ip4-unicast:
|
||||
nat44-out2in
|
||||
<snip>
|
||||
```
|
||||
|
||||
Table of Feature Arcs
|
||||
---------------------
|
||||
|
||||
Simply search for name-strings to track down the arc definition, location of
|
||||
the arc index, etc.
|
||||
|
||||
```
|
||||
| Arc Name |
|
||||
|------------------|
|
||||
| device-input |
|
||||
| ethernet-output |
|
||||
| interface-output |
|
||||
| ip4-drop |
|
||||
| ip4-local |
|
||||
| ip4-multicast |
|
||||
| ip4-output |
|
||||
| ip4-punt |
|
||||
| ip4-unicast |
|
||||
| ip6-drop |
|
||||
| ip6-local |
|
||||
| ip6-multicast |
|
||||
| ip6-output |
|
||||
| ip6-punt |
|
||||
| ip6-unicast |
|
||||
| mpls-input |
|
||||
| mpls-output |
|
||||
| nsh-output |
|
||||
```
|
18
docs/gettingstarted/developers/index.rst
Normal file
18
docs/gettingstarted/developers/index.rst
Normal file
@ -0,0 +1,18 @@
|
||||
.. _gstarteddevel:
|
||||
|
||||
##########
|
||||
Developers
|
||||
##########
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
building
|
||||
softwarearchitecture
|
||||
infrastructure
|
||||
vlib
|
||||
plugins
|
||||
vnet
|
||||
featurearcs
|
||||
bihash
|
||||
|
330
docs/gettingstarted/developers/infrastructure.md
Normal file
330
docs/gettingstarted/developers/infrastructure.md
Normal file
File diff suppressed because it is too large
Load Diff
11
docs/gettingstarted/developers/plugins.md
Normal file
11
docs/gettingstarted/developers/plugins.md
Normal file
@ -0,0 +1,11 @@
|
||||
|
||||
Plugins
|
||||
=======
|
||||
|
||||
vlib implements a straightforward plug-in DLL mechanism. VLIB client
|
||||
applications specify a directory to search for plug-in .DLLs, and a name
|
||||
filter to apply (if desired). VLIB needs to load plug-ins very early.
|
||||
|
||||
Once loaded, the plug-in DLL mechanism uses dlsym to find and verify a
|
||||
vlib\_plugin\_registration data structure in the newly-loaded plug-in.
|
||||
|
44
docs/gettingstarted/developers/softwarearchitecture.md
Normal file
44
docs/gettingstarted/developers/softwarearchitecture.md
Normal file
@ -0,0 +1,44 @@
|
||||
Software Architecture
|
||||
=====================
|
||||
|
||||
The fd.io vpp implementation is a third-generation vector packet
|
||||
processing implementation specifically related to US Patent 7,961,636,
|
||||
as well as earlier work. Note that the Apache-2 license specifically
|
||||
grants non-exclusive patent licenses; we mention this patent as a point
|
||||
of historical interest.
|
||||
|
||||
For performance, the vpp dataplane consists of a directed graph of
|
||||
forwarding nodes which process multiple packets per invocation. This
|
||||
schema enables a variety of micro-processor optimizations: pipelining
|
||||
and prefetching to cover dependent read latency, inherent I-cache phase
|
||||
behavior, vector instructions. Aside from hardware input and hardware
|
||||
output nodes, the entire forwarding graph is portable code.
|
||||
|
||||
Depending on the scenario at hand, we often spin up multiple worker
|
||||
threads which process ingress-hashes packets from multiple queues using
|
||||
identical forwarding graph replicas.
|
||||
|
||||
VPP Layers - Implementation Taxonomy
|
||||
------------------------------------
|
||||
|
||||

|
||||
|
||||
- VPP Infra - the VPP infrastructure layer, which contains the core
|
||||
library source code. This layer performs memory functions, works
|
||||
with vectors and rings, performs key lookups in hash tables, and
|
||||
works with timers for dispatching graph nodes.
|
||||
- VLIB - the vector processing library. The vlib layer also handles
|
||||
various application management functions: buffer, memory and graph
|
||||
node management, maintaining and exporting counters, thread
|
||||
management, packet tracing. Vlib implements the debug CLI (command
|
||||
line interface).
|
||||
- VNET - works with VPP\'s networking interface (layers 2, 3, and 4)
|
||||
performs session and traffic management, and works with devices and
|
||||
the data control plane.
|
||||
- Plugins - Contains an increasingly rich set of data-plane plugins,
|
||||
as noted in the above diagram.
|
||||
- VPP - the container application linked against all of the above.
|
||||
|
||||
It's important to understand each of these layers in a certain amount of
|
||||
detail. Much of the implementation is best dealt with at the API level
|
||||
and otherwise left alone.
|
496
docs/gettingstarted/developers/vlib.md
Normal file
496
docs/gettingstarted/developers/vlib.md
Normal file
File diff suppressed because it is too large
Load Diff
171
docs/gettingstarted/developers/vnet.md
Normal file
171
docs/gettingstarted/developers/vnet.md
Normal file
@ -0,0 +1,171 @@
|
||||
|
||||
VNET (VPP Network Stack)
|
||||
========================
|
||||
|
||||
The files associated with the VPP network stack layer are located in the
|
||||
./src/vnet folder. The Network Stack Layer is basically an
|
||||
instantiation of the code in the other layers. This layer has a vnet
|
||||
library that provides vectorized layer-2 and 3 networking graph nodes, a
|
||||
packet generator, and a packet tracer.
|
||||
|
||||
In terms of building a packet processing application, vnet provides a
|
||||
platform-independent subgraph to which one connects a couple of
|
||||
device-driver nodes.
|
||||
|
||||
Typical RX connections include "ethernet-input" \[full software
|
||||
classification, feeds ipv4-input, ipv6-input, arp-input etc.\] and
|
||||
"ipv4-input-no-checksum" \[if hardware can classify, perform ipv4 header
|
||||
checksum\].
|
||||
|
||||

|
||||
|
||||
List of features and layer areas that VNET works with:
|
||||
|
||||
Effective graph dispatch function coding
|
||||
----------------------------------------
|
||||
|
||||
Over the 15 years, multiple coding styles have emerged: a
|
||||
single/dual/quad loop coding model (with variations) and a
|
||||
fully-pipelined coding model.
|
||||
|
||||
Single/dual loops
|
||||
-----------------
|
||||
|
||||
The single/dual/quad loop model variations conveniently solve problems
|
||||
where the number of items to process is not known in advance: typical
|
||||
hardware RX-ring processing. This coding style is also very effective
|
||||
when a given node will not need to cover a complex set of dependent
|
||||
reads.
|
||||
|
||||
Here is an quad/single loop which can leverage up-to-avx512 SIMD vector
|
||||
units to convert buffer indices to buffer pointers:
|
||||
|
||||
```c
|
||||
static uword
|
||||
simulated_ethernet_interface_tx (vlib_main_t * vm,
|
||||
vlib_node_runtime_t *
|
||||
node, vlib_frame_t * frame)
|
||||
{
|
||||
u32 n_left_from, *from;
|
||||
u32 next_index = 0;
|
||||
u32 n_bytes;
|
||||
u32 thread_index = vm->thread_index;
|
||||
vnet_main_t *vnm = vnet_get_main ();
|
||||
vnet_interface_main_t *im = &vnm->interface_main;
|
||||
vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
|
||||
u16 nexts[VLIB_FRAME_SIZE], *next;
|
||||
|
||||
n_left_from = frame->n_vectors;
|
||||
from = vlib_frame_args (frame);
|
||||
|
||||
/*
|
||||
* Convert up to VLIB_FRAME_SIZE indices in "from" to
|
||||
* buffer pointers in bufs[]
|
||||
*/
|
||||
vlib_get_buffers (vm, from, bufs, n_left_from);
|
||||
b = bufs;
|
||||
next = nexts;
|
||||
|
||||
/*
|
||||
* While we have at least 4 vector elements (pkts) to process..
|
||||
*/
|
||||
while (n_left_from >= 4)
|
||||
{
|
||||
/* Prefetch next quad-loop iteration. */
|
||||
if (PREDICT_TRUE (n_left_from >= 8))
|
||||
{
|
||||
vlib_prefetch_buffer_header (b[4], STORE);
|
||||
vlib_prefetch_buffer_header (b[5], STORE);
|
||||
vlib_prefetch_buffer_header (b[6], STORE);
|
||||
vlib_prefetch_buffer_header (b[7], STORE);
|
||||
}
|
||||
|
||||
/*
|
||||
* $$$ Process 4x packets right here...
|
||||
* set next[0..3] to send the packets where they need to go
|
||||
*/
|
||||
|
||||
do_something_to (b[0]);
|
||||
do_something_to (b[1]);
|
||||
do_something_to (b[2]);
|
||||
do_something_to (b[3]);
|
||||
|
||||
/* Process the next 0..4 packets */
|
||||
b += 4;
|
||||
next += 4;
|
||||
n_left_from -= 4;
|
||||
}
|
||||
/*
|
||||
* Clean up 0...3 remaining packets at the end of the incoming frame
|
||||
*/
|
||||
while (n_left_from > 0)
|
||||
{
|
||||
/*
|
||||
* $$$ Process one packet right here...
|
||||
* set next[0..3] to send the packets where they need to go
|
||||
*/
|
||||
do_something_to (b[0]);
|
||||
|
||||
/* Process the next packet */
|
||||
b += 1;
|
||||
next += 1;
|
||||
n_left_from -= 1;
|
||||
}
|
||||
|
||||
/*
|
||||
* Send the packets along their respective next-node graph arcs
|
||||
* Considerable locality of reference is expected, most if not all
|
||||
* packets in the inbound vector will traverse the same next-node
|
||||
* arc
|
||||
*/
|
||||
vlib_buffer_enqueue_to_next (vm, node, from, nexts, frame->n_vectors);
|
||||
|
||||
return frame->n_vectors;
|
||||
}
|
||||
```
|
||||
|
||||
Given a packet processing task to implement, it pays to scout around
|
||||
looking for similar tasks, and think about using the same coding
|
||||
pattern. It is not uncommon to recode a given graph node dispatch function
|
||||
several times during performance optimization.
|
||||
|
||||
Packet tracer
|
||||
-------------
|
||||
|
||||
Vlib includes a frame element \[packet\] trace facility, with a simple
|
||||
vlib cli interface. The cli is straightforward: "trace add
|
||||
input-node-name count".
|
||||
|
||||
To trace 100 packets on a typical x86\_64 system running the dpdk
|
||||
plugin: "trace add dpdk-input 100". When using the packet generator:
|
||||
"trace add pg-input 100"
|
||||
|
||||
Each graph node has the opportunity to capture its own trace data. It is
|
||||
almost always a good idea to do so. The trace capture APIs are simple.
|
||||
|
||||
The packet capture APIs snapshoot binary data, to minimize processing at
|
||||
capture time. Each participating graph node initialization provides a
|
||||
vppinfra format-style user function to pretty-print data when required
|
||||
by the VLIB "show trace" command.
|
||||
|
||||
Set the VLIB node registration ".format\_trace" member to the name of
|
||||
the per-graph node format function.
|
||||
|
||||
Here's a simple example:
|
||||
|
||||
```c
|
||||
u8 * my_node_format_trace (u8 * s, va_list * args)
|
||||
{
|
||||
vlib_main_t * vm = va_arg (*args, vlib_main_t *);
|
||||
vlib_node_t * node = va_arg (*args, vlib_node_t *);
|
||||
my_node_trace_t * t = va_arg (*args, my_trace_t *);
|
||||
|
||||
s = format (s, "My trace data was: %d", t-><whatever>);
|
||||
|
||||
return s;
|
||||
}
|
||||
```
|
||||
|
||||
The trace framework hands the per-node format function the data it
|
||||
captured as the packet whizzed by. The format function pretty-prints the
|
||||
data as desired.
|
Reference in New Issue
Block a user