Initial commit of Sphinx docs

Change-Id: I9fca8fb98502dffc2555f9de7f507b6f006e0e77 Signed-off-by: John DeNisco <jdenisco@cisco.com>
2018-07-26 12:45:10 -04:00
parent 1d65279ffe
commit 06dcd45ff8
239 changed files with 12736 additions and 56 deletions
--- a/docs/gettingstarted/developers/bihash.md
+++ b/docs/gettingstarted/developers/bihash.md
@ -0,0 +1,273 @@
+Bounded-index Extensible Hashing (bihash)
+=========================================
+
+Vpp uses bounded-index extensible hashing to solve a variety of
+exact-match (key, value) lookup problems. Benefits of the current
+implementation:
+
+* Very high record count scaling, tested to 100,000,000 records.
+* Lookup performance degrades gracefully as the number of records increases
+* No reader locking required
+* Template implementation, it's easy to support arbitrary (key,value) types
+
+Bounded-index extensible hashing has been widely used in databases for
+decades. 
+
+Bihash uses a two-level data structure:
+
+```
+    +-----------------+                                              
+    | bucket-0        |                                             
+    |  log2_size      |                                             
+    |  backing store  |                                             
+    +-----------------+                                             
+    | bucket-1        |                                             
+    |  log2_size      |           +--------------------------------+
+    |  backing store  | --------> | KVP_PER_PAGE * key-value-pairs |
+    +-----------------+           | page 0                         |
+         ...                      +--------------------------------+
+    +-----------------+           | KVP_PER_PAGE * key-value-pairs |
+    | bucket-2**N-1   |           | page 1                         |
+    |  log2_size      |           +--------------------------------+
+    |  backing store  |                       ---                   
+    +-----------------+           +--------------------------------+
+                                  | KVP_PER_PAGE * key-value-pairs |
+                                  | page 2**(log2(size)) - 1       |
+                                  +--------------------------------+
+```                                  
+
+Discussion of the algorithm
+---------------------------
+
+This structure has a couple of major advantages. In practice, each
+bucket entry fits into a 64-bit integer. Coincidentally, vpp's target
+CPU architectures support 64-bit atomic operations. When modifying the
+contents of a specific bucket, we do the following:
+
+* Make a working copy of the bucket's backing storage
+* Atomically swap a pointer to the working copy into the bucket array
+* Change the original backing store data
+* Atomically swap back to the original
+
+So, no reader locking is required to search a bihash table.
+
+At lookup time, the implementation computes a key hash code. We use
+the least-significant N bits of the hash to select the bucket.
+
+With the bucket in hand, we learn log2 (nBackingPages) for the
+selected bucket. At this point, we use the next log2_size bits from
+the hash code to select the specific backing page in which the
+(key,value) page will be found.
+
+Net result: we search **one** backing page, not 2**log2_size
+pages. This is a key property of the algorithm.
+
+When sufficient collisions occur to fill the backing pages for a given
+bucket, we double the bucket size, rehash, and deal the bucket
+contents into a double-sized set of backing pages. In the future, we
+may represent the size as a linear combination of two powers-of-two,
+to increase space efficiency.
+
+To solve the "jackpot case" where a set of records collide under
+hashing in a bad way, the implementation will fall back to linear
+search across 2**log2_size backing pages on a per-bucket basis.
+
+To maintain *space* efficiency, we should configure the bucket array
+so that backing pages are effectively utilized. Lookup performance
+tends to change *very litte* if the bucket array is too small or too
+large.
+
+Bihash depends on selecting an effective hash function. If one were to
+use a truly broken hash function such as "return 1ULL." bihash would
+still work, but it would be equivalent to poorly-programmed linear
+search.
+
+We often use cpu intrinsic functions - think crc32 - to rapidly
+compute a hash code which has decent statistics.
+
+Bihash Cookbook
+---------------
+
+### Using current (key,value) template instance types
+
+It's quite easy to use one of the template instance types. As of this
+writing, .../src/vppinfra provides pre-built templates for 8, 16, 20,
+24, 40, and 48 byte keys, u8 * vector keys, and 8 byte values.
+
+See .../src/vppinfra/{bihash_<key-size>_8}.h
+
+To define the data types, #include a specific template instance, most
+often in a subsystem header file:
+
+```c
+     #include <vppinfra/bihash_8_8.h>
+```
+
+If you're building a standalone application, you'll need to define the
+various functions by #including the method implementation file in a C
+source file. 
+
+The core vpp engine currently uses most if not all of the known bihash
+types, so you probably won't need to #include the method
+implementation file.
+
+
+```c
+     #include <vppinfra/bihash_template.c>
+```
+
+Add an instance of the selected bihash data structure to e.g. a
+"main_t" structure:
+
+```c
+    typedef struct
+    {
+      ...
+      BVT (clib_bihash) hash;
+      or
+      clib_bihash_8_8_t hash;
+      ...
+    } my_main_t;
+```
+
+The BV macro concatenate its argument with the value of the
+preprocessor symbol BIHASH_TYPE. The BVT macro concatenates its
+argument with the value of BIHASH_TYPE and the fixed-string "_t". So
+in the above example, BVT (clib_bihash) generates "clib_bihash_8_8_t".
+
+If you're sure you won't decide to change the template / type name
+later, it's perfectly OK to code "clib_bihash_8_8_t" and so forth.
+
+In fact, if you #include multiple template instances in a single
+source file, you **must** use fully-enumerated type names. The macros
+stand no chance of working.
+
+### Initializing a bihash table
+
+Call the init function as shown. As a rough guide, pick a number of
+buckets which is approximately
+number_of_expected_records/BIHASH_KVP_PER_PAGE from the relevant
+template instance header-file.  See previous discussion. 
+
+The amount of memory selected should easily contain all of the
+records, with a generous allowance for hash collisions. Bihash memory
+is allocated separately from the main heap, and won't cost anything
+except kernel PTE's until touched, so it's OK to be reasonably
+generous.
+
+For example:
+
+```c
+    my_main_t *mm = &my_main;
+    clib_bihash_8_8_t *h;
+        
+    h = &mm->hash_table;
+
+    clib_bihash_init_8_8 (h, "test", (u32) number_of_buckets, 
+                           (uword) memory_size);
+```
+
+### Add or delete a key/value pair
+
+Use BV(clib_bihash_add_del), or the explicit type variant:
+
+```c
+   clib_bihash_kv_8_8_t kv;
+   clib_bihash_8_8_t * h;
+   my_main_t *mm = &my_main;
+   clib_bihash_8_8_t *h;
+        
+   h = &mm->hash_table;
+   kv.key = key_to_add_or_delete;
+   kv.value = value_to_add_or_delete;
+
+   clib_bihash_add_del_8_8 (h, &kv, is_add /* 1=add, 0=delete */);
+```
+
+In the delete case, kv.value is irrelevant. To change the value associated
+with an existing (key,value) pair, simply re-add the [new] pair.
+
+### Simple search
+
+The simplest possible (key, value) search goes like so:
+
+```c
+   clib_bihash_kv_8_8_t search_kv, return_kv;
+   clib_bihash_8_8_t * h;
+   my_main_t *mm = &my_main;
+   clib_bihash_8_8_t *h;
+        
+   h = &mm->hash_table;
+   search_kv.key = key_to_add_or_delete;
+
+   if (clib_bihash_search_8_8 (h, &search_kv, &return_kv) < 0)
+     key_not_found()
+   else
+     key_not_found();
+```
+
+Note that it's perfectly fine to collect the lookup result
+
+```c
+   if (clib_bihash_search_8_8 (h, &search_kv, &search_kv))
+     key_not_found();
+   etc.
+```
+
+### Bihash vector processing
+
+When processing a vector of packets which need a certain lookup
+performed, it's worth the trouble to compute the key hash, and
+prefetch the correct bucket ahead of time.
+
+Here's a sketch of one way to write the required code:
+
+Dual-loop:
+* 6 packets ahead, prefetch 2x vlib_buffer_t's and 2x packet data
+  required to form the record keys
+* 4 packets ahead, form 2x record keys and call BV(clib_bihash_hash)
+  or the explicit hash function to calculate the record hashes.
+  Call 2x BV(clib_bihash_prefetch_bucket) to prefetch the buckets
+* 2 packets ahead, call 2x BV(clib_bihash_prefetch_data) to prefetch 
+  2x (key,value) data pages.
+* In the processing section, call 2x BV(clib_bihash_search_inline_with_hash)
+  to perform the search
+
+Programmer's choice whether to stash the hash code somewhere in
+vnet_buffer(b) metadata, or to use local variables.
+
+Single-loop:
+* Use simple search as shown above.
+
+### Walking a bihash table
+
+A fairly common scenario to build "show" commands involves walking a
+bihash table. It's simple enough:
+
+```c
+   my_main_t *mm = &my_main;
+   clib_bihash_8_8_t *h;
+   void callback_fn (clib_bihash_kv_8_8_t *, void *);
+
+   h = &mm->hash_table;
+
+   BV(clib_bihash_foreach_key_value_pair) (h, callback_fn, (void *) arg);
+```
+To nobody's great surprise: clib_bihash_foreach_key_value_pair
+iterates across the entire table, calling callback_fn with active
+entries.
+
+### Creating a new template instance
+
+Creating a new template is easy. Use one of the existing templates as
+a model, and make the obvious changes. The hash and key_compare
+methods are performance-critical in multiple senses.
+
+If the key compare method is slow, every lookup will be slow. If the
+hash function is slow, same story. If the hash function has poor
+statistical properties, space efficiency will suffer. In the limit, a
+bad enough hash function will cause large portions of the table to
+revert to linear search.
+
+Use of the best available vector unit is well worth the trouble in the
+hash and key_compare functions.
--- a/docs/gettingstarted/developers/building.rst
+++ b/docs/gettingstarted/developers/building.rst
@ -0,0 +1,151 @@
+.. _building:
+
+.. toctree::
+
+Building VPP
+============
+
+To get started developing with VPP you need to get the sources and build the packages.
+
+.. _setupproxies:
+
+Set up Proxies
+--------------
+
+Depending on the environment, proxies may need to be set. 
+You may run these commands:
+
+.. code-block:: console
+
+    $ export http_proxy=http://<proxy-server-name>.com:<port-number>
+    $ export https_proxy=https://<proxy-server-name>.com:<port-number>
+
+
+Get the VPP Sources
+-------------------
+
+To get the VPP sources and get ready to build execute the following:
+
+.. code-block:: console
+
+    $ git clone https://gerrit.fd.io/r/vpp
+    $ cd vpp
+
+Build VPP Dependencies
+----------------------
+
+Before building, make sure there are no FD.io VPP or DPDK packages installed by entering the following
+commands:
+
+.. code-block:: console
+
+    $ dpkg -l | grep vpp 
+    $ dpkg -l | grep DPDK
+
+There should be no output, or packages showing after each of the above commands.
+
+Run this to install the dependencies for FD.io VPP. 
+If it hangs during downloading at any point, you may need to set up :ref:`proxies for this to work <setupproxies>`.
+
+.. code-block:: console
+
+    $ make install-dep
+    Hit:1 http://us.archive.ubuntu.com/ubuntu xenial InRelease
+    Get:2 http://us.archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
+    Get:3 http://security.ubuntu.com/ubuntu xenial-security InRelease [107 kB]
+    Get:4 http://us.archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]
+    Get:5 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [803 kB]
+    Get:6 http://us.archive.ubuntu.com/ubuntu xenial-updates/main i386 Packages [732 kB]
+    ...
+    ...
+    Update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jmap to provide /usr/bin/jmap (jmap) in auto mode
+    Setting up default-jdk-headless (2:1.8-56ubuntu2) ...
+    Processing triggers for libc-bin (2.23-0ubuntu3) ...
+    Processing triggers for systemd (229-4ubuntu6) ...
+    Processing triggers for ureadahead (0.100.0-19) ...
+    Processing triggers for ca-certificates (20160104ubuntu1) ...
+    Updating certificates in /etc/ssl/certs...
+    0 added, 0 removed; done.
+    Running hooks in /etc/ca-certificates/update.d...
+
+    done.
+    done.
+
+Build VPP (Debug Mode)
+----------------------
+
+This build version contains debug symbols which is useful to modify VPP. The command below will build debug version of VPP. 
+This build will come with /build-root/vpp_debug-native.
+
+.. code-block:: console
+
+    $ make build
+    make[1]: Entering directory '/home/vagrant/vpp-master/build-root'
+    @@@@ Arch for platform 'vpp' is native @@@@
+    @@@@ Finding source for dpdk @@@@
+    @@@@ Makefile fragment found in /home/vagrant/vpp-master/build-data/packages/dpdk.mk @@@@
+    @@@@ Source found in /home/vagrant/vpp-master/dpdk @@@@
+    @@@@ Arch for platform 'vpp' is native @@@@
+    @@@@ Finding source for vpp @@@@
+    @@@@ Makefile fragment found in /home/vagrant/vpp-master/build-data/packages/vpp.mk @@@@
+    @@@@ Source found in /home/vagrant/vpp-master/src @@@@
+    ...
+    ...
+    make[5]: Leaving directory '/home/vagrant/vpp-master/build-root/build-vpp_debug-native/vpp/vpp-api/java'
+    make[4]: Leaving directory '/home/vagrant/vpp-master/build-root/build-vpp_debug-native/vpp/vpp-api/java'
+    make[3]: Leaving directory '/home/vagrant/vpp-master/build-root/build-vpp_debug-native/vpp'
+    make[2]: Leaving directory '/home/vagrant/vpp-master/build-root/build-vpp_debug-native/vpp'
+    @@@@ Installing vpp: nothing to do @@@@
+    make[1]: Leaving directory '/home/vagrant/vpp-master/build-root'
+
+Build VPP (Release Version)
+---------------------------
+
+To build the release version of FD.io VPP.
+This build is optimized and will not create debug symbols.
+This build will come with /build-root/build-vpp-native
+
+.. code-block:: console
+
+    $ make release
+
+
+Building Necessary Packages
+---------------------------
+
+To build the debian packages, one of the following commands below depending on the system:
+
+Building Debian Packages
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: console
+
+    $ make pkg-deb 
+
+
+Building RPM Packages
+^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: console
+
+    $ make pkg-rpm
+
+The packages will be found in the build-root directory.
+
+.. code-block:: console
+    
+    $ ls *.deb
+
+    If packages built correctly, this should be the Output
+
+    vpp_18.07-rc0~456-gb361076_amd64.deb             vpp-dbg_18.07-rc0~456-gb361076_amd64.deb
+    vpp-api-java_18.07-rc0~456-gb361076_amd64.deb    vpp-dev_18.07-rc0~456-gb361076_amd64.deb
+    vpp-api-lua_18.07-rc0~456-gb361076_amd64.deb     vpp-lib_18.07-rc0~456-gb361076_amd64.deb
+    vpp-api-python_18.07-rc0~456-gb361076_amd64.deb  vpp-plugins_18.07-rc0~456-gb361076_amd64.deb
+
+Packages built installed end up in build-root directory. Finally, the command below installs all built packages.
+
+.. code-block:: console
+
+   $ sudo bash
+   # dpkg -i *.deb
--- a/docs/gettingstarted/developers/featurearcs.md
+++ b/docs/gettingstarted/developers/featurearcs.md
@ -0,0 +1,224 @@
+Feature Arcs
+============
+
+A significant number of vpp features are configurable on a per-interface
+or per-system basis. Rather than ask feature coders to manually
+construct the required graph arcs, we built a general mechanism to
+manage these mechanics.
+
+Specifically, feature arcs comprise ordered sets of graph nodes. Each
+feature node in an arc is independently controlled. Feature arc nodes
+are generally unaware of each other. Handing a packet to "the next
+feature node" is quite inexpensive.
+
+The feature arc implementation solves the problem of creating graph arcs
+used for steering.
+
+At the beginning of a feature arc, a bit of setup work is needed, but
+only if at least one feature is enabled on the arc.
+
+On a per-arc basis, individual feature definitions create a set of
+ordering dependencies. Feature infrastructure performs a topological
+sort of the ordering dependencies, to determine the actual feature
+order. Missing dependencies **will** lead to runtime disorder. See
+<https://gerrit.fd.io/r/#/c/12753> for an example.
+
+If no partial order exists, vpp will refuse to run. Circular dependency
+loops of the form "a then b, b then c, c then a" are impossible to
+satisfy.
+
+Adding a feature to an existing feature arc
+-------------------------------------------
+
+To nobody's great surprise, we set up feature arcs using the typical
+"macro -> constructor function -> list of declarations" pattern:
+
+```c
+    VNET_FEATURE_INIT (mactime, static) =
+    {
+      .arc_name = "device-input",
+      .node_name = "mactime",
+      .runs_before = VNET_FEATURES ("ethernet-input"),
+    };  
+```
+
+This creates a "mactime" feature on the "device-input" arc.
+
+Once per frame, dig up the vnet\_feature\_config\_main\_t corresponding
+to the "device-input" feature arc:
+
+```c
+    vnet_main_t *vnm = vnet_get_main ();
+    vnet_interface_main_t *im = &vnm->interface_main;
+    u8 arc = im->output_feature_arc_index;
+    vnet_feature_config_main_t *fcm;
+
+    fcm = vnet_feature_get_config_main (arc);
+```
+
+Note that in this case, we've stored the required arc index - assigned
+by the feature infrastructure - in the vnet\_interface\_main\_t. Where
+to put the arc index is a programmer's decision when creating a feature
+arc.
+
+Per packet, set next0 to steer packets to the next node they should
+visit:
+
+```c
+    vnet_get_config_data (&fcm->config_main,
+                          &b0->current_config_index /* value-result */, 
+                          &next0, 0 /* # bytes of config data */);
+```
+
+Configuration data is per-feature arc, and is often unused. Note that
+it's normal to reset next0 to divert packets elsewhere; often, to drop
+them for cause:
+
+```c
+    next0 = MACTIME_NEXT_DROP;
+    b0->error = node->errors[DROP_CAUSE];
+```
+
+Creating a feature arc
+----------------------
+
+Once again, we create feature arcs using constructor macros:
+
+```c
+    VNET_FEATURE_ARC_INIT (ip4_unicast, static) =
+    {
+      .arc_name = "ip4-unicast",
+      .start_nodes = VNET_FEATURES ("ip4-input", "ip4-input-no-checksum"),
+      .arc_index_ptr = &ip4_main.lookup_main.ucast_feature_arc_index,
+    };  
+```
+
+In this case, we configure two arc start nodes to handle the
+"hardware-verified ip checksum or not" cases. During initialization,
+the feature infrastructure stores the arc index as shown.
+
+In the head-of-arc node, do the following to send packets along the
+feature arc:
+
+```c
+    ip_lookup_main_t *lm = &im->lookup_main;
+    arc = lm->ucast_feature_arc_index;
+```
+
+Once per packet, initialize packet metadata to walk the feature arc:
+
+```c
+vnet_feature_arc_start (arc, sw_if_index0, &next, b0);
+```
+
+Enabling / Disabling features
+-----------------------------
+
+Simply call vnet_feature_enable_disable to enable or disable a specific
+feature:
+
+```c
+    vnet_feature_enable_disable ("device-input", /* arc name */
+                                 "mactime",      /* feature name */
+           		             sw_if_index,    /* Interface sw_if_index */
+                                 enable_disable, /* 1 => enable */
+                                 0 /* (void *) feature_configuration */, 
+                                 0 /* feature_configuration_nbytes */);
+```
+
+The feature_configuration opaque is seldom used. 
+
+If you wish to make a feature a _de facto_ system-level concept, pass
+sw_if_index=0 at all times. Sw_if_index 0 is always valid, and
+corresponds to the "local" interface.
+
+Related "show" commands
+-----------------------
+
+To display the entire set of features, use "show features [verbose]". The
+verbose form displays arc indices, and feature indicies within the arcs
+
+```
+$ vppctl show features verbose
+Available feature paths
+<snip>
+[14] ip4-unicast:
+  [ 0]: nat64-out2in-handoff
+  [ 1]: nat64-out2in
+  [ 2]: nat44-ed-hairpin-dst
+  [ 3]: nat44-hairpin-dst
+  [ 4]: ip4-dhcp-client-detect
+  [ 5]: nat44-out2in-fast
+  [ 6]: nat44-in2out-fast
+  [ 7]: nat44-handoff-classify
+  [ 8]: nat44-out2in-worker-handoff
+  [ 9]: nat44-in2out-worker-handoff
+  [10]: nat44-ed-classify
+  [11]: nat44-ed-out2in
+  [12]: nat44-ed-in2out
+  [13]: nat44-det-classify
+  [14]: nat44-det-out2in
+  [15]: nat44-det-in2out
+  [16]: nat44-classify
+  [17]: nat44-out2in
+  [18]: nat44-in2out
+  [19]: ip4-qos-record
+  [20]: ip4-vxlan-gpe-bypass
+  [21]: ip4-reassembly-feature
+  [22]: ip4-not-enabled
+  [23]: ip4-source-and-port-range-check-rx
+  [24]: ip4-flow-classify
+  [25]: ip4-inacl
+  [26]: ip4-source-check-via-rx
+  [27]: ip4-source-check-via-any
+  [28]: ip4-policer-classify
+  [29]: ipsec-input-ip4
+  [30]: vpath-input-ip4
+  [31]: ip4-vxlan-bypass
+  [32]: ip4-lookup
+<snip>
+```
+
+Here, we learn that the ip4-unicast feature arc has index 14, and that
+e.g. ip4-inacl is the 25th feature in the generated partial order.
+
+To display the features currently active on a specific interface,
+use "show interface <name> features":
+
+```
+$ vppctl show interface GigabitEthernet3/0/0 features
+Feature paths configured on GigabitEthernet3/0/0...
+<snip>
+ip4-unicast:
+  nat44-out2in
+<snip>
+```
+
+Table of Feature Arcs
+---------------------
+
+Simply search for name-strings to track down the arc definition, location of
+the arc index, etc.
+
+```
+            |    Arc Name      |
+            |------------------|
+            | device-input     |
+            | ethernet-output  |
+            | interface-output |
+            | ip4-drop         |
+            | ip4-local        |
+            | ip4-multicast    |
+            | ip4-output       |
+            | ip4-punt         |
+            | ip4-unicast      |
+            | ip6-drop         |
+            | ip6-local        |
+            | ip6-multicast    |
+            | ip6-output       |
+            | ip6-punt         |
+            | ip6-unicast      |
+            | mpls-input       |
+            | mpls-output      |
+            | nsh-output       |
+```
--- a/docs/gettingstarted/developers/index.rst
+++ b/docs/gettingstarted/developers/index.rst
@ -0,0 +1,18 @@
+.. _gstarteddevel:
+
+##########
+Developers
+##########
+
+.. toctree::
+   :maxdepth: 2
+
+   building
+   softwarearchitecture
+   infrastructure
+   vlib
+   plugins
+   vnet
+   featurearcs
+   bihash
+
--- a/docs/gettingstarted/developers/infrastructure.md
+++ b/docs/gettingstarted/developers/infrastructure.md
--- a/docs/gettingstarted/developers/plugins.md
+++ b/docs/gettingstarted/developers/plugins.md
@ -0,0 +1,11 @@
+
+Plugins
+=======
+
+vlib implements a straightforward plug-in DLL mechanism. VLIB client
+applications specify a directory to search for plug-in .DLLs, and a name
+filter to apply (if desired). VLIB needs to load plug-ins very early.
+
+Once loaded, the plug-in DLL mechanism uses dlsym to find and verify a
+vlib\_plugin\_registration data structure in the newly-loaded plug-in.
+
--- a/docs/gettingstarted/developers/softwarearchitecture.md
+++ b/docs/gettingstarted/developers/softwarearchitecture.md
@ -0,0 +1,44 @@
+Software Architecture
+=====================
+
+The fd.io vpp implementation is a third-generation vector packet
+processing implementation specifically related to US Patent 7,961,636,
+as well as earlier work. Note that the Apache-2 license specifically
+grants non-exclusive patent licenses; we mention this patent as a point
+of historical interest.
+
+For performance, the vpp dataplane consists of a directed graph of
+forwarding nodes which process multiple packets per invocation. This
+schema enables a variety of micro-processor optimizations: pipelining
+and prefetching to cover dependent read latency, inherent I-cache phase
+behavior, vector instructions. Aside from hardware input and hardware
+output nodes, the entire forwarding graph is portable code.
+
+Depending on the scenario at hand, we often spin up multiple worker
+threads which process ingress-hashes packets from multiple queues using
+identical forwarding graph replicas.
+
+VPP Layers - Implementation Taxonomy
+------------------------------------
+
+![image](/_images/VPP_Layering.png)
+
+-   VPP Infra - the VPP infrastructure layer, which contains the core
+    library source code. This layer performs memory functions, works
+    with vectors and rings, performs key lookups in hash tables, and
+    works with timers for dispatching graph nodes.
+-   VLIB - the vector processing library. The vlib layer also handles
+    various application management functions: buffer, memory and graph
+    node management, maintaining and exporting counters, thread
+    management, packet tracing. Vlib implements the debug CLI (command
+    line interface).
+-   VNET - works with VPP\'s networking interface (layers 2, 3, and 4)
+    performs session and traffic management, and works with devices and
+    the data control plane.
+-   Plugins - Contains an increasingly rich set of data-plane plugins,
+    as noted in the above diagram.
+-   VPP - the container application linked against all of the above.
+
+It's important to understand each of these layers in a certain amount of
+detail. Much of the implementation is best dealt with at the API level
+and otherwise left alone.
--- a/docs/gettingstarted/developers/vlib.md
+++ b/docs/gettingstarted/developers/vlib.md
--- a/docs/gettingstarted/developers/vnet.md
+++ b/docs/gettingstarted/developers/vnet.md
@ -0,0 +1,171 @@
+
+VNET (VPP Network Stack)
+========================
+
+The files associated with the VPP network stack layer are located in the
+./src/vnet folder. The Network Stack Layer is basically an
+instantiation of the code in the other layers. This layer has a vnet
+library that provides vectorized layer-2 and 3 networking graph nodes, a
+packet generator, and a packet tracer.
+
+In terms of building a packet processing application, vnet provides a
+platform-independent subgraph to which one connects a couple of
+device-driver nodes.
+
+Typical RX connections include "ethernet-input" \[full software
+classification, feeds ipv4-input, ipv6-input, arp-input etc.\] and
+"ipv4-input-no-checksum" \[if hardware can classify, perform ipv4 header
+checksum\].
+
+![image](/_images/VNET_Features.png)
+
+List of features and layer areas that VNET works with:
+
+Effective graph dispatch function coding
+----------------------------------------
+
+Over the 15 years, multiple coding styles have emerged: a
+single/dual/quad loop coding model (with variations) and a
+fully-pipelined coding model.
+
+Single/dual loops
+-----------------
+
+The single/dual/quad loop model variations conveniently solve problems
+where the number of items to process is not known in advance: typical
+hardware RX-ring processing. This coding style is also very effective
+when a given node will not need to cover a complex set of dependent
+reads.
+
+Here is an quad/single loop which can leverage up-to-avx512 SIMD vector
+units to convert buffer indices to buffer pointers:
+
+```c
+   static uword
+   simulated_ethernet_interface_tx (vlib_main_t * vm,
+   				 vlib_node_runtime_t *
+   				 node, vlib_frame_t * frame)
+   {
+     u32 n_left_from, *from;
+     u32 next_index = 0;
+     u32 n_bytes;
+     u32 thread_index = vm->thread_index;
+     vnet_main_t *vnm = vnet_get_main ();
+     vnet_interface_main_t *im = &vnm->interface_main;
+     vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
+     u16 nexts[VLIB_FRAME_SIZE], *next;
+
+     n_left_from = frame->n_vectors;
+     from = vlib_frame_args (frame);
+
+     /* 
+      * Convert up to VLIB_FRAME_SIZE indices in "from" to 
+      * buffer pointers in bufs[]
+      */
+     vlib_get_buffers (vm, from, bufs, n_left_from);
+     b = bufs;
+     next = nexts;
+
+     /* 
+      * While we have at least 4 vector elements (pkts) to process.. 
+      */
+     while (n_left_from >= 4)
+       {
+         /* Prefetch next quad-loop iteration. */
+         if (PREDICT_TRUE (n_left_from >= 8))
+   	   {
+   	     vlib_prefetch_buffer_header (b[4], STORE);
+   	     vlib_prefetch_buffer_header (b[5], STORE);
+   	     vlib_prefetch_buffer_header (b[6], STORE);
+   	     vlib_prefetch_buffer_header (b[7], STORE);
+           }
+
+         /* 
+          * $$$ Process 4x packets right here...
+          * set next[0..3] to send the packets where they need to go
+          */
+
+          do_something_to (b[0]);
+          do_something_to (b[1]);
+          do_something_to (b[2]);
+          do_something_to (b[3]);
+
+         /* Process the next 0..4 packets */
+   	 b += 4;
+   	 next += 4;
+   	 n_left_from -= 4;
+   	}
+     /* 
+      * Clean up 0...3 remaining packets at the end of the incoming frame
+      */
+     while (n_left_from > 0)
+       {
+         /* 
+          * $$$ Process one packet right here...
+          * set next[0..3] to send the packets where they need to go
+          */
+          do_something_to (b[0]);
+
+         /* Process the next packet */
+         b += 1;
+         next += 1;
+         n_left_from -= 1;
+       }
+
+     /*
+      * Send the packets along their respective next-node graph arcs
+      * Considerable locality of reference is expected, most if not all
+      * packets in the inbound vector will traverse the same next-node
+      * arc
+      */
+     vlib_buffer_enqueue_to_next (vm, node, from, nexts, frame->n_vectors);
+
+     return frame->n_vectors;
+   }  
+```
+
+Given a packet processing task to implement, it pays to scout around
+looking for similar tasks, and think about using the same coding
+pattern. It is not uncommon to recode a given graph node dispatch function
+several times during performance optimization.
+
+Packet tracer
+-------------
+
+Vlib includes a frame element \[packet\] trace facility, with a simple
+vlib cli interface. The cli is straightforward: "trace add
+input-node-name count".
+
+To trace 100 packets on a typical x86\_64 system running the dpdk
+plugin: "trace add dpdk-input 100". When using the packet generator:
+"trace add pg-input 100"
+
+Each graph node has the opportunity to capture its own trace data. It is
+almost always a good idea to do so. The trace capture APIs are simple.
+
+The packet capture APIs snapshoot binary data, to minimize processing at
+capture time. Each participating graph node initialization provides a
+vppinfra format-style user function to pretty-print data when required
+by the VLIB "show trace" command.
+
+Set the VLIB node registration ".format\_trace" member to the name of
+the per-graph node format function.
+
+Here's a simple example:
+
+```c
+    u8 * my_node_format_trace (u8 * s, va_list * args)
+    {
+        vlib_main_t * vm = va_arg (*args, vlib_main_t *);
+        vlib_node_t * node = va_arg (*args, vlib_node_t *);
+        my_node_trace_t * t = va_arg (*args, my_trace_t *);
+
+        s = format (s, "My trace data was: %d", t-><whatever>);
+
+        return s;
+    } 
+```
+
+The trace framework hands the per-node format function the data it
+captured as the packet whizzed by. The format function pretty-prints the
+data as desired.