DOC ONLY: add packet handoff doc

Change-Id: I2e8076bb4f697819780e61ff761defdc74bf4f09
Signed-off-by: Dave Barach <dave@barachs.net>
This commit is contained in:
Dave Barach
2019-03-12 08:54:21 -04:00
committed by Florin Coras
parent 24b77d1be0
commit c4be9850f2

View File

@ -95,7 +95,7 @@ Graph dispatcher internals
--------------------------
This section may be safely skipped. It's not necessary to understand
graph dispatcher internals to create graph nodes.
graph dispatcher internals to create graph nodes.
Vector Data Structure
---------------------
@ -161,10 +161,10 @@ Here is the code in .../src/vlib/main.c:vlib_main_or_worker_loop()
which processes frames:
```c
/*
/*
* Input nodes may have added work to the pending vector.
* Process pending vector until there is nothing left.
* All pending vectors will be processed from input -> output.
* All pending vectors will be processed from input -> output.
*/
for (i = 0; i < _vec_len (nm->pending_frames); i++)
cpu_time_now = dispatch_pending_node (vm, i, cpu_time_now);
@ -245,14 +245,14 @@ indicated next node.
After some scuffling around - two levels of macros - processing
reaches vlib\_get\_next\_frame_internal (...). Get-next-frame-internal
digs up the vlib\_next\_frame\_t corresponding to the desired graph
arc.
arc.
The next frame data structure amounts to a graph-arc-centric frame
cache. Once a node finishes adding element to a frame, it will acquire
a vlib_pending_frame_t and end up on the graph dispatcher's
run-queue. But there's no guarantee that more vector elements won't be
added to the underlying frame from the same (source\_node,
next\_index) arc or from a different (source\_node, next\_index) arc.
next\_index) arc or from a different (source\_node, next\_index) arc.
Maintaining consistency of the arc-to-frame cache is necessary. The
first step in maintaining consistency is to make sure that only one
@ -260,7 +260,7 @@ graph node at a time thinks it "owns" the target vlib\_frame\_t.
Back to the graph node dispatch function. In the usual case, a certain
number of packets will be added to the vlib\_frame\_t acquired by
calling vlib\_get\_next\_frame (...).
calling vlib\_get\_next\_frame (...).
Before a dispatch function returns, it's required to call
vlib\_put\_next\_frame (...) for all of the graph arcs it actually
@ -274,12 +274,12 @@ dispatch\_pending\_node actions
-------------------------------
The main graph dispatch loop calls dispatch pending node as shown
above.
above.
Dispatch\_pending\_node recovers the pending frame, and the graph node
runtime / dispatch function. Further, it recovers the next\_frame
currently associated with the vlib\_frame\_t, and detaches the
vlib\_frame\_t from the next\_frame.
vlib\_frame\_t from the next\_frame.
In .../src/vlib/main.c:dispatch\_pending\_node(...), note this stanza:
@ -349,7 +349,7 @@ to use. Here is a typical example:
vlib_main_t *vm = &vlib_global_main;
uword event_type, * event_data = 0;
while (1)
while (1)
{
vlib_process_wait_for_event_or_clock (vm, 5.0 /* seconds */);
@ -362,7 +362,7 @@ to use. Here is a typical example:
case EVENT2:
handle_event2s (event_data);
break;
break;
case ~0: /* 5-second idle/periodic */
handle_idle ();
@ -471,7 +471,7 @@ Here is a complete example:
}
/* *INDENT-OFF* */
static VLIB_CLI_COMMAND (show_ip_tuple_command) =
static VLIB_CLI_COMMAND (show_ip_tuple_command) =
{
.path = "show ip tuple match",
.short_help = "Show ip 5-tuple match-and-broadcast tables",
@ -494,3 +494,109 @@ code elsewhere to unpack the data and finally print the answer. If a
certain cli command has the potential to hurt packet processing
performance by running for too long, do the work incrementally in a
process node. The client can wait.
Handing off buffers between threads
-----------------------------------
Vlib includes an easy-to-use mechanism for handing off buffers between
worker threads. A typical use-case: software ingress flow hashing. At
a high level, one creates a per-worker-thread queue which sends packets
to a specific graph node in the indicated worker thread. With the
queue in hand, enqueue packets to the worker thread of your choice.
### Initialize a handoff queue
Simple enough, call vlib_frame_queue_main_init:
```c
main_ptr->frame_queue_index
= vlib_frame_queue_main_init (dest_node.index, frame_queue_size);
```
Frame_queue_size means what it says: the number of frames which may be
queued. Since frames contain 1...256 packets, frame_queue_size should
be a reasonably small number (32...64). If the frame queue producer(s)
are faster than the frame queue consumer(s), congestion will
occur. Suggest letting the enqueue operator deal with queue
congestion, as shown in the enqueue example below.
Under the floorboards, vlib_frame_queue_main_init creates an input queue
for each worker thread.
Please do NOT create frame queues until it's clear that they will be
used. Although the main dispatch loop is reasonably smart about how
often it polls the (entire set of) frame queues, polling unused frame
queues is a waste of clock cycles.
### Hand off packets
The actual handoff mechanics are simple, and integrate nicely with
a typical graph-node dispatch function:
```c
always_inline uword
do_handoff_inline (vlib_main_t * vm,
vlib_node_runtime_t * node, vlib_frame_t * frame,
int is_ip4, int is_trace)
{
u32 n_left_from, *from;
vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
u16 thread_indices [VLIB_FRAME_SIZE];
u16 nexts[VLIB_FRAME_SIZE], *next;
u32 n_enq;
htest_main_t *hmp = &htest_main;
int i;
from = vlib_frame_vector_args (frame);
n_left_from = frame->n_vectors;
vlib_get_buffers (vm, from, bufs, n_left_from);
next = nexts;
b = bufs;
/*
* Typical frame traversal loop, details vary with
* use case. Make sure to set thread_indices[i] with
* the desired destination thread index. You may
* or may not bother to set next[i].
*/
for (i = 0; i < frame->n_vectors; i++)
{
<snip>
/* Pick a thread to handle this packet */
thread_indices[i] = f (packet_data_or_whatever);
<snip>
b += 1;
next += 1;
n_left_from -= 1;
}
/* Enqueue buffers to threads */
n_enq =
vlib_buffer_enqueue_to_thread (vm, hmp->frame_queue_index,
from, thread_indices, frame->n_vectors,
1 /* drop on congestion */);
/* Typical counters,
if (n_enq < frame->n_vectors)
vlib_node_increment_counter (vm, node->node_index,
XXX_ERROR_CONGESTION_DROP,
frame->n_vectors - n_enq);
vlib_node_increment_counter (vm, node->node_index,
XXX_ERROR_HANDED_OFF, n_enq);
return frame->n_vectors;
}
```
Notes about calling vlib_buffer_enqueue_to_thread(...):
* If you pass "drop on congestion" non-zero, all packets in the
inbound frame will be consumed one way or the other. This is the
recommended setting.
* In the drop-on-congestion case, please don't try to "help" in the
enqueue node by freeing dropped packets, or by pushing them to
"error-drop." Either of those actions would be a severe error.
* It's perfectly OK to enqueue packets to the current thread.