Files
vpp/src/plugins/map/map_doc.rst
Nathan Skrzypczak f47122e07e docs: convert plugins doc md->rst
Type: improvement

Change-Id: I7e821cce1feae229e1be4baeed249b9cca658135
Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>
2021-10-13 23:22:20 +00:00

100 lines
4.1 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
MAP and Lw4o6
=============
This is a memo intended to contain documentation of the VPP MAP and
Lw4o6 implementations. Everything that is not directly obvious should
come here.
MAP-E Virtual Reassembly
------------------------
The MAP-E implementation supports handling of IPv4 fragments as well as
IPv4-in-IPv6 inner and outer fragments. This is called virtual
reassembly because the fragments are not actually reassembled. Instead,
some meta-data are kept about the first fragment and reused for
subsequent fragments.
Fragment caching and handling is not always necessary. It is performed
when: \* An IPv4 fragment is received and the destination IPv4 address
is shared. \* An IPv6 packet is received with an inner IPv4 fragment,
the IPv4 source address is shared, and security-check fragments is on.
\* An IPv6 fragment is received.
There are 3 dedicated nodes: \* ip4-map-reass \* ip6-map-ip4-reass \*
ip6-map-ip6-reass
ip4-map sends all fragments to ip4-map-reass. ip6-map sends all
inner-fragments to ip6-map-ip4-reass. ip6-map sends all outer-fragments
to ip6-map-ip6-reass.
IPv4 (resp. IPv6) virtual reassembly makes use of a hash table in order
to store IPv4 (resp. IPv6) reassembly structures. The hash-key is based
on the IPv4-src:IPv4-dst:Frag-ID:Protocol tuple (resp.
IPv6-src:IPv6-dst:Frag-ID tuple, as the protocol is IPv4-in-IPv6).
Therefore, each packet reassembly makes use of exactly one reassembly
structure. When such a structure is allocated, it is timestamped with
the current time. Finally, those structures are capable of storing a
limited number of buffer indexes.
An IPv4 (resp. IPv6) reassembly structure can cache up to
MAP_IP4_REASS_MAX_FRAGMENTS_PER_REASSEMBLY (resp.
MAP_IP6_REASS_MAX_FRAGMENTS_PER_REASSEMBLY) buffers. Buffers are cached
until the first fragment is received.
Virtual Reassembly configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
IPv4 and IPv6 virtual reassembly support the following configuration:
map params reassembly [ip4 \| ip6] [lifetime ] [pool-size ] [buffers ]
[ht-ratio ]
lifetime: The time in milliseconds a reassembly structure is considered
valid. The longer, the more reliable is reassembly, but the more likely
it is to exhaust the pool of reassembly structures. IPv4 standard
suggests a lifetime of 15 seconds. IPv6 specifies a lifetime of 60
seconds. Those values are not realistic for high-throughput cases.
buffers: The upper limit of buffers that are allowed to be cached. It
can be used to protect against fragmentation attacks which would aim to
exhaust the global buffers pool.
pool-size: The number of reassembly structures that can be allocated. As
each structure can store a small fixed number of fragments, it also sets
an upper-bound of pool-size \*
MAP_IPX_REASS_MAX_FRAGMENTS_PER_REASSEMBLY buffers that can be cached
in total.
ht-ratio: The amount of buckets in the hash-table is pool-size \*
ht-ratio.
Any time pool-size and ht-ratio is modified, the hash-table is destroyed
and created again, which means all current state is lost.
Additional considerations
^^^^^^^^^^^^^^^^^^^^^^^^^
Reassembly at high rate is expensive in terms of buffers. There is a
trade-off between the lifetime and number of allocated buffers. Reducing
the lifetime helps, but at the cost of loosing state for fragments that
are wide apart.
Let: R be the packet rate at which fragments are received. F be the
number of fragments per packet.
Assuming the first fragment is always received last. We should have:
buffers > lifetime \* R / F \* (F - 1) pool-size > lifetime \* R/F
This is a worst case. Receiving the first fragment earlier helps
reducing the number of required buffers. Also, an optimization is
implemented (MAP_IP6_REASS_COUNT_BYTES and MAP_IP4_REASS_COUNT_BYTES)
which counts the number of transmitted bytes and remembers the total
number of bytes which should be transmitted based on the last fragment,
and therefore helps reducing pool-size.
But the formula shows that it is challenging to forward a significant
amount of fragmented packets at high rates. For instance, with a
lifetime of 1 second, 5Mpps packet rate would require buffering up to
2.5 millions fragments.
If you want to do that, be prepared to configure a lot of fragments.