Next: Filter Actions Up: QoS in Linux with Previous: QoS in Linux with

Finally time to start filtering!

Let's begin with a simple one, i.e. reestablishing what pfifo_fast did automatically based on TOS/Priority field. Linux internally translates the header field into the priority field of struct skbuff, which pfifo_fast uses for classification. tc-prio(8) contains a table listing the priority (and ultimately, pfifo_fast queue index) each TOS value is being translated into. Here is a shorter version:

TOS Values	Linux Priority (Number)	Queue Index
0x0 - 0x6	Best Effort (0)	1
0x8 - 0xe	Bulk (2)	2
0x10 - 0x16	Interactive (6)	0
0x18 - 0x1e	Interactive Bulk (4)	1

Using the basic filter, it is possible to match packets based on that skbuff field, which has the added benefit of being IP version agnostic. Since the HTB setup above defaults to class ID 1:30, the Bulk priority can be ignored. The basic filter allows to combine matches, therefore we get along with only two filters:

# tc filter add dev eth0 parent 1: basic \
        match 'meta(priority eq 6)' classid 1:10
# tc filter add dev eth0 parent 1: basic \
        match 'meta(priority eq 0)' \
        or 'meta(priority eq 4)' classid 1:20

A detailed description of the basic filter and the ematch syntax it uses can be found in tc-basic(8) and tc-ematch(8).

Obviously, this first example cries for optimization. A simple one would be to just change the default class from 1:30 to 1:20, so filters are only needed for Bulk and Interactive priorities:

# tc filter add dev eth0 parent 1: basic \
        match 'meta(priority eq 6)' classid 1:10
# tc filter add dev eth0 parent 1: basic \
        match 'meta(priority eq 2)' classid 1:20

Given that class IDs are random, choosing them wisely allows for a direct mapping. So first, recreate the qdisc and classes configuration:

# tc qdisc replace dev eth0 root handle 1: htb default 10
# tc class add dev eth0 parent 1: classid 1:1 htb rate 95mbit
# alias tclass='tc class add dev eth0 parent 1:1'
# tclass classid 1:16 htb rate 1mbit ceil 20mbit prio 1
# tclass classid 1:10 htb rate 90mbit ceil 95mbit prio 2
# tclass classid 1:12 htb rate 1mbit ceil 95mbit prio 3
# tc qdisc add dev eth0 parent 1:16 fq_codel
# tc qdisc add dev eth0 parent 1:10 fq_codel
# tc qdisc add dev eth0 parent 1:12 fq_codel

This is basically identical to above, but with changed leaf class IDs and the second priority class being the default. Using the flow filter with it's map functionality, a single filter command is enough:

# tc filter add dev eth0 parent 1: handle 0x1337 flow \
        map key priority baseclass 1:10

The flow filter now uses the priority value to construct a destination class ID by adding it to the value of baseclass. While this works for priority values of 0, 2 and 6, it will result in non-existent class ID 1:14 for Interactive Bulk traffic. In that case, the HTB default applies so that traffic goes into class ID 1:10 just as intended. Please note that specifying a handle is a mandatory requirement by the flow filter, although I didn't see where one would use that later. For more information about flow, see tc-flow(8).

While flow and basic filters are relatively easy to apply and understand, they are as well quite limited to their intended purpose. A more flexible option is the u32 filter, which allows to match on arbitrary parts of the packet data - yet only on that, not any meta data associated to it by the kernel (with the exception of firewall mark value). So in order to continue this little exercise with u32, we have to base classification directly upon the actual TOS value. An intuitive attempt might look like this:

# alias tcfilter='tc filter add dev eth0 parent 1:'
# tcfilter u32 match ip dsfield 0x10 0x1e classid 1:16
# tcfilter u32 match ip dsfield 0x12 0x1e classid 1:16
# tcfilter u32 match ip dsfield 0x14 0x1e classid 1:16
# tcfilter u32 match ip dsfield 0x16 0x1e classid 1:16
# tcfilter u32 match ip dsfield 0x8 0x1e classid 1:12
# tcfilter u32 match ip dsfield 0xa 0x1e classid 1:12
# tcfilter u32 match ip dsfield 0xc 0x1e classid 1:12
# tcfilter u32 match ip dsfield 0xe 0x1e classid 1:12

The obvious drawback here is the amount of filters needed. And without the default class, eight more filters would be necessary. This also has performance implications: A packet with TOS value 0xe will be checked eight times in total in order to determine it's destination class. While there's not much to be done about the number of filters, at least the performance problem can be eliminated by using u32's hash table support:

# tc filter add dev eth0 parent 1: prio 99 handle 1: u32 divisor 16

This creates a hash table with 16 buckets. The table size is arbitrary, but not random: Since the first bit of the TOS field is not interesting, it can be ignored and therefore the range of values to consider is just [0;15], i.e. a number of 16 different values. The next step is to populate the hash table:

# alias tcfilter='tc filter add dev eth0 parent 1: prio 99'
# tcfilter u32 match u8 0 0 ht 1:0: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:1: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:2: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:3: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:4: classid 1:12
# tcfilter u32 match u8 0 0 ht 1:5: classid 1:12
# tcfilter u32 match u8 0 0 ht 1:6: classid 1:12
# tcfilter u32 match u8 0 0 ht 1:7: classid 1:12
# tcfilter u32 match u8 0 0 ht 1:8: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:9: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:a: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:b: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:c: classid 1:10
# tcfilter u32 match u8 0 0 ht 1:d: classid 1:10
# tcfilter u32 match u8 0 0 ht 1:e: classid 1:10
# tcfilter u32 match u8 0 0 ht 1:f: classid 1:10

The parameter ht denotes the hash table and bucket the filter should be added to. Since the first TOS bit is ignored, it's value has to be divided by two in order to get to the bucket it maps to. E.g. a TOS value of 0x10 will therefore map to bucket 0x8. For the sake of completeness, all possible values are mapped and therefore a configurable default class is not required. Note that the used match expression is not necessary, but mandatory. Therefore anything that matches any packet will suffice. Finally, a filter which links to the defined hash table is needed:

# tc filter add dev eth0 parent 1: prio 1 protocol ip u32 \
        link 1: hashkey mask 0x001e0000 match u8 0 0

Here again, the actual match statement is not necessary, but syntactically required. All the magic lies within the hashkey parameter, which defines which part of the packet should be used directly as hash key. Here's a drawing of the first four bytes of the IPv4 header, with the area selected by hashkey mask highlighted:

$\begin{figure}\begin{Verbatim}0 1 2 3 .-------------------------------------... ...----------------------------------------------------'\end{Verbatim} \end{figure}$

This may look confusing at first, but keep in mind that bit- as well as byte-ordering here is LSB while the mask value is written in MSB we humans use. Therefore reading the mask is done like so, starting from left:

Skip the first byte (which contains Version and IHL fields).
Skip the lowest bit of the second byte (0x1e is even).
Mark the four following bits (0x1e is 11110 in binary).
Skip the remaining three bits of the second byte as well as the remaining two bytes.

Before doing the lookup, the kernel right-shifts the masked value by the amount of zero-bits in mask, which implicitly also does the division by two which the hash table depends on. With this setup, every packet has to pass exactly two filters to be classified. Note that this filter is limited to IPv4 packets: Due to the related Traffic Class field being at a different offset in the packet, it would not work for IPv6. To use the same setup for IPv6 as well, a second entry-level filter is necessary:

# tc filter add dev eth0 parent 1: prio 2 protocol ipv6 u32 \
        link 1: hashkey mask 0x01e00000 match u8 0 0

For illustration purposes, here again is a drawing of the first four bytes of the IPv6 header, again with masked area highlighted:

$\begin{figure}\begin{Verbatim}0 1 2 3 .-------------------------------------... ...----------------------------------------------------'\end{Verbatim} \end{figure}$

Reading the mask value is analogous to IPv4 with the added complexity that Traffic Class spans over two bytes. Yet, for comparison there's a simple trick: IPv6 has the interesting field shifted by four bits to the left, and the new mask's value is shifted by the same amount. For further information about u32 and what can be done with it, consult it's man page tc-u32(8).

Of course, the kernel provides many more filters than just basic, flow and u32 which have been presented above. As of now, the remaining ones are:

bpf: Filtering using Berkeley Packet Filter programs. The program's return code determines the packet's destination class ID.
cgroup: Filter packets based on control groups. This is only useful for packets originating from the local host, as control groups only exist in that scope.
flower: An extended variant of the flow filter.
fw: Matches on firewall mark values previously assigned to the packet by netfilter (or a filter action, see below for details). This allows to export the classification algorithm into netfilter, which is very convenient if appropriate rules exist on the same system in there already.
route: Filter packets based on matching routing table entry. Basically equivalent to the fw filter above, to make use of an already existing extensive routing table setup.
rsvp, rsvp6: Implementation of the Resource Reservation Protocol in Linux, to react upon requests sent by an RSVP daemon.
tcindex: Match packets based on tcindex value, which is usually set by the dsmark qdisc. This is part of an approach to support Differentiated Services in Linux, which is another topic on it's own.

Next: Filter Actions Up: QoS in Linux with Previous: QoS in Linux with