Let's begin with a simple one, i.e. reestablishing what pfifo_fast did automatically based on TOS/Priority field. Linux internally translates the header field into the priority field of struct skbuff, which pfifo_fast uses for classification. tc-prio(8) contains a table listing the priority (and ultimately, pfifo_fast queue index) each TOS value is being translated into. Here is a shorter version:
TOS Values | Linux Priority (Number) | Queue Index |
0x0 - 0x6 | Best Effort (0) | 1 |
0x8 - 0xe | Bulk (2) | 2 |
0x10 - 0x16 | Interactive (6) | 0 |
0x18 - 0x1e | Interactive Bulk (4) | 1 |
# tc filter add dev eth0 parent 1: basic \ match 'meta(priority eq 6)' classid 1:10 # tc filter add dev eth0 parent 1: basic \ match 'meta(priority eq 0)' \ or 'meta(priority eq 4)' classid 1:20A detailed description of the basic filter and the ematch syntax it uses can be found in tc-basic(8) and tc-ematch(8).
Obviously, this first example cries for optimization. A simple one would be to just change the default class from 1:30 to 1:20, so filters are only needed for Bulk and Interactive priorities:
# tc filter add dev eth0 parent 1: basic \ match 'meta(priority eq 6)' classid 1:10 # tc filter add dev eth0 parent 1: basic \ match 'meta(priority eq 2)' classid 1:20Given that class IDs are random, choosing them wisely allows for a direct mapping. So first, recreate the qdisc and classes configuration:
# tc qdisc replace dev eth0 root handle 1: htb default 10 # tc class add dev eth0 parent 1: classid 1:1 htb rate 95mbit # alias tclass='tc class add dev eth0 parent 1:1' # tclass classid 1:16 htb rate 1mbit ceil 20mbit prio 1 # tclass classid 1:10 htb rate 90mbit ceil 95mbit prio 2 # tclass classid 1:12 htb rate 1mbit ceil 95mbit prio 3 # tc qdisc add dev eth0 parent 1:16 fq_codel # tc qdisc add dev eth0 parent 1:10 fq_codel # tc qdisc add dev eth0 parent 1:12 fq_codelThis is basically identical to above, but with changed leaf class IDs and the second priority class being the default. Using the flow filter with it's map functionality, a single filter command is enough:
# tc filter add dev eth0 parent 1: handle 0x1337 flow \ map key priority baseclass 1:10The flow filter now uses the priority value to construct a destination class ID by adding it to the value of baseclass. While this works for priority values of 0, 2 and 6, it will result in non-existent class ID 1:14 for Interactive Bulk traffic. In that case, the HTB default applies so that traffic goes into class ID 1:10 just as intended. Please note that specifying a handle is a mandatory requirement by the flow filter, although I didn't see where one would use that later. For more information about flow, see tc-flow(8).
While flow and basic filters are relatively easy to apply and understand, they are as well quite limited to their intended purpose. A more flexible option is the u32 filter, which allows to match on arbitrary parts of the packet data - yet only on that, not any meta data associated to it by the kernel (with the exception of firewall mark value). So in order to continue this little exercise with u32, we have to base classification directly upon the actual TOS value. An intuitive attempt might look like this:
# alias tcfilter='tc filter add dev eth0 parent 1:' # tcfilter u32 match ip dsfield 0x10 0x1e classid 1:16 # tcfilter u32 match ip dsfield 0x12 0x1e classid 1:16 # tcfilter u32 match ip dsfield 0x14 0x1e classid 1:16 # tcfilter u32 match ip dsfield 0x16 0x1e classid 1:16 # tcfilter u32 match ip dsfield 0x8 0x1e classid 1:12 # tcfilter u32 match ip dsfield 0xa 0x1e classid 1:12 # tcfilter u32 match ip dsfield 0xc 0x1e classid 1:12 # tcfilter u32 match ip dsfield 0xe 0x1e classid 1:12The obvious drawback here is the amount of filters needed. And without the default class, eight more filters would be necessary. This also has performance implications: A packet with TOS value 0xe will be checked eight times in total in order to determine it's destination class. While there's not much to be done about the number of filters, at least the performance problem can be eliminated by using u32's hash table support:
# tc filter add dev eth0 parent 1: prio 99 handle 1: u32 divisor 16This creates a hash table with 16 buckets. The table size is arbitrary, but not random: Since the first bit of the TOS field is not interesting, it can be ignored and therefore the range of values to consider is just [0;15], i.e. a number of 16 different values. The next step is to populate the hash table:
# alias tcfilter='tc filter add dev eth0 parent 1: prio 99' # tcfilter u32 match u8 0 0 ht 1:0: classid 1:16 # tcfilter u32 match u8 0 0 ht 1:1: classid 1:16 # tcfilter u32 match u8 0 0 ht 1:2: classid 1:16 # tcfilter u32 match u8 0 0 ht 1:3: classid 1:16 # tcfilter u32 match u8 0 0 ht 1:4: classid 1:12 # tcfilter u32 match u8 0 0 ht 1:5: classid 1:12 # tcfilter u32 match u8 0 0 ht 1:6: classid 1:12 # tcfilter u32 match u8 0 0 ht 1:7: classid 1:12 # tcfilter u32 match u8 0 0 ht 1:8: classid 1:16 # tcfilter u32 match u8 0 0 ht 1:9: classid 1:16 # tcfilter u32 match u8 0 0 ht 1:a: classid 1:16 # tcfilter u32 match u8 0 0 ht 1:b: classid 1:16 # tcfilter u32 match u8 0 0 ht 1:c: classid 1:10 # tcfilter u32 match u8 0 0 ht 1:d: classid 1:10 # tcfilter u32 match u8 0 0 ht 1:e: classid 1:10 # tcfilter u32 match u8 0 0 ht 1:f: classid 1:10The parameter ht denotes the hash table and bucket the filter should be added to. Since the first TOS bit is ignored, it's value has to be divided by two in order to get to the bucket it maps to. E.g. a TOS value of 0x10 will therefore map to bucket 0x8. For the sake of completeness, all possible values are mapped and therefore a configurable default class is not required. Note that the used match expression is not necessary, but mandatory. Therefore anything that matches any packet will suffice. Finally, a filter which links to the defined hash table is needed:
# tc filter add dev eth0 parent 1: prio 1 protocol ip u32 \ link 1: hashkey mask 0x001e0000 match u8 0 0Here again, the actual match statement is not necessary, but syntactically required. All the magic lies within the hashkey parameter, which defines which part of the packet should be used directly as hash key. Here's a drawing of the first four bytes of the IPv4 header, with the area selected by hashkey mask highlighted:
# tc filter add dev eth0 parent 1: prio 2 protocol ipv6 u32 \ link 1: hashkey mask 0x01e00000 match u8 0 0For illustration purposes, here again is a drawing of the first four bytes of the IPv6 header, again with masked area highlighted:
Of course, the kernel provides many more filters than just basic, flow and u32 which have been presented above. As of now, the remaining ones are: