tc qdisc … dev
| root) [ handle
] cbq avpkt
] [ ewma
] [ mpu
tc class … dev
] cbq allot
] [ rate
] [ minburst
] [ maxburst
] [ ewma
] [ cell
] [ bounded isolated ] [ split
] [ estimator
Class Based Queueing is a classful qdisc that implements a rich
linksharing hierarchy of classes. It contains shaping elements as
well as prioritizing capabilities. Shaping is performed using link
idle time calculations based on the timing of dequeue events and
underlying link bandwidth.
Shaping is done using link idle time calculations, and actions taken if
these calculations deviate from set limits.
When shaping a 10mbit/s connection to 1mbit/s, the link will
be idle 90% of the time. If it isn’t, it needs to be throttled so that it
IS idle 90% of the time.
From the kernel’s perspective, this is hard to measure, so CBQ instead
derives the idle time from the number of microseconds (in fact, jiffies)
that elapse between requests from the device driver for more data. Combined
with the knowledge of packet sizes, this is used to approximate how full or
empty the link is.
This is rather circumspect and doesn’t always arrive at proper
results. For example, what is the actual link speed of an interface
that is not really able to transmit the full 100mbit/s of data,
perhaps because of a badly implemented driver? A PCMCIA network card
will also never achieve 100mbit/s because of the way the bus is
designed – again, how do we calculate the idle time?
The physical link bandwidth may be ill defined in case of not-quite-real
network devices like PPP over Ethernet or PPTP over TCP/IP. The effective
bandwidth in that case is probably determined by the efficiency of pipes
to userspace – which not defined.
During operations, the effective idletime is measured using an
exponential weighted moving average (EWMA), which considers recent
packets to be exponentially more important than past ones. The Unix
loadaverage is calculated in the same way.
The calculated idle time is subtracted from the EWMA measured one,
the resulting number is called ‘avgidle’. A perfectly loaded link has
an avgidle of zero: packets arrive exactly at the calculated
An overloaded link has a negative avgidle and if it gets too negative,
CBQ throttles and is then ‘overlimit’.
Conversely, an idle link might amass a huge avgidle, which would then
allow infinite bandwidths after a few hours of silence. To prevent
this, avgidle is capped at
If overlimit, in theory, the CBQ could throttle itself for exactly the
amount of time that was calculated to pass between packets, and then
pass one packet, and throttle again. Due to timer resolution constraints,
this may not be feasible, see the
Within the one CBQ instance many classes may exist. Each of these classes
contains another qdisc, by default
When enqueueing a packet, CBQ starts at the root and uses various methods to
determine which class should receive the data. If a verdict is reached, this
process is repeated for the recipient class which might have further
means of classifying traffic to its children, if any.
CBQ has the following methods available to classify a packet to any child
Can be set from userspace by an application with the
skb->priority class encoding
only applies if the skb->priority holds a major:minor handle of an existing
class within this qdisc.
parameters. The defmap may contain instructions for each possible Linux packet
Each class also has a
Classification is a loop, which terminates when a leaf class is found. At any
point the loop may jump to the fallback algorithm.
The loop consists of the following steps:
choose it and terminate.
than the current class. If so, and the returned class is not a leaf node,
restart the loop at the found class. If it is a leaf node, terminate.
If we found an upward reference to a higher level, enter the fallback
class. If this is an upward reference, or no
class was defined,
enter the fallback algorithm. If a valid class was found, and it is not a
leaf node, restart the loop at this class. If it is a leaf, choose it and
neither the priority distilled from the classid, nor the
priority yielded a class, enter the fallback algorithm.
The fallback algorithm resides outside of the loop and is as follows.
of the class (which is related to the TOS field), choose this class and
priority. If found, choose it, and terminate.
The packet is enqueued to the class which was chosen when either algorithm
terminated. It is therefore possible for a packet to be enqueued *not* at a
leaf node, but in the middle of the hierarchy.
When dequeuing for sending to the network device, CBQ decides which of its
classes will be allowed to send. It does so with a Weighted Round Robin process
in which each class with packets gets a chance to send in turn. The WRR process
starts by asking the highest priority classes (lowest numerically –
highest semantically) for packets, and will continue to do so until they
have no more data to offer, in which case the process repeats for lower
CERTAINTY ENDS HERE, ANK PLEASE HELP
Each class is not allowed to send at length though – they can only dequeue a
configurable amount of data during each round.
If a class is about to go overlimit, and it is not
it will try to borrow avgidle from siblings that are not
This process is repeated from the bottom upwards. If a class is unable
to borrow enough avgidle to send a packet, it is throttled and not asked
for a packet for enough time for the avgidle to increase above zero.
I REALLY NEED HELP FIGURING THIS OUT. REST OF DOCUMENT IS PRETTY CERTAIN
The root qdisc of a CBQ class tree has the following parameters:
of an interface or within an existing class.
A CBQ qdisc does not shape out of its own accord. It only needs to know certain
parameters about the underlying link. Actual shaping is done in classes.
Classes have a host of parameters to configure their operation.
The time to wait is called the offtime. Higher values of minburst lead
to more accurate shaping in the long term, but to bigger bursts at
Minidle is specified in negative microseconds, so 10 means that
avgidle is capped at -10us.
The defmap specifies which priorities this class wants to receive,
specified as a bitmap. The Least Significant Bit corresponds to priority
parameter tells CBQ at which class the decision must be made, which should
be a (grand)parent of the class you are adding.
As an example, ‘tc class add … classid 10:1 cbq .. split 10:0 defmap c0′
configures class 10:0 to send packets with priorities 6 and 7 to 10:1.
The complimentary configuration would then
be: ‘tc class add … classid 10:2 cbq … split 10:0 defmap 3f’
Which would send all packets 0, 1, 2, 3, 4 and 5 to 10:1.
microseconds how much traffic has passed. This again is a EWMA, for which
the time constant can be specified, also in microseconds. The
corresponds to the sluggishness of the measurement or, conversely, to the
sensitivity of the average to short bursts. Higher values mean less