I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch turns QFQ into QFQ+, a variant of QFQ that provides the
following two benefits: 1) QFQ+ is faster than QFQ, 2) differently
from QFQ, QFQ+ correctly schedules also non-leaves classes in a
hierarchical setting. A detailed description of QFQ+, plus a
performance comparison with DRR and QFQ, can be found in [1].
[1] P. Valente, "Reducing the Execution Time of Fair-Queueing Schedulers"
http://algo.ing.unimo.it/people/paolo/agg-sched/agg-sched.pdf
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the max packet size for some class (configured through tc) is
violated by the actual size of the packets of that class, then QFQ
would not schedule classes correctly, and the data structures
implementing the bucket lists may get corrupted. This problem occurs
with TSO/GSO even if the max packet size is set to the MTU, and is,
e.g., the cause of the failure reported in [1]. Two patches have been
proposed to solve this problem in [2], one of them is a preliminary
version of this patch.
This patch addresses the above issues by: 1) setting QFQ parameters to
proper values for supporting TSO/GSO (in particular, setting the
maximum possible packet size to 64KB), 2) automatically increasing the
max packet size for a class, lmax, when a packet with a larger size
than the current value of lmax arrives.
The drawback of the first point is that the maximum weight for a class
is now limited to 4096, which is equal to 1/16 of the maximum weight
sum.
Finally, this patch also forcibly caps the timestamps of a class if
they are too high to be stored in the bucket list. This capping, taken
from QFQ+ [3], handles the unfrequent case described in the comment to
the function slot_insert.
[1] http://marc.info/?l=linux-netdev&m=134968777902077&w=2
[2] http://marc.info/?l=linux-netdev&m=135096573507936&w=2
[3] http://marc.info/?l=linux-netdev&m=134902691421670&w=2
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Tested-by: Cong Wang <amwang@redhat.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/team/team.c
drivers/net/usb/qmi_wwan.c
net/batman-adv/bat_iv_ogm.c
net/ipv4/fib_frontend.c
net/ipv4/route.c
net/l2tp/l2tp_netlink.c
The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.
qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.
With help from Antonio Quartulli.
Signed-off-by: David S. Miller <davem@davemloft.net>
GCC refuses to recognize that all error control flows do in fact
set err to something.
Add an explicit initialization to shut it up.
net/sched/sch_drr.c: In function ‘drr_enqueue’:
net/sched/sch_drr.c:359:11: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/sched/sch_qfq.c: In function ‘qfq_enqueue’:
net/sched/sch_qfq.c:885:11: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Signed-off-by: David S. Miller <davem@davemloft.net>
If the old timestamps of a class, say cl, are stale when the class
becomes active, then QFQ may assign to cl a much higher start time
than the maximum value allowed. This may happen when QFQ assigns to
the start time of cl the finish time of a group whose classes are
characterized by a higher value of the ratio
max_class_pkt/weight_of_the_class with respect to that of
cl. Inserting a class with a too high start time into the bucket list
corrupts the data structure and may eventually lead to crashes.
This patch limits the maximum start time assigned to a class.
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Resending again, as the text was corrupted by the email client]
To speed up operations, QFQ internally divides classes into
groups. Which group a class belongs to depends on the ratio between
the maximum packet length and the weight of the class. Unfortunately
the function qfq_change_class lacks the steps for changing the group
of a class when the ratio max_pkt_len/weight of the class changes.
For example, when the last of the following three commands is
executed, the group of class 1:1 is not correctly changed:
tc disc add dev XXX root handle 1: qfq
tc class add dev XXX parent 1: qfq classid 1:1 weight 1
tc class change dev XXX parent 1: classid 1:1 qfq weight 4
Not changing the group of a class does not affect the long-term
bandwidth guaranteed to the class, as the latter is independent of the
maximum packet length, and correctly changes (only) if the weight of
the class changes. In contrast, if the group of the class is not
updated, the class is still guaranteed the short-term bandwidth and
packet delay related to its old group, instead of the guarantees that
it should receive according to its new weight and/or maximum packet
length. This may also break service guarantees for other classes.
This patch adds the missing operations.
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.
Signed-off-by: David S. Miller <davem@davemloft.net>
We can underestimate q->wsum in case of "tc class replace ... qfq"
and/or qdisc_create_dflt() error.
wsum is not really used in fast path, only at qfq qdisc/class setup,
to catch user error.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
grp->slot_shift is between 22 and 41, so using 32bit wide variables is
probably a typo.
This could explain QFQ hangs Dave reported to me, after 2^23 packets ?
(23 = 64 - 41)
Reported-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is an implementation of the Quick Fair Queue scheduler developed
by Fabio Checconi. The same algorithm is already implemented in ipfw
in FreeBSD. Fabio had an earlier version developed on Linux, I just
cleaned it up. Thanks to Eric Dumazet for testing this under load.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>