kernel-ark/net
Eric Dumazet 2b85a34e91 net: No more expensive sock_hold()/sock_put() on each tx
One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.

This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.

We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.

We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.

As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.

skb_set_owner_w() doesnt call sock_hold() anymore

sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.

Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.

Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11 02:55:43 -07:00
..
9p net/9p: handle correctly interrupted 9P requests 2009-04-05 16:54:53 -05:00
802 net: remove COMPAT_NET_DEV_OPS 2009-05-25 01:53:53 -07:00
8021q 8021q: Vlan driver should use rcu_barrier() on unload instead of syncronize_net() 2009-06-10 01:11:22 -07:00
appletalk appletalk: Use frag list abstraction interfaces. 2009-06-09 00:17:44 -07:00
atm net: skb->dst accessors 2009-06-03 02:51:04 -07:00
ax25 ax25: proc uid file misses header 2009-04-20 02:14:59 -07:00
bluetooth isdn: rename capi_ctr_reseted() to capi_ctr_down() 2009-06-08 00:45:50 -07:00
bridge net: skb->dst accessors 2009-06-03 02:51:04 -07:00
can can: af_can.c use rcu_barrier() on module unload. 2009-06-10 01:11:24 -07:00
core net: No more expensive sock_hold()/sock_put() on each tx 2009-06-11 02:55:43 -07:00
dcb
dccp net: skb->dst accessors 2009-06-03 02:51:04 -07:00
decnet net: skb->dst accessors 2009-06-03 02:51:04 -07:00
dsa net: convert unicast addr list 2009-05-29 22:12:32 -07:00
econet econet: Use SKB queue and list helpers instead of doing it by-hand. 2009-05-28 16:46:29 -07:00
ethernet net: remove COMPAT_NET_DEV_OPS 2009-05-25 01:53:53 -07:00
ieee802154 ieee802154: Use '%Zu' printf format for size_t. 2009-06-11 02:10:19 -07:00
ipv4 net: No more expensive sock_hold()/sock_put() on each tx 2009-06-11 02:55:43 -07:00
ipv6 net: No more expensive sock_hold()/sock_put() on each tx 2009-06-11 02:55:43 -07:00
ipx ipx: use constant for strings and desciptor 2009-03-21 19:06:51 -07:00
irda irda: Use SKB queue and list helpers instead of doing it by-hand. 2009-05-28 23:26:33 -07:00
iucv af_iucv: Fix merge. 2009-04-23 06:37:16 -07:00
key af_key: remove some pointless conditionals before kfree_skb() 2009-02-26 23:07:32 -08:00
lapb
llc llc: Kill outdated and incorrect comment. 2009-05-28 23:31:56 -07:00
mac80211 mac80211: disable PS while probing AP 2009-06-10 13:28:41 -04:00
netfilter nfnetlink_queue: Use rcu_barrier() on module unload. 2009-06-10 01:11:23 -07:00
netlabel netlabel: Use genl_register_family_with_ops() 2009-05-21 16:50:24 -07:00
netlink genetlink: Introduce genl_register_family_with_ops() 2009-05-21 16:50:22 -07:00
netrom net/netrom: Fix socket locking 2009-04-22 00:49:51 -07:00
packet net: skb->dst accessors 2009-06-03 02:51:04 -07:00
phonet phonet: Use frag list abstraction interfaces. 2009-06-09 00:24:06 -07:00
rds Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-05-18 21:08:20 -07:00
rfkill rfkill: don't impose global states on resume (just restore the previous states) 2009-06-10 13:28:37 -04:00
rose Revert "rose: zero length frame filtering in af_rose.c" 2009-04-14 20:28:00 -07:00
rxrpc RxRPC: Error handling for rxrpc_alloc_connection() 2009-05-21 15:22:02 -07:00
sched pkt_sched: Use PSCHED_SHIFT in PSCHED time conversion 2009-06-09 05:25:29 -07:00
sctp sctp: protocol.c call rcu_barrier() on unload. 2009-06-10 01:11:25 -07:00
sunrpc sunrpc/auth_gss: Call rcu_barrier() on module unload. 2009-06-10 01:11:27 -07:00
tipc tipc: Use genl_register_family_with_ops() 2009-05-21 16:50:23 -07:00
unix New helper - current_umask() 2009-03-31 23:00:26 -04:00
wanrouter wanrouter: fix sparse warnings: context imbalance 2009-02-26 23:13:36 -08:00
wimax wimax: depend on rfkill properly 2009-06-04 10:58:15 -04:00
wireless cfg80211: fix rfkill locking problem 2009-06-10 13:28:41 -04:00
x25 af_rose/x25: Sanity check the maximum user frame size 2009-03-27 00:28:21 -07:00
xfrm xfrm: Use frag list abstraction interfaces. 2009-06-09 00:24:07 -07:00
compat.c
Kconfig net: add IEEE 802.15.4 socket family implementation 2009-06-09 05:25:32 -07:00
Makefile net: add IEEE 802.15.4 socket family implementation 2009-06-09 05:25:32 -07:00
nonet.c
socket.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2009-04-06 18:05:43 -07:00
sysctl_net.c net: sysctl_net - use net_eq to compare nets 2009-03-16 16:23:30 +01:00
TUNABLE