kernel-ark/net
Pablo Neira Ayuso 38938bfe34 netlink: add NETLINK_NO_ENOBUFS socket flag
This patch adds the NETLINK_NO_ENOBUFS socket flag. This flag can
be used by unicast and broadcast listeners to avoid receiving
ENOBUFS errors.

Generally speaking, ENOBUFS errors are useful to notify two things
to the listener:

a) You may increase the receiver buffer size via setsockopt().
b) You have lost messages, you may be out of sync.

In some cases, ignoring ENOBUFS errors can be useful. For example:

a) nfnetlink_queue: this subsystem does not have any sort of resync
method and you can decide to ignore ENOBUFS once you have set a
given buffer size.

b) ctnetlink: you can use this together with the socket flag
NETLINK_BROADCAST_SEND_ERROR to stop getting ENOBUFS errors as
you do not need to resync (packets whose event are not delivered
are drop to provide reliable logging and state-synchronization).

Moreover, the use of NETLINK_NO_ENOBUFS also reduces a "go up, go down"
effect in terms of performance which is due to the netlink congestion
control when the listener cannot back off. The effect is the following:

1) throughput rate goes up and netlink messages are inserted in the
receiver buffer.
2) Then, netlink buffer fills and overruns (set on nlk->state bit 0).
3) While the listener empties the receiver buffer, netlink keeps
dropping messages. Thus, throughput goes dramatically down.
4) Then, once the listener has emptied the buffer (nlk->state
bit 0 is set off), goto step 1.

This effect is easy to trigger with netlink broadcast under heavy
load, and it is more noticeable when using a big receiver buffer.
You can find some results in [1] that show this problem.

[1] http://1984.lsi.us.es/linux/netlink/

This patch also includes the use of sk_drop to account the number of
netlink messages drop due to overrun. This value is shown in
/proc/net/netlink.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-24 16:37:55 -07:00
..
9p 9p: fix sparse warning: cast adds address space 2009-02-26 23:13:32 -08:00
802 snap: use const for descriptor 2009-03-21 19:06:50 -07:00
8021q gro: Fix vlan/netpoll check again 2009-03-17 13:10:52 -07:00
appletalk net: convert usage of packet_type to read_mostly 2009-03-10 05:22:43 -07:00
atm atm: convert clip driver to net_device_ops 2009-03-21 19:19:12 -07:00
ax25 ax25: zero length frame filtering in AX25 2009-03-21 13:33:55 -07:00
bluetooth Bluetooth: Remove some pointless conditionals before kfree_skb() 2009-02-27 06:14:49 +01:00
bridge Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2009-03-24 13:24:36 -07:00
can can: remove some pointless conditionals before kfree_skb() 2009-02-26 23:07:35 -08:00
core net: remove useless prefetch() call 2009-03-21 13:42:55 -07:00
dcb DCB: fix kfree(skb) 2009-01-04 17:29:21 -08:00
dccp dccp: Do not let initial option overhead shrink the MPS 2009-03-02 03:07:23 -08:00
decnet net/*: use linux/kernel.h swap() 2009-03-21 13:36:17 -07:00
dsa dsa: add switch chip cascading support 2009-03-21 19:06:54 -07:00
econet net: convert usage of packet_type to read_mostly 2009-03-10 05:22:43 -07:00
ethernet
ipv4 arp_tables: ifname_compare() can assume 16bit alignment 2009-03-24 14:15:22 -07:00
ipv6 netfilter: trivial Kconfig spelling fixes 2009-03-24 13:35:27 -07:00
ipx ipx: use constant for strings and desciptor 2009-03-21 19:06:51 -07:00
irda irlan: convert to net_device_ops 2009-03-21 19:19:16 -07:00
iucv iucv: remove some pointless conditionals before kfree_skb() 2009-02-26 23:07:37 -08:00
key af_key: remove some pointless conditionals before kfree_skb() 2009-02-26 23:07:32 -08:00
lapb
llc net: convert usage of packet_type to read_mostly 2009-03-10 05:22:43 -07:00
mac80211 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-03-17 15:04:31 -07:00
netfilter Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2009-03-24 13:24:36 -07:00
netlabel
netlink netlink: add NETLINK_NO_ENOBUFS socket flag 2009-03-24 16:37:55 -07:00
netrom netrom: zero length frame filtering in NetRom 2009-03-21 13:34:20 -07:00
packet Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying end-of-line points for skbs 2009-03-13 12:09:28 -07:00
phonet net: convert usage of packet_type to read_mostly 2009-03-10 05:22:43 -07:00
rds rds: fix iband RDMA dependencies 2009-03-03 21:39:40 -08:00
rfkill net/rfkill/rfkill.c: fix unused rfkill_led_trigger() warning 2009-01-04 17:11:24 -08:00
rose rose: convert to network_device_ops 2009-01-21 14:02:04 -08:00
rxrpc RxRPC: Fix a potential NULL dereference 2009-02-06 21:50:52 -08:00
sched net/*: use linux/kernel.h swap() 2009-03-21 13:36:17 -07:00
sctp sctp: Clean up TEST_FRAME hacks. 2009-03-21 13:41:09 -07:00
sunrpc net/sunrpc/xprtsock.c: some common code found 2009-02-06 23:48:33 -08:00
tipc tipc: fix non-const printf format arguments 2009-03-18 19:11:29 -07:00
unix unix: remove some pointless conditionals before kfree_skb() 2009-02-26 23:07:34 -08:00
wanrouter wanrouter: fix sparse warnings: context imbalance 2009-02-26 23:13:36 -08:00
wimax Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2009-02-14 23:12:00 -08:00
wireless Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-03-23 13:35:04 -07:00
x25 x25: '< 0' and '>= 0' test on unsigned 2009-03-13 16:04:12 -07:00
xfrm xfrm: Fix xfrm_state_find() wrt. wildcard source address. 2009-03-13 14:22:40 -07:00
compat.c net: socket infrastructure for SO_TIMESTAMPING 2009-02-15 22:43:35 -08:00
Kconfig netdev: expose net_device_ops compat as config option 2009-03-21 22:55:36 -07:00
Makefile RDS: Kconfig and Makefile 2009-02-26 23:43:35 -08:00
nonet.c
socket.c net: socket infrastructure for SO_TIMESTAMPING 2009-02-15 22:43:35 -08:00
sysctl_net.c net: sysctl_net - use net_eq to compare nets 2009-03-16 16:23:30 +01:00
TUNABLE