Commit Graph

3643 Commits

Author SHA1 Message Date
Bruno Randolf
79b1c460a0 cfg80211: Add documentation for antenna ops
Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:36 -05:00
Felix Fietkau
dd5b4cc71c cfg80211/mac80211: improve ad-hoc multicast rate handling
- store the multicast rate as an index instead of the rate value
  (reduces cpu overhead in a hotpath)
- validate the rate values (must match a bitrate in at least one sband)

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:35 -05:00
John W. Linville
d7a066c923 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-11-24 16:19:24 -05:00
John W. Linville
ccb1435401 Revert "nl80211/mac80211: Report signal average"
This reverts commit 86107fd170.

This patch inadvertantly changed the userland ABI.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:18:36 -05:00
Eric Dumazet
bba14de987 scm: lower SCM_MAX_FD
Lower SCM_MAX_FD from 255 to 253 so that allocations for scm_fp_list are
halved. (commit f8d570a4 added two pointers in this structure)

scm_fp_dup() should not copy whole structure (and trigger kmemcheck
warnings), but only the used part. While we are at it, only allocate
needed size.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:16:43 -08:00
Eric Dumazet
456b61bca8 ipv6: mcast: RCU conversion
ipv6_sk_mc_lock rwlock becomes a spinlock.

readers (inet6_mc_check()) now takes rcu_read_lock() instead of read
lock. Writers dont need to disable BH anymore.

struct ipv6_mc_socklist objects are reclaimed after one RCU grace
period.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:16:42 -08:00
Luis R. Rodriguez
b2e253cf30 cfg80211: Fix regulatory bug with multiple cards and delays
When two cards are connected with the same regulatory domain
if CRDA had a delayed response then cfg80211's own set regulatory
domain would still be the world regulatory domain. There was a bug
on cfg80211's logic such that it assumed that once you pegged a
request as the last request it was already the currently set
regulatory domain. This would mean we would race setting a stale
regulatory domain to secondary cards which had the same regulatory
domain since the alpha2 would match.

We fix this by processing each regulatory request atomically,
and only move on to the next one once we get it fully processed.
In the case CRDA is not present we will simply world roam.

This issue is only present when you have a slow system and the
CRDA processing is delayed. Because of this it is not a known
regression.

Without this fix when a delay is present with CRDA the second card
would end up with an intersected regulatory domain and not allow it
to use the channels it really is designed for. When two cards with
two different regulatory domains were inserted you'd end up
rejecting the second card's regulatory domain request.
This fails with mac80211_hswim's regtest=2 (two requests, same alpha2)
and regtest=3 (two requests, different alpha2) module parameter
options.

This was reproduced and tested against mac80211_hwsim using this
CRDA delayer:

       #!/bin/bash
       echo $COUNTRY >> /tmp/log
       sleep 2
       /sbin/crda.orig

And these regulatory tests:

       modprobe mac80211_hwsim regtest=2
       modprobe mac80211_hwsim regtest=3

Reported-by: Mark Mentovai <mark@moxienet.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Tested-by: Mark Mentovai <mark@moxienet.com>
Tested-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-22 15:48:51 -05:00
Jan Engelhardt
20a95a2169 netns: let net_generic take pointer-to-const args
This commit is same in nature as v2.6.37-rc1-755-g3654654; the network
namespace itself is not modified when calling net_generic, so the
parameter can be const.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-21 10:05:10 -08:00
David S. Miller
24912420e9 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bonding/bond_main.c
	net/core/net-sysfs.c
	net/ipv6/addrconf.c
2010-11-19 13:13:47 -08:00
David S. Miller
07bfa524d4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-11-18 11:56:09 -08:00
Bruno Randolf
86107fd170 nl80211/mac80211: Report signal average
Extend nl80211 to report an exponential weighted moving average (EWMA) of the
signal value. Since the signal value usually fluctuates between different
packets, an average can be more useful than the value of the last packet.

This uses the recently added generic EWMA library function.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-18 14:22:20 -05:00
Tetsuo Handa
ef22b7b65f net: Fix duplicate volatile warning.
jiffies is defined as "volatile".

  extern unsigned long volatile __jiffy_data jiffies;

ACCESS_ONCE() uses "volatile".
As a result, some compilers warn duplicate `volatile' for ACCESS_ONCE(jiffies).

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 09:40:04 -08:00
Johannes Berg
4bce22b9b8 mac80211: defines for AC numbers
In many places we've just hardcoded the
AC numbers -- which is a relic from the
original mac80211 (d80211). Add constants
for them so we know what we're talking
about.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-17 16:19:31 -05:00
Changli Gao
5811662b15 net: use the macros defined for the members of flowi
Use the macros defined for the members of flowi to clean the code up.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 12:27:45 -08:00
Thomas Graf
f8ff182c71 rtnetlink: Link address family API
Each net_device contains address family specific data such as
per device settings and statistics. We already expose this data
via procfs/sysfs and partially netlink.

The netlink method requires the requester to send one RTM_GETLINK
request for each address family it wishes to receive data of
and then merge this data itself.

This patch implements a new API which combines all address family
specific link data in a new netlink attribute IFLA_AF_SPEC.
IFLA_AF_SPEC contains a sequence of nested attributes, one for each
address family which in turn defines the structure of its own
attribute. Example:

   [IFLA_AF_SPEC] = {
       [AF_INET] = {
           [IFLA_INET_CONF] = ...,
       },
       [AF_INET6] = {
           [IFLA_INET6_FLAGS] = ...,
           [IFLA_INET6_CONF] = ...,
       }
   }

The API also allows for address families to implement a function
which parses the IFLA_AF_SPEC attribute sent by userspace to
implement address family specific link options.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 11:28:24 -08:00
Felix Fietkau
8f0729b16a mac80211: add support for setting the ad-hoc multicast rate
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:39:08 -05:00
Felix Fietkau
885a46d0f7 cfg80211: add support for setting the ad-hoc multicast rate
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:39:08 -05:00
Juuso Oikarinen
a619a4c0e1 mac80211: Add function to get probe request template for current AP
Chipsets with hardware based connection monitoring need to autonomically
send directed probe-request frames to the AP (in the event of beacon loss,
for example.)

For the hardware to be able to do this, it requires a template for the frame
to transmit to the AP, filled in with the BSSID and SSID of the AP, but also
the supported rate IE's.

This patch adds a function to mac80211, which allows the hardware driver to
fetch this template after association, so it can be configured to the hardware.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:37:08 -05:00
Bruno Randolf
15d9675321 mac80211: Add antenna configuration
Allow antenna configuration by calling driver's function for it.

We disallow antenna configuration if the wiphy is already running, mainly to
make life easier for 802.11n drivers which need to recalculate HT capabilites.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:37:05 -05:00
Bruno Randolf
afe0cbf875 cfg80211: Add nl80211 antenna configuration
Allow setting of TX and RX antennas configuration via nl80211.

The antenna configuration is defined as a bitmap of allowed antennas to use.
This API can be used to mask out antennas which are not attached or should not
be used for other reasons like regulatory concerns or special setups.

Separate bitmaps are used for RX and TX to allow configuring different antennas
for receiving and transmitting. Each bitmap is 32 bit long, each bit
representing one antenna, starting with antenna 1 at the first bit. If an
antenna bit is set, this means the driver is allowed to use this antenna for RX
or TX respectively; if the bit is not set the hardware is not allowed to use
this antenna.

Using bitmaps has the benefit of allowing for a flexible configuration
interface which can support many different configurations and which can be used
for 802.11n as well as non-802.11n devices. Instead of relying on some hardware
specific assumptions, drivers can use this information to know which antennas
are actually attached to the system and derive their capabilities based on
that.

802.11n devices should enable or disable chains, based on which antennas are
present (If all antennas belonging to a particular chain are disabled, the
entire chain should be disabled). HT capabilities (like STBC, TX Beamforming,
Antenna selection) should be calculated based on the available chains after
applying the antenna masks. Should a 802.11n device have diversity antennas
attached to one of their chains, diversity can be enabled or disabled based on
the antenna information.

Non-802.11n drivers can use the antenna masks to select RX and TX antennas and
to enable or disable antenna diversity.

While covering chainmasks for 802.11n and the standard "legacy" modes "fixed
antenna 1", "fixed antenna 2" and "diversity" this API also allows more rare,
but useful configurations as follows:

1) Send on antenna 1, receive on antenna 2 (or vice versa). This can be used to
have a low gain antenna for TX in order to keep within the regulatory
constraints and a high gain antenna for RX in order to receive weaker signals
("speak softly, but listen harder"). This can be useful for building long-shot
outdoor links. Another usage of this setup is having a low-noise pre-amplifier
on antenna 1 and a power amplifier on the other antenna. This way transmit
noise is mostly kept out of the low noise receive channel.
(This would be bitmaps: tx 1 rx 2).

2) Another similar setup is: Use RX diversity on both antennas, but always send
on antenna 1. Again that would allow us to benefit from a higher gain RX
antenna, while staying within the legal limits.
(This would be: tx 0 rx 3).

3) And finally there can be special experimental setups in research and
development even with pre 802.11n hardware where more than 2 antennas are
available. It's good to keep the API simple, yet flexible.

Signed-off-by: Bruno Randolf <br1@einfach.org>

--
v7:	Made bitmasks 32 bit wide and rebased to latest wireless-testing.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:37:05 -05:00
Arik Nemtsov
f23a478075 mac80211: support hardware TX fragmentation offload
The lower driver is notified when the fragmentation threshold changes
and upon a reconfig of the interface.

If the driver supports hardware TX fragmentation, don't fragment
packets in the stack.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:37:04 -05:00
Eric Dumazet
b178bb3dfc net: reorder struct sock fields
Right now, fields in struct sock are not optimally ordered, because each
path (RX softirq, TX completion, RX user,  TX user) has to touch fields
that are contained in many different cache lines.

The really critical thing is to shrink number of cache lines that are
used at RX softirq time : CPU handling softirqs for a device can receive
many frames per second for many sockets. If load is too big, we can drop
frames at NIC level. RPS or multiqueue cards can help, but better reduce
latency if possible.

This patch starts with UDP protocol, then additional patches will try to
reduce latencies of other ones as well.

At RX softirq time, fields of interest for UDP protocol are :
(not counting ones in inet struct for the lookup)

Read/Written:
sk_refcnt   (atomic increment/decrement)
sk_rmem_alloc & sk_backlog.len (to check if there is room in queues)
sk_receive_queue
sk_backlog (if socket locked by user program)
sk_rxhash
sk_forward_alloc
sk_drops

Read only:
sk_rcvbuf (sk_rcvqueues_full())
sk_filter
sk_wq
sk_policy[0]
sk_flags

Additional notes :

- sk_backlog has one hole on 64bit arches. We can fill it to save 8
bytes.
- sk_backlog is used only if RX sofirq handler finds the socket while
locked by user.
- sk_rxhash is written only once per flow.
- sk_drops is written only if queues are full

Final layout :

[1] One section grouping all read/write fields, but placing rxhash and
sk_backlog at the end of this section.

[2] One section grouping all read fields in RX handler
   (sk_filter, sk_rcv_buf, sk_wq)

[3] Section used by other paths

I'll post a patch on its own to put sk_refcnt at the end of struct
sock_common so that it shares same cache line than section [1]

New offsets on 64bit arch :

sizeof(struct sock)=0x268
offsetof(struct sock, sk_refcnt)  =0x10
offsetof(struct sock, sk_lock)    =0x48
offsetof(struct sock, sk_receive_queue)=0x68
offsetof(struct sock, sk_backlog)=0x80
offsetof(struct sock, sk_rmem_alloc)=0x80
offsetof(struct sock, sk_forward_alloc)=0x98
offsetof(struct sock, sk_rxhash)=0x9c
offsetof(struct sock, sk_rcvbuf)=0xa4
offsetof(struct sock, sk_drops) =0xa0
offsetof(struct sock, sk_filter)=0xa8
offsetof(struct sock, sk_wq)=0xb0
offsetof(struct sock, sk_policy)=0xd0
offsetof(struct sock, sk_flags) =0xe0

Instead of :

sizeof(struct sock)=0x270
offsetof(struct sock, sk_refcnt)  =0x10
offsetof(struct sock, sk_lock)    =0x50
offsetof(struct sock, sk_receive_queue)=0xc0
offsetof(struct sock, sk_backlog)=0x70
offsetof(struct sock, sk_rmem_alloc)=0xac
offsetof(struct sock, sk_forward_alloc)=0x10c
offsetof(struct sock, sk_rxhash)=0x128
offsetof(struct sock, sk_rcvbuf)=0x4c
offsetof(struct sock, sk_drops) =0x16c
offsetof(struct sock, sk_filter)=0x198
offsetof(struct sock, sk_wq)=0x88
offsetof(struct sock, sk_policy)=0x98
offsetof(struct sock, sk_flags) =0x130

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 11:17:43 -08:00
Eric Dumazet
c31504dc0d udp: use atomic_inc_not_zero_hint
UDP sockets refcount is usually 2, unless an incoming frame is going to
be queued in receive or backlog queue.

Using atomic_inc_not_zero_hint() permits to reduce latency, because
processor issues less memory transactions.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 11:17:43 -08:00
Jan Engelhardt
3654654f7a netlink: let nlmsg and nla functions take pointer-to-const args
The changed functions do not modify the NL messages and/or attributes
at all. They should use const (similar to strchr), so that callers
which have a const nlmsg/nlattr around can make use of them without
casting.

While at it, constify a data array.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 09:52:32 -08:00
David S. Miller
b5e4156743 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-11-16 09:17:12 -08:00
Jussi Kivilinna
309075cf08 cfg80211: fix WIPHY_FLAG_IBSS_RSN bit
WIPHY_FLAG_IBSS_RSN is BIT(7) as is WIPHY_FLAG_CONTROL_PORT_PROTOCOL. Change
to BIT(8).

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 15:00:42 -05:00
Arnd Hannemann
62370e2b93 b43legacy: Fix compile on ARM architecture
When b43legacy is compiled on the arm platform, the following errors are seen:

  CC [M]  drivers/net/wireless/b43legacy/xmit.o
In file included from include/net/dst.h:11,
from drivers/net/wireless/b43legacy/xmit.c:31:
include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__'
   before '____cacheline_aligned_in_smp'
include/net/dst_ops.h: In function 'dst_entries_get_fast':
include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named
   'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_get_slow':
include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named
   'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_add':
include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named
   'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_init':
include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named
   'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_destroy':
include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named
   'pcpuc_entries'
make[4]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
make[3]: *** [drivers/net/wireless/b43legacy] Error 2
make[2]: *** [drivers/net/wireless] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2

The cause is a missing include of <linux/cache.h>, which is present for
i386 and x86_64 architectures, but not for arm.

Signed-off-by: Arnd Hannemann <arnd@arndnet.de>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@kernel.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 15:00:42 -05:00
Joe Perches
d577f1ccdd include/net/caif/cfctrl.h: Remove unnecessary semicolons
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:07:16 -08:00
Timo Teräs
cc9ff19da9 xfrm: use gre key as flow upper protocol info
The GRE Key field is intended to be used for identifying an individual
traffic flow within a tunnel. It is useful to be able to have XFRM
policy selector matches to have different policies for different
GRE tunnels.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 10:44:04 -08:00
Luis R. Rodriguez
749b527b21 cfg80211: fix allowing country IEs for WIPHY_FLAG_STRICT_REGULATORY
We should be enabling country IE hints for WIPHY_FLAG_STRICT_REGULATORY
even if we haven't yet recieved regulatory domain hint for the driver
if it needed one. Without this Country IEs are not passed on to drivers
that have set WIPHY_FLAG_STRICT_REGULATORY, today this is just all
Atheros chipset drivers: ath5k, ath9k, ar9170, carl9170.

This was part of the original design, however it was completely
overlooked...

Cc: Easwar Krishnan <easwar.krishnan@atheros.com>
Cc: stable@kernel.org
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:09 -05:00
David S. Miller
c25ecd0a21 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-11-14 11:57:05 -08:00
Linus Torvalds
9457b24a09 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (66 commits)
  can-bcm: fix minor heap overflow
  gianfar: Do not call device_set_wakeup_enable() under a spinlock
  ipv6: Warn users if maximum number of routes is reached.
  docs: Add neigh/gc_thresh3 and route/max_size documentation.
  axnet_cs: fix resume problem for some Ax88790 chip
  ipv6: addrconf: don't remove address state on ifdown if the address is being kept
  tcp: Don't change unlocked socket state in tcp_v4_err().
  x25: Prevent crashing when parsing bad X.25 facilities
  cxgb4vf: add call to Firmware to reset VF State.
  cxgb4vf: Fail open if link_start() fails.
  cxgb4vf: flesh out PCI Device ID Table ...
  cxgb4vf: fix some errors in Gather List to skb conversion
  cxgb4vf: fix bug in Generic Receive Offload
  cxgb4vf: don't implement trivial (and incorrect) ndo_select_queue()
  ixgbe: Look inside vlan when determining offload protocol.
  bnx2x: Look inside vlan when determining checksum proto.
  vlan: Add function to retrieve EtherType from vlan packets.
  virtio-net: init link state correctly
  ucc_geth: Fix deadlock
  ucc_geth: Do not bring the whole IF down when TX failure.
  ...
2010-11-12 17:17:55 -08:00
Eric Dumazet
1d7138de87 igmp: RCU conversion of in_dev->mc_list
in_dev->mc_list is protected by one rwlock (in_dev->mc_list_lock).

This can easily be converted to a RCU protection.

Writers hold RTNL, so mc_list_lock is removed, not replaced by a
spinlock.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Cypher Wu <cypher.w@gmail.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-12 13:18:57 -08:00
David S. Miller
c753796769 ipv4: Make rt->fl.iif tests lest obscure.
When we test rt->fl.iif against zero, we're seeing if it's
an output or an input route.

Make that explicit with some helper functions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 17:07:48 -08:00
Eric Dumazet
72cdd1d971 net: get rid of rtable->idev
It seems idev field in struct rtable has no special purpose, but adding
extra atomic ops.

We hold refcounts on the device itself (using percpu data, so pretty
cheap in current kernel).

infiniband case is solved using dst.dev instead of idev->dev

Removal of this field means routing without route cache is now using
shared data, percpu data, and only potential contention is a pair of
atomic ops on struct neighbour per forwarded packet.

About 5% speedup on routing test.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 10:29:40 -08:00
Eric Dumazet
46b13fc5c0 neigh: reorder struct neighbour
It is important to move nud_state outside of the often modified cache
line (because of refcnt), to reduce false sharing in neigh_event_send()

This is a followup of commit 0ed8ddf404 (neigh: Protect neigh->ha[]
with a seqlock)

This gives a 7% speedup on routing test with IP route cache disabled.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 10:29:40 -08:00
Eric Dumazet
8d987e5c75 net: avoid limits overflow
Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]

We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Robin Holt <holt@sgi.com>
Reviewed-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-10 12:12:00 -08:00
Eric Dumazet
fc766e4c49 decnet: RCU conversion and get rid of dev_base_lock
While tracking dev_base_lock users, I found decnet used it in
dnet_select_source(), but for a wrong purpose:

Writers only hold RTNL, not dev_base_lock, so readers must use RCU if
they cannot use RTNL.

Adds an rcu_head in struct dn_ifaddr and handle proper RCU management.

Adds __rcu annotation in dn_route as well.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-08 13:50:08 -08:00
David S. Miller
d0eaeec8e8 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-11-08 12:38:28 -08:00
Paul Mundt
43b81f85eb net dst: need linux/cache.h for ____cacheline_aligned_in_smp.
Presently the b43legacy build fails on an sh randconfig:

In file included from include/net/dst.h:12,
                 from drivers/net/wireless/b43legacy/xmit.c:32:
include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
include/net/dst_ops.h: In function 'dst_entries_get_fast':
include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_get_slow':
include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_add':
include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_init':
include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_destroy':
include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
make[5]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
make[5]: *** Waiting for unfinished jobs....

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-07 19:58:05 -08:00
Linus Torvalds
4b4a2700f4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (41 commits)
  inet_diag: Make sure we actually run the same bytecode we audited.
  netlink: Make nlmsg_find_attr take a const nlmsghdr*.
  fib: fib_result_assign() should not change fib refcounts
  netfilter: ip6_tables: fix information leak to userspace
  cls_cgroup: Fix crash on module unload
  memory corruption in X.25 facilities parsing
  net dst: fix percpu_counter list corruption and poison overwritten
  rds: Remove kfreed tcp conn from list
  rds: Lost locking in loop connection freeing
  de2104x: fix panic on load
  atl1 : fix panic on load
  netxen: remove unused firmware exports
  caif: Remove noisy printout when disconnecting caif socket
  caif: SPI-driver bugfix - incorrect padding.
  caif: Bugfix for socket priority, bindtodev and dbg channel.
  smsc911x: Set Ethernet EEPROM size to supported device's size
  ipv4: netfilter: ip_tables: fix information leak to userland
  ipv4: netfilter: arp_tables: fix information leak to userland
  cxgb4vf: remove call to stop TX queues at load time.
  cxgb4: remove call to stop TX queues at load time.
  ...
2010-11-05 15:25:48 -07:00
Nelson Elhage
6b8c92ba07 netlink: Make nlmsg_find_attr take a const nlmsghdr*.
This will let us use it on a nlmsghdr stored inside a netlink_callback.

Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-04 12:26:34 -07:00
Sjur Brændeland
2c24a5d1b4 caif: SPI-driver bugfix - incorrect padding.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:03 -07:00
André Carvalho de Matos
f2527ec436 caif: Bugfix for socket priority, bindtodev and dbg channel.
Changes:
o Bugfix: SO_PRIORITY for SOL_SOCKET could not be handled
  in caif's setsockopt,  using the struct sock attribute priority instead.

o Bugfix: SO_BINDTODEVICE for SOL_SOCKET could not be handled
  in caif's setsockopt,  using the struct sock attribute ifindex instead.

o Wrong assert statement for RFM layer segmentation.

o CAIF Debug channels was not working over SPI, caif_payload_info
  containing padding info must be initialized.

o Check on pointer before dereferencing when unregister dev in caif_dev.c

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:03 -07:00
Linus Torvalds
1840897ab5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
  b43: Fix warning at drivers/mmc/core/core.c:237 in mmc_wait_for_cmd
  mac80211: fix failure to check kmalloc return value in key_key_read
  libertas: Fix sd8686 firmware reload
  ath9k: Fix incorrect access of rate flags in RC
  netfilter: xt_socket: Make tproto signed in socket_mt6_v1().
  stmmac: enable/disable rx/tx in the core with a single write.
  net: atarilance - flags should be unsigned long
  netxen: fix kdump
  pktgen: Limit how much data we copy onto the stack.
  net: Limit socket I/O iovec total length to INT_MAX.
  USB: gadget: fix ethernet gadget crash in gether_setup
  fib: Fix fib zone and its hash leak on namespace stop
  cxgb3: Fix panic in free_tx_desc()
  cxgb3: fix crash due to manipulating queues before registration
  8390: Don't oops on starting dev queue
  dccp ccid-2: Stop polling
  dccp: Refine the wait-for-ccid mechanism
  dccp: Extend CCID packet dequeueing interface
  dccp: Return-value convention of hc_tx_send_packet()
  igbvf: fix panic on load
  ...
2010-10-29 14:17:12 -07:00
Pavel Emelyanov
4aa2c466a7 fib: Fix fib zone and its hash leak on namespace stop
When we stop a namespace we flush the table and free one, but the
added fn_zone-s (and their hashes if grown) are leaked. Need to free.
Tries releases all its stuff in the flushing code.

Shame on us - this bug exists since the very first make-fib-per-net
patches in 2.6.27 :(

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 10:27:03 -07:00
Venkateswararao Jujjuri (JV)
b165d60145 9p: Add datasync to client side TFSYNC/RFSYNC for dotl
SYNOPSIS
    size[4] Tfsync tag[2] fid[4] datasync[4]

    size[4] Rfsync tag[2]

DESCRIPTION

    The Tfsync transaction transfers ("flushes") all modified in-core data of
    file identified by fid to the disk device (or other  permanent  storage
    device)  where that  file  resides.

    If datasync flag is specified data will be fleshed but does not flush
    modified metadata unless  that  metadata  is  needed  in order to allow a
    subsequent data retrieval to be correctly handled.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:49 -05:00
M. Mohan Kumar
329176cc2c 9p: Implement TREADLINK operation for 9p2000.L
Synopsis

	size[4] TReadlink tag[2] fid[4]
	size[4] RReadlink tag[2] target[s]

Description
	Readlink is used to return the contents of the symoblic link
        referred by fid. Contents of symboic link is returned as a
        response.

	target[s] - Contents of the symbolic link referred by fid.

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:48 -05:00
M. Mohan Kumar
1d769cd192 9p: Implement TGETLOCK
Synopsis

    size[4] TGetlock tag[2] fid[4] getlock[n]
    size[4] RGetlock tag[2] getlock[n]

Description

TGetlock is used to test for the existence of byte range posix locks on a file
identified by given fid. The reply contains getlock structure. If the lock could
be placed it returns F_UNLCK in type field of getlock structure.  Otherwise it
returns the details of the conflicting locks in the getlock structure

    getlock structure:
      type[1] - Type of lock: F_RDLCK, F_WRLCK
      start[8] - Starting offset for lock
      length[8] - Number of bytes to check for the lock
             If length is 0, check for lock in all bytes starting at the location
            'start' through to the end of file
      pid[4] - PID of the process that wants to take lock/owns the task
               in case of reply
      client[4] - Client id of the system that owns the process which
                  has the conflicting lock

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:47 -05:00
M. Mohan Kumar
a099027c77 9p: Implement TLOCK
Synopsis

    size[4] TLock tag[2] fid[4] flock[n]
    size[4] RLock tag[2] status[1]

Description

Tlock is used to acquire/release byte range posix locks on a file
identified by given fid. The reply contains status of the lock request

    flock structure:
        type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK
        flags[4] - Flags could be either of
          P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a
            conflicting lock exists, wait for that lock to be released.
          P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is
            trying to reclaim a lock after a server restrart (due to crash)
        start[8] - Starting offset for lock
        length[8] - Number of bytes to lock
          If length is 0, lock all bytes starting at the location 'start'
          through to the end of file
        pid[4] - PID of the process that wants to take lock
        client_id[4] - Unique client id

        status[1] - Status of the lock request, can be
          P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or
          P9_LOCK_GRACE(3)
          P9_LOCK_SUCCESS - Request was successful
          P9_LOCK_BLOCKED - A conflicting lock is held by another process
          P9_LOCK_ERROR - Error while processing the lock request
          P9_LOCK_GRACE - Server is in grace period, it can't accept new lock
            requests in this period (except locks with
            P9_LOCK_FLAGS_RECLAIM flag set)

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:47 -05:00
Venkateswararao Jujjuri (JV)
920e65dc69 [9p] Introduce client side TFSYNC/RFSYNC for dotl.
SYNOPSIS
    size[4] Tfsync tag[2] fid[4]

    size[4] Rfsync tag[2]

DESCRIPTION

The Tfsync transaction transfers ("flushes") all modified in-core data of
file identified by fid to the disk device (or other  permanent  storage
device)  where that  file  resides.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:47 -05:00
Arun R Bharadwaj
4f7ebe8072 net/9p: This patch implements TLERROR/RLERROR on the 9P client.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:45 -05:00
Amarnath Revanna
a10c02036f caif-u5500: Adding shared memory include
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 12:29:51 -07:00
Eric Dumazet
b914c4ea92 inetpeer: __rcu annotations
Adds __rcu annotations to inetpeer
	(struct inet_peer)->avl_left
	(struct inet_peer)->avl_right

This is a tedious cleanup, but removes one smp_wmb() from link_to_pool()
since we now use more self documenting rcu_assign_pointer().

Note the use of RCU_INIT_POINTER() instead of rcu_assign_pointer() in
all cases we dont need a memory barrier.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:33 -07:00
Eric Dumazet
7a2b03c517 fib_rules: __rcu annotates ctarget
Adds __rcu annotation to (struct fib_rule)->ctarget

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:32 -07:00
Eric Dumazet
b33eab0844 tunnels: add __rcu annotations
Add __rcu annotations to :
        (struct ip_tunnel)->prl
        (struct ip_tunnel_prl_entry)->next
        (struct xfrm_tunnel)->next
	struct xfrm_tunnel *tunnel4_handlers
	struct xfrm_tunnel *tunnel64_handlers

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:32 -07:00
Eric Dumazet
e0ad61ec86 net: add __rcu annotations to protocol
Add __rcu annotations to :
        struct net_protocol *inet_protos
        struct net_protocol *inet6_protos

And use appropriate casts to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:31 -07:00
Eric Dumazet
1c31720a74 ipv4: add __rcu annotations to routes.c
Add __rcu annotations to :
        (struct dst_entry)->rt_next
        (struct rt_hash_bucket)->chain

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:31 -07:00
Eric Dumazet
43a951e999 ipv4: add __rcu annotations to ip_ra_chain
Add __rcu annotations to :
        (struct ip_ra_chain)->next
	struct ip_ra_chain *ip_ra_chain;

And use appropriate rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 14:18:28 -07:00
Eric Dumazet
0d7da9ddd9 net: add __rcu annotation to sk_filter
Add __rcu annotation to :
        (struct sock)->sk_filter

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 14:18:28 -07:00
Eric Dumazet
1c87733d06 net_ns: add __rcu annotations
add __rcu annotation to (struct net)->gen, and use
rcu_dereference_protected() in net_assign_generic()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 14:18:27 -07:00
Eric Dumazet
6f0bcf1525 tunnels: add _rcu annotations
(struct ip6_tnl)->next is rcu protected :
(struct ip_tunnel)->next is rcu protected :
(struct xfrm6_tunnel)->next is rcu protected :

add __rcu annotation and proper rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 13:09:45 -07:00
Eric Dumazet
3cc77ec74e net/802: add __rcu annotations
(struct net_device)->garp_port is rcu protected :
(struct garp_port)->applicants is rcu protected :

add __rcu annotation and proper rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 13:09:44 -07:00
Linus Torvalds
5f05647dd8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
  bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
  vlan: Calling vlan_hwaccel_do_receive() is always valid.
  tproxy: use the interface primary IP address as a default value for --on-ip
  tproxy: added IPv6 support to the socket match
  cxgb3: function namespace cleanup
  tproxy: added IPv6 support to the TPROXY target
  tproxy: added IPv6 socket lookup function to nf_tproxy_core
  be2net: Changes to use only priority codes allowed by f/w
  tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
  tproxy: added tproxy sockopt interface in the IPV6 layer
  tproxy: added udp6_lib_lookup function
  tproxy: added const specifiers to udp lookup functions
  tproxy: split off ipv6 defragmentation to a separate module
  l2tp: small cleanup
  nf_nat: restrict ICMP translation for embedded header
  can: mcp251x: fix generation of error frames
  can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
  can-raw: add msg_flags to distinguish local traffic
  9p: client code cleanup
  rds: make local functions/variables static
  ...

Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
drivers/net/wireless/ath/ath9k/debug.c as per David
2010-10-23 11:47:02 -07:00
Linus Torvalds
888a6f77e0 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (52 commits)
  sched: fix RCU lockdep splat from task_group()
  rcu: using ACCESS_ONCE() to observe the jiffies_stall/rnp->qsmask value
  sched: suppress RCU lockdep splat in task_fork_fair
  net: suppress RCU lockdep false positive in sock_update_classid
  rcu: move check from rcu_dereference_bh to rcu_read_lock_bh_held
  rcu: Add advice to PROVE_RCU_REPEATEDLY kernel config parameter
  rcu: Add tracing data to support queueing models
  rcu: fix sparse errors in rcutorture.c
  rcu: only one evaluation of arg in rcu_dereference_check() unless sparse
  kernel: Remove undead ifdef CONFIG_DEBUG_LOCK_ALLOC
  rcu: fix _oddness handling of verbose stall warnings
  rcu: performance fixes to TINY_PREEMPT_RCU callback checking
  rcu: upgrade stallwarn.txt documentation for CPU-bound RT processes
  vhost: add __rcu annotations
  rcu: add comment stating that list_empty() applies to RCU-protected lists
  rcu: apply TINY_PREEMPT_RCU read-side speedup to TREE_PREEMPT_RCU
  rcu: combine duplicate code, courtesy of CONFIG_PREEMPT_RCU
  rcu: Upgrade srcu_read_lock() docbook about SRCU grace periods
  rcu: document ways of stalling updates in low-memory situations
  rcu: repair code-duplication FIXMEs
  ...
2010-10-21 12:54:12 -07:00
David S. Miller
9941fb6276 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-10-21 08:21:34 -07:00
Patrick McHardy
3b1a1ce6f4 Merge branch 'for-patrick' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6 2010-10-21 16:25:51 +02:00
Balazs Scheidler
3b9afb2991 tproxy: added IPv6 socket lookup function to nf_tproxy_core
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21 16:12:14 +02:00
Balazs Scheidler
aa976fc011 tproxy: added udp6_lib_lookup function
Just like with IPv4, we need access to the UDP hash table to look up local
sockets, but instead of exporting the global udp_table, export a lookup
function.

Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21 16:05:41 +02:00
Balazs Scheidler
e97c3e278e tproxy: split off ipv6 defragmentation to a separate module
Like with IPv4, TProxy needs IPv6 defragmentation but does not
require connection tracking. Since defragmentation was coupled
with conntrack, I split off the two, creating an nf_defrag_ipv6 module,
similar to the already existing nf_defrag_ipv4.

Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21 16:03:43 +02:00
stephen hemminger
32a875adcd 9p: client code cleanup
Make p9_client_version static since only used in one file.
Remove p9_client_auth because it is defined but never used.
Compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 04:26:39 -07:00
Balazs Scheidler
093d282321 tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()
When __inet_inherit_port() is called on a tproxy connection the wrong locks are
held for the inet_bind_bucket it is added to. __inet_inherit_port() made an
implicit assumption that the listener's port number (and thus its bind bucket).
Unfortunately, if you're using the TPROXY target to redirect skbs to a
transparent proxy that assumption is not true anymore and things break.

This patch adds code to __inet_inherit_port() so that it can handle this case
by looking up or creating a new bind bucket for the child socket and updates
callers of __inet_inherit_port() to gracefully handle __inet_inherit_port()
failing.

Reported by and original patch from Stephen Buck <stephen.buck@exinda.com>.
See http://marc.info/?t=128169268200001&r=1&w=2 for the original discussion.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21 13:06:43 +02:00
Balazs Scheidler
6006db84a9 tproxy: add lookup type checks for UDP in nf_tproxy_get_sock_v4()
Also, inline this function as the lookup_type is always a literal
and inlining removes branches performed at runtime.

Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21 12:47:34 +02:00
Balazs Scheidler
106e4c26b1 tproxy: kick out TIME_WAIT sockets in case a new connection comes in with the same tuple
Without tproxy redirections an incoming SYN kicks out conflicting
TIME_WAIT sockets, in order to handle clients that reuse ports
within the TIME_WAIT period.

The same mechanism didn't work in case TProxy is involved in finding
the proper socket, as the time_wait processing code looked up the
listening socket assuming that the listener addr/port matches those
of the established connection.

This is not the case with TProxy as the listener addr/port is possibly
changed with the tproxy rule.

Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21 12:45:14 +02:00
Changli Gao
3511c9132f net_sched: remove the unused parameter of qdisc_create_dflt()
The first parameter dev isn't in use in qdisc_create_dflt().

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 03:09:47 -07:00
stephen hemminger
1c4c40c42d xfrm: make xfrm_bundle_ok local
Only used in one place.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 03:09:46 -07:00
stephen hemminger
8d8a0b1cc2 rtnetlink: remove rtnl_kill_links
The function rtnl_kill_links is defined but never used.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 03:09:45 -07:00
stephen hemminger
6f747aca5e xfrm6: make xfrm6_tunnel_free_spi local
Function only defined and used in one file.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 03:09:45 -07:00
Julian Anastasov
0d79641a96 ipvs: provide address family for debugging
As skb->protocol is not valid in LOCAL_OUT add
parameter for address family in packet debugging functions.
Even if ports are not present in AH and ESP change them to
use ip_vs_tcpudp_debug_packet to show at least valid addresses
as before. This patch removes the last user of skb->protocol
in IPVS.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-10-21 11:04:43 +02:00
Julian Anastasov
fc60476761 ipvs: changes for local real server
This patch deals with local real servers:

- Add support for DNAT to local address (different real server port).
It needs ip_vs_out hook in LOCAL_OUT for both families because
skb->protocol is not set for locally generated packets and can not
be used to set 'af'.

- Skip packets in ip_vs_in marked with skb->ipvs_property because
ip_vs_out processing can be executed in LOCAL_OUT but we still
have the conn_out_get check in ip_vs_in.

- Ignore packets with inet->nodefrag from local stack

- Require skb_dst(skb) != NULL because we use it to get struct net

- Add support for changing the route to local IPv4 stack after DNAT
depending on the source address type. Local client sets output
route and the remote client sets input route. It looks like
IPv6 does not need such rerouting because the replies use
addresses from initial incoming header, not from skb route.

- All transmitters now have strict checks for the destination
address type: redirect from non-local address to local real
server requires NAT method, local address can not be used as
source address when talking to remote real server.

- Now LOCALNODE is not set explicitly as forwarding
method in real server to allow the connections to provide
correct forwarding method to the backup server. Not sure if
this breaks tools that expect to see 'Local' real server type.
If needed, this can be supported with new flag IP_VS_DEST_F_LOCAL.
Now it should be possible connections in backup that lost
their fwmark information during sync to be forwarded properly
to their daddr, even if it is local address in the backup server.
By this way backup could be used as real server for DR or TUN,
for NAT there are some restrictions because tuple collisions
in conntracks can create problems for the traffic.

- Call ip_vs_dst_reset when destination is updated in case
some real server IP type is changed between local and remote.

[ horms@verge.net.au: removed trailing whitespace ]
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-10-21 11:03:46 +02:00
Julian Anastasov
190ecd27cd ipvs: do not schedule conns from real servers
This patch is needed to avoid scheduling of
packets from local real server when we add ip_vs_in
in LOCAL_OUT hook to support local client.

 	Currently, when ip_vs_in can not find existing
connection it tries to create new one by calling ip_vs_schedule.

 	The default indication from ip_vs_schedule was if
connection was scheduled to real server. If real server is
not available we try to use the bypass forwarding method
or to send ICMP error. But in some cases we do not want to use
the bypass feature. So, add flag 'ignored' to indicate if
the scheduler ignores this packet.

 	Make sure we do not create new connections from replies.
We can hit this problem for persistent services and local real
server when ip_vs_in is added to LOCAL_OUT hook to handle
local clients.

 	Also, make sure ip_vs_schedule ignores SYN packets
for Active FTP DATA from local real server. The FTP DATA
connection should be created on SYN+ACK from client to assign
correct connection daddr.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-10-21 10:50:41 +02:00
Julian Anastasov
cf356d69db ipvs: switch to notrack mode
Change skb->ipvs_property semantic. This is preparation
to support ip_vs_out processing in LOCAL_OUT. ipvs_property=1
will be used to avoid expensive lookups for traffic sent by
transmitters. Now when conntrack support is not used we call
ip_vs_notrack method to avoid problems in OUTPUT and
POST_ROUTING hooks instead of exiting POST_ROUTING as before.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-10-21 10:50:20 +02:00
Julian Anastasov
8b27b10f58 ipvs: optimize checksums for apps
Avoid full checksum calculation for apps that can provide
info whether csum was broken after payload mangling. For now only
ip_vs_ftp mangles payload and it updates the csum, so the full
recalculation is avoided for all packets.

 	Add CHECKSUM_UNNECESSARY for snat_handler (TCP and UDP).
It is needed to support SNAT from local address for the case
when csum is fully recalculated.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-10-21 10:50:02 +02:00
David S. Miller
5eeaa2db16 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-10-20 01:59:48 -07:00
Hans Schillstrom
714f095f74 ipvs: IPv6 tunnel mode
IPv6 encapsulation uses a bad source address for the tunnel.
i.e. VIP will be used as local-addr and encap. dst addr.
Decapsulation will not accept this.

Example
LVS (eth1 2003::2:0:1/96, VIP 2003::2:0:100)
   (eth0 2003::1:0:1/96)
RS  (ethX 2003::1:0:5/96)

tcpdump
2003::2:0:100 > 2003::1:0:5: IP6 (hlim 63, next-header TCP (6) payload length: 40)  2003::3:0:10.50991 > 2003::2:0:100.http: Flags [S], cksum 0x7312 (correct), seq 3006460279, win 5760, options [mss 1440,sackOK,TS val 1904932 ecr 0,nop,wscale 3], length 0

In Linux IPv6 impl. you can't have a tunnel with an any cast address
receiving packets (I have not tried to interpret RFC 2473)
To have receive capabilities the tunnel must have:
 - Local address set as multicast addr or an unicast addr
 - Remote address set as an unicast addr.
 - Loop back addres or Link local address are not allowed.

This causes us to setup a tunnel in the Real Server with the
LVS as the remote address, here you can't use the VIP address since it's
used inside the tunnel.

Solution
Use outgoing interface IPv6 address (match against the destination).
i.e. use ip6_route_output() to look up the route cache and
then use ipv6_dev_get_saddr(...) to set the source address of the
encapsulated packet.

Additionally, cache the results in new destination
fields: dst_cookie and dst_saddr and properly check the
returned dst from ip6_route_output. We now add xfrm_lookup
call only for the tunneling method where the source address
is a local one.

Signed-off-by:Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-19 10:38:48 +02:00
Pablo Neira Ayuso
ebbf41df4a netfilter: ctnetlink: add expectation deletion events
This patch allows to listen to events that inform about
expectations destroyed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-19 10:19:06 +02:00
Eric Dumazet
8e602ce298 netns: reorder fields in struct net
In a network bench, I noticed an unfortunate false sharing between
'loopback_dev' and 'count' fields in "struct net".

'count' is written each time a socket is created or destroyed, while
loopback_dev might be often read in routing code.

Move loopback_dev in a read mostly section of "struct net"

Note: struct netns_xfrm is cache line aligned on SMP.
(It contains a "struct dst_ops")
Move it at the end to avoid holes, and reduce sizeof(struct net) by 128
bytes on ia32.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-17 13:49:14 -07:00
stephen hemminger
31e3c3f6f1 tipc: cleanup function namespace
Do some cleanups of TIPC based on make namespacecheck
  1. Don't export unused symbols
  2. Eliminate dead code
  3. Make functions and variables local
  4. Rename buf_acquire to tipc_buf_acquire since it is used in several files

Compile tested only.
This make break out of tree kernel modules that depend on TIPC routines.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-16 11:13:24 -07:00
John W. Linville
c64557d666 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-10-15 16:11:56 -04:00
John W. Linville
1a63c353c8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-next-2.6 into for-davem 2010-10-15 16:00:02 -04:00
Kumar Sanghvi
b3d6255388 Phonet: 'connect' socket implementation for Pipe controller
Based on suggestion by Rémi Denis-Courmont to implement 'connect'
for Pipe controller logic,  this patch implements 'connect' socket
call for the Pipe controller logic.
The patch does following:-
- Removes setsockopts for PNPIPE_CREATE and PNPIPE_DESTROY
- Adds setsockopt for setting the Pipe handle value
- Implements connect socket call
- Updates the Pipe controller logic

User-space should now follow below sequence with Pipe controller:-
-socket
-bind
-setsockopt for PNPIPE_PIPE_HANDLE
-connect
-setsockopt for PNPIPE_ENCAP_IP
-setsockopt for PNPIPE_ENABLE

GPRS/3G data has been tested working fine with this.

Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-13 14:40:34 -07:00
Johannes Berg
7be5086d4c mac80211: add probe request filter flag
Using the frame registration notification, we
can see when probe requests are requested and
notify the low-level driver via filtering. The
flag is also set in AP and IBSS modes.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-13 15:45:22 -04:00
Johannes Berg
271733cf84 cfg80211: notify drivers about frame registrations
Drivers may need to adjust their filters according
to frame registrations, so notify them about them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-13 15:45:22 -04:00
Andrei Emeltchenko
534c92fde7 Bluetooth: clean up rfcomm code
Remove dead code and unused rfcomm thread events

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-10-12 12:44:53 -03:00
Mat Martineau
796c86eec8 Bluetooth: Add common code for stream-oriented recvmsg()
This commit adds a bt_sock_stream_recvmsg() function for use by any
Bluetooth code that uses SOCK_STREAM sockets.  This code is copied
from rfcomm_sock_recvmsg() with minimal modifications to remove
RFCOMM-specific functionality and improve readability.

L2CAP (with the SOCK_STREAM socket type) and RFCOMM have common needs
when it comes to reading data.  Proper stream read semantics require
that applications can read from a stream one byte at a time and not
lose any data.  The RFCOMM code already operated on and pulled data
from the underlying L2CAP socket, so very few changes were required to
make the code more generic for use with non-RFCOMM data over L2CAP.

Applications that need more awareness of L2CAP frame boundaries are
still free to use SOCK_SEQPACKET sockets, and may verify that they
connection did not fall back to basic mode by calling getsockopt().

Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-10-12 12:44:51 -03:00
David Vrabel
8f1e174223 Bluetooth: HCI devices are either BR/EDR or AMP radios
HCI transport drivers may not know what type of radio an AMP device has
so only say whether they're BR/EDR or AMP devices.

Signed-off-by: David Vrabel <david.vrabel@csr.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-10-12 12:44:51 -03:00
Eric Dumazet
e37ef961e5 neigh: reorder struct neighbour fields
Le mardi 12 octobre 2010 à 00:02 +0200, Eric Dumazet a écrit :
> Here is the followup patch.
>
> Thanks !
>

Oops, this was an old version, the up2date ones also took care of "used"
field.

I guess its time for a sleep, sorry again.

[PATCH net-next V2] neigh: reorder struct neighbour fields

(refcnt) and (ha_lock, ha, used, dev, output, ops, primary_key) should
be placed on a separate cache lines.

refcnt can be often written, while other fields are mostly read.

This gave me good result on stress test :

before:

real    0m45.570s
user    0m15.525s
sys     9m56.669s

After:

real    0m41.841s
user    0m15.261s
sys     8m45.949s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 16:09:14 -07:00
Eric Dumazet
fc66f95c68 net dst: use a percpu_counter to track entries
struct dst_ops tracks number of allocated dst in an atomic_t field,
subject to high cache line contention in stress workload.

Switch to a percpu_counter, to reduce number of time we need to dirty a
central location. Place it on a separate cache line to avoid dirtying
read only fields.

Stress test :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE, SLUB/NUMA)

Before:

real    0m51.179s
user    0m15.329s
sys     10m15.942s

After:

real	0m45.570s
user	0m15.525s
sys	9m56.669s

With a small reordering of struct neighbour fields, subject of a
following patch, (to separate refcnt from other read mostly fields)

real	0m41.841s
user	0m15.261s
sys	8m45.949s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 13:06:53 -07:00
Eric Dumazet
0ed8ddf404 neigh: Protect neigh->ha[] with a seqlock
Add a seqlock in struct neighbour to protect neigh->ha[], and avoid
dirtying neighbour in stress situation (many different flows / dsts)

Dirtying takes place because of read_lock(&n->lock) and n->used writes.

Switching to a seqlock, and writing n->used only on jiffies changes
permits less dirtying.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 12:54:04 -07:00
David S. Miller
d122179a3c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/core/ethtool.c
2010-10-11 12:30:34 -07:00