Commit Graph

11364 Commits

Author SHA1 Message Date
Trond Myklebust
69b6ba3712 SUNRPC: Ensure the server closes sockets in a timely fashion
We want to ensure that connected sockets close down the connection when we
set XPT_CLOSE, so that we don't keep it hanging while cleaning up all the
stuff that is keeping a reference to the socket.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:57 -05:00
Jeff Layton
c9233eb7b0 sunrpc: add sv_maxconn field to svc_serv (try #3)
svc_check_conn_limits() attempts to prevent denial of service attacks
by having the service close old connections once it reaches a
threshold. This threshold is based on the number of threads in the
service:

	(serv->sv_nrthreads + 3) * 20

Once we reach this, we drop the oldest connections and a printk pops
to warn the admin that they should increase the number of threads.

Increasing the number of threads isn't an option however for services
like lockd. We don't want to eliminate this check entirely for such
services but we need some way to increase this limit.

This patch adds a sv_maxconn field to the svc_serv struct. When it's
set to 0, we use the current method to calculate the max number of
connections. RPC services can then set this on an as-needed basis.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:47 -05:00
Frederik Schwarzer
025dfdafe7 trivial: fix then -> than typos in comments and documentation
- (better, more, bigger ...) then -> (...) than

Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-06 11:28:06 +01:00
Linus Torvalds
15b0669072 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (44 commits)
  qlge: Fix sparse warnings for tx ring indexes.
  qlge: Fix sparse warning regarding rx buffer queues.
  qlge: Fix sparse endian warning in ql_hw_csum_setup().
  qlge: Fix sparse endian warning for inbound packet control block flags.
  qlge: Fix sparse warnings for byte swapping in qlge_ethool.c
  myri10ge: print MAC and serial number on probe failure
  pkt_sched: cls_u32: Fix locking in u32_change()
  iucv: fix cpu hotplug
  af_iucv: Free iucv path/socket in path_pending callback
  af_iucv: avoid left over IUCV connections from failing connects
  af_iucv: New error return codes for connect()
  net/ehea: bitops work on unsigned longs
  Revert "net: Fix for initial link state in 2.6.28"
  tcp: Kill extraneous SPLICE_F_NONBLOCK checks.
  tcp: don't mask EOF and socket errors on nonblocking splice receive
  dccp: Integrate the TFRC library with DCCP
  dccp: Clean up ccid.c after integration of CCID plugins
  dccp: Lockless integration of CCID congestion-control plugins
  qeth: get rid of extra argument after printk to dev_* conversion
  qeth: No large send using EDDP for HiperSockets.
  ...
2009-01-05 18:44:59 -08:00
Linus Torvalds
520c853466 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  inotify: fix type errors in interfaces
  fix breakage in reiserfs_new_inode()
  fix the treatment of jfs special inodes
  vfs: remove duplicate code in get_fs_type()
  add a vfs_fsync helper
  sys_execve and sys_uselib do not call into fsnotify
  zero i_uid/i_gid on inode allocation
  inode->i_op is never NULL
  ntfs: don't NULL i_op
  isofs check for NULL ->i_op in root directory is dead code
  affs: do not zero ->i_op
  kill suid bit only for regular files
  vfs: lseek(fd, 0, SEEK_CUR) race condition
2009-01-05 18:32:06 -08:00
Jarek Poplawski
6f57321422 pkt_sched: cls_u32: Fix locking in u32_change()
New nodes are inserted in u32_change() under rtnl_lock() with wmb(),
so without tcf_tree_lock() like in other classifiers (e.g. cls_fw).
This isn't enough without rmb() on the read side, but on the other
hand adding such barriers doesn't give any savings, so the lock is
added instead.

Reported-by: m0sia <m0sia@plotinka.ru>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 18:14:19 -08:00
Heiko Carstens
f1d3e4dca3 iucv: fix cpu hotplug
If the iucv module is compiled in/loaded but no user is registered cpu
hot remove doesn't work. Reason for that is that the iucv cpu hotplug
notifier on CPU_DOWN_PREPARE checks if the iucv_buffer_cpumask would
be empty after the corresponding bit would be cleared. However the bit
was never set since iucv wasn't enable. That causes all cpu hot unplug
operations to fail in this scenario.
To fix this use iucv_path_table as an indicator wether iucv is enabled
or not.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 18:09:02 -08:00
Hendrik Brueckner
65dbd7c277 af_iucv: Free iucv path/socket in path_pending callback
Free iucv path after iucv_path_sever() calls in iucv_callback_connreq()
(path_pending() iucv callback).
If iucv_path_accept() fails, free path and free/kill newly created socket.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 18:08:23 -08:00
Ursula Braun
18becbc547 af_iucv: avoid left over IUCV connections from failing connects
For certain types of AFIUCV socket connect failures IUCV connections
are left over. Add some cleanup-statements to avoid cluttered IUCV
connections.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 18:07:46 -08:00
Hendrik Brueckner
55cdea9ed9 af_iucv: New error return codes for connect()
If the iucv_path_connect() call fails then return an error code that
corresponds to the iucv_path_connect() failure condition; instead of
returning -ECONNREFUSED for any failure.

This helps to improve error handling for user space applications
(e.g.  inform the user that the z/VM guest is not authorized to
connect to other guest virtual machines).

The error return codes are based on those described in connect(2).

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 18:07:07 -08:00
David S. Miller
c276e098d3 Revert "net: Fix for initial link state in 2.6.28"
This reverts commit 22604c8668.

We can't fix this issue in this way, because we now can try
to take the dev_base_lock rwlock as a writer in software interrupt
context and that is not allowed without major surgery elsewhere.

This initial link state problem needs to be solved in some other
way.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 16:01:51 -08:00
Al Viro
56ff5efad9 zero i_uid/i_gid on inode allocation
... and don't bother in callers.  Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.

i_mode is not worth the effort; it has no common default value.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05 11:54:28 -05:00
David S. Miller
7945cc6464 tcp: Kill extraneous SPLICE_F_NONBLOCK checks.
In splice TCP receive, the SPLICE_F_NONBLOCK flag is used
to compute the "timeo" value.  So checking it again inside
of the main receive loop to trigger -EAGAIN processing is
entirely unnecessary.

Noticed by Jarek P. and Lennert Buytenhek.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 00:59:00 -08:00
Lennert Buytenhek
4f7d54f59b tcp: don't mask EOF and socket errors on nonblocking splice receive
Currently, setting SPLICE_F_NONBLOCK on splice from a TCP socket
results in masking of EOF (RDHUP) and error conditions on the socket
by an -EAGAIN return.  Move the NONBLOCK check in tcp_splice_read()
to be after the EOF and error checks to fix this.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 00:00:12 -08:00
Gerrit Renker
129fa44785 dccp: Integrate the TFRC library with DCCP
This patch integrates the TFRC library, which is a dependency of CCID-3 (and
CCID-4), with the new use of CCIDs in the DCCP module.		

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 21:45:33 -08:00
Gerrit Renker
e5fd56ca4e dccp: Clean up ccid.c after integration of CCID plugins
This patch cleans up after integrating the CCID modules and, in addition,

 * moves the if/else cases from ccid_delete() into ccid_hc_{tx,rx}_delete();
 * removes the 'gfp' argument to ccid_new() - since it is always gfp_any().

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 21:43:23 -08:00
Gerrit Renker
ddebc973c5 dccp: Lockless integration of CCID congestion-control plugins
Based on Arnaldo's earlier patch, this patch integrates the standardised
CCID congestion control plugins (CCID-2 and CCID-3) of DCCP with dccp.ko:

 * enables a faster connection path by eliminating the need to always go 
   through the CCID registration lock;

 * updates the implementation to use only a single array whose size equals
   the number of configured CCIDs instead of the maximum (256);

 * since the CCIDs are now fixed array elements, synchronization is no
   longer needed, simplifying use and implementation.

CCID-2 is suggested as minimum for a basic DCCP implementation (RFC 4340, 10);
CCID-3 is a standards-track CCID supported by RFC 4342 and RFC 5348.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 21:42:53 -08:00
Oliver Hartkopp
6e5c172cf7 can: update can-bcm for hrtimer hardirq callbacks
Since commit ca109491f6 ("hrtimer:
removing all ur callback modes") the hrtimer callbacks are processed
only in hardirq context.

This patch moves some functionality into tasklets to run in softirq
context.

Additionally some duplicated code was removed in bcm_rx_thr_flush()
and an avoidable memcpy was removed from bcm_rx_handler().

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 17:31:18 -08:00
Roel Kluin
858eb711ba DCB: fix kfree(skb)
Use kfree_skb instead of kfree for struct sk_buff pointers.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 17:29:21 -08:00
Ilpo Järvinen
914d11647b ipv6: IPV6_PKTINFO relied userspace providing correct length
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 17:27:31 -08:00
Michael Marineau
22604c8668 net: Fix for initial link state in 2.6.28
From: Michael Marineau <mike@marineau.org>

Commit b47300168e "Do not fire linkwatch
events until the device is registered." was made as a workaround for
drivers that call netif_carrier_off before registering the device.
Unfortunately this causes these drivers to incorrectly report their
link status as IF_OPER_UNKNOWN which can falsely set the IFF_RUNNING
flag when the interface is first brought up. This issues was
previously pointed out[1] but was dismissed saying that IFF_RUNNING is
not related to the link status. From my digging IFF_RUNNING, as
reported to userspace, is based on the link state. It is set based on
__LINK_STATE_START and IF_OPER_UP or IF_OPER_UNKNOWN. See [2], [3],
and [4]. (Whether or not the kernel has IFF_RUNNING set in flags is
not reported to user space so it may well be independent of the link,
I don't know if and when it may get set.)

The end result depends slightly depending on the driver. The the two I
tested were e1000e and b44. With e1000e if the system is booted
without a network cable attached the interface will falsely report
RUNNING when it is brought up causing NetworkManager to attempt to
start it and eventually time out. With b44 when the system is booted
with a network cable attached and brought up with dhcpcd it will time
out the first time.

The attached patch that will still set the operstate variable
correctly to IF_OPER_UP/DOWN/etc when linkwatch_fire_event is called
but then return rather than skipping the linkwatch_fire_event call
entirely as the previous fix did. (sorry it isn't inline, I don't have
a patch friendly email client at the moment)

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 17:18:51 -08:00
Simon Holm Thøgersen
f32f8b72e0 net/rfkill/rfkill.c: fix unused rfkill_led_trigger() warning
commit 4dec9b807b ("rfkill: strip pointless
notifier chain") removed the only user of rfkill_led_trigger() that was not
guarded by #ifdef CONFIG_RFKILL_LEDS. Therefore, move rfkill_led_trigger()
completely inside #ifdef CONFIG_RFKILL_LEDS and avoid the compile time
warning:

net/rfkill/rfkill.c:59: warning: 'rfkill_led_trigger' defined but not used

Signed-off-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 17:11:24 -08:00
Herbert Xu
5d38a079ce gro: Add page frag support
This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
in one skb, rather than using the less efficient frag_list.

It also adds a new interface, napi_gro_frags to allow drivers
to inject page frags directly into the stack without allocating
an skb.  This is intended to be the GRO equivalent for LRO's
lro_receive_frags interface.

The existing GSO interface can already handle page frags with
or without an appended frag_list so nothing needs to be changed
there.

The merging itself is rather simple.  We store any new frag entries
after the last existing entry, without checking whether the first
new entry can be merged with the last existing entry.  Making this
check would actually be easy but since no existing driver can
produce contiguous frags anyway it would just be mental masturbation.

If the total number of entries would exceed the capacity of a
single skb, we simply resort to using frag_list as we do now.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:13:40 -08:00
Herbert Xu
b530256d2e gro: Use gso_size to store MSS
In order to allow GRO packets without frag_list at all, we need to
store the MSS in the packet itself.  The obvious place is gso_size.
The only thing to watch out for is if the packet ends up not being
GRO then we need to clear gso_size before pushing the packet into
the stack.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:13:19 -08:00
David S. Miller
14deae4156 ipv6: Fix sporadic sendmsg -EINVAL when sending to multicast groups.
Thanks to excellent diagnosis by Eduard Guzovsky.

The core problem is that on a network with lots of active
multicast traffic, the neighbour cache can fill up.  If
we try to allocate a new route and thus neighbour cache
entry, the bog-standard GC attempt the neighbour layer does
in ineffective because route entries hold a reference
to the existing neighbour entries and GC can only liberate
entries with no references.

IPV4 already has a way to handle this, by doing a route cache
GC in such situations (when neigh attach returns -ENOBUFS).

So simply mimick this on the ipv6 side.

Tested-by: Eduard Guzovsky <eguzovsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:04:39 -08:00
Al Viro
157cf649a7 sanitize audit_fd_pair()
* no allocations
* return void

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:41 -05:00
Al Viro
f3298dc4f2 sanitize audit_socketcall
* don't bother with allocations
* now that it can't fail, make it return void

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:39 -05:00
Alan Cox
ad36b88e2d tty: Fix an ircomm warning and note another bug
Roel Kluin noted that line is unsigned so one test is unneccessary. Also
add a warning for another flaw I noticed while making this change.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:43 -08:00
Kentaro Takeda
be6d3e56a6 introduce new LSM hooks where vfsmount is available.
Add new LSM hooks for path-based checks.  Call them on directory-modifying
operations at the points where we still know the vfsmount involved.

Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:37 -05:00
Paul Moore
6c2e8ac095 netlabel: Update kernel configuration API
Update the NetLabel kernel API to expose the new features added in kernel
releases 2.6.25 and 2.6.28: the static/fallback label functionality and network
address based selectors.

Signed-off-by: Paul Moore <paul.moore@hp.com>
2008-12-31 12:54:11 -05:00
Linus Torvalds
f57fa1d6a6 Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (70 commits)
  fs/nfs/nfs4proc.c: make nfs4_map_errors() static
  rpc: add service field to new upcall
  rpc: add target field to new upcall
  nfsd: support callbacks with gss flavors
  rpc: allow gss callbacks to client
  rpc: pass target name down to rpc level on callbacks
  nfsd: pass client principal name in rsc downcall
  rpc: implement new upcall
  rpc: store pointer to pipe inode in gss upcall message
  rpc: use count of pipe openers to wait for first open
  rpc: track number of users of the gss upcall pipe
  rpc: call release_pipe only on last close
  rpc: add an rpc_pipe_open method
  rpc: minor gss_alloc_msg cleanup
  rpc: factor out warning code from gss_pipe_destroy_msg
  rpc: remove unnecessary assignment
  NFS: remove unused status from encode routines
  NFS: increment number of operations in each encode routine
  NFS: fix comment placement in nfs4xdr.c
  NFS: fix tabs in nfs4xdr.c
  ...
2008-12-30 17:45:45 -08:00
Trond Myklebust
08cc36cbd1 Merge branch 'devel' into next 2008-12-30 16:51:43 -05:00
Herbert Xu
eb4dea5853 net: Fix percpu counters deadlock
When we converted the protocol atomic counters such as the orphan
count and the total socket count deadlocks were introduced due to
the mismatch in BH status of the spots that used the percpu counter
operations.

Based on the diagnosis and patch by Peter Zijlstra, this patch
fixes these issues by disabling BH where we may be in process
context.

Reported-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 23:04:08 -08:00
Rusty Russell
0f23174aa8 cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits: net
In future all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids.  So use that instead of NR_CPUS in iterators
and other comparisons.

This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 22:44:47 -08:00
Li Zefan
68ce9c0e34 cls_cgroup: clean up Kconfig
cls_cgroup can't be compiled as a module, since it's not supported by
cgroup.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 19:40:46 -08:00
Li Zefan
8e8ba85417 cls_cgroup: clean up for cgroup part
- It's better to use container_of() instead of casting cgroup_subsys_state *
  to cgroup_cls_state *.
- Add helper function task_cls_state().
- Rename net_cls_state() to cgrp_cls_state().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 19:40:45 -08:00
Li Zefan
2f068bf871 cls_cgroup: fix an oops when removing a cgroup
When removing a cgroup, an oops was triggered immediately. The cause
is wrong kfree() in cgrp_destroy().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 19:40:44 -08:00
Simon Horman
68888d1053 IPVS: Make "no destination available" message more consistent between schedulers
Acked-by: Graeme Fowler <graeme@graemef.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 18:37:36 -08:00
Eric W. Biederman
8eb7986396 netns: foreach_netdev_safe is insufficient in default_device_exit
During network namespace teardown we either move or delete
all of the network devices associated with a network namespace.
In the case of veth devices deleting one will also delete it's
pair device.  If both devices are in the same network namespace
then for_each_netdev_safe is insufficient as next may point
to the second veth device we have deleted.

To avoid problems I do what we do in __rtnl_kill_links and
restart the scan of the device list, after we have deleted
a device.

Currently dev_change_netnamespace does not appear to suffer from
this problem, but wireless devices are also paired and likely
should be moved between network namespaces together.  So I have
errored on the side of caution and restart the scan of the network
devices in that case as well.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 18:21:48 -08:00
Rusty Russell
91b208c7c1 net: make xfrm_statistics_seq_show use generic snmp_fold_field
No reason to roll our own here.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 18:20:06 -08:00
Linus Torvalds
0191b625ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
  net: Allow dependancies of FDDI & Tokenring to be modular.
  igb: Fix build warning when DCA is disabled.
  net: Fix warning fallout from recent NAPI interface changes.
  gro: Fix potential use after free
  sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
  sfc: When disabling the NIC, close the device rather than unregistering it
  sfc: SFT9001: Add cable diagnostics
  sfc: Add support for multiple PHY self-tests
  sfc: Merge top-level functions for self-tests
  sfc: Clean up PHY mode management in loopback self-test
  sfc: Fix unreliable link detection in some loopback modes
  sfc: Generate unique names for per-NIC workqueues
  802.3ad: use standard ethhdr instead of ad_header
  802.3ad: generalize out mac address initializer
  802.3ad: initialize ports LACPDU from const initializer
  802.3ad: remove typedef around ad_system
  802.3ad: turn ports is_individual into a bool
  802.3ad: turn ports is_enabled into a bool
  802.3ad: make ntt bool
  ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
  ...

Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
2008-12-28 12:49:40 -08:00
Linus Torvalds
1db2a5c11e Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (85 commits)
  [S390] provide documentation for hvc_iucv kernel parameter.
  [S390] convert ctcm printks to dev_xxx and pr_xxx macros.
  [S390] convert zfcp printks to pr_xxx macros.
  [S390] convert vmlogrdr printks to pr_xxx macros.
  [S390] convert zfcp dumper printks to pr_xxx macros.
  [S390] convert cpu related printks to pr_xxx macros.
  [S390] convert qeth printks to dev_xxx and pr_xxx macros.
  [S390] convert sclp printks to pr_xxx macros.
  [S390] convert iucv printks to dev_xxx and pr_xxx macros.
  [S390] convert ap_bus printks to pr_xxx macros.
  [S390] convert dcssblk and extmem printks messages to pr_xxx macros.
  [S390] convert monwriter printks to pr_xxx macros.
  [S390] convert s390 debug feature printks to pr_xxx macros.
  [S390] convert monreader printks to pr_xxx macros.
  [S390] convert appldata printks to pr_xxx macros.
  [S390] convert setup printks to pr_xxx macros.
  [S390] convert hypfs printks to pr_xxx macros.
  [S390] convert time printks to pr_xxx macros.
  [S390] convert cpacf printks to pr_xxx macros.
  [S390] convert cio printks to pr_xxx macros.
  ...
2008-12-28 12:33:21 -08:00
Herbert Xu
0da2afd596 gro: Fix potential use after free
The initial skb may have been freed after napi_gro_complete in
napi_gro_receive if it was merged into an existing packet.  Thus
we cannot check same_flow (which indicates whether it was merged)
after calling napi_gro_complete.

This patch fixes this by saving the same_flow status before the
call to napi_gro_complete.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-26 14:57:42 -08:00
Peter P Waskiewicz Jr
d7b06636be net: Init NAPI dev_list on napi_del
The recent GRO patches introduced the NAPI removal of devices in
free_netdev.  For drivers that can change the number of queues during
driver operation, the NAPI infrastructure doesn't allow the freeing and
re-addition of NAPI entities without reloading the driver.

This change reinitializes the dev_list in each NAPI struct on delete,
instead of just deleting it (and assigning the list pointers to POISON).
Drivers that wish to remove/re-add NAPI will need to re-initialize the
netdev napi_list after removing all NAPI instances, before re-adding NAPI
devices again.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-26 01:35:35 -08:00
Herbert Xu
f2712fd0b4 ipsec: Remove useless ret variable
This patch removes a useless ret variable from the IPv4 ESP/UDP
decapsulation code.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-26 01:31:18 -08:00
Julia Lawall
88a44e51e9 net/appletalk: Remove redundant test
atif is tested for being NULL twice, with the same effect in each case.  I
have kept the second test, as it seems to fit well with the comment above it.

A simplified version of the semantic patch that makes this change is as
follows: (http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r exists@
local idexpression x;
expression E;
position p1,p2;
@@

if (x@p1 == NULL || ...) { ... when forall
   return ...; }
... when != \(x=E\|x--\|x++\|--x\|++x\|x-=E\|x+=E\|x|=E\|x&=E\|&x\)
(
x@p2 == NULL
|
x@p2 != NULL
)

// another path to the test that is not through p1?
@s exists@
local idexpression r.x;
position r.p1,r.p2;
@@

... when != x@p1
(
x@p2 == NULL
|
x@p2 != NULL
)

@fix depends on !s@
position r.p1,r.p2;
expression x,E;
statement S1,S2;
@@

(
- if ((x@p2 != NULL) || ...)
  S1
|
- if ((x@p2 == NULL) && ...) S1
|
- BUG_ON(x@p2 == NULL);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25 18:04:51 -08:00
Herbert Xu
64ff3b938e tcp: Always set urgent pointer if it's beyond snd_nxt
Our TCP stack does not set the urgent flag if the urgent pointer
does not fit in 16 bits, i.e., if it is more than 64K from the
sequence number of a packet.

This behaviour is different from the BSDs, and clearly contradicts
the purpose of urgent mode, which is to send the notification
(though not necessarily the associated data) as soon as possible.
Our current behaviour may in fact delay the urgent notification
indefinitely if the receiver window does not open up.

Simply matching BSD however may break legacy applications which
incorrectly rely on the out-of-band delivery of urgent data, and
conversely the in-band delivery of non-urgent data.

Alexey Kuznetsov suggested a safe solution of following BSD only
if the urgent pointer itself has not yet been transmitted.  This
way we guarantee that when the remote end sees the packet with
non-urgent data marked as urgent due to wrap-around we would have
advanced the urgent pointer beyond, either to the actual urgent
data or to an as-yet untransmitted packet.

The only potential downside is that applications on the remote
end may see multiple SIGURG notifications.  However, this would
occur anyway with other TCP stacks.  More importantly, the outcome
of such a duplicate notification is likely to be harmless since
the signal itself does not carry any information other than the
fact that we're in urgent mode.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25 17:12:58 -08:00
Wei Yongjun
8510b937ae sctp: Add validity check for SCTP_PARTIAL_DELIVERY_POINT socket option
The latest ietf socket extensions API draft said:

  8.1.21.  Set or Get the SCTP Partial Delivery Point

    Note also that the call will fail if the user attempts to set
    this value larger than the socket receive buffer size.

This patch add this validity check for SCTP_PARTIAL_DELIVERY_POINT
socket option.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25 16:59:03 -08:00
Wei Yongjun
9fcb95a105 sctp: Avoid memory overflow while FWD-TSN chunk is received with bad stream ID
If FWD-TSN chunk is received with bad stream ID, the sctp will not do the
validity check, this may cause memory overflow when overwrite the TSN of
the stream ID.

The FORWARD-TSN chunk is like this:

FORWARD-TSN chunk
  Type                       = 192
  Flags                      = 0
  Length                     = 172
  NewTSN                     = 99
  Stream                     = 10000
  StreamSequence             = 0xFFFF

This patch fix this problem by discard the chunk if stream ID is not
less than MIS.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25 16:58:11 -08:00
Wei Yongjun
aea3c5c05d sctp: Implement socket option SCTP_GET_ASSOC_NUMBER
Implement socket option SCTP_GET_ASSOC_NUMBER of the latest ietf socket
extensions API draft.

  8.2.5.  Get the Current Number of Associations (SCTP_GET_ASSOC_NUMBER)

   This option gets the current number of associations that are attached
   to a one-to-many style socket.  The option value is an uint32_t.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25 16:57:24 -08:00
Wei Yongjun
ea686a2653 sctp: Fix a typo in socket.c
Just fix a typo in socket.c.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25 16:56:45 -08:00
Wei Yongjun
e89c209581 sctp: Bring SCTP_MAXSEG socket option into ietf API extension compliance
Brings maxseg socket option set/get into line with the latest ietf socket
extensions API draft, while maintaining backwards compatibility.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25 16:54:58 -08:00
Eric Dumazet
f7d1b9f5aa vlan: fix convertion to net_device_ops
commit 656299f706
(vlan: convert to net_device_ops) added a net_device_ops
with a NULL ndo_start_xmit field.

This gives a crash in dev_hard_start_xmit()

Fix it using two net_device_ops structures, one for hwaccel vlan,
one for non hwaccel vlan.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25 16:45:19 -08:00
Alexey Dobriyan
7091e728c5 netns: igmp: make /proc/net/{igmp,mcfilter} per netns
This patch makes the followinf proc entries per-netns:
/proc/net/igmp
/proc/net/mcfilter

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25 16:42:51 -08:00
Alexey Dobriyan
b4ee07df3d netns: igmp: allow IPPROTO_IGMP sockets in netns
Looks like everything is already ready.

Required for ebtables(8) for one thing.

Also, required for ipmr per-netns (coming soon). (Benjamin)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25 16:42:23 -08:00
Ursula Braun
8f7c502c26 [S390] convert iucv printks to dev_xxx and pr_xxx macros.
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-12-25 13:39:24 +01:00
Hendrik Brueckner
91d5d45ee0 [S390] iucv: Locking free version of iucv_message_(receive|send)
Provide a locking free version of iucv_message_receive and iucv_message_send
that do not call local_bh_enable in a spin_lock_(bh|irqsave)() context.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
2008-12-25 13:39:04 +01:00
James Morris
cbacc2c7f0 Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
David S. Miller
6332178d91 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ppp_generic.c
2008-12-23 17:56:23 -08:00
Olga Kornievskaia
2efef7080f rpc: add service field to new upcall
This patch extends the new upcall with a "service" field that currently
can have 2 values: "*" or "nfs". These values specify matching rules for
principals in the keytab file. The "*" means that gssd is allowed to use
"root", "nfs", or "host" keytab entries while the other option requires
"nfs".

Restricting gssd to use the "nfs" principal is needed for when the
server performs a callback to the client.  The server in this case has
to authenticate itself as an "nfs" principal.

We also need "service" field to distiguish between two client-side cases
both currently using a uid of 0: the case of regular file access by the
root user, and the case of state-management calls (such as setclientid)
which should use a keytab for authentication.  (And the upcall should
fail if an appropriate principal can't be found.)

Signed-off: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:19:56 -05:00
Olga Kornievskaia
8b1c7bf5b6 rpc: add target field to new upcall
This patch extends the new upcall by adding a "target" field
communicating who we want to authenticate to (equivalently, the service
principal that we want to acquire a ticket for).

Signed-off: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:19:26 -05:00
Olga Kornievskaia
61054b14d5 nfsd: support callbacks with gss flavors
This patch adds server-side support for callbacks other than AUTH_SYS.

Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:19:00 -05:00
Olga Kornievskaia
945b34a772 rpc: allow gss callbacks to client
This patch adds client-side support to allow for callbacks other than
AUTH_SYS.

Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:18:34 -05:00
Olga Kornievskaia
608207e888 rpc: pass target name down to rpc level on callbacks
The rpc client needs to know the principal that the setclientid was done
as, so it can tell gssd who to authenticate to.

Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:17:40 -05:00
Olga Kornievskaia
68e76ad0ba nfsd: pass client principal name in rsc downcall
Two principals are involved in krb5 authentication: the target, who we
authenticate *to* (normally the name of the server, like
nfs/server.citi.umich.edu@CITI.UMICH.EDU), and the source, we we
authenticate *as* (normally a user, like bfields@UMICH.EDU)

In the case of NFSv4 callbacks, the target of the callback should be the
source of the client's setclientid call, and the source should be the
nfs server's own principal.

Therefore we allow svcgssd to pass down the name of the principal that
just authenticated, so that on setclientid we can store that principal
name with the new client, to be used later on callbacks.

Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:17:15 -05:00
\"J. Bruce Fields\
34769fc488 rpc: implement new upcall
Implement the new upcall.  We decide which version of the upcall gssd
will use (new or old), by creating both pipes (the new one named "gssd",
the old one named after the mechanism (e.g., "krb5")), and then waiting
to see which version gssd actually opens.

We don't permit pipes of the two different types to be opened at once.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:16:37 -05:00
\"J. Bruce Fields\
5b7ddd4a7b rpc: store pointer to pipe inode in gss upcall message
Keep a pointer to the inode that the message is queued on in the struct
gss_upcall_msg.  This will be convenient, especially after we have a
choice of two pipes that an upcall could be queued on.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:15:44 -05:00
\"J. Bruce Fields\
79a3f20b64 rpc: use count of pipe openers to wait for first open
Introduce a global variable pipe_version which will eventually be used
to keep track of which version of the upcall gssd is using.

For now, though, it only keeps track of whether any pipe is open or not;
it is negative if not, zero if one is opened.  We use this to wait for
the first gssd to open a pipe.

(Minor digression: note this waits only for the very first open of any
pipe, not for the first open of a pipe for a given auth; thus we still
need the RPC_PIPE_WAIT_FOR_OPEN behavior to wait for gssd to open new
pipes that pop up on subsequent mounts.)

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:10:52 -05:00
\"J. Bruce Fields\
cf81939d6f rpc: track number of users of the gss upcall pipe
Keep a count of the number of pipes open plus the number of messages on
a pipe.  This count isn't used yet.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:10:19 -05:00
\"J. Bruce Fields\
e712804ae4 rpc: call release_pipe only on last close
I can't see any reason we need to call this until either the kernel or
the last gssd closes the pipe.

Also, this allows to guarantee that open_pipe and release_pipe are
called strictly in pairs; open_pipe on gssd's first open, release_pipe
on gssd's last close (or on the close of the kernel side of the pipe, if
that comes first).

That will make it very easy for the gss code to keep track of which
pipes gssd is using.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:09:47 -05:00
\"J. Bruce Fields\
c381060869 rpc: add an rpc_pipe_open method
We want to transition to a new gssd upcall which is text-based and more
easily extensible.

To simplify upgrades, as well as testing and debugging, it will help if
we can upgrade gssd (to a version which understands the new upcall)
without having to choose at boot (or module-load) time whether we want
the new or the old upcall.

We will do this by providing two different pipes: one named, as
currently, after the mechanism (normally "krb5"), and supporting the
old upcall.  One named "gssd" and supporting the new upcall version.

We allow gssd to indicate which version it supports by its choice of
which pipe to open.

As we have no interest in supporting *simultaneous* use of both
versions, we'll forbid opening both pipes at the same time.

So, add a new pipe_open callback to the rpc_pipefs api, which the gss
code can use to track which pipes have been open, and to refuse opens of
incompatible pipes.

We only need this to be called on the first open of a given pipe.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:08:32 -05:00
\"J. Bruce Fields\
db75b3d6b5 rpc: minor gss_alloc_msg cleanup
I want to add a little more code here, so it'll be convenient to have
this flatter.

Also, I'll want to add another error condition, so it'll be more
convenient to return -ENOMEM than NULL in the error case.  The only
caller is already converting NULL to -ENOMEM anyway.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:07:13 -05:00
\"J. Bruce Fields\
b03568c322 rpc: factor out warning code from gss_pipe_destroy_msg
We'll want to call this from elsewhere soon.  And this is a bit nicer
anyway.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:55 -05:00
\"J. Bruce Fields\
99db356368 rpc: remove unnecessary assignment
We're just about to kfree() gss_auth, so there's no point to setting any
of its fields.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:33 -05:00
Jeff Layton
6dcd3926b2 sunrpc: fix code that makes auth_gss send destroy_cred message (try #2)
There's a bit of a chicken and egg problem when it comes to destroying
auth_gss credentials. When we destroy the last instance of a GSSAPI RPC
credential, we should send a NULL RPC call with a GSS procedure of
RPCSEC_GSS_DESTROY to hint to the server that it can destroy those
creds.

This isn't happening because we're setting clearing the uptodate bit on
the credentials and then setting the operations to the gss_nullops. When
we go to do the RPC call, we try to refresh the creds. That fails with
-EACCES and the call fails.

Fix this by not clearing the UPTODATE bit for the credentials and adding
a new crdestroy op for gss_nullops that just tears down the cred without
trying to destroy the context.

The only difference between this patch and the first one is the removal
of some minor formatting deltas.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:57 -05:00
Peter Staubach
64672d55d9 optimize attribute timeouts for "noac" and "actimeo=0"
Hi.

I've been looking at a bugzilla which describes a problem where
a customer was advised to use either the "noac" or "actimeo=0"
mount options to solve a consistency problem that they were
seeing in the file attributes.  It turned out that this solution
did not work reliably for them because sometimes, the local
attribute cache was believed to be valid and not timed out.
(With an attribute cache timeout of 0, the cache should always
appear to be timed out.)

In looking at this situation, it appears to me that the problem
is that the attribute cache timeout code has an off-by-one
error in it.  It is assuming that the cache is valid in the
region, [read_cache_jiffies, read_cache_jiffies + attrtimeo].  The
cache should be considered valid only in the region,
[read_cache_jiffies, read_cache_jiffies + attrtimeo).  With this
change, the options, "noac" and "actimeo=0", work as originally
expected.

This problem was previously addressed by special casing the
attrtimeo == 0 case.  However, since the problem is only an off-
by-one error, the cleaner solution is address the off-by-one
error and thus, not require the special case.

    Thanx...

        ps

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:56 -05:00
Trond Myklebust
7bd8826915 SUNRPC: rpcsec_gss modules should not be used by out-of-tree code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:32 -05:00
Trond Myklebust
468039ee46 SUNRPC: Convert the xdr helpers and rpc_pipefs to EXPORT_SYMBOL_GPL
We've never considered the sunrpc code as part of any ABI to be used by
out-of-tree modules.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:31 -05:00
Trond Myklebust
88a9fe8cae SUNRPC: Remove the last remnant of the BKL...
Somehow, this escaped the previous purge. There should be no need to keep
any extra locks in the XDR callbacks.

The NFS client XDR code only writes into private objects, whereas all reads
of shared objects are confined to fields that do not change, such as
filehandles...

Ditto for lockd, the NFSv2/v3 client mount code, and rpcbind.

The nfsd XDR code may require the BKL, but since it does a synchronous RPC
call from a thread that already holds the lock, that issue is moot.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:31 -05:00
Jarek Poplawski
05a8c1cbfe pkt_sched: Remove smp_wmb() in qdisc_watchdog()
While implementing a TCQ_F_THROTTLED flag there was used an smp_wmb()
in qdisc_watchdog(), but since this flag is practically used only in
sch_netem(), and since it's not even clear what reordering is avoided
here (TCQ_F_THROTTLED vs. __QDISC_STATE_SCHED?) it seems the barrier
could be safely removed.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-22 19:44:13 -08:00
Jarek Poplawski
5f2f6da76c net: Fix oops in dev_ifsioc()
A command like this: "brctl addif br1 eth1" issued as a user gave me
an oops when bridge module wasn't loaded. It's caused by using a dev
pointer before checking for NULL.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-22 19:35:28 -08:00
Jarek Poplawski
7f3ff4f63f pkt_sched: Annotate uninitialized var in sfq_enqueue()
Some gcc versions warn that ret may be used uninitialized in
sfq_enqueue(). It's a false positive, so let's annotate this.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-21 20:14:48 -08:00
Don Skidmore
f4314e815e net: add DCNA attribute to the BCN interface for DCB
Adds the Backward Congestion Notification Address (BCNA) attribute to the
Backward Congestion Notification (BCN) interface for Data Center Bridging
(DCB), which was missing.  Receive the BCNA attribute in the ixgbe driver.
The BCNA attribute is for a switch to inform the endstation about the physical
port identification in order to support BCN on aggregated links.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Eric W Multanen <eric.w.multanen@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2008-12-21 20:10:29 -08:00
Don Skidmore
1486a61ebc net: fix DCB setstate to return success/failure
Data Center Bridging (DCB) had no way to know if setstate had failed in the
driver.  This patch enables dcb netlink code to handle the status for the DCB
setstate interface.  Likewise it allows the driver to return a failed status
if MSI-X isn't enabled.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Eric W Multanen <eric.w.multanen@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-21 20:09:50 -08:00
Kalle Valo
520eb82076 mac80211: implement dynamic power save
This patch implements dynamic power save for mac80211. Basically it
means enabling power save mode after an idle period. Implementing it
dynamically gives a good compromise of low power consumption and low
latency. Some hardware have support for this in firmware, but some
require the host to do it.

The dynamic power save is implemented by adding an timeout to
ieee80211_subif_start_xmit(). The timeout can be enabled from userspace
with Wireless Extensions. For example, the command below enables the
dynamic power save and sets the time timeout to 500 ms:

iwconfig wlan0 power timeout 500m

Power save now only works with devices which handle power save in firmware.
It's also disabled by default and the heuristics when and how to enable is
considered as a policy decision and will be left for the userspace to handle.
In case the firmware has support for this, drivers can disable this feature
with IEEE80211_HW_NO_STACK_DYNAMIC_PS.

Big thanks to Johannes Berg for the help with the design and code.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:24:00 -05:00
Kalle Valo
ce7c9111a9 mac80211: track master queue status
This is a preparation for the dynamic power save support. In future there are
two paths to stop the master queues and we need to track this properly to
avoid starting queues incorrectly. Implement this by adding a status
array for each queue.

The original idea and design is from Johannes Berg, I just did
the implementation based on his notes. All the bugs are mine, of course.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:23:59 -05:00
Kalle Valo
e0cb686ff8 mac80211: enable IEEE80211_CONF_PS only when associated
Also disable power save when disassociated. It makes no sense to have
power save enabled while disassociated.

iwlwifi seems to have this check in the driver, but it's better to do this
in mac80211 instead.

Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:23:57 -05:00
Larry Finger
5e3f308997 mac80211: Print unknown packet type in tasklet_handler
In stress testing p54usb, the WARN_ON() in ieee80211_tasklet_handler() was
triggered; however, there is no logging of the received value for packet
type. Adding that feature will improve the warning.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:23:48 -05:00
Rami Rosen
135541215c mac80211: fix a typo in ieee80211_send_assoc() method.
This patch fixes a typo in ieee80211_send_assoc(), net/mac80211/mlme.c.

The error is usage of a wrong member when building
the ie80211 management frame (it should be assoc_req, and not reassoc_req).

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:23:30 -05:00
Jouni Malinen
8d6f658e21 mac80211: Remove radiotap rate-present flag for HT
Since we do not currently report HT rates (MCS index) in radiotap
header for HT rates, we should not claim the rate is present. The rate
octet itself is used as padding in this case, so only the it_present
flag needs to be removed in case of HT rates.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:23:22 -05:00
Jouni Malinen
b8d476c8cb mac80211: Send Layer 2 Update frame on reassociation
When a STA roams back to the same AP before the previous STA entry has
expired, a new STA entry is not added in mac80211. However, a Layer 2
Update frame still needs to be transmitted to update layer 2 devices
about the new location for the STA. Without this, switches may
continue to forward frames to the previous (now incorrect) port when
STA roams between APs.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:23:08 -05:00
Jouni Malinen
0fb8ca45eb mac80211: Add HT rates into RX status reporting
This patch adds option for HT-enabled drivers to report HT rates
(HT20/HT40, short GI, MCS index) to mac80211. These rates are
currently not in the rate table, so the rate_idx is used to indicate
MCS index.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:23:04 -05:00
Sujith
094d05dc32 mac80211: Fix HT channel selection
HT management is done differently for AP and STA modes, unify
to just the ->config() callback since HT is fundamentally a
PHY property and cannot be per-BSS.

Rename enum nl80211_sec_chan_offset as nl80211_channel_type to denote
the channel type ( NO_HT, HT20, HT40+, HT40- ).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:22:54 -05:00
Henning Rogge
420e7fabd9 nl80211: Add signal strength and bandwith to nl80211station info
This patch adds signal strength and transmission bitrate
to the station_info of nl80211.

Signed-off-by: Henning Rogge <rogge@fgan.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-19 15:04:54 -05:00
Matt Mackall
6086ebca13 tcp: Stop scaring users with "treason uncloaked!"
The original message was unhelpful and extremely alarming to our poor
users, despite its charm. Make it less frightening.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-18 22:27:42 -08:00
David S. Miller
3de77cf23e Revert "xfrm: Accept ESP packets regardless of UDP encapsulation mode"
This reverts commit e061b165c7.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-18 22:27:37 -08:00
Wei Yongjun
1b08534e56 net: Fix module refcount leak in kernel_accept()
The kernel_accept() does not hold the module refcount of newsock->ops->owner,
so we need __module_get(newsock->ops->owner) code after call kernel_accept()
by hand.
In sunrpc, the module refcount is missing to hold. So this cause kernel panic.

Used following script to reproduct:

while [ 1 ];
do
    mount -t nfs4 192.168.0.19:/ /mnt
    touch /mnt/file
    umount /mnt
    lsmod | grep ipv6
done

This patch fixed the problem by add __module_get(newsock->ops->owner) to
kernel_accept(). So we do not need to used __module_get(newsock->ops->owner)
in every place when used kernel_accept().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-18 19:35:10 -08:00
David S. Miller
49ad9599d4 Revert "net: release skb->dst in sock_queue_rcv_skb()"
This reverts commit 7035560287.

As pointed out by Mark McLoughlin IP_PKTINFO cmsg data is one
post-queueing user, so this optimization is not valid right
now.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-17 22:11:38 -08:00
Arnaldo Carvalho de Melo
a693722aec dccp_diag: LISTEN sockets don't have CCIDs
And thus when we try to use 'ss -danemi' on these sockets that have no
ccid blocks (data collected using systemtap after I fixed the problem):

dccp_diag_get_info sk=0xffff8801220a3100, dp->dccps_hc_rx_ccid=0x0000000000000000, dp->dccps_hc_tx_ccid=0x0000000000000000

We get an OOPS:

mica.ghostprotocols.net login: BUG: unable to handle kernel NULL pointer
dereferenc0
IP: [<ffffffffa0136082>] dccp_diag_get_info+0x82/0xc0 [dccp_diag]
PGD 12106f067 PUD 122488067 PMD 0
Oops: 0000 [#1] PREEMPT

Fix is trivial, and 'ss -d' is working again:

[root@mica ~]# ss -danemi
State   Recv-Q Send-Q   Local Address:Port   Peer Address:Port 
LISTEN  0      0                    *:5001              *:*
ino:7288 sk:220a3100ffff8801
	 mem:(r0,w0,f0,t0) cwnd:0 ssthresh:0
[root@mica ~]# 

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-17 16:08:01 -08:00
Rémi Denis-Courmont
893873f396 Phonet: get rid of deferred work on the transmission path
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-17 15:48:50 -08:00
Rémi Denis-Courmont
be677730a0 Phonet: use atomic for packet TX window
GPRS TX flow control won't need to lock the underlying socket anymore.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-17 15:48:31 -08:00
Rémi Denis-Courmont
57c81fffc8 Phonet: allocate separate ARP type for GPRS over a Phonet pipe
A separate xmit lock class supports GPRS over a Phonet pipe over a TUN
device (type ARPHRD_NONE).

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-17 15:47:48 -08:00
Rémi Denis-Courmont
2d91d78b68 Phonet: allocate a non-Ethernet ARP type
Also leave some room for more 802.11 types.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-17 15:47:29 -08:00
Yang Hongyang
9f690db7ff ipv6: fix the outgoing interface selection order in udpv6_sendmsg()
1.When no interface is specified in an IPV6_PKTINFO ancillary data
  item, the interface specified in an IPV6_PKTINFO sticky optionis 
  is used.

RFC3542:
6.7.  Summary of Outgoing Interface Selection

   This document and [RFC-3493] specify various methods that affect the
   selection of the packet's outgoing interface.  This subsection
   summarizes the ordering among those in order to ensure deterministic
   behavior.

   For a given outgoing packet on a given socket, the outgoing interface
   is determined in the following order:

   1. if an interface is specified in an IPV6_PKTINFO ancillary data
      item, the interface is used.

   2. otherwise, if an interface is specified in an IPV6_PKTINFO sticky
      option, the interface is used.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-16 02:08:29 -08:00
Yang Hongyang
f250dcdac1 ipv6: fix the return interface index when get it while no message is received
When get receiving interface index while no message is received,
the the value seted with setsockopt() should be returned.

RFC 3542:
   Issuing getsockopt() for the above options will return the sticky
   option value i.e., the value set with setsockopt().  If no sticky
   option value has been set getsockopt() will return the following
   values:

   -  For the IPV6_PKTINFO option, it will return an in6_pktinfo
      structure with ipi6_addr being in6addr_any and ipi6_ifindex being
      zero.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-16 02:07:45 -08:00
Yang Hongyang
b24a2516d1 ipv6: Add IPV6_PKTINFO sticky option support to setsockopt()
There are three reasons for me to add this support:
1.When no interface is specified in an IPV6_PKTINFO ancillary data
  item, the interface specified in an IPV6_PKTINFO sticky optionis 
  is used.

RFC3542:
6.7.  Summary of Outgoing Interface Selection

   This document and [RFC-3493] specify various methods that affect the
   selection of the packet's outgoing interface.  This subsection
   summarizes the ordering among those in order to ensure deterministic
   behavior.

   For a given outgoing packet on a given socket, the outgoing interface
   is determined in the following order:

   1. if an interface is specified in an IPV6_PKTINFO ancillary data
      item, the interface is used.

   2. otherwise, if an interface is specified in an IPV6_PKTINFO sticky
      option, the interface is used.

2.When no IPV6_PKTINFO ancillary data is received,getsockopt() should 
  return the sticky option value which set with setsockopt().

RFC 3542:
   Issuing getsockopt() for the above options will return the sticky
   option value i.e., the value set with setsockopt().  If no sticky
   option value has been set getsockopt() will return the following
   values:

3.Make the setsockopt implementation POSIX compliant.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-16 02:06:23 -08:00
Rémi Denis-Courmont
09a2c3c0d3 Phonet: improve GPRS variable names
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-16 01:18:31 -08:00
Ilpo Järvinen
b1879204dd ipmr: merge common code
Also removes redundant skb->len < x check which can't
be true once pskb_may_pull(skb, x) succeeded.

$ diff-funcs pim_rcv ipmr.c ipmr.c pim_rcv_v1
  --- ipmr.c:pim_rcv()
  +++ ipmr.c:pim_rcv_v1()
@@ -1,22 +1,27 @@
-static int pim_rcv(struct sk_buff * skb)
+int pim_rcv_v1(struct sk_buff * skb)
 {
-	struct pimreghdr *pim;
+	struct igmphdr *pim;
 	struct iphdr   *encap;
 	struct net_device  *reg_dev = NULL;

 	if (!pskb_may_pull(skb, sizeof(*pim) + sizeof(*encap)))
 		goto drop;

-	pim = (struct pimreghdr *)skb_transport_header(skb);
-	if (pim->type != ((PIM_VERSION<<4)|(PIM_REGISTER)) ||
-	    (pim->flags&PIM_NULL_REGISTER) ||
-	    (ip_compute_csum((void *)pim, sizeof(*pim)) != 0 &&
-	     csum_fold(skb_checksum(skb, 0, skb->len, 0))))
+	pim = igmp_hdr(skb);
+
+	if (!mroute_do_pim ||
+	    skb->len < sizeof(*pim) + sizeof(*encap) ||
+	    pim->group != PIM_V1_VERSION || pim->code != PIM_V1_REGISTER)
 		goto drop;

-	/* check if the inner packet is destined to mcast group */
 	encap = (struct iphdr *)(skb_transport_header(skb) +
-				 sizeof(struct pimreghdr));
+				 sizeof(struct igmphdr));
+	/*
+	   Check that:
+	   a. packet is really destinted to a multicast group
+	   b. packet is not a NULL-REGISTER
+	   c. packet is not truncated
+	 */
 	if (!ipv4_is_multicast(encap->daddr) ||
 	    encap->tot_len == 0 ||
 	    ntohs(encap->tot_len) + sizeof(*pim) > skb->len)
@@ -40,9 +45,9 @@
 	skb->ip_summed = 0;
 	skb->pkt_type = PACKET_HOST;
 	dst_release(skb->dst);
+	skb->dst = NULL;
 	reg_dev->stats.rx_bytes += skb->len;
 	reg_dev->stats.rx_packets++;
-	skb->dst = NULL;
 	nf_reset(skb);
 	netif_rx(skb);
 	dev_put(reg_dev);

$ codiff net/ipv4/ipmr.o.old net/ipv4/ipmr.o.new

net/ipv4/ipmr.c:
  pim_rcv_v1 | -283
  pim_rcv    | -284
 2 functions changed, 567 bytes removed

net/ipv4/ipmr.c:
  __pim_rcv | +307
 1 function changed, 307 bytes added

net/ipv4/ipmr.o.new:
 3 functions changed, 307 bytes added, 567 bytes removed, diff: -260

(Tested on x86_64).

It seems that pimlen arg could be left out as well and
eq-sizedness of structs trapped with BUILD_BUG_ON but
I don't think that's more than a cosmetic flaw since there
aren't that many args anyway.

Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-16 01:15:11 -08:00
Herbert Xu
b240a0e564 ethtool: Add GGRO and SGRO ops
This patch adds the ethtool ops to enable and disable GRO.  It also
makes GRO depend on RX checksum offload much the same as how TSO
depends on SG support.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 23:44:31 -08:00
Herbert Xu
bf296b125b tcp: Add GRO support
This patch adds the TCP-specific portion of GRO.  The criterion for
merging is extremely strict (the TCP header must match exactly apart
from the checksum) so as to allow refragmentation.  Otherwise this
is pretty much identical to LRO, except that we support the merging
of ECN packets.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 23:43:36 -08:00
Herbert Xu
71d93b39e5 net: Add skb_gro_receive
This patch adds the helper skb_gro_receive to merge packets for
GRO.  The current method is to allocate a new header skb and then
chain the original packets to its frag_list.  This is done to
make it easier to integrate into the existing GSO framework.

In future as GSO is moved into the drivers, we can undo this and
simply chain the original packets together.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 23:42:33 -08:00
Herbert Xu
73cc19f155 ipv4: Add GRO infrastructure
This patch adds GRO support for IPv4.

The criteria for merging is more stringent than LRO, in particular,
we require all fields in the IP header to be identical except for
the length, ID and checksum.  In addition, the ID must form an
arithmetic sequence with a difference of one.

The ID requirement might seem overly strict, however, most hardware
TSO solutions already obey this rule.  Linux itself also obeys this
whether GSO is in use or not.

In future we could relax this rule by storing the IDs (or rather
making sure that we don't drop them when pulling the aggregate
skb's tail).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 23:41:09 -08:00
Herbert Xu
d565b0a1a9 net: Add Generic Receive Offload infrastructure
This patch adds the top-level GRO (Generic Receive Offload) infrastructure.
This is pretty similar to LRO except that this is protocol-independent.
Instead of holding packets in an lro_mgr structure, they're now held in
napi_struct.

For drivers that intend to use this, they can set the NETIF_F_GRO bit and
call napi_gro_receive instead of netif_receive_skb or just call netif_rx.
The latter will call napi_receive_skb automatically.  When napi_gro_receive
is used, the driver must either call napi_complete/napi_rx_complete, or
call napi_gro_flush in softirq context if the driver uses the primitives
__napi_complete/__napi_rx_complete.

Protocols will set the gro_receive and gro_complete function pointers in
order to participate in this scheme.

In addition to the packet, gro_receive will get a list of currently held
packets.  Each packet in the list has a same_flow field which is non-zero
if it is a potential match for the new packet.  For each packet that may
match, they also have a flush field which is non-zero if the held packet
must not be merged with the new packet.

Once gro_receive has determined that the new skb matches a held packet,
the held packet may be processed immediately if the new skb cannot be
merged with it.  In this case gro_receive should return the pointer to
the existing skb in gro_list.  Otherwise the new skb should be merged into
the existing packet and NULL should be returned, unless the new skb makes
it impossible for any further merges to be made (e.g., FIN packet) where
the merged skb should be returned.

Whenever the skb is merged into an existing entry, the gro_receive
function should set NAPI_GRO_CB(skb)->same_flow.  Note that if an skb
merely matches an existing entry but can't be merged with it, then
this shouldn't be set.

If gro_receive finds it pointless to hold the new skb for future merging,
it should set NAPI_GRO_CB(skb)->flush.

Held packets will be flushed by napi_gro_flush which is called by
napi_complete and napi_rx_complete.

Currently held packets are stored in a singly liked list just like LRO.
The list is limited to a maximum of 8 entries.  In future, this may be
expanded to use a hash table to allow more flows to be held for merging.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 23:38:52 -08:00
Herbert Xu
1a881f27c5 net: Add frag_list support to GSO
This patch allows GSO to handle frag_list in a limited way for the
purposes of allowing packets merged by GRO to be refragmented on
output.

Most hardware won't (and aren't expected to) support handling GRO
frag_list packets directly.  Therefore we will perform GSO in
software for those cases.

However, for drivers that can support it (such as virtual NICs) we
may not have to segment the packets at all.

Whether the added overhead of GRO/GSO is worthwhile for bridges
and routers when weighed against the benefit of potentially
increasing the MTU within the host is still an open question.
However, for the case of host nodes this is undoubtedly a win.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 23:27:47 -08:00
Herbert Xu
89319d3801 net: Add frag_list support to skb_segment
This patch adds limited support for handling frag_list packets in
skb_segment.  The intention is to support GRO (Generic Receive Offload)
packets which will be constructed by chaining normal packets using
frag_list.

As such we require all frag_list members terminate on exact MSS
boundaries.  This is checked using BUG_ON.

As there should only be one producer in the kernel of such packets,
namely GRO, this requirement should not be difficult to maintain.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 23:26:06 -08:00
David S. Miller
eb14f01959 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/e1000e/ich8lan.c
2008-12-15 20:03:50 -08:00
Linus Torvalds
7004405cb8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  Phonet: keep TX queue disabled when the device is off
  SCHED: netem: Correct documentation comment in code.
  netfilter: update rwlock initialization for nat_table
  netlabel: Compiler warning and NULL pointer dereference fix
  e1000e: fix double release of mutex
  IA64: HP_SIMETH needs to depend upon NET
  netpoll: fix race on poll_list resulting in garbage entry
  ipv6: silence log messages for locally generated multicast
  sungem: improve ethtool output with internal pcs and serdes
  tcp: tcp_vegas cong avoid fix 
  sungem: Make PCS PHY support partially work again.
2008-12-15 16:30:22 -08:00
Don Skidmore
8b124a8e14 net: fix dcbnl_setnumtcs operation check
dcbml_setnumtcs wasn't checking for the presence of the setnumtcs
function.  Instead, it was checking for setstate which was a bug.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Eric W Multanen <eric.w.multanen@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 01:06:23 -08:00
Rémi Denis-Courmont
4798a2b84e Phonet: keep TX queue disabled when the device is off
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 00:53:57 -08:00
Jesper Dangaard Brouer
eb9b851b98 SCHED: netem: Correct documentation comment in code.
The netem simulator is no longer limited by Linux timer resolution HZ.
Not since Patrick McHardy changed the QoS system to use hrtimer.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 00:39:17 -08:00
Steven Rostedt
be70ed189b netfilter: update rwlock initialization for nat_table
The commit e099a17357
(netfilter: netns nat: per-netns NAT table) renamed the
nat_table from __nat_table to nat_table without updating the
__RW_LOCK_UNLOCKED(__nat_table.lock).

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 00:19:14 -08:00
Ilpo Järvinen
b1721d2bb9 rpc/rdma: goto instead of copypaste
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-14 23:19:48 -08:00
Ilpo Järvinen
79f55f11a0 nf/dccp: merge errorpaths
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-14 23:19:02 -08:00
Ilpo Järvinen
e780f1c33d irda: merge exit paths
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-14 23:18:30 -08:00
Ilpo Järvinen
037322abe6 bt/rfcomm/tty: join error paths
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-14 23:18:00 -08:00
Ilpo Järvinen
0eae1b98cf ax25: join the return paths that free skb
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-14 23:17:26 -08:00
Ilpo Järvinen
ebad5c0984 can: merge error paths
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-14 23:16:58 -08:00
Ilpo Järvinen
d8eb93078c xfrm: join error paths
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-14 23:16:22 -08:00
Ilpo Järvinen
8da73b73ef ip6mr: use goto to common label instead of opencoding
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-14 23:15:49 -08:00
Ilpo Järvinen
448eb71f40 ipv6/mcast: join error paths using goto
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-14 23:15:21 -08:00
Ilpo Järvinen
5ce1bbb97b xfrm6_tunnel: join error paths using goto
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-14 23:13:48 -08:00
Ilpo Järvinen
857a6e0a4d icsk: join error paths using goto
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-14 23:13:08 -08:00
Rami Rosen
ab1f5c0bb8 mac80211: misc cleanups
This patch removes unneeded member (skbuff) from
ieee80211_ibss_add_sta() method in its declaration (in ieee80211_i.h)
and its callers (in rx.c and mlme.c)

This patch removes unneeded member from struct ieee80211_rx_data
in ieee80211_i.h.

(Originally posted as two patches. -- JWL)

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-12 14:45:27 -05:00
Johannes Berg
4dec9b807b rfkill: strip pointless notifier chain
No users, so no reason to have it.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-12 14:45:25 -05:00
Jouni Malinen
b7a530d82c mac80211: Disable requests for new scans in AP mode
AP mode operations are seriously affected if mac80211 runs through a
multi-second scan while the AP is trying to send Beacon frames on the
operation channel. While this could be implemented in a way that does
not cause too many problems, it is not very simple and will require
synchronization with Beacon frame scheduling in the drivers (scan one
channel at a time between Beacon frames). Furthermore, such scanning
takes quite a bit longer time and existing userspace applications
would be likely to timeout while waiting for the results.

For now, just refuse requests for new scans (SIOCSIWSCAN) when in AP
mode. In practice, this moves the rejection from iwl* drivers into
mac80211 to make it apply to every mac80211-based driver.

This issue shows up in associated stations getting disconnected when
something (e.g., Network Manager) requests a scan while the interface
is in AP mode. When doing this continuously (e.g., NM does it every 120
seconds), the network gets close to useless.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-12 14:02:12 -05:00
Christian Lamparter
89fad578a6 mac80211: integrate sta_notify_ps cmds into sta_notify
This patch replaces the newly introduced sta_notify_ps function,
which can be used to notify the driver about every power state
transition for all associated stations, by integrating its functionality
back into the original sta_notify callback.

Signed-off-by: Christian Lamparter <chunkeey@web.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-12 14:01:42 -05:00
Johannes Berg
b143923689 mac80211/cfg80211: check endianness in sparse runs
Make sure sparse checks endianness when run on mac80211/cfg80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-12 14:01:35 -05:00
Johannes Berg
f546638c3f mac80211: remove fragmentation offload functionality
There's no driver that actually does fragmentation on the
device, and the callback is buggy (when it returns an error,
mac80211's fragmentation status is changed so reading the
frag threshold from userspace reads the new value despite
the error). Let's just remove it, if we really find some
hardware supporting it we can add it back later.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-12 14:01:33 -05:00
Johannes Berg
8dffff216f mac80211: only create default STA interface if supported
Drivers will support this, obviously, but this forces them to
set it up properly.

(This includes the fix posted as "mac80211: fix ifmodes check" and
tested in wireless-testing by Hin-Tak and others. -- JWL)

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-12 13:59:43 -05:00
Johannes Berg
306d6112f9 cfg80211: fix nl80211 frequency handling
Fix two small bugs with HT frequency setting:
 * HT is accepted even when the driver is incapable
 * HT40 is accepted when the driver cannot do 40 MHz
 (both on the selected band)

Also simplify the code a little.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-12 13:48:25 -05:00
Reinette Chatre
447107fb32 mac80211: remove WARN_ON() from ieee80211_hw_config
ieee80211_hw_config can return an error when the hardware
has rfkill enabled. A WARN_ON() is too harsh for this
failure as it is a valid scenario. Only comment this warning
as we would like to have it back when rfkill is integrated into
mac80211.

Also reintroduce propagation of error if ieee80211_hw_config fails
in ieee80211_config_beacon.

This patch partially reverts patch:
5f0387fc3337ca26f0745f945f550f0c3734960f
"mac80211: clean up ieee80211_hw_config errors"

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-12 13:48:20 -05:00
Paul Moore
ec8f2375d7 netlabel: Compiler warning and NULL pointer dereference fix
Fix the two compiler warnings show below.  Thanks to Geert Uytterhoeven for
finding and reporting the problem.

 net/netlabel/netlabel_unlabeled.c:567: warning: 'entry' may be used
   uninitialized in this function
 net/netlabel/netlabel_unlabeled.c:629: warning: 'entry' may be used
   uninitialized in this function

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-11 21:31:50 -08:00
Eric Leblond
293a4f2833 netfilter: xt_NFLOG is dependant of nfnetlink_log
The patch "don't call nf_log_packet in NFLOG module" make xt_NFLOG
dependant of nfnetlink_log. This patch forces the dependencies to fix
compilation in case only xt_NFLOG compilation was asked and modifies the
help message accordingly to the change.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 17:24:33 -08:00
Benjamin Thery
8229efdaef netns: ip6mr: enable namespace support in ipv6 multicast forwarding code
This last patch makes the appropriate changes to use and propagate the
network namespace where needed in IPv6 multicast forwarding code.

This consists mainly in replacing all the remaining init_net occurences
with current netns pointer retrieved from sockets, net devices or 
mfc6_caches depending on the routines' contexts.

Some routines receive a new 'struct net' parameter to propagate the current
netns:
* ip6mr_get_route
* ip6mr_cache_report
* ip6mr_cache_find
* ip6mr_cache_unresolved
* mif6_add/mif6_delete
* ip6mr_mfc_add/ip6mr_mfc_delete
* ip6mr_reg_vif

All the IPv6 multicast forwarding variables moved to struct netns_ipv6 by
the previous patches are now referenced in the correct namespace.

Changelog:
==========
* Take into account the net associated to mfc6_cache when matching entries in
  mfc_unres_queue list.
* Call mroute_clean_tables() in ip6mr_net_exit() to free memory allocated
  per-namespace.
* Call dev_net_set() in ip6mr_reg_vif() to initialize dev->nd_net 
  correctly.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 16:30:15 -08:00
Benjamin Thery
8b90fc7e5b netns: ip6mr: declare ip6mr /proc/net entries per-namespace
Declare IPv6 multicast forwarding /proc/net entries per-namespace:
/proc/net/ip6_mr_vif
/proc/net/ip6_mr_cache

Changelog
=========
V2:
* In routine ipmr_mfc_seq_idx(), only match entries belonging to current
  netns in mfc_unres_queue list.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 16:29:48 -08:00
Benjamin Thery
950d5704e5 netns: ip6mr: declare reg_vif_num per-namespace
Preliminary work to make IPv6 multicast forwarding netns-aware.

Declare variable 'reg_vif_num' per-namespace, moves into struct netns_ipv6.

At the moment, this variable is only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 16:29:24 -08:00
Benjamin Thery
a21f3f997c netns: ip6mr: declare mroute_do_assert and mroute_do_pim per-namespace
Preliminary work to make IPv6 multicast forwarding netns-aware.

Declare IPv6 multicast forwarding variables 'mroute_do_assert' and
'mroute_do_pim' per-namespace in struct netns_ipv6.

At the moment, these variables are only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 16:28:44 -08:00
Benjamin Thery
4045e57c19 netns: ip6mr: declare counter cache_resolve_queue_len per-namespace
Preliminary work to make IPv6 multicast forwarding netns-aware.

Declare variable cache_resolve_queue_len per-namespace: moves it into
struct netns_ipv6.

This variable counts the number of unresolved cache entries queued in the
list mfc_unres_queue. This list is kept global to all netns as the number
of entries per namespace is limited to 10 (hardcoded in routine 
ip6mr_cache_unresolved).
Entries belonging to different namespaces in mfc_unres_queue will be
identified by matching the mfc_net member introduced previously in 
struct mfc6_cache.

Keeping this list global to all netns, also allows us to keep a single
timer (ipmr_expire_timer) to handle their expiration.
In some places cache_resolve_queue_len value was tested for arming 
or deleting the timer. These tests were equivalent to testing 
mfc_unres_queue value instead and are replaced in this patch.

At the moment, cache_resolve_queue_len is only referenced in init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 16:27:21 -08:00
Benjamin Thery
4a6258a0e3 netns: ip6mr: dynamically allocate mfc6_cache_array
Preliminary work to make IPv6 multicast forwarding netns-aware.

Dynamically allocates IPv6 multicast forwarding cache, mfc6_cache_array,
and moves it to struct netns_ipv6. 

At the moment, mfc6_cache_array is only referenced in init_net.

Replace 'ARRAY_SIZE(mfc6_cache_array)' with mfc6_cache_array size: MFC6_LINES.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 16:24:07 -08:00
Benjamin Thery
58701ad411 netns: ip6mr: store netns in struct mfc6_cache
This patch stores into struct mfc6_cache the network namespace each
mfc6_cache belongs to. The new member is mfc6_net.

mfc6_net is assigned at cache allocation and doesn't change during
the rest of the cache entry life.

This will help to retrieve the current netns around the IPv6 multicast
forwarding code.

At the moment, all mfc6_cache are allocated in init_net.

Changelog:
==========
* Use write_pnet()/read_pnet() to set and get mfc6_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-10 16:22:34 -08:00