Commit Graph

74 Commits

Author SHA1 Message Date
Masanari Iida
e793c0f70e net: treewide: Fix typo found in DocBook/networking.xml
This patch fix spelling typo found in DocBook/networking.xml.
It is because the neworking.xml is generated from comments
in the source, I have to fix typo in comments within the source.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05 17:35:28 -07:00
Tom Herbert
46fb51eb96 net: Fix save software checksum complete
Geert reported issues regarding checksum complete and UDP.
The logic introduced in commit 7e3cead517
("net: Save software checksum complete") is not correct.

This patch:
1) Restores code in __skb_checksum_complete_header except for setting
   CHECKSUM_UNNECESSARY. This function may be calculating checksum on
   something less than skb->len.
2) Adds saving checksum to __skb_checksum_complete. The full packet
   checksum 0..skb->len is calculated without adding in pseudo header.
   This value is saved in skb->csum and then the pseudo header is added
   to that to derive the checksum for validation.
3) In both __skb_checksum_complete_header and __skb_checksum_complete,
   set skb->csum_valid to whether checksum of zero was computed. This
   allows skb_csum_unnecessary to return true without changing to
   CHECKSUM_UNNECESSARY which was done previously.
4) Copy new csum related bits in __copy_skb_header.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 01:00:49 -07:00
Tom Herbert
7e3cead517 net: Save software checksum complete
In skb_checksum complete, if we need to compute the checksum for the
packet (via skb_checksum) save the result as CHECKSUM_COMPLETE.
Subsequent checksum verification can use this.

Also, added csum_complete_sw flag to distinguish between software and
hardware generated checksum complete, we should always be able to trust
the software computation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:46:13 -07:00
Zhi Yong Wu
c4e819d16c net, datagram: fix the incorrect comment in zerocopy_sg_from_iovec()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 00:19:04 -04:00
Jason Wang
3d9953a2ef net: use skb_copy_datagram_from_iovec() in zerocopy_sg_from_iovec()
Use skb_copy_datagram_from_iovec() to avoid code duplication and make it easy to
be read. Also we can do the skipping inside the zero-copy loop.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:38 -07:00
Jason Wang
0433547aa7 net: use release_pages() in zerocopy_sg_from_iovec()
To reduce the duplicated codes.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:38 -07:00
Jason Wang
234a426708 net: remove the useless comment in zerocopy_sg_from_iovec()
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:37 -07:00
Jason Wang
0a57ec62df net: use skb_fill_page_desc() in zerocopy_sg_from_iovec()
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:35 -07:00
Jason Wang
c3bdeb5c7c net: move zerocopy_sg_from_iovec() to net/core/datagram.c
To let it be reused and reduce code duplication. Also document this function.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:34 -07:00
Eliezer Tamir
076bb0c82a net: rename include/net/ll_poll.h to include/net/busy_poll.h
Rename the file and correct all the places where it is included.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10 17:08:27 -07:00
Eliezer Tamir
cbf55001b2 net: rename low latency sockets functions to busy poll
Rename functions in include/net/ll_poll.h to busy wait.
Clarify documentation about expected power use increase.
Rename POLL_LL to POLL_BUSY_LOOP.
Add need_resched() testing to poll/select busy loops.

Note, that in select and poll can_busy_poll is dynamic and is
updated continuously to reflect the existence of supported
sockets with valid queue information.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-08 19:25:45 -07:00
Eliezer Tamir
a5b50476f7 udp: add low latency socket poll support
Add upport for busy-polling on UDP sockets.
In __udp[46]_lib_rcv add a call to sk_mark_ll() to copy the napi_id
from the skb into the sk.
This is done at the earliest possible moment, right after we identify
which socket this skb is for.
In __skb_recv_datagram When there is no data and the user
tries to read we busy poll.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 21:22:36 -07:00
David S. Miller
58717686cf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
	drivers/net/ethernet/emulex/benet/be.h
	include/net/tcp.h
	net/mac802154/mac802154.h

Most conflicts were minor overlapping stuff.

The be2net driver brought in some fixes that added __vlan_put_tag
calls, which in net-next take an additional argument.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-30 03:55:20 -04:00
Benjamin Poirier
39cc86130b unix/dgram: fix peeking with an offset larger than data in queue
Currently, peeking on a unix datagram socket with an offset larger than len of
the data in the sk receive queue returns immediately with bogus data. That's
because *off is not reset between each skb_queue_walk().

This patch fixes this so that the behavior is the same as peeking with no
offset on an empty queue: the caller blocks.

Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-30 00:43:54 -04:00
Benjamin Poirier
add05ad4e9 unix/dgram: peek beyond 0-sized skbs
"77c1090 net: fix infinite loop in __skb_recv_datagram()" (v3.8) introduced a
regression:
After that commit, recv can no longer peek beyond a 0-sized skb in the queue.
__skb_recv_datagram() instead stops at the first skb with len == 0 and results
in the system call failing with -EFAULT via skb_copy_datagram_iovec().

When peeking at an offset with 0-sized skb(s), each one of those is received
only once, in sequence. The offset starts moving forward again after receiving
datagrams with len > 0.

Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-30 00:43:54 -04:00
Jacob Keller
8facd5fb73 net: fix smatch warnings inside datagram_poll
Commit 7d4c04fc17 ("net: add option to enable
error queue packets waking select") has an issue due to operator precedence
causing the bit-wise OR to bind to the sock_flags call instead of the result of
the terniary conditional. This fixes the *_poll functions to work properly. The
old code results in "mask |= POLLPRI" instead of what was intended, which is to
only include POLLPRI when the socket option is enabled.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 16:59:16 -04:00
Keller, Jacob E
7d4c04fc17 net: add option to enable error queue packets waking select
Currently, when a socket receives something on the error queue it only wakes up
the socket on select if it is in the "read" list, that is the socket has
something to read. It is useful also to wake the socket if it is in the error
list, which would enable software to wait on error queue packets without waking
up for regular data on the socket. The main use case is for receiving
timestamped transmit packets which return the timestamp to the socket via the
error queue. This enables an application to select on the socket for the error
queue only instead of for the regular traffic.

-v2-
* Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file
* Modified every socket poll function that checks error queue

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jeffrey Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Matthew Vick <matthew.vick@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-31 19:44:20 -04:00
Eric Dumazet
77c1090f94 net: fix infinite loop in __skb_recv_datagram()
Tommi was fuzzing with trinity and reported the following problem :

commit 3f518bf745 (datagram: Add offset argument to __skb_recv_datagram)
missed that a raw socket receive queue can contain skbs with no payload.

We can loop in __skb_recv_datagram() with MSG_PEEK mode, because
wait_for_packet() is not prepared to skip these skbs.

[   83.541011] INFO: rcu_sched detected stalls on CPUs/tasks: {}
(detected by 0, t=26002 jiffies, g=27673, c=27672, q=75)
[   83.541011] INFO: Stall ended before state dump start
[  108.067010] BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child31:2847]
...
[  108.067010] Call Trace:
[  108.067010]  [<ffffffff818cc103>] __skb_recv_datagram+0x1a3/0x3b0
[  108.067010]  [<ffffffff818cc33d>] skb_recv_datagram+0x2d/0x30
[  108.067010]  [<ffffffff819ed43d>] rawv6_recvmsg+0xad/0x240
[  108.067010]  [<ffffffff818c4b04>] sock_common_recvmsg+0x34/0x50
[  108.067010]  [<ffffffff818bc8ec>] sock_recvmsg+0xbc/0xf0
[  108.067010]  [<ffffffff818bf31e>] sys_recvfrom+0xde/0x150
[  108.067010]  [<ffffffff81ca4329>] system_call_fastpath+0x16/0x1b

Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-12 16:07:19 -05:00
Eric Dumazet
22911fc581 net: skb_free_datagram_locked() doesnt drop all packets
dropwatch wrongly diagnose all received UDP packets as drops.

This patch removes trace_kfree_skb() done in skb_free_datagram_locked().

Locations calling skb_free_datagram_locked() should do it on their own.

As a result, drops are accounted on the right function.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27 15:40:57 -07:00
Eric Dumazet
95c9617472 net: cleanup unsigned to unsigned int
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:44:40 -04:00
David Howells
9ffc93f203 Remove all #inclusions of asm/system.h
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it.  Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-28 18:30:03 +01:00
Pavel Emelyanov
3f518bf745 datagram: Add offset argument to __skb_recv_datagram
This one is only considered for MSG_PEEK flag and the value pointed by
it specifies where to start peeking bytes from. If the offset happens to
point into the middle of the returned skb, the offset within this skb is
put back to this very argument.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 14:58:57 -05:00
Pavel Emelyanov
4934b0329f datagram: Factor out sk queue referencing
This makes lines shorter and simplifies further patching.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 14:58:54 -05:00
Eric Dumazet
9e903e0852 net: add skb frag size accessors
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:10:46 -04:00
Ian Campbell
ea2ab69379 net: convert core to skb paged frag APIs
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 17:52:11 -07:00
David Shwatrz
8917a3c0b7 Fix a typo in datagram.c and sctp/socket.c.
Hi,
This patch fixes a typo in net/core/datagram.c and in net/sctp/socket.c

Regards,
David Shwartz

Signed-off-by: David Shwartz <dshwatrz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-06 13:10:11 -08:00
Linus Torvalds
5f05647dd8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
  bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
  vlan: Calling vlan_hwaccel_do_receive() is always valid.
  tproxy: use the interface primary IP address as a default value for --on-ip
  tproxy: added IPv6 support to the socket match
  cxgb3: function namespace cleanup
  tproxy: added IPv6 support to the TPROXY target
  tproxy: added IPv6 socket lookup function to nf_tproxy_core
  be2net: Changes to use only priority codes allowed by f/w
  tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
  tproxy: added tproxy sockopt interface in the IPV6 layer
  tproxy: added udp6_lib_lookup function
  tproxy: added const specifiers to udp lookup functions
  tproxy: split off ipv6 defragmentation to a separate module
  l2tp: small cleanup
  nf_nat: restrict ICMP translation for embedded header
  can: mcp251x: fix generation of error frames
  can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
  can-raw: add msg_flags to distinguish local traffic
  9p: client code cleanup
  rds: make local functions/variables static
  ...

Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
drivers/net/wireless/ath/ath9k/debug.c as per David
2010-10-23 11:47:02 -07:00
Koki Sanagi
07dc22e729 skb: Add tracepoints to freeing skb
This patch adds tracepoint to consume_skb and add trace_kfree_skb
before __kfree_skb in skb_free_datagram_locked and net_tx_action.
Combinating with tracepoint on dev_hard_start_xmit, we can check
how long it takes to free transmitted packets. And using it, we can
calculate how many packets driver had at that time. It is useful when
a drop of transmitted packet is a problem.

            sshd-6828  [000] 112689.258154: consume_skb: skbaddr=f2d99bb8

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <4C724364.50903@jp.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-09-07 17:51:53 +02:00
Eric Dumazet
db40980fcd net: poll() optimizations
No need to test twice sk->sk_shutdown & RCV_SHUTDOWN

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-06 18:48:45 -07:00
Eric Dumazet
9e34a5b516 net/core: EXPORT_SYMBOL cleanups
CodingStyle cleanups

EXPORT_SYMBOL should immediately follow the symbol declaration.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-12 12:57:55 -07:00
Eric Dumazet
8a74ad60a5 net: fix lock_sock_bh/unlock_sock_bh
This new sock lock primitive was introduced to speedup some user context
socket manipulation. But it is unsafe to protect two threads, one using
regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh

This patch changes lock_sock_bh to be careful against 'owned' state.
If owned is found to be set, we must take the slow path.
lock_sock_bh() now returns a boolean to say if the slow path was taken,
and this boolean is used at unlock_sock_bh time to call the appropriate
unlock function.

After this change, BH are either disabled or enabled during the
lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
so we rename these functions to lock_sock_fast()/unlock_sock_fast().

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-27 00:30:53 -07:00
Eric Dumazet
93bb64eac1 net: skb_free_datagram_locked() fix
Commit 4b0b72f7dd ( net: speedup udp receive path )
introduced a bug in skb_free_datagram_locked().

We should not skb_orphan() skb if we dont have the guarantee we are the
last skb user, this might happen with MSG_PEEK concurrent users.

To keep socket locked for the smallest period of time, we split
consume_skb() logic, inlined in skb_free_datagram_locked()

Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-03 23:18:14 -07:00
Eric Dumazet
4b0b72f7dd net: speedup udp receive path
Since commit 95766fff ([UDP]: Add memory accounting.), 
each received packet needs one extra sock_lock()/sock_release() pair.

This added latency because of possible backlog handling. Then later,
ticket spinlocks added yet another latency source in case of DDOS.

This patch introduces lock_sock_bh() and unlock_sock_bh()
synchronization primitives, avoiding one atomic operation and backlog
processing.

skb_free_datagram_locked() uses them instead of full blown
lock_sock()/release_sock(). skb is orphaned inside locked section for
proper socket memory reclaim, and finally freed outside of it.

UDP receive path now take the socket spinlock only once.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 14:35:48 -07:00
Eric Dumazet
aa39514516 net: sk_sleep() helper
Define a new function to return the waitqueue of a "struct sock".

static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
	return sk->sk_sleep;
}

Change all read occurrences of sk_sleep by a call to this function.

Needed for a future RCU conversion. sk_sleep wont be a field directly
available.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20 16:37:13 -07:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
David S. Miller
230f9bb701 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/usb/cdc_ether.c

All CDC ethernet devices of type USB_CLASS_COMM need to use
'&mbm_info'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 00:55:55 -08:00
Eric Dumazet
9d410c7960 net: fix sk_forward_alloc corruption
On UDP sockets, we must call skb_free_datagram() with socket locked,
or risk sk_forward_alloc corruption. This requirement is not respected
in SUNRPC.

Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC

Reported-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30 12:25:12 -07:00
Eric Dumazet
8edf19c2fe net: sk_drops consolidation part 2
- skb_kill_datagram() can increment sk->sk_drops itself, not callers.

- UDP on IPV4 & IPV6 dropped frames (because of bad checksum or policy checks) increment sk_drops

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-18 18:52:54 -07:00
Neil Horman
e9b3cc1b37 net: skb ftracer - add tracepoint to skb_copy_datagram_iovec (v3)
skb allocation / cosumption tracer - Add consumption tracepoint

This patch adds a tracepoint to skb_copy_datagram_iovec, which is called each
time a userspace process copies a frame from a socket receive queue to a user
space buffer.  It allows us to hook in and examine each sk_buff that the system
receives on a per-socket bases, and can be use to compile a list of which skb's
were received by which processes.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/trace/events/skb.h |   20 ++++++++++++++++++++
 net/core/datagram.c        |    3 +++
 2 files changed, 23 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-13 16:26:14 -07:00
Jiri Olsa
a57de0b433 net: adding memory barrier to the poll and receive callbacks
Adding memory barrier after the poll_wait function, paired with
receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
to wrap the memory barrier.

Without the memory barrier, following race can happen.
The race fires, when following code paths meet, and the tp->rcv_nxt
and __add_wait_queue updates stay in CPU caches.

CPU1                         CPU2

sys_select                   receive packet
  ...                        ...
  __add_wait_queue           update tp->rcv_nxt
  ...                        ...
  tp->rcv_nxt check          sock_def_readable
  ...                        {
  schedule                      ...
                                if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                        wake_up_interruptible(sk->sk_sleep)
                                ...
                             }

If there was no cache the code would work ok, since the wait_queue and
rcv_nxt are opposit to each other.

Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
passed the tp->rcv_nxt check and sleeps, or will get the new value for
tp->rcv_nxt and will return with new data mask.
In both cases the process (CPU1) is being added to the wait queue, so the
waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

The bad case is when the __add_wait_queue changes done by CPU1 stay in its
cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
endup calling schedule and sleep forever if there are no more data on the
socket.

Calls to poll_wait in following modules were ommited:
	net/bluetooth/af_bluetooth.c
	net/irda/af_irda.c
	net/irda/irnet/irnet_ppp.c
	net/mac80211/rc80211_pid_debugfs.c
	net/phonet/socket.c
	net/rds/af_rds.c
	net/rfkill/core.c
	net/sunrpc/cache.c
	net/sunrpc/rpc_pipe.c
	net/tipc/socket.c

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-09 17:06:57 -07:00
David S. Miller
5b1a002ade datagram: Use frag list abstraction interfaces.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09 00:18:15 -07:00
Sridhar Samudrala
d2d27bfd11 net: Fix skb_copy_datagram_from_iovec() to pass the right offset
I am working on enabling UFO between KVM guests using virtio-net and i have
some patches that i got working with 2.6.30-rc8. When i wanted to try them
with net-next-2.6, i noticed that virtio-net is not working with that tree.

After some debugging, it turned out to be several bugs in the recent patches
to fix aio with tun driver, specifically the following 2 commits.

http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commitdiff;h=0a1ec07a67bd8b0033dace237249654d015efa21
http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commitdiff;h=6f26c9a7555e5bcca3560919db9b852015077dae

Fix the call to memcpy_from_iovecend() in skb_copy_datagram_from_iovec
to pass the right iovec offset.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-08 00:25:38 -07:00
John Dykstra
61de71c67c Network Drop Monitor: Fix skb_kill_datagram
Commit ead2ceb0ec ("Network Drop Monitor:
Adding kfree_skb_clean for non-drops and modifying end-of-line points
for skbs") established new conventions for identifying dropped packets.

Align skb_kill_datagram() with these conventions so that packets that
get dropped just before the copy to userspace are properly tracked.

Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-08 14:57:01 -07:00
David S. Miller
aba7453037 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	Documentation/isdn/00-INDEX
	drivers/net/wireless/iwlwifi/iwl-scan.c
	drivers/net/wireless/rndis_wlan.c
	net/mac80211/main.c
2009-04-29 20:30:35 -07:00
Eric Dumazet
bf368e4e70 net: Avoid extra wakeups of threads blocked in wait_for_packet()
In 2.6.25 we added UDP mem accounting.

This unfortunatly added a penalty when a frame is transmitted, since
we have at TX completion time to call sock_wfree() to perform necessary
memory accounting. This calls sock_def_write_space() and utimately
scheduler if any thread is waiting on the socket.
Thread(s) waiting for an incoming frame was scheduled, then had to sleep
again as event was meaningless.

(All threads waiting on a socket are using same sk_sleep anchor)

This adds lot of extra wakeups and increases latencies, as noted
by Christoph Lameter, and slows down softirq handler.

Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2 

Fortunatly, Davide Libenzi recently added concept of keyed wakeups
into kernel, and particularly for sockets (see commit
37e5540b3c 
epoll keyed wakeups: make sockets use keyed wakeups)

Davide goal was to optimize epoll, but this new wakeup infrastructure
can help non epoll users as well, if they care to setup an appropriate
handler.

This patch introduces new DEFINE_WAIT_FUNC() helper and uses it
in wait_for_packet(), so that only relevant event can wakeup a thread
blocked in this function.

Trace of function calls from bnx2 TX completion bnx2_poll_work() is :
__kfree_skb()
 skb_release_head_state()
  sock_wfree()
   sock_def_write_space()
    __wake_up_sync_key()
     __wake_up_common()
      receiver_wake_function() : Stops here since thread is waiting for an INPUT


Reported-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-28 02:24:21 -07:00
Michael S. Tsirkin
6f26c9a755 tun: fix tun_chr_aio_write so that aio works
aio_write gets const struct iovec * but tun_chr_aio_write casts this to struct
iovec * and modifies the iovec. As a result, attempts to use io_submit
to send packets to a tun device fail with weird errors such as EINVAL.

Since tun is the only user of skb_copy_datagram_from_iovec, we can
fix this simply by changing the later so that it does not
touch the iovec passed to it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-21 05:42:46 -07:00
Michael S. Tsirkin
0a1ec07a67 net: skb_copy_datagram_const_iovec()
There's an skb_copy_datagram_iovec() to copy out of a paged skb,
but it modifies the iovec, and does not support starting
at an offset in the destination. We want both in tun.c, so let's
add the function.

It's a carbon copy of skb_copy_datagram_iovec() with enough changes to
be annoying.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-21 05:42:44 -07:00
Neil Horman
ead2ceb0ec Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying end-of-line points for skbs
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/linux/skbuff.h |    4 +++-
 net/core/datagram.c    |    2 +-
 net/core/skbuff.c      |   22 ++++++++++++++++++++++
 net/ipv4/arp.c         |    2 +-
 net/ipv4/udp.c         |    2 +-
 net/packet/af_packet.c |    2 +-
 6 files changed, 29 insertions(+), 5 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 12:09:28 -07:00
Eric Dumazet
270acefafe net: sk_free_datagram() should use sk_mem_reclaim_partial()
I noticed a contention on udp_memory_allocated on regular UDP applications.

While tcp_memory_allocated is seldom used, it appears each incoming UDP frame
is currently touching udp_memory_allocated when queued, and when received by
application.

One possible solution is to use sk_mem_reclaim_partial() instead of
sk_mem_reclaim(), so that we keep a small reserve (less than one page)
of memory for each UDP socket.

We did something very similar on TCP side in commit
9993e7d313
([TCP]: Do not purge sk_forward_alloc entirely in tcp_delack_timer())

A more complex solution would need to convert prot->memory_allocated to
use a percpu_counter with batches of 64 or 128 pages.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05 01:38:06 -08:00
Alan Cox
113aa838ec net: Rationalise email address: Network Specific Parts
Clean up the various different email addresses of mine listed in the code
to a single current and valid address. As Dave says his network merges
for 2.6.28 are now done this seems a good point to send them in where
they won't risk disrupting real changes.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-13 19:01:08 -07:00