Commit Graph

1335 Commits

Author SHA1 Message Date
Herbert Xu
cb18978cbf gro: Open-code final pskb_may_pull
As we know the only packets which need the final pskb_may_pull
are completely non-linear, and have all the required bits in
frag0, we can perform a straight memcpy instead of going through
pskb_may_pull and doing skb_copy_bits.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:02 -07:00
Herbert Xu
a5b1cf288d gro: Avoid unnecessary comparison after skb_gro_header
For the overwhelming majority of cases, skb_gro_header's return
value cannot be NULL.  Yet we must check it because of its current
form.  This patch splits it up into multiple functions in order
to avoid this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:01 -07:00
Herbert Xu
7489594cb2 gro: Optimise length comparison in skb_gro_header
By caching frag0_len, we can avoid checking both frag0 and the
length separately in skb_gro_header.  This helps as skb_gro_header
is called four times per packet which amounts to a few million
times at 10Gb/s.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:01 -07:00
Herbert Xu
78d3fd0b7d gro: Only use skb_gro_header for completely non-linear packets
Currently skb_gro_header is used for packets which put the hardware
header in skb->data with the rest in frags.  Since the drivers that
need this optimisation all provide completely non-linear packets,
we can gain extra optimisations by only performing the frag0
optimisation for completely non-linear packets.

In particular, we can simply test frag0 (instead of skb_headlen)
to see whether the optimisation is in force.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:25:57 -07:00
Herbert Xu
67147ba99a gro: Localise offset/headlen in skb_gro_offset
This patch stores the offset/headlen in local variables as they're
used repeatedly in skb_gro_offset.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:25:55 -07:00
Herbert Xu
78a478d0ef gro: Inline skb_gro_header and cache frag0 virtual address
The function skb_gro_header is called four times per packet which
quickly adds up at 10Gb/s.  This patch inlines it to allow better
optimisations.

Some architectures perform multiplication for page_address, which
is done by each skb_gro_header invocation.  This patch caches that
value in skb->cb to avoid the unnecessary multiplications.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:25:55 -07:00
Herbert Xu
42da6994ca gro: Open-code frags copy in skb_gro_receive
gcc does a poor job at generating code for the memcpy of the frags
array in skb_gro_receive, which is the primary purpose of that
function when merging frags.  In particular, it can't utilise the
alignment information of the source and destination.  This patch
open-codes the copy so we process words instead of bytes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:25:54 -07:00
David S. Miller
2b0cc7f78b net: Remove bogus reference to BUS_ID_SIZE in sysfs code.
BUS_ID_SIZE is really no more, and device names are dynamically
allocated and thus can be any necessary size.

So remove the BUG check here making sure BUS_ID_SIZE is at least
as large as IFNAMSIZ.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-26 21:05:19 -07:00
Eric Dumazet
08baf56108 net: txq_trans_update() helper
We would like to get rid of netdev->trans_start = jiffies; that about all net
drivers have to use in their start_xmit() function, and use txq->trans_start
instead.

This can be done generically in core network, as suggested by David.

Some devices, (particularly loopback) dont need trans_start update, because
they dont have transmit watchdog. We could add a new device flag, or rely
on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
different than -1. Use a helper function to hide our choice.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 22:58:01 -07:00
Jarek Poplawski
a1dcb6628b pkt_sched: gen_estimator: Fix signed integers right-shifts.
Right-shifts of signed integers are implementation-defined so unportable.

With feedback from: Eric Dumazet <dada1@cosmosbay.com>

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 22:47:01 -07:00
Alexander Beregalov
e3804cbebb net: remove COMPAT_NET_DEV_OPS
All drivers are already converted to new net_device_ops API
and nobody uses old API anymore.

Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 01:53:53 -07:00
David S. Miller
c649c0e31d Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/ath/ath5k/phy.c
	drivers/net/wireless/iwlwifi/iwl-agn.c
	drivers/net/wireless/iwlwifi/iwl3945-base.c
2009-05-25 01:42:21 -07:00
Herbert Xu
9bcb97cace skbuff: Copy csum instead of csum_start/csum_offset
Hi:

skbuff: Copy csum instead of csum_start/csum_offset

It's easier to copy the u32 csum instead of its two u16
constituents.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 00:40:43 -07:00
Herbert Xu
82c49a352e skbuff: Move new code into __copy_skb_header
Hi:

skbuff: Move new __skb_clone code into __copy_skb_header

It seems that people just keep on adding stuff to __skb_clone
instead __copy_skb_header.  This is wrong as it means your brand-new
attributes won't always get copied as you intended.

This patch moves them to the right place, and adds a comment to
prevent this from happening again.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 00:40:43 -07:00
David S. Miller
7d18f11489 net: Fix arg to trace_napi_poll() in netpoll.
Reproted by Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 23:30:09 -07:00
Neil Horman
4ea7e38696 dropmon: add ability to detect when hardware dropsrxpackets
Patch to add the ability to detect drops in hardware interfaces via dropwatch.
Adds a tracepoint to net_rx_action to signal everytime a napi instance is
polled.  The dropmon code then periodically checks to see if the rx_frames
counter has changed, and if so, adds a drop notification to the netlink
protocol, using the reserved all-0's vector to indicate the drop location was in
hardware, rather than somewhere in the code.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/linux/net_dropmon.h |    8 ++
 include/trace/napi.h        |   11 +++
 net/core/dev.c              |    5 +
 net/core/drop_monitor.c     |  124 ++++++++++++++++++++++++++++++++++++++++++--
 net/core/net-traces.c       |    4 +
 net/core/netpoll.c          |    2
 6 files changed, 149 insertions(+), 5 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 16:50:21 -07:00
Stephen Hemminger
ca0f31125c netns: simplify net_ns_init
The net_ns_init code can be simplified. No need to save error code
if it is only going to panic if it is set 4 lines later.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 15:10:31 -07:00
Stephen Hemminger
1f7a2bb4ef netns: remove leftover debugging message
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 15:10:05 -07:00
Florian Westphal
5b5f792a6a pktgen: do not access flows[] beyond its length
typo -- pkt_dev->nflows is for stats only, the number of concurrent
flows is stored in cflows.

Reported-By: Vladimir Ivashchenko <hazard@francoudi.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 15:07:12 -07:00
Rami Rosen
04af8cf6f3 net: Remove unused parameter from fill method in fib_rules_ops.
The netlink message header (struct nlmsghdr) is an unused parameter in
fill method of fib_rules_ops struct.  This patch removes this
parameter from this method and fixes the places where this method is
called.

(include/net/fib_rules.h)

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20 17:26:23 -07:00
Eric Dumazet
93f154b594 net: release dst entry in dev_hard_start_xmit()
One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).

CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.

It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.

David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().

List of devices that must clear this flag is :

- loopback device, because it calls netif_rx() and quoting Patrick :
    "ip_route_input() doesn't accept loopback addresses, so loopback packets
     already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function

- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 22:19:19 -07:00
Eric W. Biederman
336ca57c3b net-sysfs: Use rtnl_trylock in sysfs methods.
The earlier patch to fix the deadlock between a network device going
away and writing to sysfs attributes was incomplete.
- It did not set signal_pending so we would leak ERSTARTSYS to user space.
- It used ERESTARTSYS which only restarts if sigaction configures it to.
- It did not cover store and show for ifalias.

So fix all of these up and use the new helper restart_syscall so we get
the details correct on what it takes.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 22:15:57 -07:00
Thomas Chenault
995b337952 net: fix skb_seq_read returning wrong offset/length for page frag data
When called with a consumed value that is less than skb_headlen(skb)
bytes into a page frag, skb_seq_read() incorrectly returns an
offset/length relative to skb->data. Ensure that data which should come
from a page frag does.

Signed-off-by: Thomas Chenault <thomas_chenault@dell.com>
Tested-by: Shyam Iyer <shyam_iyer@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 21:43:27 -07:00
David S. Miller
bb803cfbec Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/scsi/fcoe/fcoe.c
2009-05-18 21:08:20 -07:00
Eric Dumazet
511e11e396 pkt_sched: gen_estimator: use 64 bit intermediate counters for bps
gen_estimator can overflow bps (bytes per second) with Gb links, while
it was designed with a u32 API, with a theorical limit of 34360Mbit
(2^32 bytes)

Using 64 bit intermediate avbps/brate counters can allow us to reach
this theorical limit.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 19:26:37 -07:00
Eric Dumazet
7004bf252c net: add tx_packets/tx_bytes/tx_dropped counters in struct netdev_queue
offsetof(struct net_device, features)=0x44
offsetof(struct net_device, stats.tx_packets)=0x54
offsetof(struct net_device, stats.tx_bytes)=0x5c
offsetof(struct net_device, stats.tx_dropped)=0x6c

Network drivers that touch dev->stats.tx_packets/stats.tx_bytes in their
tx path can slow down SMP operations, since they dirty a cache line
that should stay shared (dev->features is needed in rx and tx paths)

We could move away stats field in net_device but it wont help that much.
(Two cache lines dirtied in tx path, we can do one only)

Better solution is to add tx_packets/tx_bytes/tx_dropped in struct
netdev_queue because this structure is already touched in tx path and
counters updates will then be free (no increase in size)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 15:15:06 -07:00
John Dykstra
9dc20c5f78 tcp: tcp_prequeue() can use keyed wakeups
When TCP frees up write buffer space, avoid waking up tasks that have
done a poll() or select() on the same socket specifying read-side
events.

This is an extension of a read-side patch by Eric Dumazet.

Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 20:44:43 -07:00
Pavel Emelyanov
5e392739d6 netpoll: don't dereference NULL dev from np
It looks like the dev in netpoll_poll can be NULL - at lease it's
checked at the function beginning. Thus the dev->netde_ops dereference
looks dangerous.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 20:37:55 -07:00
Rami Rosen
8b3521eeb7 ipv4: remove an unused parameter from configure method of fib_rules_ops.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 11:59:45 -07:00
Jiri Pirko
ab9c73ccb5 net: check retval of dev_addr_init()
Add missed checking of dev_addr_init return value in alloc_netdev_mq.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

 net/core/dev.c |   15 ++++++++++++---
 1 files changed, 12 insertions(+), 3 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-09 13:15:48 -07:00
John Dykstra
61de71c67c Network Drop Monitor: Fix skb_kill_datagram
Commit ead2ceb0ec ("Network Drop Monitor:
Adding kfree_skb_clean for non-drops and modifying end-of-line points
for skbs") established new conventions for identifying dropped packets.

Align skb_kill_datagram() with these conventions so that packets that
get dropped just before the copy to userspace are properly tracked.

Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-08 14:57:01 -07:00
David S. Miller
22f6dacdfc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	include/net/tcp.h
2009-05-08 02:48:30 -07:00
Lennert Buytenhek
b805007545 net: update skb_recycle_check() for hardware timestamping changes
Commit ac45f602ee ("net: infrastructure
for hardware time stamping") added two skb initialization actions to
__alloc_skb(), which need to be added to skb_recycle_check() as well.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-06 16:49:18 -07:00
Jiri Pirko
f001fde5ea net: introduce a list of device addresses dev_addr_list (v6)
v5 -> v6 (current):
-removed so far unused static functions
-corrected dev_addr_del_multiple to call del instead of add

v4 -> v5:
-added device address type (suggested by davem)
-removed refcounting (better to have simplier code then safe potentially few
 bytes)

v3 -> v4:
-changed kzalloc to kmalloc in __hw_addr_add_ii()
-ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init()

v2 -> v3:
-removed unnecessary rcu read locking
-moved dev_addr_flush() calling to ensure no null dereference of dev_addr

v1 -> v2:
-added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush
-removed unnecessary rcu_read locking in dev_addr_init
-use compare_ether_addr_64bits instead of compare_ether_addr
-use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr
-use call_rcu instead of rcu_synchronize
-moved is_etherdev_addr into __KERNEL__ ifdef

This patch introduces a new list in struct net_device and brings a set of
functions to handle the work with device address list. The list is a replacement
for the original dev_addr field and because in some situations there is need to
carry several device addresses with the net device. To be backward compatible,
dev_addr is made to point to the first member of the list so original drivers
sees no difference.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-05 12:26:24 -07:00
Alexey Dobriyan
088eb2d905 netns 2/2: extract net_create()
net_create() will be used by C/R to create fresh netns on restart.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-04 11:12:14 -07:00
Alexey Dobriyan
4a84822c60 netns 1/2: don't get/put old netns on CLONE_NEWNET
copy_net_ns() doesn't copy anything, it creates fresh netns, so
get/put of old netns isn't needed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-04 11:11:38 -07:00
David S. Miller
513de11bba net: Avoid modulus in skb_tx_hash() for forwarding case.
Based almost entirely upon a patch by Eric Dumazet.

The common case is to have num-tx-queues <= num_rx_queues
and even if num_tx_queues is larger it will not be significantly
larger.

Therefore, a subtraction loop is always going to be faster than
modulus.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-03 14:43:10 -07:00
David S. Miller
d252a5e7b7 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-05-03 14:07:43 -07:00
Eric Dumazet
ec581f6a42 net: Fix skb_tx_hash() for forwarding workloads.
When skb_rx_queue_recorded() is true, we dont want to use jash distribution
as the device driver exactly told us which queue was selected at RX time.
jhash makes a statistical shuffle, but this wont work with 8 static inputs.

Later improvements would be to compute reciprocal value of real_num_tx_queues
to avoid a divide here. But this computation should be done once,
when real_num_tx_queues is set. This needs a separate patch, and a new
field in struct net_device.

Reported-by: Andrew Dickinson <andrew@whydna.net>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-01 09:05:06 -07:00
Jarek Poplawski
7a67e56fd3 net: Fix oops when splicing skbs from a frag_list.
Lennert Buytenhek wrote:
> Since 4fb6699481 ("net: Optimize memory
> usage when splicing from sockets.") I'm seeing this oops (e.g. in
> 2.6.30-rc3) when splicing from a TCP socket to /dev/null on a driver
> (mv643xx_eth) that uses LRO in the skb mode (lro_receive_skb) rather
> than the frag mode:

My patch incorrectly assumed skb->sk was always valid, but for
"frag_listed" skbs we can only use skb->sk of their parent.

Reported-by: Lennert Buytenhek <buytenh@wantstofly.org>
Debugged-by: Lennert Buytenhek <buytenh@wantstofly.org>
Tested-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-30 05:41:19 -07:00
David S. Miller
aba7453037 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	Documentation/isdn/00-INDEX
	drivers/net/wireless/iwlwifi/iwl-scan.c
	drivers/net/wireless/rndis_wlan.c
	net/mac80211/main.c
2009-04-29 20:30:35 -07:00
Eric Dumazet
bf368e4e70 net: Avoid extra wakeups of threads blocked in wait_for_packet()
In 2.6.25 we added UDP mem accounting.

This unfortunatly added a penalty when a frame is transmitted, since
we have at TX completion time to call sock_wfree() to perform necessary
memory accounting. This calls sock_def_write_space() and utimately
scheduler if any thread is waiting on the socket.
Thread(s) waiting for an incoming frame was scheduled, then had to sleep
again as event was meaningless.

(All threads waiting on a socket are using same sk_sleep anchor)

This adds lot of extra wakeups and increases latencies, as noted
by Christoph Lameter, and slows down softirq handler.

Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2 

Fortunatly, Davide Libenzi recently added concept of keyed wakeups
into kernel, and particularly for sockets (see commit
37e5540b3c 
epoll keyed wakeups: make sockets use keyed wakeups)

Davide goal was to optimize epoll, but this new wakeup infrastructure
can help non epoll users as well, if they care to setup an appropriate
handler.

This patch introduces new DEFINE_WAIT_FUNC() helper and uses it
in wait_for_packet(), so that only relevant event can wakeup a thread
blocked in this function.

Trace of function calls from bnx2 TX completion bnx2_poll_work() is :
__kfree_skb()
 skb_release_head_state()
  sock_wfree()
   sock_def_write_space()
    __wake_up_sync_key()
     __wake_up_common()
      receiver_wake_function() : Stops here since thread is waiting for an INPUT


Reported-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-28 02:24:21 -07:00
Herbert Xu
edbd9e3030 gro: Fix handling of headers that extend over the tail
The skb_gro_* code fails to handle the case where a header starts
in the linear area but ends in the frags area.  Since the goal
of skb_gro_* is to optimise the case of completely non-linear
packets, we can simply bail out if we have anything in the linear
area.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27 05:44:29 -07:00
Neil Horman
683703a26e drop_monitor: Update netlink protocol to include netlink attribute header in alert message
When I initially implemented this protocol, I disregarded the use of netlink
attribute headers, thinking for my purposes they weren't needed.  I've come to
find out that, as I'm starting to work with sending down messages with
associated data (like config messages), the kernel code spits out warnings about
trailing data in a netlink skb that doesn't have an associated header on it.  As
such, I'm going to start including attribute headers in my netlink transaction,
and so for completeness, I should likely include them on messages bound from the
kernel to user space.  This patch adds that header to the kernel, and bumps the
protocol version accordingly

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27 03:17:31 -07:00
Michael S. Tsirkin
6f26c9a755 tun: fix tun_chr_aio_write so that aio works
aio_write gets const struct iovec * but tun_chr_aio_write casts this to struct
iovec * and modifies the iovec. As a result, attempts to use io_submit
to send packets to a tun device fail with weird errors such as EINVAL.

Since tun is the only user of skb_copy_datagram_from_iovec, we can
fix this simply by changing the later so that it does not
touch the iovec passed to it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-21 05:42:46 -07:00
Michael S. Tsirkin
0a1ec07a67 net: skb_copy_datagram_const_iovec()
There's an skb_copy_datagram_iovec() to copy out of a paged skb,
but it modifies the iovec, and does not support starting
at an offset in the destination. We want both in tun.c, so let's
add the function.

It's a carbon copy of skb_copy_datagram_iovec() with enough changes to
be annoying.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-21 05:42:44 -07:00
David S. Miller
e5e9743bb7 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/core/dev.c
2009-04-21 01:32:26 -07:00
Ben Hutchings
5db8765a86 net: Fix GRO for multiple page fragments
This loop over fragments in napi_fraginfo_skb() was "interesting".

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-20 02:20:30 -07:00
Marcin Slusarz
eb39c57ff7 net: fix "compatibility" typos
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-20 02:15:01 -07:00
Jarek Poplawski
8caf153974 net: sch_netem: Fix an inconsistency in ingress netem timestamps.
Alex Sidorenko reported:

"while experimenting with 'netem' we have found some strange behaviour. It
seemed that ingress delay as measured by 'ping' command shows up on some
hosts but not on others.

After some investigation I have found that the problem is that skbuff->tstamp
field value depends on whether there are any packet sniffers enabled. That
is:

- if any ptype_all handler is registered, the tstamp field is as expected
- if there are no ptype_all handlers, the tstamp field does not show the delay"

This patch prevents unnecessary update of tstamp in dev_queue_xmit_nit()
on ingress path (with act_mirred) adding a check, so minimal overhead on
the fast path, but only when sniffers etc. are active.

Since netem at ingress seems to logically emulate a network before a host,
tstamp is zeroed to trigger the update and pretend delays are from the
outside.

Reported-by: Alex Sidorenko <alexandre.sidorenko@hp.com>
Tested-by: Alex Sidorenko <alexandre.sidorenko@hp.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-20 02:14:59 -07:00