tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb->len / skb->truesize ratio of 75% (3/4)
In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.
With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 < 1728
(GRO can escape this problem because it build skbs with a too low
truesize.)
This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.
We should adjust the len/truesize ratio to 50% instead of 75%
This patch :
1) changes tcp_adv_win_scale default to 1 instead of 2
2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit dc1f8bf68b ('netdev: change
transmit to limited range type') changed the required return type and
9a1654ba0b ('net: Optimize
hard_start_xmit() return checking') changed the valid numerical
return values.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 08baf56108 ('net:
txq_trans_update() helper') made it unnecessary for most drivers to
set net_device::trans_start (or netdev_queue::trans_start).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commits d314774cf2 ('netdev: network
device operations infrastructure') and
008298231a ('netdev: add more functions
to netdevice ops') moved and renamed net device operation pointers.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commits e308a5d806 ('netdev: Add
netdev->addr_list_lock protection.') and
e8a0464cc9 ('netdev: Allocate multiple
queues for TX.') introduced more fine-grained locks.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit bea3348eef ('[NET]: Make NAPI
polling independent of struct net_device objects.') removed the
automatic disabling of NAPI polling by dev_close(), and drivers
must now do this themselves.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The explanation of ip_local_port_range in
Documentation/networking/ip-sysctl.txt contains several factual
errors:
- The default value of ip_local_port_range does not depend on the
amount of memory available in the system.
- tcp_tw_recycle is not enabled by default.
- 1024-4999 is not the default value.
- Etc.
Clean up the mess.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Install commands should not be used to specify soft dependencies among
modules. When loading modules it's much better to have a softdep that
modprobe knows what's being done than having to fork/exec another
instance of modprobe to load the other module.
By using a softdep user has also an option to remove the dependencies
when removing the module (and if its refcount dropped to 0)
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Usage of /etc/modprobe.conf file was deprecated by module-init-tools and
is no longer parsed by new kmod tool. References to this file are
replaced in Documentation, comments and Kconfig according to the
context.
There are also some references to the old /etc/modules.conf from 2.4
kernels that are being removed.
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull security subsystem updates for 3.4 from James Morris:
"The main addition here is the new Yama security module from Kees Cook,
which was discussed at the Linux Security Summit last year. Its
purpose is to collect miscellaneous DAC security enhancements in one
place. This also marks a departure in policy for LSM modules, which
were previously limited to being standalone access control systems.
Chromium OS is using Yama, and I believe there are plans for Ubuntu,
at least.
This patchset also includes maintenance updates for AppArmor, TOMOYO
and others."
Fix trivial conflict in <net/sock.h> due to the jumo_label->static_key
rename.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
AppArmor: Fix location of const qualifier on generated string tables
TOMOYO: Return error if fails to delete a domain
AppArmor: add const qualifiers to string arrays
AppArmor: Add ability to load extended policy
TOMOYO: Return appropriate value to poll().
AppArmor: Move path failure information into aa_get_name and rename
AppArmor: Update dfa matching routines.
AppArmor: Minor cleanup of d_namespace_path to consolidate error handling
AppArmor: Retrieve the dentry_path for error reporting when path lookup fails
AppArmor: Add const qualifiers to generated string tables
AppArmor: Fix oops in policy unpack auditing
AppArmor: Fix error returned when a path lookup is disconnected
KEYS: testing wrong bit for KEY_FLAG_REVOKED
TOMOYO: Fix mount flags checking order.
security: fix ima kconfig warning
AppArmor: Fix the error case for chroot relative path name lookup
AppArmor: fix mapping of META_READ to audit and quiet flags
AppArmor: Fix underflow in xindex calculation
AppArmor: Fix dropping of allowed operations that are force audited
AppArmor: Add mising end of structure test to caps unpacking
...
Pull trivial tree from Jiri Kosina:
"It's indeed trivial -- mostly documentation updates and a bunch of
typo fixes from Masanari.
There are also several linux/version.h include removals from Jesper."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
kcore: fix spelling in read_kcore() comment
constify struct pci_dev * in obvious cases
Revert "char: Fix typo in viotape.c"
init: fix wording error in mm_init comment
usb: gadget: Kconfig: fix typo for 'different'
Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
writeback: fix typo in the writeback_control comment
Documentation: Fix multiple typo in Documentation
tpm_tis: fix tis_lock with respect to RCU
Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
Doc: Update numastat.txt
qla4xxx: Add missing spaces to error messages
compiler.h: Fix typo
security: struct security_operations kerneldoc fix
Documentation: broken URL in libata.tmpl
Documentation: broken URL in filesystems.tmpl
mtd: simplify return logic in do_map_probe()
mm: fix comment typo of truncate_inode_pages_range
power: bq27x00: Fix typos in comment
...
I've been working on some documentation, so let's
add this diagram to the kernel tree where at least
it has a chance of being maintained :-)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since all that include/linux/if_ppp.h does is #include <linux/ppp-ioctl.h>,
this replaces the occurrences of #include <linux/if_ppp.h> with
#include <linux/ppp-ioctl.h>.
It also corrects an error in Documentation/networking/l2tp.txt, where
it referenced include/linux/if_ppp.h as the source of some definitions
that are actually now defined in include/linux/if_pppol2tp.h.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This moves the definitions of the ioctls, constants and structures
relating to the ppp_generic interface to userspace out from if_ppp.h
to a new file, ppp-ioctl.h. The new file has my copyright since I
designed and implemented the ppp_generic interface in the late 1990s.
None of the contents of this file comes from the original if_ppp.h
published by Carnegie Mellon University.
Of the remainder of if_ppp.h, only the PPP_MTU definition was being
used, and this replaces the uses of it with PPP_MRU (which is identical).
Therefore, this replaces the entire file with the single line
#include <linux/ppp-ioctl.h>
which clearly doesn't contain any CMU code. Thus I have removed the
CMU copyright notice with its problematic advertising clause, and in
fact since it's only one trivial line I have not added any other
copyright notice.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This flag requests that network devices pass all
received frames up the stack, even ones with errors
such as invalid FCS (frame check sum). This will
allow sniffers to see bad packets and perhaps
give the user some idea how to fix the problem.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When set on hardware that supports the feature,
this causes the Ethernet FCS to be appended
to the end of the skb.
Useful for sniffing packets.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reorganization of the driver layout in drivers/net
left behind some stale paths in comments and in Kconfig
help text. Bring them up to date. No actual change to
any code takes place here.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel contains some special internal keyrings, for instance the DNS
resolver keyring :
2a93faf1 I----- 1 perm 1f030000 0 0 keyring .dns_resolver: empty
It would occasionally be useful to allow the contents of such keyrings to be
flushed by root (cache invalidation).
Allow a flag to be set on a keyring to mark that someone possessing the
sysadmin capability can clear the keyring, even without normal write access to
the keyring.
Set this flag on the special keyrings created by the DNS resolver, the NFS
identity mapper and the CIFS identity mapper.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
v2, based on Jay's review.
I kept the 'link must be up' part, because this is enforced in the code.
Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just fixed typo of sample code in packet_mmap.txt
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit c5ed63d66f24(tcp: fix three tcp sysctls tuning),
sysctl_max_syn_backlog is determined by tcp_hashinfo->ehash_mask,
and the minimal value is 128, and it will increase in proportion to the
memory of machine.
The original description for tcp_max_syn_backlog and sysctl_max_syn_backlog
are out of date.
Changelog:
V2: update description for sysctl_max_syn_backlog
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Reviewed-by: Shan Wei <shanwei88@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Open vSwitch is a multilayer Ethernet switch targeted at virtualized
environments. In addition to supporting a variety of features
expected in a traditional hardware switch, it enables fine-grained
programmatic extension and flow-based control of the network.
This control is useful in a wide variety of applications but is
particularly important in multi-server virtualization deployments,
which are often characterized by highly dynamic endpoints and the need
to maintain logical abstractions for multiple tenants.
The Open vSwitch datapath provides an in-kernel fast path for packet
forwarding. It is complemented by a userspace daemon, ovs-vswitchd,
which is able to accept configuration from a variety of sources and
translate it into packet processing rules.
See http://openvswitch.org for more information and userspace
utilities.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Rick Jones reported that TCP_CONGESTION sockopt performed on a listener
was ignored for its children sockets : right after accept() the
congestion control for new socket is the system default one.
This seems an oversight of the initial design (quoted from Stephen)
Based on prior investigation and patch from Rick.
Reported-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Yuchung Cheng <ycheng@google.com>
Tested-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Documentation/networking/ifenslave.c: In function ‘if_getconfig’:
Documentation/networking/ifenslave.c:508:14: warning: variable ‘mtu’ set but not used [-Wunused-but-set-variable]
Documentation/networking/ifenslave.c:508:6: warning: variable ‘metric’ set but not used [-Wunused-but-set-variable]
The purpose of this function is to simply print out the values
it probes, so...
Signed-off-by: David S. Miller <davem@davemloft.net>
Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
> From: David Miller <davem@davemloft.net>
> Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
>
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 09 Nov 2011 12:14:09 +0100
> >
> >> unres_qlen is the number of frames we are able to queue per unresolved
> >> neighbour. Its default value (3) was never changed and is responsible
> >> for strange drops, especially if IP fragments are used, or multiple
> >> sessions start in parallel. Even a single tcp flow can hit this limit.
> > ...
> >
> > Ok, I've applied this, let's see what happens :-)
>
> Early answer, build fails.
>
> Please test build this patch with DECNET enabled and resubmit. The
> decnet neigh layer still refers to the removed ->queue_len member.
>
> Thanks.
Ouch, this was fixed on one machine yesterday, but not the other one I
used this morning, sorry.
[PATCH V5 net-next] neigh: new unresolved queue limits
unres_qlen is the number of frames we are able to queue per unresolved
neighbour. Its default value (3) was never changed and is responsible
for strange drops, especially if IP fragments are used, or multiple
sessions start in parallel. Even a single tcp flow can hit this limit.
$ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds chapter to documentation which describes how to use
6lowpan technology.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces new network device called team. It supposes to be
very fast, simple, userspace-driven alternative to existing bonding
driver.
Userspace library called libteam with couple of demo apps is available
here:
https://github.com/jpirko/libteam
Note it's still in its dipers atm.
team<->libteam use generic netlink for communication. That and rtnl
suppose to be the only way to configure team device, no sysfs etc.
Python binding of libteam was recently introduced.
Daemon providing arpmon/miimon active-backup functionality will be
introduced shortly. All what's necessary is already implemented in
kernel team driver.
v7->v8:
- check ndo_ndo_vlan_rx_[add/kill]_vid functions before calling
them.
- use dev_kfree_skb_any() instead of dev_kfree_skb()
v6->v7:
- transmit and receive functions are not checked in hot paths.
That also resolves memory leak on transmit when no port is
present
v5->v6:
- changed couple of _rcu calls to non _rcu ones in non-readers
v4->v5:
- team_change_mtu() uses team->lock while travesing though port
list
- mac address changes are moved completely to jurisdiction of
userspace daemon. This way the daemon can do FOM1, FOM2 and
possibly other weird things with mac addresses.
Only round-robin mode sets up all ports to bond's address then
enslaved.
- Extended Kconfig text
v3->v4:
- remove redundant synchronize_rcu from __team_change_mode()
- revert "set and clear of mode_ops happens per pointer, not per
byte"
- extend comment of function __team_change_mode()
v2->v3:
- team_change_mtu() uses rcu version of list traversal to unwind
- set and clear of mode_ops happens per pointer, not per byte
- port hashlist changed to be embedded into team structure
- error branch in team_port_enter() does cleanup now
- fixed rtln->rtnl
v1->v2:
- modes are made as modules. Makes team more modular and
extendable.
- several commenters' nitpicks found on v1 were fixed
- several other bugs were fixed.
- note I ignored Eric's comment about roundrobin port selector
as Eric's way may be easily implemented as another mode (mode
"random") in future.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Small fix in Documentation, since min_pmtu is 512 + 20 + 20 = 552
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Also reword the test to make it read more easily (to me)
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add missing documentation for conntrack, snat_reroute and sync_version.
Also fix up a typo, IPVS_DEBUG should be IP_VS_DEBUG.
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1745 commits)
dp83640: free packet queues on remove
dp83640: use proper function to free transmit time stamping packets
ipv6: Do not use routes from locally generated RAs
|PATCH net-next] tg3: add tx_dropped counter
be2net: don't create multiple RX/TX rings in multi channel mode
be2net: don't create multiple TXQs in BE2
be2net: refactor VF setup/teardown code into be_vf_setup/clear()
be2net: add vlan/rx-mode/flow-control config to be_setup()
net_sched: cls_flow: use skb_header_pointer()
ipv4: avoid useless call of the function check_peer_pmtu
TCP: remove TCP_DEBUG
net: Fix driver name for mdio-gpio.c
ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
ipv4: fix ipsec forward performance regression
jme: fix irq storm after suspend/resume
route: fix ICMP redirect validation
net: hold sock reference while processing tx timestamps
tcp: md5: add more const attributes
Add ethtool -g support to virtio_net
...
Fix up conflicts in:
- drivers/net/Kconfig:
The split-up generated a trivial conflict with removal of a
stale reference to Documentation/networking/net-modules.txt.
Remove it from the new location instead.
- fs/sysfs/dir.c:
Fairly nasty conflicts with the sysfs rb-tree usage, conflicting
with Eric Biederman's changes for tagged directories.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
MAINTAINERS: linux-m32r is moderated for non-subscribers
linux@lists.openrisc.net is moderated for non-subscribers
Drop default from "DM365 codec select" choice
parisc: Kconfig: cleanup Kernel page size default
Kconfig: remove redundant CONFIG_ prefix on two symbols
cris: remove arch/cris/arch-v32/lib/nand_init.S
microblaze: add missing CONFIG_ prefixes
h8300: drop puzzling Kconfig dependencies
MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
tty: drop superfluous dependency in Kconfig
ARM: mxc: fix Kconfig typo 'i.MX51'
Fix file references in Kconfig files
aic7xxx: fix Kconfig references to READMEs
Fix file references in drivers/ide/
thinkpad_acpi: Fix printk typo 'bluestooth'
bcmring: drop commented out line in Kconfig
btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
doc: raw1394: Trivial typo fix
CIFS: Don't free volume_info->UNC until we are entirely done with it.
treewide: Correct spelling of successfully in comments
...
Add documentation about NOACK tx flag usage.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The second hunk fixes rps_sock_flow_table but has to re-wrap the paragraph.
Signed-off-by: Benjamin Poirier <benjamin.poirier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>