Commit Graph

28529 Commits

Author SHA1 Message Date
Nicolas Dichtel
c2ff682a6f sit: fix an oops when IFLA_IPTUN_PROTO is not set
The use of this attribute has been added in 32b8a8e59c (sit: add IPv4 over
IPv4 support). It is optional, by default proto is IPPROTO_IPV6.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 21:18:17 -07:00
Gao feng
dc25c676f5 neigh: disallow un-init_net to change thresh of neigh
thresh and interval are global resources,
only init net can change them.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 21:13:24 -07:00
Gao feng
170d6f9954 neigh: only allow init_net to change the default neigh_parms
Though we don't export the /proc/sys/net/ipv[4,6]/neigh/default/
directory to the un-init_net, but we can still use cmd such as
"ip ntable change name arp_cache locktime 129" to change the locktime
of default neigh_parms.

This patch disallows the un-init_net to find out the neigh_table.parms.
So the un-init_net will failed to influence the init_net.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 21:13:24 -07:00
Gao feng
cf89d6b280 neigh: no need to call lookup_neigh_parms in neigh_parms_alloc
neigh_table.parms always exist and is initialized,kmemdup
can use it to create new neigh_parms, actually lookup_neigh_parms
here will return neigh_table.parms too.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 21:13:24 -07:00
Pravin B Shelar
aa310701e7 openvswitch: Add gre tunnel support.
Add gre vport implementation.  Most of gre protocol processing
is pushed to gre module. It make use of gre demultiplexer
therefore it can co-exist with linux device based gre tunnels.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:42 -07:00
Pravin B Shelar
a3e82996a8 openvswitch: Optimize flow key match for non tunnel flows.
Following patch adds start offset for sw_flow-key, so that we can
skip tunneling information in key for non-tunnel flows.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:41 -07:00
Pravin B Shelar
ffe3f43217 openvswitch: Expand action buffer size.
MAX_ACTIONS_BUFSIZE limits action list size, set tunnel action
needs extra space on action list, for now increase max actions list limit.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:41 -07:00
Pravin B Shelar
7d5437c709 openvswitch: Add tunneling interface.
Add ovs tunnel interface for set tunnel action for userspace.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:41 -07:00
Pravin B Shelar
74f84a5726 openvswitch: Copy individual actions.
Rather than validating actions and then copying all actiaons
in one block, following patch does same operation in single pass.
This validate and copy action one by one. This is required for
ovs tunneling patch.

This patch does not change any functionality.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:41 -07:00
Pravin B Shelar
3d7b46cd20 ip_tunnel: push generic protocol handling to ip_tunnel module.
Process skb tunnel header before sending packet to protocol handler.
this allows code sharing between gre and ovs gre modules.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:41 -07:00
Pravin B Shelar
0e6fbc5b6c ip_tunnels: extend iptunnel_xmit()
Refactor various ip tunnels xmit functions and extend iptunnel_xmit()
so that there is more code sharing.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:41 -07:00
Pravin B Shelar
45f2e9976c gre: export gre_handle_offloads() function.
This is required for OVS GRE offloading.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:41 -07:00
Pravin B Shelar
752f36da68 gre: export gre_build_header() function.
This is required for ovs gre module.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:40 -07:00
Pravin B Shelar
bda7bb4634 gre: Allow multiple protocol listener for gre protocol.
Currently there is only one user is allowed to register for gre
protocol.  Following patch adds de-multiplexer.  So that multiple
modules can listen on gre protocol e.g. kernel gre devices and ovs.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:40 -07:00
Pravin B Shelar
20fd4d1f04 gre: Simplify gre protocol registration locking.
Use cmpxchg() for atomic protocol registration which saves
code and data space.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 18:07:40 -07:00
David S. Miller
d98cae64e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/wireless/ath/ath9k/Kconfig
	drivers/net/xen-netback/netback.c
	net/batman-adv/bat_iv_ogm.c
	net/wireless/nl80211.c

The ath9k Kconfig conflict was a change of a Kconfig option name right
next to the deletion of another option.

The xen-netback conflict was overlapping changes involving the
handling of the notify list in xen_netbk_rx_action().

Batman conflict resolution provided by Antonio Quartulli, basically
keep everything in both conflict hunks.

The nl80211 conflict is a little more involved.  In 'net' we added a
dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
Linus reported.  Meanwhile in 'net-next' the handlers were converted
to use pre and post doit handlers which use a flag to determine
whether to hold the RTNL mutex around the operation.

However, the dump handlers to not use this logic.  Instead they have
to explicitly do the locking.  There were apparent bugs in the
conversion of nl80211_dump_wiphy() in that we were not dropping the
RTNL mutex in all the return paths, and it seems we very much should
be doing so.  So I fixed that whilst handling the overlapping changes.

To simplify the initial returns, I take the RTNL mutex after we try
to allocate 'tb'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 16:49:39 -07:00
John W. Linville
4067c666f2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-06-19 15:24:36 -04:00
John W. Linville
2f2a8846d5 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-06-19 14:50:44 -04:00
Johannes Berg
3a5a423bb9 nl80211: fix attrbuf access race by allocating a separate one
Since my commit 3713b4e364 ("nl80211: allow splitting wiphy
information in dumps"), nl80211_dump_wiphy() uses the global
nl80211_fam.attrbuf for parsing the incoming data. This wouldn't
be a problem if it only did so on the first dump iteration which
is locked against other commands in generic netlink, but due to
space constraints in cb->args (the needed state doesn't fit) I
decided to always parse the original message. That's racy though
since nl80211_fam.attrbuf could be used by some other parsing in
generic netlink concurrently.

For now, fix this by allocating a separate parse buffer (it's a
bit too big for the stack, currently 1448 bytes on 64-bit). For
-next, I'll change the code to parse into the global buffer in
the first round only and then allocate a smaller buffer to keep
the data in cb->args.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-06-19 18:31:20 +02:00
Matthias Schiffer
33be081a81 ipv6: ndisc: fix ndisc_send_redirect writing to the wrong skb
Since some refactoring in 5f5a011, ndisc_send_redirect called
ndisc_fill_redirect_hdr_option on the wrong skb, leading to data corruption or
in the worst case a panic when the skb_put failed.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 22:59:12 -07:00
Linus Lüssing
32de868cbc bridge: fix switched interval for MLD Query types
General Queries (the one with the Multicast Address field
set to zero / '::') are supposed to have a Maximum Response Delay
of [Query Response Interval], while for Multicast-Address-Specific
Queries it is [Last Listener Query Interval] - not the other way
round. (see RFC2710, section 7.3+7.8)

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 17:11:22 -07:00
Fernando Luis Vazquez Cao
788dfcacac vlan: restore ethtool ABI to control VLAN hardware acceleration
As part of the push to add 802.1ad server provider tagging support to the
kernel the VLAN features flags were renamed. Unfortunately the kernel name
for the VLAN hardware acceleration features that the kernel shows user space
was included in the rename, which broke ethtool (txvlan and rxvlan options
do not work). This patch restores the original names, i.e. the original ABI.
If we wanted to make clear to users that we are refering to CTAGs we can
always change ethtool's short_name and long_name for these features (for
example something along the lines of txvlan -> txvlan-ctag, tx-vlan-offload ->
tx-vlan-ctag-offload).

Cc: Patrick McHardy <kaber@trash.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 17:09:35 -07:00
Daniel Borkmann
dda9192851 net: sctp: remove SCTP_STATIC macro
SCTP_STATIC is just another define for the static keyword. It's use
is inconsistent in the SCTP code anyway and it was introduced in the
initial implementation of SCTP in 2.5. We have a regression suite in
lksctp-tools, but this is for user space only, so noone makes use of
this macro anymore. The kernel test suite for 2.5 is incompatible with
the current SCTP code anyway.

So simply Remove it, to be more consistent with the rest of the kernel
code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 17:08:05 -07:00
Daniel Borkmann
939cfa75a0 net: sctp: get rid of t_new macro for kzalloc
t_new rather obfuscates things where everyone else is using actual
function names instead of that macro, so replace it with kzalloc,
which is the function t_new wraps.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 17:08:04 -07:00
David S. Miller
c5b248dd05 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into wireless
John W. Linville says:

====================
This will probably be the last batch of wireless fixes intended
for 3.10.  Many of these are one- or two-liners, and a couple of
others are mostly relocating existing code to avoid races or to
limit the code to effecting specific hardware, etc.

The mac80211 fixes have a couple of exceptions to the above.
Regarding those, Johannes says:

"Following davem's complaint about my patch, here's a new pull request
w/o the patch he was complaining about, but instead with the const
fix rolled into the fix.

I have a fix for radar detection, one for rate control and a workaround
for broken HT APs which is a regression fix because we didn't rely
on them to be correct before."

Johannes also sends some iwlwifi fixes:

"I picked up Nikolay's patch for the chain noise calibration bug
that seems to have been there forever, a fix from Emmanuel for
setting TX flags on BAR frames and a fix of my own to avoid printing
request_module() errors if the kernel isn't even modular. We also
have our own version of Stanislaw's fix for rate control."

Along with those...

Anderson Lizardo fixes a Bluetooth memory corruption bug when an MTU
value is set to too small of a value.

Arend van Spriel sends a revised brcmsmac bug that fixes a regression
caused by a bad return value in an earlier patch.  He also sends a
brcmfmac fix to avoid an oops when loading the driver at boot.

Daniel Drake fixes a race condition in btmrvl that causes hangs on
suspend for OLPC hardware.

Johan Hedberg adds a check to avoid sending a
HCI_Delete_Stored_Link_Key command to devices that don't support them,
avoiding some scary looking log spam.

Stanislaw Gruszka gives us a fix for iwlegacy to be able to use rates
higher than 1Mb/s on older wireless networks.  He also sends an rt2x00
fix to reinstate older tx power handling behavior for some devices
that didn't work well with the current code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 16:15:51 -07:00
David S. Miller
e00c7f1fad Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter fixes. They are targeted to the
TCP option targets, that have receive some scrinity in the last week. The
changes are:

* Fix TCPOPTSTRIP, it stopped working in the forward chain as tcp_hdr
  uses skb->transport_header, and we cannot use that in the forwarding
  case, from myself.

* Fix default IPv6 MSS in TCPMSS in case of absence of TCP MSS options,
  from Phil Oester.

* Fix missing fragmentation handling again in TCPMSS, from Phil Oester.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 16:13:45 -07:00
Ying Xue
2537af9dca tipc: remove dev_base_lock use from enable_bearer
Convert enable_bearer() to RCU locking with dev_get_by_name().

Based on a similar changeset in commit 840a185d ["aoe: remove
dev_base_lock use from aoecmd_cfg_pkts()"] -- quoting that:

  "dev_base_lock is the legacy way to lock the device list,
   and is planned to disappear. (writers hold RTNL, readers
   hold RCU lock)"

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:01 -07:00
Ying Xue
126c052464 tipc: fix wrong return value for link_send_sections_long routine
When skb buffer cannot be allocated in link_send_sections_long(),
-ENOMEM error code instead of -EFAULT should be returned to its
caller.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:01 -07:00
Ying Xue
7410f967ba tipc: make tipc_link_send_sections_fast exit earlier
Once message build request function returns invalid code, the
process of sending message cannot continue. So in case of message
build failure, tipc_link_send_sections_fast() should return
immediately.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:01 -07:00
Ying Xue
796c75d0d3 tipc: enhance priority of link protocol packet
pfifo_fast is set as default traffic class queueing discipline. This
queue has three so called "bands". Within each band, FIFO rules apply.
However, as long as there are packets waiting in band 0, band 1 won't
be processed.

Now all kind of TIPC type packet priorities are never set, that is,
their priorities are 0, so they are mapped to band 1 of pfifo_fast
qdisc. But, especially during link congestion, if link protocol packet
can be sent out as earlier as possible than other type of packets so
that protocol packet can arrive at peer endpoint in time, the peer
will timely reset its link timeout timer to keep the link alive.
So enhancing the priority of link protocol packets can meet the
specific demand to avoid unnecessary link reset due to a transient
link congestion.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:01 -07:00
Paul Gortmaker
ae8509c420 tipc: cosmetic realignment of function arguments
No runtime code changes here.  Just a realign of the function
arguments to start where the 1st one was, and fit as many args
as can be put in an 80 char line.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:01 -07:00
Ying Xue
c0fee8aca7 tipc: save sock structure pointer instead of void pointer to tipc_port
Directly save sock structure pointer instead of void pointer to avoid
unnecessary cast conversions.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:01 -07:00
Ying Xue
28e5297281 tipc: convert config_lock from spinlock to mutex
As the configuration server is now running under process context,
it's unnecessary for us to have a spinlock serializing the TIPC
configuration process. Instead, we replace it with a mutex lock,
which gives us more freedom. For instance, we can now call
pre-emptable functions within the protected area.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:01 -07:00
Ying Xue
3c5db8e4ec tipc: rename tipc_createport_raw to tipc_createport
After the removal of the native API, there is now only one way to
to create a TIPC port instance -- the function tipc_createport_raw().
We make it more readable by renaming it to tipc_createport().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:01 -07:00
Ying Xue
f1733d7580 tipc: remove user_port instance from tipc_port structure
After the native API has been completely removed, the 'user_port'
field in struct tipc_port becomes unused, and can be removed.
As a consequence, the "usrmem" argument in tipc_msg_build() is no
longer needed, and so we remove that one too.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:00 -07:00
Ying Xue
198d73b82b tipc: delete code orphaned by new server infrastructure
Having completed the conversion of the topology server and
configuration server to use the new server infrastructure,
the following functions become unused, and can be deleted:

   - tipc_createport()
   - port_wakeup_sh()
   - port_dispatcher()
   - port_dispatcher_sigh()
   - tipc_send_buf_fast()
   - tipc_send_buf2port

Additionally, the following variables become orphaned,
and can be deleted:

   - tipc_msg_err_event
   - tipc_named_msg_err_event
   - tipc_conn_shutdown_event
   - tipc_msg_event
   - tipc_named_msg_event
   - tipc_conn_msg_event
   - tipc_continue_event
   - msg_queue_head
   - msg_queue_tail
   - queue_lock

Deletion is done here in a separate commit in order to allow
the actual conversion changes to be more easily viewed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:00 -07:00
Ying Xue
7d0ab17b74 tipc: convert configuration server to use new server facility
As the new socket-based TIPC server infrastructure has been
introduced, we can now convert the configuration server to use
it.  Then we can take future steps to simplify the configuration
server locking policy.

Some minor reordering of initialization is done, due to the
dependency on having tipc_socket_init completed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:00 -07:00
Ying Xue
13a2e89873 tipc: convert topology server to use new server facility
As the new TIPC server infrastructure has been introduced, we can
now convert the TIPC topology server to it.  We get two benefits
from doing this:

1) It simplifies the topology server locking policy.  In the
original locking policy, we placed one spin lock pointer in the
tipc_subscriber structure to reuse the lock of the subscriber's
server port, controlling access to members of tipc_subscriber
instance.  That is, we only used one lock to ensure both
tipc_port and tipc_subscriber members were safely accessed.

Now we introduce another spin lock for tipc_subscriber structure
only protecting themselves, to get a finer granularity locking
policy.  Moreover, the change will allow us to make the topology
server code more readable and maintainable.

2) It fixes a bug where sent subscription events may be lost when
the topology port is congested.  Using the new service, the
topology server now queues sent events into an outgoing buffer,
and then wakes up a sender process which has been blocked in
workqueue context.  The process will keep picking events from the
buffer and send them to their respective subscribers, using the
kernel socket interface, until the buffer is empty. Even if the
socket is congested during transmission there is no risk that
events may be dropped, since the sender process may block when
needed.

Some minor reordering of initialization is done, since we now
have a scenario where the topology server must be started after
socket initialization has taken place, as the former depends
on the latter.  And overall, we see a simplification of the
TIPC subscriber code in making this changeover.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:00 -07:00
Ying Xue
c5fa7b3cf3 tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.

As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.

To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.

The new service also solves another problem:

As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.

Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.

As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.

As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:00 -07:00
Erik Hugne
5d21cb70db tipc: allow implicit connect for stream sockets
TIPC's implied connect feature, aka piggyback connect, allows
applications to save one syscall and all SYN/SYN-ACK signalling
overhead when setting up a connection.  Until now, this has only
been supported for SEQPACKET sockets.  Here, we make it possible
to use this feature even with stream sockets.

At the connecting side, the connection is completed when the
first data message arrives from the accepting peer.  This means
that we must allow the connecting user to call blocking recv()
before the socket has reached state SS_CONNECTED.  So we must must
relax the state machine check at recv_stream(), and allow the
recv() call even if socket is in state SS_CONNECTING.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:00 -07:00
Ying Xue
cc79dd1ba9 tipc: change socket buffer overflow control to respect sk_rcvbuf
As per feedback from the netdev community, we change the buffer
overflow protection algorithm in receiving sockets so that it
always respects the nominal upper limit set in sk_rcvbuf.

Instead of scaling up from a small sk_rcvbuf value, which leads to
violation of the configured sk_rcvbuf limit, we now calculate the
weighted per-message limit by scaling down from a much bigger value,
still in the same field, according to the importance priority of the
received message.

To allow for administrative tunability of the socket receive buffer
size, we create a tipc_rmem sysctl variable to allow the user to
configure an even bigger value via sysctl command.  It is a size of
three (min/default/max) to be consistent with things like tcp_rmem.

By default, the value initialized in tipc_rmem[1] is equal to the
receive socket size needed by a TIPC_CRITICAL_IMPORTANCE message.
This value is also set as the default value of sk_rcvbuf.

Originally-by: Jon Maloy <jon.maloy@ericsson.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
[Ying: added sysctl variation to Jon's original patch]
Signed-off-by: Ying Xue <ying.xue@windriver.com>
[PG: don't compile sysctl.c if not config'd; add Documentation]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:53:00 -07:00
Eliezer Tamir
dafcc4380d net: add socket option for low latency polling
adds a socket option for low latency polling.
This allows overriding the global sysctl value with a per-socket one.
Unexport sysctl_net_ll_poll since for now it's not needed in modules.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:48:14 -07:00
Eliezer Tamir
89bf1b5a68 net: remove NET_LL_RX_POLL config menue
Remove NET_LL_RX_POLL from the config menu.
Change default to y.
Busy polling still needs to be enabled at run time.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:48:14 -07:00
Eliezer Tamir
9a3c71aa80 net: convert low latency sockets to sched_clock()
Use sched_clock() instead of get_cycles().
We can use sched_clock() because we don't care much about accuracy.
Remove the dependency on X86_TSC

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:48:14 -07:00
Eliezer Tamir
eb6db62282 net: change sysctl_net_ll_poll into an unsigned int
There is no reason for sysctl_net_ll_poll to be an unsigned long.
Change it into an unsigned int.
Fix the proc handler.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 15:48:13 -07:00
John W. Linville
722a300bf8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-06-17 13:44:01 -04:00
Daniel Borkmann
2e0c9e7911 net: sctp: sctp_association_init: put refs in reverse order
In case we need to bail out for whatever reason during assoc
init, we call sctp_endpoint_put() and then sock_put(), however,
we've hold both refs in reverse, non-symmetric order, so first
sctp_endpoint_hold() and then sock_hold().

Reverse this, so that in an error case we have sock_put() and then
sctp_endpoint_put(). Actually shouldn't matter too much, since both
cleanup paths do the right thing, but that way, it is more consistent
with the rest of the code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14 15:38:36 -07:00
Daniel Borkmann
c164b83814 net: sctp: minor: remove variable in sctp_init_sock
It's only used at this one time, so we could remove it as well.
This is valid and also makes it more explicit/obvious that in case
of error the sp->ep is NULL here, i.e. for the sctp_destroy_sock()
check that was recently added.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14 15:38:36 -07:00
Daniel Borkmann
405426f6ca net: sctp: sctp_sf_do_prm_asoc: do SCTP_CMD_INIT_CHOOSE_TRANSPORT first
While this currently cannot trigger any NULL pointer dereference in
sctp_seq_dump_local_addrs(), better change the order of commands to
prevent a future bug to happen. Although we first add SCTP_CMD_NEW_ASOC
and then set the SCTP_CMD_INIT_CHOOSE_TRANSPORT, it is okay for now,
since this primitive is only called by sctp_connect() or sctp_sendmsg()
with sctp_assoc_add_peer() set first. However, lets do this precaution
and first set the transport and then add it to the association hashlist
to prevent in future something to possibly triggering this.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14 15:38:36 -07:00
Daniel Borkmann
f9e42b8535 net: sctp: sideeffect: throw BUG if primary_path is NULL
This clearly states a BUG somewhere in the SCTP code as e.g. fixed once
in f28156335 ("sctp: Use correct sideffect command in duplicate cookie
handling"). If this ever happens, throw a trace in the sideeffect engine
where assocs clearly must have a primary_path assigned.

When in sctp_seq_dump_local_addrs() also throw a WARN and bail out since
we do not need to panic for printing this one asterisk. Also, it will
avoid the not so obvious case when primary != NULL test passes and at a
later point in time triggering a NULL ptr dereference caused by primary.
While at it, also fix up the white space.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14 15:38:36 -07:00