kernel-ark/net
Daniel Borkmann 492135557d tcp: add rfc3168, section 6.1.1.1. fallback
This work as a follow-up of commit f7b3bec6f5 ("net: allow setting ecn
via routing table") and adds RFC3168 section 6.1.1.1. fallback for outgoing
ECN connections. In other words, this work adds a retry with a non-ECN
setup SYN packet, as suggested from the RFC on the first timeout:

  [...] A host that receives no reply to an ECN-setup SYN within the
  normal SYN retransmission timeout interval MAY resend the SYN and
  any subsequent SYN retransmissions with CWR and ECE cleared. [...]

Schematic client-side view when assuming the server is in tcp_ecn=2 mode,
that is, Linux default since 2009 via commit 255cac91c3 ("tcp: extend
ECN sysctl to allow server-side only ECN"):

 1) Normal ECN-capable path:

    SYN ECE CWR ----->
                <----- SYN ACK ECE
            ACK ----->

 2) Path with broken middlebox, when client has fallback:

    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
            SYN ----->
                <----- SYN ACK
            ACK ----->

In case we would not have the fallback implemented, the middlebox drop
point would basically end up as:

    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)

In any case, it's rather a smaller percentage of sites where there would
occur such additional setup latency: it was found in end of 2014 that ~56%
of IPv4 and 65% of IPv6 servers of Alexa 1 million list would negotiate
ECN (aka tcp_ecn=2 default), 0.42% of these webservers will fail to connect
when trying to negotiate with ECN (tcp_ecn=1) due to timeouts, which the
fallback would mitigate with a slight latency trade-off. Recent related
paper on this topic:

  Brian Trammell, Mirja Kühlewind, Damiano Boppart, Iain Learmonth,
  Gorry Fairhurst, and Richard Scheffenegger:
    "Enabling Internet-Wide Deployment of Explicit Congestion Notification."
    Proc. PAM 2015, New York.
  http://ecn.ethz.ch/ecn-pam15.pdf

Thus, when net.ipv4.tcp_ecn=1 is being set, the patch will perform RFC3168,
section 6.1.1.1. fallback on timeout. For users explicitly not wanting this
which can be in DC use case, we add a net.ipv4.tcp_ecn_fallback knob that
allows for disabling the fallback.

tp->ecn_flags are not being cleared in tcp_ecn_clear_syn() on output, but
rather we let tcp_ecn_rcv_synack() take that over on input path in case a
SYN ACK ECE was delayed. Thus a spurious SYN retransmission will not prevent
ECN being negotiated eventually in that case.

Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf
Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
Signed-off-by: Brian Trammell <trammell@tik.ee.ethz.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Dave That <dave.taht@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 16:53:37 -04:00
..
6lowpan
9p 9p: patches for 4.1 merge window 2015-04-18 17:45:30 -04:00
802
8021q vlan: implement ndo_get_iflink 2015-04-02 14:05:00 -04:00
appletalk net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
atm net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
ax25 net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
batman-adv dev: introduce dev_get_iflink() 2015-04-02 14:04:59 -04:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-05-13 14:31:43 -04:00
bridge bridge_netfilter: No ICMP packet on IPv4 fragmentation error 2015-05-19 00:15:39 -04:00
caif net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
can net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
ceph net: Add a struct net parameter to sock_create_kern 2015-05-11 10:50:17 -04:00
core netns: make nsid_lock per net 2015-05-17 23:41:11 -04:00
dcb net/dcb: Add IEEE QCN attribute 2015-03-06 21:50:02 -05:00
dccp inet: fix possible panic in reqsk_queue_unlink() 2015-04-24 11:39:15 -04:00
decnet net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
dns_resolver
dsa switchdev: don't use anonymous union on switchdev attr/obj structs 2015-05-13 14:20:59 -04:00
ethernet flow_dissect: use programable dissector in skb_flow_dissect and friends 2015-05-13 15:19:47 -04:00
hsr
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-05-13 14:31:43 -04:00
ipv4 tcp: add rfc3168, section 6.1.1.1. fallback 2015-05-19 16:53:37 -04:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-05-13 14:31:43 -04:00
ipx net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
irda net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
iucv net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
key net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
l2tp net: Modify sk_alloc to not reference count the netns of kernel sockets. 2015-05-11 10:50:18 -04:00
lapb
llc net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
mac80211 This just has a few fixes: 2015-05-19 16:43:17 -04:00
mac802154 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2015-05-09 15:51:00 -04:00
mpls mpls: Change reserved label names to be consistent with netbsd 2015-05-09 22:29:50 -04:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2015-05-18 14:47:36 -04:00
netlabel netlink: implement nla_put_in_addr and nla_put_in6_addr 2015-03-31 13:58:35 -04:00
netlink netlink: Use random autobind rover 2015-05-17 23:43:31 -04:00
netrom net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
nfc net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
openvswitch geneve: Rename support library as geneve_core 2015-05-13 15:59:13 -04:00
packet net-packet: fix null pointer exception in rollover mode 2015-05-17 22:41:38 -04:00
phonet net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-05-13 14:31:43 -04:00
rfkill
rose net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
rxrpc net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
sched cls_flower: Fix compile error 2015-05-14 13:34:35 -04:00
sctp net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
sunrpc svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures 2015-05-04 12:02:40 -04:00
switchdev switchdev: add support for fdb add/del/dump via switchdev_port_obj ops. 2015-05-17 22:49:09 -04:00
tipc tipc: use sock_create_kern interface to create kernel socket 2015-05-14 13:39:33 -04:00
unix net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
vmw_vsock net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
wimax
wireless cfg80211: change GO_CONCURRENT to IR_CONCURRENT for STA 2015-05-06 15:50:02 +02:00
x25 net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
xfrm net: make skb_dst_pop routine static 2015-05-12 23:19:49 -04:00
compat.c net: switch importing msghdr from userland to {compat_,}import_iovec() 2015-04-09 00:02:26 -04:00
Kconfig net: add CONFIG_NET_INGRESS to enable ingress filtering 2015-05-14 01:10:05 -04:00
Makefile mpls: Refactor how the mpls module is built 2015-03-04 00:26:06 -05:00
socket.c net: Add a struct net parameter to sock_create_kern 2015-05-11 10:50:17 -04:00
sysctl_net.c