kernel-ark/include
Alexei Starovoitov 969bf05eb3 bpf: direct packet access
Extended BPF carried over two instructions from classic to access
packet data: LD_ABS and LD_IND. They're highly optimized in JITs,
but due to their design they have to do length check for every access.
When BPF is processing 20M packets per second single LD_ABS after JIT
is consuming 3% cpu. Hence the need to optimize it further by amortizing
the cost of 'off < skb_headlen' over multiple packet accesses.
One option is to introduce two new eBPF instructions LD_ABS_DW and LD_IND_DW
with similar usage as skb_header_pointer().
The kernel part for interpreter and x64 JIT was implemented in [1], but such
new insns behave like old ld_abs and abort the program with 'return 0' if
access is beyond linear data. Such hidden control flow is hard to workaround
plus changing JITs and rolling out new llvm is incovenient.

Therefore allow cls_bpf/act_bpf program access skb->data directly:
int bpf_prog(struct __sk_buff *skb)
{
  struct iphdr *ip;

  if (skb->data + sizeof(struct iphdr) + ETH_HLEN > skb->data_end)
      /* packet too small */
      return 0;

  ip = skb->data + ETH_HLEN;

  /* access IP header fields with direct loads */
  if (ip->version != 4 || ip->saddr == 0x7f000001)
      return 1;
  [...]
}

This solution avoids introduction of new instructions. llvm stays
the same and all JITs stay the same, but verifier has to work extra hard
to prove safety of the above program.

For XDP the direct store instructions can be allowed as well.

The skb->data is NET_IP_ALIGNED, so for common cases the verifier can check
the alignment. The complex packet parsers where packet pointer is adjusted
incrementally cannot be tracked for alignment, so allow byte access in such cases
and misaligned access on architectures that define efficient_unaligned_access

[1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dw

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:53 -04:00
..
acpi
asm-generic asm-generic/futex: Re-enable preemption in futex_atomic_cmpxchg_inatomic() 2016-04-21 11:06:09 +02:00
clocksource
crypto
drm drm: Loongson-3 doesn't fully support wc memory 2016-04-22 10:24:11 +10:00
dt-bindings
keys
kvm
linux net/mlx4: Avoid wrong virtual mappings 2016-05-05 23:23:05 -04:00
math-emu
media [media] media: vb2: Fix regression on poll() for RW mode 2016-04-25 10:21:23 -03:00
memory
misc
net Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 2016-05-04 17:13:34 -04:00
pcmcia
ras
rdma IB/security: Restrict use of the write() interface 2016-04-28 12:03:16 -04:00
rxrpc
scsi
soc
sound ALSA: hda - Update BCLK also at hotplug for i915 HSW/BDW 2016-04-26 10:11:11 +02:00
target
trace perf, bpf: minimize the size of perf_trace_() tracepoint handler 2016-04-21 13:48:20 -04:00
uapi bpf: direct packet access 2016-05-06 16:01:53 -04:00
video
xen
Kbuild