This patch adds support for building PMU driver as module. It exports
the functions perf_pmu_{register,unregister}() and adds reference tracking
for the PMU driver module.
When the PMU driver is built as a module, each active event of the PMU
holds a reference to the driver module.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1395133004-23205-1-git-send-email-zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
CPUs which should support the RAPL counters according to
Family/Model/Stepping may still issue #GP when attempting to access
the RAPL MSRs. This may happen when Linux is running under KVM and
we are passing-through host F/M/S data, for example. Use rdmsrl_safe
to first access the RAPL_POWER_UNIT MSR; if this fails, do not
attempt to use this PMU.
Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com
Cc: zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: ak@linux.intel.com
Cc: linux-kernel@vger.kernel.org
[ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull uprobes fixes and cleanups from Oleg Nesterov:
"Any probed jmp/call can kill the application, see the changelog in 11/15."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change branch_setup_xol_ops() to simply use opc1 = OPCODE2(insn) - 0x10
if OPCODE1() == 0x0f; this matches the "short" jmp which checks the same
condition.
Thanks to lib/insn.c, it does the rest correctly. branch->ilen/offs are
correct no matter if this jmp is "near" or "short".
Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Teach branch_emulate_op() to emulate the conditional "short" jmp's which
check regs->flags.
Note: this doesn't support jcxz/jcexz, loope/loopz, and loopne/loopnz.
They all are rel8 and thus they can't trigger the problem, but perhaps
we will add the support in future just for completeness.
Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
See the previous "Emulate unconditional relative jmp's" which explains
why we can not execute "jmp" out-of-line, the same applies to "call".
Emulating of rip-relative call is trivial, we only need to additionally
push the ret-address. If this fails, we execute this instruction out of
line and this should trigger the trap, the probed application should die
or the same insn will be restarted if a signal handler expands the stack.
We do not even need ->post_xol() for this case.
But there is a corner (and almost theoretical) case: another thread can
expand the stack right before we execute this insn out of line. In this
case it hit the same problem we are trying to solve. So we simply turn
the probed insn into "call 1f; 1:" and add ->post_xol() which restores
->sp and restarts.
Many thanks to Jonathan who finally found the standalone reproducer,
otherwise I would never resolve the "random SIGSEGV's under systemtap"
bug-report. Now that the problem is clear we can write the simplified
test-case:
void probe_func(void), callee(void);
int failed = 1;
asm (
".text\n"
".align 4096\n"
".globl probe_func\n"
"probe_func:\n"
"call callee\n"
"ret"
);
/*
* This assumes that:
*
* - &probe_func = 0x401000 + a_bit, aligned = 0x402000
*
* - xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
* as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
*
* so we can target the non-canonical address from xol_vma using
* the simple math below, 100 * 4096 is just the random offset
*/
asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1 + 100 * 4096\n");
void callee(void)
{
failed = 0;
}
int main(void)
{
probe_func();
return failed;
}
It SIGSEGV's if you probe "probe_func" (although this is not very reliable,
randomize_va_space/etc can change the placement of xol area).
Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8)
differently. This patch relies on lib/insn.c and thus implements the intel's
behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use
this insn, so we postpone the fix until we decide what should we do; emulate
or not, support or not, etc.
Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Finally we can kill the ugly (and very limited) code in __skip_sstep().
Just change branch_setup_xol_ops() to treat "nop" as jmp to the next insn.
Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes
"(rep;)+ nop;" at least, and (afaics) much more.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Currently we always execute all insns out-of-line, including relative
jmp's and call's. This assumes that even if regs->ip points to nowhere
after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will
update it correctly.
However, this doesn't work if this regs->ip == xol_vaddr + insn_offset
is not canonical. In this case CPU generates #GP and general_protection()
kills the task which tries to execute this insn out-of-line.
Now that we have uprobe_xol_ops we can teach uprobes to emulate these
insns and solve the problem. This patch adds branch_xol_ops which has
a single branch_emulate_op() hook, so far it can only handle rel8/32
relative jmp's.
TODO: move ->fixup into the union along with rip_rela_target_address.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jonathan Lebon <jlebon@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
1. Add the trivial sizeof_long() helper and change other callers of
is_ia32_task() to use it.
TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
not necessarily mean 32bit. Fortunately syscall-like insns can't be
probed so it actually works, but it would be better to rename and
use is_ia32_frame().
2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr()
and adjust_ret_addr() should be named "nleft". And in fact only the
last copy_to_user() in arch_uretprobe_hijack_return_addr() actually
needs to inspect the non-zero error code.
TODO: adjust_ret_addr() should die. We can always calculate the value
we need to write into *regs->sp, just UPROBE_FIX_CALL should record
insn->length.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
SIGILL after the failed arch_uprobe_post_xol() should only be used as
a last resort, we should try to restart the probed insn if possible.
Currently only adjust_ret_addr() can fail, and this can only happen if
another thread unmapped our stack after we executed "call" out-of-line.
Most probably the application if buggy, but even in this case it can
have a handler for SIGSEGV/etc. And in theory it can be even correct
and do something non-trivial with its memory.
Of course we can't restart unconditionally, so arch_uprobe_post_xol()
does this only if ->post_xol() returns -ERESTART even if currently this
is the only possible error.
default_post_xol_op(UPROBE_FIX_CALL) can always restart, but as Jim
pointed out it should not forget to pop off the return address pushed
by this insn executed out-of-line.
Note: this is not "perfect", we do not want the extra handler_chain()
after restart, but I think this is the best solution we can realistically
do without too much uglifications.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Currently the error from arch_uprobe_post_xol() is silently ignored.
This doesn't look good and this can lead to the hard-to-debug problems.
1. Change handle_singlestep() to loudly complain and send SIGILL.
Note: this only affects x86, ppc/arm can't fail.
2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and
avoid TF games if it is going to return an error.
This can help to to analyze the problem, if nothing else we should
not report ->ip = xol_slot in the core-file.
Note: this means that handle_riprel_post_xol() can be called twice,
but this is fine because it is idempotent.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start,
but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic.
Move the callsite into "default" case and change the "0xff" case to
fall-through.
We are going to add the various hooks to handle the rip-relative
jmp/call instructions (and more), we need this change to enforce the
fact that the new code can not conflict with is_riprel_insn() logic
which, after this change, can only be used by default_xol_ops.
Note: arch_uprobe_abort_xol() still calls handle_riprel_post_xol()
directly. This is fine unless another _xol_ops we may add later will
need to reuse "UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX" bits in ->fixup.
In this case we can add uprobe_xol_ops->abort() hook, which (perhaps)
we will need anyway in the long term.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops",
move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default
set of methods and change arch_uprobe_pre/post_xol() accordingly.
This way we can add the new uprobe_xol_ops's to handle the insns
which need the special processing (rip-relative jmp/call at least).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
No functional changes. Preparation to simplify the review of the next
change. Just reorder the code in arch_uprobe_pre/post_xol() functions
so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end.
Also change arch_uprobe_pre_xol() to use utask instead of autask, to
make the code more symmetrical with arch_uprobe_post_xol().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cosmetic. Move pre_xol_rip_insn() and handle_riprel_post_xol() up to
the closely related handle_riprel_insn(). This way it is simpler to
read and understand this code, and this lessens the number of ifdef's.
While at it, update the comment in handle_riprel_post_xol() as Jim
suggested.
TODO: rename them somehow to make the naming consistent.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Kill the "mm->context.ia32_compat" check in handle_riprel_insn(), if
it is true insn_rip_relative() must return false. validate_insn_bits()
passed "ia32_compat" as !x86_64 to insn_init(), and insn_rip_relative()
checks insn->x86_64.
Also, remove the no longer needed "struct mm_struct *mm" argument and
the unnecessary "return" at the end.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
No functional changes, preparation.
Shift the code from prepare_fixups() to arch_uprobe_analyze_insn()
with the following modifications:
- Do not call insn_get_opcode() again, it was already called
by validate_insn_bits().
- Move "case 0xea" up. This way "case 0xff" can fall through
to default case.
- change "case 0xff" to use the nested "switch (MODRM_REG)",
this way the code looks a bit simpler.
- Make the comments look consistent.
While at it, kill the initialization of rip_rela_target_address and
->fixups, we can rely on kzalloc(). We will add the new members into
arch_uprobe, it would be better to assume that everything is zero by
default.
TODO: cleanup/fix the mess in validate_insn_bits() paths:
- validate_insn_64bits() and validate_insn_32bits() should be
unified.
- "ifdef" is not used consistently; if good_insns_64 depends
on CONFIG_X86_64, then probably good_insns_32 should depend
on CONFIG_X86_32/EMULATION
- the usage of mm->context.ia32_compat looks wrong if the task
is TIF_X32.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
UPROBE_COPY_INSN, UPROBE_SKIP_SSTEP, and uprobe->flags must die. This
patch kills UPROBE_SKIP_SSTEP. I never understood why it was added;
not only it doesn't help, it harms.
It can only help to avoid arch_uprobe_skip_sstep() if it was already
called before and failed. But this is ugly, if we want to know whether
we can emulate this instruction or not we should do this analysis in
arch_uprobe_analyze_insn(), not when we hit this probe for the first
time.
And in fact this logic is simply wrong. arch_uprobe_skip_sstep() can
fail or not depending on the task/register state, if this insn can be
emulated but, say, put_user() fails we need to xol it this time, but
this doesn't mean we shouldn't try to emulate it when this or another
thread hits this bp next time.
And this is the actual reason for this change. We need to emulate the
"call" insn, but push(return-address) can obviously fail.
Per-arch notes:
x86: __skip_sstep() can only emulate "rep;nop". With this
change it will be called every time and most probably
for no reason.
This will be fixed by the next changes. We need to
change this suboptimal code anyway.
arm: Should not be affected. It has its own "bool simulate"
flag checked in arch_uprobe_skip_sstep().
ppc: Looks like, it can emulate almost everything. Does it
actually need to record the fact that emulate_step()
failed? Hopefully not. But if yes, it can add the ppc-
specific flag into arch_uprobe.
TODO: rename arch_uprobe_skip_sstep() to arch_uprobe_emulate_insn(),
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: David A. Long <dave.long@linaro.org>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Current kprobes in-kernel page fault handler doesn't
expect that its single-stepping can be interrupted by
an NMI handler which may cause a page fault(e.g. perf
with callback tracing).
In that case, the page-fault handled by kprobes and it
misunderstands the page-fault has been caused by the
single-stepping code and tries to recover IP address
to probed address.
But the truth is the page-fault has been caused by the
NMI handler, and do_page_fault failes to handle real
page fault because the IP address is modified and
causes Kernel BUGs like below.
----
[ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40
To handle this correctly, I fixed the kprobes fault
handler to ensure the faulted ip address is its own
single-step buffer instead of checking current kprobe
state.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fche@redhat.com
Cc: systemtap@sourceware.org
Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
User visible:
. Add --percentage option to control absolute/relative percentage output (Namhyung Kim)
Developer stuff:
. Add --list-cmds to 'kmem', 'mem', 'lock' and 'sched', for use by completion scripts (Ramkumar Ramachandra)
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTTsj7AAoJEPZqUSBWB3s98OEP/AzBJviyJSRqf/DNxMZpnuyT
E1Pd9R8IUT50sWGklEYZcjeCG8JPF46sHcuf5v/tXCwni4u0muJ2imcOJzBKf4it
XVpXA+E7bNt/oAr+8RVcZ0mhusX2DqafCZOVRTnuUv5y5JG7cWd9fC8qW2Hu6oYe
SOfjNRpwi9tV6aCkJUMbJT6SaW+rMGSxHLxv77KK/uP50Ekd/H06TyVspsJRfFeB
rckEIm8p3svQWX7EAHs09p7BxAarTbrK0TWzgfL3YEnCee/y0y9xxN32Pdc8fGj5
bKx6uRe5HWDcZZZvDIIDQeZLECwBu4xJvQPeRE9AeDs8NgXyvnNvljED+RSXcQ/F
zxnxYmviZSz/15B8fQ3TWIv+fd1rsc7hnbnmY1U+vPWdRgsc0spTFKZiKtDkLIBq
jZI+vwhgznDCjajBdKTAVQj6lErupP9ML6HIRcQPWu1YHoDbPiM+ZZ0xskIcsOuE
AfMAu5reYXW4bPGIA80AEEKLRjTjxcYlNadyQOHYZO/fE+SPs7iW+c/sm/+boN+r
dWX8NeK6Nfz45wunBIxvXaCM/MnBvTuH657fZQlKuf43radnrduZRjf/Q+yDUEfQ
VovatgZlK7PRJiIfBUfYfG+ov1cr0F9qJKBYyC8fIrNtWNsvfIwFd3ubkOQBW/qG
Q0PYdJPNL0furiI0VWO0
=cjaT
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf into perf/core
Pull perf/core improvements and fixes from Jiri Olsa:
User visible changes:
* Add --percentage option to control absolute/relative percentage output (Namhyung Kim)
Plumbing changes:
* Add --list-cmds to 'kmem', 'mem', 'lock' and 'sched', for use by completion scripts (Ramkumar Ramachandra)
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Various fixes:
- reboot regression fix
- build message spam fix
- GPU quirk fix
- 'make kvmconfig' fix
plus the wire-up of the renameat2() system call on i386"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Remove the PCI reboot method from the default chain
x86/build: Supress "Nothing to be done for ..." messages
x86/gpu: Fix sign extension issue in Intel graphics stolen memory quirks
x86/platform: Fix "make O=dir kvmconfig"
i386: Wire up the renameat2() syscall
Pull perf fixes from Ingo Molnar:
"Tooling fixes, plus a simple hardware-enablement patch for the Intel
RAPL PMU (energy use measurement) on Haswell CPUs, which I hope is
still fine at this stage"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Instead of redirecting flex output, use -o
perf tools: Fix double free in perf test 21 (code-reading.c)
perf stat: Initialize statistics correctly
perf bench: Set more defaults in the 'numa' suite
perf bench: Fix segfault at the end of an 'all' execution
perf bench: Update manpage to mention numa and futex
perf probe: Use dwarf_getcfi_elf() instead of dwarf_getcfi()
perf probe: Fix to handle errors in line_range searching
perf probe: Fix --line option behavior
perf tools: Pick up libdw without explicit LIBDW_DIR
MAINTAINERS: Change e-mail to kernel.org one
perf callchains: Disable unwind libraries when libelf isn't found
tools lib traceevent: Do not call warning() directly
tools lib traceevent: Print event name when show warning if possible
perf top: Fix documentation of invalid -s option
perf/x86: Enable DRAM RAPL support on Intel Haswell
- Fix a couple of barnsjukdomar on the Rockchip driver.
- Remove an idiotic debug print I happened to leave behind
in the Nomadik driver.
- Fixup the Qualcomm MSM interrupt handling code for the
TLMM v2.
- Three patches renaming the Broadcom Capri driver to
BCM28155. This has been falling between the chairs for
some time due to some cross-tree synchronization
misunderstandings, now I'm fed up with this and just
rename it in this -rc1 phase.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTTPPzAAoJEEEQszewGV1zSBgP/iMS9y9LAouPmzkXseCeT4aM
eb2pEi1wtOSWCeGNMIx4X1GRkbHS+T+5Wk6dmh46RUn/8b1HY4gkLoJLnrQGML1l
zS9tdDyURGmGYuVAr0ghLq0LTruvcViCQageRzO8yllTkJb8Tf6rfKE2+y9BsGRH
CdBIE9/XYP2Z2Wwd0fLAPMFpa9wPz8eNeF7XGyQ20+DSuRzNMmDq6AUhmlCfIMnL
TxlJIT1vpAAt/e3wcRvmr2n/Nrlz28ajP/VmzSm2dSqTajy8ofWqgFwQLiItJJ3q
VuJto3eKy+xGT4IVO+ozXCZd0kDgaAeiz7PNWHpbFBec0y4QFmVFxtPpw/Zff7RH
136Vh0KahX47TaJ1GGvB5622OLjsQzwH2TY24Sn9WRzNT8VS0pv2F3RQk8tVrWrd
fquQksuMEaCay+MHcBhI1mJlQcgTFJNsenmY8KOIjFykeBz5x6bOtBkbOCjzuogr
yVgaOW/zV7b0zFQ6Vv9eZFf7WYhBXE7w1Im5D2EMR9mJpNBWbj9A8GQpCjc0+VXB
jN2hWmj5qQtQW6z67VEn3l8Mqzpazsu61zbcB3F4Ma0m247vakIkk5I8GMmLW3/c
UMr6RG2IHIaECNdS1Ir1UkBnDCFb7CQq3j0Bh2UynyU9jKGtRZU/P7SdD7LqHsi+
Q56kdNSbYdRPpjPWmZUA
=i5qY
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pincontrol fixes from Linus Walleij:
"A first set of pin control fixes for the v3.15 series:
- Fix a couple of barnsjukdomar on the Rockchip driver.
- Remove an idiotic debug print I happened to leave behind in the
Nomadik driver.
- Fixup the Qualcomm MSM interrupt handling code for the TLMM v2.
- Three patches renaming the Broadcom Capri driver to BCM28155. This
has been falling between the chairs for some time due to some
cross-tree synchronization misunderstandings, now I'm fed up with
this and just rename it in this -rc1 phase"
* tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: fix typo in bindings documentation
Update bcm_defconfig with new pinctrl CONFIG
pinctrl: Rename Broadcom Capri pinctrl driver
pinctrl: msm: Correct interrupt code for TLMM v2
pinctrl: nomadik: delete stray debug print
pinctrl: rockchip: handle first half of rk3188-bank0 correctly
pinctrl: rockchip: add return value to rockchip_set_mux
pinctrl: rockchip: fix offset of mux registers for rk3188
Pull s390 patches from Martin Schwidefsky:
"An update to the oops output with additional information about the
crash. The renameat2 system call is enabled. Two patches in regard
to the PTR_ERR_OR_ZERO cleanup. And a bunch of bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/sclp_cmd: replace PTR_RET with PTR_ERR_OR_ZERO
s390/sclp: replace PTR_RET with PTR_ERR_OR_ZERO
s390/sclp_vt220: Fix kernel panic due to early terminal input
s390/compat: fix typo
s390/uaccess: fix possible register corruption in strnlen_user_srst()
s390: add 31 bit warning message
s390: wire up sys_renameat2
s390: show_registers() should not map user space addresses to kernel symbols
s390/mm: print control registers and page table walk on crash
s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
s390: fix control register update
April 2014 Itanium processor specification update:
http://www.intel.com/content/www/us/en/processors/itanium/itanium-specification-update.html
describes this erratum:
=========================================================================
237. Under a complex set of conditions, store to load forwarding for a
sub 8-byte load may complete incorrectly
Problem: A load instruction may complete incorrectly when a code sequence
using 4-byte or smaller load and store operations to the same address
is executed in combination with specific timing of all the following
concurrent conditions: store to load forwarding, alignment checking
enabled, a mis-predicted branch, and complex cache utilization activity.
Implication: The affected sub 8-byte instruction may complete
incorrectly resulting in unpredictable system behavior. There is an
extremely low probability of exposure due to the significant number of
complex microarchitectural concurrent conditions required to encounter
the erratum.
Workaround: Set PSR.ac = 0 to completely avoid the erratum. Disabling
Hyper-Threading will significantly reduce exposure to the conditions
that contribute to encountering the erratum.
Status: See the Summary Table of Changes for the affected steppings.
=========================================================================
[Table of changes essentially lists all models from McKinley to Tukwila]
The PSR.ac bit controls whether the processor will always generate
an unaligned reference trap (0x5a00) for a misaligned data access
(when PSR.ac=1) or if it will let the access succeed when running
on a cpu that implements logic to handle some unaligned accesses.
Way back in 2008 in commit b704882e70
[IA64] Rationalize kernel mode alignment checking
we made the decision to always enable strict checking. We were
already doing so in trap/interrupt context because the common
preamble code set this bit - but the rest of supervisor code
(and by inheritance user code) ran with PSR.ac=0.
We now reverse that decision and set PSR.ac=0 everywhere in the
kernel (also inherited by user processes). This will avoid the
erratum using the method described in the Itanium specification
update. Net effect for users is that the processor will handle
unaligned access when it can (typically with a tiny performance
bubble in the pipeline ... but much less invasive than taking a
trap and having the OS perform the access).
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add hist.percentage option for setting default value of the
symbol_conf.filter_relative. It affects the output of various perf
commands (like perf report, top and diff) only if filter(s) applied.
An user can write .perfconfig file like below to show absolute
percentage of filtered entries by default:
$ cat ~/.perfconfig
[hist]
percentage = absolute
And it can be changed through command line:
$ perf report --percentage relative
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1397145720-8063-6-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
The --percentage option is for controlling overhead percentage
displayed. It can only receive either of "relative" or "absolute" and
affects -c delta output only.
For more information, please see previous commit same thing done to
"perf report".
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1397145720-8063-5-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
The --percentage option is for controlling overhead percentage
displayed. It can only receive either of "relative" or "absolute".
Move the parser callback function into a common location since it's
used by multiple commands now.
For more information, please see previous commit same thing done to
"perf report".
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1397145720-8063-4-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
The --percentage option is for controlling overhead percentage
displayed. It can only receive either of "relative" or "absolute".
"relative" means it's relative to filtered entries only so that the
sum of shown entries will be always 100%. "absolute" means it retains
the original value before and after the filter is applied.
$ perf report -s comm
# Overhead Command
# ........ ............
#
74.19% cc1
7.61% gcc
6.11% as
4.35% sh
4.14% make
1.13% fixdep
...
$ perf report -s comm -c cc1,gcc --percentage absolute
# Overhead Command
# ........ ............
#
74.19% cc1
7.61% gcc
$ perf report -s comm -c cc1,gcc --percentage relative
# Overhead Command
# ........ ............
#
90.69% cc1
9.31% gcc
Note that it has zero effect if no filter was applied.
Suggested-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1397145720-8063-3-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
When filtering by thread, dso or symbol on TUI it also update total
period so that the output shows different result than no filter - the
percentage changed to relative to filtered entries only. Sometimes
this is not desired since users might expect same results with filter.
So new filtered_* fields to hists->stats to count them separately.
They'll be controlled/used by user later.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1397145720-8063-2-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Steve reported a reboot hang and bisected it back to this commit:
a4f1987e4c x86, reboot: Add EFI and CF9 reboot methods into the default list
He heroically tested all reboot methods and found the following:
reboot=t # triple fault ok
reboot=k # keyboard ctrl FAIL
reboot=b # BIOS ok
reboot=a # ACPI FAIL
reboot=e # EFI FAIL [system has no EFI]
reboot=p # PCI 0xcf9 FAIL
And I think it's pretty obvious that we should only try PCI 0xcf9 as a
last resort - if at all.
The other observation is that (on this box) we should never try
the PCI reboot method, but close with either the 'triple fault'
or the 'BIOS' (terminal!) reboot methods.
Thirdly, CF9_COND is a total misnomer - it should be something like
CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...
So this patch fixes the worst problems:
- it orders the actual reboot logic to follow the reboot ordering
pattern - it was in a pretty random order before for no good
reason.
- it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
BOOT_CF9_SAFE flags to make the code more obvious.
- it tries the BIOS reboot method before the PCI reboot method.
(Since 'BIOS' is a terminal reboot method resulting in a hang
if it does not work, this is essentially equivalent to removing
the PCI reboot method from the default reboot chain.)
- just for the miraculous possibility of terminal (resulting
in hang) reboot methods of triple fault or BIOS returning
without having done their job, there's an ordering between
them as well.
Reported-and-bisected-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Aubrey <aubrey.li@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull networking fixes from David Miller:
1) Fix BPF filter validation of netlink attribute accesses, from
Mathias Kruase.
2) Netfilter conntrack generation seqcount not initialized properly,
from Andrey Vagin.
3) Fix comparison mask computation on big-endian in nft_cmp_fast(),
from Patrick McHardy.
4) Properly limit MTU over ipv6, from Eric Dumazet.
5) Fix seccomp system call argument population on 32-bit, from Daniel
Borkmann.
6) skb_network_protocol() should not use hard-coded ETH_HLEN, instead
skb->mac_len needs to be used. From Vlad Yasevich.
7) We have several cases of using socket based communications to
implement a tunnel. For example, some tunnels are encapsulations
over UDP so we use an internal kernel UDP socket to do the
transmits.
These tunnels should behave just like other software devices and
pass the packets on down to the next layer.
Most importantly we want the top-level socket (eg TCP) that created
the traffic to be charged for the SKB memory.
However, once you get into the IP output path, we have code that
assumed that whatever was attached to skb->sk is an IP socket.
To keep the top-level socket being charged for the SKB memory,
whilst satisfying the needs of the IP output path, we now pass in an
explicit 'sk' argument.
From Eric Dumazet.
8) ping_init_sock() leaks group info, from Xiaoming Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
cxgb4: use the correct max size for firmware flash
qlcnic: Fix MSI-X initialization code
ip6_gre: don't allow to remove the fb_tunnel_dev
ipv4: add a sock pointer to dst->output() path.
ipv4: add a sock pointer to ip_queue_xmit()
driver/net: cosa driver uses udelay incorrectly
at86rf230: fix __at86rf230_read_subreg function
at86rf230: remove check if AVDD settled
net: cadence: Add architecture dependencies
net: Start with correct mac_len in skb_network_protocol
Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
cxgb4: Save the correct mac addr for hw-loopback connections in the L2T
net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF
qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
qlcnic: Fix PVID configuration on eSwitch port.
qlcnic: Fix max ring count calculation
qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.
qlcnic: Fix panic due to uninitialzed delayed_work struct in use.
...
The wrong max fw size was being used and causing false
"too big" errors running ethtool -f.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function qlcnic_setup_tss_rss_intr() might enter endless
loop in case pci_enable_msix() contiguously returns a
positive number of MSI-Xs that could have been allocated.
Besides, the function contains 'err = -EIO;' assignment
that never could be reached. This update fixes the
aforementioned issues.
Cc: Shahed Shaikh <shahed.shaikh@qlogic.com>
Cc: Dept-HSGLinuxNICDev@qlogic.com
Cc: netdev@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but
this is unsafe, the module always supposes that this device exists. For example,
ip6gre_tunnel_lookup() may use it unconditionally.
Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we
let ip6gre_destroy_tunnels() do the job).
Introduced by commit c12b395a46 ("gre: Support GRE over IPv6").
CC: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the dst->output() path for ipv4, the code assumes the skb it has to
transmit is attached to an inet socket, specifically via
ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
provider of the packet is an AF_PACKET socket.
The dst->output() method gets an additional 'struct sock *sk'
parameter. This needs a cascade of changes so that this parameter can
be propagated from vxlan to final consumer.
Fixes: 8f646c922d ("vxlan: keep original skb ownership")
Reported-by: lucien xin <lucien.xin@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>