Impact: fix rare (but currently harmless) miscompile with certain configs and gcc versions
Hugh Dickins noticed that strncpy_from_user() was miscompiled
in some circumstances with gcc 4.3.
Thanks to Hugh's excellent analysis it was easy to track down.
Hugh writes:
> Try building an x86_64 defconfig 2.6.29-rc1 kernel tree,
> except not quite defconfig, switch CONFIG_PREEMPT_NONE=y
> and CONFIG_PREEMPT_VOLUNTARY off (because it expands a
> might_fault() there, which hides the issue): using a
> gcc 4.3.2 (I've checked both openSUSE 11.1 and Fedora 10).
>
> It generates the following:
>
> 0000000000000000 <__strncpy_from_user>:
> 0: 48 89 d1 mov %rdx,%rcx
> 3: 48 85 c9 test %rcx,%rcx
> 6: 74 0e je 16 <__strncpy_from_user+0x16>
> 8: ac lods %ds:(%rsi),%al
> 9: aa stos %al,%es:(%rdi)
> a: 84 c0 test %al,%al
> c: 74 05 je 13 <__strncpy_from_user+0x13>
> e: 48 ff c9 dec %rcx
> 11: 75 f5 jne 8 <__strncpy_from_user+0x8>
> 13: 48 29 c9 sub %rcx,%rcx
> 16: 48 89 c8 mov %rcx,%rax
> 19: c3 retq
>
> Observe that "sub %rcx,%rcx; mov %rcx,%rax", whereas gcc 4.2.1
> (and many other configs) say "sub %rcx,%rdx; mov %rdx,%rax".
> Isn't it returning 0 when it ought to be returning strlen?
The asm constraints for the strncpy_from_user() result were missing an
early clobber, which tells gcc that the last output arguments
are written before all input arguments are read.
Also add more early clobbers in the rest of the file and fix 32-bit
usercopy.c in the same way.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
[ since this API is rarely used and no in-kernel user relies on a 'len'
return value (they only rely on negative return values) this miscompile
was never noticed in the field. But it's worth fixing it nevertheless. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
copy_to/from_user and all its variants (except the atomic ones) can take a
page fault and perform non-trivial work like taking mmap_sem and entering
the filesyste/pagecache.
Unfortunately, this often escapes lockdep because a common pattern is to
use it to read in some arguments just set up from userspace, or write data
back to a hot buffer. In those cases, it will be unlikely for page reclaim
to get a window in to cause copy_*_user to fault.
With the new might_lock primitives, add some annotations to x86. I don't
know if I caught all possible faulting points (it's a bit of a maze, and I
didn't really look at 32-bit). But this is a starting point.
Boots and runs OK so far.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: performance optimization
I did some rebenchmarking with modern compilers and dropping
-funroll-loops makes the function consistently go faster by a few
percent. So drop that flag.
Thanks to Richard Guenther for a hint.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Remove an unnecessary level of abstraction in the msr-on-cpu library.
Although this duplicates some code, the duplicated code is less than
the additional code, and this way should be faster.
Additionally, change the order of the functions to make the regular
structure of this file more obvious.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Propagate error (-ENXIO) from smp_call_function_single(). These
errors can happen when a CPU is unplugged while the MSR driver is
open.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
movsl_mask is currently defined in arch/x86/kernel/cpu/intel.c, which
contains code specific to Intel CPUs. However, movsl_mask is used in
the non-CPU specific code in arch/x86/lib/usercopy_32.c, which breaks
the compilation when support for Intel CPUs is compiled out.
This patch solves this problem by moving movsl_mask's definition close
to its users in arch/x86/lib/usercopy_32.c.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: michael@free-electrons.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
New ALIGN_DESTINATION macro has sad typo: r8d register was used instead
of ecx in fixup section. This can be considered as a regression.
Register ecx was also wrongly loaded with value in r8d in
copy_user_nocache routine.
Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gas 2.15 complains about 32-bit registers being used in lea.
AS arch/x86/lib/copy_user_64.o
/local/scratch-2/jeremy/hg/xen/paravirt/linux/arch/x86/lib/copy_user_64.S: Assembler messages:
/local/scratch-2/jeremy/hg/xen/paravirt/linux/arch/x86/lib/copy_user_64.S:188: Error: `(%edx,%ecx,8)' is not a valid 64 bit base/index expression
/local/scratch-2/jeremy/hg/xen/paravirt/linux/arch/x86/lib/copy_user_64.S:257: Error: `(%edx,%ecx,8)' is not a valid 64 bit base/index expression
AS arch/x86/lib/copy_user_nocache_64.o
/local/scratch-2/jeremy/hg/xen/paravirt/linux/arch/x86/lib/copy_user_nocache_64.S: Assembler messages:
/local/scratch-2/jeremy/hg/xen/paravirt/linux/arch/x86/lib/copy_user_nocache_64.S:107: Error: `(%edx,%ecx,8)' is not a valid 64 bit base/index expression
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
putuser_32.S and putuser_64.S are merged into putuser.S.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In putuser_32.S and putuser_64.S, replace things like .quad, .long,
and explicit references to [r|e]ax for the apropriate macros
in asm/asm.h.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove them where unambiguous.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In putuser_64.S, do it the i386 way, and replace the code
in beginning and end of functions with macros, since it's
always the same thing. Save lines.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Instead of operating over a register we need to put back
into normal state afterwards (the memory position), just
sub from rbx, which is trashed anyway. We can save a few instructions.
Also, this is the i386 way.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is consistent with i386 usage.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Instead of clobbering r8, clobber rbx, which is the i386 way.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Clobber it in the inline asm macros, and let the compiler do this for us.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
getuser_32.S and getuser_64.S are merged into getuser.S.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Switch .long and .quad with _ASM_PTR in getuser*.S.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are situations in which the architecture wants to use the
register that represents its word-size, whatever it is. For those,
introduce __ASM_REG in asm.h, along with the first users _ASM_AX
and _ASM_DX. They have users waiting for it, namely the getuser
functions.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The instructions access registers, so the size is unambiguous.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is for consistency with i386.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Instead of doing a sub after the addition, use the
offset directly at the memory operand of the mov instructions.
This is the way i386 do.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since the instructions refer to registers, they'll be able
to figure it out.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There's really no reason to clobber r8 or pass the address in rcx.
We can safely use only two registers (which we already have to touch anyway)
to do the job.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
delay_32.c, delay_64.c are now equal, and are integrated into delay.c.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For x86_64, we can't just use %0, as it would
generate a mul against rdx, which is not really what we
want (note the ">> 32" in x86_64 version).
Using a u64 variable with a shift in i386 generates bad code,
so the solution is to explicitly use %%edx in inline assembly
for both.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This way we achieve the same code for both arches.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is for consistency with i386. We call use_tsc_delay()
at tsc initialization for x86_64, so we'll be always using it.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the "l" from inline asm at arch/x86/lib/delay_32.c.
It is not needed.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Most users by far do not care about the exact return value (they only
really care about whether the copy succeeded in its entirety or not),
but a few special core routines actually care deeply about exactly how
many bytes were copied from user space.
And the unrolled versions of the x86-64 user copy routines would
sometimes report that it had copied more bytes than it actually had.
Very few uses actually have partial copies to begin with, but to make
this bug even harder to trigger, most x86 CPU's use the "rep string"
instructions for normal user copies, and that version didn't have this
issue.
To make it even harder to hit, the one user of this that really cared
about the return value (and used the uncached version of the copy that
doesn't use the "rep string" instructions) was the generic write
routine, which pre-populated its source, once more hiding the problem by
avoiding the exception case that triggers the bug.
In other words, very special thanks to Bron Gondwana who not only
triggered this, but created a test-program to show it, and bisected the
behavior down to commit 08291429cf ("mm:
fix pagecache write deadlocks") which changed the access pattern just
enough that you can now trigger it with 'writev()' with multiple
iovec's.
That commit itself was not the cause of the bug, it just allowed all the
stars to align just right that you could trigger the problem.
[ Side note: this is just the minimal fix to make the copy routines
(with __copy_from_user_inatomic_nocache as the particular version that
was involved in showing this) have the right return values.
We really should improve on the exceptional case further - to make the
copy do a byte-accurate copy up to the exact page limit that causes it
to fail. As it is, the callers have to do extra work to handle the
limit case gracefully. ]
Reported-by: Bron Gondwana <brong@fastmail.fm>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(which didn't have this problem), and since
most users that do the carethis was very hard to trigger, but
when trying to understand how Bogomips are implemented I have found a
bug in arch/i386/lib/delay.c file, delay_loop function.
The function fails for loops > 2^31+1. It because SF is set when dec
returns numbers > 2^31.
The fix is to use jnz instruction instead of jns (and add one decl
instruction to the end to have exactly the same number of loops as in
original version).
Martin Mares observed:
> It is a long time since I have hacked that file, but you should definitely
> make sure that the function is never called with a zero argument. In such
> case, the original version made just a single pass, but your version
> makes 2^32 of them.
fixed that.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The RT team has been searching for a nasty latency. This latency shows
up out of the blue and has been seen to be as big as 5ms!
Using ftrace I found the cause of the latency.
pcscd-2995 3dNh1 52360300us : irq_exit (smp_apic_timer_interrupt)
pcscd-2995 3dN.2 52360301us : idle_cpu (irq_exit)
pcscd-2995 3dN.2 52360301us : rcu_irq_exit (irq_exit)
pcscd-2995 3dN.1 52360771us : smp_apic_timer_interrupt (apic_timer_interrupt
)
pcscd-2995 3dN.1 52360771us : exit_idle (smp_apic_timer_interrupt)
Here's an example of a 400 us latency. pcscd took a timer interrupt and
returned with "need resched" enabled, but did not reschedule until after
the next interrupt came in at 52360771us 400us later!
At first I thought we somehow missed a preemption check in entry.S. But
I also noticed that this always seemed to happen during a __delay call.
pcscd-2995 3dN.2 52360836us : rcu_irq_exit (irq_exit)
pcscd-2995 3.N.. 52361265us : preempt_schedule (__delay)
Looking at the x86 delay, I found my problem.
In git commit 35d5d08a08, Andrew Morton
placed preempt_disable around the entire delay due to TSC's not working
nicely on SMP. Unfortunately for those that care about latencies this
is devastating! Especially when we have callers to mdelay(8).
Here I enable preemption during the loop and account for anytime the task
migrates to a new CPU. The delay asked for may be extended a bit by
the migration, but delay only guarantees that it will delay for that minimum
time. Delaying longer should not be an issue.
[
Thanks to Thomas Gleixner for spotting that cpu wasn't updated,
and to place the rep_nop between preempt_enabled/disable.
]
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: akpm@osdl.org
Cc: Clark Williams <clark.williams@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
Cc: Gregory Haskins <ghaskins@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi-suse@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix this symbol export problem:
Building modules, stage 2.
MODPOST 193 modules
ERROR: "csum_partial" [fs/reiserfs/reiserfs.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
This is due to a known weakness of symbol exports: if a symbol's
only in-core user is an EXPORT_SYMBOL from a lib-y section, the
symbol is not linked in.
The solution is to move the export to x8664_ksyms_64.c - but the real
solution would be to fix kbuild.
Signed-off-by: Ingo Molnar <mingo@elte.hu>