kernel-ark/kernel
Linus Torvalds 04e2f1741d Add memory barrier semantics to wake_up() & co
Oleg Nesterov and others have pointed out that on some architectures,
the traditional sequence of

	set_current_state(TASK_INTERRUPTIBLE);
	if (CONDITION)
		return;
	schedule();

is racy wrt another CPU doing

	CONDITION = 1;
	wake_up_process(p);

because while set_current_state() has a memory barrier separating
setting of the TASK_INTERRUPTIBLE state from reading of the CONDITION
variable, there is no such memory barrier on the wakeup side.

Now, wake_up_process() does actually take a spinlock before it reads and
sets the task state on the waking side, and on x86 (and many other
architectures) that spinlock is in fact equivalent to a memory barrier,
but that is not generally guaranteed.  The write that sets CONDITION
could move into the critical region protected by the runqueue spinlock.

However, adding a smp_wmb() to before the spinlock should now order the
writing of CONDITION wrt the lock itself, which in turn is ordered wrt
the accesses within the spinlock (which includes the reading of the old
state).

This should thus close the race (which probably has never been seen in
practice, but since smp_wmb() is a no-op on x86, it's not like this will
make anything worse either on the most common architecture where the
spinlock already gave the required protection).

Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23 18:05:03 -08:00
..
irq genirq: do not leave interupts enabled on free_irq 2008-02-19 10:43:58 +01:00
power PM: Introduce PM_EVENT_HIBERNATE callback state 2008-02-23 10:40:04 -08:00
time timer_list: print relative expiry time signed 2008-02-17 17:29:38 +01:00
.gitignore Update kernel/.gitignore with new auto-generated files 2008-02-09 23:27:01 -08:00
acct.c
audit_tree.c Introduce path_put() 2008-02-14 21:13:33 -08:00
audit.c d_path: Make d_path() use a struct path 2008-02-14 21:17:09 -08:00
audit.h
auditfilter.c Introduce path_put() 2008-02-14 21:13:33 -08:00
auditsc.c Audit: use == not = in if statements 2008-02-18 18:46:28 -08:00
backtracetest.c
capability.c
cgroup_debug.c
cgroup.c cgroup: remove dead code in cgroup_get_rootdir() 2008-02-23 17:13:25 -08:00
compat.c hrtimer: don't modify restart_block->fn in restart functions 2008-02-10 10:48:03 +01:00
configs.c
cpu.c cpu: fix section mismatch warnings for enable_nonboot_cpus 2008-02-08 09:22:41 -08:00
cpuset.c
delayacct.c
dma.c
exec_domain.c
exit.c Use struct path in fs_struct 2008-02-14 21:13:33 -08:00
extable.c
fork.c Use struct path in fs_struct 2008-02-14 21:13:33 -08:00
futex_compat.c futex: runtime enable pi and robust functionality 2008-02-23 17:12:15 -08:00
futex.c futex: runtime enable pi and robust functionality 2008-02-23 17:12:15 -08:00
hrtimer.c hrtimer: catch expired CLOCK_REALTIME timers early 2008-02-14 22:08:30 +01:00
itimer.c
kallsyms.c
Kconfig.hz
Kconfig.preempt
kexec.c
kfifo.c
kmod.c Dont touch fs_struct in usermodehelper 2008-02-14 21:13:32 -08:00
kprobes.c
ksysfs.c
kthread.c
latencytop.c
lockdep_internals.h
lockdep_proc.c
lockdep.c
Makefile avoid overflows in kernel/time.c 2008-02-08 09:22:39 -08:00
marker.c markers: fix sparse warnings in markers.c 2008-02-23 17:12:14 -08:00
module.c modules: do not try to add sysfs attributes if !CONFIG_SYSFS 2008-02-21 15:27:08 -08:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
ns_cgroup.c
nsproxy.c
panic.c
params.c Add new string functions strict_strto* and convert kernel params to use them 2008-02-08 09:22:41 -08:00
pid_namespace.c
pid.c
pm_qos_params.c
posix-cpu-timers.c Use find_task_by_vpid in posix timers 2008-02-08 09:22:41 -08:00
posix-timers.c hrtimer: check relative timeouts for overflow 2008-02-14 22:08:30 +01:00
printk.c printk_ratelimit() functions should use CONFIG_PRINTK 2008-02-08 09:22:39 -08:00
profile.c
ptrace.c
rcuclassic.c
rcupdate.c rcupdate: fix comment 2008-02-13 16:21:18 -08:00
rcupreempt_trace.c
rcupreempt.c
rcutorture.c
relay.c
res_counter.c
resource.c
rtmutex_common.h Don't operate with pid_t in rtmutex tester 2008-02-08 09:22:41 -08:00
rtmutex-debug.c Don't operate with pid_t in rtmutex tester 2008-02-08 09:22:41 -08:00
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c hrtimer: more hrtimer_init_sleeper() fallout. 2008-02-13 15:45:36 +01:00
rtmutex.h
rwsem.c
sched_debug.c
sched_fair.c
sched_idletask.c
sched_rt.c sched: rt-group: make rt groups scheduling configurable 2008-02-13 15:45:40 +01:00
sched_stats.h
sched.c Add memory barrier semantics to wake_up() & co 2008-02-23 18:05:03 -08:00
seccomp.c
signal.c remove final fastcall users 2008-02-13 16:21:18 -08:00
softirq.c
softlockup.c
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys_ni.c
sys.c
sysctl_check.c
sysctl.c hugetlb: fix overcommit locking 2008-02-13 16:21:18 -08:00
taskstats.c
test_kprobes.c
time.c avoid overflows in kernel/time.c 2008-02-08 09:22:39 -08:00
timeconst.pl timeconst.pl: correct reversal of USEC_TO_HZ and HZ_TO_USEC 2008-02-12 14:29:26 -08:00
timer.c
tsacct.c
uid16.c
user_namespace.c
user.c sched: rt-group: make rt groups scheduling configurable 2008-02-13 15:45:40 +01:00
utsname_sysctl.c
utsname.c
wait.c
workqueue.c