kernel-ark/kernel
Paul Jackson 3077a260e9 [PATCH] cpuset release ABBA deadlock fix
Fix possible cpuset_sem ABBA deadlock if 'notify_on_release' set.

For a particular usage pattern, creating and destroying cpusets fairly
frequently using notify_on_release, on a very large system, this deadlock
can be seen every few days.  If you are not using the cpuset
notify_on_release feature, you will never see this deadlock.

The existing code, on task exit (or cpuset deletion) did:

  get cpuset_sem
  if cpuset marked notify_on_release and is ready to release:
    compute cpuset path relative to /dev/cpuset mount point
    call_usermodehelper() forks /sbin/cpuset_release_agent with path
  drop cpuset_sem

Unfortunately, the fork in call_usermodehelper can allocate memory, and
allocating memory can require cpuset_sem, if the mems_generation values
changed in the interim.  This results in an ABBA deadlock, trying to obtain
cpuset_sem when it is already held by the current task.

To fix this, I put the cpuset path (which must be computed while holding
cpuset_sem) in a temporary buffer, to be used in the call_usermodehelper
call of /sbin/cpuset_release_agent only _after_ dropping cpuset_sem.

So the new logic is:

  get cpuset_sem
  if cpuset marked notify_on_release and is ready to release:
    compute cpuset path relative to /dev/cpuset mount point
    stash path in kmalloc'd buffer
  drop cpuset_sem
  call_usermodehelper() forks /sbin/cpuset_release_agent with path
  free path

The sharp eyed reader might notice that this patch does not contain any
calls to kmalloc.  The existing code in the check_for_release() routine was
already kmalloc'ing a buffer to hold the cpuset path.  In the old code, it
just held the buffer for a few lines, over the cpuset_release_agent() call
that in turn invoked call_usermodehelper().  In the new code, with the
application of this patch, it returns that buffer via the new char
**ppathbuf parameter, for later use and freeing in cpuset_release_agent(),
which is called after cpuset_sem is dropped.  Whereas the old code has just
one call to cpuset_release_agent(), right in the check_for_release()
routine, the new code has three calls to cpuset_release_agent(), from the
various places that a cpuset can be released.

This patch has been build and booted on SN2, and passed a stress test that
previously hit the deadlock within a few seconds.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-09 12:08:22 -07:00
..
irq
power
acct.c
audit.c
auditsc.c
capability.c
compat.c
configs.c
cpu.c
cpuset.c [PATCH] cpuset release ABBA deadlock fix 2005-08-09 12:08:22 -07:00
crash_dump.c
dma.c
exec_domain.c
exit.c [PATCH] revert "timer exit cleanup" 2005-08-04 16:57:49 -07:00
extable.c
fork.c
futex.c
intermodule.c
itimer.c
kallsyms.c
Kconfig.hz
Kconfig.preempt
kexec.c
kfifo.c
kmod.c
kprobes.c
ksysfs.c
kthread.c
Makefile
module.c [PATCH] Module per-cpu alignment cannot always be met 2005-08-01 21:38:01 -07:00
panic.c
params.c
pid.c
posix-cpu-timers.c
posix-timers.c [PATCH] revert "timer exit cleanup" 2005-08-04 16:57:49 -07:00
printk.c
profile.c
ptrace.c
rcupdate.c
resource.c
sched.c
seccomp.c
signal.c
softirq.c [PATCH] revert bogus softirq changes 2005-07-30 10:49:59 -07:00
spinlock.c
stop_machine.c
sys_ni.c [PATCH] remove sys_set_zone_reclaim() 2005-08-01 10:03:56 -07:00
sys.c [PATCH] Remove suspend() calls from shutdown path 2005-08-04 08:20:47 -07:00
sysctl.c
time.c
timer.c
uid16.c
user.c
wait.c
workqueue.c