kernel-ark/ipc
Manfred Spraul 5864a2fd30 ipc/sem.c: fix complex_count vs. simple op race
Commit 6d07b68ce1 ("ipc/sem.c: optimize sem_lock()") introduced a
race:

sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
 - a non-simple operations is ongoing (sma->sem_perm.lock held)
 - a complex operation is sleeping (sma->complex_count != 0)

As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions.  See below for more details
(or kernel bugzilla 105651).

The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.

The patch also updates stale documentation regarding the locking.

With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce1 ("ipc/sem.c: optimize sem_lock()") (3.10?)

The alternative is to revert the patch that introduced the race.

The patch is safe for backporting, i.e. it makes no assumptions
about memory barriers in spin_unlock_wait().

Background:
Here is the race of the current implementation:

Thread A: (simple op)
- does the first "sma->complex_count == 0" test

Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
  find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
  drops sem_lock, sleeps

Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test

Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count

Thread A:
- does the complex_count test

Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.

Fixes: 6d07b68ce1 ("ipc/sem.c: optimize sem_lock()")
Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com
Reported-by: <felixh@informatik.uni-bremen.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <1vier1@web.de>
Cc: <stable@vger.kernel.org>	[3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
..
compat_mq.c ipc, kernel: use Linux headers 2014-06-06 16:08:14 -07:00
compat.c ipc: resolve shadow warnings 2014-10-14 02:18:23 +02:00
ipc_sysctl.c ipc/msg: increase MSGMNI, remove scaling 2014-12-13 12:42:52 -08:00
Makefile ipc/msg: increase MSGMNI, remove scaling 2014-12-13 12:42:52 -08:00
mq_sysctl.c ipc: convert use of typedef ctl_table to struct ctl_table 2014-06-06 16:08:16 -07:00
mqueue.c fs: Replace CURRENT_TIME with current_time() for inode timestamps 2016-09-27 21:06:21 -04:00
msg.c sysv, ipc: fix security-layer leaking 2016-08-02 17:31:41 -04:00
msgutil.c ipc: delete "nr_ipc_ns" 2016-08-02 19:35:44 -04:00
namespace.c Merge branch 'nsfs-ioctls' into HEAD 2016-09-22 20:00:36 -05:00
sem.c ipc/sem.c: fix complex_count vs. simple op race 2016-10-11 15:06:33 -07:00
shm.c shmem: make shmem_inode_info::lock irq-safe 2016-07-26 16:19:19 -07:00
syscall.c get rid of union semop in sys_semctl(2) arguments 2013-03-05 15:14:16 -05:00
util.c tree wide: use kvfree() than conditional kfree()/vfree() 2016-01-22 17:02:18 -08:00
util.h tree wide: use kvfree() than conditional kfree()/vfree() 2016-01-22 17:02:18 -08:00