kernel-ark/include
Peter Zijlstra ff77e46853 sched/rt: Fix PI handling vs. sched_setscheduler()
Andrea Parri reported:

> I found that the following scenario (with CONFIG_RT_GROUP_SCHED=y) is not
> handled correctly:
>
>     T1 (prio = 20)
>        lock(rtmutex);
>
>     T2 (prio = 20)
>        blocks on rtmutex  (rt_nr_boosted = 0 on T1's rq)
>
>     T1 (prio = 20)
>        sys_set_scheduler(prio = 0)
>           [new_effective_prio == oldprio]
>           T1 prio = 20    (rt_nr_boosted = 0 on T1's rq)
>
> The last step is incorrect as T1 is now boosted (c.f., rt_se_boosted());
> in particular, if we continue with
>
>    T1 (prio = 20)
>       unlock(rtmutex)
>          wakeup(T2)
>          adjust_prio(T1)
>             [prio != rt_mutex_getprio(T1)]
>	    dequeue(T1)
>	       rt_nr_boosted = (unsigned long)(-1)
>	       ...
>             T1 prio = 0
>
> then we end up leaving rt_nr_boosted in an "inconsistent" state.
>
> The simple program attached could reproduce the previous scenario; note
> that, as a consequence of the presence of this state, the "assertion"
>
>     WARN_ON(!rt_nr_running && rt_nr_boosted)
>
> from dec_rt_group() may trigger.

So normally we dequeue/enqueue tasks in sched_setscheduler(), which
would ensure the accounting stays correct. However in the early PI path
we fail to do so.

So this was introduced at around v3.14, by:

  c365c292d0 ("sched: Consider pi boosting in setscheduler()")

which fixed another problem exactly because that dequeue/enqueue, joy.

Fix this by teaching rt about DEQUEUE_SAVE/ENQUEUE_RESTORE and have it
preserve runqueue location with that option. This requires decoupling
the on_rt_rq() state from being on the list.

In order to allow for explicit movement during the SAVE/RESTORE,
introduce {DE,EN}QUEUE_MOVE. We still must use SAVE/RESTORE in these
cases to preserve other invariants.

Respecting the SAVE/RESTORE flags also has the (nice) side-effect that
things like sys_nice()/sys_sched_setaffinity() also do not reorder
FIFO tasks (whereas they used to before this patch).

Reported-by: Andrea Parri <parri.andrea@gmail.com>
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:05 +01:00
..
acpi ACPI / CPPC: remove redundant mbox_send_message() declaration 2016-02-03 01:09:52 +01:00
asm-generic powerpc fixes for 4.5 #2 2016-02-20 09:22:11 -08:00
clocksource
crypto
drm drm/atomic: Allow for holes in connector state, v2. 2016-02-19 13:24:03 +10:00
dt-bindings clk: tegra: Add the APB2APE audio clock on Tegra210 2016-02-02 15:49:29 +01:00
keys
kvm
linux sched/rt: Fix PI handling vs. sched_setscheduler() 2016-02-29 09:53:05 +01:00
math-emu
media [media] vb2: fix nasty vb2_thread regression 2016-02-04 09:13:46 -02:00
memory
misc
net tcp/dccp: fix another race at listener dismantle 2016-02-18 11:35:51 -05:00
pcmcia
ras
rdma
rxrpc
scsi
soc
sound ALSA: hda - Loop interrupt handling until really cleared 2016-02-26 08:50:31 +01:00
target target/transport: add flag to indicate CPU Affinity is observed 2016-02-10 23:08:55 -08:00
trace
uapi nfit: update address range scrub commands to the acpi 6.1 format 2016-02-23 17:17:20 -08:00
video
xen
Kbuild