kernel-ark/kernel
Steven Rostedt c3a990dc9f sched/rt: Kick RT bandwidth timer immediately on start up
I've been debugging why deadline tasks can cause the RT scheduler to
throttle, even when the deadline tasks are only taking up 50% of the
CPU and RT tasks are not even using 1% of the CPU. Here's what I found.

In order to keep a CPU from being hogged by RT tasks, the deadline
scheduler adds its run time (delta_exec) to the rt_time of the RT
bandwidth. That way, if the two use more than 95% of the CPU within one
second (default settings), the RT tasks are throttled to allow non RT
tasks to run.

Although the deadline tasks add their run time to the RT bandwidth, it
lets the RT tasks do the accounting. This is where the problem lies. If
a deadline task runs for a bit, and no RT tasks are running, then it
will continually add to the RT rt_time that is used to calculate how
much CPU the RT tasks use. But no RT period is in play, and this
accumulation of the runtime never gets reset.

When an RT task finally gets to run, and the watchdog goes off, it can
see that the RT task has used more than it should of, because the
deadline task added all this runtime to its rt_time. Then the RT task
that just woke up gets throttled for no good reason.

I also noticed that when an RT task is queued, it starts the timer to
account for overload and such. But that timer goes off one period
later, which may be too late and the extra rt_time will trigger a
throttle.

This is a quick work around to the problem. When a new RT task is
queued, the bandwidth timer is set to go off immediately. Then the
timer can clear out the extra time added to the rt_time while there was
no RT task running. This stops my tests from triggering the throttle,
and it will still throttle if an RT task runs too much, even while a
deadline task is running.

A better solution may be to subtract the bandwidth that the deadline
task uses from the rt_runtime, and add it back when its finished. Then
there wont be a need for runtime tracking of the time used by deadline
tasks.

I may play with that solution tomorrow.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <juri.lelli@gmail.com>
Cc: <williams@redhat.com>
Cc: Clark Williams
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160216183746.349ec98b@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:53:07 +01:00
..
bpf bpf: fix branch offset adjustment on backjumps after patching ctx expansion 2016-02-10 16:56:47 -05:00
configs kconfig: add xenconfig defconfig helper 2015-06-16 11:04:29 +01:00
debug module: use a structure to encapsulate layout. 2015-12-04 22:46:25 +01:00
events perf: Robustify task_function_call() 2016-02-25 08:44:29 +01:00
gcov gcov: use within_module() helper. 2015-12-04 22:46:25 +01:00
irq Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 14:48:58 -08:00
livepatch livepatch: Cleanup module page permission changes 2015-12-04 22:51:07 +01:00
locking Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-02-14 12:02:05 -08:00
power PM: APM_EMULATION does not depend on PM 2016-01-27 23:20:14 +01:00
printk kernel: printk: specify alignment for struct printk_log 2016-01-20 17:09:18 -08:00
rcu rcu: Use simple wait queues where possible in rcutree 2016-02-25 11:27:16 +01:00
sched sched/rt: Kick RT bandwidth timer immediately on start up 2016-02-29 09:53:07 +01:00
time Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 15:49:06 -08:00
trace Another small bug reported to me by Chunyu Hu. 2016-02-25 20:12:09 -08:00
.gitignore certs: add .gitignore to stop git nagging about x509_certificate_list 2015-10-21 15:18:35 +01:00
acct.c
async.c async: export current_is_async() 2015-11-19 17:51:48 +01:00
audit_fsnotify.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
audit_tree.c audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_watch.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
audit.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-01-17 19:13:15 -08:00
audit.h security: Make inode argument of inode_getsecid non-const 2015-12-24 11:09:39 -05:00
auditfilter.c audit: fix comment block whitespace 2015-11-04 08:23:51 -05:00
auditsc.c security: Make inode argument of inode_getsecid non-const 2015-12-24 11:09:39 -05:00
backtracetest.c
bounds.c
capability.c
cgroup_freezer.c cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends 2015-12-03 10:24:08 -05:00
cgroup_pids.c cgroup_pids: fix a typo. 2015-12-14 14:54:37 -05:00
cgroup.c cgroup: make sure a parent css isn't freed before its children 2016-01-22 10:42:58 -05:00
compat.c compat: cleanup coding in compat_get_bitmap() and compat_put_bitmap() 2015-06-04 23:57:18 +02:00
configs.c
context_tracking.c context_tracking: Switch to new static_branch API 2015-11-24 09:56:43 +01:00
cpu_pm.c kernel/cpu_pm: fix cpu_cluster_pm_exit comment 2015-09-03 02:42:20 +02:00
cpu.c kernel/cpu.c: make set_cpu_* static inlines 2016-01-20 17:09:18 -08:00
cpuset.c cpuset: make mm migration asynchronous 2016-01-22 10:22:46 -05:00
crash_dump.c
cred.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
delayacct.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
dma.c
elfcore.c
exec_domain.c
exit.c exit: remove unneeded declaration of exit_mm() 2016-01-20 17:09:18 -08:00
extable.c kernel/extable.c: remove duplicated include 2015-09-10 13:29:01 -07:00
fork.c mm: rework virtual memory accounting 2016-01-14 16:00:49 -08:00
freezer.c
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
futex.c rtmutex: Make wait_lock irq safe 2016-01-26 11:08:35 +01:00
groups.c
hung_task.c
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
jump_label.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
kallsyms.c
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS 2015-05-12 09:46:00 +02:00
Kconfig.preempt
kexec_core.c kernel/kexec_core.c: use list_for_each_entry_safe in kimage_free_page_list 2016-01-20 17:09:18 -08:00
kexec_file.c kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILE 2016-01-20 17:09:18 -08:00
kexec_internal.h kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILE 2016-01-20 17:09:18 -08:00
kexec.c kexec: set KEXEC_TYPE_CRASH before sanity_check_segment_list() 2016-01-20 17:09:18 -08:00
kmod.c kmod: don't run async usermode helper as a child of kworker thread 2015-10-23 17:55:10 +09:00
kprobes.c perf/x86/hw_breakpoints: Disallow kernel breakpoints unless kprobe-safe 2015-08-04 10:16:54 +02:00
ksysfs.c rcu: Remove TINY_RCU bloat from pointless boot parameters 2015-12-07 16:59:37 -08:00
kthread.c kernel/kthread.c:kthread_create_on_node(): clarify documentation 2015-09-04 16:54:41 -07:00
latencytop.c sched/debug: Make schedstats a runtime tunable that is disabled by default 2016-02-09 11:54:23 +01:00
Makefile sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
membarrier.c sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
memremap.c Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm 2016-02-25 18:54:53 -08:00
module_signing.c KEYS: Merge the type-specific data with the payload data 2015-10-21 15:18:36 +01:00
module-internal.h
module.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching 2016-02-18 16:34:15 -08:00
notifier.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 08:40:25 -07:00
nsproxy.c
padata.c
panic.c printk: do cond_resched() between lines while outputting to consoles 2016-01-16 11:17:25 -08:00
params.c Nothing exciting, minor tweaks and cleanups. 2015-11-09 15:53:39 -08:00
pid_namespace.c
pid.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 15:44:04 -08:00
profile.c sched/debug: Make schedstats a runtime tunable that is disabled by default 2016-02-09 11:54:23 +01:00
ptrace.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
range.c
reboot.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
relay.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
resource.c kernel/resource.c: fix muxed resource handling in __request_region() 2016-02-20 08:57:52 -08:00
seccomp.c seccomp: always propagate NO_NEW_PRIVS on tsync 2016-01-27 07:38:25 -08:00
signal.c signals: avoid random wakeups in sigsuspend() 2016-02-05 18:10:40 -08:00
smp.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
smpboot.c stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() 2015-10-20 10:23:55 +02:00
smpboot.h
softirq.c
stacktrace.c
stop_machine.c kernel/stop_machine.c: remove CONFIG_SMP dependencies 2016-01-16 11:17:24 -08:00
sys_ni.c vfs: add copy_file_range syscall and vfs helper 2015-12-01 14:00:53 -05:00
sys.c prctl: take mmap sem for writing to protect against others 2016-01-20 17:09:18 -08:00
sysctl_binary.c
sysctl.c sched/debug: Make schedstats a runtime tunable that is disabled by default 2016-02-09 11:54:23 +01:00
task_work.c task_work: remove fifo ordering guarantee 2015-09-05 13:46:58 -07:00
taskstats.c
test_kprobes.c
torture.c torture: Consolidate cond_resched_rcu_qs() into stutter_wait() 2015-10-06 11:25:01 -07:00
tracepoint.c tracepoint: Give priority to probes of tracepoints 2015-10-25 21:33:54 -04:00
tsacct.c
uid16.c
up.c
user_namespace.c kernel/*: switch to memdup_user_nul() 2016-01-04 10:27:55 -05:00
user-return-notifier.c
user.c
utsname_sysctl.c
utsname.c
watchdog.c Merge branch 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2016-01-11 18:53:13 -08:00
workqueue_internal.h
workqueue.c workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup 2016-02-10 12:13:05 -05:00