Scheduler updates:
- Revert the printk format based wchan() symbol resolution as it can leak the raw value in case that the symbol is not resolvable. - Make wchan() more robust and work with all kind of unwinders by enforcing that the task stays blocked while unwinding is in progress. - Prevent sched_fork() from accessing an invalid sched_task_group - Improve asymmetric packing logic - Extend scheduler statistics to RT and DL scheduling classes and add statistics for bandwith burst to the SCHED_FAIR class. - Properly account SCHED_IDLE entities - Prevent a potential deadlock when initial priority is assigned to a newly created kthread. A recent change to plug a race between cpuset and __sched_setscheduler() introduced a new lock dependency which is now triggered. Break the lock dependency chain by moving the priority assignment to the thread function. - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems. - Improve idle balancing in general and especially for NOHZ enabled systems. - Provide proper interfaces for live patching so it does not have to fiddle with scheduler internals. - Add cluster aware scheduling support. - A small set of tweaks for RT (irqwork, wait_task_inactive(), various scheduler options and delaying mmdrop) - The usual small tweaks and improvements all over the place -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/OUkTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoR/5D/9ikdGNpKg9osNqJ3GjAmxsK6kVkB29 iFe2k8pIpWDToWQf/wQRGih4Yj3Cl49QSnZcPIibh2/12EB1qrrW6iSPJkInz8Ec /1LS5/Vewn2OyoxyXZjdvGC5gTXEodSbIazASvX7nvdMeI4gsAsL5etzrMJirT/t aymqvr7zovvywrwMTQJrGjUMo9l4ewE8tafMNNhRu1BHU1U4ojM9yvThyRAAcmp7 3Xy49A+Yq3IgrvYI4u8FMK5Zh08KaxSFjiLhePGm/bF+wSfYmWop2TP1jY05W2Uo ti8hfbJMUoFRYuMxAiEldkItnc0wV4M9PtWZZ/x+B71bs65Y4Zjt9cW+rxJv2+m1 vzV31EsQwGnOti072dzWN4c/cZqngVXAjaNtErvDwJUr+Tw1ayv9KUvuodMQqZY6 mu68bFUO2kV9EMe1CBOv51Uy1RGHyLj3rlNqrkw+Xp5ISE9Ad2vhUEiRp5bQx5Ci V/XFhGZkGUluh0vccrdFlNYZwhj8cZEzkOPCnPSeZ+bq8SyZE6xuHH/lTP1CJCOy s800rW1huM+kgV+zRN8adDkGXibAk9N3RtVGnQXmuEy8gB9LZmQg+JeM2wsc9B+6 i0gdqZnsjNAfoK+BBAG4holxptSL8/eOJsFH8ZNIoxQ+iqooyPx9tFX7yXnRTBQj d2qWG7UvoseT+g== =fgtS -----END PGP SIGNATURE----- Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - Revert the printk format based wchan() symbol resolution as it can leak the raw value in case that the symbol is not resolvable. - Make wchan() more robust and work with all kind of unwinders by enforcing that the task stays blocked while unwinding is in progress. - Prevent sched_fork() from accessing an invalid sched_task_group - Improve asymmetric packing logic - Extend scheduler statistics to RT and DL scheduling classes and add statistics for bandwith burst to the SCHED_FAIR class. - Properly account SCHED_IDLE entities - Prevent a potential deadlock when initial priority is assigned to a newly created kthread. A recent change to plug a race between cpuset and __sched_setscheduler() introduced a new lock dependency which is now triggered. Break the lock dependency chain by moving the priority assignment to the thread function. - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems. - Improve idle balancing in general and especially for NOHZ enabled systems. - Provide proper interfaces for live patching so it does not have to fiddle with scheduler internals. - Add cluster aware scheduling support. - A small set of tweaks for RT (irqwork, wait_task_inactive(), various scheduler options and delaying mmdrop) - The usual small tweaks and improvements all over the place * tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits) sched/fair: Cleanup newidle_balance sched/fair: Remove sysctl_sched_migration_cost condition sched/fair: Wait before decaying max_newidle_lb_cost sched/fair: Skip update_blocked_averages if we are defering load balance sched/fair: Account update_blocked_averages in newidle_balance cost x86: Fix __get_wchan() for !STACKTRACE sched,x86: Fix L2 cache mask sched/core: Remove rq_relock() sched: Improve wake_up_all_idle_cpus() take #2 irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support. sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ sched: Add cluster scheduler level for x86 sched: Add cluster scheduler level in core and related Kconfig for ARM64 topology: Represent clusters of CPUs within a die sched: Disable -Wunused-but-set-variable sched: Add wrapper for get_wchan() to keep task blocked x86: Fix get_wchan() to support the ORC unwinder proc: Use task_is_running() for wchan in /proc/$pid/stat ...
This commit is contained in:
commit
9a7e0a90a4
@ -42,6 +42,12 @@ Description: the CPU core ID of cpuX. Typically it is the hardware platform's
|
||||
architecture and platform dependent.
|
||||
Values: integer
|
||||
|
||||
What: /sys/devices/system/cpu/cpuX/topology/cluster_id
|
||||
Description: the cluster ID of cpuX. Typically it is the hardware platform's
|
||||
identifier (rather than the kernel's). The actual value is
|
||||
architecture and platform dependent.
|
||||
Values: integer
|
||||
|
||||
What: /sys/devices/system/cpu/cpuX/topology/book_id
|
||||
Description: the book ID of cpuX. Typically it is the hardware platform's
|
||||
identifier (rather than the kernel's). The actual value is
|
||||
@ -85,6 +91,15 @@ Description: human-readable list of CPUs within the same die.
|
||||
The format is like 0-3, 8-11, 14,17.
|
||||
Values: decimal list.
|
||||
|
||||
What: /sys/devices/system/cpu/cpuX/topology/cluster_cpus
|
||||
Description: internal kernel map of CPUs within the same cluster.
|
||||
Values: hexadecimal bitmask.
|
||||
|
||||
What: /sys/devices/system/cpu/cpuX/topology/cluster_cpus_list
|
||||
Description: human-readable list of CPUs within the same cluster.
|
||||
The format is like 0-3, 8-11, 14,17.
|
||||
Values: decimal list.
|
||||
|
||||
What: /sys/devices/system/cpu/cpuX/topology/book_siblings
|
||||
Description: internal kernel map of cpuX's hardware threads within the same
|
||||
book_id. it's only used on s390.
|
||||
|
@ -1016,6 +1016,8 @@ All time durations are in microseconds.
|
||||
- nr_periods
|
||||
- nr_throttled
|
||||
- throttled_usec
|
||||
- nr_bursts
|
||||
- burst_usec
|
||||
|
||||
cpu.weight
|
||||
A read-write single value file which exists on non-root
|
||||
@ -1047,6 +1049,12 @@ All time durations are in microseconds.
|
||||
$PERIOD duration. "max" for $MAX indicates no limit. If only
|
||||
one number is written, $MAX is updated.
|
||||
|
||||
cpu.max.burst
|
||||
A read-write single value file which exists on non-root
|
||||
cgroups. The default is "0".
|
||||
|
||||
The burst in the range [0, $MAX].
|
||||
|
||||
cpu.pressure
|
||||
A read-write nested-keyed file.
|
||||
|
||||
|
@ -19,11 +19,13 @@ these macros in include/asm-XXX/topology.h::
|
||||
|
||||
#define topology_physical_package_id(cpu)
|
||||
#define topology_die_id(cpu)
|
||||
#define topology_cluster_id(cpu)
|
||||
#define topology_core_id(cpu)
|
||||
#define topology_book_id(cpu)
|
||||
#define topology_drawer_id(cpu)
|
||||
#define topology_sibling_cpumask(cpu)
|
||||
#define topology_core_cpumask(cpu)
|
||||
#define topology_cluster_cpumask(cpu)
|
||||
#define topology_die_cpumask(cpu)
|
||||
#define topology_book_cpumask(cpu)
|
||||
#define topology_drawer_cpumask(cpu)
|
||||
@ -39,10 +41,12 @@ not defined by include/asm-XXX/topology.h:
|
||||
|
||||
1) topology_physical_package_id: -1
|
||||
2) topology_die_id: -1
|
||||
3) topology_core_id: 0
|
||||
4) topology_sibling_cpumask: just the given CPU
|
||||
5) topology_core_cpumask: just the given CPU
|
||||
6) topology_die_cpumask: just the given CPU
|
||||
3) topology_cluster_id: -1
|
||||
4) topology_core_id: 0
|
||||
5) topology_sibling_cpumask: just the given CPU
|
||||
6) topology_core_cpumask: just the given CPU
|
||||
7) topology_cluster_cpumask: just the given CPU
|
||||
8) topology_die_cpumask: just the given CPU
|
||||
|
||||
For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
|
||||
default definitions for topology_book_id() and topology_book_cpumask().
|
||||
|
@ -22,9 +22,52 @@ cfs_quota units at each period boundary. As threads consume this bandwidth it
|
||||
is transferred to cpu-local "silos" on a demand basis. The amount transferred
|
||||
within each of these updates is tunable and described as the "slice".
|
||||
|
||||
Burst feature
|
||||
-------------
|
||||
This feature borrows time now against our future underrun, at the cost of
|
||||
increased interference against the other system users. All nicely bounded.
|
||||
|
||||
Traditional (UP-EDF) bandwidth control is something like:
|
||||
|
||||
(U = \Sum u_i) <= 1
|
||||
|
||||
This guaranteeds both that every deadline is met and that the system is
|
||||
stable. After all, if U were > 1, then for every second of walltime,
|
||||
we'd have to run more than a second of program time, and obviously miss
|
||||
our deadline, but the next deadline will be further out still, there is
|
||||
never time to catch up, unbounded fail.
|
||||
|
||||
The burst feature observes that a workload doesn't always executes the full
|
||||
quota; this enables one to describe u_i as a statistical distribution.
|
||||
|
||||
For example, have u_i = {x,e}_i, where x is the p(95) and x+e p(100)
|
||||
(the traditional WCET). This effectively allows u to be smaller,
|
||||
increasing the efficiency (we can pack more tasks in the system), but at
|
||||
the cost of missing deadlines when all the odds line up. However, it
|
||||
does maintain stability, since every overrun must be paired with an
|
||||
underrun as long as our x is above the average.
|
||||
|
||||
That is, suppose we have 2 tasks, both specify a p(95) value, then we
|
||||
have a p(95)*p(95) = 90.25% chance both tasks are within their quota and
|
||||
everything is good. At the same time we have a p(5)p(5) = 0.25% chance
|
||||
both tasks will exceed their quota at the same time (guaranteed deadline
|
||||
fail). Somewhere in between there's a threshold where one exceeds and
|
||||
the other doesn't underrun enough to compensate; this depends on the
|
||||
specific CDFs.
|
||||
|
||||
At the same time, we can say that the worst case deadline miss, will be
|
||||
\Sum e_i; that is, there is a bounded tardiness (under the assumption
|
||||
that x+e is indeed WCET).
|
||||
|
||||
The interferenece when using burst is valued by the possibilities for
|
||||
missing the deadline and the average WCET. Test results showed that when
|
||||
there many cgroups or CPU is under utilized, the interference is
|
||||
limited. More details are shown in:
|
||||
https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/
|
||||
|
||||
Management
|
||||
----------
|
||||
Quota and period are managed within the cpu subsystem via cgroupfs.
|
||||
Quota, period and burst are managed within the cpu subsystem via cgroupfs.
|
||||
|
||||
.. note::
|
||||
The cgroupfs files described in this section are only applicable
|
||||
@ -32,29 +75,37 @@ Quota and period are managed within the cpu subsystem via cgroupfs.
|
||||
:ref:`Documentation/admin-guide/cgroup-v2.rst <cgroup-v2-cpu>`.
|
||||
|
||||
- cpu.cfs_quota_us: the total available run-time within a period (in
|
||||
microseconds)
|
||||
- cpu.cfs_quota_us: run-time replenished within a period (in microseconds)
|
||||
- cpu.cfs_period_us: the length of a period (in microseconds)
|
||||
- cpu.stat: exports throttling statistics [explained further below]
|
||||
- cpu.cfs_burst_us: the maximum accumulated run-time (in microseconds)
|
||||
|
||||
The default values are::
|
||||
|
||||
cpu.cfs_period_us=100ms
|
||||
cpu.cfs_quota=-1
|
||||
cpu.cfs_quota_us=-1
|
||||
cpu.cfs_burst_us=0
|
||||
|
||||
A value of -1 for cpu.cfs_quota_us indicates that the group does not have any
|
||||
bandwidth restriction in place, such a group is described as an unconstrained
|
||||
bandwidth group. This represents the traditional work-conserving behavior for
|
||||
CFS.
|
||||
|
||||
Writing any (valid) positive value(s) will enact the specified bandwidth limit.
|
||||
The minimum quota allowed for the quota or period is 1ms. There is also an
|
||||
upper bound on the period length of 1s. Additional restrictions exist when
|
||||
bandwidth limits are used in a hierarchical fashion, these are explained in
|
||||
more detail below.
|
||||
Writing any (valid) positive value(s) no smaller than cpu.cfs_burst_us will
|
||||
enact the specified bandwidth limit. The minimum quota allowed for the quota or
|
||||
period is 1ms. There is also an upper bound on the period length of 1s.
|
||||
Additional restrictions exist when bandwidth limits are used in a hierarchical
|
||||
fashion, these are explained in more detail below.
|
||||
|
||||
Writing any negative value to cpu.cfs_quota_us will remove the bandwidth limit
|
||||
and return the group to an unconstrained state once more.
|
||||
|
||||
A value of 0 for cpu.cfs_burst_us indicates that the group can not accumulate
|
||||
any unused bandwidth. It makes the traditional bandwidth control behavior for
|
||||
CFS unchanged. Writing any (valid) positive value(s) no larger than
|
||||
cpu.cfs_quota_us into cpu.cfs_burst_us will enact the cap on unused bandwidth
|
||||
accumulation.
|
||||
|
||||
Any updates to a group's bandwidth specification will result in it becoming
|
||||
unthrottled if it is in a constrained state.
|
||||
|
||||
@ -74,7 +125,7 @@ for more fine-grained consumption.
|
||||
|
||||
Statistics
|
||||
----------
|
||||
A group's bandwidth statistics are exported via 3 fields in cpu.stat.
|
||||
A group's bandwidth statistics are exported via 5 fields in cpu.stat.
|
||||
|
||||
cpu.stat:
|
||||
|
||||
@ -82,6 +133,9 @@ cpu.stat:
|
||||
- nr_throttled: Number of times the group has been throttled/limited.
|
||||
- throttled_time: The total time duration (in nanoseconds) for which entities
|
||||
of the group have been throttled.
|
||||
- nr_bursts: Number of periods burst occurs.
|
||||
- burst_time: Cumulative wall-time (in nanoseconds) that any CPUs has used
|
||||
above quota in respective periods
|
||||
|
||||
This interface is read-only.
|
||||
|
||||
@ -179,3 +233,15 @@ Examples
|
||||
|
||||
By using a small period here we are ensuring a consistent latency
|
||||
response at the expense of burst capacity.
|
||||
|
||||
4. Limit a group to 40% of 1 CPU, and allow accumulate up to 20% of 1 CPU
|
||||
additionally, in case accumulation has been done.
|
||||
|
||||
With 50ms period, 20ms quota will be equivalent to 40% of 1 CPU.
|
||||
And 10ms burst will be equivalent to 20% of 1 CPU.
|
||||
|
||||
# echo 20000 > cpu.cfs_quota_us /* quota = 20ms */
|
||||
# echo 50000 > cpu.cfs_period_us /* period = 50ms */
|
||||
# echo 10000 > cpu.cfs_burst_us /* burst = 10ms */
|
||||
|
||||
Larger buffer setting (no larger than quota) allows greater burst capacity.
|
||||
|
@ -42,7 +42,7 @@ extern void start_thread(struct pt_regs *, unsigned long, unsigned long);
|
||||
struct task_struct;
|
||||
extern void release_thread(struct task_struct *);
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p);
|
||||
unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#define KSTK_EIP(tsk) (task_pt_regs(tsk)->pc)
|
||||
|
||||
|
@ -376,12 +376,11 @@ thread_saved_pc(struct task_struct *t)
|
||||
}
|
||||
|
||||
unsigned long
|
||||
get_wchan(struct task_struct *p)
|
||||
__get_wchan(struct task_struct *p)
|
||||
{
|
||||
unsigned long schedule_frame;
|
||||
unsigned long pc;
|
||||
if (!p || p == current || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
/*
|
||||
* This one depends on the frame size of schedule(). Do a
|
||||
* "disass schedule" in gdb to find the frame size. Also, the
|
||||
|
@ -70,7 +70,7 @@ struct task_struct;
|
||||
extern void start_thread(struct pt_regs * regs, unsigned long pc,
|
||||
unsigned long usp);
|
||||
|
||||
extern unsigned int get_wchan(struct task_struct *p);
|
||||
extern unsigned int __get_wchan(struct task_struct *p);
|
||||
|
||||
#endif /* !__ASSEMBLY__ */
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
* = specifics of data structs where trace is saved(CONFIG_STACKTRACE etc)
|
||||
*
|
||||
* vineetg: March 2009
|
||||
* -Implemented correct versions of thread_saved_pc() and get_wchan()
|
||||
* -Implemented correct versions of thread_saved_pc() and __get_wchan()
|
||||
*
|
||||
* rajeshwarr: 2008
|
||||
* -Initial implementation
|
||||
@ -248,7 +248,7 @@ void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
|
||||
* Of course just returning schedule( ) would be pointless so unwind until
|
||||
* the function is not in schedular code
|
||||
*/
|
||||
unsigned int get_wchan(struct task_struct *tsk)
|
||||
unsigned int __get_wchan(struct task_struct *tsk)
|
||||
{
|
||||
return arc_unwind_core(tsk, NULL, __get_first_nonsched, NULL);
|
||||
}
|
||||
|
@ -84,7 +84,7 @@ struct task_struct;
|
||||
/* Free all resources held by a thread. */
|
||||
extern void release_thread(struct task_struct *);
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p);
|
||||
unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#define task_pt_regs(p) \
|
||||
((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
|
||||
|
@ -276,13 +276,11 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
|
||||
return 0;
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
struct stackframe frame;
|
||||
unsigned long stack_page;
|
||||
int count = 0;
|
||||
if (!p || p == current || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
frame.fp = thread_saved_fp(p);
|
||||
frame.sp = thread_saved_sp(p);
|
||||
|
@ -988,6 +988,15 @@ config SCHED_MC
|
||||
making when dealing with multi-core CPU chips at a cost of slightly
|
||||
increased overhead in some places. If unsure say N here.
|
||||
|
||||
config SCHED_CLUSTER
|
||||
bool "Cluster scheduler support"
|
||||
help
|
||||
Cluster scheduler support improves the CPU scheduler's decision
|
||||
making when dealing with machines that have clusters of CPUs.
|
||||
Cluster usually means a couple of CPUs which are placed closely
|
||||
by sharing mid-level caches, last-level cache tags or internal
|
||||
busses.
|
||||
|
||||
config SCHED_SMT
|
||||
bool "SMT scheduler support"
|
||||
help
|
||||
|
@ -257,7 +257,7 @@ struct task_struct;
|
||||
/* Free all resources held by a thread. */
|
||||
extern void release_thread(struct task_struct *);
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p);
|
||||
unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
void update_sctlr_el1(u64 sctlr);
|
||||
|
||||
|
@ -528,13 +528,11 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
|
||||
return last;
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
struct stackframe frame;
|
||||
unsigned long stack_page, ret = 0;
|
||||
int count = 0;
|
||||
if (!p || p == current || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
stack_page = (unsigned long)try_get_task_stack(p);
|
||||
if (!stack_page)
|
||||
|
@ -103,6 +103,8 @@ int __init parse_acpi_topology(void)
|
||||
cpu_topology[cpu].thread_id = -1;
|
||||
cpu_topology[cpu].core_id = topology_id;
|
||||
}
|
||||
topology_id = find_acpi_cpu_topology_cluster(cpu);
|
||||
cpu_topology[cpu].cluster_id = topology_id;
|
||||
topology_id = find_acpi_cpu_topology_package(cpu);
|
||||
cpu_topology[cpu].package_id = topology_id;
|
||||
|
||||
|
@ -81,7 +81,7 @@ static inline void release_thread(struct task_struct *dead_task)
|
||||
|
||||
extern int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p);
|
||||
unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#define KSTK_EIP(tsk) (task_pt_regs(tsk)->pc)
|
||||
#define KSTK_ESP(tsk) (task_pt_regs(tsk)->usp)
|
||||
|
@ -111,12 +111,11 @@ static bool save_wchan(unsigned long pc, void *arg)
|
||||
return false;
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *task)
|
||||
unsigned long __get_wchan(struct task_struct *task)
|
||||
{
|
||||
unsigned long pc = 0;
|
||||
|
||||
if (likely(task && task != current && !task_is_running(task)))
|
||||
walk_stackframe(task, NULL, save_wchan, &pc);
|
||||
walk_stackframe(task, NULL, save_wchan, &pc);
|
||||
return pc;
|
||||
}
|
||||
|
||||
|
@ -105,7 +105,7 @@ static inline void release_thread(struct task_struct *dead_task)
|
||||
{
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p);
|
||||
unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#define KSTK_EIP(tsk) \
|
||||
({ \
|
||||
|
@ -128,15 +128,12 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
|
||||
return 0;
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
unsigned long fp, pc;
|
||||
unsigned long stack_page;
|
||||
int count = 0;
|
||||
|
||||
if (!p || p == current || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
stack_page = (unsigned long)p;
|
||||
fp = ((struct pt_regs *)p->thread.ksp)->er6;
|
||||
do {
|
||||
|
@ -64,7 +64,7 @@ struct thread_struct {
|
||||
extern void release_thread(struct task_struct *dead_task);
|
||||
|
||||
/* Get wait channel for task P. */
|
||||
extern unsigned long get_wchan(struct task_struct *p);
|
||||
extern unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
/* The following stuff is pretty HEXAGON specific. */
|
||||
|
||||
|
@ -130,13 +130,11 @@ void flush_thread(void)
|
||||
* is an identification of the point at which the scheduler
|
||||
* was invoked by a blocked thread.
|
||||
*/
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
unsigned long fp, pc;
|
||||
unsigned long stack_page;
|
||||
int count = 0;
|
||||
if (!p || p == current || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
stack_page = (unsigned long)task_stack_page(p);
|
||||
fp = ((struct hexagon_switch_stack *)p->thread.switch_sp)->fp;
|
||||
|
@ -330,7 +330,7 @@ struct task_struct;
|
||||
#define release_thread(dead_task)
|
||||
|
||||
/* Get wait channel for task P. */
|
||||
extern unsigned long get_wchan (struct task_struct *p);
|
||||
extern unsigned long __get_wchan (struct task_struct *p);
|
||||
|
||||
/* Return instruction pointer of blocked task TSK. */
|
||||
#define KSTK_EIP(tsk) \
|
||||
|
@ -523,15 +523,12 @@ exit_thread (struct task_struct *tsk)
|
||||
}
|
||||
|
||||
unsigned long
|
||||
get_wchan (struct task_struct *p)
|
||||
__get_wchan (struct task_struct *p)
|
||||
{
|
||||
struct unw_frame_info info;
|
||||
unsigned long ip;
|
||||
int count = 0;
|
||||
|
||||
if (!p || p == current || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
/*
|
||||
* Note: p may not be a blocked task (it could be current or
|
||||
* another process running on some other CPU. Rather than
|
||||
|
@ -150,7 +150,7 @@ static inline void release_thread(struct task_struct *dead_task)
|
||||
{
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p);
|
||||
unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#define KSTK_EIP(tsk) \
|
||||
({ \
|
||||
|
@ -263,13 +263,11 @@ int dump_fpu (struct pt_regs *regs, struct user_m68kfp_struct *fpu)
|
||||
}
|
||||
EXPORT_SYMBOL(dump_fpu);
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
unsigned long fp, pc;
|
||||
unsigned long stack_page;
|
||||
int count = 0;
|
||||
if (!p || p == current || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
stack_page = (unsigned long)task_stack_page(p);
|
||||
fp = ((struct switch_stack *)p->thread.ksp)->a6;
|
||||
|
@ -68,7 +68,7 @@ static inline void release_thread(struct task_struct *dead_task)
|
||||
{
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p);
|
||||
unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
/* The size allocated for kernel stacks. This _must_ be a power of two! */
|
||||
# define KERNEL_STACK_SIZE 0x2000
|
||||
|
@ -112,7 +112,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
|
||||
return 0;
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
/* TBD (used by procfs) */
|
||||
return 0;
|
||||
|
@ -369,7 +369,7 @@ static inline void flush_thread(void)
|
||||
{
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p);
|
||||
unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#define __KSTK_TOS(tsk) ((unsigned long)task_stack_page(tsk) + \
|
||||
THREAD_SIZE - 32 - sizeof(struct pt_regs))
|
||||
|
@ -511,7 +511,7 @@ static int __init frame_info_init(void)
|
||||
|
||||
/*
|
||||
* Without schedule() frame info, result given by
|
||||
* thread_saved_pc() and get_wchan() are not reliable.
|
||||
* thread_saved_pc() and __get_wchan() are not reliable.
|
||||
*/
|
||||
if (schedule_mfi.pc_offset < 0)
|
||||
printk("Can't analyze schedule() prologue at %p\n", schedule);
|
||||
@ -652,9 +652,9 @@ unsigned long unwind_stack(struct task_struct *task, unsigned long *sp,
|
||||
#endif
|
||||
|
||||
/*
|
||||
* get_wchan - a maintenance nightmare^W^Wpain in the ass ...
|
||||
* __get_wchan - a maintenance nightmare^W^Wpain in the ass ...
|
||||
*/
|
||||
unsigned long get_wchan(struct task_struct *task)
|
||||
unsigned long __get_wchan(struct task_struct *task)
|
||||
{
|
||||
unsigned long pc = 0;
|
||||
#ifdef CONFIG_KALLSYMS
|
||||
@ -662,8 +662,6 @@ unsigned long get_wchan(struct task_struct *task)
|
||||
unsigned long ra = 0;
|
||||
#endif
|
||||
|
||||
if (!task || task == current || task_is_running(task))
|
||||
goto out;
|
||||
if (!task_stack_page(task))
|
||||
goto out;
|
||||
|
||||
|
@ -83,7 +83,7 @@ extern struct task_struct *last_task_used_math;
|
||||
/* Prepare to copy thread state - unlazy all lazy status */
|
||||
#define prepare_to_copy(tsk) do { } while (0)
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p);
|
||||
unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#define cpu_relax() barrier()
|
||||
|
||||
|
@ -233,15 +233,12 @@ int dump_fpu(struct pt_regs *regs, elf_fpregset_t * fpu)
|
||||
|
||||
EXPORT_SYMBOL(dump_fpu);
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
unsigned long fp, lr;
|
||||
unsigned long stack_start, stack_end;
|
||||
int count = 0;
|
||||
|
||||
if (!p || p == current || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
if (IS_ENABLED(CONFIG_FRAME_POINTER)) {
|
||||
stack_start = (unsigned long)end_of_stack(p);
|
||||
stack_end = (unsigned long)task_stack_page(p) + THREAD_SIZE;
|
||||
@ -258,5 +255,3 @@ unsigned long get_wchan(struct task_struct *p)
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
EXPORT_SYMBOL(get_wchan);
|
||||
|
@ -69,7 +69,7 @@ static inline void release_thread(struct task_struct *dead_task)
|
||||
{
|
||||
}
|
||||
|
||||
extern unsigned long get_wchan(struct task_struct *p);
|
||||
extern unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#define task_pt_regs(p) \
|
||||
((struct pt_regs *)(THREAD_SIZE + task_stack_page(p)) - 1)
|
||||
|
@ -217,15 +217,12 @@ void dump(struct pt_regs *fp)
|
||||
pr_emerg("\n\n");
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
unsigned long fp, pc;
|
||||
unsigned long stack_page;
|
||||
int count = 0;
|
||||
|
||||
if (!p || p == current || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
stack_page = (unsigned long)p;
|
||||
fp = ((struct switch_stack *)p->thread.ksp)->fp; /* ;dgt2 */
|
||||
do {
|
||||
|
@ -73,7 +73,7 @@ struct thread_struct {
|
||||
|
||||
void start_thread(struct pt_regs *regs, unsigned long nip, unsigned long sp);
|
||||
void release_thread(struct task_struct *);
|
||||
unsigned long get_wchan(struct task_struct *p);
|
||||
unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#define cpu_relax() barrier()
|
||||
|
||||
|
@ -263,7 +263,7 @@ void dump_elf_thread(elf_greg_t *dest, struct pt_regs* regs)
|
||||
dest[35] = 0;
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
/* TODO */
|
||||
|
||||
|
@ -273,7 +273,7 @@ struct mm_struct;
|
||||
/* Free all resources held by a thread. */
|
||||
extern void release_thread(struct task_struct *);
|
||||
|
||||
extern unsigned long get_wchan(struct task_struct *p);
|
||||
extern unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#define KSTK_EIP(tsk) ((tsk)->thread.regs.iaoq[0])
|
||||
#define KSTK_ESP(tsk) ((tsk)->thread.regs.gr[30])
|
||||
|
@ -240,15 +240,12 @@ copy_thread(unsigned long clone_flags, unsigned long usp,
|
||||
}
|
||||
|
||||
unsigned long
|
||||
get_wchan(struct task_struct *p)
|
||||
__get_wchan(struct task_struct *p)
|
||||
{
|
||||
struct unwind_frame_info info;
|
||||
unsigned long ip;
|
||||
int count = 0;
|
||||
|
||||
if (!p || p == current || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
/*
|
||||
* These bracket the sleeping functions..
|
||||
*/
|
||||
|
@ -300,7 +300,7 @@ struct thread_struct {
|
||||
|
||||
#define task_pt_regs(tsk) ((tsk)->thread.regs)
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p);
|
||||
unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#define KSTK_EIP(tsk) ((tsk)->thread.regs? (tsk)->thread.regs->nip: 0)
|
||||
#define KSTK_ESP(tsk) ((tsk)->thread.regs? (tsk)->thread.regs->gpr[1]: 0)
|
||||
|
@ -2111,14 +2111,11 @@ int validate_sp(unsigned long sp, struct task_struct *p,
|
||||
|
||||
EXPORT_SYMBOL(validate_sp);
|
||||
|
||||
static unsigned long __get_wchan(struct task_struct *p)
|
||||
static unsigned long ___get_wchan(struct task_struct *p)
|
||||
{
|
||||
unsigned long ip, sp;
|
||||
int count = 0;
|
||||
|
||||
if (!p || p == current || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
sp = p->thread.ksp;
|
||||
if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD))
|
||||
return 0;
|
||||
@ -2137,14 +2134,14 @@ static unsigned long __get_wchan(struct task_struct *p)
|
||||
return 0;
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
unsigned long ret;
|
||||
|
||||
if (!try_get_task_stack(p))
|
||||
return 0;
|
||||
|
||||
ret = __get_wchan(p);
|
||||
ret = ___get_wchan(p);
|
||||
|
||||
put_task_stack(p);
|
||||
|
||||
|
@ -66,7 +66,7 @@ static inline void release_thread(struct task_struct *dead_task)
|
||||
{
|
||||
}
|
||||
|
||||
extern unsigned long get_wchan(struct task_struct *p);
|
||||
extern unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
|
||||
static inline void wait_for_interrupt(void)
|
||||
|
@ -128,16 +128,14 @@ static bool save_wchan(void *arg, unsigned long pc)
|
||||
return true;
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *task)
|
||||
unsigned long __get_wchan(struct task_struct *task)
|
||||
{
|
||||
unsigned long pc = 0;
|
||||
|
||||
if (likely(task && task != current && !task_is_running(task))) {
|
||||
if (!try_get_task_stack(task))
|
||||
return 0;
|
||||
walk_stackframe(task, NULL, save_wchan, &pc);
|
||||
put_task_stack(task);
|
||||
}
|
||||
if (!try_get_task_stack(task))
|
||||
return 0;
|
||||
walk_stackframe(task, NULL, save_wchan, &pc);
|
||||
put_task_stack(task);
|
||||
return pc;
|
||||
}
|
||||
|
||||
|
@ -192,7 +192,7 @@ static inline void release_thread(struct task_struct *tsk) { }
|
||||
void guarded_storage_release(struct task_struct *tsk);
|
||||
void gs_load_bc_cb(struct pt_regs *regs);
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p);
|
||||
unsigned long __get_wchan(struct task_struct *p);
|
||||
#define task_pt_regs(tsk) ((struct pt_regs *) \
|
||||
(task_stack_page(tsk) + THREAD_SIZE) - 1)
|
||||
#define KSTK_EIP(tsk) (task_pt_regs(tsk)->psw.addr)
|
||||
|
@ -181,12 +181,12 @@ void execve_tail(void)
|
||||
asm volatile("sfpc %0" : : "d" (0));
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
struct unwind_state state;
|
||||
unsigned long ip = 0;
|
||||
|
||||
if (!p || p == current || task_is_running(p) || !task_stack_page(p))
|
||||
if (!task_stack_page(p))
|
||||
return 0;
|
||||
|
||||
if (!try_get_task_stack(p))
|
||||
|
@ -180,7 +180,7 @@ static inline void show_code(struct pt_regs *regs)
|
||||
}
|
||||
#endif
|
||||
|
||||
extern unsigned long get_wchan(struct task_struct *p);
|
||||
extern unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#define KSTK_EIP(tsk) (task_pt_regs(tsk)->pc)
|
||||
#define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[15])
|
||||
|
@ -182,13 +182,10 @@ __switch_to(struct task_struct *prev, struct task_struct *next)
|
||||
return prev;
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
unsigned long pc;
|
||||
|
||||
if (!p || p == current || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
/*
|
||||
* The same comment as on the Alpha applies here, too ...
|
||||
*/
|
||||
|
@ -89,7 +89,7 @@ static inline void start_thread(struct pt_regs * regs, unsigned long pc,
|
||||
/* Free all resources held by a thread. */
|
||||
#define release_thread(tsk) do { } while(0)
|
||||
|
||||
unsigned long get_wchan(struct task_struct *);
|
||||
unsigned long __get_wchan(struct task_struct *);
|
||||
|
||||
#define task_pt_regs(tsk) ((tsk)->thread.kregs)
|
||||
#define KSTK_EIP(tsk) ((tsk)->thread.kregs->pc)
|
||||
|
@ -183,7 +183,7 @@ do { \
|
||||
/* Free all resources held by a thread. */
|
||||
#define release_thread(tsk) do { } while (0)
|
||||
|
||||
unsigned long get_wchan(struct task_struct *task);
|
||||
unsigned long __get_wchan(struct task_struct *task);
|
||||
|
||||
#define task_pt_regs(tsk) (task_thread_info(tsk)->kregs)
|
||||
#define KSTK_EIP(tsk) (task_pt_regs(tsk)->tpc)
|
||||
|
@ -365,7 +365,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
|
||||
return 0;
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *task)
|
||||
unsigned long __get_wchan(struct task_struct *task)
|
||||
{
|
||||
unsigned long pc, fp, bias = 0;
|
||||
unsigned long task_base = (unsigned long) task;
|
||||
@ -373,9 +373,6 @@ unsigned long get_wchan(struct task_struct *task)
|
||||
struct reg_window32 *rw;
|
||||
int count = 0;
|
||||
|
||||
if (!task || task == current || task_is_running(task))
|
||||
goto out;
|
||||
|
||||
fp = task_thread_info(task)->ksp + bias;
|
||||
do {
|
||||
/* Bogus frame pointer? */
|
||||
|
@ -663,7 +663,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
|
||||
return 0;
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *task)
|
||||
unsigned long __get_wchan(struct task_struct *task)
|
||||
{
|
||||
unsigned long pc, fp, bias = 0;
|
||||
struct thread_info *tp;
|
||||
@ -671,9 +671,6 @@ unsigned long get_wchan(struct task_struct *task)
|
||||
unsigned long ret = 0;
|
||||
int count = 0;
|
||||
|
||||
if (!task || task == current || task_is_running(task))
|
||||
goto out;
|
||||
|
||||
tp = task_thread_info(task);
|
||||
bias = STACK_BIAS;
|
||||
fp = task_thread_info(task)->ksp + bias;
|
||||
|
@ -106,6 +106,6 @@ extern struct cpuinfo_um boot_cpu_data;
|
||||
#define cache_line_size() (boot_cpu_data.cache_alignment)
|
||||
|
||||
#define KSTK_REG(tsk, reg) get_thread_reg(reg, &tsk->thread.switch_buf)
|
||||
extern unsigned long get_wchan(struct task_struct *p);
|
||||
extern unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#endif
|
||||
|
@ -364,14 +364,11 @@ unsigned long arch_align_stack(unsigned long sp)
|
||||
}
|
||||
#endif
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
unsigned long stack_page, sp, ip;
|
||||
bool seen_sched = 0;
|
||||
|
||||
if ((p == NULL) || (p == current) || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
stack_page = (unsigned long) task_stack_page(p);
|
||||
/* Bail if the process has no kernel stack for some reason */
|
||||
if (stack_page == 0)
|
||||
|
@ -1001,6 +1001,17 @@ config NR_CPUS
|
||||
This is purely to save memory: each supported CPU adds about 8KB
|
||||
to the kernel image.
|
||||
|
||||
config SCHED_CLUSTER
|
||||
bool "Cluster scheduler support"
|
||||
depends on SMP
|
||||
default y
|
||||
help
|
||||
Cluster scheduler support improves the CPU scheduler's decision
|
||||
making when dealing with machines that have clusters of CPUs.
|
||||
Cluster usually means a couple of CPUs which are placed closely
|
||||
by sharing mid-level caches, last-level cache tags or internal
|
||||
busses.
|
||||
|
||||
config SCHED_SMT
|
||||
def_bool y if SMP
|
||||
|
||||
|
@ -589,7 +589,7 @@ static inline void load_sp0(unsigned long sp0)
|
||||
/* Free all resources held by a thread. */
|
||||
extern void release_thread(struct task_struct *);
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p);
|
||||
unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
/*
|
||||
* Generic CPUID function
|
||||
|
@ -16,7 +16,9 @@ DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
|
||||
DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
|
||||
/* cpus sharing the last level cache: */
|
||||
DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
|
||||
DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map);
|
||||
DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id);
|
||||
DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_l2c_id);
|
||||
DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
|
||||
|
||||
static inline struct cpumask *cpu_llc_shared_mask(int cpu)
|
||||
@ -24,6 +26,11 @@ static inline struct cpumask *cpu_llc_shared_mask(int cpu)
|
||||
return per_cpu(cpu_llc_shared_map, cpu);
|
||||
}
|
||||
|
||||
static inline struct cpumask *cpu_l2c_shared_mask(int cpu)
|
||||
{
|
||||
return per_cpu(cpu_l2c_shared_map, cpu);
|
||||
}
|
||||
|
||||
DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid);
|
||||
DECLARE_EARLY_PER_CPU_READ_MOSTLY(u32, x86_cpu_to_acpiid);
|
||||
DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_bios_cpu_apicid);
|
||||
|
@ -103,6 +103,7 @@ static inline void setup_node_to_cpumask_map(void) { }
|
||||
#include <asm-generic/topology.h>
|
||||
|
||||
extern const struct cpumask *cpu_coregroup_mask(int cpu);
|
||||
extern const struct cpumask *cpu_clustergroup_mask(int cpu);
|
||||
|
||||
#define topology_logical_package_id(cpu) (cpu_data(cpu).logical_proc_id)
|
||||
#define topology_physical_package_id(cpu) (cpu_data(cpu).phys_proc_id)
|
||||
@ -113,7 +114,9 @@ extern const struct cpumask *cpu_coregroup_mask(int cpu);
|
||||
extern unsigned int __max_die_per_package;
|
||||
|
||||
#ifdef CONFIG_SMP
|
||||
#define topology_cluster_id(cpu) (per_cpu(cpu_l2c_id, cpu))
|
||||
#define topology_die_cpumask(cpu) (per_cpu(cpu_die_map, cpu))
|
||||
#define topology_cluster_cpumask(cpu) (cpu_clustergroup_mask(cpu))
|
||||
#define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))
|
||||
#define topology_sibling_cpumask(cpu) (per_cpu(cpu_sibling_map, cpu))
|
||||
|
||||
|
@ -846,6 +846,7 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
|
||||
l2 = new_l2;
|
||||
#ifdef CONFIG_SMP
|
||||
per_cpu(cpu_llc_id, cpu) = l2_id;
|
||||
per_cpu(cpu_l2c_id, cpu) = l2_id;
|
||||
#endif
|
||||
}
|
||||
|
||||
|
@ -85,6 +85,9 @@ u16 get_llc_id(unsigned int cpu)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(get_llc_id);
|
||||
|
||||
/* L2 cache ID of each logical CPU */
|
||||
DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_l2c_id) = BAD_APICID;
|
||||
|
||||
/* correctly size the local cpu masks */
|
||||
void __init setup_cpu_local_masks(void)
|
||||
{
|
||||
|
@ -198,7 +198,7 @@ void sched_set_itmt_core_prio(int prio, int core_cpu)
|
||||
* of the priority chain and only used when
|
||||
* all other high priority cpus are out of capacity.
|
||||
*/
|
||||
smt_prio = prio * smp_num_siblings / i;
|
||||
smt_prio = prio * smp_num_siblings / (i * i);
|
||||
per_cpu(sched_core_priority, cpu) = smt_prio;
|
||||
i++;
|
||||
}
|
||||
|
@ -43,6 +43,7 @@
|
||||
#include <asm/io_bitmap.h>
|
||||
#include <asm/proto.h>
|
||||
#include <asm/frame.h>
|
||||
#include <asm/unwind.h>
|
||||
|
||||
#include "process.h"
|
||||
|
||||
@ -942,60 +943,22 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
|
||||
* because the task might wake up and we might look at a stack
|
||||
* changing under us.
|
||||
*/
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
unsigned long start, bottom, top, sp, fp, ip, ret = 0;
|
||||
int count = 0;
|
||||
struct unwind_state state;
|
||||
unsigned long addr = 0;
|
||||
|
||||
if (p == current || task_is_running(p))
|
||||
return 0;
|
||||
for (unwind_start(&state, p, NULL, NULL); !unwind_done(&state);
|
||||
unwind_next_frame(&state)) {
|
||||
addr = unwind_get_return_address(&state);
|
||||
if (!addr)
|
||||
break;
|
||||
if (in_sched_functions(addr))
|
||||
continue;
|
||||
break;
|
||||
}
|
||||
|
||||
if (!try_get_task_stack(p))
|
||||
return 0;
|
||||
|
||||
start = (unsigned long)task_stack_page(p);
|
||||
if (!start)
|
||||
goto out;
|
||||
|
||||
/*
|
||||
* Layout of the stack page:
|
||||
*
|
||||
* ----------- topmax = start + THREAD_SIZE - sizeof(unsigned long)
|
||||
* PADDING
|
||||
* ----------- top = topmax - TOP_OF_KERNEL_STACK_PADDING
|
||||
* stack
|
||||
* ----------- bottom = start
|
||||
*
|
||||
* The tasks stack pointer points at the location where the
|
||||
* framepointer is stored. The data on the stack is:
|
||||
* ... IP FP ... IP FP
|
||||
*
|
||||
* We need to read FP and IP, so we need to adjust the upper
|
||||
* bound by another unsigned long.
|
||||
*/
|
||||
top = start + THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;
|
||||
top -= 2 * sizeof(unsigned long);
|
||||
bottom = start;
|
||||
|
||||
sp = READ_ONCE(p->thread.sp);
|
||||
if (sp < bottom || sp > top)
|
||||
goto out;
|
||||
|
||||
fp = READ_ONCE_NOCHECK(((struct inactive_task_frame *)sp)->bp);
|
||||
do {
|
||||
if (fp < bottom || fp > top)
|
||||
goto out;
|
||||
ip = READ_ONCE_NOCHECK(*(unsigned long *)(fp + sizeof(unsigned long)));
|
||||
if (!in_sched_functions(ip)) {
|
||||
ret = ip;
|
||||
goto out;
|
||||
}
|
||||
fp = READ_ONCE_NOCHECK(*(unsigned long *)fp);
|
||||
} while (count++ < 16 && !task_is_running(p));
|
||||
|
||||
out:
|
||||
put_task_stack(p);
|
||||
return ret;
|
||||
return addr;
|
||||
}
|
||||
|
||||
long do_arch_prctl_common(struct task_struct *task, int option,
|
||||
|
@ -101,6 +101,8 @@ EXPORT_PER_CPU_SYMBOL(cpu_die_map);
|
||||
|
||||
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
|
||||
|
||||
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map);
|
||||
|
||||
/* Per CPU bogomips and other parameters */
|
||||
DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info);
|
||||
EXPORT_PER_CPU_SYMBOL(cpu_info);
|
||||
@ -464,6 +466,21 @@ static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
|
||||
return false;
|
||||
}
|
||||
|
||||
static bool match_l2c(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
|
||||
{
|
||||
int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
|
||||
|
||||
/* If the arch didn't set up l2c_id, fall back to SMT */
|
||||
if (per_cpu(cpu_l2c_id, cpu1) == BAD_APICID)
|
||||
return match_smt(c, o);
|
||||
|
||||
/* Do not match if L2 cache id does not match: */
|
||||
if (per_cpu(cpu_l2c_id, cpu1) != per_cpu(cpu_l2c_id, cpu2))
|
||||
return false;
|
||||
|
||||
return topology_sane(c, o, "l2c");
|
||||
}
|
||||
|
||||
/*
|
||||
* Unlike the other levels, we do not enforce keeping a
|
||||
* multicore group inside a NUMA node. If this happens, we will
|
||||
@ -523,7 +540,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
|
||||
}
|
||||
|
||||
|
||||
#if defined(CONFIG_SCHED_SMT) || defined(CONFIG_SCHED_MC)
|
||||
#if defined(CONFIG_SCHED_SMT) || defined(CONFIG_SCHED_CLUSTER) || defined(CONFIG_SCHED_MC)
|
||||
static inline int x86_sched_itmt_flags(void)
|
||||
{
|
||||
return sysctl_sched_itmt_enabled ? SD_ASYM_PACKING : 0;
|
||||
@ -541,12 +558,21 @@ static int x86_smt_flags(void)
|
||||
return cpu_smt_flags() | x86_sched_itmt_flags();
|
||||
}
|
||||
#endif
|
||||
#ifdef CONFIG_SCHED_CLUSTER
|
||||
static int x86_cluster_flags(void)
|
||||
{
|
||||
return cpu_cluster_flags() | x86_sched_itmt_flags();
|
||||
}
|
||||
#endif
|
||||
#endif
|
||||
|
||||
static struct sched_domain_topology_level x86_numa_in_package_topology[] = {
|
||||
#ifdef CONFIG_SCHED_SMT
|
||||
{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
|
||||
#endif
|
||||
#ifdef CONFIG_SCHED_CLUSTER
|
||||
{ cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS) },
|
||||
#endif
|
||||
#ifdef CONFIG_SCHED_MC
|
||||
{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
|
||||
#endif
|
||||
@ -557,6 +583,9 @@ static struct sched_domain_topology_level x86_topology[] = {
|
||||
#ifdef CONFIG_SCHED_SMT
|
||||
{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
|
||||
#endif
|
||||
#ifdef CONFIG_SCHED_CLUSTER
|
||||
{ cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS) },
|
||||
#endif
|
||||
#ifdef CONFIG_SCHED_MC
|
||||
{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
|
||||
#endif
|
||||
@ -584,6 +613,7 @@ void set_cpu_sibling_map(int cpu)
|
||||
if (!has_mp) {
|
||||
cpumask_set_cpu(cpu, topology_sibling_cpumask(cpu));
|
||||
cpumask_set_cpu(cpu, cpu_llc_shared_mask(cpu));
|
||||
cpumask_set_cpu(cpu, cpu_l2c_shared_mask(cpu));
|
||||
cpumask_set_cpu(cpu, topology_core_cpumask(cpu));
|
||||
cpumask_set_cpu(cpu, topology_die_cpumask(cpu));
|
||||
c->booted_cores = 1;
|
||||
@ -602,6 +632,9 @@ void set_cpu_sibling_map(int cpu)
|
||||
if ((i == cpu) || (has_mp && match_llc(c, o)))
|
||||
link_mask(cpu_llc_shared_mask, cpu, i);
|
||||
|
||||
if ((i == cpu) || (has_mp && match_l2c(c, o)))
|
||||
link_mask(cpu_l2c_shared_mask, cpu, i);
|
||||
|
||||
if ((i == cpu) || (has_mp && match_die(c, o)))
|
||||
link_mask(topology_die_cpumask, cpu, i);
|
||||
}
|
||||
@ -652,6 +685,11 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
|
||||
return cpu_llc_shared_mask(cpu);
|
||||
}
|
||||
|
||||
const struct cpumask *cpu_clustergroup_mask(int cpu)
|
||||
{
|
||||
return cpu_l2c_shared_mask(cpu);
|
||||
}
|
||||
|
||||
static void impress_friends(void)
|
||||
{
|
||||
int cpu;
|
||||
@ -1335,6 +1373,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
|
||||
zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
|
||||
zalloc_cpumask_var(&per_cpu(cpu_die_map, i), GFP_KERNEL);
|
||||
zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
|
||||
zalloc_cpumask_var(&per_cpu(cpu_l2c_shared_map, i), GFP_KERNEL);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -1564,7 +1603,10 @@ static void remove_siblinginfo(int cpu)
|
||||
|
||||
for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
|
||||
cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
|
||||
for_each_cpu(sibling, cpu_l2c_shared_mask(cpu))
|
||||
cpumask_clear_cpu(cpu, cpu_l2c_shared_mask(sibling));
|
||||
cpumask_clear(cpu_llc_shared_mask(cpu));
|
||||
cpumask_clear(cpu_l2c_shared_mask(cpu));
|
||||
cpumask_clear(topology_sibling_cpumask(cpu));
|
||||
cpumask_clear(topology_core_cpumask(cpu));
|
||||
cpumask_clear(topology_die_cpumask(cpu));
|
||||
|
@ -215,7 +215,7 @@ struct mm_struct;
|
||||
/* Free all resources held by a thread. */
|
||||
#define release_thread(thread) do { } while(0)
|
||||
|
||||
extern unsigned long get_wchan(struct task_struct *p);
|
||||
extern unsigned long __get_wchan(struct task_struct *p);
|
||||
|
||||
#define KSTK_EIP(tsk) (task_pt_regs(tsk)->pc)
|
||||
#define KSTK_ESP(tsk) (task_pt_regs(tsk)->areg[1])
|
||||
|
@ -298,15 +298,12 @@ int copy_thread(unsigned long clone_flags, unsigned long usp_thread_fn,
|
||||
* These bracket the sleeping functions..
|
||||
*/
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
unsigned long __get_wchan(struct task_struct *p)
|
||||
{
|
||||
unsigned long sp, pc;
|
||||
unsigned long stack_page = (unsigned long) task_stack_page(p);
|
||||
int count = 0;
|
||||
|
||||
if (!p || p == current || task_is_running(p))
|
||||
return 0;
|
||||
|
||||
sp = p->thread.sp;
|
||||
pc = MAKE_PC_FROM_RA(p->thread.ra, p->thread.sp);
|
||||
|
||||
|
@ -746,6 +746,73 @@ int find_acpi_cpu_topology_package(unsigned int cpu)
|
||||
ACPI_PPTT_PHYSICAL_PACKAGE);
|
||||
}
|
||||
|
||||
/**
|
||||
* find_acpi_cpu_topology_cluster() - Determine a unique CPU cluster value
|
||||
* @cpu: Kernel logical CPU number
|
||||
*
|
||||
* Determine a topology unique cluster ID for the given CPU/thread.
|
||||
* This ID can then be used to group peers, which will have matching ids.
|
||||
*
|
||||
* The cluster, if present is the level of topology above CPUs. In a
|
||||
* multi-thread CPU, it will be the level above the CPU, not the thread.
|
||||
* It may not exist in single CPU systems. In simple multi-CPU systems,
|
||||
* it may be equal to the package topology level.
|
||||
*
|
||||
* Return: -ENOENT if the PPTT doesn't exist, the CPU cannot be found
|
||||
* or there is no toplogy level above the CPU..
|
||||
* Otherwise returns a value which represents the package for this CPU.
|
||||
*/
|
||||
|
||||
int find_acpi_cpu_topology_cluster(unsigned int cpu)
|
||||
{
|
||||
struct acpi_table_header *table;
|
||||
acpi_status status;
|
||||
struct acpi_pptt_processor *cpu_node, *cluster_node;
|
||||
u32 acpi_cpu_id;
|
||||
int retval;
|
||||
int is_thread;
|
||||
|
||||
status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
|
||||
if (ACPI_FAILURE(status)) {
|
||||
acpi_pptt_warn_missing();
|
||||
return -ENOENT;
|
||||
}
|
||||
|
||||
acpi_cpu_id = get_acpi_id_for_cpu(cpu);
|
||||
cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
|
||||
if (cpu_node == NULL || !cpu_node->parent) {
|
||||
retval = -ENOENT;
|
||||
goto put_table;
|
||||
}
|
||||
|
||||
is_thread = cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD;
|
||||
cluster_node = fetch_pptt_node(table, cpu_node->parent);
|
||||
if (cluster_node == NULL) {
|
||||
retval = -ENOENT;
|
||||
goto put_table;
|
||||
}
|
||||
if (is_thread) {
|
||||
if (!cluster_node->parent) {
|
||||
retval = -ENOENT;
|
||||
goto put_table;
|
||||
}
|
||||
cluster_node = fetch_pptt_node(table, cluster_node->parent);
|
||||
if (cluster_node == NULL) {
|
||||
retval = -ENOENT;
|
||||
goto put_table;
|
||||
}
|
||||
}
|
||||
if (cluster_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID)
|
||||
retval = cluster_node->acpi_processor_id;
|
||||
else
|
||||
retval = ACPI_PTR_DIFF(cluster_node, table);
|
||||
|
||||
put_table:
|
||||
acpi_put_table(table);
|
||||
|
||||
return retval;
|
||||
}
|
||||
|
||||
/**
|
||||
* find_acpi_cpu_topology_hetero_id() - Get a core architecture tag
|
||||
* @cpu: Kernel logical CPU number
|
||||
|
@ -600,6 +600,11 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
|
||||
return core_mask;
|
||||
}
|
||||
|
||||
const struct cpumask *cpu_clustergroup_mask(int cpu)
|
||||
{
|
||||
return &cpu_topology[cpu].cluster_sibling;
|
||||
}
|
||||
|
||||
void update_siblings_masks(unsigned int cpuid)
|
||||
{
|
||||
struct cpu_topology *cpu_topo, *cpuid_topo = &cpu_topology[cpuid];
|
||||
@ -617,6 +622,12 @@ void update_siblings_masks(unsigned int cpuid)
|
||||
if (cpuid_topo->package_id != cpu_topo->package_id)
|
||||
continue;
|
||||
|
||||
if (cpuid_topo->cluster_id == cpu_topo->cluster_id &&
|
||||
cpuid_topo->cluster_id != -1) {
|
||||
cpumask_set_cpu(cpu, &cpuid_topo->cluster_sibling);
|
||||
cpumask_set_cpu(cpuid, &cpu_topo->cluster_sibling);
|
||||
}
|
||||
|
||||
cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
|
||||
cpumask_set_cpu(cpu, &cpuid_topo->core_sibling);
|
||||
|
||||
@ -635,6 +646,9 @@ static void clear_cpu_topology(int cpu)
|
||||
cpumask_clear(&cpu_topo->llc_sibling);
|
||||
cpumask_set_cpu(cpu, &cpu_topo->llc_sibling);
|
||||
|
||||
cpumask_clear(&cpu_topo->cluster_sibling);
|
||||
cpumask_set_cpu(cpu, &cpu_topo->cluster_sibling);
|
||||
|
||||
cpumask_clear(&cpu_topo->core_sibling);
|
||||
cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
|
||||
cpumask_clear(&cpu_topo->thread_sibling);
|
||||
@ -650,6 +664,7 @@ void __init reset_cpu_topology(void)
|
||||
|
||||
cpu_topo->thread_id = -1;
|
||||
cpu_topo->core_id = -1;
|
||||
cpu_topo->cluster_id = -1;
|
||||
cpu_topo->package_id = -1;
|
||||
cpu_topo->llc_id = -1;
|
||||
|
||||
|
@ -48,6 +48,9 @@ static DEVICE_ATTR_RO(physical_package_id);
|
||||
define_id_show_func(die_id);
|
||||
static DEVICE_ATTR_RO(die_id);
|
||||
|
||||
define_id_show_func(cluster_id);
|
||||
static DEVICE_ATTR_RO(cluster_id);
|
||||
|
||||
define_id_show_func(core_id);
|
||||
static DEVICE_ATTR_RO(core_id);
|
||||
|
||||
@ -63,6 +66,10 @@ define_siblings_read_func(core_siblings, core_cpumask);
|
||||
static BIN_ATTR_RO(core_siblings, 0);
|
||||
static BIN_ATTR_RO(core_siblings_list, 0);
|
||||
|
||||
define_siblings_read_func(cluster_cpus, cluster_cpumask);
|
||||
static BIN_ATTR_RO(cluster_cpus, 0);
|
||||
static BIN_ATTR_RO(cluster_cpus_list, 0);
|
||||
|
||||
define_siblings_read_func(die_cpus, die_cpumask);
|
||||
static BIN_ATTR_RO(die_cpus, 0);
|
||||
static BIN_ATTR_RO(die_cpus_list, 0);
|
||||
@ -94,6 +101,8 @@ static struct bin_attribute *bin_attrs[] = {
|
||||
&bin_attr_thread_siblings_list,
|
||||
&bin_attr_core_siblings,
|
||||
&bin_attr_core_siblings_list,
|
||||
&bin_attr_cluster_cpus,
|
||||
&bin_attr_cluster_cpus_list,
|
||||
&bin_attr_die_cpus,
|
||||
&bin_attr_die_cpus_list,
|
||||
&bin_attr_package_cpus,
|
||||
@ -112,6 +121,7 @@ static struct bin_attribute *bin_attrs[] = {
|
||||
static struct attribute *default_attrs[] = {
|
||||
&dev_attr_physical_package_id.attr,
|
||||
&dev_attr_die_id.attr,
|
||||
&dev_attr_cluster_id.attr,
|
||||
&dev_attr_core_id.attr,
|
||||
#ifdef CONFIG_SCHED_BOOK
|
||||
&dev_attr_book_id.attr,
|
||||
|
@ -541,7 +541,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
|
||||
}
|
||||
|
||||
if (permitted && (!whole || num_threads < 2))
|
||||
wchan = get_wchan(task);
|
||||
wchan = !task_is_running(task);
|
||||
if (!whole) {
|
||||
min_flt = task->min_flt;
|
||||
maj_flt = task->maj_flt;
|
||||
@ -606,10 +606,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
|
||||
*
|
||||
* This works with older implementations of procps as well.
|
||||
*/
|
||||
if (wchan)
|
||||
seq_puts(m, " 1");
|
||||
else
|
||||
seq_puts(m, " 0");
|
||||
seq_put_decimal_ull(m, " ", wchan);
|
||||
|
||||
seq_put_decimal_ull(m, " ", 0);
|
||||
seq_put_decimal_ull(m, " ", 0);
|
||||
|
@ -67,6 +67,7 @@
|
||||
#include <linux/mm.h>
|
||||
#include <linux/swap.h>
|
||||
#include <linux/rcupdate.h>
|
||||
#include <linux/kallsyms.h>
|
||||
#include <linux/stacktrace.h>
|
||||
#include <linux/resource.h>
|
||||
#include <linux/module.h>
|
||||
@ -386,17 +387,19 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
|
||||
struct pid *pid, struct task_struct *task)
|
||||
{
|
||||
unsigned long wchan;
|
||||
char symname[KSYM_NAME_LEN];
|
||||
|
||||
if (ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
|
||||
wchan = get_wchan(task);
|
||||
else
|
||||
wchan = 0;
|
||||
if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
|
||||
goto print0;
|
||||
|
||||
if (wchan)
|
||||
seq_printf(m, "%ps", (void *) wchan);
|
||||
else
|
||||
seq_putc(m, '0');
|
||||
wchan = get_wchan(task);
|
||||
if (wchan && !lookup_symbol_name(wchan, symname)) {
|
||||
seq_puts(m, symname);
|
||||
return 0;
|
||||
}
|
||||
|
||||
print0:
|
||||
seq_putc(m, '0');
|
||||
return 0;
|
||||
}
|
||||
#endif /* CONFIG_KALLSYMS */
|
||||
|
@ -24,7 +24,7 @@
|
||||
|
||||
#ifdef arch_idle_time
|
||||
|
||||
static u64 get_idle_time(struct kernel_cpustat *kcs, int cpu)
|
||||
u64 get_idle_time(struct kernel_cpustat *kcs, int cpu)
|
||||
{
|
||||
u64 idle;
|
||||
|
||||
@ -46,7 +46,7 @@ static u64 get_iowait_time(struct kernel_cpustat *kcs, int cpu)
|
||||
|
||||
#else
|
||||
|
||||
static u64 get_idle_time(struct kernel_cpustat *kcs, int cpu)
|
||||
u64 get_idle_time(struct kernel_cpustat *kcs, int cpu)
|
||||
{
|
||||
u64 idle, idle_usecs = -1ULL;
|
||||
|
||||
|
@ -12,18 +12,22 @@ static int uptime_proc_show(struct seq_file *m, void *v)
|
||||
{
|
||||
struct timespec64 uptime;
|
||||
struct timespec64 idle;
|
||||
u64 nsec;
|
||||
u64 idle_nsec;
|
||||
u32 rem;
|
||||
int i;
|
||||
|
||||
nsec = 0;
|
||||
for_each_possible_cpu(i)
|
||||
nsec += (__force u64) kcpustat_cpu(i).cpustat[CPUTIME_IDLE];
|
||||
idle_nsec = 0;
|
||||
for_each_possible_cpu(i) {
|
||||
struct kernel_cpustat kcs;
|
||||
|
||||
kcpustat_cpu_fetch(&kcs, i);
|
||||
idle_nsec += get_idle_time(&kcs, i);
|
||||
}
|
||||
|
||||
ktime_get_boottime_ts64(&uptime);
|
||||
timens_add_boottime(&uptime);
|
||||
|
||||
idle.tv_sec = div_u64_rem(nsec, NSEC_PER_SEC, &rem);
|
||||
idle.tv_sec = div_u64_rem(idle_nsec, NSEC_PER_SEC, &rem);
|
||||
idle.tv_nsec = rem;
|
||||
seq_printf(m, "%lu.%02lu %lu.%02lu\n",
|
||||
(unsigned long) uptime.tv_sec,
|
||||
|
@ -1353,6 +1353,7 @@ static inline int lpit_read_residency_count_address(u64 *address)
|
||||
#ifdef CONFIG_ACPI_PPTT
|
||||
int acpi_pptt_cpu_is_thread(unsigned int cpu);
|
||||
int find_acpi_cpu_topology(unsigned int cpu, int level);
|
||||
int find_acpi_cpu_topology_cluster(unsigned int cpu);
|
||||
int find_acpi_cpu_topology_package(unsigned int cpu);
|
||||
int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
|
||||
int find_acpi_cpu_cache_topology(unsigned int cpu, int level);
|
||||
@ -1365,6 +1366,10 @@ static inline int find_acpi_cpu_topology(unsigned int cpu, int level)
|
||||
{
|
||||
return -EINVAL;
|
||||
}
|
||||
static inline int find_acpi_cpu_topology_cluster(unsigned int cpu)
|
||||
{
|
||||
return -EINVAL;
|
||||
}
|
||||
static inline int find_acpi_cpu_topology_package(unsigned int cpu)
|
||||
{
|
||||
return -EINVAL;
|
||||
|
@ -62,10 +62,12 @@ void topology_set_thermal_pressure(const struct cpumask *cpus,
|
||||
struct cpu_topology {
|
||||
int thread_id;
|
||||
int core_id;
|
||||
int cluster_id;
|
||||
int package_id;
|
||||
int llc_id;
|
||||
cpumask_t thread_sibling;
|
||||
cpumask_t core_sibling;
|
||||
cpumask_t cluster_sibling;
|
||||
cpumask_t llc_sibling;
|
||||
};
|
||||
|
||||
@ -73,13 +75,16 @@ struct cpu_topology {
|
||||
extern struct cpu_topology cpu_topology[NR_CPUS];
|
||||
|
||||
#define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id)
|
||||
#define topology_cluster_id(cpu) (cpu_topology[cpu].cluster_id)
|
||||
#define topology_core_id(cpu) (cpu_topology[cpu].core_id)
|
||||
#define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling)
|
||||
#define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling)
|
||||
#define topology_cluster_cpumask(cpu) (&cpu_topology[cpu].cluster_sibling)
|
||||
#define topology_llc_cpumask(cpu) (&cpu_topology[cpu].llc_sibling)
|
||||
void init_cpu_topology(void);
|
||||
void store_cpu_topology(unsigned int cpuid);
|
||||
const struct cpumask *cpu_coregroup_mask(int cpu);
|
||||
const struct cpumask *cpu_clustergroup_mask(int cpu);
|
||||
void update_siblings_masks(unsigned int cpu);
|
||||
void remove_cpu_topology(unsigned int cpuid);
|
||||
void reset_cpu_topology(void);
|
||||
|
@ -3,6 +3,7 @@
|
||||
#define _LINUX_IRQ_WORK_H
|
||||
|
||||
#include <linux/smp_types.h>
|
||||
#include <linux/rcuwait.h>
|
||||
|
||||
/*
|
||||
* An entry can be in one of four states:
|
||||
@ -16,11 +17,13 @@
|
||||
struct irq_work {
|
||||
struct __call_single_node node;
|
||||
void (*func)(struct irq_work *);
|
||||
struct rcuwait irqwait;
|
||||
};
|
||||
|
||||
#define __IRQ_WORK_INIT(_func, _flags) (struct irq_work){ \
|
||||
.node = { .u_flags = (_flags), }, \
|
||||
.func = (_func), \
|
||||
.irqwait = __RCUWAIT_INITIALIZER(irqwait), \
|
||||
}
|
||||
|
||||
#define IRQ_WORK_INIT(_func) __IRQ_WORK_INIT(_func, 0)
|
||||
@ -46,6 +49,11 @@ static inline bool irq_work_is_busy(struct irq_work *work)
|
||||
return atomic_read(&work->node.a_flags) & IRQ_WORK_BUSY;
|
||||
}
|
||||
|
||||
static inline bool irq_work_is_hard(struct irq_work *work)
|
||||
{
|
||||
return atomic_read(&work->node.a_flags) & IRQ_WORK_HARD_IRQ;
|
||||
}
|
||||
|
||||
bool irq_work_queue(struct irq_work *work);
|
||||
bool irq_work_queue_on(struct irq_work *work, int cpu);
|
||||
|
||||
|
@ -102,6 +102,7 @@ extern void account_system_index_time(struct task_struct *, u64,
|
||||
enum cpu_usage_stat);
|
||||
extern void account_steal_time(u64);
|
||||
extern void account_idle_time(u64);
|
||||
extern u64 get_idle_time(struct kernel_cpustat *kcs, int cpu);
|
||||
|
||||
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
|
||||
static inline void account_process_tick(struct task_struct *tsk, int user)
|
||||
|
@ -12,6 +12,7 @@
|
||||
#include <linux/completion.h>
|
||||
#include <linux/cpumask.h>
|
||||
#include <linux/uprobes.h>
|
||||
#include <linux/rcupdate.h>
|
||||
#include <linux/page-flags-layout.h>
|
||||
#include <linux/workqueue.h>
|
||||
#include <linux/seqlock.h>
|
||||
@ -649,6 +650,9 @@ struct mm_struct {
|
||||
bool tlb_flush_batched;
|
||||
#endif
|
||||
struct uprobes_state uprobes_state;
|
||||
#ifdef CONFIG_PREEMPT_RT
|
||||
struct rcu_head delayed_drop;
|
||||
#endif
|
||||
#ifdef CONFIG_HUGETLB_PAGE
|
||||
atomic_long_t hugetlb_usage;
|
||||
#endif
|
||||
|
@ -503,6 +503,8 @@ struct sched_statistics {
|
||||
|
||||
u64 block_start;
|
||||
u64 block_max;
|
||||
s64 sum_block_runtime;
|
||||
|
||||
u64 exec_max;
|
||||
u64 slice_max;
|
||||
|
||||
@ -522,7 +524,7 @@ struct sched_statistics {
|
||||
u64 nr_wakeups_passive;
|
||||
u64 nr_wakeups_idle;
|
||||
#endif
|
||||
};
|
||||
} ____cacheline_aligned;
|
||||
|
||||
struct sched_entity {
|
||||
/* For load-balancing: */
|
||||
@ -538,8 +540,6 @@ struct sched_entity {
|
||||
|
||||
u64 nr_migrations;
|
||||
|
||||
struct sched_statistics statistics;
|
||||
|
||||
#ifdef CONFIG_FAIR_GROUP_SCHED
|
||||
int depth;
|
||||
struct sched_entity *parent;
|
||||
@ -775,10 +775,10 @@ struct task_struct {
|
||||
int normal_prio;
|
||||
unsigned int rt_priority;
|
||||
|
||||
const struct sched_class *sched_class;
|
||||
struct sched_entity se;
|
||||
struct sched_rt_entity rt;
|
||||
struct sched_dl_entity dl;
|
||||
const struct sched_class *sched_class;
|
||||
|
||||
#ifdef CONFIG_SCHED_CORE
|
||||
struct rb_node core_node;
|
||||
@ -803,6 +803,8 @@ struct task_struct {
|
||||
struct uclamp_se uclamp[UCLAMP_CNT];
|
||||
#endif
|
||||
|
||||
struct sched_statistics stats;
|
||||
|
||||
#ifdef CONFIG_PREEMPT_NOTIFIERS
|
||||
/* List of struct preempt_notifier: */
|
||||
struct hlist_head preempt_notifiers;
|
||||
@ -2154,6 +2156,7 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
|
||||
#endif /* CONFIG_SMP */
|
||||
|
||||
extern bool sched_task_on_rq(struct task_struct *p);
|
||||
extern unsigned long get_wchan(struct task_struct *p);
|
||||
|
||||
/*
|
||||
* In order to reduce various lock holder preemption latencies provide an
|
||||
|
@ -11,7 +11,11 @@ enum cpu_idle_type {
|
||||
CPU_MAX_IDLE_TYPES
|
||||
};
|
||||
|
||||
#ifdef CONFIG_SMP
|
||||
extern void wake_up_if_idle(int cpu);
|
||||
#else
|
||||
static inline void wake_up_if_idle(int cpu) { }
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Idle thread specific functions to determine the need_resched
|
||||
|
@ -49,6 +49,35 @@ static inline void mmdrop(struct mm_struct *mm)
|
||||
__mmdrop(mm);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_PREEMPT_RT
|
||||
/*
|
||||
* RCU callback for delayed mm drop. Not strictly RCU, but call_rcu() is
|
||||
* by far the least expensive way to do that.
|
||||
*/
|
||||
static inline void __mmdrop_delayed(struct rcu_head *rhp)
|
||||
{
|
||||
struct mm_struct *mm = container_of(rhp, struct mm_struct, delayed_drop);
|
||||
|
||||
__mmdrop(mm);
|
||||
}
|
||||
|
||||
/*
|
||||
* Invoked from finish_task_switch(). Delegates the heavy lifting on RT
|
||||
* kernels via RCU.
|
||||
*/
|
||||
static inline void mmdrop_sched(struct mm_struct *mm)
|
||||
{
|
||||
/* Provides a full memory barrier. See mmdrop() */
|
||||
if (atomic_dec_and_test(&mm->mm_count))
|
||||
call_rcu(&mm->delayed_drop, __mmdrop_delayed);
|
||||
}
|
||||
#else
|
||||
static inline void mmdrop_sched(struct mm_struct *mm)
|
||||
{
|
||||
mmdrop(mm);
|
||||
}
|
||||
#endif
|
||||
|
||||
/**
|
||||
* mmget() - Pin the address space associated with a &struct mm_struct.
|
||||
* @mm: The address space to pin.
|
||||
|
@ -54,7 +54,8 @@ extern asmlinkage void schedule_tail(struct task_struct *prev);
|
||||
extern void init_idle(struct task_struct *idle, int cpu);
|
||||
|
||||
extern int sched_fork(unsigned long clone_flags, struct task_struct *p);
|
||||
extern void sched_post_fork(struct task_struct *p);
|
||||
extern void sched_post_fork(struct task_struct *p,
|
||||
struct kernel_clone_args *kargs);
|
||||
extern void sched_dead(struct task_struct *p);
|
||||
|
||||
void __noreturn do_task_dead(void);
|
||||
|
@ -42,6 +42,13 @@ static inline int cpu_smt_flags(void)
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_SCHED_CLUSTER
|
||||
static inline int cpu_cluster_flags(void)
|
||||
{
|
||||
return SD_SHARE_PKG_RESOURCES;
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_SCHED_MC
|
||||
static inline int cpu_core_flags(void)
|
||||
{
|
||||
@ -98,7 +105,7 @@ struct sched_domain {
|
||||
|
||||
/* idle_balance() stats */
|
||||
u64 max_newidle_lb_cost;
|
||||
unsigned long next_decay_max_lb_cost;
|
||||
unsigned long last_decay_max_lb_cost;
|
||||
|
||||
u64 avg_scan_cost; /* select_idle_sibling */
|
||||
|
||||
|
@ -186,6 +186,9 @@ static inline int cpu_to_mem(int cpu)
|
||||
#ifndef topology_die_id
|
||||
#define topology_die_id(cpu) ((void)(cpu), -1)
|
||||
#endif
|
||||
#ifndef topology_cluster_id
|
||||
#define topology_cluster_id(cpu) ((void)(cpu), -1)
|
||||
#endif
|
||||
#ifndef topology_core_id
|
||||
#define topology_core_id(cpu) ((void)(cpu), 0)
|
||||
#endif
|
||||
@ -195,6 +198,9 @@ static inline int cpu_to_mem(int cpu)
|
||||
#ifndef topology_core_cpumask
|
||||
#define topology_core_cpumask(cpu) cpumask_of(cpu)
|
||||
#endif
|
||||
#ifndef topology_cluster_cpumask
|
||||
#define topology_cluster_cpumask(cpu) cpumask_of(cpu)
|
||||
#endif
|
||||
#ifndef topology_die_cpumask
|
||||
#define topology_die_cpumask(cpu) cpumask_of(cpu)
|
||||
#endif
|
||||
@ -206,6 +212,13 @@ static inline const struct cpumask *cpu_smt_mask(int cpu)
|
||||
}
|
||||
#endif
|
||||
|
||||
#if defined(CONFIG_SCHED_CLUSTER) && !defined(cpu_cluster_mask)
|
||||
static inline const struct cpumask *cpu_cluster_mask(int cpu)
|
||||
{
|
||||
return topology_cluster_cpumask(cpu);
|
||||
}
|
||||
#endif
|
||||
|
||||
static inline const struct cpumask *cpu_cpu_mask(int cpu)
|
||||
{
|
||||
return cpumask_of_node(cpu_to_node(cpu));
|
||||
|
@ -1160,6 +1160,7 @@ int autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, i
|
||||
(wait)->flags = 0; \
|
||||
} while (0)
|
||||
|
||||
bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg);
|
||||
typedef int (*task_call_f)(struct task_struct *p, void *arg);
|
||||
extern int task_call_func(struct task_struct *p, task_call_f func, void *arg);
|
||||
|
||||
#endif /* _LINUX_WAIT_H */
|
||||
|
@ -2,10 +2,11 @@
|
||||
|
||||
choice
|
||||
prompt "Preemption Model"
|
||||
default PREEMPT_NONE
|
||||
default PREEMPT_NONE_BEHAVIOUR
|
||||
|
||||
config PREEMPT_NONE
|
||||
config PREEMPT_NONE_BEHAVIOUR
|
||||
bool "No Forced Preemption (Server)"
|
||||
select PREEMPT_NONE if !PREEMPT_DYNAMIC
|
||||
help
|
||||
This is the traditional Linux preemption model, geared towards
|
||||
throughput. It will still provide good latencies most of the
|
||||
@ -17,9 +18,10 @@ config PREEMPT_NONE
|
||||
raw processing power of the kernel, irrespective of scheduling
|
||||
latencies.
|
||||
|
||||
config PREEMPT_VOLUNTARY
|
||||
config PREEMPT_VOLUNTARY_BEHAVIOUR
|
||||
bool "Voluntary Kernel Preemption (Desktop)"
|
||||
depends on !ARCH_NO_PREEMPT
|
||||
select PREEMPT_VOLUNTARY if !PREEMPT_DYNAMIC
|
||||
help
|
||||
This option reduces the latency of the kernel by adding more
|
||||
"explicit preemption points" to the kernel code. These new
|
||||
@ -35,12 +37,10 @@ config PREEMPT_VOLUNTARY
|
||||
|
||||
Select this if you are building a kernel for a desktop system.
|
||||
|
||||
config PREEMPT
|
||||
config PREEMPT_BEHAVIOUR
|
||||
bool "Preemptible Kernel (Low-Latency Desktop)"
|
||||
depends on !ARCH_NO_PREEMPT
|
||||
select PREEMPTION
|
||||
select UNINLINE_SPIN_UNLOCK if !ARCH_INLINE_SPIN_UNLOCK
|
||||
select PREEMPT_DYNAMIC if HAVE_PREEMPT_DYNAMIC
|
||||
select PREEMPT
|
||||
help
|
||||
This option reduces the latency of the kernel by making
|
||||
all kernel code (that is not executing in a critical section)
|
||||
@ -58,7 +58,7 @@ config PREEMPT
|
||||
|
||||
config PREEMPT_RT
|
||||
bool "Fully Preemptible Kernel (Real-Time)"
|
||||
depends on EXPERT && ARCH_SUPPORTS_RT
|
||||
depends on EXPERT && ARCH_SUPPORTS_RT && !PREEMPT_DYNAMIC
|
||||
select PREEMPTION
|
||||
help
|
||||
This option turns the kernel into a real-time kernel by replacing
|
||||
@ -75,6 +75,17 @@ config PREEMPT_RT
|
||||
|
||||
endchoice
|
||||
|
||||
config PREEMPT_NONE
|
||||
bool
|
||||
|
||||
config PREEMPT_VOLUNTARY
|
||||
bool
|
||||
|
||||
config PREEMPT
|
||||
bool
|
||||
select PREEMPTION
|
||||
select UNINLINE_SPIN_UNLOCK if !ARCH_INLINE_SPIN_UNLOCK
|
||||
|
||||
config PREEMPT_COUNT
|
||||
bool
|
||||
|
||||
@ -83,7 +94,10 @@ config PREEMPTION
|
||||
select PREEMPT_COUNT
|
||||
|
||||
config PREEMPT_DYNAMIC
|
||||
bool
|
||||
bool "Preemption behaviour defined on boot"
|
||||
depends on HAVE_PREEMPT_DYNAMIC
|
||||
select PREEMPT
|
||||
default y
|
||||
help
|
||||
This option allows to define the preemption model on the kernel
|
||||
command line parameter and thus override the default preemption
|
||||
|
@ -63,6 +63,7 @@
|
||||
#include <linux/rcuwait.h>
|
||||
#include <linux/compat.h>
|
||||
#include <linux/io_uring.h>
|
||||
#include <linux/kprobes.h>
|
||||
|
||||
#include <linux/uaccess.h>
|
||||
#include <asm/unistd.h>
|
||||
@ -167,6 +168,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
|
||||
{
|
||||
struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);
|
||||
|
||||
kprobe_flush_task(tsk);
|
||||
perf_event_delayed_put(tsk);
|
||||
trace_sched_process_free(tsk);
|
||||
put_task_struct(tsk);
|
||||
|
@ -2404,7 +2404,7 @@ static __latent_entropy struct task_struct *copy_process(
|
||||
write_unlock_irq(&tasklist_lock);
|
||||
|
||||
proc_fork_connector(p);
|
||||
sched_post_fork(p);
|
||||
sched_post_fork(p, args);
|
||||
cgroup_post_fork(p, args);
|
||||
perf_event_fork(p);
|
||||
|
||||
|
@ -18,11 +18,36 @@
|
||||
#include <linux/cpu.h>
|
||||
#include <linux/notifier.h>
|
||||
#include <linux/smp.h>
|
||||
#include <linux/smpboot.h>
|
||||
#include <asm/processor.h>
|
||||
#include <linux/kasan.h>
|
||||
|
||||
static DEFINE_PER_CPU(struct llist_head, raised_list);
|
||||
static DEFINE_PER_CPU(struct llist_head, lazy_list);
|
||||
static DEFINE_PER_CPU(struct task_struct *, irq_workd);
|
||||
|
||||
static void wake_irq_workd(void)
|
||||
{
|
||||
struct task_struct *tsk = __this_cpu_read(irq_workd);
|
||||
|
||||
if (!llist_empty(this_cpu_ptr(&lazy_list)) && tsk)
|
||||
wake_up_process(tsk);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_SMP
|
||||
static void irq_work_wake(struct irq_work *entry)
|
||||
{
|
||||
wake_irq_workd();
|
||||
}
|
||||
|
||||
static DEFINE_PER_CPU(struct irq_work, irq_work_wakeup) =
|
||||
IRQ_WORK_INIT_HARD(irq_work_wake);
|
||||
#endif
|
||||
|
||||
static int irq_workd_should_run(unsigned int cpu)
|
||||
{
|
||||
return !llist_empty(this_cpu_ptr(&lazy_list));
|
||||
}
|
||||
|
||||
/*
|
||||
* Claim the entry so that no one else will poke at it.
|
||||
@ -52,15 +77,29 @@ void __weak arch_irq_work_raise(void)
|
||||
/* Enqueue on current CPU, work must already be claimed and preempt disabled */
|
||||
static void __irq_work_queue_local(struct irq_work *work)
|
||||
{
|
||||
struct llist_head *list;
|
||||
bool rt_lazy_work = false;
|
||||
bool lazy_work = false;
|
||||
int work_flags;
|
||||
|
||||
work_flags = atomic_read(&work->node.a_flags);
|
||||
if (work_flags & IRQ_WORK_LAZY)
|
||||
lazy_work = true;
|
||||
else if (IS_ENABLED(CONFIG_PREEMPT_RT) &&
|
||||
!(work_flags & IRQ_WORK_HARD_IRQ))
|
||||
rt_lazy_work = true;
|
||||
|
||||
if (lazy_work || rt_lazy_work)
|
||||
list = this_cpu_ptr(&lazy_list);
|
||||
else
|
||||
list = this_cpu_ptr(&raised_list);
|
||||
|
||||
if (!llist_add(&work->node.llist, list))
|
||||
return;
|
||||
|
||||
/* If the work is "lazy", handle it from next tick if any */
|
||||
if (atomic_read(&work->node.a_flags) & IRQ_WORK_LAZY) {
|
||||
if (llist_add(&work->node.llist, this_cpu_ptr(&lazy_list)) &&
|
||||
tick_nohz_tick_stopped())
|
||||
arch_irq_work_raise();
|
||||
} else {
|
||||
if (llist_add(&work->node.llist, this_cpu_ptr(&raised_list)))
|
||||
arch_irq_work_raise();
|
||||
}
|
||||
if (!lazy_work || tick_nohz_tick_stopped())
|
||||
arch_irq_work_raise();
|
||||
}
|
||||
|
||||
/* Enqueue the irq work @work on the current CPU */
|
||||
@ -104,17 +143,34 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
|
||||
if (cpu != smp_processor_id()) {
|
||||
/* Arch remote IPI send/receive backend aren't NMI safe */
|
||||
WARN_ON_ONCE(in_nmi());
|
||||
|
||||
/*
|
||||
* On PREEMPT_RT the items which are not marked as
|
||||
* IRQ_WORK_HARD_IRQ are added to the lazy list and a HARD work
|
||||
* item is used on the remote CPU to wake the thread.
|
||||
*/
|
||||
if (IS_ENABLED(CONFIG_PREEMPT_RT) &&
|
||||
!(atomic_read(&work->node.a_flags) & IRQ_WORK_HARD_IRQ)) {
|
||||
|
||||
if (!llist_add(&work->node.llist, &per_cpu(lazy_list, cpu)))
|
||||
goto out;
|
||||
|
||||
work = &per_cpu(irq_work_wakeup, cpu);
|
||||
if (!irq_work_claim(work))
|
||||
goto out;
|
||||
}
|
||||
|
||||
__smp_call_single_queue(cpu, &work->node.llist);
|
||||
} else {
|
||||
__irq_work_queue_local(work);
|
||||
}
|
||||
out:
|
||||
preempt_enable();
|
||||
|
||||
return true;
|
||||
#endif /* CONFIG_SMP */
|
||||
}
|
||||
|
||||
|
||||
bool irq_work_needs_cpu(void)
|
||||
{
|
||||
struct llist_head *raised, *lazy;
|
||||
@ -160,6 +216,10 @@ void irq_work_single(void *arg)
|
||||
* else claimed it meanwhile.
|
||||
*/
|
||||
(void)atomic_cmpxchg(&work->node.a_flags, flags, flags & ~IRQ_WORK_BUSY);
|
||||
|
||||
if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) ||
|
||||
!arch_irq_work_has_interrupt())
|
||||
rcuwait_wake_up(&work->irqwait);
|
||||
}
|
||||
|
||||
static void irq_work_run_list(struct llist_head *list)
|
||||
@ -167,7 +227,12 @@ static void irq_work_run_list(struct llist_head *list)
|
||||
struct irq_work *work, *tmp;
|
||||
struct llist_node *llnode;
|
||||
|
||||
BUG_ON(!irqs_disabled());
|
||||
/*
|
||||
* On PREEMPT_RT IRQ-work which is not marked as HARD will be processed
|
||||
* in a per-CPU thread in preemptible context. Only the items which are
|
||||
* marked as IRQ_WORK_HARD_IRQ will be processed in hardirq context.
|
||||
*/
|
||||
BUG_ON(!irqs_disabled() && !IS_ENABLED(CONFIG_PREEMPT_RT));
|
||||
|
||||
if (llist_empty(list))
|
||||
return;
|
||||
@ -184,7 +249,10 @@ static void irq_work_run_list(struct llist_head *list)
|
||||
void irq_work_run(void)
|
||||
{
|
||||
irq_work_run_list(this_cpu_ptr(&raised_list));
|
||||
irq_work_run_list(this_cpu_ptr(&lazy_list));
|
||||
if (!IS_ENABLED(CONFIG_PREEMPT_RT))
|
||||
irq_work_run_list(this_cpu_ptr(&lazy_list));
|
||||
else
|
||||
wake_irq_workd();
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(irq_work_run);
|
||||
|
||||
@ -194,7 +262,11 @@ void irq_work_tick(void)
|
||||
|
||||
if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
|
||||
irq_work_run_list(raised);
|
||||
irq_work_run_list(this_cpu_ptr(&lazy_list));
|
||||
|
||||
if (!IS_ENABLED(CONFIG_PREEMPT_RT))
|
||||
irq_work_run_list(this_cpu_ptr(&lazy_list));
|
||||
else
|
||||
wake_irq_workd();
|
||||
}
|
||||
|
||||
/*
|
||||
@ -204,8 +276,42 @@ void irq_work_tick(void)
|
||||
void irq_work_sync(struct irq_work *work)
|
||||
{
|
||||
lockdep_assert_irqs_enabled();
|
||||
might_sleep();
|
||||
|
||||
if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) ||
|
||||
!arch_irq_work_has_interrupt()) {
|
||||
rcuwait_wait_event(&work->irqwait, !irq_work_is_busy(work),
|
||||
TASK_UNINTERRUPTIBLE);
|
||||
return;
|
||||
}
|
||||
|
||||
while (irq_work_is_busy(work))
|
||||
cpu_relax();
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(irq_work_sync);
|
||||
|
||||
static void run_irq_workd(unsigned int cpu)
|
||||
{
|
||||
irq_work_run_list(this_cpu_ptr(&lazy_list));
|
||||
}
|
||||
|
||||
static void irq_workd_setup(unsigned int cpu)
|
||||
{
|
||||
sched_set_fifo_low(current);
|
||||
}
|
||||
|
||||
static struct smp_hotplug_thread irqwork_threads = {
|
||||
.store = &irq_workd,
|
||||
.setup = irq_workd_setup,
|
||||
.thread_should_run = irq_workd_should_run,
|
||||
.thread_fn = run_irq_workd,
|
||||
.thread_comm = "irq_work/%u",
|
||||
};
|
||||
|
||||
static __init int irq_work_init_threads(void)
|
||||
{
|
||||
if (IS_ENABLED(CONFIG_PREEMPT_RT))
|
||||
BUG_ON(smpboot_register_percpu_thread(&irqwork_threads));
|
||||
return 0;
|
||||
}
|
||||
early_initcall(irq_work_init_threads);
|
||||
|
@ -1250,10 +1250,10 @@ void kprobe_busy_end(void)
|
||||
}
|
||||
|
||||
/*
|
||||
* This function is called from finish_task_switch when task tk becomes dead,
|
||||
* so that we can recycle any function-return probe instances associated
|
||||
* with this task. These left over instances represent probed functions
|
||||
* that have been called but will never return.
|
||||
* This function is called from delayed_put_task_struct() when a task is
|
||||
* dead and cleaned up to recycle any function-return probe instances
|
||||
* associated with this task. These left over instances represent probed
|
||||
* functions that have been called but will never return.
|
||||
*/
|
||||
void kprobe_flush_task(struct task_struct *tk)
|
||||
{
|
||||
|
@ -270,6 +270,7 @@ EXPORT_SYMBOL_GPL(kthread_parkme);
|
||||
|
||||
static int kthread(void *_create)
|
||||
{
|
||||
static const struct sched_param param = { .sched_priority = 0 };
|
||||
/* Copy data: it's on kthread's stack */
|
||||
struct kthread_create_info *create = _create;
|
||||
int (*threadfn)(void *data) = create->threadfn;
|
||||
@ -300,6 +301,13 @@ static int kthread(void *_create)
|
||||
init_completion(&self->parked);
|
||||
current->vfork_done = &self->exited;
|
||||
|
||||
/*
|
||||
* The new thread inherited kthreadd's priority and CPU mask. Reset
|
||||
* back to default in case they have been changed.
|
||||
*/
|
||||
sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m);
|
||||
set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_FLAG_KTHREAD));
|
||||
|
||||
/* OK, tell user we're spawned, wait for stop or wakeup */
|
||||
__set_current_state(TASK_UNINTERRUPTIBLE);
|
||||
create->result = current;
|
||||
@ -397,7 +405,6 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
|
||||
}
|
||||
task = create->result;
|
||||
if (!IS_ERR(task)) {
|
||||
static const struct sched_param param = { .sched_priority = 0 };
|
||||
char name[TASK_COMM_LEN];
|
||||
|
||||
/*
|
||||
@ -406,13 +413,6 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
|
||||
*/
|
||||
vsnprintf(name, sizeof(name), namefmt, args);
|
||||
set_task_comm(task, name);
|
||||
/*
|
||||
* root may have changed our (kthreadd's) priority or CPU mask.
|
||||
* The kernel thread should not inherit these properties.
|
||||
*/
|
||||
sched_setscheduler_nocheck(task, SCHED_NORMAL, ¶m);
|
||||
set_cpus_allowed_ptr(task,
|
||||
housekeeping_cpumask(HK_FLAG_KTHREAD));
|
||||
}
|
||||
kfree(create);
|
||||
return task;
|
||||
|
@ -13,7 +13,6 @@
|
||||
#include "core.h"
|
||||
#include "patch.h"
|
||||
#include "transition.h"
|
||||
#include "../sched/sched.h"
|
||||
|
||||
#define MAX_STACK_ENTRIES 100
|
||||
#define STACK_ERR_BUF_SIZE 128
|
||||
@ -240,7 +239,7 @@ static int klp_check_stack_func(struct klp_func *func, unsigned long *entries,
|
||||
* Determine whether it's safe to transition the task to the target patch state
|
||||
* by looking for any to-be-patched or to-be-unpatched functions on its stack.
|
||||
*/
|
||||
static int klp_check_stack(struct task_struct *task, char *err_buf)
|
||||
static int klp_check_stack(struct task_struct *task, const char **oldname)
|
||||
{
|
||||
static unsigned long entries[MAX_STACK_ENTRIES];
|
||||
struct klp_object *obj;
|
||||
@ -248,12 +247,8 @@ static int klp_check_stack(struct task_struct *task, char *err_buf)
|
||||
int ret, nr_entries;
|
||||
|
||||
ret = stack_trace_save_tsk_reliable(task, entries, ARRAY_SIZE(entries));
|
||||
if (ret < 0) {
|
||||
snprintf(err_buf, STACK_ERR_BUF_SIZE,
|
||||
"%s: %s:%d has an unreliable stack\n",
|
||||
__func__, task->comm, task->pid);
|
||||
return ret;
|
||||
}
|
||||
if (ret < 0)
|
||||
return -EINVAL;
|
||||
nr_entries = ret;
|
||||
|
||||
klp_for_each_object(klp_transition_patch, obj) {
|
||||
@ -262,11 +257,8 @@ static int klp_check_stack(struct task_struct *task, char *err_buf)
|
||||
klp_for_each_func(obj, func) {
|
||||
ret = klp_check_stack_func(func, entries, nr_entries);
|
||||
if (ret) {
|
||||
snprintf(err_buf, STACK_ERR_BUF_SIZE,
|
||||
"%s: %s:%d is sleeping on function %s\n",
|
||||
__func__, task->comm, task->pid,
|
||||
func->old_name);
|
||||
return ret;
|
||||
*oldname = func->old_name;
|
||||
return -EADDRINUSE;
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -274,6 +266,22 @@ static int klp_check_stack(struct task_struct *task, char *err_buf)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int klp_check_and_switch_task(struct task_struct *task, void *arg)
|
||||
{
|
||||
int ret;
|
||||
|
||||
if (task_curr(task) && task != current)
|
||||
return -EBUSY;
|
||||
|
||||
ret = klp_check_stack(task, arg);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
|
||||
task->patch_state = klp_target_state;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Try to safely switch a task to the target patch state. If it's currently
|
||||
* running, or it's sleeping on a to-be-patched or to-be-unpatched function, or
|
||||
@ -281,13 +289,8 @@ static int klp_check_stack(struct task_struct *task, char *err_buf)
|
||||
*/
|
||||
static bool klp_try_switch_task(struct task_struct *task)
|
||||
{
|
||||
static char err_buf[STACK_ERR_BUF_SIZE];
|
||||
struct rq *rq;
|
||||
struct rq_flags flags;
|
||||
const char *old_name;
|
||||
int ret;
|
||||
bool success = false;
|
||||
|
||||
err_buf[0] = '\0';
|
||||
|
||||
/* check if this task has already switched over */
|
||||
if (task->patch_state == klp_target_state)
|
||||
@ -305,36 +308,31 @@ static bool klp_try_switch_task(struct task_struct *task)
|
||||
* functions. If all goes well, switch the task to the target patch
|
||||
* state.
|
||||
*/
|
||||
rq = task_rq_lock(task, &flags);
|
||||
ret = task_call_func(task, klp_check_and_switch_task, &old_name);
|
||||
switch (ret) {
|
||||
case 0: /* success */
|
||||
break;
|
||||
|
||||
if (task_running(rq, task) && task != current) {
|
||||
snprintf(err_buf, STACK_ERR_BUF_SIZE,
|
||||
"%s: %s:%d is running\n", __func__, task->comm,
|
||||
task->pid);
|
||||
goto done;
|
||||
case -EBUSY: /* klp_check_and_switch_task() */
|
||||
pr_debug("%s: %s:%d is running\n",
|
||||
__func__, task->comm, task->pid);
|
||||
break;
|
||||
case -EINVAL: /* klp_check_and_switch_task() */
|
||||
pr_debug("%s: %s:%d has an unreliable stack\n",
|
||||
__func__, task->comm, task->pid);
|
||||
break;
|
||||
case -EADDRINUSE: /* klp_check_and_switch_task() */
|
||||
pr_debug("%s: %s:%d is sleeping on function %s\n",
|
||||
__func__, task->comm, task->pid, old_name);
|
||||
break;
|
||||
|
||||
default:
|
||||
pr_debug("%s: Unknown error code (%d) when trying to switch %s:%d\n",
|
||||
__func__, ret, task->comm, task->pid);
|
||||
break;
|
||||
}
|
||||
|
||||
ret = klp_check_stack(task, err_buf);
|
||||
if (ret)
|
||||
goto done;
|
||||
|
||||
success = true;
|
||||
|
||||
clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
|
||||
task->patch_state = klp_target_state;
|
||||
|
||||
done:
|
||||
task_rq_unlock(rq, task, &flags);
|
||||
|
||||
/*
|
||||
* Due to console deadlock issues, pr_debug() can't be used while
|
||||
* holding the task rq lock. Instead we have to use a temporary buffer
|
||||
* and print the debug message after releasing the lock.
|
||||
*/
|
||||
if (err_buf[0] != '\0')
|
||||
pr_debug("%s", err_buf);
|
||||
|
||||
return success;
|
||||
return !ret;
|
||||
}
|
||||
|
||||
/*
|
||||
@ -415,8 +413,11 @@ void klp_try_complete_transition(void)
|
||||
for_each_possible_cpu(cpu) {
|
||||
task = idle_task(cpu);
|
||||
if (cpu_online(cpu)) {
|
||||
if (!klp_try_switch_task(task))
|
||||
if (!klp_try_switch_task(task)) {
|
||||
complete = false;
|
||||
/* Make idle task go through the main loop. */
|
||||
wake_up_if_idle(cpu);
|
||||
}
|
||||
} else if (task->patch_state != klp_target_state) {
|
||||
/* offline idle tasks can be switched immediately */
|
||||
clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
|
||||
|
@ -928,7 +928,7 @@ reset_ipi:
|
||||
}
|
||||
|
||||
/* Callback function for scheduler to check locked-down task. */
|
||||
static bool trc_inspect_reader(struct task_struct *t, void *arg)
|
||||
static int trc_inspect_reader(struct task_struct *t, void *arg)
|
||||
{
|
||||
int cpu = task_cpu(t);
|
||||
bool in_qs = false;
|
||||
@ -939,7 +939,7 @@ static bool trc_inspect_reader(struct task_struct *t, void *arg)
|
||||
|
||||
// If no chance of heavyweight readers, do it the hard way.
|
||||
if (!ofl && !IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
|
||||
return false;
|
||||
return -EINVAL;
|
||||
|
||||
// If heavyweight readers are enabled on the remote task,
|
||||
// we can inspect its state despite its currently running.
|
||||
@ -947,7 +947,7 @@ static bool trc_inspect_reader(struct task_struct *t, void *arg)
|
||||
n_heavy_reader_attempts++;
|
||||
if (!ofl && // Check for "running" idle tasks on offline CPUs.
|
||||
!rcu_dynticks_zero_in_eqs(cpu, &t->trc_reader_nesting))
|
||||
return false; // No quiescent state, do it the hard way.
|
||||
return -EINVAL; // No quiescent state, do it the hard way.
|
||||
n_heavy_reader_updates++;
|
||||
if (ofl)
|
||||
n_heavy_reader_ofl_updates++;
|
||||
@ -962,7 +962,7 @@ static bool trc_inspect_reader(struct task_struct *t, void *arg)
|
||||
t->trc_reader_checked = true;
|
||||
|
||||
if (in_qs)
|
||||
return true; // Already in quiescent state, done!!!
|
||||
return 0; // Already in quiescent state, done!!!
|
||||
|
||||
// The task is in a read-side critical section, so set up its
|
||||
// state so that it will awaken the grace-period kthread upon exit
|
||||
@ -970,7 +970,7 @@ static bool trc_inspect_reader(struct task_struct *t, void *arg)
|
||||
atomic_inc(&trc_n_readers_need_end); // One more to wait on.
|
||||
WARN_ON_ONCE(READ_ONCE(t->trc_reader_special.b.need_qs));
|
||||
WRITE_ONCE(t->trc_reader_special.b.need_qs, true);
|
||||
return true;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Attempt to extract the state for the specified task. */
|
||||
@ -992,7 +992,7 @@ static void trc_wait_for_one_reader(struct task_struct *t,
|
||||
|
||||
// Attempt to nail down the task for inspection.
|
||||
get_task_struct(t);
|
||||
if (try_invoke_on_locked_down_task(t, trc_inspect_reader, NULL)) {
|
||||
if (!task_call_func(t, trc_inspect_reader, NULL)) {
|
||||
put_task_struct(t);
|
||||
return;
|
||||
}
|
||||
|
@ -240,16 +240,16 @@ struct rcu_stall_chk_rdr {
|
||||
* Report out the state of a not-running task that is stalling the
|
||||
* current RCU grace period.
|
||||
*/
|
||||
static bool check_slow_task(struct task_struct *t, void *arg)
|
||||
static int check_slow_task(struct task_struct *t, void *arg)
|
||||
{
|
||||
struct rcu_stall_chk_rdr *rscrp = arg;
|
||||
|
||||
if (task_curr(t))
|
||||
return false; // It is running, so decline to inspect it.
|
||||
return -EBUSY; // It is running, so decline to inspect it.
|
||||
rscrp->nesting = t->rcu_read_lock_nesting;
|
||||
rscrp->rs = t->rcu_read_unlock_special;
|
||||
rscrp->on_blkd_list = !list_empty(&t->rcu_node_entry);
|
||||
return true;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
@ -283,7 +283,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
|
||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||
while (i) {
|
||||
t = ts[--i];
|
||||
if (!try_invoke_on_locked_down_task(t, check_slow_task, &rscr))
|
||||
if (task_call_func(t, check_slow_task, &rscr))
|
||||
pr_cont(" P%d", t->pid);
|
||||
else
|
||||
pr_cont(" P%d/%d:%c%c%c%c",
|
||||
|
@ -3,6 +3,10 @@ ifdef CONFIG_FUNCTION_TRACER
|
||||
CFLAGS_REMOVE_clock.o = $(CC_FLAGS_FTRACE)
|
||||
endif
|
||||
|
||||
# The compilers are complaining about unused variables inside an if(0) scope
|
||||
# block. This is daft, shut them up.
|
||||
ccflags-y += $(call cc-disable-warning, unused-but-set-variable)
|
||||
|
||||
# These files are disabled because they produce non-interesting flaky coverage
|
||||
# that is not a function of syscall inputs. E.g. involuntary context switches.
|
||||
KCOV_INSTRUMENT := n
|
||||
|
@ -74,7 +74,11 @@ __read_mostly int sysctl_resched_latency_warn_once = 1;
|
||||
* Number of tasks to iterate in a single balance run.
|
||||
* Limited because this is done with IRQs disabled.
|
||||
*/
|
||||
#ifdef CONFIG_PREEMPT_RT
|
||||
const_debug unsigned int sysctl_sched_nr_migrate = 8;
|
||||
#else
|
||||
const_debug unsigned int sysctl_sched_nr_migrate = 32;
|
||||
#endif
|
||||
|
||||
/*
|
||||
* period over which we measure -rt task CPU usage in us.
|
||||
@ -1962,6 +1966,25 @@ bool sched_task_on_rq(struct task_struct *p)
|
||||
return task_on_rq_queued(p);
|
||||
}
|
||||
|
||||
unsigned long get_wchan(struct task_struct *p)
|
||||
{
|
||||
unsigned long ip = 0;
|
||||
unsigned int state;
|
||||
|
||||
if (!p || p == current)
|
||||
return 0;
|
||||
|
||||
/* Only get wchan if task is blocked and we can keep it that way. */
|
||||
raw_spin_lock_irq(&p->pi_lock);
|
||||
state = READ_ONCE(p->__state);
|
||||
smp_rmb(); /* see try_to_wake_up() */
|
||||
if (state != TASK_RUNNING && state != TASK_WAKING && !p->on_rq)
|
||||
ip = __get_wchan(p);
|
||||
raw_spin_unlock_irq(&p->pi_lock);
|
||||
|
||||
return ip;
|
||||
}
|
||||
|
||||
static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
|
||||
{
|
||||
if (!(flags & ENQUEUE_NOCLOCK))
|
||||
@ -3251,7 +3274,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
|
||||
ktime_t to = NSEC_PER_SEC / HZ;
|
||||
|
||||
set_current_state(TASK_UNINTERRUPTIBLE);
|
||||
schedule_hrtimeout(&to, HRTIMER_MODE_REL);
|
||||
schedule_hrtimeout(&to, HRTIMER_MODE_REL_HARD);
|
||||
continue;
|
||||
}
|
||||
|
||||
@ -3489,11 +3512,11 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
|
||||
#ifdef CONFIG_SMP
|
||||
if (cpu == rq->cpu) {
|
||||
__schedstat_inc(rq->ttwu_local);
|
||||
__schedstat_inc(p->se.statistics.nr_wakeups_local);
|
||||
__schedstat_inc(p->stats.nr_wakeups_local);
|
||||
} else {
|
||||
struct sched_domain *sd;
|
||||
|
||||
__schedstat_inc(p->se.statistics.nr_wakeups_remote);
|
||||
__schedstat_inc(p->stats.nr_wakeups_remote);
|
||||
rcu_read_lock();
|
||||
for_each_domain(rq->cpu, sd) {
|
||||
if (cpumask_test_cpu(cpu, sched_domain_span(sd))) {
|
||||
@ -3505,14 +3528,14 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
|
||||
}
|
||||
|
||||
if (wake_flags & WF_MIGRATED)
|
||||
__schedstat_inc(p->se.statistics.nr_wakeups_migrate);
|
||||
__schedstat_inc(p->stats.nr_wakeups_migrate);
|
||||
#endif /* CONFIG_SMP */
|
||||
|
||||
__schedstat_inc(rq->ttwu_count);
|
||||
__schedstat_inc(p->se.statistics.nr_wakeups);
|
||||
__schedstat_inc(p->stats.nr_wakeups);
|
||||
|
||||
if (wake_flags & WF_SYNC)
|
||||
__schedstat_inc(p->se.statistics.nr_wakeups_sync);
|
||||
__schedstat_inc(p->stats.nr_wakeups_sync);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -3691,15 +3714,11 @@ void wake_up_if_idle(int cpu)
|
||||
if (!is_idle_task(rcu_dereference(rq->curr)))
|
||||
goto out;
|
||||
|
||||
if (set_nr_if_polling(rq->idle)) {
|
||||
trace_sched_wake_idle_without_ipi(cpu);
|
||||
} else {
|
||||
rq_lock_irqsave(rq, &rf);
|
||||
if (is_idle_task(rq->curr))
|
||||
smp_send_reschedule(cpu);
|
||||
/* Else CPU is not idle, do nothing here: */
|
||||
rq_unlock_irqrestore(rq, &rf);
|
||||
}
|
||||
rq_lock_irqsave(rq, &rf);
|
||||
if (is_idle_task(rq->curr))
|
||||
resched_curr(rq);
|
||||
/* Else CPU is not idle, do nothing here: */
|
||||
rq_unlock_irqrestore(rq, &rf);
|
||||
|
||||
out:
|
||||
rcu_read_unlock();
|
||||
@ -4106,46 +4125,61 @@ out:
|
||||
}
|
||||
|
||||
/**
|
||||
* try_invoke_on_locked_down_task - Invoke a function on task in fixed state
|
||||
* task_call_func - Invoke a function on task in fixed state
|
||||
* @p: Process for which the function is to be invoked, can be @current.
|
||||
* @func: Function to invoke.
|
||||
* @arg: Argument to function.
|
||||
*
|
||||
* If the specified task can be quickly locked into a definite state
|
||||
* (either sleeping or on a given runqueue), arrange to keep it in that
|
||||
* state while invoking @func(@arg). This function can use ->on_rq and
|
||||
* task_curr() to work out what the state is, if required. Given that
|
||||
* @func can be invoked with a runqueue lock held, it had better be quite
|
||||
* lightweight.
|
||||
* Fix the task in it's current state by avoiding wakeups and or rq operations
|
||||
* and call @func(@arg) on it. This function can use ->on_rq and task_curr()
|
||||
* to work out what the state is, if required. Given that @func can be invoked
|
||||
* with a runqueue lock held, it had better be quite lightweight.
|
||||
*
|
||||
* Returns:
|
||||
* @false if the task slipped out from under the locks.
|
||||
* @true if the task was locked onto a runqueue or is sleeping.
|
||||
* However, @func can override this by returning @false.
|
||||
* Whatever @func returns
|
||||
*/
|
||||
bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
|
||||
int task_call_func(struct task_struct *p, task_call_f func, void *arg)
|
||||
{
|
||||
struct rq *rq = NULL;
|
||||
unsigned int state;
|
||||
struct rq_flags rf;
|
||||
bool ret = false;
|
||||
struct rq *rq;
|
||||
int ret;
|
||||
|
||||
raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
|
||||
if (p->on_rq) {
|
||||
|
||||
state = READ_ONCE(p->__state);
|
||||
|
||||
/*
|
||||
* Ensure we load p->on_rq after p->__state, otherwise it would be
|
||||
* possible to, falsely, observe p->on_rq == 0.
|
||||
*
|
||||
* See try_to_wake_up() for a longer comment.
|
||||
*/
|
||||
smp_rmb();
|
||||
|
||||
/*
|
||||
* Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
|
||||
* the task is blocked. Make sure to check @state since ttwu() can drop
|
||||
* locks at the end, see ttwu_queue_wakelist().
|
||||
*/
|
||||
if (state == TASK_RUNNING || state == TASK_WAKING || p->on_rq)
|
||||
rq = __task_rq_lock(p, &rf);
|
||||
if (task_rq(p) == rq)
|
||||
ret = func(p, arg);
|
||||
|
||||
/*
|
||||
* At this point the task is pinned; either:
|
||||
* - blocked and we're holding off wakeups (pi->lock)
|
||||
* - woken, and we're holding off enqueue (rq->lock)
|
||||
* - queued, and we're holding off schedule (rq->lock)
|
||||
* - running, and we're holding off de-schedule (rq->lock)
|
||||
*
|
||||
* The called function (@func) can use: task_curr(), p->on_rq and
|
||||
* p->__state to differentiate between these states.
|
||||
*/
|
||||
ret = func(p, arg);
|
||||
|
||||
if (rq)
|
||||
rq_unlock(rq, &rf);
|
||||
} else {
|
||||
switch (READ_ONCE(p->__state)) {
|
||||
case TASK_RUNNING:
|
||||
case TASK_WAKING:
|
||||
break;
|
||||
default:
|
||||
smp_rmb(); // See smp_rmb() comment in try_to_wake_up().
|
||||
if (!p->on_rq)
|
||||
ret = func(p, arg);
|
||||
}
|
||||
}
|
||||
|
||||
raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
|
||||
return ret;
|
||||
}
|
||||
@ -4196,7 +4230,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
|
||||
|
||||
#ifdef CONFIG_SCHEDSTATS
|
||||
/* Even if schedstat is disabled, there should not be garbage */
|
||||
memset(&p->se.statistics, 0, sizeof(p->se.statistics));
|
||||
memset(&p->stats, 0, sizeof(p->stats));
|
||||
#endif
|
||||
|
||||
RB_CLEAR_NODE(&p->dl.rb_node);
|
||||
@ -4328,8 +4362,6 @@ int sysctl_schedstats(struct ctl_table *table, int write, void *buffer,
|
||||
*/
|
||||
int sched_fork(unsigned long clone_flags, struct task_struct *p)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
__sched_fork(clone_flags, p);
|
||||
/*
|
||||
* We mark the process as NEW here. This guarantees that
|
||||
@ -4375,24 +4407,6 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
|
||||
|
||||
init_entity_runnable_average(&p->se);
|
||||
|
||||
/*
|
||||
* The child is not yet in the pid-hash so no cgroup attach races,
|
||||
* and the cgroup is pinned to this child due to cgroup_fork()
|
||||
* is ran before sched_fork().
|
||||
*
|
||||
* Silence PROVE_RCU.
|
||||
*/
|
||||
raw_spin_lock_irqsave(&p->pi_lock, flags);
|
||||
rseq_migrate(p);
|
||||
/*
|
||||
* We're setting the CPU for the first time, we don't migrate,
|
||||
* so use __set_task_cpu().
|
||||
*/
|
||||
__set_task_cpu(p, smp_processor_id());
|
||||
if (p->sched_class->task_fork)
|
||||
p->sched_class->task_fork(p);
|
||||
raw_spin_unlock_irqrestore(&p->pi_lock, flags);
|
||||
|
||||
#ifdef CONFIG_SCHED_INFO
|
||||
if (likely(sched_info_on()))
|
||||
memset(&p->sched_info, 0, sizeof(p->sched_info));
|
||||
@ -4408,8 +4422,29 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
|
||||
return 0;
|
||||
}
|
||||
|
||||
void sched_post_fork(struct task_struct *p)
|
||||
void sched_post_fork(struct task_struct *p, struct kernel_clone_args *kargs)
|
||||
{
|
||||
unsigned long flags;
|
||||
#ifdef CONFIG_CGROUP_SCHED
|
||||
struct task_group *tg;
|
||||
#endif
|
||||
|
||||
raw_spin_lock_irqsave(&p->pi_lock, flags);
|
||||
#ifdef CONFIG_CGROUP_SCHED
|
||||
tg = container_of(kargs->cset->subsys[cpu_cgrp_id],
|
||||
struct task_group, css);
|
||||
p->sched_task_group = autogroup_task_group(p, tg);
|
||||
#endif
|
||||
rseq_migrate(p);
|
||||
/*
|
||||
* We're setting the CPU for the first time, we don't migrate,
|
||||
* so use __set_task_cpu().
|
||||
*/
|
||||
__set_task_cpu(p, smp_processor_id());
|
||||
if (p->sched_class->task_fork)
|
||||
p->sched_class->task_fork(p);
|
||||
raw_spin_unlock_irqrestore(&p->pi_lock, flags);
|
||||
|
||||
uclamp_post_fork(p);
|
||||
}
|
||||
|
||||
@ -4836,18 +4871,12 @@ static struct rq *finish_task_switch(struct task_struct *prev)
|
||||
*/
|
||||
if (mm) {
|
||||
membarrier_mm_sync_core_before_usermode(mm);
|
||||
mmdrop(mm);
|
||||
mmdrop_sched(mm);
|
||||
}
|
||||
if (unlikely(prev_state == TASK_DEAD)) {
|
||||
if (prev->sched_class->task_dead)
|
||||
prev->sched_class->task_dead(prev);
|
||||
|
||||
/*
|
||||
* Remove function-return probe instances associated with this
|
||||
* task and put them back on the free list.
|
||||
*/
|
||||
kprobe_flush_task(prev);
|
||||
|
||||
/* Task is done with its stack. */
|
||||
put_task_stack(prev);
|
||||
|
||||
@ -5580,8 +5609,7 @@ restart:
|
||||
return p;
|
||||
}
|
||||
|
||||
/* The idle class should always have a runnable task: */
|
||||
BUG();
|
||||
BUG(); /* The idle class should always have a runnable task. */
|
||||
}
|
||||
|
||||
#ifdef CONFIG_SCHED_CORE
|
||||
@ -5603,54 +5631,18 @@ static inline bool cookie_match(struct task_struct *a, struct task_struct *b)
|
||||
return a->core_cookie == b->core_cookie;
|
||||
}
|
||||
|
||||
// XXX fairness/fwd progress conditions
|
||||
/*
|
||||
* Returns
|
||||
* - NULL if there is no runnable task for this class.
|
||||
* - the highest priority task for this runqueue if it matches
|
||||
* rq->core->core_cookie or its priority is greater than max.
|
||||
* - Else returns idle_task.
|
||||
*/
|
||||
static struct task_struct *
|
||||
pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *max, bool in_fi)
|
||||
static inline struct task_struct *pick_task(struct rq *rq)
|
||||
{
|
||||
struct task_struct *class_pick, *cookie_pick;
|
||||
unsigned long cookie = rq->core->core_cookie;
|
||||
const struct sched_class *class;
|
||||
struct task_struct *p;
|
||||
|
||||
class_pick = class->pick_task(rq);
|
||||
if (!class_pick)
|
||||
return NULL;
|
||||
|
||||
if (!cookie) {
|
||||
/*
|
||||
* If class_pick is tagged, return it only if it has
|
||||
* higher priority than max.
|
||||
*/
|
||||
if (max && class_pick->core_cookie &&
|
||||
prio_less(class_pick, max, in_fi))
|
||||
return idle_sched_class.pick_task(rq);
|
||||
|
||||
return class_pick;
|
||||
for_each_class(class) {
|
||||
p = class->pick_task(rq);
|
||||
if (p)
|
||||
return p;
|
||||
}
|
||||
|
||||
/*
|
||||
* If class_pick is idle or matches cookie, return early.
|
||||
*/
|
||||
if (cookie_equals(class_pick, cookie))
|
||||
return class_pick;
|
||||
|
||||
cookie_pick = sched_core_find(rq, cookie);
|
||||
|
||||
/*
|
||||
* If class > max && class > cookie, it is the highest priority task on
|
||||
* the core (so far) and it must be selected, otherwise we must go with
|
||||
* the cookie pick in order to satisfy the constraint.
|
||||
*/
|
||||
if (prio_less(cookie_pick, class_pick, in_fi) &&
|
||||
(!max || prio_less(max, class_pick, in_fi)))
|
||||
return class_pick;
|
||||
|
||||
return cookie_pick;
|
||||
BUG(); /* The idle class should always have a runnable task. */
|
||||
}
|
||||
|
||||
extern void task_vruntime_update(struct rq *rq, struct task_struct *p, bool in_fi);
|
||||
@ -5658,11 +5650,12 @@ extern void task_vruntime_update(struct rq *rq, struct task_struct *p, bool in_f
|
||||
static struct task_struct *
|
||||
pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
|
||||
{
|
||||
struct task_struct *next, *max = NULL;
|
||||
const struct sched_class *class;
|
||||
struct task_struct *next, *p, *max = NULL;
|
||||
const struct cpumask *smt_mask;
|
||||
bool fi_before = false;
|
||||
int i, j, cpu, occ = 0;
|
||||
unsigned long cookie;
|
||||
int i, cpu, occ = 0;
|
||||
struct rq *rq_i;
|
||||
bool need_sync;
|
||||
|
||||
if (!sched_core_enabled(rq))
|
||||
@ -5735,12 +5728,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
|
||||
* and there are no cookied tasks running on siblings.
|
||||
*/
|
||||
if (!need_sync) {
|
||||
for_each_class(class) {
|
||||
next = class->pick_task(rq);
|
||||
if (next)
|
||||
break;
|
||||
}
|
||||
|
||||
next = pick_task(rq);
|
||||
if (!next->core_cookie) {
|
||||
rq->core_pick = NULL;
|
||||
/*
|
||||
@ -5753,76 +5741,51 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
|
||||
}
|
||||
}
|
||||
|
||||
for_each_cpu(i, smt_mask) {
|
||||
struct rq *rq_i = cpu_rq(i);
|
||||
|
||||
rq_i->core_pick = NULL;
|
||||
/*
|
||||
* For each thread: do the regular task pick and find the max prio task
|
||||
* amongst them.
|
||||
*
|
||||
* Tie-break prio towards the current CPU
|
||||
*/
|
||||
for_each_cpu_wrap(i, smt_mask, cpu) {
|
||||
rq_i = cpu_rq(i);
|
||||
|
||||
if (i != cpu)
|
||||
update_rq_clock(rq_i);
|
||||
|
||||
p = rq_i->core_pick = pick_task(rq_i);
|
||||
if (!max || prio_less(max, p, fi_before))
|
||||
max = p;
|
||||
}
|
||||
|
||||
cookie = rq->core->core_cookie = max->core_cookie;
|
||||
|
||||
/*
|
||||
* Try and select tasks for each sibling in descending sched_class
|
||||
* order.
|
||||
* For each thread: try and find a runnable task that matches @max or
|
||||
* force idle.
|
||||
*/
|
||||
for_each_class(class) {
|
||||
again:
|
||||
for_each_cpu_wrap(i, smt_mask, cpu) {
|
||||
struct rq *rq_i = cpu_rq(i);
|
||||
struct task_struct *p;
|
||||
for_each_cpu(i, smt_mask) {
|
||||
rq_i = cpu_rq(i);
|
||||
p = rq_i->core_pick;
|
||||
|
||||
if (rq_i->core_pick)
|
||||
continue;
|
||||
|
||||
/*
|
||||
* If this sibling doesn't yet have a suitable task to
|
||||
* run; ask for the most eligible task, given the
|
||||
* highest priority task already selected for this
|
||||
* core.
|
||||
*/
|
||||
p = pick_task(rq_i, class, max, fi_before);
|
||||
if (!cookie_equals(p, cookie)) {
|
||||
p = NULL;
|
||||
if (cookie)
|
||||
p = sched_core_find(rq_i, cookie);
|
||||
if (!p)
|
||||
continue;
|
||||
p = idle_sched_class.pick_task(rq_i);
|
||||
}
|
||||
|
||||
if (!is_task_rq_idle(p))
|
||||
occ++;
|
||||
rq_i->core_pick = p;
|
||||
|
||||
rq_i->core_pick = p;
|
||||
if (rq_i->idle == p && rq_i->nr_running) {
|
||||
if (p == rq_i->idle) {
|
||||
if (rq_i->nr_running) {
|
||||
rq->core->core_forceidle = true;
|
||||
if (!fi_before)
|
||||
rq->core->core_forceidle_seq++;
|
||||
}
|
||||
|
||||
/*
|
||||
* If this new candidate is of higher priority than the
|
||||
* previous; and they're incompatible; we need to wipe
|
||||
* the slate and start over. pick_task makes sure that
|
||||
* p's priority is more than max if it doesn't match
|
||||
* max's cookie.
|
||||
*
|
||||
* NOTE: this is a linear max-filter and is thus bounded
|
||||
* in execution time.
|
||||
*/
|
||||
if (!max || !cookie_match(max, p)) {
|
||||
struct task_struct *old_max = max;
|
||||
|
||||
rq->core->core_cookie = p->core_cookie;
|
||||
max = p;
|
||||
|
||||
if (old_max) {
|
||||
rq->core->core_forceidle = false;
|
||||
for_each_cpu(j, smt_mask) {
|
||||
if (j == i)
|
||||
continue;
|
||||
|
||||
cpu_rq(j)->core_pick = NULL;
|
||||
}
|
||||
occ = 1;
|
||||
goto again;
|
||||
}
|
||||
}
|
||||
} else {
|
||||
occ++;
|
||||
}
|
||||
}
|
||||
|
||||
@ -5842,7 +5805,7 @@ again:
|
||||
* non-matching user state.
|
||||
*/
|
||||
for_each_cpu(i, smt_mask) {
|
||||
struct rq *rq_i = cpu_rq(i);
|
||||
rq_i = cpu_rq(i);
|
||||
|
||||
/*
|
||||
* An online sibling might have gone offline before a task
|
||||
@ -6319,20 +6282,14 @@ static inline void sched_submit_work(struct task_struct *tsk)
|
||||
|
||||
task_flags = tsk->flags;
|
||||
/*
|
||||
* If a worker went to sleep, notify and ask workqueue whether
|
||||
* it wants to wake up a task to maintain concurrency.
|
||||
* As this function is called inside the schedule() context,
|
||||
* we disable preemption to avoid it calling schedule() again
|
||||
* in the possible wakeup of a kworker and because wq_worker_sleeping()
|
||||
* requires it.
|
||||
* If a worker goes to sleep, notify and ask workqueue whether it
|
||||
* wants to wake up a task to maintain concurrency.
|
||||
*/
|
||||
if (task_flags & (PF_WQ_WORKER | PF_IO_WORKER)) {
|
||||
preempt_disable();
|
||||
if (task_flags & PF_WQ_WORKER)
|
||||
wq_worker_sleeping(tsk);
|
||||
else
|
||||
io_wq_worker_sleeping(tsk);
|
||||
preempt_enable_no_resched();
|
||||
}
|
||||
|
||||
if (tsk_is_pi_blocked(tsk))
|
||||
@ -6586,12 +6543,13 @@ EXPORT_STATIC_CALL_TRAMP(preempt_schedule_notrace);
|
||||
*/
|
||||
|
||||
enum {
|
||||
preempt_dynamic_none = 0,
|
||||
preempt_dynamic_undefined = -1,
|
||||
preempt_dynamic_none,
|
||||
preempt_dynamic_voluntary,
|
||||
preempt_dynamic_full,
|
||||
};
|
||||
|
||||
int preempt_dynamic_mode = preempt_dynamic_full;
|
||||
int preempt_dynamic_mode = preempt_dynamic_undefined;
|
||||
|
||||
int sched_dynamic_mode(const char *str)
|
||||
{
|
||||
@ -6664,7 +6622,27 @@ static int __init setup_preempt_mode(char *str)
|
||||
}
|
||||
__setup("preempt=", setup_preempt_mode);
|
||||
|
||||
#endif /* CONFIG_PREEMPT_DYNAMIC */
|
||||
static void __init preempt_dynamic_init(void)
|
||||
{
|
||||
if (preempt_dynamic_mode == preempt_dynamic_undefined) {
|
||||
if (IS_ENABLED(CONFIG_PREEMPT_NONE_BEHAVIOUR)) {
|
||||
sched_dynamic_update(preempt_dynamic_none);
|
||||
} else if (IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY_BEHAVIOUR)) {
|
||||
sched_dynamic_update(preempt_dynamic_voluntary);
|
||||
} else {
|
||||
/* Default static call setting, nothing to do */
|
||||
WARN_ON_ONCE(!IS_ENABLED(CONFIG_PREEMPT_BEHAVIOUR));
|
||||
preempt_dynamic_mode = preempt_dynamic_full;
|
||||
pr_info("Dynamic Preempt: full\n");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#else /* !CONFIG_PREEMPT_DYNAMIC */
|
||||
|
||||
static inline void preempt_dynamic_init(void) { }
|
||||
|
||||
#endif /* #ifdef CONFIG_PREEMPT_DYNAMIC */
|
||||
|
||||
/*
|
||||
* This is the entry point to schedule() from kernel preemption
|
||||
@ -9466,6 +9444,8 @@ void __init sched_init(void)
|
||||
|
||||
init_uclamp();
|
||||
|
||||
preempt_dynamic_init();
|
||||
|
||||
scheduler_running = 1;
|
||||
}
|
||||
|
||||
@ -9640,9 +9620,9 @@ void normalize_rt_tasks(void)
|
||||
continue;
|
||||
|
||||
p->se.exec_start = 0;
|
||||
schedstat_set(p->se.statistics.wait_start, 0);
|
||||
schedstat_set(p->se.statistics.sleep_start, 0);
|
||||
schedstat_set(p->se.statistics.block_start, 0);
|
||||
schedstat_set(p->stats.wait_start, 0);
|
||||
schedstat_set(p->stats.sleep_start, 0);
|
||||
schedstat_set(p->stats.block_start, 0);
|
||||
|
||||
if (!dl_task(p) && !rt_task(p)) {
|
||||
/*
|
||||
@ -10484,15 +10464,21 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v)
|
||||
seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time);
|
||||
|
||||
if (schedstat_enabled() && tg != &root_task_group) {
|
||||
struct sched_statistics *stats;
|
||||
u64 ws = 0;
|
||||
int i;
|
||||
|
||||
for_each_possible_cpu(i)
|
||||
ws += schedstat_val(tg->se[i]->statistics.wait_sum);
|
||||
for_each_possible_cpu(i) {
|
||||
stats = __schedstats_from_se(tg->se[i]);
|
||||
ws += schedstat_val(stats->wait_sum);
|
||||
}
|
||||
|
||||
seq_printf(sf, "wait_sum %llu\n", ws);
|
||||
}
|
||||
|
||||
seq_printf(sf, "nr_bursts %d\n", cfs_b->nr_burst);
|
||||
seq_printf(sf, "burst_time %llu\n", cfs_b->burst_time);
|
||||
|
||||
return 0;
|
||||
}
|
||||
#endif /* CONFIG_CFS_BANDWIDTH */
|
||||
@ -10608,16 +10594,20 @@ static int cpu_extra_stat_show(struct seq_file *sf,
|
||||
{
|
||||
struct task_group *tg = css_tg(css);
|
||||
struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
|
||||
u64 throttled_usec;
|
||||
u64 throttled_usec, burst_usec;
|
||||
|
||||
throttled_usec = cfs_b->throttled_time;
|
||||
do_div(throttled_usec, NSEC_PER_USEC);
|
||||
burst_usec = cfs_b->burst_time;
|
||||
do_div(burst_usec, NSEC_PER_USEC);
|
||||
|
||||
seq_printf(sf, "nr_periods %d\n"
|
||||
"nr_throttled %d\n"
|
||||
"throttled_usec %llu\n",
|
||||
"throttled_usec %llu\n"
|
||||
"nr_bursts %d\n"
|
||||
"burst_usec %llu\n",
|
||||
cfs_b->nr_periods, cfs_b->nr_throttled,
|
||||
throttled_usec);
|
||||
throttled_usec, cfs_b->nr_burst, burst_usec);
|
||||
}
|
||||
#endif
|
||||
return 0;
|
||||
|
@ -11,7 +11,7 @@ struct sched_core_cookie {
|
||||
refcount_t refcnt;
|
||||
};
|
||||
|
||||
unsigned long sched_core_alloc_cookie(void)
|
||||
static unsigned long sched_core_alloc_cookie(void)
|
||||
{
|
||||
struct sched_core_cookie *ck = kmalloc(sizeof(*ck), GFP_KERNEL);
|
||||
if (!ck)
|
||||
@ -23,7 +23,7 @@ unsigned long sched_core_alloc_cookie(void)
|
||||
return (unsigned long)ck;
|
||||
}
|
||||
|
||||
void sched_core_put_cookie(unsigned long cookie)
|
||||
static void sched_core_put_cookie(unsigned long cookie)
|
||||
{
|
||||
struct sched_core_cookie *ptr = (void *)cookie;
|
||||
|
||||
@ -33,7 +33,7 @@ void sched_core_put_cookie(unsigned long cookie)
|
||||
}
|
||||
}
|
||||
|
||||
unsigned long sched_core_get_cookie(unsigned long cookie)
|
||||
static unsigned long sched_core_get_cookie(unsigned long cookie)
|
||||
{
|
||||
struct sched_core_cookie *ptr = (void *)cookie;
|
||||
|
||||
@ -53,7 +53,8 @@ unsigned long sched_core_get_cookie(unsigned long cookie)
|
||||
*
|
||||
* Returns: the old cookie
|
||||
*/
|
||||
unsigned long sched_core_update_cookie(struct task_struct *p, unsigned long cookie)
|
||||
static unsigned long sched_core_update_cookie(struct task_struct *p,
|
||||
unsigned long cookie)
|
||||
{
|
||||
unsigned long old_cookie;
|
||||
struct rq_flags rf;
|
||||
|
@ -1265,8 +1265,10 @@ static void update_curr_dl(struct rq *rq)
|
||||
return;
|
||||
}
|
||||
|
||||
schedstat_set(curr->se.statistics.exec_max,
|
||||
max(curr->se.statistics.exec_max, delta_exec));
|
||||
schedstat_set(curr->stats.exec_max,
|
||||
max(curr->stats.exec_max, delta_exec));
|
||||
|
||||
trace_sched_stat_runtime(curr, delta_exec, 0);
|
||||
|
||||
curr->se.sum_exec_runtime += delta_exec;
|
||||
account_group_exec_runtime(curr, delta_exec);
|
||||
@ -1472,6 +1474,82 @@ static inline bool __dl_less(struct rb_node *a, const struct rb_node *b)
|
||||
return dl_time_before(__node_2_dle(a)->deadline, __node_2_dle(b)->deadline);
|
||||
}
|
||||
|
||||
static inline struct sched_statistics *
|
||||
__schedstats_from_dl_se(struct sched_dl_entity *dl_se)
|
||||
{
|
||||
return &dl_task_of(dl_se)->stats;
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_wait_start_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se)
|
||||
{
|
||||
struct sched_statistics *stats;
|
||||
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
|
||||
stats = __schedstats_from_dl_se(dl_se);
|
||||
__update_stats_wait_start(rq_of_dl_rq(dl_rq), dl_task_of(dl_se), stats);
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_wait_end_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se)
|
||||
{
|
||||
struct sched_statistics *stats;
|
||||
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
|
||||
stats = __schedstats_from_dl_se(dl_se);
|
||||
__update_stats_wait_end(rq_of_dl_rq(dl_rq), dl_task_of(dl_se), stats);
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_enqueue_sleeper_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se)
|
||||
{
|
||||
struct sched_statistics *stats;
|
||||
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
|
||||
stats = __schedstats_from_dl_se(dl_se);
|
||||
__update_stats_enqueue_sleeper(rq_of_dl_rq(dl_rq), dl_task_of(dl_se), stats);
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_enqueue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
|
||||
int flags)
|
||||
{
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
|
||||
if (flags & ENQUEUE_WAKEUP)
|
||||
update_stats_enqueue_sleeper_dl(dl_rq, dl_se);
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
|
||||
int flags)
|
||||
{
|
||||
struct task_struct *p = dl_task_of(dl_se);
|
||||
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
|
||||
if ((flags & DEQUEUE_SLEEP)) {
|
||||
unsigned int state;
|
||||
|
||||
state = READ_ONCE(p->__state);
|
||||
if (state & TASK_INTERRUPTIBLE)
|
||||
__schedstat_set(p->stats.sleep_start,
|
||||
rq_clock(rq_of_dl_rq(dl_rq)));
|
||||
|
||||
if (state & TASK_UNINTERRUPTIBLE)
|
||||
__schedstat_set(p->stats.block_start,
|
||||
rq_clock(rq_of_dl_rq(dl_rq)));
|
||||
}
|
||||
}
|
||||
|
||||
static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
|
||||
{
|
||||
struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
|
||||
@ -1502,6 +1580,8 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags)
|
||||
{
|
||||
BUG_ON(on_dl_rq(dl_se));
|
||||
|
||||
update_stats_enqueue_dl(dl_rq_of_se(dl_se), dl_se, flags);
|
||||
|
||||
/*
|
||||
* If this is a wakeup or a new instance, the scheduling
|
||||
* parameters of the task might need updating. Otherwise,
|
||||
@ -1598,6 +1678,9 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
|
||||
return;
|
||||
}
|
||||
|
||||
check_schedstat_required();
|
||||
update_stats_wait_start_dl(dl_rq_of_se(&p->dl), &p->dl);
|
||||
|
||||
enqueue_dl_entity(&p->dl, flags);
|
||||
|
||||
if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
|
||||
@ -1606,6 +1689,7 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
|
||||
|
||||
static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
|
||||
{
|
||||
update_stats_dequeue_dl(&rq->dl, &p->dl, flags);
|
||||
dequeue_dl_entity(&p->dl);
|
||||
dequeue_pushable_dl_task(rq, p);
|
||||
}
|
||||
@ -1825,7 +1909,12 @@ static void start_hrtick_dl(struct rq *rq, struct task_struct *p)
|
||||
|
||||
static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool first)
|
||||
{
|
||||
struct sched_dl_entity *dl_se = &p->dl;
|
||||
struct dl_rq *dl_rq = &rq->dl;
|
||||
|
||||
p->se.exec_start = rq_clock_task(rq);
|
||||
if (on_dl_rq(&p->dl))
|
||||
update_stats_wait_end_dl(dl_rq, dl_se);
|
||||
|
||||
/* You can't push away the running task */
|
||||
dequeue_pushable_dl_task(rq, p);
|
||||
@ -1882,6 +1971,12 @@ static struct task_struct *pick_next_task_dl(struct rq *rq)
|
||||
|
||||
static void put_prev_task_dl(struct rq *rq, struct task_struct *p)
|
||||
{
|
||||
struct sched_dl_entity *dl_se = &p->dl;
|
||||
struct dl_rq *dl_rq = &rq->dl;
|
||||
|
||||
if (on_dl_rq(&p->dl))
|
||||
update_stats_wait_start_dl(dl_rq, dl_se);
|
||||
|
||||
update_curr_dl(rq);
|
||||
|
||||
update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 1);
|
||||
|
@ -311,6 +311,7 @@ static __init int sched_init_debug(void)
|
||||
|
||||
debugfs_create_u32("latency_ns", 0644, debugfs_sched, &sysctl_sched_latency);
|
||||
debugfs_create_u32("min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_min_granularity);
|
||||
debugfs_create_u32("idle_min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_idle_min_granularity);
|
||||
debugfs_create_u32("wakeup_granularity_ns", 0644, debugfs_sched, &sysctl_sched_wakeup_granularity);
|
||||
|
||||
debugfs_create_u32("latency_warn_ms", 0644, debugfs_sched, &sysctl_resched_latency_warn_ms);
|
||||
@ -448,9 +449,11 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group
|
||||
struct sched_entity *se = tg->se[cpu];
|
||||
|
||||
#define P(F) SEQ_printf(m, " .%-30s: %lld\n", #F, (long long)F)
|
||||
#define P_SCHEDSTAT(F) SEQ_printf(m, " .%-30s: %lld\n", #F, (long long)schedstat_val(F))
|
||||
#define P_SCHEDSTAT(F) SEQ_printf(m, " .%-30s: %lld\n", \
|
||||
#F, (long long)schedstat_val(stats->F))
|
||||
#define PN(F) SEQ_printf(m, " .%-30s: %lld.%06ld\n", #F, SPLIT_NS((long long)F))
|
||||
#define PN_SCHEDSTAT(F) SEQ_printf(m, " .%-30s: %lld.%06ld\n", #F, SPLIT_NS((long long)schedstat_val(F)))
|
||||
#define PN_SCHEDSTAT(F) SEQ_printf(m, " .%-30s: %lld.%06ld\n", \
|
||||
#F, SPLIT_NS((long long)schedstat_val(stats->F)))
|
||||
|
||||
if (!se)
|
||||
return;
|
||||
@ -460,16 +463,19 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group
|
||||
PN(se->sum_exec_runtime);
|
||||
|
||||
if (schedstat_enabled()) {
|
||||
PN_SCHEDSTAT(se->statistics.wait_start);
|
||||
PN_SCHEDSTAT(se->statistics.sleep_start);
|
||||
PN_SCHEDSTAT(se->statistics.block_start);
|
||||
PN_SCHEDSTAT(se->statistics.sleep_max);
|
||||
PN_SCHEDSTAT(se->statistics.block_max);
|
||||
PN_SCHEDSTAT(se->statistics.exec_max);
|
||||
PN_SCHEDSTAT(se->statistics.slice_max);
|
||||
PN_SCHEDSTAT(se->statistics.wait_max);
|
||||
PN_SCHEDSTAT(se->statistics.wait_sum);
|
||||
P_SCHEDSTAT(se->statistics.wait_count);
|
||||
struct sched_statistics *stats;
|
||||
stats = __schedstats_from_se(se);
|
||||
|
||||
PN_SCHEDSTAT(wait_start);
|
||||
PN_SCHEDSTAT(sleep_start);
|
||||
PN_SCHEDSTAT(block_start);
|
||||
PN_SCHEDSTAT(sleep_max);
|
||||
PN_SCHEDSTAT(block_max);
|
||||
PN_SCHEDSTAT(exec_max);
|
||||
PN_SCHEDSTAT(slice_max);
|
||||
PN_SCHEDSTAT(wait_max);
|
||||
PN_SCHEDSTAT(wait_sum);
|
||||
P_SCHEDSTAT(wait_count);
|
||||
}
|
||||
|
||||
P(se->load.weight);
|
||||
@ -535,10 +541,11 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
|
||||
(long long)(p->nvcsw + p->nivcsw),
|
||||
p->prio);
|
||||
|
||||
SEQ_printf(m, "%9Ld.%06ld %9Ld.%06ld %9Ld.%06ld",
|
||||
SPLIT_NS(schedstat_val_or_zero(p->se.statistics.wait_sum)),
|
||||
SEQ_printf(m, "%9lld.%06ld %9lld.%06ld %9lld.%06ld %9lld.%06ld",
|
||||
SPLIT_NS(schedstat_val_or_zero(p->stats.wait_sum)),
|
||||
SPLIT_NS(p->se.sum_exec_runtime),
|
||||
SPLIT_NS(schedstat_val_or_zero(p->se.statistics.sum_sleep_runtime)));
|
||||
SPLIT_NS(schedstat_val_or_zero(p->stats.sum_sleep_runtime)),
|
||||
SPLIT_NS(schedstat_val_or_zero(p->stats.sum_block_runtime)));
|
||||
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
SEQ_printf(m, " %d %d", task_node(p), task_numa_group_id(p));
|
||||
@ -614,6 +621,8 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
|
||||
cfs_rq->nr_spread_over);
|
||||
SEQ_printf(m, " .%-30s: %d\n", "nr_running", cfs_rq->nr_running);
|
||||
SEQ_printf(m, " .%-30s: %d\n", "h_nr_running", cfs_rq->h_nr_running);
|
||||
SEQ_printf(m, " .%-30s: %d\n", "idle_nr_running",
|
||||
cfs_rq->idle_nr_running);
|
||||
SEQ_printf(m, " .%-30s: %d\n", "idle_h_nr_running",
|
||||
cfs_rq->idle_h_nr_running);
|
||||
SEQ_printf(m, " .%-30s: %ld\n", "load", cfs_rq->load.weight);
|
||||
@ -810,6 +819,7 @@ static void sched_debug_header(struct seq_file *m)
|
||||
SEQ_printf(m, " .%-40s: %Ld.%06ld\n", #x, SPLIT_NS(x))
|
||||
PN(sysctl_sched_latency);
|
||||
PN(sysctl_sched_min_granularity);
|
||||
PN(sysctl_sched_idle_min_granularity);
|
||||
PN(sysctl_sched_wakeup_granularity);
|
||||
P(sysctl_sched_child_runs_first);
|
||||
P(sysctl_sched_features);
|
||||
@ -954,8 +964,8 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
|
||||
"---------------------------------------------------------"
|
||||
"----------\n");
|
||||
|
||||
#define P_SCHEDSTAT(F) __PS(#F, schedstat_val(p->F))
|
||||
#define PN_SCHEDSTAT(F) __PSN(#F, schedstat_val(p->F))
|
||||
#define P_SCHEDSTAT(F) __PS(#F, schedstat_val(p->stats.F))
|
||||
#define PN_SCHEDSTAT(F) __PSN(#F, schedstat_val(p->stats.F))
|
||||
|
||||
PN(se.exec_start);
|
||||
PN(se.vruntime);
|
||||
@ -968,33 +978,34 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
|
||||
if (schedstat_enabled()) {
|
||||
u64 avg_atom, avg_per_cpu;
|
||||
|
||||
PN_SCHEDSTAT(se.statistics.sum_sleep_runtime);
|
||||
PN_SCHEDSTAT(se.statistics.wait_start);
|
||||
PN_SCHEDSTAT(se.statistics.sleep_start);
|
||||
PN_SCHEDSTAT(se.statistics.block_start);
|
||||
PN_SCHEDSTAT(se.statistics.sleep_max);
|
||||
PN_SCHEDSTAT(se.statistics.block_max);
|
||||
PN_SCHEDSTAT(se.statistics.exec_max);
|
||||
PN_SCHEDSTAT(se.statistics.slice_max);
|
||||
PN_SCHEDSTAT(se.statistics.wait_max);
|
||||
PN_SCHEDSTAT(se.statistics.wait_sum);
|
||||
P_SCHEDSTAT(se.statistics.wait_count);
|
||||
PN_SCHEDSTAT(se.statistics.iowait_sum);
|
||||
P_SCHEDSTAT(se.statistics.iowait_count);
|
||||
P_SCHEDSTAT(se.statistics.nr_migrations_cold);
|
||||
P_SCHEDSTAT(se.statistics.nr_failed_migrations_affine);
|
||||
P_SCHEDSTAT(se.statistics.nr_failed_migrations_running);
|
||||
P_SCHEDSTAT(se.statistics.nr_failed_migrations_hot);
|
||||
P_SCHEDSTAT(se.statistics.nr_forced_migrations);
|
||||
P_SCHEDSTAT(se.statistics.nr_wakeups);
|
||||
P_SCHEDSTAT(se.statistics.nr_wakeups_sync);
|
||||
P_SCHEDSTAT(se.statistics.nr_wakeups_migrate);
|
||||
P_SCHEDSTAT(se.statistics.nr_wakeups_local);
|
||||
P_SCHEDSTAT(se.statistics.nr_wakeups_remote);
|
||||
P_SCHEDSTAT(se.statistics.nr_wakeups_affine);
|
||||
P_SCHEDSTAT(se.statistics.nr_wakeups_affine_attempts);
|
||||
P_SCHEDSTAT(se.statistics.nr_wakeups_passive);
|
||||
P_SCHEDSTAT(se.statistics.nr_wakeups_idle);
|
||||
PN_SCHEDSTAT(sum_sleep_runtime);
|
||||
PN_SCHEDSTAT(sum_block_runtime);
|
||||
PN_SCHEDSTAT(wait_start);
|
||||
PN_SCHEDSTAT(sleep_start);
|
||||
PN_SCHEDSTAT(block_start);
|
||||
PN_SCHEDSTAT(sleep_max);
|
||||
PN_SCHEDSTAT(block_max);
|
||||
PN_SCHEDSTAT(exec_max);
|
||||
PN_SCHEDSTAT(slice_max);
|
||||
PN_SCHEDSTAT(wait_max);
|
||||
PN_SCHEDSTAT(wait_sum);
|
||||
P_SCHEDSTAT(wait_count);
|
||||
PN_SCHEDSTAT(iowait_sum);
|
||||
P_SCHEDSTAT(iowait_count);
|
||||
P_SCHEDSTAT(nr_migrations_cold);
|
||||
P_SCHEDSTAT(nr_failed_migrations_affine);
|
||||
P_SCHEDSTAT(nr_failed_migrations_running);
|
||||
P_SCHEDSTAT(nr_failed_migrations_hot);
|
||||
P_SCHEDSTAT(nr_forced_migrations);
|
||||
P_SCHEDSTAT(nr_wakeups);
|
||||
P_SCHEDSTAT(nr_wakeups_sync);
|
||||
P_SCHEDSTAT(nr_wakeups_migrate);
|
||||
P_SCHEDSTAT(nr_wakeups_local);
|
||||
P_SCHEDSTAT(nr_wakeups_remote);
|
||||
P_SCHEDSTAT(nr_wakeups_affine);
|
||||
P_SCHEDSTAT(nr_wakeups_affine_attempts);
|
||||
P_SCHEDSTAT(nr_wakeups_passive);
|
||||
P_SCHEDSTAT(nr_wakeups_idle);
|
||||
|
||||
avg_atom = p->se.sum_exec_runtime;
|
||||
if (nr_switches)
|
||||
@ -1060,7 +1071,7 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
|
||||
void proc_sched_set_task(struct task_struct *p)
|
||||
{
|
||||
#ifdef CONFIG_SCHEDSTATS
|
||||
memset(&p->se.statistics, 0, sizeof(p->se.statistics));
|
||||
memset(&p->stats, 0, sizeof(p->stats));
|
||||
#endif
|
||||
}
|
||||
|
||||
|
@ -59,6 +59,14 @@ unsigned int sysctl_sched_tunable_scaling = SCHED_TUNABLESCALING_LOG;
|
||||
unsigned int sysctl_sched_min_granularity = 750000ULL;
|
||||
static unsigned int normalized_sysctl_sched_min_granularity = 750000ULL;
|
||||
|
||||
/*
|
||||
* Minimal preemption granularity for CPU-bound SCHED_IDLE tasks.
|
||||
* Applies only when SCHED_IDLE tasks compete with normal tasks.
|
||||
*
|
||||
* (default: 0.75 msec)
|
||||
*/
|
||||
unsigned int sysctl_sched_idle_min_granularity = 750000ULL;
|
||||
|
||||
/*
|
||||
* This value is kept at sysctl_sched_latency/sysctl_sched_min_granularity
|
||||
*/
|
||||
@ -665,6 +673,8 @@ static u64 __sched_period(unsigned long nr_running)
|
||||
return sysctl_sched_latency;
|
||||
}
|
||||
|
||||
static bool sched_idle_cfs_rq(struct cfs_rq *cfs_rq);
|
||||
|
||||
/*
|
||||
* We calculate the wall-time slice from the period by taking a part
|
||||
* proportional to the weight.
|
||||
@ -674,6 +684,8 @@ static u64 __sched_period(unsigned long nr_running)
|
||||
static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
||||
{
|
||||
unsigned int nr_running = cfs_rq->nr_running;
|
||||
struct sched_entity *init_se = se;
|
||||
unsigned int min_gran;
|
||||
u64 slice;
|
||||
|
||||
if (sched_feat(ALT_PERIOD))
|
||||
@ -684,12 +696,13 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
||||
for_each_sched_entity(se) {
|
||||
struct load_weight *load;
|
||||
struct load_weight lw;
|
||||
struct cfs_rq *qcfs_rq;
|
||||
|
||||
cfs_rq = cfs_rq_of(se);
|
||||
load = &cfs_rq->load;
|
||||
qcfs_rq = cfs_rq_of(se);
|
||||
load = &qcfs_rq->load;
|
||||
|
||||
if (unlikely(!se->on_rq)) {
|
||||
lw = cfs_rq->load;
|
||||
lw = qcfs_rq->load;
|
||||
|
||||
update_load_add(&lw, se->load.weight);
|
||||
load = &lw;
|
||||
@ -697,8 +710,14 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
||||
slice = __calc_delta(slice, se->load.weight, load);
|
||||
}
|
||||
|
||||
if (sched_feat(BASE_SLICE))
|
||||
slice = max(slice, (u64)sysctl_sched_min_granularity);
|
||||
if (sched_feat(BASE_SLICE)) {
|
||||
if (se_is_idle(init_se) && !sched_idle_cfs_rq(cfs_rq))
|
||||
min_gran = sysctl_sched_idle_min_granularity;
|
||||
else
|
||||
min_gran = sysctl_sched_min_granularity;
|
||||
|
||||
slice = max_t(u64, slice, min_gran);
|
||||
}
|
||||
|
||||
return slice;
|
||||
}
|
||||
@ -837,8 +856,13 @@ static void update_curr(struct cfs_rq *cfs_rq)
|
||||
|
||||
curr->exec_start = now;
|
||||
|
||||
schedstat_set(curr->statistics.exec_max,
|
||||
max(delta_exec, curr->statistics.exec_max));
|
||||
if (schedstat_enabled()) {
|
||||
struct sched_statistics *stats;
|
||||
|
||||
stats = __schedstats_from_se(curr);
|
||||
__schedstat_set(stats->exec_max,
|
||||
max(delta_exec, stats->exec_max));
|
||||
}
|
||||
|
||||
curr->sum_exec_runtime += delta_exec;
|
||||
schedstat_add(cfs_rq->exec_clock, delta_exec);
|
||||
@ -863,137 +887,70 @@ static void update_curr_fair(struct rq *rq)
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
||||
update_stats_wait_start_fair(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
||||
{
|
||||
u64 wait_start, prev_wait_start;
|
||||
struct sched_statistics *stats;
|
||||
struct task_struct *p = NULL;
|
||||
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
|
||||
wait_start = rq_clock(rq_of(cfs_rq));
|
||||
prev_wait_start = schedstat_val(se->statistics.wait_start);
|
||||
stats = __schedstats_from_se(se);
|
||||
|
||||
if (entity_is_task(se) && task_on_rq_migrating(task_of(se)) &&
|
||||
likely(wait_start > prev_wait_start))
|
||||
wait_start -= prev_wait_start;
|
||||
if (entity_is_task(se))
|
||||
p = task_of(se);
|
||||
|
||||
__schedstat_set(se->statistics.wait_start, wait_start);
|
||||
__update_stats_wait_start(rq_of(cfs_rq), p, stats);
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
||||
update_stats_wait_end_fair(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
||||
{
|
||||
struct task_struct *p;
|
||||
u64 delta;
|
||||
struct sched_statistics *stats;
|
||||
struct task_struct *p = NULL;
|
||||
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
|
||||
stats = __schedstats_from_se(se);
|
||||
|
||||
/*
|
||||
* When the sched_schedstat changes from 0 to 1, some sched se
|
||||
* maybe already in the runqueue, the se->statistics.wait_start
|
||||
* will be 0.So it will let the delta wrong. We need to avoid this
|
||||
* scenario.
|
||||
*/
|
||||
if (unlikely(!schedstat_val(se->statistics.wait_start)))
|
||||
if (unlikely(!schedstat_val(stats->wait_start)))
|
||||
return;
|
||||
|
||||
delta = rq_clock(rq_of(cfs_rq)) - schedstat_val(se->statistics.wait_start);
|
||||
|
||||
if (entity_is_task(se)) {
|
||||
if (entity_is_task(se))
|
||||
p = task_of(se);
|
||||
if (task_on_rq_migrating(p)) {
|
||||
/*
|
||||
* Preserve migrating task's wait time so wait_start
|
||||
* time stamp can be adjusted to accumulate wait time
|
||||
* prior to migration.
|
||||
*/
|
||||
__schedstat_set(se->statistics.wait_start, delta);
|
||||
return;
|
||||
}
|
||||
trace_sched_stat_wait(p, delta);
|
||||
}
|
||||
|
||||
__schedstat_set(se->statistics.wait_max,
|
||||
max(schedstat_val(se->statistics.wait_max), delta));
|
||||
__schedstat_inc(se->statistics.wait_count);
|
||||
__schedstat_add(se->statistics.wait_sum, delta);
|
||||
__schedstat_set(se->statistics.wait_start, 0);
|
||||
__update_stats_wait_end(rq_of(cfs_rq), p, stats);
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
||||
update_stats_enqueue_sleeper_fair(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
||||
{
|
||||
struct sched_statistics *stats;
|
||||
struct task_struct *tsk = NULL;
|
||||
u64 sleep_start, block_start;
|
||||
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
|
||||
sleep_start = schedstat_val(se->statistics.sleep_start);
|
||||
block_start = schedstat_val(se->statistics.block_start);
|
||||
stats = __schedstats_from_se(se);
|
||||
|
||||
if (entity_is_task(se))
|
||||
tsk = task_of(se);
|
||||
|
||||
if (sleep_start) {
|
||||
u64 delta = rq_clock(rq_of(cfs_rq)) - sleep_start;
|
||||
|
||||
if ((s64)delta < 0)
|
||||
delta = 0;
|
||||
|
||||
if (unlikely(delta > schedstat_val(se->statistics.sleep_max)))
|
||||
__schedstat_set(se->statistics.sleep_max, delta);
|
||||
|
||||
__schedstat_set(se->statistics.sleep_start, 0);
|
||||
__schedstat_add(se->statistics.sum_sleep_runtime, delta);
|
||||
|
||||
if (tsk) {
|
||||
account_scheduler_latency(tsk, delta >> 10, 1);
|
||||
trace_sched_stat_sleep(tsk, delta);
|
||||
}
|
||||
}
|
||||
if (block_start) {
|
||||
u64 delta = rq_clock(rq_of(cfs_rq)) - block_start;
|
||||
|
||||
if ((s64)delta < 0)
|
||||
delta = 0;
|
||||
|
||||
if (unlikely(delta > schedstat_val(se->statistics.block_max)))
|
||||
__schedstat_set(se->statistics.block_max, delta);
|
||||
|
||||
__schedstat_set(se->statistics.block_start, 0);
|
||||
__schedstat_add(se->statistics.sum_sleep_runtime, delta);
|
||||
|
||||
if (tsk) {
|
||||
if (tsk->in_iowait) {
|
||||
__schedstat_add(se->statistics.iowait_sum, delta);
|
||||
__schedstat_inc(se->statistics.iowait_count);
|
||||
trace_sched_stat_iowait(tsk, delta);
|
||||
}
|
||||
|
||||
trace_sched_stat_blocked(tsk, delta);
|
||||
|
||||
/*
|
||||
* Blocking time is in units of nanosecs, so shift by
|
||||
* 20 to get a milliseconds-range estimation of the
|
||||
* amount of time that the task spent sleeping:
|
||||
*/
|
||||
if (unlikely(prof_on == SLEEP_PROFILING)) {
|
||||
profile_hits(SLEEP_PROFILING,
|
||||
(void *)get_wchan(tsk),
|
||||
delta >> 20);
|
||||
}
|
||||
account_scheduler_latency(tsk, delta >> 10, 0);
|
||||
}
|
||||
}
|
||||
__update_stats_enqueue_sleeper(rq_of(cfs_rq), tsk, stats);
|
||||
}
|
||||
|
||||
/*
|
||||
* Task is being enqueued - update stats:
|
||||
*/
|
||||
static inline void
|
||||
update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
|
||||
update_stats_enqueue_fair(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
|
||||
{
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
@ -1003,14 +960,14 @@ update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
|
||||
* a dequeue/enqueue event is a NOP)
|
||||
*/
|
||||
if (se != cfs_rq->curr)
|
||||
update_stats_wait_start(cfs_rq, se);
|
||||
update_stats_wait_start_fair(cfs_rq, se);
|
||||
|
||||
if (flags & ENQUEUE_WAKEUP)
|
||||
update_stats_enqueue_sleeper(cfs_rq, se);
|
||||
update_stats_enqueue_sleeper_fair(cfs_rq, se);
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
|
||||
update_stats_dequeue_fair(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
|
||||
{
|
||||
|
||||
if (!schedstat_enabled())
|
||||
@ -1021,7 +978,7 @@ update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
|
||||
* waiting task:
|
||||
*/
|
||||
if (se != cfs_rq->curr)
|
||||
update_stats_wait_end(cfs_rq, se);
|
||||
update_stats_wait_end_fair(cfs_rq, se);
|
||||
|
||||
if ((flags & DEQUEUE_SLEEP) && entity_is_task(se)) {
|
||||
struct task_struct *tsk = task_of(se);
|
||||
@ -1030,10 +987,10 @@ update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
|
||||
/* XXX racy against TTWU */
|
||||
state = READ_ONCE(tsk->__state);
|
||||
if (state & TASK_INTERRUPTIBLE)
|
||||
__schedstat_set(se->statistics.sleep_start,
|
||||
__schedstat_set(tsk->stats.sleep_start,
|
||||
rq_clock(rq_of(cfs_rq)));
|
||||
if (state & TASK_UNINTERRUPTIBLE)
|
||||
__schedstat_set(se->statistics.block_start,
|
||||
__schedstat_set(tsk->stats.block_start,
|
||||
rq_clock(rq_of(cfs_rq)));
|
||||
}
|
||||
}
|
||||
@ -1081,11 +1038,12 @@ struct numa_group {
|
||||
unsigned long total_faults;
|
||||
unsigned long max_faults_cpu;
|
||||
/*
|
||||
* faults[] array is split into two regions: faults_mem and faults_cpu.
|
||||
*
|
||||
* Faults_cpu is used to decide whether memory should move
|
||||
* towards the CPU. As a consequence, these stats are weighted
|
||||
* more by CPU use than by memory faults.
|
||||
*/
|
||||
unsigned long *faults_cpu;
|
||||
unsigned long faults[];
|
||||
};
|
||||
|
||||
@ -1259,8 +1217,8 @@ static inline unsigned long group_faults(struct task_struct *p, int nid)
|
||||
|
||||
static inline unsigned long group_faults_cpu(struct numa_group *group, int nid)
|
||||
{
|
||||
return group->faults_cpu[task_faults_idx(NUMA_MEM, nid, 0)] +
|
||||
group->faults_cpu[task_faults_idx(NUMA_MEM, nid, 1)];
|
||||
return group->faults[task_faults_idx(NUMA_CPU, nid, 0)] +
|
||||
group->faults[task_faults_idx(NUMA_CPU, nid, 1)];
|
||||
}
|
||||
|
||||
static inline unsigned long group_faults_priv(struct numa_group *ng)
|
||||
@ -2116,7 +2074,7 @@ static void numa_migrate_preferred(struct task_struct *p)
|
||||
}
|
||||
|
||||
/*
|
||||
* Find out how many nodes on the workload is actively running on. Do this by
|
||||
* Find out how many nodes the workload is actively running on. Do this by
|
||||
* tracking the nodes from which NUMA hinting faults are triggered. This can
|
||||
* be different from the set of nodes where the workload's memory is currently
|
||||
* located.
|
||||
@ -2170,7 +2128,7 @@ static void update_task_scan_period(struct task_struct *p,
|
||||
|
||||
/*
|
||||
* If there were no record hinting faults then either the task is
|
||||
* completely idle or all activity is areas that are not of interest
|
||||
* completely idle or all activity is in areas that are not of interest
|
||||
* to automatic numa balancing. Related to that, if there were failed
|
||||
* migration then it implies we are migrating too quickly or the local
|
||||
* node is overloaded. In either case, scan slower
|
||||
@ -2427,7 +2385,7 @@ static void task_numa_placement(struct task_struct *p)
|
||||
* is at the beginning of the numa_faults array.
|
||||
*/
|
||||
ng->faults[mem_idx] += diff;
|
||||
ng->faults_cpu[mem_idx] += f_diff;
|
||||
ng->faults[cpu_idx] += f_diff;
|
||||
ng->total_faults += diff;
|
||||
group_faults += ng->faults[mem_idx];
|
||||
}
|
||||
@ -2481,7 +2439,8 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
|
||||
|
||||
if (unlikely(!deref_curr_numa_group(p))) {
|
||||
unsigned int size = sizeof(struct numa_group) +
|
||||
4*nr_node_ids*sizeof(unsigned long);
|
||||
NR_NUMA_HINT_FAULT_STATS *
|
||||
nr_node_ids * sizeof(unsigned long);
|
||||
|
||||
grp = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
|
||||
if (!grp)
|
||||
@ -2492,9 +2451,6 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
|
||||
grp->max_faults_cpu = 0;
|
||||
spin_lock_init(&grp->lock);
|
||||
grp->gid = p->pid;
|
||||
/* Second half of the array tracks nids where faults happen */
|
||||
grp->faults_cpu = grp->faults + NR_NUMA_HINT_FAULT_TYPES *
|
||||
nr_node_ids;
|
||||
|
||||
for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
|
||||
grp->faults[i] = p->numa_faults[i];
|
||||
@ -2995,6 +2951,8 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
||||
}
|
||||
#endif
|
||||
cfs_rq->nr_running++;
|
||||
if (se_is_idle(se))
|
||||
cfs_rq->idle_nr_running++;
|
||||
}
|
||||
|
||||
static void
|
||||
@ -3008,6 +2966,8 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
||||
}
|
||||
#endif
|
||||
cfs_rq->nr_running--;
|
||||
if (se_is_idle(se))
|
||||
cfs_rq->idle_nr_running--;
|
||||
}
|
||||
|
||||
/*
|
||||
@ -4207,7 +4167,12 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
|
||||
|
||||
/* sleeps up to a single latency don't count. */
|
||||
if (!initial) {
|
||||
unsigned long thresh = sysctl_sched_latency;
|
||||
unsigned long thresh;
|
||||
|
||||
if (se_is_idle(se))
|
||||
thresh = sysctl_sched_min_granularity;
|
||||
else
|
||||
thresh = sysctl_sched_latency;
|
||||
|
||||
/*
|
||||
* Halve their sleep time's effect, to allow
|
||||
@ -4225,26 +4190,6 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
|
||||
|
||||
static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
|
||||
|
||||
static inline void check_schedstat_required(void)
|
||||
{
|
||||
#ifdef CONFIG_SCHEDSTATS
|
||||
if (schedstat_enabled())
|
||||
return;
|
||||
|
||||
/* Force schedstat enabled if a dependent tracepoint is active */
|
||||
if (trace_sched_stat_wait_enabled() ||
|
||||
trace_sched_stat_sleep_enabled() ||
|
||||
trace_sched_stat_iowait_enabled() ||
|
||||
trace_sched_stat_blocked_enabled() ||
|
||||
trace_sched_stat_runtime_enabled()) {
|
||||
printk_deferred_once("Scheduler tracepoints stat_sleep, stat_iowait, "
|
||||
"stat_blocked and stat_runtime require the "
|
||||
"kernel parameter schedstats=enable or "
|
||||
"kernel.sched_schedstats=1\n");
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
||||
static inline bool cfs_bandwidth_used(void);
|
||||
|
||||
/*
|
||||
@ -4318,7 +4263,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
|
||||
place_entity(cfs_rq, se, 0);
|
||||
|
||||
check_schedstat_required();
|
||||
update_stats_enqueue(cfs_rq, se, flags);
|
||||
update_stats_enqueue_fair(cfs_rq, se, flags);
|
||||
check_spread(cfs_rq, se);
|
||||
if (!curr)
|
||||
__enqueue_entity(cfs_rq, se);
|
||||
@ -4402,7 +4347,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
|
||||
update_load_avg(cfs_rq, se, UPDATE_TG);
|
||||
se_update_runnable(se);
|
||||
|
||||
update_stats_dequeue(cfs_rq, se, flags);
|
||||
update_stats_dequeue_fair(cfs_rq, se, flags);
|
||||
|
||||
clear_buddies(cfs_rq, se);
|
||||
|
||||
@ -4487,7 +4432,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
||||
* a CPU. So account for the time it spent waiting on the
|
||||
* runqueue.
|
||||
*/
|
||||
update_stats_wait_end(cfs_rq, se);
|
||||
update_stats_wait_end_fair(cfs_rq, se);
|
||||
__dequeue_entity(cfs_rq, se);
|
||||
update_load_avg(cfs_rq, se, UPDATE_TG);
|
||||
}
|
||||
@ -4502,9 +4447,12 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
||||
*/
|
||||
if (schedstat_enabled() &&
|
||||
rq_of(cfs_rq)->cfs.load.weight >= 2*se->load.weight) {
|
||||
schedstat_set(se->statistics.slice_max,
|
||||
max((u64)schedstat_val(se->statistics.slice_max),
|
||||
se->sum_exec_runtime - se->prev_sum_exec_runtime));
|
||||
struct sched_statistics *stats;
|
||||
|
||||
stats = __schedstats_from_se(se);
|
||||
__schedstat_set(stats->slice_max,
|
||||
max((u64)stats->slice_max,
|
||||
se->sum_exec_runtime - se->prev_sum_exec_runtime));
|
||||
}
|
||||
|
||||
se->prev_sum_exec_runtime = se->sum_exec_runtime;
|
||||
@ -4586,7 +4534,7 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev)
|
||||
check_spread(cfs_rq, prev);
|
||||
|
||||
if (prev->on_rq) {
|
||||
update_stats_wait_start(cfs_rq, prev);
|
||||
update_stats_wait_start_fair(cfs_rq, prev);
|
||||
/* Put 'current' back into the tree. */
|
||||
__enqueue_entity(cfs_rq, prev);
|
||||
/* in !on_rq case, update occurred at dequeue */
|
||||
@ -4687,11 +4635,20 @@ static inline u64 sched_cfs_bandwidth_slice(void)
|
||||
*/
|
||||
void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
|
||||
{
|
||||
s64 runtime;
|
||||
|
||||
if (unlikely(cfs_b->quota == RUNTIME_INF))
|
||||
return;
|
||||
|
||||
cfs_b->runtime += cfs_b->quota;
|
||||
runtime = cfs_b->runtime_snap - cfs_b->runtime;
|
||||
if (runtime > 0) {
|
||||
cfs_b->burst_time += runtime;
|
||||
cfs_b->nr_burst++;
|
||||
}
|
||||
|
||||
cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
|
||||
cfs_b->runtime_snap = cfs_b->runtime;
|
||||
}
|
||||
|
||||
static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
|
||||
@ -5577,6 +5534,17 @@ static int sched_idle_rq(struct rq *rq)
|
||||
rq->nr_running);
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns true if cfs_rq only has SCHED_IDLE entities enqueued. Note the use
|
||||
* of idle_nr_running, which does not consider idle descendants of normal
|
||||
* entities.
|
||||
*/
|
||||
static bool sched_idle_cfs_rq(struct cfs_rq *cfs_rq)
|
||||
{
|
||||
return cfs_rq->nr_running &&
|
||||
cfs_rq->nr_running == cfs_rq->idle_nr_running;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_SMP
|
||||
static int sched_idle_cpu(int cpu)
|
||||
{
|
||||
@ -5787,6 +5755,7 @@ static struct {
|
||||
cpumask_var_t idle_cpus_mask;
|
||||
atomic_t nr_cpus;
|
||||
int has_blocked; /* Idle CPUS has blocked load */
|
||||
int needs_update; /* Newly idle CPUs need their next_balance collated */
|
||||
unsigned long next_balance; /* in jiffy units */
|
||||
unsigned long next_blocked; /* Next update of blocked load in jiffies */
|
||||
} nohz ____cacheline_aligned;
|
||||
@ -5997,12 +5966,12 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
|
||||
if (sched_feat(WA_WEIGHT) && target == nr_cpumask_bits)
|
||||
target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
|
||||
|
||||
schedstat_inc(p->se.statistics.nr_wakeups_affine_attempts);
|
||||
schedstat_inc(p->stats.nr_wakeups_affine_attempts);
|
||||
if (target == nr_cpumask_bits)
|
||||
return prev_cpu;
|
||||
|
||||
schedstat_inc(sd->ttwu_move_affine);
|
||||
schedstat_inc(p->se.statistics.nr_wakeups_affine);
|
||||
schedstat_inc(p->stats.nr_wakeups_affine);
|
||||
return target;
|
||||
}
|
||||
|
||||
@ -6443,11 +6412,6 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
|
||||
(available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
|
||||
cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr) &&
|
||||
asym_fits_capacity(task_util, recent_used_cpu)) {
|
||||
/*
|
||||
* Replace recent_used_cpu with prev as it is a potential
|
||||
* candidate for the next wake:
|
||||
*/
|
||||
p->recent_used_cpu = prev;
|
||||
return recent_used_cpu;
|
||||
}
|
||||
|
||||
@ -7806,7 +7770,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
|
||||
if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
|
||||
int cpu;
|
||||
|
||||
schedstat_inc(p->se.statistics.nr_failed_migrations_affine);
|
||||
schedstat_inc(p->stats.nr_failed_migrations_affine);
|
||||
|
||||
env->flags |= LBF_SOME_PINNED;
|
||||
|
||||
@ -7840,7 +7804,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
|
||||
env->flags &= ~LBF_ALL_PINNED;
|
||||
|
||||
if (task_running(env->src_rq, p)) {
|
||||
schedstat_inc(p->se.statistics.nr_failed_migrations_running);
|
||||
schedstat_inc(p->stats.nr_failed_migrations_running);
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -7862,12 +7826,12 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
|
||||
env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
|
||||
if (tsk_cache_hot == 1) {
|
||||
schedstat_inc(env->sd->lb_hot_gained[env->idle]);
|
||||
schedstat_inc(p->se.statistics.nr_forced_migrations);
|
||||
schedstat_inc(p->stats.nr_forced_migrations);
|
||||
}
|
||||
return 1;
|
||||
}
|
||||
|
||||
schedstat_inc(p->se.statistics.nr_failed_migrations_hot);
|
||||
schedstat_inc(p->stats.nr_failed_migrations_hot);
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -8601,6 +8565,99 @@ group_type group_classify(unsigned int imbalance_pct,
|
||||
return group_has_spare;
|
||||
}
|
||||
|
||||
/**
|
||||
* asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
|
||||
* @dst_cpu: Destination CPU of the load balancing
|
||||
* @sds: Load-balancing data with statistics of the local group
|
||||
* @sgs: Load-balancing statistics of the candidate busiest group
|
||||
* @sg: The candidate busiest group
|
||||
*
|
||||
* Check the state of the SMT siblings of both @sds::local and @sg and decide
|
||||
* if @dst_cpu can pull tasks.
|
||||
*
|
||||
* If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
|
||||
* the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
|
||||
* only if @dst_cpu has higher priority.
|
||||
*
|
||||
* If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
|
||||
* busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
|
||||
* Bigger imbalances in the number of busy CPUs will be dealt with in
|
||||
* update_sd_pick_busiest().
|
||||
*
|
||||
* If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
|
||||
* of @dst_cpu are idle and @sg has lower priority.
|
||||
*/
|
||||
static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
|
||||
struct sg_lb_stats *sgs,
|
||||
struct sched_group *sg)
|
||||
{
|
||||
#ifdef CONFIG_SCHED_SMT
|
||||
bool local_is_smt, sg_is_smt;
|
||||
int sg_busy_cpus;
|
||||
|
||||
local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
|
||||
sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
|
||||
|
||||
sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
|
||||
|
||||
if (!local_is_smt) {
|
||||
/*
|
||||
* If we are here, @dst_cpu is idle and does not have SMT
|
||||
* siblings. Pull tasks if candidate group has two or more
|
||||
* busy CPUs.
|
||||
*/
|
||||
if (sg_busy_cpus >= 2) /* implies sg_is_smt */
|
||||
return true;
|
||||
|
||||
/*
|
||||
* @dst_cpu does not have SMT siblings. @sg may have SMT
|
||||
* siblings and only one is busy. In such case, @dst_cpu
|
||||
* can help if it has higher priority and is idle (i.e.,
|
||||
* it has no running tasks).
|
||||
*/
|
||||
return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
|
||||
}
|
||||
|
||||
/* @dst_cpu has SMT siblings. */
|
||||
|
||||
if (sg_is_smt) {
|
||||
int local_busy_cpus = sds->local->group_weight -
|
||||
sds->local_stat.idle_cpus;
|
||||
int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
|
||||
|
||||
if (busy_cpus_delta == 1)
|
||||
return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
/*
|
||||
* @sg does not have SMT siblings. Ensure that @sds::local does not end
|
||||
* up with more than one busy SMT sibling and only pull tasks if there
|
||||
* are not busy CPUs (i.e., no CPU has running tasks).
|
||||
*/
|
||||
if (!sds->local_stat.sum_nr_running)
|
||||
return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
|
||||
|
||||
return false;
|
||||
#else
|
||||
/* Always return false so that callers deal with non-SMT cases. */
|
||||
return false;
|
||||
#endif
|
||||
}
|
||||
|
||||
static inline bool
|
||||
sched_asym(struct lb_env *env, struct sd_lb_stats *sds, struct sg_lb_stats *sgs,
|
||||
struct sched_group *group)
|
||||
{
|
||||
/* Only do SMT checks if either local or candidate have SMT siblings */
|
||||
if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
|
||||
(group->flags & SD_SHARE_CPUCAPACITY))
|
||||
return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
|
||||
|
||||
return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
|
||||
}
|
||||
|
||||
/**
|
||||
* update_sg_lb_stats - Update sched_group's statistics for load balancing.
|
||||
* @env: The load balancing environment.
|
||||
@ -8609,6 +8666,7 @@ group_type group_classify(unsigned int imbalance_pct,
|
||||
* @sg_status: Holds flag indicating the status of the sched_group
|
||||
*/
|
||||
static inline void update_sg_lb_stats(struct lb_env *env,
|
||||
struct sd_lb_stats *sds,
|
||||
struct sched_group *group,
|
||||
struct sg_lb_stats *sgs,
|
||||
int *sg_status)
|
||||
@ -8617,7 +8675,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
|
||||
|
||||
memset(sgs, 0, sizeof(*sgs));
|
||||
|
||||
local_group = cpumask_test_cpu(env->dst_cpu, sched_group_span(group));
|
||||
local_group = group == sds->local;
|
||||
|
||||
for_each_cpu_and(i, sched_group_span(group), env->cpus) {
|
||||
struct rq *rq = cpu_rq(i);
|
||||
@ -8660,18 +8718,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
|
||||
}
|
||||
}
|
||||
|
||||
/* Check if dst CPU is idle and preferred to this group */
|
||||
if (env->sd->flags & SD_ASYM_PACKING &&
|
||||
env->idle != CPU_NOT_IDLE &&
|
||||
sgs->sum_h_nr_running &&
|
||||
sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
|
||||
sgs->group_asym_packing = 1;
|
||||
}
|
||||
|
||||
sgs->group_capacity = group->sgc->capacity;
|
||||
|
||||
sgs->group_weight = group->group_weight;
|
||||
|
||||
/* Check if dst CPU is idle and preferred to this group */
|
||||
if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
|
||||
env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
|
||||
sched_asym(env, sds, sgs, group)) {
|
||||
sgs->group_asym_packing = 1;
|
||||
}
|
||||
|
||||
sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs);
|
||||
|
||||
/* Computing avg_load makes sense only when group is overloaded */
|
||||
@ -9180,7 +9237,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
|
||||
update_group_capacity(env->sd, env->dst_cpu);
|
||||
}
|
||||
|
||||
update_sg_lb_stats(env, sg, sgs, &sg_status);
|
||||
update_sg_lb_stats(env, sds, sg, sgs, &sg_status);
|
||||
|
||||
if (local_group)
|
||||
goto next_group;
|
||||
@ -9603,6 +9660,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
|
||||
nr_running == 1)
|
||||
continue;
|
||||
|
||||
/* Make sure we only pull tasks from a CPU of lower priority */
|
||||
if ((env->sd->flags & SD_ASYM_PACKING) &&
|
||||
sched_asym_prefer(i, env->dst_cpu) &&
|
||||
nr_running == 1)
|
||||
continue;
|
||||
|
||||
switch (env->migration_type) {
|
||||
case migrate_load:
|
||||
/*
|
||||
@ -10176,6 +10239,30 @@ void update_max_interval(void)
|
||||
max_load_balance_interval = HZ*num_online_cpus()/10;
|
||||
}
|
||||
|
||||
static inline bool update_newidle_cost(struct sched_domain *sd, u64 cost)
|
||||
{
|
||||
if (cost > sd->max_newidle_lb_cost) {
|
||||
/*
|
||||
* Track max cost of a domain to make sure to not delay the
|
||||
* next wakeup on the CPU.
|
||||
*/
|
||||
sd->max_newidle_lb_cost = cost;
|
||||
sd->last_decay_max_lb_cost = jiffies;
|
||||
} else if (time_after(jiffies, sd->last_decay_max_lb_cost + HZ)) {
|
||||
/*
|
||||
* Decay the newidle max times by ~1% per second to ensure that
|
||||
* it is not outdated and the current max cost is actually
|
||||
* shorter.
|
||||
*/
|
||||
sd->max_newidle_lb_cost = (sd->max_newidle_lb_cost * 253) / 256;
|
||||
sd->last_decay_max_lb_cost = jiffies;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
/*
|
||||
* It checks each scheduling domain to see if it is due to be balanced,
|
||||
* and initiates a balancing operation if so.
|
||||
@ -10199,14 +10286,9 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
|
||||
for_each_domain(cpu, sd) {
|
||||
/*
|
||||
* Decay the newidle max times here because this is a regular
|
||||
* visit to all the domains. Decay ~1% per second.
|
||||
* visit to all the domains.
|
||||
*/
|
||||
if (time_after(jiffies, sd->next_decay_max_lb_cost)) {
|
||||
sd->max_newidle_lb_cost =
|
||||
(sd->max_newidle_lb_cost * 253) / 256;
|
||||
sd->next_decay_max_lb_cost = jiffies + HZ;
|
||||
need_decay = 1;
|
||||
}
|
||||
need_decay = update_newidle_cost(sd, 0);
|
||||
max_cost += sd->max_newidle_lb_cost;
|
||||
|
||||
/*
|
||||
@ -10375,7 +10457,7 @@ static void nohz_balancer_kick(struct rq *rq)
|
||||
goto out;
|
||||
|
||||
if (rq->nr_running >= 2) {
|
||||
flags = NOHZ_KICK_MASK;
|
||||
flags = NOHZ_STATS_KICK | NOHZ_BALANCE_KICK;
|
||||
goto out;
|
||||
}
|
||||
|
||||
@ -10389,7 +10471,7 @@ static void nohz_balancer_kick(struct rq *rq)
|
||||
* on.
|
||||
*/
|
||||
if (rq->cfs.h_nr_running >= 1 && check_cpu_capacity(rq, sd)) {
|
||||
flags = NOHZ_KICK_MASK;
|
||||
flags = NOHZ_STATS_KICK | NOHZ_BALANCE_KICK;
|
||||
goto unlock;
|
||||
}
|
||||
}
|
||||
@ -10403,7 +10485,7 @@ static void nohz_balancer_kick(struct rq *rq)
|
||||
*/
|
||||
for_each_cpu_and(i, sched_domain_span(sd), nohz.idle_cpus_mask) {
|
||||
if (sched_asym_prefer(i, cpu)) {
|
||||
flags = NOHZ_KICK_MASK;
|
||||
flags = NOHZ_STATS_KICK | NOHZ_BALANCE_KICK;
|
||||
goto unlock;
|
||||
}
|
||||
}
|
||||
@ -10416,7 +10498,7 @@ static void nohz_balancer_kick(struct rq *rq)
|
||||
* to run the misfit task on.
|
||||
*/
|
||||
if (check_misfit_status(rq, sd)) {
|
||||
flags = NOHZ_KICK_MASK;
|
||||
flags = NOHZ_STATS_KICK | NOHZ_BALANCE_KICK;
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
@ -10443,13 +10525,16 @@ static void nohz_balancer_kick(struct rq *rq)
|
||||
*/
|
||||
nr_busy = atomic_read(&sds->nr_busy_cpus);
|
||||
if (nr_busy > 1) {
|
||||
flags = NOHZ_KICK_MASK;
|
||||
flags = NOHZ_STATS_KICK | NOHZ_BALANCE_KICK;
|
||||
goto unlock;
|
||||
}
|
||||
}
|
||||
unlock:
|
||||
rcu_read_unlock();
|
||||
out:
|
||||
if (READ_ONCE(nohz.needs_update))
|
||||
flags |= NOHZ_NEXT_KICK;
|
||||
|
||||
if (flags)
|
||||
kick_ilb(flags);
|
||||
}
|
||||
@ -10546,12 +10631,13 @@ void nohz_balance_enter_idle(int cpu)
|
||||
/*
|
||||
* Ensures that if nohz_idle_balance() fails to observe our
|
||||
* @idle_cpus_mask store, it must observe the @has_blocked
|
||||
* store.
|
||||
* and @needs_update stores.
|
||||
*/
|
||||
smp_mb__after_atomic();
|
||||
|
||||
set_cpu_sd_state_idle(cpu);
|
||||
|
||||
WRITE_ONCE(nohz.needs_update, 1);
|
||||
out:
|
||||
/*
|
||||
* Each time a cpu enter idle, we assume that it has blocked load and
|
||||
@ -10600,12 +10686,17 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
|
||||
/*
|
||||
* We assume there will be no idle load after this update and clear
|
||||
* the has_blocked flag. If a cpu enters idle in the mean time, it will
|
||||
* set the has_blocked flag and trig another update of idle load.
|
||||
* set the has_blocked flag and trigger another update of idle load.
|
||||
* Because a cpu that becomes idle, is added to idle_cpus_mask before
|
||||
* setting the flag, we are sure to not clear the state and not
|
||||
* check the load of an idle cpu.
|
||||
*
|
||||
* Same applies to idle_cpus_mask vs needs_update.
|
||||
*/
|
||||
WRITE_ONCE(nohz.has_blocked, 0);
|
||||
if (flags & NOHZ_STATS_KICK)
|
||||
WRITE_ONCE(nohz.has_blocked, 0);
|
||||
if (flags & NOHZ_NEXT_KICK)
|
||||
WRITE_ONCE(nohz.needs_update, 0);
|
||||
|
||||
/*
|
||||
* Ensures that if we miss the CPU, we must see the has_blocked
|
||||
@ -10627,13 +10718,17 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
|
||||
* balancing owner will pick it up.
|
||||
*/
|
||||
if (need_resched()) {
|
||||
has_blocked_load = true;
|
||||
if (flags & NOHZ_STATS_KICK)
|
||||
has_blocked_load = true;
|
||||
if (flags & NOHZ_NEXT_KICK)
|
||||
WRITE_ONCE(nohz.needs_update, 1);
|
||||
goto abort;
|
||||
}
|
||||
|
||||
rq = cpu_rq(balance_cpu);
|
||||
|
||||
has_blocked_load |= update_nohz_stats(rq);
|
||||
if (flags & NOHZ_STATS_KICK)
|
||||
has_blocked_load |= update_nohz_stats(rq);
|
||||
|
||||
/*
|
||||
* If time for next balance is due,
|
||||
@ -10664,8 +10759,9 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
|
||||
if (likely(update_next_balance))
|
||||
nohz.next_balance = next_balance;
|
||||
|
||||
WRITE_ONCE(nohz.next_blocked,
|
||||
now + msecs_to_jiffies(LOAD_AVG_PERIOD));
|
||||
if (flags & NOHZ_STATS_KICK)
|
||||
WRITE_ONCE(nohz.next_blocked,
|
||||
now + msecs_to_jiffies(LOAD_AVG_PERIOD));
|
||||
|
||||
abort:
|
||||
/* There is still blocked load, enable periodic update */
|
||||
@ -10763,9 +10859,9 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
|
||||
{
|
||||
unsigned long next_balance = jiffies + HZ;
|
||||
int this_cpu = this_rq->cpu;
|
||||
u64 t0, t1, curr_cost = 0;
|
||||
struct sched_domain *sd;
|
||||
int pulled_task = 0;
|
||||
u64 curr_cost = 0;
|
||||
|
||||
update_misfit_status(NULL, this_rq);
|
||||
|
||||
@ -10796,47 +10892,49 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
|
||||
*/
|
||||
rq_unpin_lock(this_rq, rf);
|
||||
|
||||
if (this_rq->avg_idle < sysctl_sched_migration_cost ||
|
||||
!READ_ONCE(this_rq->rd->overload)) {
|
||||
rcu_read_lock();
|
||||
sd = rcu_dereference_check_sched_domain(this_rq->sd);
|
||||
|
||||
if (!READ_ONCE(this_rq->rd->overload) ||
|
||||
(sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) {
|
||||
|
||||
rcu_read_lock();
|
||||
sd = rcu_dereference_check_sched_domain(this_rq->sd);
|
||||
if (sd)
|
||||
update_next_balance(sd, &next_balance);
|
||||
rcu_read_unlock();
|
||||
|
||||
goto out;
|
||||
}
|
||||
rcu_read_unlock();
|
||||
|
||||
raw_spin_rq_unlock(this_rq);
|
||||
|
||||
t0 = sched_clock_cpu(this_cpu);
|
||||
update_blocked_averages(this_cpu);
|
||||
|
||||
rcu_read_lock();
|
||||
for_each_domain(this_cpu, sd) {
|
||||
int continue_balancing = 1;
|
||||
u64 t0, domain_cost;
|
||||
u64 domain_cost;
|
||||
|
||||
if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
|
||||
update_next_balance(sd, &next_balance);
|
||||
update_next_balance(sd, &next_balance);
|
||||
|
||||
if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost)
|
||||
break;
|
||||
}
|
||||
|
||||
if (sd->flags & SD_BALANCE_NEWIDLE) {
|
||||
t0 = sched_clock_cpu(this_cpu);
|
||||
|
||||
pulled_task = load_balance(this_cpu, this_rq,
|
||||
sd, CPU_NEWLY_IDLE,
|
||||
&continue_balancing);
|
||||
|
||||
domain_cost = sched_clock_cpu(this_cpu) - t0;
|
||||
if (domain_cost > sd->max_newidle_lb_cost)
|
||||
sd->max_newidle_lb_cost = domain_cost;
|
||||
t1 = sched_clock_cpu(this_cpu);
|
||||
domain_cost = t1 - t0;
|
||||
update_newidle_cost(sd, domain_cost);
|
||||
|
||||
curr_cost += domain_cost;
|
||||
t0 = t1;
|
||||
}
|
||||
|
||||
update_next_balance(sd, &next_balance);
|
||||
|
||||
/*
|
||||
* Stop searching for tasks to pull if there are
|
||||
* now runnable tasks on this rq.
|
||||
@ -11394,7 +11492,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
|
||||
if (!cfs_rq)
|
||||
goto err;
|
||||
|
||||
se = kzalloc_node(sizeof(struct sched_entity),
|
||||
se = kzalloc_node(sizeof(struct sched_entity_stats),
|
||||
GFP_KERNEL, cpu_to_node(i));
|
||||
if (!se)
|
||||
goto err_free_rq;
|
||||
@ -11560,7 +11658,7 @@ int sched_group_set_idle(struct task_group *tg, long idle)
|
||||
for_each_possible_cpu(i) {
|
||||
struct rq *rq = cpu_rq(i);
|
||||
struct sched_entity *se = tg->se[i];
|
||||
struct cfs_rq *grp_cfs_rq = tg->cfs_rq[i];
|
||||
struct cfs_rq *parent_cfs_rq, *grp_cfs_rq = tg->cfs_rq[i];
|
||||
bool was_idle = cfs_rq_is_idle(grp_cfs_rq);
|
||||
long idle_task_delta;
|
||||
struct rq_flags rf;
|
||||
@ -11571,6 +11669,14 @@ int sched_group_set_idle(struct task_group *tg, long idle)
|
||||
if (WARN_ON_ONCE(was_idle == cfs_rq_is_idle(grp_cfs_rq)))
|
||||
goto next_cpu;
|
||||
|
||||
if (se->on_rq) {
|
||||
parent_cfs_rq = cfs_rq_of(se);
|
||||
if (cfs_rq_is_idle(grp_cfs_rq))
|
||||
parent_cfs_rq->idle_nr_running++;
|
||||
else
|
||||
parent_cfs_rq->idle_nr_running--;
|
||||
}
|
||||
|
||||
idle_task_delta = grp_cfs_rq->h_nr_running -
|
||||
grp_cfs_rq->idle_h_nr_running;
|
||||
if (!cfs_rq_is_idle(grp_cfs_rq))
|
||||
|
@ -46,11 +46,16 @@ SCHED_FEAT(DOUBLE_TICK, false)
|
||||
*/
|
||||
SCHED_FEAT(NONTASK_CAPACITY, true)
|
||||
|
||||
#ifdef CONFIG_PREEMPT_RT
|
||||
SCHED_FEAT(TTWU_QUEUE, false)
|
||||
#else
|
||||
|
||||
/*
|
||||
* Queue remote wakeups on the target CPU and process them
|
||||
* using the scheduler IPI. Reduces rq->lock contention/bounces.
|
||||
*/
|
||||
SCHED_FEAT(TTWU_QUEUE, true)
|
||||
#endif
|
||||
|
||||
/*
|
||||
* When doing wakeups, attempt to limit superfluous scans of the LLC domain.
|
||||
|
@ -1009,8 +1009,10 @@ static void update_curr_rt(struct rq *rq)
|
||||
if (unlikely((s64)delta_exec <= 0))
|
||||
return;
|
||||
|
||||
schedstat_set(curr->se.statistics.exec_max,
|
||||
max(curr->se.statistics.exec_max, delta_exec));
|
||||
schedstat_set(curr->stats.exec_max,
|
||||
max(curr->stats.exec_max, delta_exec));
|
||||
|
||||
trace_sched_stat_runtime(curr, delta_exec, 0);
|
||||
|
||||
curr->se.sum_exec_runtime += delta_exec;
|
||||
account_group_exec_runtime(curr, delta_exec);
|
||||
@ -1271,6 +1273,112 @@ static void __delist_rt_entity(struct sched_rt_entity *rt_se, struct rt_prio_arr
|
||||
rt_se->on_list = 0;
|
||||
}
|
||||
|
||||
static inline struct sched_statistics *
|
||||
__schedstats_from_rt_se(struct sched_rt_entity *rt_se)
|
||||
{
|
||||
#ifdef CONFIG_RT_GROUP_SCHED
|
||||
/* schedstats is not supported for rt group. */
|
||||
if (!rt_entity_is_task(rt_se))
|
||||
return NULL;
|
||||
#endif
|
||||
|
||||
return &rt_task_of(rt_se)->stats;
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_wait_start_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se)
|
||||
{
|
||||
struct sched_statistics *stats;
|
||||
struct task_struct *p = NULL;
|
||||
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
|
||||
if (rt_entity_is_task(rt_se))
|
||||
p = rt_task_of(rt_se);
|
||||
|
||||
stats = __schedstats_from_rt_se(rt_se);
|
||||
if (!stats)
|
||||
return;
|
||||
|
||||
__update_stats_wait_start(rq_of_rt_rq(rt_rq), p, stats);
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_enqueue_sleeper_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se)
|
||||
{
|
||||
struct sched_statistics *stats;
|
||||
struct task_struct *p = NULL;
|
||||
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
|
||||
if (rt_entity_is_task(rt_se))
|
||||
p = rt_task_of(rt_se);
|
||||
|
||||
stats = __schedstats_from_rt_se(rt_se);
|
||||
if (!stats)
|
||||
return;
|
||||
|
||||
__update_stats_enqueue_sleeper(rq_of_rt_rq(rt_rq), p, stats);
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_enqueue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se,
|
||||
int flags)
|
||||
{
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
|
||||
if (flags & ENQUEUE_WAKEUP)
|
||||
update_stats_enqueue_sleeper_rt(rt_rq, rt_se);
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_wait_end_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se)
|
||||
{
|
||||
struct sched_statistics *stats;
|
||||
struct task_struct *p = NULL;
|
||||
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
|
||||
if (rt_entity_is_task(rt_se))
|
||||
p = rt_task_of(rt_se);
|
||||
|
||||
stats = __schedstats_from_rt_se(rt_se);
|
||||
if (!stats)
|
||||
return;
|
||||
|
||||
__update_stats_wait_end(rq_of_rt_rq(rt_rq), p, stats);
|
||||
}
|
||||
|
||||
static inline void
|
||||
update_stats_dequeue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se,
|
||||
int flags)
|
||||
{
|
||||
struct task_struct *p = NULL;
|
||||
|
||||
if (!schedstat_enabled())
|
||||
return;
|
||||
|
||||
if (rt_entity_is_task(rt_se))
|
||||
p = rt_task_of(rt_se);
|
||||
|
||||
if ((flags & DEQUEUE_SLEEP) && p) {
|
||||
unsigned int state;
|
||||
|
||||
state = READ_ONCE(p->__state);
|
||||
if (state & TASK_INTERRUPTIBLE)
|
||||
__schedstat_set(p->stats.sleep_start,
|
||||
rq_clock(rq_of_rt_rq(rt_rq)));
|
||||
|
||||
if (state & TASK_UNINTERRUPTIBLE)
|
||||
__schedstat_set(p->stats.block_start,
|
||||
rq_clock(rq_of_rt_rq(rt_rq)));
|
||||
}
|
||||
}
|
||||
|
||||
static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
|
||||
{
|
||||
struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
|
||||
@ -1344,6 +1452,8 @@ static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
|
||||
{
|
||||
struct rq *rq = rq_of_rt_se(rt_se);
|
||||
|
||||
update_stats_enqueue_rt(rt_rq_of_se(rt_se), rt_se, flags);
|
||||
|
||||
dequeue_rt_stack(rt_se, flags);
|
||||
for_each_sched_rt_entity(rt_se)
|
||||
__enqueue_rt_entity(rt_se, flags);
|
||||
@ -1354,6 +1464,8 @@ static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
|
||||
{
|
||||
struct rq *rq = rq_of_rt_se(rt_se);
|
||||
|
||||
update_stats_dequeue_rt(rt_rq_of_se(rt_se), rt_se, flags);
|
||||
|
||||
dequeue_rt_stack(rt_se, flags);
|
||||
|
||||
for_each_sched_rt_entity(rt_se) {
|
||||
@ -1376,6 +1488,9 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
|
||||
if (flags & ENQUEUE_WAKEUP)
|
||||
rt_se->timeout = 0;
|
||||
|
||||
check_schedstat_required();
|
||||
update_stats_wait_start_rt(rt_rq_of_se(rt_se), rt_se);
|
||||
|
||||
enqueue_rt_entity(rt_se, flags);
|
||||
|
||||
if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
|
||||
@ -1576,7 +1691,12 @@ static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p, int flag
|
||||
|
||||
static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool first)
|
||||
{
|
||||
struct sched_rt_entity *rt_se = &p->rt;
|
||||
struct rt_rq *rt_rq = &rq->rt;
|
||||
|
||||
p->se.exec_start = rq_clock_task(rq);
|
||||
if (on_rt_rq(&p->rt))
|
||||
update_stats_wait_end_rt(rt_rq, rt_se);
|
||||
|
||||
/* The running task is never eligible for pushing */
|
||||
dequeue_pushable_task(rq, p);
|
||||
@ -1650,6 +1770,12 @@ static struct task_struct *pick_next_task_rt(struct rq *rq)
|
||||
|
||||
static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
|
||||
{
|
||||
struct sched_rt_entity *rt_se = &p->rt;
|
||||
struct rt_rq *rt_rq = &rq->rt;
|
||||
|
||||
if (on_rt_rq(&p->rt))
|
||||
update_stats_wait_start_rt(rt_rq, rt_se);
|
||||
|
||||
update_curr_rt(rq);
|
||||
|
||||
update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 1);
|
||||
|
@ -368,6 +368,7 @@ struct cfs_bandwidth {
|
||||
u64 quota;
|
||||
u64 runtime;
|
||||
u64 burst;
|
||||
u64 runtime_snap;
|
||||
s64 hierarchical_quota;
|
||||
|
||||
u8 idle;
|
||||
@ -380,7 +381,9 @@ struct cfs_bandwidth {
|
||||
/* Statistics: */
|
||||
int nr_periods;
|
||||
int nr_throttled;
|
||||
int nr_burst;
|
||||
u64 throttled_time;
|
||||
u64 burst_time;
|
||||
#endif
|
||||
};
|
||||
|
||||
@ -529,6 +532,7 @@ struct cfs_rq {
|
||||
struct load_weight load;
|
||||
unsigned int nr_running;
|
||||
unsigned int h_nr_running; /* SCHED_{NORMAL,BATCH,IDLE} */
|
||||
unsigned int idle_nr_running; /* SCHED_IDLE */
|
||||
unsigned int idle_h_nr_running; /* SCHED_IDLE */
|
||||
|
||||
u64 exec_clock;
|
||||
@ -1253,11 +1257,6 @@ extern void sched_core_dequeue(struct rq *rq, struct task_struct *p);
|
||||
extern void sched_core_get(void);
|
||||
extern void sched_core_put(void);
|
||||
|
||||
extern unsigned long sched_core_alloc_cookie(void);
|
||||
extern void sched_core_put_cookie(unsigned long cookie);
|
||||
extern unsigned long sched_core_get_cookie(unsigned long cookie);
|
||||
extern unsigned long sched_core_update_cookie(struct task_struct *p, unsigned long cookie);
|
||||
|
||||
#else /* !CONFIG_SCHED_CORE */
|
||||
|
||||
static inline bool sched_core_enabled(struct rq *rq)
|
||||
@ -1421,11 +1420,6 @@ static inline struct cfs_rq *group_cfs_rq(struct sched_entity *grp)
|
||||
|
||||
extern void update_rq_clock(struct rq *rq);
|
||||
|
||||
static inline u64 __rq_clock_broken(struct rq *rq)
|
||||
{
|
||||
return READ_ONCE(rq->clock);
|
||||
}
|
||||
|
||||
/*
|
||||
* rq::clock_update_flags bits
|
||||
*
|
||||
@ -1620,14 +1614,6 @@ rq_lock(struct rq *rq, struct rq_flags *rf)
|
||||
rq_pin_lock(rq, rf);
|
||||
}
|
||||
|
||||
static inline void
|
||||
rq_relock(struct rq *rq, struct rq_flags *rf)
|
||||
__acquires(rq->lock)
|
||||
{
|
||||
raw_spin_rq_lock(rq);
|
||||
rq_repin_lock(rq, rf);
|
||||
}
|
||||
|
||||
static inline void
|
||||
rq_unlock_irqrestore(struct rq *rq, struct rq_flags *rf)
|
||||
__releases(rq->lock)
|
||||
@ -1808,6 +1794,7 @@ struct sched_group {
|
||||
unsigned int group_weight;
|
||||
struct sched_group_capacity *sgc;
|
||||
int asym_prefer_cpu; /* CPU of highest priority in group */
|
||||
int flags;
|
||||
|
||||
/*
|
||||
* The CPUs this group covers.
|
||||
@ -2401,6 +2388,7 @@ extern const_debug unsigned int sysctl_sched_migration_cost;
|
||||
#ifdef CONFIG_SCHED_DEBUG
|
||||
extern unsigned int sysctl_sched_latency;
|
||||
extern unsigned int sysctl_sched_min_granularity;
|
||||
extern unsigned int sysctl_sched_idle_min_granularity;
|
||||
extern unsigned int sysctl_sched_wakeup_granularity;
|
||||
extern int sysctl_resched_latency_warn_ms;
|
||||
extern int sysctl_resched_latency_warn_once;
|
||||
@ -2708,12 +2696,18 @@ extern void cfs_bandwidth_usage_dec(void);
|
||||
#define NOHZ_BALANCE_KICK_BIT 0
|
||||
#define NOHZ_STATS_KICK_BIT 1
|
||||
#define NOHZ_NEWILB_KICK_BIT 2
|
||||
#define NOHZ_NEXT_KICK_BIT 3
|
||||
|
||||
/* Run rebalance_domains() */
|
||||
#define NOHZ_BALANCE_KICK BIT(NOHZ_BALANCE_KICK_BIT)
|
||||
/* Update blocked load */
|
||||
#define NOHZ_STATS_KICK BIT(NOHZ_STATS_KICK_BIT)
|
||||
/* Update blocked load when entering idle */
|
||||
#define NOHZ_NEWILB_KICK BIT(NOHZ_NEWILB_KICK_BIT)
|
||||
/* Update nohz.next_balance */
|
||||
#define NOHZ_NEXT_KICK BIT(NOHZ_NEXT_KICK_BIT)
|
||||
|
||||
#define NOHZ_KICK_MASK (NOHZ_BALANCE_KICK | NOHZ_STATS_KICK)
|
||||
#define NOHZ_KICK_MASK (NOHZ_BALANCE_KICK | NOHZ_STATS_KICK | NOHZ_NEXT_KICK)
|
||||
|
||||
#define nohz_flags(cpu) (&cpu_rq(cpu)->nohz_flags)
|
||||
|
||||
|
@ -4,6 +4,110 @@
|
||||
*/
|
||||
#include "sched.h"
|
||||
|
||||
void __update_stats_wait_start(struct rq *rq, struct task_struct *p,
|
||||
struct sched_statistics *stats)
|
||||
{
|
||||
u64 wait_start, prev_wait_start;
|
||||
|
||||
wait_start = rq_clock(rq);
|
||||
prev_wait_start = schedstat_val(stats->wait_start);
|
||||
|
||||
if (p && likely(wait_start > prev_wait_start))
|
||||
wait_start -= prev_wait_start;
|
||||
|
||||
__schedstat_set(stats->wait_start, wait_start);
|
||||
}
|
||||
|
||||
void __update_stats_wait_end(struct rq *rq, struct task_struct *p,
|
||||
struct sched_statistics *stats)
|
||||
{
|
||||
u64 delta = rq_clock(rq) - schedstat_val(stats->wait_start);
|
||||
|
||||
if (p) {
|
||||
if (task_on_rq_migrating(p)) {
|
||||
/*
|
||||
* Preserve migrating task's wait time so wait_start
|
||||
* time stamp can be adjusted to accumulate wait time
|
||||
* prior to migration.
|
||||
*/
|
||||
__schedstat_set(stats->wait_start, delta);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
trace_sched_stat_wait(p, delta);
|
||||
}
|
||||
|
||||
__schedstat_set(stats->wait_max,
|
||||
max(schedstat_val(stats->wait_max), delta));
|
||||
__schedstat_inc(stats->wait_count);
|
||||
__schedstat_add(stats->wait_sum, delta);
|
||||
__schedstat_set(stats->wait_start, 0);
|
||||
}
|
||||
|
||||
void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p,
|
||||
struct sched_statistics *stats)
|
||||
{
|
||||
u64 sleep_start, block_start;
|
||||
|
||||
sleep_start = schedstat_val(stats->sleep_start);
|
||||
block_start = schedstat_val(stats->block_start);
|
||||
|
||||
if (sleep_start) {
|
||||
u64 delta = rq_clock(rq) - sleep_start;
|
||||
|
||||
if ((s64)delta < 0)
|
||||
delta = 0;
|
||||
|
||||
if (unlikely(delta > schedstat_val(stats->sleep_max)))
|
||||
__schedstat_set(stats->sleep_max, delta);
|
||||
|
||||
__schedstat_set(stats->sleep_start, 0);
|
||||
__schedstat_add(stats->sum_sleep_runtime, delta);
|
||||
|
||||
if (p) {
|
||||
account_scheduler_latency(p, delta >> 10, 1);
|
||||
trace_sched_stat_sleep(p, delta);
|
||||
}
|
||||
}
|
||||
|
||||
if (block_start) {
|
||||
u64 delta = rq_clock(rq) - block_start;
|
||||
|
||||
if ((s64)delta < 0)
|
||||
delta = 0;
|
||||
|
||||
if (unlikely(delta > schedstat_val(stats->block_max)))
|
||||
__schedstat_set(stats->block_max, delta);
|
||||
|
||||
__schedstat_set(stats->block_start, 0);
|
||||
__schedstat_add(stats->sum_sleep_runtime, delta);
|
||||
__schedstat_add(stats->sum_block_runtime, delta);
|
||||
|
||||
if (p) {
|
||||
if (p->in_iowait) {
|
||||
__schedstat_add(stats->iowait_sum, delta);
|
||||
__schedstat_inc(stats->iowait_count);
|
||||
trace_sched_stat_iowait(p, delta);
|
||||
}
|
||||
|
||||
trace_sched_stat_blocked(p, delta);
|
||||
|
||||
/*
|
||||
* Blocking time is in units of nanosecs, so shift by
|
||||
* 20 to get a milliseconds-range estimation of the
|
||||
* amount of time that the task spent sleeping:
|
||||
*/
|
||||
if (unlikely(prof_on == SLEEP_PROFILING)) {
|
||||
profile_hits(SLEEP_PROFILING,
|
||||
(void *)get_wchan(p),
|
||||
delta >> 20);
|
||||
}
|
||||
account_scheduler_latency(p, delta >> 10, 0);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Current schedstat API version.
|
||||
*
|
||||
|
@ -2,6 +2,8 @@
|
||||
|
||||
#ifdef CONFIG_SCHEDSTATS
|
||||
|
||||
extern struct static_key_false sched_schedstats;
|
||||
|
||||
/*
|
||||
* Expects runqueue lock to be held for atomicity of update
|
||||
*/
|
||||
@ -40,7 +42,31 @@ rq_sched_info_dequeue(struct rq *rq, unsigned long long delta)
|
||||
#define schedstat_val(var) (var)
|
||||
#define schedstat_val_or_zero(var) ((schedstat_enabled()) ? (var) : 0)
|
||||
|
||||
void __update_stats_wait_start(struct rq *rq, struct task_struct *p,
|
||||
struct sched_statistics *stats);
|
||||
|
||||
void __update_stats_wait_end(struct rq *rq, struct task_struct *p,
|
||||
struct sched_statistics *stats);
|
||||
void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p,
|
||||
struct sched_statistics *stats);
|
||||
|
||||
static inline void
|
||||
check_schedstat_required(void)
|
||||
{
|
||||
if (schedstat_enabled())
|
||||
return;
|
||||
|
||||
/* Force schedstat enabled if a dependent tracepoint is active */
|
||||
if (trace_sched_stat_wait_enabled() ||
|
||||
trace_sched_stat_sleep_enabled() ||
|
||||
trace_sched_stat_iowait_enabled() ||
|
||||
trace_sched_stat_blocked_enabled() ||
|
||||
trace_sched_stat_runtime_enabled())
|
||||
printk_deferred_once("Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enable or kernel.sched_schedstats=1\n");
|
||||
}
|
||||
|
||||
#else /* !CONFIG_SCHEDSTATS: */
|
||||
|
||||
static inline void rq_sched_info_arrive (struct rq *rq, unsigned long long delta) { }
|
||||
static inline void rq_sched_info_dequeue(struct rq *rq, unsigned long long delta) { }
|
||||
static inline void rq_sched_info_depart (struct rq *rq, unsigned long long delta) { }
|
||||
@ -53,8 +79,31 @@ static inline void rq_sched_info_depart (struct rq *rq, unsigned long long delt
|
||||
# define schedstat_set(var, val) do { } while (0)
|
||||
# define schedstat_val(var) 0
|
||||
# define schedstat_val_or_zero(var) 0
|
||||
|
||||
# define __update_stats_wait_start(rq, p, stats) do { } while (0)
|
||||
# define __update_stats_wait_end(rq, p, stats) do { } while (0)
|
||||
# define __update_stats_enqueue_sleeper(rq, p, stats) do { } while (0)
|
||||
# define check_schedstat_required() do { } while (0)
|
||||
|
||||
#endif /* CONFIG_SCHEDSTATS */
|
||||
|
||||
#ifdef CONFIG_FAIR_GROUP_SCHED
|
||||
struct sched_entity_stats {
|
||||
struct sched_entity se;
|
||||
struct sched_statistics stats;
|
||||
} __no_randomize_layout;
|
||||
#endif
|
||||
|
||||
static inline struct sched_statistics *
|
||||
__schedstats_from_se(struct sched_entity *se)
|
||||
{
|
||||
#ifdef CONFIG_FAIR_GROUP_SCHED
|
||||
if (!entity_is_task(se))
|
||||
return &container_of(se, struct sched_entity_stats, se)->stats;
|
||||
#endif
|
||||
return &task_of(se)->stats;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_PSI
|
||||
/*
|
||||
* PSI tracks state that persists across sleeps, such as iowaits and
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user