Commit Graph

18572 Commits

Author SHA1 Message Date
Kirill Tkhai
e5673f2805 sched/fair: Remove double_lock_balance() from active_load_balance_cpu_stop()
Avoid double_rq_lock() and use the TASK_ON_RQ_MIGRATING state for
active_load_balance_cpu_stop(). The advantage is (obviously) not
holding two 'rq->lock's at the same time and thereby increasing
parallelism.

Further note that if there was no task to migrate we will not
have acquired the second rq->lock at all.

The important point to note is that because we acquire dst->lock
immediately after releasing src->lock the potential wait time of
task_rq_lock() callers on TASK_ON_RQ_MIGRATING is not longer
than it would have been in the double rq lock scenario.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1408528081.23412.92.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20 14:53:03 +02:00
Kirill Tkhai
a1e0182979 sched: Remove double_rq_lock() from __migrate_task()
Avoid double_rq_lock() and use TASK_ON_RQ_MIGRATING for
__migrate_task(). The advantage is (obviously) not holding two
rq->lock's at the same time and thereby increasing parallelism.

The important point to note is that because we acquire dst->lock
immediately after releasing src->lock the potential wait time of
task_rq_lock() callers on TASK_ON_RQ_MIGRATING is not longer
than it would have been in the double rq lock scenario.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1408528070.23412.89.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20 14:53:02 +02:00
Kirill Tkhai
cca26e8009 sched: Teach scheduler to understand TASK_ON_RQ_MIGRATING state
This is a new p->on_rq state which will be used to indicate that a task
is in a process of migrating between two RQs. It allows to get
rid of double_rq_lock(), which we used to use to change a rq of
a queued task before.

Let's consider an example. To move a task between src_rq and
dst_rq we will do the following:

	raw_spin_lock(&src_rq->lock);
	/* p is a task which is queued on src_rq */
	p = ...;

	dequeue_task(src_rq, p, 0);
	p->on_rq = TASK_ON_RQ_MIGRATING;
	set_task_cpu(p, dst_cpu);
	raw_spin_unlock(&src_rq->lock);

    	/*
    	 * Both RQs are unlocked here.
    	 * Task p is dequeued from src_rq
    	 * but its on_rq value is not zero.
    	 */

	raw_spin_lock(&dst_rq->lock);
	p->on_rq = TASK_ON_RQ_QUEUED;
	enqueue_task(dst_rq, p, 0);
	raw_spin_unlock(&dst_rq->lock);

While p->on_rq is TASK_ON_RQ_MIGRATING, task is considered as
"migrating", and other parallel scheduler actions with it are
not available to parallel callers. The parallel caller is
spining till migration is completed.

The unavailable actions are changing of cpu affinity, changing
of priority etc, in other words all the functionality which used
to require task_rq(p)->lock before (and related to the task).

To implement TASK_ON_RQ_MIGRATING support we primarily are using
the following fact. Most of scheduler users (from which we are
protecting a migrating task) use task_rq_lock() and
__task_rq_lock() to get the lock of task_rq(p). These primitives
know that task's cpu may change, and they are spining while the
lock of the right RQ is not held. We add one more condition into
them, so they will be also spinning until the migration is
finished.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1408528062.23412.88.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20 14:53:00 +02:00
Kirill Tkhai
da0c1e65b5 sched: Add wrapper for checking task_struct::on_rq
Implement task_on_rq_queued() and use it everywhere instead of
on_rq check. No functional changes.

The only exception is we do not use the wrapper in
check_for_tasks(), because it requires to export
task_on_rq_queued() in global header files. Next patch in series
would return it back, so we do not twist it from here to there.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1408528052.23412.87.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20 14:52:59 +02:00
Kirill Tkhai
f36c019c79 sched/fair: Fix reschedule which is generated on throttled cfs_rq
(sched_entity::on_rq == 1) does not guarantee the task is pickable;
changes on throttled cfs_rq must not lead to reschedule.

Check for task_struct::on_rq instead.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1407312361.8424.35.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20 09:47:20 +02:00
Pranith Kumar
8b06c55bdb sched: Match declaration with definition
Match the declaration of runqueues with the definition.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1407950893-32731-1-git-send-email-bobby.prani@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20 09:47:19 +02:00
Oleg Nesterov
5aface53d1 sched: Change autogroup_move_group() to use for_each_thread()
Change autogroup_move_group() to use for_each_thread() instead of
buggy while_each_thread().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Frank Mayhar <fmayhar@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sanjay Rao <srao@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140813192003.GA19334@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20 09:47:18 +02:00
Oleg Nesterov
1e4dda08b4 sched: Change thread_group_cputime() to use for_each_thread()
Change thread_group_cputime() to use for_each_thread() instead of
buggy while_each_thread(). This also makes the pid_alive() check
unnecessary.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Frank Mayhar <fmayhar@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sanjay Rao <srao@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140813192000.GA19327@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20 09:47:18 +02:00
Oleg Nesterov
d38e83c715 sched: s/do_each_thread/for_each_process_thread/ in debug.c
Change kernel/sched/debug.c to use for_each_process_thread().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Frank Mayhar <fmayhar@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sanjay Rao <srao@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140813191956.GA19324@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20 09:47:17 +02:00
Oleg Nesterov
5d07f4202c sched: s/do_each_thread/for_each_process_thread/ in core.c
Change kernel/sched/core.c to use for_each_process_thread().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Frank Mayhar <fmayhar@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sanjay Rao <srao@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140813191953.GA19315@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20 09:47:17 +02:00
Rik van Riel
83d7f24247 sched/numa: Fix numa capacity computation
Commit c61037e9 fixes the phenomenon of 'fantom' cores due to
N*frac(smt_power) >= 1 by limiting the capacity to the actual
number of cores in the load balancing code.

This patch applies the same correction to the NUMA balancing
code.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: vincent.guittot@linaro.org
Cc: Morten.Rasmussen@arm.com
Cc: nicolas.pitre@linaro.org
Cc: efault@gmx.de
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1407173008-9334-3-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-12 12:48:23 +02:00
Rik van Riel
b932c03c34 sched/numa: Fix off-by-one in capacity check
Commit a43455a1d5 ensures that
task_numa_migrate will call task_numa_compare on the preferred
node all the time, even when the preferred node has no free capacity.

This could lead to a performance regression if nr_running == capacity
on both the source and the destination node. This can be avoided by
also checking for nr_running == capacity on the source node, which is
one stricter than checking .has_free_capacity.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: vincent.guittot@linaro.org
Cc: Morten.Rasmussen@arm.com
Cc: nicolas.pitre@linaro.org
Cc: efault@gmx.de
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1407173008-9334-2-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-12 12:48:22 +02:00
Zhihui Zhang
aaecac4ad4 sched: Rename a misleading variable in build_overlap_sched_groups()
The child variable in build_overlap_sched_groups() actually refers to the
peer or sibling domain of the given CPU. Rename it to sibling to be consistent
with the naming in build_group_mask().

Signed-off-by: Zhihui Zhang <zzhsuny@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1406942283-18249-1-git-send-email-zzhsuny@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-12 12:48:21 +02:00
Peter Zijlstra
9a5d9ba6a3 sched/fair: Allow calculate_imbalance() to move idle cpus
Allow calculate_imbalance() to 'create' idle cpus in the busiest group
if there are idle cpus in the local group.

Suggested-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140729152705.GX12054@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-12 12:48:20 +02:00
Rik van Riel
caeb178c60 sched/fair: Make update_sd_pick_busiest() return 'true' on a busier sd
Currently update_sd_pick_busiest only identifies the busiest sd
that is either overloaded, or has a group imbalance. When no
sd is imbalanced or overloaded, the load balancer fails to find
the busiest domain.

This breaks load balancing between domains that are not overloaded,
in the !SD_ASYM_PACKING case. This patch makes update_sd_pick_busiest
return true when the busiest sd yet is encountered.

Groups are ranked in the order overloaded > imbalanced > other,
with higher ranked groups getting priority even when their load
is lower. This is necessary due to the possibility of unequal
capacities and cpumasks between domains within a sched group.

Behaviour for SD_ASYM_PACKING does not seem to match the comment,
but I have no hardware to test that so I have left the behaviour
of that code unchanged.

Enum for group classification suggested by Peter Zijlstra.

Signed-off-by: Rik van Riel <riel@redhat.com>
[peterz: replaced sg_lb_stats::group_imb with the new enum group_type
         in an attempt to avoid endless recalculation]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Cc: ktkhai@parallels.com
Cc: tim.c.chen@linux.intel.com
Cc: nicolas.pitre@linaro.org
Cc: jhladky@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140729152743.GI3935@laptop
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-12 12:48:19 +02:00
Peter Zijlstra
743cb1ff19 sched/fair: Make calculate_imbalance() independent
Rik noticed that calculate_imbalance() relies on
update_sd_pick_busiest() to guarantee that busiest->sum_nr_running >
busiest->group_capacity_factor.

Break this implicit assumption (with the intent of not providing it
anymore) by having calculat_imbalance() verify it and not rely on
others.

Reported-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20140729152631.GW12054@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-12 12:48:18 +02:00
Linus Torvalds
98959948a7 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - Move the nohz kick code out of the scheduler tick to a dedicated IPI,
   from Frederic Weisbecker.

  This necessiated quite some background infrastructure rework,
  including:

   * Clean up some irq-work internals
   * Implement remote irq-work
   * Implement nohz kick on top of remote irq-work
   * Move full dynticks timer enqueue notification to new kick
   * Move multi-task notification to new kick
   * Remove unecessary barriers on multi-task notification

 - Remove proliferation of wait_on_bit() action functions and allow
   wait_on_bit_action() functions to support a timeout.  (Neil Brown)

 - Another round of sched/numa improvements, cleanups and fixes.  (Rik
   van Riel)

 - Implement fast idling of CPUs when the system is partially loaded,
   for better scalability.  (Tim Chen)

 - Restructure and fix the CPU hotplug handling code that may leave
   cfs_rq and rt_rq's throttled when tasks are migrated away from a dead
   cpu.  (Kirill Tkhai)

 - Robustify the sched topology setup code.  (Peterz Zijlstra)

 - Improve sched_feat() handling wrt.  static_keys (Jason Baron)

 - Misc fixes.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  sched/fair: Fix 'make xmldocs' warning caused by missing description
  sched: Use macro for magic number of -1 for setparam
  sched: Robustify topology setup
  sched: Fix sched_setparam() policy == -1 logic
  sched: Allow wait_on_bit_action() functions to support a timeout
  sched: Remove proliferation of wait_on_bit() action functions
  sched/numa: Revert "Use effective_load() to balance NUMA loads"
  sched: Fix static_key race with sched_feat()
  sched: Remove extra static_key*() function indirection
  sched/rt: Fix replenish_dl_entity() comments to match the current upstream code
  sched: Transform resched_task() into resched_curr()
  sched/deadline: Kill task_struct->pi_top_task
  sched: Rework check_for_tasks()
  sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime()
  sched/fair: Disable runtime_enabled on dying rq
  sched/numa: Change scan period code to match intent
  sched/numa: Rework best node setting in task_numa_migrate()
  sched/numa: Examine a task move when examining a task swap
  sched/numa: Simplify task_numa_compare()
  sched/numa: Use effective_load() to balance NUMA loads
  ...
2014-08-04 16:23:30 -07:00
Linus Torvalds
ef35ad26f8 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Kernel side changes:

   - Consolidate the PMU interrupt-disabled code amongst architectures
     (Vince Weaver)

   - misc fixes

  Tooling changes (new features, user visible changes):

   - Add support for pagefault tracing in 'trace', please see multiple
     examples in the changeset messages (Stanislav Fomichev).

   - Add pagefault statistics in 'trace' (Stanislav Fomichev)

   - Add header for columns in 'top' and 'report' TUI browsers (Jiri
     Olsa)

   - Add pagefault statistics in 'trace' (Stanislav Fomichev)

   - Add IO mode into timechart command (Stanislav Fomichev)

   - Fallback to syscalls:* when raw_syscalls:* is not available in the
     perl and python perf scripts.  (Daniel Bristot de Oliveira)

   - Add --repeat global option to 'perf bench' to be used in benchmarks
     such as the existing 'futex' one, that was modified to use it
     instead of a local option.  (Davidlohr Bueso)

   - Fix fd -> pathname resolution in 'trace', be it using /proc or a
     vfs_getname probe point.  (Arnaldo Carvalho de Melo)

   - Add suggestion of how to set perf_event_paranoid sysctl, to help
     non-root users trying tools like 'trace' to get a working
     environment.  (Arnaldo Carvalho de Melo)

   - Updates from trace-cmd for traceevent plugin_kvm plus args cleanup
     (Steven Rostedt, Jan Kiszka)

   - Support S/390 in 'perf kvm stat' (Alexander Yarygin)

  Tooling infrastructure changes:

   - Allow reserving a row for header purposes in the hists browser
     (Arnaldo Carvalho de Melo)

   - Various fixes and prep work related to supporting Intel PT (Adrian
     Hunter)

   - Introduce multiple debug variables control (Jiri Olsa)

   - Add callchain and additional sample information for python scripts
     (Joseph Schuchart)

   - More prep work to support Intel PT: (Adrian Hunter)
     - Polishing 'script' BTS output
     - 'inject' can specify --kallsym
     - VDSO is per machine, not a global var
     - Expose data addr lookup functions previously private to 'script'
     - Large mmap fixes in events processing

   - Include standard stringify macros in power pc code (Sukadev
     Bhattiprolu)

  Tooling cleanups:

   - Convert open coded equivalents to asprintf() (Andy Shevchenko)

   - Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo)

   - Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de
     Melo)

   - No need to reimplement err() in 'perf bench sched-messaging', drop
     barf().  (Davidlohr Bueso).

   - Remove ev_name argument from perf_evsel__hists_browse, can be
     obtained from the other parameters.  (Jiri Olsa)

  Tooling fixes:

   - Fix memory leak in the 'sched-messaging' perf bench test.
     (Davidlohr Bueso)

   - The -o and -n 'perf bench mem' options are mutually exclusive, emit
     error when both are specified.  (Davidlohr Bueso)

   - Fix scrollbar refresh row index in the ui browser, problem exposed
     now that headers will be added and will be allowed to be switched
     on/off.  (Jiri Olsa)

   - Handle the num array type in python properly (Sebastian Andrzej
     Siewior)

   - Fix wrong condition for allocation failure (Jiri Olsa)

   - Adjust callchain based on DWARF debug info on powerpc (Sukadev
     Bhattiprolu)

   - Fix a risk for doing free on uninitialized pointer in traceevent
     lib (Rickard Strandqvist)

   - Update attr test with PERF_FLAG_FD_CLOEXEC flag (Jiri Olsa)

   - Enable close-on-exec flag on perf file descriptor (Yann Droneaud)

   - Fix build on gcc 4.4.7 (Arnaldo Carvalho de Melo)

   - Event ordering fixes (Jiri Olsa)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (123 commits)
  Revert "perf tools: Fix jump label always changing during tracing"
  perf tools: Fix perf usage string leftover
  perf: Check permission only for parent tracepoint event
  perf record: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds
  perf record: Always force PERF_RECORD_FINISHED_ROUND event
  perf inject: Add --kallsyms parameter
  perf tools: Expose 'addr' functions so they can be reused
  perf session: Fix accounting of ordered samples queue
  perf powerpc: Include util/util.h and remove stringify macros
  perf tools: Fix build on gcc 4.4.7
  perf tools: Add thread parameter to vdso__dso_findnew()
  perf tools: Add dso__type()
  perf tools: Separate the VDSO map name from the VDSO dso name
  perf tools: Add vdso__new()
  perf machine: Fix the lifetime of the VDSO temporary file
  perf tools: Group VDSO global variables into a structure
  perf session: Add ability to skip 4GiB or more
  perf session: Add ability to 'skip' a non-piped event stream
  perf tools: Pass machine to vdso__dso_findnew()
  perf tools: Add dso__data_size()
  ...
2014-08-04 16:09:53 -07:00
Linus Torvalds
8efb90cf1e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle are:

   - big rtmutex and futex cleanup and robustification from Thomas
     Gleixner
   - mutex optimizations and refinements from Jason Low
   - arch_mutex_cpu_relax() removal and related cleanups
   - smaller lockdep tweaks"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  arch, locking: Ciao arch_mutex_cpu_relax()
  locking/lockdep: Only ask for /proc/lock_stat output when available
  locking/mutexes: Optimize mutex trylock slowpath
  locking/mutexes: Try to acquire mutex only if it is unlocked
  locking/mutexes: Delete the MUTEX_SHOW_NO_WAITER macro
  locking/mutexes: Correct documentation on mutex optimistic spinning
  rtmutex: Make the rtmutex tester depend on BROKEN
  futex: Simplify futex_lock_pi_atomic() and make it more robust
  futex: Split out the first waiter attachment from lookup_pi_state()
  futex: Split out the waiter check from lookup_pi_state()
  futex: Use futex_top_waiter() in lookup_pi_state()
  futex: Make unlock_pi more robust
  rtmutex: Avoid pointless requeueing in the deadlock detection chain walk
  rtmutex: Cleanup deadlock detector debug logic
  rtmutex: Confine deadlock logic to futex
  rtmutex: Simplify remove_waiter()
  rtmutex: Document pi chain walk
  rtmutex: Clarify the boost/deboost part
  rtmutex: No need to keep task ref for lock owner check
  rtmutex: Simplify and document try_to_take_rtmutex()
  ...
2014-08-04 16:09:06 -07:00
Linus Torvalds
5bda4f638f Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU changes from Ingo Molar:
 "The main changes:

   - torture-test updates
   - callback-offloading changes
   - maintainership changes
   - update RCU documentation
   - miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missing
  rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp()
  rcu: Fix a sparse warning in rcu_initiate_boost()
  rcu: Fix __rcu_reclaim() to use true/false for bool
  rcu: Remove CONFIG_PROVE_RCU_DELAY
  rcu: Use __this_cpu_read() instead of per_cpu_ptr()
  rcu: Don't use NMIs to dump other CPUs' stacks
  rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  rcu: Simplify priority boosting by putting rt_mutex in rcu_node
  rcu: Check both root and current rcu_node when setting up future grace period
  rcu: Allow post-unlock reference for rt_mutex
  rcu: Loosen __call_rcu()'s rcu_head alignment constraint
  rcu: Eliminate read-modify-write ACCESS_ONCE() calls
  rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu
  rcu: Make rcu node arrays static const char * const
  signal: Explain local_irq_save() call
  rcu: Handle obsolete references to TINY_PREEMPT_RCU
  rcu: Document deadlock-avoidance information for rcu_read_unlock()
  scripts: Teach get_maintainer.pl about the new "R:" tag
  rcu: Update rcu torture maintainership filename patterns
  ...
2014-08-04 15:55:08 -07:00
Linus Torvalds
c9b88e9581 Oleg Nesterov did several clean ups with the tracing filter code.
As he found some small bugs that went into 3.16, and these changes were
 based on that, I had to apply his changes to a separate branch than
 my main development branch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT3533AAoJEKQekfcNnQGuaNkIALM9uf6gvIXgvyAlhK1qfjJL
 0pxAlgppukA7hZT9gcubGuStPxKnwtVSvd/GwGI/mc7Ls57vOvdA61TEHN48l2MW
 SUev1Jeu1Nx3Wh4B2ivRq+dJadrwU9NwG507yvuXn2L8Gqzwal133dg3AHnsTYgz
 yu7oe8baxMniFFYlAmSELFp2cA3kli/WrRuQB7weudbT12jZlpOMqz7OUN9o7dfo
 E4XfwkkDUuo9EiKD1QwVMk2cIkCaHFI8lJDYJbrM3L5XJKfnKzZNRMWfwDG2L7hS
 DGjMl6T0SjU+ZIoTidHdbyF/Q+I0o9kKTnUrErGu88+2Z1gfEKN5O8zovNMbPUM=
 =dT/x
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing filter cleanups from Steven Rostedt:
 "Oleg Nesterov did several clean ups with the tracing filter code.  As
  he found some small bugs that went into 3.16, and these changes were
  based on that, I had to apply his changes to a separate branch than my
  main development branch.

  This was based on work that was already pulled into 3.16, and is a
  separate pull request to keep from having local merges in my pull
  request"

* tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Kill "filter_string" arg of replace_preds()
  tracing: Change apply_subsystem_event_filter() paths to check file->system == dir
  tracing: Kill ftrace_event_call->files
  tracing/uprobes: Kill the dead TRACE_EVENT_FL_USE_CALL_FILTER logic
  tracing: Kill call_filter_disable()
  tracing: Kill destroy_call_preds()
  tracing: Kill destroy_preds() and destroy_file_preds()
2014-08-04 12:02:48 -07:00
Linus Torvalds
b8c0aa46b3 This pull request has a lot of work done. The main thing is the changes
to the ftrace function callback infrastructure. It's introducing a
 way to allow different functions to call directly different trampolines
 instead of all calling the same "mcount" one.
 
 The only user of this for now is the function graph tracer, which always
 had a different trampoline, but the function tracer trampoline was called
 and did basically nothing, and then the function graph tracer trampoline
 was called. The difference now, is that the function graph tracer
 trampoline can be called directly if a function is only being traced by
 the function graph trampoline. If function tracing is also happening on
 the same function, the old way is still done.
 
 The accounting for this takes up more memory when function graph tracing
 is activated, as it needs to keep track of which functions it uses.
 I have a new way that wont take as much memory, but it's not ready yet
 for this merge window, and will have to wait for the next one.
 
 Another big change was the removal of the ftrace_start/stop() calls that
 were used by the suspend/resume code that stopped function tracing when
 entering into suspend and resume paths. The stop of ftrace was done
 because there was some function that would crash the system if one called
 smp_processor_id()! The stop/start was a big hammer to solve the issue
 at the time, which was when ftrace was first introduced into Linux.
 Now ftrace has better infrastructure to debug such issues, and I found
 the problem function and labeled it with "notrace" and function tracing
 can now safely be activated all the way down into the guts of suspend
 and resume.
 
 Other changes include clean ups of uprobe code.
 Clean up of the trace_seq() code.
 And other various small fixes and clean ups to ftrace and tracing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT35zXAAoJEKQekfcNnQGuOz0H/38zqM0nLFhrgvz3EPk2UOjn
 xqpX8qyb2V7TJZL+IqeXU2a5cQZl5ba0D4WtBGpxbTae3CJYiuQ87iKUNFoH0om5
 FDpn80igb368k8V3qRdRsziKVCCf0XBd/NkHJXc0ZkfXGyzB2Ga4bBxALxp2gj9y
 bnO+vKo6+tWYKG4hyQb4P3LRXUrK8/LWEsPr39cH2QH1Rdj69Lx9CgrCdUVJmwcb
 Bj8hEiLXL/RYCFNn79A3wNTUvW0rG/AOIf4SLqXtasSRZ0ToaU0ZyDnrNv+0Ol47
 rX8tSk+LfXchL9hpIvjCf1vlAYq3pO02favteR/jip3lx/dTjEDE4RJ9qtJzZ4Q=
 =fwQY
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This pull request has a lot of work done.  The main thing is the
  changes to the ftrace function callback infrastructure.  It's
  introducing a way to allow different functions to call directly
  different trampolines instead of all calling the same "mcount" one.

  The only user of this for now is the function graph tracer, which
  always had a different trampoline, but the function tracer trampoline
  was called and did basically nothing, and then the function graph
  tracer trampoline was called.  The difference now, is that the
  function graph tracer trampoline can be called directly if a function
  is only being traced by the function graph trampoline.  If function
  tracing is also happening on the same function, the old way is still
  done.

  The accounting for this takes up more memory when function graph
  tracing is activated, as it needs to keep track of which functions it
  uses.  I have a new way that wont take as much memory, but it's not
  ready yet for this merge window, and will have to wait for the next
  one.

  Another big change was the removal of the ftrace_start/stop() calls
  that were used by the suspend/resume code that stopped function
  tracing when entering into suspend and resume paths.  The stop of
  ftrace was done because there was some function that would crash the
  system if one called smp_processor_id()! The stop/start was a big
  hammer to solve the issue at the time, which was when ftrace was first
  introduced into Linux.  Now ftrace has better infrastructure to debug
  such issues, and I found the problem function and labeled it with
  "notrace" and function tracing can now safely be activated all the way
  down into the guts of suspend and resume

  Other changes include clean ups of uprobe code, clean up of the
  trace_seq() code, and other various small fixes and clean ups to
  ftrace and tracing"

* tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits)
  ftrace: Add warning if tramp hash does not match nr_trampolines
  ftrace: Fix trampoline hash update check on rec->flags
  ring-buffer: Use rb_page_size() instead of open coded head_page size
  ftrace: Rename ftrace_ops field from trampolines to nr_trampolines
  tracing: Convert local function_graph functions to static
  ftrace: Do not copy old hash when resetting
  tracing: let user specify tracing_thresh after selecting function_graph
  ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on()
  tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST
  s390/ftrace: remove check of obsolete variable function_trace_stop
  arm64, ftrace: Remove check of obsolete variable function_trace_stop
  Blackfin: ftrace: Remove check of obsolete variable function_trace_stop
  metag: ftrace: Remove check of obsolete variable function_trace_stop
  microblaze: ftrace: Remove check of obsolete variable function_trace_stop
  MIPS: ftrace: Remove check of obsolete variable function_trace_stop
  parisc: ftrace: Remove check of obsolete variable function_trace_stop
  sh: ftrace: Remove check of obsolete variable function_trace_stop
  sparc64,ftrace: Remove check of obsolete variable function_trace_stop
  tile: ftrace: Remove check of obsolete variable function_trace_stop
  ftrace: x86: Remove check of obsolete variable function_trace_stop
  ...
2014-08-04 11:50:00 -07:00
Linus Torvalds
47dfe4037e Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
 "Mostly changes to get the v2 interface ready.  The core features are
  mostly ready now and I think it's reasonable to expect to drop the
  devel mask in one or two devel cycles at least for a subset of
  controllers.

   - cgroup added a controller dependency mechanism so that block cgroup
     can depend on memory cgroup.  This will be used to finally support
     IO provisioning on the writeback traffic, which is currently being
     implemented.

   - The v2 interface now uses a separate table so that the interface
     files for the new interface are explicitly declared in one place.
     Each controller will explicitly review and add the files for the
     new interface.

   - cpuset is getting ready for the hierarchical behavior which is in
     the similar style with other controllers so that an ancestor's
     configuration change doesn't change the descendants' configurations
     irreversibly and processes aren't silently migrated when a CPU or
     node goes down.

  All the changes are to the new interface and no behavior changed for
  the multiple hierarchies"

* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (29 commits)
  cpuset: fix the WARN_ON() in update_nodemasks_hier()
  cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test
  cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core
  cgroup: distinguish the default and legacy hierarchies when handling cftypes
  cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes()
  cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes
  cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[]
  cpuset: export effective masks to userspace
  cpuset: allow writing offlined masks to cpuset.cpus/mems
  cpuset: enable onlined cpu/node in effective masks
  cpuset: refactor cpuset_hotplug_update_tasks()
  cpuset: make cs->{cpus, mems}_allowed as user-configured masks
  cpuset: apply cs->effective_{cpus,mems}
  cpuset: initialize top_cpuset's configured masks at mount
  cpuset: use effective cpumask to build sched domains
  cpuset: inherit ancestor's masks if effective_{cpus, mems} becomes empty
  cpuset: update cs->effective_{cpus, mems} when config changes
  cpuset: update cpuset->effective_{cpus,mems} at hotplug
  cpuset: add cs->effective_cpus and cs->effective_mems
  cgroup: clean up sane_behavior handling
  ...
2014-08-04 10:11:28 -07:00
Linus Torvalds
f2a84170ed Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu updates from Tejun Heo:

 - Major reorganization of percpu header files which I think makes
   things a lot more readable and logical than before.

 - percpu-refcount is updated so that it requires explicit destruction
   and can be reinitialized if necessary.  This was pulled into the
   block tree to replace the custom percpu refcnting implemented in
   blk-mq.

 - In the process, percpu and percpu-refcount got cleaned up a bit

* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits)
  percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()
  percpu-refcount: require percpu_ref to be exited explicitly
  percpu-refcount: use unsigned long for pcpu_count pointer
  percpu-refcount: add helpers for ->percpu_count accesses
  percpu-refcount: one bit is enough for REF_STATUS
  percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()
  workqueue: stronger test in process_one_work()
  workqueue: clear POOL_DISASSOCIATED in rebind_workers()
  percpu: Use ALIGN macro instead of hand coding alignment calculation
  percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations
  percpu: preffity percpu header files
  percpu: use raw_cpu_*() to define __this_cpu_*()
  percpu: reorder macros in percpu header files
  percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h
  percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h
  percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops
  percpu: reorganize include/linux/percpu-defs.h
  percpu: move accessors from include/linux/percpu.h to percpu-defs.h
  percpu: include/asm-generic/percpu.h should contain only arch-overridable parts
  percpu: introduce arch_raw_cpu_ptr()
  ...
2014-08-04 10:09:27 -07:00
Linus Torvalds
c4c3f5fba0 Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Lai has been doing a lot of cleanups of workqueue and kthread_work.
  No significant behavior change.  Just a lot of cleanups all over the
  place.  Some are a bit invasive but overall nothing too dangerous"

* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  kthread_work: remove the unused wait_queue_head
  kthread_work: wake up worker only when the worker is idle
  workqueue: use nr_node_ids instead of wq_numa_tbl_len
  workqueue: remove the misnamed out_unlock label in get_unbound_pool()
  workqueue: remove the stale comment in pwq_unbound_release_workfn()
  workqueue: move rescuer pool detachment to the end
  workqueue: unfold start_worker() into create_worker()
  workqueue: remove @wakeup from worker_set_flags()
  workqueue: remove an unneeded UNBOUND test before waking up the next worker
  workqueue: wake regular worker if need_more_worker() when rescuer leave the pool
  workqueue: alloc struct worker on its local node
  workqueue: reuse the already calculated pwq in try_to_grab_pending()
  workqueue: stronger test in process_one_work()
  workqueue: clear POOL_DISASSOCIATED in rebind_workers()
  workqueue: sanity check pool->cpu in wq_worker_sleeping()
  workqueue: clear leftover flags when detached
  workqueue: remove useless WARN_ON_ONCE()
  workqueue: use schedule_timeout_interruptible() instead of open code
  workqueue: remove the empty check in too_many_workers()
  workqueue: use "pool->cpu < 0" to stand for an unbound pool
2014-08-04 10:04:44 -07:00
Linus Torvalds
3e7a716a92 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 - CTR(AES) optimisation on x86_64 using "by8" AVX.
 - arm64 support to ccp
 - Intel QAT crypto driver
 - Qualcomm crypto engine driver
 - x86-64 assembly optimisation for 3DES
 - CTR(3DES) speed test
 - move FIPS panic from module.c so that it only triggers on crypto
   modules
 - SP800-90A Deterministic Random Bit Generator (drbg).
 - more test vectors for ghash.
 - tweak self tests to catch partial block bugs.
 - misc fixes.

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (94 commits)
  crypto: drbg - fix failure of generating multiple of 2**16 bytes
  crypto: ccp - Do not sign extend input data to CCP
  crypto: testmgr - add missing spaces to drbg error strings
  crypto: atmel-tdes - Switch to managed version of kzalloc
  crypto: atmel-sha - Switch to managed version of kzalloc
  crypto: testmgr - use chunks smaller than algo block size in chunk tests
  crypto: qat - Fixed SKU1 dev issue
  crypto: qat - Use hweight for bit counting
  crypto: qat - Updated print outputs
  crypto: qat - change ae_num to ae_id
  crypto: qat - change slice->regions to slice->region
  crypto: qat - use min_t macro
  crypto: qat - remove unnecessary parentheses
  crypto: qat - remove unneeded header
  crypto: qat - checkpatch blank lines
  crypto: qat - remove unnecessary return codes
  crypto: Resolve shadow warnings
  crypto: ccp - Remove "select OF" from Kconfig
  crypto: caam - fix DECO RSR polling
  crypto: qce - Let 'DEV_QCE' depend on both HAS_DMA and HAS_IOMEM
  ...
2014-08-04 09:52:51 -07:00
Linus Torvalds
8d71844b51 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "Two fixes in the timer area:
   - a long-standing lock inversion due to a printk
   - suspend-related hrtimer corruption in sched_clock"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
  sched_clock: Avoid corrupting hrtimer tree during suspend
2014-08-03 09:58:20 -07:00
Jan Kara
504d58745c timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
clockevents_increase_min_delta() calls printk() from under
hrtimer_bases.lock. That causes lock inversion on scheduler locks because
printk() can call into the scheduler. Lockdep puts it as:

======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc8-06195-g939f04b #2 Not tainted
-------------------------------------------------------
trinity-main/74 is trying to acquire lock:
 (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c

but task is already holding lock:
 (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (hrtimer_bases.lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
       [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
       [<81080792>] task_clock_event_start+0x3a/0x3f
       [<810807a4>] task_clock_event_add+0xd/0x14
       [<8108259a>] event_sched_in+0xb6/0x17a
       [<810826a2>] group_sched_in+0x44/0x122
       [<81082885>] ctx_sched_in.isra.67+0x105/0x11f
       [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
       [<81082bf6>] __perf_install_in_context+0x8b/0xa3
       [<8107eb8e>] remote_function+0x12/0x2a
       [<8105f5af>] smp_call_function_single+0x2d/0x53
       [<8107e17d>] task_function_call+0x30/0x36
       [<8107fb82>] perf_install_in_context+0x87/0xbb
       [<810852c9>] SYSC_perf_event_open+0x5c6/0x701
       [<810856f9>] SyS_perf_event_open+0x17/0x19
       [<8142f8ee>] syscall_call+0x7/0xb

-> #4 (&ctx->lock){......}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

-> #3 (&rq->lock){-.-.-.}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81040873>] __task_rq_lock+0x33/0x3a
       [<8104184c>] wake_up_new_task+0x25/0xc2
       [<8102474b>] do_fork+0x15c/0x2a0
       [<810248a9>] kernel_thread+0x1a/0x1f
       [<814232a2>] rest_init+0x1a/0x10e
       [<817af949>] start_kernel+0x303/0x308
       [<817af2ab>] i386_start_kernel+0x79/0x7d

-> #2 (&p->pi_lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<810413dd>] try_to_wake_up+0x1d/0xd6
       [<810414cd>] default_wake_function+0xb/0xd
       [<810461f3>] __wake_up_common+0x39/0x59
       [<81046346>] __wake_up+0x29/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #1 (&tty->write_wait){-.....}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<81046332>] __wake_up+0x15/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #0 (&port_lock_key){-.....}:
       [<8104a62d>] __lock_acquire+0x9ea/0xc6d
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<811c60be>] serial8250_console_write+0x8c/0x10c
       [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
       [<8104f5d5>] console_unlock+0x1d7/0x398
       [<8104fb70>] vprintk_emit+0x3da/0x3e4
       [<81425f76>] printk+0x17/0x19
       [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
       [<8105c548>] clockevents_program_event+0xe7/0xf3
       [<8105cc1c>] tick_program_event+0x1e/0x23
       [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
       [<8103c49e>] __remove_hrtimer+0x5b/0x79
       [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
       [<8103cb4b>] hrtimer_cancel+0xd/0x18
       [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
       [<81080705>] task_clock_event_stop+0x20/0x64
       [<81080756>] task_clock_event_del+0xd/0xf
       [<81081350>] event_sched_out+0xab/0x11e
       [<810813e0>] group_sched_out+0x1d/0x66
       [<81081682>] ctx_sched_out+0xaf/0xbf
       [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

other info that might help us debug this:

Chain exists of:
  &port_lock_key --> &ctx->lock --> hrtimer_bases.lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hrtimer_bases.lock);
                               lock(&ctx->lock);
                               lock(hrtimer_bases.lock);
  lock(&port_lock_key);

 *** DEADLOCK ***

4 locks held by trinity-main/74:
 #0:  (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
 #1:  (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
 #2:  (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
 #3:  (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4

stack backtrace:
CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2
 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
Call Trace:
 [<81426f69>] dump_stack+0x16/0x18
 [<81425a99>] print_circular_bug+0x18f/0x19c
 [<8104a62d>] __lock_acquire+0x9ea/0xc6d
 [<8104a942>] lock_acquire+0x92/0x101
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c60be>] serial8250_console_write+0x8c/0x10c
 [<8104af87>] ? lock_release+0x191/0x223
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
 [<8104f5d5>] console_unlock+0x1d7/0x398
 [<8104fb70>] vprintk_emit+0x3da/0x3e4
 [<81425f76>] printk+0x17/0x19
 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
 [<8105cc1c>] tick_program_event+0x1e/0x23
 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
 [<8103c49e>] __remove_hrtimer+0x5b/0x79
 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
 [<8103cb4b>] hrtimer_cancel+0xd/0x18
 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
 [<81080705>] task_clock_event_stop+0x20/0x64
 [<81080756>] task_clock_event_del+0xd/0xf
 [<81081350>] event_sched_out+0xab/0x11e
 [<810813e0>] group_sched_out+0x1d/0x66
 [<81081682>] ctx_sched_out+0xaf/0xbf
 [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
 [<8104416d>] ? __dequeue_entity+0x23/0x27
 [<81044505>] ? pick_next_task_fair+0xb1/0x120
 [<8142cacc>] __schedule+0x4c6/0x4cb
 [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
 [<810475b0>] ? trace_hardirqs_off+0xb/0xd
 [<81056346>] ? rcu_irq_exit+0x64/0x77

Fix the problem by using printk_deferred() which does not call into the
scheduler.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-08-01 12:54:41 +02:00
David Rientjes
3a1122d26c kexec: fix build error when hugetlbfs is disabled
free_huge_page() is undefined without CONFIG_HUGETLBFS and there's no
need to filter PageHuge() page is such a configuration either, so avoid
exporting the symbol to fix a build error:

   In file included from kernel/kexec.c:14:0:
   kernel/kexec.c: In function 'crash_save_vmcoreinfo_init':
   kernel/kexec.c:1623:20: error: 'free_huge_page' undeclared (first use in this function)
     VMCOREINFO_SYMBOL(free_huge_page);
                       ^

Introduced by commit 8f1d26d0e5 ("kexec: export free_huge_page to
VMCOREINFO")

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Acked-by: Olof Johansson <olof@lixom.net>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30 20:09:37 -07:00
Josh Triplett
e0198b290d Josh has moved
My IBM email addresses haven't worked for years; also map some
old-but-functional forwarding addresses to my canonical address.

Update my GPG key fingerprint; I moved to 4096R a long time ago.

Update description.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30 17:16:13 -07:00
Atsushi Kumagai
8f1d26d0e5 kexec: export free_huge_page to VMCOREINFO
PG_head_mask was added into VMCOREINFO to filter huge pages in b3acc56bfe
("kexec: save PG_head_mask in VMCOREINFO"), but makedumpfile still need
another symbol to filter *hugetlbfs* pages.

If a user hope to filter user pages, makedumpfile tries to exclude them by
checking the condition whether the page is anonymous, but hugetlbfs pages
aren't anonymous while they also be user pages.

We know it's possible to detect them in the same way as PageHuge(),
so we need the start address of free_huge_page():

    int PageHuge(struct page *page)
    {
            if (!PageCompound(page))
                    return 0;

            page = compound_head(page);
            return get_compound_page_dtor(page) == free_huge_page;
    }

For that reason, this patch changes free_huge_page() into public
to export it to VMCOREINFO.

Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30 17:16:13 -07:00
Li Zefan
a13812683f cpuset: fix the WARN_ON() in update_nodemasks_hier()
The WARN_ON() is used to check if we break the legal hierarchy, on
which the effective mems should be equal to configured mems.

Reported-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Tested-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
2014-07-30 11:26:58 -04:00
Lai Jiangshan
ed1403ec2b kthread_work: wake up worker only when the worker is idle
If the worker is already executing a work item when another is queued,
we can safely skip wakeup without worrying about stalling queue thus
avoiding waking up the busy worker spuriously.  Spurious wakeups
should be fine but still isn't nice and avoiding it is trivial here.

tj: Updated description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-28 14:07:52 -04:00
Masanari Iida
cd3bd4e628 sched/fair: Fix 'make xmldocs' warning caused by missing description
This patch fix following warning caused by missing description
"overload" in kernel/sched/fair.c

Warning(.//kernel/sched/fair.c:5906): No description found for
parameter 'overload'

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1406518686-7274-1-git-send-email-standby24x7@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28 10:04:14 +02:00
Steven Rostedt
c13db6b131 sched: Use macro for magic number of -1 for setparam
Instead of passing around a magic number -1 for the sched_setparam()
policy, use a more descriptive macro name like SETPARAM_POLICY.

[ based on top of Daniel's sched_setparam() fix ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Daniel Bristot de Oliveira<bristot@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140723112826.6ed6cbce@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28 10:04:14 +02:00
Peter Zijlstra
6ae72dff37 sched: Robustify topology setup
We hard assume that higher topology levels are supersets of lower
levels.

Detect, warn and try to fixup when we encounter this violated.

Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Bruno Wolff III <bruno@wolff.to>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140722094740.GJ12054@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28 10:04:13 +02:00
Ingo Molnar
ca5bc6cd5d Merge branch 'sched/urgent' into sched/core, to merge fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28 10:03:00 +02:00
Jiri Olsa
f4be073db8 perf: Check permission only for parent tracepoint event
There's no need to check cloned event's permission once the
parent was already checked.

Also the code is checking 'current' process permissions, which
is not owner process for cloned events, thus could end up with
wrong permission check result.

Reported-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Tested-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1405079782-8139-1-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28 10:01:38 +02:00
Ingo Molnar
5030c69755 Linux 3.16-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT1VYNAAoJEHm+PkMAQRiGQJwIAKSYp1Uqz5O/e5r0V1TlZKT4
 1B4Njopl57PwSrJQWcGEuH2yHyM896vfPO4L6BJIOfyWzh8kwpQqclDt6uhXoF/v
 OsO1zb/7/j+n/pDZsePqP9AyIgErsHEBgUbhecDqzjN++ITPcZjQ6TIMPglZaumN
 jFAdAZuAaEwqAk8jqN2wlm689Fh9MuUEarHXbXLCqu5RgLrWhFGhp/cTWY62aqnZ
 XfEeQ9KtpRZmlR/IYjerbb1eRH7ZdJsZ88WngLX9dj/JdNxHWBkWQBXGAusXk5Fk
 y6LsIV3TjyBdrRKJ1Ifyg/2EIXHNBs8HxTFGXpjtp2HPuMLDxZOWOWikb9URtNg=
 =Fjf4
 -----END PGP SIGNATURE-----

Merge tag 'v3.16-rc7' into perf/core, to merge in the latest fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28 10:00:33 +02:00
Daniel Bristot de Oliveira
d8d28c8f00 sched: Fix sched_setparam() policy == -1 logic
The scheduler uses policy == -1 to preserve the current policy state to
implement sched_setparam(). But, as (int) -1 is equals to 0xffffffff,
it's matching the if (policy & SCHED_RESET_ON_FORK) on
_sched_setscheduler(). This match changes the policy value to an
invalid value, breaking the sched_setparam() syscall.

This patch checks policy == -1 before check the SCHED_RESET_ON_FORK flag.

The following program shows the bug:

int main(void)
{
	struct sched_param param = {
		.sched_priority = 5,
	};

	sched_setscheduler(0, SCHED_FIFO, &param);
	param.sched_priority = 1;
	sched_setparam(0, &param);
	param.sched_priority = 0;
	sched_getparam(0, &param);
	if (param.sched_priority != 1)
		printf("failed priority setting (found %d instead of 1)\n",
			param.sched_priority);
	else
		printf("priority setting fine\n");
}

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org> # 3.14+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Fixes: 7479f3c9cf "sched: Move SCHED_RESET_ON_FORK into attr::sched_flags"
Link: http://lkml.kernel.org/r/9ebe0566a08dbbb3999759d3f20d6004bb2dbcfa.1406079891.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28 10:00:08 +02:00
Linus Torvalds
9dae0a3fc4 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A bunch of fixes for perf and kprobes:
   - revert a commit that caused a perf group regression
   - silence dmesg spam
   - fix kprobe probing errors on ia64 and ppc64
   - filter kprobe faults from userspace
   - lockdep fix for perf exit path
   - prevent perf #GP in KVM guest
   - correct perf event and filters"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes: Fix "Failed to find blacklist" probing errors on ia64 and ppc64
  kprobes/x86: Don't try to resolve kprobe faults from userspace
  perf/x86/intel: Avoid spamming kernel log for BTS buffer failure
  perf/x86/intel: Protect LBR and extra_regs against KVM lying
  perf: Fix lockdep warning on process exit
  perf/x86/intel/uncore: Fix SNB-EP/IVT Cbox filter mappings
  perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
  perf: Revert ("perf: Always destroy groups on exit")
2014-07-27 09:57:16 -07:00
Steven Rostedt (Red Hat)
dc6f03f26f ftrace: Add warning if tramp hash does not match nr_trampolines
After adding all the records to the tramp_hash, add a check that makes
sure that the number of records added matches the number of records
expected to match and do a WARN_ON and disable ftrace if they do
not match.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-24 11:26:11 -04:00
Steven Rostedt (Red Hat)
2a0343baa4 ftrace: Fix trampoline hash update check on rec->flags
In the loop of ftrace_save_ops_tramp_hash(), it adds all the recs
to the ops hash if the rec has only one callback attached and the
ops is connected to the rec. It gives a nasty warning and shuts down
ftrace if the rec doesn't have a trampoline set for it. But this
can happen with the following scenario:

  # cd /sys/kernel/debug/tracing
  # echo schedule do_IRQ > set_ftrace_filter
  # mkdir instances/foo
  # echo schedule > instances/foo/set_ftrace_filter
  # echo function_graph > current_function
  # echo function > instances/foo/current_function
  # echo nop > instances/foo/current_function

The above would then trigger the following warning and disable
ftrace:

 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 3145 at kernel/trace/ftrace.c:2212 ftrace_run_update_code+0xe4/0x15b()
 Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ip [...]
 CPU: 1 PID: 3145 Comm: bash Not tainted 3.16.0-rc3-test+ #136
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
  0000000000000000 ffffffff81808a88 ffffffff81502130 0000000000000000
  ffffffff81040ca1 ffff880077c08000 ffffffff810bd286 0000000000000001
  ffffffff81a56830 ffff88007a041be0 ffff88007a872d60 00000000000001be
 Call Trace:
  [<ffffffff81502130>] ? dump_stack+0x4a/0x75
  [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
  [<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b
  [<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b
  [<ffffffff810bda1a>] ? ftrace_shutdown+0x11c/0x16b
  [<ffffffff810bda87>] ? unregister_ftrace_function+0x1e/0x38
  [<ffffffff810cc7e1>] ? function_trace_reset+0x1a/0x28
  [<ffffffff810c924f>] ? tracing_set_tracer+0xc1/0x276
  [<ffffffff810c9477>] ? tracing_set_trace_write+0x73/0x91
  [<ffffffff81132383>] ? __sb_start_write+0x9a/0xcc
  [<ffffffff8120478f>] ? security_file_permission+0x1b/0x31
  [<ffffffff81130e49>] ? vfs_write+0xac/0x11c
  [<ffffffff8113115d>] ? SyS_write+0x60/0x8e
  [<ffffffff81508112>] ? system_call_fastpath+0x16/0x1b
 ---[ end trace 938c4415cbc7dc96 ]---
 ------------[ cut here ]------------

Link: http://lkml.kernel.org/r/20140723120805.GB21376@redhat.com

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-24 10:06:41 -04:00
Stephen Boyd
f723aa1817 sched_clock: Avoid corrupting hrtimer tree during suspend
During suspend we call sched_clock_poll() to update the epoch and
accumulated time and reprogram the sched_clock_timer to fire
before the next wrap-around time. Unfortunately,
sched_clock_poll() doesn't restart the timer, instead it relies
on the hrtimer layer to do that and during suspend we aren't
calling that function from the hrtimer layer. Instead, we're
reprogramming the expires time while the hrtimer is enqueued,
which can cause the hrtimer tree to be corrupted. Furthermore, we
restart the timer during suspend but we update the epoch during
resume which seems counter-intuitive.

Let's fix this by saving the accumulated state and canceling the
timer during suspend. On resume we can update the epoch and
restart the timer similar to what we would do if we were starting
the clock for the first time.

Fixes: a08ca5d108 "sched_clock: Use an hrtimer instead of timer"
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1406174630-23458-1-git-send-email-john.stultz@linaro.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-07-24 12:02:49 +02:00
Steven Rostedt (Red Hat)
10e83fd01c ring-buffer: Use rb_page_size() instead of open coded head_page size
There's a helper function to get a ring buffer page size (the number
of bytes of data recorded on the page), called rb_page_size().
Use that instead of open coding it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-23 19:45:12 -04:00
Steven Rostedt (Red Hat)
0162d621dd ftrace: Rename ftrace_ops field from trampolines to nr_trampolines
Having two fields within the same struct that is off by one character
can be confusing and error prone. Rename the counter "trampolines"
to "nr_trampolines" to explicitly show it is a counter and not to
be confused by the "trampoline" field.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-23 15:03:00 -04:00
Lai Jiangshan
ddcb57e2ed workqueue: use nr_node_ids instead of wq_numa_tbl_len
They are the same and nr_node_ids is provided by the memory subsystem.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-22 12:10:39 -04:00
Lai Jiangshan
3fb1823c09 workqueue: remove the misnamed out_unlock label in get_unbound_pool()
After the locking was moved up to the caller of the get_unbound_pool(),
out_unlock label doesn't need to do any unlock operation and the name
became bad, so we just remove this label, and the only usage-site
"goto out_unlock" is subsituted to "return pool".

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-22 12:10:39 -04:00
Lai Jiangshan
29b1cb416a workqueue: remove the stale comment in pwq_unbound_release_workfn()
In 75ccf5950f ("workqueue: prepare flush_workqueue() for dynamic
creation and destrucion of unbound pool_workqueues"), a comment
about the synchronization for the pwq in pwq_unbound_release_workfn()
was added. The comment claimed the flush_mutex wasn't strictly
necessary, it was correct in that time, due to the pwq was protected
by workqueue_lock.

But it is incorrect now since the wq->flush_mutex was renamed to
wq->mutex and workqueue_lock was removed, the wq->mutex is strictly
needed. But the comment was miss-updated when the synchronization
was changed.

This patch removes the incorrect comments and doesn't add any new
comment to explain why wq->mutex is needed here, which is definitely
obvious and wq->pwqs_node has "WQ" notation in its definition which is
better comment.

The old commit mentioned above also introduced a comment in link_pwq()
about the synchronization. This comment is also removed in this patch
since the whole link_pwq() is proteced by wq->mutex.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-22 12:10:39 -04:00
Lai Jiangshan
13b1d625ef workqueue: move rescuer pool detachment to the end
In 51697d3939 ("workqueue: use generic attach/detach routine for
rescuers"), The rescuer detaches itself from the pool before put_pwq()
so that the put_unbound_pool() will not destroy the rescuer-attached
pool.

It is unnecessary.  worker_detach_from_pool() can be used as the last
statement to access to the pool just like the regular workers,
put_unbound_pool() will wait for it to detach and then free the pool.

So we move the worker_detach_from_pool() down, make it coincide with
the regular workers.

tj: Minor description update.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-22 12:10:39 -04:00