Commit Graph

6317 Commits

Author SHA1 Message Date
Linus Torvalds
a8416961d3 Merge branch 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
  x86: disable __do_IRQ support
  sparseirq, powerpc/cell: fix unused variable warning in interrupt.c
  genirq: deprecate obsolete typedefs and defines
  genirq: deprecate __do_IRQ
  genirq: add doc to struct irqaction
  genirq: use kzalloc instead of explicit zero initialization
  genirq: make irqreturn_t an enum
  genirq: remove redundant if condition
  genirq: remove unused hw_irq_controller typedef
  irq: export remove_irq() and setup_irq() symbols
  irq: match remove_irq() args with setup_irq()
  irq: add remove_irq() for freeing of setup_irq() irqs
  genirq: assert that irq handlers are indeed running in hardirq context
  irq: name 'p' variables a bit better
  irq: further clean up the free_irq() code flow
  irq: refactor and clean up the free_irq() code flow
  irq: clean up manage.c
  irq: use GFP_KERNEL for action allocation in request_irq()
  kernel/irq: fix sparse warning: make symbol static
  irq: optimize init_kstat_irqs/init_copy_kstat_irqs
  ...
2009-03-26 16:06:50 -07:00
Linus Torvalds
6671de344c Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
  posix timers: fix RLIMIT_CPU && fork()
  time: ntp: fix bug in ntp_update_offset() & do_adjtimex(), fix
  time: ntp: clean up second_overflow()
  time: ntp: simplify ntp_tick_adj calculations
  time: ntp: make 64-bit constants more robust
  time: ntp: refactor do_adjtimex() some more
  time: ntp: refactor do_adjtimex()
  time: ntp: fix bug in ntp_update_offset() & do_adjtimex()
  time: ntp: micro-optimize ntp_update_offset()
  time: ntp: simplify ntp_update_offset_fll()
  time: ntp: refactor and clean up ntp_update_offset()
  time: ntp: refactor up ntp_update_frequency()
  time: ntp: clean up ntp_update_frequency()
  time: ntp: simplify the MAX_TICKADJ_SCALED definition
  time: ntp: simplify the second_overflow() code flow
  time: ntp: clean up kernel/time/ntp.c
  x86: hpet: stop HPET_COUNTER when programming periodic mode
  x86: hpet: provide separate functions to stop and start the counter
  x86: hpet: print HPET registers during setup (if hpet=verbose is used)
  time: apply NTP frequency/tick changes immediately
  ...
2009-03-26 16:05:42 -07:00
Linus Torvalds
831576fe40 Merge branch 'sched-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (46 commits)
  sched: Add comments to find_busiest_group() function
  sched: Refactor the power savings balance code
  sched: Optimize the !power_savings_balance during fbg()
  sched: Create a helper function to calculate imbalance
  sched: Create helper to calculate small_imbalance in fbg()
  sched: Create a helper function to calculate sched_domain stats for fbg()
  sched: Define structure to store the sched_domain statistics for fbg()
  sched: Create a helper function to calculate sched_group stats for fbg()
  sched: Define structure to store the sched_group statistics for fbg()
  sched: Fix indentations in find_busiest_group() using gotos
  sched: Simple helper functions for find_busiest_group()
  sched: remove unused fields from struct rq
  sched: jiffies not printed per CPU
  sched: small optimisation of can_migrate_task()
  sched: fix typos in documentation
  sched: add avg_overlap decay
  x86, sched_clock(): mark variables read-mostly
  sched: optimize ttwu vs group scheduling
  sched: TIF_NEED_RESCHED -> need_reshed() cleanup
  sched: don't rebalance if attached on NULL domain
  ...
2009-03-26 16:05:01 -07:00
David S. Miller
08abe18af1 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts:
	drivers/net/wimax/i2400m/usb-notif.c
2009-03-26 15:23:24 -07:00
Linus Torvalds
0c93ea4064 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (61 commits)
  Dynamic debug: fix pr_fmt() build error
  Dynamic debug: allow simple quoting of words
  dynamic debug: update docs
  dynamic debug: combine dprintk and dynamic printk
  sysfs: fix some bin_vm_ops errors
  kobject: don't block for each kobject_uevent
  sysfs: only allow one scheduled removal callback per kobj
  Driver core: Fix device_move() vs. dpm list ordering, v2
  Driver core: some cleanup on drivers/base/sys.c
  Driver core: implement uevent suppress in kobject
  vcs: hook sysfs devices into object lifetime instead of "binding"
  driver core: fix passing platform_data
  driver core: move platform_data into platform_device
  sysfs: don't block indefinitely for unmapped files.
  driver core: move knode_bus into private structure
  driver core: move knode_driver into private structure
  driver core: move klist_children into private structure
  driver core: create a private portion of struct device
  driver core: remove polling for driver_probe_done(v5)
  sysfs: reference sysfs_dirent from sysfs inodes
  ...

Fixed conflicts in drivers/sh/maple/maple.c manually
2009-03-26 11:17:04 -07:00
Ingo Molnar
7c526e1fef Merge branches 'timers/new-apis', 'timers/ntp' and 'timers/urgent' into timers/core 2009-03-26 15:45:52 +01:00
Ingo Molnar
a5ebc0b1a7 Merge commit 'v2.6.29' into timers/core 2009-03-26 15:45:22 +01:00
Gautham R Shenoy
b7bb4c9bb0 sched: Add comments to find_busiest_group() function
Impact: cleanup

Add /** style comments around find_busiest_group(). Also add a few
explanatory comments.

This concludes the find_busiest_group() cleanup. The function is
now down to 72 lines from the original 313 lines.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091427.13992.18933.stgit@sofia.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25 13:28:30 +01:00
Gautham R Shenoy
c071df1852 sched: Refactor the power savings balance code
Impact: cleanup

Create seperate helper functions to initialize the
power-savings-balance related variables, to update them and
to check if we have a scope for performing power-savings balance.

Add no-op inline functions for the !(CONFIG_SCHED_MC || CONFIG_SCHED_SMT)
case.

This will eliminate all the #ifdef jungle in find_busiest_group() and the
other helper functions.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091422.13992.73616.stgit@sofia.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25 10:30:48 +01:00
Gautham R Shenoy
a021dc0337 sched: Optimize the !power_savings_balance during fbg()
Impact: cleanup, micro-optimization

We don't need to perform power_savings balance if either the
cpu is NOT_IDLE or if the sched_domain doesn't contain the
SD_POWERSAVINGS_BALANCE flag set.

Currently, we check for these conditions multiple number of
times, even though these variables don't change over the scope
of find_busiest_group().

Check once, and store the value in the already exiting
"power_savings_balance" variable.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091417.13992.2657.stgit@sofia.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25 10:30:48 +01:00
Gautham R Shenoy
dbc523a3b8 sched: Create a helper function to calculate imbalance
Move all the imbalance calculation out of find_busiest_group()
through this helper function.

With this change, the structure of find_busiest_group() will be
as follows:

- update_sched_domain_statistics.

- check if imbalance exits.

- update imbalance and return busiest.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091411.13992.43293.stgit@sofia.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25 10:30:47 +01:00
Gautham R Shenoy
2e6f44aeda sched: Create helper to calculate small_imbalance in fbg()
Impact: cleanup

We have two places in find_busiest_group() where we need to calculate
the minor imbalance before returning the busiest group. Encapsulate
this functionality into a seperate helper function.

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091406.13992.54316.stgit@sofia.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25 10:30:47 +01:00
Gautham R Shenoy
37abe198b1 sched: Create a helper function to calculate sched_domain stats for fbg()
Impact: cleanup

Create a helper function named update_sd_lb_stats() to update the
various sched_domain related statistics in find_busiest_group().

With this we would have moved all the statistics computation out of
find_busiest_group().

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091401.13992.88737.stgit@sofia.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25 10:30:46 +01:00
Gautham R Shenoy
222d656dea sched: Define structure to store the sched_domain statistics for fbg()
Impact: cleanup

Currently we use a lot of local variables in find_busiest_group()
to capture the various statistics related to the sched_domain.
Group them together into a single data structure.

This will help us to offload the job of updating the sched_domain
statistics to a helper function.

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091356.13992.25970.stgit@sofia.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25 10:30:46 +01:00
Gautham R Shenoy
1f8c553d0f sched: Create a helper function to calculate sched_group stats for fbg()
Impact: cleanup

Create a helper function named update_sg_lb_stats() which
can be invoked to calculate the individual group's statistics
in find_busiest_group().

This reduces the lenght of find_busiest_group() considerably.

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Aked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091351.13992.43461.stgit@sofia.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25 10:30:45 +01:00
Gautham R Shenoy
381be78fdc sched: Define structure to store the sched_group statistics for fbg()
Impact: cleanup

Currently a whole bunch of variables are used to store the
various statistics pertaining to the groups we iterate over
in find_busiest_group().

Group them together in a single data structure and add
appropriate comments.

This will be useful later on when we create helper functions
to calculate the sched_group statistics.

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091345.13992.20099.stgit@sofia.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25 10:30:45 +01:00
Gautham R Shenoy
6dfdb06290 sched: Fix indentations in find_busiest_group() using gotos
Impact: cleanup

Some indentations in find_busiest_group() can minimized by using
early exits with the help of gotos. This improves readability in
a couple of cases.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091340.13992.45062.stgit@sofia.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25 10:30:44 +01:00
Gautham R Shenoy
67bb6c036d sched: Simple helper functions for find_busiest_group()
Impact: cleanup

Currently the load idx calculation code is in find_busiest_group().
Move that to a static inline helper function.

Similary, to find the first cpu of a sched_group we use
cpumask_first(sched_group_cpus(group))

Use a helper to that. It improves readability in some cases.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091335.13992.55424.stgit@sofia.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25 10:30:44 +01:00
Ingo Molnar
b6d9842258 Merge branch 'sched/cleanups'; commit 'v2.6.29' into sched/core 2009-03-25 10:26:51 +01:00
Jason Baron
e9d376f0fa dynamic debug: combine dprintk and dynamic printk
This patch combines Greg Bank's dprintk() work with the existing dynamic
printk patchset, we are now calling it 'dynamic debug'.

The new feature of this patchset is a richer /debugfs control file interface,
(an example output from my system is at the bottom), which allows fined grained
control over the the debug output. The output can be controlled by function,
file, module, format string, and line number.

for example, enabled all debug messages in module 'nf_conntrack':

echo -n 'module nf_conntrack +p' > /mnt/debugfs/dynamic_debug/control

to disable them:

echo -n 'module nf_conntrack -p' > /mnt/debugfs/dynamic_debug/control

A further explanation can be found in the documentation patch.

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-24 16:38:26 -07:00
Luis Henriques
67aa0f767a sched: remove unused fields from struct rq
Impact: cleanup, new schedstat ABI

Since they are used on in statistics and are always set to zero, the
following fields from struct rq have been removed: yld_exp_empty,
yld_act_empty and yld_both_empty.

Both Sched Debug and SCHEDSTAT_VERSION versions has also been
incremented since ABIs have been changed.

The schedtop tool has been updated to properly handle new version of
schedstat:

   http://rt.wiki.kernel.org/index.php/Schedtop_utility

Signed-off-by: Luis Henriques <henrix@sapo.pt>
Acked-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090324221002.GA10061@hades.domain.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-24 23:16:51 +01:00
Benjamin Herrenschmidt
9e41d9597e Merge commit 'origin/master' into next 2009-03-24 13:38:30 +11:00
James Morris
703a3cd728 Merge branch 'master' into next 2009-03-24 10:52:46 +11:00
Oleg Nesterov
37bebc70d7 posix timers: fix RLIMIT_CPU && fork()
See http://bugzilla.kernel.org/show_bug.cgi?id=12911

copy_signal() copies signal->rlim, but RLIMIT_CPU is "lost". Because
posix_cpu_timers_init_group() sets cputime_expires.prof_exp = 0 and thus
fastpath_timer_check() returns false unless we have other cpu timers.

This is the minimal fix for 2.6.29 (tested) and 2.6.28. The patch is not
optimal, we need further cleanups here. With this patch update_rlimit_cpu()
is not really needed, but I don't think it should be removed.

The proper fix (I think) is:

	- set_process_cpu_timer() should just start the cputimer->running
	  logic (it does), no need to change cputime_expires.xxx_exp

	- posix_cpu_timers_init_group() should set ->running when needed

	- fastpath_timer_check() can check ->running instead of
	  task_cputime_zero(signal->cputime_expires)

Reported-by: Peter Lojkin <ia6432@inbox.ru>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: <stable@kernel.org> [for 2.6.29.x]
LKML-Reference: <20090323193411.GA17514@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23 20:43:35 +01:00
Miklos Szeredi
53da1d9456 fix ptrace slowness
This patch fixes bug #12208:

  Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=12208
  Subject         : uml is very slow on 2.6.28 host

This turned out to be not a scheduler regression, but an already
existing problem in ptrace being triggered by subtle scheduler
changes.

The problem is this:

 - task A is ptracing task B
 - task B stops on a trace event
 - task A is woken up and preempts task B
 - task A calls ptrace on task B, which does ptrace_check_attach()
 - this calls wait_task_inactive(), which sees that task B is still on the runq
 - task A goes to sleep for a jiffy
 - ...

Since UML does lots of the above sequences, those jiffies quickly add
up to make it slow as hell.

This patch solves this by not rescheduling in read_unlock() after
ptrace_stop() has woken up the tracer.

Thanks to Oleg Nesterov and Ingo Molnar for the feedback.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-23 09:22:31 -07:00
Ingo Molnar
efd247fa34 Merge branches 'sched/debug' and 'linus' into sched/core 2009-03-23 16:53:20 +01:00
Ingo Molnar
b3e3b302cf Merge branches 'irq/sparseirq' and 'linus' into irq/core 2009-03-23 10:07:49 +01:00
Ingo Molnar
4a44bac1f9 symbols, stacktrace: look up init symbols after module symbols
Impact: fix incomplete stacktraces

I noticed such weird stacktrace entries in lockdep dumps:

[    0.285956] {HARDIRQ-ON-W} state was registered at:
[    0.285956]   [<ffffffff802bce90>] mark_irqflags+0xbe/0x125
[    0.285956]   [<ffffffff802bf2fd>] __lock_acquire+0x674/0x82d
[    0.285956]   [<ffffffff802bf5b2>] lock_acquire+0xfc/0x128
[    0.285956]   [<ffffffff8135b636>] rt_spin_lock+0xc8/0xd0
[    0.285956]   [<ffffffffffffffff>] 0xffffffffffffffff

The stacktrace entry is cut off after rt_spin_lock.

After much debugging i found out that stacktrace entries that
belong to init symbols dont get printed out, due to commit:

  a2da405: module: Don't report discarded init pages as kernel text.

The reason is this check added to core_kernel_text():

-       if (addr >= (unsigned long)_sinittext &&
+       if (system_state == SYSTEM_BOOTING &&
+           addr >= (unsigned long)_sinittext &&
            addr <= (unsigned long)_einittext)
                return 1;

This will discard inittext symbols even though their symbol table
is still present and even though stacktraces done while the system
was booting up might still be relevant.

To not reintroduce the (not well-specified) bug addressed in that
commit, first do a module symbols lookup, then a final init-symbols
lookup.

This will work fine on architectures that have separate address
spaces for modules (such as x86) - and should not crash any other
architectures either.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <new-discussion>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-19 13:38:35 +01:00
Luis Henriques
af66df5ecf sched: jiffies not printed per CPU
The jiffies value was being printed for each CPU, which does not seem to make
sense.  Moved jiffies to system section.

Signed-off-by: Luis Henriques <henrix@sapo.pt>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090318000425.GA2228@hades.domain.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 09:57:26 +01:00
Masami Hiramatsu
6e2b75740b module: fix refptr allocation and release order
Impact: fix ref-after-free crash on failed module load

Fix refptr bug: Change refptr allocation and release order not to access a module
data structure pointed by 'mod' after freeing mod->module_core.
This bug will cause kernel panic(e.g. failed to find undefined symbols).

This bug was reported on systemtap bugzilla.
http://sources.redhat.com/bugzilla/show_bug.cgi?id=9927

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-18 09:31:21 +10:30
Luis Henriques
708dc51253 sched: small optimisation of can_migrate_task()
There were 3 invocations of task_hot() in can_migrate_task().

Replace these 3 invocations by only one invocation, cached in
a local variable.

Signed-off-by: Luis Henriques <henrix@sapo.pt>
LKML-Reference: <20090316195902.GA6197@hades.domain.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-17 12:04:41 +01:00
Luis Henriques
80dd99b368 sched: fix typos in documentation
Fixed typos in function documentation.

Signed-off-by: Luis Henriques <henrix@sapo.pt>
LKML-Reference: <20090316195809.GA6073@hades.domain.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-17 12:04:40 +01:00
Ingo Molnar
edb35028e4 Merge branches 'irq/genirq' and 'linus' into irq/core 2009-03-16 09:20:13 +01:00
Thomas Gleixner
0e57aa11ab genirq: deprecate __do_IRQ
Two years migration time is enough. Remove the compability cruft.

Add the deprecated warning in kernel/irq/handle.c because marking
__do_IRQ itself is way too noisy.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-03-13 16:34:02 +01:00
Thomas Gleixner
4553573277 genirq: use kzalloc instead of explicit zero initialization
Impact: simplification

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2009-03-13 14:32:29 +01:00
Thomas Gleixner
c8e2aeef0b genirq: remove redundant if condition
Impact: cleanup

The code is only compiled if CONFIG_GENERIC_HARDIRQS=y so another
check for this define in the code is redundant. Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-03-13 14:32:28 +01:00
Ingo Molnar
c95dbf27e2 panic: clean up kernel/panic.c
Impact: cleanup, no code changed

Clean up kernel/panic.c some more and make it more consistent.

LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 11:25:53 +01:00
Ingo Molnar
d1dedb52ac panic, smp: provide smp_send_stop() wrapper on UP too
Impact: cleanup, no code changed

Remove an ugly #ifdef CONFIG_SMP from panic(), by providing
an smp_send_stop() wrapper on UP too.

LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 11:24:31 +01:00
Ingo Molnar
ffd71da4e3 panic: decrease oops_in_progress only after having done the panic
Impact: eliminate secondary warnings during panic()

We can panic() in a number of difficult, atomic contexts, hence
we use bust_spinlocks(1) in panic() to increase oops_in_progress,
which prevents various debug checks we have in place.

But in practice this protection only covers the first few printk's
done by panic() - it does not cover the later attempt to stop all
other CPUs and kexec(). If a secondary warning triggers in one of
those facilities that can make the panic message scroll off.

So do bust_spinlocks(0) only much later in panic(). (which code
is only reached if panic policy is relaxed that it can return
after a warning message)

Reported-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 11:06:47 +01:00
Ingo Molnar
cd80a8142e Merge branch 'x86/core' into core/ipi 2009-03-13 11:05:58 +01:00
Ingo Molnar
641cd4cfcd generic-ipi: eliminate WARN_ON()s during oops/panic
Do not output smp-call related warnings in the oops/panic codepath.

Reported-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 10:47:34 +01:00
Ingo Molnar
88f502fedb futex: remove the pointer math from double_unlock_hb, fix
Impact: fix double unlock crash

Thomas Gleixner noticed that the simplified double_unlock_hb()
became ... too unsophisticated: in the hb1 == hb2 case it will
do a double unlock.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Darren Hart <dvhltc@us.ibm.com>
LKML-Reference: <20090312221118.11146.68610.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 10:32:07 +01:00
Ingo Molnar
f6411fe7e0 Merge branches 'sched/clock', 'sched/urgent' and 'linus' into sched/core 2009-03-13 04:50:44 +01:00
Ingo Molnar
25d500067d Merge branch 'linus' into core/ipi 2009-03-13 02:14:25 +01:00
Ingo Molnar
d820ac4c2f locking: rename trace_softirq_[enter|exit] => lockdep_softirq_[enter|exit]
Impact: cleanup

The naming clashes with upcoming softirq tracepoints, so rename the
APIs to lockdep_*().

Requested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 01:32:36 +01:00
Ingo Molnar
3c1f67d60e Merge branch 'linus' into core/locking 2009-03-13 01:29:17 +01:00
Darren Hart
f061d35150 futex: remove the pointer math from double_unlock_hb
Impact: simplify code

I mistakenly included the pointer value ordering in the
double_unlock_hb() in my previous patch. It's only necessary
in the double_lock_hb() function. This patch removes it.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <20090312221118.11146.68610.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 01:15:46 +01:00
Magnus Damm
eb53b4e8fe irq: export remove_irq() and setup_irq() symbols
Export the setup_irq() and remove_irq() symbols.

I'd like to export these functions since I have timer
code that needs to use setup_irq() early on (too early
for request_irq()), and the same code can also be
compiled as a module.

Signed-off-by: Magnus Damm <damm@igel.co.jp>
LKML-Reference: <20090312120559.2926.82371.sendpatchset@rx1.opensource.se>
[ changed to _GPL as these are special APIs deep inside the irq layer. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 13:16:33 +01:00
Magnus Damm
cbf94f0682 irq: match remove_irq() args with setup_irq()
Modify remove_irq() to match setup_irq().

Signed-off-by: Magnus Damm <damm@igel.co.jp>
LKML-Reference: <20090312120551.2926.43942.sendpatchset@rx1.opensource.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 13:16:33 +01:00
Magnus Damm
f21cfb258d irq: add remove_irq() for freeing of setup_irq() irqs
Impact: add new API

This patch adds a remove_irq() function for releasing
interrupts requested with setup_irq().

Without this patch we have no way of releasing such
interrupts since free_irq() today tries to kfree()
the irqaction passed with setup_irq().

Signed-off-by: Magnus Damm <damm@igel.co.jp>
LKML-Reference: <20090312120542.2926.56609.sendpatchset@rx1.opensource.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 13:16:32 +01:00
Ingo Molnar
f8cb22cbb8 Merge branch 'linus' into irq/genirq 2009-03-12 13:16:18 +01:00
Darren Hart
e4dc5b7a36 futex: clean up fault logic
Impact: cleanup

Older versions of the futex code held the mmap_sem which had to
be dropped in order to call get_user(), so a two-pronged fault
handling mechanism was employed to handle faults of the atomic
operations.  The mmap_sem is no longer held, so get_user()
should be adequate.  This patch greatly simplifies the logic and
improves legibility.

Build and boot tested on a 4 way Intel x86_64 workstation.
Passes basic pthread_mutex and PI tests out of
ltp/testcases/realtime.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <20090312075612.9856.48612.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 11:20:57 +01:00
Darren Hart
e8f6386c01 futex: unlock before returning -EFAULT
Impact: rt-mutex failure case fix

futex_lock_pi can potentially return -EFAULT with the rt_mutex
held.  This seems like the wrong thing to do as userspace should
assume -EFAULT means the lock was not taken.  Even if it could
figure this out, we'd be leaving the pi_state->owner in an
inconsistent state.  This patch unlocks the rt_mutex prior to
returning -EFAULT to userspace.

Build and boot tested on a 4 way Intel x86_64 workstation.
Passes basic pthread_mutex and PI tests out of
ltp/testcases/realtime.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <20090312075606.9856.88729.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 11:20:57 +01:00
Darren Hart
16f4993f4e futex: use current->time_slack_ns for rt tasks too
RT tasks should set their timer slack to 0 on their own.  This
patch removes the 'if (rt_task()) slack = 0;' block in
futex_wait.

Build and boot tested on a 4 way Intel x86_64 workstation.
Passes basic pthread_mutex and PI tests out of
ltp/testcases/realtime.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <20090312075559.9856.28822.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 11:20:57 +01:00
Darren Hart
5eb3dc62fc futex: add double_unlock_hb()
Impact: cleanup

The futex code uses double_lock_hb() which locks the hb->lock's
in pointer value order.  There is no parallel unlock routine,
and the code unlocks them in name order, ignoring pointer value.

This patch adds double_unlock_hb() to refactor the duplicated
code segments.

Build and boot tested on a 4 way Intel x86_64 workstation.
Passes basic pthread_mutex and PI tests out of
ltp/testcases/realtime.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <20090312075552.9856.48021.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 11:20:56 +01:00
Darren Hart
de87fcc124 futex: additional (get|put)_futex_key() fixes
Impact: fix races

futex_requeue and futex_lock_pi still had some bad
(get|put)_futex_key() usage. This patch adds the missing
put_futex_keys() and corrects a goto in futex_lock_pi() to avoid
a double get.

Build and boot tested on a 4 way Intel x86_64 workstation.
Passes basic pthread_mutex and PI tests out of
ltp/testcases/realtime.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <20090312075545.9856.75152.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 11:20:56 +01:00
Darren Hart
b2d0994b13 futex: update futex commentary
Impact: cleanup

The futex_hash_bucket can be a bit confusing when first looking
at the code as it is a shared queue (and futex_q isn't a queue
at all, but rather an element on the queue).

The mmap_sem is no longer held outside of the
futex_handle_fault() routine, yet numerous comments refer to it.
The fshared argument is no an integer.  I left some of these
comments along as they are simply removed in future patches.

Some of the commentary refering to futexes by virtual page
mappings was not very clear, and completely accurate (as for
shared futexes both the page and the offset are used to
determine the key).  For the purposes of the function
description, just referring to "the futex" seems sufficient.

With hashed futexes we now access the page after the hash-bucket
is locked, and not only after it is enqueued.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <20090312075537.9856.29954.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 11:20:55 +01:00
Mike Galbraith
df1c99d416 sched: add avg_overlap decay
Impact: more precise avg_overlap metric - better load-balancing

avg_overlap is used to measure the runtime overlap of the waker and
wakee.

However, when a process changes behaviour, eg a pipe becomes
un-congested and we don't need to go to sleep after a wakeup
for a while, the avg_overlap value grows stale.

When running we use the avg runtime between preemption as a
measure for avg_overlap since the amount of runtime can be
correlated to cache footprint.

The longer we run, the less likely we'll be wanting to be
migrated to another CPU.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1236709131.25234.576.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-11 11:31:50 +01:00
Ingo Molnar
78b020d035 Merge branches 'x86/cleanups', 'x86/kexec', 'x86/mce2' and 'linus' into x86/core 2009-03-11 10:49:15 +01:00
Benjamin Herrenschmidt
e14eee56c2 Merge commit 'origin/master' into next 2009-03-11 17:10:07 +11:00
Dhaval Giani
be50b8342d kernel/user.c: fix a memory leak when freeing up non-init usernamespaces users
We were returning early in the sysfs directory cleanup function if the
user belonged to a non init usernamespace.  Due to this a lot of the
cleanup was not done and we were left with a leak.  Fix the leak.

Reported-by: Serge Hallyn <serue@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-10 15:55:11 -07:00
Peter Zijlstra
57310a98a3 sched: optimize ttwu vs group scheduling
Impact: micro-optimization

We can avoid the sched domain walk on try_to_wake_up() when we know
there are no groups.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1236603381.8389.455.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-10 16:43:36 +01:00
Ingo Molnar
8c54436ae9 Merge branches 'sched/cleanups' and 'linus' into sched/core 2009-03-10 16:34:43 +01:00
Américo Wang
bbb76b552a ptrace: remove a useless goto
Impact: cleanup

Obviously, this goto is useless. Remove it.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Roland McGrath <roland@redhat.com>
LKML-Reference: <20090310093447.GC3179@hack>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-10 13:39:29 +01:00
Ingo Molnar
467c88fee5 Merge branches 'x86/apic', 'x86/asm', 'x86/fixmap', 'x86/memtest', 'x86/mm', 'x86/urgent', 'linus' and 'core/percpu' into x86/core 2009-03-10 09:26:38 +01:00
Oleg Nesterov
2d5516cbb9 copy_process: fix CLONE_PARENT && parent_exec_id interaction
CLONE_PARENT can fool the ->self_exec_id/parent_exec_id logic. If we
re-use the old parent, we must also re-use ->parent_exec_id to make
sure exit_notify() sees the right ->xxx_exec_id's when the CLONE_PARENT'ed
task exits.

Also, move down the "p->parent_exec_id = p->self_exec_id" thing, to place
two different cases together.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-09 13:23:25 -07:00
Heiko Carstens
6d5b5acca9 Fix fixpoint divide exception in acct_update_integrals
Frans Pop reported the crash below when running an s390 kernel under Hercules:

  Kernel BUG at 000738b4  verbose debug info unavailable!
  fixpoint divide exception: 0009  #1! SMP
  Modules linked in: nfs lockd nfs_acl sunrpc ctcm fsm tape_34xx
     cu3088 tape ccwgroup tape_class ext3 jbd mbcache dm_mirror dm_log dm_snapshot
     dm_mod dasd_eckd_mod dasd_mod
  CPU: 0 Not tainted 2.6.27.19 #13
  Process awk (pid: 2069, task: 0f9ed9b8, ksp: 0f4f7d18)
  Krnl PSW : 070c1000 800738b4 (acct_update_integrals+0x4c/0x118)
             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0
  Krnl GPRS: 00000000 000007d0 7fffffff fffff830
             00000000 ffffffff 00000002 0f9ed9b8
             00000000 00008ca0 00000000 0f9ed9b8
             0f9edda4 8007386e 0f4f7ec8 0f4f7e98
  Krnl Code: 800738aa: a71807d0         lhi     %r1,2000
             800738ae: 8c200001         srdl    %r2,1
             800738b2: 1d21             dr      %r2,%r1
            >800738b4: 5810d10e         l       %r1,270(%r13)
             800738b8: 1823             lr      %r2,%r3
             800738ba: 4130f060         la      %r3,96(%r15)
             800738be: 0de1             basr    %r14,%r1
             800738c0: 5800f060         l       %r0,96(%r15)
  Call Trace:
  ( <000000000004fdea>! blocking_notifier_call_chain+0x1e/0x2c)
    <0000000000038502>! do_exit+0x106/0x7c0
    <0000000000038c36>! do_group_exit+0x7a/0xb4
    <0000000000038c8e>! SyS_exit_group+0x1e/0x30
    <0000000000021c28>! sysc_do_restart+0x12/0x16
    <0000000077e7e924>! 0x77e7e924

Reason for this is that cpu time accounting usually only happens from
interrupt context, but acct_update_integrals gets also called from
process context with interrupts enabled.

So in acct_update_integrals we may end up with the following scenario:

Between reading tsk->stime/tsk->utime and tsk->acct_timexpd an interrupt
happens which updates accouting values.  This causes acct_timexpd to be
greater than the former stime + utime.  The subsequent calculation of

	dtime = cputime_sub(time, tsk->acct_timexpd);

will be negative and the division performed by

	cputime_to_jiffies(dtime)

will generate an exception since the result won't fit into a 32 bit
register.

In order to fix this just always disable interrupts while accessing any
of the accounting values.

Reported by: Frans Pop <elendil@planet.nl>
Tested by: Frans Pop <elendil@planet.nl>
Cc: stable@kernel.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-09 08:13:35 -07:00
Lai Jiangshan
5ed0cec0ac sched: TIF_NEED_RESCHED -> need_reshed() cleanup
Impact: cleanup

Use test_tsk_need_resched(), set_tsk_need_resched(), need_resched()
instead of using TIF_NEED_RESCHED.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <49B10BA4.9070209@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 12:48:55 +01:00
Ingo Molnar
7fc07d8410 Merge branch 'sched/core' into sched/cleanups 2009-03-06 11:47:52 +01:00
Tejun Heo
edcb463997 percpu, module: implement reserved allocation and use it for module percpu variables
Impact: add reserved allocation functionality and use it for module
	percpu variables

This patch implements reserved allocation from the first chunk.  When
setting up the first chunk, arch can ask to set aside certain number
of bytes right after the core static area which is available only
through a separate reserved allocator.  This will be used primarily
for module static percpu variables on architectures with limited
relocation range to ensure that the module perpcu symbols are inside
the relocatable range.

If reserved area is requested, the first chunk becomes reserved and
isn't available for regular allocation.  If the first chunk also
includes piggy-back dynamic allocation area, a separate chunk mapping
the same region is created to serve dynamic allocation.  The first one
is called static first chunk and the second dynamic first chunk.
Although they share the page map, their different area map
initializations guarantee they serve disjoint areas according to their
purposes.

If arch doesn't setup reserved area, reserved allocation is handled
like any other allocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-06 14:33:59 +09:00
Ingo Molnar
a1413c89ae Merge branch 'x86/urgent' into x86/core
Conflicts:
	arch/x86/include/asm/fixmap_64.h
Semantic merge:
	arch/x86/include/asm/fixmap.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 21:48:50 +01:00
Frederic Weisbecker
8a0be9ef82 sched: don't rebalance if attached on NULL domain
Impact: fix function graph trace hang / drop pointless softirq on UP

While debugging a function graph trace hang on an old PII, I saw
that it consumed most of its time on the timer interrupt. And
the domain rebalancing softirq was the most concerned.

The timer interrupt calls trigger_load_balance() which will
decide if it is worth to schedule a rebalancing softirq.

In case of builtin UP kernel, no problem arises because there is
no domain question.

In case of builtin SMP kernel running on an SMP box, still no
problem, the softirq will be raised each time we reach the
next_balance time.

In case of builtin SMP kernel running on a UP box (most distros
provide default SMP kernels, whatever the box you have), then
the CPU is attached to the NULL sched domain. So a kind of
unexpected behaviour happen:

trigger_load_balance() -> raises the rebalancing softirq later
on softirq: run_rebalance_domains() -> rebalance_domains() where
the for_each_domain(cpu, sd) is not taken because of the NULL
domain we are attached at. Which means rq->next_balance is never
updated. So on the next timer tick, we will enter
trigger_load_balance() which will always reschedule() the
rebalacing softirq:

if (time_after_eq(jiffies, rq->next_balance))
	raise_softirq(SCHED_SOFTIRQ);

So for each tick, we process this pointless softirq.

This patch fixes it by checking if we are attached to the null
domain before raising the softirq, another possible fix would be
to set the maximal possible JIFFIES value to rq->next_balance if
we are attached to the NULL domain.

v2: build fix on UP

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <49af242d.1c07d00a.32d5.ffffc019@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:04:44 +01:00
Ingo Molnar
49d2d266ad Merge commit 'v2.6.29-rc7' into sched/core 2009-03-05 11:59:10 +01:00
David Rientjes
03d78913f0 lockdep: remove duplicate CONFIG_DEBUG_LOCKDEP definitions
Impact: cleanup

The atomic debug modifiers are already defined in
kernel/lockdep_internals.h.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <alpine.DEB.2.00.0903050222160.30401@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 11:46:00 +01:00
Ingo Molnar
a140feab42 Merge commit 'v2.6.29-rc7' into core/locking 2009-03-05 11:45:22 +01:00
David S. Miller
508827ff0a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/tokenring/tmspci.c
	drivers/net/ucc_geth_mii.c
2009-03-05 02:06:47 -08:00
Eric Dumazet
64ca5ab913 rcu: increment quiescent state counter in ksoftirqd()
If a machine is flooded by network frames, a cpu can loop
100% of its time inside ksoftirqd() without calling schedule().
This can delay RCU grace period to insane values.

Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem.

Paul: "This regression was a result of the recent change from
"schedule()" to "cond_resched()", which got rid of that quiescent
state in the common case where a reschedule is not needed".

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 22:08:45 +01:00
Peter Zijlstra
26575e28df lockdep: remove extra "irq" string
Impact: clarify lockdep printk text

print_irq_inversion_bug() gets handed state strings of the form

  "HARDIRQ", "SOFTIRQ", "RECLAIM_FS"

and appends "-irq-{un,}safe" to them, which is either redudant for *IRQ or
confusing in the RECLAIM_FS case.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1236175192.5330.7585.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 18:38:34 +01:00
Peter Zijlstra
1c21f14ec4 lockdep: fix incorrect state name
In the recent mark_lock_irq() rework a bug snuck in that would report the
state of write locks causing irq inversion under a read lock as a read
lock.

Fix this by masking the read bit of the state when validating write
dependencies.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1236172646.5330.7450.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 18:38:33 +01:00
Ingo Molnar
91d75e209b Merge branch 'x86/core' into core/percpu 2009-03-04 02:29:19 +01:00
Ingo Molnar
8b0e5860cb Merge branches 'x86/apic', 'x86/cpu', 'x86/fixmap', 'x86/mm', 'x86/sched', 'x86/setup-lzma', 'x86/signal' and 'x86/urgent' into x86/core 2009-03-04 02:22:31 +01:00
Linus Torvalds
219f170a85 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: don't allow setuid to succeed if the user does not have rt bandwidth
  sched_rt: don't start timer when rt bandwidth disabled
2009-03-03 14:33:20 -08:00
Linus Torvalds
b24746c7be Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: Teach RCU that idle task is not quiscent state at boot
2009-03-03 14:32:04 -08:00
Roland McGrath
5b1017404a x86-64: seccomp: fix 32/64 syscall hole
On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call.  A 64-bit process make a 32-bit system call with int $0x80.

In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
the wrong system call number table.  The fix is simple: test TS_COMPAT
instead of TIF_IA32.  Here is an example exploit:

	/* test case for seccomp circumvention on x86-64

	   There are two failure modes: compile with -m64 or compile with -m32.

	   The -m64 case is the worst one, because it does "chmod 777 ." (could
	   be any chmod call).  The -m32 case demonstrates it was able to do
	   stat(), which can glean information but not harm anything directly.

	   A buggy kernel will let the test do something, print, and exit 1; a
	   fixed kernel will make it exit with SIGKILL before it does anything.
	*/

	#define _GNU_SOURCE
	#include <assert.h>
	#include <inttypes.h>
	#include <stdio.h>
	#include <linux/prctl.h>
	#include <sys/stat.h>
	#include <unistd.h>
	#include <asm/unistd.h>

	int
	main (int argc, char **argv)
	{
	  char buf[100];
	  static const char dot[] = ".";
	  long ret;
	  unsigned st[24];

	  if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
	    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");

	#ifdef __x86_64__
	  assert ((uintptr_t) dot < (1UL << 32));
	  asm ("int $0x80 # %0 <- %1(%2 %3)"
	       : "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
	  ret = snprintf (buf, sizeof buf,
			  "result %ld (check mode on .!)\n", ret);
	#elif defined __i386__
	  asm (".code32\n"
	       "pushl %%cs\n"
	       "pushl $2f\n"
	       "ljmpl $0x33, $1f\n"
	       ".code64\n"
	       "1: syscall # %0 <- %1(%2 %3)\n"
	       "lretl\n"
	       ".code32\n"
	       "2:"
	       : "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
	  if (ret == 0)
	    ret = snprintf (buf, sizeof buf,
			    "stat . -> st_uid=%u\n", st[7]);
	  else
	    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
	#else
	# error "not this one"
	#endif

	  write (1, buf, ret);

	  syscall (__NR_exit, 1);
	  return 2;
	}

Signed-off-by: Roland McGrath <roland@redhat.com>
[ I don't know if anybody actually uses seccomp, but it's enabled in
  at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-02 15:41:30 -08:00
Peter Zijlstra
044d408409 genirq: assert that irq handlers are indeed running in hardirq context
Make sure the genirq layer handlers are indeed running handlers
in hardirq context. That is the genirq expectation and doing
anything else is broken.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1236006812.5330.632.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 00:05:45 +01:00
Ingo Molnar
c02368a9d0 Merge branch 'linus' into irq/genirq 2009-03-02 22:08:56 +01:00
Wang Chen
b67802ea80 sched: kill unused parameter of pick_next_task()
Impact: micro-optimization

Parameter "prev" is not used really.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 12:02:53 +01:00
Ingo Molnar
5512b3ece0 Merge branches 'sched/clock', 'sched/urgent' and 'linus' into sched/core 2009-03-02 12:02:36 +01:00
David S. Miller
aa4abc9bcc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-tx.c
	net/8021q/vlan_core.c
	net/core/dev.c
2009-03-01 21:35:16 -08:00
Ingo Molnar
55f2b78995 Merge branch 'x86/urgent' into x86/pat 2009-03-01 12:47:58 +01:00
David Howells
5170836679 Fix recursive lock in free_uid()/free_user_ns()
free_uid() and free_user_ns() are corecursive when CONFIG_USER_SCHED=n,
but free_user_ns() is called from free_uid() by way of uid_hash_remove(),
which requires uidhash_lock to be held.  free_user_ns() then calls
free_uid() to complete the destruction.

Fix this by deferring the destruction of the user_namespace.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-27 16:26:21 -08:00
Dhaval Giani
54e9912428 sched: don't allow setuid to succeed if the user does not have rt bandwidth
Impact: fix hung task with certain (non-default) rt-limit settings

Corey Hickey reported that on using setuid to change the uid of a
rt process, the process would be unkillable and not be running.
This is because there was no rt runtime for that user group. Add
in a check to see if a user can attach an rt task to its task group.
On failure, return EINVAL, which is also returned in
CONFIG_CGROUP_SCHED.

Reported-by: Corey Hickey <bugfood-ml@fatooh.org>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-27 11:11:53 +01:00
Serge E. Hallyn
1d1e97562e keys: distinguish per-uid keys in different namespaces
per-uid keys were looked by uid only.  Use the user namespace
to distinguish the same uid in different namespaces.

This does not address key_permission.  So a task can for instance
try to join a keyring owned by the same uid in another namespace.
That will be handled by a separate patch.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-02-27 12:35:06 +11:00
Peter Zijlstra
8325d9c09d sched_clock: cleanups
- remove superfluous checks in __update_sched_clock()
- skip sched_clock_tick() for sched_clock_stable
- reinstate the simple !HAVE_UNSTABLE_SCHED_CLOCK code to please the bloatwatch

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 21:56:07 +01:00
Ingo Molnar
b342501cd3 sched: allow architectures to specify sched_clock_stable
Allow CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures to still specify
that their sched_clock() implementation is reliable.

This will be used by x86 to switch on a faster sched_clock_cpu()
implementation on certain CPU types.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 21:20:22 +01:00
John Stultz
a2a5ac8650 time: ntp: fix bug in ntp_update_offset() & do_adjtimex(), fix
The time_status conditional was accidentally placed right after we clear
the checked time_status bits, which causes us to take the conditional
every time through. This fixes it by moving the conditional to before we
clear the time_status bits.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Clark Williams <williams@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 19:39:47 +01:00
Hiroshi Shimamoto
cac64d00c2 sched_rt: don't start timer when rt bandwidth disabled
Impact: fix incorrect condition check

No need to start rt bandwidth timer when rt bandwidth is disabled.
If this timer starts, it may stop at sched_rt_period_timer() on the first time.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 14:18:55 +01:00
Li Zefan
c40c6f85a7 cpuacct: add a branch prediction
cpuacct_charge() is in fast-path, and checking of !cpuacct_susys.active
always returns false after cpuacct has been initialized at system boot.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 13:22:38 +01:00
Ingo Molnar
4434e51564 Merge branches 'sched/cleanups', 'sched/urgent' and 'linus' into sched/core 2009-02-26 13:22:13 +01:00
Paul E. McKenney
a682604838 rcu: Teach RCU that idle task is not quiscent state at boot
This patch fixes a bug located by Vegard Nossum with the aid of
kmemcheck, updated based on review comments from Nick Piggin,
Ingo Molnar, and Andrew Morton.  And cleans up the variable-name
and function-name language.  ;-)

The boot CPU runs in the context of its idle thread during boot-up.
During this time, idle_cpu(0) will always return nonzero, which will
fool Classic and Hierarchical RCU into deciding that a large chunk of
the boot-up sequence is a big long quiescent state.  This in turn causes
RCU to prematurely end grace periods during this time.

This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
function to ignore the idle task as a quiescent state until the
system has started up the scheduler in rest_init(), introducing a
new non-API function rcu_idle_now_means_idle() to inform RCU of this
transition.  RCU maintains an internal rcu_idle_cpu_truthful variable
to track this state, which is then used by rcu_check_callback() to
determine if it should believe idle_cpu().

Because this patch has the effect of disallowing RCU grace periods
during long stretches of the boot-up sequence, this patch also introduces
Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
no-op if num_online_cpus() returns 1.  This allows boot-time code that
calls synchronize_rcu() to proceed normally.  Note, however, that RCU
callbacks registered by call_rcu() will likely queue up until later in
the boot sequence.  Although rcuclassic and rcutree can also use this
same optimization after boot completes, rcupreempt must restrict its
use of this optimization to the portion of the boot sequence before the
scheduler starts up, given that an rcupreempt RCU read-side critical
section may be preeempted.

In addition, this patch takes Nick Piggin's suggestion to make the
system_state global variable be __read_mostly.

Changes since v4:

o	Changes the name of the introduced function and variable to
	be less emotional.  ;-)

Changes since v3:

o	WARN_ON(nr_context_switches() > 0) to verify that RCU
	switches out of boot-time mode before the first context
	switch, as suggested by Nick Piggin.

Changes since v2:

o	Created rcu_blocking_is_gp() internal-to-RCU API that
	determines whether a call to synchronize_rcu() is itself
	a grace period.

o	The definition of rcu_blocking_is_gp() for rcuclassic and
	rcutree checks to see if but a single CPU is online.

o	The definition of rcu_blocking_is_gp() for rcupreempt
	checks to see both if but a single CPU is online and if
	the system is still in early boot.

	This allows rcupreempt to again work correctly if running
	on a single CPU after booting is complete.

o	Added check to rcupreempt's synchronize_sched() for there
	being but one online CPU.

Tested all three variants both SMP and !SMP, booted fine, passed a short
rcutorture test on both x86 and Power.

Located-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 04:08:14 +01:00
Ingo Molnar
39854fe8c1 time: ntp: clean up second_overflow()
Impact: cleanup, no functionality changed

The 'time_adj' local variable is named in a very confusing
way because it almost shadows the 'time_adjust' global
variable - which is used in this same function.

Rename it to 'delta' - to make them stand apart more clearly.

kernel/time/ntp.o:

   text	   data	    bss	    dec	    hex	filename
   2545	    114	    144	   2803	    af3	ntp.o.before
   2545	    114	    144	   2803	    af3	ntp.o.after

md5:
   1bf0b3be564512279ba7cee299d1d2be  ntp.o.before.asm
   1bf0b3be564512279ba7cee299d1d2be  ntp.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:17 +01:00
Ingo Molnar
069569e025 time: ntp: simplify ntp_tick_adj calculations
Impact: micro-optimization

Convert the (internal) ntp_tick_adj value we store from unscaled
units to scaled units. This is a constant that we never modify,
so scaling it up once during bootup is enough - we dont have to
do it for every adjustment step.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:16 +01:00
Ingo Molnar
2b9d1496e7 time: ntp: make 64-bit constants more robust
Impact: cleanup, no functionality changed

 - make PPM_SCALE an explicit s64 constant, to
   remove (s64) casts from usage sites.

kernel/time/ntp.o:

   text	   data	    bss	    dec	    hex	filename
   2536	    114	    136	   2786	    ae2	ntp.o.before
   2536	    114	    136	   2786	    ae2	ntp.o.after

md5:
   40a7728d1188aa18e83e21a81fa7b150  ntp.o.before.asm
   40a7728d1188aa18e83e21a81fa7b150  ntp.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:15 +01:00
Ingo Molnar
e96291653b time: ntp: refactor do_adjtimex() some more
Impact: cleanup, no functionality changed

Further simplify do_adjtimex():

 - introduce the ntp_start_leap_timer() helper function
 - eliminate the goto adj_done complication

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:15 +01:00
Ingo Molnar
80f2257116 time: ntp: refactor do_adjtimex()
Impact: cleanup, no functionality changed

do_adjtimex() is currently a monster function with a maze of
branches. Refactor the txc->modes setting aspects of it into
two new helper functions:

	process_adj_status()
	process_adjtimex_modes()

kernel/time/ntp.o:

   text	   data	    bss	    dec	    hex	filename
   2512	    114	    136	   2762	    aca	ntp.o.before
   2512	    114	    136	   2762	    aca	ntp.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:14 +01:00
Ingo Molnar
10dd31a7a1 time: ntp: fix bug in ntp_update_offset() & do_adjtimex()
Impact: change (fix) the way the NTP PLL seconds offset is initialized/tracked

Fix a bug and do a micro-optimization:

When PLL is enabled we do not reset time_reftime. If the PLL
was off for a long time (for example after bootup), this is
arguably the wrong thing to do.

We already had a hack for the common boot-time case in
ntp_update_offset(), in form of:

	if (unlikely(time_status & STA_FREQHOLD || time_reftime == 0))
 		secs = 0;

But the update delta should be reset later on too - not just when
the PLL is enabled for the first time after bootup.

So do it on !STA_PLL -> STA_PLL transitions.

This changes behavior, as previously if ntpd was disabled for
a long time and we restarted it, we'd run from that last update,
with a very large delta.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:13 +01:00
Ingo Molnar
c7986acba2 time: ntp: micro-optimize ntp_update_offset()
Impact: cleanup, no functionality changed

The time_reftime update in ntp_update_offset() to xtime.tv_sec
is a convoluted way of saying that we want to freeze the frequency
and want the 'secs' delta to be 0. Also make this branch unlikely.

This shaves off 8 bytes from the code size:

   text	   data	    bss	    dec	    hex	filename
   2504	    114	    136	   2754	    ac2	ntp.o.before
   2496	    114	    136	   2746	    aba	ntp.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:12 +01:00
Ingo Molnar
478b7aab16 time: ntp: simplify ntp_update_offset_fll()
Impact: cleanup, no functionality changed

Change ntp_update_offset_fll() to delta logic instead of
absolute value logic. This eliminates 'freq_adj' from the
function.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:12 +01:00
Ingo Molnar
f939890b66 time: ntp: refactor and clean up ntp_update_offset()
Impact: cleanup, no functionality changed

- introduce the ntp_update_offset_fll() helper
- clean up the flow and variable naming

kernel/time/ntp.o:

   text	   data	    bss	    dec	    hex	filename
   2504	    114	    136	   2754	    ac2	ntp.o.before
   2504	    114	    136	   2754	    ac2	ntp.o.after

md5:
   01f7b8e1a5472a3056f9e4ae84d46315  ntp.o.before.asm
   01f7b8e1a5472a3056f9e4ae84d46315  ntp.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:11 +01:00
Ingo Molnar
bc26c31d44 time: ntp: refactor up ntp_update_frequency()
Impact: cleanup, no functionality changed

Change ntp_update_frequency() from a hard to follow code
flow that uses global variables as temporaries, to a clean
input+output flow.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:10 +01:00
Ingo Molnar
9ce616aaef time: ntp: clean up ntp_update_frequency()
Impact: cleanup, no functionality changed

Prepare a refactoring of ntp_update_frequency().

kernel/time/ntp.o:

   text	   data	    bss	    dec	    hex	filename
   2504	    114	    136	   2754	    ac2	ntp.o.before
   2504	    114	    136	   2754	    ac2	ntp.o.after

md5:
   41f3009debc9b397d7394dd77d912f0a  ntp.o.before.asm
   41f3009debc9b397d7394dd77d912f0a  ntp.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:09 +01:00
Ingo Molnar
bbd1267690 time: ntp: simplify the MAX_TICKADJ_SCALED definition
Impact: cleanup, no functionality changed

There's an ugly u64 typecase in the MAX_TICKADJ_SCALED definition,
this can be eliminated by making the MAX_TICKADJ constant's type
64-bit (signed).

kernel/time/ntp.o:

   text	   data	    bss	    dec	    hex	filename
   2504	    114	    136	   2754	    ac2	ntp.o.before
   2504	    114	    136	   2754	    ac2	ntp.o.after

md5:
   41f3009debc9b397d7394dd77d912f0a  ntp.o.before.asm
   41f3009debc9b397d7394dd77d912f0a  ntp.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:08 +01:00
Ingo Molnar
3c972c2444 time: ntp: simplify the second_overflow() code flow
Impact: cleanup, no functionality changed

Instead of a hierarchy of conditions, transform them to clean
gradual conditions and return's.

This makes the flow easier to read and makes the purpose of
the function easier to understand.

kernel/time/ntp.o:

   text	   data	    bss	    dec	    hex	filename
   2552	    170	    168	   2890	    b4a	ntp.o.before
   2552	    170	    168	   2890	    b4a	ntp.o.after

md5:
   eae1275df0b7d6290c13f6f6f8f05c8c  ntp.o.before.asm
   eae1275df0b7d6290c13f6f6f8f05c8c  ntp.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:07 +01:00
Ingo Molnar
53bbfa9e94 time: ntp: clean up kernel/time/ntp.c
Impact: cleanup, no functionality changed

Make this file a bit more readable by applying a consistent coding style.

No code changed:

kernel/time/ntp.o:

   text	   data	    bss	    dec	    hex	filename
   2552	    170	    168	   2890	    b4a	ntp.o.before
   2552	    170	    168	   2890	    b4a	ntp.o.after

md5:
   eae1275df0b7d6290c13f6f6f8f05c8c  ntp.o.before.asm
   eae1275df0b7d6290c13f6f6f8f05c8c  ntp.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:38:06 +01:00
Ingo Molnar
0b13fda1e0 generic-ipi: cleanups
Andrew pointed out that there's some small amount of
style rot in kernel/smp.c.

Clean it up.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 16:52:50 +01:00
Peter Zijlstra
6e2756376c generic-ipi: remove CSD_FLAG_WAIT
Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework
the code so that we can use CSD_FLAG_LOCK for both purposes.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 14:13:44 +01:00
Peter Zijlstra
8969a5ede0 generic-ipi: remove kmalloc()
Remove the use of kmalloc() from the smp_call_function_*()
calls.

Steven's generic-ipi patch (d7240b98: generic-ipi: use per cpu
data for single cpu ipi calls) started the discussion on the use
of kmalloc() in this code and fixed the
smp_call_function_single(.wait=0) fallback case.

In this patch we complete this by also providing means for the
_many() call, which fully removes the need for kmalloc() in this
code.

The problem with the _many() call is that other cpus might still
be observing our entry when we're done with it. It solved this
by dynamically allocating data elements and RCU-freeing it.

We solve it by using a single per-cpu entry which provides
static storage and solves one half of the problem (avoiding
referencing freed data).

The other half, ensuring the queue iteration it still possible,
is done by placing re-used entries at the head of the list. This
means that if someone was still iterating that entry when it got
moved, he will now re-visit the entries on the list he had
already seen, but avoids skipping over entries like would have
happened had we placed the new entry at the end.

Furthermore, visiting entries twice is not a problem, since we
remove our cpu from the entry's cpumask once its called.

Many thanks to Oleg for his suggestions and him poking holes in
my earlier attempts.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 14:13:43 +01:00
Nick Piggin
15d0d3b337 generic IPI: simplify barriers and locking
Simplify the barriers in generic remote function call interrupt
code.

Firstly, just unconditionally take the lock and check the list
in the generic_call_function_single_interrupt IPI handler. As
we've just taken an IPI here, the chances are fairly high that
there will be work on the list for us, so do the locking
unconditionally. This removes the tricky lockless list_empty
check and dubious barriers. The change looks bigger than it is
because it is just removing an outer loop.

Secondly, clarify architecture specific IPI locking rules.
Generic code has no tools to impose any sane ordering on IPIs if
they go outside normal cache coherency, ergo the arch code must
make them appear to obey cache coherency as a "memory operation"
to initiate an IPI, and a "memory operation" to receive one.
This way at least they can be reasoned about in generic code,
and smp_mb used to provide ordering.

The combination of these two changes means that explict barriers
can be taken out of queue handling for the single case -- shared
data is explicitly locked, and ipi ordering must conform to
that, so no barriers needed. An extra barrier is needed in the
many handler, so as to ensure we load the list element after the
IPI is received.

Does any architecture actually *need* these barriers? For the
initiator I could see it, but for the handler I would be
surprised. So the other thing we could do for simplicity is just
to require that, rather than just matching with cache coherency,
we just require a full barrier before generating an IPI, and
after receiving an IPI. In which case, the smp_mb()s can go
away. But just for now, we'll be on the safe side and use the
barriers (they're in the slow case anyway).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-arch@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:27:08 +01:00
Ingo Molnar
0edcf8d692 Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu
Conflicts:
	arch/x86/include/asm/pgtable.h
2009-02-24 21:52:45 +01:00
David S. Miller
e70049b9e7 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2009-02-24 03:50:29 -08:00
Benjamin Herrenschmidt
35f88e6b06 Merge commit 'ftrace/function-graph' into next 2009-02-23 10:47:23 +11:00
Ingo Molnar
fc6fc7f1b1 Merge branch 'linus' into x86/apic
Conflicts:
	arch/x86/mach-default/setup.c

Semantic conflict resolution:
	arch/x86/kernel/setup.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 20:05:19 +01:00
Rafael J. Wysocki
770824bdc4 PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
Move the sysdev_suspend/resume from the callee to the callers, with
no real change in semantics, so that we can rework the disabling of
interrupts during suspend/hibernation.

This is based on an earlier patch from Linus.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-22 10:33:44 -08:00
Linus Torvalds
adfafefd10 Merge branch 'hibernate'
* hibernate:
  PM: Fix suspend_console and resume_console to use only one semaphore
  PM: Wait for console in resume
  PM: Fix pm_notifiers during user mode hibernation
  swsusp: clean up shrink_all_zones()
  swsusp: dont fiddle with swappiness
  PM: fix build for CONFIG_PM unset
  PM/hibernate: fix "swap breaks after hibernation failures"
  PM/resume: wait for device probing to finish
  Consolidate driver_probe_done() loops into one place
2009-02-21 14:17:26 -08:00
Arve Hjønnevåg
403f307576 PM: Fix suspend_console and resume_console to use only one semaphore
This fixes a race where a thread acquires the console while the
console is suspended, and the console is resumed before this
thread releases it. In this case, the secondary console
semaphore would be left locked, and the primary semaphore would
be released twice. This in turn would cause the console switch
on suspend or resume to hang forever.

Note that suspend_console does not actually lock the console
for clients that use acquire_console_sem, it only locks it for
clients that use try_acquire_console_sem. If we change
suspend_console to fully lock the console, then the kernel
may deadlock on suspend. One client of try_acquire_console_sem
is acquire_console_semaphore_for_printk, which uses it to
prevent printk from using the console while it is suspended.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-21 14:17:18 -08:00
Arve Hjønnevåg
b090f9fa53 PM: Wait for console in resume
Avoids later waking up to a blinking cursor if the device woke up and
returned to sleep before the console switch happened.

Signed-off-by: Brian Swetland <swetland@google.com>
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-21 14:17:17 -08:00
Andrey Borzenkov
ebae2604f2 PM: Fix pm_notifiers during user mode hibernation
Snapshot device is opened with O_RDONLY during suspend and O_WRONLY durig
resume.  Make sure we also call notifiers with correct parameter telling
them what we are really doing.

Signed-off-by: Andrey Borzenkov <arvidjaar@mail.ru>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <gregkh@suse.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-21 14:17:17 -08:00
Rafael J. Wysocki
09664fda48 PM: fix build for CONFIG_PM unset
Compilation of kprobes.c with CONFIG_PM unset is broken due to some broken
config dependncies.  Fix that.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <gregkh@suse.de>
Reported-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-21 14:17:17 -08:00
Arjan van de Ven
eed3ee0829 PM/resume: wait for device probing to finish
the resume code does not currently wait for device probing to finish.
Even without async function calls this is dicey and not correct,
but with async function calls during the boot sequence this is going
to get hit more...

This patch adds the synchronization using the newly introduced helper.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-21 14:17:17 -08:00
Ingo Molnar
3b6f7b9beb Merge branch 'x86/urgent' into x86/core 2009-02-20 17:40:43 +01:00
Tejun Heo
fbf59bc9d7 percpu: implement new dynamic percpu allocator
Impact: new scalable dynamic percpu allocator which allows dynamic
        percpu areas to be accessed the same way as static ones

Implement scalable dynamic percpu allocator which can be used for both
static and dynamic percpu areas.  This will allow static and dynamic
areas to share faster direct access methods.  This feature is optional
and enabled only when CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is defined by
arch.  Please read comment on top of mm/percpu.c for details.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
2009-02-20 16:29:08 +09:00
Rusty Russell
b36128c830 alloc_percpu: change percpu_ptr to per_cpu_ptr
Impact: cleanup

There are two allocated per-cpu accessor macros with almost identical
spelling.  The original and far more popular is per_cpu_ptr (44
files), so change over the other 4 files.

tj: kill percpu_ptr() and update UP too

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: mingo@redhat.com
Cc: lenb@kernel.org
Cc: cpufreq@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-20 16:29:08 +09:00
Tejun Heo
6b588c18f8 module: reorder module pcpu related functions
Impact: cleanup

Move percpu_modinit() upwards.  This is to ease further changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-20 16:29:07 +09:00
Linus Torvalds
f54b2fe4ae Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: limit the number of loops the ring buffer self test can make
  tracing: have function trace select kallsyms
  tracing: disable tracing while testing ring buffer
  tracing/function-graph-tracer: trace the idle tasks
2009-02-19 09:14:22 -08:00
john stultz
fdcedf7b75 time: apply NTP frequency/tick changes immediately
Since the GENERIC_TIME changes landed, the adjtimex behavior changed
for struct timex.tick and .freq changed. When the tick or freq value
is set, we adjust the tick_length_base in ntp_update_frequency().
However, this new value doesn't get applied to tick_length until the
next second (via second_overflow).

This means some applications that do quick time tweaking do not see the
requested change made as quickly as expected.

I've run a few tests with this change, and ntpd still functions fine.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 10:10:08 +01:00
Steven Rostedt
4b3e3d2284 tracing: limit the number of loops the ring buffer self test can make
Impact: prevent deadlock if ring buffer gets corrupted

This patch adds a paranoid check to make sure the ring buffer consumer
does not go into an infinite loop. Since the ring buffer has been set
to read only, the consumer should not loop for more than the ring buffer
size. A check is added to make sure the consumer does not loop more than
the ring buffer size.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-18 22:50:01 -05:00
Steven Rostedt
4d7a077c0c tracing: have function trace select kallsyms
Impact: fix output of function tracer to be useful

The function tracer is pretty useless if KALLSYMS is not configured.
Unless you are good at reading hex values, the function tracer should
select the KALLSYMS configuration.

Also, the dynamic function tracer will fail its self test if KALLSYMS
is not selected.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-18 22:06:18 -05:00
Steven Rostedt
0c5119c1e6 tracing: disable tracing while testing ring buffer
Impact: fix to prevent hard lockup on self tests

If one of the tracers are broken and is constantly filling the ring
buffer while the test of the ring buffer is running, it will hang
the box. The reason is that the test is a consumer that will not
stop till the ring buffer is empty. But if the tracer is broken and
is constantly producing input to the buffer, this test will never
end. The result is a lockup of the box.

This happened when KALLSYMS was not defined and the dynamic ftrace
test constantly filled the ring buffer, because the filter failed
and all functions were being traced. Something was being called
that constantly filled the buffer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-18 22:04:01 -05:00
Linus Torvalds
ba95fd47d1 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list
  block: fix booting from partitioned md array
  block: revert part of 18ce3751cc
  cciss: PCI power management reset for kexec
  paride/pg.c: xs(): &&/|| confusion
  fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free
  block: fix bad definition of BIO_RW_SYNC
  bsg: Fix sense buffer bug in SG_IO
2009-02-18 18:33:04 -08:00
Rafael J. Wysocki
42f5e039c3 pm: fix build for CONFIG_PM unset
Compilation of kprobes.c with CONFIG_PM unset is broken due to some broken
config dependncies.  Fix that.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:54 -08:00
Li Zefan
67e055d144 cgroups: fix possible use after free
In cgroup_kill_sb(), root is freed before sb is detached from the list, so
another sget() may find this sb and call cgroup_test_super(), which will
access the root that has been freed.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:54 -08:00
Steven Rostedt
712406a6bf tracing/function-graph-tracer: make arch generic push pop functions
There is nothing really arch specific of the push and pop functions
used by the function graph tracer. This patch moves them to generic
code.

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-18 13:43:04 -05:00
Ingo Molnar
74019224ac timers: add mod_timer_pending()
Impact: new timer API

Based on an idea from Martin Josefsson with the help of
Patrick McHardy and Stephen Hemminger:

introduce the mod_timer_pending() API which is a mod_timer()
offspring that is an invariant on already removed timers.

(regular mod_timer() re-activates non-pending timers.)

This is useful for the networking code in that it can
allow unserialized mod_timer_pending() timer-forwarding
calls, but a single del_timer*() will stop the timer
from being reactivated again.

Also while at it:

- optimize the regular mod_timer() path some more, the
  timer-stat and a debug check was needlessly duplicated
  in __mod_timer().

- make the exports come straight after the function, as
  most other exports in timer.c already did.

- eliminate __mod_timer() as an external API, change the
  users to mod_timer().

The regular mod_timer() code path is not impacted
significantly, due to inlining optimizations and due to
the simplifications.

Based-on-patch-from: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-18 19:26:33 +01:00
Jens Axboe
93dbb39350 block: fix bad definition of BIO_RW_SYNC
We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO
and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before
213d9417fe.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-18 10:32:00 +01:00
Linus Torvalds
e78ac4b9de Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: cpu hotplug fix
2009-02-17 14:30:06 -08:00
Linus Torvalds
29a0c5ce5d Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  timers: more consistently use clock vs timer
2009-02-17 14:29:42 -08:00
Linus Torvalds
f8effd1a4a Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  doc: mmiotrace.txt, buffer size control change
  trace: mmiotrace to the tracer menu in Kconfig
  mmiotrace: count events lost due to not recording
2009-02-17 14:29:15 -08:00
Linus Torvalds
8ce9a75a30 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  iommu: fix Intel IOMMU write-buffer flushing
  futex: fix reference leak

Trivial conflicts fixed manually in drivers/pci/intel-iommu.c
2009-02-17 14:26:35 -08:00
Ingo Molnar
f17c75453b irq: name 'p' variables a bit better
'p' stands for pointer - make it clear in setup_irq() and free_irq()
what kind of pointer it is.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 20:44:47 +01:00
Ingo Molnar
8316e38100 irq: further clean up the free_irq() code flow
Linus noticed that the 'pp' variable can be eliminated
altogether, and the loop can be cleaned up further.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 20:28:29 +01:00