Commit Graph

9398 Commits

Author SHA1 Message Date
KAMEZAWA Hiroyuki
58ae83db2a per-zone and reclaim enhancements for memory controller: calculate mapper_ratio per cgroup
Define function for calculating mapped_ratio in memory cgroup.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Paul Menage <menage@google.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:21 -08:00
KAMEZAWA Hiroyuki
4fca88c87b memory cgroup enhancements: add- pre_destroy() handler
Add a handler "pre_destroy" to cgroup_subsys.  It is called before
cgroup_rmdir() checks all subsys's refcnt.

I think this is useful for subsys which have some extra refs even if there
are no tasks in cgroup.  By adding pre_destroy(), the kernel keeps the rule
"destroy() against subsystem is called only when refcnt=0." and allows css
ref to be used by other objects than tasks.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Paul Menage <menage@google.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:20 -08:00
KAMEZAWA Hiroyuki
ae41be3742 bugfix for memory cgroup controller: migration under memory controller fix
While using memory control cgroup, page-migration under it works as following.
==
 1. uncharge all refs at try to unmap.
 2. charge regs again remove_migration_ptes()
==
This is simple but has following problems.
==
 The page is uncharged and charged back again if *mapped*.
    - This means that cgroup before migration can be different from one after
      migration
    - If page is not mapped but charged as page cache, charge is just ignored
      (because not mapped, it will not be uncharged before migration)
      This is memory leak.
==
This patch tries to keep memory cgroup at page migration by increasing
one refcnt during it. 3 functions are added.

 mem_cgroup_prepare_migration() --- increase refcnt of page->page_cgroup
 mem_cgroup_end_migration()     --- decrease refcnt of page->page_cgroup
 mem_cgroup_page_migration() --- copy page->page_cgroup from old page to
                                 new page.

During migration
  - old page is under PG_locked.
  - new page is under PG_locked, too.
  - both old page and new page is not on LRU.

These 3 facts guarantee that page_cgroup() migration has no race.

Tested and worked well in x86_64/fake-NUMA box.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:19 -08:00
David Rientjes
4c4a221489 memcontrol: move oom task exclusion to tasklist scan
Creates a helper function to return non-zero if a task is a member of a
memory controller:

	int task_in_mem_cgroup(const struct task_struct *task,
			       const struct mem_cgroup *mem);

When the OOM killer is constrained by the memory controller, the exclusion
of tasks that are not a member of that controller was previously misplaced
and appeared in the badness scoring function.  It should be excluded
during the tasklist scan in select_bad_process() instead.

[akpm@linux-foundation.org: build fix]
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:19 -08:00
David Rientjes
3062fc67da memcontrol: move mm_cgroup to header file
Inline functions must preceed their use, so mm_cgroup() should be defined
in linux/memcontrol.h.

include/linux/memcontrol.h:48: warning: 'mm_cgroup' declared inline after
	being called
include/linux/memcontrol.h:48: warning: previous declaration of
	'mm_cgroup' was here

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:19 -08:00
Balbir Singh
e1a1cd590e Memory controller: make charging gfp mask aware
Nick Piggin pointed out that swap cache and page cache addition routines
could be called from non GFP_KERNEL contexts.  This patch makes the
charging routine aware of the gfp context.  Charging might fail if the
cgroup is over it's limit, in which case a suitable error is returned.

This patch was tested on a Powerpc box.  I am still looking at being able
to test the path, through which allocations happen in non GFP_KERNEL
contexts.

[kamezawa.hiroyu@jp.fujitsu.com: problem with ZONE_MOVABLE]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:19 -08:00
Balbir Singh
bed7161a51 Memory controller: make page_referenced() cgroup aware
Make page_referenced() cgroup aware.  Without this patch, page_referenced()
can cause a page to be skipped while reclaiming pages.  This patch ensures
that other cgroups do not hold pages in a particular cgroup hostage.  It
is required to ensure that shared pages are freed from a cgroup when they
are not actively referenced from the cgroup that brought them in

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:19 -08:00
Balbir Singh
8697d33194 Memory controller: add switch to control what type of pages to limit
Choose if we want cached pages to be accounted or not.  By default both are
accounted for.  A new set of tunables are added.

echo -n 1 > mem_control_type

switches the accounting to account for only mapped pages

echo -n 3 > mem_control_type

switches the behaviour back

[bunk@kernel.org: mm/memcontrol.c: clenups]
[akpm@linux-foundation.org: fix sparc32 build]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:19 -08:00
Pavel Emelianov
c7ba5c9e81 Memory controller: OOM handling
Out of memory handling for cgroups over their limit. A task from the
cgroup over limit is chosen using the existing OOM logic and killed.

TODO:
1. As discussed in the OLS BOF session, consider implementing a user
space policy for OOM handling.

[akpm@linux-foundation.org: fix build due to oom-killer changes]
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:19 -08:00
Balbir Singh
0eea103017 Memory controller improve user interface
Change the interface to use bytes instead of pages.  Page sizes can vary
across platforms and configurations.  A new strategy routine has been added
to the resource counters infrastructure to format the data as desired.

Suggested by David Rientjes, Andrew Morton and Herbert Poetzl

Tested on a UML setup with the config for memory control enabled.

[kamezawa.hiroyu@jp.fujitsu.com: possible race fix in res_counter]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:18 -08:00
Balbir Singh
66e1707bc3 Memory controller: add per cgroup LRU and reclaim
Add the page_cgroup to the per cgroup LRU.  The reclaim algorithm has
been modified to make the isolate_lru_pages() as a pluggable component.  The
scan_control data structure now accepts the cgroup on behalf of which
reclaims are carried out.  try_to_free_pages() has been extended to become
cgroup aware.

[akpm@linux-foundation.org: fix warning]
[Lee.Schermerhorn@hp.com: initialize all scan_control's isolate_pages member]
[bunk@kernel.org: make do_try_to_free_pages() static]
[hugh@veritas.com: memcgroup: fix try_to_free order]
[kamezawa.hiroyu@jp.fujitsu.com: this unlock_page_cgroup() is unnecessary]
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:18 -08:00
Balbir Singh
8a9f3ccd24 Memory controller: memory accounting
Add the accounting hooks.  The accounting is carried out for RSS and Page
Cache (unmapped) pages.  There is now a common limit and accounting for both.
The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap()
time.  Page cache is accounted at add_to_page_cache(),
__delete_from_page_cache().  Swap cache is also accounted for.

Each page's page_cgroup is protected with the last bit of the
page_cgroup pointer, this makes handling of race conditions involving
simultaneous mappings of a page easier.  A reference count is kept in the
page_cgroup to deal with cases where a page might be unmapped from the RSS
of all tasks, but still lives in the page cache.

Credits go to Vaidyanathan Srinivasan for helping with reference counting work
of the page cgroup.  Almost all of the page cache accounting code has help
from Vaidyanathan Srinivasan.

[hugh@veritas.com: fix swapoff breakage]
[akpm@linux-foundation.org: fix locking]
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: <Valdis.Kletnieks@vt.edu>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:18 -08:00
Pavel Emelianov
78fb74669e Memory controller: accounting setup
Basic setup routines, the mm_struct has a pointer to the cgroup that
it belongs to and the the page has a page_cgroup associated with it.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:18 -08:00
Balbir Singh
8cdea7c054 Memory controller: cgroups setup
Setup the memory cgroup and add basic hooks and controls to integrate
and work with the cgroup.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:18 -08:00
Pavel Emelianov
e552b66170 Memory controller: resource counters
With fixes from David Rientjes <rientjes@google.com>

Introduce generic structures and routines for resource accounting.

Each resource accounting cgroup is supposed to aggregate it,
cgroup_subsystem_state and its resource-specific members within.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:18 -08:00
Alan Cox
3dddbfc301 tty: Kill TTY_FLIPBUF_SIZE
This legacy define from the old buffer code is now only used in a single
power pc driver than doesn't compile anyway.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:16 -08:00
Erez Zadok
deb21db778 VFS: swap do_ioctl and vfs_ioctl names
Rename old vfs_ioctl to do_ioctl, because the comment above it clearly
indicates that it is an internal function not to be exported to modules;
therefore it should have a more traditional do_XXX name.  The new do_ioctl
is exported in fs.h but not to modules.

Rename the old do_ioctl to vfs_ioctl because the names vfs_XXX should
preferably be reserved to callable VFS functions which modules may call, as
many other vfs_XXX functions already do.  Export the new vfs_ioctl to GPL
modules so others can use it (including Unionfs and eCryptfs).  Add DocBook
for new vfs_ioctl.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:16 -08:00
Philipp Zabel
4aa323bd83 DS1WM: decouple host IRQ and INTR active state settings
The DS1WM driver incorrectly infers the IAS bit (1-wire interrupt active
high) from IRQ settings.  There are devices that have IAS=0 but still need
the IRQ to trigger on a rising edge.  With this patch, machines with DS1WM
that need IAS=1 have to set .active_high=1 in the ds1wm_platform_data.

Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Acked-by: Matt Reimer <mreimer@vpop.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:06 -08:00
Richard Purdie
388bbb09b9 [MTD] Add mtd panic_write function pointer
MTDs are well suited for logging critical data and the mtdoops driver
allows kernel panics/oops to be written to flash in a blackbox flight
recorder fashion allowing better debugging and analysis of crashes.

Any kernel oops in user context can be easily handled since the kernel
continues as normal and any queued mtd writes are scheduled. Any kernel
oops in interrupt context results in a panic and the delayed writes will
not be scheduled however. The existing mtd->write function cannot be
called in interrupt context so these messages can never be written to
flash.

This patch adds a panic_write function pointer that drivers can
optionally implement which can be called in interrupt context. It is
only intended to be called when its known the kernel is about to panic
and we need to write to succeed. Since the kernel is not going to be
running for much longer, this function can break locks and delay to
ensure the write succeeds (but not sleep).

Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2008-02-07 10:30:48 +00:00
Márton Németh
4c79141d28 leds: Add support for hardware accelerated LED flashing
Extends the leds subsystem with a blink_set() callback function which can
be optionally implemented by a LED driver. If implemented, the driver can use
the hardware acceleration for blinking a LED.

Signed-off-by: Márton Németh <nm127@freemail.hu>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
2008-02-07 09:49:38 +00:00
Roland McGrath
24f1a84961 [POWERPC] Add SPE registers to core dumps
This makes the SPE register data appear in ELF core dumps, using the
new n_type value NT_PPC_SPE (0x101).  This new note type is not used
by any consumers of core files yet, but support can be added.  I don't
even have any hardware with SPE capabilities, so I've never seen such
a note.  But this demonstrates how simple it is to export register
information in core dumps when the user_regset style is used for the
low-level code.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-02-07 20:40:23 +11:00
Len Brown
9b71315421 Revert "cpuidle: build fix for non-x86"
This reverts commit f757397097.
which ironically broke the ia64 build
2008-02-07 04:16:34 -05:00
Len Brown
81e242d0ef Merge branches 'release' and 'dsdt-override' into release 2008-02-07 04:01:53 -05:00
Len Brown
a733a5da97 Merge branches 'release' and 'fluff' into release
Conflicts:

	drivers/acpi/scan.c
	include/linux/acpi.h

Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-07 03:38:22 -05:00
Jean Delvare
ee1ce6fcb3 ACPI: cleanup acpi.h
Two cleanups to <linux/acpi.h>:
* Stop defining acpi_mp_config, it isn't used anywhere.
* Discard nested "#ifdef CONFIG_ACPI", they are useless and
  error-prone.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-07 03:32:27 -05:00
Len Brown
299cfe3808 Merge branches 'release' and 'hwmon-conflicts' into release 2008-02-07 03:31:17 -05:00
Len Brown
060195500e Merge branches 'release' and 'wmi-2.6.25' into release 2008-02-07 03:19:43 -05:00
Len Brown
26b6f22366 Merge branches 'release' and 'menlo' into release
Conflicts:

	drivers/acpi/video.c

Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-07 03:18:04 -05:00
Len Brown
e5e54bc86a Merge branches 'release' and 'stats' into release 2008-02-07 03:13:36 -05:00
Len Brown
5531d28504 Merge branches 'release' and 'dmi' into release 2008-02-07 03:11:31 -05:00
Len Brown
acf63867ae Merge branches 'release', 'cpuidle-2.6.25' and 'idle' into release 2008-02-07 03:11:05 -05:00
Len Brown
c64768a7d6 Merge branches 'release', 'bugzilla-6217', 'bugzilla-6629', 'bugzilla-6933', 'bugzilla-7186', 'bugzilla-8269', 'bugzilla-8570', 'bugzilla-9139', 'bugzilla-9277', 'bugzilla-9341', 'bugzilla-9444', 'bugzilla-9614', 'bugzilla-9643' and 'bugzilla-9644' into release 2008-02-07 03:09:43 -05:00
Len Brown
dd07a8db72 Merge branches 'release', 'asus', 'sony-laptop' and 'thinkpad' into release 2008-02-07 03:07:35 -05:00
Len Brown
877c357e75 Merge branches 'release', 'acpi_pm_device_sleep_state' and 'battery' into release 2008-02-07 03:07:03 -05:00
venkatesh.pallipadi@intel.com
9a0b841586 cpuidle: Add a poll_idle method
Add a default poll idle state with 0 latency. Provides an option to users
to use poll_idle by using 0 as the latency requirement.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-07 02:20:15 -05:00
Thomas Renninger
443dea72d5 ACPI: Export acpi_check_resource_conflict
Export acpi_check_resource_conflict(), sometimes drivers already have
a struct resource at hand so no need to use the wrappers to build a new
one.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: "Mark M. Hoffman" <mhoffman@lightlink.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-07 01:00:23 -05:00
Thomas Renninger
df92e69599 ACPI: track opregion names to avoid driver resource conflicts.
Small ACPICA extension to be able to store the name of operation regions in osl.c later

In ACPI, AML can define accesses to IO ports and System Memory by Operation
Regions.  Those are not registered as done by PNPACPI using resource templates
(and _CRS/_SRS methods).

The IO ports and System Memory regions may get accessed by arbitrary AML code.
 When native drivers are accessing the same resources bad things can happen
(e.g.  a critical shutdown temperature of 3000 C every 2 months or so).

It is not really possible to register the operation regions via
request_resource, as they often overlap with pnp or other resources (e.g.
statically setup IO resources below 0x100).

This approach stores all Operation Region declarations (IO and System Memory
only) at ACPI table parse time.  It offers a similar functionality like
request_region and let drivers which are known to possibly use the same IO
ports and Memory which are also often used by ACPI (hwmon and i2c) check for
ACPI interference.

A boot parameter acpi_enforce_resources=strict/lax/no is provided, which
is default set to lax:
  - strict: let conflicting drivers fail to load with an error message
  - lax:    let conflicting driver work normal with a warning message
  - no:     no functional change at all
Depending on the feedback and the kind of interferences we see, this
should be set to strict at later time.

Goal of this patch set is:
  - Identify ACPI interferences in bug reports (very hard to reproduce
    and to identify)
  - Find BIOSes for that an ACPI driver should exist for specific HW
    instead of a native one.
  - stability in general

Provide acpi_check_{mem_}region.

Drivers can additionally check against possible ACPI interference by also
invoking this shortly before they call request_region.
If -EBUSY is returned, the driver must not load.
Use acpi_enforce_resources=strict/lax/no options to:
  - strict: let conflicting drivers fail to load with an error message
  - lax:    let conflicting driver work normal with a warning message
  - no:     no functional change at all

Cc: "Mark M. Hoffman" <mhoffman@lightlink.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-07 00:59:18 -05:00
Venki Pallipadi
9e76988e93 [CPUFREQ] Eliminate cpufreq_userspace scaling_setspeed deadlock
Eliminate cpufreq_userspace scaling_setspeed deadlock.

Luming Yu recently uncovered yet another cpufreq related deadlock.
One thread that continuously switches the governors and the other thread that
repeatedly cats the contents of cpufreq directory causes both these threads to
go into a deadlock.

Detailed examination of the deadlock showed the exact flow before the deadlock
as:

Thread 1			Thread 2
________			________
				cats files under /sys/devices/.../cpufreq/
Set governor to userspace
  Adds a new sysfs entry for
  scaling_setspeed
				cats files under /sys/devices/.../cpufreq/

Set governor to performance
  Holds cpufreq_rw_sem in write
  mode
  Sends a STOP notify to
  userspace governor
				cat /sys/devices/.../cpufreq/scaling_setspeed
				  Gets a handle on the above sysfs entry with
				  sysfs_get_active
				  Blocks while trying to get cpufreq_rw_sem
				  in read mode
  Remove a sysfs entry for
  scaling_setspeed
    Blocks on sysfs_deactivate
    while waiting for earlier
    get_active (on other thread)
    to drain

At this point both threads go into deadlock and any other thread that tries to
do anything with sysfs cpufreq will also block.

There seems to be no easy way to avoid this deadlock as long as
cpufreq_userspace adds/removes the sysfs entry under same kobject as cpufreq.
Below patch moves scaling_setspeed to cpufreq.c, keeping it always and calling
back the governor on read/write. This is the cleanest fix I could think of,
even though adding two callbacks in governor structure just for this seems
unnecessary.

Note that the change makes scaling_setspeed under /sys/.../cpufreq permanent
and returns <unsupported> when governor is not userspace.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-02-06 22:57:58 -05:00
Len Brown
5229e87d59 ACPI: create /sys/firmware/acpi/interrupts
See Documentation/ABI/testing/sysfs-firmware-acpi

Based-on-original-patch-by: Luming Yu <luming.yu@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-06 22:27:06 -05:00
Éric Piel
6ed31e92e9 ACPI: Taint kernel on ACPI table override (format corrected)
When an ACPI table is overridden (for now this can happen only for DSDT)
display a big warning and taint the kernel with flag A.

Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-06 22:07:51 -05:00
Linus Torvalds
3e6bdf473f Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
  x86: fix deadlock, make pgd_lock irq-safe
  virtio: fix trivial build bug
  x86: fix mttr trimming
  x86: delay CPA self-test and repeat it
  x86: fix 64-bit sections
  generic: add __FINITDATA
  x86: remove suprious ifdefs from pageattr.c
  x86: mark the .rodata section also NX
  x86: fix iret exception recovery on 64-bit
  cpuidle: dubious one-bit signed bitfield in cpuidle.h
  x86: fix sparse warnings in powernow-k8.c
  x86: fix sparse error in traps_32.c
  x86: trivial sparse/checkpatch in quirks.c
  x86 ptrace: disallow null cs/ss
  MAINTAINERS: RDC R-321x SoC maintainer
  brk randomization: introduce CONFIG_COMPAT_BRK
  brk: check the lower bound properly
  x86: remove X2 workaround
  x86: make spurious fault handler aware of large mappings
  x86: make traps on entry code be debuggable in user space, 64-bit
2008-02-06 13:54:09 -08:00
Ingo Molnar
9f9975a55d generic: add __FINITDATA
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-06 22:39:45 +01:00
Harvey Harrison
b5556a67f0 cpuidle: dubious one-bit signed bitfield in cpuidle.h
fix these sparse warnings:

  CHECK   arch/x86/kernel/acpi/cstate.c
include/linux/cpuidle.h:82:17: error: dubious one-bit signed bitfield
  CHECK   arch/x86/kernel/acpi/processor.c
include/linux/cpuidle.h:82:17: error: dubious one-bit signed bitfield
  CHECK   arch/x86/kernel/cpu/cpufreq/powernow-k7.c
include/linux/cpuidle.h:82:17: error: dubious one-bit signed bitfield
  CHECK   arch/x86/kernel/cpu/cpufreq/powernow-k8.c
include/linux/cpuidle.h:82:17: error: dubious one-bit signed bitfield
  CHECK   arch/x86/kernel/cpu/cpufreq/longhaul.c
include/linux/cpuidle.h:82:17: error: dubious one-bit signed bitfield
  CHECK   arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
include/linux/cpuidle.h:82:17: error: dubious one-bit signed bitfield

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-06 22:39:44 +01:00
Linus Torvalds
3d4d4582e5 Merge branch 'async-tx-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop into fix
* 'async-tx-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop:
  async_tx: allow architecture specific async_tx_find_channel implementations
  async_tx: replace 'int_en' with operation preparation flags
  async_tx: kill tx_set_src and tx_set_dest methods
  async_tx: kill ASYNC_TX_ASSUME_COHERENT
  iop-adma: use LIST_HEAD instead of LIST_HEAD_INIT
  async_tx: use LIST_HEAD instead of LIST_HEAD_INIT
  async_tx: fix compile breakage, mark do_async_xor __always_inline
2008-02-06 11:16:11 -08:00
Linus Torvalds
2dd550b90b Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  ata_piix.c:piix_init_one() must be __devinit
  sata_via.c: Remove missleading comment.
  libata-core: unblacklist HITACHI drives
  sata_nv: fix ATAPI issues with memory over 4GB (v7)
  ata: drivers/ata/sata_mv.c needs dmapool.h
  libata: kill now unused n_iter and fix sata_fsl
  ahci: fix CAP.NP and PI handling
  sata_mv: Support SoC controllers
  Rename: linux/pata_platform.h to linux/ata_platform.h
2008-02-06 10:47:46 -08:00
Linus Torvalds
8755e56825 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (35 commits)
  virtio net: fix oops on interface-up
  Fix PHY Lib support for gianfar and ucc_geth
  forcedeth: preserve registers
  forcedeth: phy status fix
  forcedeth: restart tx/rx
  ipvs: Make wrr "no available servers" error message rate-limited
  [PPPOL2TP]: Label unused warning when CONFIG_PROC_FS is not set.
  [NET_SCHED]: cls_flow: support classification based on VLAN tag
  [VLAN]: Constify skb argument to vlan_get_tag()
  [NET_SCHED]: cls_flow: fix key mask validity check
  [NET_SCHED]: em_meta: fix compile warning
  b43: Fix DMA for 30/32-bit DMA engines
  b43: fix build with CONFIG_SSB_PCIHOST=n
  mac80211: Is not EXPERIMENTAL anymore
  iwl3945-base.c: fix off-by-one errors
  b43legacy: fix DMA slot resource leakage
  b43legacy: drop packets we are not able to encrypt
  b43legacy: fix suspend/resume
  b43legacy: fix PIO crash
  Generic HDLC - use random_ether_addr()
  ...
2008-02-06 10:47:18 -08:00
Olaf Hering
d8fd66aaea jbd.h: hide kernel only code
Move a few kernel-only things into __KERNEL__.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:21 -08:00
Adrian Bunk
533083836f make jbd/journal.c:__journal_abort_hard() static
__journal_abort_hard() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:20 -08:00
Daniel Walker
b3bd86e2fd isapnp driver semaphore to mutex
Changed the isapnp semaphore to a mutex.

[akpm@linux-foundation.org: no externs-in-c]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:20 -08:00
NeilBrown
73c34431c7 md: change ITERATE_RDEV_GENERIC to rdev_for_each_list, and remove ITERATE_RDEV_PENDING.
Finish ITERATE_ to for_each conversion.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:19 -08:00
NeilBrown
d089c6af10 md: change ITERATE_RDEV to rdev_for_each
As this is more in line with common practice in the kernel.  Also swap the
args around to be more like list_for_each.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:19 -08:00
NeilBrown
c5d79adba7 md: allow devices to be shared between md arrays
Currently, a given device is "claimed" by a particular array so that it cannot
be used by other arrays.

This is not ideal for DDF and other metadata schemes which have their own
partitioning concept.

So for externally managed metadata, just claim the device for md in general,
require that "offset" and "size" are set properly for each device, and make
sure that if a device is included in different arrays then the active sections
do not overlap.

This involves adding another flag to the rdev which makes it awkward to set
"->flags = 0" to clear certain flags.  So now clear flags explicitly by name
when we want to clear things.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:18 -08:00
NeilBrown
c620727779 md: allow a maximum extent to be set for resyncing
This allows userspace to control resync/reshape progress and synchronise it
with other activities, such as shared access in a SAN, or backing up critical
sections during a tricky reshape.

Writing a number of sectors (which must be a multiple of the chunk size if
such is meaningful) causes a resync to pause when it gets to that point.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:18 -08:00
NeilBrown
e691063a61 md: support 'external' metadata for md arrays
- Add a state flag 'external' to indicate that the metadata is managed
  externally (by user-space) so important changes need to be
  left of user-space to handle.
  Alternates are non-persistant ('none') where there is no stable metadata -
  after the  array is stopped there is no record of it's status - and
  internal which can be version 0.90 or version 1.x
  These are selected by writing to the 'metadata' attribute.

- move the updating of superblocks (sync_sbs) to after we have checked if
  there are any superblocks or not.

- New array state 'write_pending'.  This means that the metadata records
  the array as 'clean', but a write has been requested, so the metadata has
  to be updated to record a 'dirty' array before the write can continue.
  This change is reported to md by writing 'active' to the array_state
  attribute.

- tidy up marking of sb_dirty:
   - don't set sb_dirty when resync finishes as md_check_recovery
     calls md_update_sb when the sync thread finishes anyway.
   - Don't set sb_dirty in multipath_run as the array might not be dirty.
   - don't mark superblock dirty when switching to 'clean' if there
     is no internal superblock (if external, userspace can choose to
     update the superblock whenever it chooses to).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:18 -08:00
NeilBrown
b47490c9bc md: Update md bitmap during resync.
Currently an md array with a write-intent bitmap does not updated that bitmap
to reflect successful partial resync.  Rather the entire bitmap is updated
when the resync completes.

This is because there is no guarentee that resync requests will complete in
order, and tracking each request individually is unnecessarily burdensome.

However there is value in regularly updating the bitmap, so add code to
periodically pause while all pending sync requests complete, then update the
bitmap.  Doing this only every few seconds (the same as the bitmap update
time) does not notciably affect resync performance.

[snitzer@gmail.com: export bitmap_cond_end_sync]
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: "Mike Snitzer" <snitzer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:18 -08:00
Magnus Damm
dfcffa467b sm501fb: control panel pin usage with platform data flags
This patch makes it possible to control panel pins usage with flags passed
from the platform data.  Without this patch the sm501fb driver always controls
the VBIASEN and FPEN pins.  The polarity and use of these pins are very
platform specific, so this patch introduces the flags
SM501FB_FLAG_PANEL_USE_VBIASEN and SM501FB_FLAG_PANEL_USE_FPEN which enable
the use of these pins.

This patch is needed to support the a Sharp LQ104V1DG21 lcd panel on SuperH
platforms such as R2D-1 and R2D-PLUS boards.  Letting the sm501fb driver
control the FPEN and VBIASEN pins like today just results in lcd panel
flicker.

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:16 -08:00
Guennadi Liakhovetski
f3dc3630f6 gpio: rename pca953x symbols
This second part of an extension to support more pca953x chips renames the C
and Kconfig symbols.  All affected files were updated by sed, except for a
couple of obvious exceptions.  It also updates the Kconfig helptext.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:15 -08:00
Guennadi Liakhovetski
d1c057e317 gpio: rename pca9539 driver
First part of an extension to let the pca9539 driver support more chips,
starting with pca9534, pca9535, pca9536, pca9537, and pca9538.

This renames the files and modifies the Makefile.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:15 -08:00
Ville Syrjala
ad8dc96e3b w1-gpio: add GPIO w1 bus master driver
Add a GPIO 1-wire bus master driver.  The driver used the GPIO API to
control the wire and the GPIO pin can be specified using platform data
similar to i2c-gpio.  The driver was tested with AT91SAM9260 + DS2401.

Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:15 -08:00
Abhishek Sagar
f47cd9b553 kprobes: kretprobe user entry-handler
Provide support to add an optional user defined callback to be run at
function entry of a kretprobe'd function.  Also modify the kprobe smoke
tests to include an entry-handler during the kretprobe sanity test.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:11 -08:00
David Fries
6ffc787a44 system timer: fix crash in <100Hz system timer
The kernel has a divide by zero crash when trying to run the system timer
less than 100Hz.  The problem is x/(HZ/USER_HZ) and related.  Now
x*(USER_HZ/HZ) will be used if HZ<USER_HZ.

I'm running the Linux kernel under qemu and went to run a slower system
timer to take less CPU (and battery) on the host.  I found that the kernel
paniced under emulation because of a divide by zero in three places.  Here
is the patch.  The base git was updated today 01-05-2008.  I went for a
20Hz system time by adding config HZ_20 etc to kernel/Kconfig.hz.  With
this patch I verified the system timer by looking at /proc/interrupts.

[akpm@linux-foundation.org: partially clean up the macro maze]
Signed-off-by: David Fries <david@fries.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:10 -08:00
Eric Dumazet
1bf47346d7 kernel/sys.c: get rid of expensive divides in groups_sort()
groups_sort() can be quite long if user loads a large gid table.

This is because GROUP_AT(group_info, some_integer) uses an integer divide.
So having to do XXX thousand divides during one syscall can lead to very
high latencies.  (NGROUPS_MAX=65536)

In the past (25 Mar 2006), an analog problem was found in groups_search()
(commit d74beb9f33 ) and at that time I
changed some variables to unsigned int.

I believe that a more generic fix is to make sure NGROUPS_PER_BLOCK is
unsigned.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:09 -08:00
Luís P Mendes
dc999159bb parport: add support for the Quatech SPPXP-100 Parallel port PCI ExpressCard
Added pci device id for the Quatech SPPXP-100 ExpressCard - 0x278 - to
include/linux/pci_id.h

Modified drivers/parport/parport_pc.c to support the Quatech SPPXP-100 Parallel port PCI ExpressCard

[akpm@linux-foundation.org: build fix]
Signed-off-by: Luís P Mendes <luis.p.mendes@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:08 -08:00
Roland McGrath
1a669c2f16 Add arch_ptrace_stop
This adds support to allow asm/ptrace.h to define two new macros,
arch_ptrace_stop_needed and arch_ptrace_stop.  These control special
machine-specific actions to be done before a ptrace stop.  The new code
compiles away to nothing when the new macros are not defined.  This is the
case on all machines to begin with.

On ia64, these macros will be defined to solve the long-standing issue of
ptrace vs register backing store.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:07 -08:00
Daniel Walker
4749380ed6 drivers/isdn/i4l/isdn_tty.c: remove write_sem
I couldn't find any users, so removing it..

Signed-off-by: Daniel Walker <dwalker@mvista.com>
Cc: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:07 -08:00
Daniel Walker
eb31005eaf drivers/char/tty_io.c: remove pty_sem
I couldn't find any users, so removing it..

Signed-off-by: Daniel Walker <dwalker@mvista.com>
Acked-by: Alan Cox <alan@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:07 -08:00
Paul E. McKenney
d99c4f6b13 Remove rcu_assign_pointer() penalty for NULL pointers
The rcu_assign_pointer() primitive currently unconditionally executes a
memory barrier, even when a NULL pointer is being assigned.  This has lead
some to avoid using rcu_assign_pointer() for NULL pointers, which loses the
self-documenting advantages of rcu_assign_pointer() This patch uses
__builtin_const_p() to omit needless memory barriers for NULL-pointer
assignments at compile time with no runtime penalty, as discussed in the
following thread:

	http://www.mail-archive.com/netdev@vger.kernel.org/msg54852.html

Tested on x86_64 and ppc64, also compiled the four cases (NULL/non-NULL
and const/non-const) with gcc version 4.1.2, and hand-checked the
assembly output.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:06 -08:00
Eric Dumazet
9cfe015aa4 get rid of NR_OPEN and introduce a sysctl_nr_open
NR_OPEN (historically set to 1024*1024) actually forbids processes to open
more than 1024*1024 handles.

Unfortunatly some production servers hit the not so 'ridiculously high
value' of 1024*1024 file descriptors per process.

Changing NR_OPEN is not considered safe because of vmalloc space potential
exhaust.

This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to
1024*1024, so that admins can decide to change this limit if their workload
needs it.

[akpm@linux-foundation.org: export it for sparc64]
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:06 -08:00
Jan Kara
ece95912db inotify: send IN_ATTRIB events when link count changes
Currently, no notification event has been sent when inode's link count
changed.  This is inconvenient for the application in some cases:

Suppose you have the following directory structure

    foo/test
    bar/

and you watch test.  If someone does "mv foo/test bar/", you get event
IN_MOVE_SELF and you know something has happened with the file "test".
However if someone does "ln foo/test bar/test" and "rm foo/test" you get no
inotify event for the file "test" (only directories "foo" and "bar" receive
events).

Furthermore it could be argued that link count belongs to file's metadata and
thus IN_ATTRIB should be sent when it changes.

The following patch implements sending of IN_ATTRIB inotify events when link
count of the inode changes, i.e., when a hardlink to the inode is created or
when it is removed.  This event is sent in addition to all the events sent so
far.  In particular, when a last link to a file is removed, IN_ATTRIB event is
sent in addition to IN_DELETE_SELF event.

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Morten Welinder <mwelinder@gmail.com>
Cc: Robert Love <rlove@google.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:05 -08:00
Akinobu Mita
797074e44d fs: use list_for_each_entry_reverse and kill sb_entry
Use list_for_each_entry_reverse for super_blocks list and remove
unused sb_entry macro.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:05 -08:00
Herbert Xu
f10db6277d Avoid divide in IS_ALIGN
I was happy to discover the brand new IS_ALIGN macro and quickly used it in
my code.  To my dismay I found that the generated code used division to
perform the test.

This patch fixes it by changing the % test to an &.  This avoids the
division.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:04 -08:00
Eric Dumazet
b324215190 PERCPU : __percpu_alloc_mask() can dynamically size percpu_data storage
Instead of allocating a fix sized array of NR_CPUS pointers for percpu_data,
we can use nr_cpu_ids, which is generally < NR_CPUS.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:04 -08:00
Joern Engel
e7ca2d41a0 Document I_SYNC and I_DATASYNC
After some archeology (see http://logfs.org/logfs/inode_state_bits) I
finally figured out what the three I_DIRTY bits do.  Maybe others would
prefer less effort to reach this insight.

Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:03 -08:00
Jiri Olsa
0321155926 fs: remove dead config CONFIG_HAS_COMPAT_EPOLL_EVENT symbol
Remove dead config CONFIG_HAS_COMPAT_EPOLL_EVENT symbol.

Signed-off-by: Jiri Olsa <olsajiri@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:03 -08:00
Robert P. J. Day
de9330d13e log2.h: Define order_base_2() macro for convenience.
Given a number of places in the tree that need to calculate this value
explicitly, might as well just create a macro for it.

(akpm: must be implemented as a macro for callee typeof() usage)

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:03 -08:00
Adrian Bunk
26464378c4 proper prototype for vty_init()
Add a proper prototype for vty_init() in include/linux/vt_kern.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:03 -08:00
Adrian Bunk
011e3fcd1e proper prototype for get_filesystem_list()
Ad a proper prototype for migration_init() in include/linux/fs.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:02 -08:00
Adrian Bunk
a1c9eea9e5 proper prototype for signals_init()
Add a proper prototype for signals_init() in include/linux/signal.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:02 -08:00
Andrew Morton
941e492bdb read_current_timer() cleanups
- All implementations can be __devinit

- The function prototypes were in asm/timex.h but they all must be the same,
  so create a single declaration in linux/timex.h.

- uninline the sparc64 version to match the other architectures

- Don't bother #defining ARCH_HAS_READ_CURRENT_TIMER to a particular value.

[ezk@cs.sunysb.edu: fix build]
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:02 -08:00
Adrian Bunk
83bad1d764 scheduled OSS driver removal
This patch contains the scheduled removal of OSS drivers whose config
options have been removed in 2.6.23.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:02 -08:00
Adrian Bunk
f74596d079 proper show_interrupts() prototype
Add a proper prototype for show_interrupts() in include/linux/interrupt.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:02 -08:00
David Woodhouse
96c5865559 Allow auto-destruction of loop devices
This allows a flag to be set on loop devices so that when they are
closed for the last time, they'll self-destruct.

In general, so that we can automatically allocate loop devices (as with
losetup -f) and have them disappear when we're done with them.

In particular, right now, so that we can stop relying on the hackish
special-case in umount(8) which kills off loop devices which were set up by
'mount -oloop'.  That means we can stop putting crap in /etc/mtab which
doesn't belong there, which means it can be a symlink to /proc/mounts, which
means yet another writable file on the root filesystem is eliminated and the
'stateless' folks get happier...  and OLPC trac #356 can be closed.

The mount(8) side of that is at
http://marc.info/?l=util-linux-ng&m=119362955431694&w=2

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Bernardo Innocenti <bernie@codewiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:01 -08:00
Matthias Kaehlcke
0a5dcb5177 Parallel port: convert port_mutex to the mutex API
Parallel port: Convert port_mutex to the mutex API

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:01 -08:00
Matthew Wilcox
4e701482d1 hash: add explicit u32 and u64 versions of hash
The 32-bit version is more efficient (and apparently gives better hash
results than the 64-bit version), so users who are only hashing a 32-bit
quantity can now opt to use the 32-bit version explicitly, rather than
promoting to a long.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:00 -08:00
Dan Williams
47437b2c9a async_tx: allow architecture specific async_tx_find_channel implementations
The source and destination addresses are included to allow channel
selection based on address alignment.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
2008-02-06 10:12:18 -07:00
Dan Williams
d4c56f97ff async_tx: replace 'int_en' with operation preparation flags
Pass a full set of flags to drivers' per-operation 'prep' routines.
Currently the only flag passed is DMA_PREP_INTERRUPT.  The expectation is
that arch-specific async_tx_find_channel() implementations can exploit this
capability to find the best channel for an operation.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Reviewed-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
2008-02-06 10:12:18 -07:00
Dan Williams
0036731c88 async_tx: kill tx_set_src and tx_set_dest methods
The tx_set_src and tx_set_dest methods were originally implemented to allow
an array of addresses to be passed down from async_xor to the dmaengine
driver while minimizing stack overhead.  Removing these methods allows
drivers to have all transaction parameters available at 'prep' time, saves
two function pointers in struct dma_async_tx_descriptor, and reduces the
number of indirect branches..

A consequence of moving this data to the 'prep' routine is that
multi-source routines like async_xor need temporary storage to convert an
array of linear addresses into an array of dma addresses.  In order to keep
the same stack footprint of the previous implementation the input array is
reused as storage for the dma addresses.  This requires that
sizeof(dma_addr_t) be less than or equal to sizeof(void *).  As a
consequence CONFIG_DMADEVICES now depends on !CONFIG_HIGHMEM64G.  It also
requires that drivers be able to make descriptor resources available when
the 'prep' routine is polled.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
2008-02-06 10:12:17 -07:00
Dan Williams
d909b34759 async_tx: kill ASYNC_TX_ASSUME_COHERENT
Remove the unused ASYNC_TX_ASSUME_COHERENT flag.  Async_tx is
meant to hide the difference between asynchronous hardware and synchronous
software operations, this flag requires clients to understand cache
coherency consequences of the async path.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
2008-02-06 10:12:17 -07:00
James Bottomley
37198e3051 libata: kill now unused n_iter and fix sata_fsl
qc->n_iter was used for libata's own sg walking before sg chaining
replaced it.  During conversion, the field and its usage in sata_fsl
were left behind.  Kill the filed and update sata_fsl.

tj: This was part of James's libata-use-block-layer-padding patch.
    Separated out by me.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-02-06 06:59:32 -05:00
Saeed Bishara
f351b2d638 sata_mv: Support SoC controllers
Marvell's Orion SoC includes SATA controllers based on Marvell's
PCI-to-SATA 88SX controllers. This patch extends the libATA sata_mv
driver to support those controllers.

[edited to use linux/ata_platform.h -jg]

Signed-off-by: Saeed Bishara <saeed@marvell.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-02-06 06:54:17 -05:00
Jeff Garzik
0a87e3e92b Rename: linux/pata_platform.h to linux/ata_platform.h
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-02-06 06:54:17 -05:00
David S. Miller
655d2ce073 Merge branch 'upstream-davem' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2008-02-06 03:52:44 -08:00
Michael Ellerman
f4eb010706 [POWERPC] Add of_get_next_parent()
Iterating through a device node's parents is simple enough, but dealing
with the refcounts properly is a little ugly, and replicating that logic
is asking for someone to get it wrong or forget it all together, eg:

while (dn != NULL) {
	/* loop body */
	tmp = of_get_parent(dn);
	of_node_put(dn);
	dn = tmp;
}

So add of_get_next_parent(), inspired by of_get_next_child().  The
contract is that it returns the parent and drops the reference on the
current node, this makes the loop look like:

while (dn != NULL) {
	/* loop body */
	dn = of_get_next_parent(dn);
}

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-02-06 16:29:59 +11:00
David S. Miller
a29961b33b Merge branch 'fixes' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6 2008-02-05 19:58:05 -08:00
maximilian attems
7c2670bbb5 ACPI: battery: add sysfs serial number
egrep serial /proc/acpi/battery/BAT0/info
serial number:           32090

serial number can tell you from the imminent danger
of beeing set on fire.

Signed-off-by: maximilian attems <max@stro.at>
Acked-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-05 21:15:50 -05:00
Bartlomiej Zolnierkiewicz
64a57fe439 ide: add ide_read_error() inline helper
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-02-06 02:57:51 +01:00
Bartlomiej Zolnierkiewicz
c47137a99c ide: add ide_read_[alt]status() inline helpers
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-02-06 02:57:51 +01:00
Bartlomiej Zolnierkiewicz
29dd59755a ide: remove ide_setup_ports()
ide-cris.c:
* Add cris_setup_ports() helper and use it instead of ide_setup_ports()
  (fixes random value being set in ->io_ports[IDE_IRQ_OFFSET]).

buddha.c:
* Add buddha_setup_ports() helper and use it instead of ide_setup_ports().

falconide.c:
* Add falconide_setup_ports() helper and use it instead of ide_setup_ports(),
  also fix return value of falconide_init() while at it.

gayle.c:
* Add gayle_setup_ports() helper and use it instead of ide_setup_ports().

macide.c:
* Add macide_setup_ports() helper and use it instead of ide_setup_ports()
  (fixes incorrect value being set in ->io_ports[IDE_IRQ_OFFSET]).

q40ide.c:
* Fix q40_ide_setup_ports() comments.

ide.c:
* Remove no longer needed ide_setup_ports().

Cc: Mikael Starvik <starvik@axis.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-02-06 02:57:50 +01:00
Bartlomiej Zolnierkiewicz
afdd360c95 ide: remove write-only ->sata_misc[] from ide_hwif_t
* Remove write-only ->sata_misc[] from ide_hwif_t.

* Remove no longer used SATA_{MISC,PHY,IEN}_OFFSET defines.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-02-06 02:57:50 +01:00
Anton Salnikov
7c7e92a926 Palmchip BK3710 IDE driver
This is Palmchip BK3710 IDE controller support.

The IDE controller logic supports PIO, MultiWord-DMA and Ultra-DMA modes.
Supports interface to Compact Flash (CF) configured in True-IDE mode.

Bart:
- remove dead code
- fix ide_hwif_setup_dma() build problem

Signed-off-by: Anton Salnikov <asalnikov@ru.mvista.com>
Reviewed-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reviewed-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-02-06 02:57:48 +01:00