Commit Graph

22698 Commits

Author SHA1 Message Date
Alan Cox
76b25a5509 char: switch gs, cyclades and esp to return int for put_char
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:45 -07:00
Alan Cox
5d0fdf1e01 tty_io: fix remaining pid struct locking
This fixes the last couple of pid struct locking failures I know about.

[oleg@tv-sign.ru: clean up do_task_stat()]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:40 -07:00
Alan Cox
47f86834bb redo locking of tty->pgrp
Historically tty->pgrp and friends were pid_t and the code "knew" they were
safe.  The change to pid structs opened up a few races and the removal of the
BKL in places made them quite hittable.  We put tty->pgrp under the ctrl_lock
for the tty.

Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:40 -07:00
Alan Cox
04f378b198 tty: BKL pushdown
- Push the BKL down into the line disciplines
- Switch the tty layer to unlocked_ioctl
- Introduce a new ctrl_lock spin lock for the control bits
- Eliminate much of the lock_kernel use in n_tty
- Prepare to (but don't yet) call the drivers with the lock dropped
  on the paths that historically held the lock

BKL now primarily protects open/close/ldisc change in the tty layer

[jirislaby@gmail.com: a couple of fixes]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:40 -07:00
Oleg Nesterov
53b6f9fbd3 ptrace: introduce ptrace_reparented() helper
Add another trivial helper for the sake of grep.  It also auto-documents the
fact that ->parent != real_parent implies ->ptrace.

No functional changes.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:38 -07:00
Roland McGrath
5a8da0ea82 signals: x86 TS_RESTORE_SIGMASK
Replace TIF_RESTORE_SIGMASK with TS_RESTORE_SIGMASK and define our own
set_restore_sigmask() function.  This saves the costly SMP-safe set_bit
operation, which we do not need for the sigmask flag since TIF_SIGPENDING
always has to be set too.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Roland McGrath
f3de272b82 signals: use HAVE_SET_RESTORE_SIGMASK
Change all the #ifdef TIF_RESTORE_SIGMASK conditionals in non-arch code to
#ifdef HAVE_SET_RESTORE_SIGMASK.  If arch code defines it first, the generic
set_restore_sigmask() using TIF_RESTORE_SIGMASK is not defined.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
akpm@linux-foundation.org
49eaeb4bc4 signals: ia64 renumber TIF_RESTORE_SIGMASK
TIF_RESTORE_SIGMASK no longer needs to be in the _TIF_WORK_* masks.
Those low bits are scarce.  Renumber TIF_RESTORE_SIGMASK to free one up.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Roland McGrath
02a029b325 signals: s390: renumber TIF_RESTORE_SIGMASK
TIF_RESTORE_SIGMASK no longer needs to be in the _TIF_WORK_* masks.  Those low
bits are scarce, and are all used up now.  Renumber TIF_RESTORE_SIGMASK to
free one up.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Roland McGrath
7648d961fc signals: set_restore_sigmask TIF_SIGPENDING
Set TIF_SIGPENDING in set_restore_sigmask.  This lets arch code take
TIF_RESTORE_SIGMASK out of the set of bits that will be noticed on return to
user mode.  On some machines those bits are scarce, and we can free this
unneeded one up for other uses.

It is probably the case that TIF_SIGPENDING is always set anyway everywhere
set_restore_sigmask() is used.  But this is some cheap paranoia in case there
is an arcane case where it might not be.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Roland McGrath
4e4c22c711 signals: add set_restore_sigmask
This adds the set_restore_sigmask() inline in <linux/thread_info.h> and
replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it.  No
change, but abstracts the details of the flag protocol from all the calls.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Oleg Nesterov
fae5fa44f1 signals: fix /sbin/init protection from unwanted signals
The global init has a lot of long standing problems with the unhandled fatal
signals.

	- The "is_global_init(current)" check in get_signal_to_deliver()
	  protects only the main thread. Sub-thread can dequee the fatal
	  signal and shutdown the whole thread group except the main thread.
	  If it dequeues SIGSTOP /sbin/init will be stopped, this is not
	  right too. Note that we can't use is_global_init(->group_leader),
	  this breaks exec and this can't solve other problems we have.

	- Even if afterwards ignored, the fatal signals sets SIGNAL_GROUP_EXIT
	  on delivery. This breaks exec, has other bad implications, and this
	  is just wrong.

Introduce the new SIGNAL_UNKILLABLE flag to fix these problems.  It also helps
to solve some other problems addressed by the subsequent patches.

Currently we use this flag for the global init only, but it could also be used
by kthreads and (perhaps) by the sub-namespace inits.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Oleg Nesterov
ac5c215383 signals: join send_sigqueue() with send_group_sigqueue()
We export send_sigqueue() and send_group_sigqueue() for the only user,
posix_timer_event().  This is a bit silly, because both are just trivial
helpers on top of do_send_sigqueue() and because the we pass the unused
.si_signo parameter.

Kill them both, rename do_send_sigqueue() to send_sigqueue(), and export it.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:36 -07:00
Oleg Nesterov
6ca25b5513 kill_pid_info: don't take now unneeded tasklist_lock
Previously handle_stop_signal(SIGCONT) could drop ->siglock.  That is why
kill_pid_info(SIGCONT) takes tasklist_lock to make sure the target task can't
go away after unlock.  Not needed now.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:34 -07:00
Oleg Nesterov
e442055193 signals: re-assign CLD_CONTINUED notification from the sender to reciever
Based on discussion with Jiri and Roland.

In short: currently handle_stop_signal(SIGCONT, p) sends the notification to
p->parent, with this patch p itself notifies its parent when it becomes
running.

handle_stop_signal(SIGCONT) has to drop ->siglock temporary in order to notify
the parent with do_notify_parent_cldstop().  This leads to multiple problems:

	- as Jiri Kosina pointed out, the stopped task can resume without
	  actually seeing SIGCONT which may have a handler.

	- we race with another sig_kernel_stop() signal which may come in
	  that window.

	- we race with sig_fatal() signals which may set SIGNAL_GROUP_EXIT
	  in that window.

	- we can't avoid taking tasklist_lock() while sending SIGCONT.

With this patch handle_stop_signal() just sets the new SIGNAL_CLD_CONTINUED
flag in p->signal->flags and returns.  The notification is sent by the first
task which returns from finish_stop() (there should be at least one) or any
other signalled thread from get_signal_to_deliver().

This is a user-visible change.  Say, currently kill(SIGCONT, stopped_child)
can't return without seeing SIGCHLD, with this patch SIGCHLD can be delayed
unpredictably.  Another difference is that if the child is ptraced by another
process, CLD_CONTINUED may be delivered to ->real_parent after ptrace_detach()
while currently it always goes to the tracer which doesn't actually need this
notification.  Hopefully not a problem.

The patch asks for the futher obvious cleanups, I'll send them separately.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:34 -07:00
Dan Williams
6bfe0b4990 md: support blocking writes to an array on device failure
Allows a userspace metadata handler to take action upon detecting a device
failure.

Based on an original patch by Neil Brown.

Changes:
-added blocked_wait waitqueue to rdev
-don't qualify Blocked with Faulty always let userspace block writes
-added md_wait_for_blocked_rdev to wait for the block device to be clear, if
 userspace misses the notification another one is sent every 5 seconds
-set MD_RECOVERY_NEEDED after clearing "blocked"
-kill DoBlock flag, just test mddev->external

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:33 -07:00
Eric Miao
3c42a44910 pxafb: preliminary smart panel interface support
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Signed-off-by: Eric Miao <eric.miao@marvell.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:32 -07:00
eric miao
84f43c308b pxafb: introduce register independent LCD connection type for pxafb
Reasons:

  1. straight forward: the name "LCD_COLOR_DSTN_16BPP" is much better
     than "LCCR0_Pas | LCCR0_Color | LCCR0_Dual"

  2. by defining LCD connection types as constants, it allows only
     valid possibilities

  3. by removing the dependency of register bits definitions, those
     can be later moved into the body of pxafb.c, instead of having
     a regs-lcd.h around

Currently, only lubbock, mainstone, zylonite and littleton have been
modified to support these types (see coming patches after this).
Other platforms are encouraged to change their way describing the
LCD controller connections.

Signed-off-by: eric miao <eric.miao@marvell.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:31 -07:00
eric miao
ce4fb7b892 pxafb: convert fb driver to use ioremap() and __raw_{readl, writel}
This is part of the effort moving peripheral registers outside of pxa-regs.h,
and using ioremap() make it possible the same IP can be re-used on different
processors with different registers space

As a result, the fixed mapping in pxa_map_io() is removed.

The regs-lcd.h can actually moved to where closer to pxafb.c but some of its
bit definitions are directly used by various platform code, though this is not
a good style.

Signed-off-by: eric miao <eric.miao@marvell.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:31 -07:00
Bryan Wu
2f3517418d Blackfin serial driver: this driver enable SPORTs on Blackfin emulate UART
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:30 -07:00
Martin Schwidefsky
941af343e2 [S390] use generic sys_ptrace
After the PT_IEEE_IP hack has been removed s390 can now use
the common code sys_ptrace function.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30 13:38:48 +02:00
Heiko Carstens
17f3458085 [S390] Convert to SPARSEMEM & SPARSEMEM_VMEMMAP
Convert s390 to SPARSEMEM and SPARSEMEM_VMEMMAP. We do a select
of SPARSEMEM_VMEMMAP since it is configurable. This is because
SPARSEMEM without SPARSEMEM_VMEMMAP gives us a hell of broken
include dependencies that I don't want to fix.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30 13:38:48 +02:00
Gerald Schaefer
53492b1de4 [S390] System z large page support.
This adds hugetlbfs support on System z, using both hardware large page
support if available and software large page emulation on older hardware.
Shared (large) page tables are implemented in software emulation mode,
by using page->index of the first tail page from a compound large page
to store page table information.

Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30 13:38:47 +02:00
Heiko Carstens
2e5061e40a [S390] Convert machine feature detection code to C.
From: Heiko Carstens <heiko.carstens@de.ibm.com>
From: Carsten Otte <cotte@de.ibm.com>

This lets us use defines for the magic bits in machine flags instead
of using plain numbers all over the place.
In addition on newer machines features/facilities are indicated by the
result of the stfl instruction. So we use these bits instead of trying
to execute new instructions and check wether we get an exception or
not.
Also the mvpg instruction is always available when in zArch mode,
whereas the idte instruction is only available in zArch mode. This
results in some minor optimizations.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30 13:38:47 +02:00
Heiko Carstens
484875b11f [S390] Move stfl to system.h and delete duplicated version.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30 13:38:46 +02:00
Heiko Carstens
d00aa4e7d0 [S390] Add topology_core_siblings to topology.h
This exposes the core siblings to user space via sysfs.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2008-04-30 13:38:45 +02:00
Heiko Carstens
1e489518da [S390] Automatically detect added cpus.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30 13:38:44 +02:00
Heiko Carstens
f291e17227 [S390] Add missing ifndef/define to include/asm-s390/sysinfo.h.
In order to protect against compile breakage in case the header file
gets included twice.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30 13:38:43 +02:00
Heiko Carstens
4e83be7b24 [S390] Move show_regs to traps.c.
This is where it should be and we can get rid of some externs
and a static inline function.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30 13:38:43 +02:00
Christoph Hellwig
3dcf54515a ext4: move headers out of include/linux
Move ext4 headers out of include/linux.  This is just the trivial move,
there's some more thing that could be done later. 

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 18:13:32 -04:00
Philip Craig
443a70d50b netfilter: nf_conntrack: padding breaks conntrack hash on ARM
commit 0794935e "[NETFILTER]: nf_conntrack: optimize hash_conntrack()"
results in ARM platforms hashing uninitialised padding.  This padding
doesn't exist on other architectures.

Fix this by replacing NF_CT_TUPLE_U_BLANK() with memset() to ensure
everything is initialised.  There were only 4 bytes that
NF_CT_TUPLE_U_BLANK() wasn't clearing anyway (or 12 bytes on ARM).

Signed-off-by: Philip Craig <philipc@snapgear.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29 03:35:10 -07:00
Timo Teras
0010e46577 ipv4: Update MTU to all related cache entries in ip_rt_frag_needed()
Add struct net_device parameter to ip_rt_frag_needed() and update MTU to
cache entries where ifindex is specified. This is similar to what is
already done in ip_rt_redirect().

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29 03:32:25 -07:00
David L Stevens
42908c69f6 net: Add compat support for getsockopt (MCAST_MSFILTER)
This patch adds support for getsockopt for MCAST_MSFILTER for
both IPv4 and IPv6. It depends on the previous setsockopt patch,
and uses the same method.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29 03:23:22 -07:00
Julian Anastasov
2ad17defd5 ipvs: fix oops in backup for fwmark conn templates
Fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=10556
where conn templates with protocol=IPPROTO_IP can oops backup box.

        Result from ip_vs_proto_get() should be checked because
protocol value can be invalid or unsupported in backup. But
for valid message we should not fail for templates which use
IPPROTO_IP. Also, add checks to validate message limits and
connection state. Show state NONE for templates using IPPROTO_IP.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29 03:21:23 -07:00
Zhang Wei
61b269179d [RAPIDIO] Add serial RapidIO controller support, which includes MPC8548, MPC8641
Signed-off-by: Zhang Wei <wei.zhang@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 19:40:29 +10:00
Zhang Wei
e042323607 [RAPIDIO] Auto-probe the RapidIO system size
The RapidIO system size will auto probe in RIO setup.  The route table
and rionet_active in rionet.c are changed to be allocated dynamically
according to the size of the system.

Signed-off-by: Zhang Wei <wei.zhang@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 19:40:28 +10:00
Zhang Wei
ad1e9380b1 [RAPIDIO] Add RapidIO multi mport support
The original RapidIO driver suppose there is only one mpc85xx RIO controller
in system.  So, some data structures are defined as mpc85xx_rio global, such
as 'regs_win', 'dbell_ring', 'msg_tx_ring'.  Now, I changed them to mport's
private members.  And you can define multi RIO OF-nodes in dts file for multi
RapidIO controller in one processor, such as PCI/PCI-Ex host controllers in
Freescale's silicon.  And the mport operation function declaration should be
changed to know which RapidIO controller is target.

Signed-off-by: Zhang Wei <wei.zhang@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 19:40:28 +10:00
Zhang Wei
5a7b60ed88 [RAPIDIO] Move include/asm-ppc/rio.h to asm-powerpc
Signed-off-by: Zhang Wei <wei.zhang@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 19:40:27 +10:00
David S. Miller
e2fdd7fd99 sparc: Add kgdb support.
Current limitations:

1) On SMP single stepping has some fundamental issues,
   shared with other sw single-step architectures such
   as mips and arm.

2) On 32-bit sparc we don't support SMP kgdb yet.  That
   requires some reworking of the IPI mechanisms and
   infrastructure on that platform.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29 02:38:50 -07:00
David S. Miller
0a9e9b110c sparc32: Kill smp_message_pass() and related code.
Completely unused, and it just makes the SMP message
passing code on 32-bit sparc look more complex than
it is.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29 01:14:10 -07:00
Bjorn Helgaas
dfd2e1b4e6 PNPBIOS: remove include/linux/pnpbios.h
The contents of include/linux/pnpbios.h are used only inside the PNPBIOS
backend, so this file doesn't need to be visible outside PNP.

This patch moves the contents into an existing PNPBIOS-specific file,
drivers/pnp/pnpbios/pnpbios.h.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:30 -04:00
Bjorn Helgaas
261b20da4b ISAPNP: remove unused pnp_dev->regs field
The "regs" field in struct pnp_dev is set but never read, so remove it.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:30 -04:00
Bjorn Helgaas
62cfb298b9 PNP: make interfaces private to the PNP core
The interfaces for registering protocols, devices, cards,
and resource options should only be used inside the PNP core.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:30 -04:00
Bjorn Helgaas
02d83b5da3 PNP: make pnp_resource_table private to PNP core
There are no remaining references to the PNP_MAX_* constants or
the pnp_resource_table structure outside of the PNP core.  Make
them private to the PNP core.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:27 -04:00
Bjorn Helgaas
13575e81bb PNP: convert resource accessors to use pnp_get_resource(), not pnp_resource_table
This removes more direct references to pnp_resource_table.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:23 -04:00
Bjorn Helgaas
b90eca0a61 PNP: add pnp_get_resource() interface
This adds a pnp_get_resource() that works the same way as
platform_get_resource().  This will enable us to consolidate
many pnp_resource_table references in one place, which will
make it easier to make the table dynamic.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:23 -04:00
Bjorn Helgaas
2cd1393098 PNP: remove unused interfaces using pnp_resource_table
Rene Herman <rene.herman@gmail.com> recently removed the only in-tree
driver uses of:

    pnp_init_resource_table()
    pnp_manual_config_dev()
    pnp_resource_change()

in this change:

    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=109c53f840e551d6e99ecfd8b0131a968332c89f

These are no longer used in the PNP core either, so we can just remove
them completely.

It's possible that there are out-of-tree drivers that use these
interfaces.  They should be changed to either (1) use PNP quirks
to work around broken hardware or firmware, or (2) use the sysfs
interfaces to control resource usage from userspace.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:22 -04:00
Bjorn Helgaas
f449000209 PNP: add pnp_init_resources(struct pnp_dev *) interface
Add pnp_init_resources(struct pnp_dev *) to replace
pnp_init_resource_table(), which takes a pointer to the
pnp_resource_table itself.  Passing only the pnp_dev * reduces
the possibility for error in the caller and removes the
pnp_resource_table implementation detail from the interface.

Even though pnp_init_resource_table() is exported, I did not
export pnp_init_resources() because it is used only by the PNP
core.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:22 -04:00
Bjorn Helgaas
59284cb409 PNP: remove pnp_resource_table from internal get/set interfaces
When we call protocol->get() and protocol->set() methods, we currently
supply pointers to both the pnp_dev and the pnp_resource_table even
though the pnp_resource_table should always be the one associated with
the pnp_dev.

This removes the pnp_resource_table arguments to make it clear that
these methods only operate on the specified pnp_dev.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:21 -04:00
Bjorn Helgaas
c1caf06ccf PNP: add debug output to option registration
Add debug output to resource option registration functions (enabled
by CONFIG_PNP_DEBUG).  This uses dev_printk, so I had to add pnp_dev
arguments at the same time.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:20 -04:00
Bjorn Helgaas
048825deea PNP: make pnp_add_card_id() internal to PNP core
pnp_add_card_id() doesn't need to be exposed outside the PNP core, so
move the declaration to an internal header file.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:17 -04:00
Bjorn Helgaas
1692b27bf3 PNP: make pnp_add_id() internal to PNP core
pnp_add_id() doesn't need to be exposed outside the PNP core, so
move the declaration to an internal header file.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:15 -04:00
Bjorn Helgaas
ca0e8b6fd2 ISAPNP: move config register addresses out of isapnp.h
These are used only in drivers/pnp/isapnp/core.c, so no need to
expose them to the world.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:15 -04:00
Zhang Rui
e68b16abd9 thermal: add hwmon sysfs I/F
Add hwmon sys I/F for generic thermal driver.

Note: we have one hwmon class device for EACH TYPE of the thermal zone device.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 02:48:01 -04:00
Zhang, Rui
9ec732ff80 thermal: add new get_crit_temp callback
Add a new callback so that the generic thermal can get
the critical trip point info of a thermal zone,
which is needed for building the tempX_crit hwmon sysfs attribute.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 02:45:49 -04:00
Zhang Rui
63c4ec905d thermal: add the support for building the generic thermal as a module
Build the generic thermal driver as module "thermal_sys".

Make ACPI thermal, video, processor and fan SELECT the generic
thermal driver, as these drivers rely on it to build the sysfs I/F.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 02:44:00 -04:00
Badari Pulavarty
9d88a2eb6e [POWERPC] Provide walk_memory_resource() for powerpc
Provide walk_memory_resource() for 64-bit powerpc.  PowerPC maintains
logical memory region mapping in the lmb.memory structure.  Walk
through these structures and do the callbacks for the contiguous
chunks.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 15:57:53 +10:00
Badari Pulavarty
98d5c21c81 [POWERPC] Update lmb data structures for hotplug memory add/remove
The powerpc kernel maintains information about logical memory blocks
in the lmb.memory structure, which is initialized and updated at boot
time, but not when memory is added or removed while the kernel is
running.

This adds a hotplug memory notifier which updates lmb.memory when
memory is added or removed.  This information is useful for eHEA
driver to find out the memory layout and holes.

NOTE: No special locking is needed for lmb_add() and lmb_remove().
Calls to these are serialized by caller. (pSeries_reconfig_chain).

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 15:57:53 +10:00
Kumar Gala
85218827cc [POWERPC] Add IRQSTACKS support on ppc32
This makes it possible to use separate stacks for hard and soft IRQs
on 32-bit powerpc as well as on 64-bit.  The code for 32-bit is just
the 32-bit analog of the 64-bit code.

* Added allocation and initialization of the irq stacks.  We limit the
  stacks to be in lowmem for ppc32.
* Implemented ppc32 versions of call_do_softirq() and call_handle_irq()
  to switch the stack pointers
* Reworked how we do stack overflow detection.  We now keep around the
  limit of the stack in the thread_struct and compare against the limit
  to see if we've overflowed.  We can now use this on ppc64 if desired.

[ paulus@samba.org: Fixed bug on 6xx where we need to reload r9 with the
  thread_info pointer. ]

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 15:57:34 +10:00
Paul Mackerras
dd18434ff0 [POWERPC] Use __always_inline for xchg* and cmpxchg*
This changes the definitions of the xchg and cmpxchg families of
functions in include/asm-powerpc/system.h to be marked __always_inline
rather than __inline__.  The reason for doing this is that we rely on
the compiler inlining them in order to eliminate the references to
__xchg_called_with_bad_pointer and __cmpxchg_called_with_bad_pointer,
which are deliberately left undefined.  Thus this change will enable
us to make the inline keyword be just a hint rather than a directive.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 15:57:34 +10:00
Ursula Braun
a74b08c7fc qeth: read number of ports from card
Read out number of ports from the hardware.

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-04-29 01:56:34 -04:00
Ursula Braun
022b660ae5 ccwgroup: Unify parsing for group attribute.
Instead of having each driver for ccwgroup slave device parsing the
input itself and calling ccwgroup_create(), introduce a new function
ccwgroup_create_from_string() and handle parsing inside the ccwgroup
core.

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-04-29 01:56:29 -04:00
Linus Torvalds
8ab68ab420 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (35 commits)
  siimage: coding style cleanup (take 2)
  ide-cd: clean up cdrom_analyze_sense_data()
  ide-cd: fix test unsigned var < 0
  ide: add TSSTcorp CDDVDW SH-S202H to ivb_list[]
  piix: add Asus Eee 701 controller to short cable list
  ARM: always select HAVE_IDE
  remove the broken ETRAX_IDE driver
  ide: remove ->dma_prdtable field from ide_hwif_t
  ide: remove ->dma_vendor{1,3} fields from ide_hwif_t
  scc_pata: add ->dma_host_set and ->dma_start methods
  ide: skip "VLB sync" if host uses MMIO
  ide: add ide_pad_transfer() helper
  ide: remove ->INW and ->OUTW methods
  ide: use IDE I/O helpers directly in ide_tf_{load,read}()
  ns87415: add ->tf_read method
  scc_pata: add ->tf_{load,read} methods
  ide-h8300: add ->tf_{load,read} methods
  ide-cris: add ->tf_{load,read} methods
  ide: add ->tf_load and ->tf_read methods
  ide: move ide_tf_{load,read} to ide-iops.c
  ...
2008-04-28 17:30:26 -07:00
Bartlomiej Zolnierkiewicz
55224bc86a ide: remove ->dma_prdtable field from ide_hwif_t
* Use 'hwif->dma_base + {4,8}' instead of hwif->dma_prdtable in
  {ide,scc}_dma_setup().

* Remove no longer needed ->dma_prdtable field from ide_hwif_t.

While at it:

* Use ATA_DMA_TABLE_OFS define.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-28 23:44:42 +02:00
Bartlomiej Zolnierkiewicz
41051a141d ide: remove ->dma_vendor{1,3} fields from ide_hwif_t
* Use 'hwif->dma_base + {1,3}' instead of hwif->dma_vendor{1,3} in
  pdc202xx_new host driver.

* Remove no longer needed ->dma_vendor{1,3} fields from ide_hwif_t.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-28 23:44:42 +02:00
Bartlomiej Zolnierkiewicz
9f87abe892 ide: add ide_pad_transfer() helper
* Add ide_pad_transfer() helper (which uses ->{in,out}put_data methods
  internally so the transfer is also padded to drive+host requirements)
  and use it instead of ide_atapi_{write_zeros,discard_data}().

* Remove no longer needed ide_atapi_{write_zeros,discard_data}().

Cc: Borislav Petkov <petkovbb@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-28 23:44:41 +02:00
Bartlomiej Zolnierkiewicz
7c0daf2681 ide: remove ->INW and ->OUTW methods
* Remove no longer used ->INW and ->OUTW methods.

While at it:

* scc_pata.c: scc_ide_{out,in}w() is called only in scc_tf_{load,read}()
  so inline it there.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-28 23:44:41 +02:00
Bartlomiej Zolnierkiewicz
94cd5b62ff ide: add ->tf_load and ->tf_read methods
* Add ->tf_load and ->tf_read methods to ide_hwif_t and set the default
  methods in default_hwif_transport().

* Use ->tf_{load,read} instead o calling ide_tf_{load,read}() directly.

* Make ide_tf_{load,read}() static.

There should be no functional changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-28 23:44:40 +02:00
Bartlomiej Zolnierkiewicz
089c5c7e00 ide: factor out debugging code from ide_tf_load()
Factor out debugging code from ide_tf_load() to ide_tf_dump() helper
and update ide_tf_load() users accordingly.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-28 23:44:39 +02:00
Bartlomiej Zolnierkiewicz
1fc142589e ide: add ide_execute_pkt_cmd() helper
Add ide_execute_pkt_cmd() helper for executing PACKET command,
then convert ATAPI device drivers to use it.

As a nice side-effect this fixes ide-{floppy,tape,scsi} w.r.t.
ide_lock taking (ide-cd was OK).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-28 23:44:39 +02:00
Bartlomiej Zolnierkiewicz
16bb69c14a ide: remove ->INS{W,L} and ->OUTS{W,L} methods
* Use ins{w,l}()/outs{w,l}() and __ide_mm_ins{w,l}()/__ide_mm_outs{w,l}()
  directly in ata_{in,out}put_data() (by using IDE_HFLAG_MMIO host flag to
  decide which I/O ops are required).

* Remove no longer needed ->INS{W,L} and ->OUTS{W,L} methods (ide-h8300,
  au1xxx-ide and scc_pata implement their own ->{in,out}put_data methods).

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-28 23:44:37 +02:00
Bartlomiej Zolnierkiewicz
c5dd43ec65 ide: add IDE_HFLAG_MMIO host flag (take 2)
* Add IDE_HFLAG_MMIO host flag and set it for hosts which use
  default_hwif_mmiops().

v2:
* Fix kernel panic in pmac host driver (',' should be '|').

  Thanks to Kamalesh for reporting it + testing the fix
  and to Andrew for hinting me about the source of the issue.

Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-28 23:44:37 +02:00
Bartlomiej Zolnierkiewicz
9567b349f7 ide: merge ->atapi_*put_bytes and ->ata_*put_data methods
* Merge ->atapi_{in,out}put_bytes and ->ata_{in,out}put_data methods
  into new ->{in,out}put_data methods which take number of bytes to
  transfer as an argument and always do padding.

While at it:

* Use 'hwif' or 'drive->hwif' instead of 'HWIF(drive)'.

There should be no functional changes caused by this patch (all users
of ->ata_{in,out}put_data methods were using multiply-of-4 word counts).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-28 23:44:36 +02:00
Bartlomiej Zolnierkiewicz
92d3ab27e8 falconide/q40ide: add ->atapi_*put_bytes and ->ata_*put_data methods (take 2)
* Add ->atapi_{in,out}put_bytes and ->ata_{in,out}put_data methods to
  falconide and q40ide host drivers (->ata_* methods are implemented on
  top of ->atapi_* methods so they also do byte-swapping now).

* Cleanup atapi_{in,out}put_bytes().

v2:
* Add 'struct request *rq' argument to ->ata_{in,out}put_data methods
  and don't byte-swap disk fs requests (we shouldn't un-swap fs requests
  because fs itself is stored byte-swapped on the disk) - this is how
  things were done before the patch (ideally device mapper should be
  used instead but it would break existing setups and would have some
  performance impact).

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michael Schmitz <schmitz@debian.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Richard Zidlicky <rz@linux-m68k.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-28 23:44:36 +02:00
Linus Torvalds
e97e386b12 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  slub: pack objects denser
  slub: Calculate min_objects based on number of processors.
  slub: Drop DEFAULT_MAX_ORDER / DEFAULT_MIN_OBJECTS
  slub: Simplify any_slab_object checks
  slub: Make the order configurable for each slab cache
  slub: Drop fallback to page allocator method
  slub: Fallback to minimal order during slab page allocation
  slub: Update statistics handling for variable order slabs
  slub: Add kmem_cache_order_objects struct
  slub: for_each_object must be passed the number of objects in a slab
  slub: Store max number of objects in the page struct.
  slub: Dump list of objects not freed on kmem_cache_close()
  slub: free_list() cleanup
  slub: improve kmem_cache_destroy() error message
  slob: fix bug - when slob allocates "struct kmem_cache", it does not force alignment.
2008-04-28 14:08:56 -07:00
Linus Torvalds
e31a94ed37 Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (45 commits)
  [MIPS] Pb1200/DBAu1200: move platform code to its proper place
  [MIPS] Fix handling of trap and breakpoint instructions
  [MIPS] Pb1200: do register SMC 91C111
  [MIPS] DBAu1200: fix bad SMC 91C111 resource size
  [NET] Kconfig: Rename MIKROTIK_RB500 -> MIKROTIK_RB532
  [MIPS] IP27: Fix build bug due to missing include
  [MIPS] Fix some sparse warnings on traps.c and irq-msc01.c
  [MIPS] cevt-gt641xx: Kill unnecessary include
  [MIPS] DS1287: Add clockevent driver
  [MIPS] add DECstation I/O ASIC clocksource
  [MIPS] rbtx4938: minor cleanup
  [MIPS] Alchemy: kill unused PCI_IRQ_TABLE_LOOKUP macro
  [MIPS] rbtx4938: misc cleanups
  [MIPS] jmr3927: use generic txx9 gpio
  [MIPS] rbhma4500: use generic txx9 gpio
  [MIPS] generic txx9 gpio support
  [MIPS] make fallback gpio.h gpiolib-friendly
  [MIPS] unexport null_perf_irq() and make it static
  [MIPS] unexport rtc_mips_set_time()
  [MIPS] unexport copy_from_user_page()
  ...
2008-04-28 10:51:43 -07:00
Linus Torvalds
cfd299dffe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  SELinux: Fix a RCU free problem with the netport cache
  SELinux: Made netnode cache adds faster
  SELinux: include/security.h whitespace, syntax, and other cleanups
  SELinux: policydb.h whitespace, syntax, and other cleanups
  SELinux: mls_types.h whitespace, syntax, and other cleanups
  SELinux: mls.h whitespace, syntax, and other cleanups
  SELinux: hashtab.h whitespace, syntax, and other cleanups
  SELinux: context.h whitespace, syntax, and other cleanups
  SELinux: ss/conditional.h whitespace, syntax, and other cleanups
  SELinux: selinux/include/security.h whitespace, syntax, and other cleanups
  SELinux: objsec.h whitespace, syntax, and other cleanups
  SELinux: netlabel.h whitespace, syntax, and other cleanups
  SELinux: avc_ss.h whitespace, syntax, and other cleanups

Fixed up conflict in include/linux/security.h manually
2008-04-28 10:08:49 -07:00
Al Viro
01d7b36988 usbhid endianness annotations and fixes
usb_control_msg() converts arguments to little-endian itself,
doing that in caller means breakage on big-endian boxen.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 10:03:31 -07:00
Al Viro
b750568053 fix ia64 local_irq_save() et.al.
psr is not a good name for local variable in macro body when it
has a good chance of being the argument of said macro (actually
is at least in one place)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 10:03:31 -07:00
Linus Torvalds
e945e849e1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc: video drivers: add facility level
  sparc: tcx.c make tcx_init and tcx_exit static
  sparc: ffb.c make ffb_init and ffb_exit static
  sparc: cg14.c make cg14_init and cg15_exit static
  sparc: bw2.c fix bw2_exit
  sparc64: Fix accidental syscall restart on child return from clone/fork/vfork.
  sparc64: Clean up handling of pt_regs trap type encoding.
  sparc: Remove old style signal frame support.
  sparc64: Kill bogus RT_ALIGNEDSZ macro from signal.c
  sparc: sunzilog.c remove unused argument
  sparc: fix drivers/video/tcx.c warning
  sparc64: Kill unused local ISA bus layer.
  input: Rewrite sparcspkr device probing.
  sparc64: Do not ignore 'pmu' device ranges.
  sparc64: Kill ISA_FLOPPY_WORKS code.
  sparc64: Kill CONFIG_SPARC32_COMPAT
  sparc64: Cleanups and corrections for arch/sparc64/Kconfig
  sparc64: Fix wedged irq regression.
2008-04-28 09:45:57 -07:00
Linus Torvalds
77a50df2b1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  iwlwifi: Allow building iwl3945 without iwl4965.
  wireless: Fix compile error with wifi & leds
  tcp: Fix slab corruption with ipv6 and tcp6fuzz
  ipv4/ipv6 compat: Fix SSM applications on 64bit kernels.
  [IPSEC]: Use digest_null directly for auth
  sunrpc: fix missing kernel-doc
  can: Fix copy_from_user() results interpretation
  Revert "ipv6: Fix typo in net/ipv6/Kconfig"
  tipc: endianness annotations
  ipv6: result of csum_fold() is already 16bit, no need to cast
  [XFRM] AUDIT: Fix flowlabel text format ambibuity.
2008-04-28 09:44:11 -07:00
Sergei Shtylyov
fcbd3b4b92 [MIPS] Pb1200/DBAu1200: move platform code to its proper place
Since both the IDE interface and SMC 91C111 Ethernet chip are on-board
devices, not SOC devices, move the platform device registration form the
common to the board specific code.

While at it, remove semicolon (which didn't break compilation only by
chance) from the AU1XXX_ATA_DDMA_REQ macro and do some renaming:

- change 'au1200_ide0_' variable name prefix to the mere 'ide_';

- change 'smc91x_' variable name prefix to 'smc91c111_' since that's the
  name of the chip used on the boards;

- drop 'AU1XXX_' prefix from the names of macros describing IDE and Ethernet
  on-board devices;

- change 'SMC91111_' to 'SMC91C111_', change 'IRQ' to 'INT' in the names of
  the macros describing the Ethernet chip for consistency with the IDE
  macros;

- change 'ATA_' to 'IDE_' and 'OFFSET' to 'SHIFT' (since this value is
  indeed a shift count) in the names of the macros describing the IDE
  interface.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:33 +01:00
Adrian Bunk
a4a8f70d2d [MIPS] IP27: Fix build bug due to missing include
asm-mips/mach-ip27/topology.h must #include <asm-generic/topology.h>
This fixes the following compile error:

...
  CC      kernel/sched.o
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/sched.c: In function 'find_next_best_node':
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/sched.c:7015: error: implicit declaration of function 'node_to_cpumask_ptr'
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/sched.c:7015: error: '__tmp__' undeclared (first use in this function)
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/sched.c:7015: error: (Each undeclared identifier is reported only once
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/sched.c:7015: error: for each function it appears in.)
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/sched.c: In function 'sched_domain_node_span':
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/sched.c:7047: error: 'nodemask' undeclared (first use in this function)
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/sched.c:7048: warning: ISO C90 forbids mixed declarations and code
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/sched.c:7059: error: implicit declaration of function 'node_to_cpumask_ptr_next'
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/sched.c: In function '__build_sched_domains':
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/sched.c:7605: error: 'pnodemask' undeclared (first use in this function)
make[2]: *** [kernel/sched.o] Error 1

<--  snip  -->

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:32 +01:00
Atsushi Nemoto
411ba7fcba [MIPS] Fix some sparse warnings on traps.c and irq-msc01.c
* Declare board_bind_eic_interrupt, board_watchpoint_handler in traps.h
* Make msc_bind_eic_interrupt static and fix its argument types.
* Make msc_levelirq_type, msc_edgeirq_type static.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:32 +01:00
Yoichi Yuasa
6457d9fc3b [MIPS] DS1287: Add clockevent driver
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:32 +01:00
Yoichi Yuasa
4247417d84 [MIPS] add DECstation I/O ASIC clocksource
Add DECstation I/O ASIC clocksource

Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:32 +01:00
Sergei Shtylyov
6ed436932d [MIPS] Alchemy: kill unused PCI_IRQ_TABLE_LOOKUP macro
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:31 +01:00
Atsushi Nemoto
66140c8e9f [MIPS] rbtx4938: misc cleanups
* Do not use non-standard I/O accessors, such as reg_rd08, etc.
* Kill unnecessary wbflush()
* Kill tx4938_mips.h
* Kill unnecessary includes

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:31 +01:00
Atsushi Nemoto
1bd0962e3d [MIPS] jmr3927: use generic txx9 gpio
Use generic txx9 gpio (and gpiolib) for JMR3927 board.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:31 +01:00
Atsushi Nemoto
4cad154b30 [MIPS] rbhma4500: use generic txx9 gpio
Use generic txx9 gpio (and gpiolib) for RBHMA4500 board.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:31 +01:00
Atsushi Nemoto
a9aec7fe74 [MIPS] generic txx9 gpio support
This is a board-independent TXx9 gpio API implementation using gpiolib.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:31 +01:00
Atsushi Nemoto
8aa62adafa [MIPS] make fallback gpio.h gpiolib-friendly
If gpiolib was selected, asm-generic/gpio.h provides some prototypes
for gpio API and implementation helpers.  With this patch, platform
code can implement its GPIO API using gpiolib without custom gpio.h
file.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:31 +01:00
Sergei Shtylyov
0167509574 [MIPS] Alchemy: don't unmask timer IRQ early
Defer the unmasking of the count/compare interrupt (IRQ5) till the
clockevent driver initialization:

- only enable the cascaded IRQs 0 thru 4 in arch_init_irq(); kill the
  ALLINTS macro -- this change is blessed by AMD as I saw it in their own
  patch; :-)

- do not force IRQ5 enabled in plat_time_init() if PM is enabled and there's
  no 32 KHz crystal.

Update the copyrights (taking into account my prior changes), also removing
Pete Popov's old email...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:26 +01:00
Daniel Laird
a92b05880d [MIPS] Move arch/mips/philips to arch/mips/nxp
Signed-off-by: daniel.j.laird <daniel.j.laird@nxp.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:26 +01:00
Ralf Baechle
39b8d52542 [MIPS] Add support for MIPS CMP platform.
Signed-off-by: Chris Dearman <chris@mips.com>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:26 +01:00
Chris Dearman
308402445e [MIPS] Add CoreFPGA5 support; distinguish between SOCit/ROCit
Signed-off-by: Chris Dearman <chris@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:26 +01:00
Chris Dearman
351336929c [MIPS] Allow setting of the cache attribute at run time.
Slightly tacky, but there is a precedent in the sparc archirecture code.

Signed-off-by: Chris Dearman <chris@mips.com>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:25 +01:00
Chris Dearman
bec5052743 [MIPS] Tidy up cache attributes
Signed-off-by: Chris Dearman <chris@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:25 +01:00
Chris Dearman
962f480e0f [MIPS] All MIPS32 processors support64-bit physical addresses.
Still, only the 4K may actually implement it.

Signed-off-by: Chris Dearman <chris@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-04-28 17:14:25 +01:00
Andrew Morton
73f20e58b1 FAT_VALID_MEDIA(): remove pointless test
The on-disk media specification field in FAT is only 8-bits, so testing for
<=0xff is pointless, and can generate a "comparison is always true due to
limited range of data type" warning.

While we're there, convert FAT_VALID_MEDIA() into a C function - the present
implementation is buggy: it generates either one or two references to its
argument.

Cc: Frank Seidel <fseidel@suse.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
OGAWA Hirofumi
606e423e43 fat: Update free_clusters even if it is untrusted
Currently, free_clusters is not updated until it is trusted, because
Windows doesn't update it correctly.

But if user is using FAT driver of Linux, it updates free_clusters
correctly.  Instead, this updates it even if it's untrusted, so if
free_clustes is correct, now keep correct value.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
OGAWA Hirofumi
1ae43f826b fat: Add allow_utime option
Normally utime(2) checks current process is owner of the file, or it
has CAP_FOWNER capability.  But FAT filesystem doesn't have uid/gid as
on disk info, so normal check is too unflexible.

With this option you can relax it.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
OGAWA Hirofumi
1278fdd34b fat: fat_notify_change() and check_mode() cleanup
- Rename fat_notify_change() to fat_setattr()
- check_mode() cleanup
- Change layout of code

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
Jan Kara
d5dee5c395 reiserfs: unpack tails on quota files
Quota files cannot have tails because quota_write and quota_read functions do
not support them.  So far when quota files did have tail, we just refused to
turn quotas on it.  Sadly this check has been wrong and so there are now
plenty installations where quota files don't have NOTAIL flag set and so now
after fixing the check, they suddently fail to turn quotas on.  Since it's
easy to unpack the tail from kernel, do this from reiserfs_quota_on() which
solves the problem and is generally nicer to users anyway.

Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: <urhausen@urifabi.net>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:46 -07:00
Dan Williams
8b3e6cdc53 md: introduce get_priority_stripe() to improve raid456 write performance
Improve write performance by preventing the delayed_list from dumping all its
stripes onto the handle_list in one shot.  Delayed stripes are now further
delayed by being held on the 'hold_list'.  The 'hold_list' is bypassed when:

  * a STRIPE_IO_STARTED stripe is found at the head of 'handle_list'
  * 'handle_list' is empty and i/o is being done to satisfy full stripe-width
    write requests
  * 'bypass_count' is less than 'bypass_threshold'.  By default the threshold
    is 1, i.e. every other stripe handled is a preread stripe provided the
    top two conditions are false.

Benchmark data:
System: 2x Xeon 5150, 4x SATA, mem=1GB
Baseline: 2.6.24-rc7
Configuration: mdadm --create /dev/md0 /dev/sd[b-e] -n 4 -l 5 --assume-clean
Test1: dd if=/dev/zero of=/dev/md0 bs=1024k count=2048
  * patched:  +33% (stripe_cache_size = 256), +25% (stripe_cache_size = 512)

Test2: tiobench --size 2048 --numruns 5 --block 4096 --block 131072 (XFS)
  * patched: +13%
  * patched + preread_bypass_threshold = 0: +37%

Changes since v1:
* reduce bypass_threshold from (chunk_size / sectors_per_chunk) to (1) and
  make it configurable.  This defaults to fairness and modest performance
  gains out of the box.
Changes since v2:
* [neilb@suse.de]: kill STRIPE_PRIO_HI and preread_needed as they are not
  necessary, the important change was clearing STRIPE_DELAYED in
  add_stripe_bio and this has been moved out to make_request for the hang
  fix.
* [neilb@suse.de]: simplify get_priority_stripe
* [dan.j.williams@intel.com]: reset the bypass_count when ->hold_list is
  sampled empty (+11%)
* [dan.j.williams@intel.com]: decrement the bypass_count at the detection
  of stripes being naturally promoted off of hold_list +2%.  Note, resetting
  bypass_count instead of decrementing on these events yields +4% but that is
  probably too aggressive.
Changes since v3:
* cosmetic fixups

Tested-by: James W. Laferriere <babydr@baby-dragons.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:42 -07:00
Jaya Kumar
0e27aa3dab fbdev: platforming hecubafb and n411
This patch splits hecubafb into the platform independent hecubafb and the
platform dependent n411.

Signed-off-by: Jaya Kumar <jayakumar.lkml@gmail.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:41 -07:00
Jaya Kumar
03c33a4f00 fbdev: platforming metronomefb and am200epd
This patch splits metronomefb into the platform independent metronomefb and
the platform dependent am200epd.

Signed-off-by: Jaya Kumar <jayakumar.lkml@gmail.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:41 -07:00
Andres Salomon
fd96795630 gxfb/lxfb: detect framebuffer size using an MSR if VSA2 isn't available
If there's no VSA2 (ie, if we're using tinybios or OpenFirmware), use the
GLIU's P2D Range Offset Descriptor to determine how much memory we have
available for the framebuffer.

Originally based on a patch by Jordan Crouse.  Tested with OpenFirmware;
Pascal informs me that tinybios has a stub that fills in P2D_RO0.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:40 -07:00
Andres Salomon
61a517a063 gxfb/lxfb: use VSA definitions when fetching framebuffer size
..Rather than using magic constants.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:40 -07:00
Nicolas Ferre
fd0858017e atmel_lcdfb: wiring BGR to RGB color mode
Adds different wiring mode for the LCD screen.

The legacy atmel LCDC IP uses a non standard color mode, "BGR-555.1" instead
"RGB-565".  The major part of graphic stacks for embedded systems uses only
"RGB-565".  It is possible to swap LCD IOs instead of doing this bit swapping
by software (See application note AT91SAM9 LCD Controller
http://www.atmel.com/dyn/resources/prod_documents/doc6300.pdf)

This wire swapping is done on the at91sam9rl-ek board (board code
using this patch will come later).

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Cc: Hans-Christian Egtvedt <hcegtvedt@atmel.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Andrew Victor <avictor.za@gmail.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:39 -07:00
David Brownell
cf19a37e06 atmel_lcdfb: suspend/resume support
Teach atmel_lcdfb driver how to suspend/resume.

Note that the backlight control should probably do more of the same stuff:
turning off display power (more than just the backlight) and stopping the
clocks (and dma to drive the no-longer-seen display).  No point in wasting
power to generate images that can't be observed, after all...

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Hans-Christian Egtvedt <hcegtvedt@atmel.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Andrew Victor <avictor.za@gmail.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:38 -07:00
Andres Salomon
b6f448e99c PM/gxfb: add hook to PM console layer that allows disabling of suspend VT switch
Prior to suspend, we allocate and switch to a new VT; after suspend, we switch
back to the original VT.  This can be slow, and is completely unnecessary if
the framebuffer we're using can restore video properly.

This adds a hook that allows drivers to select whether or not to do this vt
switch, and changes the gxfb driver to call this hook.  It also adds a module
param to gxfb to allow controlling of the vt switch (defaulting to no switch).

(Note: I'm not convinced that console_sem is the best way to protect this, but
we should probably have some form of locking..)

[akpm@linux-foundation.org: build fix]
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:36 -07:00
Andres Salomon
e9338364e6 x86: GEODE: add Virtual Systems Architecture detection
This is generic VSA2 detection.  It's used by OLPC to determine whether or not
the BIOS contains VSA2, but since other BIOSes are coming out that don't use
the VSA (ie, tinybios), it might end up being useful for others.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:35 -07:00
Andres Salomon
32bf87e369 x86: geode: MSR cleanup
This cleans up a few MSR-using drivers in the following manner:
  - Ensures MSRs are all defined in asm/geode.h, rather than in misc
    places
  - Makes the naming consistent; cs553[56] ones begin with MSR_,
    GX-specific ones start with MSR_GX_, and LX-specific ones start
    with MSR_LX_.  Also, make the names match the data sheet.
  - Use MSR names rather than numbers in source code
  - Document the fact that the LX's MSR_PADSEL has the wrong value
    in the data sheet.  That's, uh, good to note.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:35 -07:00
Anton Vorontsov
e4c690e061 fb: add support for foreign endianness
Add support for the framebuffers with non-native endianness.  This is done via
FBINFO_FOREIGN_ENDIAN flag that will be used by the drivers.  Depending on the
host endianness this flag will be overwritten by FBINFO_BE_MATH internal flag,
or cleared.

Tested to work on MPC8360E-RDK (BE) + Fujitsu MINT framebuffer (LE).

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <Valdis.Kletnieks@vt.edu>
Cc: Clemens Koller <clemens.koller@anagramm.de>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:35 -07:00
Anton Vorontsov
169b6a7a6e gpiochip_reserve()
Add a new function gpiochip_reserve() to reserve ranges of gpios that platform
code has pre-allocated.  That is, this marks gpio numbers which will be
claimed by drivers that haven't yet been loaded, and thus are not available
for dynamic gpio number allocation.

[akpm@linux-foundation.org: remove unneeded __must_check]
[david-b@pacbell.net: don't export gpiochip_reserve (section fix)]
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:34 -07:00
Guennadi Liakhovetski
e6de1808f8 gpio: define gpio_is_valid()
Introduce a gpio_is_valid() predicate; use it in gpiolib.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de>
    [ use inline function; follow the gpio_* naming convention;
      work without gpiolib; all programming interfaces need docs ]
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:34 -07:00
Guennadi Liakhovetski
438d8908b3 gpiolib: better rmmod infrastructure
As long as one or more GPIOs on a gpio chip are used its driver should not be
unloaded.  The existing mechanism (gpiochip_remove failure) doesn't address
that, since rmmod can no longer be made to fail by having the cleanup code
report errors.  Module usecounts are the solution.

Assuming standard "initialize struct to zero" policies, this change won't
affect SOC platform drivers.  However, drivers for external chips (on I2C and
SPI busses) should be updated if they can be built as modules.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de>
[ gpio_ensure_requested() needs to update module usecounts too ]
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:34 -07:00
Ilpo Järvinen
73fcdc9e15 i2o: remove static inline forward declarations
Nothing in between of them and the later declaration with body
needs them.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:34 -07:00
Andrew Morton
50f8c370e7 quota: convert stub functions from macros into inlines
Fixes things like this:

fs/super.c: In function `deactivate_super':
fs/super.c:182: warning: statement with no effect
fs/super.c: In function `do_remount_sb':
fs/super.c:644: warning: statement with no effect

Cc: Jan Kara <jack@ucw.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:33 -07:00
Jan Kara
0ff5af8340 quota: quota core changes for quotaon on remount
Currently, we just turn quotas off on remount of filesystem to read-only
state.  The patch below adds necessary framework so that we can turn quotas
off on remount RO but we are able to automatically reenable them again when
filesystem is remounted to RW state.  All we need to do is to keep references
to inodes of quota files when remounting RO and using these references to
reenable quotas when remounting RW.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:33 -07:00
Jan Kara
03f6e92bdd quota: various style cleanups
Cleanups in quota code:
  Change __inline__ to inline.
  Change some macros to inline functions.
  Remove vfs_quota_off_mount() macro.
  DQUOT_OFF() should be (0) is CONFIG_QUOTA is disabled.
  Move declaration of mark_dquot_dirty and dirty_dquot from quota.h to dquot.c

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:33 -07:00
Andrew Perepechko
338bf9afda quota: do not allow setting of quota limits to too high values
We should check whether quota limits set via Q_SETQUOTA are not exceeding
limits which quota format is able to handle.

Signed-off-by: Andrew Perepechko <andrew.perepechko@sun.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:32 -07:00
Masami Hiramatsu
26b31c1908 kprobes: add (un)register_jprobes for batch registration
Introduce unregister_/register_jprobes() for jprobe batch registration.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:32 -07:00
Masami Hiramatsu
4a296e07c3 kprobes: add (un)register_kretprobes for batch registration
Introduce unregister_/register_kretprobes() for kretprobe batch registration.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:32 -07:00
Masami Hiramatsu
9861668f74 kprobes: add (un)register_kprobes for batch registration
Introduce unregister_/register_kprobes() for kprobe batch registration.  This
can reduce waiting time for synchronized_sched() when a lot of probes have to
be unregistered at once.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:32 -07:00
Masami Hiramatsu
9960257281 list.h: add list_is_singular()
Add list_is_singular() to check a list has just one entry.

list_is_singular() is useful to check whether a list_head which have been
temporarily allocated for listing objects can be released or not.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:32 -07:00
Srinivasa Ds
3d8d996e0c kprobes: prevent probing of preempt_schedule()
Prohibit users from probing preempt_schedule().  One way of prohibiting the
user from probing functions is by marking such functions with __kprobes.  But
this method doesn't work for those functions, which are already marked to
different section like preempt_schedule() (belongs to __sched section).  So we
use blacklist approach to refuse user from probing these functions.

In blacklist approach we populate the blacklisted function's starting address
and its size in kprobe_blacklist structure.  Then we verify the user specified
address against start and end of the blacklisted function.  So any attempt to
register probe on blacklisted functions will be rejected.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:32 -07:00
Karl Dahlke
0341a4d0fd VT notifier extension for accessibility
Some accessibility modules need to be able to catch the output on the
console before the VT interpretation, and possibly swallow it.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:32 -07:00
Magnus Damm
61711f8fd8 sm501: add uart support
This patch extends the sm501 mfd with 8250 uart support. We're currently
doing this in the board specific r2d-1 code already, but it would be nice to
do move things into the mfd since it's more chip specific than board specific.

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:32 -07:00
Thomas Petazzoni
7ae9392c0a x86: configurable DMI scanning code
Turn CONFIG_DMI into a selectable option if EMBEDDED is defined, in
order to be able to remove the DMI table scanning code if it's not
needed, and then reduce the kernel code size.

With CONFIG_DMI (i.e before) :

   text    data     bss     dec     hex filename
1076076  128656   98304 1303036  13e1fc vmlinux

Without CONFIG_DMI (i.e after) :

   text    data     bss     dec     hex filename
1068092  126308   98304 1292704  13b9a0 vmlinux

Result:

   text    data     bss     dec     hex filename
  -7984   -2348       0  -10332   -285c vmlinux

The new option appears in "Processor type and features", only when
CONFIG_EMBEDDED is defined.

This patch is part of the Linux Tiny project, and is based on previous work
done by Matt Mackall <mpm@selenic.com>.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Anvin" <hpa@zytor.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:30 -07:00
Yoichi Yuasa
fc3f341b5a serial: add VR41xx SIU setup for serial console
Add VR41xx SIU setup for serial console.

Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:30 -07:00
Joe Perches
0fab6de09c synclink drivers bool conversion
Remove more TRUE/FALSE defines and uses
Remove == TRUE tests
Convert BOOLEAN to bool
Convert int to bool where appropriate

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Paul Fulghum <paulkf@microgate.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:29 -07:00
Harvey Harrison
cdf8803768 ncpfs: add prototypes to ncp_fs.h
Removes some externs from C files, noticed from the sparse warnings:
fs/ncpfs/dir.c:90:26: warning: symbol 'ncp_root_dentry_operations' was not declared. Should it be static?
fs/ncpfs/symlink.c:107:5: warning: symbol 'ncp_symlink' was not declared. Should it be static?
fs/ncpfs/symlink.c:101:39: warning: symbol 'ncp_symlink_aops' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Petr Vandrovec <VANDROVE@vc.cvut.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:29 -07:00
KOSAKI Motohiro
16a26ef5ad cris: add constfy to pgd_offset()
add constfy to pgd_offset() for avoid following warnings.

  CC      mm/pagewalk.o
mm/pagewalk.c: In function 'walk_page_range':
mm/pagewalk.c:111: warning: passing argument 1 of 'pgd_offset' discards qualifiers from p\
ointer target type

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: "Vegard Nossum" <vegard.nossum@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:28 -07:00
Andrew Morton
ed6b9b97f4 alpha: teach the compiler that BUG doesn't return
Fix things like this:

security/selinux/netnode.c: In function 'sel_netnode_find':
security/selinux/netnode.c:126: warning: 'idx' may be used uninitialized in this function
security/selinux/netnode.c: In function 'sel_netnode_sid':
security/selinux/netnode.c:225: warning: 'ret' may be used uninitialized in this function
security/selinux/netnode.c:168: warning: 'idx' may be used uninitialized in this function

due to code correctly not expecting BUG() to return.

For some reason this reduces the object code size for that particular file.

Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:27 -07:00
Harvey Harrison
95d193a903 alpha: replace __inline with inline
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:27 -07:00
Andrew G. Morgan
3898b1b4eb capabilities: implement per-process securebits
Filesystem capability support makes it possible to do away with (set)uid-0
based privilege and use capabilities instead.  That is, with filesystem
support for capabilities but without this present patch, it is (conceptually)
possible to manage a system with capabilities alone and never need to obtain
privilege via (set)uid-0.

Of course, conceptually isn't quite the same as currently possible since few
user applications, certainly not enough to run a viable system, are currently
prepared to leverage capabilities to exercise privilege.  Further, many
applications exist that may never get upgraded in this way, and the kernel
will continue to want to support their setuid-0 base privilege needs.

Where pure-capability applications evolve and replace setuid-0 binaries, it is
desirable that there be a mechanisms by which they can contain their
privilege.  In addition to leveraging the per-process bounding and inheritable
sets, this should include suppressing the privilege of the uid-0 superuser
from the process' tree of children.

The feature added by this patch can be leveraged to suppress the privilege
associated with (set)uid-0.  This suppression requires CAP_SETPCAP to
initiate, and only immediately affects the 'current' process (it is inherited
through fork()/exec()).  This reimplementation differs significantly from the
historical support for securebits which was system-wide, unwieldy and which
has ultimately withered to a dead relic in the source of the modern kernel.

With this patch applied a process, that is capable(CAP_SETPCAP), can now drop
all legacy privilege (through uid=0) for itself and all subsequently
fork()'d/exec()'d children with:

  prctl(PR_SET_SECUREBITS, 0x2f);

This patch represents a no-op unless CONFIG_SECURITY_FILE_CAPABILITIES is
enabled at configure time.

[akpm@linux-foundation.org: fix uninitialised var warning]
[serue@us.ibm.com: capabilities: use cap_task_prctl when !CONFIG_SECURITY]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Reviewed-by: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Paul Moore <paul.moore@hp.com>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:26 -07:00
KAMEZAWA Hiroyuki
8cece85ec7 mm: fix broken gfp_zone with __GFP_THISNODE
This hack, "base = MAX_NR_ZONES", at __GFP_THISNODE was used for old
zonliests.

Now, new zonelist[] have a list for __GFP_THISNODE and this hack is incorrect.
Should be removed.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:26 -07:00
Yasunori Goto
e70260aabe memory hotplug: make alloc_bootmem_section()
alloc_bootmem_section() can allocate specified section's area.  This is used
for usemap to keep same section with pgdat by later patch.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:25 -07:00
Yasunori Goto
0475327876 memory hotplug: register section/node id to free
This patch set is to free pages which is allocated by bootmem for
memory-hotremove.  Some structures of memory management are allocated by
bootmem.  ex) memmap, etc.

To remove memory physically, some of them must be freed according to
circumstance.  This patch set makes basis to free those pages, and free
memmaps.

Basic my idea is using remain members of struct page to remember information
of users of bootmem (section number or node id).  When the section is
removing, kernel can confirm it.  By this information, some issues can be
solved.

  1) When the memmap of removing section is allocated on other
     section by bootmem, it should/can be free.
  2) When the memmap of removing section is allocated on the
     same section, it shouldn't be freed. Because the section has to be
     logical memory offlined already and all pages must be isolated against
     page allocater. If it is freed, page allocator may use it which will
     be removed physically soon.
  3) When removing section has other section's memmap,
     kernel will be able to show easily which section should be removed
     before it for user. (Not implemented yet)
  4) When the above case 2), the page isolation will be able to check and skip
     memmap's page when logical memory offline (offline_pages()).
     Current page isolation code fails in this case because this page is
     just reserved page and it can't distinguish this pages can be
     removed or not. But, it will be able to do by this patch.
     (Not implemented yet.)
  5) The node information like pgdat has similar issues. But, this
     will be able to be solved too by this.
     (Not implemented yet, but, remembering node id in the pages.)

Fortunately, current bootmem allocator just keeps PageReserved flags,
and doesn't use any other members of page struct. The users of
bootmem doesn't use them too.

This patch:

This is to register information which is node or section's id.  Kernel can
distinguish which node/section uses the pages allcated by bootmem.  This is
basis for hot-remove sections or nodes.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:25 -07:00
Gerald Schaefer
7f2e9525ba hugetlbfs: common code update for s390
Huge ptes have a special type on s390 and cannot be handled with the standard
pte functions in certain cases, e.g.  because of a different location of the
invalid bit.  This patch adds some new architecture- specific functions to
hugetlb common code, as a prerequisite for the s390 large page support.

This won't affect other architectures in functionality, but I need to add some
new dummy inline functions to the headers.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:25 -07:00
Gerald Schaefer
8fe627ec5b hugetlbfs: add missing TLB flush to hugetlb_cow()
A cow break on a hugetlbfs page with page_count > 1 will set a new pte with
set_huge_pte_at(), w/o any tlb flush operation.  The old pte will remain in
the tlb and subsequent write access to the page will result in a page fault
loop, for as long as it may take until the tlb is flushed from somewhere else.
 This patch introduces an architecture-specific huge_ptep_clear_flush()
function, which is called before the the set_huge_pte_at() in hugetlb_cow().

ATTENTION: This is just a nop on all architectures for now, the s390
implementation will come with our large page patch later.  Other architectures
should define their own huge_ptep_clear_flush() if needed.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:25 -07:00
Gerald Schaefer
6d779079bf hugetlbfs: architecture header cleanup
This patch moves all architecture functions for hugetlb to architecture header
files (include/asm-foo/hugetlb.h) and converts all macros to inline functions.
 It also removes (!) ARCH_HAS_HUGEPAGE_ONLY_RANGE,
ARCH_HAS_HUGETLB_FREE_PGD_RANGE, ARCH_HAS_PREPARE_HUGEPAGE_RANGE,
ARCH_HAS_SETCLEAR_HUGE_PTE and ARCH_HAS_HUGETLB_PREFAULT_HOOK.

Getting rid of the ARCH_HAS_xxx #ifdef and macro fugliness should increase
readability and maintainability, at the price of some code duplication.  An
asm-generic common part would have reduced the loc, but we would end up with
new ARCH_HAS_xxx defines eventually.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:25 -07:00
Lee Schermerhorn
71fe804b6d mempolicy: use struct mempolicy pointer in shmem_sb_info
This patch replaces the mempolicy mode, mode_flags, and nodemask in the
shmem_sb_info struct with a struct mempolicy pointer, initialized to NULL.
This removes dependency on the details of mempolicy from shmem.c and hugetlbfs
inode.c and simplifies the interfaces.

mpol_parse_str() in mempolicy.c is changed to return, via a pointer to a
pointer arg, a struct mempolicy pointer on success.  For MPOL_DEFAULT, the
returned pointer is NULL.  Further, mpol_parse_str() now takes a 'no_context'
argument that causes the input nodemask to be stored in the w.user_nodemask of
the created mempolicy for use when the mempolicy is installed in a tmpfs inode
shared policy tree.  At that time, any cpuset contextualization is applied to
the original input nodemask.  This preserves the previous behavior where the
input nodemask was stored in the superblock.  We can think of the returned
mempolicy as "context free".

Because mpol_parse_str() is now calling mpol_new(), we can remove from
mpol_to_str() the semantic checks that mpol_new() already performs.

Add 'no_context' parameter to mpol_to_str() to specify that it should format
the nodemask in w.user_nodemask for 'bind' and 'interleave' policies.

Change mpol_shared_policy_init() to take a pointer to a "context free" struct
mempolicy and to create a new, "contextualized" mempolicy using the mode,
mode_flags and user_nodemask from the input mempolicy.

  Note: we know that the mempolicy passed to mpol_to_str() or
  mpol_shared_policy_init() from a tmpfs superblock is "context free".  This
  is currently the only instance thereof.  However, if we found more uses for
  this concept, and introduced any ambiguity as to whether a mempolicy was
  context free or not, we could add another internal mode flag to identify
  context free mempolicies.  Then, we could remove the 'no_context' argument
  from mpol_to_str().

Added shmem_get_sbmpol() to return a reference counted superblock mempolicy,
if one exists, to pass to mpol_shared_policy_init().  We must add the
reference under the sb stat_lock to prevent races with replacement of the mpol
by remount.  This reference is removed in mpol_shared_policy_init().

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: another build fix]
[akpm@linux-foundation.org: yet another build fix]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:25 -07:00
Lee Schermerhorn
095f1fc4eb mempolicy: rework shmem mpol parsing and display
mm/shmem.c currently contains functions to parse and display memory policy
strings for the tmpfs 'mpol' mount option.  Move this to mm/mempolicy.c with
the rest of the mempolicy support.  With subsequent patches, we'll be able to
remove knowledge of the details [mode, flags, policy, ...] completely from
shmem.c

1) replace shmem_parse_mpol() in mm/shmem.c with mpol_parse_str() in
   mm/mempolicy.c.  Rework to use the policy_types[] array [used by
   mpol_to_str()] to look up mode by name.

2) use mpol_to_str() to format policy for shmem_show_mpol().  mpol_to_str()
   expects a pointer to a struct mempolicy, so temporarily construct one.
   This will be replaced with a reference to a struct mempolicy in the tmpfs
   superblock in a subsequent patch.

   NOTE 1: I changed mpol_to_str() to use a colon ':' rather than an equal
   sign '=' as the nodemask delimiter to match mpol_parse_str() and the
   tmpfs/shmem mpol mount option formatting that now uses mpol_to_str().  This
   is a user visible change to numa_maps, but then the addition of the mode
   flags already changed the display.  It makes sense to me to have the mounts
   and numa_maps display the policy in the same format.  However, if anyone
   objects strongly, I can pass the desired nodemask delimeter as an arg to
   mpol_to_str().

   Note 2: Like show_numa_map(), I don't check the return code from
   mpol_to_str().  I do use a longer buffer than the one provided by
   show_numa_map(), which seems to have sufficed so far.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:24 -07:00
Lee Schermerhorn
fc36b8d3d8 mempolicy: use MPOL_F_LOCAL to Indicate Preferred Local Policy
Now that we're using "preferred local" policy for system default, we need to
make this as fast as possible.  Because of the variable size of the mempolicy
structure [based on size of nodemasks], the preferred_node may be in a
different cacheline from the mode.  This can result in accessing an extra
cacheline in the normal case of system default policy.  Suspect this is the
cause of an observed 2-3% slowdown in page fault testing relative to kernel
without this patch series.

To alleviate this, use an internal mode flag, MPOL_F_LOCAL in the mempolicy
flags member which is guaranteed [?] to be in the same cacheline as the mode
itself.

Verified that reworked mempolicy now performs slightly better on 25-rc8-mm1
for both anon and shmem segments with system default and vma [preferred local]
policy.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:24 -07:00
Lee Schermerhorn
52cd3b0740 mempolicy: rework mempolicy Reference Counting [yet again]
After further discussion with Christoph Lameter, it has become clear that my
earlier attempts to clean up the mempolicy reference counting were a bit of
overkill in some areas, resulting in superflous ref/unref in what are usually
fast paths.  In other areas, further inspection reveals that I botched the
unref for interleave policies.

A separate patch, suitable for upstream/stable trees, fixes up the known
errors in the previous attempt to fix reference counting.

This patch reworks the memory policy referencing counting and, one hopes,
simplifies the code.  Maybe I'll get it right this time.

See the update to the numa_memory_policy.txt document for a discussion of
memory policy reference counting that motivates this patch.

Summary:

Lookup of mempolicy, based on (vma, address) need only add a reference for
shared policy, and we need only unref the policy when finished for shared
policies.  So, this patch backs out all of the unneeded extra reference
counting added by my previous attempt.  It then unrefs only shared policies
when we're finished with them, using the mpol_cond_put() [conditional put]
helper function introduced by this patch.

Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma
containing just the policy.  read_swap_cache_async() can call alloc_page_vma()
multiple times, so we can't let alloc_page_vma() unref the shared policy in
this case.  To avoid this, we make a copy of any non-null shared policy and
remove the MPOL_F_SHARED flag from the copy.  This copy occurs before reading
a page [or multiple pages] from swap, so the overhead should not be an issue
here.

I introduced a new static inline function "mpol_cond_copy()" to copy the
shared policy to an on-stack policy and remove the flags that would require a
conditional free.  The current implementation of mpol_cond_copy() assumes that
the struct mempolicy contains no pointers to dynamically allocated structures
that must be duplicated or reference counted during copy.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:24 -07:00
Lee Schermerhorn
a6020ed759 mempolicy: document {set|get}_policy() vm_ops APIs
Document mempolicy return value reference semantics assumed by the rest of the
mempolicy code for the set_ and get_policy vm_ops in <linux/mm.h>--where the
prototypes are defined--to inform any future mempolicy vm_op writers what the
rest of the subsystem expects of them.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:24 -07:00
Lee Schermerhorn
aab0b1029f mempolicy: mark shared policies for unref
As part of yet another rework of mempolicy reference counting, we want to be
able to identify shared policies efficiently, because they have an extra ref
taken on lookup that needs to be removed when we're finished using the policy.

  Note:  the extra ref is required because the policies are
  shared between tasks/processes and can be changed/freed
  by one task while another task is using them--e.g., for
  page allocation.

Building on David Rientjes mempolicy "mode flags" enhancement, this patch
indicates a "shared" policy by setting a new MPOL_F_SHARED flag in the flags
member of the struct mempolicy added by David.  MPOL_F_SHARED, and any future
"internal mode flags" are reserved from bit zero up, as they will never be
passed in the upper bits of the mode argument of a mempolicy API.

I set the MPOL_F_SHARED flag when the policy is installed in the shared policy
rb-tree.  Don't need/want to clear the flag when removing from the tree as the
mempolicy is freed [unref'd] internally to the sp_delete() function.  However,
a task could hold another reference on this mempolicy from a prior lookup.  We
need the MPOL_F_SHARED flag to stay put so that any tasks holding a ref will
unref, eventually freeing, the mempolicy.

A later patch in this series will introduce a function to conditionally unref
[mpol_free] a policy.  The MPOL_F_SHARED flag is one reason [currently the
only reason] to unref/free a policy via the conditional free.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:24 -07:00
Lee Schermerhorn
45c4745af3 mempolicy: rename struct mempolicy 'policy' member to 'mode'
The terms 'policy' and 'mode' are both used in various places to describe the
semantics of the value stored in the 'policy' member of struct mempolicy.
Furthermore, the term 'policy' is used to refer to that member, to the entire
struct mempolicy and to the more abstract concept of the tuple consisting of a
"mode" and an optional node or set of nodes.  Recently, we have added "mode
flags" that are passed in the upper bits of the 'mode' [or sometimes,
'policy'] member of the numa APIs.

I'd like to resolve this confusion, which perhaps only exists in my mind, by
renaming the 'policy' member to 'mode' throughout, and fixing up the
Documentation.  Man pages will be updated separately.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:24 -07:00
Lee Schermerhorn
846a16bf0f mempolicy: rename mpol_copy to mpol_dup
This patch renames mpol_copy() to mpol_dup() because, well, that's what it
does.  Like, e.g., strdup() for strings, mpol_dup() takes a pointer to an
existing mempolicy, allocates a new one and copies the contents.

In a later patch, I want to use the name mpol_copy() to copy the contents from
one mempolicy to another like, e.g., strcpy() does for strings.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:23 -07:00
Lee Schermerhorn
f0be3d32b0 mempolicy: rename mpol_free to mpol_put
This is a change that was requested some time ago by Mel Gorman.  Makes sense
to me, so here it is.

Note: I retain the name "mpol_free_shared_policy()" because it actually does
free the shared_policy, which is NOT a reference counted object.  However, ...

The mempolicy object[s] referenced by the shared_policy are reference counted,
so mpol_put() is used to release the reference held by the shared_policy.  The
mempolicy might not be freed at this time, because some task attached to the
shared object associated with the shared policy may be in the process of
allocating a page based on the mempolicy.  In that case, the task performing
the allocation will hold a reference on the mempolicy, obtained via
mpol_shared_policy_lookup().  The mempolicy will be freed when all tasks
holding such a reference have called mpol_put() for the mempolicy.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:23 -07:00
Adam Litke
3b11630063 Subject: [PATCH] hugetlb: vmstat events for huge page allocations
Allocating huge pages directly from the buddy allocator is not guaranteed to
succeed.  Success depends on several factors (such as the amount of physical
memory available and the level of fragmentation).  With the addition of
dynamic hugetlb pool resizing, allocations can occur much more frequently.
For these reasons it is desirable to keep track of huge page allocation
successes and failures.

Add two new vmstat entries to track huge page allocations that succeed and
fail.  The presence of the two entries is contingent upon CONFIG_HUGETLB_PAGE
being enabled.

[akpm@linux-foundation.org: reduced ifdeffery]
Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Eric Munson <ebmunson@us.ibm.com>
Tested-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andy Whitcroft <apw@shadowen.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:23 -07:00
Nick Piggin
a08cb629f5 s390: implement pte special bit
Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP for
the user mappings.

This requires the get_xip_page API to be changed to an address based one.
Improve the API layering a little bit too, while we're here.

This is required in order to support XIP filesystems on memory that isn't
backed with struct page (but memory with struct page is still supported too).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:23 -07:00
Nick Piggin
70688e4dd1 xip: support non-struct page backed memory
Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP for
the user mappings.

This requires the get_xip_page API to be changed to an address based one.
Improve the API layering a little bit too, while we're here.

This is required in order to support XIP filesystems on memory that isn't
backed with struct page (but memory with struct page is still supported too).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:23 -07:00
Jared Hulbert
30afcb4bd2 return pfn from direct_access, for XIP
Alter the block device ->direct_access() API to work with the new
get_xip_mem() API (that requires both kaddr and pfn are returned).

Some architectures will not do the right thing in their virt_to_page() for use
by XIP (to translate from the kernel virtual address returned by
direct_access(), to a user mappable pfn in XIP's page fault handler.

However, we can't switch it to just return the pfn and not the kaddr, because
we have no good way to get a kva from a pfn, and XIP requires the kva for its
read(2) and write(2) handlers.  So we have to return both.

Signed-off-by: Jared Hulbert <jaredeh@gmail.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-mm@kvack.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:23 -07:00
Nick Piggin
423bad6004 mm: add vm_insert_mixed
vm_insert_mixed will insert either a raw pfn or a refcounted struct page into
the page tables, depending on whether vm_normal_page() will return the page or
not.  With the introduction of the new pte bit, this is now a too tricky for
drivers to be doing themselves.

filemap_xip uses this in a subsequent patch.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:23 -07:00
Nick Piggin
7e675137a8 mm: introduce pte_special pte bit
s390 for one, cannot implement VM_MIXEDMAP with pfn_valid, due to their memory
model (which is more dynamic than most).  Instead, they had proposed to
implement it with an additional path through vm_normal_page(), using a bit in
the pte to determine whether or not the page should be refcounted:

vm_normal_page()
{
	...
        if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
                if (vma->vm_flags & VM_MIXEDMAP) {
#ifdef s390
			if (!mixedmap_refcount_pte(pte))
				return NULL;
#else
                        if (!pfn_valid(pfn))
                                return NULL;
#endif
                        goto out;
                }
	...
}

This is fine, however if we are allowed to use a bit in the pte to determine
refcountedness, we can use that to _completely_ replace all the vma based
schemes.  So instead of adding more cases to the already complex vma-based
scheme, we can have a clearly seperate and simple pte-based scheme (and get
slightly better code generation in the process):

vm_normal_page()
{
#ifdef s390
	if (!mixedmap_refcount_pte(pte))
		return NULL;
	return pte_page(pte);
#else
	...
#endif
}

And finally, we may rather make this concept usable by any architecture rather
than making it s390 only, so implement a new type of pte state for this.
Unfortunately the old vma based code must stay, because some architectures may
not be able to spare pte bits.  This makes vm_normal_page a little bit more
ugly than we would like, but the 2 cases are clearly seperate.

So introduce a pte_special pte state, and use it in mm/memory.c.  It is
currently a noop for all architectures, so this doesn't actually result in any
compiled code changes to mm/memory.o.

BTW:
I haven't put vm_normal_page() into arch code as-per an earlier suggestion.
The reason is that, regardless of where vm_normal_page is actually
implemented, the *abstraction* is still exactly the same. Also, while it
depends on whether the architecture has pte_special or not, that is the
only two possible cases, and it really isn't an arch specific function --
the role of the arch code should be to provide primitive functions and
accessors with which to build the core code; pte_special does that. We do
not want architectures to know or care about vm_normal_page itself, and
we definitely don't want them being able to invent something new there
out of sight of mm/ code. If we made vm_normal_page an arch function, then
we have to make vm_insert_mixed (next patch) an arch function too. So I
don't think moving it to arch code fundamentally improves any abstractions,
while it does practically make the code more difficult to follow, for both
mm and arch developers, and easier to misuse.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:23 -07:00
Jared Hulbert
b379d79019 mm: introduce VM_MIXEDMAP
This series introduces some important infrastructure work.  The overall result
is that:

1. We now support XIP backed filesystems using memory that have no
   struct page allocated to them. And patches 6 and 7 actually implement
   this for s390.

   This is pretty important in a number of cases. As far as I understand,
   in the case of virtualisation (eg. s390), each guest may mount a
   readonly copy of the same filesystem (eg. the distro). Currently,
   guests need to allocate struct pages for this image. So if you have
   100 guests, you already need to allocate more memory for the struct
   pages than the size of the image. I think. (Carsten?)

   For other (eg. embedded) systems, you may have a very large non-
   volatile filesystem. If you have to have struct pages for this, then
   your RAM consumption will go up proportionally to fs size. Even
   though it is just a small proportion, the RAM can be much more costly
   eg in terms of power, so every KB less that Linux uses makes it more
   attractive to a lot of these guys.

2. VM_MIXEDMAP allows us to support mappings where you actually do want
   to refcount _some_ pages in the mapping, but not others, and support
   COW on arbitrary (non-linear) mappings. Jared needs this for his NVRAM
   filesystem in progress. Future iterations of this filesystem will
   most likely want to migrate pages between pagecache and XIP backing,
   which is where the requirement for mixed (some refcounted, some not)
   comes from.

3. pte_special also has a peripheral usage that I need for my lockless
   get_user_pages patch. That was shown to speed up "oltp" on db2 by
   10% on a 2 socket system, which is kind of significant because they
   scrounge for months to try to find 0.1% improvement on these
   workloads. I'm hoping we might finally be faster than AIX on
   pSeries with this :). My reference to lockless get_user_pages is not
   meant to justify this patchset (which doesn't include lockless gup),
   but just to show that pte_special is not some s390 specific thing that
   should be hidden in arch code or xip code: I definitely want to use it
   on at least x86 and powerpc as well.

This patch:

Introduce a new type of mapping, VM_MIXEDMAP.  This is unlike VM_PFNMAP in
that it can support COW mappings of arbitrary ranges including ranges without
struct page *and* ranges with a struct page that we actually want to refcount
(PFNMAP can only support COW in those cases where the un-COW-ed translations
are mapped linearly in the virtual address, and can only support non
refcounted ranges).

VM_MIXEDMAP achieves this by refcounting all pfn_valid pages, and not
refcounting !pfn_valid pages (which is not an option for VM_PFNMAP, because it
needs to avoid refcounting pfn_valid pages eg.  for /dev/mem mappings).

Signed-off-by: Jared Hulbert <jaredeh@gmail.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:22 -07:00
Christoph Lameter
e20b8cca76 PAGEFLAGS_EXTENDED and separate page flags for Head and Tail
Having separate page flags for the head and the tail of a compound page allows
the compiler to use bitops instead of operations on a word to check for a tail
page.  That is f.e.  important for virt_to_head_page() which is used in
various critical code paths (kfree for example):

Code for PageTail(page)

Before:

 mov    (%rdi),%rdx		page->flags
 mov    %rdx,%rax		3 bytes
 and    $0x12000,%eax		5 bytes
 cmp    $0x12000,%rax		6 bytes
 je     897 <kfree+0xa7>

After:

 mov    (%rdi),%rax
 test   $0x40,%ah			(3 bytes)
 jne    887 <kfree+0x97>

So we go from 14 bytes to 3 bytes and from 3 instructions to one.  From the
use of 2 registers we go to none.

We can only use page flags for this if we have page flags available.  This
patch introduces CONFIG_PAGEFLAGS_EXTENDED that is set if pageflags are not
scarce due to SPARSEMEM using page flags for its sectionid on 32 bit NUMA
platforms.

Additional page flag definitions can be added to the CONFIG_PAGEFLAGS_EXTENDED
section in page-flags.h if the functionality depends on PAGEFLAGS_EXTENDED or
if more page flag overlapping tricks are used for the !PAGEFLAGS_EXTENDED
fallback (the upcoming virtual compound patch may hook in here and Rik's/Lee's
additional page flags to solve the reclaim issues could also be added there
[hint...  hint...  where are these patchsets?]).

Avoiding the overlaying of Pg_reclaim also clears the way for possible use of
compound pages for the pagecache or on the LRU.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:22 -07:00
Christoph Lameter
97965478a6 mm: Get rid of __ZONE_COUNT
It was used to compensate because MAX_NR_ZONES was not available to the
#ifdefs.  Export MAX_NR_ZONES via the new mechanism and get rid of
__ZONE_COUNT.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:22 -07:00
Christoph Lameter
ec7cade8c1 page flags: add PAGEFLAGS_FALSE for flags that are always false
Turns out that there are a number of times that a flag is simply always
returning 0.  Define a macro for that.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:22 -07:00
Christoph Lameter
602c4d112f page flags: handle PG_uncached like all other flags
Remove the special setup for PG_uncached and simply make it part of the enum.
The page flag will only be allocated when the kernel build includes the
uncached allocator.

Acked-by: Dean Nelson <dcn@sgi.com>
Cc: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:22 -07:00
Christoph Lameter
0a128b2b1a pageflags: eliminate PG_xxx aliases
Remove aliases of PG_xxx.  We can easily drop those now and alias by
specifying the PG_xxx flag in the macro that generates the functions.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:22 -07:00
Christoph Lameter
d60cd46bbd pageflags: use proper page flag functions in Xen
Xen uses bitops to manipulate page flags.  Make it use proper page flag
functions.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:22 -07:00
Christoph Lameter
6a1e7f777f pageflags: convert to the use of new macros
Replace explicit definitions of page flags through the use of macros.
Significantly reduces the size of the definitions and removes a lot of
opportunity for errors.  Additonal page flags can typically be generated with
a single line.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:22 -07:00
Christoph Lameter
f94a62e910 pageflags: introduce macros to generate page flag functions
Introduce a set of macros that generate functions to handle page flags.

A page flag function group typically starts with either

	SETPAGEFLAG(<part of function name>,<part of PG_ flagname>)

to create a set of page flag operations that are atomic. Or

	__SETPAGEFLAG(<part of function name>,<part of PG_ flagname)

to create a set of page flag operations that are not atomic.

Then additional operations can be added using the following macros

TESTSCFLAG		Create additional atomic test-and-set and
			test-and-clear functions

TESTSETFLAG		Create additional test and set function
TESTCLEARFLAG		Create additional test and clear function
SETPAGEFLAG		Create additional atomic set function
CLEARPAGEFLAG		Create additional atomic clear function
__TESTPAGEFLAG		Create additional non atomic set function
__SETPAGEFLAG		Create additional non atomic clear function

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:22 -07:00
Christoph Lameter
9223b4190f pageflags: get rid of FLAGS_RESERVED
NR_PAGEFLAGS specifies the number of page flags we are using.  From that we
can calculate the number of bits leftover that can be used for zone, node (and
maybe the sections id).  There is no need anymore for FLAGS_RESERVED if we use
NR_PAGEFLAGS.

Use the new methods to make NR_PAGEFLAGS available via the preprocessor.
NR_PAGEFLAGS is used to calculate field boundaries in the page flags fields.
These field widths have to be available to the preprocessor.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: David Miller <davem@davemloft.net>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:21 -07:00
Christoph Lameter
e268318149 pageflags: use an enum for the flags
Use an enum to ease the maintenance of page flags.  This is going to change
the numbering from 0 to 18.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:21 -07:00
Andrew Morton
726b801272 page_mapping(): add ifdef around reference to swapper_space
This fixes the superh build when the pageflags patches are applied.

But it shouldn't unless it's a gcc bug.

Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:21 -07:00
Christoph Lameter
308c05e35e sparsemem: vmemmap does not need section bits
A set of patches that attempts to improve page flag handling.  First of all a
method is introduced to generate the page flag functions using macros.  Then
the number of page flags used by sparsemem is reduced.  All page flag
operations will no longer be macros.  All flags will use inline function.

Then we add a way to export enum constants to the preprocessor which allows us
to get rid of __ZONE_COUNT and use the NR_PAGEFLAGS for the dynamic
calculation of actually available page flags for fields.

This patch:

Sparsemem vmemmap does not need any section bits.  This patch has the effect
of reducing the number of bits used in page->flags by at least 6.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:21 -07:00
Christoph Lameter
2301696932 vmallocinfo: add caller information
Add caller information so that /proc/vmallocinfo shows where the allocation
request for a slice of vmalloc memory originated.

Results in output like this:

0xffffc20000000000-0xffffc20000801000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
0xffffc20000801000-0xffffc20000806000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
0xffffc20000806000-0xffffc20000c07000 4198400 alloc_large_system_hash+0x127/0x246 pages=1024 vmalloc vpages
0xffffc20000c07000-0xffffc20000c0a000   12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
0xffffc20000c0a000-0xffffc20000c0c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c0c000-0xffffc20000c0f000   12288 acpi_os_map_memory+0x13/0x1c phys=cff64000 ioremap
0xffffc20000c10000-0xffffc20000c15000   20480 acpi_os_map_memory+0x13/0x1c phys=cff65000 ioremap
0xffffc20000c16000-0xffffc20000c18000    8192 acpi_os_map_memory+0x13/0x1c phys=cff69000 ioremap
0xffffc20000c18000-0xffffc20000c1a000    8192 acpi_os_map_memory+0x13/0x1c phys=fed1f000 ioremap
0xffffc20000c1a000-0xffffc20000c1c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c1c000-0xffffc20000c1e000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c1e000-0xffffc20000c20000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c20000-0xffffc20000c22000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c22000-0xffffc20000c24000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c24000-0xffffc20000c26000    8192 acpi_os_map_memory+0x13/0x1c phys=e0081000 ioremap
0xffffc20000c26000-0xffffc20000c28000    8192 acpi_os_map_memory+0x13/0x1c phys=e0080000 ioremap
0xffffc20000c28000-0xffffc20000c2d000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
0xffffc20000c2d000-0xffffc20000c31000   16384 tcp_init+0xd5/0x31c pages=3 vmalloc
0xffffc20000c31000-0xffffc20000c34000   12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
0xffffc20000c34000-0xffffc20000c36000    8192 init_vdso_vars+0xde/0x1f1
0xffffc20000c36000-0xffffc20000c38000    8192 pci_iomap+0x8a/0xb4 phys=d8e00000 ioremap
0xffffc20000c38000-0xffffc20000c3a000    8192 usb_hcd_pci_probe+0x139/0x295 [usbcore] phys=d8e00000 ioremap
0xffffc20000c3a000-0xffffc20000c3e000   16384 sys_swapon+0x509/0xa15 pages=3 vmalloc
0xffffc20000c40000-0xffffc20000c61000  135168 e1000_probe+0x1c4/0xa32 phys=d8a20000 ioremap
0xffffc20000c61000-0xffffc20000c6a000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c6a000-0xffffc20000c73000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c73000-0xffffc20000c7c000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c7c000-0xffffc20000c7f000   12288 e1000e_setup_tx_resources+0x29/0xbe pages=2 vmalloc
0xffffc20000c80000-0xffffc20001481000 8392704 pci_mmcfg_arch_init+0x90/0x118 phys=e0000000 ioremap
0xffffc20001481000-0xffffc20001682000 2101248 alloc_large_system_hash+0x127/0x246 pages=512 vmalloc
0xffffc20001682000-0xffffc20001e83000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
0xffffc20001e83000-0xffffc20002204000 3674112 alloc_large_system_hash+0x127/0x246 pages=896 vmalloc vpages
0xffffc20002204000-0xffffc2000220d000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc2000220d000-0xffffc20002216000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002216000-0xffffc2000221f000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc2000221f000-0xffffc20002228000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002228000-0xffffc20002231000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002231000-0xffffc20002234000   12288 e1000e_setup_rx_resources+0x35/0x122 pages=2 vmalloc
0xffffc20002240000-0xffffc20002261000  135168 e1000_probe+0x1c4/0xa32 phys=d8a60000 ioremap
0xffffc20002261000-0xffffc2000270c000 4894720 sys_swapon+0x509/0xa15 pages=1194 vmalloc vpages
0xffffffffa0000000-0xffffffffa0022000  139264 module_alloc+0x4f/0x55 pages=33 vmalloc
0xffffffffa0022000-0xffffffffa0029000   28672 module_alloc+0x4f/0x55 pages=6 vmalloc
0xffffffffa002b000-0xffffffffa0034000   36864 module_alloc+0x4f/0x55 pages=8 vmalloc
0xffffffffa0034000-0xffffffffa003d000   36864 module_alloc+0x4f/0x55 pages=8 vmalloc
0xffffffffa003d000-0xffffffffa0049000   49152 module_alloc+0x4f/0x55 pages=11 vmalloc
0xffffffffa0049000-0xffffffffa0050000   28672 module_alloc+0x4f/0x55 pages=6 vmalloc

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:21 -07:00
Christoph Lameter
a10aa57987 vmalloc: show vmalloced areas via /proc/vmallocinfo
Implement a new proc file that allows the display of the currently allocated
vmalloc memory.

It allows to see the users of vmalloc.  That is important if vmalloc space is
scarce (i386 for example).

And it's going to be important for the compound page fallback to vmalloc.
Many of the current users can be switched to use compound pages with fallback.
 This means that the number of users of vmalloc is reduced and page tables no
longer necessary to access the memory.  /proc/vmallocinfo allows to review how
that reduction occurs.

If memory becomes fragmented and larger order allocations are no longer
possible then /proc/vmallocinfo allows to see which compound page allocations
fell back to virtual compound pages.  That is important for new users of
virtual compound pages.  Such as order 1 stack allocation etc that may
fallback to virtual compound pages in the future.

/proc/vmallocinfo permissions are made readable-only-by-root to avoid possible
information leakage.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: CONFIG_MMU=n build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:21 -07:00
Andrew Morton
b454456841 mm: make early_pfn_to_nid() a C function
Fix this (sparc64)

mm/sparse-vmemmap.c: In function `vmemmap_verify':
mm/sparse-vmemmap.c:64: warning: unused variable `pfn'

by switching to a C function which touches its arg.

(reason 3,555 why macros are bad)

Also, the `nid' arg was misnamed.

Reviewed-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:20 -07:00
Miklos Szeredi
ac6aadb24b mm: rotate_reclaimable_page() cleanup
Clean up messy conditional calling of test_clear_page_writeback() from both
rotate_reclaimable_page() and end_page_writeback().

The only user of rotate_reclaimable_page() is end_page_writeback() so this is
OK.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:20 -07:00
Andi Kleen
7edf85aa3c mm: save some bytes in mm_struct by filling holes on 64bit
Save some bytes in mm_struct by filling holes

Putting int values together for better packing on 64bit shrinks sizeof(struct
mm_struct) from 776 bytes to 764 bytes.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:20 -07:00
David Rientjes
3842b46de6 mempolicy: small header file cleanup
Removes forward definition of vm_area_struct in linux/mempolicy.h.  We already
get it from the linux/slab.h -> linux/gfp.h include.

Removes the unused mpol_set_vma_default() macro from linux/mempolicy.h.

Removes the extern definition of default_policy since it is only referenced,
as it should be, in mm/mempolicy.c.

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:20 -07:00
David Rientjes
4c50bc0116 mempolicy: add MPOL_F_RELATIVE_NODES flag
Adds another optional mode flag, MPOL_F_RELATIVE_NODES, that specifies
nodemasks passed via set_mempolicy() or mbind() should be considered relative
to the current task's mems_allowed.

When the mempolicy is created, the passed nodemask is folded and mapped onto
the current task's mems_allowed.  For example, consider a task using
set_mempolicy() to pass MPOL_INTERLEAVE | MPOL_F_RELATIVE_NODES with a
nodemask of 1-3.  If current's mems_allowed is 4-7, the effected nodemask is
5-7 (the second, third, and fourth node of mems_allowed).

If the same task is attached to a cpuset, the mempolicy nodemask is rebound
each time the mems are changed.  Some possible rebinds and results are:

	mems			result
	1-3			1-3
	1-7			2-4
	1,5-6			1,5-6
	1,5-7			5-7

Likewise, the zonelist built for MPOL_BIND acts on the set of zones assigned
to the resultant nodemask from the relative remap.

In the MPOL_PREFERRED case, the preferred node is remapped from the currently
effected nodemask to the relative nodemask.

This mempolicy mode flag was conceived of by Paul Jackson <pj@sgi.com>.

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:19 -07:00
Paul Jackson
7ea931c9fc mempolicy: add bitmap_onto() and bitmap_fold() operations
The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(),
with the usual cpumask and nodemask wrappers.

The bitmap_onto() operator computes one bitmap relative to another.  If the
n-th bit in the origin mask is set, then the m-th bit of the destination mask
will be set, where m is the position of the n-th set bit in the relative mask.

The bitmap_fold() operator folds a bitmap into a second that has bit m set iff
the input bitmap has some bit n set, where m == n mod sz, for the specified sz
value.

There are two substantive changes between this patch and its
predecessor bitmap_relative:
 1) Renamed bitmap_relative() to be bitmap_onto().
 2) Added bitmap_fold().

The essential motivation for bitmap_onto() is to provide a mechanism for
converting a cpuset-relative CPU or Node mask to an absolute mask.  Cpuset
relative masks are written as if the current task were in a cpuset whose CPUs
or Nodes were just the consecutive ones numbered 0..N-1, for some N.  The
bitmap_onto() operator is provided in anticipation of adding support for the
first such cpuset relative mask, by the mbind() and set_mempolicy() system
calls, using a planned flag of MPOL_F_RELATIVE_NODES.  These bitmap operators
(and their nodemask wrappers, in particular) will be used in code that
converts the user specified cpuset relative memory policy to a specific system
node numbered policy, given the current mems_allowed of the tasks cpuset.

Such cpuset relative mempolicies will address two deficiencies
of the existing interface between cpusets and mempolicies:
 1) A task cannot at present reliably establish a cpuset
    relative mempolicy because there is an essential race
    condition, in that the tasks cpuset may be changed in
    between the time the task can query its cpuset placement,
    and the time the task can issue the applicable mbind or
    set_memplicy system call.
 2) A task cannot at present establish what cpuset relative
    mempolicy it would like to have, if it is in a smaller
    cpuset than it might have mempolicy preferences for,
    because the existing interface only allows specifying
    mempolicies for nodes currently allowed by the cpuset.

Cpuset relative mempolicies are useful for tasks that don't distinguish
particularly between one CPU or Node and another, but only between how many of
each are allowed, and the proper placement of threads and memory pages on the
various CPUs and Nodes available.

The motivation for the added bitmap_fold() can be seen in the following
example.

Let's say an application has specified some mempolicies that presume 16 memory
nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset
relative) nodes 12-15.  Then lets say that application is crammed into a
cpuset that only has 8 memory nodes, 0-7.  If one just uses bitmap_onto(),
this mempolicy, mapped to that cpuset, would ignore the requested relative
nodes above 7, leaving it empty of nodes.  That's not good; better to fold the
higher nodes down, so that some nodes are included in the resulting mapped
mempolicy.  In this case, the mempolicy nodes 12-15 are taken modulo 8 (the
weight of the mems_allowed of the confining cpuset), resulting in a mempolicy
specifying nodes 4-7.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: <kosaki.motohiro@jp.fujitsu.com>
Cc: <ray-lk@madrabbit.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:19 -07:00
David Rientjes
f5b087b52f mempolicy: add MPOL_F_STATIC_NODES flag
Add an optional mempolicy mode flag, MPOL_F_STATIC_NODES, that suppresses the
node remap when the policy is rebound.

Adds another member to struct mempolicy, nodemask_t user_nodemask, as part of
a union with cpuset_mems_allowed:

	struct mempolicy {
		...
		union {
			nodemask_t cpuset_mems_allowed;
			nodemask_t user_nodemask;
		} w;
	}

that stores the the nodemask that the user passed when he or she created the
mempolicy via set_mempolicy() or mbind().  When using MPOL_F_STATIC_NODES,
which is passed with any mempolicy mode, the user's passed nodemask
intersected with the VMA or task's allowed nodes is always used when
determining the preferred node, setting the MPOL_BIND zonelist, or creating
the interleave nodemask.  This happens whenever the policy is rebound,
including when a task's cpuset assignment changes or the cpuset's mems are
changed.

This creates an interesting side-effect in that it allows the mempolicy
"intent" to lie dormant and uneffected until it has access to the node(s) that
it desires.  For example, if you currently ask for an interleaved policy over
a set of nodes that you do not have access to, the mempolicy is not created
and the task continues to use the previous policy.  With this change, however,
it is possible to create the same mempolicy; it is only effected when access
to nodes in the nodemask is acquired.

It is also possible to mount tmpfs with the static nodemask behavior when
specifying a node or nodemask.  To do this, simply add "=static" immediately
following the mempolicy mode at mount time:

	mount -o remount mpol=interleave=static:1-3

Also removes mpol_check_policy() and folds its logic into mpol_new() since it
is now obsoleted.  The unused vma_mpol_equal() is also removed.

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:19 -07:00
David Rientjes
028fec414d mempolicy: support optional mode flags
With the evolution of mempolicies, it is necessary to support mempolicy mode
flags that specify how the policy shall behave in certain circumstances.  The
most immediate need for mode flag support is to suppress remapping the
nodemask of a policy at the time of rebind.

Both the mempolicy mode and flags are passed by the user in the 'int policy'
formal of either the set_mempolicy() or mbind() syscall.  A new constant,
MPOL_MODE_FLAGS, represents the union of legal optional flags that may be
passed as part of this int.  Mempolicies that include illegal flags as part of
their policy are rejected as invalid.

An additional member to struct mempolicy is added to support the mode flags:

	struct mempolicy {
		...
		unsigned short policy;
		unsigned short flags;
	}

The splitting of the 'int' actual passed by the user is done in
sys_set_mempolicy() and sys_mbind() for their respective syscalls.  This is
done by intersecting the actual with MPOL_MODE_FLAGS, rejecting the syscall of
there are additional flags, and storing it in the new 'flags' member of struct
mempolicy.  The intersection of the actual with ~MPOL_MODE_FLAGS is stored in
the 'policy' member of the struct and all current users of pol->policy remain
unchanged.

The union of the policy mode and optional mode flags is passed back to the
user in get_mempolicy().

This combination of mode and flags within the same actual does not break
userspace code that relies on get_mempolicy(&policy, ...) and either

	switch (policy) {
	case MPOL_BIND:
		...
	case MPOL_INTERLEAVE:
		...
	};

statements or

	if (policy == MPOL_INTERLEAVE) {
		...
	}

statements.  Such applications would need to use optional mode flags when
calling set_mempolicy() or mbind() for these previously implemented statements
to stop working.  If an application does start using optional mode flags, it
will need to mask the optional flags off the policy in switch and conditional
statements that only test mode.

An additional member is also added to struct shmem_sb_info to store the
optional mode flags.

[hugh@veritas.com: shmem mpol: fix build warning]
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:19 -07:00
David Rientjes
a3b51e0142 mempolicy: convert MPOL constants to enum
The mempolicy mode constants, MPOL_DEFAULT, MPOL_PREFERRED, MPOL_BIND, and
MPOL_INTERLEAVE, are better declared as part of an enum since they are
sequentially numbered and cannot be combined.

The policy member of struct mempolicy is also converted from type short to
type unsigned short.  A negative policy does not have any legitimate meaning,
so it is possible to change its type in preparation for adding optional mode
flags later.

The equivalent member of struct shmem_sb_info is also changed from int to
unsigned short.

For compatibility, the policy formal to get_mempolicy() remains as a pointer
to an int:

	int get_mempolicy(int *policy, unsigned long *nmask,
			  unsigned long maxnode, unsigned long addr,
			  unsigned long flags);

although the only possible values is the range of type unsigned short.

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:19 -07:00
Pekka Enberg
1b27d05b6e mm: move cache_line_size() to <linux/cache.h>
Not all architectures define cache_line_size() so as suggested by Andrew move
the private implementations in mm/slab.c and mm/slob.c to <linux/cache.h>.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:19 -07:00
Mel Gorman
19770b3260 mm: filter based on a nodemask as well as a gfp_mask
The MPOL_BIND policy creates a zonelist that is used for allocations
controlled by that mempolicy.  As the per-node zonelist is already being
filtered based on a zone id, this patch adds a version of __alloc_pages() that
takes a nodemask for further filtering.  This eliminates the need for
MPOL_BIND to create a custom zonelist.

A positive benefit of this is that allocations using MPOL_BIND now use the
local node's distance-ordered zonelist instead of a custom node-id-ordered
zonelist.  I.e., pages will be allocated from the closest allowed node with
available memory.

[Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments]
[Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask]
[Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:19 -07:00
Mel Gorman
dd1a239f6f mm: have zonelist contains structs with both a zone pointer and zone_idx
Filtering zonelists requires very frequent use of zone_idx().  This is costly
as it involves a lookup of another structure and a substraction operation.  As
the zone_idx is often required, it should be quickly accessible.  The node idx
could also be stored here if it was found that accessing zone->node is
significant which may be the case on workloads where nodemasks are heavily
used.

This patch introduces a struct zoneref to store a zone pointer and a zone
index.  The zonelist then consists of an array of these struct zonerefs which
are looked up as necessary.  Helpers are given for accessing the zone index as
well as the node index.

[kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers]
[hugh@veritas.com: mm-have-zonelist: fix memcg ooms]
[hugh@veritas.com: just return do_try_to_free_pages]
[hugh@veritas.com: do_try_to_free_pages gfp_mask redundant]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:18 -07:00
Mel Gorman
54a6eb5c47 mm: use two zonelist that are filtered by GFP mask
Currently a node has two sets of zonelists, one for each zone type in the
system and a second set for GFP_THISNODE allocations.  Based on the zones
allowed by a gfp mask, one of these zonelists is selected.  All of these
zonelists consume memory and occupy cache lines.

This patch replaces the multiple zonelists per-node with two zonelists.  The
first contains all populated zones in the system, ordered by distance, for
fallback allocations when the target/preferred node has no free pages.  The
second contains all populated zones in the node suitable for GFP_THISNODE
allocations.

An iterator macro is introduced called for_each_zone_zonelist() that interates
through each zone allowed by the GFP flags in the selected zonelist.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:18 -07:00
Mel Gorman
18ea7e710d mm: remember what the preferred zone is for zone_statistics
On NUMA, zone_statistics() is used to record events like numa hit, miss and
foreign.  It assumes that the first zone in a zonelist is the preferred zone.
When multiple zonelists are replaced by one that is filtered, this is no
longer the case.

This patch records what the preferred zone is rather than assuming the first
zone in the zonelist is it.  This simplifies the reading of later patches in
this set.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:18 -07:00
Mel Gorman
0e88460da6 mm: introduce node_zonelist() for accessing the zonelist for a GFP mask
Introduce a node_zonelist() helper function.  It is used to lookup the
appropriate zonelist given a node and a GFP mask.  The patch on its own is a
cleanup but it helps clarify parts of the two-zonelist-per-node patchset.  If
necessary, it can be merged with the next patch in this set without problems.

Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:18 -07:00
Mel Gorman
dac1d27bc8 mm: use zonelists instead of zones when direct reclaiming pages
The following patches replace multiple zonelists per node with two zonelists
that are filtered based on the GFP flags.  The patches as a set fix a bug with
regard to the use of MPOL_BIND and ZONE_MOVABLE.  With this patchset, the
MPOL_BIND will apply to the two highest zones when the highest zone is
ZONE_MOVABLE.  This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that filters
only custom zonelists.

The first patch cleans up an inconsistency where direct reclaim uses
zonelist->zones where other places use zonelist.

The second patch introduces a helper function node_zonelist() for looking up
the appropriate zonelist for a GFP mask which simplifies patches later in the
set.

The third patch defines/remembers the "preferred zone" for numa statistics, as
it is no longer always the first zone in a zonelist.

The forth patch replaces multiple zonelists with two zonelists that are
filtered.  The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.

The fifth patch introduces helper macros for retrieving the zone and node
indices of entries in a zonelist.

The final patch introduces filtering of the zonelists based on a nodemask.
Two zonelists exist per node, one for normal allocations and one for
__GFP_THISNODE.

Performance results varied depending on the machine configuration.  In real
workloads the gain/loss will depend on how much the userspace portion of the
benchmark benefits from having more cache available due to reduced referencing
of zonelists.

These are the range of performance losses/gains when running against
2.6.24-rc4-mm1.  The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.
			     loss   to  gain
Total CPU time on Kernbench: -0.86% to  1.13%
Elapsed   time on Kernbench: -0.79% to  0.76%
page_test from aim9:         -4.37% to  0.79%
brk_test  from aim9:         -0.71% to  4.07%
fork_test from aim9:         -1.84% to  4.60%
exec_test from aim9:         -0.71% to  1.08%

This patch:

The allocator deals with zonelists which indicate the order in which zones
should be targeted for an allocation.  Similarly, direct reclaim of pages
iterates over an array of zones.  For consistency, this patch converts direct
reclaim to use a zonelist.  No functionality is changed by this patch.  This
simplifies zonelist iterators in the next patch.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:18 -07:00
Nick Piggin
3c18ddd160 mm: remove nopage
Nothing in the tree uses nopage any more.  Remove support for it in the
core mm code and documentation (and a few stray references to it in
comments).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:18 -07:00
Harvey Harrison
ddc81ed2c5 remove sparse warning for mmzone.h
include/linux/mmzone.h:640:22: warning: potentially expensive pointer subtraction

Calculate the offset into the node_zones array rather than the index
using casts to (char *) and comparing against the index * sizeof(struct zone).

On X86_32 this saves a sar, but code size increases by one byte per
is_highmem() use due to 32-bit cmps rather than 16 bit cmps.

Before:
 207:   2b 80 8c 07 00 00       sub    0x78c(%eax),%eax
 20d:   c1 f8 0b                sar    $0xb,%eax
 210:   83 f8 02                cmp    $0x2,%eax
 213:   74 16                   je     22b <kmap_atomic_prot+0x144>
 215:   83 f8 03                cmp    $0x3,%eax
 218:   0f 85 8f 00 00 00       jne    2ad <kmap_atomic_prot+0x1c6>
 21e:   83 3d 00 00 00 00 02    cmpl   $0x2,0x0
 225:   0f 85 82 00 00 00       jne    2ad <kmap_atomic_prot+0x1c6>
 22b:   64 a1 00 00 00 00       mov    %fs:0x0,%eax

After:
 207:   2b 80 8c 07 00 00       sub    0x78c(%eax),%eax
 20d:   3d 00 10 00 00          cmp    $0x1000,%eax
 212:   74 18                   je     22c <kmap_atomic_prot+0x145>
 214:   3d 00 18 00 00          cmp    $0x1800,%eax
 219:   0f 85 8f 00 00 00       jne    2ae <kmap_atomic_prot+0x1c7>
 21f:   83 3d 00 00 00 00 02    cmpl   $0x2,0x0
 226:   0f 85 82 00 00 00       jne    2ae <kmap_atomic_prot+0x1c7>
 22c:   64 a1 00 00 00 00       mov    %fs:0x0,%eax

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:17 -07:00
Christoph Lameter
488514d179 Remove set_migrateflags()
Migrate flags must be set on slab creation as agreed upon when the antifrag
logic was reviewed.  Otherwise some slabs of a slabcache will end up in the
unmovable and others in the reclaimable section depending on which flag was
active when a new slab page was allocated.

This likely slid in somehow when antifrag was merged. Remove it.

The buffer_heads are always allocated with __GFP_RECLAIMABLE because the
SLAB_RECLAIM_ACCOUNT option is set.  The set_migrateflags() never had any
effect there.

Radix tree allocations are not directly reclaimable but they are allocated
with __GFP_RECLAIMABLE set on each allocation.  We now set
SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix
tree slabs are consistently placed in the reclaimable section.  Radix tree
slabs will also be accounted as such.

There is then no user left of set_migratepages. So remove it.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:17 -07:00
Badari Pulavarty
ea01ea937d hotplug memory remove: generic __remove_pages() support
Generic helper function to remove section mappings and sysfs entries for the
section of the memory we are removing.  offline_pages() correctly adjusted
zone and marked the pages reserved.

TODO: Yasunori Goto is working on patches to free up allocations from bootmem.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:17 -07:00
David S. Miller
ceb4e8e44b sparc64: Kill PIL_RESERVED, unused.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-28 04:03:06 -07:00
Eric Paris
7b41b1733c SELinux: include/security.h whitespace, syntax, and other cleanups
This patch changes include/security.h to fix whitespace and syntax issues.  Things that
are fixed may include (does not not have to include)

whitespace at end of lines
spaces followed by tabs
spaces used instead of tabs
spacing around parenthesis
location of { around structs and else clauses
location of * in pointer declarations
removal of initialization of static data to keep it in the right section
useless {} in if statemetns
useless checking for NULL before kfree
fixing of the indentation depth of switch statements
no assignments in if statements
include spaces around , in function calls
and any number of other things I forgot to mention

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-04-28 09:29:08 +10:00
David S. Miller
90888816ba sparc64: Clean up handling of pt_regs trap type encoding.
If we use this from more than one place, it's better to
have helpers instead of twiddling magic constants all
over.

Add pt_regs_trap_type(), pt_regs_clear_trap_type(), and
pt_regs_is_syscall().

Use them in do_signal().

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-27 14:52:51 -07:00
David L Stevens
dae5029548 ipv4/ipv6 compat: Fix SSM applications on 64bit kernels.
Add support on 64-bit kernels for seting 32-bit compatible MCAST*
socket options.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-27 14:26:53 -07:00
Linus Torvalds
064922a805 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (40 commits)
  [SCSI] jazz_esp, sgiwd93, sni_53c710, sun3x_esp: fix platform driver hotplug/coldplug
  [SCSI] aic7xxx: add const
  [SCSI] aic7xxx: add static
  [SCSI] aic7xxx: Update _shipped files
  [SCSI] aic7xxx: teach aicasm to not emit unused debug code/data
  [SCSI] qla2xxx: Update version number to 8.02.01-k2.
  [SCSI] qla2xxx: Correct regression in relogin code.
  [SCSI] qla2xxx: Correct misc. endian and byte-ordering issues.
  [SCSI] qla2xxx: make qla2x00_issue_iocb_timeout() static
  [SCSI] qla2xxx: qla_os.c, make 2 functions static
  [SCSI] qla2xxx: Re-register FDMI information after a LIP.
  [SCSI] qla2xxx: Correct SRB usage-after-completion/free issues.
  [SCSI] qla2xxx: Correct ISP84XX verify-chip response handling.
  [SCSI] qla2xxx: Wakeup DPC thread to process any deferred-work requests.
  [SCSI] qla2xxx: Collapse RISC-RAM retrieval code during a firmware-dump.
  [SCSI] m68k: new mac_esp scsi driver
  [SCSI] zfcp: Add some statistics provided by the FCP adapter to the sysfs
  [SCSI] zfcp: Print some messages only during ERP
  [SCSI] zfcp: Wait for free SBAL during exchange config
  [SCSI] scsi_transport_fc: fc_user_scan correction
  ...
2008-04-27 11:25:00 -07:00
Linus Torvalds
42cadc8600 Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (147 commits)
  KVM: kill file->f_count abuse in kvm
  KVM: MMU: kvm_pv_mmu_op should not take mmap_sem
  KVM: SVM: remove selective CR0 comment
  KVM: SVM: remove now obsolete FIXME comment
  KVM: SVM: disable CR8 intercept when tpr is not masking interrupts
  KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled
  KVM: export kvm_lapic_set_tpr() to modules
  KVM: SVM: sync TPR value to V_TPR field in the VMCB
  KVM: ppc: PowerPC 440 KVM implementation
  KVM: Add MAINTAINERS entry for PowerPC KVM
  KVM: ppc: Add DCR access information to struct kvm_run
  ppc: Export tlb_44x_hwater for KVM
  KVM: Rename debugfs_dir to kvm_debugfs_dir
  KVM: x86 emulator: fix lea to really get the effective address
  KVM: x86 emulator: fix smsw and lmsw with a memory operand
  KVM: x86 emulator: initialize src.val and dst.val for register operands
  KVM: SVM: force a new asid when initializing the vmcb
  KVM: fix kvm_vcpu_kick vs __vcpu_run race
  KVM: add ioctls to save/store mpstate
  KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*
  ...
2008-04-27 10:13:52 -07:00