Commit Graph

5728 Commits

Author SHA1 Message Date
Rafael J. Wysocki
f577eb30af [PATCH] swsusp: low level interface
Introduce the low level interface that can be used for handling the
snapshot of the system memory by the in-kernel swap-writing/reading code of
swsusp and the userland interface code (to be introduced shortly).

Also change the way in which swsusp records the allocated swap pages and,
consequently, simplifies the in-kernel swap-writing/reading code (this is
necessary for the userland interface too).  To this end, it introduces two
helper functions in mm/swapfile.c, so that the swsusp code does not refer
directly to the swap internals.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:07 -08:00
Markus Gutschke
aeefc956d5 [PATCH] x86: Make _syscallX() macros compile in PIC mode
Gcc reserves %ebx when compiling position-independent-code on i386.  This
means, the _syscallX() macros in include/asm-i386/unistd.h will not
compile.  This patch is changes the existing macros to take special care to
preserve %ebx.

The bug can be tracked at http://bugzilla.kernel.org/show_bug.cgi?id=6204

Signed-off-by: Markus Gutschke <markus@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:07 -08:00
Chuck Ebbert
42c059e04d [PATCH] i386 spinlocks: disable interrupts only if we enabled them
_raw_spin_lock_flags() is entered with interrupts disabled.  If it cannot
obtain a spinlock, it checks the flags that were passed and re-enables
interrupts before spinning if that's how the flags are set.  When the
spinlock might be available, it disables interrupts (even if they are
already disabled) before trying to get the lock.  Change that so interrupts
are only disabled if they have been enabled.  This costs nine bytes of
duplicated spinloop code.

Fastpath before patch:
        jle <keep looping>      not-taken conditional jump
        cli                     disable interrupts
        jmp <try for lock>      unconditional jump

Fastpath after patch, if interrupts were not enabled:
        jg <try for lock>       taken conditional branch

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:06 -08:00
Jesper Juhl
52f4a91afd [PATCH] Fix the imlicit declaration of mtrr_centaur_report_mcr in arch/i386/kernel/cpu/centaur.c
arch/i386/kernel/cpu/centaur.c: In function `centaur_mcr_insert':
arch/i386/kernel/cpu/centaur.c:33: warning: implicit declaration of function `mtrr_centaur_report_mcr'

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:06 -08:00
Jan Beulich
db753bdfc2 [PATCH] i386: fix uses of user_mode() vs. user_mode_vm()
>commit 76381fee7e
>Author: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
>Date:   Thu Jun 23 00:08:46 2005 -0700
>
>    [PATCH] xen: x86_64: use more usermode macro
>
>    Make use of the user_mode macro where it's possible.  This is useful for Xen
>    because it will need only to redefine only the macro to a hypervisor call.

I am of the opinion that the above changeset is incomplete, i.e.  it missed
converting some previous uses of user_mode to user_mode_vm.  While most of
them could be considered just cosmetical, at least the one in die_nmi
doesn't appear to be.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Zachary Amsden <zach@vmware.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:05 -08:00
Jan Beulich
101f12af16 [PATCH] i386: actively synchronize vmalloc area when registering certain callbacks
Registering a callback handler through register_die_notifier() is obviously
primarily intended for use by modules.  However, the way these currently
get called it is basically impossible for them to actually be used by
modules, as there is, on non-PAE configurationes, a good chance (the larger
the module, the better) for the system to crash as a result.

This is because the callback gets invoked

(a) in the page fault path before the top level page table propagation
    gets carried out (hence a fault to propagate the top level page table
    entry/entries mapping to module's code/data would nest infinitly) and

(b) in the NMI path, where nested faults must absolutely not happen,
    since otherwise the IRET from the nested fault re-enables NMIs,
    potentially resulting in nested NMI occurences.

Besides the modular aspect, similar problems would even arise for in-
kernel consumers of the API if they touched ioremap()ed or vmalloc()ed
memory inside their handlers.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:05 -08:00
Stas Sergeev
99b7de3347 [PATCH] x86: early printk handling fixes
The history is that -mm kernels do not work for me for a few months
already.  The things started from crashing somewhere after starting init,
and for the last month - no boot at all, just "Uncompressing...  OK,
booting kernel", and silence.  Early console didn't work too.  With the
latest releases this degraded into an infinite stream of the "Unknown
interrupt or fault" messages.  So today my patience ran out and I started
to think how can I collect at least some info for the bug-report.  Attached
is the patch that allows to gather some valueable debug info on the problem
by making an early console more useable.  I can't properly test the patch,
as the kernel still doesn't boot, so I'll explain it in details in a hope
someone else can justify the intrusive changes.

arch_hooks.h: added prototypes for setup_early_printk() and early_printk().

setup.c: killed wrong setup_early_printk() prototype.  Moved
setup_early_printk() a bit earlier, as it was not "early enough" to cover
the bug I was fighting with.

early_printk.c: made it to start printing from the bottom of the screen,
otherwise the messages interfere with the ones of the boot-loader, so you
can't read them.

Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Cc: Andi Kleen <ak@muc.de>
Cc: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:05 -08:00
Chris Wright
7c63ee5cf7 [PATCH] i386: remove duplicate declaration of mp_bus_id_to_pci_bus
mp_bus_id_to_pci_bus is declared identically twice.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:04 -08:00
Natalie.Protasevich@unisys.com
e5428ede94 [PATCH] Compilation fix for ES7000 when no ACPI is specified in config (i386)
ES7000 platform code clean up for compilation errors and a warning.
Ifdef'd the ACPI related parts in the ES7000 platform code.  They were
causing compile errors in certain configuration (without ACPI defined).  I
think this approach would be best (as opposed to Kconfig changes) since it
only touches the subarch...

Signed-off-by: <Natalie.Protasevich@unisys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:04 -08:00
Eric W. Biederman
30e931d409 [PATCH] i386: Add a temporary to make put_user more type safe
In some code I am developing I had occasion to change the type of a
variable.  This made the value put_user was putting to user space wrong.
But the code continued to build cleanly without errors.

Introducing a temporary fixes this problem and at least with gcc-3.3.5 does
not cause gcc any problems with optimizing out the temporary.  gcc-4.x
using SSA internally ought to be even better at optimizing out temporaries,
so I don't expect a temporary to become a problem.  Especially because in
all correct cases the types on both sides of the assignment to the
temporary are the same.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:04 -08:00
Gerd Hoffmann
9a0b5817ad [PATCH] x86: SMP alternatives
Implement SMP alternatives, i.e.  switching at runtime between different
code versions for UP and SMP.  The code can patch both SMP->UP and UP->SMP.
The UP->SMP case is useful for CPU hotplug.

With CONFIG_CPU_HOTPLUG enabled the code switches to UP at boot time and
when the number of CPUs goes down to 1, and switches to SMP when the number
of CPUs goes up to 2.

Without CONFIG_CPU_HOTPLUG or on non-SMP-capable systems the code is
patched once at boot time (if needed) and the tables are released
afterwards.

The changes in detail:

  * The current alternatives bits are moved to a separate file,
    the SMP alternatives code is added there.

  * The patch adds some new elf sections to the kernel:
    .smp_altinstructions
	like .altinstructions, also contains a list
	of alt_instr structs.
    .smp_altinstr_replacement
	like .altinstr_replacement, but also has some space to
	save original instruction before replaving it.
    .smp_locks
	list of pointers to lock prefixes which can be nop'ed
	out on UP.
    The first two are used to replace more complex instruction
    sequences such as spinlocks and semaphores.  It would be possible
    to deal with the lock prefixes with that as well, but by handling
    them as special case the table sizes become much smaller.

 * The sections are page-aligned and padded up to page size, so they
   can be free if they are not needed.

 * Splitted the code to release init pages to a separate function and
   use it to release the elf sections if they are unused.

Signed-off-by: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:04 -08:00
NeilBrown
9e71f9c848 [PATCH] DM: Fix bug: BIO_RW_BARRIER requests to md/raid1 hang.
Both R1BIO_Barrier and R1BIO_Returned are 4 !!!!

This means that barrier requests don't get returned (i.e.  b_endio called)
because it looks like they already have been.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:03 -08:00
Linus Torvalds
2e6e33bab6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (78 commits)
  [PATCH] powerpc: Add FSL SEC node to documentation
  [PATCH] macintosh: tidy-up driver_register() return values
  [PATCH] powerpc: tidy-up of_register_driver()/driver_register() return values
  [PATCH] powerpc: via-pmu warning fix
  [PATCH] macintosh: cleanup the use of i2c headers
  [PATCH] powerpc: dont allow old RTC to be selected
  [PATCH] powerpc: make powerbook_sleep_grackle static
  [PATCH] powerpc: Fix warning in add_memory
  [PATCH] powerpc: update mailing list addresses
  [PATCH] powerpc: Remove calculation of io hole
  [PATCH] powerpc: iseries: Add bootargs to /chosen
  [PATCH] powerpc: iseries: Add /system-id, /model and /compatible
  [PATCH] powerpc: Add strne2a() to convert a string from EBCDIC to ASCII
  [PATCH] powerpc: iseries: Make more stuff static in platforms/iseries/mf.c
  [PATCH] powerpc: iseries: Remove pointless iSeries_(restart|power_off|halt)
  [PATCH] powerpc: iseries: mf related cleanups
  [PATCH] powerpc: Replace platform_is_lpar() with a firmware feature
  [PATCH] powerpc: trivial: Cleanup whitespace in cputable.h
  [PATCH] powerpc: Remove unused iommu_off logic from pSeries_init_early()
  [PATCH] powerpc: Unconfuse htab_bolt_mapping() callers
  ...
2006-03-22 22:20:46 -08:00
Linus Torvalds
80576fd86b Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [TCP]: Do not use inet->id of global tcp_socket when sending RST.
  [NETFILTER]: Fix undefined references to get_h225_addr
  [NETFILTER]: futher {ip,ip6,arp}_tables unification
  [NETFILTER]: Fix xt_policy address matching
  [NETFILTER]: nf_conntrack: support for layer 3 protocol load on demand
  [NETFILTER]: x_tables: set the protocol family in x_tables targets/matches
  [NETFILTER]: conntrack: cleanup the conntrack ID initialization
  [NETFILTER]: nfnetlink_queue: fix nfnetlink message size
  [NETFILTER]: ctnetlink: Fix expectaction mask dumping
  [NETFILTER]: Fix Kconfig typos
  [NETFILTER]: Fix ip6tables breakage from {get,set}sockopt compat layer
2006-03-22 17:36:04 -08:00
Linus Torvalds
9d8f057acb Merge master.kernel.org:/home/rmk/linux-2.6-serial
* master.kernel.org:/home/rmk/linux-2.6-serial:
  [SERIAL] Merge avlab serial board entries in parport_serial
  [SERIAL] kernel console should send CRLF not LFCR
2006-03-22 17:33:12 -08:00
Linus Torvalds
591eb85ecd Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm: (45 commits)
  [ARM] 3389/1: typo and grammar fix
  [ARM] 3386/1: AT91RM9200 Clock update
  [ARM] 3384/1: AT91RM9200: Timer
  [ARM] 3382/1: ixp2000: unify defconfigs
  [ARM] 3381/1: ixp2000: fix slowport write timing control register fields
  [ARM] 3380/1: ixp2000: simplify ixdp2x00_master_npu() check
  [ARM] 3379/1: ixp2000: use generic 8250 debug macros
  [ARM] 3378/1: ixp2000: fix gpio interrupt handling
  [ARM] Quieten spurious IRQ detection
  [ARM] Use kcalloc to allocate counter_config array rather than kmalloc
  [ARM] Oprofile: dynamically allocate counter_config
  [ARM] Oprofile: Convert semaphore to mutex
  [ARM] 3376/2: S3C2410 - update defconfig
  [ARM] 3375/1: S3C2440 - fix osiris machine build
  [ARM] 3374/1: ep93xx: gpio interrupt support
  [ARM] 3361/1: S3C24XX - add USB bus clock source
  [ARM] 3360/1: S3C2440 - add set rate methods and camera clock
  [ARM] 3359/1: S3C24XX - add support for clk_set_rate
  [ARM] Convert kmalloc+memset to kzalloc
  [ARM] 3373/1: move uengine loader to arch/arm/common
  ...
2006-03-22 17:32:09 -08:00
Jeff Garzik
f01c184569 Merge branch 'master' 2006-03-22 19:13:54 -05:00
Dmitry Mishin
1e30a014e3 [NETFILTER]: futher {ip,ip6,arp}_tables unification
This patch moves {ip,ip6,arp}t_entry_{match,target} definitions to
x_tables.h. This move simplifies code and future compatibility fixes.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:56:56 -08:00
Pablo Neira Ayuso
b9f78f9fca [NETFILTER]: nf_conntrack: support for layer 3 protocol load on demand
x_tables matches and targets that require nf_conntrack_ipv[4|6] to work
don't have enough information to load on demand these modules. This
patch introduces the following changes to solve this issue:

o nf_ct_l3proto_try_module_get: try to load the layer 3 connection
tracker module and increases the refcount.
o nf_ct_l3proto_module put: drop the refcount of the module.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:56:08 -08:00
Pablo Neira Ayuso
a45049c51c [NETFILTER]: x_tables: set the protocol family in x_tables targets/matches
Set the family field in xt_[matches|targets] registered.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:55:40 -08:00
Lennert Buytenhek
a04f2d9d3a [ARM] 3381/1: ixp2000: fix slowport write timing control register fields
Patch from Lennert Buytenhek

The original version of the chip docs had the PW and SU fields in
the slowport write timing control register accidentally reversed.
This is mentioned in the errata (documentation change #4) and fixed
in newer docs.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-22 20:14:11 +00:00
Lennert Buytenhek
709eb502eb [ARM] 3380/1: ixp2000: simplify ixdp2x00_master_npu() check
Patch from Lennert Buytenhek

On the IXDP2x00s, the NPU that is PCI master is always the egress
(i.e. 'master') NPU.  At least on the IXDP2800, both NPUs have flash,
so the ixp2000_has_flash() check in ixdp2x00_master_npu() is useless.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-22 20:14:11 +00:00
Lennert Buytenhek
e99053e075 [ARM] 3379/1: ixp2000: use generic 8250 debug macros
Patch from Lennert Buytenhek

The xscale UART in the ixp2000 is basically just an 8250 UART (with
some extra bits and pieces), so we can use the generic 8250 debug
macros on the ixp2000.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-22 20:14:09 +00:00
Linus Torvalds
1c2e02750b Merge git://git.kernel.org/pub/scm/linux/kernel/git/perex/alsa
* git://git.kernel.org/pub/scm/linux/kernel/git/perex/alsa: (124 commits)
  [ALSA] version 1.0.11rc4
  [PATCH] Intruduce DMA_28BIT_MASK
  [ALSA] hda-codec - Add support for ASUS P4GPL-X
  [ALSA] hda-codec - Add support for HP nx9420 laptop
  [ALSA] Fix memory leaks in error path of control.c
  [ALSA] AMD Au1x00: AC'97 controller is memory mapped
  [ALSA] AMD Au1x00: fix DMA init/cleanup
  [ALSA] hda-codec - Fix generic auto-configurator
  [ALSA] hda-codec - Fix BIOS auto-configuration
  [ALSA] Fixes typos in Audiophile-USB.txt
  [ALSA] ice1712 - typo fixes for dxr_enable module option
  [ALSA] AMD Au1x00: make driver build after cleanup
  [ALSA] ice1712 - Fix wrong value types for enum items
  [ALSA] fix resource leak in usbmixer
  [ALSA] Fix gus_pcm dereference before NULL
  [ALSA] Fix seq_clientmgr dereferences before NULL check
  [ALSA] hda-codec - Fix for Samsung R65 and ASUS A6J
  [ALSA] hda-codec - Add support for VAIO FE550G and SZ110
  [ALSA] usb-audio: add Maya44 mixer control names
  [ALSA] usb-audio: add Casio PL-40R support
  ...
2006-03-22 10:59:20 -08:00
Linus Torvalds
8b4b6707ee Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  fixed path to moved file in include/linux/device.h
  Fix spelling in E1000_DISABLE_PACKET_SPLIT Kconfig description
  Documentation/dvb/get_dvb_firmware: fix firmware URL
  Documentation: Update to BUG-HUNTING
  Remove superfluous NOTIFY_COOKIE_LEN define
  add "tags" to .gitignore
  Fix "frist", "fisrt", typos
  fix rwlock usage example
  It's UTF-8
2006-03-22 10:58:05 -08:00
Linus Torvalds
d04ef3a795 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: Add a secondary TSB for hugepage mappings.
  [SPARC]: Respect vm_page_prot in io_remap_page_range().
2006-03-22 10:56:57 -08:00
Linus Torvalds
36177ba655 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [TG3]: Bump driver version and reldate.
  [TG3]: Skip phy power down on some devices
  [TG3]: Fix SRAM access during tg3_init_one()
  [X25]: dte facilities 32 64 ioctl conversion
  [X25]: allow ITU-T DTE facilities for x25
  [X25]: fix kernel error message 64 bit kernel
  [X25]: ioctl conversion 32 bit user to 64 bit kernel
  [NET]: socket timestamp 32 bit handler for 64 bit kernel
  [NET]: allow 32 bit socket ioctl in 64 bit kernel
  [BLUETOOTH]: Return negative error constant
2006-03-22 10:56:23 -08:00
Linus Torvalds
2152f85366 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (138 commits)
  [SCSI] libata: implement minimal transport template for ->eh_timed_out
  [SCSI] eliminate rphy allocation in favour of expander/end device allocation
  [SCSI] convert mptsas over to end_device/expander allocations
  [SCSI] allow displaying and setting of cache type via sysfs
  [SCSI] add scsi_mode_select to scsi_lib.c
  [SCSI] 3ware 9000 add big endian support
  [SCSI] qla2xxx: update MAINTAINERS
  [SCSI] scsi: move target_destroy call
  [SCSI] fusion - bump version
  [SCSI] fusion - expander hotplug suport in mptsas module
  [SCSI] fusion - exposing raid components in mptsas
  [SCSI] fusion - memory leak, and initializing fields
  [SCSI] fusion - exclosure misspelled
  [SCSI] fusion - cleanup mptsas event handling functions
  [SCSI] fusion - removing target_id/bus_id from the VirtDevice structure
  [SCSI] fusion - static fix's
  [SCSI] fusion - move some debug firmware event debug msgs to verbose level
  [SCSI] fusion - loginfo header update
  [SCSI] add scsi_reprobe_device
  [SCSI] megaraid_sas: fix extended timeout handling
  ...
2006-03-22 10:47:24 -08:00
Christoph Lameter
b20a35035f [PATCH] page migration reorg
Centralize the page migration functions in anticipation of additional
tinkering.  Creates a new file mm/migrate.c

1. Extract buffer_migrate_page() from fs/buffer.c

2. Extract central migration code from vmscan.c

3. Extract some components from mempolicy.c

4. Export pageout() and remove_from_swap() from vmscan.c

5. Make it possible to configure NUMA systems without page migration
   and non-NUMA systems with page migration.

I had to so some #ifdeffing in mempolicy.c that may need a cleanup.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:06 -08:00
David Gibson
42b88befd6 [PATCH] hugepage: is_aligned_hugepage_range() cleanup
Quite a long time back, prepare_hugepage_range() replaced
is_aligned_hugepage_range() as the callback from mm/mmap.c to arch code to
verify if an address range is suitable for a hugepage mapping.
is_aligned_hugepage_range() stuck around, but only to implement
prepare_hugepage_range() on archs which didn't implement their own.

Most archs (everything except ia64 and powerpc) used the same
implementation of is_aligned_hugepage_range().  On powerpc, which
implements its own prepare_hugepage_range(), the custom version was never
used.

In addition, "is_aligned_hugepage_range()" was a bad name, because it
suggests it returns true iff the given range is a good hugepage range,
whereas in fact it returns 0-or-error (so the sense is reversed).

This patch cleans up by abolishing is_aligned_hugepage_range().  Instead
prepare_hugepage_range() is defined directly.  Most archs use the default
version, which simply checks the given region is aligned to the size of a
hugepage.  ia64 and powerpc define custom versions.  The ia64 one simply
checks that the range is in the correct address space region in addition to
being suitably aligned.  The powerpc version (just as previously) checks
for suitable addresses, and if necessary performs low-level MMU frobbing to
set up new areas for use by hugepages.

No libhugetlbfs testsuite regressions on ppc64 (POWER5 LPAR).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:04 -08:00
David Gibson
3915bcf38f [PATCH] hugepage: Move hugetlb_free_pgd_range() prototype to hugetlb.h
The optional hugepage callback, hugetlb_free_pgd_range() is presently
implemented non-trivially only on ia64 (but I plan to add one for powerpc
shortly).  It has its own prototype for the function in asm-ia64/pgtable.h.
 However, since the function is called from generic code, it make sense for
its prototype to be in the generic hugetlb.h header file, as the protypes
other arch callbacks already are (prepare_hugepage_range(),
set_huge_pte_at(), etc.).  This patch makes it so.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:04 -08:00
David Gibson
9da61aef0f [PATCH] hugepage: Fix hugepage logic in free_pgtables()
free_pgtables() has special logic to call hugetlb_free_pgd_range() instead
of the normal free_pgd_range() on hugepage VMAs.  However, the test it uses
to do so is incorrect: it calls is_hugepage_only_range on a hugepage sized
range at the start of the vma.  is_hugepage_only_range() will return true
if the given range has any intersection with a hugepage address region, and
in this case the given region need not be hugepage aligned.  So, for
example, this test can return true if called on, say, a 4k VMA immediately
preceding a (nicely aligned) hugepage VMA.

At present we get away with this because the powerpc version of
hugetlb_free_pgd_range() is just a call to free_pgd_range().  On ia64 (the
only other arch with a non-trivial is_hugepage_only_range()) we get away
with it for a different reason; the hugepage area is not contiguous with
the rest of the user address space, and VMAs are not permitted in between,
so the test can't return a false positive there.

Nonetheless this should be fixed.  We do that in the patch below by
replacing the is_hugepage_only_range() test with an explicit test of the
VMA using is_vm_hugetlb_page().

This in turn changes behaviour for platforms where is_hugepage_only_range()
returns false always (everything except powerpc and ia64).  We address this
by ensuring that hugetlb_free_pgd_range() is defined to be identical to
free_pgd_range() (instead of a no-op) on everything except ia64.  Even so,
it will prevent some otherwise possible coalescing of calls down to
free_pgd_range().  Since this only happens for hugepage VMAs, removing this
small optimization seems unlikely to cause any trouble.

This patch causes no regressions on the libhugetlbfs testsuite - ppc64
POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP).

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:03 -08:00
David Gibson
27a85ef1b8 [PATCH] hugepage: Make {alloc,free}_huge_page() local
Originally, mm/hugetlb.c just handled the hugepage physical allocation path
and its {alloc,free}_huge_page() functions were used from the arch specific
hugepage code.  These days those functions are only used with mm/hugetlb.c
itself.  Therefore, this patch makes them static and removes their
prototypes from hugetlb.h.  This requires a small rearrangement of code in
mm/hugetlb.c to avoid a forward declaration.

This patch causes no regressions on the libhugetlbfs testsuite (ppc64,
POWER5).

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:03 -08:00
David Gibson
b45b5bd65f [PATCH] hugepage: Strict page reservation for hugepage inodes
These days, hugepages are demand-allocated at first fault time.  There's a
somewhat dubious (and racy) heuristic when making a new mmap() to check if
there are enough available hugepages to fully satisfy that mapping.

A particularly obvious case where the heuristic breaks down is where a
process maps its hugepages not as a single chunk, but as a bunch of
individually mmap()ed (or shmat()ed) blocks without touching and
instantiating the pages in between allocations.  In this case the size of
each block is compared against the total number of available hugepages.
It's thus easy for the process to become overcommitted, because each block
mapping will succeed, although the total number of hugepages required by
all blocks exceeds the number available.  In particular, this defeats such
a program which will detect a mapping failure and adjust its hugepage usage
downward accordingly.

The patch below addresses this problem, by strictly reserving a number of
physical hugepages for hugepage inodes which have been mapped, but not
instatiated.  MAP_SHARED mappings are thus "safe" - they will fail on
mmap(), not later with an OOM SIGKILL.  MAP_PRIVATE mappings can still
trigger an OOM.  (Actually SHARED mappings can technically still OOM, but
only if the sysadmin explicitly reduces the hugepage pool between mapping
and instantiation)

This patch appears to address the problem at hand - it allows DB2 to start
correctly, for instance, which previously suffered the failure described
above.

This patch causes no regressions on the libhugetblfs testsuite, and makes a
test (designed to catch this problem) pass which previously failed (ppc64,
POWER5).

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:03 -08:00
Zhang, Yanmin
8f860591ff [PATCH] Enable mprotect on huge pages
2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
mprotect.

From: David Gibson <david@gibson.dropbear.id.au>

  Remove a test from the mprotect() path which checks that the mprotect()ed
  range on a hugepage VMA is hugepage aligned (yes, really, the sense of
  is_aligned_hugepage_range() is the opposite of what you'd guess :-/).

  In fact, we don't need this test.  If the given addresses match the
  beginning/end of a hugepage VMA they must already be suitably aligned.  If
  they don't, then mprotect_fixup() will attempt to split the VMA.  The very
  first test in split_vma() will check for a badly aligned address on a
  hugepage VMA and return -EINVAL if necessary.

From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>

  On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE.  The
  identify of hugetlb pte is lost when changing page protection via mprotect.
  A page fault occurs later will trigger a bug check in huge_pte_alloc().

  The fix is to always make new pte a hugetlb pte and also to clean up
  legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.

Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:03 -08:00
Nick Piggin
617d2214ee [PATCH] mm: optimise page_count
Optimise page_count compound page test and make it consistent with similar
functions.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:02 -08:00
Nick Piggin
7835e98b2e [PATCH] remove set_page_count() outside mm/
set_page_count usage outside mm/ is limited to setting the refcount to 1.
Remove set_page_count from outside mm/, and replace those users with
init_page_count() and set_page_refcounted().

This allows more debug checking, and tighter control on how code is allowed
to play around with page->_count.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:02 -08:00
Nick Piggin
84097518d1 [PATCH] mm: nommu use compound pages
Now that compound page handling is properly fixed in the VM, move nommu
over to using compound pages rather than rolling their own refcounting.

nommu vm page refcounting is broken anyway, but there is no need to have
divergent code in the core VM now, nor when it gets fixed.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: David Howells <dhowells@redhat.com>

(Needs testing, please).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:01 -08:00
Nick Piggin
0f8053a509 [PATCH] mm: make __put_page internal
Remove __put_page from outside the core mm/.  It is dangerous because it does
not handle compound pages nicely, and misses 1->0 transitions.  If a user
later appears that really needs the extra speed we can reevaluate.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:01 -08:00
Andrew Morton
69e05944af [PATCH] vmscan: use unsigned longs
Turn basically everything in vmscan.c into `unsigned long'.  This is to avoid
the possibility that some piece of code in there might decide to operate upon
more than 4G (or even 2G) of pages in one hit.

This might be silly, but we'll need it one day.

Cc: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:00 -08:00
Andrew Morton
78eef01b0f [PATCH] on_each_cpu(): disable local interrupts
When on_each_cpu() runs the callback on other CPUs, it runs with local
interrupts disabled.  So we should run the function with local interrupts
disabled on this CPU, too.

And do the same for UP, so the callback is run in the same environment on both
UP and SMP.  (strictly it should do preempt_disable() too, but I think
local_irq_disable is sufficiently equivalent).

Also uninlines on_each_cpu().  softirq.c was the most appropriate file I could
find, but it doesn't seem to justify creating a new file.

Oh, and fix up that comment over (under?) x86's smp_call_function().  It
drives me nuts.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:59 -08:00
Christoph Lameter
ac2b898ca6 [PATCH] slab: Remove SLAB_NO_REAP option
SLAB_NO_REAP is documented as an option that will cause this slab not to be
reaped under memory pressure.  However, that is not what happens.  The only
thing that SLAB_NO_REAP controls at the moment is the reclaim of the unused
slab elements that were allocated in batch in cache_reap().  Cache_reap()
is run every few seconds independently of memory pressure.

Could we remove the whole thing?  Its only used by three slabs anyways and
I cannot find a reason for having this option.

There is an additional problem with SLAB_NO_REAP.  If set then the recovery
of objects from alien caches is switched off.  Objects not freed on the
same node where they were initially allocated will only be reused if a
certain amount of objects accumulates from one alien node (not very likely)
or if the cache is explicitly shrunk.  (Strangely __cache_shrink does not
check for SLAB_NO_REAP)

Getting rid of SLAB_NO_REAP fixes the problems with alien cache freeing.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:59 -08:00
Adrian Bunk
b50ec7d807 [PATCH] kcalloc(): INT_MAX -> ULONG_MAX
Since size_t has the same size as a long on all architectures, it's enough
for overflow checks to check against ULONG_MAX.

This change could allow a compiler better optimization (especially in the
n=1 case).

The practical effect seems to be positive, but quite small:

    text           data     bss      dec            hex filename
21762380        5859870 1848928 29471178        1c1b1ca vmlinux-old
21762211        5859870 1848928 29471009        1c1b121 vmlinux-patched

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:58 -08:00
Nick Piggin
9d41415221 [PATCH] mm: page_state comment more
Clarify that preemption needs to be guarded against with the
__xxx_page_state functions.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:58 -08:00
Nick Piggin
8dfcc9ba27 [PATCH] mm: split highorder pages
Have an explicit mm call to split higher order pages into individual pages.
 Should help to avoid bugs and be more explicit about the code's intention.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
8dc04efbfb [PATCH] mm: de-skew page refcounting
atomic_add_unless (atomic_inc_not_zero) no longer requires an offset refcount
to function correctly.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
7c8ee9a863 [PATCH] mm: simplify vmscan vs release refcounting
The VM has an interesting race where a page refcount can drop to zero, but it
is still on the LRU lists for a short time.  This was solved by testing a 0->1
refcount transition when picking up pages from the LRU, and dropping the
refcount in that case.

Instead, use atomic_add_unless to ensure we never pick up a 0 refcount page
from the LRU, thus a 0 refcount page will never have its refcount elevated
until it is allocated again.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
f205b2fe62 [PATCH] mm: slab less atomics
Atomic operation removal from slab

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
5e9dace8d3 [PATCH] mm: page_alloc less atomics
More atomic operation removal from page allocator

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
674539115c [PATCH] mm: less atomic ops
In the page release paths, we can be sure that nobody will mess with our
page->flags because the refcount has dropped to 0.  So no need for atomic
operations here.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00