kernel-ark

Author	SHA1	Message	Date
Linus Torvalds	28e0cf22c1	Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq	2006-01-31 15:09:20 -08:00
Linus Torvalds	9aef3b7c20	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6	2006-01-31 14:14:02 -08:00
Linus Torvalds	e0ae23550f	Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6	2006-01-31 13:12:41 -08:00
Linus Torvalds	d195ea4b14	Merge master.kernel.org:/home/rmk/linux-2.6-serial	2006-01-31 11:31:54 -08:00
Linus Torvalds	bb4bc81a23	Merge master.kernel.org:/home/rmk/linux-2.6-arm	2006-01-31 11:31:02 -08:00
Linus Torvalds	d5bee77513	Merge branch 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block	2006-01-31 11:22:40 -08:00
Linus Torvalds	0827f2b698	Merge branch 'upstream-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6	2006-01-31 10:29:35 -08:00
Linus Torvalds	7fcdf327be	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2006-01-31 10:21:13 -08:00
Zhu Yi	4a99ac3a9e	[PATCH] ieee80211: Fix A band min and max channel definitions Signed-off-by: Hong Liu <hong.liu@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2006-01-27 16:49:58 -05:00
Prarit Bhargava	61d67f2e07	[IA64-SGI] Add PROM feature set for device flush list Introduce PRF_DEVICE_FLUSH_LIST flag for older PROMs. Signed-off-by: Prarit Bhargava <prarit@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2006-01-26 13:50:40 -08:00
brking@us.ibm.com	bb1d1073a1	[SCSI] Prevent scsi_execute_async from guessing cdb length When the scsi_execute_async interface was added it ended up reducing the flexibility of userspace to send arbitrary scsi commands through sg using SG_IO. The SG_IO interface allows userspace to specify the CDB length. This is now ignored in scsi_execute_async and it is guessed using the COMMAND_SIZE macro, which is not always correct, particularly for vendor specific commands. This patch adds a cmd_len parameter to the scsi_execute_async interface to allow the caller to specify the length of the CDB. Signed-off-by: Brian King <brking@us.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2006-01-26 15:13:50 -05:00
George G. Davis	7efb83002b	[ARM] 3269/1: Add ARMv6 MT_NONSHARED_DEVICE mem_types[] index Patch from George G. Davis This Freescale Semiconductor, Inc. contributed patch adds mem_types[] support for ARMv6 non-shared device memory region attributes. This implementation provides support for only first level section mapped non-shared devices. Second level non-shared device mappings are not yet supported. Signed-off-by: George G. Davis <gdavis@mvista.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-01-26 15:21:28 +00:00
Lucas Correia Villa Real	0367a8d37a	[ARM] 3266/1: S3C2400 - adds macro S3C24XX Patch from Lucas Correia Villa Real This patch defines S3C2400 memory map and adds a S3C24XX macro for common resources between S3C2400, S3C2410 and S3C2440 cpus. Signed-off-by: Lucas Correia Villa Real <lucasvr@gobolinux.org> Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-01-26 15:20:50 +00:00
Tetsuo Takata	dfcd77d16b	[SCSI] Remove host template ordered_flush variable After the recent overhaul of the block layer the variable "ordered_flush" is no longer used. Signed-off-by: Tetsuo Takata <takatatt@intellilink.co.jp> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-01-25 11:12:40 +01:00
Dean Roe	fd8b206d16	[IA64-SGI] add sn_feature_sets bit SGI's prom has added a new feature which avoids an Altix-specific MCA that can occur with excessive use of ia64_pal_cache_flush. This patch adds the #define to the sn_feature_sets.h to reflect that bit is taken. Signed-off-by: Dean Roe <roe@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2006-01-24 14:49:43 -08:00
Jens Axboe	2cb2e147a6	[BLOCK] ll_rw_blk: make max_sectors and max_hw_sectors unsigned ints IDE lba48 can support full 64k request size, which overflows the max_hw_sectors variable. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-01-24 10:06:19 +01:00
David S. Miller	d3ed309a71	[SPARC64]: Implement __raw_read_trylock() generic__raw_read_trylock() just does a raw_read_lock() so that isn't very useful. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-23 21:03:56 -08:00
Russell King	0077d45e46	[SERIAL] Make uart_port flags a bitwise type Same reasoning as commit `747c8a5594` but this time we're making uart_port flags a bitwise type - not all of these flags correspond with the old ASYNC_ flags, so there is the possibility for bugs if the wrong ASYNC_* constants are used. Always use UPF_* constants for uart_port->flags. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-01-21 23:03:28 +00:00
Russell King	27ae7a7435	[SERIAL] Fix UPF_ flag usage with uart_info->flags The previous change found a bug in the serial SAK handling - because we were looking for UPF_SAK set in uart_info->flags, we would never raise a SAK condition. UPF_SAK is in uart_port->flags. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-01-21 22:54:06 +00:00
Russell King	747c8a5594	[SERIAL] Make uart_info flags a bitwise type The potential for confusing the flags is fairly high. Make uart_info's flags a bitwise type so sparse can check that the right flag definitions are used with the right structure. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-01-21 22:50:36 +00:00
Russell King	ba899dbc03	[SERIAL] Make port->ops constant No one should write to the port->ops structure, so make it constant. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-01-21 22:45:50 +00:00
Russell King	ca74080385	[SERIAL] Remove UPF_AUTOPROBE and UPF_BOOT_ONLYMCA The functionality UPF_BOOT_ONLYMCA provided has been replaced by the 8250_mca module, which only registers MCA ports if MCA is present. UPF_AUTOPROBE has no functional effect - in fact, it's never tested. Only ibmasm set the flag. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-01-21 20:06:14 +00:00
David S. Miller	6fbfc96884	[NETFILTER]: Unbreak x-tables on x86. x86 defines __alignof__(long long) as 8 yet it gives 4 for a struct containing a long long, ho hum... so my simplified form doesn't work everywhere. So use Harald Welte's original patch, which should work on all platforms. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-20 11:57:07 -08:00
Linus Torvalds	18a4144028	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2006-01-19 22:19:26 -08:00
Linus Torvalds	02829f7377	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6	2006-01-19 22:17:38 -08:00
Linus Torvalds	497992917e	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6	2006-01-19 22:16:58 -08:00
David S. Miller	4f2d7680cb	[NETFILTER] x_tables: Make XT_ALIGN align as strictly as necessary. Or else we break on ppc32 and other 32-bit platforms. Based upon a patch from Harald Welte. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-19 16:58:37 -08:00
David S. Miller	cf9e50a920	Merge master.kernel.org:/pub/scm/linux/kernel/git/sridhar/lksctp-2.6	2006-01-19 16:53:02 -08:00
David S. Miller	2d7d5f0511	[SPARC]: Add support for *at(), ppoll, and pselect syscalls. This also includes by necessity _TIF_RESTORE_SIGMASK support, which actually resulted in a lot of cleanups. The sparc signal handling code is quite a mess and I should clean it up some day. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-19 02:42:49 -08:00
David S. Miller	f7111ceb52	[SPARC]: sparc32 needs PROMDEV_{I,O}RSC defines too. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-18 21:57:37 -08:00
Alan Cox	da9bb1d27b	[PATCH] EDAC: core EDAC support code This is a subset of the bluesmoke project core code, stripped of the NMI work which isn't ready to merge and some of the "interesting" proc functionality that needs reworking or just has no place in kernel. It requires no core kernel changes except the added scrub functions already posted. The goal is to merge further functionality only after the core code is accepted and proven in the base kernel, and only at the point the upstream extras are really ready to merge. From: doug thompson <norsk5@xmission.com> This converts EDAC to sysfs and is the final chunk neccessary before EDAC has a stable user space API and can be considered for submission into the base kernel. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: doug thompson <norsk5@xmission.com> Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:31 -08:00
Alan Cox	715b49ef2d	[PATCH] EDAC: atomic scrub operations EDAC requires a way to scrub memory if an ECC error is found and the chipset does not do the work automatically. That means rewriting memory locations atomically with respect to all CPUs _and_ bus masters. That means we can't use atomic_add(foo, 0) as it gets optimised for non-SMP This adds a function to include/asm-foo/atomic.h for the platforms currently supported which implements a scrub of a mapped block. It also adjusts a few other files include order where atomic.h is included before types.h as this now causes an error as atomic_scrub uses u32. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:30 -08:00
David Woodhouse	3213e913b0	[PATCH] Add pselect/ppoll system calls on i386 Add the sys_pselect6() and sys_poll() calls to the i386 syscall table. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:30 -08:00
David Woodhouse	9f72949f67	[PATCH] Add pselect/ppoll system call implementation The following implementation of ppoll() and pselect() system calls depends on the architecture providing a TIF_RESTORE_SIGMASK flag in the thread_info. These system calls have to change the signal mask during their operation, and signal handlers must be invoked using the new, temporary signal mask. The old signal mask must be restored either upon successful exit from the system call, or upon returning from the invoked signal handler if the system call is interrupted. We can't simply restore the original signal mask and return to userspace, since the restored signal mask may actually block the signal which interrupted the system call. The TIF_RESTORE_SIGMASK flag deals with this by causing the syscall exit path to trap into do_signal() just as TIF_SIGPENDING does, and by causing do_signal() to use the saved signal mask instead of the current signal mask when setting up the stack frame for the signal handler -- or by causing do_signal() to simply restore the saved signal mask in the case where there is no handler to be invoked. The first patch implements the sys_pselect() and sys_ppoll() system calls, which are present only if TIF_RESTORE_SIGMASK is defined. That #ifdef should go away in time when all architectures have implemented it. The second patch implements TIF_RESTORE_SIGMASK for the PowerPC kernel (in the -mm tree), and the third patch then removes the arch-specific implementations of sys_rt_sigsuspend() and replaces them with generic versions using the same trick. The fourth and fifth patches, provided by David Howells, implement TIF_RESTORE_SIGMASK for FR-V and i386 respectively, and the sixth patch adds the syscalls to the i386 syscall table. This patch: Add the pselect() and ppoll() system calls, providing core routines usable by the original select() and poll() system calls and also the new calls (with their semantics w.r.t timeouts). Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:30 -08:00
Jeff Dike	36a7878a22	[PATCH] uml: use generic sys_rt_sigsuspend Use the generic sys_rt_sigsuspend. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:30 -08:00
Jeff Dike	2fc10620e7	[PATCH] uml: add TIF_RESTORE_SIGMASK support Add support for TIF_RESTORE_SIGMASK. I copy the i386 handling of the flag. sys_sigsuspend is also changed to follow i386. Also a bit of cleanup - turn an if into a switch get rid of a couple more emacs formatting comments Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:30 -08:00
David Woodhouse	f27201da5c	[PATCH] TIF_RESTORE_SIGMASK support for arch/powerpc Implement the TIF_RESTORE_SIGMASK flag in the new arch/powerpc kernel, for both 32-bit and 64-bit system call paths. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:30 -08:00
David Howells	283828f3c1	[PATCH] Handle TIF_RESTORE_SIGMASK for i386 Handle TIF_RESTORE_SIGMASK as added by David Woodhouse's patch entitled: [PATCH] 2/3 Add TIF_RESTORE_SIGMASK support for arch/powerpc [PATCH] 3/3 Generic sys_rt_sigsuspend It does the following: (1) Declares TIF_RESTORE_SIGMASK for i386. (2) Invokes it over to do_signal() when TIF_RESTORE_SIGMASK is set. (3) Makes do_signal() support TIF_RESTORE_SIGMASK, using the signal mask saved in current->saved_sigmask. (4) Discards sys_rt_sigsuspend() from the arch, using the generic one instead. (5) Makes sys_sigsuspend() save the signal mask and set TIF_RESTORE_SIGMASK rather than attempting to fudge the return registers. (6) Makes sys_sigsuspend() return -ERESTARTNOHAND rather than looping intrinsically. (7) Makes setup_frame(), setup_rt_frame() and handle_signal() return 0 or -EFAULT rather than true/false to be consistent with the rest of the kernel. Due to the fact do_signal() is then only called from one place: (8) Makes do_signal() no longer have a return value is it was just being ignored; force_sig() takes care of this. (9) Discards the old sigmask argument to do_signal() as it's no longer necessary. (10) Makes do_signal() static. (11) Marks the second argument to do_notify_resume() as unused. The unused argument should remain in the middle as the arguments are passed in as registers, and the ordering is specific in entry.S Given the way do_signal() is now no longer called from sys_{,rt_}sigsuspend(), they no longer need access to the exception frame, and so can just take arguments normally. This patch depends on sys_rt_sigsuspend patch. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:29 -08:00
David Howells	a411aee96e	[PATCH] Handle TIF_RESTORE_SIGMASK for FRV Handle TIF_RESTORE_SIGMASK as added by David Woodhouse's patch entitled: [PATCH] 2/3 Add TIF_RESTORE_SIGMASK support for arch/powerpc [PATCH] 3/3 Generic sys_rt_sigsuspend It does the following: (1) Declares TIF_RESTORE_SIGMASK for FRV. (2) Invokes it over to do_signal() when TIF_RESTORE_SIGMASK is set. (3) Makes do_signal() support TIF_RESTORE_SIGMASK, using the signal mask saved in current->saved_sigmask. (4) Discards sys_rt_sigsuspend() from the arch, using the generic one instead. (5) Makes sys_sigsuspend() save the signal mask and set TIF_RESTORE_SIGMASK rather than attempting to fudge the return registers. (6) Makes sys_sigsuspend() return -ERESTARTNOHAND rather than looping intrinsically. (7) Makes setup_frame(), setup_rt_frame() and handle_signal() return 0 or -EFAULT rather than true/false to be consistent with the rest of the kernel. Due to the fact do_signal() is then only called from one place: (8) Make do_signal() no longer have a return value is it was just being ignored; force_sig() takes care of this. (9) Discards the old sigmask argument to do_signal() as it's no longer necessary. This patch depends on the FRV signalling patches as well as the sys_rt_sigsuspend patch. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:29 -08:00
David Woodhouse	150256d8aa	[PATCH] Generic sys_rt_sigsuspend() The TIF_RESTORE_SIGMASK flag allows us to have a generic implementation of sys_rt_sigsuspend() instead of duplicating it for each architecture. This provides such an implementation and makes arch/powerpc use it. It also tidies up the ppc32 sys_sigsuspend() to use TIF_RESTORE_SIGMASK. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:29 -08:00
Ulrich Drepper	a60fc5190a	[PATCH] vfs: *at functions: x86_64 Wire up the x86_64 syscalls. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:29 -08:00
Ulrich Drepper	4f08550723	[PATCH] vfs: *at functions: i386 Wire up the x86 syscalls Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:29 -08:00
Ulrich Drepper	5590ff0d55	[PATCH] vfs: *at functions: core Here is a series of patches which introduce in total 13 new system calls which take a file descriptor/filename pair instead of a single file name. These functions, openat etc, have been discussed on numerous occasions. They are needed to implement race-free filesystem traversal, they are necessary to implement a virtual per-thread current working directory (think multi-threaded backup software), etc. We have in glibc today implementations of the interfaces which use the /proc/self/fd magic. But this code is rather expensive. Here are some results (similar to what Jim Meyering posted before). The test creates a deep directory hierarchy on a tmpfs filesystem. Then rm -fr is used to remove all directories. Without syscall support I get this: real 0m31.921s user 0m0.688s sys 0m31.234s With syscall support the results are much better: real 0m20.699s user 0m0.536s sys 0m20.149s The interfaces are for obvious reasons currently not much used. But they'll be used. coreutils (and Jeff's posixutils) are already using them. Furthermore, code like ftw/fts in libc (maybe even glob) will also start using them. I expect a patch to make follow soon. Every program which is walking the filesystem tree will benefit. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@ftp.linux.org.uk> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:29 -08:00
J. Bruce Fields	3a65588adc	[PATCH] nfsd4: rename lk_stateowner One of the things that's confusing about nfsd4_lock is that the lk_stateowner field could be set to either of two different lockowners: the open owner or the lock owner. Rename to lk_replay_owner and add a comment to make it clear that it's used for whichever stateowner has its sequence id bumped for replay detection. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:24 -08:00
J. Bruce Fields	1918e34138	[PATCH] svcrpc: save and restore the daddr field when request deferred The server code currently keeps track of the destination address on every request so that it can reply using the same address. However we forget to do that in the case of a deferred request. Remedy this oversight. >From folks at PolyServe. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:24 -08:00
YAMAMOTO Takashi	f193fbab2e	[PATCH] nfsd: check error status from nfsd_sync_dir Change nfsd_sync_dir to return an error if ->sync fails, and pass that error up through the stack. This involves a number of rearrangements of error paths, and care to distinguish between Linux -errno numbers and NFSERR numbers. In the 'create' routines, we continue with the 'setattr' even if a previous sync_dir failed. This patch is quite different from Takashi's in a few ways, but there is still a strong lineage. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:24 -08:00
Arnd Bergmann	5131cf154a	[PATCH] add missing syscall declarations All standard system calls should be declared in include/linux/syscalls.h. Add some of the new additions that were previously missed. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:22 -08:00
Paolo 'Blaisorblade' Giarrusso	3b948068b8	[PATCH] uml: remove leftover from patch revertal I added this line to share this file with UML, but now it's no longer shared so remove this useless leftover. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:20 -08:00
Jeff Dike	ea1eae75eb	[PATCH] uml: add __raw_writel definition Add implementations of the write* and __raw_write* functions. __raw_writel is needed by lib/iocopy.c, which shouldn't be used in UML, but which is unconditionally linked in anyway. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:18 -08:00
Christoph Lameter	dc85da15d4	[PATCH] NUMA policies in the slab allocator V2 This patch fixes a regression in 2.6.14 against 2.6.13 that causes an imbalance in memory allocation during bootup. The slab allocator in 2.6.13 is not numa aware and simply calls alloc_pages(). This means that memory policies may control the behavior of alloc_pages(). During bootup the memory policy is set to MPOL_INTERLEAVE resulting in the spreading out of allocations during bootup over all available nodes. The slab allocator in 2.6.13 has only a single list of slab pages. As a result the per cpu slab cache and the spinlock controlled page lists may contain slab entries from off node memory. The slab allocator in 2.6.13 makes no effort to discern the locality of an entry on its lists. The NUMA aware slab allocator in 2.6.14 controls locality of the slab pages explicitly by calling alloc_pages_node(). The NUMA slab allocator manages slab entries by having lists of available slab pages for each node. The per cpu slab cache can only contain slab entries associated with the node local to the processor. This guarantees that the default allocation mode of the slab allocator always assigns local memory if available. Setting MPOL_INTERLEAVE as a default policy during bootup has no effect anymore. In 2.6.14 all node unspecific slab allocations are performed on the boot processor. This means that most of key data structures are allocated on one node. Most processors will have to refer to these structures making the boot node a potential bottleneck. This may reduce performance and cause unnecessary memory pressure on the boot node. This patch implements NUMA policies in the slab layer. There is the need of explicit application of NUMA memory policies by the slab allcator itself since the NUMA slab allocator does no longer let the page_allocator control locality. The check for policies is made directly at the beginning of __cache_alloc using current->mempolicy. The memory policy is already frequently checked by the page allocator (alloc_page_vma() and alloc_page_current()). So it is highly likely that the cacheline is present. For MPOL_INTERLEAVE kmalloc() will spread out each request to one node after another so that an equal distribution of allocations can be obtained during bootup. It is not possible to push the policy check to lower layers of the NUMA slab allocator since the per cpu caches are now only containing slab entries from the current node. If the policy says that the local node is not to be preferred or forbidden then there is no point in checking the slab cache or local list of slab pages. The allocation better be directed immediately to the lists containing slab entries for the allowed set of nodes. This way of applying policy also fixes another strange behavior in 2.6.13. alloc_pages() is controlled by the memory allocation policy of the current process. It could therefore be that one process is running with MPOL_INTERLEAVE and would f.e. obtain a new page following that policy since no slab entries are in the lists anymore. A page can typically be used for multiple slab entries but lets say that the current process is only using one. The other entries are then added to the slab lists. These are now non local entries in the slab lists despite of the possible availability of local pages that would provide faster access and increase the performance of the application. Another process without MPOL_INTERLEAVE may now run and expect a local slab entry from kmalloc(). However, there are still these free slab entries from the off node page obtained from the other process via MPOL_INTERLEAVE in the cache. The process will then get an off node slab entry although other slab entries may be available that are local to that process. This means that the policy if one process may contaminate the locality of the slab caches for other processes. This patch in effect insures that a per process policy is followed for the allocation of slab entries and that there cannot be a memory policy influence from one process to another. A process with default policy will always get a local slab entry if one is available. And the process using memory policies will get its memory arranged as requested. Off-node slab allocation will require the use of spinlocks and will make the use of per cpu caches not possible. A process using memory policies to redirect allocations offnode will have to cope with additional lock overhead in addition to the latency added by the need to access a remote slab entry. Changes V1->V2 - Remove #ifdef CONFIG_NUMA by moving forward declaration into prior #ifdef CONFIG_NUMA section. - Give the function determining the node number to use a saner name. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:18 -08:00

1 2 3 4 5 ...

4880 Commits