kernel-ark

Author	SHA1	Message	Date
Paul Mackerras	6bfd93c32a	powerpc: Fix incorrect might_sleep in __get_user/__put_user on kernel addresses We have a case where __get_user and __put_user can validly be used on kernel addresses in interrupt context - namely, the alignment exception handler, as our get/put_unaligned just do a single access and rely on the alignment exception handler to fix things up in the rare cases where the cpu can't handle it in hardware. Thus we can get alignment exceptions in the network stack at interrupt level. The alignment exception handler does a __get_user to read the instruction and blows up in might_sleep(). Since a __get_user on a kernel address won't actually ever sleep, this makes the might_sleep conditional on the address being less than PAGE_OFFSET. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-05-03 23:06:46 +10:00
Jeremy Kerr	8261aa6009	[PATCH] powerpc: cell: Add numa id to struct spu Add an nid member to the spu structure, and store the numa id of the spu there on creation. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-01 18:17:46 -07:00
Jeremy Kerr	953039c8df	[PATCH] powerpc: Allow devices to register with numa topology Change of_node_to_nid() to traverse the device tree, looking for a numa id. Cell uses this to assign ids to SPUs, which are children of the CPU node. Existing users of of_node_to_nid() are altered to use of_node_to_nid_single(), which doesn't do the traversal. Export an attach_sysdev_to_node() function, allowing system devices (eg. SPUs) to link themselves into the numa topology in sysfs. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-01 18:17:46 -07:00
David Woodhouse	b07019f293	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-04-30 20:34:39 +01:00
Olof Johansson	bc97ce951c	[PATCH] powerpc: kill union tce_entry It's been long overdue to kill the union tce_entry in the pSeries/iSeries TCE code, especially since I asked the Summit guys to do it on the code they copied from us. Also, while I was at it, I cleaned up some whitespace. Built and booted on pSeries, built on iSeries. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-29 18:07:54 +10:00
Stephen Rothwell	c7f0e8cb56	[PATCH] powerpc: merge the rest of the vio code Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-29 18:02:02 +10:00
Stephen Rothwell	dd721ffd95	[PATCH] powerpc: use a common vio_match_device routine This requires the compatible properties having vaules that are empty strings instead of just being empty properties. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-29 18:02:01 +10:00
Stephen Rothwell	e10fa77368	[PATCH] powerpc: use the device tree for the iSeries vio bus probe As an added bonus, since every vio_dev now has a device_node associated with it, hotplug now works. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-29 18:02:00 +10:00
Paul Mackerras	29f147d746	Merge branch 'merge'	2006-04-29 16:15:57 +10:00
Anton Blanchard	03054d51a7	[PATCH] powerpc: Add cputable entry for POWER6 Add a cputable entry for the POWER6 processor. The SIHV and SIPR bits in the mmcra have moved in POWER6, so disable support for that until oprofile is fixed. Also tell firmware that we know about POWER6. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-29 10:56:58 +10:00
David Woodhouse	5614253686	Remove unneeded _syscallX macros from user view in asm-*/unistd.h These aren't needed by glibc or klibc, and they're broken in some cases anyway. The uClibc folks are apparently switching over to stop using them too (now that we agreed that they should be dropped, at least). Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-04-29 01:51:47 +01:00
David Woodhouse	d6754b401a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2006-04-29 01:42:26 +01:00
Andreas Schwab	2833c28aa0	[PATCH] powerpc: Wire up at syscalls Wire up at syscalls. This patch has been tested on ppc64 (using glibc's testsuite, both 32bit and 64bit), and compile-tested for ppc32 (I have currently no ppc32 system available, but I expect no problems). Signed-off-by: Andreas Schwab <schwab@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-28 21:04:59 +10:00
David Woodhouse	1269277a5e	[PATCH] powerpc: Use check_legacy_ioport() on ppc32 too. Some people report that we die on some Macs when we are expecting to catch machine checks after poking at some random I/O address. I'd seen it happen on my dual G4 with serial ports until we fixed those to use OF, but now other users are reporting it with i8042. This expands the use of check_legacy_ioport() to avoid that situation even on 32-bit kernels. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-28 21:04:55 +10:00
David Gibson	f10a04c034	[PATCH] powerpc: Fix pagetable bloat for hugepages At present, ARCH=powerpc kernels can waste considerable space in pagetables when making large hugepage mappings. Hugepage PTEs go in PMD pages, but each PMD page maps 256M and so contains only 16 hugepage PTEs (128 bytes of data), but takes up a 1024 byte allocation. With CONFIG_PPC_64K_PAGES enabled (64k base page size), the situation is worse. Now hugepage PTEs are at the PTE page level (also mapping 256M), so we store 16 hugepage PTEs in a 64k allocation. The PowerPC MMU already means that any 256M region is either all hugepage, or all normal pages. Thus, with some care, we can use a different allocation for the hugepage PTE tables and only allocate the 128 bytes necessary. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-28 15:02:51 +10:00
David Woodhouse	62c4f0a2d5	Don't include linux/config.h from anywhere else in include/ Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-04-26 12:56:16 +01:00
Jens Axboe	912d35f867	[PATCH] Add support for the sys_vmsplice syscall sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice() moves data to a pipe, with the input being a user address range instead. This uses an approach suggested by Linus, where we can hold partial ranges inside the pages[] map. Hopefully this will be useful for network receive support as well. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-04-26 10:59:21 +02:00
David Woodhouse	dd02ec3ac2	Remove user-visible references to PAGE_SIZE in include/asm-powerpc/elf.h Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-04-25 13:51:52 +01:00
Paul Mackerras	916a3d5729	Merge branch 'merge'	2006-04-23 10:55:56 +10:00
Paul Mackerras	d0e15bed84	powerpc: Fix define_machine so machine_is() works from modules machine_is() was always returning 0 when used in a module, because we weren't exporting the machine definitions. This was why sound wasn't working on powermacs when CONFIG_SND_POWERMAC=m. Original fix from Ben Herrenschmidt, further fixed by me. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-23 10:42:04 +10:00
Linas Vepstas	ac325acd50	[PATCH] powerpc/pseries: clear PCI failure counter if no new failures The current PCI error recovery system keeps track of the number of PCI card resets, and refuses to bring a card back up if this number is too large. The goal of doing this was to avoid an infinite loop of resets if a card is obviously dead. However, if the failures are rare, but the machine has a high uptime, this mechanism might still be triggered; this is too harsh. This patch will avoids this problem by decrementing the fail count after an hour. Thus, as long as a pci card BSOD's less than 6 times an hour, it will continue to be reset indefinitely. If it's failure rate is greater than that, it will be taken off-line permanently. This patch is larger than it might otherwise be because it changes indentation by removing a pointless while-loop. The while loop is not needed, as the handler is invoked once fo each event (by schedule_work()); the loop is leftover cruft from an earlier implementation. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-22 18:46:13 +10:00
Anton Blanchard	c256f4b959	[PATCH] powerpc: remove io_page_mask Cleanup patch which removes the io_page_mask. It fixes the reset on some e1000 devices which is needed for clean kexec reboots. The legacy devices which broke with this patch (parallel port and PC speaker) have now been fixed in Linus' tree. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-22 18:45:05 +10:00
Olof Johansson	7daa411b81	[PATCH] powerpc: IOMMU support for honoring dma_mask Some devices don't support full 32-bit DMA address space, which we currently assume. Add the required mask-passing to the IOMMU allocators. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-21 22:28:55 +10:00
Linus Torvalds	6fbe85f914	Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge * git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge: powerpc: Use correct sequence for putting CPU into nap mode [PATCH] spufs: fix context-switch decrementer code [PATCH] powerpc32: Set cpu explicitly in kernel compiles [PATCH] powerpc/pseries: bugfix: balance calls to pci_device_put [PATCH] powerpc: Fix machine detection in prom_init.c [PATCH] ppc32: Fix string comparing in platform_notify_map [PATCH] powerpc: Avoid __initcall warnings [PATCH] powerpc: Ensure runlatch is off in the idle loop powerpc: Fix CHRP booting - needs a define_machine call powerpc: iSeries has only 256 IRQs	2006-04-18 10:34:24 -07:00
Paul Mackerras	f39224a8c1	powerpc: Use correct sequence for putting CPU into nap mode We weren't using the recommended sequence for putting the CPU into nap mode. When I changed the idle loop, for some reason 7447A cpus started hanging when we put them into nap mode. Changing to the recommended sequence fixes that. The complexity here is that the recommended sequence is a loop that keeps putting the cpu back into nap mode. Clearly we need some way to break out of the loop when an interrupt (external interrupt, decrementer, performance monitor) occurs. Here we use a bit in the thread_info struct to indicate that we need this, and the exception entry code notices this and arranges for the exception to return to the value in the link register, thus breaking out of the loop. We use a new `local_flags' field in the thread_info which we can alter without needing to use an atomic update sequence. The PPC970 has the same recommended sequence, so we do the same thing there too. This also fixes a bug in the kernel stack overflow handling code on 32-bit, since it was causing a value that we needed in a register to get trashed. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-18 21:49:11 +10:00
Jens Axboe	70524490ee	[PATCH] splice: add support for sys_tee() Basically an in-kernel implementation of tee, which uses splice and the pipe buffers as an intelligent way to pass data around by reference. Where the user space tee consumes the input and produces a stdout and file output, this syscall merely duplicates the data inside a pipe to another pipe. No data is copied, the output just grabs a reference to the input pipe data. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-04-11 15:51:17 +02:00
Yasunori Goto	c80d79d746	[PATCH] Configurable NODES_SHIFT Current implementations define NODES_SHIFT in include/asm-xxx/numnodes.h for each arch. Its definition is sometimes configurable. Indeed, ia64 defines 5 NODES_SHIFT values in the current git tree. But it looks a bit messy. SGI-SN2(ia64) system requires 1024 nodes, and the number of nodes already has been changeable by config. Suitable node's number may be changed in the future even if it is other architecture. So, I wrote configurable node's number. This patch set defines just default value for each arch which needs multi nodes except ia64. But, it is easy to change to configurable if necessary. On ia64 the number of nodes can be already configured in generic ia64 and SN2 config. But, NODES_SHIFT is defined for DIG64 and HP'S machine too. So, I changed it so that all platforms can be configured via CONFIG_NODES_SHIFT. It would be simpler. See also: http://marc.theaimsgroup.com/?l=linux-kernel&m=114358010523896&w=2 Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-04-11 06:18:39 -07:00
Stephen Rothwell	7d01c88085	powerpc: iSeries has only 256 IRQs The iSeries Hypervisor only allows us to specify IRQ numbers up to 255 (it has a u8 field to pass it in). This patch allows platforms to specify a maximum to the virtual IRQ numbers we will use and has iSeries set that to 255. If not set, the maximum is NR_IRQS - 1 (as before). Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>	2006-04-04 14:49:48 +10:00
Nathan Fontenot	794e085e56	[PATCH] powerpc/pseries: EEH Cleanup This patch removes unnecessary exports, marks functions as static when possible, and simplifies some list-related code. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-01 22:37:11 +11:00
Heiko J Schick	b13a96cfb0	[PATCH] powerpc: Extends HCALL interface for InfiniBand usage This extends the HCALL interface for InfiniBand usage. I've made the patch against the linux-2.6 git tree and Segher's patch: [PATCH] Change H_StudlyCaps to H_SHOUTING_CAPS We moved this into the common powerpc code based on comments we got after posting the first eHCA InfiniBand device driver patch. Signed-off-by: Heiko j Schick <schickhj@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-01 22:37:00 +11:00
Segher Boessenkool	706c8c93ba	[PATCH] powerpc/pseries: Change H_StudlyCaps to H_SHOUTING_CAPS Also cleans up some nearby whitespace problems. Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-01 22:36:57 +11:00
Linus Torvalds	4b75679f60	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NET]: Allow skb headroom to be overridden [TCP]: Kill unused extern decl for tcp_v4_hash_connecting() [NET]: add SO_RCVBUF comment [NET]: Deinline some larger functions from netdevice.h [DCCP]: Use NULL for pointers, comfort sparse. [DECNET]: Fix refcount	2006-03-31 12:52:30 -08:00
Anton Blanchard	025be81e83	[NET]: Allow skb headroom to be overridden Previously we added NET_IP_ALIGN so an architecture can override the padding done to align headers. The next step is to allow the skb headroom to be overridden. We currently always reserve 16 bytes to grow into, meaning all DMAs start 16 bytes into a cacheline. On ppc64 we really want DMA writes to start on a cacheline boundary, so we increase that headroom to one cacheline. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-31 02:27:06 -08:00
Jens Axboe	5274f052e7	[PATCH] Introduce sys_splice() system call This adds support for the sys_splice system call. Using a pipe as a transport, it can connect to files or sockets (latter as output only). From the splice.c comments: "splice": joining two ropes together by interweaving their strands. This is the "extended pipe" functionality, where a pipe is used as an arbitrary in-memory buffer. Think of a pipe as a small kernel buffer that you can use to transfer data from one end to the other. The traditional unix read/write is extended with a "splice()" operation that transfers data buffers to or from a pipe buffer. Named by Larry McVoy, original implementation from Linus, extended by Jens to support splicing to files and fixing the initial implementation bugs. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-30 12:28:18 -08:00
Anton Blanchard	15e812ad84	[PATCH] powerpc: Remove oprofile spinlock backtrace code Remove oprofile spinlock backtrace code now we have proper calltrace support. Also make MMCRA sihv and sipr bits a variable since they may change in future cpus. Finally, MMCRA should be a 64bit quantity. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-29 13:44:16 +11:00
Brian Rogan	6c6bd754bf	[PATCH] powerpc: Add oprofile calltrace support Add oprofile calltrace support to powerpc. Disable spinlock backtracing now we can use calltrace info. (Updated to work on both 32bit and 64bit by me). Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-29 13:44:16 +11:00
KAMEZAWA Hiroyuki	0e5519548f	[PATCH] for_each_possible_cpu: powerpc for_each_cpu() actually iterates across all possible CPUs. We've had mistakes in the past where people were using for_each_cpu() where they should have been iterating across only online or present CPUs. This is inefficient and possibly buggy. We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the future. This patch replaces for_each_cpu with for_each_possible_cpu. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-29 13:44:15 +11:00
Paul Mackerras	bac30d1a78	Merge ../linux-2.6	2006-03-29 13:24:50 +11:00
Benjamin Herrenschmidt	e8222502ee	[PATCH] powerpc: Kill _machine and hard-coded platform numbers This removes statically assigned platform numbers and reworks the powerpc platform probe code to use a better mechanism. With this, board support files can simply declare a new machine type with a macro, and implement a probe() function that uses the flattened device-tree to detect if they apply for a given machine. We now have a machine_is() macro that replaces the comparisons of _machine with the various PLATFORM_* constants. This commit also changes various drivers to use the new macro instead of looking at _machine. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-28 23:15:54 +11:00
Andrew Morton	872345b715	[PATCH] git-powerpc: WARN was a dumb idea There are at least 14 different implementations of WARN() in the tree already. The build fails all over the place. Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-28 20:48:54 +11:00
Stephen Rothwell	b239cbe957	[PATCH] powerpc: make ISA floppies work again We used to assume that a DMA mapping request with a NULL dev was for ISA DMA. This assumption was broken at some point. Now we explicitly pass the detected ISA PCI device in the floppy setup. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-28 16:45:36 +11:00
Ryan S. Arnold	45d607ed92	[PATCH] powerpc: hvc_console updates These are some updates from both Ryan and Arnd for the hvc_console driver: The main point is to enable the inclusion of a console driver for rtas, which is currrently needed for the cell platform. Also shuffle around some data-type declarations and moves some functions out of include/asm-ppc64/hvconsole.h and into a new drivers/char/hvc_console.h file. Signed-off-by: "Ryan S. Arnold" <rsa@us.ibm.com> Signed-off-by: Arnd Bergmann <abergman@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-28 16:45:26 +11:00
Michael Ellerman	d0160bf0b3	[PATCH] powerpc: Rename and export ppc64_firmware_features We need to export ppc64_firmware_features for modules. Before we do that I think we should probably rename it to powerpc_firmware_features. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-28 16:45:20 +11:00
Anton Blanchard	2f25194dbe	[PATCH] powerpc: export validate_sp for oprofile calltrace Export validate_sp so we can use it in the oprofile calltrace code. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-28 16:19:52 +11:00
Anton Blanchard	72533db012	[PATCH] powerpc: Remove some ifdefs in oprofile_impl.h - No one uses op_counter_config.valid, so remove it - No need to ifdef around function protypes. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-28 16:19:49 +11:00
Paul Mackerras	0a26b1364f	ppc: Remove CHRP, POWER3 and POWER4 support from arch/ppc 32-bit CHRP machines are now supported only in arch/powerpc, as are all 64-bit PowerPC processors. This means that we don't use Open Firmware on any platform in arch/ppc any more. This makes PReP support a single-platform option like every other platform support option in arch/ppc now, thus CONFIG_PPC_MULTIPLATFORM is gone from arch/ppc. CONFIG_PPC_PREP is the option that selects PReP support and is generally what has replaced CONFIG_PPC_MULTIPLATFORM within arch/ppc. _machine is all but dead now, being #defined to 0. Updated Makefiles, comments and Kconfig options generally to reflect these changes. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-28 10:22:10 +11:00
Alan Stern	e041c68341	[PATCH] Notifier chain update: API changes The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:50 -08:00
Ingo Molnar	8f17d3a504	[PATCH] lightweight robust futexes updates - fix: initialize the robust list(s) to NULL in copy_process. - doc update - cleanup: rename _inuser to _inatomic - __user cleanups and other small cleanups Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:49 -08:00
Ingo Molnar	e9056f13bf	[PATCH] lightweight robust futexes: arch defaults This patchset provides a new (written from scratch) implementation of robust futexes, called "lightweight robust futexes". We believe this new implementation is faster and simpler than the vma-based robust futex solutions presented before, and we'd like this patchset to be adopted in the upstream kernel. This is version 1 of the patchset. Background ---------- What are robust futexes? To answer that, we first need to understand what futexes are: normal futexes are special types of locks that in the noncontended case can be acquired/released from userspace without having to enter the kernel. A futex is in essence a user-space address, e.g. a 32-bit lock variable field. If userspace notices contention (the lock is already owned and someone else wants to grab it too) then the lock is marked with a value that says "there's a waiter pending", and the sys_futex(FUTEX_WAIT) syscall is used to wait for the other guy to release it. The kernel creates a 'futex queue' internally, so that it can later on match up the waiter with the waker - without them having to know about each other. When the owner thread releases the futex, it notices (via the variable value) that there were waiter(s) pending, and does the sys_futex(FUTEX_WAKE) syscall to wake them up. Once all waiters have taken and released the lock, the futex is again back to 'uncontended' state, and there's no in-kernel state associated with it. The kernel completely forgets that there ever was a futex at that address. This method makes futexes very lightweight and scalable. "Robustness" is about dealing with crashes while holding a lock: if a process exits prematurely while holding a pthread_mutex_t lock that is also shared with some other process (e.g. yum segfaults while holding a pthread_mutex_t, or yum is kill -9-ed), then waiters for that lock need to be notified that the last owner of the lock exited in some irregular way. To solve such types of problems, "robust mutex" userspace APIs were created: pthread_mutex_lock() returns an error value if the owner exits prematurely - and the new owner can decide whether the data protected by the lock can be recovered safely. There is a big conceptual problem with futex based mutexes though: it is the kernel that destroys the owner task (e.g. due to a SEGFAULT), but the kernel cannot help with the cleanup: if there is no 'futex queue' (and in most cases there is none, futexes being fast lightweight locks) then the kernel has no information to clean up after the held lock! Userspace has no chance to clean up after the lock either - userspace is the one that crashes, so it has no opportunity to clean up. Catch-22. In practice, when e.g. yum is kill -9-ed (or segfaults), a system reboot is needed to release that futex based lock. This is one of the leading bugreports against yum. To solve this problem, 'Robust Futex' patches were created and presented on lkml: the one written by Todd Kneisel and David Singleton is the most advanced at the moment. These patches all tried to extend the futex abstraction by registering futex-based locks in the kernel - and thus give the kernel a chance to clean up. E.g. in David Singleton's robust-futex-6.patch, there are 3 new syscall variants to sys_futex(): FUTEX_REGISTER, FUTEX_DEREGISTER and FUTEX_RECOVER. The kernel attaches such robust futexes to vmas (via vma->vm_file->f_mapping->robust_head), and at do_exit() time, all vmas are searched to see whether they have a robust_head set. Lots of work went into the vma-based robust-futex patch, and recently it has improved significantly, but unfortunately it still has two fundamental problems left: - they have quite complex locking and race scenarios. The vma-based patches had been pending for years, but they are still not completely reliable. - they have to scan _every_ vma at sys_exit() time, per thread! The second disadvantage is a real killer: pthread_exit() takes around 1 microsecond on Linux, but with thousands (or tens of thousands) of vmas every pthread_exit() takes a millisecond or more, also totally destroying the CPU's L1 and L2 caches! This is very much noticeable even for normal process sys_exit_group() calls: the kernel has to do the vma scanning unconditionally! (this is because the kernel has no knowledge about how many robust futexes there are to be cleaned up, because a robust futex might have been registered in another task, and the futex variable might have been simply mmap()-ed into this process's address space). This huge overhead forced the creation of CONFIG_FUTEX_ROBUST, but worse than that: the overhead makes robust futexes impractical for any type of generic Linux distribution. So it became clear to us, something had to be done. Last week, when Thomas Gleixner tried to fix up the vma-based robust futex patch in the -rt tree, he found a handful of new races and we were talking about it and were analyzing the situation. At that point a fundamentally different solution occured to me. This patchset (written in the past couple of days) implements that new solution. Be warned though - the patchset does things we normally dont do in Linux, so some might find the approach disturbing. Parental advice recommended ;-) New approach to robust futexes ------------------------------ At the heart of this new approach there is a per-thread private list of robust locks that userspace is holding (maintained by glibc) - which userspace list is registered with the kernel via a new syscall [this registration happens at most once per thread lifetime]. At do_exit() time, the kernel checks this user-space list: are there any robust futex locks to be cleaned up? In the common case, at do_exit() time, there is no list registered, so the cost of robust futexes is just a simple current->robust_list != NULL comparison. If the thread has registered a list, then normally the list is empty. If the thread/process crashed or terminated in some incorrect way then the list might be non-empty: in this case the kernel carefully walks the list [not trusting it], and marks all locks that are owned by this thread with the FUTEX_OWNER_DEAD bit, and wakes up one waiter (if any). The list is guaranteed to be private and per-thread, so it's lockless. There is one race possible though: since adding to and removing from the list is done after the futex is acquired by glibc, there is a few instructions window for the thread (or process) to die there, leaving the futex hung. To protect against this possibility, userspace (glibc) also maintains a simple per-thread 'list_op_pending' field, to allow the kernel to clean up if the thread dies after acquiring the lock, but just before it could have added itself to the list. Glibc sets this list_op_pending field before it tries to acquire the futex, and clears it after the list-add (or list-remove) has finished. That's all that is needed - all the rest of robust-futex cleanup is done in userspace [just like with the previous patches]. Ulrich Drepper has implemented the necessary glibc support for this new mechanism, which fully enables robust mutexes. (Ulrich plans to commit these changes to glibc-HEAD later today.) Key differences of this userspace-list based approach, compared to the vma based method: - it's much, much faster: at thread exit time, there's no need to loop over every vma (!), which the VM-based method has to do. Only a very simple 'is the list empty' op is done. - no VM changes are needed - 'struct address_space' is left alone. - no registration of individual locks is needed: robust mutexes dont need any extra per-lock syscalls. Robust mutexes thus become a very lightweight primitive - so they dont force the application designer to do a hard choice between performance and robustness - robust mutexes are just as fast. - no per-lock kernel allocation happens. - no resource limits are needed. - no kernel-space recovery call (FUTEX_RECOVER) is needed. - the implementation and the locking is "obvious", and there are no interactions with the VM. Performance ----------- I have benchmarked the time needed for the kernel to process a list of 1 million (!) held locks, using the new method [on a 2GHz CPU]: - with FUTEX_WAIT set [contended mutex]: 130 msecs - without FUTEX_WAIT set [uncontended mutex]: 30 msecs I have also measured an approach where glibc does the lock notification [which it currently does for !pshared robust mutexes], and that took 256 msecs - clearly slower, due to the 1 million FUTEX_WAKE syscalls userspace had to do. (1 million held locks are unheard of - we expect at most a handful of locks to be held at a time. Nevertheless it's nice to know that this approach scales nicely.) Implementation details ---------------------- The patch adds two new syscalls: one to register the userspace list, and one to query the registered list pointer: asmlinkage long sys_set_robust_list(struct robust_list_head __user head, size_t len); asmlinkage long sys_get_robust_list(int pid, struct robust_list_head __user head_ptr, size_t __user len_ptr); List registration is very fast: the pointer is simply stored in current->robust_list. [Note that in the future, if robust futexes become widespread, we could extend sys_clone() to register a robust-list head for new threads, without the need of another syscall.] So there is virtually zero overhead for tasks not using robust futexes, and even for robust futex users, there is only one extra syscall per thread lifetime, and the cleanup operation, if it happens, is fast and straightforward. The kernel doesnt have any internal distinction between robust and normal futexes. If a futex is found to be held at exit time, the kernel sets the highest bit of the futex word: #define FUTEX_OWNER_DIED 0x40000000 and wakes up the next futex waiter (if any). User-space does the rest of the cleanup. Otherwise, robust futexes are acquired by glibc by putting the TID into the futex field atomically. Waiters set the FUTEX_WAITERS bit: #define FUTEX_WAITERS 0x80000000 and the remaining bits are for the TID. Testing, architecture support ----------------------------- I've tested the new syscalls on x86 and x86_64, and have made sure the parsing of the userspace list is robust [ ;-) ] even if the list is deliberately corrupted. i386 and x86_64 syscalls are wired up at the moment, and Ulrich has tested the new glibc code (on x86_64 and i386), and it works for his robust-mutex testcases. All other architectures should build just fine too - but they wont have the new syscalls yet. Architectures need to implement the new futex_atomic_cmpxchg_inuser() inline function before writing up the syscalls (that function returns -ENOSYS right now). This patch: Add placeholder futex_atomic_cmpxchg_inuser() implementations to every architecture that supports futexes. It returns -ENOSYS. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:49 -08:00
KAMEZAWA Hiroyuki	659e35051b	[PATCH] unify pfn_to_page: powerpc pfn_to_page PowerPC can use generic ones. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:44 -08:00
Paul Mackerras	9618edab82	powerpc: Fix event-scan code for 32-bit CHRP On CHRP machines we are supposed to call into firmware (RTAS) periodically, to give it a chance to check for errors and other events. Under ppc we had some special code in timer_interrupt to do this, but that didn't get transferred over to arch/powerpc. Instead, we use an array of timer_list structs, one per CPU, and use add_timer_on to make sure each one gets called on the appropriate CPU. With this we can remove the heartbeat_* elements of the ppc_md struct. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 21:48:57 +11:00
Paul Mackerras	fbd7740fdf	powerpc: Simplify pSeries idle loop Since pSeries only wants to do something different in the idle loop when there is no work to do, we can simplify the code by implementing ppc_md.power_save functions instead of complete idle loops. There are two versions: one for shared-processor partitions and one for dedicated- processor partitions. With this we also do a cede_processor() call on dedicated processor partitions if the poll_pending() call indicates that the hypervisor has work it wants to do. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 15:06:20 +11:00
Paul Mackerras	a0652fc9a2	powerpc: Unify the 32 and 64 bit idle loops This unifies the 32-bit (ARCH=ppc and ARCH=powerpc) and 64-bit idle loops. It brings over the concept of having a ppc_md.power_save function from 32-bit to ARCH=powerpc, which lets us get rid of native_idle(). With this we will also be able to simplify the idle handling for pSeries and cell. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 15:03:03 +11:00
Anton Blanchard	4df20460a3	[PATCH] powerpc: Allow non zero boot cpuids We currently have a hack to flip the boot cpu and its secondary thread to logical cpuid 0 and 1. This means the logical - physical mapping will differ depending on which cpu is boot cpu. This is most apparent on kexec, where we might kexec on any cpu and therefore change the mapping from boot to boot. The patch below does a first pass early on to work out the logical cpuid of the boot thread. We then fix up some paca structures to match. Ive also removed the boot_cpuid_phys variable for ppc64, to be consistent we use get_hard_smp_processor_id(boot_cpuid) everywhere. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 14:48:48 +11:00
Mark Nutter	6df10a82f8	[PATCH] spufs: enable SPE problem state MMIO access. This patch is layered on top of CONFIG_SPARSEMEM and is patterned after direct mapping of LS. This patch allows mmap() of the following regions: "mfc", which represents the area from [0x3000 - 0x3fff]; "cntl", which represents the area from [0x4000 - 0x4fff]; "signal1" which begins at offset 0x14000; "signal2" which begins at offset 0x1c000. The signal1 & signal2 files may be mmap()'d by regular user processes. The cntl and mfc file, on the other hand, may only be accessed if the owning process has CAP_SYS_RAWIO, because they have the potential to confuse the kernel with regard to parallel access to the same files with regular file operations: the kernel always holds a spinlock when accessing registers in these areas to serialize them, which can not be guaranteed with user mmaps, Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 14:48:28 +11:00
Arnd Bergmann	a33a7d7309	[PATCH] spufs: implement mfc access for PPE-side DMA This patch adds a new file called 'mfc' to each spufs directory. The file accepts DMA commands that are a subset of what would be legal DMA commands for problem state register access. Upon reading the file, a bitmask is returned with the completed tag groups set. The file is meant to be used from an abstraction in libspe that is added by a different patch. From the kernel perspective, this means a process can now offload a memory copy from or into an SPE local store without having to run code on the SPE itself. The transfer will only be performed while the SPE is owned by one thread that is waiting in the spu_run system call and the data will be transferred into that thread's address space, independent of which thread started the transfer. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 14:48:26 +11:00
Arnd Bergmann	2dd14934c9	[PATCH] spufs: allow SPU code to do syscalls An SPU does not have a way to implement system calls itself, but it can create intercepts to the kernel. This patch uses the method defined by the JSRE interface for C99 host library calls from an SPU to implement Linux system calls. It uses the reserved SPU stop code 0x2104 for this, using the structure layout and syscall numbers for ppc64-linux. I'm still undecided wether it is better to have a list of allowed syscalls or a list of forbidden syscalls, since we can't allow an SPU to call all syscalls that are defined for ppc64-linux. This patch implements the easier choice of them, with a blacklist that only prevents an SPU from calling anything that interacts with its own execution, e.g fork, execve, clone, vfork, exit, spu_run and spu_create and everything that deals with signals. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 14:48:24 +11:00
Arnd Bergmann	a7f31841a4	[PATCH] powerpc: declare arch syscalls in <asm/syscalls.h> powerpc currently declares some of its own system calls in <asm/unistd.h>, but not all of them. That place also contains remainders of the now almost unused kernel syscall hack. - Add a new <asm/syscalls.h> with clean declarations - Include that file from every source that implements one of these - Get rid of old declarations in <asm/unistd.h> This patch is required as a base for implementing system calls from an SPU, but also makes sense as a general cleanup. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 14:48:22 +11:00
Michael Ellerman	dd4d7bfad6	[PATCH] powerpc: Change firmware_has_feature() to a macro So that we can use firmware_has_feature() in a BUG_ON() and have the compiler elide the code entirely if the feature can never be set, change firmware_has_feature to a macro. Unfortunate, but necessary at least until GCC bug #26724 is fixed. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 14:48:12 +11:00
Michael Ellerman	e3f94b85f9	[PATCH] powerpc: Make BUG_ON & WARN_ON play nice with compile-time optimisations Change BUG_ON and WARN_ON to give the compiler a chance to perform compile-time optimsations. Depending on the complexity of the condition, the compiler may not do this very well, so if it's important check the object code. Current GCC's (4.x) produce good code as long as the condition does not include a function call, including a static inline. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 14:48:10 +11:00
Stephen Rothwell	7c92943c7b	[PATCH] powerpc: work around sparse warnings in cputable.h Christoph noticed that sparse warned about all the enum tags in cuptable.h that had values that required them to be type log. (enum tags are ints according to the standard.) This patch attempts to fix them in the least intrusive way possible by turning them all into #defines except for the 32 bit CPU_FTRS_POSSIBLE and CPU_FTRS_ALWAYS which are hard to construct that way. This works because these last two contain no bits above 2^31. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 14:48:06 +11:00
Akinobu Mita	e779b2f95f	[PATCH] bitops: powerpc: use generic bitops - remove __{,test_and_}{set,clear,change}_bit() and test_bit() - remove generic_fls64() - remove generic_hweight{64,32,16,8}() - remove sched_find_first_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:57:14 -08:00
Takashi Sato	a0f62ac636	[PATCH] 2TB files: add blkcnt_t Add blkcnt_t as the type of inode.i_blocks. This enables you to make the size of blkcnt_t either 4 bytes or 8 bytes on 32 bits architecture with CONFIG_LSF. - CONFIG_LSF Add new configuration parameter. - blkcnt_t On h8300, i386, mips, powerpc, s390 and sh that define sector_t, blkcnt_t is defined as u64 if CONFIG_LSF is enabled; otherwise it is defined as unsigned long. On other architectures, it is defined as unsigned long. - inode.i_blocks Change the type from sector_t to blkcnt_t. Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:57:00 -08:00
Linus Torvalds	3cbb90a9cb	powerpc: fix strncasecmp prototype It takes a size_t, not an int, as its third argument. Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-25 09:41:40 -08:00
Davide Libenzi	f348d70a32	[PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications Implement the half-closed devices notifiation, by adding a new POLLRDHUP (and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the existing POLLHUP handling, that does not report correctly half-closed devices, was feared to be changed, this implementation leaves the current POLLHUP reporting unchanged and simply add a new bit that is set in the few places where it makes sense. The same thing was discussed and conceptually agreed quite some time ago: http://lkml.org/lkml/2003/7/12/116 Since this new event bit is added to the existing Linux poll infrastruture, even the existing poll/select system calls will be able to use it. As far as the existing POLLHUP handling, the patch leaves it as is. The pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing archs and sets the bit in the six relevant files. The other attached diff is the simple change required to sys/epoll.h to add the EPOLLRDHUP definition. There is "a stupid program" to test POLLRDHUP delivery here: http://www.xmailserver.org/pollrdhup-test.c It tests poll(2), but since the delivery is same epoll(2) will work equally. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-25 08:22:56 -08:00
Andrew Morton	394e3902c5	[PATCH] more for_each_cpu() conversions When we stop allocating percpu memory for not-possible CPUs we must not touch the percpu data for not-possible CPUs at all. The correct way of doing this is to test cpu_possible() or to use for_each_cpu(). This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very few instances of this bug, if any. But the patch converts lots of open-coded test to use the preferred helper macros. Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Acked-by: Kyle McMartin <kyle@parisc-linux.org> Cc: Anton Blanchard <anton@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Andi Kleen <ak@muc.de> Cc: Christian Zankel <chris@zankel.net> Cc: Philippe Elie <phil.el@wanadoo.fr> Cc: Nathan Scott <nathans@sgi.com> Cc: Jens Axboe <axboe@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-23 07:38:17 -08:00
Linus Torvalds	2e6e33bab6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (78 commits) [PATCH] powerpc: Add FSL SEC node to documentation [PATCH] macintosh: tidy-up driver_register() return values [PATCH] powerpc: tidy-up of_register_driver()/driver_register() return values [PATCH] powerpc: via-pmu warning fix [PATCH] macintosh: cleanup the use of i2c headers [PATCH] powerpc: dont allow old RTC to be selected [PATCH] powerpc: make powerbook_sleep_grackle static [PATCH] powerpc: Fix warning in add_memory [PATCH] powerpc: update mailing list addresses [PATCH] powerpc: Remove calculation of io hole [PATCH] powerpc: iseries: Add bootargs to /chosen [PATCH] powerpc: iseries: Add /system-id, /model and /compatible [PATCH] powerpc: Add strne2a() to convert a string from EBCDIC to ASCII [PATCH] powerpc: iseries: Make more stuff static in platforms/iseries/mf.c [PATCH] powerpc: iseries: Remove pointless iSeries_(restart\|power_off\|halt) [PATCH] powerpc: iseries: mf related cleanups [PATCH] powerpc: Replace platform_is_lpar() with a firmware feature [PATCH] powerpc: trivial: Cleanup whitespace in cputable.h [PATCH] powerpc: Remove unused iommu_off logic from pSeries_init_early() [PATCH] powerpc: Unconfuse htab_bolt_mapping() callers ...	2006-03-22 22:20:46 -08:00
David Gibson	9da61aef0f	[PATCH] hugepage: Fix hugepage logic in free_pgtables() free_pgtables() has special logic to call hugetlb_free_pgd_range() instead of the normal free_pgd_range() on hugepage VMAs. However, the test it uses to do so is incorrect: it calls is_hugepage_only_range on a hugepage sized range at the start of the vma. is_hugepage_only_range() will return true if the given range has any intersection with a hugepage address region, and in this case the given region need not be hugepage aligned. So, for example, this test can return true if called on, say, a 4k VMA immediately preceding a (nicely aligned) hugepage VMA. At present we get away with this because the powerpc version of hugetlb_free_pgd_range() is just a call to free_pgd_range(). On ia64 (the only other arch with a non-trivial is_hugepage_only_range()) we get away with it for a different reason; the hugepage area is not contiguous with the rest of the user address space, and VMAs are not permitted in between, so the test can't return a false positive there. Nonetheless this should be fixed. We do that in the patch below by replacing the is_hugepage_only_range() test with an explicit test of the VMA using is_vm_hugetlb_page(). This in turn changes behaviour for platforms where is_hugepage_only_range() returns false always (everything except powerpc and ia64). We address this by ensuring that hugetlb_free_pgd_range() is defined to be identical to free_pgd_range() (instead of a no-op) on everything except ia64. Even so, it will prevent some otherwise possible coalescing of calls down to free_pgd_range(). Since this only happens for hugepage VMAs, removing this small optimization seems unlikely to cause any trouble. This patch causes no regressions on the libhugetlbfs testsuite - ppc64 POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP). Signed-off-by: David Gibson <dwg@au1.ibm.com> Cc: William Lee Irwin III <wli@holomorphy.com> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-22 07:54:03 -08:00
Michael Ellerman	f8642ebee8	[PATCH] powerpc: Remove calculation of io hole In mm_init_ppc64() we calculate the location of the "IO hole", but then no one ever looks at the value. So don't bother. That's actually all mm_init_ppc64() does, so get rid of it too. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-22 15:04:30 +11:00
Michael Ellerman	584fc6d111	[PATCH] powerpc: Add strne2a() to convert a string from EBCDIC to ASCII Add strne2a() which converts a string from EBCDIC to ASCII. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-22 15:04:25 +11:00
Michael Ellerman	00611c5cfc	[PATCH] powerpc: iseries: Make more stuff static in platforms/iseries/mf.c Make mf_get_rtc(), mf_get_boot_rtc() and mf_set_rtc() static, cause they can be. We need to move mf_set_rtc() to avoid a forward declaration. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-22 15:04:23 +11:00
Michael Ellerman	a9ea2101aa	[PATCH] powerpc: iseries: Remove pointless iSeries_(restart\|power_off\|halt) These routines just call through to the mf routines, so point ppc_md straight at the mf routines. We need to pass the cmd through to mf_reboot to make it work, but that seems reasonable. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-22 15:04:22 +11:00
Michael Ellerman	260de22faa	[PATCH] powerpc: iseries: mf related cleanups Some cleanups in the iSeries code. - Make mf_display_progress() check mf_initialized rather than the caller. - Set mf_initialized in mf_init() rather than in setup.c - Then move mf_initialized into mf.c, the only place it's used. - Move the mf related logic from iSeries_progress() to mf_display_progress() - Use a #define to size the pending_event_prealloc array - Use that define in the initialsation loop rather than sizeof jiggery pokery - Remove stupid comment(s) - Mark stuff static and/or __init Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-22 15:04:20 +11:00
Michael Ellerman	57cfb814f6	[PATCH] powerpc: Replace platform_is_lpar() with a firmware feature It has been decreed that platform numbers are evil, so as a step in that direction, replace platform_is_lpar() with a FW_FEATURE_LPAR bit. Currently FW_FEATURE_LPAR really means i/pSeries LPAR, in the future we might have to clean that up if we need to be more specific about what LPAR actually means. But that's another patch ... Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-22 15:04:17 +11:00
Michael Ellerman	3d15910bfb	[PATCH] powerpc: trivial: Cleanup whitespace in cputable.h Remove redundant whitespace in include/asm-powerpc/cputable.h Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-22 15:04:15 +11:00
Christoph Hellwig	e33852228f	[PATCH] powerpc: add for_each_node_by_foo helpers Typical use for of_find_node_by_name and of_find_node_by_type is to iterate over all nodes of a given type/name. Add a helper macro to do that (in spirit of the list_for_each* macros). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-17 13:21:09 +11:00
David Gibson	0f6be7b77c	[PATCH] powerpc: Better pmd_bad() and pud_bad() checks At present, the powerpc pmd_bad() and pud_bad() macros return false unless the given pmd or pud is zero. This patch makes these tests more thorough, checking if the given pmd or pud looks like a plausible pte page or pmd page pointer respectively. This can result in helpful error messages when messing with the pagetable code. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-17 13:20:40 +11:00
Paul Mackerras	23dd640112	Merge ../linux-2.6	2006-03-17 12:01:19 +11:00
John Rose	92eb4602eb	[PATCH] powerpc: properly configure DDR/P5IOC children devs The dynamic add path for PCI Host Bridges can fail to configure children adapters under P5IOC controllers. It fails to properly fixup bus/device resources, and it fails to properly enable EEH. Both of these steps need to occur before any children devices are enabled in pci_bus_add_devices(). Signed-off-by: John Rose <johnrose@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-16 16:55:07 +11:00
Paul Mackerras	5164501794	Merge ../linux-2.6	2006-03-09 14:32:05 +11:00
Linus Torvalds	0d514f040a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge * git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge: powerpc: Fix various syscall/signal/swapcontext bugs [PATCH] powerpc: incorrect rmo_top handling in prom_init [PATCH] powerpc: Fix incorrect pud_ERROR() message [PATCH] powerpc: Expose SMT and L1 icache snoop userland features [PATCH] powerpc: Fix windfarm_pm112 not starting all control loops [PATCH] powerpc: Fix old g5 issues with windfarm powerpc32: Fix timebase synchronization on 32-bit powermacs powerpc: Turn off verbose debug output in powermac platform functions powerpc: Fix might-sleep warning in program check exception handler	2006-03-08 18:11:00 -08:00
Michael Matz	2ec5e3a867	[PATCH] fix kexec asm While testing kexec and kdump we hit problems where the new kernel would freeze or instantly reboot. The easiest way to trigger it was to kexec a kernel compiled for CONFIG_M586 on an athlon cpu. Compiling for CONFIG_MK7 instead would work fine. The patch fixes a few problems with the kexec inline asm. Signed-off-by: Chris Mason <mason@suse.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-08 14:15:04 -08:00
Mark Fasheh	1c6cc5fd32	[PATCH] powerpc: restore eeh_add_device_late() prototype stub We fixed this: arch/powerpc/platforms/pseries/eeh.c: In function `eeh_add_device_tree_late': arch/powerpc/platforms/pseries/eeh.c:901: warning: implicit declaration of function `eeh_add_device_late' arch/powerpc/platforms/pseries/eeh.c: At top level: arch/powerpc/platforms/pseries/eeh.c:918: error: conflicting types for 'eeh_add_device_late' arch/powerpc/platforms/pseries/eeh.c:901: error: previous implicit declaration of 'eeh_add_device_late' was here make[2]: *** [arch/powerpc/platforms/pseries/eeh.o] Error 1 But we forgot the !CONFIG_EEH stub. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-08 14:14:00 -08:00
Paul Mackerras	1bd79336a4	powerpc: Fix various syscall/signal/swapcontext bugs A careful reading of the recent changes to the system call entry/exit paths revealed several problems, plus some things that could be simplified and improved: * 32-bit wasn't testing the _TIF_NOERROR bit in the syscall fast exit path, so it was only doing anything with it once it saw some other bit being set. In other words, the noerror behaviour would apply to the next system call where we had to reschedule or deliver a signal, which is not necessarily the current system call. * 32-bit wasn't doing the call to ptrace_notify in the syscall exit path when the _TIF_SINGLESTEP bit was set. * _TIF_RESTOREALL was in both _TIF_USER_WORK_MASK and _TIF_PERSYSCALL_MASK, which is odd since _TIF_RESTOREALL is only set by system calls. I took it out of _TIF_USER_WORK_MASK. * On 64-bit, _TIF_RESTOREALL wasn't causing the non-volatile registers to be restored (unless perhaps a signal was delivered or the syscall was traced or single-stepped). Thus the non-volatile registers weren't restored on exit from a signal handler. We probably got away with it mostly because signal handlers written in C wouldn't alter the non-volatile registers. * On 32-bit I simplified the code and made it more like 64-bit by making the syscall exit path jump to ret_from_except to handle preemption and signal delivery. * 32-bit was calling do_signal unnecessarily when _TIF_RESTOREALL was set - but I think because of that 32-bit was actually restoring the non-volatile registers on exit from a signal handler. * I changed the order of enabling interrupts and saving the non-volatile registers before calling do_syscall_trace_leave; now we enable interrupts first. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-08 13:24:22 +11:00
David Gibson	141aa59b53	[PATCH] powerpc: Fix incorrect pud_ERROR() message The powerpc pud_ERROR() function misleadingly prints a message indicating a pmd error. This patch fixes it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-03 22:00:52 +11:00
Benjamin Herrenschmidt	aa5cb02143	[PATCH] powerpc: Expose SMT and L1 icache snoop userland features This patch makes userland aware of the icache snoop capability of the POWER5 (and possibly others in the future) and of SMT capabilities. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-03 22:00:23 +11:00
Greg KH	e5cef95d58	[PATCH] fix build breakage in eeh.c in 2.6.16-rc5-git5 This patch should fixe a problem with eeh_add_device_late() not being defined in the ppc64 build process, causing the build to break. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-01 13:53:02 -08:00
Paul Mackerras	6749c55073	Merge ../powerpc-merge	2006-02-28 16:35:24 +11:00
John Rose	827c1a6c1a	[PATCH] powerpc: fix dynamic PCI probe regression Some hotplug driver functions were migrated to the kernel for use by EEH in commit `2bf6a8fa21`. Previously, the PCI Hotplug module had been changed to use the new OFDT-based PCI probe when appropriate: `5fa80fcdca` When rpaphp_pci_config_slot() was moved from the rpaphp driver to the new kernel function pcibios_add_pci_devices(), the OFDT-based probe stuff was dropped. This patch restores it. Signed-off-by: John Rose <johnrose@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-02-28 16:25:54 +11:00
Nick Piggin	f055affb89	[PATCH] powerpc: native atomic_add_unless Do atomic_add_unless natively instead of using cmpxchg. Improved register allocation idea from Joel Schopp. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-02-24 14:06:02 +11:00
Nick Piggin	4f629d7db3	[PATCH] powerpc: newline for ISYNC_ON_SMP Add a newline at the end of the ISYNC_ON_SMP string. Needed for a subsequent patch. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-02-24 14:06:00 +11:00
David Gibson	20f4eb3e50	[PATCH] powerpc: Fixup for STRICT_MM_TYPECHECKS Currently ARCH=powerpc will not compile when STRICT_MM_TYPECHECKS is turned on and CONFIG_64K_PAGES is turned off. This corrects the problem. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-02-24 14:05:58 +11:00
Paul Mackerras	c6622f63db	powerpc: Implement accurate task and CPU time accounting This implements accurate task and cpu time accounting for 64-bit powerpc kernels. Instead of accounting a whole jiffy of time to a task on a timer interrupt because that task happened to be running at the time, we now account time in units of timebase ticks according to the actual time spent by the task in user mode and kernel mode. We also count the time spent processing hardware and software interrupts accurately. This is conditional on CONFIG_VIRT_CPU_ACCOUNTING. If that is not set, we do tick-based approximate accounting as before. To get this accurate information, we read either the PURR (processor utilization of resources register) on POWER5 machines, or the timebase on other machines on * each entry to the kernel from usermode * each exit to usermode * transitions between process context, hard irq context and soft irq context in kernel mode * context switches. On POWER5 systems with shared-processor logical partitioning we also read both the PURR and the timebase at each timer interrupt and context switch in order to determine how much time has been taken by the hypervisor to run other partitions ("steal" time). Unfortunately, since we need values of the PURR on both threads at the same time to accurately calculate the steal time, and since we can only calculate steal time on a per-core basis, the apportioning of the steal time between idle time (time which we ceded to the hypervisor in the idle loop) and actual stolen time is somewhat approximate at the moment. This is all based quite heavily on what s390 does, and it uses the generic interfaces that were added by the s390 developers, i.e. account_system_time(), account_user_time(), etc. This patch doesn't add any new interfaces between the kernel and userspace, and doesn't change the units in which time is reported to userspace by things such as /proc/stat, /proc/<pid>/stat, getrusage(), times(), etc. Internally the various task and cpu times are stored in timebase units, but they are converted to USER_HZ units (1/100th of a second) when reported to userspace. Some precision is therefore lost but there should not be any accumulating error, since the internal accumulation is at full precision. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-02-24 14:05:56 +11:00
Paul Mackerras	a00428f5b1	Merge ../powerpc-merge	2006-02-24 14:05:47 +11:00
Anton Blanchard	cb2c9b2741	[PATCH] powerpc: Fix runlatch performance issues The runlatch SPR can take a lot of time to write. My original runlatch code would set it on every exception entry even though most of the time this was not required. It would also continually set it in the idle loop, which is an issue on an SMT capable processor. Now we cache the runlatch value in a threadinfo bit, and only check for it in decrementer and hardware interrupt exceptions as well as the idle loop. Boot on POWER3, POWER5 and iseries, and compile tested on pmac32. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-02-24 11:36:31 +11:00
Kumar Gala	1775dbbcd0	[PATCH] powerpc: Enable coherency for all pages on 83xx to fix PCI data corruption On the 83xx platform to ensure the PCI inbound memory is handled properly we have to turn on coherency for all pages in the MMU. Otherwise we see corruption if inbound "prefetching/streaming" is enabled on the PCI controller. Signed-off-by: Randy Vinson <rvinson@mvista.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-02-24 11:36:25 +11:00
Michael Ellerman	337a7128db	[PATCH] powerpc: Only calculate htab_size in one place for kexec For kexec we need to know the size of the MMU hash table. Currently we calculate the size once in the htab code, and then twice more in the kexec code, once using htab_hash_mask and once using ppc64_pft_size. On some machines the ppc64_pft_size calculation is broken because ppc64_pft_size is not set. So we need to fix the second calculation, but better still we should just calculate the size once and use it everywhere else. Tested on Power5 LPAR, Power4 non-LPAR and Power3. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-02-24 11:36:18 +11:00
David Gibson	200a4552af	[PATCH] powerpc: Fix accidentally-working typo in __pud_free_tlb One of the parameters to the __pud_free_tlb() macro for powerpc is incorrect (see patch) . We get away with it by accident, because the one place the macro is called, the second parameter is a variable named "pud". Signed-off-by: David Gibson <dwg@au1.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-02-17 13:59:27 -08:00
Michael S. Tsirkin	5f6164f309	[PATCH] add asm-generic/mman.h Make new MADV_REMOVE, MADV_DONTFORK, MADV_DOFORK consistent across all arches. The idea is to make it possible to use them portably even before distros include them in libc headers. Move common flags to asm-generic/mman.h Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Cc: Roland Dreier <rolandd@cisco.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-02-15 15:32:22 -08:00
Michael S. Tsirkin	f822566165	[PATCH] madvise MADV_DONTFORK/MADV_DOFORK Currently, copy-on-write may change the physical address of a page even if the user requested that the page is pinned in memory (either by mlock or by get_user_pages). This happens if the process forks meanwhile, and the parent writes to that page. As a result, the page is orphaned: in case of get_user_pages, the application will never see any data hardware DMA's into this page after the COW. In case of mlock'd memory, the parent is not getting the realtime/security benefits of mlock. In particular, this affects the Infiniband modules which do DMA from and into user pages all the time. This patch adds madvise options to control whether memory range is inherited across fork. Useful e.g. for when hardware is doing DMA from/into these pages. Could also be useful to an application wanting to speed up its forks by cutting large areas out of consideration. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Acked-by: Hugh Dickins <hugh@veritas.com> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-02-14 16:09:34 -08:00

1 2 3 4 5 ...

494 Commits