kernel-ark

Author	SHA1	Message	Date
Christoph Hellwig	92198f7eaa	[PATCH] pass iocb to dio_iodone_t XFS will have to look at iocb->private to fix aio+dio. No other filesystem is using the blockdev_direct_IO* end_io callback. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:19 -07:00
David Howells	3e30148c3d	[PATCH] Keys: Make request-key create an authorisation key The attached patch makes the following changes: (1) There's a new special key type called ".request_key_auth". This is an authorisation key for when one process requests a key and another process is started to construct it. This type of key cannot be created by the user; nor can it be requested by kernel services. Authorisation keys hold two references: (a) Each refers to a key being constructed. When the key being constructed is instantiated the authorisation key is revoked, rendering it of no further use. (b) The "authorising process". This is either: (i) the process that called request_key(), or: (ii) if the process that called request_key() itself had an authorisation key in its session keyring, then the authorising process referred to by that authorisation key will also be referred to by the new authorisation key. This means that the process that initiated a chain of key requests will authorise the lot of them, and will, by default, wind up with the keys obtained from them in its keyrings. (2) request_key() creates an authorisation key which is then passed to /sbin/request-key in as part of a new session keyring. (3) When request_key() is searching for a key to hand back to the caller, if it comes across an authorisation key in the session keyring of the calling process, it will also search the keyrings of the process specified therein and it will use the specified process's credentials (fsuid, fsgid, groups) to do that rather than the calling process's credentials. This allows a process started by /sbin/request-key to find keys belonging to the authorising process. (4) A key can be read, even if the process executing KEYCTL_READ doesn't have direct read or search permission if that key is contained within the keyrings of a process specified by an authorisation key found within the calling process's session keyring, and is searchable using the credentials of the authorising process. This allows a process started by /sbin/request-key to read keys belonging to the authorising process. (5) The magic KEY_SPEC__KEYRING key IDs when passed to KEYCTL_INSTANTIATE or KEYCTL_NEGATE will specify a keyring of the authorising process, rather than the process doing the instantiation. (6) One of the process keyrings can be nominated as the default to which request_key() should attach new keys if not otherwise specified. This is done with KEYCTL_SET_REQKEY_KEYRING and one of the KEY_REQKEY_DEFL_ constants. The current setting can also be read using this call. (7) request_key() is partially interruptible. If it is waiting for another process to finish constructing a key, it can be interrupted. This permits a request-key cycle to be broken without recourse to rebooting. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-Off-By: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:19 -07:00
David Howells	7888e7ff4e	[PATCH] Keys: Pass session keyring to call_usermodehelper() The attached patch makes it possible to pass a session keyring through to the process spawned by call_usermodehelper(). This allows patch 3/3 to pass an authorisation key through to /sbin/request-key, thus permitting better access controls when doing just-in-time key creation. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:18 -07:00
David Howells	76d8aeabfe	[PATCH] keys: Discard key spinlock and use RCU for key payload The attached patch changes the key implementation in a number of ways: (1) It removes the spinlock from the key structure. (2) The key flags are now accessed using atomic bitops instead of write-locking the key spinlock and using C bitwise operators. The three instantiation flags are dealt with with the construction semaphore held during the request_key/instantiate/negate sequence, thus rendering the spinlock superfluous. The key flags are also now bit numbers not bit masks. (3) The key payload is now accessed using RCU. This permits the recursive keyring search algorithm to be simplified greatly since no locks need be taken other than the usual RCU preemption disablement. Searching now does not require any locks or semaphores to be held; merely that the starting keyring be pinned. (4) The keyring payload now includes an RCU head so that it can be disposed of by call_rcu(). This requires that the payload be copied on unlink to prevent introducing races in copy-down vs search-up. (5) The user key payload is now a structure with the data following it. It includes an RCU head like the keyring payload and for the same reason. It also contains a data length because the data length in the key may be changed on another CPU whilst an RCU protected read is in progress on the payload. This would then see the supposed RCU payload and the on-key data length getting out of sync. I'm tempted to drop the key's datalen entirely, except that it's used in conjunction with quota management and so is a little tricky to get rid of. (6) Update the keys documentation. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:18 -07:00
Thomas Graf	d675c989ed	[PKT_SCHED]: Packet classification based on textsearch (ematch) Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 21:00:58 -07:00
Thomas Graf	3fc7e8a6d8	[NET]: skb_find_text() - Find a text pattern in skb data Finds a pattern in the skb data according to the specified textsearch configuration. Use textsearch_next() to retrieve subsequent occurrences of the pattern. Returns the offset to the first occurrence or UINT_MAX if no match was found. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 21:00:17 -07:00
Thomas Graf	677e90eda3	[NET]: Zerocopy sequential reading of skb data Implements sequential reading for both linear and non-linear skb data at zerocopy cost. The data is returned in chunks of arbitary length, therefore random access is not possible. Usage: from := 0 to := 128 state := undef data := undef len := undef consumed := 0 skb_prepare_seq_read(skb, from, to, &state) while (len = skb_seq_read(consumed, &data, &state)) != 0 do /* do something with 'data' of length 'len' / if abort then / abort read if we don't wait for * skb_seq_read() to return 0 / skb_abort_seq_read(&state) return endif / not necessary to consume all of 'len' */ consumed += len done Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:59:51 -07:00
Thomas Graf	6408f79cce	[LIB]: Naive finite state machine based textsearch A finite state machine consists of n states (struct ts_fsm_token) representing the pattern as a finite automation. The data is read sequentially on a octet basis. Every state token specifies the number of recurrences and the type of value accepted which can be either a specific character or ctype based set of characters. The available type of recurrences include 1, (0\|1), [0 n], and [1 n]. The algorithm differs between strict/non-strict mode specyfing whether the pattern has to start at the first octect. Strict mode is enabled by default and can be disabled by inserting TS_FSM_HEAD_IGNORE as the first token in the chain. The runtime performance of the algorithm should be around O(n), however while in strict mode the average runtime can be better. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:59:16 -07:00
Thomas Graf	2de4ff7bd6	[LIB]: Textsearch infrastructure. The textsearch infrastructure provides text searching facitilies for both linear and non-linear data. Individual search algorithms are implemented in modules and chosen by the user. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:49:30 -07:00
Stephen Hemminger	5f8ef48d24	[TCP]: Allow choosing TCP congestion control via sockopt. Allow using setsockopt to set TCP congestion control to use on a per socket basis. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:37:36 -07:00
Stephen Hemminger	51b0bdedb8	[NET]: Separate two usages of netdev_max_backlog. Separate out the two uses of netdev_max_backlog. One controls the upper bound on packets processed per softirq, the new name for this is netdev_budget; the other controls the limit on packets queued via netif_rx. Increase the max_backlog default to account for faster processors. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:14:40 -07:00
Stephen Hemminger	31aa02c53c	[NET]: Eliminate netif_rx massive packet drops. Eliminate the throttling behaviour when the netif receive queue fills because it behaves badly when using high speed networks under load. The throttling cause multiple packet drops that cause TCP to go into slow start mode. The same effective patch has been part of BIC TCP and H-TCP as well as part of Web100. The existing code drops 100's of packets when the queue fills; this changes it to individual packet drop-tail. Signed-off-by: Stephen Hemmminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:12:48 -07:00
Stephen Hemminger	34008d8c63	[NET]: Remove obsolete netif_rx congestion sensing mechanism. Remove the congestion sensing mechanism from netif_rx, and always return either full or empty. Almost no driver checks the return value from netif_rx, and those that do only use it for debug messages. The original design of netif_rx was to do flow control based on the receive queue, but NAPI has supplanted this and no driver uses the feedback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:10:00 -07:00
Stephen Hemminger	c1ebcdb8c4	[NET]: Remove obsolete fastroute stats. Remove last vestiages of fastroute code that is no longer used. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:08:59 -07:00
Stephen Hemminger	056ede6cfa	[TCP]: Report congestion control algorithm in tcp_diag. Enhancement to the tcp_diag interface used by the iproute2 ss command to report the tcp congestion control being used by a socket. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:21:28 -07:00
Stephen Hemminger	317a76f9a4	[TCP]: Add pluggable congestion control algorithm infrastructure. Allow TCP to have multiple pluggable congestion control algorithms. Algorithms are defined by a set of operations and can be built in or modules. The legacy "new RENO" algorithm is used as a starting point and fallback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:19:55 -07:00
Adrian Bunk	4749f32da9	[PATCH] better USB_MON dependencies This makes the USB_MON less confusing. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 10:04:15 -07:00
Alexey Dobriyan	bfb07599da	[PATCH] Introduce tty_unregister_ldisc() It's a bit strange to see tty_register_ldisc call in modules' exit functions. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:35 -07:00
Benjamin LaHaise	c43dc2fd88	[PATCH] aio: make wait_queue ->task ->private In the upcoming aio_down patch, it is useful to store a private data pointer in the kiocb's wait_queue. Since we provide our own wake up function and do not require the task_struct pointer, it makes sense to convert the task pointer into a generic private pointer. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:34 -07:00
Christoph Hellwig	9a59f452ab	[PATCH] remove <linux/xattr_acl.h> This file duplicates <linux/posix_acl_xattr.h>, using slightly different names. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:33 -07:00
Christoph Hellwig	f9fd27a253	[PATCH] acl endianess annotations Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:33 -07:00
Christoph Lameter	45778ca819	[PATCH] Remove f_error field from struct file The following patch removes the f_error field and all checks of f_error. Trond said: f_error was introduced for NFS, and made sense when we were guaranteed always to have a file pointer around when write errors occurred. Since then, we have (for various reasons) had to introduce the nfs_open_context in order to track the file read/write state, and it made sense to move our f_error tracking there too. Signed-off-by: Christoph Lameter <christoph@lameter.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:33 -07:00
Arnd Bergmann	bb93e3a52f	[PATCH] block: add unlocked_ioctl support for block devices This patch allows block device drivers to convert their ioctl functions to unlocked_ioctl() like character devices and other subsystems. All functions that were called with the BKL held before are still used that way, but I would not be surprised if it could be removed from the ioctl functions in drivers/block/ioctl.c themselves. As a side note, I found that compat_blkdev_ioctl() acquires the BKL as well, which looks like a bug. I have checked that every user of disk->fops->compat_ioctl() in the current git tree gets the BKL itself, so it could easily be removed from compat_blkdev_ioctl(). Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:32 -07:00
Peter Osterlund	46c271bedd	[PATCH] Improve CD/DVD packet driver write performance This patch improves write performance for the CD/DVD packet writing driver. The logic for switching between reading and writing has been changed so that streaming writes are no longer interrupted by read requests. Signed-off-by: Peter Osterlund <petero2@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:30 -07:00
Yoav Zach	ef3daeda7b	[PATCH] Don't force O_LARGEFILE for 32 bit processes on ia64 In ia64 kernel, the O_LARGEFILE flag is forced when opening a file. This is problematic for execution of 32 bit processes, which are not largefile aware, either by SW emulation or by HW execution. For such processes, the problem is two-fold: 1) When trying to open a file that is larger than 4G the operation should fail, but it's not 2) Writing to offset larger than 4G should fail, but it's not The proposed patch takes advantage of the way 32 bit processes are identified in ia64 systems. Such processes have PER_LINUX32 for their personality. With the patch, the ia64 kernel will not enforce the O_LARGEFILE flag if the current process has PER_LINUX32 set. The behavior for all other architectures remains unchanged. Signed-off-by: Yoav Zach <yoav.zach@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:28 -07:00
Alan Cox	d6e7114481	[PATCH] setuid core dump Add a new `suid_dumpable' sysctl: This value can be used to query and set the core dump mode for setuid or otherwise protected/tainted binaries. The modes are 0 - (default) - traditional behaviour. Any process which has changed privilege levels or is execute only will not be dumped 1 - (debug) - all processes dump core when possible. The core dump is owned by the current user and no security is applied. This is intended for system debugging situations only. Ptrace is unchecked. 2 - (suidsafe) - any binary which normally would not be dumped is dumped readable by root only. This allows the end user to remove such a dump but not access it directly. For security reasons core dumps in this mode will not overwrite one another or other files. This mode is appropriate when adminstrators are attempting to debug problems in a normal environment. (akpm: > > +EXPORT_SYMBOL(suid_dumpable); > > EXPORT_SYMBOL_GPL? No problem to me. > > if (current->euid == current->uid && current->egid == current->gid) > > current->mm->dumpable = 1; > > Should this be SUID_DUMP_USER? Actually the feedback I had from last time was that the SUID_ defines should go because its clearer to follow the numbers. They can go everywhere (and there are lots of places where dumpable is tested/used as a bool in untouched code) > Maybe this should be renamed to `dump_policy' or something. Doing that > would help us catch any code which isn't using the #defines, too. Fair comment. The patch was designed to be easy to maintain for Red Hat rather than for merging. Changing that field would create a gigantic diff because it is used all over the place. ) Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:26 -07:00
Prasanna S Panchamukhi	ea32c65cc2	[PATCH] kprobes: Temporary disarming of reentrant probe In situations where a kprobes handler calls a routine which has a probe on it, then kprobes_handler() disarms the new probe forever. This patch removes the above limitation by temporarily disarming the new probe. When the another probe hits while handling the old probe, the kprobes_handler() saves previous kprobes state and handles the new probe without calling the new kprobes registered handlers. kprobe_post_handler() restores back the previous kprobes state and the normal execution continues. However on x86_64 architecture, re-rentrancy is provided only through pre_handler(). If a routine having probe is referenced through post_handler(), then the probes on that routine are disarmed forever, since the exception stack is gets changed after the processor single steps the instruction of the new probe. This patch includes generic changes to support temporary disarming on reentrancy of probes. Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:24 -07:00
Hien Nguyen	0aa55e4d7d	[PATCH] kprobes: moves lock-unlock to non-arch kprobe_flush_task This patch moves the lock/unlock of the arch specific kprobe_flush_task() to the non-arch specific kprobe_flusk_task(). Signed-off-by: Hien Nguyen <hien@us.ibm.com> Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:21 -07:00
Rusty Lynch	7e1048b11c	[PATCH] Move kprobe [dis]arming into arch specific code The architecture independent code of the current kprobes implementation is arming and disarming kprobes at registration time. The problem is that the code is assuming that arming and disarming is a just done by a simple write of some magic value to an address. This is problematic for ia64 where our instructions look more like structures, and we can not insert break points by just doing something like: p->addr = BREAKPOINT_INSTRUCTION; The following patch to 2.6.12-rc4-mm2 adds two new architecture dependent functions: void arch_arm_kprobe(struct kprobe p) void arch_disarm_kprobe(struct kprobe *p) and then adds the new functions for each of the architectures that already implement kprobes (spar64/ppc64/i386/x86_64). I thought arch_[dis]arm_kprobe was the most descriptive of what was really happening, but each of the architectures already had a disarm_kprobe() function that was really a "disarm and do some other clean-up items as needed when you stumble across a recursive kprobe." So... I took the liberty of changing the code that was calling disarm_kprobe() to call arch_disarm_kprobe(), and then do the cleanup in the block of code dealing with the recursive kprobe case. So far this patch as been tested on i386, x86_64, and ppc64, but still needs to be tested in sparc64. Signed-off-by: Rusty Lynch <rusty.lynch@intel.com> Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:21 -07:00
Hien Nguyen	b94cce926b	[PATCH] kprobes: function-return probes This patch adds function-return probes to kprobes for the i386 architecture. This enables you to establish a handler to be run when a function returns. 1. API Two new functions are added to kprobes: int register_kretprobe(struct kretprobe rp); void unregister_kretprobe(struct kretprobe rp); 2. Registration and unregistration 2.1 Register To register a function-return probe, the user populates the following fields in a kretprobe object and calls register_kretprobe() with the kretprobe address as an argument: kp.addr - the function's address handler - this function is run after the ret instruction executes, but before control returns to the return address in the caller. maxactive - The maximum number of instances of the probed function that can be active concurrently. For example, if the function is non- recursive and is called with a spinlock or mutex held, maxactive = 1 should be enough. If the function is non-recursive and can never relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should be enough. maxactive is used to determine how many kretprobe_instance objects to allocate for this particular probed function. If maxactive <= 0, it is set to a default value (if CONFIG_PREEMPT maxactive=max(10, 2 * NR_CPUS) else maxactive=NR_CPUS) For example: struct kretprobe rp; rp.kp.addr = /* entrypoint address / rp.handler = /return probe handler / rp.maxactive = / e.g., 1 or NR_CPUS or 0, see the above explanation */ register_kretprobe(&rp); The following field may also be of interest: nmissed - Initialized to zero when the function-return probe is registered, and incremented every time the probed function is entered but there is no kretprobe_instance object available for establishing the function-return probe (i.e., because maxactive was set too low). 2.2 Unregister To unregiter a function-return probe, the user calls unregister_kretprobe() with the same kretprobe object as registered previously. If a probed function is running when the return probe is unregistered, the function will return as expected, but the handler won't be run. 3. Limitations 3.1 This patch supports only the i386 architecture, but patches for x86_64 and ppc64 are anticipated soon. 3.2 Return probes operates by replacing the return address in the stack (or in a known register, such as the lr register for ppc). This may cause __builtin_return_address(0), when invoked from the return-probed function, to return the address of the return-probes trampoline. 3.3 This implementation uses the "Multiprobes at an address" feature in 2.6.12-rc3-mm3. 3.4 Due to a limitation in multi-probes, you cannot currently establish a return probe and a jprobe on the same function. A patch to remove this limitation is being tested. This feature is required by SystemTap (http://sourceware.org/systemtap), and reflects ideas contributed by several SystemTap developers, including Will Cohen and Ananth Mavinakayanahalli. Signed-off-by: Hien Nguyen <hien@us.ibm.com> Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Frederik Deweerdt <frederik.deweerdt@laposte.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:21 -07:00
Christoph Hellwig	84de856ed3	[PATCH] quota: consolidate code surrounding vfs_quota_on_mount Move some code duplicated in both callers into vfs_quota_on_mount Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:20 -07:00
Neil Horman	ac20427ef6	[PATCH] add check to /proc/devices read routines Patch to add check to get_chrdev_list and get_blkdev_list to prevent reads of /proc/devices from spilling over the provided page if more than 4096 bytes of string data are generated from all the registered character and block devices in a system Signed-off-by: Neil Horman <nhorman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:19 -07:00
Nick Piggin	35a82d1a53	[PATCH] optimise loop driver a bit Looks like locking can be optimised quite a lot. Increase lock widths slightly so lo_lock is taken fewer times per request. Also it was quite trivial to cover lo_pending with that lock, and remove the atomic requirement. This also makes memory ordering explicitly correct, which is nice (not that I particularly saw any mem ordering bugs). Test was reading 4 250MB files in parallel on ext2-on-tmpfs filesystem (1K block size, 4K page size). System is 2 socket Xeon with HT (4 thread). intel:/home/npiggin# umount /dev/loop0 ; mount /dev/loop0 /mnt/loop ; /usr/bin/time ./mtloop.sh Before: 0.24user 5.51system 0:02.84elapsed 202%CPU (0avgtext+0avgdata 0maxresident)k 0.19user 5.52system 0:02.88elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k 0.19user 5.57system 0:02.89elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k 0.22user 5.51system 0:02.90elapsed 197%CPU (0avgtext+0avgdata 0maxresident)k 0.19user 5.44system 0:02.91elapsed 193%CPU (0avgtext+0avgdata 0maxresident)k After: 0.07user 2.34system 0:01.68elapsed 143%CPU (0avgtext+0avgdata 0maxresident)k 0.06user 2.37system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k 0.06user 2.39system 0:01.68elapsed 145%CPU (0avgtext+0avgdata 0maxresident)k 0.06user 2.36system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k 0.06user 2.42system 0:01.68elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:18 -07:00
Paulo Marques	543537bd92	[PATCH] create a kstrdup library function This patch creates a new kstrdup library function and changes the "local" implementations in several places to use this function. Most of the changes come from the sound and net subsystems. The sound part had already been acknowledged by Takashi Iwai and the net part by David S. Miller. I left UML alone for now because I would need more time to read the code carefully before making changes there. Signed-off-by: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:18 -07:00
Alexander Viro	991114c6fa	[PATCH] fix for prune_icache()/forced final iput() races Based on analysis and a patch from Russ Weight <rweight@us.ibm.com> There is a race condition that can occur if an inode is allocated and then released (using iput) during the ->fill_super functions. The race condition is between kswapd and mount. For most filesystems this can only happen in an error path when kswapd is running concurrently. For isofs, however, the error can occur in a more common code path (which is how the bug was found). The logic here is "we want final iput() to free inode now instead of letting it sit in cache if fs is going down or had not quite come up". The problem is with kswapd seeing such inodes in the middle of being killed and happily taking over. The clean solution would be to tell kswapd to leave those inodes alone and let our final iput deal with them. I.e. add a new flag (I_FORCED_FREEING), set it before write_inode_now() there and make prune_icache() leave those alone. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:17 -07:00
Oleg Nesterov	fd450b7318	[PATCH] timers: introduce try_to_del_timer_sync() This patch splits del_timer_sync() into 2 functions. The new one, try_to_del_timer_sync(), returns -1 when it hits executing timer. It can be used in interrupt context, or when the caller hold locks which can prevent completion of the timer's handler. NOTE. Currently it can't be used in interrupt context in UP case, because ->running_timer is used only with CONFIG_SMP. Should the need arise, it is possible to kill #ifdef CONFIG_SMP in set_running_timer(), it is cheap. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:16 -07:00
Oleg Nesterov	55c888d6d0	[PATCH] timers fixes/improvements This patch tries to solve following problems: 1. del_timer_sync() is racy. The timer can be fired again after del_timer_sync have checked all cpus and before it will recheck timer_pending(). 2. It has scalability problems. All cpus are scanned to determine if the timer is running on that cpu. With this patch del_timer_sync is O(1) and no slower than plain del_timer(pending_timer), unless it has to actually wait for completion of the currently running timer. The only restriction is that the recurring timer should not use add_timer_on(). 3. The timers are not serialized wrt to itself. If CPU_0 does mod_timer(jiffies+1) while the timer is currently running on CPU 1, it is quite possible that local interrupt on CPU_0 will start that timer before it finished on CPU_1. 4. The timers locking is suboptimal. __mod_timer() takes 3 locks at once and still requires wmb() in del_timer/run_timers. The new implementation takes 2 locks sequentially and does not need memory barriers. Currently ->base != NULL means that the timer is pending. In that case ->base.lock is used to lock the timer. __mod_timer also takes timer->lock because ->base can be == NULL. This patch uses timer->entry.next != NULL as indication that the timer is pending. So it does __list_del(), entry->next = NULL instead of list_del() when the timer is deleted. The ->base field is used for hashed locking only, it is initialized in init_timer() which sets ->base = per_cpu(tvec_bases). When the tvec_bases.lock is locked, it means that all timers which are tied to this base via timer->base are locked, and the base itself is locked too. So __run_timers/migrate_timers can safely modify all timers which could be found on ->tvX lists (pending timers). When the timer's base is locked, and the timer removed from ->entry list (which means that _run_timers/migrate_timers can't see this timer), it is possible to set timer->base = NULL and drop the lock: the timer remains locked. This patch adds lock_timer_base() helper, which waits for ->base != NULL, locks the ->base, and checks it is still the same. __mod_timer() schedules the timer on the local CPU and changes it's base. However, it does not lock both old and new bases at once. It locks the timer via lock_timer_base(), deletes the timer, sets ->base = NULL, and unlocks old base. Then __mod_timer() locks new_base, sets ->base = new_base, and adds this timer. This simplifies the code, because AB-BA deadlock is not possible. __mod_timer() also ensures that the timer's base is not changed while the timer's handler is running on the old base. __run_timers(), del_timer() do not change ->base anymore, they only clear pending flag. So del_timer_sync() can test timer->base->running_timer == timer to detect whether it is running or not. We don't need timer_list->lock anymore, this patch kills it. We also don't need barriers. del_timer() and __run_timers() used smp_wmb() before clearing timer's pending flag. It was needed because __mod_timer() did not lock old_base if the timer is not pending, so __mod_timer()->list_add() could race with del_timer()->list_del(). With this patch these functions are serialized through base->lock. One problem. TIMER_INITIALIZER can't use per_cpu(tvec_bases). So this patch adds global struct timer_base_s { spinlock_t lock; struct timer_list *running_timer; } __init_timer_base; which is used by TIMER_INITIALIZER. The corresponding fields in tvec_t_base_s struct are replaced by struct timer_base_s t_base. It is indeed ugly. But this can't have scalability problems. The global __init_timer_base.lock is used only when __mod_timer() is called for the first time AND the timer was compile time initialized. After that the timer migrates to the local CPU. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Renaud Lienhart <renaud.lienhart@free.fr> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:16 -07:00
Tejun Heo	f7d37d028d	[PATCH] blk: remove BLK_TAGS_{PER_LONG\|MASK} Replace BLK_TAGS_PER_LONG with BITS_PER_LONG and remove unused BLK_TAGS_MASK. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:15 -07:00
Tejun Heo	fa72b903f7	[PATCH] blk: remove blk_queue_tag->real_max_depth optimization blk_queue_tag->real_max_depth was used to optimize out unnecessary allocations/frees on tag resize. However, the whole thing was very broken - tag_map was never allocated to real_max_depth resulting in access beyond the end of the map, bits in [max_depth..real_max_depth] were set when initializing a map and copied when resizing resulting in pre-occupied tags. As the gain of the optimization is very small, well, almost nill, remove the whole thing. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:15 -07:00
Christoph Lameter	1946089a10	[PATCH] NUMA aware block device control structure allocation Patch to allocate the control structures for for ide devices on the node of the device itself (for NUMA systems). The patch depends on the Slab API change patch by Manfred and me (in mm) and the pcidev_to_node patch that I posted today. Does some realignment too. Signed-off-by: Justin M. Forbes <jmforbes@linuxtx.org> Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Pravin Shelar <pravin@calsoftinc.com> Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:09 -07:00
Andy Whitcroft	29751f6991	[PATCH] sparsemem hotplug base Make sparse's initalization be accessible at runtime. This allows sparse mappings to be created after boot in a hotplug situation. This patch is separated from the previous one just to give an indication how much of the sparse infrastructure is just for hotplug memory. The section_mem_map doesn't really store a pointer. It stores something that is convenient to do some math against to get a pointer. It isn't valid to just do section_mem_map, so I don't think it should be stored as a pointer. There are a couple of things I'd like to store about a section. First of all, the fact that it is !NULL does not mean that it is present. There could be such a combination where section_mem_map is* NULL, but the math gets you properly to a real mem_map. So, I don't think that check is safe. Since we're storing 32-bit-aligned structures, we have a few bits in the bottom of the pointer to play with. Use one bit to encode whether there's really a mem_map there, and the other one to tell whether there's a valid section there. We need to distinguish between the two because sometimes there's a gap between when a section is discovered to be present and when we can get the mem_map for it. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:05 -07:00
Andy Whitcroft	641c767389	[PATCH] sparsemem swiss cheese numa layouts The part of the sparsemem patch which modifies memmap_init_zone() has recently become a problem. It changes behavior so that there is a call to pfn_to_page() for each individual page inside of a node's range: node_start_pfn through node_end_pfn. It used to simply do this once, at the beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside of a node made it necessary to change. Mike Kravetz recently wrote a patch which made the NUMA code accept some new kinds of layouts. The system's memory was laid out like this, with node 0's memory in two pieces: one before and one after node 1's memory: Node 0: +++++ +++++ Node 1: +++++ Previous behavior before Mike's patch was to assign nodes like this: Node 0: 00000 XXXXX Node 1: 11111 Where the 'X' areas were simply thrown away. The new behavior was to make the pg_data_t span node 0 across all of its areas, including areas that are really node 1's: Node 0: 000000000000000 Node 1: 11111 This wastes a little bit of mem_map space, but ends up being OK, and more fully utilizes the system's memory. memmap_init_zone() initializes all of the "struct page"s for node 0, even for the "hole", but those never get used, because there is no pfn_to_page() that resolves to those pages. However, only calling pfn_to_page() once, memmap_init_zone() always uses the pages that were allocated for node0->node_mem_map because: struct page *start = pfn_to_page(start_pfn); // effectively start = &node->node_mem_map[0] for (page = start; page < (start + size); page++) { init_page_here();... page++; } Slow, and wasteful, but generally harmless. But, modify that to call pfn_to_page() for each loop iteration (like sparsemem does): for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) { page = pfn_to_page(pfn); } And you end up trying to initialize node 1's pages too early, along with bogus data from node 0. This patch checks for those weird layouts and declines to touch the pages, making the more frequent pfn_to_page() calls OK to do. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:05 -07:00
Andy Whitcroft	d41dee369b	[PATCH] sparsemem memory model Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of mem_map[] is needed by discontiguous memory machines (like in the old CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually become a complete replacement. A significant advantage over DISCONTIGMEM is that it's completely separated from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA and DISCONTIG are often confused. Another advantage is that sparse doesn't require each NUMA node's ranges to be contiguous. It can handle overlapping ranges between nodes with no problems, where DISCONTIGMEM currently throws away that memory. Sparsemem uses an array to provide different pfn_to_page() translations for each SECTION_SIZE area of physical memory. This is what allows the mem_map[] to be chopped up. In order to do quick pfn_to_page() operations, the section number of the page is encoded in page->flags. Part of the sparsemem infrastructure enables sharing of these bits more dynamically (at compile-time) between the page_zone() and sparsemem operations. However, on 32-bit architectures, the number of bits is quite limited, and may require growing the size of the page->flags type in certain conditions. Several things might force this to occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of memory), an increase in the physical address space, or an increase in the number of used page->flags. One thing to note is that, once sparsemem is present, the NUMA node information no longer needs to be stored in the page->flags. It might provide speed increases on certain platforms and will be stored there if there is room. But, if out of room, an alternate (theoretically slower) mechanism is used. This patch introduces CONFIG_FLATMEM. It is used in almost all cases where there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM often have to compile out the same areas of code. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:04 -07:00
Andy Whitcroft	b159d43fbf	[PATCH] generify early_pfn_to_nid Provide a default implementation for early_pfn_to_nid returning node 0. Allow architectures to override this with their own implementation out of asm/mmzone.h. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:04 -07:00
Dave Hansen	93b7504e3e	[PATCH] Introduce new Kconfig option for NUMA or DISCONTIG There is some confusion that arose when working on SPARSEMEM patch between what is needed for DISCONTIG vs. NUMA. Multiple pg_data_t's are needed for DISCONTIGMEM or NUMA, independently. All of the current NUMA implementations require an implementation of DISCONTIG. Because of this, quite a lot of code which is really needed for NUMA is actually under DISCONTIG #ifdefs. For SPARSEMEM, we changed some of these #ifdefs to CONFIG_NUMA, but that broke the DISCONTIG=y and NUMA=n case. Introducing this new NEED_MULTIPLE_NODES config option allows code that is needed for both NUMA or DISCONTIG to be separated out from code that is specific to DISCONTIG. One great advantage of this approach is that it doesn't require every architecture to be converted over. All of the current implementations should "just work", only the ones implementing SPARSEMEM will have to be fixed up. The change to free_area_init() makes it work inside, or out of the new config option. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:03 -07:00
Dave Hansen	348f8b6c48	[PATCH] sparsemem base: reorganize page->flags bit operations Generify the value fields in the page_flags. The aim is to allow the location and size of these fields to be varied. Additionally we want to move away from fixed allocations per field whilst still enforcing the overall bit utilisation limits. We rely on the compiler to spot and optimise the accessor functions. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:01 -07:00
Dave Hansen	6f167ec721	[PATCH] sparsemem base: simple NUMA remap space allocator Introduce a simple allocator for the NUMA remap space. This space is very scarce, used for structures which are best allocated node local. This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep the pgdat->node_mem_map initialized in a consistent place for all architectures. Issues: o alloc_remap takes a node_id where we might expect a pgdat which was intended to allow us to allocate the pgdat's using this mechanism; which we do not yet do. Could have alloc_remap_node() and alloc_remap_nid() for this purpose. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:01 -07:00
Dave Hansen	408fde81c1	[PATCH] remove non-DISCONTIG use of pgdat->node_mem_map This patch effectively eliminates direct use of pgdat->node_mem_map outside of the DISCONTIG code. On a flat memory system, these fields aren't currently used, neither are they on a sparsemem system. There was also a node_mem_map(nid) macro on many architectures. Its use along with the use of ->node_mem_map itself was not consistent. It has been removed in favor of two new, more explicit, arch-independent macros: pgdat_page_nr(pgdat, pagenr) nid_page_nr(nid, pagenr) I called them "pgdat" and "nid" because we overload the term "node" to mean "NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways. I believe the newer names are much clearer. These macros can be overridden in the sparsemem case with a theoretically slower operation using node_start_pfn and pfn_to_page(), instead. We could make this the only behavior if people want, but I don't want to change too much at once. One thing at a time. This patch removes more code than it adds. Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386 generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64. Full list here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/ Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin J. Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:00 -07:00
Linus Torvalds	060de20e82	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2005-06-22 23:11:50 -07:00
Shaun Pereira	ebc3f64b86	[X25]: Fast select with no restriction on response This patch is a follow up to patch 1 regarding "Selective Sub Address matching with call user data". It allows use of the Fast-Select-Acceptance optional user facility for X.25. This patch just implements fast select with no restriction on response (NRR). What this means (according to ITU-T Recomendation 10/96 section 6.16) is that if in an incoming call packet, the relevant facility bits are set for fast-select-NRR, then the called DTE can issue a direct response to the incoming packet using a call-accepted packet that contains call-user-data. This patch allows such a response. The called DTE can also respond with a clear-request packet that contains call-user-data. However, this feature is currently not implemented by the patch. How is Fast Select Acceptance used? By default, the system does not allow fast select acceptance (as before). To enable a response to fast select acceptance, After a listen socket in created and bound as follows socket(AF_X25, SOCK_SEQPACKET, 0); bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr)); but before a listen system call is made, the following ioctl should be used. ioctl(call_soc,SIOCX25CALLACCPTAPPRV); Now the listen system call can be made listen(call_soc, 4); After this, an incoming-call packet will be accepted, but no call-accepted packet will be sent back until the following system call is made on the socket that accepts the call ioctl(vc_soc,SIOCX25SENDCALLACCPT); The network (or cisco xot router used for testing here) will allow the application server's call-user-data in the call-accepted packet, provided the call-request was made with Fast-select NRR. Signed-off-by: Shaun Pereira <spereira@tusc.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:16:17 -07:00
Shaun Pereira	cb65d506c3	[X25]: Selective sub-address matching with call user data. From: Shaun Pereira <spereira@tusc.com.au> This is the first (independent of the second) patch of two that I am working on with x25 on linux (tested with xot on a cisco router). Details are as follows. Current state of module: A server using the current implementation (2.6.11.7) of the x25 module will accept a call request/ incoming call packet at the listening x.25 address, from all callers to that address, as long as NO call user data is present in the packet header. If the server needs to choose to accept a particular call request/ incoming call packet arriving at its listening x25 address, then the kernel has to allow a match of call user data present in the call request packet with its own. This is required when multiple servers listen at the same x25 address and device interface. The kernel currently matches ALL call user data, if present. Current Changes: This patch is a follow up to the patch submitted previously by Andrew Hendry, and allows the user to selectively control the number of octets of call user data in the call request packet, that the kernel will match. By default no call user data is matched, even if call user data is present. To allow call user data matching, a cudmatchlength > 0 has to be passed into the kernel after which the passed number of octets will be matched. Otherwise the kernel behavior is exactly as the original implementation. This patch also ensures that as is normally the case, no call user data will be present in the Call accepted / call connected packet sent back to the caller Future Changes on next patch: There are cases however when call user data may be present in the call accepted packet. According to the X.25 recommendation (ITU-T 10/96) section 5.2.3.2 call user data may be present in the call accepted packet provided the fast select facility is used. My next patch will include this fast select utility and the ability to send up to 128 octets call user data in the call accepted packet provided the fast select facility is used. I am currently testing this, again with xot on linux and cisco. Signed-off-by: Shaun Pereira <spereira@tusc.com.au> (With a fix from Alexey Dobriyan <adobriyan@gmail.com>) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:15:01 -07:00
Jeff Moyer	fbeec2e155	[NETPOLL]: allow multiple netpoll_clients to register against one interface This patch provides support for registering multiple netpoll clients to the same network device. Only one of these clients may register an rx_hook, however. In practice, this restriction has not been problematic. It is worth mentioning, though, that the current design can be easily extended to allow for the registration of multiple rx_hooks. The basic idea of the patch is that the rx_np pointer in the netpoll_info structure points to the struct netpoll that has rx_hook filled in. Aside from this one case, there is no need for a pointer from the struct net_device to an individual struct netpoll. A lock is introduced to protect the setting and clearing of the np_rx pointer. The pointer will only be cleared upon netpoll client module removal, and the lock should be uncontested. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:05:59 -07:00
Jeff Moyer	115c1d6e61	[NETPOLL]: Introduce a netpoll_info struct This patch introduces a netpoll_info structure, which the struct net_device will now point to instead of pointing to a struct netpoll. The reason for this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock should be maintained per net_device, not per netpoll; and 2) this is a first step in providing support for multiple netpoll clients to register against the same net_device. The struct netpoll is now pointed to by the netpoll_info structure. As such, the previous behaviour of the code is preserved. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:05:31 -07:00
Jeff Moyer	6ca4f65e6b	[NETPOLL]: Set poll_owner to -1 before unlocking in netpoll_poll_unlock() This trivial patch moves the assignment of poll_owner to -1 inside of the lock. This fixes a potential SMP race in the code. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:04:55 -07:00
Linus Torvalds	9092131f7e	Merge rsync://client.linux-nfs.org/pub/linux/nfs-2.6	2005-06-22 14:32:15 -07:00
Trond Myklebust	8d0a8a9d0e	[PATCH] NFSv4: Clean up nfs4 lock state accounting Ensure that lock owner structures are not released prematurely. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:42 -04:00
Trond Myklebust	ecdbf769b2	[PATCH] NLM: fix a client-side race on blocking locks. If the lock blocks, the server may send us a GRANTED message that races with the reply to our LOCK request. Make sure that we catch the GRANTED by queueing up our request on the nlm_blocked list before we send off the first LOCK rpc call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:42 -04:00
Trond Myklebust	3da28eb1c6	[PATCH] NFS: Replace nfs_page insertion sort with a radix sort Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:39 -04:00
Trond Myklebust	c6a556b88a	[PATCH] NFS: Make searching and waiting on busy writeback requests more efficient. Basically copies the VFS's method for tracking writebacks and applies it to the struct nfs_page. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:39 -04:00
Trond Myklebust	fe51beecc5	[PATCH] NFS: Ensure that fstat() always returns the correct mtime Even if the file is open for writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:37 -04:00
Trond Myklebust	7d52e86274	[PATCH] NFS: Cleanup of caching code, and slight optimization of writes. Unless we're doing O_APPEND writes, we really don't care about revalidating the file length. Just make sure that we catch any page cache invalidations. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:37 -04:00
Trond Myklebust	951a143b3f	[PATCH] NFS: Fix the file size revalidation Instead of looking at whether or not the file is open for writes before we accept to update the length using the server value, we should rather be looking at whether or not we are currently caching any writes. Failure to do so means in particular that we're not updating the file length correctly after obtaining a POSIX or BSD lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:36 -04:00
Trond Myklebust	f0dd2136da	[PATCH] NFS: Clean up readdir changes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:34 -04:00
Olivier Galibert	00a9264227	[PATCH] NFS: Hide NFS server-generated readdir cookies from userland NFSv3 currently returns the unsigned 64-bit cookie directly to userspace. The following patch causes the kernel to generate loff_t offsets for the benefit of userland. The current server-generated READDIR cookie is cached in the nfs_open_context instead of in filp->f_pos, so we still end up work correctly under directory insertions/deletion. Signed-off-by: Olivier Galibert <galibert@pobox.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:33 -04:00
Andreas Gruenbacher	5c6a9f7d92	[PATCH] NFS: Cache the NFSv3 acls. Attach acls to inodes in the icache to avoid unnecessary GETACL RPC round-trips. As long as the client doesn't retrieve any acls itself, only the default acls of exiting directories and the default and access acls of new directories will end up in the cache, which preserves some memory compared to always caching the access and default acl of all files. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:25 -04:00
Andreas Gruenbacher	055ffbea05	[PATCH] NFS: Fix handling of the umask when an NFSv3 default acl is present. NFSv3 has no concept of a umask on the server side: The client applies the umask locally, and sends the effective permissions to the server. This behavior is wrong when files are created in a directory that has a default ACL. In this case, the umask is supposed to be ignored, and only the default ACL determines the file's effective permissions. Usually its the server's task to conditionally apply the umask. But since the server knows nothing about the umask, we have to do it on the client side. This patch tries to fetch the parent directory's default ACL before creating a new file, computes the appropriate create mode to send to the server, and finally sets the new file's access and default acl appropriately. Many thanks to Buck Huppmann <buchk@pobox.com> for sending the initial version of this patch, as well as for arguing why we need this change. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:24 -04:00
Andreas Gruenbacher	b7fa0554cf	[PATCH] NFS: Add support for NFSv3 ACLs This adds acl support fo nfs clients via the NFSACL protocol extension, by implementing the getxattr, listxattr, setxattr, and removexattr iops for the system.posix_acl_access and system.posix_acl_default attributes. This patch implements a dumb version that uses no caching (and thus adds some overhead). (Another patch in this patchset adds caching as well.) Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:24 -04:00
Andreas Gruenbacher	a257cdd0e2	[PATCH] NFSD: Add server support for NFSv3 ACLs. This adds functions for encoding and decoding POSIX ACLs for the NFSACL protocol extension, and the GETACL and SETACL RPCs. The implementation is compatible with NFSACL in Solaris. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:23 -04:00
Andreas Gruenbacher	9ba02638e4	[PATCH] RPC: Allow the sunrpc server to multiplex serveral programs on a single port The NFS and NFSACL programs run on the same RPC transport. This patch adds support for this by converting svc_program into a chained list of programs (server-side). Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:22 -04:00
Andreas Gruenbacher	bd8100e7ed	[PATCH] RPC: Encode and decode arbitrary XDR arrays Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:20 -04:00
Trond Myklebust	7e06b53d79	[PATCH] RPC: fix accounting bug in the case of a truncated RPC message Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:19 -04:00
Olaf Kirch	e053d1ab62	[PATCH] RPC: Lazy RPC receive buffer allocation Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:19 -04:00
Andreas Gruenbacher	007e251f2b	[PATCH] RPC: Allow multiple RPC client programs to share the same transport Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:18 -04:00
J. Bruce Fields	e50a1c2e1f	[PATCH] NFSv4: client-side caching NFSv4 ACLs Add nfs4_acl field to the nfs_inode, and use it to cache acls. Only cache acls of size up to a page. Also prepare for up to a page of acl data even when the user doesn't pass in a buffer, as when they want to get the acl length to decide what size buffer to allocate. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:15 -04:00
J. Bruce Fields	23ec6965c2	[PATCH] NFSv4: Client-side xdr for writing NFSv4 acls Client-side support for NFSv4 acls: xdr encoding and decoding routines for writing acls Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:13 -04:00
J. Bruce Fields	029d105e66	[PATCH] NFSv4: Client-side xdr for reading NFSv4 acls Client-side support for NFSv4 acls: xdr encoding and decoding routines for reading acls Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:12 -04:00
Trond Myklebust	ada70d9425	[PATCH] NFS: Add hooks to allow common NFS attribute code to clear cached acls Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:09 -04:00
J. Bruce Fields	92cfc62cb8	[PATCH] NFS: Allow NFS versions to support different sets of inode operations. ACL support will require supporting additional inode operations in v4 (getxattr, setxattr, listxattr). This patch allows different protocol versions to support different inode operations by adding a file_inode_ops to the nfs_rpc_ops (to match the existing dir_inode_ops). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:09 -04:00
Trond Myklebust	464a98bd70	[PATCH] NFS: cleanup: shrink struct nfs_open_context Remove the wait queue, and replace the functions that depended on it with wait_on_bit(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:08 -04:00
Trond Myklebust	96651ab341	[PATCH] RPC: Shrink struct rpc_task by switching to wait_on_bit() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:07 -04:00
Trond Myklebust	a656db9987	[PATCH] NFS: Remove unused NFS inode field readdir_timestamp. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:07 -04:00
Trond Myklebust	4ce79717ce	[PATCH] NFS: Header file cleanup... - Move NFSv4 state definitions into a private header file. - Clean up gunk in nfs_fs.h Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:06 -04:00
Trond Myklebust	5ee0ed7d3a	[PATCH] RPC: Make rpc_create_client() probe server for RPC program+version support Ensure that we don't create an RPC client without checking that the server does indeed support the RPC program + version that we are trying to set up. This enables us to immediately return an error to "mount" if it turns out that the server is only supporting NFSv2, when we requested NFSv3 or NFSv4. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:04 -04:00
Harald Welte	dd7f0b8092	[NETFILTER]: Fix "iptables -D" rule deletion with ipt_CLUSTERIP target. The patch just changes the order of structure members. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 12:38:33 -07:00
Linus Torvalds	fb7a0e3653	Merge kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git Do arch/ia64/defconfig by hand.	2005-06-22 12:22:12 -07:00
Randy Vinson	c124a78d8c	[PATCH] I2C: Add support for Maxim/Dallas DS1374 Real-Time Clock Chip (1/2) Add support for Maxim/Dallas DS1374 Real-Time Clock Chip This change adds support for the Maxim/Dallas DS1374 RTC chip. This chip is an I2C-based RTC that maintains a simple 32-bit binary seconds count with battery backup support. Signed-off-by: Randy Vinson <rvinson@mvista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2005-06-21 21:52:06 -07:00
Jean Delvare	10c08f8100	[PATCH] I2C: rename i2c-sysfs.h to hwmon-sysfs.h This patch renames the new linux/i2c-sysfs.h header file to linux/hwmon-sysfs.h. This names seems to be more appropriate since this file defines macros and structures not related to i2c but to hardware monitoring drivers. The patch also updates the five hardware monitoring driver which include that header file already. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2005-06-21 21:52:05 -07:00
Sebastian Witt	3886246a25	[PATCH] I2C: i2c-vid.h: Support for VID to reg conversion Adds conversion from VID (mV) to register value. Used by the atxp1 I2C module. Removed uneeded switch case. Signed-off-by: Sebastian Witt <se.witt@gmx.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2005-06-21 21:51:49 -07:00
Jean Delvare	b3d5496ea5	[PATCH] I2C: Kill address ranges in non-sensors i2c chip drivers Some months ago, you killed the address ranges mechanism from all sensors i2c chip drivers (both the module parameters and the in-code address lists). I think it was a very good move, as the ranges can easily be replaced by individual addresses, and this allowed for significant cleanups in the i2c core (let alone the impressive size shrink for all these drivers). Unfortunately you did not do the same for non-sensors i2c chip drivers. These need the address ranges even less, so we could get rid of the ranges here as well for another significant i2c core cleanup. Here comes a patch which does just that. Since the process is exactly the same as what you did for the other drivers set already, I did not split this one in parts. A documentation update is included. The change saves 308 bytes in the i2c core, and an average 1382 bytes for chip drivers which use I2C_CLIENT_INSMOD, 126 bytes for those which do not. This change is required if we want to merge the sensors and non-sensors i2c code (and we want to do this). Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Index: gregkh-2.6/Documentation/i2c/writing-clients ===================================================================	2005-06-21 21:51:48 -07:00
NeilBrown	39730960d9	[PATCH] Two small fixes for md verion-1 superblocks. 1/ Must typecast int to (sector_t) before inverting or we might not invert enough bits. 2/ When "bitmap_offset" was added to mdp_superblock_1, we didn't increase the count of words-used (96 to 100). Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:47 -07:00
NeilBrown	7bfa19f274	[PATCH] md: allow md to update multiple superblocks in parallel. currently, md updates all superblocks (one on each device) in series. It waits for one write to complete before starting the next. This isn't a big problem as superblock updates don't happen that often. However it is neater to do it in parallel, and if the drives in the array have gone to "sleep" after a period of idleness, then waking them is parallel is faster (and someone else should be worrying about power drain). Futher, we will need parallel superblock updates for a future patch which keeps the intent-logging bitmap near the superblock. Also remove the silly code that retired superblock updates 100 times. This simply never made sense. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:47 -07:00
NeilBrown	a654b9d8f8	[PATCH] md: allow md intent bitmap to be stored near the superblock. This provides an alternate to storing the bitmap in a separate file. The bitmap can be stored at a given offset from the superblock. Obviously the creator of the array must make sure this doesn't intersect with data.... After is good for version-0.90 superblocks. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:47 -07:00
NeilBrown	3d310eb7b3	[PATCH] md: fix deadlock due to md thread processing delayed requests. Before completing a 'write' the md superblock might need to be updated. This is best done by the md_thread. The current code schedules this up and queues the write request for later handling by the md_thread. However some personalities (Raid5/raid6) will deadlock if the md_thread tries to submit requests to its own array. So this patch changes things so the processes submitting the request waits for the superblock to be written and then submits the request itself. This fixes a recently-created deadlock in raid5/raid6 Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:46 -07:00
NeilBrown	41158c7eb2	[PATCH] md: optimise reconstruction when re-adding a recently failed drive. When an array is degraded, bit in the intent-bitmap are never cleared. So if a recently failed drive is re-added, we only need to reconstruct the block that are still reflected in the bitmap. This patch adds support for this re-adding. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:46 -07:00
NeilBrown	191ea9b2c7	[PATCH] md: raid1 support for bitmap intent logging Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:46 -07:00
NeilBrown	77ad4bc706	[PATCH] md: enable the bitmap write-back daemon and wait for it. Currently we don't wait for updates to the bitmap to be flushed to disk properly. The infrastructure all there, but it isn't being used.... A separate kernel thread (bitmap_writeback_daemon) is needed to wait for each page as we cannot get callbacks when a page write completes. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:45 -07:00
NeilBrown	32a7627cf3	[PATCH] md: optimised resync using Bitmap based intent logging With this patch, the intent to write to some block in the array can be logged to a bitmap file. Each bit represents some number of sectors and is set before any update happens, and only cleared when all writes relating to all sectors are complete. After an unclean shutdown, information in this bitmap can be used to optimise resync - only sectors which could be out-of-sync need to be updated. Also if a drive is removed and then added back into an array, the recovery can make use of the bitmap to optimise reconstruction. This is not implemented in this patch. Currently the bitmap is stored in a file which must (obviously) be stored on a separate device. The patch only provided infrastructure. It does not update any personalities to bitmap intent logging. Md arrays can still be used with no bitmap file. This patch has minimal impact on such arrays. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:43 -07:00
NeilBrown	57afd89f98	[PATCH] md: improve the interface to sync_request 1/ change the return value (which is number-of-sectors synced) from 'int' to 'sector_t'. The number of sectors is usually easily small enough to fit in an int, but if resync needs to abort, it may want to return the total number of remaining sectors, which could be large. Also errors cannot be returned as negative numbers now, so use 0 instead 2/ Add a 'skipped' return parameter to allow the array to report that it skipped the sectors. This allows md to take this into account in the speed calculations. Currently there is no important skipping, but the bitmap-based-resync that is coming will use this. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:43 -07:00
NeilBrown	06d91a5fe0	[PATCH] md: improve locking on 'safemode' and move superblock writes When md marks the superblock dirty before a write, it calls generic_make_request (to write the superblock) from within generic_make_request (to write the first dirty block), which could cause problems later. With this patch, the superblock write is always done by the helper thread, and write request are delayed until that write completes. Also, the locking around marking the array dirty and writing the superblock is improved to avoid possible races. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:43 -07:00
James Simmons	f1ab5dac25	[PATCH] fbdev: stack reduction Shrink the stack when calling the drawing alignment functions. Signed-off-by: James Simmons <jsimmons@www.infradead.org> Cc: "Antonino A. Daplas" <adaplas@hotpop.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:41 -07:00

1 2 3 4 5 ...

386 Commits