kernel-ark

Author	SHA1	Message	Date
FUJITA Tomonori	ac6b91b803	block: changes for blk_rq_unmap_user new API This converts block/scsi_ioctl.c use blk_rq_unmap_user new API. blk_unmap_sghdr_rq is too simple and it might be better to remove it. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-16 08:52:44 +02:00
Jens Axboe	3d6392cfbd	bsg: support for full generic block layer SG v3 Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-16 08:52:44 +02:00
Matthias Kaehlcke	70cee26e02	Use list_for_each_entry() instead of list_for_each() in the block device elevator Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 13:43:32 +02:00
Jan Engelhardt	16ed002f22	Use menuconfigs instead of menus, so the whole menu can be disabled at once instead of going through all options. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 13:43:27 +02:00
Jens Axboe	15c31be4d5	cfq-iosched: fix async queue behaviour With the cfq_queue hash removal, we inadvertently got rid of the async queue sharing. This was not intentional, in fact CFQ purposely shares the async queue per priority level to get good merging for async writes. So put some logic in cfq_get_queue() to track the shared queues. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 13:43:25 +02:00
Tejun Heo	f4b09303d0	[BLOCK] drop unnecessary bvec rewinding from flush_dry_bio_endio Barrier bios are completed twice - once after the barrier write itself is done and again after the whole sequence is complete. flush_dry_bio_endio() is for the first completion. It doesn't really complete the bio. It rewinds bvec and resets bio so that it can be completed again when the whole barrier sequence is complete. The bvec rewinding code has the following problems. 1. The rewinding code is wrong because filesystems may pass bvec with non zero bv_offset. 2. The block layer doesn't guarantee anything about the state of bvec array on request completion. bv_offset and len are updated iff __end_that_request_first() completes the bvec partially. Because of #2, #1 doesn't really matter (nobody cares whether bvec is re-wound correctly or not) but then again by not doing unwinding at all, we'll always give back the same bvec to the caller as full bvec completion doesn't alter bvecs and the final completion is always full completion. Drop unnecessary rewinding code. This is spotted by Neil Brown. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:03:33 +02:00
Jens Axboe	32eef96411	blk_hw_contig_segment(): bad segment size checks Two bugs in there: - The virt oversize check should use the current bio hardware back size and the next bio front size, not the same bio. Spotted by Neil Brown. - The segment size check should add hw front sizes, not total bio sizes. Spotted by James Bottomley Acked-by: James Bottomley <James.Bottomley@SteelEye.com> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:03:32 +02:00
Tejun Heo	bc90ba093a	block: always requeue !fs requests at the front SCSI marks internal commands with REQ_PREEMPT and push it at the front of the request queue using blk_execute_rq(). When entering suspended or frozen state, SCSI devices are quiesced using scsi_device_quiesce(). In quiesced state, only REQ_PREEMPT requests are processed. This is how SCSI blocks other requests out while suspending and resuming. As all internal commands are pushed at the front of the queue, this usually works. Unfortunately, this interacts badly with ordered requeueing. To preserve request order on requeueing (due to busy device, active EH or other failures), requests are sorted according to ordered sequence on requeue if IO barrier is in progress. The following sequence deadlocks. 1. IO barrier sequence issues. 2. Suspend requested. Queue is quiesced with part or all of IO barrier sequence at the front. 3. During suspending or resuming, SCSI issues internal command which gets deferred and requeued for some reason. As the command is issued after the IO barrier in #1, ordered requeueing code puts the request after IO barrier sequence. 4. The device is ready to process requests again but still is in quiesced state and the first request of the queue isn't REQ_PREEMPT, so command processing is deadlocked - suspending/resuming waits for the issued request to complete while the request can't be processed till device is put back into running state by resuming. This can be fixed by always putting !fs requests at the front when requeueing. The following thread reports this deadlock. http://thread.gmane.org/gmane.linux.kernel/537473 Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: David Greaves <david@dgreaves.com> Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-15 16:12:20 -07:00
Kristen Carlson Accardi	8ce7ad7b2d	genhd: send async notification on media change Send an uevent to user space to indicate that a media change event has occurred. Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-23 20:14:12 -07:00
Kristen Carlson Accardi	86ce18d7b7	genhd: expose AN to user space Allow user space to determine if a disk supports Asynchronous Notification of media changes. This is done by adding a new sysfs file "capability_flags", which is documented in (insert file name). This sysfs file will export all disk capabilities flags to user space. We also define a new flag to define the media change notification capability. Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-23 20:14:11 -07:00
Jens Axboe	f653c34dd3	ll_rw_blk: fix gcc 4.2 warning on current_io_context() current_io_context() is both static and exported with EXPORT_SYMBOL(). As there are no users outside of ll_rw_blk.c itself, just kill the export. Problem reported by Martin Michlmayr <tbm@cyrius.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-15 10:44:15 -07:00
Neil Brown	d89d87965d	When stacked block devices are in-use (e.g. md or dm), the recursive calls to generic_make_request can use up a lot of space, and we would rather they didn't. As generic_make_request is a void function, and as it is generally not expected that it will have any effect immediately, it is safe to delay any call to generic_make_request until there is sufficient stack space available. As ->bi_next is reserved for the driver to use, it can have no valid value when generic_make_request is called, and as __make_request implicitly assumes it will be NULL (ELEVATOR_BACK_MERGE fork of switch) we can be certain that all callers set it to NULL. We can therefore safely use bi_next to link pending requests together, providing we clear it before making the real call. So, we choose to allow each thread to only be active in one generic_make_request at a time. If a subsequent (recursive) call is made, the bio is linked into a per-thread list, and is handled when the active call completes. As the list of pending bios is per-thread, there are no locking issues to worry about. I say above that it is "safe to delay any call...". There are, however, some behaviours of a make_request_fn which would make it unsafe. These include any behaviour that assumes anything will have changed after a recursive call to generic_make_request. These could include: - waiting for that call to finish and call it's bi_end_io function. md use to sometimes do this (marking the superblock dirty before completing a write) but doesn't any more - inspecting the bio for fields that generic_make_request might change, such as bi_sector or bi_bdev. It is hard to see a good reason for this, and I don't think anyone actually does it. - inspecing the queue to see if, e.g. it is 'full' yet. Again, I think this is very unlikely to be useful, or to be done. Signed-off-by: Neil Brown <neilb@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: <dm-devel@redhat.com> Alasdair G Kergon <agk@redhat.com> said: I can see nothing wrong with this in principle. For device-mapper at the moment though it's essential that, while the bio mappings may now get delayed, they still get processed in exactly the same order as they were passed to generic_make_request(). My main concern is whether the timing changes implicit in this patch will make the rare data-corrupting races in the existing snapshot code more likely. (I'm working on a fix for these races, but the unfinished patch is already several hundred lines long.) It would be helpful if some people on this mailing list would test this patch in various scenarios and report back. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-05-11 13:28:37 +02:00
Linus Torvalds	9a9136e270	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits) sound: convert "sound" subdirectory to UTF-8 MAINTAINERS: Add cxacru website/mailing list include files: convert "include" subdirectory to UTF-8 general: convert "kernel" subdirectory to UTF-8 documentation: convert the Documentation directory to UTF-8 Convert the toplevel files CREDITS and MAINTAINERS to UTF-8. remove broken URLs from net drivers' output Magic number prefix consistency change to Documentation/magic-number.txt trivial: s/i_sem /i_mutex/ fix file specification in comments drivers/base/platform.c: fix small typo in doc misc doc and kconfig typos Remove obsolete fat_cvf help text Fix occurrences of "the the " Fix minor typoes in kernel/module.c Kconfig: Remove reference to external mqueue library Kconfig: A couple of grammatical fixes in arch/i386/Kconfig Correct comments in genrtc.c to refer to correct /proc file. Fix more "deprecated" spellos. Fix "deprecated" typoes. ... Fix trivial comment conflict in kernel/relay.c.	2007-05-09 12:54:17 -07:00
Rafael J. Wysocki	8bb7844286	Add suspend-related notifications for CPU hotplug Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:56 -07:00
Oleg Nesterov	28e53bddf8	unify flush_work/flush_work_keventd and rename it to cancel_work_sync flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq (this was possible from the very beginnig, I missed this). So we can unify flush_work_keventd and flush_work. Also, rename flush_work() to cancel_work_sync() and fix all callers. Perhaps this is not the best name, but "flush_work" is really bad. (akpm: this is why the earlier patches bypassed maintainers) Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Auke Kok <auke-jan.h.kok@intel.com>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:53 -07:00
Andrew Morton	19a75d83ff	kblockd: use flush_work Switch the kblockd flushing from a global flush to a more specific flush_work(). (akpm: bypassed maintainers, sorry. There are other patches which depend on this) Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <axboe@suse.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:51 -07:00
Dave Gilbert	dd2a345f8f	Display all possible partitions when the root filesystem failed to mount Display all possible partitions when the root filesystem is not mounted. This helps to track spell'o's and missing drivers. Updated to work with newer kernels. Example output: VFS: Cannot open root device "foobar" or unknown-block(0,0) Please append a correct "root=" boot option; here are the available partitions: 0800 8388608 sda driver: sd 0801 192748 sda1 0802 8193150 sda2 0810 4194304 sdb driver: sd Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) [akpm@linux-foundation.org: cleanups, fix printk warnings] Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: Dave Gilbert <linux@treblig.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:48 -07:00
Michael Opdenacker	59c51591a0	Fix occurrences of "the the " Signed-off-by: Michael Opdenacker <michael@free-electrons.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 08:57:56 +02:00
Linus Torvalds	02a93208ed	Merge branch 'for-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block * 'for-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block: [PATCH] ll_rw_blk: fix missing bounce in blk_rq_map_kern() [PATCH] splice: always call into page_cache_readahead() [PATCH] splice(): fix interaction with readahead	2007-05-08 11:34:52 -07:00
Nick Piggin	c6a632a2b6	as: fix antic_expire check Fix units mismatch (jiffies vs msecs) in as-iosched.c, spotted by Xiaoning Ding <dingxn@cse.ohio-state.edu>. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:03 -07:00
Mike Christie	821de3a27b	[PATCH] ll_rw_blk: fix missing bounce in blk_rq_map_kern() I think we might just need the blk_map_kern users now. For the async execute I added the bounce code already and the block SG_IO has it atleady. I think the blk_map_kern bounce code got dropped because we thought the correct gfp_t would be passed in. But I think all we need is the patch below and all the paths are take care of. The patch is not tested. Patch was made against scsi-misc. The last place that is sending non sg commands may just be md/dm-emc.c but that is is just waiting on alasdair to take some patches that fix that and a bunch of junk in there including adding bounce support. If the patch below is ok though and dm-emc finally gets converted then it will have sg and bonce buffer support. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-05-08 19:12:23 +02:00
Christoph Lameter	0a31bd5f2b	KMEM_CACHE(): simplify slab cache creation This patch provides a new macro KMEM_CACHE(<struct>, <flags>) to simplify slab creation. KMEM_CACHE creates a slab with the name of the struct, with the size of the struct and with the alignment of the struct. Additional slab flags may be specified if necessary. Example struct test_slab { int a,b,c; struct list_head; } __cacheline_aligned_in_smp; test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC) will create a new slab named "test_slab" of the size sizeof(struct test_slab) and aligned to the alignment of test slab. If it fails then we panic. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Peter Zijlstra	f98393a64c	mm: remove destroy_dirty_buffers from invalidate_bdev() Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't been used in 6 years (so akpm says). find * -name \.[ch] \| xargs grep -l invalidate_bdev \| while read file; do quilt add $file; sed -ie 's/invalidate_bdev($[^,]$,[^)]*)/invalidate_bdev(\1)/g' $file; done Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Linus Torvalds	4f7a307dc6	Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (87 commits) [SCSI] fusion: fix domain validation loops [SCSI] qla2xxx: fix regression on sparc64 [SCSI] modalias for scsi devices [SCSI] sg: cap reserved_size values at max_sectors [SCSI] BusLogic: stop using check_region [SCSI] tgt: fix rdma transfer bugs [SCSI] aacraid: fix aacraid not finding device [SCSI] aacraid: Correct SMC products in aacraid.txt [SCSI] scsi_error.c: Add EH Start Unit retry [SCSI] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test. [SCSI] ipr: Driver version to 2.3.2 [SCSI] ipr: Faster sg list fetch [SCSI] ipr: Return better qc_issue errors [SCSI] ipr: Disrupt device error [SCSI] ipr: Improve async error logging level control [SCSI] ipr: PCI unblock config access fix [SCSI] ipr: Fix for oops following SATA request sense [SCSI] ipr: Log error for SAS dual path switch [SCSI] ipr: Enable logging of debug error data for all devices [SCSI] ipr: Add new PCI-E IDs to device table ...	2007-05-05 13:30:44 -07:00
Greg Kroah-Hartman	823bccfc40	remove "struct subsystem" as it is no longer needed We need to work on cleaning up the relationship between kobjects, ksets and ktypes. The removal of 'struct subsystem' is the first step of this, especially as it is not really needed at all. Thanks to Kay for fixing the bugs in this patch. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 18:57:59 -07:00
Jens Axboe	07e4470805	Merge branch 'cfq' into for-linus	2007-04-30 09:09:27 +02:00
Jens Axboe	2a12dcd71a	[PATCH] elevator: elv_list_lock does not need irq disabling It's never grabbed from irq context, so just make it plain spin_lock(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:08:17 +02:00
Jens Axboe	597bc485d6	cfq-iosched: speedup cic rb lookup We often lookup the same queue many times in succession, so cache the last looked up queue to avoid browsing the rbtree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:23 +02:00
Jens Axboe	4e521c27ee	ll_rw_blk: add io_context private pointer To be used by as/cfq as they see fit. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:23 +02:00
Vasily Tarasov	91fac317a3	cfq-iosched: get rid of cfqq hash cfq hash is no more necessary. We always can get cfqq from io context. cfq_get_io_context_noalloc() function is introduced, because we don't want to allocate cic on merging and checking may_queue. In order to identify sync queue we've used hash key = CFQ_KEY_ASYNC. Since hash is eliminated we need to use other criterion: sync flag for queue is added. In all places where we dig in rb_tree we're in current context, so no additional locking is required. Advantages of this patch: no additional memory for hash, no seeking in hash, code is cleaner. But it is necessary now to seek cic in per-ioc rbtree, but it is faster: - most processes work only with few devices - most systems have only few block devices - it is a rb-tree Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Changes by me: - Merge into CFQ devel branch - Get rid of cfq_get_io_context_noalloc() - Fix various bugs with dereferencing cic->cfqq[] with offset other than 0 or 1. - Fix bug in cfqq setup, is_sync condition was reversed. - Fix bug where only bio_sync() is used, we need to check for a READ too Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:23 +02:00
Jens Axboe	cc19747977	cfq-iosched: tighten queue request overlap condition For tagged devices, allow overlap of requests if the idle window isn't enabled on the current active queue. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:23 +02:00
Jens Axboe	3ed9a2965c	cfq-iosched: improve sync vs async workloads Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:23 +02:00
Jens Axboe	1be92f2fc7	cfq-iosched: never allow an async queue idling We don't enable it by default, don't let it get enabled during runtime. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	20e493a8d0	cfq-iosched: get rid of ->dispatch_slice We can track it fairly accurately locally, let the slice handling take care of the rest. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	6084cdda0e	cfq-iosched: don't pass unused preemption variable around We don't use it anymore in the slice expiry handling. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	edd75ffd92	cfq-iosched: get rid of ->cur_rr and ->cfq_list It's only used for preemption now that the IDLE and RT queues also use the rbtree. If we pass an 'add_front' variable to cfq_service_tree_add(), we can set ->rb_key to 0 to force insertion at the front of the tree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	67e6b49e39	cfq-iosched: slice offset should take ioprio into account Use the max_slice-cur_slice as the multipler for the insertion offset. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	498d3aa2b4	[PATCH] cfq-iosched: style cleanups and comments Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	67060e3799	cfq-iosched: sort IDLE queues into the rbtree Same treatment as the RT conversion, just put the sorted idle branch at the end of the tree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	0c534e0a46	cfq-iosched: sort RT queues into the rbtree Currently CFQ does a linked insert into the current list for RT queues. We can just factor the class into the rb insertion, and then we don't have to treat RT queues in a special way. It's faster, too. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	cc09e2990f	[PATCH] cfq-iosched: speed up rbtree handling For cases where the rbtree is mainly used for sorting and min retrieval, a nice speedup of the rbtree code is to maintain a cache of the leftmost node in the tree. Also spotted in the CFS CPU scheduler code. Improved by Alan D. Brunelle <Alan.Brunelle@hp.com> by updating the leftmost hint in cfq_rb_first() if it isn't set, instead of only updating it on insert. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:21 +02:00
Jens Axboe	d9e7620e60	cfq-iosched: rework the whole round-robin list concept Drawing on some inspiration from the CFS CPU scheduler design, overhaul the pending cfq_queue concept list management. Currently CFQ uses a doubly linked list per priority level for sorting and service uses. Kill those lists and maintain an rbtree of cfq_queue's, sorted by when to service them. This unfortunately means that the ionice levels aren't as strong anymore, will work on improving those later. We only scale the slice time now, not the number of times we service. This means that latency is better (for all priority levels), but that the distinction between the highest and lower levels aren't as big. The diffstat speaks for itself. cfq-iosched.c \| 363 +++++++++++++++++--------------------------------- 1 file changed, 125 insertions(+), 238 deletions(-) Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:21 +02:00
Jens Axboe	1afba0451c	cfq-iosched: minor updates - Move the queue_new flag clear to when the queue is selected - Only select the non-first queue in cfq_get_best_queue(), if there's a substantial difference between the best and first. - Get rid of ->busy_rr - Only select a close cooperator, if the current queue is known to take a while to "think". Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:21 +02:00
Jens Axboe	6d048f5310	cfq-iosched: development update - Implement logic for detecting cooperating processes, so we choose the best available queue whenever possible. - Improve residual slice time accounting. - Remove dead code: we no longer see async requests coming in on sync queues. That part was removed a long time ago. That means that we can also remove the difference between cfq_cfqq_sync() and cfq_cfqq_class_sync(), they are now indentical. And we can kill the on_dispatch array, just make it a counter. - Allow a process to go into the current list, if it hasn't been serviced in this scheduler tick yet. Possible future improvements including caching the cfqq lookup in cfq_close_cooperator(), so we don't have to look it up twice. cfq_get_best_queue() should just use that last decision instead of doing it again. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:21 +02:00
Jens Axboe	1e3335de05	cfq-iosched: improve preemption for cooperating tasks When testing the syslet async io approach, I discovered that CFQ sometimes didn't perform as well as expected. cfq_should_preempt() needs to better check for cooperating tasks, so fix that by allowing preemption of an equal priority queue if the recently queued request is as good a candidate for IO as the one we are currently waiting for. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:21 +02:00
Jens Axboe	5044eed488	cfq-iosched: fix alias + front merge bug There's a really rare and obscure bug in CFQ, that causes a crash in cfq_dispatch_insert() due to rq == NULL. One example of the resulting oops is seen here: http://lkml.org/lkml/2007/4/15/41 Neil correctly diagnosed the situation for how this can happen: if two concurrent requests with the exact same sector number (due to direct IO or aliasing between MD and the raw device access), the alias handling will add the request to the sortlist, but next_rq remains NULL. Read the more complete analysis at: http://lkml.org/lkml/2007/4/25/57 This looks like it requires md to trigger, even though it should potentially be possible to due with O_DIRECT (at least if you edit the kernel and doctor some of the unplug calls). The fix is to move the ->next_rq update to when we add a request to the rbtree. Then we remove the possibility for a request to exist in the rbtree code, but not have ->next_rq correctly updated. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-25 08:41:48 -07:00
Jens Axboe	a993800655	cfq-iosched: fix sequential write regression We have a 10-15% performance regression for sequential writes on TCQ/NCQ enabled drives in 2.6.21-rcX after the CFQ update went in. It has been reported by Valerie Clement <valerie.clement@bull.net> and the Intel testing folks. The regression is because of CFQ's now more aggressive queue control, limiting the depth available to the device. This patches fixes that regression by allowing a greater depth when only one queue is busy. It has been tested to not impact sync-vs-async workloads too much - we still do a lot better than 2.6.20. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-20 22:56:29 -07:00
Alan Stern	44ec95425c	[SCSI] sg: cap reserved_size values at max_sectors This patch (as857) modifies the SG_GET_RESERVED_SIZE and SG_SET_RESERVED_SIZE ioctls in the sg driver, capping the values at the device's request_queue's max_sectors value. This will permit cdrecord to obtain a legal value for the maximum transfer length, fixing Bugzilla #7026. The patch also caps the initial reserved_size value. There's no reason to have a reserved buffer larger than max_sectors, since it would be impossible to use the extra space. The corresponding ioctls in the block layer are modified similarly, and the initial value for the reserved_size is set as large as possible. This will effectively make it default to max_sectors. Note that the actual value is meaningless anyway, since block devices don't have a reserved buffer. Finally, the BLKSECTGET ioctl is added to sg, so that there will be a uniform way for users to determine the actual max_sectors value for any raw SCSI transport. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Douglas Gilbert <dougg@torque.net> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2007-04-17 18:09:56 -04:00
Andrew Morton	2363cc0264	[PATCH] remove protection of LANANA-reserved majors Revert all this. It can cause device-mapper to receive a different major from earlier kernels and it turns out that the Amanda backup program (via GNU tar, apparently) checks major numbers on files when performing incremental backups. Which is a bit broken of Amanda (or tar), but this feature isn't important enough to justify the churn. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-04 21:12:47 -07:00
Thibaut VARENE	1ffb96c587	make elv_register() output atomic Booting 2.6.21-rc3-g45592145 I noticed the following on one of my machines in the bootlog: io scheduler noop registered<6>Time: jiffies clocksource has been installed. io scheduler deadline registered (default) Looking at block/elevator.c, it appears that elv_register() uses two consecutive printks in a non-atomic way, leading to the above glitch. The attached trivial patch fixes this issue, by using a single printk. Signed-off-by: Thibaut VARENE <varenet@parisc-linux.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-03-27 08:53:04 +02:00
Vasily Tarasov	f772b3d9ca	block: blk_max_pfn is somtimes wrong There is a small problem in handling page bounce. At the moment blk_max_pfn equals max_pfn, which is in fact not maximum possible _number_ of a page frame, but the _amount_ of page frames. For example for the 32bit x86 node with 4Gb RAM, max_pfn = 0x100000, but not 0xFFFF. request_queue structure has a member q->bounce_pfn and queue needs bounce pages for the pages _above_ this limit. This routine is handled by blk_queue_bounce(), where the following check is produced: if (q->bounce_pfn >= blk_max_pfn) return; Assume, that a driver has set q->bounce_pfn to 0xFFFF, but blk_max_pfn equals 0x10000. In such situation the check above fails and for each bio we always fall down for iterating over pages tied to the bio. I want to notice, that for quite a big range of device drivers (ide, md, ...) such problem doesn't happen because they use BLK_BOUNCE_ANY for bounce_pfn. BLK_BOUNCE_ANY is defined as blk_max_pfn << PAGE_SHIFT, and then the check above doesn't fail. But for other drivers, which obtain reuired value from drivers, it fails. For example sata_nv uses ATA_DMA_MASK or dev->dma_mask. I propose to use (max_pfn - 1) for blk_max_pfn. And the same for blk_max_low_pfn. The patch also cleanses some checks related with bounce_pfn. Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-03-27 08:52:47 +02:00
Peter Zijlstra	6d740cd5b1	[PATCH] lockdep: annotate BLKPG_DEL_PARTITION >============================================= >[ INFO: possible recursive locking detected ] >2.6.19-1.2909.fc7 #1 >--------------------------------------------- >anaconda/587 is trying to acquire lock: > (&bdev->bd_mutex){--..}, at: [<c05fb380>] mutex_lock+0x21/0x24 > >but task is already holding lock: > (&bdev->bd_mutex){--..}, at: [<c05fb380>] mutex_lock+0x21/0x24 > >other info that might help us debug this: >1 lock held by anaconda/587: > #0: (&bdev->bd_mutex){--..}, at: [<c05fb380>] mutex_lock+0x21/0x24 > >stack backtrace: > [<c0405812>] show_trace_log_lvl+0x1a/0x2f > [<c0405db2>] show_trace+0x12/0x14 > [<c0405e36>] dump_stack+0x16/0x18 > [<c043bd84>] __lock_acquire+0x116/0xa09 > [<c043c960>] lock_acquire+0x56/0x6f > [<c05fb1fa>] __mutex_lock_slowpath+0xe5/0x24a > [<c05fb380>] mutex_lock+0x21/0x24 > [<c04d82fb>] blkdev_ioctl+0x600/0x76d > [<c04946b1>] block_ioctl+0x1b/0x1f > [<c047ed5a>] do_ioctl+0x22/0x68 > [<c047eff2>] vfs_ioctl+0x252/0x265 > [<c047f04e>] sys_ioctl+0x49/0x63 > [<c0404070>] syscall_call+0x7/0xb Annotate BLKPG_DEL_PARTITION's bd_mutex locking and add a little comment clarifying the bd_mutex locking, because I confused myself and initially thought the lock order was wrong too. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-20 17:10:16 -08:00
Andrew Morton	b446b60e4e	[PATCH] rework reserved major handling Several people have reported failures in dynamic major device number handling due to the recent changes in there to avoid handing out the local/experimental majors. Rolf reports that this is due to a gcc-4.1.0 bug. The patch refactors that code a lot in an attempt to provoke the compiler into behaving. Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-20 17:10:13 -08:00
Jesper Juhl	a8e14b950c	update I/O sched Kconfig help texts - CFQ is now default, not AS. Change I/O scheduler description to correctly show CFQ as being the default scheduler and not the anticipatory scheduler that previously was default. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-02-17 20:08:22 +01:00
Arjan van de Ven	2b8693c061	[PATCH] mark struct file_operations const 3 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:45 -08:00
Andrew Morton	fdf892be32	[PATCH] register_blkdev(): don't hand out the LOCAL/EXPERIMENTAL majors As pointed out in http://bugzilla.kernel.org/show_bug.cgi?id=7922, dynamic blockdev major allocation can hand out majors which LANANA has defined as being for local/experimental use. Cc: Torben Mathiasen <device@lanana.org> Cc: Greg KH <greg@kroah.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tomas Klas <tomas.klas@mepatek.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:27 -08:00
Jens Axboe	9ede209e83	cfq-iosched: improve continue or break logic in cfq_dispatch This improves performance considerably for sync requests when you have command queuing enabled. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	28f95cbc3e	cfq-iosched: remove the implicit queue kicking in slice expire We only really need it for a process going away, so move it to those locations. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	3c6bd2f879	cfq-iosched: check whether a queue timed out in accounting Makes it more fair for the residual slice count. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	cb8874119e	cfq-iosched: tweak the FIFO checking We currently check the FIFO once per slice. Optimize that a bit and only do it as the first thing for a new slice, so we don't end up doing a single request and then seek to the FIFO requests. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	1792669cc1	cfq-iosched: don't pass in queue for cfq_arm_slice_timer() It must always be the active queue, otherwise it's a bug. So just use the active_queue, don't pass it in explicitly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	c5b680f3b7	cfq-iosched: account for slice over/under time If a slice uses less than it is entitled to (or perhaps more), include that in the decision on how much time to give it the next time it gets serviced. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	44f7c16065	cfq-iosched: defer slice activation to first request being active This better matches what time the queue is actually spending doing IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	99f9628aba	[PATCH] cfq-iosched: use last service point as the fairness criteria Right now we use slice_start, which gives async queues an unfair advantage. Chance that to service_last, and base the resorter on that. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	b0b8d74941	cfq-iosched: document the cfqq flags Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Jens Axboe	98e41c7dfc	[PATCH] cfq-iosched: move on_rr check into cfq_resort_rr_list() Move the on_rr check into cfq_resort_rr_list(), every call site needs to check it anyway. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Jens Axboe	aaf1228ddf	cfq-iosched: remove cfq_io_context last_queue It hasn't been used for a while, kill it off and remove the old if 0 code chunk. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Jens Axboe	783660b2f6	elevator: don't sort reads between writes Don't allow elv_dispatch_sort() to mix reads and writes together, it's rarely a good idea. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Jens Axboe	cad9751642	elevator: abstract out the activate and deactivate functions Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Linus Torvalds	c827ba4cb4	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC64]: Update defconfig. [SPARC64]: Add PCI MSI support on Niagara. [SPARC64] IRQ: Use irq_desc->chip_data instead of irq_desc->handler_data [SPARC64]: Add obppath sysfs attribute for SBUS and PCI devices. [PARTITION]: Add whole_disk attribute.	2007-02-11 11:37:45 -08:00
Mathieu Desnoyers	23c887522e	[PATCH] Relay: add CPU hotplug support Mathieu originally needed to add this for tracing Xen, but it's something that's needed for any application that can be tracing while cpus are added. unplug isn't supported by this patch. The thought was that at minumum a new buffer needs to be added when a cpu comes up, but it wasn't worth the effort to remove buffers on cpu down since they'd be freed soon anyway when the channel was closed. [zanussi@us.ibm.com: avoid lock_cpu_hotplug deadlock] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:28 -08:00
Fabio Massimo Di Nitto	d18d7682c1	[PARTITION]: Add whole_disk attribute. Some partitioning systems create special partitions that span the entire disk. One example are Sun partitions, and this whole-disk partition exists to tell the firmware the extent of the entire device so it can load the boot block and do other things. Such partitions should not be treated as normal partitions, because all the other partitions overlap this whole-disk one. So we'd see multiple instances of the same UUID etc. which we do not want. udev and friends can thus search for this 'whole_disk' attribute and use it to decide to ignore the partition. Signed-off-by: Fabio Massimo Di Nitto <fabbione@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:50:00 -08:00
Neil Brown	387bb17374	[PATCH] md: fix various bugs with aligned reads in RAID5 It is possible for raid5 to be sent a bio that is too big for an underlying device. So if it is a READ that we pass stright down to a device, it will fail and confuse RAID5. So in 'chunk_aligned_read' we check that the bio fits within the parameters for the target device and if it doesn't fit, fall back on reading through the stripe cache and making lots of one-page requests. Note that this is the earliest time we can check against the device because earlier we don't have a lock on the device, so it could change underneath us. Also, the code for handling a retry through the cache when a read fails has not been tested and was badly broken. This patch fixes that code. Signed-off-by: Neil Brown <neilb@suse.de> Cc: "Kai" <epimetreus@fastmail.fm> Cc: <stable@suse.de> Cc: <org@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-09 09:25:46 -08:00
Mike Christie	c0d4d573fe	[PATCH] Fix SG_IO timeout jiffy conversion Commit `85e04e371b` cleaned up the timeout conversion, but did it exactly the wrong way. We get msecs from user space, and should convert them into jiffies. Not the other way around. Here is a fix with the overflow check sg.c has added in. This fixes DVD burnign with Nero. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> [ "you'll be wanting a comma there" - Andrew ] Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-29 20:32:03 -08:00
Linas Vepstas	95543179f1	[PATCH] elevator: move clearing of unplug flag earlier A flag was recently added to the elevator code to avoid performing an unplug when reuests are being re-queued. The goal of this flag was to avoid a deep recursion that can occur when re-queueing requests after a SCSI device/host reset. See http://lkml.org/lkml/2006/5/17/254 However, that fix added the flag near the bottom of a case statement, where an earlier break (in an if statement) could transport one out of the case, without setting the flag. This patch sets the flag earlier in the case statement. I re-discovered the deep recursion recently during testing; I was told that it was a known problem, and the fix to it was in the kernel I was testing. Indeed it was ... but it didn't fix the bug. With the patch below, I no longer see the bug. Signed-off by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Jens Axboe <axboe@suse.de> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-23 11:01:17 -08:00
Jens Axboe	ec8acb6904	[PATCH] cfq-iosched: merging problem Two issues: - The final return 1 should be a return 0, otherwise comparing cfqq is a noop. - bio_sync() only checks the sync flag, while rq_is_sync() checks both for READ and sync. The latter is what we want. Expand the bio check to include reads, and relax the restriction to allow merging of async io into sync requests. In the future we want to clean up the SYNC logic, right now it means both sync request (such as READ and O_DIRECT WRITE) and unplug-on-issue. Leave that for later. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-02 09:46:16 -08:00
Jens Axboe	719d34027e	[PATCH] cfq-iosched: tighten allow merge criteria The logic in cfq_allow_merge() wasn't clear enough - basically allow merging for the same queues only. Do a fast check for 'rq and bio both sync/async' before doing the cfqq hash lookup. This is verified to work with the fixed elv_try_merge() from commit `bb4067e341`. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 14:13:08 -08:00
Randy Dunlap	af9997e426	[PATCH] fix kernel-doc warnings in 2.6.20-rc1 Fix kernel-doc warnings in 2.6.20-rc1. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:47 -08:00
Jens Axboe	bb4067e341	[PATCH] elevator: fixup typo in merge logic The recent io scheduler allow_merge commit left the block layer with no merging, oops. This patch fixes that up. That means the CFQ change needs to be verified again, it might not fix the original bug now. But that's a seperate thing, I'll double check that tomorrow. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-21 22:01:04 -08:00
Jens Axboe	da77526502	[PATCH] cfq-iosched: don't allow sync merges across queues Currently we allow any merge, even if the io originates from different processes. This can cause really bad starvation and unfairness, if those ios happen to be synchronous (reads or direct writes). So add a allow_merge hook to the io scheduler ops, so an io scheduler can help decide whether a bio/process combination may be merged with an existing request. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-20 11:04:12 +01:00
Jens Axboe	8e5cfc45e7	[PATCH] Fixup blk_rq_unmap_user() API The blk_rq_unmap_user() API is not very nice. It expects the caller to know that rq->bio has to be reset to the original bio, and it will silently do nothing if that is not done. Instead make it explicit that we need to pass in the first bio, by expecting a bio argument. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-19 11:12:46 +01:00
Jens Axboe	48785bb9fa	[PATCH] __blk_rq_unmap_user() fails to return error If the bio is user copied, the copy back could return -EFAULT. Make sure we return any error seen during unmapping. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-19 11:07:59 +01:00
Jens Axboe	9c9381f942	[PATCH] __blk_rq_map_user() doesn't need to grab the queue_lock It was for driver private back_merge_fn hooks, but they don't exist anymore. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-19 08:34:17 +01:00
Jens Axboe	1aa4f24fe9	[PATCH] Remove queue merging hooks We have full flexibility of merging parameters now, so we can remove the hooks that define back/front/request merge strategies. Nobody is using them anymore. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-19 08:33:11 +01:00
Jens Axboe	2985259b0e	[PATCH] ->nr_sectors and ->hard_nr_sectors are not used for BLOCK_PC requests It's a file system thing, for block requests the only size used in the io paths is ->data_len as it is in bytes, not sectors. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-19 08:27:31 +01:00
Jens Axboe	c65fb61b3c	[PATCH] Allow as-iosched to be unloaded We implemented the missing bits to allow this some time ago, and they are integrated in AS. So remove the __module_get() to allow the module to be unloaded. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-13 13:25:18 +01:00
Jens Axboe	7749a8d423	[PATCH] Propagate down request sync flag We need to do this, otherwise the io schedulers don't get access to the sync flag. Then they cannot tell the difference between a regular write and an O_DIRECT write, which can cause a performance loss. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-13 13:02:26 +01:00
FUJITA Tomonori	335302618f	[PATCH] remove unnecessary blk_queue_bounce in SG_IO When I converted the original patch, I left unnecessary blk_queue_bounce in SG_IO. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-12 10:26:55 +01:00
FUJITA Tomonori	77d172ce27	[PATCH] fix SG_IO bio leak This patch fixes bio leaks in SG_IO. rq->bio can be changed after io completion, so we need to reset rq->bio before calling blk_rq_unmap_user() http://marc.theaimsgroup.com/?l=linux-kernel&m=116570666807983&w=2 Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-12 10:22:23 +01:00
Boaz Harrosh	2b02a17920	[PATCH] remove blk_queue_activity_fn While working on bidi support at struct request level I have found that blk_queue_activity_fn is actually never used. The only user is in ide-probe.c with this code: /* enable led activity for disk drives only */ if (drive->media == ide_disk && hwif->led_act) blk_queue_activity_fn(q, hwif->led_act, drive); And led_act is never initialized anywhere. (Looking back at older kernels it was used in the PPC arch, but was removed around 2.6.18) Unless it is all for future use off course. (this patch is against linux-2.6-block.git as off 2006/12/4) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-12 10:22:23 +01:00
Andrew Morton	faccbd4b26	[PATCH] io-accounting: read accounting Wire up read accounting for block devices, within submit_bio(). Cc: Jay Lan <jlan@sgi.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Cc: David Wright <daw@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:41 -08:00
Akinobu Mita	c17bb49517	[PATCH] fault-injection capability for disk IO This patch provides fault-injection capability for disk IO. Boot option: fail_make_request=<probability>,<interval>,<space>,<times> <interval> -- specifies the interval of failures. <probability> -- specifies how often it should fail in percent. <space> -- specifies the size of free space where disk IO can be issued safely in bytes. <times> -- specifies how many times failures may happen at most. Debugfs: /debug/fail_make_request/interval /debug/fail_make_request/probability /debug/fail_make_request/specifies /debug/fail_make_request/times Example: fail_make_request=10,100,0,-1 echo 1 > /sys/blocks/hda/hda1/make-it-fail generic_make_request() on /dev/hda1 fails once per 10 times. Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:29:02 -08:00
Josef Sipek	c5a20b6c26	[PATCH] struct path: convert block Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:44 -08:00
Peter Zijlstra	2e7b651df1	[PATCH] remove the old bd_mutex lockdep annotation Remove the old complex and crufty bd_mutex annotation. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:38 -08:00
Ingo Molnar	0231606785	[PATCH] hotplug CPU: clean up hotcpu_notifier() use There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn, prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus generating compiler warnings of unused symbols, hence forcing people to add #ifdefs. the compiler can skip truly unused functions just fine: text data bss dec hex filename 1624412 728710 3674856 6027978 5bfaca vmlinux.before 1624412 728710 3674856 6027978 5bfaca vmlinux.after [akpm@osdl.org: topology.c fix] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:39 -08:00
Christoph Lameter	e18b890bb0	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Alan Stern	a120586873	[PATCH] Allow NULL pointers in percpu_free The patch (as824b) makes percpu_free() ignore NULL arguments, as one would expect for a deallocation routine. (Note that free_percpu is #defined as percpu_free in include/linux/percpu.h.) A few callers are updated to remove now-unneeded tests for NULL. A few other callers already seem to assume that passing a NULL pointer to percpu_free() is okay! The patch also removes an unnecessary NULL check in percpu_depopulate(). Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:22 -08:00
David Howells	4796b71fbb	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/pcmcia/ds.c Fix up merge failures with Linus's head and fix new compile failures. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-06 15:01:18 +00:00
Linus Torvalds	ec0bf39a47	Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (73 commits) [SCSI] aic79xx: Add ASC-29320LPE ids to driver [SCSI] stex: version update [SCSI] stex: change wait loop code [SCSI] stex: add new device type support [SCSI] stex: update device id info [SCSI] stex: adjust default queue length [SCSI] stex: add value check in hard reset routine [SCSI] stex: fix controller_info command handling [SCSI] stex: fix biosparam calculation [SCSI] megaraid: fix MMIO casts [SCSI] tgt: fix undefined flush_dcache_page() problem [SCSI] libsas: better error handling in sas_expander.c [SCSI] lpfc 8.1.11 : Change version number to 8.1.11 [SCSI] lpfc 8.1.11 : Misc Fixes [SCSI] lpfc 8.1.11 : Add soft_wwnn sysfs attribute, rename soft_wwn_enable [SCSI] lpfc 8.1.11 : Removed decoding of PCI Subsystem Id [SCSI] lpfc 8.1.11 : Add MSI (Message Signalled Interrupts) support [SCSI] lpfc 8.1.11 : Adjust LOG_FCP logging [SCSI] lpfc 8.1.11 : Fix Memory leaks [SCSI] lpfc 8.1.11 : Fix lpfc_multi_ring_support ...	2006-12-05 16:09:46 -08:00
David Howells	9db7372445	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/ata/libata-scsi.c include/linux/libata.h Futher merge of Linus's head and compilation fixups. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 17:01:28 +00:00
David Howells	4c1ac1b491	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 14:37:56 +00:00
Matthew Wilcox	e62438630c	[PATCH] Centralise definitions of sector_t and blkcnt_t CONFIG_LBD and CONFIG_LSF are spread into asm/types.h for no particularly good reason. Centralising the definition in linux/types.h means that arch maintainers don't need to bother adding it, as well as fixing the problem with x86-64 users being asked to make a decision that has absolutely no effect. The H8/300 porters seem particularly confused since I'm not aware of any microcontrollers that need to support 2TB filesystems. Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-04 19:41:15 -08:00
Jens Axboe	a863055b10	[PATCH] blktrace: don't return blktrace_seq from trace_note() Only the process notifier needs it, and it can set it manually. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-04 09:30:58 +01:00
Jens Axboe	d3d9d2a5ea	[PATCH] blktrace: uninline trace_note() It's too large to inline. Additionally clean it up, by fast pathing the likely path. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-04 09:27:41 +01:00
Jens Axboe	bb37b94c68	[BLOCK] Cleanup unused variable passing - ->init_queue() does not need the elevator passed in - ->put_request() is a hot path and need not have the queue passed in - cfq_update_io_seektime() does not need cfqd passed in Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-01 10:42:33 +01:00
Mike Christie	0e75f9063f	[PATCH] block: support larger block pc requests This patch modifies blk_rq_map/unmap_user() and the cdrom and scsi_ioctl.c users so that it supports requests larger than bio by chaining them together. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-01 10:40:55 +01:00
Olaf Kirch	be1c63411a	[PATCH] blktrace: add timestamp message This adds a new timestamp message to blktrace, giving the timeofday when we starting tracing. This helps user space correlate block trace events with eg an application strace. This requires a (compatible) update to blkparse. The changed blkparse is still able to process traces generated by older kernels, and older versions of blkparse should silently ignore the new records (because they have a pid of 0). Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-01 10:39:12 +01:00
James Bottomley	0bd2af4683	Merge ../scsi-rc-fixes-2.6	2006-11-22 12:06:44 -06:00
David Howells	65f27f3844	WorkStruct: Pass the work_struct pointer instead of context data Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is not cleared before jumping to the work function, then the work function may access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:55:48 +00:00
Tejun Heo	097b8457da	[PATCH] scsi: clear garbage after CDBs on SG_IO ATAPI devices transfer fixed number of bytes for CDBs (12 or 16). Some ATAPI devices choke when shorter CDB is used and the left bytes contain garbage. Block SG_IO cleared left bytes but SCSI SG_IO didn't. This patch makes SCSI SG_IO clear it and simplify CDB clearing in block SG_IO. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Mathieu Fluhr <mfluhr@nero.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Douglas Gilbert <dougg@torque.net> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: <stable@kernel.org> Acked-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-16 11:43:38 -08:00
Hannes Reinecke	85e04e371b	[SCSI] block: convert jiffies to msecs in scsi_ioctl() Use the proper conversion function for convert jiffies to msecs in sg_io(). Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2006-11-15 17:57:33 -06:00
Jens Axboe	616e8a091a	[PATCH] Fix bad data direction in SG_IO Contrary to what the name misleads you to believe, SG_DXFER_TO_FROM_DEV is really just a normal read seen from the device side. This patch fixes http://lkml.org/lkml/2006/10/13/100 Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-13 09:47:00 -08:00
Andrew Morton	df66b8552b	[PATCH] tidy "md: check bio address after mapping through partitions" Neil's xterms are too wide. Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-03 12:27:55 -08:00
Jens Axboe	5fccbf61be	[PATCH] CFQ: request <-> request merging rr_list fixup In very rare circumstances would we be pruning a merged request and at the same time delete the implicated cfqq from the rr_list, and not readd it when the merged request got added. This could cause io stalls until that process issued io again. Fix it up by putting the rr_list add handling into cfq_add_rq_rb(), identical to how pruning is handled in cfq_del_rq_rb(). This fixes a hang reproducible with fsx-linux. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-31 08:12:45 -08:00
NeilBrown	5ddfe9691c	[PATCH] md: check bio address after mapping through partitions. Partitions are not limited to live within a device. So we should range check after partition mapping. Note that 'maxsector' was being used for two different things. I have split off the second usage into 'old_sector' so that maxsector can be still be used for it's primary usage later in the function. Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-31 08:07:01 -08:00
Jens Axboe	c1b707d253	[PATCH] CFQ: bad locking in changed_ioprio() When the ioprio code recently got juggled a bit, a bug was introduced. changed_ioprio() is no longer called with interrupts disabled, so using plain spin_lock() on the queue_lock is a bug. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-30 11:01:50 -08:00
Jens Axboe	0261d6886e	[PATCH] CFQ: use irq safe locking in cfq_cic_link() If cfq_set_request() is called for a new process AND a non-fs io request (so that __GFP_WAIT may not be set), cfq_cic_link() may use spin_lock_irq() and spin_unlock_irq() with interrupts already disabled. Fix is to always use irq safe locking in cfq_cic_link() Acked-By: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-30 10:21:58 -08:00
Andrew Morton	3fcfab16c5	[PATCH] separate bdi congestion functions from queue congestion functions Separate out the concept of "queue congestion" from "backing-dev congestion". Congestion is a backing-dev concept, not a queue concept. The blk_* congestion functions are retained, as wrappers around the core backing-dev congestion functions. This proper layering is needed so that NFS can cleanly use the congestion functions, and so that CONFIG_BLOCK=n actually links. Cc: "Thomas Maier" <balagi@justmail.de> Cc: "Jens Axboe" <jens.axboe@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: David Howells <dhowells@redhat.com> Cc: Peter Osterlund <petero2@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:35 -07:00
Thomas Maier	79e2de4bc5	[PATCH] export clear_queue_congested and set_queue_congested Export the clear_queue_congested() and set_queue_congested() functions located in ll_rw_blk.c The functions are renamed to blk_clear_queue_congested() and blk_set_queue_congested(). (needed in the pktcdvd driver's bio write congestion control) Signed-off-by: Thomas Maier <balagi@justmail.de> Cc: Peter Osterlund <petero2@telia.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:35 -07:00
Vasily Tarasov	c584164224	[PATCH] block layer: elv_iosched_show should get elv_list_lock elv_iosched_show function iterates other elv_list, hence elv_list_lock should be got. Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Signed-off-by: Vasily Tarasov <jens.axboe@oracle.com>	2006-10-12 15:08:51 +02:00
Vasily Tarasov	a22b169df1	[PATCH] block layer: elevator_find function cleanup We can easily produce search through the elevator list without introducing additional elevator_type variable. Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-10-12 15:08:51 +02:00
David C Somayajulu	f583f4924d	[PATCH] helper function for retrieving scsi_cmd given host based block layer tag This was necessitated by the need for a function to get back to a scsi_cmnd, when an hba the posts its (corresponding) completion interrupt with a block layer tag as its reference. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: David Somayajulu <david.somayajulu@qlogic.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-10-04 19:32:09 +02:00
Alasdair G Kergon	7006f6eca8	[PATCH] dm: export blkdev_driver_ioctl Export blkdev_driver_ioctl for device-mapper. If we get as far as the device-mapper ioctl handler, we know the ioctl is not a standard block layer BLK* one, so we don't need to check for them a second time and can call blkdev_driver_ioctl() directly. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:13 -07:00
Peter Zijlstra	6e9a4738c9	[PATCH] completions: lockdep annotate on stack completions All on stack DECLARE_COMPLETIONs should be replaced by: DECLARE_COMPLETION_ONSTACK Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:24 -07:00
Jens Axboe	51d7513a8a	[PATCH] Only enable CONFIG_BLOCK option for embedded It's too easy for people to shoot themselves in the foot, and it only makes sense for embedded folks anyway. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 21:14:05 +02:00
Jens Axboe	059af497c2	[PATCH] blk_queue_start_tag() shared map race fix If we share the tag map between two or more queues, then we cannot use __set_bit() to set the bit. In fact we need to make sure we atomically acquire this tag, so loop using test_and_set_bit() to protect from that. Noticed by Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:52:34 +02:00
Jens Axboe	0fe2347957	[PATCH] Update axboe@suse.de email address As people often look for the copyright in files to see who to mail, update the link to a neutral one. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:52:34 +02:00
David Howells	9361401eb7	[PATCH] BLOCK: Make it possible to disable the block layer [try #6 ] Make it possible to disable the block layer. Not all embedded devices require it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require the block layer to be present. This patch does the following: () Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev support. () Adds dependencies on CONFIG_BLOCK to any configuration item that controls an item that uses the block layer. This includes: () Block I/O tracing. () Disk partition code. () All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS. () The SCSI layer. As far as I can tell, even SCSI chardevs use the block layer to do scheduling. Some drivers that use SCSI facilities - such as USB storage - end up disabled indirectly from this. () Various block-based device drivers, such as IDE and the old CDROM drivers. () MTD blockdev handling and FTL. () JFFS - which uses set_bdev_super(), something it could avoid doing by taking a leaf out of JFFS2's book. () Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is, however, still used in places, and so is still available. () Also made contingent are the contents of linux/mpage.h, linux/genhd.h and parts of linux/fs.h. () Makes a number of files in fs/ contingent on CONFIG_BLOCK. () Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK. () set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK is not enabled. () fs/no-block.c is created to hold out-of-line stubs and things that are required when CONFIG_BLOCK is not set: () Default blockdev file operations (to give error ENODEV on opening). () Makes some /proc changes: () /proc/devices does not list any blockdevs. () /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK. () Makes some compat ioctl handling contingent on CONFIG_BLOCK. () If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if given command other than Q_SYNC or if a special device is specified. () In init/do_mounts.c, no reference is made to the blockdev routines if CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2. () The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return error ENOSYS by way of cond_syscall if so). () The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if CONFIG_BLOCK is not set, since they can't then happen. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:52:31 +02:00
Martin Peschke	4090959aee	[PATCH] blktrace: cleanup using on_each_cpu This patch kills a few lines of code in blktrace by making use of on_each_cpu(). Signed-off-by: Martin Peschke <mp3@de.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:31:19 +02:00
Oleg Nesterov	25034d7a83	[PATCH] exit_io_context: don't disable irqs We don't need to disable irqs to clear current->io_context, it is protected by ->alloc_lock. Even IF it was possible to submit I/O from IRQ on behalf of current this irq_disable() can't help: current_io_context() will re-instantiate ->io_context after irq_enable(). We don't need task_lock() or local_irq_disable() to clear ioc->task. This can't prevent other CPUs from playing with our io_context anyway. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:31:18 +02:00
Jens Axboe	7457e6e2d7	[PATCH] blktrace: support for logging metadata reads Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:43 +02:00
Jens Axboe	374f84ac39	[PATCH] cfq-iosched: use metadata read flag Give meta data reads preference over regular reads, as the process often needs to get that out of the way to do the io it was actually interested in. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:43 +02:00
Jens Axboe	5404bc7a87	[PATCH] Allow file systems to differentiate between data and meta reads We can use this information for making more intelligent priority decisions, and it will also be useful for blktrace. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:42 +02:00
Jens Axboe	da20a20f3b	[PATCH] ll_rw_blk: allow more flexibility for read_ahead_kb store It can make sense to set read-ahead larger than a single request. We should not be enforcing such policy on the user. Additionally, using the BLKRASET ioctl doesn't impose such a restriction. So additionally we now expose identical behaviour through the two. Issue also reported by Anton <cbou@mail.ru> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:41 +02:00
Jens Axboe	bf57225670	[PATCH] cfq-iosched: improve queue preemption Don't touch the current queues, just make sure that the wanted queue is selected next. Simplifies the logic. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:41 +02:00
Jens Axboe	dc72ef4ae3	[PATCH] Add blk_start_queueing() helper CFQ implements this on its own now, but it's really block layer knowledge. Tells a device queue to start dispatching requests to the driver, taking care to unplug if needed. Also fixes the issue where as/cfq will invoke a stopped queue, which we really don't want. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:40 +02:00
Jens Axboe	981a79730d	[PATCH] cfq-iosched: kill the empty_list No point in having a place holder list just for empty queues, so remove it. It's not used for anything other than to keep ->cfq_list busy. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:40 +02:00
Jens Axboe	53b03744e5	[PATCH] cfq-iosched: Kill O(N) runtime of cfq_resort_rr_list() Currently it scales with number of processes in that priority group, which is potentially not very nice as it's called quite often. Basically we always need to do tail inserts, except for the case of a new process. So just mark/detect a queue as such. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:39 +02:00
Jens Axboe	b5deef9012	[PATCH] Make sure all block/io scheduler setups are node aware Some were kmalloc_node(), some were still kmalloc(). Change them all to kmalloc_node(). Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:39 +02:00
Jens Axboe	1ea25ecb72	[PATCH] Audit block layer inlines Kill a few inlines that bring in too much code to more than one location Shrinks kernel text by about 300 bytes on 32-bit x86. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:38 +02:00
Jens Axboe	4050cf1674	[PATCH] cfq-iosched: use new io context counting mechanism It's ok if the read path is a lot more costly, as long as inc/dec is really cheap. The inc/dec will happen for each created/freed io context, while the reading only happens when a disk queue exits. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:37 +02:00
Jens Axboe	e4313dd423	[PATCH] as-iosched: use new io context counting mechanism It's ok if the read path is a lot more costly, as long as inc/dec is really cheap. The inc/dec will happen for each created/freed io context, while the reading only happens when a disk queue exits. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:37 +02:00
Jens Axboe	fc46379daf	[PATCH] cfq-iosched: kill cfq_exit_lock cfq_exit_lock is protecting two things now: - The per-ioc rbtree of cfq_io_contexts - The per-cfqd linked list of cfq_io_contexts The per-cfqd linked list can be protected by the queue lock, as it is (by definition) per cfqd as the queue lock is. The per-ioc rbtree is mainly used and updated by the process itself only. The only outside use is the io priority changing. If we move the priority changing to not browsing the rbtree, we can remove any locking from the rbtree updates and lookup completely. Let the sys_ioprio syscall just mark processes as having the iopriority changed and lazily update the private cfq io contexts the next time io is queued, and we can remove this locking as well. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:36 +02:00
Jens Axboe	89850f7ee9	[PATCH] cfq-iosched: cleanups, fixes, dead code removal A collection of little fixes and cleanups: - We don't use the 'queued' sysfs exported attribute, since the may_queue() logic was rewritten. So kill it. - Remove dead defines. - cfq_set_active_queue() can be rewritten cleaner with else if conditions. - Several places had cfq_exit_cfqq() like logic, abstract that out and use that. - Annotate the cfqq kmem_cache_alloc() so the allocator knows that this is a repeat allocation if it fails with __GFP_WAIT set. Allows the allocator to start freeing some memory, if needed. CFQ already loops for this condition, so might as well pass the hint down. - Remove cfqd->rq_starved logic. It's not needed anymore after we dropped the crq allocation in cfq_set_request(). - Remove uneeded parameter passing. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:35 +02:00
Jens Axboe	51da90fcb6	[PATCH] ll_rw_blk: cleanup __make_request() - Don't assign variables that are only used once. - Kill spin_lock() prefetching, it's opportunistic at best. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:34 +02:00
Jens Axboe	cb78b285c8	[PATCH] Drop useless bio passing in may_queue/set_request API It's not needed for anything, so kill the bio passing. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:23 +02:00
Jens Axboe	cdd6026217	[PATCH] Remove ->rq_status from struct request After Christophs SCSI change, the only usage left is RQ_ACTIVE and RQ_INACTIVE. The block layer sets RQ_INACTIVE right before freeing the request, so any check for RQ_INACTIVE in a driver is a bug and indicates use-after-free. So kill/clean the remaining users, straight forward. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:23 +02:00
Jens Axboe	49171e5c6f	[PATCH] Remove struct request_list from struct request It is always identical to &q->rq, and we only use it for detecting whether this request came out of our mempool or not. So replace it with an additional ->flags bit flag. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:22 +02:00
Jens Axboe	c00895ab2f	[PATCH] Remove ->waiting member from struct request As the comments indicates in blkdev.h, we can fold it into ->end_io_data usage as that is really what ->waiting is. Fixup the users of blk_end_sync_rq(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:29:12 +02:00
Jens Axboe	8a8e674cb1	[PATCH] as-iosched: kill arq Get rid of the as_rq request type. With the added elevator_private2, we have enough room in struct request to get rid of any arq allocation/free for each request. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de>	2006-09-30 20:27:02 +02:00
Jens Axboe	5e70537479	[PATCH] cfq-iosched: kill crq Get rid of the cfq_rq request type. With the added elevator_private2, we have enough room in struct request to get rid of any crq allocation/free for each request. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:27:02 +02:00
Jens Axboe	5380a101d3	[PATCH] cfq-iosched: remove the crq flag functions/variable There's just one flag currently (SYNC), and that one can be grabbed from the request. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:27:01 +02:00
Jens Axboe	8840faa1ee	[PATCH] deadline-iosched: remove elevator private drq request type A big win, we now save an allocation/free on each request! With the previous rb/hash abstractions, we can just reuse queuelist/donelist for the FIFO data and be done with it. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:27:00 +02:00
Jens Axboe	9e2585a8a2	[PATCH] as-iosched: remove arq->is_sync member We can track this in struct request. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de>	2006-09-30 20:27:00 +02:00
Jens Axboe	d4f2f4629e	[PATCH] as-iosched: reuse rq for fifo Saves some space in arq. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de>	2006-09-30 20:27:00 +02:00
Jens Axboe	95e8810b28	[PATCH] cfq-iosched: convert to using the FIFO elevator defines Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:59 +02:00
Jens Axboe	b8aca35af5	[PATCH] deadline-iosched: migrate to using the elevator rb functions This removes the rbtree handling from deadline. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:58 +02:00
Jens Axboe	21183b07ee	[PATCH] cfq-iosched: migrate to using the elevator rb functions This removes the rbtree handling from CFQ. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:58 +02:00
Jens Axboe	e37f346e34	[PATCH] as-iosched: migrate to using the elevator rb functions This removes the rbtree handling from AS. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de>	2006-09-30 20:26:57 +02:00
Jens Axboe	2e662b65f0	[PATCH] elevator: abstract out the rbtree sort handling The rbtree sort/lookup/reposition logic is mostly duplicated in cfq/deadline/as, so move it to the elevator core. The io schedulers still provide the actual rb root, as we don't want to impose any sort of specific handling on the schedulers. Introduce the helpers and rb_node in struct request to help migrate the IO schedulers. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:57 +02:00
Jens Axboe	10fd48f237	[PATCH] rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev The conditions got reserved. Also make rb_next() and rb_prev() check for the empty condition. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:56 +02:00
Jens Axboe	9817064b68	[PATCH] elevator: move the backmerging logic into the elevator core Right now, every IO scheduler implements its own backmerging (except for noop, which does no merging). That results in duplicated code for essentially the same operation, which is never a good thing. This patch moves the backmerging out of the io schedulers and into the elevator core. We save 1.6kb of text and as a bonus get backmerging for noop as well. Win-win! Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:56 +02:00
Jens Axboe	4aff5e2333	[PATCH] Split struct request ->flags into two parts Right now ->flags is a bit of a mess: some are request types, and others are just modifiers. Clean this up by splitting it into ->cmd_type and ->cmd_flags. This allows introduction of generic Linux block message types, useful for sending generic Linux commands to block devices. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:23:37 +02:00
Alexey Dobriyan	6c5c934153	[PATCH] ifdef blktrace debugging fields Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:09 -07:00
Randy Dunlap	87a5726110	[PATCH] block: handle subsystem_register() init errors Check and handle init errors. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:05 -07:00
Theodore Ts'o	8e18e2941c	[PATCH] inode_diet: Replace inode.u.generic_ip with inode.i_private The following patches reduce the size of the VFS inode structure by 28 bytes on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction in the inode size on a UP kernel that is configured in a production mode (i.e., with no spinlock or other debugging functions enabled; if you want to save memory taken up by in-core inodes, the first thing you should do is disable the debugging options; they are responsible for a huge amount of bloat in the VFS inode structure). This patch: The filesystem or device-specific pointer in the inode is inside a union, which is pretty pointless given that all 30+ users of this field have been using the void pointer. Get rid of the union and rename it to i_private, with a comment to explain who is allowed to use the void pointer. This is just a cleanup, but it allows us to reuse the union 'u' for something something where the union will actually be used. [judith@osdl.org: powerpc build fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Judith Lebzelter <judith@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:17 -07:00
James Bottomley	1aedf2ccc6	Merge mulgrave-w:git/linux-2.6 Conflicts: include/linux/blkdev.h Trivial merge to incorporate tag prototypes.	2006-09-23 21:03:52 -05:00
Trond Myklebust	275a082fe9	Add a real API for dealing with blk_congestion_wait() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:54 -04:00
James Bottomley	492dfb4896	[SCSI] block: add support for shared tag maps The current block queue implementation already contains most of the machinery for shared tag maps. The only remaining pieces are a way to allocate and destroy a tag map independently of the queues (so that the maps can be managed on the life cycle of the overseeing entity) Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2006-08-31 11:17:18 -04:00
Oleg Nesterov	2d8f613160	elv_unregister: fix possible crash on module unload An exiting task or process which didn't do I/O yet have no io context, elv_unregister() should check it is not NULL. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-08-22 12:52:23 -07:00
Oleg Nesterov	be33c3a67b	[PATCH] cfq_cic_link: fix usage of wrong cfq_io_context Obviously, cfq_cic_link() shouldn't free a just allocated cfq_io_context? The dead key is from __cic, so drop that. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-08-21 10:02:54 +02:00
Oleg Nesterov	9f83e45eb5	[PATCH] Fix current_io_context() vs set_task_ioprio() race I know nothing about io scheduler, but I suspect set_task_ioprio() is not safe. current_io_context() initializes "struct io_context", then sets ->io_context. set_task_ioprio() running on another cpu may see the changes out of order, so ->set_ioprio(ioc) may use io_context which was not initialized properly. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-08-21 08:34:15 +02:00
Jens Axboe	44eb123126	[PATCH] cfq-iosched: don't use a hard jiffies value, translate from msecs The CIC_SEEKY() test really wants to use the minimum of either: - 2 msecs (not jiffies) - or, the pending slice time So code it like that. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-07-25 15:05:21 +02:00
Milton Miller	ad01b1ca79	[PATCH] blktrace: fix read-ahead bit It should be toggling the same bit on and off, fix it up. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-07-25 15:04:13 +02:00
Arjan van de Ven	ddca60c590	[PATCH] lockdep: annotate the BLKPG_DEL_PARTITION ioctl The delete partition IOCTL takes the bd_mutex for both the disk and the partition; these have an obvious hierarchical relationship and this patch annotates this relationship for lockdep. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-14 21:53:53 -07:00
Jens Axboe	1959d21232	[PATCH] Only the first two bits in bio->bi_rw and rq->flags match Not three, as assumed. This causes the barrier bit to be needlessly set for some IO. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-07-06 10:18:05 +02:00
Nathan Scott	40359ccb83	[PATCH] blktrace: readahead support Provide the needed kernel support for distinguishing readahead from regular read requests when tracing block devices. Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-07-06 10:03:28 +02:00
Ingo Molnar	60be6b9a41	[PATCH] lockdep: annotate on-stack completions lockdep needs to have the waitqueue lock initialized for on-stack waitqueues implicitly initialized by DECLARE_COMPLETION(). Annotate on-stack completions accordingly. Has no effect on non-lockdep kernels. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-03 15:27:09 -07:00
Linus Torvalds	22a3e233ca	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: Remove obsolete #include <linux/config.h> remove obsolete swsusp_encrypt arch/arm26/Kconfig typos Documentation/IPMI typos Kconfig: Typos in net/sched/Kconfig v9fs: do not include linux/version.h Documentation/DocBook/mtdnand.tmpl: typo fixes typo fixes: specfic -> specific typo fixes in Documentation/networking/pktgen.txt typo fixes: occuring -> occurring typo fixes: infomation -> information typo fixes: disadvantadge -> disadvantage typo fixes: aquire -> acquire typo fixes: mecanism -> mechanism typo fixes: bandwith -> bandwidth fix a typo in the RTC_CLASS help text smb is no longer maintained Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S	2006-06-30 15:39:30 -07:00
Christoph Lameter	f8891e5e1f	[PATCH] Light weight event counters The remaining counters in page_state after the zoned VM counter patches have been applied are all just for show in /proc/vmstat. They have no essential function for the VM. We use a simple increment of per cpu variables. In order to avoid the most severe races we disable preempt. Preempt does not prevent the race between an increment and an interrupt handler incrementing the same statistics counter. However, that race is exceedingly rare, we may only loose one increment or so and there is no requirement (at least not in kernel) that the vm event counters have to be accurate. In the non preempt case this results in a simple increment for each counter. For many architectures this will be reduced by the compiler to a single instruction. This single instruction is atomic for i386 and x86_64. And therefore even the rare race condition in an interrupt is avoided for both architectures in most cases. The patchset also adds an off switch for embedded systems that allows a building of linux kernels without these counters. The implementation of these counters is through inline code that hopefully results in only a single instruction increment instruction being emitted (i386, x86_64) or in the increment being hidden though instruction concurrency (EPIC architectures such as ia64 can get that done). Benefits: - VM event counter operations usually reduce to a single inline instruction on i386 and x86_64. - No interrupt disable, only preempt disable for the preempt case. Preempt disable can also be avoided by moving the counter into a spinlock. - Handling is similar to zoned VM counters. - Simple and easily extendable. - Can be omitted to reduce memory use for embedded use. References: RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2 RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2 local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2 V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2 V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2 V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2 Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-30 11:25:36 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Chandra Seetharaman	5a67e4c5b6	[PATCH] cpu hotplug: use hotplug version of cpu notifier in appropriate places Make use the of newly defined hotplug version of cpu_notifier functionality wherever appropriate. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-27 17:32:41 -07:00
Chandra Seetharaman	054cc8a2d8	[PATCH] cpu hotplug: revert initdata patch submitted for 2.6.17 This patch reverts notifier_block changes made in 2.6.17 Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-27 17:32:41 -07:00
Andreas Mohr	d6e05edc59	spelling fixes acquired (aquired) contiguous (contigious) successful (succesful, succesfull) surprise (suprise) whether (weather) some other misspellings Signed-off-by: Andreas Mohr <andi@lisas.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-26 18:35:02 +02:00
Andi Kleen	8269730b38	[BLOCK] Fix bounce limit address check Do a safer check for when to enable DMA. Currently we enable ISA DMA for cases that do not need it, resulting in OOM conditions when ZONE_DMA runs out of space. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	dd67d05152	[PATCH] rbtree: support functions used by the io schedulers They all duplicate macros to check for empty root and/or node, and clearing a node. So put those in rbtree.h. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	fd61af0384	[PATCH] cfq-iosched: rq update fixes - Remember to set ->last_sector so that the cfq_choose_req() logic works correctly. - Remove redundant call to cfq_choose_req() Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	caaa5f9f0a	[PATCH] cfq-iosched: many performance fixes This is a collection of patches that greatly improve CFQ performance in some circumstances. - Change the idling logic to only kick in after a request is done and we are deciding what to do. Before the idling included the request service time, so it was hard to adjust. Now it's true think/idle time. - Take advantage of TCQ/NCQ/queueing for seeky sync workloads, but keep it in control for sync and sequential (or close to) workloads. - Expire queues immediately and move on to other busy queues, if we are not going to idle after the current one finishes. - Don't rearm idle timer if there are no busy queues. Just leave the system idle. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	35e6077cb1	[PATCH] cfq-iosched: correctly set ioprio on both targets Patch originally from Vasily Tarasov <vtaras@sw.ru> If you set io-priority of process 1 using sys_ioprio_set system call by another process 2 (like ionice do), then cfq_init_prio_data() function sets priority of process 2 (current) on queue of process 1 and clears the flag, that designates change of ioprio. So the process 1 will work like with priority of process 2. I propose not to call cfq_init_prio_data() on io-priority change, but only mark queue as queue with changed prority. Every time when new request comes cfq-scheduler checks for this flag and atomaticaly changes priority of queue to new value. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	b17fd9bceb	[PATCH] Make CFQ the default IO scheduler Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	b31dc66a54	[PATCH] Kill PF_SYNCWRITE flag A process flag to indicate whether we are doing sync io is incredibly ugly. It also causes performance problems when one does a lot of async io and then proceeds to sync it. Part of the io will go out as async, and the other part as sync. This causes a disconnect between the previously submitted io and the synced io. For io schedulers such as CFQ, this will cause us lost merges and suboptimal behaviour in scheduling. Remove PF_SYNCWRITE completely from the fsync/msync paths, and let the O_DIRECT path just directly indicate that the writes are sync by using WRITE_SYNC instead. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	271f18f102	[PATCH] cfq-iosched: Don't set the queue batching limits We cannot update them if the user changes nr_requests, so don't set it in the first place. The gains are pretty questionable as well. The batching loss has been shown to decrease throughput. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:38 +02:00
Dave Jones	acf4217555	[PATCH] remove dead code from elevator switching We already drop the refcount in elevator_exit(), and as we're setting 'e' to NULL, we'll never take that branch anyway. Finally, as 'e' is a local var that isn't referenced afterwards, setting it to NULL is pointless. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:38 +02:00
Paolo 'Blaisorblade' Giarrusso	a038e25364	[PATCH] blk_start_queue() must be called with irq disabled - add warning The queue lock can be taken from interrupts so it must always be taken with irq disabling primitives. Some primitives already verify this. blk_start_queue() is called under this lock, so interrupts must be disabled. Also document this requirement clearly in blk_init_queue(), where the queue spinlock is set. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:38 +02:00
Akinobu Mita	bae386f788	[PATCH] iosched: use hlist for request hashtable Use hlist instead of list_head for request hashtable in deadline-iosched and as-iosched. It also can remove the flag to know hashed or unhashed. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Jens Axboe <axboe@suse.de> block/as-iosched.c \| 45 +++++++++++++++++++-------------------------- block/deadline-iosched.c \| 39 ++++++++++++++++----------------------- 2 files changed, 35 insertions(+), 49 deletions(-)	2006-06-23 17:10:38 +02:00
Oleg Nesterov	626ab0e69d	[PATCH] list: use list_replace_init() instead of list_splice_init() list_splice_init(list, head) does unneeded job if it is known that list_empty(head) == 1. We can use list_replace_init() instead. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-23 07:43:07 -07:00
Kay Sievers	b9d9c82b4d	[PATCH] Driver core: add generic "subsystem" link to all devices Like the SUBSYTEM= key we find in the environment of the uevent, this creates a generic "subsystem" link in sysfs for every device. Userspace usually doesn't care at all if its a "class" or a "bus" device. This provides an unified way to determine the subsytem of a device, regardless of the way the driver core has created it. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-06-21 12:40:49 -07:00
Linus Torvalds	6b41fd1785	Fix up CFQ scheduler for recent rbtree node shrinkage The color is now in the low bits of the parent pointer, and initializing it to 0 happens as part of the whole memset above, so just remove the unnecessary RB_CLEAR_COLOR. Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-20 19:44:03 -07:00
Linus Torvalds	2edc322d42	Merge git://git.infradead.org/~dwmw2/rbtree-2.6 * git://git.infradead.org/~dwmw2/rbtree-2.6: [RBTREE] Switch rb_colour() et al to en_US spelling of 'color' for consistency Update UML kernel/physmem.c to use rb_parent() accessor macro [RBTREE] Update hrtimers to use rb_parent() accessor macro. [RBTREE] Add explicit alignment to sizeof(long) for struct rb_node. [RBTREE] Merge colour and parent fields of struct rb_node. [RBTREE] Remove dead code in rb_erase() [RBTREE] Update JFFS2 to use rb_parent() accessor macro. [RBTREE] Update eventpoll.c to use rb_parent() accessor macro. [RBTREE] Update key.c to use rb_parent() accessor macro. [RBTREE] Update ext3 to use rb_parent() accessor macro. [RBTREE] Change rbtree off-tree marking in I/O schedulers. [RBTREE] Add accessor macros for colour and parent fields of rb_node	2006-06-20 14:51:22 -07:00
Jens Axboe	553698f944	[PATCH] cfq-iosched: fix crash in do_div() We don't clear the seek stat values in cfq_alloc_io_context(), and if ->seek_mean is unlucky enough to be set to -36 by chance, the first invocation of cfq_update_io_seektime() will oops with a divide by zero in do_div(). Just memset the entire cic instead of filling invididual values independently. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-14 10:22:16 -07:00
Jens Axboe	bc1c116974	[PATCH] elevator switching race There's a race between shutting down one io scheduler and firing up the next, in which a new io could enter and cause the io scheduler to be invoked with bad or NULL data. To fix this, we need to maintain the queue lock for a bit longer. Unfortunately we cannot do that, since the elevator init requires to be run without the lock held. This isn't easily fixable, without also changing the mempool API. So split the initialization into two parts, and alloc-init operation and an attach operation. Then we can preallocate the io scheduler and related structures, and run the attach inside the lock after we detach the old one. This patch has survived 30 minutes of 1 second io scheduler switching with a very busy io load. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-08 15:14:23 -07:00
Jens Axboe	b52a834892	[PATCH] cfq-iosched: busy_rr fairness fix Now that we select busy_rr for possible service, insert entries at the back of that list instead of at the front. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-01 18:53:43 +02:00
Jens Axboe	ae818a38d4	[PATCH] cfq-iosched: fix bug in timer handling for the idle class There's a small window from when the timer is entered and we grab the queue lock, where cfq_set_active_queue() could be rearming the timer for us. Seen in the wild on a 12-way ppc box. Fix this by just using mod_timer(), which will do the right thing for us. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-01 10:13:43 +02:00
Jens Axboe	25776e3594	[PATCH] cfq-iosched: Detect hardware queueing If the hardware is doing real queueing, decide that it's worthless to idle the hardware. It does reasonable simultaneous io in that case anyways, and the idling hurts some work loads. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-01 10:12:26 +02:00
Jens Axboe	12e9fddd6e	[PATCH] cfq-iosched: Detect idle process issuing async request If we are anticipating a sync request from this process and we are waiting for that and see an async request come in, expire that slice and move on. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-01 10:09:56 +02:00
Jens Axboe	e0de0206a2	[PATCH] cfq-iosched: check busy queues before deciding we are idle For just one busy queue (like async write out), we often overlooked that we could queue more io and decided we were idle instead. This causes us quite a bit of performance loss. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-01 10:07:26 +02:00
Jens Axboe	3793c65c13	[PATCH] cfq-iosched: fixup locking and ->queue_list list management - Drop cic from the list when seen as dead. - Fixup the locking, just use a simple spinlock. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-30 20:31:05 -07:00
Jens Axboe	fd0ff8aa1d	[PATCH] blk: fix gendisk->in_flight accounting during barrier sequence While executing barrrier sequence, the bar_rq which carries actual write was accounted as normal IO on completion, while it wasn't on queueing. This caused gendisk->in_flight to be decremented by 1 after each barrier thus messed up statistics. This patch makes bar_rq not accounted as normal IO. As the containing barrier request as a whole is accounted, part of it shouldn't be. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-23 10:39:43 -07:00
Linus Torvalds	1a2acc9e92	Revert "[BLOCK] Fix oops on removal of SD/MMC card" This reverts commit `56cf6504fc`. Both Erik Mouw and Andrew Vasquez independently pinpointed this commit as causing problems, where the slab cache for a driver is never released (most obviously causing problems when immediately re-loading that driver, resulting in a "kmem_cache_create: duplicate cache <xyz>" message, but it can also cause other trouble). James Bottomley dug into it, and reports: "OK, here's the scoop. The problem patch adds a get of driverfs_dev in add_disk(), but doesn't put it again until disk_release() (which occurs on final put_disk() of the gendisk). However, in SCSI, the driverfs_dev is the sdev_gendev. That means there's a reference held on sdev_gendev until final disk put. Unfortunately, we use the driver model driver_remove to trigger del_gendisk (which removes the gendisk from visibility and decrements the refcount), so we've introduced an unbreakable deadlock in the reference counting with this. I suggest simply reversing this patch at the moment. If Russell and Jens can tell me what they're trying to do I'll see if there's another way to do it." so hereby the patch gets reverted, waiting for a better fix. Cc: Jens Axboe <axboe@suse.de> Cc: Russell King <rmk@arm.linux.org.uk> Cc: James Bottomley <James.Bottomley@SteelEye.com> Cc: Erik Mouw <erik@harddisk-recovery.com> Cc: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-12 12:08:46 -07:00
Jens Axboe	dac07ec121	[BLOCK] limit request_fn recursion Don't recurse back into the driver even if the unplug threshold is met, when the driver asks for a requeue. This is both silly from a logical point of view (requeues typically happen due to driver/hardware shortage), and also dangerous since we could hit an endless request_fn -> requeue -> unplug -> request_fn loop and crash on stack overrun. Also limit blk_run_queue() to one level of recursion, similar to how blk_start_queue() works. This patch fixed a real problem with SLES10 and lpfc, and it could hit any SCSI lld that returns non-zero from it's ->queuecommand() handler. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-11 12:38:59 -07:00
Russell King	56cf6504fc	[BLOCK] Fix oops on removal of SD/MMC card The block layer keeps a reference (driverfs_dev) to the struct device associated with the block device, and uses it internally for generating uevents in block_uevent. Block device uevents include umounting the partition, which can occur after the backing device has been removed. Unfortunately, this reference is not counted. This means that if the struct device is removed from the device tree, the block layers reference will become stale. Guard against this by holding a reference to the struct device in add_disk(), and only drop the reference when we're releasing the gendisk kobject - in other words when we can be sure that no further uevents will be generated for this block device. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Jens Axboe <axboe@suse.de>	2006-05-05 17:57:52 +01:00
Chandra Seetharaman	649bbaa484	[PATCH] Remove __devinitdata from notifier block definitions Few of the notifier_chain_register() callers use __devinitdata in the definition of notifier_block data structure. It is incorrect as the data structure should be available after the initializations (they do not unregister them during initializations). This was leading to an oops when notifier_chain_register() call is invoked for those callback chains after initialization. This patch fixes all such usages to _not_ have the notifier_block data structure in the init data section. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-04-26 08:27:50 -07:00
David Woodhouse	3db3a44530	[RBTREE] Change rbtree off-tree marking in I/O schedulers. They were abusing the rb_color field to mark nodes which weren't currently on the tree. Fix that to use the same method as eventpoll did -- setting the parent pointer to point back to itself. And use the appropriate accessor macros for setting and reading the parent. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-04-21 13:15:17 +01:00
Adrian Bunk	4f73247f0e	[PATCH] block/elevator.c: remove unused exports This patch removes the following unused EXPORT_SYMBOL's: - elv_requeue_request - elv_completed_request They are only used by the block core, hence they need not be exported. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-04-20 15:45:22 +02:00
Coywolf Qi Hunt	7daac49020	[patch] cleanup: use blk_queue_stopped This cleanup the source to use blk_queue_stopped. Signed-off-by: Coywolf Qi Hunt <qiyong@freeforge.net> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-04-20 13:04:36 +02:00
OGAWA Hirofumi	be3b075354	[PATCH] cfq: Further rbtree traversal and cfq_exit_queue() race fix In current code, we are re-reading cic->key after dead cic->key check. So, in theory, it may really re-read after cfq_exit_queue() seted NULL. To avoid race, we copy it to stack, then use it. With this change, I guess gcc will assign cic->key to a register or stack, and it wouldn't be re-readed. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-04-18 19:18:31 +02:00
OGAWA Hirofumi	dbecf3ab40	[PATCH 2/2] cfq: fix cic's rbtree traversal When queue dies, we set cic->key=NULL as dead mark. So, when we traverse a rbtree, we must check whether it's still valid key. if it was invalidated, drop it, then restart the traversal from top. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-04-18 09:45:18 +02:00
OGAWA Hirofumi	fba822722e	[PATCH 1/2] iosched: fix typo and barrier() On rmmod path, cfq/as waits to make sure all io-contexts was freed. However, it's using complete(), not wait_for_completion(). I think barrier() is not enough in here. To avoid the following case, this patch replaces barrier() with smb_wmb(). cpu0 visibility cpu1 [ioc_gnone=NULL,ioc_count=1] ioc_gnone = &all_gone NULL,ioc_count=1 atomic_read(&ioc_count) NULL,ioc_count=1 wait_for_completion() NULL,ioc_count=0 atomic_sub_and_test() NULL,ioc_count=0 if ( && ioc_gone) [ioc_gone==NULL, so doesn't call complete()] &all_gone,ioc_count=0 Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-04-18 09:44:06 +02:00
Christoph Hellwig	21b2f0c803	[SCSI] unify SCSI_IOCTL_SEND_COMMAND implementations We currently have two implementations of this obsolete ioctl, one in the block layer and one in the scsi code. Both of them have drawbacks. This patch kills the scsi layer version after updating the block version with the missing bits: - argument checking - use scatterlist I/O - set number of retries based on the submitted command This is the last user of non-S/G I/O except for the gdth driver, so getting this in ASAP and through the scsi tree would be nie to kill the non-S/G I/O path. Jens, what do you think about adding a check for non-S/G I/O in the midlayer? Thanks to Or Gerlitz for testing this patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2006-04-13 10:13:15 -05:00
Martin Waitz	a580290c3e	Documentation: fix minor kernel-doc warnings This patch updates the comments to match the actual code. Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-04-02 13:59:55 +02:00
Trond Myklebust	88b9adb073	[PATCH] config: fix CONFIG_LFS option The help text says that if you select CONFIG_LBD, then it will automatically select CONFIG_LFS. That isn't currently the case, so update the text. - Get rid of the cruft in the help text mentioning CONFIG_LBD - Tell unsure users to select CONFIG_LFS. - Remove the `default n'. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-31 12:18:55 -08:00
OGAWA Hirofumi	9b41046cd0	[PATCH] Don't pass boot parameters to argv_init[] The boot cmdline is parsed in parse_early_param() and parse_args(,unknown_bootoption). And __setup() is used in obsolete_checksetup(). start_kernel() -> parse_args() -> unknown_bootoption() -> obsolete_checksetup() If __setup()'s callback (->setup_func()) returns 1 in obsolete_checksetup(), obsolete_checksetup() thinks a parameter was handled. If ->setup_func() returns 0, obsolete_checksetup() tries other ->setup_func(). If all ->setup_func() that matched a parameter returns 0, a parameter is seted to argv_init[]. Then, when runing /sbin/init or init=app, argv_init[] is passed to the app. If the app doesn't ignore those arguments, it will warning and exit. This patch fixes a wrong usage of it, however fixes obvious one only. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-31 12:18:53 -08:00
Joe Korty	68eef3b479	[PATCH] Simplify proc/devices and fix early termination regression Make baby-simple the code for /proc/devices. Based on the proven design for /proc/interrupts. This also fixes the early-termination regression 2.6.16 introduced, as demonstrated by: # dd if=/proc/devices bs=1 Character devices: 1 mem 27+0 records in 27+0 records out This should also work (but is untested) when /proc/devices >4096 bytes, which I believe is what the original 2.6.16 rewrite fixed. [akpm@osdl.org: cleanups, simplifications] Signed-off-by: Joe Korty <joe.korty@ccur.com> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-31 12:18:53 -08:00
Linus Torvalds	7baf398f12	Merge branch 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block * 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block: [BLOCK] cfq-iosched: seek and async performance fixes [PATCH] ll_rw_blk: fix 80-col offender in put_io_context() [PATCH] cfq-iosched: small cfq_choose_req() optimization [PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree	2006-03-28 09:25:44 -08:00
KAMEZAWA Hiroyuki	0a94502277	[PATCH] for_each_possible_cpu: fixes for generic part replaces for_each_cpu with for_each_possible_cpu(). Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-28 09:16:05 -08:00
Jens Axboe	206dc69b31	[BLOCK] cfq-iosched: seek and async performance fixes Detect whether a given process is seeky and if so disable (mostly) the idle window if it is. We still allow just a little idle time, just enough to allow that process to submit a new request. That is needed to maintain fairness across priority groups. In some cases, we could setup several async queues. This is not optimal from a performance POV, since we want all async io in one queue to perform good sorting on it. It also impacted sync queues, as async io got too much slice time. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-03-28 13:03:44 +02:00
Jens Axboe	7143dd4b01	[PATCH] ll_rw_blk: fix 80-col offender in put_io_context() This makes akpm more happy. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-03-28 09:00:28 +02:00
Andreas Mohr	e8a99053ea	[PATCH] cfq-iosched: small cfq_choose_req() optimization this is a small optimization to cfq_choose_req() in the CFQ I/O scheduler (this function is a semi-often invoked candidate in an oprofile log): by using a bit mask variable, we can use a simple switch() to check the various cases instead of having to query two variables for each check. Benefit: 251 vs. 285 bytes footprint of cfq_choose_req(). Also, common case 0 (no request wrapping) is now checked first in code. Signed-off-by: Andreas Mohr <andi@lisas.de> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-03-28 08:59:49 +02:00
Jens Axboe	e2d74ac066	[PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree On setups with many disks, we spend a considerable amount of time looking up the process-disk mapping on each queue of io. Testing with a NULL based block driver, this costs 40-50% reduction in throughput for 1000 disks. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-03-28 08:59:01 +02:00
Linus Torvalds	4fa639123d	Merge branch 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block * 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block: [PATCH] Don't make debugfs depend on DEBUG_KERNEL [PATCH] Fix blktrace compile with sysfs not defined [PATCH] unused label in drivers/block/cciss. [BLOCK] increase size of disk stat counters [PATCH] blk_execute_rq_nowait-speedup [PATCH] ide-cd: quiet down GPCMD_READ_CDVD_CAPACITY failure [BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion [PATCH] kzalloc() conversion in drivers/block [PATCH] update max_sectors documentation	2006-03-27 08:46:49 -08:00
NeilBrown	89e5c8b5b8	[PATCH] md: Make sure QUEUE_FLAG_CLUSTER is set properly for md. This flag should be set for a virtual device iff it is set for all underlying devices. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:00 -08:00
Jens Axboe	09540e691d	[PATCH] Fix blktrace compile with sysfs not defined debugfs depends on sysfs, so make blktrace kconfig option depend on that. Reported by Adrian Bunk. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-03-27 09:29:03 +02:00
Ben Woodard	837c787877	[BLOCK] increase size of disk stat counters The kernel's representation of the disk statistics uses the type unsigned which is 32b on both 32b and 64b platforms. Unfortunately, most system tools that work with these numbers that are exported in /proc/diskstats including iostat read these numbers into unsigned longs. This works fine on 32b platforms and when the number of IO transactions are small on 64b platforms. However, when the numbers wrap on 64b platforms & you read the numbers into unsigned longs, and compare the numbers to previous readings, then you get an unsigned representation of a negative number. This looks like a very large 64b number & gives you bizarre readouts in iostat: ilc4: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util ilc4: sda 5.50 0.00 143.96 0.00 307496983987862656.00 0.00 153748491993931328.00 0.00 2136028725038430.00 7.94 55.12 5.59 80.42 Though fixing iostat in user space is possible, and a quick survey indicates that several other similar tools also use unsigned longs when processing /proc/diskstats. Therefore, it seems like a better approach would be to extend the length of the disk_stats structure on 64b architectures to 64b. The following patch does that. It should not affect the operation on 32b platforms. Signed-off-by: Ben Woodard <woodard@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-03-27 09:29:02 +02:00
Andrew Morton	4c5d0bbde9	[PATCH] blk_execute_rq_nowait-speedup Both elv_add_request() and generic_unplug_device() grab the queue lock and disable interrupts, do that locally and use the __ variants. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-03-27 09:29:02 +02:00
Jens Axboe	f68110fc28	[BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion Signed-off-by: Jens Axboe <axboe@suse.de>	2006-03-27 09:29:02 +02:00
Takashi Sato	a0f62ac636	[PATCH] 2TB files: add blkcnt_t Add blkcnt_t as the type of inode.i_blocks. This enables you to make the size of blkcnt_t either 4 bytes or 8 bytes on 32 bits architecture with CONFIG_LSF. - CONFIG_LSF Add new configuration parameter. - blkcnt_t On h8300, i386, mips, powerpc, s390 and sh that define sector_t, blkcnt_t is defined as u64 if CONFIG_LSF is enabled; otherwise it is defined as unsigned long. On other architectures, it is defined as unsigned long. - inode.i_blocks Change the type from sector_t to blkcnt_t. Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:57:00 -08:00
Matthew Dobson	93d2341c75	[PATCH] mempool: use mempool_create_slab_pool() Modify well over a dozen mempool users to call mempool_create_slab_pool() rather than calling mempool_create() with extra arguments, saving about 30 lines of code and increasing readability. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:57:00 -08:00
Eric Sesterhenn	ce52449742	BUG_ON() Conversion in block/elevator.c this changes if() BUG(); constructs to BUG_ON() which is cleaner, contains unlikely() and can better optimized away. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-03-24 18:43:26 +01:00
Jens Axboe	2056a782f8	[PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23 Signed-off-by: Jens Axboe <axboe@suse.de>	2006-03-23 20:00:26 +01:00
Arjan van de Ven	c039e3134a	[PATCH] sem2mutex: blockdev #2 Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-23 07:38:11 -08:00
Jes Sorensen	58383af629	[PATCH] kobj_map semaphore to mutex conversion Convert the kobj_map code to use a mutex instead of a semaphore. It converts the single two users as well, genhd.c and char_dev.c. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-03-20 13:42:58 -08:00
Al Viro	e572ec7e4e	[PATCH] fix rmmod problems with elevator attributes, clean them up	2006-03-18 22:27:18 -05:00
Al Viro	3d1ab40f4c	[PATCH] elevator_t lifetime rules and sysfs fixes	2006-03-18 18:35:43 -05:00
Al Viro	1cc9be68eb	[PATCH] noise removal: cfq-iosched.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-03-18 18:35:08 -05:00
Al Viro	a90d742e4c	[PATCH] don't bother with refcounting for cfq_data Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-03-18 18:35:05 -05:00
Al Viro	483f4afc42	[PATCH] fix sysfs interaction and lifetime rules handling for queues	2006-03-18 18:34:37 -05:00
Al Viro	6f325a1344	[PATCH] fix cfq_get_queue()/ioprio_set(2) races Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-03-18 18:34:17 -05:00
Al Viro	334e94de9b	[PATCH] deal with rmmod/put_io_context() races Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-03-18 18:34:15 -05:00
Al Viro	e17a9489b4	[PATCH] stop elv_unregister() from rogering other iosched's data, fix locking Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-03-18 18:34:12 -05:00
Al Viro	25975f863b	[PATCH] stop cfq from pinning queue down Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-03-18 18:34:09 -05:00

... 3 4 5 6 7 ...

514 Commits