Commit Graph

204 Commits

Author SHA1 Message Date
Jens Axboe
10fd48f237 [PATCH] rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev
The conditions got reserved. Also make rb_next() and rb_prev() check
for the empty condition.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:26:56 +02:00
Jens Axboe
9817064b68 [PATCH] elevator: move the backmerging logic into the elevator core
Right now, every IO scheduler implements its own backmerging (except for
noop, which does no merging). That results in duplicated code for
essentially the same operation, which is never a good thing. This patch
moves the backmerging out of the io schedulers and into the elevator
core. We save 1.6kb of text and as a bonus get backmerging for noop as
well. Win-win!

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:26:56 +02:00
Jens Axboe
4aff5e2333 [PATCH] Split struct request ->flags into two parts
Right now ->flags is a bit of a mess: some are request types, and
others are just modifiers. Clean this up by splitting it into
->cmd_type and ->cmd_flags. This allows introduction of generic
Linux block message types, useful for sending generic Linux commands
to block devices.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:23:37 +02:00
Alexey Dobriyan
6c5c934153 [PATCH] ifdef blktrace debugging fields
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:09 -07:00
Randy Dunlap
87a5726110 [PATCH] block: handle subsystem_register() init errors
Check and handle init errors.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:05 -07:00
Theodore Ts'o
8e18e2941c [PATCH] inode_diet: Replace inode.u.generic_ip with inode.i_private
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86.  (It would be more on an x86_64 system).  This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).

This patch:

The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer.  Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer.  This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.

[judith@osdl.org: powerpc build fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Judith Lebzelter <judith@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:17 -07:00
James Bottomley
1aedf2ccc6 Merge mulgrave-w:git/linux-2.6
Conflicts:

	include/linux/blkdev.h

Trivial merge to incorporate tag prototypes.
2006-09-23 21:03:52 -05:00
Trond Myklebust
275a082fe9 Add a real API for dealing with blk_congestion_wait()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22 23:24:54 -04:00
James Bottomley
492dfb4896 [SCSI] block: add support for shared tag maps
The current block queue implementation already contains most of the
machinery for shared tag maps.  The only remaining pieces are a way to
allocate and destroy a tag map independently of the queues (so that
the maps can be managed on the life cycle of the overseeing entity)

Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-08-31 11:17:18 -04:00
Oleg Nesterov
2d8f613160 elv_unregister: fix possible crash on module unload
An exiting task or process which didn't do I/O yet have no io context,
elv_unregister() should check it is not NULL.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-22 12:52:23 -07:00
Oleg Nesterov
be33c3a67b [PATCH] cfq_cic_link: fix usage of wrong cfq_io_context
Obviously, cfq_cic_link() shouldn't free a just allocated cfq_io_context?
The dead key is from __cic, so drop that.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-08-21 10:02:54 +02:00
Oleg Nesterov
9f83e45eb5 [PATCH] Fix current_io_context() vs set_task_ioprio() race
I know nothing about io scheduler, but I suspect set_task_ioprio() is not safe.

current_io_context() initializes "struct io_context", then sets ->io_context.
set_task_ioprio() running on another cpu may see the changes out of order, so
->set_ioprio(ioc) may use io_context which was not initialized properly.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-08-21 08:34:15 +02:00
Jens Axboe
44eb123126 [PATCH] cfq-iosched: don't use a hard jiffies value, translate from msecs
The CIC_SEEKY() test really wants to use the minimum of either:

- 2 msecs (not jiffies)

- or, the pending slice time

So code it like that.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-07-25 15:05:21 +02:00
Milton Miller
ad01b1ca79 [PATCH] blktrace: fix read-ahead bit
It should be toggling the same bit on and off, fix it up.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-07-25 15:04:13 +02:00
Arjan van de Ven
ddca60c590 [PATCH] lockdep: annotate the BLKPG_DEL_PARTITION ioctl
The delete partition IOCTL takes the bd_mutex for both the disk and the
partition; these have an obvious hierarchical relationship and this patch
annotates this relationship for lockdep.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:53 -07:00
Jens Axboe
1959d21232 [PATCH] Only the first two bits in bio->bi_rw and rq->flags match
Not three, as assumed. This causes the barrier bit to be needlessly set
for some IO.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-07-06 10:18:05 +02:00
Nathan Scott
40359ccb83 [PATCH] blktrace: readahead support
Provide the needed kernel support for distinguishing readahead
from regular read requests when tracing block devices.

Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-07-06 10:03:28 +02:00
Ingo Molnar
60be6b9a41 [PATCH] lockdep: annotate on-stack completions
lockdep needs to have the waitqueue lock initialized for on-stack waitqueues
implicitly initialized by DECLARE_COMPLETION().  Annotate on-stack completions
accordingly.

Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:09 -07:00
Linus Torvalds
22a3e233ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  Remove obsolete #include <linux/config.h>
  remove obsolete swsusp_encrypt
  arch/arm26/Kconfig typos
  Documentation/IPMI typos
  Kconfig: Typos in net/sched/Kconfig
  v9fs: do not include linux/version.h
  Documentation/DocBook/mtdnand.tmpl: typo fixes
  typo fixes: specfic -> specific
  typo fixes in Documentation/networking/pktgen.txt
  typo fixes: occuring -> occurring
  typo fixes: infomation -> information
  typo fixes: disadvantadge -> disadvantage
  typo fixes: aquire -> acquire
  typo fixes: mecanism -> mechanism
  typo fixes: bandwith -> bandwidth
  fix a typo in the RTC_CLASS help text
  smb is no longer maintained

Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
2006-06-30 15:39:30 -07:00
Christoph Lameter
f8891e5e1f [PATCH] Light weight event counters
The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat.  They have no
essential function for the VM.

We use a simple increment of per cpu variables.  In order to avoid the most
severe races we disable preempt.  Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter.  However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.

In the non preempt case this results in a simple increment for each
counter.  For many architectures this will be reduced by the compiler to a
single instruction.  This single instruction is atomic for i386 and x86_64.
 And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.

The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.

The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).

Benefits:
- VM event counter operations usually reduce to a single inline instruction
  on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
  Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.

References:

RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:36 -07:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Chandra Seetharaman
5a67e4c5b6 [PATCH] cpu hotplug: use hotplug version of cpu notifier in appropriate places
Make use the of newly defined hotplug version of cpu_notifier functionality
wherever appropriate.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:41 -07:00
Chandra Seetharaman
054cc8a2d8 [PATCH] cpu hotplug: revert initdata patch submitted for 2.6.17
This patch reverts notifier_block changes made in 2.6.17

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:41 -07:00
Andreas Mohr
d6e05edc59 spelling fixes
acquired (aquired)
contiguous (contigious)
successful (succesful, succesfull)
surprise (suprise)
whether (weather)
some other misspellings

Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-26 18:35:02 +02:00
Andi Kleen
8269730b38 [BLOCK] Fix bounce limit address check
Do a safer check for when to enable DMA. Currently we enable ISA DMA
for cases that do not need it, resulting in OOM conditions when ZONE_DMA
runs out of space.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
Jens Axboe
dd67d05152 [PATCH] rbtree: support functions used by the io schedulers
They all duplicate macros to check for empty root and/or node, and
clearing a node. So put those in rbtree.h.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
Jens Axboe
fd61af0384 [PATCH] cfq-iosched: rq update fixes
- Remember to set ->last_sector so that the cfq_choose_req() logic
  works correctly.

- Remove redundant call to cfq_choose_req()

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
Jens Axboe
caaa5f9f0a [PATCH] cfq-iosched: many performance fixes
This is a collection of patches that greatly improve CFQ performance
in some circumstances.

- Change the idling logic to only kick in after a request is done and we
  are deciding what to do. Before the idling included the request service
  time, so it was hard to adjust. Now it's true think/idle time.

- Take advantage of TCQ/NCQ/queueing for seeky sync workloads, but keep
  it in control for sync and sequential (or close to) workloads.

- Expire queues immediately and move on to other busy queues, if we are
  not going to idle after the current one finishes.

- Don't rearm idle timer if there are no busy queues. Just leave the
  system idle.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
Jens Axboe
35e6077cb1 [PATCH] cfq-iosched: correctly set ioprio on both targets
Patch originally from Vasily Tarasov <vtaras@sw.ru>

If you set io-priority of process 1 using sys_ioprio_set system call by
another process 2 (like ionice do), then cfq_init_prio_data() function
sets priority of process 2 (current) on queue of process 1 and clears
the flag, that designates change of ioprio.  So the process  1 will work
like with priority of process 2.

I propose not to call cfq_init_prio_data() on io-priority change, but
only mark queue as queue with changed prority.  Every time when new
request comes cfq-scheduler checks for this flag and atomaticaly changes
priority of queue to new value.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
Jens Axboe
b17fd9bceb [PATCH] Make CFQ the default IO scheduler
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
Jens Axboe
b31dc66a54 [PATCH] Kill PF_SYNCWRITE flag
A process flag to indicate whether we are doing sync io is incredibly
ugly. It also causes performance problems when one does a lot of async
io and then proceeds to sync it. Part of the io will go out as async,
and the other part as sync. This causes a disconnect between the
previously submitted io and the synced io. For io schedulers such as CFQ,
this will cause us lost merges and suboptimal behaviour in scheduling.

Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
the O_DIRECT path just directly indicate that the writes are sync
by using WRITE_SYNC instead.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
Jens Axboe
271f18f102 [PATCH] cfq-iosched: Don't set the queue batching limits
We cannot update them if the user changes nr_requests, so don't
set it in the first place. The gains are pretty questionable as
well. The batching loss has been shown to decrease throughput.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:38 +02:00
Dave Jones
acf4217555 [PATCH] remove dead code from elevator switching
We already drop the refcount in elevator_exit(), and as
we're setting 'e' to NULL, we'll never take that branch anyway.
Finally, as 'e' is a local var that isn't referenced afterwards,
setting it to NULL is pointless.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:38 +02:00
Paolo 'Blaisorblade' Giarrusso
a038e25364 [PATCH] blk_start_queue() must be called with irq disabled - add warning
The queue lock can be taken from interrupts so it must always be taken with
irq disabling primitives.  Some primitives already verify this.
blk_start_queue() is called under this lock, so interrupts must be
disabled.

Also document this requirement clearly in blk_init_queue(), where the queue
spinlock is set.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:38 +02:00
Akinobu Mita
bae386f788 [PATCH] iosched: use hlist for request hashtable
Use hlist instead of list_head for request hashtable in deadline-iosched
and as-iosched. It also can remove the flag to know hashed or unhashed.

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Jens Axboe <axboe@suse.de>

 block/as-iosched.c       |   45 +++++++++++++++++++--------------------------
 block/deadline-iosched.c |   39 ++++++++++++++++-----------------------
 2 files changed, 35 insertions(+), 49 deletions(-)
2006-06-23 17:10:38 +02:00
Oleg Nesterov
626ab0e69d [PATCH] list: use list_replace_init() instead of list_splice_init()
list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1.  We can use list_replace_init() instead.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:07 -07:00
Kay Sievers
b9d9c82b4d [PATCH] Driver core: add generic "subsystem" link to all devices
Like the SUBSYTEM= key we find in the environment of the uevent, this
creates a generic "subsystem" link in sysfs for every device. Userspace
usually doesn't care at all if its a "class" or a "bus" device. This
provides an unified way to determine the subsytem of a device, regardless
of the way the driver core has created it.

Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-21 12:40:49 -07:00
Linus Torvalds
6b41fd1785 Fix up CFQ scheduler for recent rbtree node shrinkage
The color is now in the low bits of the parent pointer, and initializing
it to 0 happens as part of the whole memset above, so just remove the
unnecessary RB_CLEAR_COLOR.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-20 19:44:03 -07:00
Linus Torvalds
2edc322d42 Merge git://git.infradead.org/~dwmw2/rbtree-2.6
* git://git.infradead.org/~dwmw2/rbtree-2.6:
  [RBTREE] Switch rb_colour() et al to en_US spelling of 'color' for consistency
  Update UML kernel/physmem.c to use rb_parent() accessor macro
  [RBTREE] Update hrtimers to use rb_parent() accessor macro.
  [RBTREE] Add explicit alignment to sizeof(long) for struct rb_node.
  [RBTREE] Merge colour and parent fields of struct rb_node.
  [RBTREE] Remove dead code in rb_erase()
  [RBTREE] Update JFFS2 to use rb_parent() accessor macro.
  [RBTREE] Update eventpoll.c to use rb_parent() accessor macro.
  [RBTREE] Update key.c to use rb_parent() accessor macro.
  [RBTREE] Update ext3 to use rb_parent() accessor macro.
  [RBTREE] Change rbtree off-tree marking in I/O schedulers.
  [RBTREE] Add accessor macros for colour and parent fields of rb_node
2006-06-20 14:51:22 -07:00
Jens Axboe
553698f944 [PATCH] cfq-iosched: fix crash in do_div()
We don't clear the seek stat values in cfq_alloc_io_context(), and if
->seek_mean is unlucky enough to be set to -36 by chance, the first
invocation of cfq_update_io_seektime() will oops with a divide by zero
in do_div().

Just memset the entire cic instead of filling invididual values
independently.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-14 10:22:16 -07:00
Jens Axboe
bc1c116974 [PATCH] elevator switching race
There's a race between shutting down one io scheduler and firing up the
next, in which a new io could enter and cause the io scheduler to be
invoked with bad or NULL data.

To fix this, we need to maintain the queue lock for a bit longer.
Unfortunately we cannot do that, since the elevator init requires to be
run without the lock held.  This isn't easily fixable, without also
changing the mempool API.  So split the initialization into two parts,
and alloc-init operation and an attach operation.  Then we can
preallocate the io scheduler and related structures, and run the attach
inside the lock after we detach the old one.

This patch has survived 30 minutes of 1 second io scheduler switching
with a very busy io load.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-08 15:14:23 -07:00
Jens Axboe
b52a834892 [PATCH] cfq-iosched: busy_rr fairness fix
Now that we select busy_rr for possible service, insert entries at the
back of that list instead of at the front.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-01 18:53:43 +02:00
Jens Axboe
ae818a38d4 [PATCH] cfq-iosched: fix bug in timer handling for the idle class
There's a small window from when the timer is entered and we grab
the queue lock, where cfq_set_active_queue() could be rearming the
timer for us. Seen in the wild on a 12-way ppc box. Fix this by
just using mod_timer(), which will do the right thing for us.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-01 10:13:43 +02:00
Jens Axboe
25776e3594 [PATCH] cfq-iosched: Detect hardware queueing
If the hardware is doing real queueing, decide that it's worthless to
idle the hardware. It does reasonable simultaneous io in that case
anyways, and the idling hurts some work loads.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-01 10:12:26 +02:00
Jens Axboe
12e9fddd6e [PATCH] cfq-iosched: Detect idle process issuing async request
If we are anticipating a sync request from this process and we are
waiting for that and see an async request come in, expire that slice
and move on.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-01 10:09:56 +02:00
Jens Axboe
e0de0206a2 [PATCH] cfq-iosched: check busy queues before deciding we are idle
For just one busy queue (like async write out), we often overlooked
that we could queue more io and decided we were idle instead. This causes
us quite a bit of performance loss.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-01 10:07:26 +02:00
Jens Axboe
3793c65c13 [PATCH] cfq-iosched: fixup locking and ->queue_list list management
- Drop cic from the list when seen as dead.
- Fixup the locking, just use a simple spinlock.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-30 20:31:05 -07:00
Jens Axboe
fd0ff8aa1d [PATCH] blk: fix gendisk->in_flight accounting during barrier sequence
While executing barrrier sequence, the bar_rq which carries actual
write was accounted as normal IO on completion, while it wasn't on
queueing.  This caused gendisk->in_flight to be decremented by 1 after
each barrier thus messed up statistics.

This patch makes bar_rq not accounted as normal IO.  As the containing
barrier request as a whole is accounted, part of it shouldn't be.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-23 10:39:43 -07:00
Linus Torvalds
1a2acc9e92 Revert "[BLOCK] Fix oops on removal of SD/MMC card"
This reverts commit 56cf6504fc.

Both Erik Mouw and Andrew Vasquez independently pinpointed this commit
as causing problems, where the slab cache for a driver is never released
(most obviously causing problems when immediately re-loading that
driver, resulting in a "kmem_cache_create: duplicate cache <xyz>"
message, but it can also cause other trouble).

James Bottomley dug into it, and reports:

  "OK, here's the scoop.  The problem patch adds a get of driverfs_dev in
   add_disk(), but doesn't put it again until disk_release() (which occurs
   on final put_disk() of the gendisk).

   However, in SCSI, the driverfs_dev is the sdev_gendev.  That means
   there's a reference held on sdev_gendev  until final disk put.
   Unfortunately, we use the driver model driver_remove to trigger
   del_gendisk (which removes the gendisk from visibility and decrements
   the refcount), so we've introduced an unbreakable deadlock in the
   reference counting with this.

   I suggest simply reversing this patch at the moment.  If Russell and
   Jens can tell me what they're trying to do I'll see if there's another
   way to do it."

so hereby the patch gets reverted, waiting for a better fix.

Cc: Jens Axboe <axboe@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Erik Mouw <erik@harddisk-recovery.com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-12 12:08:46 -07:00
Jens Axboe
dac07ec121 [BLOCK] limit request_fn recursion
Don't recurse back into the driver even if the unplug threshold is met,
when the driver asks for a requeue. This is both silly from a logical
point of view (requeues typically happen due to driver/hardware
shortage), and also dangerous since we could hit an endless request_fn
-> requeue -> unplug -> request_fn loop and crash on stack overrun.

Also limit blk_run_queue() to one level of recursion, similar to how
blk_start_queue() works.

This patch fixed a real problem with SLES10 and lpfc, and it could hit
any SCSI lld that returns non-zero from it's ->queuecommand() handler.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-11 12:38:59 -07:00
Russell King
56cf6504fc [BLOCK] Fix oops on removal of SD/MMC card
The block layer keeps a reference (driverfs_dev) to the struct
device associated with the block device, and uses it internally
for generating uevents in block_uevent.

Block device uevents include umounting the partition, which can
occur after the backing device has been removed.

Unfortunately, this reference is not counted.  This means that
if the struct device is removed from the device tree, the block
layers reference will become stale.

Guard against this by holding a reference to the struct device
in add_disk(), and only drop the reference when we're releasing
the gendisk kobject - in other words when we can be sure that no
further uevents will be generated for this block device.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Jens Axboe <axboe@suse.de>
2006-05-05 17:57:52 +01:00
Chandra Seetharaman
649bbaa484 [PATCH] Remove __devinitdata from notifier block definitions
Few of the notifier_chain_register() callers use __devinitdata in the
definition of notifier_block data structure.  It is incorrect as the
data structure should be available after the initializations (they do
not unregister them during initializations).

This was leading to an oops when notifier_chain_register() call is
invoked for those callback chains after initialization.

This patch fixes all such usages to _not_ have the notifier_block data
structure in the init data section.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-26 08:27:50 -07:00
David Woodhouse
3db3a44530 [RBTREE] Change rbtree off-tree marking in I/O schedulers.
They were abusing the rb_color field to mark nodes which weren't currently
on the tree. Fix that to use the same method as eventpoll did -- setting
the parent pointer to point back to itself. And use the appropriate
accessor macros for setting and reading the parent.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-21 13:15:17 +01:00
Adrian Bunk
4f73247f0e [PATCH] block/elevator.c: remove unused exports
This patch removes the following unused EXPORT_SYMBOL's:
- elv_requeue_request
- elv_completed_request

They are only used by the block core, hence they need not be exported.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-20 15:45:22 +02:00
Coywolf Qi Hunt
7daac49020 [patch] cleanup: use blk_queue_stopped
This cleanup the source to use blk_queue_stopped.

Signed-off-by: Coywolf Qi Hunt <qiyong@freeforge.net>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-20 13:04:36 +02:00
OGAWA Hirofumi
be3b075354 [PATCH] cfq: Further rbtree traversal and cfq_exit_queue() race fix
In current code, we are re-reading cic->key after dead cic->key check.
So, in theory, it may really re-read *after* cfq_exit_queue() seted NULL.

To avoid race, we copy it to stack, then use it. With this change, I
guess gcc will assign cic->key to a register or stack, and it wouldn't
be re-readed.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-18 19:18:31 +02:00
OGAWA Hirofumi
dbecf3ab40 [PATCH 2/2] cfq: fix cic's rbtree traversal
When queue dies, we set cic->key=NULL as dead mark. So, when we
traverse a rbtree, we must check whether it's still valid key. if it
was invalidated, drop it, then restart the traversal from top.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-18 09:45:18 +02:00
OGAWA Hirofumi
fba822722e [PATCH 1/2] iosched: fix typo and barrier()
On rmmod path, cfq/as waits to make sure all io-contexts was
freed. However, it's using complete(), not wait_for_completion().

I think barrier() is not enough in here. To avoid the following case,
this patch replaces barrier() with smb_wmb().

	cpu0			visibility			cpu1
	                [ioc_gnone=NULL,ioc_count=1]

ioc_gnone = &all_gone		NULL,ioc_count=1
atomic_read(&ioc_count)		NULL,ioc_count=1
wait_for_completion()		NULL,ioc_count=0	atomic_sub_and_test()
				NULL,ioc_count=0	if ( && ioc_gone)
						    [ioc_gone==NULL,
						    so doesn't call complete()]
			   &all_gone,ioc_count=0

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-18 09:44:06 +02:00
Christoph Hellwig
21b2f0c803 [SCSI] unify SCSI_IOCTL_SEND_COMMAND implementations
We currently have two implementations of this obsolete ioctl, one in
the block layer and one in the scsi code.  Both of them have drawbacks.

This patch kills the scsi layer version after updating the block version
with the missing bits:

 - argument checking
 - use scatterlist I/O
 - set number of retries based on the submitted command

This is the last user of non-S/G I/O except for the gdth driver, so
getting this in ASAP and through the scsi tree would be nie to kill
the non-S/G I/O path.  Jens, what do you think about adding a check
for non-S/G I/O in the midlayer?

Thanks to  Or Gerlitz for testing this patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-04-13 10:13:15 -05:00
Martin Waitz
a580290c3e Documentation: fix minor kernel-doc warnings
This patch updates the comments to match the actual code.

Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-02 13:59:55 +02:00
Trond Myklebust
88b9adb073 [PATCH] config: fix CONFIG_LFS option
The help text says that if you select CONFIG_LBD, then it will automatically
select CONFIG_LFS.  That isn't currently the case, so update the text.

- Get rid of the cruft in the help text mentioning CONFIG_LBD

- Tell unsure users to select CONFIG_LFS.

- Remove the `default n'.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:55 -08:00
OGAWA Hirofumi
9b41046cd0 [PATCH] Don't pass boot parameters to argv_init[]
The boot cmdline is parsed in parse_early_param() and
parse_args(,unknown_bootoption).

And __setup() is used in obsolete_checksetup().

	start_kernel()
		-> parse_args()
			-> unknown_bootoption()
				-> obsolete_checksetup()

If __setup()'s callback (->setup_func()) returns 1 in
obsolete_checksetup(), obsolete_checksetup() thinks a parameter was
handled.

If ->setup_func() returns 0, obsolete_checksetup() tries other
->setup_func().  If all ->setup_func() that matched a parameter returns 0,
a parameter is seted to argv_init[].

Then, when runing /sbin/init or init=app, argv_init[] is passed to the app.
If the app doesn't ignore those arguments, it will warning and exit.

This patch fixes a wrong usage of it, however fixes obvious one only.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:53 -08:00
Joe Korty
68eef3b479 [PATCH] Simplify proc/devices and fix early termination regression
Make baby-simple the code for /proc/devices.  Based on the proven design
for /proc/interrupts.

This also fixes the early-termination regression 2.6.16 introduced, as
demonstrated by:

    # dd if=/proc/devices bs=1
    Character devices:
      1 mem
    27+0 records in
    27+0 records out

This should also work (but is untested) when /proc/devices >4096 bytes,
which I believe is what the original 2.6.16 rewrite fixed.

[akpm@osdl.org: cleanups, simplifications]
Signed-off-by: Joe Korty <joe.korty@ccur.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:53 -08:00
Linus Torvalds
7baf398f12 Merge branch 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [BLOCK] cfq-iosched: seek and async performance fixes
  [PATCH] ll_rw_blk: fix 80-col offender in put_io_context()
  [PATCH] cfq-iosched: small cfq_choose_req() optimization
  [PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree
2006-03-28 09:25:44 -08:00
KAMEZAWA Hiroyuki
0a94502277 [PATCH] for_each_possible_cpu: fixes for generic part
replaces for_each_cpu with for_each_possible_cpu().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:05 -08:00
Jens Axboe
206dc69b31 [BLOCK] cfq-iosched: seek and async performance fixes
Detect whether a given process is seeky and if so disable (mostly) the
idle window if it is. We still allow just a little idle time, just enough
to allow that process to submit a new request. That is needed to maintain
fairness across priority groups.

In some cases, we could setup several async queues. This is not optimal
from a performance POV, since we want all async io in one queue to perform
good sorting on it. It also impacted sync queues, as async io got too much
slice time.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28 13:03:44 +02:00
Jens Axboe
7143dd4b01 [PATCH] ll_rw_blk: fix 80-col offender in put_io_context()
This makes akpm more happy.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28 09:00:28 +02:00
Andreas Mohr
e8a99053ea [PATCH] cfq-iosched: small cfq_choose_req() optimization
this is a small optimization to cfq_choose_req() in the CFQ I/O scheduler
(this function is a semi-often invoked candidate in an oprofile log):
by using a bit mask variable, we can use a simple switch() to check
the various cases instead of having to query two variables for each check.
Benefit: 251 vs. 285 bytes footprint of cfq_choose_req().
Also, common case 0 (no request wrapping) is now checked first in code.

Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28 08:59:49 +02:00
Jens Axboe
e2d74ac066 [PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree
On setups with many disks, we spend a considerable amount of time
looking up the process-disk mapping on each queue of io. Testing with
a NULL based block driver, this costs 40-50% reduction in throughput
for 1000 disks.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28 08:59:01 +02:00
Linus Torvalds
4fa639123d Merge branch 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] Don't make debugfs depend on DEBUG_KERNEL
  [PATCH] Fix blktrace compile with sysfs not defined
  [PATCH] unused label in drivers/block/cciss.
  [BLOCK] increase size of disk stat counters
  [PATCH] blk_execute_rq_nowait-speedup
  [PATCH] ide-cd: quiet down GPCMD_READ_CDVD_CAPACITY failure
  [BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion
  [PATCH] kzalloc() conversion in drivers/block
  [PATCH] update max_sectors documentation
2006-03-27 08:46:49 -08:00
NeilBrown
89e5c8b5b8 [PATCH] md: Make sure QUEUE_FLAG_CLUSTER is set properly for md.
This flag should be set for a virtual device iff it is set for all underlying
devices.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:00 -08:00
Jens Axboe
09540e691d [PATCH] Fix blktrace compile with sysfs not defined
debugfs depends on sysfs, so make blktrace kconfig option depend
on that.

Reported by Adrian Bunk.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27 09:29:03 +02:00
Ben Woodard
837c787877 [BLOCK] increase size of disk stat counters
The kernel's representation of the disk statistics uses the type unsigned
which is 32b on both 32b and 64b platforms.  Unfortunately, most system
tools that work with these numbers that are exported in /proc/diskstats
including iostat read these numbers into unsigned longs.  This works fine
on 32b platforms and when the number of IO transactions are small on 64b
platforms.  However, when the numbers wrap on 64b platforms & you read the
numbers into unsigned longs, and compare the numbers to previous readings,
then you get an unsigned representation of a negative number.  This looks
like a very large 64b number & gives you bizarre readouts in iostat:

ilc4: Device:    rrqm/s wrqm/s r/s    w/s  rsec/s  wsec/s    rkB/s wkB/s avgrq-sz avgqu-sz   await  svctm  %util
ilc4: sda        5.50   0.00   143.96 0.00 307496983987862656.00 0.00 153748491993931328.00     0.00 2136028725038430.00     7.94   55.12    5.59  80.42

Though fixing iostat in user space is possible, and a quick survey
indicates that several other similar tools also use unsigned longs when
processing /proc/diskstats.  Therefore, it seems like a better approach
would be to extend the length of the disk_stats structure on 64b
architectures to 64b.  The following patch does that.  It should not affect
the operation on 32b platforms.

Signed-off-by: Ben Woodard <woodard@redhat.com>
Cc: Rick Lindsley <ricklind@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27 09:29:02 +02:00
Andrew Morton
4c5d0bbde9 [PATCH] blk_execute_rq_nowait-speedup
Both elv_add_request() and generic_unplug_device() grab the queue lock
and disable interrupts, do that locally and use the __ variants.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27 09:29:02 +02:00
Jens Axboe
f68110fc28 [BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27 09:29:02 +02:00
Takashi Sato
a0f62ac636 [PATCH] 2TB files: add blkcnt_t
Add blkcnt_t as the type of inode.i_blocks.  This enables you to make the size
of blkcnt_t either 4 bytes or 8 bytes on 32 bits architecture with CONFIG_LSF.

- CONFIG_LSF
  Add new configuration parameter.
- blkcnt_t
  On h8300, i386, mips, powerpc, s390 and sh that define sector_t,
  blkcnt_t is defined as u64 if CONFIG_LSF is enabled; otherwise it is
  defined as unsigned long.
  On other architectures, it is defined as unsigned long.
- inode.i_blocks
  Change the type from sector_t to blkcnt_t.

Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Matthew Dobson
93d2341c75 [PATCH] mempool: use mempool_create_slab_pool()
Modify well over a dozen mempool users to call mempool_create_slab_pool()
rather than calling mempool_create() with extra arguments, saving about 30
lines of code and increasing readability.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Eric Sesterhenn
ce52449742 BUG_ON() Conversion in block/elevator.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-24 18:43:26 +01:00
Jens Axboe
2056a782f8 [PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-23 20:00:26 +01:00
Arjan van de Ven
c039e3134a [PATCH] sem2mutex: blockdev #2
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:11 -08:00
Jes Sorensen
58383af629 [PATCH] kobj_map semaphore to mutex conversion
Convert the kobj_map code to use a mutex instead of a semaphore.  It
converts the single two users as well, genhd.c and char_dev.c.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:58 -08:00
Al Viro
e572ec7e4e [PATCH] fix rmmod problems with elevator attributes, clean them up 2006-03-18 22:27:18 -05:00
Al Viro
3d1ab40f4c [PATCH] elevator_t lifetime rules and sysfs fixes 2006-03-18 18:35:43 -05:00
Al Viro
1cc9be68eb [PATCH] noise removal: cfq-iosched.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:35:08 -05:00
Al Viro
a90d742e4c [PATCH] don't bother with refcounting for cfq_data
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:35:05 -05:00
Al Viro
483f4afc42 [PATCH] fix sysfs interaction and lifetime rules handling for queues 2006-03-18 18:34:37 -05:00
Al Viro
6f325a1344 [PATCH] fix cfq_get_queue()/ioprio_set(2) races
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:17 -05:00
Al Viro
334e94de9b [PATCH] deal with rmmod/put_io_context() races
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:15 -05:00
Al Viro
e17a9489b4 [PATCH] stop elv_unregister() from rogering other iosched's data, fix locking
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:12 -05:00
Al Viro
25975f863b [PATCH] stop cfq from pinning queue down
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:09 -05:00
Al Viro
d9ff418793 [PATCH] make cfq_exit_queue() prune the cfq_io_context for that queue
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:07 -05:00
Al Viro
a6a0763a60 [PATCH] fix the exclusion for ioprio_set()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:04 -05:00
Al Viro
12a0573215 [PATCH] keep sync and async cfq_queue separate
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:02 -05:00
Al Viro
478a82b0ed [PATCH] switch to use of ->key to get cfq_data by cfq_io_context
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:33:59 -05:00
Al Viro
7670876d2d [PATCH] stop leaking cfq_data in cfq_set_request()
We don't need to pin ->key down; ->cfqq->cfqd will do that for us.
Incidentally, that stops the leak we had - that reference was never
dropped.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:33:56 -05:00
Al Viro
b0a6916bcc [PATCH] fix cfq hash lookups
If somebody does a hash lookup for cfq_queue while ioprio of an async queue
is elevated, they shouldn't end up stuck with lowered ioprio when we go back.
Fix is to use ->org_ioprio{,class} in hash lookups.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:33:54 -05:00
Al Viro
c981ff9f89 [PATCH] fix locking in queue_requests_store()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:33:51 -05:00
Al Viro
8669aafdb5 [PATCH] fix double-free in blk_init_queue_node()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:33:49 -05:00
Andi Kleen
5ee1af9f51 [PATCH] block: disable block layer bouncing for most memory on 64bit systems
The low level PCI DMA mapping functions should handle it in most cases.

This should fix problems with depleting the DMA zone early. The old
code used precious GFP_DMA memory in many cases where it was not needed.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 18:10:31 -08:00
Jens Axboe
7b14e3b52f [PATCH] cfq-iosched: slice expiry fixups
During testing of SLES10, we encountered a hang in the CFQ io scheduler.
Turns out the deferred slice expiry logic is buggy, so remove that for
now.  We could be left with an idle queue that would never wake up.  So
kill that logic, always expire immediately.  Also fix a potential timer
race condition.

Patch looks bigger than it is, because it moves a function.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-28 00:38:02 -08:00