This patch converts virtio_blk to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.
Related 'uptodate' argument is converted to 'error'.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The current floppy_struct allows floppies to number sectors starting
from 0 or 1. This patch allows arbitrary first-sector numbers - for
example, 0xC1 for Amstrad CPC disks.
This extends the existing 1-bit field (FD_ZEROBASED, bit 2 of stretch)
to 8 bits (FD_SECTMASK, bits 2 to 9).
Currently 0x00 denotes a first sector number of 1, and 0x01 denotes a
first sector number of 0. We extend this by interpreting FD_SECTMASK
as the first sector number with the LSB flipped.
Signed-off-by: Keith Wansbrough <keith@lochan.org>
Cc: Alain Knaff <alain@linux.lu>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Karel Zak <kzak@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The kernel.h macro DIV_ROUND_UP performs the computation (((n) + (d) - 1) /
(d)) but is perhaps more readable.
An extract of the semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@haskernel@
@@
#include <linux/kernel.h>
@depends on haskernel@
expression n,d;
@@
(
- (n + d - 1) / d
+ DIV_ROUND_UP(n,d)
|
- (n + (d - 1)) / d
+ DIV_ROUND_UP(n,d)
)
@depends on haskernel@
expression n,d;
@@
- DIV_ROUND_UP((n),d)
+ DIV_ROUND_UP(n,d)
@depends on haskernel@
expression n,d;
@@
- DIV_ROUND_UP(n,(d))
+ DIV_ROUND_UP(n,d)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Fix cciss SCSI rescan code to better notice device changes.
If you hot-unplug a tape drive, then hot-plug a different
tape drive into the same slot in a storage enclosure,
the cciss driver wouldn't notice anything had changed, as
it was only looking at the LUN address and device type.
Now it looks at the inquiry page 0x83 device identifier,
and vendor and model strings as well.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Until recently, the maximum number of xvd block devices you could attach
to a Xen domU was 16. This limitation turned out to be problematic for
some users, so it was expanded to handle a much larger number of disks.
However, this requires a couple of changes in the way that blkfront
scans for disks. This functionality is already present in the Xen
linux-2.6.18-xen.hg tree; the attached patch adds this functionality to
the mainline xen-blkfront implementation. I successfully tested it on a
2.6.25 tree, and build tested it on 2.6.27-rc3.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Move stats related fields - stamp, in_flight, dkstats - from disk to
part0 and unify stat handling such that...
* part_stat_*() now updates part0 together if the specified partition
is not part0. ie. part_stat_*() are now essentially all_stat_*().
* {disk|all}_stat_*() are gone.
* part_round_stats() is updated similary. It handles part0 stats
automatically and disk_round_stats() is killed.
* part_{inc|dec}_in_fligh() is implemented which automatically updates
part0 stats for parts other than part0.
* disk_map_sector_rcu() is updated to return part0 if no part matches.
Combined with the above changes, this makes NULL special case
handling in callers unnecessary.
* Separate stats show code paths for disk are collapsed into part
stats show code paths.
* Rename disk_stat_lock/unlock() to part_stat_lock/unlock()
While at it, reposition stat handling macros a bit and add missing
parentheses around macro parameters.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Move disk->capacity to part0->nr_sects and convert all users who
directly accessed the field to use {get|set}_capacity(). This is done
early to allow the __dev field to be moved.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Implement {disk|part}_to_dev() and use them to access generic device
instead of directly dereferencing {disk|part}->dev. To make sure no
user is left behind, rename generic devices fields to __dev.
This is in preparation of unifying partition 0 handling with other
partitions.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
There are two variants of stat functions - ones prefixed with double
underbars which don't care about preemption and ones without which
disable preemption before manipulating per-cpu counters. It's unclear
whether the underbarred ones assume that preemtion is disabled on
entry as some callers don't do that.
This patch unifies diskstats access by implementing disk_stat_lock()
and disk_stat_unlock() which take care of both RCU (for partition
access) and preemption (for per-cpu counter access). diskstats access
should always be enclosed between the two functions. As such, there's
no need for the versions which disables preemption. They're removed
and double underbars ones are renamed to drop the underbars. As an
extra argument is added, there's no danger of using the old version
unconverted.
disk_stat_lock() uses get_cpu() and returns the cpu index and all
diskstat functions which access per-cpu counters now has @cpu
argument to help RT.
This change adds RCU or preemption operations at some places but also
collapses several preemption ops into one at others. Overall, the
performance difference should be negligible as all involved ops are
very lightweight per-cpu ones.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
disk->part[] is protected by its matching bdev's lock. However,
non-critical accesses like collecting stats and printing out sysfs and
proc information used to be performed without any locking. As
partitions can come and go dynamically, partitions can go away
underneath those non-critical accesses. As some of those accesses are
writes, this theoretically can lead to silent corruption.
This patch fixes the race by using RCU for the partition array and dev
reference counter to hold partitions.
* Rename disk->part[] to disk->__part[] to make sure no one outside
genhd layer proper accesses it directly.
* Use RCU for disk->__part[] dereferencing.
* Implement disk_{get|put}_part() which can be used to get and put
partitions from gendisk respectively.
* Iterators are implemented to help iterate through all partitions
safely.
* Functions which require RCU readlock are marked with _rcu suffix.
* Use disk_put_part() in __blkdev_put() instead of directly putting
the contained kobject.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Implement disk_devt() and part_devt() and use them to directly
access devt instead of computing it from ->major and ->first_minor.
Note that all references to ->major and ->first_minor outside of
block layer is used to determine devt of the disk (the part0) and as
->major and ->first_minor will continue to represent devt for the
disk, converting these users aren't strictly necessary. However,
convert them for consistency.
* Implement disk_max_parts() to avoid directly deferencing
genhd->minors.
* Update bdget_disk() such that it doesn't assume consecutive minor
space.
* Move devt computation from register_disk() to add_disk() and make it
the only one (all other usages use the initially determined value).
These changes clean up the code and will help disk->part dereference
fix and extended block device numbers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch makes the following misc updates in preparation for
disk->part dereference fix and extended block devt support.
* implment part_to_disk()
* fix comment about gendisk->part indexing
* rename get_part() to disk_map_sector()
* don't use n which is always zero while printing disk information in
diskstats_show()
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
struct request has an ioprio member but it is never updated because
currently bios do not hold io context information. The implication of
this is that virtio_blk ends up passing useless information to the
backend driver.
That said, some IO schedulers such as CFQ do store io context
information in struct request, but use private members for that, which
means that that information cannot be directly accessed in a IO
scheduler-independent way.
This patch adds a function to obtain the ioprio of a request. We should
avoid accessing ioprio directly and use this function instead, so that
its users do not have to care about future changes in block layer
structures or what the currently active IO controller is.
This patch does not introduce any functional changes but paves the way
for future clean-ups and enhancements.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It was only used by ps3disk, and it should probably have been
REQ_TYPE_LINUX_BLOCK + REQ_LB_OP_FLUSH.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The variable statindex in send_request is never read, so remove it.
Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported by Thomas Graf.
If we don't unlink the SKB from the queue when we send it
out in aoenet_xmit(), dev_hard_start_xmit() will see skb->next
as non-NULL and interpret this to mean the SKB is part of a
GSO segment list.
Add __skb_unlink() call to fix that.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that arch/ppc is dead CONFIG_PPC_MERGE is always defined for all
powerpc platforms and we want to get rid of CONFIG_PPC_MERGE use
CONFIG_PPC instead.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This mirrors the of_device_id[] changes done in
fd098316ef ("sparc: Annotate
of_device_id arrays with const or __initdata.")
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 5b6155ee70, because
the block device ioctl's really aren't ready for it.
In particular, the "struct file *" and the "struct inode *" arguments do
not necessarily match, which means that the unlocked version of the
ioctl (that only gets a "struct file *") isn't actually able to handle
the cases it needs to handle.
This fixes bugzilla
http://bugzilla.kernel.org/show_bug.cgi?id=11401
Reported-and-bisected-by: Laurent Riffard <laurent.riffard@free.fr>
Acked-by: Peter Osterlund <petero2@telia.com>
Cc: Alan Cox <alan@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The name of brd block device is "ramdisk", it's not "brd".
(The block device is registered by register_blkdev(RAMDISK_MAJOR, "ramdisk")
So it should be unregistered by unregister_blkdev(RAMDISK_MAJOR, "ramdisk")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We leak the memory allocated for the nbd_dev array at multiple places.
Fix them by either adding a kfree() or by rearranging code to return
before we allocate the memory.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Cc: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are four operating modes Xen code may find itself running in:
- native
- hvm domain
- pv dom0
- pv domU
Clean up predicates for testing for these states to make them more consistent.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch makes the needlessly global blkif_ioctl() static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Bug fix. If SCSI tape support is turned off we get an implicit declaration
of cciss_unregister_scsi error in cciss_remove_one.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch adds support for multi-lun devices in a SAS environment. It's
required for the support of media changers.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch changes way we notify the scsi layer that something has changed
on the SCSI tape side of the driver. The user can now just tell the driver
to rescan a particular controller rather than having to know the SCSI nexus
to echo into the SCSI mid-layer.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch fixes a problem where the logical volume count may go negative.
In some instances if several logical are configured on a controller and all
of them are deleted using the online utilities the volume count in /proc may
go negative with no way get it correct again.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch removes redundant code where ever logical volumes are added or
removed. It adds 3 new functions that are called instead of having the same
code spread throughout the driver. It also removes the cciss_getgeometry
function.
The patch is fairly complex but we haven't figured out how to make it any
simpler and still do everything that needs to be done. Some of the
complexity comes from having to special case booting from cciss. Otherwise
the gendisk doesn't get added in time and the switchroot will fail.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch makes the rebuild_lun_table smart enough to not rip a logical
volume out from under the OS. Without this fix if a customer is running
hpacucli to monitor their storage the driver will blindly remove and re-add
the disks whenever the utility calls the CCISS_REGNEWD ioctl. Unfortunately,
both hpacucli and ACUXE call the ioctl repeatedly. Customers have reported
IO coming to a standstill. Calling the ioctl is the problem, this patch is
the fix.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Return -EFAULT instead of -ENOMEM if copy_from_user() fails.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
curr_queue is a local variable in a for loop, and it's being initialized
at the start of each loop. So any assignment at the end of the loop is
pointless.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some module parameters with only one line have the '\n' at the end of the
description. This is not needed nor wanted as after the description the
type (i.e. int) is followed by a newline.
Some modules contain a multi-line description, these are not affected
by this patch.
Signed-off-by: Niels de Vos <niels.devos@wincor-nixdorf.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Ed L. Cashin <ecashin@coraid.com>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Roland Dreier <rolandd@cisco.com>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ATA over Ethernet: The semaphore emsgs_sema is used for signalling an
event, convert it in a completion.
Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently virtio_blk assumes a 512 byte hard sector size. This can cause
trouble / performance issues if the backing has a different block size
(like a file on an ext3 file system formatted with 4k block size or a dasd).
Lets add a feature flag that tells the guest to use a different hard sector
size than 512 byte.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
device_create() is race-prone, so use the race-free
device_create_drvdata() instead as device_create() is going away.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
According to the tests in do_initcalls(), the proper error code in case no
device is found is -ENODEV, not -ENXIO or -EIO.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The code that needed this #include was removed one year ago.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: rmk@arm.linux.org.uk
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Many people will see this option the first time now that it is in
drivers/block/
Make it clear that virtually noone needs it.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: rmk@arm.linux.org.uk
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
This patch moves hd.c to drivers/block/
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: rmk@arm.linux.org.uk
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (80 commits)
ide-floppy: fix unfortunate function naming
ide-tape: unify idetape_create_read/write_cmd
ide: add ide_pc_intr() helper
ide-{floppy,scsi}: read Status Register before stopping DMA engine
ide-scsi: add more debugging to idescsi_pc_intr()
ide-scsi: use pc->callback
ide-floppy: add more debugging to idefloppy_pc_intr()
ide-tape: always log debug info in idetape_pc_intr() if debugging is enabled
ide-tape: add ide_tape_io_buffers() helper
ide-tape: factor out DSC handling from idetape_pc_intr()
ide-{floppy,tape}: move checking of ->failed_pc to ->callback
ide: add ide_issue_pc() helper
ide: add PC_FLAG_DRQ_INTERRUPT pc flag
ide-scsi: move idescsi_map_sg() call out from idescsi_issue_pc()
ide: add ide_transfer_pc() helper
ide-scsi: set drive->scsi flag for devices handled by the driver
ide-{cd,floppy,tape}: remove checking for drive->scsi
ide: add PC_FLAG_ZIP_DRIVE pc flag
ide-tape: factor out waiting for good ireason from idetape_transfer_pc()
ide-tape: set PC_FLAG_DMA_IN_PROGRESS flag in idetape_transfer_pc()
...
pd_special_command uses blk_put_request with struct request on the
stack. As a result, blk_put_request needs a hack to catch a NULL
request_queue. This converts pd_special_command to use
blk_execute_rq.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (37 commits)
splice: fix generic_file_splice_read() race with page invalidation
ramfs: enable splice write
drivers/block/pktcdvd.c: avoid useless memset
cdrom: revert commit 22a9189 (cdrom: use kmalloced buffers instead of buffers on stack)
scsi: sr avoids useless buffer allocation
block: blk_rq_map_kern uses the bounce buffers for stack buffers
block: add blk_queue_update_dma_pad
DAC960: push down BKL
pktcdvd: push BKL down into driver
paride: push ioctl down into driver
block: use get_unaligned_* helpers
block: extend queue_flag bitops
block: request_module(): use format string
Add bvec_merge_data to handle stacked devices and ->merge_bvec()
block: integrity flags can't use bit ops on unsigned short
cmdfilter: extend default read filter
sg: fix odd style (extra parenthesis) introduced by cmd filter patch
block: add bounce support to blk_rq_map_user_iov
cfq-iosched: get rid of enable_idle being unused warning
allow userspace to modify scsi command filter on per device basis
...
This patch changes the way we determine the maximum number of outstanding
commands for each controller.
Most Smart Array controllers can support up to 1024 commands, the notable
exceptions are the E200 and E200i.
The next generation of controllers which were just added support a mode of
operation called Zero Memory Raid (ZMR). In this mode they only support
64 outstanding commands. In Full Function Raid (FFR) mode they support
1024.
We have been setting the queue depth by arbitrarily assigning some value
for each controller. We needed a better way to set the queue depth to
avoid lots of annoying "fifo full" messages. So we made the driver a
little smarter. We now read the config table and subtract 4 from the
returned value. The -4 is to allow some room for ioctl calls which are
not tracked the same way as io commands are tracked.
Please consider this for inclusion.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix regression in cciss driver that if no logical drives are configured,
no device nodes at all get created.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid the 'memset(...,0, ...)' before calling 'init_cdrom_command' because
this function already does it.
Signed-off-by: Christophe Jaillet <jaillet.christophe@wanadoo.fr>
Acked-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Push the lock_kernel down into the driver and switch to unlocked_ioctl
[akpm@linux-foundation.org: build fix]
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Leaves us with lock_kernel for two methods. Also remove a bogus printk
with no printk level and return -ENOTTY not -EINVAL for correctness.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(Jens: added smp_lock.h include to pt.c, otherwise it wont compile because
of missing {un}lock_kernel() definition)
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
When devices are stacked, one device's merge_bvec_fn may need to perform
the mapping and then call one or more functions for its underlying devices.
The following bio fields are used:
bio->bi_sector
bio->bi_bdev
bio->bi_size
bio->bi_rw using bio_data_dir()
This patch creates a new struct bvec_merge_data holding a copy of those
fields to avoid having to change them directly in the struct bio when
going down the stack only to have to change them back again on the way
back up. (And then when the bio gets mapped for real, the whole
exercise gets repeated, but that's a problem for another day...)
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Avoid allocations causing swap activity on the resume path by
preventing the allocations from doing IO and allowing them
to access the emergency pools.
These paths are used when a frontend device is trying to connect
to its backend driver over Xenbus. These reconnections are triggered
on demand by IO, so by definition there is already IO underway,
and further IO would naturally deadlock. On resume, this path
is triggered when the running system tries to continue using its
devices. If it cannot then the resume will fail; to try to avoid this
we let it dip into the emergency pools.
[ linux-2.6.18-xen changesets e8b49cfbdac, fdb998e79aba ]
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Return 0 instead of -EINVAL if the blkfront device is a cdrom,
i.e. had the VDISK_CDROM attribute. This allows udev's cdrom_id
to correctly detect the device as a cdrom device.
[ Add blkif_ioctl, and CDROMMULTISESSION ]
[ linux-2.6.18-xen changeset d2bd9af846b5 ]
Signed-off-by: Christian Limpach <Christian.Limpach@xensource.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Add support for the next generation of HP Smart Array SAS/SATA
controllers. Shipping date is late Fall 2008.
Bump the driver version to 3.6.20 to reflect the new hardware support from
patch 1 of this set.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alias brd to rd in the hope of helping legacy users. Suggested by Jan.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hello Rusty,
sometimes it is useful to share a disk (e.g. usr). To avoid file system
corruption, the disk should be mounted read-only in that case. This patch
adds a new feature flag, that allows the host to specify, if the disk should
be considered read-only.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fix a modprobe virtio_blk ; rmmod virtio_blk ; modprobe virtio_blk crash; this
was basically because we weren't doing "del_gendisk()" in the remove path.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (moved del_gendisk up)
In 2.6.25, ramdisk devices show up in /proc/partitions, which is a
behaviour change from the old rd.c. Add GENHD_FL_SUPPRESS_PARTITION_INFO,
which was present in rd.c.
All kernels prior to 2.6.25 weren't displaying ramdisks in
/proc/partitions. Since there are many userspace tools using information
from /proc/partitions some of them may now behave incorrectly (I didn't
tested any though). For example before 2.6.25 /proc/partitions was empty
if no block devices like hard disks and such were detected by kernel. Now
all 16 ramdisks are always visible there. Some software may rely on such
information (I mean, on empty /proc/partitions).
There was quite similar situation back in 2004, and ramdisks were excluded
back from displaying. Thats why I called this a regression (maybe a bit
unfortunate). See this patch for info:
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6.3-rc2-mm1/broken-out/nbd-proc-partitions-fix.patch
I also think that someone somewhere (long time ago) excluded ramdisks from
/proc/partitions for good reasons. It is possible that now such new
"feature" is harmless, but I think there are more chances that someone
will say "hey, /proc/partitions has changed, now my software doesn't work"
then "hey where did my new 2.6.25 feature go". nbd devices are also
excluded, maybe for very same (unknown to me) reasons.
Signed-off-by: Marcin Krol <hawk@pld-linux.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I don't use my IBM email address normally and people can find me in
CREDITS.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
According to the tests in do_initcalls(), the proper error code in case no
device is found is -ENODEV, not -ENXIO or -EIO.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_part() is fairly expensive, as it O(N) loops over partitions
to find the right one. In lots of normal IO paths we end up looking
up the partition twice, to make matters even worse. Change the
stat add code to accept a passed in partition instead.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (32 commits)
USB GADGET/PERIPHERAL: g_file_storage Bulk-Only Transport compliance, clear-feature ignore
USB GADGET/PERIPHERAL: g_file_storage Bulk-Only Transport compliance
usb_serial: some coding style fixes
USB: Remove redundant dependencies on USB_ATM.
USB: UHCI: disable remote wakeup when it's not needed
USB: OHCI: work around bogus compiler warning
USB: add Cypress c67x00 OTG controller HCD driver
USB: add Cypress c67x00 OTG controller core driver
USB: add Cypress c67x00 low level interface code
USB: airprime: unlock mutex instead of trying to lock it again
USB: storage: Update mailling list address
USB: storage: UNUSUAL_DEVS() for PanDigital Picture frame.
USB: Add the USB 2.0 extension descriptor.
USB: add more FTDI device ids
USB: fix cannot work usb storage when using ohci-sm501
usb: gadget zero timer init fix
usb: gadget zero style fixups (mostly whitespace)
usb serial gadget: CDC ACM fixes
usb: pxa27x_udc driver
USB: INTOVA Pixtreme camera mass storage device
...
I hoped to continue to ignore this problem or use libusual, but these
days it's simpler to work around than to deal with it. Let's attempt to
use bad residue devices and hope that upper level integrity checks catch
any problems (e.g. please use sha1sum on your backups).
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The wodim says:
"close track/session scsi sendcmd: cmd timeout after 5.000 (480) s"
This happened because we ignored the supplied timeout and used 5s.
It's not completely correct to apply a timeout meant for the complete
command to any single URB, but we don't have many URBs per command, so
this is simple and works.
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Rather than faking up some geometry, allow the backend to push the disk
geometry via virtio pci config option. Keep the old geo code around for
compatibility.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (modified to single struct)
A recent proposed feature addition to the virtio block driver revealed
some flaws in the API: in particular, we assume that feature
negotiation is complete once a driver's probe function returns.
There is nothing in the API to require this, however, and even I
didn't notice when it was violated.
So instead, we require the driver to specify what features it supports
in a table, we can then move the feature negotiation into the virtio
core. The intersection of device and driver features are presented in
a new 'features' bitmap in the struct virtio_device.
Note that this highlights the difference between Linux unsigned-long
bitmaps where each unsigned long is in native endian, and a
straight-forward little-endian array of bytes.
Drivers can still remove feature bits in their probe routine if they
really have to.
API changes:
- dev->config->feature() no longer gets and acks a feature.
- drivers should advertise their features in the 'feature_table' field
- use virtio_has_feature() for extra sanity when checking feature bits
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
A recent proposed feature addition to the virtio block driver revealed
some flaws in the API, in particular how easy it is to break big
endian machines.
The virtio config space was originally chosen to be little-endian,
because we thought the config might be part of the PCI config space
for virtio_pci. It's actually a separate mmio region, so that
argument holds little water; as only x86 is currently using the virtio
mechanism, we can change this (but must do so now, before the
impending s390 merge).
API changes:
- __virtio_config_val() just becomes a striaght vdev->config_get() call.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Do not unregister the major at device remove, since there might be
another device instances around.
(qemu) pci_del 0 11
(qemu) ACPI: PCI interrupt for device 0000:00:0b.0 disabled
(qemu) pci_del 0 10
(qemu) ------------[ cut here ]------------
WARNING: at block/genhd.c:126 unregister_blkdev+0x74/0x9e()
ACPI: PCI interrupt for device 0000:00:0a.0 disabled
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Ron Minnich points out that a struct containing a char is not always
sizeof(char); simplest to remove the structure to avoid confusion.
Cc: "ron minnich" <rminnich@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Simply replace proc_create and further data assigned with proc_create_data.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds partition management for Block RAM Device (BRD).
This patch is done to keep in sync BRD and loop device drivers.
This patch adds a parameter to the module, max_part, to specify
the maximum number of partitions per RAM device.
Example:
# modprobe brd max_part=63
# ls -l /dev/ram*
brw-rw---- 1 root disk 1, 0 2008-04-03 13:39 /dev/ram0
brw-rw---- 1 root disk 1, 64 2008-04-03 13:39 /dev/ram1
brw-rw---- 1 root disk 1, 640 2008-04-03 13:39 /dev/ram10
brw-rw---- 1 root disk 1, 704 2008-04-03 13:39 /dev/ram11
brw-rw---- 1 root disk 1, 768 2008-04-03 13:39 /dev/ram12
brw-rw---- 1 root disk 1, 832 2008-04-03 13:39 /dev/ram13
brw-rw---- 1 root disk 1, 896 2008-04-03 13:39 /dev/ram14
brw-rw---- 1 root disk 1, 960 2008-04-03 13:39 /dev/ram15
brw-rw---- 1 root disk 1, 128 2008-04-03 13:39 /dev/ram2
brw-rw---- 1 root disk 1, 192 2008-04-03 13:39 /dev/ram3
brw-rw---- 1 root disk 1, 256 2008-04-03 13:39 /dev/ram4
brw-rw---- 1 root disk 1, 320 2008-04-03 13:39 /dev/ram5
brw-rw---- 1 root disk 1, 384 2008-04-03 13:39 /dev/ram6
brw-rw---- 1 root disk 1, 448 2008-04-03 13:39 /dev/ram7
brw-rw---- 1 root disk 1, 512 2008-04-03 13:39 /dev/ram8
brw-rw---- 1 root disk 1, 576 2008-04-03 13:39 /dev/ram9
# fdisk /dev/ram0
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): o
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-2, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-2, default 2): 2
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
# ls -l /dev/ram0*
brw-rw---- 1 root disk 1, 0 2008-04-03 13:40 /dev/ram0
brw-rw---- 1 root disk 1, 1 2008-04-03 13:40 /dev/ram0p1
# mkfs /dev/ram0p1
mke2fs 1.40-WIP (14-Nov-2006)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
4016 inodes, 16032 blocks
801 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=16515072
2 block groups
8192 blocks per group, 8192 fragments per group
2008 inodes per group
Superblock backups stored on blocks:
8193
Writing inode tables: done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 26 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
# mount /dev/ram0p1 /mnt
df /mnt
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/ram0p1 15521 138 14582 1% /mnt
# ls -l /mnt
total 12
drwx------ 2 root root 12288 2008-04-03 13:41 lost+found
# umount /mnt
# rmmod brd
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: Skip I/O merges when disabled
block: add large command support
block: replace sizeof(rq->cmd) with BLK_MAX_CDB
ide: use blk_rq_init() to initialize the request
block: use blk_rq_init() to initialize the request
block: rename and export rq_init()
block: no need to initialize rq->cmd with blk_get_request
block: no need to initialize rq->cmd in prepare_flush_fn hook
block/blk-barrier.c:blk_ordered_cur_seq() mustn't be inline
block/elevator.c:elv_rq_merge_ok() mustn't be inline
block: make queue flags non-atomic
block: add dma alignment and padding support to blk_rq_map_kern
unexport blk_max_pfn
ps3disk: Remove superfluous cast
block: make rq_init() do a full memset()
relay: fix splice problem
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Ed L. Cashin <ecashin@coraid.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some drivers have duplicated unlikely() macros. IS_ERR() already has
unlikely() in itself.
This patch cleans up such pointless code.
Signed-off-by: Hirofumi Nakagawa <hnakagawa@miraclelinux.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jeff Garzik <jeff@garzik.org>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the no longer used aoedev_isbusy().
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Permit the use of partitions with network block devices (NBD).
A new parameter is introduced to define how many partition we want to be able
to manage per network block device. This parameter is "max_part".
For instance, to manage 63 partitions / loop device, we will do:
[on the server side]
# nbd-server 1234 /dev/sdb
[on the client side]
# modprobe nbd max_part=63
# ls -l /dev/nbd*
brw-rw---- 1 root disk 43, 0 2008-03-25 11:14 /dev/nbd0
brw-rw---- 1 root disk 43, 64 2008-03-25 11:11 /dev/nbd1
brw-rw---- 1 root disk 43, 640 2008-03-25 11:11 /dev/nbd10
brw-rw---- 1 root disk 43, 704 2008-03-25 11:11 /dev/nbd11
brw-rw---- 1 root disk 43, 768 2008-03-25 11:11 /dev/nbd12
brw-rw---- 1 root disk 43, 832 2008-03-25 11:11 /dev/nbd13
brw-rw---- 1 root disk 43, 896 2008-03-25 11:11 /dev/nbd14
brw-rw---- 1 root disk 43, 960 2008-03-25 11:11 /dev/nbd15
brw-rw---- 1 root disk 43, 128 2008-03-25 11:11 /dev/nbd2
brw-rw---- 1 root disk 43, 192 2008-03-25 11:11 /dev/nbd3
brw-rw---- 1 root disk 43, 256 2008-03-25 11:11 /dev/nbd4
brw-rw---- 1 root disk 43, 320 2008-03-25 11:11 /dev/nbd5
brw-rw---- 1 root disk 43, 384 2008-03-25 11:11 /dev/nbd6
brw-rw---- 1 root disk 43, 448 2008-03-25 11:11 /dev/nbd7
brw-rw---- 1 root disk 43, 512 2008-03-25 11:11 /dev/nbd8
brw-rw---- 1 root disk 43, 576 2008-03-25 11:11 /dev/nbd9
# nbd-client localhost 1234 /dev/nbd0
Negotiation: ..size = 80418240KB
bs=1024, sz=80418240
-------NOTE, RFC: partition table is not automatically read.
The driver sets bdev->bd_invalidated to 1 to force the read of the partition
table of the device, but this is done only on an open of the device.
So we have to do a "touch /dev/nbdX" or something like that.
It can't be done from the nbd-client or nbd driver because at this
level we can't ask to read the partition table and to serve the request
at the same time (-> deadlock)
If someone has a better idea, I'm open to any suggestion.
-------NOTE, RFC
# fdisk -l /dev/nbd0
Disk /dev/nbd0: 82.3 GB, 82348277760 bytes
255 heads, 63 sectors/track, 10011 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/nbd0p1 * 1 9965 80043831 83 Linux
/dev/nbd0p2 9966 10011 369495 5 Extended
/dev/nbd0p5 9966 10011 369463+ 82 Linux swap / Solaris
# ls -l /dev/nbd0*
brw-rw---- 1 root disk 43, 0 2008-03-25 11:16 /dev/nbd0
brw-rw---- 1 root disk 43, 1 2008-03-25 11:16 /dev/nbd0p1
brw-rw---- 1 root disk 43, 2 2008-03-25 11:16 /dev/nbd0p2
brw-rw---- 1 root disk 43, 5 2008-03-25 11:16 /dev/nbd0p5
# mount /dev/nbd0p1 /mnt
# ls /mnt
bin dev initrd lost+found opt sbin sys var
boot etc initrd.img media proc selinux tmp vmlinuz
cdrom home lib mnt root srv usr
# umount /mnt
# nbd-client -d /dev/nbd0
# ls -l /dev/nbd0*
brw-rw---- 1 root disk 43, 0 2008-03-25 11:16 /dev/nbd0
-------NOTE
On "nbd-client -d", we can do an iocl(BLKRRPART) to update partition table:
as the size of the device is 0, we don't have to serve the partition manager
request (-> no deadlock).
-------NOTE
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch allows Network Block Device to be mounted locally (nbd-client to
nbd-server over 127.0.0.1).
It creates a kthread to avoid the deadlock described in NBD tools
documentation. So, if nbd-client hangs waiting for pages, the kblockd thread
can continue its work and free pages.
I have tested the patch to verify that it avoids the hang that always occurs
when writing to a localhost nbd connection. I have also tested to verify that
no performance degradation results from the additional thread and queue.
Patch originally from Laurent Vivier.
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Neil Brown <neilb@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use creation by full path: "driver/foo".
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduced between 2.6.25-rc2 and -rc3
drivers/block/xen-blkfront.c:139:5: warning: symbol 'blkif_getgeo' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace init_module and cleanup_module with static functions and
module_init/module_exit.
Signed-off-by: Jon Schindler <jkschind@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Any path needs to call it to initialize the request.
This is a preparation for large command support, which needs to
initialize the request in a proper way (that is, just doing a memset()
will not work).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
blk_get_request initializes rq->cmd (rq_init does) so the users don't
need to do that.
The purpose of this patch is to remove sizeof(rq->cmd) and &rq->cmd,
as a preparation for large command support, which changes rq->cmd from
the static array to a pointer. sizeof(rq->cmd) will not make sense and
&rq->cmd won't work.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The block layer initializes rq->cmd (queue_flush calls rq_init) so
prepare_flush_fn hooks don't need to do that.
The purpose of this patch is to remove sizeof(rq->cmd), as a
preparation for large command support, which changes rq->cmd from the
static array to a pointer. sizeof(rq->cmd) will not make sense.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
We can save some atomic ops in the IO path, if we clearly define
the rules of how to modify the queue flags.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
As ps3disk is a ppc64-only driver, sector_t equals to unsigned long, and the
cast is not needed.
Reuse in another (possibly 32-bit) driver is protected by the safety net called
`compiler warning' (with the cast, it may silently truncate to 32-bit).
If sector_t ever changes, we will get a compiler warning as well (with the
cast, we won't).
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Alter the block device ->direct_access() API to work with the new
get_xip_mem() API (that requires both kaddr and pfn are returned).
Some architectures will not do the right thing in their virt_to_page() for use
by XIP (to translate from the kernel virtual address returned by
direct_access(), to a user mappable pfn in XIP's page fault handler.
However, we can't switch it to just return the pfn and not the kaddr, because
we have no good way to get a kva from a pfn, and XIP requires the kva for its
read(2) and write(2) handlers. So we have to return both.
Signed-off-by: Jared Hulbert <jaredeh@gmail.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-mm@kvack.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Before getting merged, xen-blkfront was xenblk and
xen-netfront was xennet.
Temporarily adding compatibility module aliases
eases upgrades from older versions by e.g. allowing
mkinitrd to find the new version of the module.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add module aliases to support autoprobing modules
for xen frontend devices.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When the xen block frontend driver is built as a module the module load
is only synchronous up to the point where the frontend and the backend
become connected rather than when the disk is added.
This means that there can be a race on boot between loading the module and
loading the dm-* modules and doing the scan for LVM physical volumes (all
in the initrd). In the failure case the disk is not present until after the
scan for physical volumes is complete.
Taken from:
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/11483a00c017
Signed-off-by: Christian Limpach <Christian.Limpach@xensource.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
info->dev is never initialized to anything, so bdget(info->dev) is
meaningless. Get rid of info->dev, and use bdget_disk on the gendisk.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Frontends are expected to write their protocol ABI to xenstore. Since
the protocol ABI defaults to the backend's native ABI, things work
fine without that as long as the frontend's native ABI is identical to
the backend's native ABI. This is not the case for xen-blkfront
running 32-on-64, because its ABI differs between 32 and 64 bit, and
thus needs this fix.
Based on http://xenbits.xensource.com/xen-unstable.hg?rev/c545932a18f3
and http://xenbits.xensource.com/xen-unstable.hg?rev/ffe52263b430 by
Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
While looking at the implementation of the Ram backed block device
driver, I stumbled across a write-only local variable, which makes
little sense, so I assume it should actually work like this:
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-2.6.26' of git://git.kernel.dk/linux-2.6-block:
block: fix blk_register_queue() return value
block: fix memory hotplug and bouncing in block layer
block: replace remaining __FUNCTION__ occurrences
Kconfig: clean up block/Kconfig help descriptions
cciss: fix warning oops on rmmod of driver
cciss: Fix race between disk-adding code and interrupt handler
block: move the padding adjustment to blk_rq_map_sg
block: add bio_copy_user_iov support to blk_rq_map_user_iov
block: convert bio_copy_user to bio_copy_user_iov
loop: manage partitions in disk image
cdrom: use kmalloced buffers instead of buffers on stack
cdrom: make unregister_cdrom() return void
cdrom: use list_head for cdrom_device_info list
cdrom: protect cdrom_device_info list by mutex
cdrom: cleanup hardcoded error-code
cdrom: remove ifdef CONFIG_SYSCTL
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (202 commits)
[POWERPC] Fix compile breakage for 64-bit UP configs
[POWERPC] Define copy_siginfo_from_user32
[POWERPC] Add compat handler for PTRACE_GETSIGINFO
[POWERPC] i2c: Fix build breakage introduced by OF helpers
[POWERPC] Optimize fls64() on 64-bit processors
[POWERPC] irqtrace support for 64-bit powerpc
[POWERPC] Stacktrace support for lockdep
[POWERPC] Move stackframe definitions to common header
[POWERPC] Fix device-tree locking vs. interrupts
[POWERPC] Make pci_bus_to_host()'s struct pci_bus * argument const
[POWERPC] Remove unused __max_memory variable
[POWERPC] Simplify xics direct/lpar irq_host setup
[POWERPC] Use pseries_setup_i8259_cascade() in pseries_mpic_init_IRQ()
[POWERPC] Turn xics_setup_8259_cascade() into a generic pseries_setup_i8259_cascade()
[POWERPC] Move xics_setup_8259_cascade() into platforms/pseries/setup.c
[POWERPC] Use asm-generic/bitops/find.h in bitops.h
[POWERPC] 83xx: mpc8315 - fix USB UTMI Host setup
[POWERPC] 85xx: Fix the size of qe muram for MPC8568E
[POWERPC] 86xx: mpc86xx_hpcn - Temporarily accept old dts node identifier.
[POWERPC] 86xx: mark functions static, other minor cleanups
...
* Fix oops on cciss rmmod due to calling pci_free_consistent with
irqs disabled.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Fix race condition between cciss_init_one(), cciss_update_drive_info(),
and cciss_check_queues().
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch allows to use loop device with partitionned disk image.
Original behavior of loop is not modified.
A new parameter is introduced to define how many partition we want to be
able to manage per loop device. This parameter is "max_part".
For instance, to manage 63 partitions / loop device, we will do:
# modprobe loop max_part=63
# ls -l /dev/loop?*
brw-rw---- 1 root disk 7, 0 2008-03-05 14:55 /dev/loop0
brw-rw---- 1 root disk 7, 64 2008-03-05 14:55 /dev/loop1
brw-rw---- 1 root disk 7, 128 2008-03-05 14:55 /dev/loop2
brw-rw---- 1 root disk 7, 192 2008-03-05 14:55 /dev/loop3
brw-rw---- 1 root disk 7, 256 2008-03-05 14:55 /dev/loop4
brw-rw---- 1 root disk 7, 320 2008-03-05 14:55 /dev/loop5
brw-rw---- 1 root disk 7, 384 2008-03-05 14:55 /dev/loop6
brw-rw---- 1 root disk 7, 448 2008-03-05 14:55 /dev/loop7
And to attach a raw partitionned disk image, the original losetup is used:
# losetup -f etch.img
# ls -l /dev/loop?*
brw-rw---- 1 root disk 7, 0 2008-03-05 14:55 /dev/loop0
brw-rw---- 1 root disk 7, 1 2008-03-05 14:57 /dev/loop0p1
brw-rw---- 1 root disk 7, 2 2008-03-05 14:57 /dev/loop0p2
brw-rw---- 1 root disk 7, 5 2008-03-05 14:57 /dev/loop0p5
brw-rw---- 1 root disk 7, 64 2008-03-05 14:55 /dev/loop1
brw-rw---- 1 root disk 7, 128 2008-03-05 14:55 /dev/loop2
brw-rw---- 1 root disk 7, 192 2008-03-05 14:55 /dev/loop3
brw-rw---- 1 root disk 7, 256 2008-03-05 14:55 /dev/loop4
brw-rw---- 1 root disk 7, 320 2008-03-05 14:55 /dev/loop5
brw-rw---- 1 root disk 7, 384 2008-03-05 14:55 /dev/loop6
brw-rw---- 1 root disk 7, 448 2008-03-05 14:55 /dev/loop7
# mount /dev/loop0p1 /mnt
# ls /mnt
bench cdrom home lib mnt root srv usr
bin dev initrd lost+found opt sbin sys var
boot etc initrd.img media proc selinux tmp vmlinuz
# umount /mnt
# losetup -d /dev/loop0
Of course, the same behavior can be done using kpartx on a loop device,
but modifying loop avoids to stack several layers of block device (loop +
device mapper), this is a very light modification (40% of modifications
are to manage the new parameter).
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
This patch adds the missing include directive <linux/scatterlist.h> to the
cciss.c source file. This was discovered by our release team when building
the kernel for the Alpha architecture.
Errors were found as references to functions 'sg_init_table' and 'sg_page' do
not exist without the include for Alpha.
Signed-off-by: Mike Pagano <mpagano@gentoo.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When __blk_end_request returns nonzero, it means that the request was
not completely processed and some BIOs are still attached. Since we
have dequeued it by that time, it means leaking requests and hanging
processes, which is why BUG() was in there. In ub this happens if
a packet request ends normally, but with residue (e.g. when scsi_id
issues INQUIRY).
The fix is to make sure that arguments passed to __blk_end_request
are correct: the full request length and not just transferred length.
The transferred length is indicated to applications by adjusting
rq->data_len with old, unchanged code outside of this patch.
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Cc: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Cc: Greg KH <greg@kroah.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NBD does not protect the nbd_device's socket from becoming NULL during
receives.
This closes a race with the NBD_CLEAR_SOCK ioctl (nbd-client -d) setting
the nbd_device's socket to NULL right before NBD calls sock_xmit.
Signed-off-by: Mike Snitzer <snitzer@gmail.com>
Cc: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P.J. Day proposed to use the macro FIELD_SIZEOF in replace of code
that matches its definition.
The modification was made using the following semantic patch
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@haskernel@
@@
#include <linux/kernel.h>
@depends on haskernel@
type t;
identifier f;
@@
- (sizeof(((t*)0)->f))
+ FIELD_SIZEOF(t, f)
@depends on haskernel@
type t;
identifier f;
@@
- sizeof(((t*)0)->f)
+ FIELD_SIZEOF(t, f)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
Revert "unexport bio_{,un}map_user"
relay: fix subbuf_splice_actor() adding too many pages
The ps2esdi driver was marked as BROKEN more than two years ago due to being
Fix up so that the virtio_blk devices in sysfs link correctly to their
block device. This then allows them to be detected by hal, etc
Signed-off-by: Jeremy Katz <katzj@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
no longer working for some time.
A driver that had been marked as BROKEN for such a long time seems to be
unlikely to be revived in the forseeable future.
But if anyone wants to ever revive this driver, the code is still present in
the older kernel releases.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Alan Cox <alan@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Floppy rmmod locks up when no such hardware was initialized, since there is
nobody to wake the remove code up. Remove the completion, because release is
called during platform_unregister anyway.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The iSeries viodasd drivers does some very strange things with
scatterlists, one of these causing a BUG_ON to trigger when
scatterlist debugging is enabled due to initializing the
scatterlist with memset instead of sg_init_table().
This fixes it by using sg_init_table(). The rest of the stuff
it does to that poor list is still pretty awful but it will work.
I may look into fixing things in a nicer way some other time.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On my system, pkt_open() consumes 584 bytes because the compiler decides to
inline lots of functions that would not normally be part of long call chains.
The following patch fixes that problem on my system.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Cc: Nix <nix@esperi.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes the #define READ_AHEAD 1024 from the driver and uses the
block layer defaults, instead. We have found that under certain workloads
the setting can cause a disk connected to the e200 controller to go offline.
If the disk hiccups the link may try to downshift but the controller is
never notified that the link successfully completed the renegotiation.
We've also found that performance using the block layer default of 32 pages
was on par with the 1024 setting. We tried setting it to zero at one time
based on info from our firmware guys but that killed performance. Turns out
we were talking about 2 different read ahead settings.
Please consider this for inclusion.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
volumes
This patch allows us to display information about all of the logical volumes
configured on a particular controller without stepping on memory even when
there are many volumes (128 or more) configured.
Please consider this for inclusion.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
NBD doesn't work well with CFQ (or AS) schedulers, so let's default to
something else.
The two problems I have experienced with nbd and cfq are:
1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been
fixed
There's a similar debian bug that has been filed as well:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638
There have been posts to nbd-general mailing list about problems with
cfq and nbd also.
2) nbd performs about 10% better (the last time I tested) with deadline
vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not
being a real disk], and you end up going through the I/O scheduler on
the nbd server anyway, so it makes sense that deadline is better with
nbd)
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The below implements the getgeo hook for Xen block devices. Extracted
from the xen-unstable tree where it has been used for ages.
It is useful to have because it allows things like grub2 (used by the
Debian installer images) to work in a guest domain without having to
sprinkle Xen specific hacks around the place.
Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current pmac32_defconfig fails to build with the following error:
Building modules, stage 2.
ERROR: "check_media_bay" [drivers/block/swim3.ko] undefined!
WARNING: modpost: Found 23 section mismatch(es).
To see full details build your kernel with:
'make CONFIG_DEBUG_SECTION_MISMATCH=y'
make[2]: *** [__modpost] Error 1
This patch fixes that.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the arbitrary 128 device limit for NBD. nbds_max can now be set to
any number. In certain scenarios where devices are used sparsely we have
run into the 128 device limit.
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I guess aoedev_init() can go away now.
Cc: Greg KH <greg@kroah.com>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update the year in the copyright notices.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton pointed out that the "too many targets" message in patch 2 could
be printed for failing GFP_ATOMIC allocations. This patch makes the messages
more specific.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The aoedev aoeminor member doesn't need a long format.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An AoE target provides an estimate of the number of outstanding commands that
the AoE initiator can send before getting a response. The aoe_maxout
parameter provides a way to set an even lower limit. It will not allow a user
to use more outstanding commands than the target permits. If a user discovers
a problem with a large setting, this parameter provides a way for us to work
with them to debug the problem. We expect to improve the dynamic window
sizing algorithm and drop this parameter. For the time being, it is a
debugging aid.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An aoe driver user who had about 70 AoE targets found that he was hitting a
BUG in sysfs_create_file because the aoe driver was trying to tell the kernel
about an AoE device more than once. Each AoE device was reachable by several
local network interfaces, and multiple ATA device indentify responses were
returning from that single device.
This patch eliminates a race condition so that aoe always informs the block
layer of a new AoE device once in the presence of multiple incoming ATA device
identify responses.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
What this Patch Does
Even before this recent series of 12 patches to 2.6.22-rc4, the aoe
driver was reusing a small set of skbs that were allocated once and
were only used for outbound AoE commands.
The network layer cannot be allowed to put_page on the data that is
still associated with a bio we haven't returned to the block layer,
so the aoe driver (even before the patch under discussion) is still
the owner of skbs that have been handed to the network layer for
transmission. We need to keep track of these skbs so that we can
free them, but by tracking them, we can also easily re-use them.
The new patch was a response to the behavior of certain network
drivers. We cannot reuse an skb that the network driver still has
in its transmit ring. Network drivers can defer transmit ring
cleanup and then use the state in the skb to determine how many data
segments to clean up in its transmit ring. The tg3 driver is one
driver that behaves in this way.
When the network driver defers cleanup of its transmit ring, the aoe
driver can find itself in a situation where it would like to send an
AoE command, and the AoE target is ready for more work, but the
network driver still has all of the pre-allocated skbs. In that
case, the new patch just calls alloc_skb, as you'd expect.
We don't want to get carried away, though. We try not to do
excessive allocation in the write path, so we cap the number of skbs
we dynamically allocate.
Probably calling it a "dynamic pool" is misleading. We were already
trying to use a small fixed-size set of pre-allocated skbs before
this patch, and this patch just provides a little headroom (with a
ceiling, though) to accomodate network drivers that hang onto skbs,
by allocating when needed. The d->skbpool_hd list of allocated skbs
is necessary so that we can free them later.
We didn't notice the need for this headroom until AoE targets got
fast enough.
Alternatives
If the network layer never did a put_page on the pages in the bio's
we get from the block layer, then it would be possible for us to
hand skbs to the network layer and forget about them, allowing the
network layer to free skbs itself (and thereby calling our own
skb->destructor callback function if we needed that). In that case
we could get rid of the pre-allocated skbs and also the
d->skbpool_hd, instead just calling alloc_skb every time we wanted
to transmit a packet. The slab allocator would effectively maintain
the list of skbs.
Besides a loss of CPU cache locality, the main concern with that
approach the danger that it would increase the likelihood of
deadlock when VM is trying to free pages by writing dirty data from
the page cache through the aoe driver out to persistent storage on
an AoE device. Right now we have a situation where we have
pre-allocation that corresponds to how much we use, which seems
ideal.
Of course, there's still the separate issue of receiving the packets
that tell us that a write has successfully completed on the AoE
target. When memory is low and VM is using AoE to flush dirty data
to free up pages, it would be perfect if there were a way for us to
register a fast callback that could recognize write command
completion responses. But I don't think the current problems with
the receive side of the situation are a justification for
exacerbating the problem on the transmit side.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When an AoE device is detected, the kernel is informed, and a new block device
is created. If the device is unused, the block device corresponding to remote
device that is no longer available may be removed from the system by telling
the aoe driver to "flush" its list of devices.
Without this patch, software like GPFS and LVM may attempt to read from AoE
devices that were discovered earlier but are no longer present, blocking until
the I/O attempt times out.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adam Richter suggested eliminating this goto.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By returning unsigned long long, mac_addr does not generate compiler warnings
on 64-bit architectures.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A remote AoE device is something can process ATA commands and is identified by
an AoE shelf number and an AoE slot number. Such a device might have more
than one network interface, and it might be reachable by more than one local
network interface. This patch tracks the available network paths available to
each AoE device, allowing them to be used more efficiently.
Andrew Morton asked about the call to msleep_interruptible in the revalidate
function. Yes, if a signal is pending, then msleep_interruptible will not
return 0. That means we will not loop but will call aoenet_xmit with a NULL
skb, which is a noop. If the system is too low on memory or the aoe driver is
too low on frames, then the user can hit control-C to interrupt the attempt to
do a revalidate. I have added a comment to the code summarizing that.
Andrew Morton asked whether the allocation performed inside addtgt could use a
more relaxed allocation like GFP_KERNEL, but addtgt is called when the aoedev
lock has been locked with spin_lock_irqsave. It would be nice to allocate the
memory under fewer restrictions, but targets are only added when the device is
being discovered, and if the target can't be added right now, we can try again
in a minute when then next AoE config query broadcast goes out.
Andrew Morton pointed out that the "too many targets" message could be printed
for failing GFP_ATOMIC allocations. The last patch in this series makes the
messages more specific.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Support direct_access XIP method with brd.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a rewrite of the ramdisk block device driver.
The old one is really difficult because it effectively implements a block
device which serves data out of its own buffer cache. It relies on the dirty
bit being set, to pin its backing store in cache, however there are non
trivial paths which can clear the dirty bit (eg. try_to_free_buffers()),
which had recently lead to data corruption. And in general it is completely
wrong for a block device driver to do this.
The new one is more like a regular block device driver. It has no idea about
vm/vfs stuff. It's backing store is similar to the buffer cache (a simple
radix-tree of pages), but it doesn't know anything about page cache (the pages
in the radix tree are not pagecache pages).
There is one slight downside -- direct block device access and filesystem
metadata access goes through an extra copy and gets stored in RAM twice.
However, this downside is only slight, because the real buffercache of the
device is now reclaimable (because we're not playing crazy games with it), so
under memory intensive situations, footprint should effectively be the same --
maybe even a slight advantage to the new driver because it can also reclaim
buffer heads.
The fact that it now goes through all the regular vm/fs paths makes it
much more useful for testing, too.
text data bss dec hex filename
2837 849 384 4070 fe6 drivers/block/rd.o
3528 371 12 3911 f47 drivers/block/brd.o
Text is larger, but data and bss are smaller, making total size smaller.
A few other nice things about it:
- Similar structure and layout to the new loop device handlinag.
- Dynamic ramdisk creation.
- Runtime flexible buffer head size (because it is no longer part of the
ramdisk code).
- Boot / load time flexible ramdisk size, which could easily be extended
to a per-ramdisk runtime changeable size (eg. with an ioctl).
- Can use highmem for the backing store.
[akpm@linux-foundation.org: fix build]
[byron.bbradley@gmail.com: make rd_size non-static]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Byron Bradley <byron.bbradley@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>