Mostly tidying up code in preparation for some bigger changes
next time.
A few bug fixes tagged for -stable.
Main functionality change is that some RAID10 arrays can now
grow to use extra space that may have been made available on the
individual devices.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIVAwUAT2bLBjnsnt1WYoG5AQKN3xAAv1UlR5Kem5WN7Ex4lmR9xj3lr9dbURYT
TtvrUuCy3pYYWdTuijb+IBqkbODF0kPDHIhUiBx9fXUfMavkp/b9heXS/vJ3pcH4
1j99NUbOGL/AylD1TPRV9TQxGTKhEjK3n26bY0t/amLc92bWJaytMO1B9cz38LN+
qx6ufpIepz4DPXXtPYpnkBR4cZ6L4/ZXQvjf5BqG6WfKwc+0Nyncg8ipYEqhBWy7
R7ztF5yPo0yl96Wopa2KG91OroWflmyZo1DNYcbUbKtbNGGtYC92GFadOH+wNupM
FnmXv10ivfVGU5w4SpshAwOg+4OSUqmWNsBxUhpYbf8ChbN+lOl0VZdH6UBxo19D
3SqZWT/yz4I4HYd5rtr35MXFdOeBNM++CHQs4F68BLA0B6OcHfWsA9bvly2tnBVx
iEBFPd277qWztUr8m6yz7AFf/0dgyXuIhuB3d7IkVrG5yG3FX6hPi2T0FSA33qMx
Lwi5w6O4DREg5tG09xEYEnXgXe+PnB8HsKb1U/m76XMQ0UScvX6dLA6934Vg+DCv
xf+AYqob0Tc/Op5I7h2PbVXq7DciNXwlX1WvM0m+TEaV+3fl1FB0VsCcANAV6JVn
uRLmvtePQRt0hxAog2p7OsumVnxMhbuEo5h8rJMKWM7IbhueKNoz+gBwpcFLzBmY
ygWc4peLQpE=
=MGuM
-----END PGP SIGNATURE-----
Merge tag 'md-3.4' of git://neil.brown.name/md
Pull md updates for 3.4 from Neil Brown:
"Mostly tidying up code in preparation for some bigger changes next
time.
A few bug fixes tagged for -stable.
Main functionality change is that some RAID10 arrays can now grow to
use extra space that may have been made available on the individual
devices."
Fixed up trivial conflicts with the k[un]map_atomic() cleanups in
drivers/md/bitmap.c.
* tag 'md-3.4' of git://neil.brown.name/md: (22 commits)
md: Add judgement bb->unacked_exist in function md_ack_all_badblocks().
md: fix clearing of the 'changed' flags for the bad blocks list.
md/bitmap: discard CHUNK_BLOCK_SHIFT macro
md/bitmap: remove unnecessary indirection when allocating.
md/bitmap: remove some pointless locking.
md/bitmap: change a 'goto' to a normal 'if' construct.
md/bitmap: move printing of bitmap status to bitmap.c
md/bitmap: remove some unused noise from bitmap.h
md/raid10 - support resizing some RAID10 arrays.
md/raid1: handle merge_bvec_fn in member devices.
md/raid10: handle merge_bvec_fn in member devices.
md: add proper merge_bvec handling to RAID0 and Linear.
md: tidy up rdev_for_each usage.
md/raid1,raid10: avoid deadlock during resync/recovery.
md/bitmap: ensure to load bitmap when creating via sysfs.
md: don't set md arrays to readonly on shutdown.
md: allow re-add to failed arrays.
md/raid5: use atomic_dec_return() instead of atomic_dec() and atomic_read().
md: Use existed macros instead of numbers
md/raid5: removed unused 'added_devices' variable.
...
Pull kmap_atomic cleanup from Cong Wang.
It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().
Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.
* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
drbd: remove the second argument of k[un]map_atomic()
zcache: remove the second argument of k[un]map_atomic()
gma500: remove the second argument of k[un]map_atomic()
dm: remove the second argument of k[un]map_atomic()
tomoyo: remove the second argument of k[un]map_atomic()
sunrpc: remove the second argument of k[un]map_atomic()
rds: remove the second argument of k[un]map_atomic()
net: remove the second argument of k[un]map_atomic()
mm: remove the second argument of k[un]map_atomic()
lib: remove the second argument of k[un]map_atomic()
power: remove the second argument of k[un]map_atomic()
kdb: remove the second argument of k[un]map_atomic()
udf: remove the second argument of k[un]map_atomic()
ubifs: remove the second argument of k[un]map_atomic()
squashfs: remove the second argument of k[un]map_atomic()
reiserfs: remove the second argument of k[un]map_atomic()
ocfs2: remove the second argument of k[un]map_atomic()
ntfs: remove the second argument of k[un]map_atomic()
...
Pull trivial tree from Jiri Kosina:
"It's indeed trivial -- mostly documentation updates and a bunch of
typo fixes from Masanari.
There are also several linux/version.h include removals from Jesper."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
kcore: fix spelling in read_kcore() comment
constify struct pci_dev * in obvious cases
Revert "char: Fix typo in viotape.c"
init: fix wording error in mm_init comment
usb: gadget: Kconfig: fix typo for 'different'
Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
writeback: fix typo in the writeback_control comment
Documentation: Fix multiple typo in Documentation
tpm_tis: fix tis_lock with respect to RCU
Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
Doc: Update numastat.txt
qla4xxx: Add missing spaces to error messages
compiler.h: Fix typo
security: struct security_operations kerneldoc fix
Documentation: broken URL in libata.tmpl
Documentation: broken URL in filesystems.tmpl
mtd: simplify return logic in do_map_probe()
mm: fix comment typo of truncate_inode_pages_range
power: bq27x00: Fix typos in comment
...
If there are no unacked bad blocks, then there is no point searching
for them to acknowledge them.
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
In super_1_sync (the first hunk) we need to clear 'changed' before
checking read_seqretry(), otherwise we might race with other code
adding a bad block and so won't retry later.
In md_update_sb (the second hunk), in the case where there is no
metadata (neither persistent nor external), we treat any bad blocks as
an error. However we need to clear the 'changed' flag before calling
md_ack_all_badblocks, else it won't do anything.
This patch is suitable for -stable release 3.0 and later.
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Be redefining ->chunkshift as the shift from sectors to chunks rather
than bytes to chunks, we can just use "bitmap->chunkshift" which is
shorter than the macro call, and less indirect.
Signed-off-by: NeilBrown <neilb@suse.de>
These funcitons don't add anything useful except possibly the trace
points, and I don't think they are worth the extra indirection.
So remove them.
Signed-off-by: NeilBrown <neilb@suse.de>
There is nothing gained by holding a lock while we check if a pointer
is NULL or not. If there could be a race, then it could become NULL
immediately after the unlock - but there is no race here.
So just remove the locking.
Signed-off-by: NeilBrown <neilb@suse.de>
The use of a goto makes the control flow more obscure here.
So make it a normal:
if (x) {
Y;
}
No functional change.
Signed-off-by: NeilBrown <neilb@suse.de>
The part of /proc/mdstat which describes the bitmap should really
be generated by code in bitmap.c. So move it there.
Signed-off-by: NeilBrown <neilb@suse.de>
'resizing' an array in this context means making use of extra
space that has become available in component devices, not adding new
devices.
It also includes shrinking the array to take up less space of
component devices.
This is not supported for array with a 'far' layout. However
for 'near' and 'offset' layout arrays, adding and removing space at
the end of the devices is easy to support, and this patch provides
that support.
Signed-off-by: NeilBrown <neilb@suse.de>
Currently we don't honour merge_bvec_fn in member devices so if there
is one, we force all requests to be single-page at most.
This is not ideal.
So create a raid1 merge_bvec_fn to check that function in children
as well.
This introduces a small problem. There is no locking around calls
the ->merge_bvec_fn and subsequent calls to ->make_request. So a
device added between these could end up getting a request which
violates its merge_bvec_fn.
Currently the best we can do is synchronize_sched(). This will work
providing no preemption happens. If there is is preemption, we just
have to hope that new devices are largely consistent with old devices.
Signed-off-by: NeilBrown <neilb@suse.de>
Currently we don't honour merge_bvec_fn in member devices so if there
is one, we force all requests to be single-page at most.
This is not ideal.
So enhance the raid10 merge_bvec_fn to check that function in children
as well.
This introduces a small problem. There is no locking around calls
the ->merge_bvec_fn and subsequent calls to ->make_request. So a
device added between these could end up getting a request which
violates its merge_bvec_fn.
Currently the best we can do is synchronize_sched(). This will work
providing no preemption happens. If there is preemption, we just
have to hope that new devices are largely consistent with old devices.
Signed-off-by: NeilBrown <neilb@suse.de>
These personalities currently set a max request size of one page
when any member device has a merge_bvec_fn because they don't
bother to call that function.
This causes extra works in splitting and combining requests.
So make the extra effort to call the merge_bvec_fn when it exists
so that we end up with larger requests out the bottom.
Signed-off-by: NeilBrown <neilb@suse.de>
md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
mddev. However it uses the 'safe' version of list_for_each_entry,
and so requires the extra variable, but doesn't include 'safe' in the
name, which is useful documentation.
Consequently some places use this safe version without needing it, and
many use an explicity list_for_each entry.
So:
- rename rdev_for_each to rdev_for_each_safe
- create a new rdev_for_each which uses the plain
list_for_each_entry,
- use the 'safe' version only where needed, and convert all other
list_for_each_entry calls to use rdev_for_each.
Signed-off-by: NeilBrown <neilb@suse.de>
If RAID1 or RAID10 is used under LVM or some other stacking
block device, it is possible to enter a deadlock during
resync or recovery.
This can happen if the upper level block device creates
two requests to the RAID1 or RAID10. The first request gets
processed, blocks recovery and queue requests for underlying
requests in current->bio_list. A resync request then starts
which will wait for those requests and block new IO.
But then the second request to the RAID1/10 will be attempted
and it cannot progress until the resync request completes,
which cannot progress until the underlying device requests complete,
which are on a queue behind that second request.
So allow that second request to proceed even though there is
a resync request about to start.
This is suitable for any -stable kernel.
Cc: stable@vger.kernel.org
Reported-by: Ray Morris <support@bettercgi.com>
Tested-by: Ray Morris <support@bettercgi.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When commit 69e51b449d (md/bitmap: separate out loading a bitmap...)
created bitmap_load, it missed calling it after bitmap_create when a
bitmap is created through the sysfs interface.
So if a bitmap is added this way, we don't allocate memory properly
and can crash.
This is suitable for any -stable release since 2.6.35.
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
It seems that with recent kernel, writeback can still be happening
while shutdown is happening, and consequently data can be written
after the md reboot notifier switches all arrays to read-only.
This causes a BUG.
So don't switch them to read-only - just mark them clean and
set 'safemode' to '2' which mean that immediately after any
write the array will be switch back to 'clean'.
This could result in the shutdown happening when array is marked
dirty, thus forcing a resync on reboot. However if you reboot
without performing a "sync" first, you get to keep both halves.
This is suitable for any stable kernel (though there might be some
conflicts with obvious fixes in earlier kernels).
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
When an array is failed (some data inaccessible) then there is no
point attempting to add a spare as it could not possibly be recovered.
However that may be value in re-adding a recently removed device.
e.g. if there is a write-intent-bitmap and it is clear, then access
to the data could be restored by this action.
So don't reject a re-add to a failed array for RAID10 and RAID5 (the
only arrays types that check for a failed array).
Signed-off-by: NeilBrown <neilb@suse.de>
Recent commit 4ca40c2ce0 (md/raid10: Allow replacement device ...)
added an smp_mb in end_sync_write.
This was to close a possible race with raid10_remove_disk.
However there is no such race as it is never attempted to remove a
disk while resync (or recovery) is happening.
so the smp_mb is just noise.
Signed-off-by: NeilBrown <neilb@suse.de>
Fix dm-raid flush support.
Both md and dm have support for flush, but the dm-raid target
forgot to set the flag to indicate that flushes should be
passed on. (Important for data integrity e.g. with writeback cache
enabled.)
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The 'rebuild' parameter is used to rebuild individual devices in an
array (e.g. resynchronize a RAID1 device or recalculate a parity device
in higher RAID). The MD_CHANGE_DEVS flag must be set when this
parameter is given in order to write out the superblocks and make the
change take immediate effect. The code that handles new devices in
super_load already sets MD_CHANGE_DEVS and 'FirstUse'. (The 'FirstUse'
flag was being set as a special case for rebuilds in
super_init_validation.)
Add a condition for rebuilds in super_load to take care of both flags
without the special case in 'super_init_validation'.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Correct the number of mapped sectors shown on a thin device's
status line by decrementing td->mapped_blocks in __remove() each time
a block is removed.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
If dm_sm_disk_create() fails the superblock must be unlocked.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The __open_device() error paths in __create_thin() and __create_snap()
incorrectly call __close_device() even if td was not initialized by
__open_device(). Remove this.
Also document __open_device() return values, remove a redundant
td->changed = 1 in __create_thin(), and insert an additional
safeguard against creating an already-existing device.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The following BUG is hit on the first read that is submitted to a dm
flakey test device while the device is "down" if the corrupt_bio_byte
feature wasn't requested when the device's table was loaded.
Example DM table that will hit this BUG:
0 2097152 flakey 8:0 2048 0 30
This bug was introduced by commit a3998799fb
(dm flakey: add corrupt_bio_byte feature) in v3.1-rc1.
BUG: unable to handle kernel paging request at ffff8801cfce3fff
IP: [<ffffffffa008c233>] corrupt_bio_data+0x6e/0xae [dm_flakey]
PGD 1606063 PUD 0
Oops: 0002 [#1] SMP
...
Call Trace:
<IRQ>
[<ffffffffa008c2b5>] flakey_end_io+0x42/0x48 [dm_flakey]
[<ffffffffa00dca98>] clone_endio+0x54/0xb6 [dm_mod]
[<ffffffff81130587>] bio_endio+0x2d/0x2f
[<ffffffff811c819a>] req_bio_endio+0x96/0x9f
[<ffffffff811c94b9>] blk_update_request+0x1dc/0x3a9
[<ffffffff812f5ee2>] ? rcu_read_unlock+0x21/0x23
[<ffffffff811c96a6>] blk_update_bidi_request+0x20/0x6e
[<ffffffff811c9713>] blk_end_bidi_request+0x1f/0x5d
[<ffffffff811c978d>] blk_end_request+0x10/0x12
[<ffffffff8128f450>] scsi_io_completion+0x1e5/0x4b1
[<ffffffff812882a9>] scsi_finish_command+0xec/0xf5
[<ffffffff8128f830>] scsi_softirq_done+0xff/0x108
[<ffffffff811ce284>] blk_done_softirq+0x84/0x98
[<ffffffff81048d19>] __do_softirq+0xe3/0x1d5
[<ffffffff8138f83f>] ? _raw_spin_lock+0x62/0x69
[<ffffffff810997cf>] ? handle_irq_event+0x4c/0x61
[<ffffffff8139833c>] call_softirq+0x1c/0x30
[<ffffffff81003b37>] do_softirq+0x4b/0xa3
[<ffffffff81048a39>] irq_exit+0x53/0xca
[<ffffffff81398acd>] do_IRQ+0x9d/0xb4
[<ffffffff81390333>] common_interrupt+0x73/0x73
...
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.1+
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch fixes a crash by recognising discards in dm_io.
Currently dm_mirror can send REQ_DISCARD bios if running over a
discard-enabled device and without support in dm_io the system
crashes badly.
BUG: unable to handle kernel paging request at 00800000
IP: __bio_add_page.part.17+0xf5/0x1e0
...
bio_add_page+0x56/0x70
dispatch_io+0x1cf/0x240 [dm_mod]
? km_get_page+0x50/0x50 [dm_mod]
? vm_next_page+0x20/0x20 [dm_mod]
? mirror_flush+0x130/0x130 [dm_mirror]
dm_io+0xdc/0x2b0 [dm_mod]
...
Introduced in 2.6.38-rc1 by commit 5fc2ffeabb
(dm raid1: support discard).
Signed-off-by: Milan Broz <mbroz@redhat.com>
Cc: stable@kernel.org
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
If 'argc' is zero we jump to the 'out:' label, but this leaks the
(unused) memory that 'dm_split_args()' allocated for 'argv' if the
string being split consisted entirely of whitespace. Jump to the
'out_argv:' label instead to free up that memory.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2 relate to the recently added drive replacement.
One causes read error in RAID10 to sometimes be retried indefinitely.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIVAwUAT1VI1znsnt1WYoG5AQK47Q//d51y5QCpABFNUcgIM626zJXlBWFUSmzU
wFOGXh5emN6/TWguzkiZwrvcspDmXMzz1zmJtGWixYb2jBpn2MHEN4uNz3Vq68w+
IYk/dJg/CG4+lzX+6IjiHOb3+TASRx94QZHJASx68vypqniAyikshqcbUeZBMTB0
Fu+sKqsOGYmwQfe6/vtRPVXY7DYK2dFDBRMFpmOl+o4Y2XxmmWzMw4Dg1RIEdtFS
Jo9GwLHTnlw2xoc0XooufeT0Q2KOpqi9T8L6Nj0ORwpgsFqgtZ/kIOoGU6qOpSri
ofLTrobVKMpjFtmiYVOp9TaBlPnd/TNX3E4WPLGNsAwYuRUFjq8evmJKjG+pOdeB
3ArxRKRJCaI2jnVhH+NpT7i/tpkEg/8a/BoOAihX+hM/8QkmsWluaRBOGMhpuuuc
1baPVTusi/zijO9cM8RGIXaQj5UG4s3LUpCIOIYdDyxsfmAH5KN1F2EPrU4NMME2
96THSshIZLkgAg5ICwtva0qoHlBlEclAlVAzEomT7R9KwHojEB1xUiyMmaIdMFoy
JjGFAMp2E5+KBKZ1eYEHjthPWCb+nZ3eYHUh0DOnEt4kASCXnn45GJREQkpkNIR/
HhDTS8vI743unKnbCtYFMxiw/9OXZbMkdoZhobg7lxcpoQlWJ+5ziOtACl0h0Kv8
+ET+Kp3W8K4=
=93ms
-----END PGP SIGNATURE-----
Merge tag 'md-3.3-fixes' of git://neil.brown.name/md
Pull md fixes from Neil Brown:
"Three fixes for md in 3.3-rc: Two relate to the recently added drive
replacement. One fixes the problem where a read error in RAID10 would
sometimes be retried indefinitely."
* tag 'md-3.3-fixes' of git://neil.brown.name/md:
md/raid10: fix assembling of arrays with replacement devices.
md/raid10: fix handling of error on last working device in array.
md/raid1: fix buglet in md_raid1_contested.
commit 56a2559bb6 (md/raid10: recognise replacements ...)
changed 'run' to set ->replacement or ->rdev depending on the
'Replacement' status if the device, but it didn't remove the
old unconditional setting of 'rdev'. So it was largely ineffective.
So remove that now.
Signed-off-by: NeilBrown <neilb@suse.de>
If we get a read error on the last working device in a RAID10 which
contains the target block, then we don't fail the device (which is
good) but we don't abort retries, which is wrong.
We end up in an infinite loop retrying the read on the one device.
This patch fixes the problem in two places:
1/ in raid10_end_read_request we don't even ask for a retry if this
was the last usable device. This is efficient but a little racy
and will sometimes retry when it should not.
2/ in handle_read_error we are careful to exclude any device from
retry which we tried to mark as faulty (that might have failed if
it was the last device). This is race-free but less efficient.
Signed-off-by: NeilBrown <neilb@suse.de>
Since we added 'replacement' capability, RAID1 can have twice
as many devices as ->raid_disks indicates.
So md_raid1_congested needs to check that many possible devices,
not just ->raid_disks many.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ two small fixes to ensure we handle an interrupted resync properly.
2/ avoid loading the bitmap multiple times in dm-raid
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIVAwUATzMdiTnsnt1WYoG5AQKICw/9H3Xf/3crCCVRQ+yzSdZ1ZJH24Rps9O6W
8dLFN4/Ng/qxymWUMrgHAMq5MEEz2M3i7W+j23lFv6Oce06y8GJ4PpoYY5xlXCgO
SIU1BaO1JFHxQn89EQtP3iOn4AOiZvX0GUObR0P8KO1mMnLmN7cg8J1kBfmQiBKu
aXcUqqNvcywoix6ve4O/xgnZjd4IExxqG3W8U7CaIwExUDwaLY4NckxJcIJbIYy9
iapOGMUdcyr6xm819V/xE2DyAtfFCtvAk1hfW/dM4QQctran3MzQIRFn9RW+CwHU
ComEnv5ti/7g//JPXQArUPk4xgRHrMhqFcmmD8rozJ6FJDi8vw2e0BXaRLVqa0mK
1qSZkr0Ot3nwAdILzgSbNXQ0Y5OJgc9OLX5GGlVibTW2VTJYFgA7jAsnqq8PAJC5
sU5h2K3jrSy2unGy6BxleL5D/wvREE5OBnW35TEB5TYbxjp1FLgn+BWp8FfFUYWT
Eb2cIyAj6cBFJ3ma1K0RH0dmS9cbNjuG+CLiApJOnEEsXzrp/4KnqOwg4672ewW3
m1Ue2Qv+0avaK3sVyT+qzuemc6b0ps/dix0gMXw2pYqXQWHquW5NdUJcgD2DKFSn
BB734nUP6KlPg0IFh1eehRHyVRLIAot/uBlUJ3bMx9xeYCkKa+twX90u6EmjTopP
JjLxNsf6c2I=
=k0Xz
-----END PGP SIGNATURE-----
Merge tag 'md-3.3-fixes' of git://neil.brown.name/md
Some simple md-related fixes.
1/ two small fixes to ensure we handle an interrupted resync properly.
2/ avoid loading the bitmap multiple times in dm-raid
* tag 'md-3.3-fixes' of git://neil.brown.name/md:
md: two small fixes to handling interrupt resync.
Prevent DM RAID from loading bitmap twice.
1/ If a resync is aborted we should record how far we got
(recovery_cp) the last request that we know has completed
(->curr_resync_completed) rather than the last request that was
submitted (->curr_resync).
2/ When a resync aborts we still want to update the metadata with
any changes, so set MD_CHANGE_DEVS even if we 'skip'.
Signed-off-by: NeilBrown <neilb@suse.de>
As 'make versioncheck' points out, drivers/md/dm-bufio.c has no need to include
linux/version.h, so this patch removes the unneeded include.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The life cycle of a device-mapper target is:
1) create
2) resume
3) suspend
*) possibly repeat from 2
4) destroy
The dm-raid target is unconditionally calling MD's bitmap_load function upon
every resume. If steps 2 & 3 above are repeated, bitmap_load is called
multiple times. It is only written to be called once; otherwise, it allocates
new memory for the bitmap (without freeing the old) and incrementing the number
of pages it thinks it has without zeroing first. This ultimately leads to
access beyond allocated memory and lost memory.
Simply avoiding the bitmap_load call upon resume is not sufficient. If the
target was suspended while the initial recovery was only partially complete,
it needs to be restarted when the target is resumed. This is why
'md_wakeup_thread' is called before issuing the 'mddev_resume'.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
* 'for-3.3/core' of git://git.kernel.dk/linux-block: (37 commits)
Revert "block: recursive merge requests"
block: Stop using macro stubs for the bio data integrity calls
blockdev: convert some macros to static inlines
fs: remove unneeded plug in mpage_readpages()
block: Add BLKROTATIONAL ioctl
block: Introduce blk_set_stacking_limits function
block: remove WARN_ON_ONCE() in exit_io_context()
block: an exiting task should be allowed to create io_context
block: ioc_cgroup_changed() needs to be exported
block: recursive merge requests
block, cfq: fix empty queue crash caused by request merge
block, cfq: move icq creation and rq->elv.icq association to block core
block, cfq: restructure io_cq creation path for io_context interface cleanup
block, cfq: move io_cq exit/release to blk-ioc.c
block, cfq: move icq cache management to block core
block, cfq: move io_cq lookup to blk-ioc.c
block, cfq: move cfqd->icq_list to request_queue and add request->elv.icq
block, cfq: reorganize cfq_io_context into generic and cfq specific parts
block: remove elevator_queue->ops
block: reorder elevator switch sequence
...
Fix up conflicts in:
- block/blk-cgroup.c
Switch from can_attach_task to can_attach
- block/cfq-iosched.c
conflict with now removed cic index changes (we now use q->id instead)
A logical volume can map to just part of underlying physical volume.
In this case, it must be treated like a partition.
Based on a patch from Alasdair G Kergon.
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One is a recently introduced regression that affects an unusual
configuration with a guaranteed BUG_ON. Has been tagged for -stable.
The other is minor missing functionality.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIVAwUATwy6Sznsnt1WYoG5AQL+5A//TbTgElZaJ7IMY4q658afuRNtuWfevqTs
4EoSUvarwyZN20JxUd4dFTzLQ3nu3XVmwZsDBbpRs7+Dt2m7Efp4qytqrTxHb6SR
4gOr1KFXZi2rQFNpIg8T5+eyb+2VkbHGYffOtwS9TZnJqZZ4upffJi1EpJSfB1Bo
ilkO8wcaNKVWzTgnQo+JVOLQQyNENs12Xc0aLVA0dZC0a37qWJTbr75r7nrtLT7A
Gy783AG8JglRsr7AOVceqBVOpRonhFDz7G2hQqHg140m6i/GzDJrPtadovCtq7nt
U6/Po7qbOj5eOSGrVPwS1gJQOT7deAL7Eeu7dOpbzl1Cwysbhg63piMNyDs4P/gM
bFsR+LTbmZiaYs5G1oDwN/WTYLeq6cxY0IftShWdGoQwZRF/woJ7VAQSWNvHY8mg
Z+EbEL3sY40+8eBk7/umT0WxQ9wYjooS/9ZowQ2ktRmt82Dwv0LXzWNTSlwhWKKt
QBtv1er/psEKFqb2zDtlea8gDlKahaVNaiOK6RuY5CM5iBa4/zEmWVXS/i07LC7Z
cW9swD4J3AEKSolWHWYQJBmCsKy+rUp5t0mQ5e/O4+nhCDbfe+Da0OArg6b/ygMu
14RdyjOENxSqKi3IkCnToch+eNzCIm3ETaS2E0nSv996G+ShqsLtROOI9x9DXiu3
nyLxAnIVp8I=
=969y
-----END PGP SIGNATURE-----
Merge tag 'md-3.3-fixes' of git://neil.brown.name/md
Two bugfixes for md.
One is a recently introduced regression that affects an unusual
configuration with a guaranteed BUG_ON. Has been tagged for -stable.
The other is minor missing functionality.
* tag 'md-3.3-fixes' of git://neil.brown.name/md:
md/raid1: perform bad-block tests for WriteMostly devices too.
md: notify the 'degraded' sysfs attribute on failure.
Stacking driver queue limits are typically bounded exclusively by the
capabilities of the low level devices, not by the stacking driver
itself.
This patch introduces blk_set_stacking_limits() which has more liberal
metrics than the default queue limits function. This allows us to
inherit topology parameters from bottom devices without manually
tweaking the default limits in each driver prior to calling the stacking
function.
Since there is now a clear distinction between stacking and low-level
devices, blk_set_default_limits() has been modified to carry the more
conservative values that we used to manually set in
blk_queue_make_request().
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We normally try to avoid reading from write-mostly devices, but when
we do we really have to check for bad blocks and be sure not to
try reading them.
With the current code, best_good_sectors might not get set and that
causes zero-length read requests to be send down which is very
confusing.
This bug was introduced in commit d2eb35acfd and so the patch
is suitable for 3.1.x and 3.2.x
Reported-and-tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Reported-and-tested-by: Art -kwaak- van Breemen <ard@telegraafnet.nl>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org
We currently only 'notify' changes to the 'degraded' attribute
when it decreases, not when it increases.
Notifying on failure is a little awkward as it happen in
interrupt context.
So instead, notify when we remove the failed device from the array,
which is very soon afterwards.
Reported-and-tested-by: Mikhail Balabin <mbalabin@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>