Backport btrfs fixes queued for stable (rhbz 1217191)

This commit is contained in:
Josh Boyer 2015-06-11 14:34:42 -04:00
parent 508ca30eb4
commit 92024771d3
8 changed files with 387 additions and 0 deletions

View File

@ -0,0 +1,126 @@
From: Filipe Manana <fdmanana@suse.com>
Date: Tue, 31 Mar 2015 14:56:46 +0100
Subject: [PATCH] Btrfs: fix range cloning when same inode used as source and
destination
While searching for extents to clone we might find one where we only use
a part of it coming from its tail. If our destination inode is the same
the source inode, we end up removing the tail part of the extent item and
insert after a new one that point to the same extent with an adjusted
key file offset and data offset. After this we search for the next extent
item in the fs/subvol tree with a key that has an offset incremented by
one. But this second search leaves us at the new extent item we inserted
previously, and since that extent item has a non-zero data offset, it
it can make us call btrfs_drop_extents with an empty range (start == end)
which causes the following warning:
[23978.537119] WARNING: CPU: 6 PID: 16251 at fs/btrfs/file.c:550 btrfs_drop_extent_cache+0x43/0x385 [btrfs]()
(...)
[23978.557266] Call Trace:
[23978.557978] [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[23978.559191] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[23978.560699] [<ffffffffa047f0ea>] ? btrfs_drop_extent_cache+0x43/0x385 [btrfs]
[23978.562389] [<ffffffff8104544d>] warn_slowpath_null+0x1a/0x1c
[23978.563613] [<ffffffffa047f0ea>] btrfs_drop_extent_cache+0x43/0x385 [btrfs]
[23978.565103] [<ffffffff810e3a18>] ? time_hardirqs_off+0x15/0x28
[23978.566294] [<ffffffff81079ff8>] ? trace_hardirqs_off+0xd/0xf
[23978.567438] [<ffffffffa047f73d>] __btrfs_drop_extents+0x6b/0x9e1 [btrfs]
[23978.568702] [<ffffffff8107c03f>] ? trace_hardirqs_on+0xd/0xf
[23978.569763] [<ffffffff811441c0>] ? ____cache_alloc+0x69/0x2eb
[23978.570817] [<ffffffff81142269>] ? virt_to_head_page+0x9/0x36
[23978.571872] [<ffffffff81143c15>] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb
[23978.573466] [<ffffffff811420d5>] ? kmemleak_alloc_recursive.constprop.52+0x16/0x18
[23978.574962] [<ffffffffa0480d07>] btrfs_drop_extents+0x66/0x7f [btrfs]
[23978.576179] [<ffffffffa049aa35>] btrfs_clone+0x516/0xaf5 [btrfs]
[23978.577311] [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
[23978.578520] [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
[23978.580282] [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
(...)
[23978.591887] ---[ end trace 988ec2a653d03ed3 ]---
Then we attempt to insert a new extent item with a key that already
exists, which makes btrfs_insert_empty_item return -EEXIST resulting in
abortion of the current transaction:
[23978.594355] WARNING: CPU: 6 PID: 16251 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
(...)
[23978.622589] Call Trace:
[23978.623181] [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[23978.624359] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[23978.625573] [<ffffffffa044ab6c>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
[23978.626971] [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
[23978.628003] [<ffffffff8108a6c8>] ? vprintk_default+0x1d/0x1f
[23978.629138] [<ffffffffa044ab6c>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
[23978.630528] [<ffffffffa049ad1b>] btrfs_clone+0x7fc/0xaf5 [btrfs]
[23978.631635] [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
[23978.632886] [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
[23978.634119] [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
(...)
[23978.647714] ---[ end trace 988ec2a653d03ed4 ]---
This is wrong because we should not process the extent item that we just
inserted previously, and instead process the extent item that follows it
in the tree
For example for the test case I wrote for fstests:
bs=$((64 * 1024))
mkfs.btrfs -f -l $bs -O ^no-holes /dev/sdc
mount /dev/sdc /mnt
xfs_io -f -c "pwrite -S 0xaa $(($bs * 2)) $(($bs * 2))" /mnt/foo
$CLONER_PROG -s $((3 * $bs)) -d $((267 * $bs)) -l 0 /mnt/foo /mnt/foo
$CLONER_PROG -s $((217 * $bs)) -d $((95 * $bs)) -l 0 /mnt/foo /mnt/foo
The second clone call fails with -EEXIST, because when we process the
first extent item (offset 262144), we drop part of it (counting from the
end) and then insert a new extent item with a key greater then the key we
found. The next time we search the tree we search for a key with offset
262144 + 1, which leaves us at the new extent item we have just inserted
but we think it refers to an extent that we need to clone.
Fix this by ensuring the next search key uses an offset corresponding to
the offset of the key we found previously plus the data length of the
corresponding extent item. This ensures we skip new extent items that we
inserted and works for the case of implicit holes too (NO_HOLES feature).
A test case for fstests follows soon.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
---
fs/btrfs/ioctl.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2b4c5423672d..d79c599240a7 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3206,6 +3206,8 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
key.offset = off;
while (1) {
+ u64 next_key_min_offset;
+
/*
* note the key will change type as we walk through the
* tree.
@@ -3286,7 +3288,7 @@ process_slot:
} else if (key.offset >= off + len) {
break;
}
-
+ next_key_min_offset = key.offset + datal;
size = btrfs_item_size_nr(leaf, slot);
read_extent_buffer(leaf, buf,
btrfs_item_ptr_offset(leaf, slot),
@@ -3501,7 +3503,7 @@ process_slot:
break;
}
btrfs_release_path(path);
- key.offset++;
+ key.offset = next_key_min_offset;
}
ret = 0;

View File

@ -0,0 +1,55 @@
From: Chris Mason <clm@fb.com>
Date: Thu, 11 Jun 2015 18:06:51 +0200
Subject: [PATCH] Btrfs: fix regression in raid level conversion
Commit 2f0810880f082fa8ba66ab2c33b02e4ff9770a5e changed
btrfs_set_block_group_ro to avoid trying to allocate new chunks with the
new raid profile during conversion. This fixed failures when there was
no space on the drive to allocate a new chunk, but the metadata
reserves were sufficient to continue the conversion.
But this ended up causing a regression when the drive had plenty of
space to allocate new chunks, mostly because reduce_alloc_profile isn't
using the new raid profile.
Fixing btrfs_reduce_alloc_profile is a bigger patch. For now, do a
partial revert of 2f0810880, and don't error out if we hit ENOSPC.
Signed-off-by: Chris Mason <clm@fb.com>
Tested-by: Dave Sterba <dsterba@suse.cz>
Reported-by: Holger Hoffstaette <holger.hoffstaette@googlemail.com>
[adapted for stable kernel branch, v4.0.5]
Signed-off-by: David Sterba <dsterba@suse.cz>
---
fs/btrfs/extent-tree.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8b33da6ec3dd..63be2a96ed6a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8535,6 +8535,24 @@ int btrfs_set_block_group_ro(struct btrfs_root *root,
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
return PTR_ERR(trans);
+ /*
+ * if we are changing raid levels, try to allocate a corresponding
+ * block group with the new raid level.
+ */
+ alloc_flags = update_block_group_flags(root, cache->flags);
+ if (alloc_flags != cache->flags) {
+ ret = do_chunk_alloc(trans, root, alloc_flags,
+ CHUNK_ALLOC_FORCE);
+ /*
+ * ENOSPC is allowed here, we may have enough space
+ * already allocated at the new raid level to
+ * carry on
+ */
+ if (ret == -ENOSPC)
+ ret = 0;
+ if (ret < 0)
+ goto out;
+ }
ret = set_block_group_ro(cache, 0);
if (!ret)

View File

@ -0,0 +1,26 @@
From: Chris Mason <clm@fb.com>
Date: Sat, 11 Apr 2015 05:09:06 -0700
Subject: [PATCH] Btrfs: fix uninit variable in clone ioctl
Commit 0d97a64e0 creates a new variable but doesn't always set it up.
This puts it back to the original method (key.offset + 1) for the cases
not covered by Filipe's new logic.
Signed-off-by: Chris Mason <clm@fb.com>
---
fs/btrfs/ioctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index d79c599240a7..64e8fb639f72 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3206,7 +3206,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
key.offset = off;
while (1) {
- u64 next_key_min_offset;
+ u64 next_key_min_offset = key.offset + 1;
/*
* note the key will change type as we walk through the

View File

@ -0,0 +1,29 @@
From: Filipe Manana <fdmanana@suse.com>
Date: Mon, 2 Mar 2015 20:53:52 +0000
Subject: [PATCH] Btrfs: send, add missing check for dead clone root
After we locked the root's root item, a concurrent snapshot deletion
call might have set the dead flag on it. So check if the dead flag
is set and abort if it is, just like we do for the parent root.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
---
fs/btrfs/send.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index d6033f540cc7..6ec28f13659e 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -5855,7 +5855,8 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
clone_sources_to_rollback = i + 1;
spin_lock(&clone_root->root_item_lock);
clone_root->send_in_progress++;
- if (!btrfs_root_readonly(clone_root)) {
+ if (!btrfs_root_readonly(clone_root) ||
+ btrfs_root_dead(clone_root)) {
spin_unlock(&clone_root->root_item_lock);
srcu_read_unlock(&fs_info->subvol_srcu, index);
ret = -EPERM;

View File

@ -0,0 +1,58 @@
From: Filipe Manana <fdmanana@suse.com>
Date: Mon, 2 Mar 2015 20:53:53 +0000
Subject: [PATCH] Btrfs: send, don't leave without decrementing clone root's
send_progress
If the clone root was not readonly or the dead flag was set on it, we were
leaving without decrementing the root's send_progress counter (and before
we just incremented it). If a concurrent snapshot deletion was in progress
and ended up being aborted, it would be impossible to later attempt to
delete again the snapshot, since the root's send_in_progress counter could
never go back to 0.
We were also setting clone_sources_to_rollback to i + 1 too early - if we
bailed out because the clone root we got is not readonly or flagged as dead
we ended up later derreferencing a null pointer because we didn't assign
the clone root to sctx->clone_roots[i].root:
for (i = 0; sctx && i < clone_sources_to_rollback; i++)
btrfs_root_dec_send_in_progress(
sctx->clone_roots[i].root);
So just don't increment the send_in_progress counter if the root is readonly
or flagged as dead.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
---
fs/btrfs/send.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 6ec28f13659e..571de5a08fe7 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -5852,9 +5852,7 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
ret = PTR_ERR(clone_root);
goto out;
}
- clone_sources_to_rollback = i + 1;
spin_lock(&clone_root->root_item_lock);
- clone_root->send_in_progress++;
if (!btrfs_root_readonly(clone_root) ||
btrfs_root_dead(clone_root)) {
spin_unlock(&clone_root->root_item_lock);
@@ -5862,10 +5860,12 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
ret = -EPERM;
goto out;
}
+ clone_root->send_in_progress++;
spin_unlock(&clone_root->root_item_lock);
srcu_read_unlock(&fs_info->subvol_srcu, index);
sctx->clone_roots[i].root = clone_root;
+ clone_sources_to_rollback = i + 1;
}
vfree(clone_sources_tmp);
clone_sources_tmp = NULL;

View File

@ -0,0 +1,38 @@
From: Jeff Mahoney <jeffm@suse.com>
Date: Fri, 20 Mar 2015 14:02:09 -0400
Subject: [PATCH] btrfs: cleanup orphans while looking up default subvolume
Orphans in the fs tree are cleaned up via open_ctree and subvolume
orphans are cleaned via btrfs_lookup_dentry -- except when a default
subvolume is in use. The name for the default subvolume uses a manual
lookup that doesn't trigger orphan cleanup and needs to trigger it
manually as well. This doesn't apply to the remount case since the
subvolumes are cleaned up by walking the root radix tree.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
---
fs/btrfs/super.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 05fef198ff94..e477ed67a49a 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -901,6 +901,15 @@ find_root:
if (IS_ERR(new_root))
return ERR_CAST(new_root);
+ if (!(sb->s_flags & MS_RDONLY)) {
+ int ret;
+ down_read(&fs_info->cleanup_work_sem);
+ ret = btrfs_orphan_cleanup(new_root);
+ up_read(&fs_info->cleanup_work_sem);
+ if (ret)
+ return ERR_PTR(ret);
+ }
+
dir_id = btrfs_root_dirid(&new_root->root_item);
setup_root:
location.objectid = dir_id;

View File

@ -0,0 +1,34 @@
From: Chengyu Song <csong84@gatech.edu>
Date: Tue, 24 Mar 2015 18:12:56 -0400
Subject: [PATCH] btrfs: incorrect handling for fiemap_fill_next_extent return
fiemap_fill_next_extent returns 0 on success, -errno on error, 1 if this was
the last extent that will fit in user array. If 1 is returned, the return
value may eventually returned to user space, which should not happen, according
to manpage of ioctl.
Signed-off-by: Chengyu Song <csong84@gatech.edu>
Reviewed-by: David Sterba <dsterba@suse.cz>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
---
fs/btrfs/extent_io.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d688cfe5d496..782f3bc4651d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4514,8 +4514,11 @@ int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
}
ret = fiemap_fill_next_extent(fieinfo, em_start, disko,
em_len, flags);
- if (ret)
+ if (ret) {
+ if (ret == 1)
+ ret = 0;
goto out_free;
+ }
}
out_free:
free_extent_map(em);

View File

@ -798,6 +798,15 @@ Patch26223: block-discard-bdi_unregister-in-favour-of-bdi_destro.patch
#rhbz 1223051
Patch26230: Input-synaptics-add-min-max-quirk-for-Lenovo-S540.patch
#rhbz 1217191
Patch26231: Btrfs-send-add-missing-check-for-dead-clone-root.patch
Patch26232: Btrfs-send-don-t-leave-without-decrementing-clone-ro.patch
Patch26233: btrfs-incorrect-handling-for-fiemap_fill_next_extent.patch
Patch26234: btrfs-cleanup-orphans-while-looking-up-default-subvo.patch
Patch26235: Btrfs-fix-range-cloning-when-same-inode-used-as-sour.patch
Patch26236: Btrfs-fix-uninit-variable-in-clone-ioctl.patch
Patch26237: Btrfs-fix-regression-in-raid-level-conversion.patch
# END OF PATCH DEFINITIONS
%endif
@ -1565,6 +1574,15 @@ ApplyPatch block-discard-bdi_unregister-in-favour-of-bdi_destro.patch
#rhbz 1223051
ApplyPatch Input-synaptics-add-min-max-quirk-for-Lenovo-S540.patch
#rhbz 1217191
ApplyPatch Btrfs-send-add-missing-check-for-dead-clone-root.patch
ApplyPatch Btrfs-send-don-t-leave-without-decrementing-clone-ro.patch
ApplyPatch btrfs-incorrect-handling-for-fiemap_fill_next_extent.patch
ApplyPatch btrfs-cleanup-orphans-while-looking-up-default-subvo.patch
ApplyPatch Btrfs-fix-range-cloning-when-same-inode-used-as-sour.patch
ApplyPatch Btrfs-fix-uninit-variable-in-clone-ioctl.patch
ApplyPatch Btrfs-fix-regression-in-raid-level-conversion.patch
# END OF PATCH APPLICATIONS
%endif
@ -2376,6 +2394,9 @@ fi
# ||----w |
# || ||
%changelog
* Thu Jun 11 2015 Josh Boyer <jwboyer@fedoraproject.org>
- Backport btrfs fixes queued for stable (rhbz 1217191)
* Tue Jun 09 2015 Josh Boyer <jwboyer@fedoraproject.org>
- Fix touchpad for Thinkpad S540 (rhbz 1223051)