Backport btrfs fixes queued for stable (rhbz 1217191)

2015-06-11 14:34:42 -04:00 · 2015-06-11 14:34:42 -04:00 · 92024771d3
parent 508ca30eb4
commit 92024771d3
8 changed files with 387 additions and 0 deletions
--- a/Btrfs-fix-range-cloning-when-same-inode-used-as-sour.patch
+++ b/Btrfs-fix-range-cloning-when-same-inode-used-as-sour.patch
@ -0,0 +1,126 @@
+From: Filipe Manana <fdmanana@suse.com>
+Date: Tue, 31 Mar 2015 14:56:46 +0100
+Subject: [PATCH] Btrfs: fix range cloning when same inode used as source and
+ destination
+
+While searching for extents to clone we might find one where we only use
+a part of it coming from its tail. If our destination inode is the same
+the source inode, we end up removing the tail part of the extent item and
+insert after a new one that point to the same extent with an adjusted
+key file offset and data offset. After this we search for the next extent
+item in the fs/subvol tree with a key that has an offset incremented by
+one. But this second search leaves us at the new extent item we inserted
+previously, and since that extent item has a non-zero data offset, it
+it can make us call btrfs_drop_extents with an empty range (start == end)
+which causes the following warning:
+
+[23978.537119] WARNING: CPU: 6 PID: 16251 at fs/btrfs/file.c:550 btrfs_drop_extent_cache+0x43/0x385 [btrfs]()
+(...)
+[23978.557266] Call Trace:
+[23978.557978]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
+[23978.559191]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
+[23978.560699]  [<ffffffffa047f0ea>] ? btrfs_drop_extent_cache+0x43/0x385 [btrfs]
+[23978.562389]  [<ffffffff8104544d>] warn_slowpath_null+0x1a/0x1c
+[23978.563613]  [<ffffffffa047f0ea>] btrfs_drop_extent_cache+0x43/0x385 [btrfs]
+[23978.565103]  [<ffffffff810e3a18>] ? time_hardirqs_off+0x15/0x28
+[23978.566294]  [<ffffffff81079ff8>] ? trace_hardirqs_off+0xd/0xf
+[23978.567438]  [<ffffffffa047f73d>] __btrfs_drop_extents+0x6b/0x9e1 [btrfs]
+[23978.568702]  [<ffffffff8107c03f>] ? trace_hardirqs_on+0xd/0xf
+[23978.569763]  [<ffffffff811441c0>] ? ____cache_alloc+0x69/0x2eb
+[23978.570817]  [<ffffffff81142269>] ? virt_to_head_page+0x9/0x36
+[23978.571872]  [<ffffffff81143c15>] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb
+[23978.573466]  [<ffffffff811420d5>] ? kmemleak_alloc_recursive.constprop.52+0x16/0x18
+[23978.574962]  [<ffffffffa0480d07>] btrfs_drop_extents+0x66/0x7f [btrfs]
+[23978.576179]  [<ffffffffa049aa35>] btrfs_clone+0x516/0xaf5 [btrfs]
+[23978.577311]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
+[23978.578520]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
+[23978.580282]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
+(...)
+[23978.591887] ---[ end trace 988ec2a653d03ed3 ]---
+
+Then we attempt to insert a new extent item with a key that already
+exists, which makes btrfs_insert_empty_item return -EEXIST resulting in
+abortion of the current transaction:
+
+[23978.594355] WARNING: CPU: 6 PID: 16251 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
+(...)
+[23978.622589] Call Trace:
+[23978.623181]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
+[23978.624359]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
+[23978.625573]  [<ffffffffa044ab6c>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
+[23978.626971]  [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
+[23978.628003]  [<ffffffff8108a6c8>] ? vprintk_default+0x1d/0x1f
+[23978.629138]  [<ffffffffa044ab6c>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
+[23978.630528]  [<ffffffffa049ad1b>] btrfs_clone+0x7fc/0xaf5 [btrfs]
+[23978.631635]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
+[23978.632886]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
+[23978.634119]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
+(...)
+[23978.647714] ---[ end trace 988ec2a653d03ed4 ]---
+
+This is wrong because we should not process the extent item that we just
+inserted previously, and instead process the extent item that follows it
+in the tree
+
+For example for the test case I wrote for fstests:
+
+   bs=$((64 * 1024))
+   mkfs.btrfs -f -l $bs -O ^no-holes /dev/sdc
+   mount /dev/sdc /mnt
+
+   xfs_io -f -c "pwrite -S 0xaa $(($bs * 2)) $(($bs * 2))" /mnt/foo
+
+   $CLONER_PROG -s $((3 * $bs)) -d $((267 * $bs)) -l 0 /mnt/foo /mnt/foo
+   $CLONER_PROG -s $((217 * $bs)) -d $((95 * $bs)) -l 0 /mnt/foo /mnt/foo
+
+The second clone call fails with -EEXIST, because when we process the
+first extent item (offset 262144), we drop part of it (counting from the
+end) and then insert a new extent item with a key greater then the key we
+found. The next time we search the tree we search for a key with offset
+262144 + 1, which leaves us at the new extent item we have just inserted
+but we think it refers to an extent that we need to clone.
+
+Fix this by ensuring the next search key uses an offset corresponding to
+the offset of the key we found previously plus the data length of the
+corresponding extent item. This ensures we skip new extent items that we
+inserted and works for the case of implicit holes too (NO_HOLES feature).
+
+A test case for fstests follows soon.
+
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: Chris Mason <clm@fb.com>
+---
+ fs/btrfs/ioctl.c | 6 ++++--
+ 1 file changed, 4 insertions(+), 2 deletions(-)
+
+diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
+index 2b4c5423672d..d79c599240a7 100644
+--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
+@@ -3206,6 +3206,8 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
+ 	key.offset = off;
+ 
+ 	while (1) {
+		u64 next_key_min_offset;
+
+ 		/*
+ 		 * note the key will change type as we walk through the
+ 		 * tree.
+@@ -3286,7 +3288,7 @@ process_slot:
+ 			} else if (key.offset >= off + len) {
+ 				break;
+ 			}
+-
+			next_key_min_offset = key.offset + datal;
+ 			size = btrfs_item_size_nr(leaf, slot);
+ 			read_extent_buffer(leaf, buf,
+ 					   btrfs_item_ptr_offset(leaf, slot),
+@@ -3501,7 +3503,7 @@ process_slot:
+ 				break;
+ 		}
+ 		btrfs_release_path(path);
+-		key.offset++;
+		key.offset = next_key_min_offset;
+ 	}
+ 	ret = 0;
+ 
--- a/Btrfs-fix-regression-in-raid-level-conversion.patch
+++ b/Btrfs-fix-regression-in-raid-level-conversion.patch
@ -0,0 +1,55 @@
+From: Chris Mason <clm@fb.com>
+Date: Thu, 11 Jun 2015 18:06:51 +0200
+Subject: [PATCH] Btrfs: fix regression in raid level conversion
+
+Commit 2f0810880f082fa8ba66ab2c33b02e4ff9770a5e changed
+btrfs_set_block_group_ro to avoid trying to allocate new chunks with the
+new raid profile during conversion.  This fixed failures when there was
+no space on the drive to allocate a new chunk, but the metadata
+reserves were sufficient to continue the conversion.
+
+But this ended up causing a regression when the drive had plenty of
+space to allocate new chunks, mostly because reduce_alloc_profile isn't
+using the new raid profile.
+
+Fixing btrfs_reduce_alloc_profile is a bigger patch.  For now, do a
+partial revert of 2f0810880, and don't error out if we hit ENOSPC.
+
+Signed-off-by: Chris Mason <clm@fb.com>
+Tested-by: Dave Sterba <dsterba@suse.cz>
+Reported-by: Holger Hoffstaette <holger.hoffstaette@googlemail.com>
+[adapted for stable kernel branch, v4.0.5]
+Signed-off-by: David Sterba <dsterba@suse.cz>
+---
+ fs/btrfs/extent-tree.c | 18 ++++++++++++++++++
+ 1 file changed, 18 insertions(+)
+
+diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
+index 8b33da6ec3dd..63be2a96ed6a 100644
+--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
+@@ -8535,6 +8535,24 @@ int btrfs_set_block_group_ro(struct btrfs_root *root,
+ 	trans = btrfs_join_transaction(root);
+ 	if (IS_ERR(trans))
+ 		return PTR_ERR(trans);
+	/*
+	 * if we are changing raid levels, try to allocate a corresponding
+	 * block group with the new raid level.
+	 */
+	alloc_flags = update_block_group_flags(root, cache->flags);
+	if (alloc_flags != cache->flags) {
+		ret = do_chunk_alloc(trans, root, alloc_flags,
+				     CHUNK_ALLOC_FORCE);
+		/*
+		 * ENOSPC is allowed here, we may have enough space
+		 * already allocated at the new raid level to
+		 * carry on
+		 */
+		if (ret == -ENOSPC)
+			ret = 0;
+		if (ret < 0)
+			goto out;
+	}
+ 
+ 	ret = set_block_group_ro(cache, 0);
+ 	if (!ret)
--- a/Btrfs-fix-uninit-variable-in-clone-ioctl.patch
+++ b/Btrfs-fix-uninit-variable-in-clone-ioctl.patch
@ -0,0 +1,26 @@
+From: Chris Mason <clm@fb.com>
+Date: Sat, 11 Apr 2015 05:09:06 -0700
+Subject: [PATCH] Btrfs: fix uninit variable in clone ioctl
+
+Commit 0d97a64e0 creates a new variable but doesn't always set it up.
+This puts it back to the original method (key.offset + 1) for the cases
+not covered by Filipe's new logic.
+
+Signed-off-by: Chris Mason <clm@fb.com>
+---
+ fs/btrfs/ioctl.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
+index d79c599240a7..64e8fb639f72 100644
+--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
+@@ -3206,7 +3206,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
+ 	key.offset = off;
+ 
+ 	while (1) {
+-		u64 next_key_min_offset;
+		u64 next_key_min_offset = key.offset + 1;
+ 
+ 		/*
+ 		 * note the key will change type as we walk through the
--- a/Btrfs-send-add-missing-check-for-dead-clone-root.patch
+++ b/Btrfs-send-add-missing-check-for-dead-clone-root.patch
@ -0,0 +1,29 @@
+From: Filipe Manana <fdmanana@suse.com>
+Date: Mon, 2 Mar 2015 20:53:52 +0000
+Subject: [PATCH] Btrfs: send, add missing check for dead clone root
+
+After we locked the root's root item, a concurrent snapshot deletion
+call might have set the dead flag on it. So check if the dead flag
+is set and abort if it is, just like we do for the parent root.
+
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Reviewed-by: David Sterba <dsterba@suse.cz>
+Signed-off-by: Chris Mason <clm@fb.com>
+---
+ fs/btrfs/send.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
+index d6033f540cc7..6ec28f13659e 100644
+--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
+@@ -5855,7 +5855,8 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
+ 			clone_sources_to_rollback = i + 1;
+ 			spin_lock(&clone_root->root_item_lock);
+ 			clone_root->send_in_progress++;
+-			if (!btrfs_root_readonly(clone_root)) {
+			if (!btrfs_root_readonly(clone_root) ||
+			    btrfs_root_dead(clone_root)) {
+ 				spin_unlock(&clone_root->root_item_lock);
+ 				srcu_read_unlock(&fs_info->subvol_srcu, index);
+ 				ret = -EPERM;
--- a/Btrfs-send-don-t-leave-without-decrementing-clone-ro.patch
+++ b/Btrfs-send-don-t-leave-without-decrementing-clone-ro.patch
@ -0,0 +1,58 @@
+From: Filipe Manana <fdmanana@suse.com>
+Date: Mon, 2 Mar 2015 20:53:53 +0000
+Subject: [PATCH] Btrfs: send, don't leave without decrementing clone root's
+ send_progress
+
+If the clone root was not readonly or the dead flag was set on it, we were
+leaving without decrementing the root's send_progress counter (and before
+we just incremented it). If a concurrent snapshot deletion was in progress
+and ended up being aborted, it would be impossible to later attempt to
+delete again the snapshot, since the root's send_in_progress counter could
+never go back to 0.
+
+We were also setting clone_sources_to_rollback to i + 1 too early - if we
+bailed out because the clone root we got is not readonly or flagged as dead
+we ended up later derreferencing a null pointer because we didn't assign
+the clone root to sctx->clone_roots[i].root:
+
+		for (i = 0; sctx && i < clone_sources_to_rollback; i++)
+			btrfs_root_dec_send_in_progress(
+					sctx->clone_roots[i].root);
+
+So just don't increment the send_in_progress counter if the root is readonly
+or flagged as dead.
+
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Reviewed-by: David Sterba <dsterba@suse.cz>
+Signed-off-by: Chris Mason <clm@fb.com>
+---
+ fs/btrfs/send.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
+index 6ec28f13659e..571de5a08fe7 100644
+--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
+@@ -5852,9 +5852,7 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
+ 				ret = PTR_ERR(clone_root);
+ 				goto out;
+ 			}
+-			clone_sources_to_rollback = i + 1;
+ 			spin_lock(&clone_root->root_item_lock);
+-			clone_root->send_in_progress++;
+ 			if (!btrfs_root_readonly(clone_root) ||
+ 			    btrfs_root_dead(clone_root)) {
+ 				spin_unlock(&clone_root->root_item_lock);
+@@ -5862,10 +5860,12 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
+ 				ret = -EPERM;
+ 				goto out;
+ 			}
+			clone_root->send_in_progress++;
+ 			spin_unlock(&clone_root->root_item_lock);
+ 			srcu_read_unlock(&fs_info->subvol_srcu, index);
+ 
+ 			sctx->clone_roots[i].root = clone_root;
+			clone_sources_to_rollback = i + 1;
+ 		}
+ 		vfree(clone_sources_tmp);
+ 		clone_sources_tmp = NULL;
--- a/btrfs-cleanup-orphans-while-looking-up-default-subvo.patch
+++ b/btrfs-cleanup-orphans-while-looking-up-default-subvo.patch
@ -0,0 +1,38 @@
+From: Jeff Mahoney <jeffm@suse.com>
+Date: Fri, 20 Mar 2015 14:02:09 -0400
+Subject: [PATCH] btrfs: cleanup orphans while looking up default subvolume
+
+Orphans in the fs tree are cleaned up via open_ctree and subvolume
+orphans are cleaned via btrfs_lookup_dentry -- except when a default
+subvolume is in use.  The name for the default subvolume uses a manual
+lookup that doesn't trigger orphan cleanup and needs to trigger it
+manually as well. This doesn't apply to the remount case since the
+subvolumes are cleaned up by walking the root radix tree.
+
+Signed-off-by: Jeff Mahoney <jeffm@suse.com>
+Reviewed-by: David Sterba <dsterba@suse.cz>
+Signed-off-by: Chris Mason <clm@fb.com>
+---
+ fs/btrfs/super.c | 9 +++++++++
+ 1 file changed, 9 insertions(+)
+
+diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
+index 05fef198ff94..e477ed67a49a 100644
+--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
+@@ -901,6 +901,15 @@ find_root:
+ 	if (IS_ERR(new_root))
+ 		return ERR_CAST(new_root);
+ 
+	if (!(sb->s_flags & MS_RDONLY)) {
+		int ret;
+		down_read(&fs_info->cleanup_work_sem);
+		ret = btrfs_orphan_cleanup(new_root);
+		up_read(&fs_info->cleanup_work_sem);
+		if (ret)
+			return ERR_PTR(ret);
+	}
+
+ 	dir_id = btrfs_root_dirid(&new_root->root_item);
+ setup_root:
+ 	location.objectid = dir_id;
--- a/btrfs-incorrect-handling-for-fiemap_fill_next_extent.patch
+++ b/btrfs-incorrect-handling-for-fiemap_fill_next_extent.patch
@ -0,0 +1,34 @@
+From: Chengyu Song <csong84@gatech.edu>
+Date: Tue, 24 Mar 2015 18:12:56 -0400
+Subject: [PATCH] btrfs: incorrect handling for fiemap_fill_next_extent return
+
+fiemap_fill_next_extent returns 0 on success, -errno on error, 1 if this was
+the last extent that will fit in user array. If 1 is returned, the return
+value may eventually returned to user space, which should not happen, according
+to manpage of ioctl.
+
+Signed-off-by: Chengyu Song <csong84@gatech.edu>
+Reviewed-by: David Sterba <dsterba@suse.cz>
+Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
+Signed-off-by: Chris Mason <clm@fb.com>
+---
+ fs/btrfs/extent_io.c | 5 ++++-
+ 1 file changed, 4 insertions(+), 1 deletion(-)
+
+diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
+index d688cfe5d496..782f3bc4651d 100644
+--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
+@@ -4514,8 +4514,11 @@ int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
+ 		}
+ 		ret = fiemap_fill_next_extent(fieinfo, em_start, disko,
+ 					      em_len, flags);
+-		if (ret)
+		if (ret) {
+			if (ret == 1)
+				ret = 0;
+ 			goto out_free;
+		}
+ 	}
+ out_free:
+ 	free_extent_map(em);
--- a/kernel.spec
+++ b/kernel.spec
@ -798,6 +798,15 @@ Patch26223: block-discard-bdi_unregister-in-favour-of-bdi_destro.patch
 #rhbz 1223051
 Patch26230: Input-synaptics-add-min-max-quirk-for-Lenovo-S540.patch

+#rhbz 1217191
+Patch26231: Btrfs-send-add-missing-check-for-dead-clone-root.patch
+Patch26232: Btrfs-send-don-t-leave-without-decrementing-clone-ro.patch
+Patch26233: btrfs-incorrect-handling-for-fiemap_fill_next_extent.patch
+Patch26234: btrfs-cleanup-orphans-while-looking-up-default-subvo.patch
+Patch26235: Btrfs-fix-range-cloning-when-same-inode-used-as-sour.patch
+Patch26236: Btrfs-fix-uninit-variable-in-clone-ioctl.patch
+Patch26237: Btrfs-fix-regression-in-raid-level-conversion.patch
+
 # END OF PATCH DEFINITIONS

 %endif
@ -1565,6 +1574,15 @@ ApplyPatch block-discard-bdi_unregister-in-favour-of-bdi_destro.patch
 #rhbz 1223051
 ApplyPatch Input-synaptics-add-min-max-quirk-for-Lenovo-S540.patch

+#rhbz 1217191
+ApplyPatch Btrfs-send-add-missing-check-for-dead-clone-root.patch
+ApplyPatch Btrfs-send-don-t-leave-without-decrementing-clone-ro.patch
+ApplyPatch btrfs-incorrect-handling-for-fiemap_fill_next_extent.patch
+ApplyPatch btrfs-cleanup-orphans-while-looking-up-default-subvo.patch
+ApplyPatch Btrfs-fix-range-cloning-when-same-inode-used-as-sour.patch
+ApplyPatch Btrfs-fix-uninit-variable-in-clone-ioctl.patch
+ApplyPatch Btrfs-fix-regression-in-raid-level-conversion.patch
+
 # END OF PATCH APPLICATIONS

 %endif
@ -2376,6 +2394,9 @@ fi
 #                 ||----w |
 #                 ||     ||
 %changelog
+* Thu Jun 11 2015 Josh Boyer <jwboyer@fedoraproject.org>
+- Backport btrfs fixes queued for stable (rhbz 1217191)
+
 * Tue Jun 09 2015 Josh Boyer <jwboyer@fedoraproject.org>
 - Fix touchpad for Thinkpad S540 (rhbz 1223051)