127 lines
5.5 KiB
Diff
127 lines
5.5 KiB
Diff
From: Filipe Manana <fdmanana@suse.com>
|
|
Date: Tue, 31 Mar 2015 14:56:46 +0100
|
|
Subject: [PATCH] Btrfs: fix range cloning when same inode used as source and
|
|
destination
|
|
|
|
While searching for extents to clone we might find one where we only use
|
|
a part of it coming from its tail. If our destination inode is the same
|
|
the source inode, we end up removing the tail part of the extent item and
|
|
insert after a new one that point to the same extent with an adjusted
|
|
key file offset and data offset. After this we search for the next extent
|
|
item in the fs/subvol tree with a key that has an offset incremented by
|
|
one. But this second search leaves us at the new extent item we inserted
|
|
previously, and since that extent item has a non-zero data offset, it
|
|
it can make us call btrfs_drop_extents with an empty range (start == end)
|
|
which causes the following warning:
|
|
|
|
[23978.537119] WARNING: CPU: 6 PID: 16251 at fs/btrfs/file.c:550 btrfs_drop_extent_cache+0x43/0x385 [btrfs]()
|
|
(...)
|
|
[23978.557266] Call Trace:
|
|
[23978.557978] [<ffffffff81425fd9>] dump_stack+0x4c/0x65
|
|
[23978.559191] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
|
|
[23978.560699] [<ffffffffa047f0ea>] ? btrfs_drop_extent_cache+0x43/0x385 [btrfs]
|
|
[23978.562389] [<ffffffff8104544d>] warn_slowpath_null+0x1a/0x1c
|
|
[23978.563613] [<ffffffffa047f0ea>] btrfs_drop_extent_cache+0x43/0x385 [btrfs]
|
|
[23978.565103] [<ffffffff810e3a18>] ? time_hardirqs_off+0x15/0x28
|
|
[23978.566294] [<ffffffff81079ff8>] ? trace_hardirqs_off+0xd/0xf
|
|
[23978.567438] [<ffffffffa047f73d>] __btrfs_drop_extents+0x6b/0x9e1 [btrfs]
|
|
[23978.568702] [<ffffffff8107c03f>] ? trace_hardirqs_on+0xd/0xf
|
|
[23978.569763] [<ffffffff811441c0>] ? ____cache_alloc+0x69/0x2eb
|
|
[23978.570817] [<ffffffff81142269>] ? virt_to_head_page+0x9/0x36
|
|
[23978.571872] [<ffffffff81143c15>] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb
|
|
[23978.573466] [<ffffffff811420d5>] ? kmemleak_alloc_recursive.constprop.52+0x16/0x18
|
|
[23978.574962] [<ffffffffa0480d07>] btrfs_drop_extents+0x66/0x7f [btrfs]
|
|
[23978.576179] [<ffffffffa049aa35>] btrfs_clone+0x516/0xaf5 [btrfs]
|
|
[23978.577311] [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
|
|
[23978.578520] [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
|
|
[23978.580282] [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
|
|
(...)
|
|
[23978.591887] ---[ end trace 988ec2a653d03ed3 ]---
|
|
|
|
Then we attempt to insert a new extent item with a key that already
|
|
exists, which makes btrfs_insert_empty_item return -EEXIST resulting in
|
|
abortion of the current transaction:
|
|
|
|
[23978.594355] WARNING: CPU: 6 PID: 16251 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
|
|
(...)
|
|
[23978.622589] Call Trace:
|
|
[23978.623181] [<ffffffff81425fd9>] dump_stack+0x4c/0x65
|
|
[23978.624359] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
|
|
[23978.625573] [<ffffffffa044ab6c>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
|
|
[23978.626971] [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
|
|
[23978.628003] [<ffffffff8108a6c8>] ? vprintk_default+0x1d/0x1f
|
|
[23978.629138] [<ffffffffa044ab6c>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
|
|
[23978.630528] [<ffffffffa049ad1b>] btrfs_clone+0x7fc/0xaf5 [btrfs]
|
|
[23978.631635] [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
|
|
[23978.632886] [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
|
|
[23978.634119] [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
|
|
(...)
|
|
[23978.647714] ---[ end trace 988ec2a653d03ed4 ]---
|
|
|
|
This is wrong because we should not process the extent item that we just
|
|
inserted previously, and instead process the extent item that follows it
|
|
in the tree
|
|
|
|
For example for the test case I wrote for fstests:
|
|
|
|
bs=$((64 * 1024))
|
|
mkfs.btrfs -f -l $bs -O ^no-holes /dev/sdc
|
|
mount /dev/sdc /mnt
|
|
|
|
xfs_io -f -c "pwrite -S 0xaa $(($bs * 2)) $(($bs * 2))" /mnt/foo
|
|
|
|
$CLONER_PROG -s $((3 * $bs)) -d $((267 * $bs)) -l 0 /mnt/foo /mnt/foo
|
|
$CLONER_PROG -s $((217 * $bs)) -d $((95 * $bs)) -l 0 /mnt/foo /mnt/foo
|
|
|
|
The second clone call fails with -EEXIST, because when we process the
|
|
first extent item (offset 262144), we drop part of it (counting from the
|
|
end) and then insert a new extent item with a key greater then the key we
|
|
found. The next time we search the tree we search for a key with offset
|
|
262144 + 1, which leaves us at the new extent item we have just inserted
|
|
but we think it refers to an extent that we need to clone.
|
|
|
|
Fix this by ensuring the next search key uses an offset corresponding to
|
|
the offset of the key we found previously plus the data length of the
|
|
corresponding extent item. This ensures we skip new extent items that we
|
|
inserted and works for the case of implicit holes too (NO_HOLES feature).
|
|
|
|
A test case for fstests follows soon.
|
|
|
|
Signed-off-by: Filipe Manana <fdmanana@suse.com>
|
|
Signed-off-by: Chris Mason <clm@fb.com>
|
|
---
|
|
fs/btrfs/ioctl.c | 6 ++++--
|
|
1 file changed, 4 insertions(+), 2 deletions(-)
|
|
|
|
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
|
|
index 2b4c5423672d..d79c599240a7 100644
|
|
--- a/fs/btrfs/ioctl.c
|
|
+++ b/fs/btrfs/ioctl.c
|
|
@@ -3206,6 +3206,8 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
|
|
key.offset = off;
|
|
|
|
while (1) {
|
|
+ u64 next_key_min_offset;
|
|
+
|
|
/*
|
|
* note the key will change type as we walk through the
|
|
* tree.
|
|
@@ -3286,7 +3288,7 @@ process_slot:
|
|
} else if (key.offset >= off + len) {
|
|
break;
|
|
}
|
|
-
|
|
+ next_key_min_offset = key.offset + datal;
|
|
size = btrfs_item_size_nr(leaf, slot);
|
|
read_extent_buffer(leaf, buf,
|
|
btrfs_item_ptr_offset(leaf, slot),
|
|
@@ -3501,7 +3503,7 @@ process_slot:
|
|
break;
|
|
}
|
|
btrfs_release_path(path);
|
|
- key.offset++;
|
|
+ key.offset = next_key_min_offset;
|
|
}
|
|
ret = 0;
|
|
|