| From 1d512cb77bdbda80f0dd0620a3b260d697fd581d Mon Sep 17 00:00:00 2001 |
| From: Filipe Manana <fdmanana@suse.com> |
| Date: Mon, 9 Nov 2015 00:33:58 +0000 |
| Subject: Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow |
| |
| From: Filipe Manana <fdmanana@suse.com> |
| |
| commit 1d512cb77bdbda80f0dd0620a3b260d697fd581d upstream. |
| |
| If we are using the NO_HOLES feature, we have a tiny time window when |
| running delalloc for a nodatacow inode where we can race with a concurrent |
| link or xattr add operation leading to a BUG_ON. |
| |
| This happens because at run_delalloc_nocow() we end up casting a leaf item |
| of type BTRFS_INODE_[REF|EXTREF]_KEY or of type BTRFS_XATTR_ITEM_KEY to a |
| file extent item (struct btrfs_file_extent_item) and then analyse its |
| extent type field, which won't match any of the expected extent types |
| (values BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]) and therefore trigger an |
| explicit BUG_ON(1). |
| |
| The following sequence diagram shows how the race happens when running a |
| no-cow dellaloc range [4K, 8K[ for inode 257 and we have the following |
| neighbour leafs: |
| |
| Leaf X (has N items) Leaf Y |
| |
| [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 EXTENT_DATA 8192), ... ] |
| slot N - 2 slot N - 1 slot 0 |
| |
| (Note the implicit hole for inode 257 regarding the [0, 8K[ range) |
| |
| CPU 1 CPU 2 |
| |
| run_dealloc_nocow() |
| btrfs_lookup_file_extent() |
| --> searches for a key with value |
| (257 EXTENT_DATA 4096) in the |
| fs/subvol tree |
| --> returns us a path with |
| path->nodes[0] == leaf X and |
| path->slots[0] == N |
| |
| because path->slots[0] is >= |
| btrfs_header_nritems(leaf X), it |
| calls btrfs_next_leaf() |
| |
| btrfs_next_leaf() |
| --> releases the path |
| |
| hard link added to our inode, |
| with key (257 INODE_REF 500) |
| added to the end of leaf X, |
| so leaf X now has N + 1 keys |
| |
| --> searches for the key |
| (257 INODE_REF 256), because |
| it was the last key in leaf X |
| before it released the path, |
| with path->keep_locks set to 1 |
| |
| --> ends up at leaf X again and |
| it verifies that the key |
| (257 INODE_REF 256) is no longer |
| the last key in the leaf, so it |
| returns with path->nodes[0] == |
| leaf X and path->slots[0] == N, |
| pointing to the new item with |
| key (257 INODE_REF 500) |
| |
| the loop iteration of run_dealloc_nocow() |
| does not break out the loop and continues |
| because the key referenced in the path |
| at path->nodes[0] and path->slots[0] is |
| for inode 257, its type is < BTRFS_EXTENT_DATA_KEY |
| and its offset (500) is less then our delalloc |
| range's end (8192) |
| |
| the item pointed by the path, an inode reference item, |
| is (incorrectly) interpreted as a file extent item and |
| we get an invalid extent type, leading to the BUG_ON(1): |
| |
| if (extent_type == BTRFS_FILE_EXTENT_REG || |
| extent_type == BTRFS_FILE_EXTENT_PREALLOC) { |
| (...) |
| } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) { |
| (...) |
| } else { |
| BUG_ON(1) |
| } |
| |
| The same can happen if a xattr is added concurrently and ends up having |
| a key with an offset smaller then the delalloc's range end. |
| |
| So fix this by skipping keys with a type smaller than |
| BTRFS_EXTENT_DATA_KEY. |
| |
| Signed-off-by: Filipe Manana <fdmanana@suse.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| fs/btrfs/inode.c | 10 ++++++++-- |
| 1 file changed, 8 insertions(+), 2 deletions(-) |
| |
| --- a/fs/btrfs/inode.c |
| +++ b/fs/btrfs/inode.c |
| @@ -1286,8 +1286,14 @@ next_slot: |
| num_bytes = 0; |
| btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]); |
| |
| - if (found_key.objectid > ino || |
| - found_key.type > BTRFS_EXTENT_DATA_KEY || |
| + if (found_key.objectid > ino) |
| + break; |
| + if (WARN_ON_ONCE(found_key.objectid < ino) || |
| + found_key.type < BTRFS_EXTENT_DATA_KEY) { |
| + path->slots[0]++; |
| + goto next_slot; |
| + } |
| + if (found_key.type > BTRFS_EXTENT_DATA_KEY || |
| found_key.offset > end) |
| break; |
| |