| From b72c3aba09a53fc7c1824250d71180ca154517a7 Mon Sep 17 00:00:00 2001 |
| From: Qu Wenruo <wqu@suse.com> |
| Date: Tue, 21 Aug 2018 09:53:47 +0800 |
| Subject: btrfs: locking: Add extra check in btrfs_init_new_buffer() to avoid deadlock |
| |
| From: Qu Wenruo <wqu@suse.com> |
| |
| commit b72c3aba09a53fc7c1824250d71180ca154517a7 upstream. |
| |
| [BUG] |
| For certain crafted image, whose csum root leaf has missing backref, if |
| we try to trigger write with data csum, it could cause deadlock with the |
| following kernel WARN_ON(): |
| |
| WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400 |
| CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8 |
| Workqueue: btrfs-endio-write btrfs_endio_write_helper |
| RIP: 0010:btrfs_tree_lock+0x3e2/0x400 |
| Call Trace: |
| btrfs_alloc_tree_block+0x39f/0x770 |
| __btrfs_cow_block+0x285/0x9e0 |
| btrfs_cow_block+0x191/0x2e0 |
| btrfs_search_slot+0x492/0x1160 |
| btrfs_lookup_csum+0xec/0x280 |
| btrfs_csum_file_blocks+0x2be/0xa60 |
| add_pending_csums+0xaf/0xf0 |
| btrfs_finish_ordered_io+0x74b/0xc90 |
| finish_ordered_fn+0x15/0x20 |
| normal_work_helper+0xf6/0x500 |
| btrfs_endio_write_helper+0x12/0x20 |
| process_one_work+0x302/0x770 |
| worker_thread+0x81/0x6d0 |
| kthread+0x180/0x1d0 |
| ret_from_fork+0x35/0x40 |
| |
| [CAUSE] |
| That crafted image has missing backref for csum tree root leaf. And |
| when we try to allocate new tree block, since there is no |
| EXTENT/METADATA_ITEM for csum tree root, btrfs consider it's free slot |
| and use it. |
| |
| The extent tree of the image looks like: |
| |
| Normal image | This fuzzed image |
| ----------------------------------+-------------------------------- |
| BG 29360128 | BG 29360128 |
| One empty slot | One empty slot |
| 29364224: backref to UUID tree | 29364224: backref to UUID tree |
| Two empty slots | Two empty slots |
| 29376512: backref to CSUM tree | One empty slot (bad type) <<< |
| 29380608: backref to D_RELOC tree | 29380608: backref to D_RELOC tree |
| ... | ... |
| |
| Since bytenr 29376512 has no METADATA/EXTENT_ITEM, when btrfs try to |
| alloc tree block, it's an valid slot for btrfs. |
| |
| And for finish_ordered_write, when we need to insert csum, we try to CoW |
| csum tree root. |
| |
| By accident, empty slots at bytenr BG_OFFSET, BG_OFFSET + 8K, |
| BG_OFFSET + 12K is already used by tree block COW for other trees, the |
| next empty slot is BG_OFFSET + 16K, which should be the backref for CSUM |
| tree. |
| |
| But due to the bad type, btrfs can recognize it and still consider it as |
| an empty slot, and will try to use it for csum tree CoW. |
| |
| Then in the following call trace, we will try to lock the new tree |
| block, which turns out to be the old csum tree root which is already |
| locked: |
| |
| btrfs_search_slot() called on csum tree root, which is at 29376512 |
| |- btrfs_cow_block() |
| |- btrfs_set_lock_block() |
| | |- Now locks tree block 29376512 (old csum tree root) |
| |- __btrfs_cow_block() |
| |- btrfs_alloc_tree_block() |
| |- btrfs_reserve_extent() |
| | Now it returns tree block 29376512, which extent tree |
| | shows its empty slot, but it's already hold by csum tree |
| |- btrfs_init_new_buffer() |
| |- btrfs_tree_lock() |
| | Triggers WARN_ON(eb->lock_owner == current->pid) |
| |- wait_event() |
| Wait lock owner to release the lock, but it's |
| locked by ourself, so it will deadlock |
| |
| [FIX] |
| This patch will do the lock_owner and current->pid check at |
| btrfs_init_new_buffer(). |
| So above deadlock can be avoided. |
| |
| Since such problem can only happen in crafted image, we will still |
| trigger kernel warning for later aborted transaction, but with a little |
| more meaningful warning message. |
| |
| Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405 |
| Reported-by: Xu Wen <wen.xu@gatech.edu> |
| CC: stable@vger.kernel.org # 4.4+ |
| Signed-off-by: Qu Wenruo <wqu@suse.com> |
| Reviewed-by: David Sterba <dsterba@suse.com> |
| Signed-off-by: David Sterba <dsterba@suse.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| fs/btrfs/extent-tree.c | 13 +++++++++++++ |
| 1 file changed, 13 insertions(+) |
| |
| --- a/fs/btrfs/extent-tree.c |
| +++ b/fs/btrfs/extent-tree.c |
| @@ -8263,6 +8263,19 @@ btrfs_init_new_buffer(struct btrfs_trans |
| if (IS_ERR(buf)) |
| return buf; |
| |
| + /* |
| + * Extra safety check in case the extent tree is corrupted and extent |
| + * allocator chooses to use a tree block which is already used and |
| + * locked. |
| + */ |
| + if (buf->lock_owner == current->pid) { |
| + btrfs_err_rl(root->fs_info, |
| +"tree block %llu owner %llu already locked by pid=%d, extent tree corruption detected", |
| + buf->start, btrfs_header_owner(buf), current->pid); |
| + free_extent_buffer(buf); |
| + return ERR_PTR(-EUCLEAN); |
| + } |
| + |
| btrfs_set_header_generation(buf, trans->transid); |
| btrfs_set_buffer_lockdep_class(root->root_key.objectid, buf, level); |
| btrfs_tree_lock(buf); |