| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-48875: btrfs: don't take dev_replace rwsem on task already holding it |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| btrfs: don't take dev_replace rwsem on task already holding it |
| |
| Running fstests btrfs/011 with MKFS_OPTIONS="-O rst" to force the usage of |
| the RAID stripe-tree, we get the following splat from lockdep: |
| |
| BTRFS info (device sdd): dev_replace from /dev/sdd (devid 1) to /dev/sdb started |
| |
| ============================================ |
| WARNING: possible recursive locking detected |
| 6.11.0-rc3-btrfs-for-next #599 Not tainted |
| -------------------------------------------- |
| btrfs/2326 is trying to acquire lock: |
| ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250 |
| |
| but task is already holding lock: |
| ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250 |
| |
| other info that might help us debug this: |
| Possible unsafe locking scenario: |
| |
| CPU0 |
| ---- |
| lock(&fs_info->dev_replace.rwsem); |
| lock(&fs_info->dev_replace.rwsem); |
| |
| *** DEADLOCK *** |
| |
| May be due to missing lock nesting notation |
| |
| 1 lock held by btrfs/2326: |
| #0: ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250 |
| |
| stack backtrace: |
| CPU: 1 UID: 0 PID: 2326 Comm: btrfs Not tainted 6.11.0-rc3-btrfs-for-next #599 |
| Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 |
| Call Trace: |
| <TASK> |
| dump_stack_lvl+0x5b/0x80 |
| __lock_acquire+0x2798/0x69d0 |
| ? __pfx___lock_acquire+0x10/0x10 |
| ? __pfx___lock_acquire+0x10/0x10 |
| lock_acquire+0x19d/0x4a0 |
| ? btrfs_map_block+0x39f/0x2250 |
| ? __pfx_lock_acquire+0x10/0x10 |
| ? find_held_lock+0x2d/0x110 |
| ? lock_is_held_type+0x8f/0x100 |
| down_read+0x8e/0x440 |
| ? btrfs_map_block+0x39f/0x2250 |
| ? __pfx_down_read+0x10/0x10 |
| ? do_raw_read_unlock+0x44/0x70 |
| ? _raw_read_unlock+0x23/0x40 |
| btrfs_map_block+0x39f/0x2250 |
| ? btrfs_dev_replace_by_ioctl+0xd69/0x1d00 |
| ? btrfs_bio_counter_inc_blocked+0xd9/0x2e0 |
| ? __kasan_slab_alloc+0x6e/0x70 |
| ? __pfx_btrfs_map_block+0x10/0x10 |
| ? __pfx_btrfs_bio_counter_inc_blocked+0x10/0x10 |
| ? kmem_cache_alloc_noprof+0x1f2/0x300 |
| ? mempool_alloc_noprof+0xed/0x2b0 |
| btrfs_submit_chunk+0x28d/0x17e0 |
| ? __pfx_btrfs_submit_chunk+0x10/0x10 |
| ? bvec_alloc+0xd7/0x1b0 |
| ? bio_add_folio+0x171/0x270 |
| ? __pfx_bio_add_folio+0x10/0x10 |
| ? __kasan_check_read+0x20/0x20 |
| btrfs_submit_bio+0x37/0x80 |
| read_extent_buffer_pages+0x3df/0x6c0 |
| btrfs_read_extent_buffer+0x13e/0x5f0 |
| read_tree_block+0x81/0xe0 |
| read_block_for_search+0x4bd/0x7a0 |
| ? __pfx_read_block_for_search+0x10/0x10 |
| btrfs_search_slot+0x78d/0x2720 |
| ? __pfx_btrfs_search_slot+0x10/0x10 |
| ? lock_is_held_type+0x8f/0x100 |
| ? kasan_save_track+0x14/0x30 |
| ? __kasan_slab_alloc+0x6e/0x70 |
| ? kmem_cache_alloc_noprof+0x1f2/0x300 |
| btrfs_get_raid_extent_offset+0x181/0x820 |
| ? __pfx_lock_acquire+0x10/0x10 |
| ? __pfx_btrfs_get_raid_extent_offset+0x10/0x10 |
| ? down_read+0x194/0x440 |
| ? __pfx_down_read+0x10/0x10 |
| ? do_raw_read_unlock+0x44/0x70 |
| ? _raw_read_unlock+0x23/0x40 |
| btrfs_map_block+0x5b5/0x2250 |
| ? __pfx_btrfs_map_block+0x10/0x10 |
| scrub_submit_initial_read+0x8fe/0x11b0 |
| ? __pfx_scrub_submit_initial_read+0x10/0x10 |
| submit_initial_group_read+0x161/0x3a0 |
| ? lock_release+0x20e/0x710 |
| ? __pfx_submit_initial_group_read+0x10/0x10 |
| ? __pfx_lock_release+0x10/0x10 |
| scrub_simple_mirror.isra.0+0x3eb/0x580 |
| scrub_stripe+0xe4d/0x1440 |
| ? lock_release+0x20e/0x710 |
| ? __pfx_scrub_stripe+0x10/0x10 |
| ? __pfx_lock_release+0x10/0x10 |
| ? do_raw_read_unlock+0x44/0x70 |
| ? _raw_read_unlock+0x23/0x40 |
| scrub_chunk+0x257/0x4a0 |
| scrub_enumerate_chunks+0x64c/0xf70 |
| ? __mutex_unlock_slowpath+0x147/0x5f0 |
| ? __pfx_scrub_enumerate_chunks+0x10/0x10 |
| ? bit_wait_timeout+0xb0/0x170 |
| ? __up_read+0x189/0x700 |
| ? scrub_workers_get+0x231/0x300 |
| ? up_write+0x490/0x4f0 |
| btrfs_scrub_dev+0x52e/0xcd0 |
| ? create_pending_snapshots+0x230/0x250 |
| ? __pfx_btrfs_scrub_dev+0x10/0x10 |
| btrfs_dev_replace_by_ioctl+0xd69/0x1d00 |
| ? lock_acquire+0x19d/0x4a0 |
| ? __pfx_btrfs_dev_replace_by_ioctl+0x10/0x10 |
| ? lock_release+0x20e/0x710 |
| ? btrfs_ioctl+0xa09/0x74f0 |
| ? __pfx_lock_release+0x10/0x10 |
| ? do_raw_spin_lock+0x11e/0x240 |
| ? __pfx_do_raw_spin_lock+0x10/0x10 |
| btrfs_ioctl+0xa14/0x74f0 |
| ? lock_acquire+0x19d/0x4a0 |
| ? find_held_lock+0x2d/0x110 |
| ? __pfx_btrfs_ioctl+0x10/0x10 |
| ? lock_release+0x20e/0x710 |
| ? do_sigaction+0x3f0/0x860 |
| ? __pfx_do_vfs_ioctl+0x10/0x10 |
| ? do_raw_spin_lock+0x11e/0x240 |
| ? lockdep_hardirqs_on_prepare+0x270/0x3e0 |
| ? _raw_spin_unlock_irq+0x28/0x50 |
| ? do_sigaction+0x3f0/0x860 |
| ? __pfx_do_sigaction+0x10/0x10 |
| ? __x64_sys_rt_sigaction+0x18e/0x1e0 |
| ? __pfx___x64_sys_rt_sigaction+0x10/0x10 |
| ? __x64_sys_close+0x7c/0xd0 |
| __x64_sys_ioctl+0x137/0x190 |
| do_syscall_64+0x71/0x140 |
| entry_SYSCALL_64_after_hwframe+0x76/0x7e |
| RIP: 0033:0x7f0bd1114f9b |
| Code: Unable to access opcode bytes at 0x7f0bd1114f71. |
| RSP: 002b:00007ffc8a8c3130 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 |
| RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f0bd1114f9b |
| RDX: 00007ffc8a8c35e0 RSI: 00000000ca289435 RDI: 0000000000000003 |
| RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000007 |
| R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffc8a8c6c85 |
| R13: 00000000398e72a0 R14: 0000000000004361 R15: 0000000000000004 |
| </TASK> |
| |
| This happens because on RAID stripe-tree filesystems we recurse back into |
| btrfs_map_block() on scrub to perform the logical to device physical |
| mapping. |
| |
| But as the device replace task is already holding the dev_replace::rwsem |
| we deadlock. |
| |
| So don't take the dev_replace::rwsem in case our task is the task performing |
| the device replace. |
| |
| The Linux kernel CVE team has assigned CVE-2024-48875 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Fixed in 6.6.66 with commit a5bc4e030f50fdbb1fbc69acc1e0c5f57c79d044 |
| Fixed in 6.12.5 with commit a2e99dcd7aafa9d474f7d9b0740b8f93c4e156c2 |
| Fixed in 6.13 with commit 8cca35cb29f81eba3e96ec44dad8696c8a2f9138 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-48875 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| fs/btrfs/dev-replace.c |
| fs/btrfs/fs.h |
| fs/btrfs/volumes.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/a5bc4e030f50fdbb1fbc69acc1e0c5f57c79d044 |
| https://git.kernel.org/stable/c/a2e99dcd7aafa9d474f7d9b0740b8f93c4e156c2 |
| https://git.kernel.org/stable/c/8cca35cb29f81eba3e96ec44dad8696c8a2f9138 |