| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-57976: btrfs: do proper folio cleanup when cow_file_range() failed |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| btrfs: do proper folio cleanup when cow_file_range() failed |
| |
| [BUG] |
| When testing with COW fixup marked as BUG_ON() (this is involved with the |
| new pin_user_pages*() change, which should not result new out-of-band |
| dirty pages), I hit a crash triggered by the BUG_ON() from hitting COW |
| fixup path. |
| |
| This BUG_ON() happens just after a failed btrfs_run_delalloc_range(): |
| |
| BTRFS error (device dm-2): failed to run delalloc range, root 348 ino 405 folio 65536 submit_bitmap 6-15 start 90112 len 106496: -28 |
| ------------[ cut here ]------------ |
| kernel BUG at fs/btrfs/extent_io.c:1444! |
| Internal error: Oops - BUG: 00000000f2000800 [#1] SMP |
| CPU: 0 UID: 0 PID: 434621 Comm: kworker/u24:8 Tainted: G OE 6.12.0-rc7-custom+ #86 |
| Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 |
| Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] |
| pc : extent_writepage_io+0x2d4/0x308 [btrfs] |
| lr : extent_writepage_io+0x2d4/0x308 [btrfs] |
| Call trace: |
| extent_writepage_io+0x2d4/0x308 [btrfs] |
| extent_writepage+0x218/0x330 [btrfs] |
| extent_write_cache_pages+0x1d4/0x4b0 [btrfs] |
| btrfs_writepages+0x94/0x150 [btrfs] |
| do_writepages+0x74/0x190 |
| filemap_fdatawrite_wbc+0x88/0xc8 |
| start_delalloc_inodes+0x180/0x3b0 [btrfs] |
| btrfs_start_delalloc_roots+0x174/0x280 [btrfs] |
| shrink_delalloc+0x114/0x280 [btrfs] |
| flush_space+0x250/0x2f8 [btrfs] |
| btrfs_async_reclaim_data_space+0x180/0x228 [btrfs] |
| process_one_work+0x164/0x408 |
| worker_thread+0x25c/0x388 |
| kthread+0x100/0x118 |
| ret_from_fork+0x10/0x20 |
| Code: aa1403e1 9402f3ef aa1403e0 9402f36f (d4210000) |
| ---[ end trace 0000000000000000 ]--- |
| |
| [CAUSE] |
| That failure is mostly from cow_file_range(), where we can hit -ENOSPC. |
| |
| Although the -ENOSPC is already a bug related to our space reservation |
| code, let's just focus on the error handling. |
| |
| For example, we have the following dirty range [0, 64K) of an inode, |
| with 4K sector size and 4K page size: |
| |
| 0 16K 32K 48K 64K |
| |///////////////////////////////////////| |
| |#######################################| |
| |
| Where |///| means page are still dirty, and |###| means the extent io |
| tree has EXTENT_DELALLOC flag. |
| |
| - Enter extent_writepage() for page 0 |
| |
| - Enter btrfs_run_delalloc_range() for range [0, 64K) |
| |
| - Enter cow_file_range() for range [0, 64K) |
| |
| - Function btrfs_reserve_extent() only reserved one 16K extent |
| So we created extent map and ordered extent for range [0, 16K) |
| |
| 0 16K 32K 48K 64K |
| |////////|//////////////////////////////| |
| |<- OE ->|##############################| |
| |
| And range [0, 16K) has its delalloc flag cleared. |
| But since we haven't yet submit any bio, involved 4 pages are still |
| dirty. |
| |
| - Function btrfs_reserve_extent() returns with -ENOSPC |
| Now we have to run error cleanup, which will clear all |
| EXTENT_DELALLOC* flags and clear the dirty flags for the remaining |
| ranges: |
| |
| 0 16K 32K 48K 64K |
| |////////| | |
| | | | |
| |
| Note that range [0, 16K) still has its pages dirty. |
| |
| - Some time later, writeback is triggered again for the range [0, 16K) |
| since the page range still has dirty flags. |
| |
| - btrfs_run_delalloc_range() will do nothing because there is no |
| EXTENT_DELALLOC flag. |
| |
| - extent_writepage_io() finds page 0 has no ordered flag |
| Which falls into the COW fixup path, triggering the BUG_ON(). |
| |
| Unfortunately this error handling bug dates back to the introduction of |
| btrfs. Thankfully with the abuse of COW fixup, at least it won't crash |
| the kernel. |
| |
| [FIX] |
| Instead of immediately unlocking the extent and folios, we keep the extent |
| and folios locked until either erroring out or the whole delalloc range |
| finished. |
| |
| When the whole delalloc range finished without error, we just unlock the |
| whole range with PAGE_SET_ORDERED (and PAGE_UNLOCK for !keep_locked |
| cases), with EXTENT_DELALLOC and EXTENT_LOCKED cleared. |
| And the involved folios will be properly submitted, with their dirty |
| flags cleared during submission. |
| |
| For the error path, it will be a little more complex: |
| |
| - The range with ordered extent allocated (range (1)) |
| We only clear the EXTENT_DELALLOC and EXTENT_LOCKED, as the remaining |
| flags are cleaned up by |
| btrfs_mark_ordered_io_finished()->btrfs_finish_one_ordered(). |
| |
| For folios we finish the IO (clear dirty, start writeback and |
| immediately finish the writeback) and unlock the folios. |
| |
| - The range with reserved extent but no ordered extent (range(2)) |
| - The range we never touched (range(3)) |
| For both range (2) and range(3) the behavior is not changed. |
| |
| Now even if cow_file_range() failed halfway with some successfully |
| reserved extents/ordered extents, we will keep all folios clean, so |
| there will be no future writeback triggered on them. |
| |
| The Linux kernel CVE team has assigned CVE-2024-57976 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Fixed in 6.13.2 with commit 692cf71173bb41395c855acbbbe197d3aedfa5d4 |
| Fixed in 6.14 with commit 06f364284794f149d2abc167c11d556cf20c954b |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-57976 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| fs/btrfs/inode.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/692cf71173bb41395c855acbbbe197d3aedfa5d4 |
| https://git.kernel.org/stable/c/06f364284794f149d2abc167c11d556cf20c954b |