| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-26794: btrfs: fix race between ordered extent completion and fiemap |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| btrfs: fix race between ordered extent completion and fiemap |
| |
| For fiemap we recently stopped locking the target extent range for the |
| whole duration of the fiemap call, in order to avoid a deadlock in a |
| scenario where the fiemap buffer happens to be a memory mapped range of |
| the same file. This use case is very unlikely to be useful in practice but |
| it may be triggered by fuzz testing (syzbot, etc). |
| |
| However by not locking the target extent range for the whole duration of |
| the fiemap call we can race with an ordered extent. This happens like |
| this: |
| |
| 1) The fiemap task finishes processing a file extent item that covers |
| the file range [512K, 1M[, and that file extent item is the last item |
| in the leaf currently being processed; |
| |
| 2) And ordered extent for the file range [768K, 2M[, in COW mode, |
| completes (btrfs_finish_one_ordered()) and the file extent item |
| covering the range [512K, 1M[ is trimmed to cover the range |
| [512K, 768K[ and then a new file extent item for the range [768K, 2M[ |
| is inserted in the inode's subvolume tree; |
| |
| 3) The fiemap task calls fiemap_next_leaf_item(), which then calls |
| btrfs_next_leaf() to find the next leaf / item. This finds that the |
| the next key following the one we previously processed (its type is |
| BTRFS_EXTENT_DATA_KEY and its offset is 512K), is the key corresponding |
| to the new file extent item inserted by the ordered extent, which has |
| a type of BTRFS_EXTENT_DATA_KEY and an offset of 768K; |
| |
| 4) Later the fiemap code ends up at emit_fiemap_extent() and triggers |
| the warning: |
| |
| if (cache->offset + cache->len > offset) { |
| WARN_ON(1); |
| return -EINVAL; |
| } |
| |
| Since we get 1M > 768K, because the previously emitted entry for the |
| old extent covering the file range [512K, 1M[ ends at an offset that |
| is greater than the new extent's start offset (768K). This makes fiemap |
| fail with -EINVAL besides triggering the warning that produces a stack |
| trace like the following: |
| |
| [1621.677651] ------------[ cut here ]------------ |
| [1621.677656] WARNING: CPU: 1 PID: 204366 at fs/btrfs/extent_io.c:2492 emit_fiemap_extent+0x84/0x90 [btrfs] |
| [1621.677899] Modules linked in: btrfs blake2b_generic (...) |
| [1621.677951] CPU: 1 PID: 204366 Comm: pool Not tainted 6.8.0-rc5-btrfs-next-151+ #1 |
| [1621.677954] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 |
| [1621.677956] RIP: 0010:emit_fiemap_extent+0x84/0x90 [btrfs] |
| [1621.678033] Code: 2b 4c 89 63 (...) |
| [1621.678035] RSP: 0018:ffffab16089ffd20 EFLAGS: 00010206 |
| [1621.678037] RAX: 00000000004fa000 RBX: ffffab16089ffe08 RCX: 0000000000009000 |
| [1621.678039] RDX: 00000000004f9000 RSI: 00000000004f1000 RDI: ffffab16089ffe90 |
| [1621.678040] RBP: 00000000004f9000 R08: 0000000000001000 R09: 0000000000000000 |
| [1621.678041] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000041d78000 |
| [1621.678043] R13: 0000000000001000 R14: 0000000000000000 R15: ffff9434f0b17850 |
| [1621.678044] FS: 00007fa6e20006c0(0000) GS:ffff943bdfa40000(0000) knlGS:0000000000000000 |
| [1621.678046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 |
| [1621.678048] CR2: 00007fa6b0801000 CR3: 000000012d404002 CR4: 0000000000370ef0 |
| [1621.678053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 |
| [1621.678055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 |
| [1621.678056] Call Trace: |
| [1621.678074] <TASK> |
| [1621.678076] ? __warn+0x80/0x130 |
| [1621.678082] ? emit_fiemap_extent+0x84/0x90 [btrfs] |
| [1621.678159] ? report_bug+0x1f4/0x200 |
| [1621.678164] ? handle_bug+0x42/0x70 |
| [1621.678167] ? exc_invalid_op+0x14/0x70 |
| [1621.678170] ? asm_exc_invalid_op+0x16/0x20 |
| [1621.678178] ? emit_fiemap_extent+0x84/0x90 [btrfs] |
| [1621.678253] extent_fiemap+0x766/0xa30 [btrfs] |
| [1621.678339] btrfs_fiemap+0x45/0x80 [btrfs] |
| [1621.678420] do_vfs_ioctl+0x1e4/0x870 |
| [1621.678431] __x64_sys_ioctl+0x6a/0xc0 |
| [1621.678434] do_syscall_64+0x52/0x120 |
| [1621.678445] entry_SYSCALL_64_after_hwframe+0x6e/0x76 |
| |
| There's also another case where before calling btrfs_next_leaf() we are |
| processing a hole or a prealloc extent and we had several delalloc ranges |
| within that hole or prealloc extent. In that case if the ordered extents |
| complete before we find the next key, we may end up finding an extent item |
| with an offset smaller than (or equals to) the offset in cache->offset. |
| |
| So fix this by changing emit_fiemap_extent() to address these three |
| scenarios like this: |
| |
| 1) For the first case, steps listed above, adjust the length of the |
| previously cached extent so that it does not overlap with the current |
| extent, emit the previous one and cache the current file extent item; |
| |
| 2) For the second case where he had a hole or prealloc extent with |
| multiple delalloc ranges inside the hole or prealloc extent's range, |
| and the current file extent item has an offset that matches the offset |
| in the fiemap cache, just discard what we have in the fiemap cache and |
| assign the current file extent item to the cache, since it's more up |
| to date; |
| |
| 3) For the third case where he had a hole or prealloc extent with |
| multiple delalloc ranges inside the hole or prealloc extent's range |
| and the offset of the file extent item we just found is smaller than |
| what we have in the cache, just skip the current file extent item |
| if its range end at or behind the cached extent's end, because we may |
| have emitted (to the fiemap user space buffer) delalloc ranges that |
| overlap with the current file extent item's range. If the file extent |
| item's range goes beyond the end offset of the cached extent, just |
| emit the cached extent and cache a subrange of the file extent item, |
| that goes from the end offset of the cached extent to the end offset |
| of the file extent item. |
| |
| Dealing with those cases in those ways makes everything consistent by |
| reflecting the current state of file extent items in the btree and |
| without emitting extents that have overlapping ranges (which would be |
| confusing and violating expectations). |
| |
| This issue could be triggered often with test case generic/561, and was |
| also hit and reported by Wang Yugui. |
| |
| The Linux kernel CVE team has assigned CVE-2024-26794 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.6.24 with commit ded566b4637f1b6b4c9ba74e7d0b8493e93f19cf and fixed in 6.6.21 with commit d43f8e58f10a44df8c08e7f7076f3288352cd168 |
| Issue introduced in 6.7.12 with commit 89bca7fe6382d61e88c67a0b0e7bce315986fb8b and fixed in 6.7.9 with commit 31d07a757c6d3430e03cc22799921569999b9a12 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-26794 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| fs/btrfs/extent_io.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/d43f8e58f10a44df8c08e7f7076f3288352cd168 |
| https://git.kernel.org/stable/c/31d07a757c6d3430e03cc22799921569999b9a12 |
| https://git.kernel.org/stable/c/a1a4a9ca77f143c00fce69c1239887ff8b813bec |