| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-27080: btrfs: fix race when detecting delalloc ranges during fiemap |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| btrfs: fix race when detecting delalloc ranges during fiemap |
| |
| For fiemap we recently stopped locking the target extent range for the |
| whole duration of the fiemap call, in order to avoid a deadlock in a |
| scenario where the fiemap buffer happens to be a memory mapped range of |
| the same file. This use case is very unlikely to be useful in practice but |
| it may be triggered by fuzz testing (syzbot, etc). |
| |
| This however introduced a race that makes us miss delalloc ranges for |
| file regions that are currently holes, so the caller of fiemap will not |
| be aware that there's data for some file regions. This can be quite |
| serious for some use cases - for example in coreutils versions before 9.0, |
| the cp program used fiemap to detect holes and data in the source file, |
| copying only regions with data (extents or delalloc) from the source file |
| to the destination file in order to preserve holes (see the documentation |
| for its --sparse command line option). This means that if cp was used |
| with a source file that had delalloc in a hole, the destination file could |
| end up without that data, which is effectively a data loss issue, if it |
| happened to hit the race described below. |
| |
| The race happens like this: |
| |
| 1) Fiemap is called, without the FIEMAP_FLAG_SYNC flag, for a file that |
| has delalloc in the file range [64M, 65M[, which is currently a hole; |
| |
| 2) Fiemap locks the inode in shared mode, then starts iterating the |
| inode's subvolume tree searching for file extent items, without having |
| the whole fiemap target range locked in the inode's io tree - the |
| change introduced recently by commit b0ad381fa769 ("btrfs: fix |
| deadlock with fiemap and extent locking"). It only locks ranges in |
| the io tree when it finds a hole or prealloc extent since that |
| commit; |
| |
| 3) Note that fiemap clones each leaf before using it, and this is to |
| avoid deadlocks when locking a file range in the inode's io tree and |
| the fiemap buffer is memory mapped to some file, because writing |
| to the page with btrfs_page_mkwrite() will wait on any ordered extent |
| for the page's range and the ordered extent needs to lock the range |
| and may need to modify the same leaf, therefore leading to a deadlock |
| on the leaf; |
| |
| 4) While iterating the file extent items in the cloned leaf before |
| finding the hole in the range [64M, 65M[, the delalloc in that range |
| is flushed and its ordered extent completes - meaning the corresponding |
| file extent item is in the inode's subvolume tree, but not present in |
| the cloned leaf that fiemap is iterating over; |
| |
| 5) When fiemap finds the hole in the [64M, 65M[ range by seeing the gap in |
| the cloned leaf (or a file extent item with disk_bytenr == 0 in case |
| the NO_HOLES feature is not enabled), it will lock that file range in |
| the inode's io tree and then search for delalloc by checking for the |
| EXTENT_DELALLOC bit in the io tree for that range and ordered extents |
| (with btrfs_find_delalloc_in_range()). But it finds nothing since the |
| delalloc in that range was already flushed and the ordered extent |
| completed and is gone - as a result fiemap will not report that there's |
| delalloc or an extent for the range [64M, 65M[, so user space will be |
| mislead into thinking that there's a hole in that range. |
| |
| This could actually be sporadically triggered with test case generic/094 |
| from fstests, which reports a missing extent/delalloc range like this: |
| |
| generic/094 2s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad) |
| --- tests/generic/094.out 2020-06-10 19:29:03.830519425 +0100 |
| +++ /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad 2024-02-28 11:00:00.381071525 +0000 |
| @@ -1,3 +1,9 @@ |
| QA output created by 094 |
| fiemap run with sync |
| fiemap run without sync |
| +ERROR: couldn't find extent at 7 |
| +map is 'HHDDHPPDPHPH' |
| +logical: [ 5.. 6] phys: 301517.. 301518 flags: 0x800 tot: 2 |
| +logical: [ 8.. 8] phys: 301520.. 301520 flags: 0x800 tot: 1 |
| ... |
| (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/094.out /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad' to see the entire diff) |
| |
| So in order to fix this, while still avoiding deadlocks in the case where |
| the fiemap buffer is memory mapped to the same file, change fiemap to work |
| like the following: |
| |
| 1) Always lock the whole range in the inode's io tree before starting to |
| iterate the inode's subvolume tree searching for file extent items, |
| just like we did before commit b0ad381fa769 ("btrfs: fix deadlock with |
| fiemap and extent locking"); |
| |
| 2) Now instead of writing to the fiemap buffer every time we have an extent |
| to report, write instead to a temporary buffer (1 page), and when that |
| buffer becomes full, stop iterating the file extent items, unlock the |
| range in the io tree, release the search path, submit all the entries |
| kept in that buffer to the fiemap buffer, and then resume the search |
| for file extent items after locking again the remainder of the range in |
| the io tree. |
| |
| The buffer having a size of a page, allows for 146 entries in a system |
| with 4K pages. This is a large enough value to have a good performance |
| by avoiding too many restarts of the search for file extent items. |
| In other words this preserves the huge performance gains made in the |
| last two years to fiemap, while avoiding the deadlocks in case the |
| fiemap buffer is memory mapped to the same file (useless in practice, |
| but possible and exercised by fuzz testing and syzbot). |
| |
| The Linux kernel CVE team has assigned CVE-2024-27080 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.6.24 with commit ded566b4637f1b6b4c9ba74e7d0b8493e93f19cf and fixed in 6.6.26 with commit 49d640d2946c35a17b051d54171a032dd95b0f50 |
| Issue introduced in 6.8 with commit b0ad381fa7690244802aed119b478b4bdafc31dd and fixed in 6.8.2 with commit ced63fffd63072c0ca55d5a451010d71bf08c0b3 |
| Issue introduced in 6.8 with commit b0ad381fa7690244802aed119b478b4bdafc31dd and fixed in 6.9 with commit 978b63f7464abcfd364a6c95f734282c50f3decf |
| Issue introduced in 6.7.12 with commit 89bca7fe6382d61e88c67a0b0e7bce315986fb8b |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-27080 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| fs/btrfs/extent_io.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/49d640d2946c35a17b051d54171a032dd95b0f50 |
| https://git.kernel.org/stable/c/ced63fffd63072c0ca55d5a451010d71bf08c0b3 |
| https://git.kernel.org/stable/c/978b63f7464abcfd364a6c95f734282c50f3decf |