| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-38306: btrfs: protect folio::private when attaching extent buffer folios |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| btrfs: protect folio::private when attaching extent buffer folios |
| |
| [BUG] |
| Since v6.8 there are rare kernel crashes reported by various people, |
| the common factor is bad page status error messages like this: |
| |
| BUG: Bad page state in process kswapd0 pfn:d6e840 |
| page: refcount:0 mapcount:0 mapping:000000007512f4f2 index:0x2796c2c7c |
| pfn:0xd6e840 |
| aops:btree_aops ino:1 |
| flags: 0x17ffffe0000008(uptodate|node=0|zone=2|lastcpupid=0x3fffff) |
| page_type: 0xffffffff() |
| raw: 0017ffffe0000008 dead000000000100 dead000000000122 ffff88826d0be4c0 |
| raw: 00000002796c2c7c 0000000000000000 00000000ffffffff 0000000000000000 |
| page dumped because: non-NULL mapping |
| |
| [CAUSE] |
| Commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to |
| allocate-then-attach method") changes the sequence when allocating a new |
| extent buffer. |
| |
| Previously we always called grab_extent_buffer() under |
| mapping->i_private_lock, to ensure the safety on modification on |
| folio::private (which is a pointer to extent buffer for regular |
| sectorsize). |
| |
| This can lead to the following race: |
| |
| Thread A is trying to allocate an extent buffer at bytenr X, with 4 |
| 4K pages, meanwhile thread B is trying to release the page at X + 4K |
| (the second page of the extent buffer at X). |
| |
| Thread A | Thread B |
| -----------------------------------+------------------------------------- |
| | btree_release_folio() |
| | | This is for the page at X + 4K, |
| | | Not page X. |
| | | |
| alloc_extent_buffer() | |- release_extent_buffer() |
| |- filemap_add_folio() for the | | |- atomic_dec_and_test(eb->refs) |
| | page at bytenr X (the first | | | |
| | page). | | | |
| | Which returned -EEXIST. | | | |
| | | | | |
| |- filemap_lock_folio() | | | |
| | Returned the first page locked. | | | |
| | | | | |
| |- grab_extent_buffer() | | | |
| | |- atomic_inc_not_zero() | | | |
| | | Returned false | | | |
| | |- folio_detach_private() | | |- folio_detach_private() for X |
| | |- folio_test_private() | | |- folio_test_private() |
| | Returned true | | | Returned true |
| |- folio_put() | |- folio_put() |
| |
| Now there are two puts on the same folio at folio X, leading to refcount |
| underflow of the folio X, and eventually causing the BUG_ON() on the |
| page->mapping. |
| |
| The condition is not that easy to hit: |
| |
| - The release must be triggered for the middle page of an eb |
| If the release is on the same first page of an eb, page lock would kick |
| in and prevent the race. |
| |
| - folio_detach_private() has a very small race window |
| It's only between folio_test_private() and folio_clear_private(). |
| |
| That's exactly when mapping->i_private_lock is used to prevent such race, |
| and commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to |
| allocate-then-attach method") screwed that up. |
| |
| At that time, I thought the page lock would kick in as |
| filemap_release_folio() also requires the page to be locked, but forgot |
| the filemap_release_folio() only locks one page, not all pages of an |
| extent buffer. |
| |
| [FIX] |
| Move all the code requiring i_private_lock into |
| attach_eb_folio_to_filemap(), so that everything is done with proper |
| lock protection. |
| |
| Furthermore to prevent future problems, add an extra |
| lockdep_assert_locked() to ensure we're holding the proper lock. |
| |
| To reproducer that is able to hit the race (takes a few minutes with |
| instrumented code inserting delays to alloc_extent_buffer()): |
| |
| #!/bin/sh |
| drop_caches () { |
| while(true); do |
| echo 3 > /proc/sys/vm/drop_caches |
| echo 1 > /proc/sys/vm/compact_memory |
| done |
| } |
| |
| run_tar () { |
| while(true); do |
| for x in `seq 1 80` ; do |
| tar cf /dev/zero /mnt > /dev/null & |
| done |
| wait |
| done |
| } |
| |
| mkfs.btrfs -f -d single -m single /dev/vda |
| mount -o noatime /dev/vda /mnt |
| # create 200,000 files, 1K each |
| ./simoop -n 200000 -E -f 1k /mnt |
| drop_caches & |
| (run_tar) |
| |
| The Linux kernel CVE team has assigned CVE-2024-38306 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.8 with commit 09e6cef19c9fc0e10547135476865b5272aa0406 and fixed in 6.9.5 with commit 952f048eb901881a7cc6f7c1368b53cd386ead7b |
| Issue introduced in 6.8 with commit 09e6cef19c9fc0e10547135476865b5272aa0406 and fixed in 6.10 with commit f3a5367c679d31473d3fbb391675055b4792c309 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-38306 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| fs/btrfs/extent_io.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/952f048eb901881a7cc6f7c1368b53cd386ead7b |
| https://git.kernel.org/stable/c/f3a5367c679d31473d3fbb391675055b4792c309 |