| From bippy-1.0.0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@kernel.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-26759: mm/swap: fix race when skipping swapcache |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| mm/swap: fix race when skipping swapcache |
| |
| When skipping swapcache for SWP_SYNCHRONOUS_IO, if two or more threads |
| swapin the same entry at the same time, they get different pages (A, B). |
| Before one thread (T0) finishes the swapin and installs page (A) to the |
| PTE, another thread (T1) could finish swapin of page (B), swap_free the |
| entry, then swap out the possibly modified page reusing the same entry. |
| It breaks the pte_same check in (T0) because PTE value is unchanged, |
| causing ABA problem. Thread (T0) will install a stalled page (A) into the |
| PTE and cause data corruption. |
| |
| One possible callstack is like this: |
| |
| CPU0 CPU1 |
| ---- ---- |
| do_swap_page() do_swap_page() with same entry |
| <direct swapin path> <direct swapin path> |
| <alloc page A> <alloc page B> |
| swap_read_folio() <- read to page A swap_read_folio() <- read to page B |
| <slow on later locks or interrupt> <finished swapin first> |
| ... set_pte_at() |
| swap_free() <- entry is free |
| <write to page B, now page A stalled> |
| <swap out page B to same swap entry> |
| pte_same() <- Check pass, PTE seems |
| unchanged, but page A |
| is stalled! |
| swap_free() <- page B content lost! |
| set_pte_at() <- staled page A installed! |
| |
| And besides, for ZRAM, swap_free() allows the swap device to discard the |
| entry content, so even if page (B) is not modified, if swap_read_folio() |
| on CPU0 happens later than swap_free() on CPU1, it may also cause data |
| loss. |
| |
| To fix this, reuse swapcache_prepare which will pin the swap entry using |
| the cache flag, and allow only one thread to swap it in, also prevent any |
| parallel code from putting the entry in the cache. Release the pin after |
| PT unlocked. |
| |
| Racers just loop and wait since it's a rare and very short event. A |
| schedule_timeout_uninterruptible(1) call is added to avoid repeated page |
| faults wasting too much CPU, causing livelock or adding too much noise to |
| perf statistics. A similar livelock issue was described in commit |
| 029c4628b2eb ("mm: swap: get rid of livelock in swapin readahead") |
| |
| Reproducer: |
| |
| This race issue can be triggered easily using a well constructed |
| reproducer and patched brd (with a delay in read path) [1]: |
| |
| With latest 6.8 mainline, race caused data loss can be observed easily: |
| $ gcc -g -lpthread test-thread-swap-race.c && ./a.out |
| Polulating 32MB of memory region... |
| Keep swapping out... |
| Starting round 0... |
| Spawning 65536 workers... |
| 32746 workers spawned, wait for done... |
| Round 0: Error on 0x5aa00, expected 32746, got 32743, 3 data loss! |
| Round 0: Error on 0x395200, expected 32746, got 32743, 3 data loss! |
| Round 0: Error on 0x3fd000, expected 32746, got 32737, 9 data loss! |
| Round 0 Failed, 15 data loss! |
| |
| This reproducer spawns multiple threads sharing the same memory region |
| using a small swap device. Every two threads updates mapped pages one by |
| one in opposite direction trying to create a race, with one dedicated |
| thread keep swapping out the data out using madvise. |
| |
| The reproducer created a reproduce rate of about once every 5 minutes, so |
| the race should be totally possible in production. |
| |
| After this patch, I ran the reproducer for over a few hundred rounds and |
| no data loss observed. |
| |
| Performance overhead is minimal, microbenchmark swapin 10G from 32G |
| zram: |
| |
| Before: 10934698 us |
| After: 11157121 us |
| Cached: 13155355 us (Dropping SWP_SYNCHRONOUS_IO flag) |
| |
| [kasong@tencent.com: v4] |
| |
| The Linux kernel CVE team has assigned CVE-2024-26759 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 4.15 with commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 and fixed in 6.1.80 with commit 2dedda77d4493f3e92e414b272bfa60f1f51ed95 |
| Issue introduced in 4.15 with commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 and fixed in 6.6.19 with commit 305152314df82b22cf9b181f3dc5fc411002079a |
| Issue introduced in 4.15 with commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 and fixed in 6.7.7 with commit d183a4631acfc7af955c02a02e739cec15f5234d |
| Issue introduced in 4.15 with commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 and fixed in 6.8 with commit 13ddaf26be324a7f951891ecd9ccd04466d27458 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-26759 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| include/linux/swap.h |
| mm/memory.c |
| mm/swap.h |
| mm/swapfile.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/2dedda77d4493f3e92e414b272bfa60f1f51ed95 |
| https://git.kernel.org/stable/c/305152314df82b22cf9b181f3dc5fc411002079a |
| https://git.kernel.org/stable/c/d183a4631acfc7af955c02a02e739cec15f5234d |
| https://git.kernel.org/stable/c/13ddaf26be324a7f951891ecd9ccd04466d27458 |