| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-57884: mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim() |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim() |
| |
| The task sometimes continues looping in throttle_direct_reclaim() because |
| allow_direct_reclaim(pgdat) keeps returning false. |
| |
| #0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac |
| #1 [ffff80002cb6f900] __schedule at ffff800008abbd1c |
| #2 [ffff80002cb6f990] schedule at ffff800008abc50c |
| #3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550 |
| #4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68 |
| #5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660 |
| #6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98 |
| #7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8 |
| #8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974 |
| #9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4 |
| |
| At this point, the pgdat contains the following two zones: |
| |
| NODE: 4 ZONE: 0 ADDR: ffff00817fffe540 NAME: "DMA32" |
| SIZE: 20480 MIN/LOW/HIGH: 11/28/45 |
| VM_STAT: |
| NR_FREE_PAGES: 359 |
| NR_ZONE_INACTIVE_ANON: 18813 |
| NR_ZONE_ACTIVE_ANON: 0 |
| NR_ZONE_INACTIVE_FILE: 50 |
| NR_ZONE_ACTIVE_FILE: 0 |
| NR_ZONE_UNEVICTABLE: 0 |
| NR_ZONE_WRITE_PENDING: 0 |
| NR_MLOCK: 0 |
| NR_BOUNCE: 0 |
| NR_ZSPAGES: 0 |
| NR_FREE_CMA_PAGES: 0 |
| |
| NODE: 4 ZONE: 1 ADDR: ffff00817fffec00 NAME: "Normal" |
| SIZE: 8454144 PRESENT: 98304 MIN/LOW/HIGH: 68/166/264 |
| VM_STAT: |
| NR_FREE_PAGES: 146 |
| NR_ZONE_INACTIVE_ANON: 94668 |
| NR_ZONE_ACTIVE_ANON: 3 |
| NR_ZONE_INACTIVE_FILE: 735 |
| NR_ZONE_ACTIVE_FILE: 78 |
| NR_ZONE_UNEVICTABLE: 0 |
| NR_ZONE_WRITE_PENDING: 0 |
| NR_MLOCK: 0 |
| NR_BOUNCE: 0 |
| NR_ZSPAGES: 0 |
| NR_FREE_CMA_PAGES: 0 |
| |
| In allow_direct_reclaim(), while processing ZONE_DMA32, the sum of |
| inactive/active file-backed pages calculated in zone_reclaimable_pages() |
| based on the result of zone_page_state_snapshot() is zero. |
| |
| Additionally, since this system lacks swap, the calculation of inactive/ |
| active anonymous pages is skipped. |
| |
| crash> p nr_swap_pages |
| nr_swap_pages = $1937 = { |
| counter = 0 |
| } |
| |
| As a result, ZONE_DMA32 is deemed unreclaimable and skipped, moving on to |
| the processing of the next zone, ZONE_NORMAL, despite ZONE_DMA32 having |
| free pages significantly exceeding the high watermark. |
| |
| The problem is that the pgdat->kswapd_failures hasn't been incremented. |
| |
| crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_failures |
| $1935 = 0x0 |
| |
| This is because the node deemed balanced. The node balancing logic in |
| balance_pgdat() evaluates all zones collectively. If one or more zones |
| (e.g., ZONE_DMA32) have enough free pages to meet their watermarks, the |
| entire node is deemed balanced. This causes balance_pgdat() to exit early |
| before incrementing the kswapd_failures, as it considers the overall |
| memory state acceptable, even though some zones (like ZONE_NORMAL) remain |
| under significant pressure. |
| |
| |
| The patch ensures that zone_reclaimable_pages() includes free pages |
| (NR_FREE_PAGES) in its calculation when no other reclaimable pages are |
| available (e.g., file-backed or anonymous pages). This change prevents |
| zones like ZONE_DMA32, which have sufficient free pages, from being |
| mistakenly deemed unreclaimable. By doing so, the patch ensures proper |
| node balancing, avoids masking pressure on other zones like ZONE_NORMAL, |
| and prevents infinite loops in throttle_direct_reclaim() caused by |
| allow_direct_reclaim(pgdat) repeatedly returning false. |
| |
| |
| The kernel hangs due to a task stuck in throttle_direct_reclaim(), caused |
| by a node being incorrectly deemed balanced despite pressure in certain |
| zones, such as ZONE_NORMAL. This issue arises from |
| zone_reclaimable_pages() returning 0 for zones without reclaimable file- |
| backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient |
| free pages to be skipped. |
| |
| The lack of swap or reclaimable pages results in ZONE_DMA32 being ignored |
| during reclaim, masking pressure in other zones. Consequently, |
| pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback |
| mechanisms in allow_direct_reclaim() from being triggered, leading to an |
| infinite loop in throttle_direct_reclaim(). |
| |
| This patch modifies zone_reclaimable_pages() to account for free pages |
| (NR_FREE_PAGES) when no other reclaimable pages exist. This ensures zones |
| with sufficient free pages are not skipped, enabling proper balancing and |
| reclaim behavior. |
| |
| [akpm@linux-foundation.org: coding-style cleanups] |
| |
| The Linux kernel CVE team has assigned CVE-2024-57884 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 4.8 with commit 5a1c84b404a7176b8b36e2a0041b6f0adb3151a3 and fixed in 5.4.289 with commit 66cd37660ec34ec444fe42f2277330ae4a36bb19 |
| Issue introduced in 4.8 with commit 5a1c84b404a7176b8b36e2a0041b6f0adb3151a3 and fixed in 5.10.233 with commit d675fefbaec3815b3ae0af1bebd97f27df3a05c8 |
| Issue introduced in 4.8 with commit 5a1c84b404a7176b8b36e2a0041b6f0adb3151a3 and fixed in 5.15.176 with commit 63eac98d6f0898229f515cb62fe4e4db2430e99c |
| Issue introduced in 4.8 with commit 5a1c84b404a7176b8b36e2a0041b6f0adb3151a3 and fixed in 6.1.124 with commit bfb701192129803191c9cd6cdd1f82cd07f8de2c |
| Issue introduced in 4.8 with commit 5a1c84b404a7176b8b36e2a0041b6f0adb3151a3 and fixed in 6.6.70 with commit 1ff2302e8aeac7f2eedb551d7a89617283b5c6b2 |
| Issue introduced in 4.8 with commit 5a1c84b404a7176b8b36e2a0041b6f0adb3151a3 and fixed in 6.12.9 with commit 58d0d02dbc67438fc80223fdd7bbc49cf0733284 |
| Issue introduced in 4.8 with commit 5a1c84b404a7176b8b36e2a0041b6f0adb3151a3 and fixed in 6.13 with commit 6aaced5abd32e2a57cd94fd64f824514d0361da8 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-57884 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| mm/vmscan.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/66cd37660ec34ec444fe42f2277330ae4a36bb19 |
| https://git.kernel.org/stable/c/d675fefbaec3815b3ae0af1bebd97f27df3a05c8 |
| https://git.kernel.org/stable/c/63eac98d6f0898229f515cb62fe4e4db2430e99c |
| https://git.kernel.org/stable/c/bfb701192129803191c9cd6cdd1f82cd07f8de2c |
| https://git.kernel.org/stable/c/1ff2302e8aeac7f2eedb551d7a89617283b5c6b2 |
| https://git.kernel.org/stable/c/58d0d02dbc67438fc80223fdd7bbc49cf0733284 |
| https://git.kernel.org/stable/c/6aaced5abd32e2a57cd94fd64f824514d0361da8 |