| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-26987: mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled |
| |
| When I did hard offline test with hugetlb pages, below deadlock occurs: |
| |
| ====================================================== |
| WARNING: possible circular locking dependency detected |
| 6.8.0-11409-gf6cef5f8c37f #1 Not tainted |
| ------------------------------------------------------ |
| bash/46904 is trying to acquire lock: |
| ffffffffabe68910 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_dec+0x16/0x60 |
| |
| but task is already holding lock: |
| ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40 |
| |
| which lock already depends on the new lock. |
| |
| the existing dependency chain (in reverse order) is: |
| |
| -> #1 (pcp_batch_high_lock){+.+.}-{3:3}: |
| __mutex_lock+0x6c/0x770 |
| page_alloc_cpu_online+0x3c/0x70 |
| cpuhp_invoke_callback+0x397/0x5f0 |
| __cpuhp_invoke_callback_range+0x71/0xe0 |
| _cpu_up+0xeb/0x210 |
| cpu_up+0x91/0xe0 |
| cpuhp_bringup_mask+0x49/0xb0 |
| bringup_nonboot_cpus+0xb7/0xe0 |
| smp_init+0x25/0xa0 |
| kernel_init_freeable+0x15f/0x3e0 |
| kernel_init+0x15/0x1b0 |
| ret_from_fork+0x2f/0x50 |
| ret_from_fork_asm+0x1a/0x30 |
| |
| -> #0 (cpu_hotplug_lock){++++}-{0:0}: |
| __lock_acquire+0x1298/0x1cd0 |
| lock_acquire+0xc0/0x2b0 |
| cpus_read_lock+0x2a/0xc0 |
| static_key_slow_dec+0x16/0x60 |
| __hugetlb_vmemmap_restore_folio+0x1b9/0x200 |
| dissolve_free_huge_page+0x211/0x260 |
| __page_handle_poison+0x45/0xc0 |
| memory_failure+0x65e/0xc70 |
| hard_offline_page_store+0x55/0xa0 |
| kernfs_fop_write_iter+0x12c/0x1d0 |
| vfs_write+0x387/0x550 |
| ksys_write+0x64/0xe0 |
| do_syscall_64+0xca/0x1e0 |
| entry_SYSCALL_64_after_hwframe+0x6d/0x75 |
| |
| other info that might help us debug this: |
| |
| Possible unsafe locking scenario: |
| |
| CPU0 CPU1 |
| ---- ---- |
| lock(pcp_batch_high_lock); |
| lock(cpu_hotplug_lock); |
| lock(pcp_batch_high_lock); |
| rlock(cpu_hotplug_lock); |
| |
| *** DEADLOCK *** |
| |
| 5 locks held by bash/46904: |
| #0: ffff98f6c3bb23f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0 |
| #1: ffff98f6c328e488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0 |
| #2: ffff98ef83b31890 (kn->active#113){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0 |
| #3: ffffffffabf9db48 (mf_mutex){+.+.}-{3:3}, at: memory_failure+0x44/0xc70 |
| #4: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40 |
| |
| stack backtrace: |
| CPU: 10 PID: 46904 Comm: bash Kdump: loaded Not tainted 6.8.0-11409-gf6cef5f8c37f #1 |
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 |
| Call Trace: |
| <TASK> |
| dump_stack_lvl+0x68/0xa0 |
| check_noncircular+0x129/0x140 |
| __lock_acquire+0x1298/0x1cd0 |
| lock_acquire+0xc0/0x2b0 |
| cpus_read_lock+0x2a/0xc0 |
| static_key_slow_dec+0x16/0x60 |
| __hugetlb_vmemmap_restore_folio+0x1b9/0x200 |
| dissolve_free_huge_page+0x211/0x260 |
| __page_handle_poison+0x45/0xc0 |
| memory_failure+0x65e/0xc70 |
| hard_offline_page_store+0x55/0xa0 |
| kernfs_fop_write_iter+0x12c/0x1d0 |
| vfs_write+0x387/0x550 |
| ksys_write+0x64/0xe0 |
| do_syscall_64+0xca/0x1e0 |
| entry_SYSCALL_64_after_hwframe+0x6d/0x75 |
| RIP: 0033:0x7fc862314887 |
| Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 |
| RSP: 002b:00007fff19311268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 |
| RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc862314887 |
| RDX: 000000000000000c RSI: 000056405645fe10 RDI: 0000000000000001 |
| RBP: 000056405645fe10 R08: 00007fc8623d1460 R09: 000000007fffffff |
| R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c |
| R13: 00007fc86241b780 R14: 00007fc862417600 R15: 00007fc862416a00 |
| |
| In short, below scene breaks the lock dependency chain: |
| |
| memory_failure |
| __page_handle_poison |
| zone_pcp_disable -- lock(pcp_batch_high_lock) |
| dissolve_free_huge_page |
| __hugetlb_vmemmap_restore_folio |
| static_key_slow_dec |
| cpus_read_lock -- rlock(cpu_hotplug_lock) |
| |
| Fix this by calling drain_all_pages() instead. |
| |
| This issue won't occur until commit a6b40850c442 ("mm: hugetlb: replace |
| hugetlb_free_vmemmap_enabled with a static_key"). As it introduced |
| rlock(cpu_hotplug_lock) in dissolve_free_huge_page() code path while |
| lock(pcp_batch_high_lock) is already in the __page_handle_poison(). |
| |
| [linmiaohe@huawei.com: extend comment per Oscar] |
| [akpm@linux-foundation.org: reflow block comment] |
| |
| The Linux kernel CVE team has assigned CVE-2024-26987 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 5.18 with commit a6b40850c442bf996e729e1d441d3dbc37cea171 and fixed in 6.1.88 with commit 5ef7ba2799a3b5ed292b8f6407376e2c25ef002e |
| Issue introduced in 5.18 with commit a6b40850c442bf996e729e1d441d3dbc37cea171 and fixed in 6.6.29 with commit 882e1180c83f5b75bae03d0ccc31ccedfe5159de |
| Issue introduced in 5.18 with commit a6b40850c442bf996e729e1d441d3dbc37cea171 and fixed in 6.8.8 with commit 49955b24002dc16a0ae2e83a57a2a6c863a1845c |
| Issue introduced in 5.18 with commit a6b40850c442bf996e729e1d441d3dbc37cea171 and fixed in 6.9 with commit 1983184c22dd84a4d95a71e5c6775c2638557dc7 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-26987 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| mm/memory-failure.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/5ef7ba2799a3b5ed292b8f6407376e2c25ef002e |
| https://git.kernel.org/stable/c/882e1180c83f5b75bae03d0ccc31ccedfe5159de |
| https://git.kernel.org/stable/c/49955b24002dc16a0ae2e83a57a2a6c863a1845c |
| https://git.kernel.org/stable/c/1983184c22dd84a4d95a71e5c6775c2638557dc7 |