cve/published/2024/CVE-2024-26987.mbox - pub/scm/linux/security/vulns - Git at Google

 From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
 From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 To: <linux-cve-announce@vger.kernel.org>
 Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
 Subject: CVE-2024-26987: mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled

 Description
 ===========

 In the Linux kernel, the following vulnerability has been resolved:

 mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled

 When I did hard offline test with hugetlb pages, below deadlock occurs:

 ======================================================
 WARNING: possible circular locking dependency detected
 6.8.0-11409-gf6cef5f8c37f #1 Not tainted
 ------------------------------------------------------
 bash/46904 is trying to acquire lock:
 ffffffffabe68910 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_dec+0x16/0x60

 but task is already holding lock:
 ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (pcp_batch_high_lock){+.+.}-{3:3}:
        __mutex_lock+0x6c/0x770
        page_alloc_cpu_online+0x3c/0x70
        cpuhp_invoke_callback+0x397/0x5f0
        __cpuhp_invoke_callback_range+0x71/0xe0
        _cpu_up+0xeb/0x210
        cpu_up+0x91/0xe0
        cpuhp_bringup_mask+0x49/0xb0
        bringup_nonboot_cpus+0xb7/0xe0
        smp_init+0x25/0xa0
        kernel_init_freeable+0x15f/0x3e0
        kernel_init+0x15/0x1b0
        ret_from_fork+0x2f/0x50
        ret_from_fork_asm+0x1a/0x30

 -> #0 (cpu_hotplug_lock){++++}-{0:0}:
        __lock_acquire+0x1298/0x1cd0
        lock_acquire+0xc0/0x2b0
        cpus_read_lock+0x2a/0xc0
        static_key_slow_dec+0x16/0x60
        __hugetlb_vmemmap_restore_folio+0x1b9/0x200
        dissolve_free_huge_page+0x211/0x260
        __page_handle_poison+0x45/0xc0
        memory_failure+0x65e/0xc70
        hard_offline_page_store+0x55/0xa0
        kernfs_fop_write_iter+0x12c/0x1d0
        vfs_write+0x387/0x550
        ksys_write+0x64/0xe0
        do_syscall_64+0xca/0x1e0
        entry_SYSCALL_64_after_hwframe+0x6d/0x75

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(pcp_batch_high_lock);
                                lock(cpu_hotplug_lock);
                                lock(pcp_batch_high_lock);
   rlock(cpu_hotplug_lock);

  *** DEADLOCK ***

 5 locks held by bash/46904:
  #0: ffff98f6c3bb23f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
  #1: ffff98f6c328e488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
  #2: ffff98ef83b31890 (kn->active#113){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
  #3: ffffffffabf9db48 (mf_mutex){+.+.}-{3:3}, at: memory_failure+0x44/0xc70
  #4: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40

 stack backtrace:
 CPU: 10 PID: 46904 Comm: bash Kdump: loaded Not tainted 6.8.0-11409-gf6cef5f8c37f #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x68/0xa0
  check_noncircular+0x129/0x140
  __lock_acquire+0x1298/0x1cd0
  lock_acquire+0xc0/0x2b0
  cpus_read_lock+0x2a/0xc0
  static_key_slow_dec+0x16/0x60
  __hugetlb_vmemmap_restore_folio+0x1b9/0x200
  dissolve_free_huge_page+0x211/0x260
  __page_handle_poison+0x45/0xc0
  memory_failure+0x65e/0xc70
  hard_offline_page_store+0x55/0xa0
  kernfs_fop_write_iter+0x12c/0x1d0
  vfs_write+0x387/0x550
  ksys_write+0x64/0xe0
  do_syscall_64+0xca/0x1e0
  entry_SYSCALL_64_after_hwframe+0x6d/0x75
 RIP: 0033:0x7fc862314887
 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
 RSP: 002b:00007fff19311268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc862314887
 RDX: 000000000000000c RSI: 000056405645fe10 RDI: 0000000000000001
 RBP: 000056405645fe10 R08: 00007fc8623d1460 R09: 000000007fffffff
 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
 R13: 00007fc86241b780 R14: 00007fc862417600 R15: 00007fc862416a00

 In short, below scene breaks the lock dependency chain:

  memory_failure
   __page_handle_poison
    zone_pcp_disable -- lock(pcp_batch_high_lock)
    dissolve_free_huge_page
     __hugetlb_vmemmap_restore_folio
      static_key_slow_dec
       cpus_read_lock -- rlock(cpu_hotplug_lock)

 Fix this by calling drain_all_pages() instead.

 This issue won't occur until commit a6b40850c442 ("mm: hugetlb: replace
 hugetlb_free_vmemmap_enabled with a static_key").  As it introduced
 rlock(cpu_hotplug_lock) in dissolve_free_huge_page() code path while
 lock(pcp_batch_high_lock) is already in the __page_handle_poison().

 [linmiaohe@huawei.com: extend comment per Oscar]
 [akpm@linux-foundation.org: reflow block comment]

 The Linux kernel CVE team has assigned CVE-2024-26987 to this issue.


 Affected and fixed versions
 ===========================

 	Issue introduced in 5.18 with commit a6b40850c442bf996e729e1d441d3dbc37cea171 and fixed in 6.1.88 with commit 5ef7ba2799a3b5ed292b8f6407376e2c25ef002e
 	Issue introduced in 5.18 with commit a6b40850c442bf996e729e1d441d3dbc37cea171 and fixed in 6.6.29 with commit 882e1180c83f5b75bae03d0ccc31ccedfe5159de
 	Issue introduced in 5.18 with commit a6b40850c442bf996e729e1d441d3dbc37cea171 and fixed in 6.8.8 with commit 49955b24002dc16a0ae2e83a57a2a6c863a1845c
 	Issue introduced in 5.18 with commit a6b40850c442bf996e729e1d441d3dbc37cea171 and fixed in 6.9 with commit 1983184c22dd84a4d95a71e5c6775c2638557dc7

 Please see https://www.kernel.org for a full list of currently supported
 kernel versions by the kernel community.

 Unaffected versions might change over time as fixes are backported to
 older supported kernel versions.  The official CVE entry at
 	https://cve.org/CVERecord/?id=CVE-2024-26987
 will be updated if fixes are backported, please check that for the most
 up to date information about this issue.


 Affected files
 ==============

 The file(s) affected by this issue are:
 	mm/memory-failure.c


 Mitigation
 ==========

 The Linux kernel CVE team recommends that you update to the latest
 stable kernel version for this, and many other bugfixes.  Individual
 changes are never tested alone, but rather are part of a larger kernel
 release.  Cherry-picking individual commits is not recommended or
 supported by the Linux kernel community at all.  If however, updating to
 the latest release is impossible, the individual changes to resolve this
 issue can be found at these commits:
 	https://git.kernel.org/stable/c/5ef7ba2799a3b5ed292b8f6407376e2c25ef002e
 	https://git.kernel.org/stable/c/882e1180c83f5b75bae03d0ccc31ccedfe5159de
 	https://git.kernel.org/stable/c/49955b24002dc16a0ae2e83a57a2a6c863a1845c
 	https://git.kernel.org/stable/c/1983184c22dd84a4d95a71e5c6775c2638557dc7
	From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
	From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	To: <linux-cve-announce@vger.kernel.org>
	Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
	Subject: CVE-2024-26987: mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled

	Description
	===========

	In the Linux kernel, the following vulnerability has been resolved:

	mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled

	When I did hard offline test with hugetlb pages, below deadlock occurs:

	======================================================
	WARNING: possible circular locking dependency detected
	6.8.0-11409-gf6cef5f8c37f #1 Not tainted
	------------------------------------------------------
	bash/46904 is trying to acquire lock:
	ffffffffabe68910 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_dec+0x16/0x60

	but task is already holding lock:
	ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40

	which lock already depends on the new lock.

	the existing dependency chain (in reverse order) is:

	-> #1 (pcp_batch_high_lock){+.+.}-{3:3}:
	__mutex_lock+0x6c/0x770
	page_alloc_cpu_online+0x3c/0x70
	cpuhp_invoke_callback+0x397/0x5f0
	__cpuhp_invoke_callback_range+0x71/0xe0
	_cpu_up+0xeb/0x210
	cpu_up+0x91/0xe0
	cpuhp_bringup_mask+0x49/0xb0
	bringup_nonboot_cpus+0xb7/0xe0
	smp_init+0x25/0xa0
	kernel_init_freeable+0x15f/0x3e0
	kernel_init+0x15/0x1b0
	ret_from_fork+0x2f/0x50
	ret_from_fork_asm+0x1a/0x30

	-> #0 (cpu_hotplug_lock){++++}-{0:0}:
	__lock_acquire+0x1298/0x1cd0
	lock_acquire+0xc0/0x2b0
	cpus_read_lock+0x2a/0xc0
	static_key_slow_dec+0x16/0x60
	__hugetlb_vmemmap_restore_folio+0x1b9/0x200
	dissolve_free_huge_page+0x211/0x260
	__page_handle_poison+0x45/0xc0
	memory_failure+0x65e/0xc70
	hard_offline_page_store+0x55/0xa0
	kernfs_fop_write_iter+0x12c/0x1d0
	vfs_write+0x387/0x550
	ksys_write+0x64/0xe0
	do_syscall_64+0xca/0x1e0
	entry_SYSCALL_64_after_hwframe+0x6d/0x75

	other info that might help us debug this:

	Possible unsafe locking scenario:

	CPU0 CPU1
	---- ----
	lock(pcp_batch_high_lock);
	lock(cpu_hotplug_lock);
	lock(pcp_batch_high_lock);
	rlock(cpu_hotplug_lock);

	* DEADLOCK *

	5 locks held by bash/46904:
	#0: ffff98f6c3bb23f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
	#1: ffff98f6c328e488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
	#2: ffff98ef83b31890 (kn->active#113){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
	#3: ffffffffabf9db48 (mf_mutex){+.+.}-{3:3}, at: memory_failure+0x44/0xc70
	#4: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40

	stack backtrace:
	CPU: 10 PID: 46904 Comm: bash Kdump: loaded Not tainted 6.8.0-11409-gf6cef5f8c37f #1
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
	Call Trace:
	<TASK>
	dump_stack_lvl+0x68/0xa0
	check_noncircular+0x129/0x140
	__lock_acquire+0x1298/0x1cd0
	lock_acquire+0xc0/0x2b0
	cpus_read_lock+0x2a/0xc0
	static_key_slow_dec+0x16/0x60
	__hugetlb_vmemmap_restore_folio+0x1b9/0x200
	dissolve_free_huge_page+0x211/0x260
	__page_handle_poison+0x45/0xc0
	memory_failure+0x65e/0xc70
	hard_offline_page_store+0x55/0xa0
	kernfs_fop_write_iter+0x12c/0x1d0
	vfs_write+0x387/0x550
	ksys_write+0x64/0xe0
	do_syscall_64+0xca/0x1e0
	entry_SYSCALL_64_after_hwframe+0x6d/0x75
	RIP: 0033:0x7fc862314887
	Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
	RSP: 002b:00007fff19311268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
	RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc862314887
	RDX: 000000000000000c RSI: 000056405645fe10 RDI: 0000000000000001
	RBP: 000056405645fe10 R08: 00007fc8623d1460 R09: 000000007fffffff
	R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
	R13: 00007fc86241b780 R14: 00007fc862417600 R15: 00007fc862416a00

	In short, below scene breaks the lock dependency chain:

	memory_failure
	__page_handle_poison
	zone_pcp_disable -- lock(pcp_batch_high_lock)
	dissolve_free_huge_page
	__hugetlb_vmemmap_restore_folio
	static_key_slow_dec
	cpus_read_lock -- rlock(cpu_hotplug_lock)

	Fix this by calling drain_all_pages() instead.

	This issue won't occur until commit a6b40850c442 ("mm: hugetlb: replace
	hugetlb_free_vmemmap_enabled with a static_key"). As it introduced
	rlock(cpu_hotplug_lock) in dissolve_free_huge_page() code path while
	lock(pcp_batch_high_lock) is already in the __page_handle_poison().

	[linmiaohe@huawei.com: extend comment per Oscar]
	[akpm@linux-foundation.org: reflow block comment]

	The Linux kernel CVE team has assigned CVE-2024-26987 to this issue.


	Affected and fixed versions
	===========================

	Issue introduced in 5.18 with commit a6b40850c442bf996e729e1d441d3dbc37cea171 and fixed in 6.1.88 with commit 5ef7ba2799a3b5ed292b8f6407376e2c25ef002e
	Issue introduced in 5.18 with commit a6b40850c442bf996e729e1d441d3dbc37cea171 and fixed in 6.6.29 with commit 882e1180c83f5b75bae03d0ccc31ccedfe5159de
	Issue introduced in 5.18 with commit a6b40850c442bf996e729e1d441d3dbc37cea171 and fixed in 6.8.8 with commit 49955b24002dc16a0ae2e83a57a2a6c863a1845c
	Issue introduced in 5.18 with commit a6b40850c442bf996e729e1d441d3dbc37cea171 and fixed in 6.9 with commit 1983184c22dd84a4d95a71e5c6775c2638557dc7

	Please see https://www.kernel.org for a full list of currently supported
	kernel versions by the kernel community.

	Unaffected versions might change over time as fixes are backported to
	older supported kernel versions. The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2024-26987
	will be updated if fixes are backported, please check that for the most
	up to date information about this issue.


	Affected files
	==============

	The file(s) affected by this issue are:
	mm/memory-failure.c


	Mitigation
	==========

	The Linux kernel CVE team recommends that you update to the latest
	stable kernel version for this, and many other bugfixes. Individual
	changes are never tested alone, but rather are part of a larger kernel
	release. Cherry-picking individual commits is not recommended or
	supported by the Linux kernel community at all. If however, updating to
	the latest release is impossible, the individual changes to resolve this
	issue can be found at these commits:
	https://git.kernel.org/stable/c/5ef7ba2799a3b5ed292b8f6407376e2c25ef002e
	https://git.kernel.org/stable/c/882e1180c83f5b75bae03d0ccc31ccedfe5159de
	https://git.kernel.org/stable/c/49955b24002dc16a0ae2e83a57a2a6c863a1845c
	https://git.kernel.org/stable/c/1983184c22dd84a4d95a71e5c6775c2638557dc7