cve/published/2024/CVE-2024-36028.mbox - pub/scm/linux/security/vulns - Git at Google

 From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
 From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 To: <linux-cve-announce@vger.kernel.org>
 Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
 Subject: CVE-2024-36028: mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()

 Description
 ===========

 In the Linux kernel, the following vulnerability has been resolved:

 mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()

 When I did memory failure tests recently, below warning occurs:

 DEBUG_LOCKS_WARN_ON(1)
 WARNING: CPU: 8 PID: 1011 at kernel/locking/lockdep.c:232 __lock_acquire+0xccb/0x1ca0
 Modules linked in: mce_inject hwpoison_inject
 CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
 RIP: 0010:__lock_acquire+0xccb/0x1ca0
 RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082
 RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8
 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0
 RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb
 R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10
 R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004
 FS:  00007ff9f32aa740(0000) GS:ffffa1ce5fc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007ff9f3134ba0 CR3: 00000008484e4000 CR4: 00000000000006f0
 Call Trace:
  <TASK>
  lock_acquire+0xbe/0x2d0
  _raw_spin_lock_irqsave+0x3a/0x60
  hugepage_subpool_put_pages.part.0+0xe/0xc0
  free_huge_folio+0x253/0x3f0
  dissolve_free_huge_page+0x147/0x210
  __page_handle_poison+0x9/0x70
  memory_failure+0x4e6/0x8c0
  hard_offline_page_store+0x55/0xa0
  kernfs_fop_write_iter+0x12c/0x1d0
  vfs_write+0x380/0x540
  ksys_write+0x64/0xe0
  do_syscall_64+0xbc/0x1d0
  entry_SYSCALL_64_after_hwframe+0x77/0x7f
 RIP: 0033:0x7ff9f3114887
 RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887
 RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001
 RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff
 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
 R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00
  </TASK>
 Kernel panic - not syncing: kernel: panic_on_warn set ...
 CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
 Call Trace:
  <TASK>
  panic+0x326/0x350
  check_panic_on_warn+0x4f/0x50
  __warn+0x98/0x190
  report_bug+0x18e/0x1a0
  handle_bug+0x3d/0x70
  exc_invalid_op+0x18/0x70
  asm_exc_invalid_op+0x1a/0x20
 RIP: 0010:__lock_acquire+0xccb/0x1ca0
 RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082
 RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8
 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0
 RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb
 R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10
 R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004
  lock_acquire+0xbe/0x2d0
  _raw_spin_lock_irqsave+0x3a/0x60
  hugepage_subpool_put_pages.part.0+0xe/0xc0
  free_huge_folio+0x253/0x3f0
  dissolve_free_huge_page+0x147/0x210
  __page_handle_poison+0x9/0x70
  memory_failure+0x4e6/0x8c0
  hard_offline_page_store+0x55/0xa0
  kernfs_fop_write_iter+0x12c/0x1d0
  vfs_write+0x380/0x540
  ksys_write+0x64/0xe0
  do_syscall_64+0xbc/0x1d0
  entry_SYSCALL_64_after_hwframe+0x77/0x7f
 RIP: 0033:0x7ff9f3114887
 RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887
 RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001
 RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff
 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
 R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00
  </TASK>

 After git bisecting and digging into the code, I believe the root cause is
 that _deferred_list field of folio is unioned with _hugetlb_subpool field.
 In __update_and_free_hugetlb_folio(), folio->_deferred_list is
 initialized leading to corrupted folio->_hugetlb_subpool when folio is
 hugetlb.  Later free_huge_folio() will use _hugetlb_subpool and above
 warning happens.

 But it is assumed hugetlb flag must have been cleared when calling
 folio_put() in update_and_free_hugetlb_folio().  This assumption is broken
 due to below race:

 CPU1					CPU2
 dissolve_free_huge_page			update_and_free_pages_bulk
  update_and_free_hugetlb_folio		 hugetlb_vmemmap_restore_folios
 					  folio_clear_hugetlb_vmemmap_optimized
   clear_flag = folio_test_hugetlb_vmemmap_optimized
   if (clear_flag) <-- False, it's already cleared.
    __folio_clear_hugetlb(folio) <-- Hugetlb is not cleared.
   folio_put
    free_huge_folio <-- free_the_page is expected.
 					 list_for_each_entry()
 					  __folio_clear_hugetlb <-- Too late.

 Fix this issue by checking whether folio is hugetlb directly instead of
 checking clear_flag to close the race window.

 The Linux kernel CVE team has assigned CVE-2024-36028 to this issue.


 Affected and fixed versions
 ===========================

 	Issue introduced in 6.1.47 with commit 1b4ce2952b4f33e198d5e993acff0611dff1e399 and fixed in 6.1.91 with commit 2effe407f7563add41750fd7e03da4ea44b98099
 	Issue introduced in 6.5 with commit 32c877191e022b55fe3a374f3d7e9fb5741c514d and fixed in 6.6.31 with commit 7e0a322877416e8c648819a8e441cf8c790b2cce
 	Issue introduced in 6.5 with commit 32c877191e022b55fe3a374f3d7e9fb5741c514d and fixed in 6.8.9 with commit 9c9b32d46afab2d911897914181c488954012300
 	Issue introduced in 6.5 with commit 32c877191e022b55fe3a374f3d7e9fb5741c514d and fixed in 6.9 with commit 52ccdde16b6540abe43b6f8d8e1e1ec90b0983af
 	Issue introduced in 6.4.11 with commit 9a1a43a0e7e96911eaa00ad20b20f2edefb31d8a

 Please see https://www.kernel.org for a full list of currently supported
 kernel versions by the kernel community.

 Unaffected versions might change over time as fixes are backported to
 older supported kernel versions.  The official CVE entry at
 	https://cve.org/CVERecord/?id=CVE-2024-36028
 will be updated if fixes are backported, please check that for the most
 up to date information about this issue.


 Affected files
 ==============

 The file(s) affected by this issue are:
 	mm/hugetlb.c


 Mitigation
 ==========

 The Linux kernel CVE team recommends that you update to the latest
 stable kernel version for this, and many other bugfixes.  Individual
 changes are never tested alone, but rather are part of a larger kernel
 release.  Cherry-picking individual commits is not recommended or
 supported by the Linux kernel community at all.  If however, updating to
 the latest release is impossible, the individual changes to resolve this
 issue can be found at these commits:
 	https://git.kernel.org/stable/c/2effe407f7563add41750fd7e03da4ea44b98099
 	https://git.kernel.org/stable/c/7e0a322877416e8c648819a8e441cf8c790b2cce
 	https://git.kernel.org/stable/c/9c9b32d46afab2d911897914181c488954012300
 	https://git.kernel.org/stable/c/52ccdde16b6540abe43b6f8d8e1e1ec90b0983af
	From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
	From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	To: <linux-cve-announce@vger.kernel.org>
	Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
	Subject: CVE-2024-36028: mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()

	Description
	===========

	In the Linux kernel, the following vulnerability has been resolved:

	mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()

	When I did memory failure tests recently, below warning occurs:

	DEBUG_LOCKS_WARN_ON(1)
	WARNING: CPU: 8 PID: 1011 at kernel/locking/lockdep.c:232 __lock_acquire+0xccb/0x1ca0
	Modules linked in: mce_inject hwpoison_inject
	CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
	RIP: 0010:__lock_acquire+0xccb/0x1ca0
	RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082
	RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8
	RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0
	RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb
	R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10
	R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004
	FS: 00007ff9f32aa740(0000) GS:ffffa1ce5fc00000(0000) knlGS:0000000000000000
	CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00007ff9f3134ba0 CR3: 00000008484e4000 CR4: 00000000000006f0
	Call Trace:
	<TASK>
	lock_acquire+0xbe/0x2d0
	_raw_spin_lock_irqsave+0x3a/0x60
	hugepage_subpool_put_pages.part.0+0xe/0xc0
	free_huge_folio+0x253/0x3f0
	dissolve_free_huge_page+0x147/0x210
	__page_handle_poison+0x9/0x70
	memory_failure+0x4e6/0x8c0
	hard_offline_page_store+0x55/0xa0
	kernfs_fop_write_iter+0x12c/0x1d0
	vfs_write+0x380/0x540
	ksys_write+0x64/0xe0
	do_syscall_64+0xbc/0x1d0
	entry_SYSCALL_64_after_hwframe+0x77/0x7f
	RIP: 0033:0x7ff9f3114887
	RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
	RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887
	RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001
	RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff
	R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
	R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00
	</TASK>
	Kernel panic - not syncing: kernel: panic_on_warn set ...
	CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
	Call Trace:
	<TASK>
	panic+0x326/0x350
	check_panic_on_warn+0x4f/0x50
	__warn+0x98/0x190
	report_bug+0x18e/0x1a0
	handle_bug+0x3d/0x70
	exc_invalid_op+0x18/0x70
	asm_exc_invalid_op+0x1a/0x20
	RIP: 0010:__lock_acquire+0xccb/0x1ca0
	RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082
	RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8
	RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0
	RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb
	R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10
	R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004
	lock_acquire+0xbe/0x2d0
	_raw_spin_lock_irqsave+0x3a/0x60
	hugepage_subpool_put_pages.part.0+0xe/0xc0
	free_huge_folio+0x253/0x3f0
	dissolve_free_huge_page+0x147/0x210
	__page_handle_poison+0x9/0x70
	memory_failure+0x4e6/0x8c0
	hard_offline_page_store+0x55/0xa0
	kernfs_fop_write_iter+0x12c/0x1d0
	vfs_write+0x380/0x540
	ksys_write+0x64/0xe0
	do_syscall_64+0xbc/0x1d0
	entry_SYSCALL_64_after_hwframe+0x77/0x7f
	RIP: 0033:0x7ff9f3114887
	RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
	RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887
	RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001
	RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff
	R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
	R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00
	</TASK>

	After git bisecting and digging into the code, I believe the root cause is
	that _deferred_list field of folio is unioned with _hugetlb_subpool field.
	In __update_and_free_hugetlb_folio(), folio->_deferred_list is
	initialized leading to corrupted folio->_hugetlb_subpool when folio is
	hugetlb. Later free_huge_folio() will use _hugetlb_subpool and above
	warning happens.

	But it is assumed hugetlb flag must have been cleared when calling
	folio_put() in update_and_free_hugetlb_folio(). This assumption is broken
	due to below race:

	CPU1 CPU2
	dissolve_free_huge_page update_and_free_pages_bulk
	update_and_free_hugetlb_folio hugetlb_vmemmap_restore_folios
	folio_clear_hugetlb_vmemmap_optimized
	clear_flag = folio_test_hugetlb_vmemmap_optimized
	if (clear_flag) <-- False, it's already cleared.
	__folio_clear_hugetlb(folio) <-- Hugetlb is not cleared.
	folio_put
	free_huge_folio <-- free_the_page is expected.
	list_for_each_entry()
	__folio_clear_hugetlb <-- Too late.

	Fix this issue by checking whether folio is hugetlb directly instead of
	checking clear_flag to close the race window.

	The Linux kernel CVE team has assigned CVE-2024-36028 to this issue.


	Affected and fixed versions
	===========================

	Issue introduced in 6.1.47 with commit 1b4ce2952b4f33e198d5e993acff0611dff1e399 and fixed in 6.1.91 with commit 2effe407f7563add41750fd7e03da4ea44b98099
	Issue introduced in 6.5 with commit 32c877191e022b55fe3a374f3d7e9fb5741c514d and fixed in 6.6.31 with commit 7e0a322877416e8c648819a8e441cf8c790b2cce
	Issue introduced in 6.5 with commit 32c877191e022b55fe3a374f3d7e9fb5741c514d and fixed in 6.8.9 with commit 9c9b32d46afab2d911897914181c488954012300
	Issue introduced in 6.5 with commit 32c877191e022b55fe3a374f3d7e9fb5741c514d and fixed in 6.9 with commit 52ccdde16b6540abe43b6f8d8e1e1ec90b0983af
	Issue introduced in 6.4.11 with commit 9a1a43a0e7e96911eaa00ad20b20f2edefb31d8a

	Please see https://www.kernel.org for a full list of currently supported
	kernel versions by the kernel community.

	Unaffected versions might change over time as fixes are backported to
	older supported kernel versions. The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2024-36028
	will be updated if fixes are backported, please check that for the most
	up to date information about this issue.


	Affected files
	==============

	The file(s) affected by this issue are:
	mm/hugetlb.c


	Mitigation
	==========

	The Linux kernel CVE team recommends that you update to the latest
	stable kernel version for this, and many other bugfixes. Individual
	changes are never tested alone, but rather are part of a larger kernel
	release. Cherry-picking individual commits is not recommended or
	supported by the Linux kernel community at all. If however, updating to
	the latest release is impossible, the individual changes to resolve this
	issue can be found at these commits:
	https://git.kernel.org/stable/c/2effe407f7563add41750fd7e03da4ea44b98099
	https://git.kernel.org/stable/c/7e0a322877416e8c648819a8e441cf8c790b2cce
	https://git.kernel.org/stable/c/9c9b32d46afab2d911897914181c488954012300
	https://git.kernel.org/stable/c/52ccdde16b6540abe43b6f8d8e1e1ec90b0983af