cve/published/2025/CVE-2025-39989.mbox - pub/scm/linux/security/vulns - Git at Google

 From bippy-1.2.0 Mon Sep 17 00:00:00 2001
 From: Greg Kroah-Hartman <gregkh@kernel.org>
 To: <linux-cve-announce@vger.kernel.org>
 Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
 Subject: CVE-2025-39989: x86/mce: use is_copy_from_user() to determine copy-from-user context

 Description
 ===========

 In the Linux kernel, the following vulnerability has been resolved:

 x86/mce: use is_copy_from_user() to determine copy-from-user context

 Patch series "mm/hwpoison: Fix regressions in memory failure handling",
 v4.

 ## 1. What am I trying to do:

 This patchset resolves two critical regressions related to memory failure
 handling that have appeared in the upstream kernel since version 5.17, as
 compared to 5.10 LTS.

     - copyin case: poison found in user page while kernel copying from user space
     - instr case: poison found while instruction fetching in user space

 ## 2. What is the expected outcome and why

 - For copyin case:

 Kernel can recover from poison found where kernel is doing get_user() or
 copy_from_user() if those places get an error return and the kernel return
 -EFAULT to the process instead of crashing.  More specifily, MCE handler
 checks the fixup handler type to decide whether an in kernel #MC can be
 recovered.  When EX_TYPE_UACCESS is found, the PC jumps to recovery code
 specified in _ASM_EXTABLE_FAULT() and return a -EFAULT to user space.

 - For instr case:

 If a poison found while instruction fetching in user space, full recovery
 is possible.  User process takes #PF, Linux allocates a new page and fills
 by reading from storage.


 ## 3. What actually happens and why

 - For copyin case: kernel panic since v5.17

 Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new
 extable fixup type, EX_TYPE_EFAULT_REG, and later patches updated the
 extable fixup type for copy-from-user operations, changing it from
 EX_TYPE_UACCESS to EX_TYPE_EFAULT_REG.  It breaks previous EX_TYPE_UACCESS
 handling when posion found in get_user() or copy_from_user().

 - For instr case: user process is killed by a SIGBUS signal due to #CMCI
   and #MCE race

 When an uncorrected memory error is consumed there is a race between the
 CMCI from the memory controller reporting an uncorrected error with a UCNA
 signature, and the core reporting and SRAR signature machine check when
 the data is about to be consumed.

 ### Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1]

 Prior to Icelake memory controllers reported patrol scrub events that
 detected a previously unseen uncorrected error in memory by signaling a
 broadcast machine check with an SRAO (Software Recoverable Action
 Optional) signature in the machine check bank.  This was overkill because
 it's not an urgent problem that no core is on the verge of consuming that
 bad data.  It's also found that multi SRAO UCE may cause nested MCE
 interrupts and finally become an IERR.

 Hence, Intel downgrades the machine check bank signature of patrol scrub
 from SRAO to UCNA (Uncorrected, No Action required), and signal changed to
 #CMCI.  Just to add to the confusion, Linux does take an action (in
 uc_decode_notifier()) to try to offline the page despite the UC*NA*
 signature name.

 ### Background: why #CMCI and #MCE race when poison is consuming in
     Intel platform [1]

 Having decided that CMCI/UCNA is the best action for patrol scrub errors,
 the memory controller uses it for reads too.  But the memory controller is
 executing asynchronously from the core, and can't tell the difference
 between a "real" read and a speculative read.  So it will do CMCI/UCNA if
 an error is found in any read.

 Thus:

 1) Core is clever and thinks address A is needed soon, issues a
    speculative read.

 2) Core finds it is going to use address A soon after sending the read
    request

 3) The CMCI from the memory controller is in a race with MCE from the
    core that will soon try to retire the load from address A.

 Quite often (because speculation has got better) the CMCI from the memory
 controller is delivered before the core is committed to the instruction
 reading address A, so the interrupt is taken, and Linux offlines the page
 (marking it as poison).


 ## Why user process is killed for instr case

 Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported
 "not recovered"") tries to fix noise message "Memory error not recovered"
 and skips duplicate SIGBUSs due to the race.  But it also introduced a bug
 that kill_accessing_process() return -EHWPOISON for instr case, as result,
 kill_me_maybe() send a SIGBUS to user process.

 # 4. The fix, in my opinion, should be:

 - For copyin case:

 The key point is whether the error context is in a read from user memory.
 We do not care about the ex-type if we know its a MOV reading from
 userspace.

 is_copy_from_user() return true when both of the following two checks are
 true:

     - the current instruction is copy
     - source address is user memory

 If copy_user is true, we set

 m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV;

 Then do_machine_check() will try fixup_exception() first.

 - For instr case: let kill_accessing_process() return 0 to prevent a SIGBUS.

 - For patch 3:

 The return value of memory_failure() is quite important while discussed
 instr case regression with Tony and Miaohe for patch 2, so add comment
 about the return value.


 This patch (of 3):

 Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new
 extable fixup type, EX_TYPE_EFAULT_REG, and commit 4c132d1d844a
 ("x86/futex: Remove .fixup usage") updated the extable fixup type for
 copy-from-user operations, changing it from EX_TYPE_UACCESS to
 EX_TYPE_EFAULT_REG.  The error context for copy-from-user operations no
 longer functions as an in-kernel recovery context.  Consequently, the
 error context for copy-from-user operations no longer functions as an
 in-kernel recovery context, resulting in kernel panics with the message:
 "Machine check: Data load in unrecoverable area of kernel."

 To address this, it is crucial to identify if an error context involves a
 read operation from user memory.  The function is_copy_from_user() can be
 utilized to determine:

     - the current operation is copy
     - when reading user memory

 When these conditions are met, is_copy_from_user() will return true,
 confirming that it is indeed a direct copy from user memory.  This check
 is essential for correctly handling the context of errors in these
 operations without relying on the extable fixup types that previously
 allowed for in-kernel recovery.

 So, use is_copy_from_user() to determine if a context is copy user directly.

 The Linux kernel CVE team has assigned CVE-2025-39989 to this issue.


 Affected and fixed versions
 ===========================

 	Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.6.89 with commit 5724654a084f701dc64b08d34a0e800f22f0e6e4
 	Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.12.23 with commit 3e3d8169c0950a0b3cd5105f6403a78350dcac80
 	Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.13.11 with commit 449413da90a337f343cc5a73070cbd68e92e8a54
 	Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.14.2 with commit 0b8388e97ba6a8c033f9a8b5565af41af07f9345
 	Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.15 with commit 1a15bb8303b6b104e78028b6c68f76a0d4562134
 	Issue introduced in 5.15.58 with commit 88eded8104d2ca0429703755dd250f8cbecc1447

 Please see https://www.kernel.org for a full list of currently supported
 kernel versions by the kernel community.

 Unaffected versions might change over time as fixes are backported to
 older supported kernel versions.  The official CVE entry at
 	https://cve.org/CVERecord/?id=CVE-2025-39989
 will be updated if fixes are backported, please check that for the most
 up to date information about this issue.


 Affected files
 ==============

 The file(s) affected by this issue are:
 	arch/x86/kernel/cpu/mce/severity.c


 Mitigation
 ==========

 The Linux kernel CVE team recommends that you update to the latest
 stable kernel version for this, and many other bugfixes.  Individual
 changes are never tested alone, but rather are part of a larger kernel
 release.  Cherry-picking individual commits is not recommended or
 supported by the Linux kernel community at all.  If however, updating to
 the latest release is impossible, the individual changes to resolve this
 issue can be found at these commits:
 	https://git.kernel.org/stable/c/5724654a084f701dc64b08d34a0e800f22f0e6e4
 	https://git.kernel.org/stable/c/3e3d8169c0950a0b3cd5105f6403a78350dcac80
 	https://git.kernel.org/stable/c/449413da90a337f343cc5a73070cbd68e92e8a54
 	https://git.kernel.org/stable/c/0b8388e97ba6a8c033f9a8b5565af41af07f9345
 	https://git.kernel.org/stable/c/1a15bb8303b6b104e78028b6c68f76a0d4562134
	From bippy-1.2.0 Mon Sep 17 00:00:00 2001
	From: Greg Kroah-Hartman <gregkh@kernel.org>
	To: <linux-cve-announce@vger.kernel.org>
	Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
	Subject: CVE-2025-39989: x86/mce: use is_copy_from_user() to determine copy-from-user context

	Description
	===========

	In the Linux kernel, the following vulnerability has been resolved:

	x86/mce: use is_copy_from_user() to determine copy-from-user context

	Patch series "mm/hwpoison: Fix regressions in memory failure handling",
	v4.

	## 1. What am I trying to do:

	This patchset resolves two critical regressions related to memory failure
	handling that have appeared in the upstream kernel since version 5.17, as
	compared to 5.10 LTS.

	- copyin case: poison found in user page while kernel copying from user space
	- instr case: poison found while instruction fetching in user space

	## 2. What is the expected outcome and why

	- For copyin case:

	Kernel can recover from poison found where kernel is doing get_user() or
	copy_from_user() if those places get an error return and the kernel return
	-EFAULT to the process instead of crashing. More specifily, MCE handler
	checks the fixup handler type to decide whether an in kernel #MC can be
	recovered. When EX_TYPE_UACCESS is found, the PC jumps to recovery code
	specified in _ASM_EXTABLE_FAULT() and return a -EFAULT to user space.

	- For instr case:

	If a poison found while instruction fetching in user space, full recovery
	is possible. User process takes #PF, Linux allocates a new page and fills
	by reading from storage.


	## 3. What actually happens and why

	- For copyin case: kernel panic since v5.17

	Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new
	extable fixup type, EX_TYPE_EFAULT_REG, and later patches updated the
	extable fixup type for copy-from-user operations, changing it from
	EX_TYPE_UACCESS to EX_TYPE_EFAULT_REG. It breaks previous EX_TYPE_UACCESS
	handling when posion found in get_user() or copy_from_user().

	- For instr case: user process is killed by a SIGBUS signal due to #CMCI
	and #MCE race

	When an uncorrected memory error is consumed there is a race between the
	CMCI from the memory controller reporting an uncorrected error with a UCNA
	signature, and the core reporting and SRAR signature machine check when
	the data is about to be consumed.

	### Background: why UNcorrected errors tied to CMCI in Intel platform [1]

	Prior to Icelake memory controllers reported patrol scrub events that
	detected a previously unseen uncorrected error in memory by signaling a
	broadcast machine check with an SRAO (Software Recoverable Action
	Optional) signature in the machine check bank. This was overkill because
	it's not an urgent problem that no core is on the verge of consuming that
	bad data. It's also found that multi SRAO UCE may cause nested MCE
	interrupts and finally become an IERR.

	Hence, Intel downgrades the machine check bank signature of patrol scrub
	from SRAO to UCNA (Uncorrected, No Action required), and signal changed to
	#CMCI. Just to add to the confusion, Linux does take an action (in
	uc_decode_notifier()) to try to offline the page despite the UCNA
	signature name.

	### Background: why #CMCI and #MCE race when poison is consuming in
	Intel platform [1]

	Having decided that CMCI/UCNA is the best action for patrol scrub errors,
	the memory controller uses it for reads too. But the memory controller is
	executing asynchronously from the core, and can't tell the difference
	between a "real" read and a speculative read. So it will do CMCI/UCNA if
	an error is found in any read.

	Thus:

	1) Core is clever and thinks address A is needed soon, issues a
	speculative read.

	2) Core finds it is going to use address A soon after sending the read
	request

	3) The CMCI from the memory controller is in a race with MCE from the
	core that will soon try to retire the load from address A.

	Quite often (because speculation has got better) the CMCI from the memory
	controller is delivered before the core is committed to the instruction
	reading address A, so the interrupt is taken, and Linux offlines the page
	(marking it as poison).


	## Why user process is killed for instr case

	Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported
	"not recovered"") tries to fix noise message "Memory error not recovered"
	and skips duplicate SIGBUSs due to the race. But it also introduced a bug
	that kill_accessing_process() return -EHWPOISON for instr case, as result,
	kill_me_maybe() send a SIGBUS to user process.

	# 4. The fix, in my opinion, should be:

	- For copyin case:

	The key point is whether the error context is in a read from user memory.
	We do not care about the ex-type if we know its a MOV reading from
	userspace.

	is_copy_from_user() return true when both of the following two checks are
	true:

	- the current instruction is copy
	- source address is user memory

	If copy_user is true, we set

	m->kflags \|= MCE_IN_KERNEL_COPYIN \| MCE_IN_KERNEL_RECOV;

	Then do_machine_check() will try fixup_exception() first.

	- For instr case: let kill_accessing_process() return 0 to prevent a SIGBUS.

	- For patch 3:

	The return value of memory_failure() is quite important while discussed
	instr case regression with Tony and Miaohe for patch 2, so add comment
	about the return value.


	This patch (of 3):

	Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new
	extable fixup type, EX_TYPE_EFAULT_REG, and commit 4c132d1d844a
	("x86/futex: Remove .fixup usage") updated the extable fixup type for
	copy-from-user operations, changing it from EX_TYPE_UACCESS to
	EX_TYPE_EFAULT_REG. The error context for copy-from-user operations no
	longer functions as an in-kernel recovery context. Consequently, the
	error context for copy-from-user operations no longer functions as an
	in-kernel recovery context, resulting in kernel panics with the message:
	"Machine check: Data load in unrecoverable area of kernel."

	To address this, it is crucial to identify if an error context involves a
	read operation from user memory. The function is_copy_from_user() can be
	utilized to determine:

	- the current operation is copy
	- when reading user memory

	When these conditions are met, is_copy_from_user() will return true,
	confirming that it is indeed a direct copy from user memory. This check
	is essential for correctly handling the context of errors in these
	operations without relying on the extable fixup types that previously
	allowed for in-kernel recovery.

	So, use is_copy_from_user() to determine if a context is copy user directly.

	The Linux kernel CVE team has assigned CVE-2025-39989 to this issue.


	Affected and fixed versions
	===========================

	Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.6.89 with commit 5724654a084f701dc64b08d34a0e800f22f0e6e4
	Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.12.23 with commit 3e3d8169c0950a0b3cd5105f6403a78350dcac80
	Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.13.11 with commit 449413da90a337f343cc5a73070cbd68e92e8a54
	Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.14.2 with commit 0b8388e97ba6a8c033f9a8b5565af41af07f9345
	Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.15 with commit 1a15bb8303b6b104e78028b6c68f76a0d4562134
	Issue introduced in 5.15.58 with commit 88eded8104d2ca0429703755dd250f8cbecc1447

	Please see https://www.kernel.org for a full list of currently supported
	kernel versions by the kernel community.

	Unaffected versions might change over time as fixes are backported to
	older supported kernel versions. The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2025-39989
	will be updated if fixes are backported, please check that for the most
	up to date information about this issue.


	Affected files
	==============

	The file(s) affected by this issue are:
	arch/x86/kernel/cpu/mce/severity.c


	Mitigation
	==========

	The Linux kernel CVE team recommends that you update to the latest
	stable kernel version for this, and many other bugfixes. Individual
	changes are never tested alone, but rather are part of a larger kernel
	release. Cherry-picking individual commits is not recommended or
	supported by the Linux kernel community at all. If however, updating to
	the latest release is impossible, the individual changes to resolve this
	issue can be found at these commits:
	https://git.kernel.org/stable/c/5724654a084f701dc64b08d34a0e800f22f0e6e4
	https://git.kernel.org/stable/c/3e3d8169c0950a0b3cd5105f6403a78350dcac80
	https://git.kernel.org/stable/c/449413da90a337f343cc5a73070cbd68e92e8a54
	https://git.kernel.org/stable/c/0b8388e97ba6a8c033f9a8b5565af41af07f9345
	https://git.kernel.org/stable/c/1a15bb8303b6b104e78028b6c68f76a0d4562134