| From bippy-1.2.0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@kernel.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2025-39989: x86/mce: use is_copy_from_user() to determine copy-from-user context |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| x86/mce: use is_copy_from_user() to determine copy-from-user context |
| |
| Patch series "mm/hwpoison: Fix regressions in memory failure handling", |
| v4. |
| |
| ## 1. What am I trying to do: |
| |
| This patchset resolves two critical regressions related to memory failure |
| handling that have appeared in the upstream kernel since version 5.17, as |
| compared to 5.10 LTS. |
| |
| - copyin case: poison found in user page while kernel copying from user space |
| - instr case: poison found while instruction fetching in user space |
| |
| ## 2. What is the expected outcome and why |
| |
| - For copyin case: |
| |
| Kernel can recover from poison found where kernel is doing get_user() or |
| copy_from_user() if those places get an error return and the kernel return |
| -EFAULT to the process instead of crashing. More specifily, MCE handler |
| checks the fixup handler type to decide whether an in kernel #MC can be |
| recovered. When EX_TYPE_UACCESS is found, the PC jumps to recovery code |
| specified in _ASM_EXTABLE_FAULT() and return a -EFAULT to user space. |
| |
| - For instr case: |
| |
| If a poison found while instruction fetching in user space, full recovery |
| is possible. User process takes #PF, Linux allocates a new page and fills |
| by reading from storage. |
| |
| |
| ## 3. What actually happens and why |
| |
| - For copyin case: kernel panic since v5.17 |
| |
| Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new |
| extable fixup type, EX_TYPE_EFAULT_REG, and later patches updated the |
| extable fixup type for copy-from-user operations, changing it from |
| EX_TYPE_UACCESS to EX_TYPE_EFAULT_REG. It breaks previous EX_TYPE_UACCESS |
| handling when posion found in get_user() or copy_from_user(). |
| |
| - For instr case: user process is killed by a SIGBUS signal due to #CMCI |
| and #MCE race |
| |
| When an uncorrected memory error is consumed there is a race between the |
| CMCI from the memory controller reporting an uncorrected error with a UCNA |
| signature, and the core reporting and SRAR signature machine check when |
| the data is about to be consumed. |
| |
| ### Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1] |
| |
| Prior to Icelake memory controllers reported patrol scrub events that |
| detected a previously unseen uncorrected error in memory by signaling a |
| broadcast machine check with an SRAO (Software Recoverable Action |
| Optional) signature in the machine check bank. This was overkill because |
| it's not an urgent problem that no core is on the verge of consuming that |
| bad data. It's also found that multi SRAO UCE may cause nested MCE |
| interrupts and finally become an IERR. |
| |
| Hence, Intel downgrades the machine check bank signature of patrol scrub |
| from SRAO to UCNA (Uncorrected, No Action required), and signal changed to |
| #CMCI. Just to add to the confusion, Linux does take an action (in |
| uc_decode_notifier()) to try to offline the page despite the UC*NA* |
| signature name. |
| |
| ### Background: why #CMCI and #MCE race when poison is consuming in |
| Intel platform [1] |
| |
| Having decided that CMCI/UCNA is the best action for patrol scrub errors, |
| the memory controller uses it for reads too. But the memory controller is |
| executing asynchronously from the core, and can't tell the difference |
| between a "real" read and a speculative read. So it will do CMCI/UCNA if |
| an error is found in any read. |
| |
| Thus: |
| |
| 1) Core is clever and thinks address A is needed soon, issues a |
| speculative read. |
| |
| 2) Core finds it is going to use address A soon after sending the read |
| request |
| |
| 3) The CMCI from the memory controller is in a race with MCE from the |
| core that will soon try to retire the load from address A. |
| |
| Quite often (because speculation has got better) the CMCI from the memory |
| controller is delivered before the core is committed to the instruction |
| reading address A, so the interrupt is taken, and Linux offlines the page |
| (marking it as poison). |
| |
| |
| ## Why user process is killed for instr case |
| |
| Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported |
| "not recovered"") tries to fix noise message "Memory error not recovered" |
| and skips duplicate SIGBUSs due to the race. But it also introduced a bug |
| that kill_accessing_process() return -EHWPOISON for instr case, as result, |
| kill_me_maybe() send a SIGBUS to user process. |
| |
| # 4. The fix, in my opinion, should be: |
| |
| - For copyin case: |
| |
| The key point is whether the error context is in a read from user memory. |
| We do not care about the ex-type if we know its a MOV reading from |
| userspace. |
| |
| is_copy_from_user() return true when both of the following two checks are |
| true: |
| |
| - the current instruction is copy |
| - source address is user memory |
| |
| If copy_user is true, we set |
| |
| m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV; |
| |
| Then do_machine_check() will try fixup_exception() first. |
| |
| - For instr case: let kill_accessing_process() return 0 to prevent a SIGBUS. |
| |
| - For patch 3: |
| |
| The return value of memory_failure() is quite important while discussed |
| instr case regression with Tony and Miaohe for patch 2, so add comment |
| about the return value. |
| |
| |
| This patch (of 3): |
| |
| Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new |
| extable fixup type, EX_TYPE_EFAULT_REG, and commit 4c132d1d844a |
| ("x86/futex: Remove .fixup usage") updated the extable fixup type for |
| copy-from-user operations, changing it from EX_TYPE_UACCESS to |
| EX_TYPE_EFAULT_REG. The error context for copy-from-user operations no |
| longer functions as an in-kernel recovery context. Consequently, the |
| error context for copy-from-user operations no longer functions as an |
| in-kernel recovery context, resulting in kernel panics with the message: |
| "Machine check: Data load in unrecoverable area of kernel." |
| |
| To address this, it is crucial to identify if an error context involves a |
| read operation from user memory. The function is_copy_from_user() can be |
| utilized to determine: |
| |
| - the current operation is copy |
| - when reading user memory |
| |
| When these conditions are met, is_copy_from_user() will return true, |
| confirming that it is indeed a direct copy from user memory. This check |
| is essential for correctly handling the context of errors in these |
| operations without relying on the extable fixup types that previously |
| allowed for in-kernel recovery. |
| |
| So, use is_copy_from_user() to determine if a context is copy user directly. |
| |
| The Linux kernel CVE team has assigned CVE-2025-39989 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.6.89 with commit 5724654a084f701dc64b08d34a0e800f22f0e6e4 |
| Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.12.23 with commit 3e3d8169c0950a0b3cd5105f6403a78350dcac80 |
| Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.13.11 with commit 449413da90a337f343cc5a73070cbd68e92e8a54 |
| Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.14.2 with commit 0b8388e97ba6a8c033f9a8b5565af41af07f9345 |
| Issue introduced in 5.17 with commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and fixed in 6.15 with commit 1a15bb8303b6b104e78028b6c68f76a0d4562134 |
| Issue introduced in 5.15.58 with commit 88eded8104d2ca0429703755dd250f8cbecc1447 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2025-39989 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| arch/x86/kernel/cpu/mce/severity.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/5724654a084f701dc64b08d34a0e800f22f0e6e4 |
| https://git.kernel.org/stable/c/3e3d8169c0950a0b3cd5105f6403a78350dcac80 |
| https://git.kernel.org/stable/c/449413da90a337f343cc5a73070cbd68e92e8a54 |
| https://git.kernel.org/stable/c/0b8388e97ba6a8c033f9a8b5565af41af07f9345 |
| https://git.kernel.org/stable/c/1a15bb8303b6b104e78028b6c68f76a0d4562134 |