| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2025-21880: drm/xe/userptr: fix EFAULT handling |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| drm/xe/userptr: fix EFAULT handling |
| |
| Currently we treat EFAULT from hmm_range_fault() as a non-fatal error |
| when called from xe_vm_userptr_pin() with the idea that we want to avoid |
| killing the entire vm and chucking an error, under the assumption that |
| the user just did an unmap or something, and has no intention of |
| actually touching that memory from the GPU. At this point we have |
| already zapped the PTEs so any access should generate a page fault, and |
| if the pin fails there also it will then become fatal. |
| |
| However it looks like it's possible for the userptr vma to still be on |
| the rebind list in preempt_rebind_work_func(), if we had to retry the |
| pin again due to something happening in the caller before we did the |
| rebind step, but in the meantime needing to re-validate the userptr and |
| this time hitting the EFAULT. |
| |
| This explains an internal user report of hitting: |
| |
| [ 191.738349] WARNING: CPU: 1 PID: 157 at drivers/gpu/drm/xe/xe_res_cursor.h:158 xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] |
| [ 191.738551] Workqueue: xe-ordered-wq preempt_rebind_work_func [xe] |
| [ 191.738616] RIP: 0010:xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] |
| [ 191.738690] Call Trace: |
| [ 191.738692] <TASK> |
| [ 191.738694] ? show_regs+0x69/0x80 |
| [ 191.738698] ? __warn+0x93/0x1a0 |
| [ 191.738703] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] |
| [ 191.738759] ? report_bug+0x18f/0x1a0 |
| [ 191.738764] ? handle_bug+0x63/0xa0 |
| [ 191.738767] ? exc_invalid_op+0x19/0x70 |
| [ 191.738770] ? asm_exc_invalid_op+0x1b/0x20 |
| [ 191.738777] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] |
| [ 191.738834] ? ret_from_fork_asm+0x1a/0x30 |
| [ 191.738849] bind_op_prepare+0x105/0x7b0 [xe] |
| [ 191.738906] ? dma_resv_reserve_fences+0x301/0x380 |
| [ 191.738912] xe_pt_update_ops_prepare+0x28c/0x4b0 [xe] |
| [ 191.738966] ? kmemleak_alloc+0x4b/0x80 |
| [ 191.738973] ops_execute+0x188/0x9d0 [xe] |
| [ 191.739036] xe_vm_rebind+0x4ce/0x5a0 [xe] |
| [ 191.739098] ? trace_hardirqs_on+0x4d/0x60 |
| [ 191.739112] preempt_rebind_work_func+0x76f/0xd00 [xe] |
| |
| Followed by NPD, when running some workload, since the sg was never |
| actually populated but the vma is still marked for rebind when it should |
| be skipped for this special EFAULT case. This is confirmed to fix the |
| user report. |
| |
| v2 (MattB): |
| - Move earlier. |
| v3 (MattB): |
| - Update the commit message to make it clear that this indeed fixes the |
| issue. |
| |
| (cherry picked from commit 6b93cb98910c826c2e2004942f8b060311e43618) |
| |
| The Linux kernel CVE team has assigned CVE-2025-21880 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.10 with commit 521db22a1d70dbc596a07544a738416025b1b63c and fixed in 6.12.18 with commit daad16d0a538fa938e344fd83927bbcfcd8a66ec |
| Issue introduced in 6.10 with commit 521db22a1d70dbc596a07544a738416025b1b63c and fixed in 6.13.6 with commit 51cc278f8ffacd5f9dc7d13191b81b912829db59 |
| Issue introduced in 6.10 with commit 521db22a1d70dbc596a07544a738416025b1b63c and fixed in 6.14 with commit a9f4fa3a7efa65615ff7db13023ac84516e99e21 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2025-21880 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| drivers/gpu/drm/xe/xe_vm.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/daad16d0a538fa938e344fd83927bbcfcd8a66ec |
| https://git.kernel.org/stable/c/51cc278f8ffacd5f9dc7d13191b81b912829db59 |
| https://git.kernel.org/stable/c/a9f4fa3a7efa65615ff7db13023ac84516e99e21 |