cve/published/2025/CVE-2025-21880.mbox - pub/scm/linux/security/vulns - Git at Google

 From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
 From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 To: <linux-cve-announce@vger.kernel.org>
 Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
 Subject: CVE-2025-21880: drm/xe/userptr: fix EFAULT handling

 Description
 ===========

 In the Linux kernel, the following vulnerability has been resolved:

 drm/xe/userptr: fix EFAULT handling

 Currently we treat EFAULT from hmm_range_fault() as a non-fatal error
 when called from xe_vm_userptr_pin() with the idea that we want to avoid
 killing the entire vm and chucking an error, under the assumption that
 the user just did an unmap or something, and has no intention of
 actually touching that memory from the GPU.  At this point we have
 already zapped the PTEs so any access should generate a page fault, and
 if the pin fails there also it will then become fatal.

 However it looks like it's possible for the userptr vma to still be on
 the rebind list in preempt_rebind_work_func(), if we had to retry the
 pin again due to something happening in the caller before we did the
 rebind step, but in the meantime needing to re-validate the userptr and
 this time hitting the EFAULT.

 This explains an internal user report of hitting:

 [  191.738349] WARNING: CPU: 1 PID: 157 at drivers/gpu/drm/xe/xe_res_cursor.h:158 xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
 [  191.738551] Workqueue: xe-ordered-wq preempt_rebind_work_func [xe]
 [  191.738616] RIP: 0010:xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
 [  191.738690] Call Trace:
 [  191.738692]  <TASK>
 [  191.738694]  ? show_regs+0x69/0x80
 [  191.738698]  ? __warn+0x93/0x1a0
 [  191.738703]  ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
 [  191.738759]  ? report_bug+0x18f/0x1a0
 [  191.738764]  ? handle_bug+0x63/0xa0
 [  191.738767]  ? exc_invalid_op+0x19/0x70
 [  191.738770]  ? asm_exc_invalid_op+0x1b/0x20
 [  191.738777]  ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
 [  191.738834]  ? ret_from_fork_asm+0x1a/0x30
 [  191.738849]  bind_op_prepare+0x105/0x7b0 [xe]
 [  191.738906]  ? dma_resv_reserve_fences+0x301/0x380
 [  191.738912]  xe_pt_update_ops_prepare+0x28c/0x4b0 [xe]
 [  191.738966]  ? kmemleak_alloc+0x4b/0x80
 [  191.738973]  ops_execute+0x188/0x9d0 [xe]
 [  191.739036]  xe_vm_rebind+0x4ce/0x5a0 [xe]
 [  191.739098]  ? trace_hardirqs_on+0x4d/0x60
 [  191.739112]  preempt_rebind_work_func+0x76f/0xd00 [xe]

 Followed by NPD, when running some workload, since the sg was never
 actually populated but the vma is still marked for rebind when it should
 be skipped for this special EFAULT case. This is confirmed to fix the
 user report.

 v2 (MattB):
  - Move earlier.
 v3 (MattB):
  - Update the commit message to make it clear that this indeed fixes the
    issue.

 (cherry picked from commit 6b93cb98910c826c2e2004942f8b060311e43618)

 The Linux kernel CVE team has assigned CVE-2025-21880 to this issue.


 Affected and fixed versions
 ===========================

 	Issue introduced in 6.10 with commit 521db22a1d70dbc596a07544a738416025b1b63c and fixed in 6.12.18 with commit daad16d0a538fa938e344fd83927bbcfcd8a66ec
 	Issue introduced in 6.10 with commit 521db22a1d70dbc596a07544a738416025b1b63c and fixed in 6.13.6 with commit 51cc278f8ffacd5f9dc7d13191b81b912829db59
 	Issue introduced in 6.10 with commit 521db22a1d70dbc596a07544a738416025b1b63c and fixed in 6.14 with commit a9f4fa3a7efa65615ff7db13023ac84516e99e21

 Please see https://www.kernel.org for a full list of currently supported
 kernel versions by the kernel community.

 Unaffected versions might change over time as fixes are backported to
 older supported kernel versions.  The official CVE entry at
 	https://cve.org/CVERecord/?id=CVE-2025-21880
 will be updated if fixes are backported, please check that for the most
 up to date information about this issue.


 Affected files
 ==============

 The file(s) affected by this issue are:
 	drivers/gpu/drm/xe/xe_vm.c


 Mitigation
 ==========

 The Linux kernel CVE team recommends that you update to the latest
 stable kernel version for this, and many other bugfixes.  Individual
 changes are never tested alone, but rather are part of a larger kernel
 release.  Cherry-picking individual commits is not recommended or
 supported by the Linux kernel community at all.  If however, updating to
 the latest release is impossible, the individual changes to resolve this
 issue can be found at these commits:
 	https://git.kernel.org/stable/c/daad16d0a538fa938e344fd83927bbcfcd8a66ec
 	https://git.kernel.org/stable/c/51cc278f8ffacd5f9dc7d13191b81b912829db59
 	https://git.kernel.org/stable/c/a9f4fa3a7efa65615ff7db13023ac84516e99e21
	From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
	From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	To: <linux-cve-announce@vger.kernel.org>
	Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
	Subject: CVE-2025-21880: drm/xe/userptr: fix EFAULT handling

	Description
	===========

	In the Linux kernel, the following vulnerability has been resolved:

	drm/xe/userptr: fix EFAULT handling

	Currently we treat EFAULT from hmm_range_fault() as a non-fatal error
	when called from xe_vm_userptr_pin() with the idea that we want to avoid
	killing the entire vm and chucking an error, under the assumption that
	the user just did an unmap or something, and has no intention of
	actually touching that memory from the GPU. At this point we have
	already zapped the PTEs so any access should generate a page fault, and
	if the pin fails there also it will then become fatal.

	However it looks like it's possible for the userptr vma to still be on
	the rebind list in preempt_rebind_work_func(), if we had to retry the
	pin again due to something happening in the caller before we did the
	rebind step, but in the meantime needing to re-validate the userptr and
	this time hitting the EFAULT.

	This explains an internal user report of hitting:

	[ 191.738349] WARNING: CPU: 1 PID: 157 at drivers/gpu/drm/xe/xe_res_cursor.h:158 xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
	[ 191.738551] Workqueue: xe-ordered-wq preempt_rebind_work_func [xe]
	[ 191.738616] RIP: 0010:xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
	[ 191.738690] Call Trace:
	[ 191.738692] <TASK>
	[ 191.738694] ? show_regs+0x69/0x80
	[ 191.738698] ? __warn+0x93/0x1a0
	[ 191.738703] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
	[ 191.738759] ? report_bug+0x18f/0x1a0
	[ 191.738764] ? handle_bug+0x63/0xa0
	[ 191.738767] ? exc_invalid_op+0x19/0x70
	[ 191.738770] ? asm_exc_invalid_op+0x1b/0x20
	[ 191.738777] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
	[ 191.738834] ? ret_from_fork_asm+0x1a/0x30
	[ 191.738849] bind_op_prepare+0x105/0x7b0 [xe]
	[ 191.738906] ? dma_resv_reserve_fences+0x301/0x380
	[ 191.738912] xe_pt_update_ops_prepare+0x28c/0x4b0 [xe]
	[ 191.738966] ? kmemleak_alloc+0x4b/0x80
	[ 191.738973] ops_execute+0x188/0x9d0 [xe]
	[ 191.739036] xe_vm_rebind+0x4ce/0x5a0 [xe]
	[ 191.739098] ? trace_hardirqs_on+0x4d/0x60
	[ 191.739112] preempt_rebind_work_func+0x76f/0xd00 [xe]

	Followed by NPD, when running some workload, since the sg was never
	actually populated but the vma is still marked for rebind when it should
	be skipped for this special EFAULT case. This is confirmed to fix the
	user report.

	v2 (MattB):
	- Move earlier.
	v3 (MattB):
	- Update the commit message to make it clear that this indeed fixes the
	issue.

	(cherry picked from commit 6b93cb98910c826c2e2004942f8b060311e43618)

	The Linux kernel CVE team has assigned CVE-2025-21880 to this issue.


	Affected and fixed versions
	===========================

	Issue introduced in 6.10 with commit 521db22a1d70dbc596a07544a738416025b1b63c and fixed in 6.12.18 with commit daad16d0a538fa938e344fd83927bbcfcd8a66ec
	Issue introduced in 6.10 with commit 521db22a1d70dbc596a07544a738416025b1b63c and fixed in 6.13.6 with commit 51cc278f8ffacd5f9dc7d13191b81b912829db59
	Issue introduced in 6.10 with commit 521db22a1d70dbc596a07544a738416025b1b63c and fixed in 6.14 with commit a9f4fa3a7efa65615ff7db13023ac84516e99e21

	Please see https://www.kernel.org for a full list of currently supported
	kernel versions by the kernel community.

	Unaffected versions might change over time as fixes are backported to
	older supported kernel versions. The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2025-21880
	will be updated if fixes are backported, please check that for the most
	up to date information about this issue.


	Affected files
	==============

	The file(s) affected by this issue are:
	drivers/gpu/drm/xe/xe_vm.c


	Mitigation
	==========

	The Linux kernel CVE team recommends that you update to the latest
	stable kernel version for this, and many other bugfixes. Individual
	changes are never tested alone, but rather are part of a larger kernel
	release. Cherry-picking individual commits is not recommended or
	supported by the Linux kernel community at all. If however, updating to
	the latest release is impossible, the individual changes to resolve this
	issue can be found at these commits:
	https://git.kernel.org/stable/c/daad16d0a538fa938e344fd83927bbcfcd8a66ec
	https://git.kernel.org/stable/c/51cc278f8ffacd5f9dc7d13191b81b912829db59
	https://git.kernel.org/stable/c/a9f4fa3a7efa65615ff7db13023ac84516e99e21