cve/published/2024/CVE-2024-35795.mbox - pub/scm/linux/security/vulns - Git at Google

 From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
 From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 To: <linux-cve-announce@vger.kernel.org>
 Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
 Subject: CVE-2024-35795: drm/amdgpu: fix deadlock while reading mqd from debugfs

 Description
 ===========

 In the Linux kernel, the following vulnerability has been resolved:

 drm/amdgpu: fix deadlock while reading mqd from debugfs

 An errant disk backup on my desktop got into debugfs and triggered the
 following deadlock scenario in the amdgpu debugfs files. The machine
 also hard-resets immediately after those lines are printed (although I
 wasn't able to reproduce that part when reading by hand):

 [ 1318.016074][ T1082] ======================================================
 [ 1318.016607][ T1082] WARNING: possible circular locking dependency detected
 [ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted
 [ 1318.017598][ T1082] ------------------------------------------------------
 [ 1318.018096][ T1082] tar/1082 is trying to acquire lock:
 [ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80
 [ 1318.019084][ T1082]
 [ 1318.019084][ T1082] but task is already holding lock:
 [ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
 [ 1318.020607][ T1082]
 [ 1318.020607][ T1082] which lock already depends on the new lock.
 [ 1318.020607][ T1082]
 [ 1318.022081][ T1082]
 [ 1318.022081][ T1082] the existing dependency chain (in reverse order) is:
 [ 1318.023083][ T1082]
 [ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
 [ 1318.024114][ T1082]        __ww_mutex_lock.constprop.0+0xe0/0x12f0
 [ 1318.024639][ T1082]        ww_mutex_lock+0x32/0x90
 [ 1318.025161][ T1082]        dma_resv_lockdep+0x18a/0x330
 [ 1318.025683][ T1082]        do_one_initcall+0x6a/0x350
 [ 1318.026210][ T1082]        kernel_init_freeable+0x1a3/0x310
 [ 1318.026728][ T1082]        kernel_init+0x15/0x1a0
 [ 1318.027242][ T1082]        ret_from_fork+0x2c/0x40
 [ 1318.027759][ T1082]        ret_from_fork_asm+0x11/0x20
 [ 1318.028281][ T1082]
 [ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
 [ 1318.029297][ T1082]        dma_resv_lockdep+0x16c/0x330
 [ 1318.029790][ T1082]        do_one_initcall+0x6a/0x350
 [ 1318.030263][ T1082]        kernel_init_freeable+0x1a3/0x310
 [ 1318.030722][ T1082]        kernel_init+0x15/0x1a0
 [ 1318.031168][ T1082]        ret_from_fork+0x2c/0x40
 [ 1318.031598][ T1082]        ret_from_fork_asm+0x11/0x20
 [ 1318.032011][ T1082]
 [ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}:
 [ 1318.032778][ T1082]        __lock_acquire+0x14bf/0x2680
 [ 1318.033141][ T1082]        lock_acquire+0xcd/0x2c0
 [ 1318.033487][ T1082]        __might_fault+0x58/0x80
 [ 1318.033814][ T1082]        amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu]
 [ 1318.034181][ T1082]        full_proxy_read+0x55/0x80
 [ 1318.034487][ T1082]        vfs_read+0xa7/0x360
 [ 1318.034788][ T1082]        ksys_read+0x70/0xf0
 [ 1318.035085][ T1082]        do_syscall_64+0x94/0x180
 [ 1318.035375][ T1082]        entry_SYSCALL_64_after_hwframe+0x46/0x4e
 [ 1318.035664][ T1082]
 [ 1318.035664][ T1082] other info that might help us debug this:
 [ 1318.035664][ T1082]
 [ 1318.036487][ T1082] Chain exists of:
 [ 1318.036487][ T1082]   &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex
 [ 1318.036487][ T1082]
 [ 1318.037310][ T1082]  Possible unsafe locking scenario:
 [ 1318.037310][ T1082]
 [ 1318.037838][ T1082]        CPU0                    CPU1
 [ 1318.038101][ T1082]        ----                    ----
 [ 1318.038350][ T1082]   lock(reservation_ww_class_mutex);
 [ 1318.038590][ T1082]                                lock(reservation_ww_class_acquire);
 [ 1318.038839][ T1082]                                lock(reservation_ww_class_mutex);
 [ 1318.039083][ T1082]   rlock(&mm->mmap_lock);
 [ 1318.039328][ T1082]
 [ 1318.039328][ T1082]  *** DEADLOCK ***
 [ 1318.039328][ T1082]
 [ 1318.040029][ T1082] 1 lock held by tar/1082:
 [ 1318.040259][ T1082]  #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
 [ 1318.040560][ T1082]
 [ 1318.040560][ T1082] stack backtrace:
 [ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2
 [ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023
 [ 1318.041614][ T1082] Call Trace:
 [ 1318.041895][ T1082]  <TASK>
 [ 1318.042175][ T1082]  dump_stack_lvl+0x4a/0x80
 [ 1318.042460][ T1082]  check_noncircular+0x145/0x160
 [ 1318.042743][ T1082]  __lock_acquire+0x14bf/0x2680
 [ 1318.043022][ T1082]  lock_acquire+0xcd/0x2c0
 [ 1318.043301][ T1082]  ? __might_fault+0x40/0x80
 [ 1318.043580][ T1082]  ? __might_fault+0x40/0x80
 [ 1318.043856][ T1082]  __might_fault+0x58/0x80
 [ 1318.044131][ T1082]  ? __might_fault+0x40/0x80
 [ 1318.044408][ T1082]  amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab]
 [ 1318.044749][ T1082]  full_proxy_read+0x55/0x80
 [ 1318.045042][ T1082]  vfs_read+0xa7/0x360
 [ 1318.045333][ T1082]  ksys_read+0x70/0xf0
 [ 1318.045623][ T1082]  do_syscall_64+0x94/0x180
 [ 1318.045913][ T1082]  ? do_syscall_64+0xa0/0x180
 [ 1318.046201][ T1082]  ? lockdep_hardirqs_on+0x7d/0x100
 [ 1318.046487][ T1082]  ? do_syscall_64+0xa0/0x180
 [ 1318.046773][ T1082]  ? do_syscall_64+0xa0/0x180
 [ 1318.047057][ T1082]  ? do_syscall_64+0xa0/0x180
 [ 1318.047337][ T1082]  ? do_syscall_64+0xa0/0x180
 [ 1318.047611][ T1082]  entry_SYSCALL_64_after_hwframe+0x46/0x4e
 [ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d
 [ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
 [ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
 [ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d
 [ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008
 [ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007
 [ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00
 [ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800
 [ 1318.050638][ T1082]  </TASK>

 amdgpu_debugfs_mqd_read() holds a reservation when it calls
 put_user(), which may fault and acquire the mmap_sem. This violates
 the established locking order.

 Bounce the mqd data through a kernel buffer to get put_user() out of
 the illegal section.

 The Linux kernel CVE team has assigned CVE-2024-35795 to this issue.


 Affected and fixed versions
 ===========================

 	Issue introduced in 6.5 with commit 445d85e3c1dfd8c45b24be6f1527f1e117256d0e and fixed in 6.6.24 with commit 197f6d6987c55860f6eea1c93e4f800c59078874
 	Issue introduced in 6.5 with commit 445d85e3c1dfd8c45b24be6f1527f1e117256d0e and fixed in 6.7.12 with commit 8b03556da6e576c62664b6cd01809e4a09d53b5b
 	Issue introduced in 6.5 with commit 445d85e3c1dfd8c45b24be6f1527f1e117256d0e and fixed in 6.8.3 with commit 4687e3c6ee877ee25e57b984eca00be53b9a8db5
 	Issue introduced in 6.5 with commit 445d85e3c1dfd8c45b24be6f1527f1e117256d0e and fixed in 6.9 with commit 8678b1060ae2b75feb60b87e5b75e17374e3c1c5

 Please see https://www.kernel.org for a full list of currently supported
 kernel versions by the kernel community.

 Unaffected versions might change over time as fixes are backported to
 older supported kernel versions.  The official CVE entry at
 	https://cve.org/CVERecord/?id=CVE-2024-35795
 will be updated if fixes are backported, please check that for the most
 up to date information about this issue.


 Affected files
 ==============

 The file(s) affected by this issue are:
 	drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c


 Mitigation
 ==========

 The Linux kernel CVE team recommends that you update to the latest
 stable kernel version for this, and many other bugfixes.  Individual
 changes are never tested alone, but rather are part of a larger kernel
 release.  Cherry-picking individual commits is not recommended or
 supported by the Linux kernel community at all.  If however, updating to
 the latest release is impossible, the individual changes to resolve this
 issue can be found at these commits:
 	https://git.kernel.org/stable/c/197f6d6987c55860f6eea1c93e4f800c59078874
 	https://git.kernel.org/stable/c/8b03556da6e576c62664b6cd01809e4a09d53b5b
 	https://git.kernel.org/stable/c/4687e3c6ee877ee25e57b984eca00be53b9a8db5
 	https://git.kernel.org/stable/c/8678b1060ae2b75feb60b87e5b75e17374e3c1c5
	From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
	From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	To: <linux-cve-announce@vger.kernel.org>
	Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
	Subject: CVE-2024-35795: drm/amdgpu: fix deadlock while reading mqd from debugfs

	Description
	===========

	In the Linux kernel, the following vulnerability has been resolved:

	drm/amdgpu: fix deadlock while reading mqd from debugfs

	An errant disk backup on my desktop got into debugfs and triggered the
	following deadlock scenario in the amdgpu debugfs files. The machine
	also hard-resets immediately after those lines are printed (although I
	wasn't able to reproduce that part when reading by hand):

	[ 1318.016074][ T1082] ======================================================
	[ 1318.016607][ T1082] WARNING: possible circular locking dependency detected
	[ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted
	[ 1318.017598][ T1082] ------------------------------------------------------
	[ 1318.018096][ T1082] tar/1082 is trying to acquire lock:
	[ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80
	[ 1318.019084][ T1082]
	[ 1318.019084][ T1082] but task is already holding lock:
	[ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
	[ 1318.020607][ T1082]
	[ 1318.020607][ T1082] which lock already depends on the new lock.
	[ 1318.020607][ T1082]
	[ 1318.022081][ T1082]
	[ 1318.022081][ T1082] the existing dependency chain (in reverse order) is:
	[ 1318.023083][ T1082]
	[ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
	[ 1318.024114][ T1082] __ww_mutex_lock.constprop.0+0xe0/0x12f0
	[ 1318.024639][ T1082] ww_mutex_lock+0x32/0x90
	[ 1318.025161][ T1082] dma_resv_lockdep+0x18a/0x330
	[ 1318.025683][ T1082] do_one_initcall+0x6a/0x350
	[ 1318.026210][ T1082] kernel_init_freeable+0x1a3/0x310
	[ 1318.026728][ T1082] kernel_init+0x15/0x1a0
	[ 1318.027242][ T1082] ret_from_fork+0x2c/0x40
	[ 1318.027759][ T1082] ret_from_fork_asm+0x11/0x20
	[ 1318.028281][ T1082]
	[ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
	[ 1318.029297][ T1082] dma_resv_lockdep+0x16c/0x330
	[ 1318.029790][ T1082] do_one_initcall+0x6a/0x350
	[ 1318.030263][ T1082] kernel_init_freeable+0x1a3/0x310
	[ 1318.030722][ T1082] kernel_init+0x15/0x1a0
	[ 1318.031168][ T1082] ret_from_fork+0x2c/0x40
	[ 1318.031598][ T1082] ret_from_fork_asm+0x11/0x20
	[ 1318.032011][ T1082]
	[ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}:
	[ 1318.032778][ T1082] __lock_acquire+0x14bf/0x2680
	[ 1318.033141][ T1082] lock_acquire+0xcd/0x2c0
	[ 1318.033487][ T1082] __might_fault+0x58/0x80
	[ 1318.033814][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu]
	[ 1318.034181][ T1082] full_proxy_read+0x55/0x80
	[ 1318.034487][ T1082] vfs_read+0xa7/0x360
	[ 1318.034788][ T1082] ksys_read+0x70/0xf0
	[ 1318.035085][ T1082] do_syscall_64+0x94/0x180
	[ 1318.035375][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e
	[ 1318.035664][ T1082]
	[ 1318.035664][ T1082] other info that might help us debug this:
	[ 1318.035664][ T1082]
	[ 1318.036487][ T1082] Chain exists of:
	[ 1318.036487][ T1082] &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex
	[ 1318.036487][ T1082]
	[ 1318.037310][ T1082] Possible unsafe locking scenario:
	[ 1318.037310][ T1082]
	[ 1318.037838][ T1082] CPU0 CPU1
	[ 1318.038101][ T1082] ---- ----
	[ 1318.038350][ T1082] lock(reservation_ww_class_mutex);
	[ 1318.038590][ T1082] lock(reservation_ww_class_acquire);
	[ 1318.038839][ T1082] lock(reservation_ww_class_mutex);
	[ 1318.039083][ T1082] rlock(&mm->mmap_lock);
	[ 1318.039328][ T1082]
	[ 1318.039328][ T1082] * DEADLOCK *
	[ 1318.039328][ T1082]
	[ 1318.040029][ T1082] 1 lock held by tar/1082:
	[ 1318.040259][ T1082] #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
	[ 1318.040560][ T1082]
	[ 1318.040560][ T1082] stack backtrace:
	[ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2
	[ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023
	[ 1318.041614][ T1082] Call Trace:
	[ 1318.041895][ T1082] <TASK>
	[ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80
	[ 1318.042460][ T1082] check_noncircular+0x145/0x160
	[ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680
	[ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0
	[ 1318.043301][ T1082] ? __might_fault+0x40/0x80
	[ 1318.043580][ T1082] ? __might_fault+0x40/0x80
	[ 1318.043856][ T1082] __might_fault+0x58/0x80
	[ 1318.044131][ T1082] ? __might_fault+0x40/0x80
	[ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab]
	[ 1318.044749][ T1082] full_proxy_read+0x55/0x80
	[ 1318.045042][ T1082] vfs_read+0xa7/0x360
	[ 1318.045333][ T1082] ksys_read+0x70/0xf0
	[ 1318.045623][ T1082] do_syscall_64+0x94/0x180
	[ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180
	[ 1318.046201][ T1082] ? lockdep_hardirqs_on+0x7d/0x100
	[ 1318.046487][ T1082] ? do_syscall_64+0xa0/0x180
	[ 1318.046773][ T1082] ? do_syscall_64+0xa0/0x180
	[ 1318.047057][ T1082] ? do_syscall_64+0xa0/0x180
	[ 1318.047337][ T1082] ? do_syscall_64+0xa0/0x180
	[ 1318.047611][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e
	[ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d
	[ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
	[ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
	[ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d
	[ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008
	[ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007
	[ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00
	[ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800
	[ 1318.050638][ T1082] </TASK>

	amdgpu_debugfs_mqd_read() holds a reservation when it calls
	put_user(), which may fault and acquire the mmap_sem. This violates
	the established locking order.

	Bounce the mqd data through a kernel buffer to get put_user() out of
	the illegal section.

	The Linux kernel CVE team has assigned CVE-2024-35795 to this issue.


	Affected and fixed versions
	===========================

	Issue introduced in 6.5 with commit 445d85e3c1dfd8c45b24be6f1527f1e117256d0e and fixed in 6.6.24 with commit 197f6d6987c55860f6eea1c93e4f800c59078874
	Issue introduced in 6.5 with commit 445d85e3c1dfd8c45b24be6f1527f1e117256d0e and fixed in 6.7.12 with commit 8b03556da6e576c62664b6cd01809e4a09d53b5b
	Issue introduced in 6.5 with commit 445d85e3c1dfd8c45b24be6f1527f1e117256d0e and fixed in 6.8.3 with commit 4687e3c6ee877ee25e57b984eca00be53b9a8db5
	Issue introduced in 6.5 with commit 445d85e3c1dfd8c45b24be6f1527f1e117256d0e and fixed in 6.9 with commit 8678b1060ae2b75feb60b87e5b75e17374e3c1c5

	Please see https://www.kernel.org for a full list of currently supported
	kernel versions by the kernel community.

	Unaffected versions might change over time as fixes are backported to
	older supported kernel versions. The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2024-35795
	will be updated if fixes are backported, please check that for the most
	up to date information about this issue.


	Affected files
	==============

	The file(s) affected by this issue are:
	drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c


	Mitigation
	==========

	The Linux kernel CVE team recommends that you update to the latest
	stable kernel version for this, and many other bugfixes. Individual
	changes are never tested alone, but rather are part of a larger kernel
	release. Cherry-picking individual commits is not recommended or
	supported by the Linux kernel community at all. If however, updating to
	the latest release is impossible, the individual changes to resolve this
	issue can be found at these commits:
	https://git.kernel.org/stable/c/197f6d6987c55860f6eea1c93e4f800c59078874
	https://git.kernel.org/stable/c/8b03556da6e576c62664b6cd01809e4a09d53b5b
	https://git.kernel.org/stable/c/4687e3c6ee877ee25e57b984eca00be53b9a8db5
	https://git.kernel.org/stable/c/8678b1060ae2b75feb60b87e5b75e17374e3c1c5