| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-56552: drm/xe/guc_submit: fix race around suspend_pending |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| drm/xe/guc_submit: fix race around suspend_pending |
| |
| Currently in some testcases we can trigger: |
| |
| xe 0000:03:00.0: [drm] Assertion `exec_queue_destroyed(q)` failed! |
| .... |
| WARNING: CPU: 18 PID: 2640 at drivers/gpu/drm/xe/xe_guc_submit.c:1826 xe_guc_sched_done_handler+0xa54/0xef0 [xe] |
| xe 0000:03:00.0: [drm] *ERROR* GT1: DEREGISTER_DONE: Unexpected engine state 0x00a1, guc_id=57 |
| |
| Looking at a snippet of corresponding ftrace for this GuC id we can see: |
| |
| 162.673311: xe_sched_msg_add: dev=0000:03:00.0, gt=1 guc_id=57, opcode=3 |
| 162.673317: xe_sched_msg_recv: dev=0000:03:00.0, gt=1 guc_id=57, opcode=3 |
| 162.673319: xe_exec_queue_scheduling_disable: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0x29, flags=0x0 |
| 162.674089: xe_exec_queue_kill: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0x29, flags=0x0 |
| 162.674108: xe_exec_queue_close: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa9, flags=0x0 |
| 162.674488: xe_exec_queue_scheduling_done: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa9, flags=0x0 |
| 162.678452: xe_exec_queue_deregister: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa1, flags=0x0 |
| |
| It looks like we try to suspend the queue (opcode=3), setting |
| suspend_pending and triggering a disable_scheduling. The user then |
| closes the queue. However the close will also forcefully signal the |
| suspend fence after killing the queue, later when the G2H response for |
| disable_scheduling comes back we have now cleared suspend_pending when |
| signalling the suspend fence, so the disable_scheduling now incorrectly |
| tries to also deregister the queue. This leads to warnings since the queue |
| has yet to even be marked for destruction. We also seem to trigger |
| errors later with trying to double unregister the same queue. |
| |
| To fix this tweak the ordering when handling the response to ensure we |
| don't race with a disable_scheduling that didn't actually intend to |
| perform an unregister. The destruction path should now also correctly |
| wait for any pending_disable before marking as destroyed. |
| |
| (cherry picked from commit f161809b362f027b6d72bd998e47f8f0bad60a2e) |
| |
| The Linux kernel CVE team has assigned CVE-2024-56552 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.8 with commit dd08ebf6c3525a7ea2186e636df064ea47281987 and fixed in 6.12.4 with commit 5ddcb50b700221fa7d7be2adcb3d7d7afe8633dd |
| Issue introduced in 6.8 with commit dd08ebf6c3525a7ea2186e636df064ea47281987 and fixed in 6.13 with commit 87651f31ae4e6e6e7e6c7270b9b469405e747407 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-56552 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| drivers/gpu/drm/xe/xe_guc_submit.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/5ddcb50b700221fa7d7be2adcb3d7d7afe8633dd |
| https://git.kernel.org/stable/c/87651f31ae4e6e6e7e6c7270b9b469405e747407 |