| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2023-52738: drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini |
| |
| Currently amdgpu calls drm_sched_fini() from the fence driver sw fini |
| routine - such function is expected to be called only after the |
| respective init function - drm_sched_init() - was executed successfully. |
| |
| Happens that we faced a driver probe failure in the Steam Deck |
| recently, and the function drm_sched_fini() was called even without |
| its counter-part had been previously called, causing the following oops: |
| |
| amdgpu: probe of 0000:04:00.0 failed with error -110 |
| BUG: kernel NULL pointer dereference, address: 0000000000000090 |
| PGD 0 P4D 0 |
| Oops: 0002 [#1] PREEMPT SMP NOPTI |
| CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 |
| Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 |
| RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] |
| [...] |
| Call Trace: |
| <TASK> |
| amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] |
| amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] |
| amdgpu_driver_release_kms+0x16/0x30 [amdgpu] |
| devm_drm_dev_init_release+0x49/0x70 |
| [...] |
| |
| To prevent that, check if the drm_sched was properly initialized for a |
| given ring before calling its fini counter-part. |
| |
| Notice ideally we'd use sched.ready for that; such field is set as the latest |
| thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such |
| field - in the above oops for example, it was a GFX ring causing the crash, and |
| the sched.ready field was set to true in the ring init routine, regardless of |
| the state of the DRM scheduler. Hence, we ended-up using sched.ops as per |
| Christian's suggestion [0], and also removed the no_scheduler check [1]. |
| |
| [0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/ |
| [1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/ |
| |
| The Linux kernel CVE team has assigned CVE-2023-52738 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 5.15 with commit 067f44c8b4590c3f24d21a037578a478590f2175 and fixed in 5.15.94 with commit 2e557c8ca2c585bdef591b8503ba83b85f5d0afd |
| Issue introduced in 5.15 with commit 067f44c8b4590c3f24d21a037578a478590f2175 and fixed in 6.1.12 with commit 2bcbbef9cace772f5b7128b11401c515982de34b |
| Issue introduced in 5.15 with commit 067f44c8b4590c3f24d21a037578a478590f2175 and fixed in 6.2 with commit 5ad7bbf3dba5c4a684338df1f285080f2588b535 |
| Issue introduced in 5.14.10 with commit 8ba968ae672b3075794c8086aa164595b0175abe |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2023-52738 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/2e557c8ca2c585bdef591b8503ba83b85f5d0afd |
| https://git.kernel.org/stable/c/2bcbbef9cace772f5b7128b11401c515982de34b |
| https://git.kernel.org/stable/c/5ad7bbf3dba5c4a684338df1f285080f2588b535 |