| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-26939: drm/i915/vma: Fix UAF on destroy against retire race |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| drm/i915/vma: Fix UAF on destroy against retire race |
| |
| Object debugging tools were sporadically reporting illegal attempts to |
| free a still active i915 VMA object when parking a GT believed to be idle. |
| |
| [161.359441] ODEBUG: free active (active state 0) object: ffff88811643b958 object type: i915_active hint: __i915_vma_active+0x0/0x50 [i915] |
| [161.360082] WARNING: CPU: 5 PID: 276 at lib/debugobjects.c:514 debug_print_object+0x80/0xb0 |
| ... |
| [161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 Not tainted 6.5.0-rc1-CI_DRM_13375-g003f860e5577+ #1 |
| [161.360314] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022 |
| [161.360322] Workqueue: i915-unordered __intel_wakeref_put_work [i915] |
| [161.360592] RIP: 0010:debug_print_object+0x80/0xb0 |
| ... |
| [161.361347] debug_object_free+0xeb/0x110 |
| [161.361362] i915_active_fini+0x14/0x130 [i915] |
| [161.361866] release_references+0xfe/0x1f0 [i915] |
| [161.362543] i915_vma_parked+0x1db/0x380 [i915] |
| [161.363129] __gt_park+0x121/0x230 [i915] |
| [161.363515] ____intel_wakeref_put_last+0x1f/0x70 [i915] |
| |
| That has been tracked down to be happening when another thread is |
| deactivating the VMA inside __active_retire() helper, after the VMA's |
| active counter has been already decremented to 0, but before deactivation |
| of the VMA's object is reported to the object debugging tool. |
| |
| We could prevent from that race by serializing i915_active_fini() with |
| __active_retire() via ref->tree_lock, but that wouldn't stop the VMA from |
| being used, e.g. from __i915_vma_retire() called at the end of |
| __active_retire(), after that VMA has been already freed by a concurrent |
| i915_vma_destroy() on return from the i915_active_fini(). Then, we should |
| rather fix the issue at the VMA level, not in i915_active. |
| |
| Since __i915_vma_parked() is called from __gt_park() on last put of the |
| GT's wakeref, the issue could be addressed by holding the GT wakeref long |
| enough for __active_retire() to complete before that wakeref is released |
| and the GT parked. |
| |
| I believe the issue was introduced by commit d93939730347 ("drm/i915: |
| Remove the vma refcount") which moved a call to i915_active_fini() from |
| a dropped i915_vma_release(), called on last put of the removed VMA kref, |
| to i915_vma_parked() processing path called on last put of a GT wakeref. |
| However, its visibility to the object debugging tool was suppressed by a |
| bug in i915_active that was fixed two weeks later with commit e92eb246feb9 |
| ("drm/i915/active: Fix missing debug object activation"). |
| |
| A VMA associated with a request doesn't acquire a GT wakeref by itself. |
| Instead, it depends on a wakeref held directly by the request's active |
| intel_context for a GT associated with its VM, and indirectly on that |
| intel_context's engine wakeref if the engine belongs to the same GT as the |
| VMA's VM. Those wakerefs are released asynchronously to VMA deactivation. |
| |
| Fix the issue by getting a wakeref for the VMA's GT when activating it, |
| and putting that wakeref only after the VMA is deactivated. However, |
| exclude global GTT from that processing path, otherwise the GPU never goes |
| idle. Since __i915_vma_retire() may be called from atomic contexts, use |
| async variant of wakeref put. Also, to avoid circular locking dependency, |
| take care of acquiring the wakeref before VM mutex when both are needed. |
| |
| v7: Add inline comments with justifications for: |
| - using untracked variants of intel_gt_pm_get/put() (Nirmoy), |
| - using async variant of _put(), |
| - not getting the wakeref in case of a global GTT, |
| - always getting the first wakeref outside vm->mutex. |
| v6: Since __i915_vma_active/retire() callbacks are not serialized, storing |
| a wakeref tracking handle inside struct i915_vma is not safe, and |
| there is no other good place for that. Use untracked variants of |
| intel_gt_pm_get/put_async(). |
| v5: Replace "tile" with "GT" across commit description (Rodrigo), |
| - avoid mentioning multi-GT case in commit description (Rodrigo), |
| - explain why we need to take a temporary wakeref unconditionally inside |
| i915_vma_pin_ww() (Rodrigo). |
| v4: Refresh on top of commit 5e4e06e4087e ("drm/i915: Track gt pm |
| wakerefs") (Andi), |
| - for more easy backporting, split out removal of former insufficient |
| workarounds and move them to separate patches (Nirmoy). |
| - clean up commit message and description a bit. |
| v3: Identify root cause more precisely, and a commit to blame, |
| - identify and drop former workarounds, |
| - update commit message and description. |
| v2: Get the wakeref before VM mutex to avoid circular locking dependency, |
| - drop questionable Fixes: tag. |
| |
| (cherry picked from commit f3c71b2ded5c4367144a810ef25f998fd1d6c381) |
| |
| The Linux kernel CVE team has assigned CVE-2024-26939 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 5.19 with commit d93939730347360db0afe6a4367451b6f84ab7b1 and fixed in 6.1.88 with commit 704edc9252f4988ae1ad7dafa23d0db8d90d7190 |
| Issue introduced in 5.19 with commit d93939730347360db0afe6a4367451b6f84ab7b1 and fixed in 6.6.29 with commit 5e3eb862df9f972ab677fb19e0d4b9b1be8db7b5 |
| Issue introduced in 5.19 with commit d93939730347360db0afe6a4367451b6f84ab7b1 and fixed in 6.8.3 with commit 59b2626dd8c8a2e13f18054b3530e0c00073d79f |
| Issue introduced in 5.19 with commit d93939730347360db0afe6a4367451b6f84ab7b1 and fixed in 6.9 with commit 0e45882ca829b26b915162e8e86dbb1095768e9e |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-26939 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| drivers/gpu/drm/i915/i915_vma.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/704edc9252f4988ae1ad7dafa23d0db8d90d7190 |
| https://git.kernel.org/stable/c/5e3eb862df9f972ab677fb19e0d4b9b1be8db7b5 |
| https://git.kernel.org/stable/c/59b2626dd8c8a2e13f18054b3530e0c00073d79f |
| https://git.kernel.org/stable/c/0e45882ca829b26b915162e8e86dbb1095768e9e |