| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-53169: nvme-fabrics: fix kernel crash while shutting down controller |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| nvme-fabrics: fix kernel crash while shutting down controller |
| |
| The nvme keep-alive operation, which executes at a periodic interval, |
| could potentially sneak in while shutting down a fabric controller. |
| This may lead to a race between the fabric controller admin queue |
| destroy code path (invoked while shutting down controller) and hw/hctx |
| queue dispatcher called from the nvme keep-alive async request queuing |
| operation. This race could lead to the kernel crash shown below: |
| |
| Call Trace: |
| autoremove_wake_function+0x0/0xbc (unreliable) |
| __blk_mq_sched_dispatch_requests+0x114/0x24c |
| blk_mq_sched_dispatch_requests+0x44/0x84 |
| blk_mq_run_hw_queue+0x140/0x220 |
| nvme_keep_alive_work+0xc8/0x19c [nvme_core] |
| process_one_work+0x200/0x4e0 |
| worker_thread+0x340/0x504 |
| kthread+0x138/0x140 |
| start_kernel_thread+0x14/0x18 |
| |
| While shutting down fabric controller, if nvme keep-alive request sneaks |
| in then it would be flushed off. The nvme_keep_alive_end_io function is |
| then invoked to handle the end of the keep-alive operation which |
| decrements the admin->q_usage_counter and assuming this is the last/only |
| request in the admin queue then the admin->q_usage_counter becomes zero. |
| If that happens then blk-mq destroy queue operation (blk_mq_destroy_ |
| queue()) which could be potentially running simultaneously on another |
| cpu (as this is the controller shutdown code path) would forward |
| progress and deletes the admin queue. So, now from this point onward |
| we are not supposed to access the admin queue resources. However the |
| issue here's that the nvme keep-alive thread running hw/hctx queue |
| dispatch operation hasn't yet finished its work and so it could still |
| potentially access the admin queue resource while the admin queue had |
| been already deleted and that causes the above crash. |
| |
| The above kernel crash is regression caused due to changes implemented |
| in commit a54a93d0e359 ("nvme: move stopping keep-alive into |
| nvme_uninit_ctrl()"). Ideally we should stop keep-alive before destroyin |
| g the admin queue and freeing the admin tagset so that it wouldn't sneak |
| in during the shutdown operation. However we removed the keep alive stop |
| operation from the beginning of the controller shutdown code path in commit |
| a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()") |
| and added it under nvme_uninit_ctrl() which executes very late in the |
| shutdown code path after the admin queue is destroyed and its tagset is |
| removed. So this change created the possibility of keep-alive sneaking in |
| and interfering with the shutdown operation and causing observed kernel |
| crash. |
| |
| To fix the observed crash, we decided to move nvme_stop_keep_alive() from |
| nvme_uninit_ctrl() to nvme_remove_admin_tag_set(). This change would ensure |
| that we don't forward progress and delete the admin queue until the keep- |
| alive operation is finished (if it's in-flight) or cancelled and that would |
| help contain the race condition explained above and hence avoid the crash. |
| |
| Moving nvme_stop_keep_alive() to nvme_remove_admin_tag_set() instead of |
| adding nvme_stop_keep_alive() to the beginning of the controller shutdown |
| code path in nvme_stop_ctrl(), as was the case earlier before commit |
| a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()"), |
| would help save one callsite of nvme_stop_keep_alive(). |
| |
| The Linux kernel CVE team has assigned CVE-2024-53169 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.11 with commit a54a93d0e3599b05856971734e15418ac551a14c and fixed in 6.11.11 with commit 30794f4952decb2ec8efa42f704cac5304499a41 |
| Issue introduced in 6.11 with commit a54a93d0e3599b05856971734e15418ac551a14c and fixed in 6.12.2 with commit 5416b76a8156c1b8491f78f8a728f422104bb919 |
| Issue introduced in 6.11 with commit a54a93d0e3599b05856971734e15418ac551a14c and fixed in 6.13 with commit e9869c85c81168a1275f909d5972a3fc435304be |
| Issue introduced in 6.10.7 with commit 4101af98ab573554c4225e328d506fec2a74bc54 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-53169 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| drivers/nvme/host/core.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/30794f4952decb2ec8efa42f704cac5304499a41 |
| https://git.kernel.org/stable/c/5416b76a8156c1b8491f78f8a728f422104bb919 |
| https://git.kernel.org/stable/c/e9869c85c81168a1275f909d5972a3fc435304be |