| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-26976: KVM: Always flush async #PF workqueue when vCPU is being destroyed |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| KVM: Always flush async #PF workqueue when vCPU is being destroyed |
| |
| Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its |
| completion queue, e.g. when a VM and all its vCPUs is being destroyed. |
| KVM must ensure that none of its workqueue callbacks is running when the |
| last reference to the KVM _module_ is put. Gifting a reference to the |
| associated VM prevents the workqueue callback from dereferencing freed |
| vCPU/VM memory, but does not prevent the KVM module from being unloaded |
| before the callback completes. |
| |
| Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from |
| async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will |
| result in deadlock. async_pf_execute() can't return until kvm_put_kvm() |
| finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes: |
| |
| WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm] |
| Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass |
| CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G W 6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119 |
| Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 |
| Workqueue: events async_pf_execute [kvm] |
| RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm] |
| Call Trace: |
| <TASK> |
| async_pf_execute+0x198/0x260 [kvm] |
| process_one_work+0x145/0x2d0 |
| worker_thread+0x27e/0x3a0 |
| kthread+0xba/0xe0 |
| ret_from_fork+0x2d/0x50 |
| ret_from_fork_asm+0x11/0x20 |
| </TASK> |
| ---[ end trace 0000000000000000 ]--- |
| INFO: task kworker/8:1:251 blocked for more than 120 seconds. |
| Tainted: G W 6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119 |
| "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. |
| task:kworker/8:1 state:D stack:0 pid:251 ppid:2 flags:0x00004000 |
| Workqueue: events async_pf_execute [kvm] |
| Call Trace: |
| <TASK> |
| __schedule+0x33f/0xa40 |
| schedule+0x53/0xc0 |
| schedule_timeout+0x12a/0x140 |
| __wait_for_common+0x8d/0x1d0 |
| __flush_work.isra.0+0x19f/0x2c0 |
| kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm] |
| kvm_arch_destroy_vm+0x78/0x1b0 [kvm] |
| kvm_put_kvm+0x1c1/0x320 [kvm] |
| async_pf_execute+0x198/0x260 [kvm] |
| process_one_work+0x145/0x2d0 |
| worker_thread+0x27e/0x3a0 |
| kthread+0xba/0xe0 |
| ret_from_fork+0x2d/0x50 |
| ret_from_fork_asm+0x11/0x20 |
| </TASK> |
| |
| If kvm_clear_async_pf_completion_queue() actually flushes the workqueue, |
| then there's no need to gift async_pf_execute() a reference because all |
| invocations of async_pf_execute() will be forced to complete before the |
| vCPU and its VM are destroyed/freed. And that in turn fixes the module |
| unloading bug as __fput() won't do module_put() on the last vCPU reference |
| until the vCPU has been freed, e.g. if closing the vCPU file also puts the |
| last reference to the KVM module. |
| |
| Note that kvm_check_async_pf_completion() may also take the work item off |
| the completion queue and so also needs to flush the work queue, as the |
| work will not be seen by kvm_clear_async_pf_completion_queue(). Waiting |
| on the workqueue could theoretically delay a vCPU due to waiting for the |
| work to complete, but that's a very, very small chance, and likely a very |
| small delay. kvm_arch_async_page_present_queued() unconditionally makes a |
| new request, i.e. will effectively delay entering the guest, so the |
| remaining work is really just: |
| |
| trace_kvm_async_pf_completed(addr, cr2_or_gpa); |
| |
| __kvm_vcpu_wake_up(vcpu); |
| |
| mmput(mm); |
| |
| and mmput() can't drop the last reference to the page tables if the vCPU is |
| still alive, i.e. the vCPU won't get stuck tearing down page tables. |
| |
| Add a helper to do the flushing, specifically to deal with "wakeup all" |
| work items, as they aren't actually work items, i.e. are never placed in a |
| workqueue. Trying to flush a bogus workqueue entry rightly makes |
| __flush_work() complain (kudos to whoever added that sanity check). |
| |
| Note, commit 5f6de5cbebee ("KVM: Prevent module exit until all VMs are |
| freed") *tried* to fix the module refcounting issue by having VMs grab a |
| reference to the module, but that only made the bug slightly harder to hit |
| as it gave async_pf_execute() a bit more time to complete before the KVM |
| module could be unloaded. |
| |
| The Linux kernel CVE team has assigned CVE-2024-26976 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 2.6.38 with commit af585b921e5d1e919947c4b1164b59507fe7cd7b and fixed in 4.19.312 with commit ab2c2f5d9576112ad22cfd3798071cb74693b1f5 |
| Issue introduced in 2.6.38 with commit af585b921e5d1e919947c4b1164b59507fe7cd7b and fixed in 5.4.274 with commit 82e25cc1c2e93c3023da98be282322fc08b61ffb |
| Issue introduced in 2.6.38 with commit af585b921e5d1e919947c4b1164b59507fe7cd7b and fixed in 5.10.215 with commit f8730d6335e5f43d09151fca1f0f41922209a264 |
| Issue introduced in 2.6.38 with commit af585b921e5d1e919947c4b1164b59507fe7cd7b and fixed in 5.15.154 with commit 83d3c5e309611ef593e2fcb78444fc8ceedf9bac |
| Issue introduced in 2.6.38 with commit af585b921e5d1e919947c4b1164b59507fe7cd7b and fixed in 6.1.84 with commit b54478d20375874aeee257744dedfd3e413432ff |
| Issue introduced in 2.6.38 with commit af585b921e5d1e919947c4b1164b59507fe7cd7b and fixed in 6.6.24 with commit a75afe480d4349c524d9c659b1a5a544dbc39a98 |
| Issue introduced in 2.6.38 with commit af585b921e5d1e919947c4b1164b59507fe7cd7b and fixed in 6.7.12 with commit 4f3a3bce428fb439c66a578adc447afce7b4a750 |
| Issue introduced in 2.6.38 with commit af585b921e5d1e919947c4b1164b59507fe7cd7b and fixed in 6.8.3 with commit caa9af2e27c275e089d702cfbaaece3b42bca31b |
| Issue introduced in 2.6.38 with commit af585b921e5d1e919947c4b1164b59507fe7cd7b and fixed in 6.9 with commit 3d75b8aa5c29058a512db29da7cbee8052724157 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-26976 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| virt/kvm/async_pf.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/ab2c2f5d9576112ad22cfd3798071cb74693b1f5 |
| https://git.kernel.org/stable/c/82e25cc1c2e93c3023da98be282322fc08b61ffb |
| https://git.kernel.org/stable/c/f8730d6335e5f43d09151fca1f0f41922209a264 |
| https://git.kernel.org/stable/c/83d3c5e309611ef593e2fcb78444fc8ceedf9bac |
| https://git.kernel.org/stable/c/b54478d20375874aeee257744dedfd3e413432ff |
| https://git.kernel.org/stable/c/a75afe480d4349c524d9c659b1a5a544dbc39a98 |
| https://git.kernel.org/stable/c/4f3a3bce428fb439c66a578adc447afce7b4a750 |
| https://git.kernel.org/stable/c/caa9af2e27c275e089d702cfbaaece3b42bca31b |
| https://git.kernel.org/stable/c/3d75b8aa5c29058a512db29da7cbee8052724157 |