| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-47744: KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock |
| |
| Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock |
| on x86 due to a chain of locks and SRCU synchronizations. Translating the |
| below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on |
| CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the |
| fairness of r/w semaphores). |
| |
| CPU0 CPU1 CPU2 |
| 1 lock(&kvm->slots_lock); |
| 2 lock(&vcpu->mutex); |
| 3 lock(&kvm->srcu); |
| 4 lock(cpu_hotplug_lock); |
| 5 lock(kvm_lock); |
| 6 lock(&kvm->slots_lock); |
| 7 lock(cpu_hotplug_lock); |
| 8 sync(&kvm->srcu); |
| |
| Note, there are likely more potential deadlocks in KVM x86, e.g. the same |
| pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with |
| __kvmclock_cpufreq_notifier(): |
| |
| cpuhp_cpufreq_online() |
| | |
| -> cpufreq_online() |
| | |
| -> cpufreq_gov_performance_limits() |
| | |
| -> __cpufreq_driver_target() |
| | |
| -> __target_index() |
| | |
| -> cpufreq_freq_transition_begin() |
| | |
| -> cpufreq_notify_transition() |
| | |
| -> ... __kvmclock_cpufreq_notifier() |
| |
| But, actually triggering such deadlocks is beyond rare due to the |
| combination of dependencies and timings involved. E.g. the cpufreq |
| notifier is only used on older CPUs without a constant TSC, mucking with |
| the NX hugepage mitigation while VMs are running is very uncommon, and |
| doing so while also onlining/offlining a CPU (necessary to generate |
| contention on cpu_hotplug_lock) would be even more unusual. |
| |
| The most robust solution to the general cpu_hotplug_lock issue is likely |
| to switch vm_list to be an RCU-protected list, e.g. so that x86's cpufreq |
| notifier doesn't to take kvm_lock. For now, settle for fixing the most |
| blatant deadlock, as switching to an RCU-protected list is a much more |
| involved change, but add a comment in locking.rst to call out that care |
| needs to be taken when walking holding kvm_lock and walking vm_list. |
| |
| ====================================================== |
| WARNING: possible circular locking dependency detected |
| 6.10.0-smp--c257535a0c9d-pip #330 Tainted: G S O |
| ------------------------------------------------------ |
| tee/35048 is trying to acquire lock: |
| ff6a80eced71e0a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm] |
| |
| but task is already holding lock: |
| ffffffffc07abb08 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm] |
| |
| which lock already depends on the new lock. |
| |
| the existing dependency chain (in reverse order) is: |
| |
| -> #3 (kvm_lock){+.+.}-{3:3}: |
| __mutex_lock+0x6a/0xb40 |
| mutex_lock_nested+0x1f/0x30 |
| kvm_dev_ioctl+0x4fb/0xe50 [kvm] |
| __se_sys_ioctl+0x7b/0xd0 |
| __x64_sys_ioctl+0x21/0x30 |
| x64_sys_call+0x15d0/0x2e60 |
| do_syscall_64+0x83/0x160 |
| entry_SYSCALL_64_after_hwframe+0x76/0x7e |
| |
| -> #2 (cpu_hotplug_lock){++++}-{0:0}: |
| cpus_read_lock+0x2e/0xb0 |
| static_key_slow_inc+0x16/0x30 |
| kvm_lapic_set_base+0x6a/0x1c0 [kvm] |
| kvm_set_apic_base+0x8f/0xe0 [kvm] |
| kvm_set_msr_common+0x9ae/0xf80 [kvm] |
| vmx_set_msr+0xa54/0xbe0 [kvm_intel] |
| __kvm_set_msr+0xb6/0x1a0 [kvm] |
| kvm_arch_vcpu_ioctl+0xeca/0x10c0 [kvm] |
| kvm_vcpu_ioctl+0x485/0x5b0 [kvm] |
| __se_sys_ioctl+0x7b/0xd0 |
| __x64_sys_ioctl+0x21/0x30 |
| x64_sys_call+0x15d0/0x2e60 |
| do_syscall_64+0x83/0x160 |
| entry_SYSCALL_64_after_hwframe+0x76/0x7e |
| |
| -> #1 (&kvm->srcu){.+.+}-{0:0}: |
| __synchronize_srcu+0x44/0x1a0 |
| synchronize_srcu_expedited+0x21/0x30 |
| kvm_swap_active_memslots+0x110/0x1c0 [kvm] |
| kvm_set_memslot+0x360/0x620 [kvm] |
| __kvm_set_memory_region+0x27b/0x300 [kvm] |
| kvm_vm_ioctl_set_memory_region+0x43/0x60 [kvm] |
| kvm_vm_ioctl+0x295/0x650 [kvm] |
| __se_sys_ioctl+0x7b/0xd0 |
| __x64_sys_ioctl+0x21/0x30 |
| x64_sys_call+0x15d0/0x2e60 |
| do_syscall_64+0x83/0x160 |
| entry_SYSCALL_64_after_hwframe+0x76/0x7e |
| |
| -> #0 (&kvm->slots_lock){+.+.}-{3:3}: |
| __lock_acquire+0x15ef/0x2e30 |
| lock_acquire+0xe0/0x260 |
| __mutex_lock+0x6a/0xb40 |
| mutex_lock_nested+0x1f/0x30 |
| set_nx_huge_pages+0x179/0x1e0 [kvm] |
| param_attr_store+0x93/0x100 |
| module_attr_store+0x22/0x40 |
| sysfs_kf_write+0x81/0xb0 |
| kernfs_fop_write_iter+0x133/0x1d0 |
| vfs_write+0x28d/0x380 |
| ksys_write+0x70/0xe0 |
| __x64_sys_write+0x1f/0x30 |
| x64_sys_call+0x281b/0x2e60 |
| do_syscall_64+0x83/0x160 |
| entry_SYSCALL_64_after_hwframe+0x76/0x7e |
| |
| The Linux kernel CVE team has assigned CVE-2024-47744 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.3 with commit 0bf50497f03b3d892c470c7d1a10a3e9c3c95821 and fixed in 6.6.54 with commit 4777225ec89f52bb9ca16a33cfb44c189f1b7b47 |
| Issue introduced in 6.3 with commit 0bf50497f03b3d892c470c7d1a10a3e9c3c95821 and fixed in 6.10.13 with commit a2764afce521fd9fd7a5ff6ed52ac2095873128a |
| Issue introduced in 6.3 with commit 0bf50497f03b3d892c470c7d1a10a3e9c3c95821 and fixed in 6.11.2 with commit 760a196e6dcb29580e468b44b5400171dae184d8 |
| Issue introduced in 6.3 with commit 0bf50497f03b3d892c470c7d1a10a3e9c3c95821 and fixed in 6.12 with commit 44d17459626052a2390457e550a12cb973506b2f |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-47744 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| Documentation/virt/kvm/locking.rst |
| virt/kvm/kvm_main.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/4777225ec89f52bb9ca16a33cfb44c189f1b7b47 |
| https://git.kernel.org/stable/c/a2764afce521fd9fd7a5ff6ed52ac2095873128a |
| https://git.kernel.org/stable/c/760a196e6dcb29580e468b44b5400171dae184d8 |
| https://git.kernel.org/stable/c/44d17459626052a2390457e550a12cb973506b2f |