| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-40906: net/mlx5: Always stop health timer during driver removal |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| net/mlx5: Always stop health timer during driver removal |
| |
| Currently, if teardown_hca fails to execute during driver removal, mlx5 |
| does not stop the health timer. Afterwards, mlx5 continue with driver |
| teardown. This may lead to a UAF bug, which results in page fault |
| Oops[1], since the health timer invokes after resources were freed. |
| |
| Hence, stop the health monitor even if teardown_hca fails. |
| |
| [1] |
| mlx5_core 0000:18:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) |
| mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) |
| mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) |
| mlx5_core 0000:18:00.0: E-Switch: cleanup |
| mlx5_core 0000:18:00.0: wait_func:1155:(pid 1967079): TEARDOWN_HCA(0x103) timeout. Will cause a leak of a command resource |
| mlx5_core 0000:18:00.0: mlx5_function_close:1288:(pid 1967079): tear_down_hca failed, skip cleanup |
| BUG: unable to handle page fault for address: ffffa26487064230 |
| PGD 100c00067 P4D 100c00067 PUD 100e5a067 PMD 105ed7067 PTE 0 |
| Oops: 0000 [#1] PREEMPT SMP PTI |
| CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE ------- --- 6.7.0-68.fc38.x86_64 #1 |
| Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020 |
| RIP: 0010:ioread32be+0x34/0x60 |
| RSP: 0018:ffffa26480003e58 EFLAGS: 00010292 |
| RAX: ffffa26487064200 RBX: ffff9042d08161a0 RCX: ffff904c108222c0 |
| RDX: 000000010bbf1b80 RSI: ffffffffc055ddb0 RDI: ffffa26487064230 |
| RBP: ffff9042d08161a0 R08: 0000000000000022 R09: ffff904c108222e8 |
| R10: 0000000000000004 R11: 0000000000000441 R12: ffffffffc055ddb0 |
| R13: ffffa26487064200 R14: ffffa26480003f00 R15: ffff904c108222c0 |
| FS: 0000000000000000(0000) GS:ffff904c10800000(0000) knlGS:0000000000000000 |
| CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 |
| CR2: ffffa26487064230 CR3: 00000002c4420006 CR4: 00000000007706f0 |
| DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 |
| DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 |
| PKRU: 55555554 |
| Call Trace: |
| <IRQ> |
| ? __die+0x23/0x70 |
| ? page_fault_oops+0x171/0x4e0 |
| ? exc_page_fault+0x175/0x180 |
| ? asm_exc_page_fault+0x26/0x30 |
| ? __pfx_poll_health+0x10/0x10 [mlx5_core] |
| ? __pfx_poll_health+0x10/0x10 [mlx5_core] |
| ? ioread32be+0x34/0x60 |
| mlx5_health_check_fatal_sensors+0x20/0x100 [mlx5_core] |
| ? __pfx_poll_health+0x10/0x10 [mlx5_core] |
| poll_health+0x42/0x230 [mlx5_core] |
| ? __next_timer_interrupt+0xbc/0x110 |
| ? __pfx_poll_health+0x10/0x10 [mlx5_core] |
| call_timer_fn+0x21/0x130 |
| ? __pfx_poll_health+0x10/0x10 [mlx5_core] |
| __run_timers+0x222/0x2c0 |
| run_timer_softirq+0x1d/0x40 |
| __do_softirq+0xc9/0x2c8 |
| __irq_exit_rcu+0xa6/0xc0 |
| sysvec_apic_timer_interrupt+0x72/0x90 |
| </IRQ> |
| <TASK> |
| asm_sysvec_apic_timer_interrupt+0x1a/0x20 |
| RIP: 0010:cpuidle_enter_state+0xcc/0x440 |
| ? cpuidle_enter_state+0xbd/0x440 |
| cpuidle_enter+0x2d/0x40 |
| do_idle+0x20d/0x270 |
| cpu_startup_entry+0x2a/0x30 |
| rest_init+0xd0/0xd0 |
| arch_call_rest_init+0xe/0x30 |
| start_kernel+0x709/0xa90 |
| x86_64_start_reservations+0x18/0x30 |
| x86_64_start_kernel+0x96/0xa0 |
| secondary_startup_64_no_verify+0x18f/0x19b |
| ---[ end trace 0000000000000000 ]--- |
| |
| The Linux kernel CVE team has assigned CVE-2024-40906 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.1 with commit 9b98d395b85dd042fe83fb696b1ac02e6c93a520 and fixed in 6.1.95 with commit e7d4485d47839f4d1284592ae242c4e65b2810a9 |
| Issue introduced in 6.1 with commit 9b98d395b85dd042fe83fb696b1ac02e6c93a520 and fixed in 6.6.35 with commit 6ccada6ffb42e0ac75e3db06d41baf5a7f483f8a |
| Issue introduced in 6.1 with commit 9b98d395b85dd042fe83fb696b1ac02e6c93a520 and fixed in 6.9.6 with commit e6777ae0bf6fd5bc626bb051c8c93e3c8198a3f8 |
| Issue introduced in 6.1 with commit 9b98d395b85dd042fe83fb696b1ac02e6c93a520 and fixed in 6.10 with commit c8b3f38d2dae0397944814d691a419c451f9906f |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-40906 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| drivers/net/ethernet/mellanox/mlx5/core/main.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/e7d4485d47839f4d1284592ae242c4e65b2810a9 |
| https://git.kernel.org/stable/c/6ccada6ffb42e0ac75e3db06d41baf5a7f483f8a |
| https://git.kernel.org/stable/c/e6777ae0bf6fd5bc626bb051c8c93e3c8198a3f8 |
| https://git.kernel.org/stable/c/c8b3f38d2dae0397944814d691a419c451f9906f |