| From bippy-7c5fe7eed585 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-46765: ice: protect XDP configuration with a mutex |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| ice: protect XDP configuration with a mutex |
| |
| The main threat to data consistency in ice_xdp() is a possible asynchronous |
| PF reset. It can be triggered by a user or by TX timeout handler. |
| |
| XDP setup and PF reset code access the same resources in the following |
| sections: |
| * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked |
| * ice_vsi_rebuild() for the PF VSI - not protected |
| * ice_vsi_open() - already rtnl-locked |
| |
| With an unfortunate timing, such accesses can result in a crash such as the |
| one below: |
| |
| [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14 |
| [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18 |
| [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms |
| [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001 |
| [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14 |
| [ +0.394718] ice 0000:b1:00.0: PTP reset successful |
| [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098 |
| [ +0.000045] #PF: supervisor read access in kernel mode |
| [ +0.000023] #PF: error_code(0x0000) - not-present page |
| [ +0.000023] PGD 0 P4D 0 |
| [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI |
| [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1 |
| [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021 |
| [ +0.000036] Workqueue: ice ice_service_task [ice] |
| [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice] |
| [...] |
| [ +0.000013] Call Trace: |
| [ +0.000016] <TASK> |
| [ +0.000014] ? __die+0x1f/0x70 |
| [ +0.000029] ? page_fault_oops+0x171/0x4f0 |
| [ +0.000029] ? schedule+0x3b/0xd0 |
| [ +0.000027] ? exc_page_fault+0x7b/0x180 |
| [ +0.000022] ? asm_exc_page_fault+0x22/0x30 |
| [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice] |
| [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice] |
| [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice] |
| [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice] |
| [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice] |
| [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice] |
| [ +0.000145] ice_rebuild+0x18c/0x840 [ice] |
| [ +0.000145] ? delay_tsc+0x4a/0xc0 |
| [ +0.000022] ? delay_tsc+0x92/0xc0 |
| [ +0.000020] ice_do_reset+0x140/0x180 [ice] |
| [ +0.000886] ice_service_task+0x404/0x1030 [ice] |
| [ +0.000824] process_one_work+0x171/0x340 |
| [ +0.000685] worker_thread+0x277/0x3a0 |
| [ +0.000675] ? preempt_count_add+0x6a/0xa0 |
| [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50 |
| [ +0.000679] ? __pfx_worker_thread+0x10/0x10 |
| [ +0.000653] kthread+0xf0/0x120 |
| [ +0.000635] ? __pfx_kthread+0x10/0x10 |
| [ +0.000616] ret_from_fork+0x2d/0x50 |
| [ +0.000612] ? __pfx_kthread+0x10/0x10 |
| [ +0.000604] ret_from_fork_asm+0x1b/0x30 |
| [ +0.000604] </TASK> |
| |
| The previous way of handling this through returning -EBUSY is not viable, |
| particularly when destroying AF_XDP socket, because the kernel proceeds |
| with removal anyway. |
| |
| There is plenty of code between those calls and there is no need to create |
| a large critical section that covers all of them, same as there is no need |
| to protect ice_vsi_rebuild() with rtnl_lock(). |
| |
| Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp(). |
| |
| Leaving unprotected sections in between would result in two states that |
| have to be considered: |
| 1. when the VSI is closed, but not yet rebuild |
| 2. when VSI is already rebuild, but not yet open |
| |
| The latter case is actually already handled through !netif_running() case, |
| we just need to adjust flag checking a little. The former one is not as |
| trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of |
| hardware interaction happens, this can make adding/deleting rings exit |
| with an error. Luckily, VSI rebuild is pending and can apply new |
| configuration for us in a managed fashion. |
| |
| Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to |
| indicate that ice_xdp() can just hot-swap the program. |
| |
| Also, as ice_vsi_rebuild() flow is touched in this patch, make it more |
| consistent by deconfiguring VSI when coalesce allocation fails. |
| |
| The Linux kernel CVE team has assigned CVE-2024-46765 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 5.5 with commit efc2214b6047b6f5b4ca53151eba62521b9452d6 and fixed in 6.6.51 with commit 2f057db2fb29bc209c103050647562e60554d3d3 |
| Issue introduced in 5.5 with commit efc2214b6047b6f5b4ca53151eba62521b9452d6 and fixed in 6.10.10 with commit 391f7dae3d836891fc6cfbde38add2d0e10c6b7f |
| Issue introduced in 5.5 with commit efc2214b6047b6f5b4ca53151eba62521b9452d6 and fixed in 6.11 with commit 2504b8405768a57a71e660dbfd5abd59f679a03f |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-46765 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| drivers/net/ethernet/intel/ice/ice.h |
| drivers/net/ethernet/intel/ice/ice_lib.c |
| drivers/net/ethernet/intel/ice/ice_main.c |
| drivers/net/ethernet/intel/ice/ice_xsk.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/2f057db2fb29bc209c103050647562e60554d3d3 |
| https://git.kernel.org/stable/c/391f7dae3d836891fc6cfbde38add2d0e10c6b7f |
| https://git.kernel.org/stable/c/2504b8405768a57a71e660dbfd5abd59f679a03f |