cve/published/2024/CVE-2024-46765.mbox - pub/scm/linux/security/vulns - Git at Google

 From bippy-7c5fe7eed585 Mon Sep 17 00:00:00 2001
 From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 To: <linux-cve-announce@vger.kernel.org>
 Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
 Subject: CVE-2024-46765: ice: protect XDP configuration with a mutex

 Description
 ===========

 In the Linux kernel, the following vulnerability has been resolved:

 ice: protect XDP configuration with a mutex

 The main threat to data consistency in ice_xdp() is a possible asynchronous
 PF reset. It can be triggered by a user or by TX timeout handler.

 XDP setup and PF reset code access the same resources in the following
 sections:
 * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
 * ice_vsi_rebuild() for the PF VSI - not protected
 * ice_vsi_open() - already rtnl-locked

 With an unfortunate timing, such accesses can result in a crash such as the
 one below:

 [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
 [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
 [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
 [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
 [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
 [ +0.394718] ice 0000:b1:00.0: PTP reset successful
 [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
 [ +0.000045] #PF: supervisor read access in kernel mode
 [ +0.000023] #PF: error_code(0x0000) - not-present page
 [ +0.000023] PGD 0 P4D 0
 [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
 [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
 [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
 [ +0.000036] Workqueue: ice ice_service_task [ice]
 [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
 [...]
 [ +0.000013] Call Trace:
 [ +0.000016] <TASK>
 [ +0.000014] ? __die+0x1f/0x70
 [ +0.000029] ? page_fault_oops+0x171/0x4f0
 [ +0.000029] ? schedule+0x3b/0xd0
 [ +0.000027] ? exc_page_fault+0x7b/0x180
 [ +0.000022] ? asm_exc_page_fault+0x22/0x30
 [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
 [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
 [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
 [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
 [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
 [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
 [ +0.000145] ice_rebuild+0x18c/0x840 [ice]
 [ +0.000145] ? delay_tsc+0x4a/0xc0
 [ +0.000022] ? delay_tsc+0x92/0xc0
 [ +0.000020] ice_do_reset+0x140/0x180 [ice]
 [ +0.000886] ice_service_task+0x404/0x1030 [ice]
 [ +0.000824] process_one_work+0x171/0x340
 [ +0.000685] worker_thread+0x277/0x3a0
 [ +0.000675] ? preempt_count_add+0x6a/0xa0
 [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
 [ +0.000679] ? __pfx_worker_thread+0x10/0x10
 [ +0.000653] kthread+0xf0/0x120
 [ +0.000635] ? __pfx_kthread+0x10/0x10
 [ +0.000616] ret_from_fork+0x2d/0x50
 [ +0.000612] ? __pfx_kthread+0x10/0x10
 [ +0.000604] ret_from_fork_asm+0x1b/0x30
 [ +0.000604] </TASK>

 The previous way of handling this through returning -EBUSY is not viable,
 particularly when destroying AF_XDP socket, because the kernel proceeds
 with removal anyway.

 There is plenty of code between those calls and there is no need to create
 a large critical section that covers all of them, same as there is no need
 to protect ice_vsi_rebuild() with rtnl_lock().

 Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().

 Leaving unprotected sections in between would result in two states that
 have to be considered:
 1. when the VSI is closed, but not yet rebuild
 2. when VSI is already rebuild, but not yet open

 The latter case is actually already handled through !netif_running() case,
 we just need to adjust flag checking a little. The former one is not as
 trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
 hardware interaction happens, this can make adding/deleting rings exit
 with an error. Luckily, VSI rebuild is pending and can apply new
 configuration for us in a managed fashion.

 Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
 indicate that ice_xdp() can just hot-swap the program.

 Also, as ice_vsi_rebuild() flow is touched in this patch, make it more
 consistent by deconfiguring VSI when coalesce allocation fails.

 The Linux kernel CVE team has assigned CVE-2024-46765 to this issue.


 Affected and fixed versions
 ===========================

 	Issue introduced in 5.5 with commit efc2214b6047b6f5b4ca53151eba62521b9452d6 and fixed in 6.6.51 with commit 2f057db2fb29bc209c103050647562e60554d3d3
 	Issue introduced in 5.5 with commit efc2214b6047b6f5b4ca53151eba62521b9452d6 and fixed in 6.10.10 with commit 391f7dae3d836891fc6cfbde38add2d0e10c6b7f
 	Issue introduced in 5.5 with commit efc2214b6047b6f5b4ca53151eba62521b9452d6 and fixed in 6.11 with commit 2504b8405768a57a71e660dbfd5abd59f679a03f

 Please see https://www.kernel.org for a full list of currently supported
 kernel versions by the kernel community.

 Unaffected versions might change over time as fixes are backported to
 older supported kernel versions.  The official CVE entry at
 	https://cve.org/CVERecord/?id=CVE-2024-46765
 will be updated if fixes are backported, please check that for the most
 up to date information about this issue.


 Affected files
 ==============

 The file(s) affected by this issue are:
 	drivers/net/ethernet/intel/ice/ice.h
 	drivers/net/ethernet/intel/ice/ice_lib.c
 	drivers/net/ethernet/intel/ice/ice_main.c
 	drivers/net/ethernet/intel/ice/ice_xsk.c


 Mitigation
 ==========

 The Linux kernel CVE team recommends that you update to the latest
 stable kernel version for this, and many other bugfixes.  Individual
 changes are never tested alone, but rather are part of a larger kernel
 release.  Cherry-picking individual commits is not recommended or
 supported by the Linux kernel community at all.  If however, updating to
 the latest release is impossible, the individual changes to resolve this
 issue can be found at these commits:
 	https://git.kernel.org/stable/c/2f057db2fb29bc209c103050647562e60554d3d3
 	https://git.kernel.org/stable/c/391f7dae3d836891fc6cfbde38add2d0e10c6b7f
 	https://git.kernel.org/stable/c/2504b8405768a57a71e660dbfd5abd59f679a03f
	From bippy-7c5fe7eed585 Mon Sep 17 00:00:00 2001
	From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	To: <linux-cve-announce@vger.kernel.org>
	Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
	Subject: CVE-2024-46765: ice: protect XDP configuration with a mutex

	Description
	===========

	In the Linux kernel, the following vulnerability has been resolved:

	ice: protect XDP configuration with a mutex

	The main threat to data consistency in ice_xdp() is a possible asynchronous
	PF reset. It can be triggered by a user or by TX timeout handler.

	XDP setup and PF reset code access the same resources in the following
	sections:
	* ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
	* ice_vsi_rebuild() for the PF VSI - not protected
	* ice_vsi_open() - already rtnl-locked

	With an unfortunate timing, such accesses can result in a crash such as the
	one below:

	[ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
	[ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
	[Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
	[ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
	[ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
	[ +0.394718] ice 0000:b1:00.0: PTP reset successful
	[ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
	[ +0.000045] #PF: supervisor read access in kernel mode
	[ +0.000023] #PF: error_code(0x0000) - not-present page
	[ +0.000023] PGD 0 P4D 0
	[ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
	[ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
	[ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
	[ +0.000036] Workqueue: ice ice_service_task [ice]
	[ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
	[...]
	[ +0.000013] Call Trace:
	[ +0.000016] <TASK>
	[ +0.000014] ? __die+0x1f/0x70
	[ +0.000029] ? page_fault_oops+0x171/0x4f0
	[ +0.000029] ? schedule+0x3b/0xd0
	[ +0.000027] ? exc_page_fault+0x7b/0x180
	[ +0.000022] ? asm_exc_page_fault+0x22/0x30
	[ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
	[ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
	[ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
	[ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
	[ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
	[ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
	[ +0.000145] ice_rebuild+0x18c/0x840 [ice]
	[ +0.000145] ? delay_tsc+0x4a/0xc0
	[ +0.000022] ? delay_tsc+0x92/0xc0
	[ +0.000020] ice_do_reset+0x140/0x180 [ice]
	[ +0.000886] ice_service_task+0x404/0x1030 [ice]
	[ +0.000824] process_one_work+0x171/0x340
	[ +0.000685] worker_thread+0x277/0x3a0
	[ +0.000675] ? preempt_count_add+0x6a/0xa0
	[ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
	[ +0.000679] ? __pfx_worker_thread+0x10/0x10
	[ +0.000653] kthread+0xf0/0x120
	[ +0.000635] ? __pfx_kthread+0x10/0x10
	[ +0.000616] ret_from_fork+0x2d/0x50
	[ +0.000612] ? __pfx_kthread+0x10/0x10
	[ +0.000604] ret_from_fork_asm+0x1b/0x30
	[ +0.000604] </TASK>

	The previous way of handling this through returning -EBUSY is not viable,
	particularly when destroying AF_XDP socket, because the kernel proceeds
	with removal anyway.

	There is plenty of code between those calls and there is no need to create
	a large critical section that covers all of them, same as there is no need
	to protect ice_vsi_rebuild() with rtnl_lock().

	Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().

	Leaving unprotected sections in between would result in two states that
	have to be considered:
	1. when the VSI is closed, but not yet rebuild
	2. when VSI is already rebuild, but not yet open

	The latter case is actually already handled through !netif_running() case,
	we just need to adjust flag checking a little. The former one is not as
	trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
	hardware interaction happens, this can make adding/deleting rings exit
	with an error. Luckily, VSI rebuild is pending and can apply new
	configuration for us in a managed fashion.

	Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
	indicate that ice_xdp() can just hot-swap the program.

	Also, as ice_vsi_rebuild() flow is touched in this patch, make it more
	consistent by deconfiguring VSI when coalesce allocation fails.

	The Linux kernel CVE team has assigned CVE-2024-46765 to this issue.


	Affected and fixed versions
	===========================

	Issue introduced in 5.5 with commit efc2214b6047b6f5b4ca53151eba62521b9452d6 and fixed in 6.6.51 with commit 2f057db2fb29bc209c103050647562e60554d3d3
	Issue introduced in 5.5 with commit efc2214b6047b6f5b4ca53151eba62521b9452d6 and fixed in 6.10.10 with commit 391f7dae3d836891fc6cfbde38add2d0e10c6b7f
	Issue introduced in 5.5 with commit efc2214b6047b6f5b4ca53151eba62521b9452d6 and fixed in 6.11 with commit 2504b8405768a57a71e660dbfd5abd59f679a03f

	Please see https://www.kernel.org for a full list of currently supported
	kernel versions by the kernel community.

	Unaffected versions might change over time as fixes are backported to
	older supported kernel versions. The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2024-46765
	will be updated if fixes are backported, please check that for the most
	up to date information about this issue.


	Affected files
	==============

	The file(s) affected by this issue are:
	drivers/net/ethernet/intel/ice/ice.h
	drivers/net/ethernet/intel/ice/ice_lib.c
	drivers/net/ethernet/intel/ice/ice_main.c
	drivers/net/ethernet/intel/ice/ice_xsk.c


	Mitigation
	==========

	The Linux kernel CVE team recommends that you update to the latest
	stable kernel version for this, and many other bugfixes. Individual
	changes are never tested alone, but rather are part of a larger kernel
	release. Cherry-picking individual commits is not recommended or
	supported by the Linux kernel community at all. If however, updating to
	the latest release is impossible, the individual changes to resolve this
	issue can be found at these commits:
	https://git.kernel.org/stable/c/2f057db2fb29bc209c103050647562e60554d3d3
	https://git.kernel.org/stable/c/391f7dae3d836891fc6cfbde38add2d0e10c6b7f
	https://git.kernel.org/stable/c/2504b8405768a57a71e660dbfd5abd59f679a03f