| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-49998: net: dsa: improve shutdown sequence |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| net: dsa: improve shutdown sequence |
| |
| Alexander Sverdlin presents 2 problems during shutdown with the |
| lan9303 driver. One is specific to lan9303 and the other just happens |
| to reproduce there. |
| |
| The first problem is that lan9303 is unique among DSA drivers in that it |
| calls dev_get_drvdata() at "arbitrary runtime" (not probe, not shutdown, |
| not remove): |
| |
| phy_state_machine() |
| -> ... |
| -> dsa_user_phy_read() |
| -> ds->ops->phy_read() |
| -> lan9303_phy_read() |
| -> chip->ops->phy_read() |
| -> lan9303_mdio_phy_read() |
| -> dev_get_drvdata() |
| |
| But we never stop the phy_state_machine(), so it may continue to run |
| after dsa_switch_shutdown(). Our common pattern in all DSA drivers is |
| to set drvdata to NULL to suppress the remove() method that may come |
| afterwards. But in this case it will result in an NPD. |
| |
| The second problem is that the way in which we set |
| dp->conduit->dsa_ptr = NULL; is concurrent with receive packet |
| processing. dsa_switch_rcv() checks once whether dev->dsa_ptr is NULL, |
| but afterwards, rather than continuing to use that non-NULL value, |
| dev->dsa_ptr is dereferenced again and again without NULL checks: |
| dsa_conduit_find_user() and many other places. In between dereferences, |
| there is no locking to ensure that what was valid once continues to be |
| valid. |
| |
| Both problems have the common aspect that closing the conduit interface |
| solves them. |
| |
| In the first case, dev_close(conduit) triggers the NETDEV_GOING_DOWN |
| event in dsa_user_netdevice_event() which closes user ports as well. |
| dsa_port_disable_rt() calls phylink_stop(), which synchronously stops |
| the phylink state machine, and ds->ops->phy_read() will thus no longer |
| call into the driver after this point. |
| |
| In the second case, dev_close(conduit) should do this, as per |
| Documentation/networking/driver.rst: |
| |
| | Quiescence |
| | ---------- |
| | |
| | After the ndo_stop routine has been called, the hardware must |
| | not receive or transmit any data. All in flight packets must |
| | be aborted. If necessary, poll or wait for completion of |
| | any reset commands. |
| |
| So it should be sufficient to ensure that later, when we zeroize |
| conduit->dsa_ptr, there will be no concurrent dsa_switch_rcv() call |
| on this conduit. |
| |
| The addition of the netif_device_detach() function is to ensure that |
| ioctls, rtnetlinks and ethtool requests on the user ports no longer |
| propagate down to the driver - we're no longer prepared to handle them. |
| |
| The race condition actually did not exist when commit 0650bf52b31f |
| ("net: dsa: be compatible with masters which unregister on shutdown") |
| first introduced dsa_switch_shutdown(). It was created later, when we |
| stopped unregistering the user interfaces from a bad spot, and we just |
| replaced that sequence with a racy zeroization of conduit->dsa_ptr |
| (one which doesn't ensure that the interfaces aren't up). |
| |
| The Linux kernel CVE team has assigned CVE-2024-49998 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 5.15.155 with commit ff45899e732e57088985e3a497b1d9100571c0f5 and fixed in 5.15.176 with commit 87bd909a7014e32790e8c759d5b7694a95778ca5 |
| Issue introduced in 5.17 with commit ee534378f00561207656663d93907583958339ae and fixed in 6.10.14 with commit ab5d3420a1120950703dbdc33698b28a6ebc3d23 |
| Issue introduced in 5.17 with commit ee534378f00561207656663d93907583958339ae and fixed in 6.11.3 with commit b4a65d479213fe84ecb14e328271251eebe69492 |
| Issue introduced in 5.17 with commit ee534378f00561207656663d93907583958339ae and fixed in 6.12 with commit 6c24a03a61a245fe34d47582898331fa034b6ccd |
| Issue introduced in 5.16.10 with commit 89b60402d43cdab4387dbbf24afebda5cf092ae7 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-49998 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| net/dsa/dsa.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/87bd909a7014e32790e8c759d5b7694a95778ca5 |
| https://git.kernel.org/stable/c/ab5d3420a1120950703dbdc33698b28a6ebc3d23 |
| https://git.kernel.org/stable/c/b4a65d479213fe84ecb14e328271251eebe69492 |
| https://git.kernel.org/stable/c/6c24a03a61a245fe34d47582898331fa034b6ccd |