| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-57974: udp: Deal with race between UDP socket address change and rehash |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| udp: Deal with race between UDP socket address change and rehash |
| |
| If a UDP socket changes its local address while it's receiving |
| datagrams, as a result of connect(), there is a period during which |
| a lookup operation might fail to find it, after the address is changed |
| but before the secondary hash (port and address) and the four-tuple |
| hash (local and remote ports and addresses) are updated. |
| |
| Secondary hash chains were introduced by commit 30fff9231fad ("udp: |
| bind() optimisation") and, as a result, a rehash operation became |
| needed to make a bound socket reachable again after a connect(). |
| |
| This operation was introduced by commit 719f835853a9 ("udp: add |
| rehash on connect()") which isn't however a complete fix: the |
| socket will be found once the rehashing completes, but not while |
| it's pending. |
| |
| This is noticeable with a socat(1) server in UDP4-LISTEN mode, and a |
| client sending datagrams to it. After the server receives the first |
| datagram (cf. _xioopen_ipdgram_listen()), it issues a connect() to |
| the address of the sender, in order to set up a directed flow. |
| |
| Now, if the client, running on a different CPU thread, happens to |
| send a (subsequent) datagram while the server's socket changes its |
| address, but is not rehashed yet, this will result in a failed |
| lookup and a port unreachable error delivered to the client, as |
| apparent from the following reproducer: |
| |
| LEN=$(($(cat /proc/sys/net/core/wmem_default) / 4)) |
| dd if=/dev/urandom bs=1 count=${LEN} of=tmp.in |
| |
| while :; do |
| taskset -c 1 socat UDP4-LISTEN:1337,null-eof OPEN:tmp.out,create,trunc & |
| sleep 0.1 || sleep 1 |
| taskset -c 2 socat OPEN:tmp.in UDP4:localhost:1337,shut-null |
| wait |
| done |
| |
| where the client will eventually get ECONNREFUSED on a write() |
| (typically the second or third one of a given iteration): |
| |
| 2024/11/13 21:28:23 socat[46901] E write(6, 0x556db2e3c000, 8192): Connection refused |
| |
| This issue was first observed as a seldom failure in Podman's tests |
| checking UDP functionality while using pasta(1) to connect the |
| container's network namespace, which leads us to a reproducer with |
| the lookup error resulting in an ICMP packet on a tap device: |
| |
| LOCAL_ADDR="$(ip -j -4 addr show|jq -rM '.[] | .addr_info[0] | select(.scope == "global").local')" |
| |
| while :; do |
| ./pasta --config-net -p pasta.pcap -u 1337 socat UDP4-LISTEN:1337,null-eof OPEN:tmp.out,create,trunc & |
| sleep 0.2 || sleep 1 |
| socat OPEN:tmp.in UDP4:${LOCAL_ADDR}:1337,shut-null |
| wait |
| cmp tmp.in tmp.out |
| done |
| |
| Once this fails: |
| |
| tmp.in tmp.out differ: char 8193, line 29 |
| |
| we can finally have a look at what's going on: |
| |
| $ tshark -r pasta.pcap |
| 1 0.000000 :: ? ff02::16 ICMPv6 110 Multicast Listener Report Message v2 |
| 2 0.168690 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192 |
| 3 0.168767 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192 |
| 4 0.168806 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192 |
| 5 0.168827 c6:47:05:8d:dc:04 ? Broadcast ARP 42 Who has 88.198.0.161? Tell 88.198.0.164 |
| 6 0.168851 9a:55:9a:55:9a:55 ? c6:47:05:8d:dc:04 ARP 42 88.198.0.161 is at 9a:55:9a:55:9a:55 |
| 7 0.168875 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192 |
| 8 0.168896 88.198.0.164 ? 88.198.0.161 ICMP 590 Destination unreachable (Port unreachable) |
| 9 0.168926 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192 |
| 10 0.168959 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192 |
| 11 0.168989 88.198.0.161 ? 88.198.0.164 UDP 4138 60260 ? 1337 Len=4096 |
| 12 0.169010 88.198.0.161 ? 88.198.0.164 UDP 42 60260 ? 1337 Len=0 |
| |
| On the third datagram received, the network namespace of the container |
| initiates an ARP lookup to deliver the ICMP message. |
| |
| In another variant of this reproducer, starting the client with: |
| |
| strace -f pasta --config-net -u 1337 socat UDP4-LISTEN:1337,null-eof OPEN:tmp.out,create,trunc 2>strace.log & |
| |
| and connecting to the socat server using a loopback address: |
| |
| socat OPEN:tmp.in UDP4:localhost:1337,shut-null |
| |
| we can more clearly observe a sendmmsg() call failing after the |
| first datagram is delivered: |
| |
| [pid 278012] connect(173, 0x7fff96c95fc0, 16) = 0 |
| [...] |
| [pid 278012] recvmmsg(173, 0x7fff96c96020, 1024, MSG_DONTWAIT, NULL) = -1 EAGAIN (Resource temporarily unavailable) |
| [pid 278012] sendmmsg(173, 0x561c5ad0a720, 1, MSG_NOSIGNAL) = 1 |
| [...] |
| [pid 278012] sendmmsg(173, 0x561c5ad0a720, 1, MSG_NOSIGNAL) = -1 ECONNREFUSED (Connection refused) |
| |
| and, somewhat confusingly, after a connect() on the same socket |
| succeeded. |
| |
| Until commit 4cdeeee9252a ("net: udp: prefer listeners bound to an |
| address"), the race between receive address change and lookup didn't |
| actually cause visible issues, because, once the lookup based on the |
| secondary hash chain failed, we would still attempt a lookup based on |
| the primary hash (destination port only), and find the socket with the |
| outdated secondary hash. |
| |
| That change, however, dropped port-only lookups altogether, as side |
| effect, making the race visible. |
| |
| To fix this, while avoiding the need to make address changes and |
| rehash atomic against lookups, reintroduce primary hash lookups as |
| fallback, if lookups based on four-tuple and secondary hashes fail. |
| |
| To this end, introduce a simplified lookup implementation, which |
| doesn't take care of SO_REUSEPORT groups: if we have one, there are |
| multiple sockets that would match the four-tuple or secondary hash, |
| meaning that we can't run into this race at all. |
| |
| v2: |
| - instead of synchronising lookup operations against address change |
| plus rehash, reintroduce a simplified version of the original |
| primary hash lookup as fallback |
| |
| v1: |
| - fix build with CONFIG_IPV6=n: add ifdef around sk_v6_rcv_saddr |
| usage (Kuniyuki Iwashima) |
| - directly use sk_rcv_saddr for IPv4 receive addresses instead of |
| fetching inet_rcv_saddr (Kuniyuki Iwashima) |
| - move inet_update_saddr() to inet_hashtables.h and use that |
| to set IPv4/IPv6 addresses as suitable (Kuniyuki Iwashima) |
| - rebase onto net-next, update commit message accordingly |
| |
| The Linux kernel CVE team has assigned CVE-2024-57974 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 2.6.33 with commit 30fff9231fad757c061285e347b33c5149c2c2e4 and fixed in 6.12.13 with commit 4f8344fce91c5766d368edb0ad80142eacd805c7 |
| Issue introduced in 2.6.33 with commit 30fff9231fad757c061285e347b33c5149c2c2e4 and fixed in 6.13.2 with commit d65d3bf309b2649d27b24efd0d8784da2d81f2a6 |
| Issue introduced in 2.6.33 with commit 30fff9231fad757c061285e347b33c5149c2c2e4 and fixed in 6.14 with commit a502ea6fa94b1f7be72a24bcf9e3f5f6b7e6e90c |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-57974 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| net/ipv4/udp.c |
| net/ipv6/udp.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/4f8344fce91c5766d368edb0ad80142eacd805c7 |
| https://git.kernel.org/stable/c/d65d3bf309b2649d27b24efd0d8784da2d81f2a6 |
| https://git.kernel.org/stable/c/a502ea6fa94b1f7be72a24bcf9e3f5f6b7e6e90c |