| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2025-21710: tcp: correct handling of extreme memory squeeze |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| tcp: correct handling of extreme memory squeeze |
| |
| Testing with iperf3 using the "pasta" protocol splicer has revealed |
| a problem in the way tcp handles window advertising in extreme memory |
| squeeze situations. |
| |
| Under memory pressure, a socket endpoint may temporarily advertise |
| a zero-sized window, but this is not stored as part of the socket data. |
| The reasoning behind this is that it is considered a temporary setting |
| which shouldn't influence any further calculations. |
| |
| However, if we happen to stall at an unfortunate value of the current |
| window size, the algorithm selecting a new value will consistently fail |
| to advertise a non-zero window once we have freed up enough memory. |
| This means that this side's notion of the current window size is |
| different from the one last advertised to the peer, causing the latter |
| to not send any data to resolve the sitution. |
| |
| The problem occurs on the iperf3 server side, and the socket in question |
| is a completely regular socket with the default settings for the |
| fedora40 kernel. We do not use SO_PEEK or SO_RCVBUF on the socket. |
| |
| The following excerpt of a logging session, with own comments added, |
| shows more in detail what is happening: |
| |
| // tcp_v4_rcv(->) |
| // tcp_rcv_established(->) |
| [5201<->39222]: ==== Activating log @ net/ipv4/tcp_input.c/tcp_data_queue()/5257 ==== |
| [5201<->39222]: tcp_data_queue(->) |
| [5201<->39222]: DROPPING skb [265600160..265665640], reason: SKB_DROP_REASON_PROTO_MEM |
| [rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184] |
| [copied_seq 259909392->260034360 (124968), unread 5565800, qlen 85, ofoq 0] |
| [OFO queue: gap: 65480, len: 0] |
| [5201<->39222]: tcp_data_queue(<-) |
| [5201<->39222]: __tcp_transmit_skb(->) |
| [tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160] |
| [5201<->39222]: tcp_select_window(->) |
| [5201<->39222]: (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM) ? --> TRUE |
| [tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160] |
| returning 0 |
| [5201<->39222]: tcp_select_window(<-) |
| [5201<->39222]: ADVERTISING WIN 0, ACK_SEQ: 265600160 |
| [5201<->39222]: [__tcp_transmit_skb(<-) |
| [5201<->39222]: tcp_rcv_established(<-) |
| [5201<->39222]: tcp_v4_rcv(<-) |
| |
| // Receive queue is at 85 buffers and we are out of memory. |
| // We drop the incoming buffer, although it is in sequence, and decide |
| // to send an advertisement with a window of zero. |
| // We don't update tp->rcv_wnd and tp->rcv_wup accordingly, which means |
| // we unconditionally shrink the window. |
| |
| [5201<->39222]: tcp_recvmsg_locked(->) |
| [5201<->39222]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160 |
| [5201<->39222]: [new_win = 0, win_now = 131184, 2 * win_now = 262368] |
| [5201<->39222]: [new_win >= (2 * win_now) ? --> time_to_ack = 0] |
| [5201<->39222]: NOT calling tcp_send_ack() |
| [tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160] |
| [5201<->39222]: __tcp_cleanup_rbuf(<-) |
| [rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184] |
| [copied_seq 260040464->260040464 (0), unread 5559696, qlen 85, ofoq 0] |
| returning 6104 bytes |
| [5201<->39222]: tcp_recvmsg_locked(<-) |
| |
| // After each read, the algorithm for calculating the new receive |
| // window in __tcp_cleanup_rbuf() finds it is too small to advertise |
| // or to update tp->rcv_wnd. |
| // Meanwhile, the peer thinks the window is zero, and will not send |
| // any more data to trigger an update from the interrupt mode side. |
| |
| [5201<->39222]: tcp_recvmsg_locked(->) |
| [5201<->39222]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160 |
| [5201<->39222]: [new_win = 262144, win_now = 131184, 2 * win_now = 262368] |
| [5201<->39222]: [new_win >= (2 * win_now) ? --> time_to_ack = 0] |
| [5201<->39222]: NOT calling tcp_send_ack() |
| [tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160] |
| [5201<->39222]: __tcp_cleanup_rbuf(<-) |
| [rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184] |
| [copied_seq 260099840->260171536 (71696), unread 5428624, qlen 83, ofoq 0] |
| returning 131072 bytes |
| [5201<->39222]: tcp_recvmsg_locked(<-) |
| |
| // The above pattern repeats again and again, since nothing changes |
| // between the reads. |
| |
| [...] |
| |
| [5201<->39222]: tcp_recvmsg_locked(->) |
| [5201<->39222]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160 |
| [5201<->39222]: [new_win = 262144, win_now = 131184, 2 * win_now = 262368] |
| [5201<->39222]: [new_win >= (2 * win_now) ? --> time_to_ack = 0] |
| [5201<->39222]: NOT calling tcp_send_ack() |
| [tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160] |
| [5201<->39222]: __tcp_cleanup_rbuf(<-) |
| [rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184] |
| [copied_seq 265600160->265600160 (0), unread 0, qlen 0, ofoq 0] |
| returning 54672 bytes |
| [5201<->39222]: tcp_recvmsg_locked(<-) |
| |
| // The receive queue is empty, but no new advertisement has been sent. |
| // The peer still thinks the receive window is zero, and sends nothing. |
| // We have ended up in a deadlock situation. |
| |
| Note that well behaved endpoints will send win0 probes, so the problem |
| will not occur. |
| |
| Furthermore, we have observed that in these situations this side may |
| send out an updated 'th->ack_seq´ which is not stored in tp->rcv_wup |
| as it should be. Backing ack_seq seems to be harmless, but is of |
| course still wrong from a protocol viewpoint. |
| |
| We fix this by updating the socket state correctly when a packet has |
| been dropped because of memory exhaustion and we have to advertize |
| a zero window. |
| |
| Further testing shows that the connection recovers neatly from the |
| squeeze situation, and traffic can continue indefinitely. |
| |
| The Linux kernel CVE team has assigned CVE-2025-21710 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.6 with commit e2142825c120d4317abf7160a0fc34b3de532586 and fixed in 6.6.76 with commit b01e7ceb35dcb7ffad413da657b78c3340a09039 |
| Issue introduced in 6.6 with commit e2142825c120d4317abf7160a0fc34b3de532586 and fixed in 6.12.13 with commit 1dd823a46e25ffde1492c391934f69a9e5eb574f |
| Issue introduced in 6.6 with commit e2142825c120d4317abf7160a0fc34b3de532586 and fixed in 6.13.2 with commit b4055e2fe96f4ef101d8af0feb056d78d77514ff |
| Issue introduced in 6.6 with commit e2142825c120d4317abf7160a0fc34b3de532586 and fixed in 6.14 with commit 8c670bdfa58e48abad1d5b6ca1ee843ca91f7303 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2025-21710 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| net/ipv4/tcp_output.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/b01e7ceb35dcb7ffad413da657b78c3340a09039 |
| https://git.kernel.org/stable/c/1dd823a46e25ffde1492c391934f69a9e5eb574f |
| https://git.kernel.org/stable/c/b4055e2fe96f4ef101d8af0feb056d78d77514ff |
| https://git.kernel.org/stable/c/8c670bdfa58e48abad1d5b6ca1ee843ca91f7303 |