| From foo@baz Thu Feb 27 20:11:26 PST 2014 |
| From: John Ogness <john.ogness@linutronix.de> |
| Date: Sun, 9 Feb 2014 18:40:11 -0800 |
| Subject: tcp: tsq: fix nonagle handling |
| |
| From: John Ogness <john.ogness@linutronix.de> |
| |
| [ Upstream commit bf06200e732de613a1277984bf34d1a21c2de03d ] |
| |
| Commit 46d3ceabd8d9 ("tcp: TCP Small Queues") introduced a possible |
| regression for applications using TCP_NODELAY. |
| |
| If TCP session is throttled because of tsq, we should consult |
| tp->nonagle when TX completion is done and allow us to send additional |
| segment, especially if this segment is not a full MSS. |
| Otherwise this segment is sent after an RTO. |
| |
| [edumazet] : Cooked the changelog, added another fix about testing |
| sk_wmem_alloc twice because TX completion can happen right before |
| setting TSQ_THROTTLED bit. |
| |
| This problem is particularly visible with recent auto corking, |
| but might also be triggered with low tcp_limit_output_bytes |
| values or NIC drivers delaying TX completion by hundred of usec, |
| and very low rtt. |
| |
| Thomas Glanzmann for example reported an iscsi regression, caused |
| by tcp auto corking making this bug quite visible. |
| |
| Fixes: 46d3ceabd8d9 ("tcp: TCP Small Queues") |
| Signed-off-by: John Ogness <john.ogness@linutronix.de> |
| Signed-off-by: Eric Dumazet <edumazet@google.com> |
| Reported-by: Thomas Glanzmann <thomas@glanzmann.de> |
| Signed-off-by: David S. Miller <davem@davemloft.net> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| --- |
| net/ipv4/tcp_output.c | 13 +++++++++++-- |
| 1 file changed, 11 insertions(+), 2 deletions(-) |
| |
| --- a/net/ipv4/tcp_output.c |
| +++ b/net/ipv4/tcp_output.c |
| @@ -696,7 +696,8 @@ static void tcp_tsq_handler(struct sock |
| if ((1 << sk->sk_state) & |
| (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_CLOSING | |
| TCPF_CLOSE_WAIT | TCPF_LAST_ACK)) |
| - tcp_write_xmit(sk, tcp_current_mss(sk), 0, 0, GFP_ATOMIC); |
| + tcp_write_xmit(sk, tcp_current_mss(sk), tcp_sk(sk)->nonagle, |
| + 0, GFP_ATOMIC); |
| } |
| /* |
| * One tasklest per cpu tries to send more skbs. |
| @@ -1884,7 +1885,15 @@ static bool tcp_write_xmit(struct sock * |
| |
| if (atomic_read(&sk->sk_wmem_alloc) > limit) { |
| set_bit(TSQ_THROTTLED, &tp->tsq_flags); |
| - break; |
| + /* It is possible TX completion already happened |
| + * before we set TSQ_THROTTLED, so we must |
| + * test again the condition. |
| + * We abuse smp_mb__after_clear_bit() because |
| + * there is no smp_mb__after_set_bit() yet |
| + */ |
| + smp_mb__after_clear_bit(); |
| + if (atomic_read(&sk->sk_wmem_alloc) > limit) |
| + break; |
| } |
| |
| limit = mss_now; |