| From 00bc0ef5880dc7b82f9c320dead4afaad48e47be Mon Sep 17 00:00:00 2001 |
| From: Jakub Sitnicki <jkbs@redhat.com> |
| Date: Wed, 8 Jun 2016 15:13:34 +0200 |
| Subject: ipv6: Skip XFRM lookup if dst_entry in socket cache is valid |
| |
| From: Jakub Sitnicki <jkbs@redhat.com> |
| |
| commit 00bc0ef5880dc7b82f9c320dead4afaad48e47be upstream. |
| |
| At present we perform an xfrm_lookup() for each UDPv6 message we |
| send. The lookup involves querying the flow cache (flow_cache_lookup) |
| and, in case of a cache miss, creating an XFRM bundle. |
| |
| If we miss the flow cache, we can end up creating a new bundle and |
| deriving the path MTU (xfrm_init_pmtu) from on an already transformed |
| dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down |
| to xfrm_lookup(). This can happen only if we're caching the dst_entry |
| in the socket, that is when we're using a connected UDP socket. |
| |
| To put it another way, the path MTU shrinks each time we miss the flow |
| cache, which later on leads to incorrectly fragmented payload. It can |
| be observed with ESPv6 in transport mode: |
| |
| 1) Set up a transformation and lower the MTU to trigger fragmentation |
| # ip xfrm policy add dir out src ::1 dst ::1 \ |
| tmpl src ::1 dst ::1 proto esp spi 1 |
| # ip xfrm state add src ::1 dst ::1 \ |
| proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b |
| # ip link set dev lo mtu 1500 |
| |
| 2) Monitor the packet flow and set up an UDP sink |
| # tcpdump -ni lo -ttt & |
| # socat udp6-listen:12345,fork /dev/null & |
| |
| 3) Send a datagram that needs fragmentation with a connected socket |
| # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345 |
| 2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error |
| 00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448 |
| 00:00:00.000014 IP6 ::1 > ::1: frag (1448|32) |
| 00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272 |
| (^ ICMPv6 Parameter Problem) |
| 00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136 |
| |
| 4) Compare it to a non-connected socket |
| # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345 |
| 00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448 |
| 00:00:00.000010 IP6 ::1 > ::1: frag (1448|64) |
| |
| What happens in step (3) is: |
| |
| 1) when connecting the socket in __ip6_datagram_connect(), we |
| perform an XFRM lookup, miss the flow cache, create an XFRM |
| bundle, and cache the destination, |
| |
| 2) afterwards, when sending the datagram, we perform an XFRM lookup, |
| again, miss the flow cache (due to mismatch of flowi6_iif and |
| flowi6_oif, which is an issue of its own), and recreate an XFRM |
| bundle based on the cached (and already transformed) destination. |
| |
| To prevent the recreation of an XFRM bundle, avoid an XFRM lookup |
| altogether whenever we already have a destination entry cached in the |
| socket. This prevents the path MTU shrinkage and brings us on par with |
| UDPv4. |
| |
| The fix also benefits connected PINGv6 sockets, another user of |
| ip6_sk_dst_lookup_flow(), who also suffer messages being transformed |
| twice. |
| |
| Joint work with Hannes Frederic Sowa. |
| |
| Reported-by: Jan Tluka <jtluka@redhat.com> |
| Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> |
| Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> |
| Signed-off-by: David S. Miller <davem@davemloft.net> |
| Signed-off-by: Benedict Wong <benedictwong@google.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| net/ipv6/ip6_output.c | 11 +++-------- |
| 1 file changed, 3 insertions(+), 8 deletions(-) |
| |
| --- a/net/ipv6/ip6_output.c |
| +++ b/net/ipv6/ip6_output.c |
| @@ -1038,17 +1038,12 @@ struct dst_entry *ip6_sk_dst_lookup_flow |
| const struct in6_addr *final_dst) |
| { |
| struct dst_entry *dst = sk_dst_check(sk, inet6_sk(sk)->dst_cookie); |
| - int err; |
| |
| dst = ip6_sk_dst_check(sk, dst, fl6); |
| + if (!dst) |
| + dst = ip6_dst_lookup_flow(sk, fl6, final_dst); |
| |
| - err = ip6_dst_lookup_tail(sk, &dst, fl6); |
| - if (err) |
| - return ERR_PTR(err); |
| - if (final_dst) |
| - fl6->daddr = *final_dst; |
| - |
| - return xfrm_lookup_route(sock_net(sk), dst, flowi6_to_flowi(fl6), sk, 0); |
| + return dst; |
| } |
| EXPORT_SYMBOL_GPL(ip6_sk_dst_lookup_flow); |
| |