Merge branch 'udp_tunnel-speed-up-udp-tunnel-device-destruction-part-i'

Kuniyuki Iwashima says:

====================
udp_tunnel: Speed up UDP tunnel device destruction (Part I)

Most of the UDP tunnel devices call synchronize_rcu() twice
during destruction, for example, vxlan has

  1) synchronize_rcu() in udp_tunnel_sock_release()

  2) synchronize_net() in vxlan_sock_release()

The goal of this series is to remove the former, and another
followup series removes the latter.

synchronize_rcu() was added in udp_tunnel_sock_release() by
commit 3cf7203ca620 ("net/tunnel: wait until all sk_user_data
reader finish before releasing the sock").

This was intended to protect the fast path of a dying vxlan
from dereferencing vxlan_sock->sock->sk after sock_orphan()
has set sock->sk to NULL.

Most of the UDP tunnel devices store struct socket to its
private struct, but it is NOT needed in the fast paths;
struct sock is used there, but struct socket is only used
for tunnel setup / teardown.

This is probably because UDP tunnel functions accept struct
socket, but even such functions do not need it, except for
udp_tunnel_sock_release(), which can safely access sk->sk_socket.

The overview of the series:

  Patch 1 -  5 : Convert UDP tunnel helper to take struct sock
  Patch 6      : Small fix for 10-years-old bug
  Patch 7 - 14 : Store struct sock in tunnel devices
  Patch 15     : Remove synchronize_rcu() in udp_tunnel_sock_release()

With this change, a script creating/upping vxlan in 4000 netns
runs 10x faster.
====================

[See Link for benchmark results.]

Link: https://patch.msgid.link/20260502031401.3557229-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>