| From foo@baz Fri Jan 22 01:21:57 PM CET 2021 |
| From: Eric Dumazet <edumazet@google.com> |
| Date: Wed, 13 Jan 2021 08:18:19 -0800 |
| Subject: net: avoid 32 x truesize under-estimation for tiny skbs |
| |
| From: Eric Dumazet <edumazet@google.com> |
| |
| [ Upstream commit 3226b158e67cfaa677fd180152bfb28989cb2fac ] |
| |
| Both virtio net and napi_get_frags() allocate skbs |
| with a very small skb->head |
| |
| While using page fragments instead of a kmalloc backed skb->head might give |
| a small performance improvement in some cases, there is a huge risk of |
| under estimating memory usage. |
| |
| For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations |
| per page (order-3 page in x86), or even 64 on PowerPC |
| |
| We have been tracking OOM issues on GKE hosts hitting tcp_mem limits |
| but consuming far more memory for TCP buffers than instructed in tcp_mem[2] |
| |
| Even if we force napi_alloc_skb() to only use order-0 pages, the issue |
| would still be there on arches with PAGE_SIZE >= 32768 |
| |
| This patch makes sure that small skb head are kmalloc backed, so that |
| other objects in the slab page can be reused instead of being held as long |
| as skbs are sitting in socket queues. |
| |
| Note that we might in the future use the sk_buff napi cache, |
| instead of going through a more expensive __alloc_skb() |
| |
| Another idea would be to use separate page sizes depending |
| on the allocated length (to never have more than 4 frags per page) |
| |
| I would like to thank Greg Thelen for his precious help on this matter, |
| analysing crash dumps is always a time consuming task. |
| |
| Fixes: fd11a83dd363 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb") |
| Signed-off-by: Eric Dumazet <edumazet@google.com> |
| Cc: Paolo Abeni <pabeni@redhat.com> |
| Cc: Greg Thelen <gthelen@google.com> |
| Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> |
| Acked-by: Michael S. Tsirkin <mst@redhat.com> |
| Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.com |
| Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| --- |
| net/core/skbuff.c | 9 +++++++-- |
| 1 file changed, 7 insertions(+), 2 deletions(-) |
| |
| --- a/net/core/skbuff.c |
| +++ b/net/core/skbuff.c |
| @@ -459,13 +459,17 @@ EXPORT_SYMBOL(__netdev_alloc_skb); |
| struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len, |
| gfp_t gfp_mask) |
| { |
| - struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); |
| + struct napi_alloc_cache *nc; |
| struct sk_buff *skb; |
| void *data; |
| |
| len += NET_SKB_PAD + NET_IP_ALIGN; |
| |
| - if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) || |
| + /* If requested length is either too small or too big, |
| + * we use kmalloc() for skb->head allocation. |
| + */ |
| + if (len <= SKB_WITH_OVERHEAD(1024) || |
| + len > SKB_WITH_OVERHEAD(PAGE_SIZE) || |
| (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) { |
| skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE); |
| if (!skb) |
| @@ -473,6 +477,7 @@ struct sk_buff *__napi_alloc_skb(struct |
| goto skb_success; |
| } |
| |
| + nc = this_cpu_ptr(&napi_alloc_cache); |
| len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); |
| len = SKB_DATA_ALIGN(len); |
| |