bpf: Only reuse after one RCU GP in bpf memory allocator
Currently the freed objects in bpf memory allocator may be reused
immediately by new allocation, it introduces use-after-bpf-ma-free
problem for non-preallocated hash map and makes lookup procedure
return incorrect result. The immediate reuse also makes introducing
new use case more difficult (e.g. qp-trie).
So implement reuse-after-RCU-GP to solve these problems. For
reuse-after-RCU-GP, the freed objects are reused only after one RCU
grace period and may be returned back to slab system after another
RCU-tasks-trace grace period. So for bpf programs which care about reuse
problem, these programs can use bpf_rcu_read_{lock,unlock}() to access
these freed objects safely and for those which doesn't care, there will
be safely use-after-bpf-ma-free because these objects have not been
freed by bpf memory allocator.
To handle the use case which does allocation and free on different CPUs,
a per-bpf-mem-alloc list is introduced to keep these reusable objects.
In order to reduce the risk of OOM, part of these reusable objects will
be freed and returned back to slab through RCU-tasks-trace call-back.
Before these freeing objects are freed, these objects are also available
for reuse.
As shown in the following benchmark results, the memory usage increases
a lot and the performance of overwrite and batch_op case is also
degraded. The benchmark is conducted on a KVM-VM with 8-CPUs and 16GB
memory. The command line for htab-mem-benchmark is:
./bench htab-mem --use-case $name --max-entries 16384 \
--full 50 -d 10 --producers=8
--prod-affinity=0-7
And the command line for map_perf_test benchmark is:
./map_perf_test 4 8 16384
htab-mem-benchmark (before):
| name | loop (k/s)| average memory (MiB)| peak memory (MiB)|
| -- | -- | -- | -- |
| no_op | 1160.66 | 0.99 | 1.00 |
| overwrite | 28.52 | 2.46 | 2.73 |
| batch_add_batch_del| 11.50 | 2.69 | 2.95 |
| add_del_on_diff_cpu| 3.75 | 15.85 | 24.24 |
map_perf_test (before)
2:hash_map_perf kmalloc 384527 events per sec
7:hash_map_perf kmalloc 359707 events per sec
6:hash_map_perf kmalloc 314229 events per sec
0:hash_map_perf kmalloc 306743 events per sec
3:hash_map_perf kmalloc 309987 events per sec
4:hash_map_perf kmalloc 309012 events per sec
1:hash_map_perf kmalloc 295757 events per sec
5:hash_map_perf kmalloc 292229 events per sec
htab-mem-benchmark (after):
| name | loop (k/s)| average memory (MiB)| peak memory (MiB)|
| -- | -- | -- | -- |
| no_op | 1159.18 | 0.99 | 0.99 |
| overwrite | 11.00 | 2288 | 4109 |
| batch_add_batch_del| 8.86 | 1558 | 2763 |
| add_del_on_diff_cpu| 4.74 | 11.39 | 14.77 |
map_perf_test (after)
0:hash_map_perf kmalloc 194677 events per sec
4:hash_map_perf kmalloc 194177 events per sec
1:hash_map_perf kmalloc 180662 events per sec
6:hash_map_perf kmalloc 181310 events per sec
5:hash_map_perf kmalloc 177213 events per sec
2:hash_map_perf kmalloc 173069 events per sec
3:hash_map_perf kmalloc 166792 events per sec
7:hash_map_perf kmalloc 165253 events per sec
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230606035310.4026145-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 files changed