refs/tags/percpu-for-6.6 - pub/scm/linux/kernel/git/dennis/percpu.git

tag	1dac3ba40fe6b1e6900a6728ccd2123702a7b97e
tagger	Dennis Zhou <dennis@kernel.org>	Wed Aug 30 16:58:36 2023 -0700
object	14ef95be6f5558fb9e43aaf06ef9a1d6e0cae6c8

percpu: changes for v6.6 percpu * A couple cleanups by Baoquan He and Bibo Mao. The only behavior change is to start printing messages if we're under the warn limit for failed atomic allocations. percpu_counter * Shakeel introduced percpu counters into mm_struct which caused percpu allocations be on the hot path [1]. Originally I spent some time trying to improve the percpu allocator, but instead preferred what Mateusz Guzik proposed grouping at the allocation site, percpu_counter_init_many(). This allows a single percpu allocation to be shared by the counters. I like this approach because it creates a shared lifetime by the allocations. Additionally, I believe many inits have higher level synchronization requirements, like percpu_counter does against HOTPLUG_CPU. Therefore we can group these optimizations together. [1] https://lore.kernel.org/linux-mm/20221024052841.3291983-1-shakeelb@google.com/

commit	14ef95be6f5558fb9e43aaf06ef9a1d6e0cae6c8	[log] [tgz]
author	Mateusz Guzik <mjguzik@gmail.com>	Wed Aug 23 07:06:09 2023 +0200
committer	Dennis Zhou <dennis@kernel.org>	Fri Aug 25 08:10:35 2023 -0700
tree	7abdf1224e08569f9a4dbd499192deeaba8729ef
parent	c439d5e8a0deb7310b5bb4e5f2fe47c40ff5297f [diff]

kernel/fork: group allocation/free of per-cpu counters for mm struct A trivial execve scalability test which tries to be very friendly (statically linked binaries, all separate) is predominantly bottlenecked by back-to-back per-cpu counter allocations which serialize on global locks. Ease the pain by allocating and freeing them in one go. Bench can be found here: http://apollo.backplane.com/DFlyMisc/doexec.c $ cc -static -O2 -o static-doexec doexec.c $ ./static-doexec $(nproc) Even at a very modest scale of 26 cores (ops/s): before: 133543.63 after: 186061.81 (+39%) While with the patch these allocations remain a significant problem, the primary bottleneck shifts to page release handling. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20230823050609.2228718-3-mjguzik@gmail.com [Dennis: reflowed 1 line] Signed-off-by: Dennis Zhou <dennis@kernel.org>