module: add debug stats to help identify memory pressure
Loading modules with finit_module() can end up using vmalloc(), vmap()
and vmalloc() again, for a total of up to 3 separate allocations in the
worse case for a single module. We always kernel_read*() the module,
that's a vmalloc(). Then vmap() is used for the module decompression,
and if so the last read buffer is freed as we use the now decompressed
module buffer to stuff data into our copy module. The last one is
specific to architectures but pretty much that's generally a series
of vmalloc() for different ELF sections...
Evaluation with new stress-ng module support [1] with just 100 ops
us proving that you can end up using GiBs of data easily even if
we are trying to be very careful not to load modules which are already
loaded. 100 ops seems to resemble the sort of pressure a system with
about 400 CPUs can create on modules. Although those issues for so
many concurrent loads per CPU is silly and are being fixed, we lack
proper tooling to help diagnose easily what happened, when it happened
and what likely are the culprits -- userspace or kernel module
autoloading.
Provide an initial set of stats for debugfs which let us easily scrape
post-boot information about failed loads. This sort of information can
be used on production worklaods to try to optimize *avoiding* redundant
memory pressure using finit_module().
Screen shot:
root@kmod ~ # cat /sys/kernel/debug/modules/stats
Modules loaded 67
Total module size 11464704
Total mod text size 4194304
Failed kread bytes 890064
Failed kmod bytes 890064
Invalid kread bytes 890064
Invalid decompress bytes 0
Invalid mod bytes 890064
Average mod size 171115
Average mod text size 62602
Failed modules:
kvm_intel
kvm
irqbypass
crct10dif_pclmul
ghash_clmulni_intel
sha512_ssse3
sha512_generic
aesni_intel
crypto_simd
cryptd
evdev
serio_raw
virtio_pci
nvme
nvme_core
virtio_pci_legacy_dev
t10_pi
crc64_rocksoft
virtio_pci_modern_dev
crc32_pclmul
virtio
crc32c_intel
virtio_ring
crc64
[0] https://github.com/ColinIanKing/stress-ng.git
[1] echo 0 > /proc/sys/vm/oom_dump_tasks
./stress-ng --module 100 --module-name xfs
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
7 files changed