| From: Vlad Dumitrescu <vdumitrescu@nvidia.com> |
| Subject: percpu: fix race on alloc failed warning limit |
| Date: Fri, 22 Aug 2025 15:55:16 -0700 |
| |
| The 'allocation failed, ...' warning messages can cause unlimited log |
| spam, contrary to the implementation's intent. |
| |
| The warn_limit variable is accessed without synchronization. If more than |
| <warn_limit> threads enter the warning path at the same time, the variable |
| will get decremented past 0. Once it becomes negative, the non-zero check |
| will always return true leading to unlimited log spam. |
| |
| Use atomic operation to access warn_limit and change condition to test for |
| non-negative (>= 0) - atomic_dec_if_positive will return -1 once |
| warn_limit becomes 0. Continue to print disable message alongside the |
| last warning. |
| |
| While the change cited in Fixes is only adjacent, the warning limit |
| implementation was correct before it. Only non-atomic allocations were |
| considered for warnings, and those happened to hold pcpu_alloc_mutex while |
| accessing warn_limit. |
| |
| [vdumitrescu@nvidia.com: prevent warn_limit from going negative, per Christoph Lameter] |
| Link: https://lkml.kernel.org/r/ee87cc59-2717-4dbb-8052-1d2692c5aaaa@nvidia.com |
| Link: https://lkml.kernel.org/r/ab22061a-a62f-4429-945b-744e5cc4ba35@nvidia.com |
| Fixes: f7d77dfc91f7 ("mm/percpu.c: print error message too if atomic alloc failed") |
| Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> |
| Reviewed-by: Baoquan He <bhe@redhat.com> |
| Cc: Christoph Lameter (Ampere) <cl@gentwo.org> |
| Cc: Dennis Zhou <dennis@kernel.org> |
| Cc: Tejun Heo <tj@kernel.org> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| --- |
| |
| mm/percpu.c | 20 ++++++++++++-------- |
| 1 file changed, 12 insertions(+), 8 deletions(-) |
| |
| --- a/mm/percpu.c~percpu-fix-race-on-alloc-failed-warning-limit |
| +++ a/mm/percpu.c |
| @@ -1734,7 +1734,7 @@ void __percpu *pcpu_alloc_noprof(size_t |
| bool is_atomic; |
| bool do_warn; |
| struct obj_cgroup *objcg = NULL; |
| - static int warn_limit = 10; |
| + static atomic_t warn_limit = ATOMIC_INIT(10); |
| struct pcpu_chunk *chunk, *next; |
| const char *err; |
| int slot, off, cpu, ret; |
| @@ -1904,13 +1904,17 @@ fail_unlock: |
| fail: |
| trace_percpu_alloc_percpu_fail(reserved, is_atomic, size, align); |
| |
| - if (do_warn && warn_limit) { |
| - pr_warn("allocation failed, size=%zu align=%zu atomic=%d, %s\n", |
| - size, align, is_atomic, err); |
| - if (!is_atomic) |
| - dump_stack(); |
| - if (!--warn_limit) |
| - pr_info("limit reached, disable warning\n"); |
| + if (do_warn) { |
| + int remaining = atomic_dec_if_positive(&warn_limit); |
| + |
| + if (remaining >= 0) { |
| + pr_warn("allocation failed, size=%zu align=%zu atomic=%d, %s\n", |
| + size, align, is_atomic, err); |
| + if (!is_atomic) |
| + dump_stack(); |
| + if (remaining == 0) |
| + pr_info("limit reached, disable warning\n"); |
| + } |
| } |
| |
| if (is_atomic) { |
| _ |