| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2025-21825: bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT |
| |
| During the update procedure, when overwrite element in a pre-allocated |
| htab, the freeing of old_element is protected by the bucket lock. The |
| reason why the bucket lock is necessary is that the old_element has |
| already been stashed in htab->extra_elems after alloc_htab_elem() |
| returns. If freeing the old_element after the bucket lock is unlocked, |
| the stashed element may be reused by concurrent update procedure and the |
| freeing of old_element will run concurrently with the reuse of the |
| old_element. However, the invocation of check_and_free_fields() may |
| acquire a spin-lock which violates the lockdep rule because its caller |
| has already held a raw-spin-lock (bucket lock). The following warning |
| will be reported when such race happens: |
| |
| BUG: scheduling while atomic: test_progs/676/0x00000003 |
| 3 locks held by test_progs/676: |
| #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830 |
| #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500 |
| #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0 |
| Modules linked in: bpf_testmod(O) |
| Preemption disabled at: |
| [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500 |
| CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ #11 |
| Tainted: [W]=WARN, [O]=OOT_MODULE |
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)... |
| Call Trace: |
| <TASK> |
| dump_stack_lvl+0x57/0x70 |
| dump_stack+0x10/0x20 |
| __schedule_bug+0x120/0x170 |
| __schedule+0x300c/0x4800 |
| schedule_rtlock+0x37/0x60 |
| rtlock_slowlock_locked+0x6d9/0x54c0 |
| rt_spin_lock+0x168/0x230 |
| hrtimer_cancel_wait_running+0xe9/0x1b0 |
| hrtimer_cancel+0x24/0x30 |
| bpf_timer_delete_work+0x1d/0x40 |
| bpf_timer_cancel_and_free+0x5e/0x80 |
| bpf_obj_free_fields+0x262/0x4a0 |
| check_and_free_fields+0x1d0/0x280 |
| htab_map_update_elem+0x7fc/0x1500 |
| bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43 |
| bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e |
| bpf_prog_test_run_syscall+0x322/0x830 |
| __sys_bpf+0x135d/0x3ca0 |
| __x64_sys_bpf+0x75/0xb0 |
| x64_sys_call+0x1b5/0xa10 |
| do_syscall_64+0x3b/0xc0 |
| entry_SYSCALL_64_after_hwframe+0x4b/0x53 |
| ... |
| </TASK> |
| |
| It seems feasible to break the reuse and refill of per-cpu extra_elems |
| into two independent parts: reuse the per-cpu extra_elems with bucket |
| lock being held and refill the old_element as per-cpu extra_elems after |
| the bucket lock is unlocked. However, it will make the concurrent |
| overwrite procedures on the same CPU return unexpected -E2BIG error when |
| the map is full. |
| |
| Therefore, the patch fixes the lock problem by breaking the cancelling |
| of bpf_timer into two steps for PREEMPT_RT: |
| 1) use hrtimer_try_to_cancel() and check its return value |
| 2) if the timer is running, use hrtimer_cancel() through a kworker to |
| cancel it again |
| Considering that the current implementation of hrtimer_cancel() will try |
| to acquire a being held softirq_expiry_lock when the current timer is |
| running, these steps above are reasonable. However, it also has |
| downside. When the timer is running, the cancelling of the timer is |
| delayed when releasing the last map uref. The delay is also fixable |
| (e.g., break the cancelling of bpf timer into two parts: one part in |
| locked scope, another one in unlocked scope), it can be revised later if |
| necessary. |
| |
| It is a bit hard to decide the right fix tag. One reason is that the |
| problem depends on PREEMPT_RT which is enabled in v6.12. Considering the |
| softirq_expiry_lock lock exists since v5.4 and bpf_timer is introduced |
| in v5.15, the bpf_timer commit is used in the fixes tag and an extra |
| depends-on tag is added to state the dependency on PREEMPT_RT. |
| |
| Depends-on: v6.12+ with PREEMPT_RT enabled |
| |
| The Linux kernel CVE team has assigned CVE-2025-21825 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 5.15 with commit b00628b1c7d595ae5b544e059c27b1f5828314b4 and fixed in 6.12.13 with commit 33e47d9573075342a41783a55c8c67bc71246fc1 |
| Issue introduced in 5.15 with commit b00628b1c7d595ae5b544e059c27b1f5828314b4 and fixed in 6.13.2 with commit fbeda3d939ca10063aafa7a77cc0f409d82cda88 |
| Issue introduced in 5.15 with commit b00628b1c7d595ae5b544e059c27b1f5828314b4 and fixed in 6.14 with commit 58f038e6d209d2dd862fcf5de55407855856794d |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2025-21825 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| kernel/bpf/helpers.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/33e47d9573075342a41783a55c8c67bc71246fc1 |
| https://git.kernel.org/stable/c/fbeda3d939ca10063aafa7a77cc0f409d82cda88 |
| https://git.kernel.org/stable/c/58f038e6d209d2dd862fcf5de55407855856794d |