| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-56592: bpf: Call free_htab_elem() after htab_unlock_bucket() |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| bpf: Call free_htab_elem() after htab_unlock_bucket() |
| |
| For htab of maps, when the map is removed from the htab, it may hold the |
| last reference of the map. bpf_map_fd_put_ptr() will invoke |
| bpf_map_free_id() to free the id of the removed map element. However, |
| bpf_map_fd_put_ptr() is invoked while holding a bucket lock |
| (raw_spin_lock_t), and bpf_map_free_id() attempts to acquire map_idr_lock |
| (spinlock_t), triggering the following lockdep warning: |
| |
| ============================= |
| [ BUG: Invalid wait context ] |
| 6.11.0-rc4+ #49 Not tainted |
| ----------------------------- |
| test_maps/4881 is trying to lock: |
| ffffffff84884578 (map_idr_lock){+...}-{3:3}, at: bpf_map_free_id.part.0+0x21/0x70 |
| other info that might help us debug this: |
| context-{5:5} |
| 2 locks held by test_maps/4881: |
| #0: ffffffff846caf60 (rcu_read_lock){....}-{1:3}, at: bpf_fd_htab_map_update_elem+0xf9/0x270 |
| #1: ffff888149ced148 (&htab->lockdep_key#2){....}-{2:2}, at: htab_map_update_elem+0x178/0xa80 |
| stack backtrace: |
| CPU: 0 UID: 0 PID: 4881 Comm: test_maps Not tainted 6.11.0-rc4+ #49 |
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... |
| Call Trace: |
| <TASK> |
| dump_stack_lvl+0x6e/0xb0 |
| dump_stack+0x10/0x20 |
| __lock_acquire+0x73e/0x36c0 |
| lock_acquire+0x182/0x450 |
| _raw_spin_lock_irqsave+0x43/0x70 |
| bpf_map_free_id.part.0+0x21/0x70 |
| bpf_map_put+0xcf/0x110 |
| bpf_map_fd_put_ptr+0x9a/0xb0 |
| free_htab_elem+0x69/0xe0 |
| htab_map_update_elem+0x50f/0xa80 |
| bpf_fd_htab_map_update_elem+0x131/0x270 |
| htab_map_update_elem+0x50f/0xa80 |
| bpf_fd_htab_map_update_elem+0x131/0x270 |
| bpf_map_update_value+0x266/0x380 |
| __sys_bpf+0x21bb/0x36b0 |
| __x64_sys_bpf+0x45/0x60 |
| x64_sys_call+0x1b2a/0x20d0 |
| do_syscall_64+0x5d/0x100 |
| entry_SYSCALL_64_after_hwframe+0x76/0x7e |
| |
| One way to fix the lockdep warning is using raw_spinlock_t for |
| map_idr_lock as well. However, bpf_map_alloc_id() invokes |
| idr_alloc_cyclic() after acquiring map_idr_lock, it will trigger a |
| similar lockdep warning because the slab's lock (s->cpu_slab->lock) is |
| still a spinlock. |
| |
| Instead of changing map_idr_lock's type, fix the issue by invoking |
| htab_put_fd_value() after htab_unlock_bucket(). However, only deferring |
| the invocation of htab_put_fd_value() is not enough, because the old map |
| pointers in htab of maps can not be saved during batched deletion. |
| Therefore, also defer the invocation of free_htab_elem(), so these |
| to-be-freed elements could be linked together similar to lru map. |
| |
| There are four callers for ->map_fd_put_ptr: |
| |
| (1) alloc_htab_elem() (through htab_put_fd_value()) |
| It invokes ->map_fd_put_ptr() under a raw_spinlock_t. The invocation of |
| htab_put_fd_value() can not simply move after htab_unlock_bucket(), |
| because the old element has already been stashed in htab->extra_elems. |
| It may be reused immediately after htab_unlock_bucket() and the |
| invocation of htab_put_fd_value() after htab_unlock_bucket() may release |
| the newly-added element incorrectly. Therefore, saving the map pointer |
| of the old element for htab of maps before unlocking the bucket and |
| releasing the map_ptr after unlock. Beside the map pointer in the old |
| element, should do the same thing for the special fields in the old |
| element as well. |
| |
| (2) free_htab_elem() (through htab_put_fd_value()) |
| Its caller includes __htab_map_lookup_and_delete_elem(), |
| htab_map_delete_elem() and __htab_map_lookup_and_delete_batch(). |
| |
| For htab_map_delete_elem(), simply invoke free_htab_elem() after |
| htab_unlock_bucket(). For __htab_map_lookup_and_delete_batch(), just |
| like lru map, linking the to-be-freed element into node_to_free list |
| and invoking free_htab_elem() for these element after unlock. It is safe |
| to reuse batch_flink as the link for node_to_free, because these |
| elements have been removed from the hash llist. |
| |
| Because htab of maps doesn't support lookup_and_delete operation, |
| __htab_map_lookup_and_delete_elem() doesn't have the problem, so kept |
| it as is. |
| |
| (3) fd_htab_map_free() |
| It invokes ->map_fd_put_ptr without raw_spinlock_t. |
| |
| (4) bpf_fd_htab_map_update_elem() |
| It invokes ->map_fd_put_ptr without raw_spinlock_t. |
| |
| After moving free_htab_elem() outside htab bucket lock scope, using |
| pcpu_freelist_push() instead of __pcpu_freelist_push() to disable |
| the irq before freeing elements, and protecting the invocations of |
| bpf_mem_cache_free() with migrate_{disable|enable} pair. |
| |
| The Linux kernel CVE team has assigned CVE-2024-56592 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Fixed in 6.6.66 with commit 10e8a2dec9ff1b81de8e892b0850924038adbc6d |
| Fixed in 6.12.5 with commit a50b4aa3007e63a590d501341f304676ebc74b3b |
| Fixed in 6.13 with commit b9e9ed90b10c82a4e9d4d70a2890f06bfcdd3b78 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-56592 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| kernel/bpf/hashtab.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/10e8a2dec9ff1b81de8e892b0850924038adbc6d |
| https://git.kernel.org/stable/c/a50b4aa3007e63a590d501341f304676ebc74b3b |
| https://git.kernel.org/stable/c/b9e9ed90b10c82a4e9d4d70a2890f06bfcdd3b78 |