| From bippy-1.2.0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@kernel.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2025-37772: RDMA/cma: Fix workqueue crash in cma_netevent_work_handler |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| RDMA/cma: Fix workqueue crash in cma_netevent_work_handler |
| |
| struct rdma_cm_id has member "struct work_struct net_work" |
| that is reused for enqueuing cma_netevent_work_handler()s |
| onto cma_wq. |
| |
| Below crash[1] can occur if more than one call to |
| cma_netevent_callback() occurs in quick succession, |
| which further enqueues cma_netevent_work_handler()s for the |
| same rdma_cm_id, overwriting any previously queued work-item(s) |
| that was just scheduled to run i.e. there is no guarantee |
| the queued work item may run between two successive calls |
| to cma_netevent_callback() and the 2nd INIT_WORK would overwrite |
| the 1st work item (for the same rdma_cm_id), despite grabbing |
| id_table_lock during enqueue. |
| |
| Also drgn analysis [2] indicates the work item was likely overwritten. |
| |
| Fix this by moving the INIT_WORK() to __rdma_create_id(), |
| so that it doesn't race with any existing queue_work() or |
| its worker thread. |
| |
| [1] Trimmed crash stack: |
| ============================================= |
| BUG: kernel NULL pointer dereference, address: 0000000000000008 |
| kworker/u256:6 ... 6.12.0-0... |
| Workqueue: cma_netevent_work_handler [rdma_cm] (rdma_cm) |
| RIP: 0010:process_one_work+0xba/0x31a |
| Call Trace: |
| worker_thread+0x266/0x3a0 |
| kthread+0xcf/0x100 |
| ret_from_fork+0x31/0x50 |
| ret_from_fork_asm+0x1a/0x30 |
| ============================================= |
| |
| [2] drgn crash analysis: |
| |
| >>> trace = prog.crashed_thread().stack_trace() |
| >>> trace |
| (0) crash_setup_regs (./arch/x86/include/asm/kexec.h:111:15) |
| (1) __crash_kexec (kernel/crash_core.c:122:4) |
| (2) panic (kernel/panic.c:399:3) |
| (3) oops_end (arch/x86/kernel/dumpstack.c:382:3) |
| ... |
| (8) process_one_work (kernel/workqueue.c:3168:2) |
| (9) process_scheduled_works (kernel/workqueue.c:3310:3) |
| (10) worker_thread (kernel/workqueue.c:3391:4) |
| (11) kthread (kernel/kthread.c:389:9) |
| |
| Line workqueue.c:3168 for this kernel version is in process_one_work(): |
| 3168 strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN); |
| |
| >>> trace[8]["work"] |
| *(struct work_struct *)0xffff92577d0a21d8 = { |
| .data = (atomic_long_t){ |
| .counter = (s64)536870912, <=== Note |
| }, |
| .entry = (struct list_head){ |
| .next = (struct list_head *)0xffff924d075924c0, |
| .prev = (struct list_head *)0xffff924d075924c0, |
| }, |
| .func = (work_func_t)cma_netevent_work_handler+0x0 = 0xffffffffc2cec280, |
| } |
| |
| Suspicion is that pwq is NULL: |
| >>> trace[8]["pwq"] |
| (struct pool_workqueue *)<absent> |
| |
| In process_one_work(), pwq is assigned from: |
| struct pool_workqueue *pwq = get_work_pwq(work); |
| |
| and get_work_pwq() is: |
| static struct pool_workqueue *get_work_pwq(struct work_struct *work) |
| { |
| unsigned long data = atomic_long_read(&work->data); |
| |
| if (data & WORK_STRUCT_PWQ) |
| return work_struct_pwq(data); |
| else |
| return NULL; |
| } |
| |
| WORK_STRUCT_PWQ is 0x4: |
| >>> print(repr(prog['WORK_STRUCT_PWQ'])) |
| Object(prog, 'enum work_flags', value=4) |
| |
| But work->data is 536870912 which is 0x20000000. |
| So, get_work_pwq() returns NULL and we crash in process_one_work(): |
| 3168 strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN); |
| ============================================= |
| |
| The Linux kernel CVE team has assigned CVE-2025-37772 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.1.135 with commit 51003b2c872c63d28bcf5fbcc52cf7b05615f7b7 |
| Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.6.88 with commit c2b169fc7a12665d8a675c1ff14bca1b9c63fb9a |
| Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.12.25 with commit d23fd7a539ac078df119707110686a5b226ee3bb |
| Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.14.4 with commit b172a4a0de254f1fcce7591833a9a63547c2f447 |
| Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.15 with commit 45f5dcdd049719fb999393b30679605f16ebce14 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2025-37772 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| drivers/infiniband/core/cma.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/51003b2c872c63d28bcf5fbcc52cf7b05615f7b7 |
| https://git.kernel.org/stable/c/c2b169fc7a12665d8a675c1ff14bca1b9c63fb9a |
| https://git.kernel.org/stable/c/d23fd7a539ac078df119707110686a5b226ee3bb |
| https://git.kernel.org/stable/c/b172a4a0de254f1fcce7591833a9a63547c2f447 |
| https://git.kernel.org/stable/c/45f5dcdd049719fb999393b30679605f16ebce14 |