| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2025-21732: RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error |
| |
| This patch addresses a race condition for an ODP MR that can result in a |
| CQE with an error on the UMR QP. |
| |
| During the __mlx5_ib_dereg_mr() flow, the following sequence of calls |
| occurs: |
| |
| mlx5_revoke_mr() |
| mlx5r_umr_revoke_mr() |
| mlx5r_umr_post_send_wait() |
| |
| At this point, the lkey is freed from the hardware's perspective. |
| |
| However, concurrently, mlx5_ib_invalidate_range() might be triggered by |
| another task attempting to invalidate a range for the same freed lkey. |
| |
| This task will: |
| - Acquire the umem_odp->umem_mutex lock. |
| - Call mlx5r_umr_update_xlt() on the UMR QP. |
| - Since the lkey has already been freed, this can lead to a CQE error, |
| causing the UMR QP to enter an error state [1]. |
| |
| To resolve this race condition, the umem_odp->umem_mutex lock is now also |
| acquired as part of the mlx5_revoke_mr() scope. Upon successful revoke, |
| we set umem_odp->private which points to that MR to NULL, preventing any |
| further invalidation attempts on its lkey. |
| |
| [1] From dmesg: |
| |
| infiniband rocep8s0f0: dump_cqe:277:(pid 0): WC error: 6, Message: memory bind operation error |
| cqe_dump: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |
| cqe_dump: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |
| cqe_dump: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |
| cqe_dump: 00000030: 00 00 00 00 08 00 78 06 25 00 11 b9 00 0e dd d2 |
| |
| WARNING: CPU: 15 PID: 1506 at drivers/infiniband/hw/mlx5/umr.c:394 mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] |
| Modules linked in: ip6table_mangle ip6table_natip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core |
| CPU: 15 UID: 0 PID: 1506 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1626 |
| Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 |
| RIP: 0010:mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] |
| [..] |
| Call Trace: |
| <TASK> |
| mlx5r_umr_update_xlt+0x23c/0x3e0 [mlx5_ib] |
| mlx5_ib_invalidate_range+0x2e1/0x330 [mlx5_ib] |
| __mmu_notifier_invalidate_range_start+0x1e1/0x240 |
| zap_page_range_single+0xf1/0x1a0 |
| madvise_vma_behavior+0x677/0x6e0 |
| do_madvise+0x1a2/0x4b0 |
| __x64_sys_madvise+0x25/0x30 |
| do_syscall_64+0x6b/0x140 |
| entry_SYSCALL_64_after_hwframe+0x76/0x7e |
| |
| The Linux kernel CVE team has assigned CVE-2025-21732 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 5.13 with commit e6fb246ccafbdfc86e0750af021628132fdbceac and fixed in 6.12.14 with commit b13d32786acabf70a7b04ed24b7468fc3c82977c |
| Issue introduced in 5.13 with commit e6fb246ccafbdfc86e0750af021628132fdbceac and fixed in 6.13.3 with commit 5297f5ddffef47b94172ab0d3d62270002a3dcc1 |
| Issue introduced in 5.13 with commit e6fb246ccafbdfc86e0750af021628132fdbceac and fixed in 6.14 with commit abb604a1a9c87255c7a6f3b784410a9707baf467 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2025-21732 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| drivers/infiniband/hw/mlx5/mr.c |
| drivers/infiniband/hw/mlx5/odp.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/b13d32786acabf70a7b04ed24b7468fc3c82977c |
| https://git.kernel.org/stable/c/5297f5ddffef47b94172ab0d3d62270002a3dcc1 |
| https://git.kernel.org/stable/c/abb604a1a9c87255c7a6f3b784410a9707baf467 |