cve/published/2025/CVE-2025-37772.mbox - pub/scm/linux/security/vulns - Git at Google

 From bippy-1.2.0 Mon Sep 17 00:00:00 2001
 From: Greg Kroah-Hartman <gregkh@kernel.org>
 To: <linux-cve-announce@vger.kernel.org>
 Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
 Subject: CVE-2025-37772: RDMA/cma: Fix workqueue crash in cma_netevent_work_handler

 Description
 ===========

 In the Linux kernel, the following vulnerability has been resolved:

 RDMA/cma: Fix workqueue crash in cma_netevent_work_handler

 struct rdma_cm_id has member "struct work_struct net_work"
 that is reused for enqueuing cma_netevent_work_handler()s
 onto cma_wq.

 Below crash[1] can occur if more than one call to
 cma_netevent_callback() occurs in quick succession,
 which further enqueues cma_netevent_work_handler()s for the
 same rdma_cm_id, overwriting any previously queued work-item(s)
 that was just scheduled to run i.e. there is no guarantee
 the queued work item may run between two successive calls
 to cma_netevent_callback() and the 2nd INIT_WORK would overwrite
 the 1st work item (for the same rdma_cm_id), despite grabbing
 id_table_lock during enqueue.

 Also drgn analysis [2] indicates the work item was likely overwritten.

 Fix this by moving the INIT_WORK() to __rdma_create_id(),
 so that it doesn't race with any existing queue_work() or
 its worker thread.

 [1] Trimmed crash stack:
 =============================================
 BUG: kernel NULL pointer dereference, address: 0000000000000008
 kworker/u256:6 ... 6.12.0-0...
 Workqueue:  cma_netevent_work_handler [rdma_cm] (rdma_cm)
 RIP: 0010:process_one_work+0xba/0x31a
 Call Trace:
  worker_thread+0x266/0x3a0
  kthread+0xcf/0x100
  ret_from_fork+0x31/0x50
  ret_from_fork_asm+0x1a/0x30
 =============================================

 [2] drgn crash analysis:

 >>> trace = prog.crashed_thread().stack_trace()
 >>> trace
 (0)  crash_setup_regs (./arch/x86/include/asm/kexec.h:111:15)
 (1)  __crash_kexec (kernel/crash_core.c:122:4)
 (2)  panic (kernel/panic.c:399:3)
 (3)  oops_end (arch/x86/kernel/dumpstack.c:382:3)
 ...
 (8)  process_one_work (kernel/workqueue.c:3168:2)
 (9)  process_scheduled_works (kernel/workqueue.c:3310:3)
 (10) worker_thread (kernel/workqueue.c:3391:4)
 (11) kthread (kernel/kthread.c:389:9)

 Line workqueue.c:3168 for this kernel version is in process_one_work():
 3168	strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN);

 >>> trace[8]["work"]
 *(struct work_struct *)0xffff92577d0a21d8 = {
 	.data = (atomic_long_t){
 		.counter = (s64)536870912,    <=== Note
 	},
 	.entry = (struct list_head){
 		.next = (struct list_head *)0xffff924d075924c0,
 		.prev = (struct list_head *)0xffff924d075924c0,
 	},
 	.func = (work_func_t)cma_netevent_work_handler+0x0 = 0xffffffffc2cec280,
 }

 Suspicion is that pwq is NULL:
 >>> trace[8]["pwq"]
 (struct pool_workqueue *)<absent>

 In process_one_work(), pwq is assigned from:
 struct pool_workqueue *pwq = get_work_pwq(work);

 and get_work_pwq() is:
 static struct pool_workqueue *get_work_pwq(struct work_struct *work)
 {
  	unsigned long data = atomic_long_read(&work->data);

  	if (data & WORK_STRUCT_PWQ)
  		return work_struct_pwq(data);
  	else
  		return NULL;
 }

 WORK_STRUCT_PWQ is 0x4:
 >>> print(repr(prog['WORK_STRUCT_PWQ']))
 Object(prog, 'enum work_flags', value=4)

 But work->data is 536870912 which is 0x20000000.
 So, get_work_pwq() returns NULL and we crash in process_one_work():
 3168	strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN);
 =============================================

 The Linux kernel CVE team has assigned CVE-2025-37772 to this issue.


 Affected and fixed versions
 ===========================

 	Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.1.135 with commit 51003b2c872c63d28bcf5fbcc52cf7b05615f7b7
 	Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.6.88 with commit c2b169fc7a12665d8a675c1ff14bca1b9c63fb9a
 	Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.12.25 with commit d23fd7a539ac078df119707110686a5b226ee3bb
 	Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.14.4 with commit b172a4a0de254f1fcce7591833a9a63547c2f447
 	Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.15 with commit 45f5dcdd049719fb999393b30679605f16ebce14

 Please see https://www.kernel.org for a full list of currently supported
 kernel versions by the kernel community.

 Unaffected versions might change over time as fixes are backported to
 older supported kernel versions.  The official CVE entry at
 	https://cve.org/CVERecord/?id=CVE-2025-37772
 will be updated if fixes are backported, please check that for the most
 up to date information about this issue.


 Affected files
 ==============

 The file(s) affected by this issue are:
 	drivers/infiniband/core/cma.c


 Mitigation
 ==========

 The Linux kernel CVE team recommends that you update to the latest
 stable kernel version for this, and many other bugfixes.  Individual
 changes are never tested alone, but rather are part of a larger kernel
 release.  Cherry-picking individual commits is not recommended or
 supported by the Linux kernel community at all.  If however, updating to
 the latest release is impossible, the individual changes to resolve this
 issue can be found at these commits:
 	https://git.kernel.org/stable/c/51003b2c872c63d28bcf5fbcc52cf7b05615f7b7
 	https://git.kernel.org/stable/c/c2b169fc7a12665d8a675c1ff14bca1b9c63fb9a
 	https://git.kernel.org/stable/c/d23fd7a539ac078df119707110686a5b226ee3bb
 	https://git.kernel.org/stable/c/b172a4a0de254f1fcce7591833a9a63547c2f447
 	https://git.kernel.org/stable/c/45f5dcdd049719fb999393b30679605f16ebce14
	From bippy-1.2.0 Mon Sep 17 00:00:00 2001
	From: Greg Kroah-Hartman <gregkh@kernel.org>
	To: <linux-cve-announce@vger.kernel.org>
	Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
	Subject: CVE-2025-37772: RDMA/cma: Fix workqueue crash in cma_netevent_work_handler

	Description
	===========

	In the Linux kernel, the following vulnerability has been resolved:

	RDMA/cma: Fix workqueue crash in cma_netevent_work_handler

	struct rdma_cm_id has member "struct work_struct net_work"
	that is reused for enqueuing cma_netevent_work_handler()s
	onto cma_wq.

	Below crash[1] can occur if more than one call to
	cma_netevent_callback() occurs in quick succession,
	which further enqueues cma_netevent_work_handler()s for the
	same rdma_cm_id, overwriting any previously queued work-item(s)
	that was just scheduled to run i.e. there is no guarantee
	the queued work item may run between two successive calls
	to cma_netevent_callback() and the 2nd INIT_WORK would overwrite
	the 1st work item (for the same rdma_cm_id), despite grabbing
	id_table_lock during enqueue.

	Also drgn analysis [2] indicates the work item was likely overwritten.

	Fix this by moving the INIT_WORK() to __rdma_create_id(),
	so that it doesn't race with any existing queue_work() or
	its worker thread.

	[1] Trimmed crash stack:
	=============================================
	BUG: kernel NULL pointer dereference, address: 0000000000000008
	kworker/u256:6 ... 6.12.0-0...
	Workqueue: cma_netevent_work_handler [rdma_cm] (rdma_cm)
	RIP: 0010:process_one_work+0xba/0x31a
	Call Trace:
	worker_thread+0x266/0x3a0
	kthread+0xcf/0x100
	ret_from_fork+0x31/0x50
	ret_from_fork_asm+0x1a/0x30
	=============================================

	[2] drgn crash analysis:

	>>> trace = prog.crashed_thread().stack_trace()
	>>> trace
	(0) crash_setup_regs (./arch/x86/include/asm/kexec.h:111:15)
	(1) __crash_kexec (kernel/crash_core.c:122:4)
	(2) panic (kernel/panic.c:399:3)
	(3) oops_end (arch/x86/kernel/dumpstack.c:382:3)
	...
	(8) process_one_work (kernel/workqueue.c:3168:2)
	(9) process_scheduled_works (kernel/workqueue.c:3310:3)
	(10) worker_thread (kernel/workqueue.c:3391:4)
	(11) kthread (kernel/kthread.c:389:9)

	Line workqueue.c:3168 for this kernel version is in process_one_work():
	3168 strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN);

	>>> trace[8]["work"]
	(struct work_struct )0xffff92577d0a21d8 = {
	.data = (atomic_long_t){
	.counter = (s64)536870912, <=== Note
	},
	.entry = (struct list_head){
	.next = (struct list_head *)0xffff924d075924c0,
	.prev = (struct list_head *)0xffff924d075924c0,
	},
	.func = (work_func_t)cma_netevent_work_handler+0x0 = 0xffffffffc2cec280,
	}

	Suspicion is that pwq is NULL:
	>>> trace[8]["pwq"]
	(struct pool_workqueue *)<absent>

	In process_one_work(), pwq is assigned from:
	struct pool_workqueue *pwq = get_work_pwq(work);

	and get_work_pwq() is:
	static struct pool_workqueue get_work_pwq(struct work_struct work)
	{
	unsigned long data = atomic_long_read(&work->data);

	if (data & WORK_STRUCT_PWQ)
	return work_struct_pwq(data);
	else
	return NULL;
	}

	WORK_STRUCT_PWQ is 0x4:
	>>> print(repr(prog['WORK_STRUCT_PWQ']))
	Object(prog, 'enum work_flags', value=4)

	But work->data is 536870912 which is 0x20000000.
	So, get_work_pwq() returns NULL and we crash in process_one_work():
	3168 strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN);
	=============================================

	The Linux kernel CVE team has assigned CVE-2025-37772 to this issue.


	Affected and fixed versions
	===========================

	Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.1.135 with commit 51003b2c872c63d28bcf5fbcc52cf7b05615f7b7
	Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.6.88 with commit c2b169fc7a12665d8a675c1ff14bca1b9c63fb9a
	Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.12.25 with commit d23fd7a539ac078df119707110686a5b226ee3bb
	Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.14.4 with commit b172a4a0de254f1fcce7591833a9a63547c2f447
	Issue introduced in 6.0 with commit 925d046e7e52c71c3531199ce137e141807ef740 and fixed in 6.15 with commit 45f5dcdd049719fb999393b30679605f16ebce14

	Please see https://www.kernel.org for a full list of currently supported
	kernel versions by the kernel community.

	Unaffected versions might change over time as fixes are backported to
	older supported kernel versions. The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2025-37772
	will be updated if fixes are backported, please check that for the most
	up to date information about this issue.


	Affected files
	==============

	The file(s) affected by this issue are:
	drivers/infiniband/core/cma.c


	Mitigation
	==========

	The Linux kernel CVE team recommends that you update to the latest
	stable kernel version for this, and many other bugfixes. Individual
	changes are never tested alone, but rather are part of a larger kernel
	release. Cherry-picking individual commits is not recommended or
	supported by the Linux kernel community at all. If however, updating to
	the latest release is impossible, the individual changes to resolve this
	issue can be found at these commits:
	https://git.kernel.org/stable/c/51003b2c872c63d28bcf5fbcc52cf7b05615f7b7
	https://git.kernel.org/stable/c/c2b169fc7a12665d8a675c1ff14bca1b9c63fb9a
	https://git.kernel.org/stable/c/d23fd7a539ac078df119707110686a5b226ee3bb
	https://git.kernel.org/stable/c/b172a4a0de254f1fcce7591833a9a63547c2f447
	https://git.kernel.org/stable/c/45f5dcdd049719fb999393b30679605f16ebce14