cve/published/2025/CVE-2025-37821.mbox - pub/scm/linux/security/vulns - Git at Google

 From bippy-1.2.0 Mon Sep 17 00:00:00 2001
 From: Greg Kroah-Hartman <gregkh@kernel.org>
 To: <linux-cve-announce@vger.kernel.org>
 Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
 Subject: CVE-2025-37821: sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash

 Description
 ===========

 In the Linux kernel, the following vulnerability has been resolved:

 sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash

 There is a code path in dequeue_entities() that can set the slice of a
 sched_entity to U64_MAX, which sometimes results in a crash.

 The offending case is when dequeue_entities() is called to dequeue a
 delayed group entity, and then the entity's parent's dequeue is delayed.
 In that case:

 1. In the if (entity_is_task(se)) else block at the beginning of
    dequeue_entities(), slice is set to
    cfs_rq_min_slice(group_cfs_rq(se)). If the entity was delayed, then
    it has no queued tasks, so cfs_rq_min_slice() returns U64_MAX.
 2. The first for_each_sched_entity() loop dequeues the entity.
 3. If the entity was its parent's only child, then the next iteration
    tries to dequeue the parent.
 4. If the parent's dequeue needs to be delayed, then it breaks from the
    first for_each_sched_entity() loop _without updating slice_.
 5. The second for_each_sched_entity() loop sets the parent's ->slice to
    the saved slice, which is still U64_MAX.

 This throws off subsequent calculations with potentially catastrophic
 results. A manifestation we saw in production was:

 6. In update_entity_lag(), se->slice is used to calculate limit, which
    ends up as a huge negative number.
 7. limit is used in se->vlag = clamp(vlag, -limit, limit). Because limit
    is negative, vlag > limit, so se->vlag is set to the same huge
    negative number.
 8. In place_entity(), se->vlag is scaled, which overflows and results in
    another huge (positive or negative) number.
 9. The adjusted lag is subtracted from se->vruntime, which increases or
    decreases se->vruntime by a huge number.
 10. pick_eevdf() calls entity_eligible()/vruntime_eligible(), which
     incorrectly returns false because the vruntime is so far from the
     other vruntimes on the queue, causing the
     (vruntime - cfs_rq->min_vruntime) * load calulation to overflow.
 11. Nothing appears to be eligible, so pick_eevdf() returns NULL.
 12. pick_next_entity() tries to dereference the return value of
     pick_eevdf() and crashes.

 Dumping the cfs_rq states from the core dumps with drgn showed tell-tale
 huge vruntime ranges and bogus vlag values, and I also traced se->slice
 being set to U64_MAX on live systems (which was usually "benign" since
 the rest of the runqueue needed to be in a particular state to crash).

 Fix it in dequeue_entities() by always setting slice from the first
 non-empty cfs_rq.

 The Linux kernel CVE team has assigned CVE-2025-37821 to this issue.


 Affected and fixed versions
 ===========================

 	Issue introduced in 6.12 with commit aef6987d89544d63a47753cf3741cabff0b5574c and fixed in 6.12.29 with commit 86b37810fa1e40b93171da023070b99ccbb4ea04
 	Issue introduced in 6.12 with commit aef6987d89544d63a47753cf3741cabff0b5574c and fixed in 6.14.5 with commit 50a665496881262519f115f1bfe5822f30580eb0
 	Issue introduced in 6.12 with commit aef6987d89544d63a47753cf3741cabff0b5574c and fixed in 6.15 with commit bbce3de72be56e4b5f68924b7da9630cc89aa1a8

 Please see https://www.kernel.org for a full list of currently supported
 kernel versions by the kernel community.

 Unaffected versions might change over time as fixes are backported to
 older supported kernel versions.  The official CVE entry at
 	https://cve.org/CVERecord/?id=CVE-2025-37821
 will be updated if fixes are backported, please check that for the most
 up to date information about this issue.


 Affected files
 ==============

 The file(s) affected by this issue are:
 	kernel/sched/fair.c


 Mitigation
 ==========

 The Linux kernel CVE team recommends that you update to the latest
 stable kernel version for this, and many other bugfixes.  Individual
 changes are never tested alone, but rather are part of a larger kernel
 release.  Cherry-picking individual commits is not recommended or
 supported by the Linux kernel community at all.  If however, updating to
 the latest release is impossible, the individual changes to resolve this
 issue can be found at these commits:
 	https://git.kernel.org/stable/c/86b37810fa1e40b93171da023070b99ccbb4ea04
 	https://git.kernel.org/stable/c/50a665496881262519f115f1bfe5822f30580eb0
 	https://git.kernel.org/stable/c/bbce3de72be56e4b5f68924b7da9630cc89aa1a8
	From bippy-1.2.0 Mon Sep 17 00:00:00 2001
	From: Greg Kroah-Hartman <gregkh@kernel.org>
	To: <linux-cve-announce@vger.kernel.org>
	Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
	Subject: CVE-2025-37821: sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash

	Description
	===========

	In the Linux kernel, the following vulnerability has been resolved:

	sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash

	There is a code path in dequeue_entities() that can set the slice of a
	sched_entity to U64_MAX, which sometimes results in a crash.

	The offending case is when dequeue_entities() is called to dequeue a
	delayed group entity, and then the entity's parent's dequeue is delayed.
	In that case:

	1. In the if (entity_is_task(se)) else block at the beginning of
	dequeue_entities(), slice is set to
	cfs_rq_min_slice(group_cfs_rq(se)). If the entity was delayed, then
	it has no queued tasks, so cfs_rq_min_slice() returns U64_MAX.
	2. The first for_each_sched_entity() loop dequeues the entity.
	3. If the entity was its parent's only child, then the next iteration
	tries to dequeue the parent.
	4. If the parent's dequeue needs to be delayed, then it breaks from the
	first for_each_sched_entity() loop _without updating slice_.
	5. The second for_each_sched_entity() loop sets the parent's ->slice to
	the saved slice, which is still U64_MAX.

	This throws off subsequent calculations with potentially catastrophic
	results. A manifestation we saw in production was:

	6. In update_entity_lag(), se->slice is used to calculate limit, which
	ends up as a huge negative number.
	7. limit is used in se->vlag = clamp(vlag, -limit, limit). Because limit
	is negative, vlag > limit, so se->vlag is set to the same huge
	negative number.
	8. In place_entity(), se->vlag is scaled, which overflows and results in
	another huge (positive or negative) number.
	9. The adjusted lag is subtracted from se->vruntime, which increases or
	decreases se->vruntime by a huge number.
	10. pick_eevdf() calls entity_eligible()/vruntime_eligible(), which
	incorrectly returns false because the vruntime is so far from the
	other vruntimes on the queue, causing the
	(vruntime - cfs_rq->min_vruntime) * load calulation to overflow.
	11. Nothing appears to be eligible, so pick_eevdf() returns NULL.
	12. pick_next_entity() tries to dereference the return value of
	pick_eevdf() and crashes.

	Dumping the cfs_rq states from the core dumps with drgn showed tell-tale
	huge vruntime ranges and bogus vlag values, and I also traced se->slice
	being set to U64_MAX on live systems (which was usually "benign" since
	the rest of the runqueue needed to be in a particular state to crash).

	Fix it in dequeue_entities() by always setting slice from the first
	non-empty cfs_rq.

	The Linux kernel CVE team has assigned CVE-2025-37821 to this issue.


	Affected and fixed versions
	===========================

	Issue introduced in 6.12 with commit aef6987d89544d63a47753cf3741cabff0b5574c and fixed in 6.12.29 with commit 86b37810fa1e40b93171da023070b99ccbb4ea04
	Issue introduced in 6.12 with commit aef6987d89544d63a47753cf3741cabff0b5574c and fixed in 6.14.5 with commit 50a665496881262519f115f1bfe5822f30580eb0
	Issue introduced in 6.12 with commit aef6987d89544d63a47753cf3741cabff0b5574c and fixed in 6.15 with commit bbce3de72be56e4b5f68924b7da9630cc89aa1a8

	Please see https://www.kernel.org for a full list of currently supported
	kernel versions by the kernel community.

	Unaffected versions might change over time as fixes are backported to
	older supported kernel versions. The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2025-37821
	will be updated if fixes are backported, please check that for the most
	up to date information about this issue.


	Affected files
	==============

	The file(s) affected by this issue are:
	kernel/sched/fair.c


	Mitigation
	==========

	The Linux kernel CVE team recommends that you update to the latest
	stable kernel version for this, and many other bugfixes. Individual
	changes are never tested alone, but rather are part of a larger kernel
	release. Cherry-picking individual commits is not recommended or
	supported by the Linux kernel community at all. If however, updating to
	the latest release is impossible, the individual changes to resolve this
	issue can be found at these commits:
	https://git.kernel.org/stable/c/86b37810fa1e40b93171da023070b99ccbb4ea04
	https://git.kernel.org/stable/c/50a665496881262519f115f1bfe5822f30580eb0
	https://git.kernel.org/stable/c/bbce3de72be56e4b5f68924b7da9630cc89aa1a8