| From bippy-1.2.0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@kernel.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2025-37821: sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash |
| |
| There is a code path in dequeue_entities() that can set the slice of a |
| sched_entity to U64_MAX, which sometimes results in a crash. |
| |
| The offending case is when dequeue_entities() is called to dequeue a |
| delayed group entity, and then the entity's parent's dequeue is delayed. |
| In that case: |
| |
| 1. In the if (entity_is_task(se)) else block at the beginning of |
| dequeue_entities(), slice is set to |
| cfs_rq_min_slice(group_cfs_rq(se)). If the entity was delayed, then |
| it has no queued tasks, so cfs_rq_min_slice() returns U64_MAX. |
| 2. The first for_each_sched_entity() loop dequeues the entity. |
| 3. If the entity was its parent's only child, then the next iteration |
| tries to dequeue the parent. |
| 4. If the parent's dequeue needs to be delayed, then it breaks from the |
| first for_each_sched_entity() loop _without updating slice_. |
| 5. The second for_each_sched_entity() loop sets the parent's ->slice to |
| the saved slice, which is still U64_MAX. |
| |
| This throws off subsequent calculations with potentially catastrophic |
| results. A manifestation we saw in production was: |
| |
| 6. In update_entity_lag(), se->slice is used to calculate limit, which |
| ends up as a huge negative number. |
| 7. limit is used in se->vlag = clamp(vlag, -limit, limit). Because limit |
| is negative, vlag > limit, so se->vlag is set to the same huge |
| negative number. |
| 8. In place_entity(), se->vlag is scaled, which overflows and results in |
| another huge (positive or negative) number. |
| 9. The adjusted lag is subtracted from se->vruntime, which increases or |
| decreases se->vruntime by a huge number. |
| 10. pick_eevdf() calls entity_eligible()/vruntime_eligible(), which |
| incorrectly returns false because the vruntime is so far from the |
| other vruntimes on the queue, causing the |
| (vruntime - cfs_rq->min_vruntime) * load calulation to overflow. |
| 11. Nothing appears to be eligible, so pick_eevdf() returns NULL. |
| 12. pick_next_entity() tries to dereference the return value of |
| pick_eevdf() and crashes. |
| |
| Dumping the cfs_rq states from the core dumps with drgn showed tell-tale |
| huge vruntime ranges and bogus vlag values, and I also traced se->slice |
| being set to U64_MAX on live systems (which was usually "benign" since |
| the rest of the runqueue needed to be in a particular state to crash). |
| |
| Fix it in dequeue_entities() by always setting slice from the first |
| non-empty cfs_rq. |
| |
| The Linux kernel CVE team has assigned CVE-2025-37821 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.12 with commit aef6987d89544d63a47753cf3741cabff0b5574c and fixed in 6.12.29 with commit 86b37810fa1e40b93171da023070b99ccbb4ea04 |
| Issue introduced in 6.12 with commit aef6987d89544d63a47753cf3741cabff0b5574c and fixed in 6.14.5 with commit 50a665496881262519f115f1bfe5822f30580eb0 |
| Issue introduced in 6.12 with commit aef6987d89544d63a47753cf3741cabff0b5574c and fixed in 6.15 with commit bbce3de72be56e4b5f68924b7da9630cc89aa1a8 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2025-37821 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| kernel/sched/fair.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/86b37810fa1e40b93171da023070b99ccbb4ea04 |
| https://git.kernel.org/stable/c/50a665496881262519f115f1bfe5822f30580eb0 |
| https://git.kernel.org/stable/c/bbce3de72be56e4b5f68924b7da9630cc89aa1a8 |