| From bippy-7c5fe7eed585 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2025-21932: mm: abort vma_modify() on merge out of memory failure |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| mm: abort vma_modify() on merge out of memory failure |
| |
| The remainder of vma_modify() relies upon the vmg state remaining pristine |
| after a merge attempt. |
| |
| Usually this is the case, however in the one edge case scenario of a merge |
| attempt failing not due to the specified range being unmergeable, but |
| rather due to an out of memory error arising when attempting to commit the |
| merge, this assumption becomes untrue. |
| |
| This results in vmg->start, end being modified, and thus the proceeding |
| attempts to split the VMA will be done with invalid start/end values. |
| |
| Thankfully, it is likely practically impossible for us to hit this in |
| reality, as it would require a maple tree node pre-allocation failure that |
| would likely never happen due to it being 'too small to fail', i.e. the |
| kernel would simply keep retrying reclaim until it succeeded. |
| |
| However, this scenario remains theoretically possible, and what we are |
| doing here is wrong so we must correct it. |
| |
| The safest option is, when this scenario occurs, to simply give up the |
| operation. If we cannot allocate memory to merge, then we cannot allocate |
| memory to split either (perhaps moreso!). |
| |
| Any scenario where this would be happening would be under very extreme |
| (likely fatal) memory pressure, so it's best we give up early. |
| |
| So there is no doubt it is appropriate to simply bail out in this |
| scenario. |
| |
| However, in general we must if at all possible never assume VMG state is |
| stable after a merge attempt, since merge operations update VMG fields. |
| As a result, additionally also make this clear by storing start, end in |
| local variables. |
| |
| The issue was reported originally by syzkaller, and by Brad Spengler (via |
| an off-list discussion), and in both instances it manifested as a |
| triggering of the assert: |
| |
| VM_WARN_ON_VMG(start >= end, vmg); |
| |
| In vma_merge_existing_range(). |
| |
| It seems at least one scenario in which this is occurring is one in which |
| the merge being attempted is due to an madvise() across multiple VMAs |
| which looks like this: |
| |
| start end |
| |<------>| |
| |----------|------| |
| | vma | next | |
| |----------|------| |
| |
| When madvise_walk_vmas() is invoked, we first find vma in the above |
| (determining prev to be equal to vma as we are offset into vma), and then |
| enter the loop. |
| |
| We determine the end of vma that forms part of the range we are |
| madvise()'ing by setting 'tmp' to this value: |
| |
| /* Here vma->vm_start <= start < (end|vma->vm_end) */ |
| tmp = vma->vm_end; |
| |
| We then invoke the madvise() operation via visit(), letting prev get |
| updated to point to vma as part of the operation: |
| |
| /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */ |
| error = visit(vma, &prev, start, tmp, arg); |
| |
| Where the visit() function pointer in this instance is |
| madvise_vma_behavior(). |
| |
| As observed in syzkaller reports, it is ultimately madvise_update_vma() |
| that is invoked, calling vma_modify_flags_name() and vma_modify() in turn. |
| |
| Then, in vma_modify(), we attempt the merge: |
| |
| merged = vma_merge_existing_range(vmg); |
| if (merged) |
| return merged; |
| |
| We invoke this with vmg->start, end set to start, tmp as such: |
| |
| start tmp |
| |<--->| |
| |----------|------| |
| | vma | next | |
| |----------|------| |
| |
| We find ourselves in the merge right scenario, but the one in which we |
| cannot remove the middle (we are offset into vma). |
| |
| Here we have a special case where vmg->start, end get set to perhaps |
| unintuitive values - we intended to shrink the middle VMA and expand the |
| next. |
| |
| This means vmg->start, end are set to... vma->vm_start, start. |
| |
| Now the commit_merge() fails, and vmg->start, end are left like this. |
| This means we return to the rest of vma_modify() with vmg->start, end |
| (here denoted as start', end') set as: |
| |
| start' end' |
| |<-->| |
| |----------|------| |
| | vma | next | |
| |----------|------| |
| |
| So we now erroneously try to split accordingly. This is where the |
| unfortunate stuff begins. |
| |
| We start with: |
| |
| /* Split any preceding portion of the VMA. */ |
| if (vma->vm_start < vmg->start) { |
| ... |
| } |
| |
| This doesn't trigger as we are no longer offset into vma at the start. |
| |
| But then we invoke: |
| |
| /* Split any trailing portion of the VMA. */ |
| if (vma->vm_end > vmg->end) { |
| ... |
| } |
| |
| Which does get invoked. This leaves us with: |
| |
| start' end' |
| |<-->| |
| |----|-----|------| |
| | vma| new | next | |
| |----|-----|------| |
| |
| We then return ultimately to madvise_walk_vmas(). Here 'new' is unknown, |
| and putting back the values known in this function we are faced with: |
| |
| start tmp end |
| | | | |
| |----|-----|------| |
| | vma| new | next | |
| |----|-----|------| |
| prev |
| |
| Then: |
| |
| start = tmp; |
| |
| So: |
| |
| start end |
| | | |
| |----|-----|------| |
| | vma| new | next | |
| |----|-----|------| |
| prev |
| |
| The following code does not cause anything to happen: |
| |
| if (prev && start < prev->vm_end) |
| start = prev->vm_end; |
| if (start >= end) |
| break; |
| |
| And then we invoke: |
| |
| if (prev) |
| vma = find_vma(mm, prev->vm_end); |
| |
| Which is where a problem occurs - we don't know about 'new' so we |
| essentially look for the vma after prev, which is new, whereas we actually |
| intended to discover next! |
| |
| So we end up with: |
| |
| start end |
| | | |
| |----|-----|------| |
| |prev| vma | next | |
| |----|-----|------| |
| |
| And we have successfully bypassed all of the checks madvise_walk_vmas() |
| has to ensure early exit should we end up moving out of range. |
| |
| We loop around, and hit: |
| |
| /* Here vma->vm_start <= start < (end|vma->vm_end) */ |
| tmp = vma->vm_end; |
| |
| Oh dear. Now we have: |
| |
| tmp |
| start end |
| | | |
| |----|-----|------| |
| |prev| vma | next | |
| |----|-----|------| |
| |
| We then invoke: |
| |
| /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */ |
| error = visit(vma, &prev, start, tmp, arg); |
| |
| Where start == tmp. That is, a zero range. This is not good. |
| |
| We invoke visit() which is madvise_vma_behavior() which does not check the |
| range (for good reason, it assumes all checks have been done before it was |
| called), which in turn finally calls madvise_update_vma(). |
| |
| The madvise_update_vma() function calls vma_modify_flags_name() in turn, |
| which ultimately invokes vma_modify() with... start == end. |
| |
| vma_modify() calls vma_merge_existing_range() and finally we hit: |
| |
| VM_WARN_ON_VMG(start >= end, vmg); |
| |
| Which triggers, as start == end. |
| |
| While it might be useful to add some CONFIG_DEBUG_VM asserts in these |
| instances to catch this kind of error, since we have just eliminated any |
| possibility of that happening, we will add such asserts separately as to |
| reduce churn and aid backporting. |
| |
| The Linux kernel CVE team has assigned CVE-2025-21932 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.12 with commit 2f1c6611b0a89afcb8641471af5f223c9caa01e0 and fixed in 6.12.19 with commit 79636d2981b066acd945117387a9533f56411f6f |
| Issue introduced in 6.12 with commit 2f1c6611b0a89afcb8641471af5f223c9caa01e0 and fixed in 6.13.7 with commit 53fd215f7886a1e8dea5a9ca1391dbb697fff601 |
| Issue introduced in 6.12 with commit 2f1c6611b0a89afcb8641471af5f223c9caa01e0 and fixed in 6.14 with commit 47b16d0462a460000b8f05dfb1292377ac48f3ca |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2025-21932 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| mm/vma.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/79636d2981b066acd945117387a9533f56411f6f |
| https://git.kernel.org/stable/c/53fd215f7886a1e8dea5a9ca1391dbb697fff601 |
| https://git.kernel.org/stable/c/47b16d0462a460000b8f05dfb1292377ac48f3ca |