| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2023-52934: mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups |
| |
| In commit 34488399fa08 ("mm/madvise: add file and shmem support to |
| MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none(): |
| |
| - if (!pmd_present(pmde)) |
| - return SCAN_PMD_NULL; |
| + if (pmd_none(pmde)) |
| + return SCAN_PMD_NONE; |
| |
| This was for-use by MADV_COLLAPSE file/shmem codepaths, where |
| MADV_COLLAPSE might identify a pte-mapped hugepage, only to have |
| khugepaged race-in, free the pte table, and clear the pmd. Such codepaths |
| include: |
| |
| A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER |
| already in the pagecache. |
| B) In retract_page_tables(), if we fail to grab mmap_lock for the target |
| mm/address. |
| |
| In these cases, collapse_pte_mapped_thp() really does expect a none (not |
| just !present) pmd, and we want to suitably identify that case separate |
| from the case where no pmd is found, or it's a bad-pmd (of course, many |
| things could happen once we drop mmap_lock, and the pmd could plausibly |
| undergo multiple transitions due to intervening fault, split, etc). |
| Regardless, the code is prepared install a huge-pmd only when the existing |
| pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd. |
| |
| However, the commit introduces a logical hole; namely, that we've allowed |
| !none- && !huge- && !bad-pmds to be classified as genuine |
| pte-table-mapping-pmds. One such example that could leak through are swap |
| entries. The pmd values aren't checked again before use in |
| pte_offset_map_lock(), which is expecting nothing less than a genuine |
| pte-table-mapping-pmd. |
| |
| We want to put back the !pmd_present() check (below the pmd_none() check), |
| but need to be careful to deal with subtleties in pmd transitions and |
| treatments by various arch. |
| |
| The issue is that __split_huge_pmd_locked() temporarily clears the present |
| bit (or otherwise marks the entry as invalid), but pmd_present() and |
| pmd_trans_huge() still need to return true while the pmd is in this |
| transitory state. For example, x86's pmd_present() also checks the |
| _PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also |
| checks a PMD_PRESENT_INVALID bit. |
| |
| Covering all 4 cases for x86 (all checks done on the same pmd value): |
| |
| 1) pmd_present() && pmd_trans_huge() |
| All we actually know here is that the PSE bit is set. Either: |
| a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE |
| is set. |
| => huge-pmd |
| b) We are currently racing with __split_huge_page(). The danger here |
| is that we proceed as-if we have a huge-pmd, but really we are |
| looking at a pte-mapping-pmd. So, what is the risk of this |
| danger? |
| |
| The only relevant path is: |
| |
| madvise_collapse() -> collapse_pte_mapped_thp() |
| |
| Where we might just incorrectly report back "success", when really |
| the memory isn't pmd-backed. This is fine, since split could |
| happen immediately after (actually) successful madvise_collapse(). |
| So, it should be safe to just assume huge-pmd here. |
| |
| 2) pmd_present() && !pmd_trans_huge() |
| Either: |
| a) PSE not set and either PRESENT or PROTNONE is. |
| => pte-table-mapping pmd (or PROT_NONE) |
| b) devmap. This routine can be called immediately after |
| unlocking/locking mmap_lock -- or called with no locks held (see |
| khugepaged_scan_mm_slot()), so previous VMA checks have since been |
| invalidated. |
| |
| 3) !pmd_present() && pmd_trans_huge() |
| Not possible. |
| |
| 4) !pmd_present() && !pmd_trans_huge() |
| Neither PRESENT nor PROTNONE set |
| => not present |
| |
| I've checked all archs that implement pmd_trans_huge() (arm64, riscv, |
| powerpc, longarch, x86, mips, s390) and this logic roughly translates |
| (though devmap treatment is unique to x86 and powerpc, and (3) doesn't |
| necessarily hold in general -- but that doesn't matter since |
| !pmd_present() always takes failure path). |
| |
| Also, add a comment above find_pmd_or_thp_or_none() to help future |
| travelers reason about the validity of the code; namely, the possible |
| mutations that might happen out from under us, depending on how mmap_lock |
| is held (if at all). |
| |
| The Linux kernel CVE team has assigned CVE-2023-52934 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.1 with commit 34488399fa08faaf664743fa54b271eb6f9e1321 and fixed in 6.1.11 with commit 96aaaf8666010a39430cecf8a65c7ce2908a030f |
| Issue introduced in 6.1 with commit 34488399fa08faaf664743fa54b271eb6f9e1321 and fixed in 6.2 with commit edb5d0cf5525357652aff6eacd9850b8ced07143 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2023-52934 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| mm/khugepaged.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/96aaaf8666010a39430cecf8a65c7ce2908a030f |
| https://git.kernel.org/stable/c/edb5d0cf5525357652aff6eacd9850b8ced07143 |