| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-46787: userfaultfd: fix checks for huge PMDs |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| userfaultfd: fix checks for huge PMDs |
| |
| Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2. |
| |
| The pmd_trans_huge() code in mfill_atomic() is wrong in three different |
| ways depending on kernel version: |
| |
| 1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit |
| the right two race windows) - I've tested this in a kernel build with |
| some extra mdelay() calls. See the commit message for a description |
| of the race scenario. |
| On older kernels (before 6.5), I think the same bug can even |
| theoretically lead to accessing transhuge page contents as a page table |
| if you hit the right 5 narrow race windows (I haven't tested this case). |
| 2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for |
| detecting PMDs that don't point to page tables. |
| On older kernels (before 6.5), you'd just have to win a single fairly |
| wide race to hit this. |
| I've tested this on 6.1 stable by racing migration (with a mdelay() |
| patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86 |
| VM, that causes a kernel oops in ptlock_ptr(). |
| 3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed |
| to yank page tables out from under us (though I haven't tested that), |
| so I think the BUG_ON() checks in mfill_atomic() are just wrong. |
| |
| I decided to write two separate fixes for these (one fix for bugs 1+2, one |
| fix for bug 3), so that the first fix can be backported to kernels |
| affected by bugs 1+2. |
| |
| |
| This patch (of 2): |
| |
| This fixes two issues. |
| |
| I discovered that the following race can occur: |
| |
| mfill_atomic other thread |
| ============ ============ |
| <zap PMD> |
| pmdp_get_lockless() [reads none pmd] |
| <bail if trans_huge> |
| <if none:> |
| <pagefault creates transhuge zeropage> |
| __pte_alloc [no-op] |
| <zap PMD> |
| <bail if pmd_trans_huge(*dst_pmd)> |
| BUG_ON(pmd_none(*dst_pmd)) |
| |
| I have experimentally verified this in a kernel with extra mdelay() calls; |
| the BUG_ON(pmd_none(*dst_pmd)) triggers. |
| |
| On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow |
| pte_offset_map[_lock]() to fail"), this can't lead to anything worse than |
| a BUG_ON(), since the page table access helpers are actually designed to |
| deal with page tables concurrently disappearing; but on older kernels |
| (<=6.4), I think we could probably theoretically race past the two |
| BUG_ON() checks and end up treating a hugepage as a page table. |
| |
| The second issue is that, as Qi Zheng pointed out, there are other types |
| of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs |
| (in particular, migration PMDs). |
| |
| On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a |
| PMD that contains a migration entry (which just requires winning a single, |
| fairly wide race), it will pass the PMD to pte_offset_map_lock(), which |
| assumes that the PMD points to a page table. |
| |
| Breakage follows: First, the kernel tries to take the PTE lock (which will |
| crash or maybe worse if there is no "struct page" for the address bits in |
| the migration entry PMD - I think at least on X86 there usually is no |
| corresponding "struct page" thanks to the PTE inversion mitigation, amd64 |
| looks different). |
| |
| If that didn't crash, the kernel would next try to write a PTE into what |
| it wrongly thinks is a page table. |
| |
| As part of fixing these issues, get rid of the check for pmd_trans_huge() |
| before __pte_alloc() - that's redundant, we're going to have to check for |
| that after the __pte_alloc() anyway. |
| |
| Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels. |
| |
| The Linux kernel CVE team has assigned CVE-2024-46787 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 4.3 with commit c1a4de99fada21e2e9251e52cbb51eff5aadc757 and fixed in 6.6.51 with commit 3c6b4bcf37845c9359aed926324bed66bdd2448d |
| Issue introduced in 4.3 with commit c1a4de99fada21e2e9251e52cbb51eff5aadc757 and fixed in 6.10.10 with commit 98cc18b1b71e23fe81a5194ed432b20c2d81a01a |
| Issue introduced in 4.3 with commit c1a4de99fada21e2e9251e52cbb51eff5aadc757 and fixed in 6.11 with commit 71c186efc1b2cf1aeabfeff3b9bd5ac4c5ac14d8 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-46787 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| mm/userfaultfd.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/3c6b4bcf37845c9359aed926324bed66bdd2448d |
| https://git.kernel.org/stable/c/98cc18b1b71e23fe81a5194ed432b20c2d81a01a |
| https://git.kernel.org/stable/c/71c186efc1b2cf1aeabfeff3b9bd5ac4c5ac14d8 |