cve/published/2024/CVE-2024-46787.mbox - pub/scm/linux/security/vulns - Git at Google

 From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
 From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 To: <linux-cve-announce@vger.kernel.org>
 Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
 Subject: CVE-2024-46787: userfaultfd: fix checks for huge PMDs

 Description
 ===========

 In the Linux kernel, the following vulnerability has been resolved:

 userfaultfd: fix checks for huge PMDs

 Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.

 The pmd_trans_huge() code in mfill_atomic() is wrong in three different
 ways depending on kernel version:

 1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit
    the right two race windows) - I've tested this in a kernel build with
    some extra mdelay() calls. See the commit message for a description
    of the race scenario.
    On older kernels (before 6.5), I think the same bug can even
    theoretically lead to accessing transhuge page contents as a page table
    if you hit the right 5 narrow race windows (I haven't tested this case).
 2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for
    detecting PMDs that don't point to page tables.
    On older kernels (before 6.5), you'd just have to win a single fairly
    wide race to hit this.
    I've tested this on 6.1 stable by racing migration (with a mdelay()
    patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86
    VM, that causes a kernel oops in ptlock_ptr().
 3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed
    to yank page tables out from under us (though I haven't tested that),
    so I think the BUG_ON() checks in mfill_atomic() are just wrong.

 I decided to write two separate fixes for these (one fix for bugs 1+2, one
 fix for bug 3), so that the first fix can be backported to kernels
 affected by bugs 1+2.


 This patch (of 2):

 This fixes two issues.

 I discovered that the following race can occur:

   mfill_atomic                other thread
   ============                ============
                               <zap PMD>
   pmdp_get_lockless() [reads none pmd]
   <bail if trans_huge>
   <if none:>
                               <pagefault creates transhuge zeropage>
     __pte_alloc [no-op]
                               <zap PMD>
   <bail if pmd_trans_huge(*dst_pmd)>
   BUG_ON(pmd_none(*dst_pmd))

 I have experimentally verified this in a kernel with extra mdelay() calls;
 the BUG_ON(pmd_none(*dst_pmd)) triggers.

 On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow
 pte_offset_map[_lock]() to fail"), this can't lead to anything worse than
 a BUG_ON(), since the page table access helpers are actually designed to
 deal with page tables concurrently disappearing; but on older kernels
 (<=6.4), I think we could probably theoretically race past the two
 BUG_ON() checks and end up treating a hugepage as a page table.

 The second issue is that, as Qi Zheng pointed out, there are other types
 of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs
 (in particular, migration PMDs).

 On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a
 PMD that contains a migration entry (which just requires winning a single,
 fairly wide race), it will pass the PMD to pte_offset_map_lock(), which
 assumes that the PMD points to a page table.

 Breakage follows: First, the kernel tries to take the PTE lock (which will
 crash or maybe worse if there is no "struct page" for the address bits in
 the migration entry PMD - I think at least on X86 there usually is no
 corresponding "struct page" thanks to the PTE inversion mitigation, amd64
 looks different).

 If that didn't crash, the kernel would next try to write a PTE into what
 it wrongly thinks is a page table.

 As part of fixing these issues, get rid of the check for pmd_trans_huge()
 before __pte_alloc() - that's redundant, we're going to have to check for
 that after the __pte_alloc() anyway.

 Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.

 The Linux kernel CVE team has assigned CVE-2024-46787 to this issue.


 Affected and fixed versions
 ===========================

 	Issue introduced in 4.3 with commit c1a4de99fada21e2e9251e52cbb51eff5aadc757 and fixed in 6.6.51 with commit 3c6b4bcf37845c9359aed926324bed66bdd2448d
 	Issue introduced in 4.3 with commit c1a4de99fada21e2e9251e52cbb51eff5aadc757 and fixed in 6.10.10 with commit 98cc18b1b71e23fe81a5194ed432b20c2d81a01a
 	Issue introduced in 4.3 with commit c1a4de99fada21e2e9251e52cbb51eff5aadc757 and fixed in 6.11 with commit 71c186efc1b2cf1aeabfeff3b9bd5ac4c5ac14d8

 Please see https://www.kernel.org for a full list of currently supported
 kernel versions by the kernel community.

 Unaffected versions might change over time as fixes are backported to
 older supported kernel versions.  The official CVE entry at
 	https://cve.org/CVERecord/?id=CVE-2024-46787
 will be updated if fixes are backported, please check that for the most
 up to date information about this issue.


 Affected files
 ==============

 The file(s) affected by this issue are:
 	mm/userfaultfd.c


 Mitigation
 ==========

 The Linux kernel CVE team recommends that you update to the latest
 stable kernel version for this, and many other bugfixes.  Individual
 changes are never tested alone, but rather are part of a larger kernel
 release.  Cherry-picking individual commits is not recommended or
 supported by the Linux kernel community at all.  If however, updating to
 the latest release is impossible, the individual changes to resolve this
 issue can be found at these commits:
 	https://git.kernel.org/stable/c/3c6b4bcf37845c9359aed926324bed66bdd2448d
 	https://git.kernel.org/stable/c/98cc18b1b71e23fe81a5194ed432b20c2d81a01a
 	https://git.kernel.org/stable/c/71c186efc1b2cf1aeabfeff3b9bd5ac4c5ac14d8
	From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
	From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	To: <linux-cve-announce@vger.kernel.org>
	Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
	Subject: CVE-2024-46787: userfaultfd: fix checks for huge PMDs

	Description
	===========

	In the Linux kernel, the following vulnerability has been resolved:

	userfaultfd: fix checks for huge PMDs

	Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.

	The pmd_trans_huge() code in mfill_atomic() is wrong in three different
	ways depending on kernel version:

	1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit
	the right two race windows) - I've tested this in a kernel build with
	some extra mdelay() calls. See the commit message for a description
	of the race scenario.
	On older kernels (before 6.5), I think the same bug can even
	theoretically lead to accessing transhuge page contents as a page table
	if you hit the right 5 narrow race windows (I haven't tested this case).
	2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for
	detecting PMDs that don't point to page tables.
	On older kernels (before 6.5), you'd just have to win a single fairly
	wide race to hit this.
	I've tested this on 6.1 stable by racing migration (with a mdelay()
	patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86
	VM, that causes a kernel oops in ptlock_ptr().
	3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed
	to yank page tables out from under us (though I haven't tested that),
	so I think the BUG_ON() checks in mfill_atomic() are just wrong.

	I decided to write two separate fixes for these (one fix for bugs 1+2, one
	fix for bug 3), so that the first fix can be backported to kernels
	affected by bugs 1+2.


	This patch (of 2):

	This fixes two issues.

	I discovered that the following race can occur:

	mfill_atomic other thread
	============ ============
	<zap PMD>
	pmdp_get_lockless() [reads none pmd]
	<bail if trans_huge>
	<if none:>
	<pagefault creates transhuge zeropage>
	__pte_alloc [no-op]
	<zap PMD>
	<bail if pmd_trans_huge(*dst_pmd)>
	BUG_ON(pmd_none(*dst_pmd))

	I have experimentally verified this in a kernel with extra mdelay() calls;
	the BUG_ON(pmd_none(*dst_pmd)) triggers.

	On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow
	pte_offset_map[_lock]() to fail"), this can't lead to anything worse than
	a BUG_ON(), since the page table access helpers are actually designed to
	deal with page tables concurrently disappearing; but on older kernels
	(<=6.4), I think we could probably theoretically race past the two
	BUG_ON() checks and end up treating a hugepage as a page table.

	The second issue is that, as Qi Zheng pointed out, there are other types
	of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs
	(in particular, migration PMDs).

	On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a
	PMD that contains a migration entry (which just requires winning a single,
	fairly wide race), it will pass the PMD to pte_offset_map_lock(), which
	assumes that the PMD points to a page table.

	Breakage follows: First, the kernel tries to take the PTE lock (which will
	crash or maybe worse if there is no "struct page" for the address bits in
	the migration entry PMD - I think at least on X86 there usually is no
	corresponding "struct page" thanks to the PTE inversion mitigation, amd64
	looks different).

	If that didn't crash, the kernel would next try to write a PTE into what
	it wrongly thinks is a page table.

	As part of fixing these issues, get rid of the check for pmd_trans_huge()
	before __pte_alloc() - that's redundant, we're going to have to check for
	that after the __pte_alloc() anyway.

	Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.

	The Linux kernel CVE team has assigned CVE-2024-46787 to this issue.


	Affected and fixed versions
	===========================

	Issue introduced in 4.3 with commit c1a4de99fada21e2e9251e52cbb51eff5aadc757 and fixed in 6.6.51 with commit 3c6b4bcf37845c9359aed926324bed66bdd2448d
	Issue introduced in 4.3 with commit c1a4de99fada21e2e9251e52cbb51eff5aadc757 and fixed in 6.10.10 with commit 98cc18b1b71e23fe81a5194ed432b20c2d81a01a
	Issue introduced in 4.3 with commit c1a4de99fada21e2e9251e52cbb51eff5aadc757 and fixed in 6.11 with commit 71c186efc1b2cf1aeabfeff3b9bd5ac4c5ac14d8

	Please see https://www.kernel.org for a full list of currently supported
	kernel versions by the kernel community.

	Unaffected versions might change over time as fixes are backported to
	older supported kernel versions. The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2024-46787
	will be updated if fixes are backported, please check that for the most
	up to date information about this issue.


	Affected files
	==============

	The file(s) affected by this issue are:
	mm/userfaultfd.c


	Mitigation
	==========

	The Linux kernel CVE team recommends that you update to the latest
	stable kernel version for this, and many other bugfixes. Individual
	changes are never tested alone, but rather are part of a larger kernel
	release. Cherry-picking individual commits is not recommended or
	supported by the Linux kernel community at all. If however, updating to
	the latest release is impossible, the individual changes to resolve this
	issue can be found at these commits:
	https://git.kernel.org/stable/c/3c6b4bcf37845c9359aed926324bed66bdd2448d
	https://git.kernel.org/stable/c/98cc18b1b71e23fe81a5194ed432b20c2d81a01a
	https://git.kernel.org/stable/c/71c186efc1b2cf1aeabfeff3b9bd5ac4c5ac14d8