| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-45024: mm/hugetlb: fix hugetlb vs. core-mm PT locking |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| mm/hugetlb: fix hugetlb vs. core-mm PT locking |
| |
| We recently made GUP's common page table walking code to also walk hugetlb |
| VMAs without most hugetlb special-casing, preparing for the future of |
| having less hugetlb-specific page table walking code in the codebase. |
| Turns out that we missed one page table locking detail: page table locking |
| for hugetlb folios that are not mapped using a single PMD/PUD. |
| |
| Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB |
| hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the |
| page tables, will perform a pte_offset_map_lock() to grab the PTE table |
| lock. |
| |
| However, hugetlb that concurrently modifies these page tables would |
| actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the |
| locks would differ. Something similar can happen right now with hugetlb |
| folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS. |
| |
| This issue can be reproduced [1], for example triggering: |
| |
| [ 3105.936100] ------------[ cut here ]------------ |
| [ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188 |
| [ 3105.944634] Modules linked in: [...] |
| [ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1 |
| [ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024 |
| [ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) |
| [ 3105.991108] pc : try_grab_folio+0x11c/0x188 |
| [ 3105.994013] lr : follow_page_pte+0xd8/0x430 |
| [ 3105.996986] sp : ffff80008eafb8f0 |
| [ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43 |
| [ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48 |
| [ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978 |
| [ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001 |
| [ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000 |
| [ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000 |
| [ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0 |
| [ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080 |
| [ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000 |
| [ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000 |
| [ 3106.047957] Call trace: |
| [ 3106.049522] try_grab_folio+0x11c/0x188 |
| [ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0 |
| [ 3106.055527] follow_page_mask+0x1a0/0x2b8 |
| [ 3106.058118] __get_user_pages+0xf0/0x348 |
| [ 3106.060647] faultin_page_range+0xb0/0x360 |
| [ 3106.063651] do_madvise+0x340/0x598 |
| |
| Let's make huge_pte_lockptr() effectively use the same PT locks as any |
| core-mm page table walker would. Add ptep_lockptr() to obtain the PTE |
| page table lock using a pte pointer -- unfortunately we cannot convert |
| pte_lockptr() because virt_to_page() doesn't work with kmap'ed page tables |
| we can have with CONFIG_HIGHPTE. |
| |
| Handle CONFIG_PGTABLE_LEVELS correctly by checking in reverse order, such |
| that when e.g., CONFIG_PGTABLE_LEVELS==2 with |
| PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE will work as expected. Document |
| why that works. |
| |
| There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb |
| folio being mapped using two PTE page tables. While hugetlb wants to take |
| the PMD table lock, core-mm would grab the PTE table lock of one of both |
| PTE page tables. In such corner cases, we have to make sure that both |
| locks match, which is (fortunately!) currently guaranteed for 8xx as it |
| does not support SMP and consequently doesn't use split PT locks. |
| |
| [1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/ |
| |
| The Linux kernel CVE team has assigned CVE-2024-45024 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.10 with commit 9cb28da54643ad464c47585cd5866c30b0218e67 and fixed in 6.10.7 with commit 7300dadba49e531af2d890ae4e34c9b115384a62 |
| Issue introduced in 6.10 with commit 9cb28da54643ad464c47585cd5866c30b0218e67 and fixed in 6.11 with commit 5f75cfbd6bb02295ddaed48adf667b6c828ce07b |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-45024 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| include/linux/hugetlb.h |
| include/linux/mm.h |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/7300dadba49e531af2d890ae4e34c9b115384a62 |
| https://git.kernel.org/stable/c/5f75cfbd6bb02295ddaed48adf667b6c828ce07b |