| From db42a5e323f258c0f1c2054c013bbb408e3f6fde Mon Sep 17 00:00:00 2001 |
| From: Joerg Roedel <jroedel@suse.de> |
| Date: Tue, 26 Nov 2019 11:09:42 +0100 |
| Subject: [PATCH] x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all() |
| |
| commit 9a62d20027da3164a22244d9f022c0c987261687 upstream. |
| |
| The job of vmalloc_sync_all() is to help the lazy freeing of vmalloc() |
| ranges: before such vmap ranges are reused we make sure that they are |
| unmapped from every task's page tables. |
| |
| This is really easy on pagetable setups where the kernel page tables |
| are shared between all tasks - this is the case on 32-bit kernels |
| with SHARED_KERNEL_PMD = 1. |
| |
| But on !SHARED_KERNEL_PMD 32-bit kernels this involves iterating |
| over the pgd_list and clearing all pmd entries in the pgds that |
| are cleared in the init_mm.pgd, which is the reference pagetable |
| that the vmalloc() code uses. |
| |
| In that context the current practice of vmalloc_sync_all() iterating |
| until FIX_ADDR_TOP is buggy: |
| |
| for (address = VMALLOC_START & PMD_MASK; |
| address >= TASK_SIZE_MAX && address < FIXADDR_TOP; |
| address += PMD_SIZE) { |
| struct page *page; |
| |
| Because iterating up to FIXADDR_TOP will involve a lot of non-vmalloc |
| address ranges: |
| |
| VMALLOC -> PKMAP -> LDT -> CPU_ENTRY_AREA -> FIX_ADDR |
| |
| This is mostly harmless for the FIX_ADDR and CPU_ENTRY_AREA ranges |
| that don't clear their pmds, but it's lethal for the LDT range, |
| which relies on having different mappings in different processes, |
| and 'synchronizing' them in the vmalloc sense corrupts those |
| pagetable entries (clearing them). |
| |
| This got particularly prominent with PTI, which turns SHARED_KERNEL_PMD |
| off and makes this the dominant mapping mode on 32-bit. |
| |
| To make LDT working again vmalloc_sync_all() must only iterate over |
| the volatile parts of the kernel address range that are identical |
| between all processes. |
| |
| So the correct check in vmalloc_sync_all() is "address < VMALLOC_END" |
| to make sure the VMALLOC areas are synchronized and the LDT |
| mapping is not falsely overwritten. |
| |
| The CPU_ENTRY_AREA and the FIXMAP area are no longer synced either, |
| but this is not really a proplem since their PMDs get established |
| during bootup and never change. |
| |
| This change fixes the ldt_gdt selftest in my setup. |
| |
| [ mingo: Fixed up the changelog to explain the logic and modified the |
| copying to only happen up until VMALLOC_END. ] |
| |
| Reported-by: Borislav Petkov <bp@suse.de> |
| Tested-by: Borislav Petkov <bp@suse.de> |
| Signed-off-by: Joerg Roedel <jroedel@suse.de> |
| Cc: <stable@vger.kernel.org> |
| Cc: Andy Lutomirski <luto@kernel.org> |
| Cc: Borislav Petkov <bp@alien8.de> |
| Cc: Dave Hansen <dave.hansen@linux.intel.com> |
| Cc: Joerg Roedel <joro@8bytes.org> |
| Cc: Linus Torvalds <torvalds@linux-foundation.org> |
| Cc: Peter Zijlstra <peterz@infradead.org> |
| Cc: Thomas Gleixner <tglx@linutronix.de> |
| Cc: hpa@zytor.com |
| Fixes: 7757d607c6b3: ("x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32") |
| Link: https://lkml.kernel.org/r/20191126111119.GA110513@gmail.com |
| Signed-off-by: Ingo Molnar <mingo@kernel.org> |
| Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> |
| |
| diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c |
| index da9684e95595..3589db475280 100644 |
| --- a/arch/x86/mm/fault.c |
| +++ b/arch/x86/mm/fault.c |
| @@ -214,7 +214,7 @@ void vmalloc_sync_all(void) |
| return; |
| |
| for (address = VMALLOC_START & PMD_MASK; |
| - address >= TASK_SIZE_MAX && address < FIXADDR_TOP; |
| + address >= TASK_SIZE_MAX && address < VMALLOC_END; |
| address += PMD_SIZE) { |
| struct page *page; |
| |
| -- |
| 2.7.4 |
| |