| From 2aa85f246c181b1fa89f27e8e20c5636426be624 Mon Sep 17 00:00:00 2001 |
| From: Steve Wahl <steve.wahl@hpe.com> |
| Date: Tue, 24 Sep 2019 16:03:55 -0500 |
| Subject: x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area |
| |
| From: Steve Wahl <steve.wahl@hpe.com> |
| |
| commit 2aa85f246c181b1fa89f27e8e20c5636426be624 upstream. |
| |
| Our hardware (UV aka Superdome Flex) has address ranges marked |
| reserved by the BIOS. Access to these ranges is caught as an error, |
| causing the BIOS to halt the system. |
| |
| Initial page tables mapped a large range of physical addresses that |
| were not checked against the list of BIOS reserved addresses, and |
| sometimes included reserved addresses in part of the mapped range. |
| Including the reserved range in the map allowed processor speculative |
| accesses to the reserved range, triggering a BIOS halt. |
| |
| Used early in booting, the page table level2_kernel_pgt addresses 1 |
| GiB divided into 2 MiB pages, and it was set up to linearly map a full |
| 1 GiB of physical addresses that included the physical address range |
| of the kernel image, as chosen by KASLR. But this also included a |
| large range of unused addresses on either side of the kernel image. |
| And unlike the kernel image's physical address range, this extra |
| mapped space was not checked against the BIOS tables of usable RAM |
| addresses. So there were times when the addresses chosen by KASLR |
| would result in processor accessible mappings of BIOS reserved |
| physical addresses. |
| |
| The kernel code did not directly access any of this extra mapped |
| space, but having it mapped allowed the processor to issue speculative |
| accesses into reserved memory, causing system halts. |
| |
| This was encountered somewhat rarely on a normal system boot, and much |
| more often when starting the crash kernel if "crashkernel=512M,high" |
| was specified on the command line (this heavily restricts the physical |
| address of the crash kernel, in our case usually within 1 GiB of |
| reserved space). |
| |
| The solution is to invalidate the pages of this table outside the kernel |
| image's space before the page table is activated. It fixes this problem |
| on our hardware. |
| |
| [ bp: Touchups. ] |
| |
| Signed-off-by: Steve Wahl <steve.wahl@hpe.com> |
| Signed-off-by: Borislav Petkov <bp@suse.de> |
| Acked-by: Dave Hansen <dave.hansen@linux.intel.com> |
| Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
| Cc: Baoquan He <bhe@redhat.com> |
| Cc: Brijesh Singh <brijesh.singh@amd.com> |
| Cc: dimitri.sivanich@hpe.com |
| Cc: Feng Tang <feng.tang@intel.com> |
| Cc: "H. Peter Anvin" <hpa@zytor.com> |
| Cc: Ingo Molnar <mingo@redhat.com> |
| Cc: Jordan Borgner <mail@jordan-borgner.de> |
| Cc: Juergen Gross <jgross@suse.com> |
| Cc: mike.travis@hpe.com |
| Cc: russ.anderson@hpe.com |
| Cc: stable@vger.kernel.org |
| Cc: Thomas Gleixner <tglx@linutronix.de> |
| Cc: x86-ml <x86@kernel.org> |
| Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> |
| Link: https://lkml.kernel.org/r/9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| arch/x86/kernel/head64.c | 22 ++++++++++++++++++++-- |
| 1 file changed, 20 insertions(+), 2 deletions(-) |
| |
| --- a/arch/x86/kernel/head64.c |
| +++ b/arch/x86/kernel/head64.c |
| @@ -145,13 +145,31 @@ unsigned long __head __startup_64(unsign |
| * we might write invalid pmds, when the kernel is relocated |
| * cleanup_highmap() fixes this up along with the mappings |
| * beyond _end. |
| + * |
| + * Only the region occupied by the kernel image has so far |
| + * been checked against the table of usable memory regions |
| + * provided by the firmware, so invalidate pages outside that |
| + * region. A page table entry that maps to a reserved area of |
| + * memory would allow processor speculation into that area, |
| + * and on some hardware (particularly the UV platform) even |
| + * speculative access to some reserved areas is caught as an |
| + * error, causing the BIOS to halt the system. |
| */ |
| |
| pmd = fixup_pointer(level2_kernel_pgt, physaddr); |
| - for (i = 0; i < PTRS_PER_PMD; i++) { |
| + |
| + /* invalidate pages before the kernel image */ |
| + for (i = 0; i < pmd_index((unsigned long)_text); i++) |
| + pmd[i] &= ~_PAGE_PRESENT; |
| + |
| + /* fixup pages that are part of the kernel image */ |
| + for (; i <= pmd_index((unsigned long)_end); i++) |
| if (pmd[i] & _PAGE_PRESENT) |
| pmd[i] += load_delta; |
| - } |
| + |
| + /* invalidate pages after the kernel image */ |
| + for (; i < PTRS_PER_PMD; i++) |
| + pmd[i] &= ~_PAGE_PRESENT; |
| |
| /* |
| * Fixup phys_base - remove the memory encryption mask to obtain |