| From f5f3497cad8c8416a74b9aaceb127908755d020a Mon Sep 17 00:00:00 2001 |
| From: Paolo Bonzini <pbonzini@redhat.com> |
| Date: Wed, 14 Oct 2015 13:30:45 +0200 |
| Subject: x86/setup: Extend low identity map to cover whole kernel range |
| |
| From: Paolo Bonzini <pbonzini@redhat.com> |
| |
| commit f5f3497cad8c8416a74b9aaceb127908755d020a upstream. |
| |
| On 32-bit systems, the initial_page_table is reused by |
| efi_call_phys_prolog as an identity map to call |
| SetVirtualAddressMap. efi_call_phys_prolog takes care of |
| converting the current CPU's GDT to a physical address too. |
| |
| For PAE kernels the identity mapping is achieved by aliasing the |
| first PDPE for the kernel memory mapping into the first PDPE |
| of initial_page_table. This makes the EFI stub's trick "just work". |
| |
| However, for non-PAE kernels there is no guarantee that the identity |
| mapping in the initial_page_table extends as far as the GDT; in this |
| case, accesses to the GDT will cause a page fault (which quickly becomes |
| a triple fault). Fix this by copying the kernel mappings from |
| swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at |
| identity mapping. |
| |
| For some reason, this is only reproducible with QEMU's dynamic translation |
| mode, and not for example with KVM. However, even under KVM one can clearly |
| see that the page table is bogus: |
| |
| $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize |
| $ gdb |
| (gdb) target remote localhost:1234 |
| (gdb) hb *0x02858f6f |
| Hardware assisted breakpoint 1 at 0x2858f6f |
| (gdb) c |
| Continuing. |
| |
| Breakpoint 1, 0x02858f6f in ?? () |
| (gdb) monitor info registers |
| ... |
| GDT= 0724e000 000000ff |
| IDT= fffbb000 000007ff |
| CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690 |
| ... |
| |
| The page directory is sane: |
| |
| (gdb) x/4wx 0x32b7000 |
| 0x32b7000: 0x03398063 0x03399063 0x0339a063 0x0339b063 |
| (gdb) x/4wx 0x3398000 |
| 0x3398000: 0x00000163 0x00001163 0x00002163 0x00003163 |
| (gdb) x/4wx 0x3399000 |
| 0x3399000: 0x00400003 0x00401003 0x00402003 0x00403003 |
| |
| but our particular page directory entry is empty: |
| |
| (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4 |
| 0x32b7070: 0x00000000 |
| |
| [ It appears that you can skate past this issue if you don't receive |
| any interrupts while the bogus GDT pointer is loaded, or if you avoid |
| reloading the segment registers in general. |
| |
| Andy Lutomirski provides some additional insight: |
| |
| "AFAICT it's entirely permissible for the GDTR and/or LDT |
| descriptor to point to unmapped memory. Any attempt to use them |
| (segment loads, interrupts, IRET, etc) will try to access that memory |
| as if the access came from CPL 0 and, if the access fails, will |
| generate a valid page fault with CR2 pointing into the GDT or |
| LDT." |
| |
| Up until commit 23a0d4e8fa6d ("efi: Disable interrupts around EFI |
| calls, not in the epilog/prolog calls") interrupts were disabled |
| around the prolog and epilog calls, and the functional GDT was |
| re-installed before interrupts were re-enabled. |
| |
| Which explains why no one has hit this issue until now. ] |
| |
| Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
| Reported-by: Laszlo Ersek <lersek@redhat.com> |
| Cc: <stable@vger.kernel.org> |
| Cc: Borislav Petkov <bp@alien8.de> |
| Cc: "H. Peter Anvin" <hpa@zytor.com> |
| Cc: Thomas Gleixner <tglx@linutronix.de> |
| Cc: Ingo Molnar <mingo@kernel.org> |
| Cc: Andy Lutomirski <luto@amacapital.net> |
| Signed-off-by: Matt Fleming <matt.fleming@intel.com> |
| [ Updated changelog. ] |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| arch/x86/kernel/setup.c | 8 ++++++++ |
| 1 file changed, 8 insertions(+) |
| |
| --- a/arch/x86/kernel/setup.c |
| +++ b/arch/x86/kernel/setup.c |
| @@ -1194,6 +1194,14 @@ void __init setup_arch(char **cmdline_p) |
| clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, |
| swapper_pg_dir + KERNEL_PGD_BOUNDARY, |
| KERNEL_PGD_PTRS); |
| + |
| + /* |
| + * sync back low identity map too. It is used for example |
| + * in the 32-bit EFI stub. |
| + */ |
| + clone_pgd_range(initial_page_table, |
| + swapper_pg_dir + KERNEL_PGD_BOUNDARY, |
| + KERNEL_PGD_PTRS); |
| #endif |
| |
| tboot_probe(); |