x86, kaiser: use PCID feature to make user and kernel switches faster

Short summary: Use x86 PCID feature to avoid flushing the TLB at all
interrupts and syscalls.  Speed them up.  Makes context switches
and TLB flushing slower.

Background:

KAISER keeps two copies of the page tables.  We switch between them
with the the CR3 register.  But, CR3 was really designed for context
switches and changing it also flushes the entire TLB (modulo global
pages).  This TLB flush increases the cost of interrupts and context
switches.  For syscall-heavy microbenchmarks it can cut the rate of
syscalls by 2/3.

But, now we have suppport for and Intel CPU feature called Process
Context IDentifiers (PCID) in the kernel thanks to Andy Lutomirski.
This feature is intended to allow you to switch between contexts
without flushing the TLB.

Implementation:

We can use PCIDs to avoid flushing the TLB at kernel entry/exit.
This is speeds up both interrupts and syscalls.

We do this by assigning the kernel and userspace different ASIDs.  On
entry from userspace, we move over to the kernel page tables *and*
ASID.  On exit, we restore the user page tables and ASID.  Fortunately,
the ASID is programmed via CR3, which we are already using to switch
between the page table copies.  So, we get one-stop shopping.

In current kernels, CR3 is used to switch between processes which also
provides all the TLB flushing that we need at a context switch.  But,
with KAISER, that CR3 move only flushes the current (kernel) ASID.  We
need an extra TLB flushing operation to flush the user ASID: invpcid.
This is probably ~100 cycles, but this is done with the assumption that
the time we lose in context switches is more than made up for in
interrupts and syscalls.

Support:

PCIDs are generally available on Sandybridge and newer CPUs.  However,
the accompanying INVPCID instruction did not become available until
Haswell (the ones with "v4", or called fourth-generation Core).  This
instruction allows non-current-PCID TLB entries to be flushed without
switching CR3 and global pages to be flushed without a double
MOV-to-CR4.

Without INVPCID, PCIDs are much harder to use.  TLB invalidation gets
much more onerous:

1. Every kernel TLB flush (even for a single page) requires an
   interrupts-off MOV-to-CR4 which is very expensive.  This is because
   there is no way to flush a kernel address that might be loaded
   in *EVERY* PCID.  Right now, there are "only" ~12 of these per-cpu,
   but that's too painful to use the MOV-to-CR3 to flush them.  That
   leaves only the MOV-to-CR4.
2. Every userspace flush (even for a single page requires one of the
   following:
   a. A pair of flushing (bit 63 clear) CR3 writes: one for
      the kernel ASID and another for userspace.
   b. A pair of non-flushing CR3 writes (bit 63 set) with the
      flush done for each.  For instance, what is currently a
      single instruction without KAISER:

		invpcid_flush_one(current_pcid, addr);

      becomes this with KAISER:

      		invpcid_flush_one(current_kern_pcid, addr);
		invpcid_flush_one(current_user_pcid, addr);

      and this without INVPCID:

      		__native_flush_tlb_single(addr);
		write_cr3(mm->pgd | current_user_pcid | NOFLUSH);
      		__native_flush_tlb_single(addr);
		write_cr3(mm->pgd | current_kern_pcid | NOFLUSH);

So, for now, we fully disable PCIDs with KAISER when INVPCID is
not available.  This is fixable, but it's an optimization that
we can do later.

Hugh Dickins also points out that PCIDs really have two distinct
use-cases in the context of KAISER.  The first way they can be used
is as "TLB preservation across context-swtich", which is what
Andy Lutomirksi's 4.14 PCID code does.  They can also be used as
a "KAISER syscall/interrupt accelerator".  If we just use them to
speed up syscall/interrupts (and ignore the context-switch TLB
preservation), then the deficiency of not having INVPCID
becomes much less onerous.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Cc: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: x86@kernel.org
9 files changed