arm64: Support dynamic kernel stacks

Turning on dynamic kernel stacks can save a lot of kernel
RAM on ARM64.

Example on minimal OpenWrt system:
$ cat /proc/vmstat | grep stack
nr_kernel_stack 320
nr_dynamic_stacks_faults 8

Each stack initially just use 4KB of stack, the each fault
extends one of the kernel stacks with one more 4KB page.

We see that in this case we consume

  328 * 4KB = 1.28 MB

of RAM for stacks.

Compare if using 16KB pre-allocated stacks: all 320
processes would use 16KB of physical memory each, i.e.

  320 * 16KB = 5 MB

So in this minimal system we save almost 4 MB of runtime
memory.

The approach taken here is to special-case the sync
exceptions from the vector table. If we are handling a
sync (and using dynamic stack), we stash (x16, x17)
into (TPIDR_EL0, TPIDRRO_EL0) temporarily so that we
can execute some code without using any stack at all.

We define a special sync stack that is only used
when handling sync calls, and we switch to this stack
immediately in the exception handler, without saving
a single value onto the task stack.

We then check if this sync exception was a data abort
on the ordinary process stack. If it was not, we copy our
current sync stack over to the task stack and continue
like nothing special happened.

If this was indeed a data abort on the task stack,
we call do_stack_abort() which in turn calls
dynamic_stack_fault() to latch in a new physical
page to the stack, all while running on the temporary
sync stack. We then return from the exception restoring
SP to what it used to be before the sync exception.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
7 files changed