arm64: kernel: implement fast refcount checking
This adds support to arm64 for fast refcount checking, as proposed by
Kees for x86 based on the implementation by grsecurity/PaX.
The general approach is identical: the existing atomic_t helpers are
cloned for refcount_t, with the arithmetic instruction modified to set
the PSTATE flags, and one or two branch instructions added that jump to
an out of line handler if overflow, decrement to zero or increment from
zero are detected.
One complication that we have to deal with on arm64 is the fact that
it has two atomics implementations: the original LL/SC implementation
using load/store exclusive loops, and the newer LSE one that does mostly
the same in a single instruction. So we need to clone some parts of
both for the refcount handlers, but we also need to deal with the way
LSE builds fall back to LL/SC at runtime if the hardware does not
support it.
As is the case with the x86 version, the performance delta is in the
noise (Cortex-A57 @ 2 GHz, using LL/SC not LSE), even though the arm64
implementation incorporates an add-from-zero check as well:
perf stat -B -- echo ATOMIC_TIMING >/sys/kernel/debug/provoke-crash/DIRECT
Performance counter stats for 'cat /dev/fd/63':
65716.592696 task-clock (msec) # 1.000 CPUs utilized
2 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
46 page-faults # 0.001 K/sec
131341846242 cycles # 1.999 GHz
36712622640 instructions # 0.28 insn per cycle
<not supported> branches
792754 branch-misses
65.736371584 seconds time elapsed
perf stat -B -- echo REFCOUNT_TIMING >/sys/kernel/debug/provoke-crash/DIRECT
65615.259736 task-clock (msec) # 1.000 CPUs utilized
2 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
45 page-faults # 0.001 K/sec
131138621533 cycles # 1.999 GHz
43155978260 instructions # 0.33 insn per cycle
<not supported> branches
779668 branch-misses
65.616216112 seconds time elapsed
For comparison, the numbers below were captured using CONFIG_REFCOUNT_FULL,
which uses the validation routines implemented in C:
perf stat -B -- echo REFCOUNT_TIMING >/sys/kernel/debug/provoke-crash/DIRECT
Performance counter stats for 'cat /dev/fd/63':
104566.154096 task-clock (msec) # 1.000 CPUs utilized
2 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
46 page-faults # 0.000 K/sec
208929924555 cycles # 1.998 GHz
131354624418 instructions # 0.63 insn per cycle
<not supported> branches
1604302 branch-misses
104.586265040 seconds time elapsed
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
8 files changed