random: Use a lockless fast path for get_random_uXX()

Currently, the implementations of the get_random_uXX() API protect their
critical section with a local lock and disabling interrupts, to ensure
that the code does not race with itself when called from interrupt
context.

Given that the fast path does nothing more than read a single uXX
quantity from a linear buffer and bump the position pointer, poking the
hardware registers to disable and re-enable interrupts is
disproportionately costly, and best avoided.

There are two conditions under which the batched entropy buffer is
replenished, which is what forms the critical section:
- the buffer is exhausted
- the base_crng generation counter has incremented.

By combining the position and generation counters into a single u64, we
can use compare-and-exchange to implement the fast path without taking
the local lock or disabling interrupts. By constructing the expected and
next values carefully, the compare-and-exchange will only succeed if
- we did not race with ourselves, i.e., the compare-and-exchange
  increments the position counter by exactly 1;
- the buffer is not exhausted
- the generation counter equals the base_crng generation counter.

The key insight here is that batch->position never assumes the value 0x0
except inside the critical section, and so using it as the 'old' value
in the compare-and-exchange forces it to fail and take the slow path.
in which case we take the local lock. This results in a considerable
speedup (~2x) when benchmarking get_random_u8() in a tight loop (tested
x86 and arm64).

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
1 file changed