ratelimit: Reduce ratelimit's false-positive misses

The current ratelimit implementation can have false-positive misses.
That is, ___ratelimit() might return zero (causing the caller to
invoke rate limiting, for example, by dropping printk()s) even when
the current burst had not yet been used up.  This happens when one CPU
holds a given ratelimit structure's lock and some other CPU concurrently
invokes ___ratelimit().  The fact that the lock is a raw irq-disabled
spinlock might make low-contention trylock failure seem unlikely,
but vCPU preemption, NMIs, and firmware interrupts can happen, both of which
greatly extend the trylock-failure window.

Avoiding these false-positive misses is especially important when
correlating hardware failures logged on the console with other
information.

Therefore, instead of attempting to acquire the lock on each call to
___ratelimit(), construct a lockless fastpath and only acquire the lock
when retriggering for the next burst and when resynchronizing due to
either a long idle period or due to ratelimiting having been disabled.
This reduces the number of lock-hold periods that can be extended by
vCPU preemption, NMIs and firmware interrupts, but also means that these
extensions must be of much longer durations (generally from milliseconds
to seconds) to cause false-positive drops.

In addition, the lockless fastpath gets a 10-20% speedup on my x86
laptop, though mileage may vary depending on your hardware, workload,
and configuration.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Jon Pan-Doh <pandoh@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Karolina Stolarek <karolina.stolarek@oracle.com>
3 files changed