mm/page_alloc: Add remote draining support to per-cpu lists
page_alloc.c's per-cpu page lists are currently protected using local
locks. While performance savvy, this doesn't allow for remote access to
these structures. CPUs requiring system-wide per-cpu list drains get
around this by scheduling drain work on all CPUs. That said, some select
setups like systems with NOHZ_FULL CPUs, aren't well suited to this, as
they can't handle interruptions of any sort.
To mitigate this introduce a new lock-less remote draining mechanism. It
leverages the fact that the per-cpu page lists are accessed through
indirection, and that the pointer can be updated atomically. It goes
like this:
- Atomically switch the per-cpu lists pointers to ones pointing to an
empty list.
- Wait for a grace period so as for all concurrent users of the old
per-cpu lists pointer to finish updating them. Note that whatever
they were doing, the result was going to be flushed anyway[1].
- Remotely flush the old lists now that we know nobody is using them.
Once empty, these per-cpu lists will be used for the next drain.
Concurrent access to the drain process is protected by a mutex.
RCU guarantees atomicity both while dereferencing the per-cpu lists
pointer and replacing it. It also checks for RCU critical
section/locking correctness, as all readers have to hold their per-cpu
pagesets local lock. Also, synchronize_rcu_expedited() is used to
minimize hangs during low memory situations, without interrupting
NOHZ_FULL CPUs, since they are in an extended quiescent state.
As a side effect to all this we now have to promote the spin_lock() in
free_pcppages_bulk() to spin_lock_irqsave() since not all function users
enter with interrupts disabled.
Accesses to the pcplists like the ones in mm/vmstat.c don't require RCU
supervision since they can handle outdated data.
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
[1] The old mechanism disabled preemption as the means for
serialization, so per-cpu drain works were already stepping over
whatever was being processed concurrently to the drain call.
3 files changed