ARM: CPU hotplug: Delegate complete() to surviving CPU

The ARM implementation of arch_cpu_idle_dead() invokes complete(), but
does so after RCU has stopped watching the outgoing CPU, which results
in lockdep complaints because complete() invokes functions containing RCU
readers.  In addition, if the outgoing CPU really were to consume several
seconds of its five-second allotted time, multiple RCU updates could
complete, possibly giving the outgoing CPU an inconsistent view of the
scheduler data structures on which complete() relies.

This (untested, probably does not build) commit avoids this problem by
polling the outgoing CPU.  The polling strategy in this prototype patch
is quite naive, with one jiffy between each poll and without any sort
of adaptive spin phase.  The key point is that the polling CPU uses
atomic_dec_and_test(), which evicts the flag from the outgoing CPU's
cache.  The outgoing CPU simply does an atomic_set() of the value 1 which
causes the next atomic_dec_and_test() to return true, and which also
minimizes opportunities for other data to get pulled into the outgoing
CPU's cache.  This pulling of values from the outgoing CPU's cache is
important because the outgoing CPU might be unceremoniously powered off
before it has time to execute any code after the atomic_set().

Underflow is avoided because there can be at most 5,000 invocations of
atomic_dec_and_test() for a given offline operation, and the counter is
set back to zero each time.

Reported-by: Peng Fan <van.freenix@gmail.com>
Reported-by: Russell King - ARM Linux <linux@armlinux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Fabio Estevam <fabio.estevam@nxp.com>
Cc: <linux-arm-kernel@lists.infradead.org>
1 file changed