Stack tracing and RCU has been having issues with each other and lockdep
has been pointing out constant problems. The changes have been going into
the stack tracer, but it has been discovered that the problem isn't
with the stack tracer itself, but it is with calling save_stack_trace()
from within the internals of RCU. The stack tracer is the one that
can trigger the issue the easiest, but examining the problem further,
it could also happen from a WARN() in the wrong place, or even if
an NMI happened in this area and it did an rcu_read_lock().

The critical area is where RCU is not watching. Which can happen while
going to and from idle, or bringing up or taking down a CPU.

The final fix was to put the protection in kernel_text_address() as it
is the one that requires RCU to be watching while doing the stack trace.

To make this work properly, Paul had to allow rcu_irq_enter() happen after
rcu_nmi_enter(). This should have been done anyway, since an NMI can
page fault (reading vmalloc area), and a page fault triggers rcu_irq_enter().

One patch is just a consolidation of code so that the fix only needed
to be done in one location.
tracing: Remove RCU work arounds from stack tracer

Currently the stack tracer calls rcu_irq_enter() to make sure RCU
is watching when it records a stack trace. But if the stack tracer
is triggered while tracing inside of a rcu_irq_enter(), calling
rcu_irq_enter() unconditionally can be problematic.

The reason for having rcu_irq_enter() in the first place has been
fixed from within the saving of the stack trace code, and there's no
reason for doing it in the stack tracer itself. Just remove it.

Cc: stable@vger.kernel.org
Fixes: 0be964be0 ("module: Sanitize RCU usage and locking")
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Suggested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
1 file changed