blob: e0b2d455fa8aa8c516275f799e7a183061aa5cad [file] [log] [blame]
% formal/dyntickrcu.tex
% mainfile: ../perfbook.tex
% SPDX-License-Identifier: CC-BY-SA-3.0
% Disable frame around VerbatimN in two-column layout
\IfTwoColumn{
\RecustomVerbatimEnvironment{VerbatimN}{Verbatim}%
{numbers=left,numbersep=5pt,xleftmargin=10pt,xrightmargin=0pt,frame=none}
\setlength{\lnlblraise}{0pt}
}{}
\subsection{Promela Parable:
{dynticks} and Preemptible RCU}
\label{sec:formal:Promela Parable: dynticks and Preemptible RCU}
In early 2008, a preemptible variant of RCU was accepted into
mainline Linux in support of real-time workloads,
a variant similar to the RCU implementations in
the \rt\ patchset~\cite{IngoMolnar05a}
since August 2005.
Preemptible RCU is needed for real-time workloads because older
RCU implementations disable preemption across RCU read-side
critical sections, resulting in excessive real-time latencies.
However, one disadvantage of the older \rt\ implementation
was that each \IX{grace period}
requires work to be done on each CPU, even if that CPU is in a low-power
``dynticks-idle'' state,
and thus incapable of executing RCU read-side critical sections.
The idea behind the dynticks-idle state is that idle CPUs
should be physically powered down in order to conserve energy.
In short, preemptible RCU can disable a valuable energy-conservation
feature of recent Linux kernels.
Although Josh Triplett and Paul McKenney
had discussed some approaches for allowing
CPUs to remain in low-power state throughout an RCU grace period
(thus preserving the Linux kernel's ability to conserve energy), matters
did not come to a head until Steve Rostedt integrated a new dyntick
implementation with preemptible RCU in the \rt\ patchset.
This combination caused one of Steve's systems to hang on boot, so in
October, Paul coded up a dynticks-friendly modification to preemptible RCU's
grace-period processing.
Steve coded up \co{rcu_irq_enter()} and \co{rcu_irq_exit()}
interfaces called from the
\co{irq_enter()} and \co{irq_exit()} interrupt
entry/exit functions.
These \co{rcu_irq_enter()} and \co{rcu_irq_exit()}
functions are needed to allow RCU to reliably handle situations where
a dynticks-idle CPU is momentarily powered up for an interrupt
handler containing RCU read-side critical sections.
With these changes in place, Steve's system booted reliably,
but Paul continued inspecting the code periodically on the assumption
that we could not possibly have gotten the code right on the first try.
Paul reviewed the code repeatedly from October 2007 to February 2008,
and almost always found at least one bug.
In one case, Paul even coded and tested a fix before realizing that the
bug was illusory, and in fact in all cases, the ``bug'' turned out to be
illusory.
Near the end of February, Paul grew tired of this game.
He therefore decided to enlist the aid of
Promela and Spin.
The following presents a series of seven increasingly realistic
Promela models, the last of which passes, consuming about
40\,GB of main memory for the state space.
More important, Promela and Spin did find a very subtle bug for me!
\QuickQuiz{
Yeah, that's just great!
Now, just what am I supposed to do if I don't happen to have a
machine with 40\,GB of main memory???
}\QuickQuizAnswer{
Relax, there are a number of lawful answers to
this question:
\begin{enumerate}
\item Try compiler flags \co{-DCOLLAPSE} and \co{-DMA=N}
to reduce memory consumption.
See \cref{sec:formal:Running the QRCU Example}.
\item Further optimize the model, reducing its memory consumption.
\item Work out a pencil-and-paper proof, perhaps starting with the
comments in the code in the Linux kernel.
\item Devise careful torture tests, which, though they cannot prove
the code correct, can find hidden bugs.
\item There is some movement towards tools that do model
checking on clusters of smaller machines.
However, please note that we have not actually used such
tools myself, courtesy of some large machines that Paul has
occasional access to.
\item Wait for memory sizes of affordable systems to expand
to fit your problem.
\item Use one of a number of cloud-computing services to rent
a large system for a short time period.
\end{enumerate}
}\QuickQuizEnd
Still better would be to come up with a simpler and faster algorithm
that has a smaller state space.
Even better would be an algorithm so simple that its correctness was
obvious to the casual observer!
\Crefrange{sec:formal:Introduction to Preemptible RCU and dynticks}
{sec:formal:Grace-Period Interface}
give an overview of preemptible RCU's dynticks interface,
followed by
\cref{sec:formal:Validating Preemptible RCU and dynticks}'s
discussion of the validation of the interface.
\subsubsection{Introduction to Preemptible RCU and dynticks}
\label{sec:formal:Introduction to Preemptible RCU and dynticks}
The per-CPU \co{dynticks_progress_counter} variable is
central to the interface between dynticks and preemptible RCU\@.
This variable has an even value whenever the corresponding CPU
is in dynticks-idle mode, and an odd value otherwise.
A CPU exits dynticks-idle mode for the following three reasons:
\begin{enumerate}
\item To start running a task,
\item When entering the outermost of a possibly nested set of interrupt
handlers, and
\item When entering an \IXacr{nmi} handler.
\end{enumerate}
Preemptible RCU's grace-period machinery samples the value of
the \co{dynticks_progress_counter} variable in order to
determine when a dynticks-idle CPU may safely be ignored.
The following three sections give an overview of the task
interface, the interrupt/NMI interface, and the use of
the \co{dynticks_progress_counter} variable by the
grace-period machinery as of Linux kernel v2.6.25-rc4.
\subsubsection{Task Interface}
\label{sec:formal:Task Interface}
When a given CPU enters dynticks-idle mode because it has no more
tasks to run, it invokes \co{rcu_enter_nohz()}:
\begin{VerbatimN}
static inline void rcu_enter_nohz(void)
{
mb();
__get_cpu_var(dynticks_progress_counter)++;
WARN_ON(__get_cpu_var(dynticks_progress_counter) &
0x1);
}
\end{VerbatimN}
This function simply increments \co{dynticks_progress_counter} and
checks that the result is even, but first executing a memory barrier
to ensure that any other CPU that sees the new value of
\co{dynticks_progress_counter} will also see the completion
of any prior RCU read-side critical sections.
Similarly, when a CPU that is in dynticks-idle mode prepares to
start executing a newly runnable task, it invokes
\co{rcu_exit_nohz()}:
\begin{VerbatimN}
static inline void rcu_exit_nohz(void)
{
__get_cpu_var(dynticks_progress_counter)++;
mb();
WARN_ON(!(__get_cpu_var(dynticks_progress_counter) &
0x1));
}
\end{VerbatimN}
This function again increments \co{dynticks_progress_counter},
but follows it with a memory barrier to ensure that if any other CPU
sees the result of any subsequent RCU read-side critical section,
then that other CPU will also see the incremented value of
\co{dynticks_progress_counter}.
Finally, \co{rcu_exit_nohz()} checks that the result of the
increment is an odd value.
The \co{rcu_enter_nohz()} and \co{rcu_exit_nohz()}
functions handle the case where a CPU enters and exits dynticks-idle
mode due to task execution, but does not handle interrupts, which are
covered in the following section.
\subsubsection{Interrupt Interface}
\label{sec:formal:Interrupt Interface}
The \co{rcu_irq_enter()} and \co{rcu_irq_exit()}
functions handle interrupt/NMI entry and exit, respectively.
Of course, nested interrupts must also be properly accounted for.
The possibility of nested interrupts is handled by a second per-CPU
variable, \co{rcu_update_flag}, which is incremented upon
entry to an interrupt or NMI handler (in \co{rcu_irq_enter()})
and is decremented upon exit (in \co{rcu_irq_exit()}).
In addition, the pre-existing \co{in_interrupt()} primitive is
used to distinguish between an outermost or a nested interrupt/NMI\@.
Interrupt entry is handled by the \co{rcu_irq_enter()}
shown below:
\begin{fcvlabel}[ln:formal:dyntickrcu:rcu_irq_enter]
\begin{VerbatimN}[commandchars=\\\[\]]
void rcu_irq_enter(void)
{
int cpu = smp_processor_id(); \lnlbl[fetch]
if (per_cpu(rcu_update_flag, cpu)) \lnlbl[inc:b]
per_cpu(rcu_update_flag, cpu)++; \lnlbl[inc:e]
if (!in_interrupt() && \lnlbl[chk_lv:b]
(per_cpu(dynticks_progress_counter,
cpu) & 0x1) == 0) { \lnlbl[chk_lv:e]
per_cpu(dynticks_progress_counter, cpu)++; \lnlbl[inc_cnt]
smp_mb(); \lnlbl[mb]
per_cpu(rcu_update_flag, cpu)++;\lnlbl[inc_flg]
}
}
\end{VerbatimN}
\end{fcvlabel}
\begin{fcvref}[ln:formal:dyntickrcu:rcu_irq_enter]
\Clnref{fetch} fetches the current CPU's number, while \clnref{inc:b,inc:e}
increment the \co{rcu_update_flag} nesting counter if it
is already non-zero.
\Clnrefrange{chk_lv:b}{chk_lv:e} check to see whether we are
the outermost level of
interrupt, and, if so, whether \co{dynticks_progress_counter}
needs to be incremented.
If so, \clnref{inc_cnt} increments \co{dynticks_progress_counter},
\clnref{mb} executes a memory barrier, and \clnref{inc_flg} increments
\co{rcu_update_flag}.
As with \co{rcu_exit_nohz()}, the memory barrier ensures that
any other CPU that sees the effects of an RCU read-side critical section
in the interrupt handler (following the \co{rcu_irq_enter()}
invocation) will also see the increment of
\co{dynticks_progress_counter}.
\end{fcvref}
\QuickQuizSeries{%
\QuickQuizB{
Why not simply increment \co{rcu_update_flag}, and then only
increment \co{dynticks_progress_counter} if the old value
of \co{rcu_update_flag} was zero???
}\QuickQuizAnswerB{
This fails in presence of NMIs.
To see this, suppose an NMI was received just after
\co{rcu_irq_enter()} incremented \co{rcu_update_flag},
but before it incremented \co{dynticks_progress_counter}.
The instance of \co{rcu_irq_enter()} invoked by the NMI
would see that the original value of \co{rcu_update_flag}
was non-zero, and would therefore refrain from incrementing
\co{dynticks_progress_counter}.
This would leave the RCU grace-period machinery no clue that the
NMI handler was executing on this CPU, so that any RCU read-side
critical sections in the NMI handler would lose their RCU protection.
The possibility of NMI handlers, which, by definition cannot
be masked, does complicate this code.
}\QuickQuizEndB
%
\QuickQuizE{
\begin{fcvref}[ln:formal:dyntickrcu:rcu_irq_enter]
But if \clnref{chk_lv:b} finds that we are the outermost interrupt,
wouldn't we \emph{always} need to increment
\co{dynticks_progress_counter}?
\end{fcvref}
}\QuickQuizAnswerE{
Not if we interrupted a running task!
In that case, \co{dynticks_progress_counter} would
have already been incremented by \co{rcu_exit_nohz()},
and there would be no need to increment it again.
}\QuickQuizEndE
}
Interrupt exit is handled similarly by
\co{rcu_irq_exit()}:
\begin{fcvlabel}[ln:formal:dyntickrcu:rcu_irq_exit]
\begin{VerbatimN}[commandchars=\\\[\]]
void rcu_irq_exit(void)
{
int cpu = smp_processor_id(); \lnlbl[fetch]
if (per_cpu(rcu_update_flag, cpu)) { \lnlbl[chk_flg]
if (--per_cpu(rcu_update_flag, cpu)) \lnlbl[dec_flg]
return;
WARN_ON(in_interrupt()); \lnlbl[verify]
smp_mb(); \lnlbl[mb]
per_cpu(dynticks_progress_counter, cpu)++; \lnlbl[inc_cnt]
WARN_ON(per_cpu(dynticks_progress_counter, \lnlbl[vrf_even:b]
cpu) & 0x1); \lnlbl[vrf_even:e]
}
}
\end{VerbatimN}
\end{fcvlabel}
\begin{fcvref}[ln:formal:dyntickrcu:rcu_irq_exit]
\Clnref{fetch} fetches the current CPU's number, as before.
\Clnref{chk_flg} checks to see if the \co{rcu_update_flag} is
non-zero, returning immediately (via falling off the end of the
function) if not.
Otherwise, \clnrefthro{dec_flg}{vrf_even:e} come into play.
\Clnref{dec_flg} decrements \co{rcu_update_flag}, returning
if the result is not zero.
\Clnref{verify} verifies that we are indeed leaving the outermost
level of nested interrupts, \clnref{mb} executes a memory barrier,
\clnref{inc_cnt} increments \co{dynticks_progress_counter},
and \clnref{vrf_even:b,vrf_even:e} verify that this
variable is now even.
As with \co{rcu_enter_nohz()}, the memory barrier ensures that
any other CPU that sees the increment of
\co{dynticks_progress_counter}
will also see the effects of an RCU read-side critical section
in the interrupt handler (preceding the \co{rcu_irq_exit()}
invocation).
\end{fcvref}
These two sections have described how the
\co{dynticks_progress_counter} variable is maintained during
entry to and exit from dynticks-idle mode, both by tasks and by
interrupts and NMIs.
The following section describes how this variable is used by
preemptible RCU's grace-period machinery.
\subsubsection{Grace-Period Interface}
\label{sec:formal:Grace-Period Interface}
Of the four preemptible RCU grace-period states shown in
\cref{fig:formal:Preemptible RCU State Machine},
only the \co{rcu_try_flip_waitack_state}
and \co{rcu_try_flip_waitmb_state} states need to wait
for other CPUs to respond.
\begin{figure}
\centering
\resizebox{2.5in}{!}{\includegraphics{formal/RCUpreemptStates}}
\caption{Preemptible RCU State Machine}
\label{fig:formal:Preemptible RCU State Machine}
\end{figure}
Of course, if a given CPU is in dynticks-idle state, we shouldn't
wait for it.
Therefore, just before entering one of these two states,
the preceding state takes a snapshot of each CPU's
\co{dynticks_progress_counter} variable, placing the
snapshot in another per-CPU variable,
\co{rcu_dyntick_snapshot}.
This is accomplished by invoking
\co{dyntick_save_progress_counter()}, shown below:
\begin{VerbatimN}
static void dyntick_save_progress_counter(int cpu)
{
per_cpu(rcu_dyntick_snapshot, cpu) =
per_cpu(dynticks_progress_counter, cpu);
}
\end{VerbatimN}
The \co{rcu_try_flip_waitack_state} state invokes
\co{rcu_try_flip_waitack_needed()}, shown below:
\begin{fcvlabel}[ln:formal:dyntickrcu:rcu_try_flip_waitack_needed]
\begin{VerbatimN}[samepage=true,commandchars=\\\[\]]
static inline int
rcu_try_flip_waitack_needed(int cpu)
{
long curr;
long snap;
curr = per_cpu(dynticks_progress_counter, cpu); \lnlbl[curr]
snap = per_cpu(rcu_dyntick_snapshot, cpu); \lnlbl[snap]
smp_mb(); \lnlbl[mb]
if ((curr == snap) && ((curr & 0x1) == 0)) \lnlbl[chk_remain]
return 0; \lnlbl[ret_0_r]
if ((curr - snap) > 2 || (snap & 0x1) == 0) \lnlbl[chk_idle]
return 0; \lnlbl[ret_0_i]
return 1; \lnlbl[ret_1]
}
\end{VerbatimN}
\end{fcvlabel}
\begin{fcvref}[ln:formal:dyntickrcu:rcu_try_flip_waitack_needed]
\Clnref{curr,snap} pick up current and snapshot versions of
\co{dynticks_progress_counter}, respectively.
The memory barrier on \clnref{mb} ensures that the counter checks
in the later \co{rcu_try_flip_waitzero_state} follow
the fetches of these counters.
\Clnref{chk_remain,ret_0_r} return zero
(meaning no communication with the
specified CPU is required) if that CPU has remained in dynticks-idle
state since the time that the snapshot was taken.
Similarly, \clnref{chk_idle,ret_0_i} return zero
if that CPU was initially
in dynticks-idle state or if it has completely passed through a
dynticks-idle state.
In both these cases, there is no way that the CPU could have retained
the old value of the grace-period counter.
If neither of these conditions hold, \clnref{ret_1} returns one, meaning
that the CPU needs to explicitly respond.
\end{fcvref}
For its part, the \co{rcu_try_flip_waitmb_state} state
invokes \co{rcu_try_flip_waitmb_needed()}, shown below:
\begin{fcvlabel}[ln:formal:dyntickrcu:rcu_try_flip_waitmb_needed]
\begin{VerbatimN}[commandchars=\\\[\]]
static inline int
rcu_try_flip_waitmb_needed(int cpu)
{
long curr;
long snap;
curr = per_cpu(dynticks_progress_counter, cpu);
snap = per_cpu(rcu_dyntick_snapshot, cpu);
smp_mb();
if ((curr == snap) && ((curr & 0x1) == 0))
return 0;
if (curr != snap) \lnlbl[chk_to_from]
return 0; \lnlbl[ret_0]
return 1;
}
\end{VerbatimN}
\end{fcvlabel}
\begin{fcvref}[ln:formal:dyntickrcu:rcu_try_flip_waitmb_needed]
This is quite similar to \co{rcu_try_flip_waitack_needed()},
the difference being in \clnref{chk_to_from,ret_0}, because any transition
either to or from dynticks-idle state executes the memory barrier
needed by the \co{rcu_try_flip_waitmb_state} state.
\end{fcvref}
We now have seen all the code involved in the interface between
RCU and the dynticks-idle state.
The next section builds up the Promela model used to verify this
code.
\QuickQuiz{
Can you spot any bugs in any of the code in this section?
}\QuickQuizAnswer{
Read the next section to see if you were correct.
}\QuickQuizEnd
\subsection{Validating Preemptible RCU and dynticks}
\label{sec:formal:Validating Preemptible RCU and dynticks}
This section develops a Promela model for the interface between
dynticks and RCU step by step, with each of
\crefrange{sec:formal:Basic Model}{sec:formal:Validating NMI Handlers}
illustrating one step, starting with the process-level code,
adding assertions, interrupts, and finally NMIs.
\Cref{sec:formal:Lessons (Re)Learned} lists
lessons (re)learned during this effort, and
\crefrange{sec:formal:Simplicity Avoids Formal Verification}{sec:formal:Discussion}
present a simpler solution to RCU's dynticks problem.
\subsubsection{Basic Model}
\label{sec:formal:Basic Model}
This section translates the process-level dynticks entry/exit
code and the grace-period processing into
Promela~\cite{Holzmann03a}.
We start with \co{rcu_exit_nohz()} and
\co{rcu_enter_nohz()}
from the 2.6.25-rc4 kernel, placing these in a single Promela
process that models exiting and entering dynticks-idle mode in
a loop as follows:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-base@dyntick_nohz.fcv}
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-base:dyntick_nohz]
\Clnref{do,od} define a loop.
\Clnref{break} exits the loop once the loop counter~\co{i}
has exceeded the limit \co{MAX_DYNTICK_LOOP_NOHZ}.
\Clnref{kick_loop} tells the loop construct to execute
\clnrefrange{ex_inc:b}{inc_i}
for each pass through the loop.
Because the conditionals on \clnref{break,kick_loop}
are exclusive of
each other, the normal Promela random selection of true conditions
is disabled.
\Clnref{ex_inc:b,ex_inc:e} model
\co{rcu_exit_nohz()}'s non-atomic
increment of \co{dynticks_progress_counter}, while
\clnref{ex_assert} models the \co{WARN_ON()}.
The \co{atomic} construct simply reduces the Promela state space,
given that the \co{WARN_ON()} is not strictly speaking part
of the algorithm.
\Clnrefrange{ent_inc:b}{ent_inc:e} similarly model the increment and
\co{WARN_ON()} for \co{rcu_enter_nohz()}.
Finally, \clnref{inc_i} increments the loop counter.
\end{fcvref}
Each pass through the loop therefore models a CPU exiting
dynticks-idle mode (for example, starting to execute a task), then
re-entering dynticks-idle mode (for example, that same task blocking).
\QuickQuizSeries{%
\QuickQuizB{
Why isn't the memory barrier in \co{rcu_exit_nohz()}
and \co{rcu_enter_nohz()} modeled in Promela?
}\QuickQuizAnswerB{
Promela assumes sequential consistency, so
it is not necessary to model memory barriers.
In fact, one must instead explicitly model lack of memory barriers,
for example, as shown in
\cref{lst:formal:QRCU Unordered Summation} on
\cpageref{lst:formal:QRCU Unordered Summation}.
}\QuickQuizEndB
%
\QuickQuizE{
Isn't it a bit strange to model \co{rcu_exit_nohz()}
followed by \co{rcu_enter_nohz()}?
Wouldn't it be more natural to instead model entry before exit?
}\QuickQuizAnswerE{
It probably would be more natural, but we will need
this particular order for the liveness checks that we will add later.
}\QuickQuizEndE
}
The next step is to model the interface to RCU's grace-period
processing.
For this, we need to model
\co{dyntick_save_progress_counter()},
\co{rcu_try_flip_waitack_needed()},
\co{rcu_try_flip_waitmb_needed()},
as well as portions of
\co{rcu_try_flip_waitack()} and
\co{rcu_try_flip_waitmb()}, all from the 2.6.25-rc4 kernel.
The following \co{grace_period()} Promela process models
these functions as they would be invoked during a single pass
through preemptible RCU's grace-period processing.
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-base@grace_period.fcv}
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-base:grace_period]
\Clnrefrange{print:b}{print:e} print out the loop limit
(but only into the ``.trail'' file
in case of error) and models a line of code
from \co{rcu_try_flip_idle()} and its call to
\co{dyntick_save_progress_counter()}, which takes a
snapshot of the current CPU's \co{dynticks_progress_counter}
variable.
These two lines are executed atomically to reduce state space.
\Clnrefrange{do1}{od1} model the relevant code in
\co{rcu_try_flip_waitack()} and its call to
\co{rcu_try_flip_waitack_needed()}.
This loop is modeling the grace-period state machine waiting for
a counter-flip acknowledgement from each CPU, but only that part
that interacts with dynticks-idle CPUs.
\Clnref{snap} models a line from \co{rcu_try_flip_waitzero()}
and its call to \co{dyntick_save_progress_counter()}, again
taking a snapshot of the CPU's \co{dynticks_progress_counter}
variable.
Finally, \clnrefrange{do2}{od2} model the relevant code in
\co{rcu_try_flip_waitack()} and its call to
\co{rcu_try_flip_waitack_needed()}.
This loop is modeling the grace-period state-machine waiting for
each CPU to execute a memory barrier, but again only that part
that interacts with dynticks-idle CPUs.
\end{fcvref}
\QuickQuiz{
Wait a minute!
In the Linux kernel, both \co{dynticks_progress_counter} and
\co{rcu_dyntick_snapshot} are per-CPU variables.
So why are they instead being modeled as single global variables?
}\QuickQuizAnswer{
Because the grace-period code processes each
CPU's \co{dynticks_progress_counter} and
\co{rcu_dyntick_snapshot} variables separately,
we can collapse the state onto a single CPU\@.
If the grace-period code were instead to do something special
given specific values on specific CPUs, then we would indeed need
to model multiple CPUs.
But fortunately, we can safely confine ourselves to two CPUs, the
one running the grace-period processing and the one entering and
leaving dynticks-idle mode.
}\QuickQuizEnd
The resulting model (\path{dyntickRCU-base.spin}),
when run with the
\path{runspin.sh} script,
generates 691 states and
passes without errors, which is not at all surprising given that
it completely lacks the assertions that could find failures.
The next section therefore adds safety assertions.
\subsubsection{Validating Safety}
\label{sec:formal:Validating Safety}
A safe RCU implementation must never permit a grace period to
complete before the completion of any RCU readers that started
before the start of the grace period.
This is modeled by a \co{grace_period_state} variable that
can take on three states as follows:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-base-s@grace_period_state.fcv}
The \co{grace_period()} process sets this variable as it
progresses through the grace-period phases, as shown below:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-base-s@grace_period.fcv}
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-base-s:grace_period]
\Clnref{upd_gps1,upd_gps2,upd_gps3,upd_gps4,upd_gps5,upd_gps6}
update this variable (combining
atomically with algorithmic operations where feasible) to
allow the \co{dyntick_nohz()} process to verify the basic
RCU safety property.
The form of this verification is to assert that the value of the
\co{grace_period_state} variable cannot jump from
\co{GP_IDLE} to \co{GP_DONE} during a time period
over which RCU readers could plausibly persist.
\end{fcvref}
\QuickQuiz{
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-base-s:grace_period]
Given there are a pair of back-to-back changes to
\co{grace_period_state} on \clnref{upd_gps3,upd_gps4},
how can we be sure that \clnref{upd_gps3}'s changes won't be lost?
\end{fcvref}
}\QuickQuizAnswer{
Recall that Promela and Spin trace out
every possible sequence of state changes.
Therefore, timing is irrelevant:
Promela/Spin will be quite happy to jam the entire rest of
the model between those two statements unless some state
variable specifically prohibits doing so.
}\QuickQuizEnd
The \co{dyntick_nohz()} Promela process implements
this verification as shown below:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-base-s@dyntick_nohz.fcv}
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-base-s:dyntick_nohz]
\Clnref{new_flg} sets a new \co{old_gp_idle} flag if the
value of the \co{grace_period_state} variable is
\co{GP_IDLE} at the beginning of task execution,
and the assertion at \clnref{assert:b,assert:e}
fire if the \co{grace_period_state}
variable has advanced to \co{GP_DONE} during task execution,
which would be illegal given that a single RCU read-side critical
section could span the entire intervening time period.
\end{fcvref}
The resulting
model (\path{dyntickRCU-base-s.spin}),
when run with the \path{runspin.sh} script,
generates 964 states and passes without errors, which is reassuring.
That said, although safety is critically important, it is also quite
important to avoid indefinitely stalling grace periods.
The next section therefore covers verifying liveness.
\subsubsection{Validating Liveness}
\label{sec:formal:Validating Liveness}
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-base-sl-busted:dyntick_nohz]
Although liveness can be difficult to prove, there is a simple
trick that applies here.
The first step is to make \co{dyntick_nohz()} indicate that
it is done via a \co{dyntick_nohz_done} variable, as shown on
\clnref{done} of the following:
\end{fcvref}
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-base-sl-busted@dyntick_nohz.fcv}
With this variable in place, we can add assertions to
\co{grace_period()} to check for unnecessary blockage
as follows:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-base-sl-busted@grace_period.fcv}
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-base-sl-busted:grace_period]
We have added the \co{shouldexit} variable on \clnref{shex},
which we initialize to zero on \clnref{init_shex}.
\Clnref{assert_shex} asserts that \co{shouldexit} is not set, while
\clnref{set_shex} sets \co{shouldexit} to the \co{dyntick_nohz_done}
variable maintained by \co{dyntick_nohz()}.
This assertion will therefore trigger if we attempt to take more than
one pass through the wait-for-counter-flip-acknowledgement
loop after \co{dyntick_nohz()} has completed
execution.
After all, if \co{dyntick_nohz()} is done, then there cannot be
any more state changes to force us out of the loop, so going through twice
in this state means an infinite loop, which in turn means no end to the
grace period.
\Clnref{init_shex2,assert_shex2,set_shex2} operate in a similar manner
for the second (memory-barrier) loop.
However, running this
model (\path{dyntickRCU-base-sl-busted.spin})
results in failure, as \clnref{chk_2} is checking that
the wrong variable
is even.
Upon failure, \co{spin} writes out a
``trail'' file
(\path{dyntickRCU-base-sl-busted.spin.trail}),
which records the sequence of states that lead to the failure.
Use the ``\co{spin -t -p -g -l\ }%
\path{dyntickRCU-base-sl-busted.spin}''
command to cause \co{spin} to retrace this sequence of states,
printing the statements executed and the values of variables
(\path{dyntickRCU-base-sl-busted.spin.trail.txt}).
Note that the line numbers do not match the listing above due to
the fact that \co{spin} takes both functions in a single file.
However, the line numbers \emph{do} match the full
model (\path{dyntickRCU-base-sl-busted.spin}).
We see that the \co{dyntick_nohz()} process completed
at step 34 (search for ``34:''), but that the
\co{grace_period()} process nonetheless failed to exit the loop.
The value of \co{curr} is \co{6} (see step 35)
and that the value of \co{snap} is \co{5} (see step 17).
Therefore the first condition on \clnref{chk_1} above
does not hold because
\qco{curr != snap}, and the second condition on \clnref{chk_2}
does not hold either because \co{snap} is odd and because
\co{curr} is only one greater than \co{snap}.
\end{fcvref}
So one of these two conditions has to be incorrect.
Referring to the comment block in \co{rcu_try_flip_waitack_needed()}
for the first condition:
\begin{quote}
If the CPU remained in dynticks mode for the entire time
and didn't take any interrupts, NMIs, SMIs, or whatever,
then it cannot be in the middle of an \co{rcu_read_lock()}, so
the next \co{rcu_read_lock()} it executes must use the new value
of the counter.
So we can safely pretend that this CPU already acknowledged
the counter.
\end{quote}
The first condition does match this, because if \qco{curr == snap}
and if \co{curr} is even, then the corresponding CPU has been
in dynticks-idle mode the entire time, as required.
So let's look at the comment block for the second condition:
\begin{quote}
If the CPU passed through or entered a dynticks idle phase with
no active irq handlers, then, as above, we can safely pretend
that this CPU already acknowledged the counter.
\end{quote}
The first part of the condition is correct, because if \co{curr}
and \co{snap} differ by two, there will be at least one even
number in between, corresponding to having passed completely through
a dynticks-idle phase.
However, the second part of the condition corresponds to having
\emph{started} in dynticks-idle mode, not having \emph{finished}
in this mode.
We therefore need to be testing \co{curr} rather than
\co{snap} for being an even number.
The corrected C code is as follows:
\begin{fcvlabel}[ln:formal:dyntickrcu:rcu_try_flip_waitack_needed_fixed]
\begin{VerbatimN}[commandchars=\\\[\]]
static inline int
rcu_try_flip_waitack_needed(int cpu)
{
long curr;
long snap;
curr = per_cpu(dynticks_progress_counter, cpu);
snap = per_cpu(rcu_dyntick_snapshot, cpu);
smp_mb();
if ((curr == snap) && ((curr & 0x1) == 0)) \lnlbl[if:b]
return 0;
if ((curr - snap) > 2 || (curr & 0x1) == 0)
return 0; \lnlbl[if:e]
return 1;
}
\end{VerbatimN}
\end{fcvlabel}
\begin{fcvref}[ln:formal:dyntickrcu:rcu_try_flip_waitack_needed_fixed]
\Clnrefrange{if:b}{if:e} can now be combined and simplified,
resulting in the following.
A similar simplification can be applied to
\co{rcu_try_flip_waitmb_needed()}.
\end{fcvref}
\begin{VerbatimN}[commandchars=\\\[\]]
static inline int
rcu_try_flip_waitack_needed(int cpu)
{
long curr;
long snap;
curr = per_cpu(dynticks_progress_counter, cpu);
snap = per_cpu(rcu_dyntick_snapshot, cpu);
smp_mb();
if ((curr - snap) >= 2 || (curr & 0x1) == 0)
return 0;
return 1;
}
\end{VerbatimN}
Making the corresponding correction in the
model (\path{dyntickRCU-base-sl.spin})
results in a correct verification with 661 states that passes without
errors.
However, it is worth noting that the first version of the liveness
verification failed to catch this bug, due to a bug in the liveness
verification itself.
This liveness-verification bug was located by inserting an infinite
loop in the \co{grace_period()} process, and noting that
the liveness-verification code failed to detect this problem!
We have now successfully verified both safety and liveness
conditions, but only for processes running and blocking.
We also need to handle interrupts, a task taken up in the next section.
\subsubsection{Interrupts}
\label{sec:formal:Interrupts}
There are a couple of ways to model interrupts in Promela:
\begin{enumerate}
\item Using C-preprocessor tricks to insert the interrupt handler
between each and every statement of the \co{dynticks_nohz()}
process, or
\item Modeling the interrupt handler with a separate process.
\end{enumerate}
A bit of thought indicated that the second approach would have a
smaller state space, though it requires that the interrupt handler
somehow run atomically with respect to the \co{dynticks_nohz()}
process, but not with respect to the \co{grace_period()}
process.
Fortunately, it turns out that Promela permits you to branch
out of atomic statements.
This trick allows us to have the interrupt handler set a flag, and
recode \co{dynticks_nohz()} to atomically check this flag
and execute only when the flag is not set.
This can be accomplished with a C-preprocessor macro that takes
a label and a Promela statement as follows:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-irqnn-ssl@EXECUTE_MAINLINE.fcv}
One might use this macro as follows:
\begin{VerbatimU}
EXECUTE_MAINLINE(stmt1,
tmp = dynticks_progress_counter)
\end{VerbatimU}
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-irqnn-ssl:EXECUTE_MAINLINE]
\Clnref{label} of the macro creates the specified statement label.
\Clnrefrange{atm:b}{atm:e} are an atomic block that tests
the \co{in_dyntick_irq}
variable, and if this variable is set (indicating that the interrupt
handler is active), branches out of the atomic block back to the
label.
Otherwise, \clnref{else} executes the specified statement.
The overall effect is that mainline execution stalls any time an interrupt
is active, as required.
\end{fcvref}
\subsubsection{Validating Interrupt Handlers}
\label{sec:formal:Validating Interrupt Handlers}
The first step is to convert \co{dyntick_nohz()} to
\co{EXECUTE_MAINLINE()} form, as follows:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-irqnn-ssl@dyntick_nohz.fcv}
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-irqnn-ssl:dyntick_nohz]
It is important to note that when a group of statements is passed
to \co{EXECUTE_MAINLINE()}, as in \clnrefrange{stmt2:b}{stmt2:e}, all
statements in that group execute atomically.
\end{fcvref}
\QuickQuizSeries{%
\QuickQuizB{
But what would you do if you needed the statements in a single
\co{EXECUTE_MAINLINE()} group to execute non-atomically?
}\QuickQuizAnswerB{
The easiest thing to do would be to put
each such statement in its own \co{EXECUTE_MAINLINE()}
statement.
}\QuickQuizEndB
%
\QuickQuizE{
But what if the \co{dynticks_nohz()} process had
\qco{if} or \qco{do} statements with conditions,
where the statement bodies of these constructs
needed to execute non-atomically?
}\QuickQuizAnswerE{
One approach, as we will see in a later section,
is to use explicit labels and \qco{goto} statements.
For example, the construct:
\begin{VerbatimU}
if
:: i == 0 -> a = -1;
:: else -> a = -2;
fi;
\end{VerbatimU}
%
could be modeled as something like:
\begin{VerbatimU}
EXECUTE_MAINLINE(stmt1,
if
:: i == 0 -> goto stmt1_then;
:: else -> goto stmt1_else;
fi)
stmt1_then: skip;
EXECUTE_MAINLINE(stmt1_then1, a = -1; goto stmt1_end)
stmt1_else: skip;
EXECUTE_MAINLINE(stmt1_then1, a = -2)
stmt1_end: skip;
\end{VerbatimU}
However, it is not clear that the macro is helping much in the case
of the \qco{if} statement, so these sorts of situations will
be open-coded in the following sections.
}\QuickQuizEndE
}
The next step is to write a \co{dyntick_irq()} process
to model an interrupt handler:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-irqnn-ssl@dyntick_irq.fcv}
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-irqnn-ssl:dyntick_irq]
The loop from \clnrefrange{do}{od} models up to \co{MAX_DYNTICK_LOOP_IRQ}
interrupts, with \clnref{cond1,cond2} forming the loop condition and
\clnref{inc_i} incrementing the control variable.
\Clnref{in_irq} tells \co{dyntick_nohz()} that an interrupt handler
is running, and \clnref{clr_in_irq} tells \co{dyntick_nohz()} that this
handler has completed.
\Clnref{irq_done} is used for liveness verification, just like the corresponding
line of \co{dyntick_nohz()}.
\end{fcvref}
\QuickQuiz{
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-irqnn-ssl:dyntick_irq]
Why are \clnref{clr_in_irq,inc_i} (the \qtco{in_dyntick_irq = 0;}
and the \qco{i++;}) executed atomically?
\end{fcvref}
}\QuickQuizAnswer{
These lines of code pertain to controlling the
model, not to the code being modeled, so there is no reason to
model them non-atomically.
The motivation for modeling them atomically is to reduce the size
of the state space.
}\QuickQuizEnd
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-irqnn-ssl:dyntick_irq]
\Clnrefrange{enter:b}{enter:e} model \co{rcu_irq_enter()}, and
\clnref{add_prmt_cnt:b,add_prmt_cnt:e} model the relevant snippet
of \co{__irq_enter()}.
\Clnrefrange{vrf_safe:b}{vrf_safe:e} verify safety in much the same manner
as do the corresponding lines of \co{dynticks_nohz()}.
\Clnref{irq_exit:b,irq_exit:e} model the relevant snippet of \co{__irq_exit()},
and finally \clnrefrange{exit:b}{exit:e} model \co{rcu_irq_exit()}.
\end{fcvref}
\QuickQuiz{
What property of interrupts is this \co{dynticks_irq()}
process unable to model?
}\QuickQuizAnswer{
One such property is nested interrupts,
which are handled in the following section.
}\QuickQuizEnd
The \co{grace_period()} process then becomes as follows:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-irqnn-ssl@grace_period.fcv}
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-irqnn-ssl:grace_period]
The implementation of \co{grace_period()} is very similar
to the earlier one.
The only changes are the addition of \clnref{MDLI} to add the new
interrupt-count parameter, changes to
\clnref{edit1,edit3} to add the new \co{dyntick_irq_done} variable
to the liveness checks, and of course the optimizations on \clnref{edit2,edit4}.
\end{fcvref}
This model (\path{dyntickRCU-irqnn-ssl.spin})
results in a correct verification with roughly half a million
states, passing without errors.
However, this version of the model does not handle nested
interrupts.
This topic is taken up in the next section.
\subsubsection{Validating Nested Interrupt Handlers}
\label{sec:formal:Validating Nested Interrupt Handlers}
Nested interrupt handlers may be modeled by splitting the body of
the loop in \co{dyntick_irq()} as follows:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-irq-ssl@dyntick_irq.fcv}
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-irq-ssl:dyntick_irq]
This is similar to the earlier \co{dynticks_irq()} process.
It adds a second counter variable~\co{j} on \clnref{j}, so that
\co{i}~counts entries to interrupt handlers and \co{j}~counts
exits.
The \co{outermost} variable on \clnref{om} helps determine
when the \co{grace_period_state} variable needs to be sampled
for the safety checks.
The loop-exit check on \clnref{chk_ex:b,chk_ex:e} is updated to require that the
specified number of interrupt handlers are exited as well as entered,
and the increment of~\co{i} is moved to \clnref{inc_i}, which is
the end of the interrupt-entry model.
\Clnrefrange{atm1:b}{atm1:e} set the \co{outermost} variable to indicate
whether this is the outermost of a set of nested interrupts and to
set the \co{in_dyntick_irq} variable that is used by the
\co{dyntick_nohz()} process.
\Clnrefrange{atm2:b}{atm2:e} capture the state of the \co{grace_period_state}
variable, but only when in the outermost interrupt handler.
\Clnref{cnd_ex} has the do-loop conditional for interrupt-exit modeling:
As long as we have exited fewer interrupts than we have entered, it is
legal to exit another interrupt.
\Clnrefrange{atm3:b}{atm3:e}
check the safety criterion, but only if we are exiting
from the outermost interrupt level.
Finally, \clnrefrange{atm4:b}{atm4:e} increment the interrupt-exit count~\co{j}
and, if this is the outermost interrupt level, clears
\co{in_dyntick_irq}.
\end{fcvref}
This model (\path{dyntickRCU-irq-ssl.spin})
results in a correct verification with a bit more than half a million
states, passing without errors.
However, this version of the model does not handle NMIs,
which are taken up in the next section.
\subsubsection{Validating NMI Handlers}
\label{sec:formal:Validating NMI Handlers}
We take the same general approach for NMIs as we do for interrupts,
keeping in mind that NMIs do not nest.
This results in a \co{dyntick_nmi()} process as follows:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-irq-nmi-ssl@dyntick_nmi.fcv}
Of course, the fact that we have NMIs requires adjustments in
the other components.
For example, the \co{EXECUTE_MAINLINE()} macro now needs to
pay attention to the NMI handler (\co{in_dyntick_nmi}) as well
as the interrupt handler (\co{in_dyntick_irq}) by checking
the \co{dyntick_nmi_done} variable as follows:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-irq-nmi-ssl@EXECUTE_MAINLINE.fcv}
We will also need to introduce an \co{EXECUTE_IRQ()}
macro that checks \co{in_dyntick_nmi} in order to allow
\co{dyntick_irq()} to exclude \co{dyntick_nmi()}:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-irq-nmi-ssl@EXECUTE_IRQ.fcv}
It is further necessary to convert \co{dyntick_irq()}
to \co{EXECUTE_IRQ()} as follows:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-irq-nmi-ssl@dyntick_irq.fcv}
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-irq-nmi-ssl:dyntick_irq]
Note that we have open-coded the \qco{if} statements
(for example, \clnrefrange{stmt1:b}{stmt1:e}).
In addition, statements that process strictly local state
(such as \clnref{inc_i}) need not exclude \co{dyntick_nmi()}.
\end{fcvref}
Finally, \co{grace_period()} requires only a few changes:
\input{CodeSamples/formal/promela/dyntick/dyntickRCU-irq-nmi-ssl@grace_period.fcv}
\begin{fcvref}[ln:formal:promela:dyntick:dyntickRCU-irq-nmi-ssl:grace_period]
We have added the \co{printf()} for the new
\co{MAX_DYNTICK_LOOP_NMI} parameter on \clnref{MDL_NMI} and
added \co{dyntick_nmi_done} to the \co{shouldexit}
assignments on \clnref{nmi_done1,nmi_done2}.
\end{fcvref}
The model (\path{dyntickRCU-irq-nmi-ssl.spin})
results in a correct verification with several hundred million
states, passing without errors.
\QuickQuiz{
Does Paul \emph{always} write his code in this painfully incremental
manner?
}\QuickQuizAnswer{
Not always, but more and more frequently.
In this case, Paul started with the smallest slice of code that
included an interrupt handler, because he was not sure how best
to model interrupts in Promela.
Once he got that working, he added other features.
(But if he was doing it again, he would start with a ``toy'' handler.
For example, he might have the handler increment a variable twice and
have the mainline code verify that the value was always even.)
Why the incremental approach?
Consider the following, attributed to Brian W. Kernighan:
\begin{quote}
Debugging is twice as hard as writing the code in the first
place.
Therefore, if you write the code as cleverly as possible,
you are, by definition, not smart enough to debug it.
\end{quote}
This means that any attempt to optimize the production of code should
place at least 66\pct\ of its emphasis on optimizing the debugging process,
even at the expense of increasing the time and effort spent coding.
Incremental coding and testing is one way to optimize the debugging
process, at the expense of some increase in coding effort.
Paul uses this approach because he rarely has the luxury of
devoting full days (let alone weeks) to coding and debugging.
}\QuickQuizEnd
\subsubsection{Lessons (Re)Learned}
\label{sec:formal:Lessons (Re)Learned}
This effort provided some lessons (re)learned:
\begin{enumerate}
\item {\bf Promela and Spin can verify interrupt\slash NMI\-/handler
interactions.}
\item {\bf Documenting code can help locate bugs.}
In this case, the documentation effort located
a misplaced memory barrier in
\co{rcu_enter_nohz()} and \co{rcu_exit_nohz()},
as shown by the following patch~\cite{PaulEMcKenney2008commit:ae66be9b71b1}.
\begin{VerbatimU}
static inline void rcu_enter_nohz(void)
{
+ mb();
__get_cpu_var(dynticks_progress_counter)++;
- mb();
}
static inline void rcu_exit_nohz(void)
{
- mb();
__get_cpu_var(dynticks_progress_counter)++;
+ mb();
}
\end{VerbatimU}
\item {\bf Validate your code early, often, and up to the point
of destruction.}
This effort located one subtle bug in
\co{rcu_try_flip_waitack_needed()}
that would have been quite difficult to test or debug, as
shown by the following patch~\cite{PaulEMcKenney2008commit:d7c0651390b6}.
\begin{VerbatimU}
- if ((curr - snap) > 2 || (snap & 0x1) == 0)
+ if ((curr - snap) > 2 || (curr & 0x1) == 0)
\end{VerbatimU}
\item {\bf Always verify your verification code.}
The usual way to do this is to insert a deliberate bug
and verify that the verification code catches it.
Of course, if the verification code fails to catch this bug,
you may also need to verify the bug itself, and so on,
recursing infinitely.
However, if you find yourself in this position,
getting a good night's sleep
can be an extremely effective debugging technique.
You will then see that the obvious verify-the-verification
technique is to deliberately insert bugs in the code being
verified.
If the verification fails to find them, the verification clearly
is buggy.
\item {\bf Use of atomic instructions can simplify verification.}
Unfortunately, use of the \co{cmpxchg} atomic instruction
would also slow down the critical \IXacr{irq} fastpath, so they
are not appropriate in this case.
\item {\bf The need for complex formal verification often indicates
a need to re-think your design.}
\end{enumerate}
To this last point, it turns out that there is a much simpler solution to
the dynticks problem, which is presented in the next section.
\subsubsection{Simplicity Avoids Formal Verification}
\label{sec:formal:Simplicity Avoids Formal Verification}
The complexity of the dynticks interface for preemptible RCU is primarily
due to the fact that both \IRQ s and NMIs use the same code path and the
same state variables.
This leads to the notion of providing separate code paths and variables
for \IRQ s and NMIs, as has been done for
hierarchical RCU~\cite{PaulEMcKenney2008HierarchicalRCU}
as indirectly suggested by
Manfred Spraul~\cite{ManfredSpraul2008StateMachineRCU}.
This work was pulled into mainline kernel during the v2.6.29
development cycle~\cite{PaulEMcKenney2008commit:64db4cfff99c}.
\subsubsection{State Variables for Simplified Dynticks Interface}
\label{sec:formal:State Variables for Simplified Dynticks Interface}
\Cref{lst:formal:Variables for Simple Dynticks Interface}
shows the new per-CPU state variables.
These variables are grouped into structs to allow multiple independent
RCU implementations (e.g., \co{rcu} and \co{rcu_bh}) to conveniently
and efficiently share dynticks state.
In what follows, they can be thought of as independent per-CPU variables.
\begin{listing}
\begin{VerbatimL}
struct rcu_dynticks {
int dynticks_nesting;
int dynticks;
int dynticks_nmi;
};
struct rcu_data {
...
int dynticks_snap;
int dynticks_nmi_snap;
...
};
\end{VerbatimL}
\caption{Variables for Simple Dynticks Interface}
\label{lst:formal:Variables for Simple Dynticks Interface}
\end{listing}
The \co{dynticks_nesting}, \co{dynticks}, and \co{dynticks_snap} variables
are for the \IRQ\ code paths, and the \co{dynticks_nmi} and
\co{dynticks_nmi_snap} variables are for the NMI code paths, although
the NMI code path will also reference (but not modify) the
\co{dynticks_nesting} variable.
These variables are used as follows:
\begin{description}[style=nextline]
\item [\tco{dynticks_nesting}]
This counts the number of reasons that the corresponding
CPU should be monitored for RCU read-side critical sections.
If the CPU is in dynticks-idle mode, then this counts the
\IRQ\ nesting level, otherwise it is one greater than the
\IRQ\ nesting level.
\item [\tco{dynticks}]
This counter's value is even if the corresponding CPU is
in dynticks-idle mode and there are no \IRQ\ handlers currently
running on that CPU, otherwise the counter's value is odd.
In other words, if this counter's value is odd, then the
corresponding CPU might be in an RCU read-side critical section.
\item [\tco{dynticks_nmi}]
This counter's value is odd if the corresponding CPU is
in an NMI handler, but only if the NMI arrived while this
CPU was in dyntick-idle mode with no \IRQ\ handlers running.
Otherwise, the counter's value will be even.
\item [\tco{dynticks_snap}]
This will be a snapshot of the \co{dynticks} counter, but
only if the current RCU grace period has extended for too
long a duration.
\item [\tco{dynticks_nmi_snap}]
This will be a snapshot of the \co{dynticks_nmi} counter, but
again only if the current RCU grace period has extended for too
long a duration.
\end{description}
If both \co{dynticks} and \co{dynticks_nmi} have taken on an even
value during a given time interval, then the corresponding CPU has
passed through a \IX{quiescent state} during that interval.
\QuickQuiz{
But what happens if an NMI handler starts running before
an \IRQ\ handler completes, and if that NMI handler continues
running until a second \IRQ\ handler starts?
}\QuickQuizAnswer{
This cannot happen within the confines of a single CPU\@.
The first \IRQ\ handler cannot complete until the NMI handler
returns.
Therefore, if each of the \co{dynticks} and \co{dynticks_nmi}
variables have taken on an even value during a given time
interval, the corresponding CPU really was in a quiescent
state at some time during that interval.
}\QuickQuizEnd
\subsubsection{Entering and Leaving Dynticks-Idle Mode}
\label{sec:formal:Entering and Leaving Dynticks-Idle Mode}
\Cref{lst:formal:Entering and Exiting Dynticks-Idle Mode}
shows the \co{rcu_enter_nohz()} and \co{rcu_exit_nohz()},
which enter and exit dynticks-idle mode, also known as ``nohz'' mode.
These two functions are invoked from process context.
\begin{listing}
\begin{fcvlabel}[ln:formal:Entering and Exiting Dynticks-Idle Mode]
\begin{VerbatimL}[commandchars=\\\[\]]
void rcu_enter_nohz(void)
{
unsigned long flags;
struct rcu_dynticks *rdtp;
smp_mb(); \lnlbl[mb]
local_irq_save(flags); \lnlbl[irq_sv]
rdtp = &__get_cpu_var(rcu_dynticks); \lnlbl[get_ptr]
rdtp->dynticks++; \lnlbl[inc_cnt]
rdtp->dynticks_nesting--; \lnlbl[dec_nst]
WARN_ON(rdtp->dynticks & 0x1);
local_irq_restore(flags); \lnlbl[irq_rs]
}
void rcu_exit_nohz(void)
{
unsigned long flags;
struct rcu_dynticks *rdtp;
local_irq_save(flags);
rdtp = &__get_cpu_var(rcu_dynticks);
rdtp->dynticks++;
rdtp->dynticks_nesting++;
WARN_ON(!(rdtp->dynticks & 0x1));
local_irq_restore(flags);
smp_mb();
}
\end{VerbatimL}
\end{fcvlabel}
\caption{Entering and Exiting Dynticks-Idle Mode}
\label{lst:formal:Entering and Exiting Dynticks-Idle Mode}
\end{listing}
\begin{fcvref}[ln:formal:Entering and Exiting Dynticks-Idle Mode]
\Clnref{mb} ensures that any prior memory accesses (which might
include accesses from RCU read-side critical sections) are seen
by other CPUs before those marking entry to dynticks-idle mode.
\Clnref{irq_sv,irq_rs} disable and reenable \IRQ s.
\Clnref{get_ptr} acquires a pointer to the current CPU's \co{rcu_dynticks}
structure, and
\clnref{inc_cnt} increments the current CPU's \co{dynticks} counter, which
should now be even, given that we are entering dynticks-idle mode
in process context.
Finally, \clnref{dec_nst} decrements \co{dynticks_nesting},
which should now be zero.
\end{fcvref}
The \co{rcu_exit_nohz()} function is quite similar, but increments
\co{dynticks_nesting} rather than decrementing it and checks for
the opposite \co{dynticks} polarity.
\subsubsection{NMIs From Dynticks-Idle Mode}
\label{sec:formal:NMIs From Dynticks-Idle Mode}
\begin{fcvref}[ln:formal:NMIs From Dynticks-Idle Mode]
\Cref{lst:formal:NMIs From Dynticks-Idle Mode}
shows the \co{rcu_nmi_enter()} and \co{rcu_nmi_exit()} functions,
which inform RCU of NMI entry and exit, respectively, from dynticks-idle
mode.
However, if the NMI arrives during an \IRQ\ handler, then RCU will already
be on the lookout for RCU read-side critical sections from this CPU,
so \clnref{chk_ext1,ret1} of \co{rcu_nmi_enter()} and \clnref{chk_ext2,ret2}
of \co{rcu_nmi_exit()} silently return if \co{dynticks} is odd.
Otherwise, the two functions increment \co{dynticks_nmi}, with
\co{rcu_nmi_enter()} leaving it with an odd value and \co{rcu_nmi_exit()}
leaving it with an even value.
Both functions execute memory barriers between this increment
and possible RCU read-side critical sections on \clnref{mb1,mb2},
respectively.
\end{fcvref}
\begin{listing}
\begin{fcvlabel}[ln:formal:NMIs From Dynticks-Idle Mode]
\begin{VerbatimL}[commandchars=\\\[\]]
void rcu_nmi_enter(void)
{
struct rcu_dynticks *rdtp;
rdtp = &__get_cpu_var(rcu_dynticks);
if (rdtp->dynticks & 0x1) \lnlbl[chk_ext1]
return; \lnlbl[ret1]
rdtp->dynticks_nmi++;
WARN_ON(!(rdtp->dynticks_nmi & 0x1));
smp_mb(); \lnlbl[mb1]
}
void rcu_nmi_exit(void)
{
struct rcu_dynticks *rdtp;
rdtp = &__get_cpu_var(rcu_dynticks);
if (rdtp->dynticks & 0x1) \lnlbl[chk_ext2]
return; \lnlbl[ret2]
smp_mb(); \lnlbl[mb2]
rdtp->dynticks_nmi++;
WARN_ON(rdtp->dynticks_nmi & 0x1);
}
\end{VerbatimL}
\end{fcvlabel}
\caption{NMIs From Dynticks-Idle Mode}
\label{lst:formal:NMIs From Dynticks-Idle Mode}
\end{listing}
\subsubsection{Interrupts From Dynticks-Idle Mode}
\label{sec:formal:Interrupts From Dynticks-Idle Mode}
\begin{fcvref}[ln:formal:Interrupts From Dynticks-Idle Mode]
\Cref{lst:formal:Interrupts From Dynticks-Idle Mode}
shows \co{rcu_irq_enter()} and \co{rcu_irq_exit()}, which
inform RCU of entry to and exit from, respectively, \IRQ\ context.
\Clnref{inc_nst} of \co{rcu_irq_enter()} increments \co{dynticks_nesting},
and if this variable was already non-zero, \clnref{ret1} silently returns.
Otherwise, \clnref{inc_dynt1} increments \co{dynticks},
which will then have
an odd value, consistent with the fact that this CPU can now
execute RCU read-side critical sections.
\Clnref{mb1} therefore executes a memory barrier to ensure that
the increment of \co{dynticks} is seen before any
RCU read-side critical sections that the subsequent \IRQ\ handler
might execute.
\begin{listing}
\begin{fcvlabel}[ln:formal:Interrupts From Dynticks-Idle Mode]
\begin{VerbatimL}[commandchars=\\\[\]]
void rcu_irq_enter(void)
{
struct rcu_dynticks *rdtp;
rdtp = &__get_cpu_var(rcu_dynticks);
if (rdtp->dynticks_nesting++) \lnlbl[inc_nst]
return; \lnlbl[ret1]
rdtp->dynticks++; \lnlbl[inc_dynt1]
WARN_ON(!(rdtp->dynticks & 0x1));
smp_mb(); \lnlbl[mb1]
}
void rcu_irq_exit(void)
{
struct rcu_dynticks *rdtp;
rdtp = &__get_cpu_var(rcu_dynticks);
if (--rdtp->dynticks_nesting) \lnlbl[dec_nst]
return; \lnlbl[ret2]
smp_mb(); \lnlbl[mb2]
rdtp->dynticks++; \lnlbl[inc_dynt2]
WARN_ON(rdtp->dynticks & 0x1); \lnlbl[chk_even]
if (__get_cpu_var(rcu_data).nxtlist || \lnlbl[chk_cb:b]
__get_cpu_var(rcu_bh_data).nxtlist)
set_need_resched(); \lnlbl[chk_cb:e]
}
\end{VerbatimL}
\end{fcvlabel}
\caption{Interrupts From Dynticks-Idle Mode}
\label{lst:formal:Interrupts From Dynticks-Idle Mode}
\end{listing}
\Clnref{dec_nst} of \co{rcu_irq_exit()} decrements
\co{dynticks_nesting}, and
if the result is non-zero, \clnref{ret2} silently returns.
Otherwise, \clnref{mb2} executes a memory barrier to ensure that the
increment of \co{dynticks} on \clnref{inc_dynt2} is seen after any RCU
read-side critical sections that the prior \IRQ\ handler might have executed.
\Clnref{chk_even} verifies that \co{dynticks} is now even, consistent with
the fact that no RCU read-side critical sections may appear in
dynticks-idle mode.
\Clnrefrange{chk_cb:b}{chk_cb:e} check to see
if the prior \IRQ\ handlers enqueued any
RCU callbacks, forcing this CPU out of dynticks-idle mode via
a reschedule API if so.
\end{fcvref}
\subsubsection{Checking For Dynticks Quiescent States}
\label{sec:formal:Checking For Dynticks Quiescent States}
\begin{fcvref}[ln:formal:Saving Dyntick Progress Counters]
\Cref{lst:formal:Saving Dyntick Progress Counters}
shows \co{dyntick_save_progress_counter()}, which takes a snapshot
of the specified CPU's \co{dynticks} and \co{dynticks_nmi}
counters.
\Clnref{snap,snapn} snapshot these two variables to locals, \clnref{mb}
executes a memory barrier to pair with the memory barriers in the functions in
\cref{lst:formal:Entering and Exiting Dynticks-Idle Mode,%
lst:formal:NMIs From Dynticks-Idle Mode,%
lst:formal:Interrupts From Dynticks-Idle Mode}.
\Clnref{rec_snap,rec_snapn} record the snapshots for later calls to
\co{rcu_implicit_dynticks_qs()},
and \clnref{chk_prgs} checks to see if the CPU is in dynticks-idle mode with
neither \IRQ s nor NMIs in progress (in other words, both snapshots
have even values), hence in an extended quiescent state.
If so, \clnref{cnt:b,cnt:e} count this event, and \clnref{ret} returns
true if the CPU was in a quiescent state.
\end{fcvref}
\begin{listing}
\begin{fcvlabel}[ln:formal:Saving Dyntick Progress Counters]
\begin{VerbatimL}[commandchars=\\\[\]]
static int
dyntick_save_progress_counter(struct rcu_data *rdp)
{
int ret;
int snap;
int snap_nmi;
snap = rdp->dynticks->dynticks; \lnlbl[snap]
snap_nmi = rdp->dynticks->dynticks_nmi; \lnlbl[snapn]
smp_mb(); \lnlbl[mb]
rdp->dynticks_snap = snap; \lnlbl[rec_snap]
rdp->dynticks_nmi_snap = snap_nmi; \lnlbl[rec_snapn]
ret = ((snap & 0x1) == 0) && ((snap_nmi & 0x1) == 0); \lnlbl[chk_prgs]
if (ret) \lnlbl[cnt:b]
rdp->dynticks_fqs++; \lnlbl[cnt:e]
return ret; \lnlbl[ret]
}
\end{VerbatimL}
\end{fcvlabel}
\caption{Saving Dyntick Progress Counters}
\label{lst:formal:Saving Dyntick Progress Counters}
\end{listing}
\begin{fcvref}[ln:formal:Checking Dyntick Progress Counters]
\Cref{lst:formal:Checking Dyntick Progress Counters}
shows \co{rcu_implicit_dynticks_qs()}, which is called to check
whether a CPU has entered dyntick-idle mode subsequent to a call
to \co{dynticks_save_progress_counter()}.
\Clnref{curr,currn} take new snapshots of the corresponding CPU's
\co{dynticks} and \co{dynticks_nmi} variables, while
\clnref{snap,snapn} retrieve the snapshots saved earlier by
\co{dynticks_save_progress_counter()}.
\Clnref{mb} then
executes a memory barrier to pair with the memory barriers in
the functions in
\cref{lst:formal:Entering and Exiting Dynticks-Idle Mode,%
lst:formal:NMIs From Dynticks-Idle Mode,%
lst:formal:Interrupts From Dynticks-Idle Mode}.
\Clnrefrange{chk_q:b}{chk_q:e}
then check to see if the CPU is either currently in
a quiescent state (\co{curr} and \co{curr_nmi} having even values) or
has passed through a quiescent state since the last call to
\co{dynticks_save_progress_counter()} (the values of
\co{dynticks} and \co{dynticks_nmi} having changed).
If these checks confirm that the CPU has passed through a dyntick-idle
quiescent state, then \clnref{cnt} counts that fact and
\clnref{ret_1} returns an indication of this fact.
Either way, \clnref{chk_race}
checks for race conditions that can result in RCU
waiting for a CPU that is offline.
\end{fcvref}
\begin{listing}
\begin{fcvlabel}[ln:formal:Checking Dyntick Progress Counters]
\begin{VerbatimL}[commandchars=\\\[\]]
static int
rcu_implicit_dynticks_qs(struct rcu_data *rdp)
{
long curr;
long curr_nmi;
long snap;
long snap_nmi;
curr = rdp->dynticks->dynticks; \lnlbl[curr]
snap = rdp->dynticks_snap; \lnlbl[snap]
curr_nmi = rdp->dynticks->dynticks_nmi; \lnlbl[currn]
snap_nmi = rdp->dynticks_nmi_snap; \lnlbl[snapn]
smp_mb(); \lnlbl[mb]
if ((curr != snap || (curr & 0x1) == 0) && \lnlbl[chk_q:b]
(curr_nmi != snap_nmi || (curr_nmi & 0x1) == 0)) { \lnlbl[chk_q:e]
rdp->dynticks_fqs++; \lnlbl[cnt]
return 1; \lnlbl[ret_1]
}
return rcu_implicit_offline_qs(rdp); \lnlbl[chk_race]
}
\end{VerbatimL}
\end{fcvlabel}
\caption{Checking Dyntick Progress Counters}
\label{lst:formal:Checking Dyntick Progress Counters}
\end{listing}
\QuickQuiz{
This is still pretty complicated.
Why not just have a \co{cpumask_t} with per-CPU bits, clearing
the bit when entering an \IRQ\ or NMI handler, and setting it
upon exit?
}\QuickQuizAnswer{
Although this approach would be functionally correct, it
would result in excessive \IRQ\ entry/exit overhead on
large machines.
In contrast, the approach laid out in this section allows
each CPU to touch only per-CPU data on \IRQ\ and NMI entry/exit,
resulting in much lower \IRQ\ entry/exit overhead, especially
on large machines.
}\QuickQuizEnd
Linux-kernel RCU's dyntick-idle code has since been rewritten yet again
based on a suggestion from
Andy Lutomirski~\cite{PaulMcKenney2015dyntickAndyNMI},
but it is time to sum up and move on to other topics.
\subsubsection{Discussion}
\label{sec:formal:Discussion}
A slight shift in viewpoint resulted in a substantial simplification
of the dynticks interface for RCU\@.
The key change leading to this simplification was minimizing of
sharing between \IRQ\ and NMI contexts.
The only sharing in this simplified interface is references from NMI
context to \IRQ\ variables (the \co{dynticks} variable).
This type of sharing is benign, because the NMI functions never update
this variable, so that its value remains constant through the lifetime
of the NMI handler.
This limitation of sharing allows the individual functions to be
understood one at a time, in happy contrast to the situation
described in
\cref{sec:formal:Promela Parable: dynticks and Preemptible RCU},
where an NMI might change shared state at any point during execution of
the \IRQ\ functions.
Verification can be a good thing, but simplicity is even better.
% Restore frame around VerbatimN in two-column layout
\IfTwoColumn{
\RecustomVerbatimEnvironment{VerbatimN}{Verbatim}%
{numbers=left,numbersep=3pt,xleftmargin=5pt,xrightmargin=5pt,frame=single}
\setlength{\lnlblraise}{6pt}
}{}