formal/spinhint.tex - pub/scm/linux/kernel/git/paulmck/perfbook - Git at Google

 % formal/spinhint.html

 \section{General-Purpose State-Space Search}
 \label{sec:formal:General-Purpose State-Space Search}

 This section features the general-purpose Promela and spin tools,
 which may be used to carry out a full
 state-space search of many types of multi-threaded code.
 They are also quite useful for verifying data communication protocols.
 Section~\ref{sec:formal:Promela and Spin}
 introduces Promela and spin, including a couple of warm-up exercises
 verifying both non-atomic and atomic increment.
 Section~\ref{sec:formal:How to Use Promela}
 describes use of Promela, including example command lines and a
 comparison of Promela syntax to that of C.
 Section~\ref{sec:formal:Promela Example: Locking}
 shows how Promela may be used to verify locking,
 \ref{sec:formal:Promela Example: QRCU}
 uses Promela to verify an unusual implementation of RCU named ``QRCU'',
 and finally
 Section~\ref{sec:formal:Promela Parable: dynticks and Preemptible RCU}
 applies Promela to RCU's dyntick-idle implementation.

 \subsection{Promela and Spin}
 \label{sec:formal:Promela and Spin}

 Promela is a language designed to help verify protocols, but which
 can also be used to verify small parallel algorithms.
 You recode your algorithm and correctness constraints in the C-like
 language Promela, and then use Spin to translate it into a C program
 that you can compile and run.
 The resulting program conducts a full state-space search of your
 algorithm, either verifying or finding counter-examples for
 assertions that you can include in your Promela program.

 This full-state search can be extremely powerful, but can also be a two-edged
 sword.
 If your algorithm is too complex or your Promela implementation is
 careless, there might be more states than fit in memory.
 Furthermore, even given sufficient memory, the state-space search might
 well run for longer than the expected lifetime of the universe.
 Therefore, use this tool for compact but complex parallel algorithms.
 Attempts to naively apply it to even moderate-scale algorithms (let alone
 the full Linux kernel) will end badly.

 Promela and Spin may be downloaded from
 \url{http://spinroot.com/spin/whatispin.html}.

 The above site also gives links to Gerard Holzmann's excellent
 book~\cite{Holzmann03a} on Promela and Spin,
 as well as searchable online references starting at:
 \url{http://www.spinroot.com/spin/Man/index.html}.

 The remainder of this section describes how to use Promela to debug
 parallel algorithms, starting with simple examples and progressing to
 more complex uses.

 \subsubsection{Promela Warm-Up: Non-Atomic Increment}
 \label{sec:formal:Promela Warm-Up: Non-Atomic Increment}

 \begin{figure}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 #define NUMPROCS 2
   2
   3 byte counter = 0;
   4 byte progress[NUMPROCS];
   5
   6 proctype incrementer(byte me)
   7 {
   8   int temp;
   9
  10   temp = counter;
  11   counter = temp + 1;
  12   progress[me] = 1;
  13 }
  14
  15 init {
  16   int i = 0;
  17   int sum = 0;
  18
  19   atomic {
  20     i = 0;
  21     do
  22     :: i < NUMPROCS ->
  23       progress[i] = 0;
  24       run incrementer(i);
  25       i++
  26     :: i >= NUMPROCS -> break
  27     od;
  28   }
  29   atomic {
  30     i = 0;
  31     sum = 0;
  32     do
  33     :: i < NUMPROCS ->
  34       sum = sum + progress[i];
  35       i++
  36     :: i >= NUMPROCS -> break
  37     od;
  38     assert(sum < NUMPROCS || counter == NUMPROCS)
  39   }
  40 }
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Promela Code for Non-Atomic Increment}
 \label{fig:analysis:Promela Code for Non-Atomic Increment}
 \end{figure}

 Figure~\ref{fig:analysis:Promela Code for Non-Atomic Increment}
 demonstrates the textbook race condition
 resulting from non-atomic increment.
 Line~1 defines the number of processes to run (we will vary this
 to see the effect on state space), line~3 defines the counter,
 and line~4 is used to implement the assertion that appears on
 lines~29-39.

 Lines~6-13 define a process that increments the counter non-atomically.
 The argument \co{me} is the process number, set by the initialization
 block later in the code.
 Because simple Promela statements are each assumed atomic, we must
 break the increment into the two statements on lines~10-11.
 The assignment on line~12 marks the process's completion.
 Because the Spin system will fully search the state space, including
 all possible sequences of states, there is no need for the loop
 that would be used for conventional testing.

 Lines~15-40 are the initialization block, which is executed first.
 Lines~19-28 actually do the initialization, while lines~29-39
 perform the assertion.
 Both are atomic blocks in order to avoid unnecessarily increasing
 the state space: because they are not part of the algorithm proper,
 we lose no verification coverage by making them atomic.

 The do-od construct on lines~21-27 implements a Promela loop,
 which can be thought of as a C {\tt for (;;)} loop containing a
 \co{switch} statement that allows expressions in case labels.
 The condition blocks (prefixed by {\tt ::})
 are scanned non-deterministically,
 though in this case only one of the conditions can possibly hold at a given
 time.
 The first block of the do-od from lines~22-25 initializes the i-th
 incrementer's progress cell, runs the i-th incrementer's process, and
 then increments the variable \co{i}.
 The second block of the do-od on line~26 exits the loop once
 these processes have been started.

 The atomic block on lines~29-39 also contains a similar do-od
 loop that sums up the progress counters.
 The {\tt assert()} statement on line~38 verifies that if all processes
 have been completed, then all counts have been correctly recorded.

 You can build and run this program as follows:

 \vspace{5pt}
 \begin{minipage}[t]{\columnwidth}
 \scriptsize
 \begin{verbatim}
 spin -a increment.spin		# Translate the model to C
 cc -DSAFETY -o pan pan.c	# Compile the model
 ./pan				# Run the model
 \end{verbatim}
 \end{minipage}
 \vspace{5pt}

 \begin{figure*}[tbp]
 { \scriptsize
 \begin{verbbox}
 pan: assertion violated ((sum<2)||(counter==2)) (at depth 20)
 pan: wrote increment.spin.trail
 (Spin Version 4.2.5 -- 2 April 2005)
 Warning: Search not completed
         + Partial Order Reduction

 Full statespace search for:
         never claim             - (none specified)
         assertion violations    +
         cycle checks            - (disabled by -DSAFETY)
         invalid end states      +

 State-vector 40 byte, depth reached 22, errors: 1
       45 states, stored
       13 states, matched
       58 transitions (= stored+matched)
       51 atomic steps
 hash conflicts: 0 (resolved)

 2.622  memory usage (Mbyte)
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Non-Atomic Increment spin Output}
 \label{fig:analysis:Non-Atomic Increment spin Output}
 \end{figure*}

 This will produce output as shown in
 Figure~\ref{fig:analysis:Non-Atomic Increment spin Output}.
 The first line tells us that our assertion was violated (as expected
 given the non-atomic increment!).
 The second line that a \co{trail} file was written describing how the
 assertion was violated.
 The ``Warning'' line reiterates that all was not well with our model.
 The second paragraph describes the type of state-search being carried out,
 in this case for assertion violations and invalid end states.
 The third paragraph gives state-size statistics: this small model had only
 45 states.
 The final line shows memory usage.

 The \co{trail} file may be rendered human-readable as follows:

 \vspace{5pt}
 \begin{minipage}[t]{\columnwidth}
 \scriptsize
 \begin{verbatim}
 spin -t -p increment.spin
 \end{verbatim}
 \end{minipage}
 \vspace{5pt}

 \begin{figure*}[htbp]
 { \scriptsize
 \begin{verbbox}
 Starting :init: with pid 0
  1: proc 0 (:init:) line 20 "increment.spin" (state 1) [i = 0]
  2: proc 0 (:init:) line 22 "increment.spin" (state 2) [((i<2))]
  2: proc 0 (:init:) line 23 "increment.spin" (state 3) [progress[i] = 0]
 Starting incrementer with pid 1
  3: proc 0 (:init:) line 24 "increment.spin" (state 4) [(run incrementer(i))]
  3: proc 0 (:init:) line 25 "increment.spin" (state 5) [i = (i+1)]
  4: proc 0 (:init:) line 22 "increment.spin" (state 2) [((i<2))]
  4: proc 0 (:init:) line 23 "increment.spin" (state 3) [progress[i] = 0]
 Starting incrementer with pid 2
  5: proc 0 (:init:) line 24 "increment.spin" (state 4) [(run incrementer(i))]
  5: proc 0 (:init:) line 25 "increment.spin" (state 5) [i = (i+1)]
  6: proc 0 (:init:) line 26 "increment.spin" (state 6) [((i>=2))]
  7: proc 0 (:init:) line 21 "increment.spin" (state 10) [break]
  8: proc 2 (incrementer) line 10 "increment.spin" (state 1) [temp = counter]
  9: proc 1 (incrementer) line 10 "increment.spin" (state 1) [temp = counter]
 10: proc 2 (incrementer) line 11 "increment.spin" (state 2) [counter = (temp+1)]
 11: proc 2 (incrementer) line 12 "increment.spin" (state 3) [progress[me] = 1]
 12: proc 2 terminates
 13: proc 1 (incrementer) line 11 "increment.spin" (state 2) [counter = (temp+1)]
 14: proc 1 (incrementer) line 12 "increment.spin" (state 3) [progress[me] = 1]
 15: proc 1 terminates
 16: proc 0 (:init:) line 30 "increment.spin" (state 12)	[i = 0]
 16: proc 0 (:init:) line 31 "increment.spin" (state 13)	[sum = 0]
 17: proc 0 (:init:) line 33 "increment.spin" (state 14)	[((i<2))]
 17: proc 0 (:init:) line 34 "increment.spin" (state 15)	[sum = (sum+progress[i])]
 17: proc 0 (:init:) line 35 "increment.spin" (state 16)	[i = (i+1)]
 18: proc 0 (:init:) line 33 "increment.spin" (state 14)	[((i<2))]
 18: proc 0 (:init:) line 34 "increment.spin" (state 15)	[sum = (sum+progress[i])]
 18: proc 0 (:init:) line 35 "increment.spin" (state 16)	[i = (i+1)]
 19: proc 0 (:init:) line 36 "increment.spin" (state 17)	[((i>=2))]
 20: proc 0 (:init:) line 32 "increment.spin" (state 21)	[break]
 spin: line  38 "increment.spin", Error: assertion violated
 spin: text of failed assertion: assert(((sum<2)||(counter==2)))
  21:	proc  0 (:init:) line  38 "increment.spin" (state 22)	[assert(((sum<2)||(counter==2)))]
 spin: trail ends after 21 steps
 #processes: 1
                 counter = 1
                 progress[0] = 1
                 progress[1] = 1
 21: proc 0 (:init:) line 40 "increment.spin" (state 24) <valid end state>
 3 processes created
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Non-Atomic Increment Error Trail}
 \label{fig:analysis:Non-Atomic Increment Error Trail}
 \end{figure*}

 This gives the output shown in
 Figure~\ref{fig:analysis:Non-Atomic Increment Error Trail}.
 As can be seen, the first portion of the init block created both
 incrementer processes, both of which first fetched the counter,
 then both incremented and stored it, losing a count.
 The assertion then triggered, after which the global state is displayed.

 \subsubsection{Promela Warm-Up: Atomic Increment}
 \label{sec:formal:Promela Warm-Up: Atomic Increment}

 \begin{figure}[htbp]
 { \scriptsize
 \begin{verbbox}
   1 proctype incrementer(byte me)
   2 {
   3   int temp;
   4
   5   atomic {
   6     temp = counter;
   7     counter = temp + 1;
   8   }
   9   progress[me] = 1;
  10 }
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Promela Code for Atomic Increment}
 \label{fig:analysis:Promela Code for Atomic Increment}
 \end{figure}

 \begin{figure}[htbp]
 { \scriptsize
 \begin{verbbox}
 (Spin Version 4.2.5 -- 2 April 2005)
         + Partial Order Reduction

 Full statespace search for:
         never claim             - (none specified)
         assertion violations    +
         cycle checks            - (disabled by -DSAFETY)
         invalid end states      +

 State-vector 40 byte, depth reached 20, errors: 0
       52 states, stored
       21 states, matched
       73 transitions (= stored+matched)
       66 atomic steps
 hash conflicts: 0 (resolved)

 2.622   memory usage (Mbyte)

 unreached in proctype incrementer
         (0 of 5 states)
 unreached in proctype :init:
         (0 of 24 states)
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Atomic Increment spin Output}
 \label{fig:analysis:Atomic Increment spin Output}
 \end{figure}

 It is easy to fix this example by placing the body of the incrementer
 processes in an atomic blocks as shown in
 Figure~\ref{fig:analysis:Promela Code for Atomic Increment}.
 One could also have simply replaced the pair of statements with
 {\tt counter = counter + 1}, because Promela statements are
 atomic.
 Either way, running this modified model gives us an error-free traversal
 of the state space, as shown in
 Figure~\ref{fig:analysis:Atomic Increment spin Output}.

 \begin{table}
 \footnotesize
 \centering
 \begin{tabular}{c|r|r}
 	\# incrementers & \# states &	megabytes \\
 	\hline
 	\hline
 	1 &		        11 &          2.6 \\
 	\hline
 	2 &		        52 &          2.6 \\
 	\hline
 	3 &		       372 &          2.6 \\
 	\hline
 	4 &		     3,496 &          2.7 \\
 	\hline
 	5 &		    40,221 &          5.0 \\
 	\hline
 	6 &		   545,720 &         40.5 \\
 	\hline
 	7 &		 8,521,450 &        652.7 \\
 \end{tabular}
 \caption{Memory Usage of Increment Model}
 \label{tab:advsync:Memory Usage of Increment Model}
 \end{table}

 Table~\ref{tab:advsync:Memory Usage of Increment Model}
 shows the number of states and memory consumed
 as a function of number of incrementers modeled
 (by redefining {\tt NUMPROCS}):

 Running unnecessarily large models is thus subtly discouraged, although
 652MB is well within the limits of modern desktop and laptop machines.

 With this example under our belt, let's take a closer look at the
 commands used to analyze Promela models and then look at more
 elaborate examples.

 \subsection{How to Use Promela}
 \label{sec:formal:How to Use Promela}

 Given a source file \path{qrcu.spin}, one can use the following commands:

 \begin{description}[style=nextline]
 \item	[\tco{spin -a qrcu.spin}]
 	Create a file \path{pan.c} that fully searches the state machine.
 \item	[\tco{cc -DSAFETY -o pan pan.c}]
 	Compile the generated state-machine search.  The \co{-DSAFETY}
 	generates optimizations that are appropriate if you have only
 	assertions (and perhaps \co{never} statements).  If you have
 	liveness, fairness, or forward-progress checks, you may need
 	to compile without \co{-DSAFETY}.  If you leave off \co{-DSAFETY}
 	when you could have used it, the program will let you know.

 	The optimizations produced by \co{-DSAFETY} greatly speed things
 	up, so you should use it when you can.
 	An example situation where you cannot use \co{-DSAFETY} is
 	when checking for livelocks (AKA ``non-progress cycles'')
 	via \co{-DNP}.
 \item	[\tco{./pan}]
 	This actually searches the state space.  The number of states
 	can reach into the tens of millions with very small state
 	machines, so you will need a machine with large memory.
 	For example, \path{qrcu.spin} with 3 readers and 2 updaters required
 	2.7GB of memory.

 	If you aren't sure whether your machine has enough memory,
 	run \co{top} in one window and \co{./pan} in another.  Keep the
 	focus on the \co{./pan} window so that you can quickly kill
 	execution if need be.  As soon as CPU time drops much below
 	100\%, kill \co{./pan}.  If you have removed focus from the
 	window running \co{./pan}, you may wait a long time for the
 	windowing system to grab enough memory to do anything for
 	you.

 	Don't forget to capture the output, especially
 	if you are working on a remote machine.

 	If your model includes forward-progress checks, you will likely
 	need to enable ``weak fairness'' via the \co{-f} command-line
 	argument to \co{./pan}.
 	If your forward-progress checks involve \co{accept} labels,
 	you will also need the \co{-a} argument.
 	% forward reference to model: formal.2009.02.19a in
 	% /home/linux/git/userspace-rcu/formal-model.
 \item	[\tco{spin -t -p qrcu.spin}]
 	Given \co{trail} file output by a run that encountered an
 	error, output the sequence of steps leading to that error.
 	The \co{-g} flag will also include the values of changed
 	global variables, and the  \co{-l} flag will also include
 	the values of changed local variables.
 \end{description}

 \subsubsection{Promela Peculiarities}
 \label{sec:formal:Promela Peculiarities}

 Although all computer languages have underlying similarities,
 Promela will provide some surprises to people used to coding in C,
 C++, or Java.

 \begin{enumerate}
 \item	In C, ``\co{;}'' terminates statements.  In Promela it separates them.
 	Fortunately, more recent versions of Spin have become
 	much more forgiving of ``extra'' semicolons.
 \item	Promela's looping construct, the \co{do} statement, takes
 	conditions.
 	This \co{do} statement closely resembles a looping if-then-else
 	statement.
 \item	In C's \co{switch} statement, if there is no matching case, the whole
 	statement is skipped.  In Promela's equivalent, confusingly called
 	\co{if}, if there is no matching guard expression, you get an error
 	without a recognizable corresponding error message.
 	So, if the error output indicates an innocent line of code,
 	check to see if you left out a condition from an \co{if} or \co{do}
 	statement.
 \item	When creating stress tests in C, one usually races suspect operations
 	against each other repeatedly.	In Promela, one instead sets up
 	a single race, because Promela will search out all the possible
 	outcomes from that single race.	Sometimes you do need to loop
 	in Promela, for example, if multiple operations overlap, but
 	doing so greatly increases the size of your state space.
 \item	In C, the easiest thing to do is to maintain a loop counter to track
 	progress and terminate the loop.  In Promela, loop counters
 	must be avoided like the plague because they cause the state
 	space to explode.  On the other hand, there is no penalty for
 	infinite loops in Promela as long as none of the variables
 	monotonically increase or decrease---Promela will figure out
 	how many passes through the loop really matter, and automatically
 	prune execution beyond that point.
 \item	In C torture-test code, it is often wise to keep per-task control
 	variables.  They are cheap to read, and greatly aid in debugging the
 	test code.  In Promela, per-task control variables should be used
 	only when there is no other alternative.  To see this, consider
 	a 5-task verification with one bit each to indicate completion.
 	This gives 32 states.  In contrast, a simple counter would have
 	only six states, more than a five-fold reduction.  That factor
 	of five might not seem like a problem, at least not until you
 	are struggling with a verification program possessing more than
 	150 million states consuming more than 10GB of memory!
 \item	One of the most challenging things both in C torture-test code and
 	in Promela is formulating good assertions.  Promela also allows
 	\co{never} claims that act sort of like an assertion replicated
 	between every line of code.
 \item	Dividing and conquering is extremely helpful in Promela in keeping
 	the state space under control.  Splitting a large model into two
 	roughly equal halves will result in the state space of each
 	half being roughly the square root of the whole.
 	For example, a million-state combined model might reduce to a
 	pair of thousand-state models.
 	Not only will Promela handle the two smaller models much more
 	quickly with much less memory, but the two smaller algorithms
 	are easier for people to understand.
 \end{enumerate}


 \subsubsection{Promela Coding Tricks}
 \label{sec:formal:Promela Coding Tricks}

 Promela was designed to analyze protocols, so using it on parallel programs
 is a bit abusive.
 The following tricks can help you to abuse Promela safely:

 \begin{enumerate}
 \item	Memory reordering.  Suppose you have a pair of statements
 	copying globals x and y to locals r1 and r2, where ordering
 	matters (e.g., unprotected by locks), but where you have
 	no memory barriers.  This can be modeled in Promela as follows:

 \vspace{5pt}
 \begin{minipage}[t]{\columnwidth}
 \scriptsize
 \begin{verbatim}
   1 if
   2 :: 1 -> r1 = x;
   3   r2 = y
   4 :: 1 -> r2 = y;
   5   r1 = x
   6 fi
 \end{verbatim}
 \end{minipage}
 \vspace{5pt}

 	The two branches of the \co{if} statement will be selected
 	nondeterministically, since they both are available.
 	Because the full state space is searched, \emph{both} choices
 	will eventually be made in all cases.

 	Of course, this trick will cause your state space to explode
 	if used too heavily.
 	In addition, it requires you to anticipate possible reorderings.

 \begin{figure}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 i = 0;
   2 sum = 0;
   3 do
   4 :: i < N_QRCU_READERS ->
   5   sum = sum + (readerstart[i] == 1 &&
   6     readerprogress[i] == 1);
   7   i++
   8 :: i >= N_QRCU_READERS ->
   9   assert(sum == 0);
  10   break
  11 od
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Complex Promela Assertion}
 \label{fig:analysis:Complex Promela Assertion}
 \end{figure}

 \begin{figure}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 atomic {
   2   i = 0;
   3   sum = 0;
   4   do
   5   :: i < N_QRCU_READERS ->
   6     sum = sum + (readerstart[i] == 1 &&
   7       readerprogress[i] == 1);
   8     i++
   9   :: i >= N_QRCU_READERS ->
  10     assert(sum == 0);
  11     break
  12   od
  13 }
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Atomic Block for Complex Promela Assertion}
 \label{fig:analysis:Atomic Block for Complex Promela Assertion}
 \end{figure}

 \item	State reduction.  If you have complex assertions, evaluate
 	them under \co{atomic}.  After all, they are not part of the
 	algorithm.  One example of a complex assertion (to be discussed
 	in more detail later) is as shown in
 	Figure~\ref{fig:analysis:Complex Promela Assertion}.

 	There is no reason to evaluate this assertion
 	non-atomically, since it is not actually part of the algorithm.
 	Because each statement contributes to state, we can reduce
 	the number of useless states by enclosing it in an \co{atomic}
 	block as shown in
 	Figure~\ref{fig:analysis:Atomic Block for Complex Promela Assertion}.

 \item	Promela does not provide functions.
 	You must instead use C preprocessor macros.
 	However, you must use them carefully in order to avoid
 	combinatorial explosion.
 \end{enumerate}

 Now we are ready for more complex examples.

 \subsection{Promela Example: Locking}
 \label{sec:formal:Promela Example: Locking}

 \begin{figure}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 #define spin_lock(mutex) \
   2   do \
   3   :: 1 -> atomic { \
   4       if \
   5       :: mutex == 0 -> \
   6         mutex = 1; \
   7         break \
   8       :: else -> skip \
   9       fi \
  10     } \
  11   od
  12
  13 #define spin_unlock(mutex) \
  14   mutex = 0
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Promela Code for Spinlock}
 \label{fig:analysis:Promela Code for Spinlock}
 \end{figure}

 Since locks are generally useful, \co{spin_lock()} and
 \co{spin_unlock()}
 macros are provided in \path{lock.h}, which may be included from
 multiple Promela models, as shown in
 Figure~\ref{fig:analysis:Promela Code for Spinlock}.
 The \co{spin_lock()} macro contains an infinite do-od loop
 spanning lines~2-11,
 courtesy of the single guard expression of ``1'' on line~3.
 The body of this loop is a single atomic block that contains
 an if-fi statement.
 The if-fi construct is similar to the do-od construct, except
 that it takes a single pass rather than looping.
 If the lock is not held on line~5, then line~6 acquires it and
 line~7 breaks out of the enclosing do-od loop (and also exits
 the atomic block).
 On the other hand, if the lock is already held on line~8,
 we do nothing (\co{skip}), and fall out of the if-fi and the
 atomic block so as to take another pass through the outer
 loop, repeating until the lock is available.

 The \co{spin_unlock()} macro simply marks the lock as no
 longer held.

 Note that memory barriers are not needed because Promela assumes
 full ordering.
 In any given Promela state, all processes agree on both the current
 state and the order of state changes that caused us to arrive at
 the current state.
 This is analogous to the ``sequentially consistent'' memory model
 used by a few computer systems (such as MIPS and PA-RISC).
 As noted earlier, and as will be seen in a later example,
 weak memory ordering must be explicitly coded.

 \begin{figure}[tb]
 { \scriptsize
 \begin{verbbox}
   1 #include "lock.h"
   2
   3 #define N_LOCKERS 3
   4
   5 bit mutex = 0;
   6 bit havelock[N_LOCKERS];
   7 int sum;
   8
   9 proctype locker(byte me)
  10 {
  11   do
  12   :: 1 ->
  13     spin_lock(mutex);
  14     havelock[me] = 1;
  15     havelock[me] = 0;
  16     spin_unlock(mutex)
  17   od
  18 }
  19
  20 init {
  21   int i = 0;
  22   int j;
  23
  24 end:  do
  25   :: i < N_LOCKERS ->
  26     havelock[i] = 0;
  27     run locker(i);
  28     i++
  29   :: i >= N_LOCKERS ->
  30     sum = 0;
  31     j = 0;
  32     atomic {
  33       do
  34       :: j < N_LOCKERS ->
  35         sum = sum + havelock[j];
  36         j = j + 1
  37       :: j >= N_LOCKERS ->
  38         break
  39       od
  40     }
  41     assert(sum <= 1);
  42     break
  43   od
  44 }
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Promela Code to Test Spinlocks}
 \label{fig:analysis:Promela Code to Test Spinlocks}
 \end{figure}

 These macros are tested by the Promela code shown in
 Figure~\ref{fig:analysis:Promela Code to Test Spinlocks}.
 This code is similar to that used to test the increments,
 with the number of locking processes defined by the \co{N_LOCKERS}
 macro definition on line~3.
 The mutex itself is defined on line~5, an array to track the lock owner
 on line~6, and line~7 is used by assertion
 code to verify that only one process holds the lock.

 The locker process is on lines~9-18, and simply loops forever
 acquiring the lock on line~13, claiming it on line~14,
 unclaiming it on line~15, and releasing it on line~16.

 The init block on lines~20-44 initializes the current locker's
 havelock array entry on line~26, starts the current locker on
 line~27, and advances to the next locker on line~28.
 Once all locker processes are spawned, the do-od loop
 moves to line~29, which checks the assertion.
 Lines~30 and~31 initialize the control variables,
 lines~32-40 atomically sum the havelock array entries,
 line~41 is the assertion, and line~42 exits the loop.

 We can run this model by placing the above two code fragments into
 files named \path{lock.h} and \path{lock.spin}, respectively, and then running
 the following commands:

 \vspace{5pt}
 \begin{minipage}[t]{\columnwidth}
 \scriptsize
 \begin{verbatim}
 spin -a lock.spin
 cc -DSAFETY -o pan pan.c
 ./pan
 \end{verbatim}
 \end{minipage}
 \vspace{5pt}

 \begin{figure}[htbp]
 { \scriptsize
 \begin{verbbox}
 (Spin Version 4.2.5 -- 2 April 2005)
         + Partial Order Reduction

 Full statespace search for:
         never claim             - (none specified)
         assertion violations    +
         cycle checks            - (disabled by -DSAFETY)
         invalid end states      +

 State-vector 40 byte, depth reached 357, errors: 0
      564 states, stored
      929 states, matched
     1493 transitions (= stored+matched)
      368 atomic steps
 hash conflicts: 0 (resolved)

 2.622   memory usage (Mbyte)

 unreached in proctype locker
         line 18, state 20, "-end-"
         (1 of 20 states)
 unreached in proctype :init:
         (0 of 22 states)
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Output for Spinlock Test}
 \label{fig:analysis:Output for Spinlock Test}
 \end{figure}

 The output will look something like that shown in
 Figure~\ref{fig:analysis:Output for Spinlock Test}.
 As expected, this run has no assertion failures (``errors: 0'').

 \QuickQuiz{}
 	Why is there an unreached statement in
 	locker?  After all, isn't this a \emph{full} state-space
 	search?
 \QuickQuizAnswer{
 	The locker process is an infinite loop, so control
 	never reaches the end of this process.
 	However, since there are no monotonically increasing variables,
 	Promela is able to model this infinite loop with a small
 	number of states.
 } \QuickQuizEnd

 \QuickQuiz{}
 	What are some Promela code-style issues with this example?
 \QuickQuizAnswer{
 	There are several:
 	\begin{enumerate}
 	\item	The declaration of {\tt sum} should be moved to within
 		the init block, since it is not used anywhere else.
 	\item	The assertion code should be moved outside of the
 		initialization loop.  The initialization loop can
 		then be placed in an atomic block, greatly reducing
 		the state space (by how much?).
 	\item	The atomic block covering the assertion code should
 		be extended to include the initialization of {\tt sum}
 		and {\tt j}, and also to cover the assertion.
 		This also reduces the state space (again, by how
 		much?).
 	\end{enumerate}
 } \QuickQuizEnd


 \subsection{Promela Example: QRCU}
 \label{sec:formal:Promela Example: QRCU}

 This final example demonstrates a real-world use of Promela on Oleg
 Nesterov's
 QRCU~\cite{OlegNesterov2006QRCU,OlegNesterov2006aQRCU},
 but modified to speed up the \co{synchronize_qrcu()}
 fastpath.

 But first, what is QRCU?

 QRCU is a variant of SRCU~\cite{PaulEMcKenney2006c}
 that trades somewhat higher read overhead
 (atomic increment and decrement on a global variable) for extremely
 low grace-period latencies.
 If there are no readers, the grace period will be detected in less
 than a microsecond, compared to the multi-millisecond grace-period
 latencies of most other RCU implementations.

 \begin{enumerate}
 \item	There is a \co{qrcu_struct} that defines a QRCU domain.
 	Like SRCU (and unlike other variants of RCU) QRCU's action
 	is not global, but instead focused on the specified
 	\co{qrcu_struct}.
 \item	There are \co{qrcu_read_lock()} and \co{qrcu_read_unlock()}
 	primitives that delimit QRCU read-side critical sections.
 	The corresponding \co{qrcu_struct} must be passed into
 	these primitives, and the return value from \co{rcu_read_lock()}
 	must be passed to \co{rcu_read_unlock()}.

 	For example:

 \vspace{5pt}
 \begin{minipage}[t]{\columnwidth}
 \scriptsize
 \begin{verbatim}
 idx = qrcu_read_lock(&my_qrcu_struct);
 /* read-side critical section. */
 qrcu_read_unlock(&my_qrcu_struct, idx);
 \end{verbatim}
 \end{minipage}
 \vspace{5pt}

 \item	There is a \co{synchronize_qrcu()} primitive that blocks until
 	all pre-existing QRCU read-side critical sections complete,
 	but, like SRCU's \co{synchronize_srcu()}, QRCU's
 	\co{synchronize_qrcu()} need wait only for those read-side
 	critical sections that are using the same \co{qrcu_struct}.

 	For example, \co{synchronize_qrcu(&your_qrcu_struct)}
 	would \emph{not} need to wait on the earlier QRCU read-side
 	critical section.
 	In contrast, \co{synchronize_qrcu(&my_qrcu_struct)}
 	\emph{would} need to wait, since it shares the same
 	\co{qrcu_struct}.
 \end{enumerate}

 A Linux-kernel patch for QRCU has been
 produced~\cite{PaulMcKenney2007QRCUpatch},
 but has not yet been included in the Linux kernel as of
 April 2008.

 \begin{figure}[htbp]
 { \scriptsize
 \begin{verbbox}
   1 #include "lock.h"
   2
   3 #define N_QRCU_READERS 2
   4 #define N_QRCU_UPDATERS 2
   5
   6 bit idx = 0;
   7 byte ctr[2];
   8 byte readerprogress[N_QRCU_READERS];
   9 bit mutex = 0;
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{QRCU Global Variables}
 \label{fig:analysis:QRCU Global Variables}
 \end{figure}

 Returning to the Promela code for QRCU, the global variables are as shown in
 Figure~\ref{fig:analysis:QRCU Global Variables}.
 This example uses locking, hence including \path{lock.h}.
 Both the number of readers and writers can be varied using the
 two \co{#define} statements, giving us not one but two ways to create
 combinatorial explosion.
 The \co{idx} variable controls which of the two elements of the \co{ctr}
 array will be used by readers, and the \co{readerprogress} variable
 allows an assertion to determine when all the readers are finished
 (since a QRCU update cannot be permitted to complete until all
 pre-existing readers have completed their QRCU read-side critical
 sections).
 The \co{readerprogress} array elements have values as follows,
 indicating the state of the corresponding reader:

 \begin{enumerate}
 \item	0: not yet started.
 \item	1: within QRCU read-side critical section.
 \item	2: finished with QRCU read-side critical section.
 \end{enumerate}

 Finally, the \co{mutex} variable is used to serialize updaters' slowpaths.

 \begin{figure}[htbp]
 { \scriptsize
 \begin{verbbox}
   1 proctype qrcu_reader(byte me)
   2 {
   3   int myidx;
   4
   5   do
   6   :: 1 ->
   7     myidx = idx;
   8     atomic {
   9       if
  10       :: ctr[myidx] > 0 ->
  11         ctr[myidx]++;
  12         break
  13       :: else -> skip
  14       fi
  15     }
  16   od;
  17   readerprogress[me] = 1;
  18   readerprogress[me] = 2;
  19   atomic { ctr[myidx]-- }
  20 }
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{QRCU Reader Process}
 \label{fig:analysis:QRCU Reader Process}
 \end{figure}

 QRCU readers are modeled by the \co{qrcu_reader()} process shown in
 Figure~\ref{fig:analysis:QRCU Reader Process}.
 A do-od loop spans lines~5-16, with a single guard of ``1''
 on line~6 that makes it an infinite loop.
 Line~7 captures the current value of the global index, and lines~8-15
 atomically increment it (and break from the infinite loop)
 if its value was non-zero (\co{atomic_inc_not_zero()}).
 Line~17 marks entry into the RCU read-side critical section, and
 line~18 marks exit from this critical section, both lines for the benefit of
 the {\tt assert()} statement that we shall encounter later.
 Line~19 atomically decrements the same counter that we incremented,
 thereby exiting the RCU read-side critical section.

 \begin{figure}[htbp]
 { \scriptsize
 \begin{verbbox}
   1 #define sum_unordered \
   2   atomic { \
   3     do \
   4     :: 1 -> \
   5       sum = ctr[0]; \
   6       i = 1; \
   7       break \
   8     :: 1 -> \
   9       sum = ctr[1]; \
  10       i = 0; \
  11       break \
  12     od; \
  13   } \
  14   sum = sum + ctr[i]
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{QRCU Unordered Summation}
 \label{fig:analysis:QRCU Unordered Summation}
 \end{figure}

 The C-preprocessor macro shown in
 Figure~\ref{fig:analysis:QRCU Unordered Summation}
 sums the pair of counters so as to emulate weak memory ordering.
 Lines~2-13 fetch one of the counters, and line~14 fetches the other
 of the pair and sums them.
 The atomic block consists of a single do-od statement.
 This do-od statement (spanning lines~3-12) is unusual in that
 it contains two unconditional
 branches with guards on lines~4 and~8, which causes Promela to
 non-deterministically choose one of the two (but again, the full
 state-space search causes Promela to eventually make all possible
 choices in each applicable situation).
 The first branch fetches the zero-th counter and sets \co{i} to 1 (so
 that line~14 will fetch the first counter), while the second
 branch does the opposite, fetching the first counter and setting \co{i}
 to 0 (so that line~14 will fetch the second counter).

 \QuickQuiz{}
 	Is there a more straightforward way to code the do-od statement?
 \QuickQuizAnswer{
 	Yes.
 	Replace it with {\tt if-fi} and remove the two {\tt break} statements.
 } \QuickQuizEnd

 \begin{figure}[htbp]
 { \scriptsize
 \begin{verbbox}
   1 proctype qrcu_updater(byte me)
   2 {
   3   int i;
   4   byte readerstart[N_QRCU_READERS];
   5   int sum;
   6
   7   do
   8   :: 1 ->
   9
  10     /* Snapshot reader state. */
  11
  12     atomic {
  13       i = 0;
  14       do
  15       :: i < N_QRCU_READERS ->
  16         readerstart[i] = readerprogress[i];
  17         i++
  18       :: i >= N_QRCU_READERS ->
  19         break
  20       od
  21     }
  22
  23     sum_unordered;
  24     if
  25     :: sum <= 1 -> sum_unordered
  26     :: else -> skip
  27     fi;
  28     if
  29     :: sum > 1 ->
  30       spin_lock(mutex);
  31       atomic { ctr[!idx]++ }
  32       idx = !idx;
  33       atomic { ctr[!idx]-- }
  34       do
  35       :: ctr[!idx] > 0 -> skip
  36       :: ctr[!idx] == 0 -> break
  37       od;
  38       spin_unlock(mutex);
  39     :: else -> skip
  40     fi;
  41
  42     /* Verify reader progress. */
  43
  44     atomic {
  45       i = 0;
  46       sum = 0;
  47       do
  48       :: i < N_QRCU_READERS ->
  49         sum = sum + (readerstart[i] == 1 &&
  50                readerprogress[i] == 1);
  51         i++
  52       :: i >= N_QRCU_READERS ->
  53         assert(sum == 0);
  54         break
  55       od
  56     }
  57   od
  58 }
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{QRCU Updater Process}
 \label{fig:analysis:QRCU Updater Process}
 \end{figure}

 With the \co{sum_unordered} macro in place, we can now proceed
 to the update-side process shown in
 Figure~\ref{fig:analysis:QRCU Updater Process}.
 The update-side process repeats indefinitely, with the corresponding
 do-od loop ranging over lines~7-57.
 Each pass through the loop first snapshots the global {\tt readerprogress}
 array into the local {\tt readerstart} array on lines~12-21.
 This snapshot will be used for the assertion on line~53.
 Line~23 invokes \co{sum_unordered}, and then lines~24-27
 re-invoke \co{sum_unordered} if the fastpath is potentially
 usable.

 Lines~28-40 execute the slowpath code if need be, with
 lines~30 and~38 acquiring and releasing the update-side lock,
 lines~31-33 flipping the index, and lines~34-37 waiting for
 all pre-existing readers to complete.

 Lines~44-56 then compare the current values in the {\tt readerprogress}
 array to those collected in the {\tt readerstart} array,
 forcing an assertion failure should any readers that started before
 this update still be in progress.

 \QuickQuiz{}
 	Why are there atomic blocks at lines~12-21
 	and lines~44-56, when the operations within those atomic
 	blocks have no atomic implementation on any current
 	production microprocessor?
 \QuickQuizAnswer{
 	Because those operations are for the benefit of the
 	assertion only.  They are not part of the algorithm itself.
 	There is therefore no harm in marking them atomic, and
 	so marking them greatly reduces the state space that must
 	be searched by the Promela model.
 } \QuickQuizEnd

 \QuickQuiz{}
 	Is the re-summing of the counters on lines~24-27
 	\emph{really} necessary?
 \QuickQuizAnswer{
 	Yes.  To see this, delete these lines and run the model.

 	Alternatively, consider the following sequence of steps:

 	\begin{enumerate}
 	\item	One process is within its RCU read-side critical
 		section, so that the value of {\tt ctr[0]} is zero and
 		the value of {\tt ctr[1]} is two.
 	\item	An updater starts executing, and sees that the sum of
 		the counters is two so that the fastpath cannot be
 		executed.  It therefore acquires the lock.
 	\item	A second updater starts executing, and fetches the value
 		of {\tt ctr[0]}, which is zero.
 	\item	The first updater adds one to {\tt ctr[0]}, flips
 		the index (which now becomes zero), then subtracts
 		one from {\tt ctr[1]} (which now becomes one).
 	\item	The second updater fetches the value of {\tt ctr[1]},
 		which is now one.
 	\item	The second updater now incorrectly concludes that it
 		is safe to proceed on the fastpath, despite the fact
 		that the original reader has not yet completed.
 	\end{enumerate}
 } \QuickQuizEnd

 \begin{figure}[htbp]
 { \scriptsize
 \begin{verbbox}
   1 init {
   2   int i;
   3
   4   atomic {
   5     ctr[idx] = 1;
   6     ctr[!idx] = 0;
   7     i = 0;
   8     do
   9     :: i < N_QRCU_READERS ->
  10       readerprogress[i] = 0;
  11       run qrcu_reader(i);
  12       i++
  13     :: i >= N_QRCU_READERS -> break
  14     od;
  15     i = 0;
  16     do
  17     :: i < N_QRCU_UPDATERS ->
  18       run qrcu_updater(i);
  19       i++
  20     :: i >= N_QRCU_UPDATERS -> break
  21     od
  22   }
  23 }
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{QRCU Initialization Process}
 \label{fig:analysis:QRCU Initialization Process}
 \end{figure}

 All that remains is the initialization block shown in
 Figure~\ref{fig:analysis:QRCU Initialization Process}.
 This block simply initializes the counter pair on lines~5-6,
 spawns the reader processes on lines~7-14, and spawns the updater
 processes on lines~15-21.
 This is all done within an atomic block to reduce state space.

 \subsubsection{Running the QRCU Example}
 \label{sec:formal:Running the QRCU Example}

 To run the QRCU example, combine the code fragments in the previous
 section into a single file named \path{qrcu.spin}, and place the definitions
 for \co{spin_lock()} and \co{spin_unlock()} into a file named
 \path{lock.h}.
 Then use the following commands to build and run the QRCU model:

 \vspace{5pt}
 \begin{minipage}[t]{\columnwidth}
 \scriptsize
 \begin{verbatim}
 spin -a qrcu.spin
 cc -DSAFETY -o pan pan.c
 ./pan
 \end{verbatim}
 \end{minipage}
 \vspace{5pt}

 \begin{table}
 \footnotesize
 \centering
 \begin{tabular}{c|r|r|r}
 	updaters &
 	    readers &
 		   \# states & MB \\
 	\hline
 	1 & 1 &         376 &      2.6 \\
 	\hline
 	1 & 2 &       6,177 &      2.9 \\
 	\hline
 	1 & 3 &      82,127 &      7.5 \\
 	\hline
 	2 & 1 &      29,399 &      4.5 \\
 	\hline
 	2 & 2 &   1,071,180 &     75.4 \\
 	\hline
 	2 & 3 &  33,866,700 &  2,715.2 \\
 	\hline
 	3 & 1 &     258,605 &     22.3 \\
 	\hline
 	3 & 2 & 169,533,000 & 14,979.9 \\
 \end{tabular}
 \caption{Memory Usage of QRCU Model}
 \label{tab:advsync:Memory Usage of QRCU Model}
 \end{table}

 The resulting output shows that this model passes all of the cases
 shown in
 Table~\ref{tab:advsync:Memory Usage of QRCU Model}.
 Now, it would be nice to run the case with three readers and three
 updaters, however, simple extrapolation indicates that this will
 require on the order of a terabyte of memory best case.
 So, what to do?
 Here are some possible approaches:

 \begin{enumerate}
 \item	See whether a smaller number of readers and updaters suffice
 	to prove the general case.
 \item	Manually construct a proof of correctness.
 \item	Use a more capable tool.
 \item	Divide and conquer.
 \end{enumerate}

 The following sections discuss each of these approaches.

 \subsubsection{How Many Readers and Updaters Are Really Needed?}
 \label{sec:formal:How Many Readers and Updaters Are Really Needed?}

 One approach is to look carefully at the Promela code for
 \co{qrcu_updater()} and notice that the only global state
 change is happening under the lock.
 Therefore, only one updater at a time can possibly be modifying
 state visible to either readers or other updaters.
 This means that any sequences of state changes can be carried
 out serially by a single updater due to the fact that Promela does a full
 state-space search.
 Therefore, at most two updaters are required: one to change state
 and a second to become confused.

 The situation with the readers is less clear-cut, as each reader
 does only a single read-side critical section then terminates.
 It is possible to argue that the useful number of readers is limited,
 due to the fact that the fastpath must see at most a zero and a one
 in the counters.
 This is a fruitful avenue of investigation, in fact, it leads to
 the full proof of correctness described in the next section.

 \subsubsection{Alternative Approach: Proof of Correctness}
 \label{sec:formal:Alternative Approach: Proof of Correctness}

 An informal proof~\cite{PaulMcKenney2007QRCUpatch}
 follows:

 \begin{enumerate}
 \item	For \co{synchronize_qrcu()} to exit too early, then
 	by definition there must have been at least one reader
 	present during \co{synchronize_qrcu()}'s full
 	execution.
 \item	The counter corresponding to this reader will have been
 	at least 1 during this time interval.
 \item	The \co{synchronize_qrcu()} code forces at least one
 	of the counters to be at least 1 at all times.
 \item	Therefore, at any given point in time, either one of the
 	counters will be at least 2, or both of the counters will
 	be at least one.
 \item	However, the \co{synchronize_qrcu()} fastpath code
 	can read only one of the counters at a given time.
 	It is therefore possible for the fastpath code to fetch
 	the first counter while zero, but to race with a counter
 	flip so that the second counter is seen as one.
 \item	There can be at most one reader persisting through such
 	a race condition, as otherwise the sum would be two or
 	greater, which would cause the updater to take the slowpath.
 \item	But if the race occurs on the fastpath's first read of the
 	counters, and then again on its second read, there have
 	to have been two counter flips.
 \item	Because a given updater flips the counter only once, and
 	because the update-side lock prevents a pair of updaters
 	from concurrently flipping the counters, the only way that
 	the fastpath code can race with a flip twice is if the
 	first updater completes.
 \item	But the first updater will not complete until after all
 	pre-existing readers have completed.
 \item	Therefore, if the fastpath races with a counter flip
 	twice in succession, all pre-existing readers must have
 	completed, so that it is safe to take the fastpath.
 \end{enumerate}

 Of course, not all parallel algorithms have such simple proofs.
 In such cases, it may be necessary to enlist more capable tools.

 \subsubsection{Alternative Approach: More Capable Tools}
 \label{sec:formal:Alternative Approach: More Capable Tools}

 Although Promela and Spin are quite useful,
 much more capable tools are available, particularly for verifying
 hardware.
 This means that if it is possible to translate your algorithm
 to the hardware-design VHDL language, as it often will be for
 low-level parallel algorithms, then it is possible to apply these
 tools to your code (for example, this was done for the first
 realtime RCU algorithm).
 However, such tools can be quite expensive.

 Although the advent of commodity multiprocessing
 might eventually result in powerful free-software model-checkers
 featuring fancy state-space-reduction capabilities,
 this does not help much in the here and now.

 As an aside, there are Spin features that support approximate searches
 that require fixed amounts of memory, however, I have never been able
 to bring myself to trust approximations when verifying parallel
 algorithms.

 Another approach might be to divide and conquer.

 \subsubsection{Alternative Approach: Divide and Conquer}
 \label{sec:formal:Alternative Approach: Divide and Conquer}

 It is often possible to break down a larger parallel algorithm into
 smaller pieces, which can then be proven separately.
 For example, a 10-billion-state model might be broken into a pair
 of 100,000-state models.
 Taking this approach not only makes it easier for tools such as
 Promela to verify your algorithms, it can also make your algorithms
 easier to understand.

 \subsubsection{Is QRCU Really Correct?}
 \label{sec:formal:Is QRCU Really Correct?}

 Is QRCU really correct?
 We have a Promela-based mechanical proof and a by-hand proof that both
 say that it is.
 However, a recent paper by Alglave et al.~\cite{JadeAlglave2013-cav}
 says otherwise (see Section~5.1 of the paper at the bottom of page~12).
 Which is it?

 It turns out that both are correct!
 When QRCU was added to a suite of formal-verification benchmarks,
 its memory barriers were omitted, thus resulting in a buggy version
 of QRCU.
 So the real news here is that a number of formal-verification tools
 incorrectly proved this buggy QRCU correct.
 And this is why formal-verification tools themselves should be tested
 using bug-injected versions of the code being verified.
 If a given tool cannot find the injected bugs, then that tool is
 clearly untrustworthy.

 \QuickQuiz{}
 	But different formal-verification tools are often designed to
 	locate particular classes of bugs.
 	For example, very few formal-verification tools will find
 	an error in the specification.
 	So isn't this ``clearly untrustworthy'' judgment a bit harsh?
 \QuickQuizAnswer{
 	It is certainly true that many formal-verification tools are
 	specialized in some way.
 	For example, Promela does not handle realistic memory models
 	(though they can be programmed into
 	Promela~\cite{Desnoyers:2013:MSM:2506164.2506174}),
 	CBMC~\cite{EdmundClarke2004CBMC} does not detect probabilistic
 	hangs and deadlocks, and
 	Nidhugg~\cite{CarlLeonardsson2014Nidhugg} does not detect
 	bugs involving data nondeterminism.
 	But this means that that these tools cannot be trusted to find
 	bugs that they are not designed to locate.

 	And therefore people creating formal-verification tools should
 	``tell the truth on the label'', clearly calling out what
 	classes of bugs their tools can and cannot detect.
 	Otherwise, the first time a practitioner finds a tool
 	failing to detect a bug, that practitioner is likely to
 	make extremely harsh and extremely public denunciations
 	of that tool.
 	Yes, yes, there is something to be said for putting your
 	best foot forward, but putting it too far forward without
 	appropriate disclaimers can easily trigger a land mine of
 	negative reaction that your tool might or might not be able
 	to recover from.

 	You have been warned!
 } \QuickQuizEnd

 Therefore, if you do intend to use QRCU, please take care.
 Its proofs of correctness might or might not themselves be correct.
 Which is one reason why formal verification is unlikely to
 completely replace testing, as Donald Knuth pointed out so long ago.

 \QuickQuiz{}
 	Given that we have two independent proofs of correctness for
 	the QRCU algorithm described herein, and given that the
 	proof of incorrectness covers what is known to be a different
 	algorithm, why is there any room for doubt?
 \QuickQuizAnswer{
 	There is always room for doubt.
 	In this case, it is important to keep in mind that the two proofs
 	of correctness preceded the formalization of real-world memory
 	models, raising the possibility that these two proofs are based
 	on incorrect memory-ordering assumptions.
 	Furthermore, since both proofs were constructed by the same person,
 	it is quite possible that they contain a common error.
 	Again, there is always room for doubt.
 } \QuickQuizEnd