| % formal/spinhint.html |
| |
| \section{General-Purpose State-Space Search} |
| \label{sec:formal:General-Purpose State-Space Search} |
| |
| This section features the general-purpose Promela and spin tools, |
| which may be used to carry out a full |
| state-space search of many types of multi-threaded code. |
| They are also quite useful for verifying data communication protocols. |
| Section~\ref{sec:formal:Promela and Spin} |
| introduces Promela and spin, including a couple of warm-up exercises |
| verifying both non-atomic and atomic increment. |
| Section~\ref{sec:formal:How to Use Promela} |
| describes use of Promela, including example command lines and a |
| comparison of Promela syntax to that of C. |
| Section~\ref{sec:formal:Promela Example: Locking} |
| shows how Promela may be used to verify locking, |
| \ref{sec:formal:Promela Example: QRCU} |
| uses Promela to verify an unusual implementation of RCU named ``QRCU'', |
| and finally |
| Section~\ref{sec:formal:Promela Parable: dynticks and Preemptible RCU} |
| applies Promela to RCU's dyntick-idle implementation. |
| |
| \subsection{Promela and Spin} |
| \label{sec:formal:Promela and Spin} |
| |
| Promela is a language designed to help verify protocols, but which |
| can also be used to verify small parallel algorithms. |
| You recode your algorithm and correctness constraints in the C-like |
| language Promela, and then use Spin to translate it into a C program |
| that you can compile and run. |
| The resulting program conducts a full state-space search of your |
| algorithm, either verifying or finding counter-examples for |
| assertions that you can include in your Promela program. |
| |
| This full-state search can be extremely powerful, but can also be a two-edged |
| sword. |
| If your algorithm is too complex or your Promela implementation is |
| careless, there might be more states than fit in memory. |
| Furthermore, even given sufficient memory, the state-space search might |
| well run for longer than the expected lifetime of the universe. |
| Therefore, use this tool for compact but complex parallel algorithms. |
| Attempts to naively apply it to even moderate-scale algorithms (let alone |
| the full Linux kernel) will end badly. |
| |
| Promela and Spin may be downloaded from |
| \url{http://spinroot.com/spin/whatispin.html}. |
| |
| The above site also gives links to Gerard Holzmann's excellent |
| book~\cite{Holzmann03a} on Promela and Spin, |
| as well as searchable online references starting at: |
| \url{http://www.spinroot.com/spin/Man/index.html}. |
| |
| The remainder of this section describes how to use Promela to debug |
| parallel algorithms, starting with simple examples and progressing to |
| more complex uses. |
| |
| \subsubsection{Promela Warm-Up: Non-Atomic Increment} |
| \label{sec:formal:Promela Warm-Up: Non-Atomic Increment} |
| |
| \begin{figure}[tbp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 #define NUMPROCS 2 |
| 2 |
| 3 byte counter = 0; |
| 4 byte progress[NUMPROCS]; |
| 5 |
| 6 proctype incrementer(byte me) |
| 7 { |
| 8 int temp; |
| 9 |
| 10 temp = counter; |
| 11 counter = temp + 1; |
| 12 progress[me] = 1; |
| 13 } |
| 14 |
| 15 init { |
| 16 int i = 0; |
| 17 int sum = 0; |
| 18 |
| 19 atomic { |
| 20 i = 0; |
| 21 do |
| 22 :: i < NUMPROCS -> |
| 23 progress[i] = 0; |
| 24 run incrementer(i); |
| 25 i++ |
| 26 :: i >= NUMPROCS -> break |
| 27 od; |
| 28 } |
| 29 atomic { |
| 30 i = 0; |
| 31 sum = 0; |
| 32 do |
| 33 :: i < NUMPROCS -> |
| 34 sum = sum + progress[i]; |
| 35 i++ |
| 36 :: i >= NUMPROCS -> break |
| 37 od; |
| 38 assert(sum < NUMPROCS || counter == NUMPROCS) |
| 39 } |
| 40 } |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Promela Code for Non-Atomic Increment} |
| \label{fig:analysis:Promela Code for Non-Atomic Increment} |
| \end{figure} |
| |
| Figure~\ref{fig:analysis:Promela Code for Non-Atomic Increment} |
| demonstrates the textbook race condition |
| resulting from non-atomic increment. |
| Line~1 defines the number of processes to run (we will vary this |
| to see the effect on state space), line~3 defines the counter, |
| and line~4 is used to implement the assertion that appears on |
| lines~29-39. |
| |
| Lines~6-13 define a process that increments the counter non-atomically. |
| The argument \co{me} is the process number, set by the initialization |
| block later in the code. |
| Because simple Promela statements are each assumed atomic, we must |
| break the increment into the two statements on lines~10-11. |
| The assignment on line~12 marks the process's completion. |
| Because the Spin system will fully search the state space, including |
| all possible sequences of states, there is no need for the loop |
| that would be used for conventional testing. |
| |
| Lines~15-40 are the initialization block, which is executed first. |
| Lines~19-28 actually do the initialization, while lines~29-39 |
| perform the assertion. |
| Both are atomic blocks in order to avoid unnecessarily increasing |
| the state space: because they are not part of the algorithm proper, |
| we lose no verification coverage by making them atomic. |
| |
| The do-od construct on lines~21-27 implements a Promela loop, |
| which can be thought of as a C {\tt for (;;)} loop containing a |
| \co{switch} statement that allows expressions in case labels. |
| The condition blocks (prefixed by {\tt ::}) |
| are scanned non-deterministically, |
| though in this case only one of the conditions can possibly hold at a given |
| time. |
| The first block of the do-od from lines~22-25 initializes the i-th |
| incrementer's progress cell, runs the i-th incrementer's process, and |
| then increments the variable \co{i}. |
| The second block of the do-od on line~26 exits the loop once |
| these processes have been started. |
| |
| The atomic block on lines~29-39 also contains a similar do-od |
| loop that sums up the progress counters. |
| The {\tt assert()} statement on line~38 verifies that if all processes |
| have been completed, then all counts have been correctly recorded. |
| |
| You can build and run this program as follows: |
| |
| \vspace{5pt} |
| \begin{minipage}[t]{\columnwidth} |
| \scriptsize |
| \begin{verbatim} |
| spin -a increment.spin # Translate the model to C |
| cc -DSAFETY -o pan pan.c # Compile the model |
| ./pan # Run the model |
| \end{verbatim} |
| \end{minipage} |
| \vspace{5pt} |
| |
| \begin{figure*}[tbp] |
| { \scriptsize |
| \begin{verbbox} |
| pan: assertion violated ((sum<2)||(counter==2)) (at depth 20) |
| pan: wrote increment.spin.trail |
| (Spin Version 4.2.5 -- 2 April 2005) |
| Warning: Search not completed |
| + Partial Order Reduction |
| |
| Full statespace search for: |
| never claim - (none specified) |
| assertion violations + |
| cycle checks - (disabled by -DSAFETY) |
| invalid end states + |
| |
| State-vector 40 byte, depth reached 22, errors: 1 |
| 45 states, stored |
| 13 states, matched |
| 58 transitions (= stored+matched) |
| 51 atomic steps |
| hash conflicts: 0 (resolved) |
| |
| 2.622 memory usage (Mbyte) |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Non-Atomic Increment spin Output} |
| \label{fig:analysis:Non-Atomic Increment spin Output} |
| \end{figure*} |
| |
| This will produce output as shown in |
| Figure~\ref{fig:analysis:Non-Atomic Increment spin Output}. |
| The first line tells us that our assertion was violated (as expected |
| given the non-atomic increment!). |
| The second line that a \co{trail} file was written describing how the |
| assertion was violated. |
| The ``Warning'' line reiterates that all was not well with our model. |
| The second paragraph describes the type of state-search being carried out, |
| in this case for assertion violations and invalid end states. |
| The third paragraph gives state-size statistics: this small model had only |
| 45 states. |
| The final line shows memory usage. |
| |
| The \co{trail} file may be rendered human-readable as follows: |
| |
| \vspace{5pt} |
| \begin{minipage}[t]{\columnwidth} |
| \scriptsize |
| \begin{verbatim} |
| spin -t -p increment.spin |
| \end{verbatim} |
| \end{minipage} |
| \vspace{5pt} |
| |
| \begin{figure*}[htbp] |
| { \scriptsize |
| \begin{verbbox} |
| Starting :init: with pid 0 |
| 1: proc 0 (:init:) line 20 "increment.spin" (state 1) [i = 0] |
| 2: proc 0 (:init:) line 22 "increment.spin" (state 2) [((i<2))] |
| 2: proc 0 (:init:) line 23 "increment.spin" (state 3) [progress[i] = 0] |
| Starting incrementer with pid 1 |
| 3: proc 0 (:init:) line 24 "increment.spin" (state 4) [(run incrementer(i))] |
| 3: proc 0 (:init:) line 25 "increment.spin" (state 5) [i = (i+1)] |
| 4: proc 0 (:init:) line 22 "increment.spin" (state 2) [((i<2))] |
| 4: proc 0 (:init:) line 23 "increment.spin" (state 3) [progress[i] = 0] |
| Starting incrementer with pid 2 |
| 5: proc 0 (:init:) line 24 "increment.spin" (state 4) [(run incrementer(i))] |
| 5: proc 0 (:init:) line 25 "increment.spin" (state 5) [i = (i+1)] |
| 6: proc 0 (:init:) line 26 "increment.spin" (state 6) [((i>=2))] |
| 7: proc 0 (:init:) line 21 "increment.spin" (state 10) [break] |
| 8: proc 2 (incrementer) line 10 "increment.spin" (state 1) [temp = counter] |
| 9: proc 1 (incrementer) line 10 "increment.spin" (state 1) [temp = counter] |
| 10: proc 2 (incrementer) line 11 "increment.spin" (state 2) [counter = (temp+1)] |
| 11: proc 2 (incrementer) line 12 "increment.spin" (state 3) [progress[me] = 1] |
| 12: proc 2 terminates |
| 13: proc 1 (incrementer) line 11 "increment.spin" (state 2) [counter = (temp+1)] |
| 14: proc 1 (incrementer) line 12 "increment.spin" (state 3) [progress[me] = 1] |
| 15: proc 1 terminates |
| 16: proc 0 (:init:) line 30 "increment.spin" (state 12) [i = 0] |
| 16: proc 0 (:init:) line 31 "increment.spin" (state 13) [sum = 0] |
| 17: proc 0 (:init:) line 33 "increment.spin" (state 14) [((i<2))] |
| 17: proc 0 (:init:) line 34 "increment.spin" (state 15) [sum = (sum+progress[i])] |
| 17: proc 0 (:init:) line 35 "increment.spin" (state 16) [i = (i+1)] |
| 18: proc 0 (:init:) line 33 "increment.spin" (state 14) [((i<2))] |
| 18: proc 0 (:init:) line 34 "increment.spin" (state 15) [sum = (sum+progress[i])] |
| 18: proc 0 (:init:) line 35 "increment.spin" (state 16) [i = (i+1)] |
| 19: proc 0 (:init:) line 36 "increment.spin" (state 17) [((i>=2))] |
| 20: proc 0 (:init:) line 32 "increment.spin" (state 21) [break] |
| spin: line 38 "increment.spin", Error: assertion violated |
| spin: text of failed assertion: assert(((sum<2)||(counter==2))) |
| 21: proc 0 (:init:) line 38 "increment.spin" (state 22) [assert(((sum<2)||(counter==2)))] |
| spin: trail ends after 21 steps |
| #processes: 1 |
| counter = 1 |
| progress[0] = 1 |
| progress[1] = 1 |
| 21: proc 0 (:init:) line 40 "increment.spin" (state 24) <valid end state> |
| 3 processes created |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Non-Atomic Increment Error Trail} |
| \label{fig:analysis:Non-Atomic Increment Error Trail} |
| \end{figure*} |
| |
| This gives the output shown in |
| Figure~\ref{fig:analysis:Non-Atomic Increment Error Trail}. |
| As can be seen, the first portion of the init block created both |
| incrementer processes, both of which first fetched the counter, |
| then both incremented and stored it, losing a count. |
| The assertion then triggered, after which the global state is displayed. |
| |
| \subsubsection{Promela Warm-Up: Atomic Increment} |
| \label{sec:formal:Promela Warm-Up: Atomic Increment} |
| |
| \begin{figure}[htbp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 proctype incrementer(byte me) |
| 2 { |
| 3 int temp; |
| 4 |
| 5 atomic { |
| 6 temp = counter; |
| 7 counter = temp + 1; |
| 8 } |
| 9 progress[me] = 1; |
| 10 } |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Promela Code for Atomic Increment} |
| \label{fig:analysis:Promela Code for Atomic Increment} |
| \end{figure} |
| |
| \begin{figure}[htbp] |
| { \scriptsize |
| \begin{verbbox} |
| (Spin Version 4.2.5 -- 2 April 2005) |
| + Partial Order Reduction |
| |
| Full statespace search for: |
| never claim - (none specified) |
| assertion violations + |
| cycle checks - (disabled by -DSAFETY) |
| invalid end states + |
| |
| State-vector 40 byte, depth reached 20, errors: 0 |
| 52 states, stored |
| 21 states, matched |
| 73 transitions (= stored+matched) |
| 66 atomic steps |
| hash conflicts: 0 (resolved) |
| |
| 2.622 memory usage (Mbyte) |
| |
| unreached in proctype incrementer |
| (0 of 5 states) |
| unreached in proctype :init: |
| (0 of 24 states) |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Atomic Increment spin Output} |
| \label{fig:analysis:Atomic Increment spin Output} |
| \end{figure} |
| |
| It is easy to fix this example by placing the body of the incrementer |
| processes in an atomic blocks as shown in |
| Figure~\ref{fig:analysis:Promela Code for Atomic Increment}. |
| One could also have simply replaced the pair of statements with |
| {\tt counter = counter + 1}, because Promela statements are |
| atomic. |
| Either way, running this modified model gives us an error-free traversal |
| of the state space, as shown in |
| Figure~\ref{fig:analysis:Atomic Increment spin Output}. |
| |
| \begin{table} |
| \footnotesize |
| \centering |
| \begin{tabular}{c|r|r} |
| \# incrementers & \# states & megabytes \\ |
| \hline |
| \hline |
| 1 & 11 & 2.6 \\ |
| \hline |
| 2 & 52 & 2.6 \\ |
| \hline |
| 3 & 372 & 2.6 \\ |
| \hline |
| 4 & 3,496 & 2.7 \\ |
| \hline |
| 5 & 40,221 & 5.0 \\ |
| \hline |
| 6 & 545,720 & 40.5 \\ |
| \hline |
| 7 & 8,521,450 & 652.7 \\ |
| \end{tabular} |
| \caption{Memory Usage of Increment Model} |
| \label{tab:advsync:Memory Usage of Increment Model} |
| \end{table} |
| |
| Table~\ref{tab:advsync:Memory Usage of Increment Model} |
| shows the number of states and memory consumed |
| as a function of number of incrementers modeled |
| (by redefining {\tt NUMPROCS}): |
| |
| Running unnecessarily large models is thus subtly discouraged, although |
| 652MB is well within the limits of modern desktop and laptop machines. |
| |
| With this example under our belt, let's take a closer look at the |
| commands used to analyze Promela models and then look at more |
| elaborate examples. |
| |
| \subsection{How to Use Promela} |
| \label{sec:formal:How to Use Promela} |
| |
| Given a source file \path{qrcu.spin}, one can use the following commands: |
| |
| \begin{description}[style=nextline] |
| \item [\tco{spin -a qrcu.spin}] |
| Create a file \path{pan.c} that fully searches the state machine. |
| \item [\tco{cc -DSAFETY -o pan pan.c}] |
| Compile the generated state-machine search. The \co{-DSAFETY} |
| generates optimizations that are appropriate if you have only |
| assertions (and perhaps \co{never} statements). If you have |
| liveness, fairness, or forward-progress checks, you may need |
| to compile without \co{-DSAFETY}. If you leave off \co{-DSAFETY} |
| when you could have used it, the program will let you know. |
| |
| The optimizations produced by \co{-DSAFETY} greatly speed things |
| up, so you should use it when you can. |
| An example situation where you cannot use \co{-DSAFETY} is |
| when checking for livelocks (AKA ``non-progress cycles'') |
| via \co{-DNP}. |
| \item [\tco{./pan}] |
| This actually searches the state space. The number of states |
| can reach into the tens of millions with very small state |
| machines, so you will need a machine with large memory. |
| For example, \path{qrcu.spin} with 3 readers and 2 updaters required |
| 2.7GB of memory. |
| |
| If you aren't sure whether your machine has enough memory, |
| run \co{top} in one window and \co{./pan} in another. Keep the |
| focus on the \co{./pan} window so that you can quickly kill |
| execution if need be. As soon as CPU time drops much below |
| 100\%, kill \co{./pan}. If you have removed focus from the |
| window running \co{./pan}, you may wait a long time for the |
| windowing system to grab enough memory to do anything for |
| you. |
| |
| Don't forget to capture the output, especially |
| if you are working on a remote machine. |
| |
| If your model includes forward-progress checks, you will likely |
| need to enable ``weak fairness'' via the \co{-f} command-line |
| argument to \co{./pan}. |
| If your forward-progress checks involve \co{accept} labels, |
| you will also need the \co{-a} argument. |
| % forward reference to model: formal.2009.02.19a in |
| % /home/linux/git/userspace-rcu/formal-model. |
| \item [\tco{spin -t -p qrcu.spin}] |
| Given \co{trail} file output by a run that encountered an |
| error, output the sequence of steps leading to that error. |
| The \co{-g} flag will also include the values of changed |
| global variables, and the \co{-l} flag will also include |
| the values of changed local variables. |
| \end{description} |
| |
| \subsubsection{Promela Peculiarities} |
| \label{sec:formal:Promela Peculiarities} |
| |
| Although all computer languages have underlying similarities, |
| Promela will provide some surprises to people used to coding in C, |
| C++, or Java. |
| |
| \begin{enumerate} |
| \item In C, ``\co{;}'' terminates statements. In Promela it separates them. |
| Fortunately, more recent versions of Spin have become |
| much more forgiving of ``extra'' semicolons. |
| \item Promela's looping construct, the \co{do} statement, takes |
| conditions. |
| This \co{do} statement closely resembles a looping if-then-else |
| statement. |
| \item In C's \co{switch} statement, if there is no matching case, the whole |
| statement is skipped. In Promela's equivalent, confusingly called |
| \co{if}, if there is no matching guard expression, you get an error |
| without a recognizable corresponding error message. |
| So, if the error output indicates an innocent line of code, |
| check to see if you left out a condition from an \co{if} or \co{do} |
| statement. |
| \item When creating stress tests in C, one usually races suspect operations |
| against each other repeatedly. In Promela, one instead sets up |
| a single race, because Promela will search out all the possible |
| outcomes from that single race. Sometimes you do need to loop |
| in Promela, for example, if multiple operations overlap, but |
| doing so greatly increases the size of your state space. |
| \item In C, the easiest thing to do is to maintain a loop counter to track |
| progress and terminate the loop. In Promela, loop counters |
| must be avoided like the plague because they cause the state |
| space to explode. On the other hand, there is no penalty for |
| infinite loops in Promela as long as none of the variables |
| monotonically increase or decrease---Promela will figure out |
| how many passes through the loop really matter, and automatically |
| prune execution beyond that point. |
| \item In C torture-test code, it is often wise to keep per-task control |
| variables. They are cheap to read, and greatly aid in debugging the |
| test code. In Promela, per-task control variables should be used |
| only when there is no other alternative. To see this, consider |
| a 5-task verification with one bit each to indicate completion. |
| This gives 32 states. In contrast, a simple counter would have |
| only six states, more than a five-fold reduction. That factor |
| of five might not seem like a problem, at least not until you |
| are struggling with a verification program possessing more than |
| 150 million states consuming more than 10GB of memory! |
| \item One of the most challenging things both in C torture-test code and |
| in Promela is formulating good assertions. Promela also allows |
| \co{never} claims that act sort of like an assertion replicated |
| between every line of code. |
| \item Dividing and conquering is extremely helpful in Promela in keeping |
| the state space under control. Splitting a large model into two |
| roughly equal halves will result in the state space of each |
| half being roughly the square root of the whole. |
| For example, a million-state combined model might reduce to a |
| pair of thousand-state models. |
| Not only will Promela handle the two smaller models much more |
| quickly with much less memory, but the two smaller algorithms |
| are easier for people to understand. |
| \end{enumerate} |
| |
| |
| \subsubsection{Promela Coding Tricks} |
| \label{sec:formal:Promela Coding Tricks} |
| |
| Promela was designed to analyze protocols, so using it on parallel programs |
| is a bit abusive. |
| The following tricks can help you to abuse Promela safely: |
| |
| \begin{enumerate} |
| \item Memory reordering. Suppose you have a pair of statements |
| copying globals x and y to locals r1 and r2, where ordering |
| matters (e.g., unprotected by locks), but where you have |
| no memory barriers. This can be modeled in Promela as follows: |
| |
| \vspace{5pt} |
| \begin{minipage}[t]{\columnwidth} |
| \scriptsize |
| \begin{verbatim} |
| 1 if |
| 2 :: 1 -> r1 = x; |
| 3 r2 = y |
| 4 :: 1 -> r2 = y; |
| 5 r1 = x |
| 6 fi |
| \end{verbatim} |
| \end{minipage} |
| \vspace{5pt} |
| |
| The two branches of the \co{if} statement will be selected |
| nondeterministically, since they both are available. |
| Because the full state space is searched, \emph{both} choices |
| will eventually be made in all cases. |
| |
| Of course, this trick will cause your state space to explode |
| if used too heavily. |
| In addition, it requires you to anticipate possible reorderings. |
| |
| \begin{figure}[tbp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 i = 0; |
| 2 sum = 0; |
| 3 do |
| 4 :: i < N_QRCU_READERS -> |
| 5 sum = sum + (readerstart[i] == 1 && |
| 6 readerprogress[i] == 1); |
| 7 i++ |
| 8 :: i >= N_QRCU_READERS -> |
| 9 assert(sum == 0); |
| 10 break |
| 11 od |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Complex Promela Assertion} |
| \label{fig:analysis:Complex Promela Assertion} |
| \end{figure} |
| |
| \begin{figure}[tbp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 atomic { |
| 2 i = 0; |
| 3 sum = 0; |
| 4 do |
| 5 :: i < N_QRCU_READERS -> |
| 6 sum = sum + (readerstart[i] == 1 && |
| 7 readerprogress[i] == 1); |
| 8 i++ |
| 9 :: i >= N_QRCU_READERS -> |
| 10 assert(sum == 0); |
| 11 break |
| 12 od |
| 13 } |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Atomic Block for Complex Promela Assertion} |
| \label{fig:analysis:Atomic Block for Complex Promela Assertion} |
| \end{figure} |
| |
| \item State reduction. If you have complex assertions, evaluate |
| them under \co{atomic}. After all, they are not part of the |
| algorithm. One example of a complex assertion (to be discussed |
| in more detail later) is as shown in |
| Figure~\ref{fig:analysis:Complex Promela Assertion}. |
| |
| There is no reason to evaluate this assertion |
| non-atomically, since it is not actually part of the algorithm. |
| Because each statement contributes to state, we can reduce |
| the number of useless states by enclosing it in an \co{atomic} |
| block as shown in |
| Figure~\ref{fig:analysis:Atomic Block for Complex Promela Assertion}. |
| |
| \item Promela does not provide functions. |
| You must instead use C preprocessor macros. |
| However, you must use them carefully in order to avoid |
| combinatorial explosion. |
| \end{enumerate} |
| |
| Now we are ready for more complex examples. |
| |
| \subsection{Promela Example: Locking} |
| \label{sec:formal:Promela Example: Locking} |
| |
| \begin{figure}[tbp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 #define spin_lock(mutex) \ |
| 2 do \ |
| 3 :: 1 -> atomic { \ |
| 4 if \ |
| 5 :: mutex == 0 -> \ |
| 6 mutex = 1; \ |
| 7 break \ |
| 8 :: else -> skip \ |
| 9 fi \ |
| 10 } \ |
| 11 od |
| 12 |
| 13 #define spin_unlock(mutex) \ |
| 14 mutex = 0 |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Promela Code for Spinlock} |
| \label{fig:analysis:Promela Code for Spinlock} |
| \end{figure} |
| |
| Since locks are generally useful, \co{spin_lock()} and |
| \co{spin_unlock()} |
| macros are provided in \path{lock.h}, which may be included from |
| multiple Promela models, as shown in |
| Figure~\ref{fig:analysis:Promela Code for Spinlock}. |
| The \co{spin_lock()} macro contains an infinite do-od loop |
| spanning lines~2-11, |
| courtesy of the single guard expression of ``1'' on line~3. |
| The body of this loop is a single atomic block that contains |
| an if-fi statement. |
| The if-fi construct is similar to the do-od construct, except |
| that it takes a single pass rather than looping. |
| If the lock is not held on line~5, then line~6 acquires it and |
| line~7 breaks out of the enclosing do-od loop (and also exits |
| the atomic block). |
| On the other hand, if the lock is already held on line~8, |
| we do nothing (\co{skip}), and fall out of the if-fi and the |
| atomic block so as to take another pass through the outer |
| loop, repeating until the lock is available. |
| |
| The \co{spin_unlock()} macro simply marks the lock as no |
| longer held. |
| |
| Note that memory barriers are not needed because Promela assumes |
| full ordering. |
| In any given Promela state, all processes agree on both the current |
| state and the order of state changes that caused us to arrive at |
| the current state. |
| This is analogous to the ``sequentially consistent'' memory model |
| used by a few computer systems (such as MIPS and PA-RISC). |
| As noted earlier, and as will be seen in a later example, |
| weak memory ordering must be explicitly coded. |
| |
| \begin{figure}[tb] |
| { \scriptsize |
| \begin{verbbox} |
| 1 #include "lock.h" |
| 2 |
| 3 #define N_LOCKERS 3 |
| 4 |
| 5 bit mutex = 0; |
| 6 bit havelock[N_LOCKERS]; |
| 7 int sum; |
| 8 |
| 9 proctype locker(byte me) |
| 10 { |
| 11 do |
| 12 :: 1 -> |
| 13 spin_lock(mutex); |
| 14 havelock[me] = 1; |
| 15 havelock[me] = 0; |
| 16 spin_unlock(mutex) |
| 17 od |
| 18 } |
| 19 |
| 20 init { |
| 21 int i = 0; |
| 22 int j; |
| 23 |
| 24 end: do |
| 25 :: i < N_LOCKERS -> |
| 26 havelock[i] = 0; |
| 27 run locker(i); |
| 28 i++ |
| 29 :: i >= N_LOCKERS -> |
| 30 sum = 0; |
| 31 j = 0; |
| 32 atomic { |
| 33 do |
| 34 :: j < N_LOCKERS -> |
| 35 sum = sum + havelock[j]; |
| 36 j = j + 1 |
| 37 :: j >= N_LOCKERS -> |
| 38 break |
| 39 od |
| 40 } |
| 41 assert(sum <= 1); |
| 42 break |
| 43 od |
| 44 } |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Promela Code to Test Spinlocks} |
| \label{fig:analysis:Promela Code to Test Spinlocks} |
| \end{figure} |
| |
| These macros are tested by the Promela code shown in |
| Figure~\ref{fig:analysis:Promela Code to Test Spinlocks}. |
| This code is similar to that used to test the increments, |
| with the number of locking processes defined by the \co{N_LOCKERS} |
| macro definition on line~3. |
| The mutex itself is defined on line~5, an array to track the lock owner |
| on line~6, and line~7 is used by assertion |
| code to verify that only one process holds the lock. |
| |
| The locker process is on lines~9-18, and simply loops forever |
| acquiring the lock on line~13, claiming it on line~14, |
| unclaiming it on line~15, and releasing it on line~16. |
| |
| The init block on lines~20-44 initializes the current locker's |
| havelock array entry on line~26, starts the current locker on |
| line~27, and advances to the next locker on line~28. |
| Once all locker processes are spawned, the do-od loop |
| moves to line~29, which checks the assertion. |
| Lines~30 and~31 initialize the control variables, |
| lines~32-40 atomically sum the havelock array entries, |
| line~41 is the assertion, and line~42 exits the loop. |
| |
| We can run this model by placing the above two code fragments into |
| files named \path{lock.h} and \path{lock.spin}, respectively, and then running |
| the following commands: |
| |
| \vspace{5pt} |
| \begin{minipage}[t]{\columnwidth} |
| \scriptsize |
| \begin{verbatim} |
| spin -a lock.spin |
| cc -DSAFETY -o pan pan.c |
| ./pan |
| \end{verbatim} |
| \end{minipage} |
| \vspace{5pt} |
| |
| \begin{figure}[htbp] |
| { \scriptsize |
| \begin{verbbox} |
| (Spin Version 4.2.5 -- 2 April 2005) |
| + Partial Order Reduction |
| |
| Full statespace search for: |
| never claim - (none specified) |
| assertion violations + |
| cycle checks - (disabled by -DSAFETY) |
| invalid end states + |
| |
| State-vector 40 byte, depth reached 357, errors: 0 |
| 564 states, stored |
| 929 states, matched |
| 1493 transitions (= stored+matched) |
| 368 atomic steps |
| hash conflicts: 0 (resolved) |
| |
| 2.622 memory usage (Mbyte) |
| |
| unreached in proctype locker |
| line 18, state 20, "-end-" |
| (1 of 20 states) |
| unreached in proctype :init: |
| (0 of 22 states) |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Output for Spinlock Test} |
| \label{fig:analysis:Output for Spinlock Test} |
| \end{figure} |
| |
| The output will look something like that shown in |
| Figure~\ref{fig:analysis:Output for Spinlock Test}. |
| As expected, this run has no assertion failures (``errors: 0''). |
| |
| \QuickQuiz{} |
| Why is there an unreached statement in |
| locker? After all, isn't this a \emph{full} state-space |
| search? |
| \QuickQuizAnswer{ |
| The locker process is an infinite loop, so control |
| never reaches the end of this process. |
| However, since there are no monotonically increasing variables, |
| Promela is able to model this infinite loop with a small |
| number of states. |
| } \QuickQuizEnd |
| |
| \QuickQuiz{} |
| What are some Promela code-style issues with this example? |
| \QuickQuizAnswer{ |
| There are several: |
| \begin{enumerate} |
| \item The declaration of {\tt sum} should be moved to within |
| the init block, since it is not used anywhere else. |
| \item The assertion code should be moved outside of the |
| initialization loop. The initialization loop can |
| then be placed in an atomic block, greatly reducing |
| the state space (by how much?). |
| \item The atomic block covering the assertion code should |
| be extended to include the initialization of {\tt sum} |
| and {\tt j}, and also to cover the assertion. |
| This also reduces the state space (again, by how |
| much?). |
| \end{enumerate} |
| } \QuickQuizEnd |
| |
| |
| \subsection{Promela Example: QRCU} |
| \label{sec:formal:Promela Example: QRCU} |
| |
| This final example demonstrates a real-world use of Promela on Oleg |
| Nesterov's |
| QRCU~\cite{OlegNesterov2006QRCU,OlegNesterov2006aQRCU}, |
| but modified to speed up the \co{synchronize_qrcu()} |
| fastpath. |
| |
| But first, what is QRCU? |
| |
| QRCU is a variant of SRCU~\cite{PaulEMcKenney2006c} |
| that trades somewhat higher read overhead |
| (atomic increment and decrement on a global variable) for extremely |
| low grace-period latencies. |
| If there are no readers, the grace period will be detected in less |
| than a microsecond, compared to the multi-millisecond grace-period |
| latencies of most other RCU implementations. |
| |
| \begin{enumerate} |
| \item There is a \co{qrcu_struct} that defines a QRCU domain. |
| Like SRCU (and unlike other variants of RCU) QRCU's action |
| is not global, but instead focused on the specified |
| \co{qrcu_struct}. |
| \item There are \co{qrcu_read_lock()} and \co{qrcu_read_unlock()} |
| primitives that delimit QRCU read-side critical sections. |
| The corresponding \co{qrcu_struct} must be passed into |
| these primitives, and the return value from \co{rcu_read_lock()} |
| must be passed to \co{rcu_read_unlock()}. |
| |
| For example: |
| |
| \vspace{5pt} |
| \begin{minipage}[t]{\columnwidth} |
| \scriptsize |
| \begin{verbatim} |
| idx = qrcu_read_lock(&my_qrcu_struct); |
| /* read-side critical section. */ |
| qrcu_read_unlock(&my_qrcu_struct, idx); |
| \end{verbatim} |
| \end{minipage} |
| \vspace{5pt} |
| |
| \item There is a \co{synchronize_qrcu()} primitive that blocks until |
| all pre-existing QRCU read-side critical sections complete, |
| but, like SRCU's \co{synchronize_srcu()}, QRCU's |
| \co{synchronize_qrcu()} need wait only for those read-side |
| critical sections that are using the same \co{qrcu_struct}. |
| |
| For example, \co{synchronize_qrcu(&your_qrcu_struct)} |
| would \emph{not} need to wait on the earlier QRCU read-side |
| critical section. |
| In contrast, \co{synchronize_qrcu(&my_qrcu_struct)} |
| \emph{would} need to wait, since it shares the same |
| \co{qrcu_struct}. |
| \end{enumerate} |
| |
| A Linux-kernel patch for QRCU has been |
| produced~\cite{PaulMcKenney2007QRCUpatch}, |
| but has not yet been included in the Linux kernel as of |
| April 2008. |
| |
| \begin{figure}[htbp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 #include "lock.h" |
| 2 |
| 3 #define N_QRCU_READERS 2 |
| 4 #define N_QRCU_UPDATERS 2 |
| 5 |
| 6 bit idx = 0; |
| 7 byte ctr[2]; |
| 8 byte readerprogress[N_QRCU_READERS]; |
| 9 bit mutex = 0; |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{QRCU Global Variables} |
| \label{fig:analysis:QRCU Global Variables} |
| \end{figure} |
| |
| Returning to the Promela code for QRCU, the global variables are as shown in |
| Figure~\ref{fig:analysis:QRCU Global Variables}. |
| This example uses locking, hence including \path{lock.h}. |
| Both the number of readers and writers can be varied using the |
| two \co{#define} statements, giving us not one but two ways to create |
| combinatorial explosion. |
| The \co{idx} variable controls which of the two elements of the \co{ctr} |
| array will be used by readers, and the \co{readerprogress} variable |
| allows an assertion to determine when all the readers are finished |
| (since a QRCU update cannot be permitted to complete until all |
| pre-existing readers have completed their QRCU read-side critical |
| sections). |
| The \co{readerprogress} array elements have values as follows, |
| indicating the state of the corresponding reader: |
| |
| \begin{enumerate} |
| \item 0: not yet started. |
| \item 1: within QRCU read-side critical section. |
| \item 2: finished with QRCU read-side critical section. |
| \end{enumerate} |
| |
| Finally, the \co{mutex} variable is used to serialize updaters' slowpaths. |
| |
| \begin{figure}[htbp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 proctype qrcu_reader(byte me) |
| 2 { |
| 3 int myidx; |
| 4 |
| 5 do |
| 6 :: 1 -> |
| 7 myidx = idx; |
| 8 atomic { |
| 9 if |
| 10 :: ctr[myidx] > 0 -> |
| 11 ctr[myidx]++; |
| 12 break |
| 13 :: else -> skip |
| 14 fi |
| 15 } |
| 16 od; |
| 17 readerprogress[me] = 1; |
| 18 readerprogress[me] = 2; |
| 19 atomic { ctr[myidx]-- } |
| 20 } |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{QRCU Reader Process} |
| \label{fig:analysis:QRCU Reader Process} |
| \end{figure} |
| |
| QRCU readers are modeled by the \co{qrcu_reader()} process shown in |
| Figure~\ref{fig:analysis:QRCU Reader Process}. |
| A do-od loop spans lines~5-16, with a single guard of ``1'' |
| on line~6 that makes it an infinite loop. |
| Line~7 captures the current value of the global index, and lines~8-15 |
| atomically increment it (and break from the infinite loop) |
| if its value was non-zero (\co{atomic_inc_not_zero()}). |
| Line~17 marks entry into the RCU read-side critical section, and |
| line~18 marks exit from this critical section, both lines for the benefit of |
| the {\tt assert()} statement that we shall encounter later. |
| Line~19 atomically decrements the same counter that we incremented, |
| thereby exiting the RCU read-side critical section. |
| |
| \begin{figure}[htbp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 #define sum_unordered \ |
| 2 atomic { \ |
| 3 do \ |
| 4 :: 1 -> \ |
| 5 sum = ctr[0]; \ |
| 6 i = 1; \ |
| 7 break \ |
| 8 :: 1 -> \ |
| 9 sum = ctr[1]; \ |
| 10 i = 0; \ |
| 11 break \ |
| 12 od; \ |
| 13 } \ |
| 14 sum = sum + ctr[i] |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{QRCU Unordered Summation} |
| \label{fig:analysis:QRCU Unordered Summation} |
| \end{figure} |
| |
| The C-preprocessor macro shown in |
| Figure~\ref{fig:analysis:QRCU Unordered Summation} |
| sums the pair of counters so as to emulate weak memory ordering. |
| Lines~2-13 fetch one of the counters, and line~14 fetches the other |
| of the pair and sums them. |
| The atomic block consists of a single do-od statement. |
| This do-od statement (spanning lines~3-12) is unusual in that |
| it contains two unconditional |
| branches with guards on lines~4 and~8, which causes Promela to |
| non-deterministically choose one of the two (but again, the full |
| state-space search causes Promela to eventually make all possible |
| choices in each applicable situation). |
| The first branch fetches the zero-th counter and sets \co{i} to 1 (so |
| that line~14 will fetch the first counter), while the second |
| branch does the opposite, fetching the first counter and setting \co{i} |
| to 0 (so that line~14 will fetch the second counter). |
| |
| \QuickQuiz{} |
| Is there a more straightforward way to code the do-od statement? |
| \QuickQuizAnswer{ |
| Yes. |
| Replace it with {\tt if-fi} and remove the two {\tt break} statements. |
| } \QuickQuizEnd |
| |
| \begin{figure}[htbp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 proctype qrcu_updater(byte me) |
| 2 { |
| 3 int i; |
| 4 byte readerstart[N_QRCU_READERS]; |
| 5 int sum; |
| 6 |
| 7 do |
| 8 :: 1 -> |
| 9 |
| 10 /* Snapshot reader state. */ |
| 11 |
| 12 atomic { |
| 13 i = 0; |
| 14 do |
| 15 :: i < N_QRCU_READERS -> |
| 16 readerstart[i] = readerprogress[i]; |
| 17 i++ |
| 18 :: i >= N_QRCU_READERS -> |
| 19 break |
| 20 od |
| 21 } |
| 22 |
| 23 sum_unordered; |
| 24 if |
| 25 :: sum <= 1 -> sum_unordered |
| 26 :: else -> skip |
| 27 fi; |
| 28 if |
| 29 :: sum > 1 -> |
| 30 spin_lock(mutex); |
| 31 atomic { ctr[!idx]++ } |
| 32 idx = !idx; |
| 33 atomic { ctr[!idx]-- } |
| 34 do |
| 35 :: ctr[!idx] > 0 -> skip |
| 36 :: ctr[!idx] == 0 -> break |
| 37 od; |
| 38 spin_unlock(mutex); |
| 39 :: else -> skip |
| 40 fi; |
| 41 |
| 42 /* Verify reader progress. */ |
| 43 |
| 44 atomic { |
| 45 i = 0; |
| 46 sum = 0; |
| 47 do |
| 48 :: i < N_QRCU_READERS -> |
| 49 sum = sum + (readerstart[i] == 1 && |
| 50 readerprogress[i] == 1); |
| 51 i++ |
| 52 :: i >= N_QRCU_READERS -> |
| 53 assert(sum == 0); |
| 54 break |
| 55 od |
| 56 } |
| 57 od |
| 58 } |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{QRCU Updater Process} |
| \label{fig:analysis:QRCU Updater Process} |
| \end{figure} |
| |
| With the \co{sum_unordered} macro in place, we can now proceed |
| to the update-side process shown in |
| Figure~\ref{fig:analysis:QRCU Updater Process}. |
| The update-side process repeats indefinitely, with the corresponding |
| do-od loop ranging over lines~7-57. |
| Each pass through the loop first snapshots the global {\tt readerprogress} |
| array into the local {\tt readerstart} array on lines~12-21. |
| This snapshot will be used for the assertion on line~53. |
| Line~23 invokes \co{sum_unordered}, and then lines~24-27 |
| re-invoke \co{sum_unordered} if the fastpath is potentially |
| usable. |
| |
| Lines~28-40 execute the slowpath code if need be, with |
| lines~30 and~38 acquiring and releasing the update-side lock, |
| lines~31-33 flipping the index, and lines~34-37 waiting for |
| all pre-existing readers to complete. |
| |
| Lines~44-56 then compare the current values in the {\tt readerprogress} |
| array to those collected in the {\tt readerstart} array, |
| forcing an assertion failure should any readers that started before |
| this update still be in progress. |
| |
| \QuickQuiz{} |
| Why are there atomic blocks at lines~12-21 |
| and lines~44-56, when the operations within those atomic |
| blocks have no atomic implementation on any current |
| production microprocessor? |
| \QuickQuizAnswer{ |
| Because those operations are for the benefit of the |
| assertion only. They are not part of the algorithm itself. |
| There is therefore no harm in marking them atomic, and |
| so marking them greatly reduces the state space that must |
| be searched by the Promela model. |
| } \QuickQuizEnd |
| |
| \QuickQuiz{} |
| Is the re-summing of the counters on lines~24-27 |
| \emph{really} necessary? |
| \QuickQuizAnswer{ |
| Yes. To see this, delete these lines and run the model. |
| |
| Alternatively, consider the following sequence of steps: |
| |
| \begin{enumerate} |
| \item One process is within its RCU read-side critical |
| section, so that the value of {\tt ctr[0]} is zero and |
| the value of {\tt ctr[1]} is two. |
| \item An updater starts executing, and sees that the sum of |
| the counters is two so that the fastpath cannot be |
| executed. It therefore acquires the lock. |
| \item A second updater starts executing, and fetches the value |
| of {\tt ctr[0]}, which is zero. |
| \item The first updater adds one to {\tt ctr[0]}, flips |
| the index (which now becomes zero), then subtracts |
| one from {\tt ctr[1]} (which now becomes one). |
| \item The second updater fetches the value of {\tt ctr[1]}, |
| which is now one. |
| \item The second updater now incorrectly concludes that it |
| is safe to proceed on the fastpath, despite the fact |
| that the original reader has not yet completed. |
| \end{enumerate} |
| } \QuickQuizEnd |
| |
| \begin{figure}[htbp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 init { |
| 2 int i; |
| 3 |
| 4 atomic { |
| 5 ctr[idx] = 1; |
| 6 ctr[!idx] = 0; |
| 7 i = 0; |
| 8 do |
| 9 :: i < N_QRCU_READERS -> |
| 10 readerprogress[i] = 0; |
| 11 run qrcu_reader(i); |
| 12 i++ |
| 13 :: i >= N_QRCU_READERS -> break |
| 14 od; |
| 15 i = 0; |
| 16 do |
| 17 :: i < N_QRCU_UPDATERS -> |
| 18 run qrcu_updater(i); |
| 19 i++ |
| 20 :: i >= N_QRCU_UPDATERS -> break |
| 21 od |
| 22 } |
| 23 } |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{QRCU Initialization Process} |
| \label{fig:analysis:QRCU Initialization Process} |
| \end{figure} |
| |
| All that remains is the initialization block shown in |
| Figure~\ref{fig:analysis:QRCU Initialization Process}. |
| This block simply initializes the counter pair on lines~5-6, |
| spawns the reader processes on lines~7-14, and spawns the updater |
| processes on lines~15-21. |
| This is all done within an atomic block to reduce state space. |
| |
| \subsubsection{Running the QRCU Example} |
| \label{sec:formal:Running the QRCU Example} |
| |
| To run the QRCU example, combine the code fragments in the previous |
| section into a single file named \path{qrcu.spin}, and place the definitions |
| for \co{spin_lock()} and \co{spin_unlock()} into a file named |
| \path{lock.h}. |
| Then use the following commands to build and run the QRCU model: |
| |
| \vspace{5pt} |
| \begin{minipage}[t]{\columnwidth} |
| \scriptsize |
| \begin{verbatim} |
| spin -a qrcu.spin |
| cc -DSAFETY -o pan pan.c |
| ./pan |
| \end{verbatim} |
| \end{minipage} |
| \vspace{5pt} |
| |
| \begin{table} |
| \footnotesize |
| \centering |
| \begin{tabular}{c|r|r|r} |
| updaters & |
| readers & |
| \# states & MB \\ |
| \hline |
| 1 & 1 & 376 & 2.6 \\ |
| \hline |
| 1 & 2 & 6,177 & 2.9 \\ |
| \hline |
| 1 & 3 & 82,127 & 7.5 \\ |
| \hline |
| 2 & 1 & 29,399 & 4.5 \\ |
| \hline |
| 2 & 2 & 1,071,180 & 75.4 \\ |
| \hline |
| 2 & 3 & 33,866,700 & 2,715.2 \\ |
| \hline |
| 3 & 1 & 258,605 & 22.3 \\ |
| \hline |
| 3 & 2 & 169,533,000 & 14,979.9 \\ |
| \end{tabular} |
| \caption{Memory Usage of QRCU Model} |
| \label{tab:advsync:Memory Usage of QRCU Model} |
| \end{table} |
| |
| The resulting output shows that this model passes all of the cases |
| shown in |
| Table~\ref{tab:advsync:Memory Usage of QRCU Model}. |
| Now, it would be nice to run the case with three readers and three |
| updaters, however, simple extrapolation indicates that this will |
| require on the order of a terabyte of memory best case. |
| So, what to do? |
| Here are some possible approaches: |
| |
| \begin{enumerate} |
| \item See whether a smaller number of readers and updaters suffice |
| to prove the general case. |
| \item Manually construct a proof of correctness. |
| \item Use a more capable tool. |
| \item Divide and conquer. |
| \end{enumerate} |
| |
| The following sections discuss each of these approaches. |
| |
| \subsubsection{How Many Readers and Updaters Are Really Needed?} |
| \label{sec:formal:How Many Readers and Updaters Are Really Needed?} |
| |
| One approach is to look carefully at the Promela code for |
| \co{qrcu_updater()} and notice that the only global state |
| change is happening under the lock. |
| Therefore, only one updater at a time can possibly be modifying |
| state visible to either readers or other updaters. |
| This means that any sequences of state changes can be carried |
| out serially by a single updater due to the fact that Promela does a full |
| state-space search. |
| Therefore, at most two updaters are required: one to change state |
| and a second to become confused. |
| |
| The situation with the readers is less clear-cut, as each reader |
| does only a single read-side critical section then terminates. |
| It is possible to argue that the useful number of readers is limited, |
| due to the fact that the fastpath must see at most a zero and a one |
| in the counters. |
| This is a fruitful avenue of investigation, in fact, it leads to |
| the full proof of correctness described in the next section. |
| |
| \subsubsection{Alternative Approach: Proof of Correctness} |
| \label{sec:formal:Alternative Approach: Proof of Correctness} |
| |
| An informal proof~\cite{PaulMcKenney2007QRCUpatch} |
| follows: |
| |
| \begin{enumerate} |
| \item For \co{synchronize_qrcu()} to exit too early, then |
| by definition there must have been at least one reader |
| present during \co{synchronize_qrcu()}'s full |
| execution. |
| \item The counter corresponding to this reader will have been |
| at least 1 during this time interval. |
| \item The \co{synchronize_qrcu()} code forces at least one |
| of the counters to be at least 1 at all times. |
| \item Therefore, at any given point in time, either one of the |
| counters will be at least 2, or both of the counters will |
| be at least one. |
| \item However, the \co{synchronize_qrcu()} fastpath code |
| can read only one of the counters at a given time. |
| It is therefore possible for the fastpath code to fetch |
| the first counter while zero, but to race with a counter |
| flip so that the second counter is seen as one. |
| \item There can be at most one reader persisting through such |
| a race condition, as otherwise the sum would be two or |
| greater, which would cause the updater to take the slowpath. |
| \item But if the race occurs on the fastpath's first read of the |
| counters, and then again on its second read, there have |
| to have been two counter flips. |
| \item Because a given updater flips the counter only once, and |
| because the update-side lock prevents a pair of updaters |
| from concurrently flipping the counters, the only way that |
| the fastpath code can race with a flip twice is if the |
| first updater completes. |
| \item But the first updater will not complete until after all |
| pre-existing readers have completed. |
| \item Therefore, if the fastpath races with a counter flip |
| twice in succession, all pre-existing readers must have |
| completed, so that it is safe to take the fastpath. |
| \end{enumerate} |
| |
| Of course, not all parallel algorithms have such simple proofs. |
| In such cases, it may be necessary to enlist more capable tools. |
| |
| \subsubsection{Alternative Approach: More Capable Tools} |
| \label{sec:formal:Alternative Approach: More Capable Tools} |
| |
| Although Promela and Spin are quite useful, |
| much more capable tools are available, particularly for verifying |
| hardware. |
| This means that if it is possible to translate your algorithm |
| to the hardware-design VHDL language, as it often will be for |
| low-level parallel algorithms, then it is possible to apply these |
| tools to your code (for example, this was done for the first |
| realtime RCU algorithm). |
| However, such tools can be quite expensive. |
| |
| Although the advent of commodity multiprocessing |
| might eventually result in powerful free-software model-checkers |
| featuring fancy state-space-reduction capabilities, |
| this does not help much in the here and now. |
| |
| As an aside, there are Spin features that support approximate searches |
| that require fixed amounts of memory, however, I have never been able |
| to bring myself to trust approximations when verifying parallel |
| algorithms. |
| |
| Another approach might be to divide and conquer. |
| |
| \subsubsection{Alternative Approach: Divide and Conquer} |
| \label{sec:formal:Alternative Approach: Divide and Conquer} |
| |
| It is often possible to break down a larger parallel algorithm into |
| smaller pieces, which can then be proven separately. |
| For example, a 10-billion-state model might be broken into a pair |
| of 100,000-state models. |
| Taking this approach not only makes it easier for tools such as |
| Promela to verify your algorithms, it can also make your algorithms |
| easier to understand. |
| |
| \subsubsection{Is QRCU Really Correct?} |
| \label{sec:formal:Is QRCU Really Correct?} |
| |
| Is QRCU really correct? |
| We have a Promela-based mechanical proof and a by-hand proof that both |
| say that it is. |
| However, a recent paper by Alglave et al.~\cite{JadeAlglave2013-cav} |
| says otherwise (see Section~5.1 of the paper at the bottom of page~12). |
| Which is it? |
| |
| It turns out that both are correct! |
| When QRCU was added to a suite of formal-verification benchmarks, |
| its memory barriers were omitted, thus resulting in a buggy version |
| of QRCU. |
| So the real news here is that a number of formal-verification tools |
| incorrectly proved this buggy QRCU correct. |
| And this is why formal-verification tools themselves should be tested |
| using bug-injected versions of the code being verified. |
| If a given tool cannot find the injected bugs, then that tool is |
| clearly untrustworthy. |
| |
| \QuickQuiz{} |
| But different formal-verification tools are often designed to |
| locate particular classes of bugs. |
| For example, very few formal-verification tools will find |
| an error in the specification. |
| So isn't this ``clearly untrustworthy'' judgment a bit harsh? |
| \QuickQuizAnswer{ |
| It is certainly true that many formal-verification tools are |
| specialized in some way. |
| For example, Promela does not handle realistic memory models |
| (though they can be programmed into |
| Promela~\cite{Desnoyers:2013:MSM:2506164.2506174}), |
| CBMC~\cite{EdmundClarke2004CBMC} does not detect probabilistic |
| hangs and deadlocks, and |
| Nidhugg~\cite{CarlLeonardsson2014Nidhugg} does not detect |
| bugs involving data nondeterminism. |
| But this means that that these tools cannot be trusted to find |
| bugs that they are not designed to locate. |
| |
| And therefore people creating formal-verification tools should |
| ``tell the truth on the label'', clearly calling out what |
| classes of bugs their tools can and cannot detect. |
| Otherwise, the first time a practitioner finds a tool |
| failing to detect a bug, that practitioner is likely to |
| make extremely harsh and extremely public denunciations |
| of that tool. |
| Yes, yes, there is something to be said for putting your |
| best foot forward, but putting it too far forward without |
| appropriate disclaimers can easily trigger a land mine of |
| negative reaction that your tool might or might not be able |
| to recover from. |
| |
| You have been warned! |
| } \QuickQuizEnd |
| |
| Therefore, if you do intend to use QRCU, please take care. |
| Its proofs of correctness might or might not themselves be correct. |
| Which is one reason why formal verification is unlikely to |
| completely replace testing, as Donald Knuth pointed out so long ago. |
| |
| \QuickQuiz{} |
| Given that we have two independent proofs of correctness for |
| the QRCU algorithm described herein, and given that the |
| proof of incorrectness covers what is known to be a different |
| algorithm, why is there any room for doubt? |
| \QuickQuizAnswer{ |
| There is always room for doubt. |
| In this case, it is important to keep in mind that the two proofs |
| of correctness preceded the formalization of real-world memory |
| models, raising the possibility that these two proofs are based |
| on incorrect memory-ordering assumptions. |
| Furthermore, since both proofs were constructed by the same person, |
| it is quite possible that they contain a common error. |
| Again, there is always room for doubt. |
| } \QuickQuizEnd |