| % formal/axiomatic.tex |
| |
| \section{Axiomatic Approaches} |
| \label{sec:formal:Axiomatic Approaches} |
| \OriginallyPublished{Section}{sec:formal:Axiomatic Approaches}{Axiomatic Approaches}{Linux Weekly News}{PaulEMcKenney2014weakaxiom} |
| |
| \begin{figure*}[tb] |
| { \scriptsize |
| \begin{verbbox} |
| 1 PPC IRIW.litmus |
| 2 "" |
| 3 (* Traditional IRIW. *) |
| 4 { |
| 5 0:r1=1; 0:r2=x; |
| 6 1:r1=1; 1:r4=y; |
| 7 2:r2=x; 2:r4=y; |
| 8 3:r2=x; 3:r4=y; |
| 9 } |
| 10 P0 | P1 | P2 | P3 ; |
| 11 stw r1,0(r2) | stw r1,0(r4) | lwz r3,0(r2) | lwz r3,0(r4) ; |
| 12 | | sync | sync ; |
| 13 | | lwz r5,0(r4) | lwz r5,0(r2) ; |
| 14 |
| 15 exists |
| 16 (2:r3=1 /\ 2:r5=0 /\ 3:r3=1 /\ 3:r5=0) |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{IRIW Litmus Test} |
| \label{fig:formal:IRIW Litmus Test} |
| \end{figure*} |
| |
| Although the PPCMEM tool can solve the famous ``independent reads of |
| independent writes'' (IRIW) litmus test shown in |
| Figure~\ref{fig:formal:IRIW Litmus Test}, doing so requires no less than |
| fourteen CPU hours and generates no less than ten gigabytes of state space. |
| That said, this situation is a great improvement over that before the advent |
| of PPCMEM, where solving this problem required perusing volumes of |
| reference manuals, attempting proofs, discussing with experts, and |
| being unsure of the final answer. |
| Although fourteen hours can seem like a long time, it is much shorter |
| than weeks or even months. |
| |
| However, the time required is a bit surprising given the simplicity |
| of the litmus test, which has two threads storing to two separate variables |
| and two other threads loading from these two variables in opposite |
| orders. |
| The assertion triggers if the two loading threads disagree on the order |
| of the two stores. |
| This litmus test is simple, even by the standards of memory-order litmus |
| tests. |
| |
| One reason for the amount of time and space consumed is that PPCMEM does |
| a trace-based full-state-space search, which means that it must generate |
| and evaluate all possible orders and combinations of events at the |
| architectural level. |
| At this level, both loads and stores correspond to ornate sequences |
| of events and actions, resulting in a very large state space that must |
| be completely searched, in turn resulting in large memory and CPU |
| consumption. |
| |
| Of course, many of the traces are quite similar to one another, which |
| suggests that an approach that treated similar traces as one might |
| improve performace. |
| One such approach is the axiomatic approach of |
| Alglave et al.~\cite{Alglave:2014:HCM:2594291.2594347}, |
| which creates a set of axioms to represent the memory model and then |
| converts litmus tests to theorems that might be proven or disproven |
| over this set of axioms. |
| The resulting tool, called ``herd'', conveniently takes as input the |
| same litmus tests as PPCMEM, including the IRIW litmus test shown in |
| Figure~\ref{fig:formal:IRIW Litmus Test}. |
| |
| \begin{figure*}[tb] |
| { \scriptsize |
| \begin{verbbox} |
| 1 PPC IRIW5.litmus |
| 2 "" |
| 3 (* Traditional IRIW, but with five stores instead of just one. *) |
| 4 { |
| 5 0:r1=1; 0:r2=x; |
| 6 1:r1=1; 1:r4=y; |
| 7 2:r2=x; 2:r4=y; |
| 8 3:r2=x; 3:r4=y; |
| 9 } |
| 10 P0 | P1 | P2 | P3 ; |
| 11 stw r1,0(r2) | stw r1,0(r4) | lwz r3,0(r2) | lwz r3,0(r4) ; |
| 12 addi r1,r1,1 | addi r1,r1,1 | sync | sync ; |
| 13 stw r1,0(r2) | stw r1,0(r4) | lwz r5,0(r4) | lwz r5,0(r2) ; |
| 14 addi r1,r1,1 | addi r1,r1,1 | | ; |
| 15 stw r1,0(r2) | stw r1,0(r4) | | ; |
| 16 addi r1,r1,1 | addi r1,r1,1 | | ; |
| 17 stw r1,0(r2) | stw r1,0(r4) | | ; |
| 18 addi r1,r1,1 | addi r1,r1,1 | | ; |
| 19 stw r1,0(r2) | stw r1,0(r4) | | ; |
| 20 |
| 21 exists |
| 22 (2:r3=1 /\ 2:r5=0 /\ 3:r3=1 /\ 3:r5=0) |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Expanded IRIW Litmus Test} |
| \label{fig:formal:Expanded IRIW Litmus Test} |
| \end{figure*} |
| |
| However, where PPCMEM requires 14 CPU hours to solve IRIW, herd does so |
| in 17 milliseconds, which represents a speedup of more than six orders |
| of magnitude. |
| That said, the problem is exponential in nature, so we should expect |
| herd to exhibit exponential slowdowns for larger problems. |
| And this is exactly what happens, for example, if we add four more writes |
| per writing CPU as shown in |
| Figure~\ref{fig:formal:Expanded IRIW Litmus Test}, |
| herd slows down by a factor of more than 50,000, requiring more than |
| 15 \emph{minutes} of CPU time. |
| Adding threads also results in exponential |
| slowdowns~\cite{PaulEMcKenney2014weakaxiom}. |
| |
| Despite their exponential nature, both PPCMEM and herd have proven quite |
| useful for checking key parallel algorithms, including the queued-lock |
| handoff on x86 systems. |
| The weaknesses of the herd tool are similar to those of PPCMEM, which |
| were described in |
| Section~\ref{sec:formal:PPCMEM Discussion}. |
| There are some obscure (but very real) cases for which the PPCMEM and |
| herd tools disagree, and as of late 2014 resolving these disagreements |
| was ongoing. |
| |
| Longer term, the hope is that the axiomatic approaches incorporate |
| axioms describing higher-level software artifacts. |
| This could potentially allow axiomatic verification of much larger |
| software systems. |
| Another alternative is to press the axioms of boolean logic into service, |
| as described in the next section. |