formal/axiomatic.tex - pub/scm/linux/kernel/git/paulmck/perfbook - Git at Google

 % formal/axiomatic.tex

 \section{Axiomatic Approaches}
 \label{sec:formal:Axiomatic Approaches}
 \OriginallyPublished{Section}{sec:formal:Axiomatic Approaches}{Axiomatic Approaches}{Linux Weekly News}{PaulEMcKenney2014weakaxiom}

 \begin{figure*}[tb]
 { \scriptsize
 \begin{verbbox}
  1 PPC IRIW.litmus
  2 ""
  3 (* Traditional IRIW. *)
  4 {
  5 0:r1=1; 0:r2=x;
  6 1:r1=1;         1:r4=y;
  7         2:r2=x; 2:r4=y;
  8         3:r2=x; 3:r4=y;
  9 }
 10 P0           | P1           | P2           | P3           ;
 11 stw r1,0(r2) | stw r1,0(r4) | lwz r3,0(r2) | lwz r3,0(r4) ;
 12              |              | sync         | sync         ;
 13              |              | lwz r5,0(r4) | lwz r5,0(r2) ;
 14
 15 exists
 16 (2:r3=1 /\ 2:r5=0 /\ 3:r3=1 /\ 3:r5=0)
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{IRIW Litmus Test}
 \label{fig:formal:IRIW Litmus Test}
 \end{figure*}

 Although the PPCMEM tool can solve the famous ``independent reads of
 independent writes'' (IRIW) litmus test shown in
 Figure~\ref{fig:formal:IRIW Litmus Test}, doing so requires no less than
 fourteen CPU hours and generates no less than ten gigabytes of state space.
 That said, this situation is a great improvement over that before the advent
 of PPCMEM, where solving this problem required perusing volumes of
 reference manuals, attempting proofs, discussing with experts, and
 being unsure of the final answer.
 Although fourteen hours can seem like a long time, it is much shorter
 than weeks or even months.

 However, the time required is a bit surprising given the simplicity
 of the litmus test, which has two threads storing to two separate variables
 and two other threads loading from these two variables in opposite
 orders.
 The assertion triggers if the two loading threads disagree on the order
 of the two stores.
 This litmus test is simple, even by the standards of memory-order litmus
 tests.

 One reason for the amount of time and space consumed is that PPCMEM does
 a trace-based full-state-space search, which means that it must generate
 and evaluate all possible orders and combinations of events at the
 architectural level.
 At this level, both loads and stores correspond to ornate sequences
 of events and actions, resulting in a very large state space that must
 be completely searched, in turn resulting in large memory and CPU
 consumption.

 Of course, many of the traces are quite similar to one another, which
 suggests that an approach that treated similar traces as one might
 improve performace.
 One such approach is the axiomatic approach of
 Alglave et al.~\cite{Alglave:2014:HCM:2594291.2594347},
 which creates a set of axioms to represent the memory model and then
 converts litmus tests to theorems that might be proven or disproven
 over this set of axioms.
 The resulting tool, called ``herd'',  conveniently takes as input the
 same litmus tests as PPCMEM, including the IRIW litmus test shown in
 Figure~\ref{fig:formal:IRIW Litmus Test}.

 \begin{figure*}[tb]
 { \scriptsize
 \begin{verbbox}
  1 PPC IRIW5.litmus
  2 ""
  3 (* Traditional IRIW, but with five stores instead of just one. *)
  4 {
  5 0:r1=1; 0:r2=x;
  6 1:r1=1;         1:r4=y;
  7         2:r2=x; 2:r4=y;
  8         3:r2=x; 3:r4=y;
  9 }
 10 P0           | P1           | P2           | P3           ;
 11 stw r1,0(r2) | stw r1,0(r4) | lwz r3,0(r2) | lwz r3,0(r4) ;
 12 addi r1,r1,1 | addi r1,r1,1 | sync         | sync         ;
 13 stw r1,0(r2) | stw r1,0(r4) | lwz r5,0(r4) | lwz r5,0(r2) ;
 14 addi r1,r1,1 | addi r1,r1,1 |              |              ;
 15 stw r1,0(r2) | stw r1,0(r4) |              |              ;
 16 addi r1,r1,1 | addi r1,r1,1 |              |              ;
 17 stw r1,0(r2) | stw r1,0(r4) |              |              ;
 18 addi r1,r1,1 | addi r1,r1,1 |              |              ;
 19 stw r1,0(r2) | stw r1,0(r4) |              |              ;
 20
 21 exists
 22 (2:r3=1 /\ 2:r5=0 /\ 3:r3=1 /\ 3:r5=0)
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Expanded IRIW Litmus Test}
 \label{fig:formal:Expanded IRIW Litmus Test}
 \end{figure*}

 However, where PPCMEM requires 14 CPU hours to solve IRIW, herd does so
 in 17 milliseconds, which represents a speedup of more than six orders
 of magnitude.
 That said, the problem is exponential in nature, so we should expect
 herd to exhibit exponential slowdowns for larger problems.
 And this is exactly what happens, for example, if we add four more writes
 per writing CPU as shown in
 Figure~\ref{fig:formal:Expanded IRIW Litmus Test},
 herd slows down by a factor of more than 50,000, requiring more than
 15 \emph{minutes} of CPU time.
 Adding threads also results in exponential
 slowdowns~\cite{PaulEMcKenney2014weakaxiom}.

 Despite their exponential nature, both PPCMEM and herd have proven quite
 useful for checking key parallel algorithms, including the queued-lock
 handoff on x86 systems.
 The weaknesses of the herd tool are similar to those of PPCMEM, which
 were described in
 Section~\ref{sec:formal:PPCMEM Discussion}.
 There are some obscure (but very real) cases for which the PPCMEM and
 herd tools disagree, and as of late 2014 resolving these disagreements
 was ongoing.

 Longer term, the hope is that the axiomatic approaches incorporate
 axioms describing higher-level software artifacts.
 This could potentially allow axiomatic verification of much larger
 software systems.
 Another alternative is to press the axioms of boolean logic into service,
 as described in the next section.
	% formal/axiomatic.tex

	\section{Axiomatic Approaches}
	\label{sec:formal:Axiomatic Approaches}
	\OriginallyPublished{Section}{sec:formal:Axiomatic Approaches}{Axiomatic Approaches}{Linux Weekly News}{PaulEMcKenney2014weakaxiom}

	\begin{figure*}[tb]
	{ \scriptsize
	\begin{verbbox}
	1 PPC IRIW.litmus
	2 ""
	3 (* Traditional IRIW. *)
	4 {
	5 0:r1=1; 0:r2=x;
	6 1:r1=1; 1:r4=y;
	7 2:r2=x; 2:r4=y;
	8 3:r2=x; 3:r4=y;
	9 }
	10 P0 \| P1 \| P2 \| P3 ;
	11 stw r1,0(r2) \| stw r1,0(r4) \| lwz r3,0(r2) \| lwz r3,0(r4) ;
	12 \| \| sync \| sync ;
	13 \| \| lwz r5,0(r4) \| lwz r5,0(r2) ;
	14
	15 exists
	16 (2:r3=1 /\ 2:r5=0 /\ 3:r3=1 /\ 3:r5=0)
	\end{verbbox}
	}
	\centering
	\theverbbox
	\caption{IRIW Litmus Test}
	\label{fig:formal:IRIW Litmus Test}
	\end{figure*}

	Although the PPCMEM tool can solve the famous ``independent reads of
	independent writes'' (IRIW) litmus test shown in
	Figure~\ref{fig:formal:IRIW Litmus Test}, doing so requires no less than
	fourteen CPU hours and generates no less than ten gigabytes of state space.
	That said, this situation is a great improvement over that before the advent
	of PPCMEM, where solving this problem required perusing volumes of
	reference manuals, attempting proofs, discussing with experts, and
	being unsure of the final answer.
	Although fourteen hours can seem like a long time, it is much shorter
	than weeks or even months.

	However, the time required is a bit surprising given the simplicity
	of the litmus test, which has two threads storing to two separate variables
	and two other threads loading from these two variables in opposite
	orders.
	The assertion triggers if the two loading threads disagree on the order
	of the two stores.
	This litmus test is simple, even by the standards of memory-order litmus
	tests.

	One reason for the amount of time and space consumed is that PPCMEM does
	a trace-based full-state-space search, which means that it must generate
	and evaluate all possible orders and combinations of events at the
	architectural level.
	At this level, both loads and stores correspond to ornate sequences
	of events and actions, resulting in a very large state space that must
	be completely searched, in turn resulting in large memory and CPU
	consumption.

	Of course, many of the traces are quite similar to one another, which
	suggests that an approach that treated similar traces as one might
	improve performace.
	One such approach is the axiomatic approach of
	Alglave et al.~\cite{Alglave:2014:HCM:2594291.2594347},
	which creates a set of axioms to represent the memory model and then
	converts litmus tests to theorems that might be proven or disproven
	over this set of axioms.
	The resulting tool, called ``herd'', conveniently takes as input the
	same litmus tests as PPCMEM, including the IRIW litmus test shown in
	Figure~\ref{fig:formal:IRIW Litmus Test}.

	\begin{figure*}[tb]
	{ \scriptsize
	\begin{verbbox}
	1 PPC IRIW5.litmus
	2 ""
	3 (* Traditional IRIW, but with five stores instead of just one. *)
	4 {
	5 0:r1=1; 0:r2=x;
	6 1:r1=1; 1:r4=y;
	7 2:r2=x; 2:r4=y;
	8 3:r2=x; 3:r4=y;
	9 }
	10 P0 \| P1 \| P2 \| P3 ;
	11 stw r1,0(r2) \| stw r1,0(r4) \| lwz r3,0(r2) \| lwz r3,0(r4) ;
	12 addi r1,r1,1 \| addi r1,r1,1 \| sync \| sync ;
	13 stw r1,0(r2) \| stw r1,0(r4) \| lwz r5,0(r4) \| lwz r5,0(r2) ;
	14 addi r1,r1,1 \| addi r1,r1,1 \| \| ;
	15 stw r1,0(r2) \| stw r1,0(r4) \| \| ;
	16 addi r1,r1,1 \| addi r1,r1,1 \| \| ;
	17 stw r1,0(r2) \| stw r1,0(r4) \| \| ;
	18 addi r1,r1,1 \| addi r1,r1,1 \| \| ;
	19 stw r1,0(r2) \| stw r1,0(r4) \| \| ;
	20
	21 exists
	22 (2:r3=1 /\ 2:r5=0 /\ 3:r3=1 /\ 3:r5=0)
	\end{verbbox}
	}
	\centering
	\theverbbox
	\caption{Expanded IRIW Litmus Test}
	\label{fig:formal:Expanded IRIW Litmus Test}
	\end{figure*}

	However, where PPCMEM requires 14 CPU hours to solve IRIW, herd does so
	in 17 milliseconds, which represents a speedup of more than six orders
	of magnitude.
	That said, the problem is exponential in nature, so we should expect
	herd to exhibit exponential slowdowns for larger problems.
	And this is exactly what happens, for example, if we add four more writes
	per writing CPU as shown in
	Figure~\ref{fig:formal:Expanded IRIW Litmus Test},
	herd slows down by a factor of more than 50,000, requiring more than
	15 \emph{minutes} of CPU time.
	Adding threads also results in exponential
	slowdowns~\cite{PaulEMcKenney2014weakaxiom}.

	Despite their exponential nature, both PPCMEM and herd have proven quite
	useful for checking key parallel algorithms, including the queued-lock
	handoff on x86 systems.
	The weaknesses of the herd tool are similar to those of PPCMEM, which
	were described in
	Section~\ref{sec:formal:PPCMEM Discussion}.
	There are some obscure (but very real) cases for which the PPCMEM and
	herd tools disagree, and as of late 2014 resolving these disagreements
	was ongoing.

	Longer term, the hope is that the axiomatic approaches incorporate
	axioms describing higher-level software artifacts.
	This could potentially allow axiomatic verification of much larger
	software systems.
	Another alternative is to press the axioms of boolean logic into service,
	as described in the next section.