advsync/advsync.tex - pub/scm/linux/kernel/git/paulmck/perfbook - Git at Google

 % advsync/advsync.tex
 % mainfile: ../perfbook.tex
 % SPDX-License-Identifier: CC-BY-SA-3.0

 \QuickQuizChapter{sec:advsync:Advanced Synchronization}{Advanced Synchronization}{qqzadvsync}
 %
 \Epigraph{If a little knowledge is a dangerous thing, just think what
 	  you could do with a lot of knowledge!}{\emph{Unknown}}

 This chapter covers synchronization techniques used for lockless
 algorithms and parallel real-time systems.

 Although lockless algorithms can be quite helpful when faced with
 extreme requirements, they are no panacea.
 For example, as noted at the end of \cref{chp:Counting},
 you should thoroughly apply partitioning, batching, and
 well-tested packaged weak APIs
 (see \cref{chp:Data Ownership,chp:Deferred Processing})
 before even thinking about lockless algorithms.

 But after doing all that, you still might find yourself needing the
 advanced techniques described in this chapter.
 To that end,
 \cref{sec:advsync:Avoiding Locks}
 summarizes techniques used thus far for avoiding locks and
 \cref{sec:advsync:Non-Blocking Synchronization}
 gives a brief overview of non-blocking synchronization.
 Memory ordering is also quite important, but it warrants its own
 \cref{chp:Advanced Synchronization: Memory Ordering}.

 The second form of advanced synchronization provides the stronger
 forward-progress guarantees needed for parallel real-time computing,
 which is the topic of
 \cref{sec:advsync:Parallel Real-Time Computing}.

 \section{Avoiding Locks}
 \label{sec:advsync:Avoiding Locks}
 %
 \epigraph{We are confronted with insurmountable opportunities.}
 	 {\emph{Walt Kelly}}

 Although locking is the workhorse of parallelism in production, in
 many situations performance, scalability, and real-time response can
 all be greatly improved through use of lockless techniques.
 A particularly impressive example of such a lockless technique are
 the statistical counters described in
 \cref{sec:count:Statistical Counters},
 which avoids not only locks, but also atomic operations, memory barriers,
 and even cache misses for counter increments.
 Other examples we have covered include:

 \begin{enumerate}
 \item	The fastpaths through a number of other counting algorithms
 	in \cref{chp:Counting}.
 \item	The fastpath through resource allocator caches in
 	\cref{sec:SMPdesign:Resource Allocator Caches}.
 \item	The maze solver in \cref{sec:SMPdesign:Beyond Partitioning}.
 \item	The data-ownership techniques in \cref{chp:Data Ownership}.
 \item	The reference-counting, hazard-pointer, and RCU techniques
 	in \cref{chp:Deferred Processing}.
 \item	The lookup code paths in \cref{chp:Data Structures}.
 \item	Many of the techniques in \cref{chp:Putting It All Together}.
 \end{enumerate}

 In short, lockless techniques are quite useful and are heavily used.
 However, it is best if lockless techniques are hidden behind a
 well-defined API, such as the \co{inc_count()}, \co{memblock_alloc()},
 \co{rcu_read_lock()}, and so on.
 The reason for this is that undisciplined use of lockless techniques
 is a good way to create difficult bugs.
 If you believe that finding and fixing such bugs is easier than avoiding
 them, please re-read
 \cref{chp:Validation,chp:Formal Verification}.

 \section{Non-Blocking Synchronization}
 \label{sec:advsync:Non-Blocking Synchronization}
 %
 \epigraph{Never worry about theory as long as the machinery does what
 	  it's supposed to do.}
 	 {\emph{Robert A. Heinlein}}

 The term \emph{non-blocking synchronization} (NBS)~\cite{MauriceHerlihy90a}
 describes seven classes of linearizable algorithms with differing
 \emph{forward-progress guarantees}~\cite{DanAlitarh2013PracticalProgress},
 which are as follows:

 \begin{enumerate}
 \item	\emph{Bounded wait-free synchronization}: Every thread
 	will make progress within
 	a specific finite period of time~\cite{Herlihy91}.
 	This level is widely considered to be unachievable, which might be why
 	Alitarh et al.\ omitted it~\cite{DanAlitarh2013PracticalProgress}.
 \item	\emph{Wait-free synchronization}: Every thread will make progress
 	in finite time~\cite{Herlihy93}.
 \item	\emph{Lock-free synchronization}: At least one thread will
 	make progress in finite time~\cite{Herlihy93}.
 \item	\emph{Obstruction-free synchronization}: Every thread will
 	make progress in finite time in the absence of
 	contention~\cite{HerlihyLM03}.
 \item	\emph{Clash-free synchronization}: At least one thread will
 	make progress in finite time in the absence of
 	contention~\cite{DanAlitarh2013PracticalProgress}.
 \item	\emph{Starvation-free synchronization}: Every thread will
 	make progress in finite time in the absence of
 	failures~\cite{DanAlitarh2013PracticalProgress}.
 \item	\emph{Deadlock-free synchronization}: At least one thread will
 	make progress in finite time in the absence of
 	failures~\cite{DanAlitarh2013PracticalProgress}.
 \end{enumerate}

 NBS classes~1, 2 and~3 were first formulated in the early 1990s,
 class~4 was first formulated in the early 2000s,
 and class~5 was first formulated in 2013.
 The final two classes have seen informal use for a great many decades,
 but were reformulated in 2013.

 In theory, any parallel algorithm can be cast into wait-free form,
 but there are a relatively small subset of NBS algorithms that are
 in common use.
 A few of these are listed in the following section.

 \subsection{Simple NBS}
 \label{sec:advsync:Simple NBS}

 Perhaps the simplest NBS algorithm is atomic update of an integer
 counter using fetch-and-add (\co{atomic_add_return()}) primitives.

 \subsubsection{NBS Sets}
 \label{sec:advsync:NBS Sets}

 Another simple NBS algorithm implements a set of integers in an array.
 Here the array index indicates a value that might be a member of the set
 and the array element indicates whether or not that value actually is
 a set member.
 The linearizability criterion for NBS algorithms requires that reads from
 and updates to the array either use atomic instructions or be accompanied
 by memory barriers, but in the not-uncommon case where linearizability
 is not important, simple volatile loads and stores suffice, for example,
 using \co{READ_ONCE()} and \co{WRITE_ONCE()}.

 An NBS set may also be implemented using a bitmap, where each value that
 might be a member of the set corresponds to one bit.
 Reads and updates must normally be carried out via atomic bit-manipulation
 instructions, although compare-and-swap (\co{cmpxchg()} or CAS)
 instructions can also be used.

 \subsubsection{NBS Counters}
 \label{sec:advsync:NBS Counters}

 The statistical counters algorithm discussed in
 \cref{sec:count:Statistical Counters}
 can be considered to be bounded-wait-free, but only by using a cute
 definitional trick in which the sum is considered to be approximate
 rather than exact.\footnote{
 	Citation needed.
 	I heard of this trick verbally from Mark Moir.}
 Given sufficiently wide error bounds that are a function of the length
 of time that the \co{read_count()} function takes to sum the counters,
 it is not possible to prove that any non-linearizable behavior occurred.
 This definitely (if a bit artificially) classifies the statistical-counters
 algorithm as wait-free.
 This algorithm is probably the most heavily used NBS algorithm in the
 Linux kernel.

 \subsubsection{Half-NBS Queue}
 \label{sec:advsync:Half-NBS Queue}

 \begin{fcvref}[ln:count:NBS Enqueue Algorithm]
 Another common NBS algorithm is the atomic queue where elements are
 enqueued using an atomic exchange instruction~\cite{MagedMichael1993JPDC},
 followed by a store into the \co{->next} pointer of the new element's
 predecessor, as shown in \cref{lst:count:NBS Enqueue Algorithm},
 which shows the userspace-RCU library
 implementation~\cite{MathieuDesnoyers2009URCU}.
 \Clnref{tail} updates the tail pointer to reference the new element while
 returning a reference to its predecessor, which is stored in
 local variable \co{old_tail}.
 \Clnref{pred} then updates the predecessor's \co{->next} pointer to
 reference the newly added element, and finally \clnref{ret}
 returns an indication as to whether or not the queue was initially
 empty.

 \begin{listing}[tbp]
 \begin{fcvlabel}[ln:count:NBS Enqueue Algorithm]
 \begin{VerbatimL}[commandchars=\\\[\]]
 static inline bool
 ___cds_wfcq_append(struct cds_wfcq_head *head,
                    struct cds_wfcq_tail *tail,
                    struct cds_wfcq_node *new_head,
                    struct cds_wfcq_node *new_tail)
 {
 	struct cds_wfcq_node *old_tail;

 	old_tail = uatomic_xchg(&tail->p, new_tail);	\lnlbl[tail]
 	CMM_STORE_SHARED(old_tail->next, new_head);     \lnlbl[pred]
 	return old_tail != &head->node;			\lnlbl[ret]
 }

 static inline bool
 _cds_wfcq_enqueue(struct cds_wfcq_head *head,
                   struct cds_wfcq_tail *tail,
                   struct cds_wfcq_node *new_tail)
 {
 	return ___cds_wfcq_append(head, tail,
 	                          new_tail, new_tail);
 }
 \end{VerbatimL}
 \end{fcvlabel}
 \caption{NBS Enqueue Algorithm}
 \label{lst:count:NBS Enqueue Algorithm}
 \end{listing}

 Although mutual exclusion is required to dequeue a single element
 (so that dequeue is blocking), it is possible to carry out a non-blocking
 removal of the entire contents of the queue.
 What is not possible is to dequeue any given element in a non-blocking
 manner: The enqueuer might have failed between \clnref{tail,pred} of the
 listing, so that the element in question is only partially enqueued.
 This results in a half-NBS algorithm where enqueues are NBS but
 dequeues are blocking.
 This algorithm is nevertheless heavily used in practice, in part because
 most production software is not required to tolerate arbitrary fail-stop
 errors.
 \end{fcvref}

 \subsubsection{NBS Stack}
 \label{sec:advsync:NBS Stack}

 \begin{fcvref}[ln:advsync:lifo_push:whole]
 \Cref{lst:advsync:NBS Stack Algorithm}
 shows the LIFO push algorithm, which boasts lock-free push and
 bounded wait-free pop (\path{lifo-push.c}), forming an NBS stack.
 The origins of this algorithm are unknown, but it was referred to in
 a patent granted in 1975~\cite{PaulJBrown1975LIFOpush}.
 This patent was filed in 1973, a few months before your editor
 saw his first computer, which had but one CPU.

 \begin{listing}[tbp]
 \input{CodeSamples/advsync/lifo_push@whole.fcv}
 \caption{NBS Stack Algorithm}
 \label{lst:advsync:NBS Stack Algorithm}
 \end{listing}

 \Clnrefrange{struct:b}{struct:e} shows the \co{node_t} structure,
 which contains an arbitrary value and a pointer to the next structure
 on the stack and
 \Clnref{top} shows the top-of-stack pointer.

 The \co{list_push()} function spans \clnrefrange{push:b}{push:e}.
 \Clnref{push:alloc} allocates a new node and
 \clnref{push:initialize} initializes it.
 \Clnref{push:next} initializes the newly allocated node's \co{->next}
 pointer, and \clnref{push:cmpxchg} attempts to push it on the stack.
 If \clnref{push:check} detects \co{cmpxchg()} failure, another pass
 through the loop retries.
 Otherwise, the new node has been successfully pushed, and this function
 returns to its caller.
 Note that \clnref{push:check} resolves races in which two concurrent
 instances of \co{list_push()} attempt to push onto the stack.
 The \co{cmpxchg()} will succeed for one and fail for the other,
 causing the other to retry, thereby selecting an arbitrary order for
 the two node on the stack.

 The \co{list_pop_all()} function spans \clnrefrange{popall:b}{popall:e}.
 The \co{xchg()} statement on \clnref{popall:xchg} atomically removes
 all nodes on the stack, placing the head of the resulting list in local
 variable \co{p} and setting \co{top} to \co{NULL}.
 This atomic operation serializes concurrent calls to \co{list_pop_all()}:
 One of them will get the list, and the other a \co{NULL} pointer, at
 least assuming that there were no concurrent calls to \co{list_push()}.

 An instance of \co{list_pop_all()} that obtains a non-empty list in
 \co{p} processes this list in the loop spanning
 \clnrefrange{popall:loop:b}{popall:loop:e}.
 \Clnref{popall:next} prefetches the \co{->next} pointer,
 \clnref{popall:foo} invokes the function referenced by \co{foo()} on the
 current node,
 \clnref{popall:free} frees the current node, and
 \clnref{popall:pnext} sets up \co{p} for the next pass through the loop.

 But suppose that a pair of \co{list_push()} instances run concurrently
 with a \co{list_pop_all()} with a list initially containing a single
 node~A.
 Here is one way that this scenario might play out:

 \begin{enumerate}
 \item	The first \co{list_push()} instance pushes a new node~B,
 	executing through \clnref{push:next}, having just stored
 	a pointer to node A into node~B's \co{->next} pointer.
 \item	The \co{list_pop_all()} instance runs to completion,
 	setting \co{top} to \co{NULL} and freeing node~A.
 \item	The second \co{list_push()} instance runs to completion,
 	pushing a new node~C, but happens to allocate the memory
 	that used to belong to node~A.
 \item	The first \co{list_push()} instance executes the \co{cmpxchg()}
 	on \clnref{push:cmpxchg}.
 	Because new node~C has the same address as the newly freed node~A,
 	this \co{cmpxchg()} succeeds and this \co{list_push()} instance
 	runs to completion.
 \end{enumerate}

 Note that both pushes and the popall all ran successfully despite the
 reuse of node~A's memory.
 This is an unusual property: Most data structures require protection
 against what is often called the ABA problem.

 But this property holds only for algorithm written in assembly
 language.
 The sad fact is that most languages (including C and C++) do not support
 pointers to lifetime-ended objects, such as the pointer to the old node~A
 contained in node~B's \co{->next} pointer.
 In fact, compilers are within their rights to assume that if two pointers
 (call them \co{p} and \co{q}) were returned from two different calls to
 \co{malloc()}, then those pointers must not be equal.
 Real compilers really will generate the constant \co{false} in
 response to a \co{p==q} comparison.
 This profoundly counter-intuitive behavior is termed \emph{lifetime-end
 pointer zap}.

 Many concurrent applications avoid this problem by carefully hiding the
 memory allocator from the compiler, thus preventing the compiler from
 making inappropriate assumptions.
 This obfuscatory approach currently works in practice, but might well
 one day fall victim to increasingly aggressive optimizers.
 There is work underway in both the C and C++ standards committees
 to address this
 problem~\cite{PaulEMcKenney2019PointerLifetimeEndZap,PaulEMcKenney2020PointerLifetimeEndZapCpp}.
 In the meantime, please exercise great care when coding ABA-tolerant
 algorithms.
 \end{fcvref}

 \QuickQuiz{
 	So why not ditch antique languages like C and C++ for something
 	more modern?
 }\QuickQuizAnswer{
 	That won't help unless the more-modern languages proponents
 	are energetic enough to write their own compiler backends.
 	The usual practice of re-using existing backends also reuses
 	charming properties such as lifetime-end pointer zap.
 }\QuickQuizEnd

 \subsection{Applicability of NBS Benefits}
 \label{sec:advsync:Applicability of NBS Benefits}

 The most heavily cited NBS benefits stem from its forward-progress
 guarantees, its tolerance of fail-stop bugs, and from its linearizability.
 Each of these is discussed in one of the following sections.

 \subsubsection{NBS Forward Progress Guarantees}
 \label{sec:advsync:NBS Forward Progress Guarantees}

 NBS's forward-progress guarantees have caused many to suggest its use in
 real-time systems, and NBS algorithms are in fact used in a great many
 such systems.
 However, it is important to note that forward-progress guarantees are
 largely orthogonal to those that form the basis of real-time programming:

 \begin{enumerate}
 \item	Real-time forward-progress guarantees usually have some
 	definite time associated with them, for example,
 	``scheduling latency must be less than 100 microseconds.''
 	In contrast, the most popular forms of NBS only guarantees
 	that progress will be made in finite time, with no definite
 	bound.
 \item	Real-time forward-progress guarantees are often
 	probabilistic, as in the soft-real-time guarantee that
 	``at least 99.9\,\% of the time, scheduling latency must
 	be less than 100 microseconds.''
 	In contrast, NBS's forward-progress
 	guarantees have traditionally been unconditional.
 \item	Real-time forward-progress guarantees are often conditioned on
 	environmental constraints, for example, only being honored:
 	(1)~for the highest-priority tasks,
 	(2)~when each CPU spends at least a certain fraction of its time idle,
 	or (3)~when I/O rates are below some specified maximum.
 	In contrast, NBS's forward-progress
 	guarantees are usually unconditional.\footnote{
 		As we will see below, some recent NBS work relaxes
 		this guarantee.}
 \item	An important component of a real-time program's environment
 	is the scheduler.
 	NBS algorithms assume a worst-case \emph{demonic scheduler}.
 	In contrast, real-time systems assume that the scheduler is
 	doing its level best to satisfy any scheduling constraints
 	it knows about, and, in the absence of such constraints,
 	its level best to honor process priorities and to provide
 	fair scheduling to processes of the same priority.
 	This assumption of a non-demonic scheduler allows real-time
 	programs to use simpler algorithms than those required for
 	NBS~\cite{DanAlitarh2013PracticalProgress,BjoernBrandenburgPhD}.
 \item	Real-time forward-progress guarantees usually apply only
 	in the absence of software bugs.
 	In contrast, most NBS guarantees apply even in the face of
 	fail-stop bugs.\footnote{
 		Again, some recent NBS work relaxes this guarantee.}
 \item	NBS forward-progress guarantee classes imply linearizability.
 	In contrast, real-time forward progress guarantees are often
 	independent of ordering constraints such as linearizability.
 \end{enumerate}

 To reiterate, despite these differences, a number of NBS algorithms are
 extremely useful in real-time programs.

 \subsubsection{NBS Fail-Stop Tolerance}
 \label{sec:advsync:NBS Fail-Stop Tolerance}

 Of the classes of NBS algorithms, wait-free synchronization (bounded or
 otherwise), lock-free synchronization, obstruction-free synchronization,
 and clash-free synchronization guarantee forward progress even in the
 presence of fail-stop bugs.
 An example fail-stop bug would be a scheduler bug that causes a random
 thread to be preempted indefinitely.
 As we will see, this fail-stop-tolerant property can be useful, but the
 fact is that composing a set of fail-stop-tolerate mechanisms does not
 necessarily result in a fail-stop-tolerant system.
 To see this, consider a system made up of a series of wait-free queues,
 where an element is removed from one queue in the series, processed,
 and then added to the next queue.

 If a thread is preempted in the midst of a queuing operation, all is well
 because the wait-free nature of the queue will guarantee forward progress.
 But suppose that preemption occurs not during the queuing, but instead
 during the between-queues processing.
 In this case, the element being processed is lost because
 the fail-stop-tolerant nature of the wait-free queues does not
 extend to the code using those queues.

 Nevertheless, there are a few applications where NBS's rather limited
 fail-stop-tolerance is useful.
 For example, in some network-based or web applications, a fail-stop
 event will eventually result in a retransmission, which will restart
 any work that was lost due to the fail-stop event.
 Systems running such applications can therefore be heavily loaded, even
 to the point where the scheduler can no longer provide any reasonable
 fairness guarantee.
 In constrast, if a thread fail-stops while holding a lock, the application
 will normally need to be restarted.
 Nevertheless, NBS is not a panacea even within this restricted area,
 due to the possibility of spurious retransmissions due to pure scheduling
 delays.
 In some cases, it may be more efficient to reduce the load to avoid
 queueing delays, which will also improve the scheduler's ability to
 provide fair access, which might turn reduce or even eliminate the
 apparent fail-stop events, thus shedding the load due to the retry
 operations initiated by retransmissions.

 \subsubsection{NBS Linearizability}
 \label{sec:advsync:NBS Linearizability}

 It is important to note that linearizability can be quite useful,
 especially when analyzing concurrent code made up of strict locking
 and fully ordered atomic operations.\footnote{
 	For example, the Linux kernel's value-returning atomic operations.}
 Furthermore, this handling of fully ordered atomic operations
 automatically covers simple NBS algorithms.

 However, the linearization points of a complex NBS algorithms are often
 buried deep within that algorithm, and thus not visible to users of
 a library function implementing a part of such an algorithm.
 Therefore, any claims that users benefit from the linearizability properties
 of complex NBS algorithms should be regarded with deep
 suspicion~\cite{AndreasHaas2012FIFOisnt}.

 It is sometimes asserted that linearizability is necessary for developers
 to produce proofs of correctness for their concurrent code.
 However, such proofs are the exception rather than the rule, and modern
 developers who do produce proofs often use modern proof techniques that
 do not depend on linearizability.
 Furthermore, developers frequently use modern proof techniques that do
 not require a full specification, given that developers often learn
 their specification after the fact, one bug at a time.
 A few such proof techniques were discussed in
 \cref{chp:Formal Verification}.\footnote{
 	A memorable verbal discussion with an advocate of linearizability
 	resulted in this advocate being asked:
 	``So the reason linearizability is important is to rescue 1980s
 	proof techniques?''
 	The advocate immediately replied in the affirmative, then spent
 	some time disparaging a particular modern proof technique.
 	Oddly enough, that technique was one of those successfully
 	applied to Linux-kernel RCU.}

 It is often asserted that linearizability maps well to sequential
 specifications, which are said to be more natural than are concurrent
 specifications~\cite{SergioRajsbaum2020HistoryLinearizability}.
 But this assertion flies in the face of reality in the form of a highly
 concurrent objective universe.
 Therefore, nature would be expected to select for beings able to cope
 with at least some level of concurrency, and anyone watching a team sport
 or a person successfully overseeing a number of small children would be
 hard-pressed to argue that nature has not done exactly that.
 In addition, given that the teaching of sequential
 computing is still believed to be somewhat of a black
 art~\cite{ElizabethPatitsas2020GradesNotBimodal}, it is reasonable
 to expect that teaching of concurrent computing is in a similar state
 of disarray.
 Therefore, focusing on only one proof technique is unlikely to be a
 good way forward.

 Again, please understand that linearizability is quite useful in many
 situations.
 Then again, so is that venerable tool, the hammer.
 But there comes a point in the field of computing where one should put
 down the hammer and pick up a keyboard.
 Similarly, it appears that there are times when linearizability is not
 the best tool for the job.

 To their credit, there are some linearizability advocates who are aware
 of some of its shortcomings~\cite{SergioRajsbaum2020HistoryLinearizability}.
 There are also proposals to extend linearizability, for example,
 interval-linearizability, which is intended to handle the common case
 of operations that require non-zero time to
 complete~\cite{10.1145/3266457}.
 It remains to be seen whether these proposals will result in theories
 able to handle modern concurrent software artifacts, especially given
 that several of the proof techniques discussed in \cref{chp:Formal
 Verification} already handle many modern concurrent software artifacts.

 \subsection{NBS Discussion}
 \label{sec:advsync:NBS Discussion}

 It is possible to create fully non-blocking queues~\cite{MichaelScott96},
 however, such queues are much more complex than the half-NBS algorithm
 outlined above.
 The lesson here is to carefully consider your actual requirements.
 Relaxing irrelevant requirements can often result in great
 improvements in simplicity, performance, and scalability.

 Recent research points to another important way to relax requirements.
 It turns out that systems providing fair scheduling can enjoy most
 of the benefits of wait-free synchronization even when running
 algorithms that provide only non-blocking
 synchronization, both in theory~\cite{DanAlitarh2013PracticalProgress}
 and in practice~\cite{SamyAlBahra2013NBS}.
 Because a great many schedulers used in production do in fact
 provide fairness,
 the more-complex algorithms providing wait-free synchronization
 usually provide no practical advantages over their simpler
 and often faster non-blocking-synchronization counterparts.

 Interestingly enough, fair scheduling is but one beneficial
 constraint that is often respected in practice.
 Other sets of constraints can permit blocking algorithms to
 achieve deterministic real-time response.
 For example, given fair locks that are granted to requesters in FIFO order at
 a given priority level,
 a method of avoiding priority inversion (such as priority
 inheritance~\cite{Takada:1995:RSN:527074.828566,Cai-DongWang1996PrioInherLock}
 or priority ceiling), a bounded number of threads,
 bounded critical sections,
 bounded load,
 and avoidance of fail-stop bugs,
 lock-based applications can provide deterministic
 response times~\cite{BjoernBrandenburgPhD,DipankarSarma2004OLSscalability}.
 This approach of course blurs the distinction between blocking and wait-free
 synchronization, which is all to the good.
 Hopefully theoretical frameworks continue to grow, further increasing
 their ability to
 describe how software is actually constructed in practice.

 Those who feel that theory should lead the way are referred to the
 inimitable Peter Denning, who said of operating systems:
 ``Theory follows practice''~\cite{Denning:2015:POF:2830903.2830904},
 or to the eminent Tony Hoare, who said of the whole of engineering:
 ``In all branches of engineering science, the engineering starts before
 the science; indeed, without the early products of engineering, there
 would be nothing for the scientist to
 study!''~\cite{RichardMorris2007TonyHoareInterview}
 However, once an appropriate body of theory becomes available,\footnote{
 	Admittedly often much later than the first \emph{proposed}
 	body of theory.}
 it is wise to make use of it.

 \input{advsync/rt}

 \QuickQuizAnswersChp{qqzadvsync}