| % advsync/advsync.tex |
| % mainfile: ../perfbook.tex |
| % SPDX-License-Identifier: CC-BY-SA-3.0 |
| |
| \QuickQuizChapter{sec:advsync:Advanced Synchronization}{Advanced Synchronization}{qqzadvsync} |
| % |
| \Epigraph{If a little knowledge is a dangerous thing, just think what |
| you could do with a lot of knowledge!}{\emph{Unknown}} |
| |
| This chapter covers synchronization techniques used for lockless |
| algorithms and parallel real-time systems. |
| |
| Although lockless algorithms can be quite helpful when faced with |
| extreme requirements, they are no panacea. |
| For example, as noted at the end of \cref{chp:Counting}, |
| you should thoroughly apply partitioning, batching, and |
| well-tested packaged weak APIs |
| (see \cref{chp:Data Ownership,chp:Deferred Processing}) |
| before even thinking about lockless algorithms. |
| |
| But after doing all that, you still might find yourself needing the |
| advanced techniques described in this chapter. |
| To that end, |
| \cref{sec:advsync:Avoiding Locks} |
| summarizes techniques used thus far for avoiding locks and |
| \cref{sec:advsync:Non-Blocking Synchronization} |
| gives a brief overview of non-blocking synchronization. |
| Memory ordering is also quite important, but it warrants its own |
| \cref{chp:Advanced Synchronization: Memory Ordering}. |
| |
| The second form of advanced synchronization provides the stronger |
| forward-progress guarantees needed for parallel real-time computing, |
| which is the topic of |
| \cref{sec:advsync:Parallel Real-Time Computing}. |
| |
| \section{Avoiding Locks} |
| \label{sec:advsync:Avoiding Locks} |
| % |
| \epigraph{We are confronted with insurmountable opportunities.} |
| {\emph{Walt Kelly}} |
| |
| Although locking is the workhorse of parallelism in production, in |
| many situations performance, scalability, and real-time response can |
| all be greatly improved through use of lockless techniques. |
| A particularly impressive example of such a lockless technique are |
| the statistical counters described in |
| \cref{sec:count:Statistical Counters}, |
| which avoids not only locks, but also atomic operations, memory barriers, |
| and even cache misses for counter increments. |
| Other examples we have covered include: |
| |
| \begin{enumerate} |
| \item The fastpaths through a number of other counting algorithms |
| in \cref{chp:Counting}. |
| \item The fastpath through resource allocator caches in |
| \cref{sec:SMPdesign:Resource Allocator Caches}. |
| \item The maze solver in \cref{sec:SMPdesign:Beyond Partitioning}. |
| \item The data-ownership techniques in \cref{chp:Data Ownership}. |
| \item The reference-counting, hazard-pointer, and RCU techniques |
| in \cref{chp:Deferred Processing}. |
| \item The lookup code paths in \cref{chp:Data Structures}. |
| \item Many of the techniques in \cref{chp:Putting It All Together}. |
| \end{enumerate} |
| |
| In short, lockless techniques are quite useful and are heavily used. |
| However, it is best if lockless techniques are hidden behind a |
| well-defined API, such as the \co{inc_count()}, \co{memblock_alloc()}, |
| \co{rcu_read_lock()}, and so on. |
| The reason for this is that undisciplined use of lockless techniques |
| is a good way to create difficult bugs. |
| If you believe that finding and fixing such bugs is easier than avoiding |
| them, please re-read |
| \cref{chp:Validation,chp:Formal Verification}. |
| |
| \section{Non-Blocking Synchronization} |
| \label{sec:advsync:Non-Blocking Synchronization} |
| % |
| \epigraph{Never worry about theory as long as the machinery does what |
| it's supposed to do.} |
| {\emph{Robert A. Heinlein}} |
| |
| The term \emph{non-blocking synchronization} (NBS)~\cite{MauriceHerlihy90a} |
| describes seven classes of linearizable algorithms with differing |
| \emph{forward-progress guarantees}~\cite{DanAlitarh2013PracticalProgress}, |
| which are as follows: |
| |
| \begin{enumerate} |
| \item \emph{Bounded wait-free synchronization}: Every thread |
| will make progress within |
| a specific finite period of time~\cite{Herlihy91}. |
| This level is widely considered to be unachievable, which might be why |
| Alitarh et al.\ omitted it~\cite{DanAlitarh2013PracticalProgress}. |
| \item \emph{Wait-free synchronization}: Every thread will make progress |
| in finite time~\cite{Herlihy93}. |
| \item \emph{Lock-free synchronization}: At least one thread will |
| make progress in finite time~\cite{Herlihy93}. |
| \item \emph{Obstruction-free synchronization}: Every thread will |
| make progress in finite time in the absence of |
| contention~\cite{HerlihyLM03}. |
| \item \emph{Clash-free synchronization}: At least one thread will |
| make progress in finite time in the absence of |
| contention~\cite{DanAlitarh2013PracticalProgress}. |
| \item \emph{Starvation-free synchronization}: Every thread will |
| make progress in finite time in the absence of |
| failures~\cite{DanAlitarh2013PracticalProgress}. |
| \item \emph{Deadlock-free synchronization}: At least one thread will |
| make progress in finite time in the absence of |
| failures~\cite{DanAlitarh2013PracticalProgress}. |
| \end{enumerate} |
| |
| NBS classes~1, 2 and~3 were first formulated in the early 1990s, |
| class~4 was first formulated in the early 2000s, |
| and class~5 was first formulated in 2013. |
| The final two classes have seen informal use for a great many decades, |
| but were reformulated in 2013. |
| |
| In theory, any parallel algorithm can be cast into wait-free form, |
| but there are a relatively small subset of NBS algorithms that are |
| in common use. |
| A few of these are listed in the following section. |
| |
| \subsection{Simple NBS} |
| \label{sec:advsync:Simple NBS} |
| |
| Perhaps the simplest NBS algorithm is atomic update of an integer |
| counter using fetch-and-add (\co{atomic_add_return()}) primitives. |
| |
| \subsubsection{NBS Sets} |
| \label{sec:advsync:NBS Sets} |
| |
| Another simple NBS algorithm implements a set of integers in an array. |
| Here the array index indicates a value that might be a member of the set |
| and the array element indicates whether or not that value actually is |
| a set member. |
| The linearizability criterion for NBS algorithms requires that reads from |
| and updates to the array either use atomic instructions or be accompanied |
| by memory barriers, but in the not-uncommon case where linearizability |
| is not important, simple volatile loads and stores suffice, for example, |
| using \co{READ_ONCE()} and \co{WRITE_ONCE()}. |
| |
| An NBS set may also be implemented using a bitmap, where each value that |
| might be a member of the set corresponds to one bit. |
| Reads and updates must normally be carried out via atomic bit-manipulation |
| instructions, although compare-and-swap (\co{cmpxchg()} or CAS) |
| instructions can also be used. |
| |
| \subsubsection{NBS Counters} |
| \label{sec:advsync:NBS Counters} |
| |
| The statistical counters algorithm discussed in |
| \cref{sec:count:Statistical Counters} |
| can be considered to be bounded-wait-free, but only by using a cute |
| definitional trick in which the sum is considered to be approximate |
| rather than exact.\footnote{ |
| Citation needed. |
| I heard of this trick verbally from Mark Moir.} |
| Given sufficiently wide error bounds that are a function of the length |
| of time that the \co{read_count()} function takes to sum the counters, |
| it is not possible to prove that any non-linearizable behavior occurred. |
| This definitely (if a bit artificially) classifies the statistical-counters |
| algorithm as wait-free. |
| This algorithm is probably the most heavily used NBS algorithm in the |
| Linux kernel. |
| |
| \subsubsection{Half-NBS Queue} |
| \label{sec:advsync:Half-NBS Queue} |
| |
| \begin{fcvref}[ln:count:NBS Enqueue Algorithm] |
| Another common NBS algorithm is the atomic queue where elements are |
| enqueued using an atomic exchange instruction~\cite{MagedMichael1993JPDC}, |
| followed by a store into the \co{->next} pointer of the new element's |
| predecessor, as shown in \cref{lst:count:NBS Enqueue Algorithm}, |
| which shows the userspace-RCU library |
| implementation~\cite{MathieuDesnoyers2009URCU}. |
| \Clnref{tail} updates the tail pointer to reference the new element while |
| returning a reference to its predecessor, which is stored in |
| local variable \co{old_tail}. |
| \Clnref{pred} then updates the predecessor's \co{->next} pointer to |
| reference the newly added element, and finally \clnref{ret} |
| returns an indication as to whether or not the queue was initially |
| empty. |
| |
| \begin{listing}[tbp] |
| \begin{fcvlabel}[ln:count:NBS Enqueue Algorithm] |
| \begin{VerbatimL}[commandchars=\\\[\]] |
| static inline bool |
| ___cds_wfcq_append(struct cds_wfcq_head *head, |
| struct cds_wfcq_tail *tail, |
| struct cds_wfcq_node *new_head, |
| struct cds_wfcq_node *new_tail) |
| { |
| struct cds_wfcq_node *old_tail; |
| |
| old_tail = uatomic_xchg(&tail->p, new_tail); \lnlbl[tail] |
| CMM_STORE_SHARED(old_tail->next, new_head); \lnlbl[pred] |
| return old_tail != &head->node; \lnlbl[ret] |
| } |
| |
| static inline bool |
| _cds_wfcq_enqueue(struct cds_wfcq_head *head, |
| struct cds_wfcq_tail *tail, |
| struct cds_wfcq_node *new_tail) |
| { |
| return ___cds_wfcq_append(head, tail, |
| new_tail, new_tail); |
| } |
| \end{VerbatimL} |
| \end{fcvlabel} |
| \caption{NBS Enqueue Algorithm} |
| \label{lst:count:NBS Enqueue Algorithm} |
| \end{listing} |
| |
| Although mutual exclusion is required to dequeue a single element |
| (so that dequeue is blocking), it is possible to carry out a non-blocking |
| removal of the entire contents of the queue. |
| What is not possible is to dequeue any given element in a non-blocking |
| manner: The enqueuer might have failed between \clnref{tail,pred} of the |
| listing, so that the element in question is only partially enqueued. |
| This results in a half-NBS algorithm where enqueues are NBS but |
| dequeues are blocking. |
| This algorithm is nevertheless heavily used in practice, in part because |
| most production software is not required to tolerate arbitrary fail-stop |
| errors. |
| \end{fcvref} |
| |
| \subsubsection{NBS Stack} |
| \label{sec:advsync:NBS Stack} |
| |
| \begin{fcvref}[ln:advsync:lifo_push:whole] |
| \Cref{lst:advsync:NBS Stack Algorithm} |
| shows the LIFO push algorithm, which boasts lock-free push and |
| bounded wait-free pop (\path{lifo-push.c}), forming an NBS stack. |
| The origins of this algorithm are unknown, but it was referred to in |
| a patent granted in 1975~\cite{PaulJBrown1975LIFOpush}. |
| This patent was filed in 1973, a few months before your editor |
| saw his first computer, which had but one CPU. |
| |
| \begin{listing}[tbp] |
| \input{CodeSamples/advsync/lifo_push@whole.fcv} |
| \caption{NBS Stack Algorithm} |
| \label{lst:advsync:NBS Stack Algorithm} |
| \end{listing} |
| |
| \Clnrefrange{struct:b}{struct:e} shows the \co{node_t} structure, |
| which contains an arbitrary value and a pointer to the next structure |
| on the stack and |
| \Clnref{top} shows the top-of-stack pointer. |
| |
| The \co{list_push()} function spans \clnrefrange{push:b}{push:e}. |
| \Clnref{push:alloc} allocates a new node and |
| \clnref{push:initialize} initializes it. |
| \Clnref{push:next} initializes the newly allocated node's \co{->next} |
| pointer, and \clnref{push:cmpxchg} attempts to push it on the stack. |
| If \clnref{push:check} detects \co{cmpxchg()} failure, another pass |
| through the loop retries. |
| Otherwise, the new node has been successfully pushed, and this function |
| returns to its caller. |
| Note that \clnref{push:check} resolves races in which two concurrent |
| instances of \co{list_push()} attempt to push onto the stack. |
| The \co{cmpxchg()} will succeed for one and fail for the other, |
| causing the other to retry, thereby selecting an arbitrary order for |
| the two node on the stack. |
| |
| The \co{list_pop_all()} function spans \clnrefrange{popall:b}{popall:e}. |
| The \co{xchg()} statement on \clnref{popall:xchg} atomically removes |
| all nodes on the stack, placing the head of the resulting list in local |
| variable \co{p} and setting \co{top} to \co{NULL}. |
| This atomic operation serializes concurrent calls to \co{list_pop_all()}: |
| One of them will get the list, and the other a \co{NULL} pointer, at |
| least assuming that there were no concurrent calls to \co{list_push()}. |
| |
| An instance of \co{list_pop_all()} that obtains a non-empty list in |
| \co{p} processes this list in the loop spanning |
| \clnrefrange{popall:loop:b}{popall:loop:e}. |
| \Clnref{popall:next} prefetches the \co{->next} pointer, |
| \clnref{popall:foo} invokes the function referenced by \co{foo()} on the |
| current node, |
| \clnref{popall:free} frees the current node, and |
| \clnref{popall:pnext} sets up \co{p} for the next pass through the loop. |
| |
| But suppose that a pair of \co{list_push()} instances run concurrently |
| with a \co{list_pop_all()} with a list initially containing a single |
| node~A. |
| Here is one way that this scenario might play out: |
| |
| \begin{enumerate} |
| \item The first \co{list_push()} instance pushes a new node~B, |
| executing through \clnref{push:next}, having just stored |
| a pointer to node A into node~B's \co{->next} pointer. |
| \item The \co{list_pop_all()} instance runs to completion, |
| setting \co{top} to \co{NULL} and freeing node~A. |
| \item The second \co{list_push()} instance runs to completion, |
| pushing a new node~C, but happens to allocate the memory |
| that used to belong to node~A. |
| \item The first \co{list_push()} instance executes the \co{cmpxchg()} |
| on \clnref{push:cmpxchg}. |
| Because new node~C has the same address as the newly freed node~A, |
| this \co{cmpxchg()} succeeds and this \co{list_push()} instance |
| runs to completion. |
| \end{enumerate} |
| |
| Note that both pushes and the popall all ran successfully despite the |
| reuse of node~A's memory. |
| This is an unusual property: Most data structures require protection |
| against what is often called the ABA problem. |
| |
| But this property holds only for algorithm written in assembly |
| language. |
| The sad fact is that most languages (including C and C++) do not support |
| pointers to lifetime-ended objects, such as the pointer to the old node~A |
| contained in node~B's \co{->next} pointer. |
| In fact, compilers are within their rights to assume that if two pointers |
| (call them \co{p} and \co{q}) were returned from two different calls to |
| \co{malloc()}, then those pointers must not be equal. |
| Real compilers really will generate the constant \co{false} in |
| response to a \co{p==q} comparison. |
| This profoundly counter-intuitive behavior is termed \emph{lifetime-end |
| pointer zap}. |
| |
| Many concurrent applications avoid this problem by carefully hiding the |
| memory allocator from the compiler, thus preventing the compiler from |
| making inappropriate assumptions. |
| This obfuscatory approach currently works in practice, but might well |
| one day fall victim to increasingly aggressive optimizers. |
| There is work underway in both the C and C++ standards committees |
| to address this |
| problem~\cite{PaulEMcKenney2019PointerLifetimeEndZap,PaulEMcKenney2020PointerLifetimeEndZapCpp}. |
| In the meantime, please exercise great care when coding ABA-tolerant |
| algorithms. |
| \end{fcvref} |
| |
| \QuickQuiz{ |
| So why not ditch antique languages like C and C++ for something |
| more modern? |
| }\QuickQuizAnswer{ |
| That won't help unless the more-modern languages proponents |
| are energetic enough to write their own compiler backends. |
| The usual practice of re-using existing backends also reuses |
| charming properties such as lifetime-end pointer zap. |
| }\QuickQuizEnd |
| |
| \subsection{Applicability of NBS Benefits} |
| \label{sec:advsync:Applicability of NBS Benefits} |
| |
| The most heavily cited NBS benefits stem from its forward-progress |
| guarantees, its tolerance of fail-stop bugs, and from its linearizability. |
| Each of these is discussed in one of the following sections. |
| |
| \subsubsection{NBS Forward Progress Guarantees} |
| \label{sec:advsync:NBS Forward Progress Guarantees} |
| |
| NBS's forward-progress guarantees have caused many to suggest its use in |
| real-time systems, and NBS algorithms are in fact used in a great many |
| such systems. |
| However, it is important to note that forward-progress guarantees are |
| largely orthogonal to those that form the basis of real-time programming: |
| |
| \begin{enumerate} |
| \item Real-time forward-progress guarantees usually have some |
| definite time associated with them, for example, |
| ``scheduling latency must be less than 100 microseconds.'' |
| In contrast, the most popular forms of NBS only guarantees |
| that progress will be made in finite time, with no definite |
| bound. |
| \item Real-time forward-progress guarantees are often |
| probabilistic, as in the soft-real-time guarantee that |
| ``at least 99.9\,\% of the time, scheduling latency must |
| be less than 100 microseconds.'' |
| In contrast, NBS's forward-progress |
| guarantees have traditionally been unconditional. |
| \item Real-time forward-progress guarantees are often conditioned on |
| environmental constraints, for example, only being honored: |
| (1)~for the highest-priority tasks, |
| (2)~when each CPU spends at least a certain fraction of its time idle, |
| or (3)~when I/O rates are below some specified maximum. |
| In contrast, NBS's forward-progress |
| guarantees are usually unconditional.\footnote{ |
| As we will see below, some recent NBS work relaxes |
| this guarantee.} |
| \item An important component of a real-time program's environment |
| is the scheduler. |
| NBS algorithms assume a worst-case \emph{demonic scheduler}. |
| In contrast, real-time systems assume that the scheduler is |
| doing its level best to satisfy any scheduling constraints |
| it knows about, and, in the absence of such constraints, |
| its level best to honor process priorities and to provide |
| fair scheduling to processes of the same priority. |
| This assumption of a non-demonic scheduler allows real-time |
| programs to use simpler algorithms than those required for |
| NBS~\cite{DanAlitarh2013PracticalProgress,BjoernBrandenburgPhD}. |
| \item Real-time forward-progress guarantees usually apply only |
| in the absence of software bugs. |
| In contrast, most NBS guarantees apply even in the face of |
| fail-stop bugs.\footnote{ |
| Again, some recent NBS work relaxes this guarantee.} |
| \item NBS forward-progress guarantee classes imply linearizability. |
| In contrast, real-time forward progress guarantees are often |
| independent of ordering constraints such as linearizability. |
| \end{enumerate} |
| |
| To reiterate, despite these differences, a number of NBS algorithms are |
| extremely useful in real-time programs. |
| |
| \subsubsection{NBS Fail-Stop Tolerance} |
| \label{sec:advsync:NBS Fail-Stop Tolerance} |
| |
| Of the classes of NBS algorithms, wait-free synchronization (bounded or |
| otherwise), lock-free synchronization, obstruction-free synchronization, |
| and clash-free synchronization guarantee forward progress even in the |
| presence of fail-stop bugs. |
| An example fail-stop bug would be a scheduler bug that causes a random |
| thread to be preempted indefinitely. |
| As we will see, this fail-stop-tolerant property can be useful, but the |
| fact is that composing a set of fail-stop-tolerate mechanisms does not |
| necessarily result in a fail-stop-tolerant system. |
| To see this, consider a system made up of a series of wait-free queues, |
| where an element is removed from one queue in the series, processed, |
| and then added to the next queue. |
| |
| If a thread is preempted in the midst of a queuing operation, all is well |
| because the wait-free nature of the queue will guarantee forward progress. |
| But suppose that preemption occurs not during the queuing, but instead |
| during the between-queues processing. |
| In this case, the element being processed is lost because |
| the fail-stop-tolerant nature of the wait-free queues does not |
| extend to the code using those queues. |
| |
| Nevertheless, there are a few applications where NBS's rather limited |
| fail-stop-tolerance is useful. |
| For example, in some network-based or web applications, a fail-stop |
| event will eventually result in a retransmission, which will restart |
| any work that was lost due to the fail-stop event. |
| Systems running such applications can therefore be heavily loaded, even |
| to the point where the scheduler can no longer provide any reasonable |
| fairness guarantee. |
| In constrast, if a thread fail-stops while holding a lock, the application |
| will normally need to be restarted. |
| Nevertheless, NBS is not a panacea even within this restricted area, |
| due to the possibility of spurious retransmissions due to pure scheduling |
| delays. |
| In some cases, it may be more efficient to reduce the load to avoid |
| queueing delays, which will also improve the scheduler's ability to |
| provide fair access, which might turn reduce or even eliminate the |
| apparent fail-stop events, thus shedding the load due to the retry |
| operations initiated by retransmissions. |
| |
| \subsubsection{NBS Linearizability} |
| \label{sec:advsync:NBS Linearizability} |
| |
| It is important to note that linearizability can be quite useful, |
| especially when analyzing concurrent code made up of strict locking |
| and fully ordered atomic operations.\footnote{ |
| For example, the Linux kernel's value-returning atomic operations.} |
| Furthermore, this handling of fully ordered atomic operations |
| automatically covers simple NBS algorithms. |
| |
| However, the linearization points of a complex NBS algorithms are often |
| buried deep within that algorithm, and thus not visible to users of |
| a library function implementing a part of such an algorithm. |
| Therefore, any claims that users benefit from the linearizability properties |
| of complex NBS algorithms should be regarded with deep |
| suspicion~\cite{AndreasHaas2012FIFOisnt}. |
| |
| It is sometimes asserted that linearizability is necessary for developers |
| to produce proofs of correctness for their concurrent code. |
| However, such proofs are the exception rather than the rule, and modern |
| developers who do produce proofs often use modern proof techniques that |
| do not depend on linearizability. |
| Furthermore, developers frequently use modern proof techniques that do |
| not require a full specification, given that developers often learn |
| their specification after the fact, one bug at a time. |
| A few such proof techniques were discussed in |
| \cref{chp:Formal Verification}.\footnote{ |
| A memorable verbal discussion with an advocate of linearizability |
| resulted in this advocate being asked: |
| ``So the reason linearizability is important is to rescue 1980s |
| proof techniques?'' |
| The advocate immediately replied in the affirmative, then spent |
| some time disparaging a particular modern proof technique. |
| Oddly enough, that technique was one of those successfully |
| applied to Linux-kernel RCU.} |
| |
| It is often asserted that linearizability maps well to sequential |
| specifications, which are said to be more natural than are concurrent |
| specifications~\cite{SergioRajsbaum2020HistoryLinearizability}. |
| But this assertion flies in the face of reality in the form of a highly |
| concurrent objective universe. |
| Therefore, nature would be expected to select for beings able to cope |
| with at least some level of concurrency, and anyone watching a team sport |
| or a person successfully overseeing a number of small children would be |
| hard-pressed to argue that nature has not done exactly that. |
| In addition, given that the teaching of sequential |
| computing is still believed to be somewhat of a black |
| art~\cite{ElizabethPatitsas2020GradesNotBimodal}, it is reasonable |
| to expect that teaching of concurrent computing is in a similar state |
| of disarray. |
| Therefore, focusing on only one proof technique is unlikely to be a |
| good way forward. |
| |
| Again, please understand that linearizability is quite useful in many |
| situations. |
| Then again, so is that venerable tool, the hammer. |
| But there comes a point in the field of computing where one should put |
| down the hammer and pick up a keyboard. |
| Similarly, it appears that there are times when linearizability is not |
| the best tool for the job. |
| |
| To their credit, there are some linearizability advocates who are aware |
| of some of its shortcomings~\cite{SergioRajsbaum2020HistoryLinearizability}. |
| There are also proposals to extend linearizability, for example, |
| interval-linearizability, which is intended to handle the common case |
| of operations that require non-zero time to |
| complete~\cite{10.1145/3266457}. |
| It remains to be seen whether these proposals will result in theories |
| able to handle modern concurrent software artifacts, especially given |
| that several of the proof techniques discussed in \cref{chp:Formal |
| Verification} already handle many modern concurrent software artifacts. |
| |
| \subsection{NBS Discussion} |
| \label{sec:advsync:NBS Discussion} |
| |
| It is possible to create fully non-blocking queues~\cite{MichaelScott96}, |
| however, such queues are much more complex than the half-NBS algorithm |
| outlined above. |
| The lesson here is to carefully consider your actual requirements. |
| Relaxing irrelevant requirements can often result in great |
| improvements in simplicity, performance, and scalability. |
| |
| Recent research points to another important way to relax requirements. |
| It turns out that systems providing fair scheduling can enjoy most |
| of the benefits of wait-free synchronization even when running |
| algorithms that provide only non-blocking |
| synchronization, both in theory~\cite{DanAlitarh2013PracticalProgress} |
| and in practice~\cite{SamyAlBahra2013NBS}. |
| Because a great many schedulers used in production do in fact |
| provide fairness, |
| the more-complex algorithms providing wait-free synchronization |
| usually provide no practical advantages over their simpler |
| and often faster non-blocking-synchronization counterparts. |
| |
| Interestingly enough, fair scheduling is but one beneficial |
| constraint that is often respected in practice. |
| Other sets of constraints can permit blocking algorithms to |
| achieve deterministic real-time response. |
| For example, given fair locks that are granted to requesters in FIFO order at |
| a given priority level, |
| a method of avoiding priority inversion (such as priority |
| inheritance~\cite{Takada:1995:RSN:527074.828566,Cai-DongWang1996PrioInherLock} |
| or priority ceiling), a bounded number of threads, |
| bounded critical sections, |
| bounded load, |
| and avoidance of fail-stop bugs, |
| lock-based applications can provide deterministic |
| response times~\cite{BjoernBrandenburgPhD,DipankarSarma2004OLSscalability}. |
| This approach of course blurs the distinction between blocking and wait-free |
| synchronization, which is all to the good. |
| Hopefully theoretical frameworks continue to grow, further increasing |
| their ability to |
| describe how software is actually constructed in practice. |
| |
| Those who feel that theory should lead the way are referred to the |
| inimitable Peter Denning, who said of operating systems: |
| ``Theory follows practice''~\cite{Denning:2015:POF:2830903.2830904}, |
| or to the eminent Tony Hoare, who said of the whole of engineering: |
| ``In all branches of engineering science, the engineering starts before |
| the science; indeed, without the early products of engineering, there |
| would be nothing for the scientist to |
| study!''~\cite{RichardMorris2007TonyHoareInterview} |
| However, once an appropriate body of theory becomes available,\footnote{ |
| Admittedly often much later than the first \emph{proposed} |
| body of theory.} |
| it is wise to make use of it. |
| |
| \input{advsync/rt} |
| |
| \QuickQuizAnswersChp{qqzadvsync} |