| % defer/whichtochoose.tex |
| |
| \section{Which to Choose?} |
| \label{sec:Which to Choose?} |
| |
| \begin{table*} |
| \small |
| \begin{center} |
| \begin{tabular}{l||c|c|c|c|c|c|c} |
| ~ ~ ~ ~ ~ ~ ~ ~ ~ |
| & \begin{picture}(6,100)(0,0) |
| \rotatebox{90}{Existence Guarantee} |
| \end{picture} |
| & \begin{picture}(7,100)(0,0) |
| \rotatebox{90}{Updates and Readers} |
| \end{picture} |
| \begin{picture}(7,100)(0,0) |
| \rotatebox{90}{Progress Concurrently} |
| \end{picture} |
| & \begin{picture}(6,100)(0,0) |
| \rotatebox{90}{Read-Side Overhead} |
| \end{picture} |
| & \begin{picture}(6,100)(0,0) |
| \rotatebox{90}{Bulk Reference} |
| \end{picture} |
| & \begin{picture}(6,100)(0,0) |
| \rotatebox{90}{Low Memory Footprint} |
| \end{picture} |
| & \begin{picture}(6,100)(0,0) |
| \rotatebox{90}{Unconditional Acquisition} |
| \end{picture} |
| & \begin{picture}(6,100)(0,0) |
| \rotatebox{90}{Non-Blocking Updates} |
| \end{picture} |
| \\ |
| \hline |
| % EX CO RSO BU LMF UA |
| \hline |
| Reference Counting & Y & ~ & ++ $\rightarrow$ atomic \dag |
| & ~ & Y & ~ & ? \\ |
| \hline |
| Hazard Pointers & Y & ~ & MB \dag & ~ & Y & ~ & Y \\ |
| \hline |
| Sequence Locks & ~ & Y & 2 MB \ddag |
| & N/A & N/A & ~ & \\ |
| \hline |
| RCU & Y & ~ & 0 $\rightarrow$ 2 MB |
| & Y & ~ & Y & \\ |
| \multicolumn{7}{l}{} \\ |
| \multicolumn{7}{l}{\dag Incurred on each element traversed on |
| each retry} \\ |
| \multicolumn{7}{l}{\ddag Incurred on each retry} \\ |
| \multicolumn{7}{l}{atomic: Atomic operation} \\ |
| \multicolumn{7}{l}{MB: Memory barrier} \\ |
| \end{tabular} |
| \end{center} |
| \caption{Which Deferred Technique to Choose?} |
| \label{tab:defer:Which Deferred Technique to Choose?} |
| \end{table*} |
| |
| Table~\ref{tab:defer:Which Deferred Technique to Choose?} |
| provides some rough rules of thumb that can help you choose among the |
| four deferred-processing techniques presented in this chapter. |
| |
| As shown in the ``Existence Guarantee'' column, |
| if you need existence guarantees for linked |
| data elements, you must use reference counting, hazard pointers, or RCU. |
| Sequence locks do not provide existence guarantees, instead providing |
| detection of updates, retrying any read-side critical sections |
| that do encounter an update. |
| |
| Of course, as shown in the ``Updates and Readers Progress Concurrently'' |
| column, this detection of updates implies |
| that sequence locking does not permit updates and readers to make forward |
| progress concurrently. |
| After all, preventing such forward progress is the whole point of using |
| sequence locking in the first place! |
| This situation points the way to using sequence locking in conjunction |
| with reference counting, hazard pointers, or RCU in order to provide |
| both existence guarantees and update detection. |
| In fact, the Linux kernel combines RCU and sequence locking in |
| this manner during pathname lookup. |
| |
| The ``Read-Side Overhead'' column gives a rough sense of the read-side |
| overhead of these techniques. |
| The overhead of reference counting can vary widely. |
| At the low end, a simple non-atomic increment suffices, at least in the |
| case where the reference is acquired under the protection of a lock |
| that must acquired for other reasons. |
| At the high end, a fully ordered atomic operation is required. |
| Reference counting incurs this overhead on each and every data element |
| traversed. |
| Hazard pointers incur the overhead of a memory barrier for each data element |
| traversed, and sequence locks incur the overhead of a pair of memory barriers |
| for each attempt to execute the critical section. |
| The overhead of RCU implemntations vary from nothing to that of a pair of |
| memory barriers for each read-side critical section, thus providing RCU |
| with the best performance, particularly for read-side critical sections |
| that traverse many data elements. |
| |
| The ``Bulk Reference'' column indicates that only RCU is capable of acquiring |
| multiple references with constant overhead. |
| The entry for sequence locks is ``N/A'' because, again, sequence locks |
| detect updates rather than acquiring references. |
| |
| \QuickQuiz{} |
| But can't both reference counting and hazard pointers can also acquire |
| a reference to multiple data elements with constant overhead? |
| A single reference count can cover multiple data elements, right? |
| \QuickQuizAnswer{ |
| Almost. |
| As we will see in the ``Unconditional Acquisition'' column, |
| neither reference counting |
| nor hazard pointers provide unconditional acquisition of references, |
| so acquiring a reference can have non-constant overhead in the face |
| of conflicting updates. |
| |
| In addition, using a single reference count to cover multiple |
| data items can have severe consequences, for example, you cannot |
| remove any of the data items until all references to all of them |
| have been released. |
| This can result in more complex data-element-cleanup code, |
| and can also increase memory footprint to rival that of RCU. |
| In other words, the increased memory footprint is a consequence |
| not of RCU in particular, but of bulk reference-count acquisition |
| in general. |
| } \QuickQuizEnd |
| |
| The ``Low Memory Footprint'' column indicates which techniques enjoy low |
| memory footprint. |
| This column ends up being the mirror image of the ``Bulk Reference'' |
| column: The ability to acquire references on large numbers of data |
| elements implies that all these data elements must persist, which |
| in turn implies a large memory footprint in some cases. |
| For example, one thread might delete a large number of data |
| elements while another thread concurrently executes a long RCU read-side |
| critical section. |
| Because the read-side critical section could potentially retain a reference |
| to any of the newly deleted data elements, all those data elements must |
| be retained for the full duration of that critical section. |
| In contrast, reference counting and hazard pointers would retain only |
| those specific data elements actually referenced by concurrent readers. |
| |
| However, this low-memory-footprint advantage comes at a price, as shown |
| in the ``Unconditional Acquisition'' column. |
| To see this, imagine a large linked data structure in which a |
| reference-counting or hazard-pointer reader (call it Thread~A) |
| holds a reference to |
| an isolated data element in the middle of that structure. |
| Consider the following sequence of events: |
| |
| \begin{enumerate} |
| \item Thread~B removes the data element referenced by Thread~A. |
| Because of this reference, the data element cannot yet be freed. |
| \item Thread~B removes all the data elements adjacent to the one |
| referenced by Thread~A. |
| Because there are no references held for these data elements, |
| they are all immediately freed. |
| Because Thread~A's data element has already been removed, |
| its outgoing pointers are not updated. |
| \item All of Thread~A's data element's outgoing pointers now reference |
| the freelist, and therefore cannot safely be traversed. |
| \item The reference-counting or hazard-pointer implementation therefore |
| has no choice but to fail any attempt by Thread~A to acquire |
| a reference via any of the pointers emanating from its data |
| element. |
| \end{enumerate} |
| |
| In short, any defered-processing technique that offers precise tracking |
| of references must also be prepared to fail attempts to acquire references. |
| Therefore, RCU's memory-footprint disadvantage implies an ease-of-use |
| advantage, namely that RCU readers need not deal with acquisition failure. |
| |
| This tension between memory footprint, precise tracking, and acquisition |
| failures is sometimes resolved within the Linux kernel by combining use |
| of RCU and reference counters. |
| RCU is used for short-lived references, which means that RCU read-side |
| critical sections can be short. |
| These short RCU read-side critical sections in turn mean that the corresponding |
| RCU grace periods can also be short, limiting the memory footprint. |
| For the few data elements that need longer-lived references, reference |
| counting is used. |
| This means that the complexity of reference-acquisition failure only |
| needs to be dealt with for those few data elements: The bulk of |
| the reference acquisitions are unconditional, courtesy of RCU. |
| |
| Finally, the ``Non-Blocking Updates'' column shows that hazard pointers |
| can provide non-blocking updates~\cite{MagedMichael04a,HerlihyLM02}. |
| Reference counting might or might not, depending on the implementation. |
| However, sequence locking cannot provide non-blocking updates, courtesy |
| of its update-side lock. |
| RCU updaters must wait on readers, which also rules out fully non-blocking |
| updates. |
| However, there are situations in which the only blocking operation is |
| a wait to free memory, which results in an situation that, for many |
| purposes, is as good as non-blocking~\cite{MathieuDesnoyers2012URCU}. |
| |
| As more experience is gained using these techniques, both separately |
| and in combination, the rules of thumb laid out in this section will |
| need to be refined. |
| However, this section does reflect the current state of the art. |