blob: 23ed8c8a917b49fc8e83d169d30bc49256844774 [file] [log] [blame]
% defer/whichtochoose.tex
\section{Which to Choose?}
\label{sec:Which to Choose?}
\begin{table*}
\small
\begin{center}
\begin{tabular}{l||c|c|c|c|c|c|c}
~ ~ ~ ~ ~ ~ ~ ~ ~
& \begin{picture}(6,100)(0,0)
\rotatebox{90}{Existence Guarantee}
\end{picture}
& \begin{picture}(7,100)(0,0)
\rotatebox{90}{Updates and Readers}
\end{picture}
\begin{picture}(7,100)(0,0)
\rotatebox{90}{Progress Concurrently}
\end{picture}
& \begin{picture}(6,100)(0,0)
\rotatebox{90}{Read-Side Overhead}
\end{picture}
& \begin{picture}(6,100)(0,0)
\rotatebox{90}{Bulk Reference}
\end{picture}
& \begin{picture}(6,100)(0,0)
\rotatebox{90}{Low Memory Footprint}
\end{picture}
& \begin{picture}(6,100)(0,0)
\rotatebox{90}{Unconditional Acquisition}
\end{picture}
& \begin{picture}(6,100)(0,0)
\rotatebox{90}{Non-Blocking Updates}
\end{picture}
\\
\hline
% EX CO RSO BU LMF UA
\hline
Reference Counting & Y & ~ & ++ $\rightarrow$ atomic \dag
& ~ & Y & ~ & ? \\
\hline
Hazard Pointers & Y & ~ & MB \dag & ~ & Y & ~ & Y \\
\hline
Sequence Locks & ~ & Y & 2 MB \ddag
& N/A & N/A & ~ & \\
\hline
RCU & Y & ~ & 0 $\rightarrow$ 2 MB
& Y & ~ & Y & \\
\multicolumn{7}{l}{} \\
\multicolumn{7}{l}{\dag Incurred on each element traversed on
each retry} \\
\multicolumn{7}{l}{\ddag Incurred on each retry} \\
\multicolumn{7}{l}{atomic: Atomic operation} \\
\multicolumn{7}{l}{MB: Memory barrier} \\
\end{tabular}
\end{center}
\caption{Which Deferred Technique to Choose?}
\label{tab:defer:Which Deferred Technique to Choose?}
\end{table*}
Table~\ref{tab:defer:Which Deferred Technique to Choose?}
provides some rough rules of thumb that can help you choose among the
four deferred-processing techniques presented in this chapter.
As shown in the ``Existence Guarantee'' column,
if you need existence guarantees for linked
data elements, you must use reference counting, hazard pointers, or RCU.
Sequence locks do not provide existence guarantees, instead providing
detection of updates, retrying any read-side critical sections
that do encounter an update.
Of course, as shown in the ``Updates and Readers Progress Concurrently''
column, this detection of updates implies
that sequence locking does not permit updates and readers to make forward
progress concurrently.
After all, preventing such forward progress is the whole point of using
sequence locking in the first place!
This situation points the way to using sequence locking in conjunction
with reference counting, hazard pointers, or RCU in order to provide
both existence guarantees and update detection.
In fact, the Linux kernel combines RCU and sequence locking in
this manner during pathname lookup.
The ``Read-Side Overhead'' column gives a rough sense of the read-side
overhead of these techniques.
The overhead of reference counting can vary widely.
At the low end, a simple non-atomic increment suffices, at least in the
case where the reference is acquired under the protection of a lock
that must acquired for other reasons.
At the high end, a fully ordered atomic operation is required.
Reference counting incurs this overhead on each and every data element
traversed.
Hazard pointers incur the overhead of a memory barrier for each data element
traversed, and sequence locks incur the overhead of a pair of memory barriers
for each attempt to execute the critical section.
The overhead of RCU implemntations vary from nothing to that of a pair of
memory barriers for each read-side critical section, thus providing RCU
with the best performance, particularly for read-side critical sections
that traverse many data elements.
The ``Bulk Reference'' column indicates that only RCU is capable of acquiring
multiple references with constant overhead.
The entry for sequence locks is ``N/A'' because, again, sequence locks
detect updates rather than acquiring references.
\QuickQuiz{}
But can't both reference counting and hazard pointers can also acquire
a reference to multiple data elements with constant overhead?
A single reference count can cover multiple data elements, right?
\QuickQuizAnswer{
Almost.
As we will see in the ``Unconditional Acquisition'' column,
neither reference counting
nor hazard pointers provide unconditional acquisition of references,
so acquiring a reference can have non-constant overhead in the face
of conflicting updates.
In addition, using a single reference count to cover multiple
data items can have severe consequences, for example, you cannot
remove any of the data items until all references to all of them
have been released.
This can result in more complex data-element-cleanup code,
and can also increase memory footprint to rival that of RCU.
In other words, the increased memory footprint is a consequence
not of RCU in particular, but of bulk reference-count acquisition
in general.
} \QuickQuizEnd
The ``Low Memory Footprint'' column indicates which techniques enjoy low
memory footprint.
This column ends up being the mirror image of the ``Bulk Reference''
column: The ability to acquire references on large numbers of data
elements implies that all these data elements must persist, which
in turn implies a large memory footprint in some cases.
For example, one thread might delete a large number of data
elements while another thread concurrently executes a long RCU read-side
critical section.
Because the read-side critical section could potentially retain a reference
to any of the newly deleted data elements, all those data elements must
be retained for the full duration of that critical section.
In contrast, reference counting and hazard pointers would retain only
those specific data elements actually referenced by concurrent readers.
However, this low-memory-footprint advantage comes at a price, as shown
in the ``Unconditional Acquisition'' column.
To see this, imagine a large linked data structure in which a
reference-counting or hazard-pointer reader (call it Thread~A)
holds a reference to
an isolated data element in the middle of that structure.
Consider the following sequence of events:
\begin{enumerate}
\item Thread~B removes the data element referenced by Thread~A.
Because of this reference, the data element cannot yet be freed.
\item Thread~B removes all the data elements adjacent to the one
referenced by Thread~A.
Because there are no references held for these data elements,
they are all immediately freed.
Because Thread~A's data element has already been removed,
its outgoing pointers are not updated.
\item All of Thread~A's data element's outgoing pointers now reference
the freelist, and therefore cannot safely be traversed.
\item The reference-counting or hazard-pointer implementation therefore
has no choice but to fail any attempt by Thread~A to acquire
a reference via any of the pointers emanating from its data
element.
\end{enumerate}
In short, any defered-processing technique that offers precise tracking
of references must also be prepared to fail attempts to acquire references.
Therefore, RCU's memory-footprint disadvantage implies an ease-of-use
advantage, namely that RCU readers need not deal with acquisition failure.
This tension between memory footprint, precise tracking, and acquisition
failures is sometimes resolved within the Linux kernel by combining use
of RCU and reference counters.
RCU is used for short-lived references, which means that RCU read-side
critical sections can be short.
These short RCU read-side critical sections in turn mean that the corresponding
RCU grace periods can also be short, limiting the memory footprint.
For the few data elements that need longer-lived references, reference
counting is used.
This means that the complexity of reference-acquisition failure only
needs to be dealt with for those few data elements: The bulk of
the reference acquisitions are unconditional, courtesy of RCU.
Finally, the ``Non-Blocking Updates'' column shows that hazard pointers
can provide non-blocking updates~\cite{MagedMichael04a,HerlihyLM02}.
Reference counting might or might not, depending on the implementation.
However, sequence locking cannot provide non-blocking updates, courtesy
of its update-side lock.
RCU updaters must wait on readers, which also rules out fully non-blocking
updates.
However, there are situations in which the only blocking operation is
a wait to free memory, which results in an situation that, for many
purposes, is as good as non-blocking~\cite{MathieuDesnoyers2012URCU}.
As more experience is gained using these techniques, both separately
and in combination, the rules of thumb laid out in this section will
need to be refined.
However, this section does reflect the current state of the art.