blob: 17462aae86d86230c444585429091e57e6976f81 [file] [log] [blame]
% defer/rcufundamental.tex
% mainfile: ../perfbook.tex
% SPDX-License-Identifier: CC-BY-SA-3.0
\subsection{RCU Fundamentals}
\label{sec:defer:RCU Fundamentals}
\OriginallyPublished{sec:defer:RCU Fundamentals}{RCU Fundamentals}{Linux Weekly News}{PaulEMcKenney2007WhatIsRCUFundamentally}
This section re-examines the ground covered in the previous section, but
independent of any particular example or use case.
People who prefer to live their lives very close to the actual code may
wish to skip the underlying fundamentals presented in this section.
RCU is made up of three fundamental mechanisms, the first being
used for insertion, the second being used for deletion, and the third
being used to allow readers to tolerate concurrent insertions and deletions.
\Cref{sec:defer:Publish-Subscribe Mechanism}
describes the publish-subscribe mechanism used for insertion,
\cref{sec:defer:Wait For Pre-Existing RCU Readers}
describes how waiting for pre-existing RCU readers enables deletion,
and
\cref{sec:defer:Maintain Multiple Versions of Recently Updated Objects}
discusses how maintaining multiple versions of recently updated objects
permits concurrent insertions and deletions.
Finally,
\cref{sec:defer:Summary of RCU Fundamentals}
summarizes RCU fundamentals.
\subsubsection{Publish-Subscribe Mechanism}
\label{sec:defer:Publish-Subscribe Mechanism}
Because RCU readers are not excluded by RCU updaters, an RCU-protected
data structure might change while a reader accesses it.
The accessed data item might be moved, removed, or replaced.
Because the data structure does not ``hold still'' for the reader,
each reader's access can be thought of as subscribing to the current
version of the RCU-protected data item.
For their part, updaters can be thought of as publishing new versions.
% @@@ Merge usage section into "Which to Choose?" and suggest choices.
\begin{figure}
\centering
\resizebox{3in}{!}{\includegraphics{defer/pubsub}}
\caption{Publication/Subscription Constraints}
\label{fig:defer:Publication/Subscription Constraints}
\end{figure}
Unfortunately, as laid out in
\cref{sec:toolsoftrade:Shared-Variable Shenanigans}
and reiterated in
\cref{sec:defer:Minimal Insertion and Deletion},
it is unwise to use \IXplx{plain access}{es} for these publication and subscription
operations.
It is instead necessary to inform both the compiler and the CPU
of the need for care, as can be seen from
\cref{fig:defer:Publication/Subscription Constraints},
which illustrates interactions between concurrent executions of
\co{ins_route()} (and its caller) and \co{access_route()} from
\cref{lst:defer:Insertion and Deletion With Concurrent Readers}.
The \co{ins_route()} column from
\cref{fig:defer:Publication/Subscription Constraints}
shows \co{ins_route()}'s caller allocating a new \co{route} structure,
which then contains pre-initialization garbage.
The caller then initializes the newly allocated structure, and then
invokes \co{ins_route()} to publish a pointer to the new \co{route}
structure.
Publication does not affect the contents of the structure, which
therefore remain valid after publication.
The \co{access_route()} column from this same figure shows
the pointer being subscribed to and dereferenced.
This dereference operation absolutely must see a valid \co{route}
structure rather than pre-initialization garbage because referencing
garbage could result in memory corruption, crashes, and hangs.
As noted earlier, avoiding such garbage means that the publish and
subscribe operations must inform both the compiler and the CPU of the
need to maintain the needed ordering.
Publication is carried out by \co{rcu_assign_pointer()}, which ensures that
\co{ins_route()}'s caller's initialization is ordered before the actual
publication operation's store of the pointer.
In addition, \co{rcu_assign_pointer()} must be atomic in the sense that
concurrent readers see either the old value of the pointer or the new
value of the pointer, but not some mash-up of these two values.
These requirements are met by the C11 store-release operation, and in
fact in the Linux kernel, \co{rcu_assign_pointer()} is defined in terms
of \co{smp_store_release()}, which is similar to C11 store-release.
Note that if concurrent updates are required, some sort of synchronization
mechanism will be required to mediate among multiple concurrent
\co{rcu_assign_pointer()} calls on the same pointer.
In the Linux kernel, locking is the mechanism of choice, but pretty
much any synchronization mechanism may be used.
An example of a particularly lightweight synchronization mechanism is
\cref{chp:Data Ownership}'s data ownership:
If each pointer is owned by a particular thread, then that thread may
execute \co{rcu_assign_pointer()} on that pointer with no additional
synchronization overhead.
\QuickQuiz{
Wouldn't use of data ownership for RCU updaters mean that
the updates could use exactly the same sequence of instructions
as would the corresponding single-threaded code?
}\QuickQuizAnswer{
Sometimes, for example, on TSO systems such as x86 or the IBM
mainframe where a store-release operation emits a single store
instruction.
However, weakly ordered systems must also emit a memory barrier
or perhaps a store-release instruction.
In addition, removing data requires quite a bit of additional
work because it is necessary to wait for pre-existing readers
before freeing the removed data.
}\QuickQuizEnd
Subscription is carried out by \co{rcu_dereference()}, which orders
the subscription operation's load from the pointer is before the
dereference.
Similar to \co{rcu_assign_pointer()}, \co{rcu_dereference()} must be
atomic in the sense that the value loaded must be that from a single
store, for example, the compiler must not tear the load.\footnote{
That is, the compiler must not break the load into multiple
smaller loads, as described under ``load tearing'' in
\cref{sec:toolsoftrade:Shared-Variable Shenanigans}.}
Unfortunately, compiler support for \co{rcu_dereference()} is at best
a work in progress~\cite{PaulEMcKennneyConsumeP0190R4,PaulEMcKenney2017markconsumeP0462R1,JFBastien2018P0750R1consume}.
In the meantime, the Linux kernel relies on volatile loads, the details of
the various CPU architectures, coding
restrictions~\cite{PaulEMcKenney2014rcu-dereference},
and, on DEC Alpha~\cite{ALPHA2002}, a memory-barrier instruction.
However, on other architectures, \co{rcu_dereference()} typically
emits a single load instruction, just as would the equivalent single-threaded
code.
The coding restrictions are described in more detail in
\cref{sec:memorder:Address- and Data-Dependency Difficulties},
however, the common case of field selection (\qtco{->}) works quite well.
Software that does not require the ultimate in read-side performance
can instead use C11 \IXpl{acquire load}, which provide the needed ordering and
more, albeit at a cost.
It is hoped that lighter-weight compiler support for \co{rcu_dereference()}
will appear in due course.
In short, use of \co{rcu_assign_pointer()} for publishing pointers and
use of \co{rcu_dereference()} for subscribing to them successfully avoids the
``Not OK'' garbage loads depicted in
\cref{fig:defer:Publication/Subscription Constraints}.
These two primitives can therefore be used to add new data to linked
structures without disrupting concurrent readers.
\QuickQuiz{
But suppose that updaters are adding and removing multiple data
items from a linked list while a reader is iterating over that
same list.
Specifically, suppose that a list initially contains elements
A, B, and~C, and that an updater removes element A and then
adds a new element D at the end of the list.
The reader might well see \{A, B, C, D\}, when that sequence of
elements never actually ever existed!
In what alternate universe would that qualify as ``not disrupting
concurrent readers''???
}\QuickQuizAnswer{
In the universe where an iterating reader is only required to
traverse elements that were present throughout the full duration
of the iteration.
In the example, that would be elements B and~C\@.
Because elements A and~D were each present for only part of the
iteration, the reader is permitted to iterate over them, but not
obliged to.
Note that this supports the common case where the reader is simply
looking up a single item, and does not know or care about the
presence or absence of other items.
If stronger consistency is required, then higher-cost
synchronization mechanisms are required, for example, sequence
locking or reader-writer locking.
But if stronger consistency is \emph{not} required (and it very often
is not), then why pay the higher cost?
}\QuickQuizEnd
Adding data to a linked structure without disrupting readers is a good thing,
as are the cases where this can be done with no added read-side cost compared
to single-threaded readers.
However, in most cases it is also necessary to remove data, and this is the
subject of the next section.
\subsubsection{Wait For Pre-Existing RCU Readers}
\label{sec:defer:Wait For Pre-Existing RCU Readers}
In its most basic form, RCU is a way of waiting for things to finish.
Of course, there are a great many other ways of waiting for things to
finish, including reference counts, reader-writer locks, events, and so on.
The great advantage of RCU is that it can wait for each of
(say) 20,000 different things without having to explicitly
track each and every one of them, and without having to worry about
the performance degradation, scalability limitations, complex deadlock
scenarios, and memory-leak hazards that are inherent in schemes
using explicit tracking.
In RCU's case, each of the things waited on is called an
\emph{\IXBhmr{RCU read-side}{critical section}}.
As noted in
\cref{tab:defer:Core RCU API}, an RCU read-side critical
section starts with an \co{rcu_read_lock()} primitive, and ends with a
corresponding \co{rcu_read_unlock()} primitive.
RCU read-side critical sections can be nested, and may contain pretty
much any code, as long as that code does not contain a quiescent state.
For example, within the Linux kernel, it is illegal to sleep within
an RCU read-side critical section because a context switch is a quiescent
state.\footnote{
However, a special form of RCU called SRCU~\cite{PaulEMcKenney2006c}
does permit general sleeping in SRCU read-side critical sections.}
If you abide by these conventions, you can use RCU to wait for \emph{any}
pre-existing RCU read-side critical section to complete, and
\co{synchronize_rcu()} uses indirect means to do the actual
waiting~\cite{MathieuDesnoyers2012URCU,McKenney:2013:SDS:2483852.2483867}.
\begin{figure}
\centering
\resizebox{3in}{!}{\includegraphics{defer/RCUGuaranteeFwd}}
\caption{RCU Reader and Later Grace Period}
\label{fig:defer:RCU Reader and Later Grace Period}
\end{figure}
The relationship between an RCU read-side critical section and a later
RCU grace period is an if-then relationship, as illustrated by
\cref{fig:defer:RCU Reader and Later Grace Period}.
If any portion of a given critical section precedes the beginning of
a given grace period, then RCU guarantees that all of that critical
section will precede the end of that grace period.
In the figure, \co{P0()}'s access to \co{x} precedes \co{P1()}'s access
to this same variable, and thus also precedes the grace period generated
by \co{P1()}'s call to \co{synchronize_rcu()}.
It is therefore guaranteed that \co{P0()}'s access to \co{y} will precede
\co{P1()}'s access.
In this case, if \co{r1}'s final value is 0, then \co{r2}'s final value
is guaranteed to also be 0.
\QuickQuiz{
What other final values of \co{r1} and \co{r2} are possible in
\cref{fig:defer:RCU Reader and Later Grace Period}?
}\QuickQuizAnswer{
The \co{r1 == 0 && r2 == 0} possibility was called out in the text.
Given that \co{r1 == 0} implies \co{r2 == 0}, we know that
\co{r1 == 0 && r2 == 1} is forbidden.
The following discussion will show that both
\co{r1 == 1 && r2 == 1} and \co{r1 == 1 && r2 == 0} are possible.
}\QuickQuizEnd
\begin{figure}
\centering
\resizebox{3in}{!}{\includegraphics{defer/RCUGuaranteeRev}}
\caption{RCU Reader and Earlier Grace Period}
\label{fig:defer:RCU Reader and Earlier Grace Period}
\end{figure}
The relationship between an RCU read-side critical section and an earlier
RCU grace period is also an if-then relationship, as illustrated by
\cref{fig:defer:RCU Reader and Earlier Grace Period}.
If any portion of a given critical section follows the end of
a given grace period, then RCU guarantees that all of that critical
section will follow the beginning of that grace period.
In the figure, \co{P0()}'s access to \co{y} follows \co{P1()}'s access
to this same variable, and thus follows the grace period generated by
\co{P1()}'s call to \co{synchronize_rcu()}.
It is therefore guaranteed that \co{P0()}'s access to \co{x} will follow
\co{P1()}'s access.
In this case, if \co{r2}'s final value is 1, then \co{r1}'s final value
is guaranteed to also be 1.
\QuickQuiz{
What would happen if the order of \co{P0()}'s two accesses was
reversed in
\cref{fig:defer:RCU Reader and Earlier Grace Period}?
}\QuickQuizAnswer{
Absolutely nothing would change.
The fact that \co{P0()}'s loads from \co{x} and \co{y} are
in the same RCU read-side critical section suffices;
their order is irrelevant.
}\QuickQuizEnd
\begin{figure}
\centering
\resizebox{3in}{!}{\includegraphics{defer/RCUGuaranteeMid}}
\caption{RCU Reader Within Grace Period}
\label{fig:defer:RCU Reader Within Grace Period}
\end{figure}
Finally, as shown in
\cref{fig:defer:RCU Reader Within Grace Period},
an RCU read-side critical section can be completely overlapped by
an RCU grace period.
In this case, \co{r1}'s final value is 1 and \co{r2}'s final value is 0.
However, it cannot be the case that \co{r1}'s final value is 0 and \co{r2}'s
final value is 1.
This would mean that an RCU read-side critical section had completely
overlapped a grace period, which is forbidden (or at the very least
constitutes a bug in RCU)\@.
RCU's wait-for-readers guarantee therefore has two parts:
\begin{enumerate*}[(1)]
\item If any part of a given RCU read-side critical section precedes
the beginning of a given grace period, then the entirety of that
critical section precedes the end of that grace period.
\item If any part of a given RCU read-side critical section follows
the end of a given grace period, then the entirety of that
critical section follows the beginning of that grace period.
\end{enumerate*}
This definition is sufficient for almost all RCU-based algorithms, but
for those wanting more,
simple executable formal models of RCU are available
as part of Linux kernel v4.17 and later, as discussed in
\cref{sec:formal:Axiomatic Approaches and RCU}.
In addition, RCU's ordering properties are examined in much
greater detail in \cref{sec:memorder:RCU}.
\QuickQuiz{
What would happen if \co{P0()}'s accesses in
\crefrange{fig:defer:RCU Reader and Later Grace Period}{fig:defer:RCU Reader Within Grace Period}
were stores?
}\QuickQuizAnswer{
The exact same ordering rules would apply, that is,
(1)~If any part of \co{P0()}'s RCU read-side critical section
preceded the beginning of \co{P1()}'s grace period, all of
\co{P0()}'s RCU read-side critical section would precede the
end of \co{P1()}'s grace period, and
(2)~If any part of \co{P0()}'s RCU read-side critical section
followed the end of \co{P1()}'s grace period, all of \co{P0()}'s
RCU read-side critical section would follow the beginning of
\co{P1()}'s grace period.
It might seem strange to have RCU read-side critical sections
containing writes, but this capability is not only permitted,
but also highly useful.
For example, the Linux kernel frequently carries out an
RCU-protected traversal of a linked data structure and then
acquires a reference to the destination data element.
Because this data element must not be freed in the meantime,
that element's reference counter must necessarily be incremented
within the traversal's RCU read-side critical section.
However, that increment entails a write to memory.
Therefore, it is a very good thing that memory writes are
permitted within RCU read-side critical sections.
If having writes in RCU read-side critical sections still seems
strange, please review
\cref{sec:count:Applying Exact Limit Counters},
which presented a use case for writes in reader-writer locking
read-side critical sections.
}\QuickQuizEnd
Although RCU's wait-for-readers capability really is sometimes used to
order the assignment of values to variables as shown in
\crefrange{fig:defer:RCU Reader and Later Grace Period}
{fig:defer:RCU Reader Within Grace Period},
it is more frequently used to safely free data elements removed from
a linked structure, as was done in
\cref{sec:defer:Introduction to RCU}.
The general process is illustrated by the following pseudocode:
\begin{enumerate}
\item Make a change, for example, remove an element from a linked list.
\item Wait for all pre-existing RCU read-side critical sections to
completely finish (for example, by using
\co{synchronize_rcu()}).
\item Clean up, for example, free the element that was replaced above.
\end{enumerate}
\begin{figure}
\centering
\IfEbookSize{
\resizebox{2in}{!}{\includegraphics{defer/RCUGPorderingSummary}}
}{
\resizebox{2.5in}{!}{\includegraphics{defer/RCUGPorderingSummary}}
}
\caption{Summary of RCU Grace-Period Ordering Guarantees}
\label{fig:defer:Summary of RCU Grace-Period Ordering Guarantees}
\end{figure}
This more abstract procedure requires a more abstract diagram than
\crefrange{fig:defer:RCU Reader and Later Grace Period}
{fig:defer:RCU Reader Within Grace Period},
which are specific to a particular litmus test.
After all, an RCU implementation must work correctly regardless of
the form of the RCU updates and the RCU read-side critical sections.
\Cref{fig:defer:Summary of RCU Grace-Period Ordering Guarantees}
fills this need, showing the four possible scenarios, with time
advancing from top to bottom within each scenario.
Within each scenario, an RCU reader is represented by the left-hand
stack of boxes and RCU updater by the right-hand stack.
In the first scenario, the reader starts execution before the
updater starts the removal, so it is possible that this reader
has a reference to the removed data element.
Therefore, the updater must not free this element until after the
reader completes.
In the second scenario, the reader does not start execution until
after the removal has completed.
The reader cannot possibly obtain a reference to the already-removed
data element, so this element may be freed before the reader completes.
The third scenario is like the second, but illustrates that even when the
reader cannot possibly obtain a reference to an element, it is still
permissible to defer the freeing of that element until after the
reader completes.
In the fourth and final scenario, the reader starts execution before
the updater starts removing the data element, but this element
is (incorrectly) freed before the reader completed.
A correct RCU implementation will not allow this fourth scenario to
occur.
This diagram thus illustrates RCU's wait-for-readers functionality:
Given a grace period, each reader ends before the end of that grace
period, starts after the beginning of that grace period, or both, in
which case it is wholly contained within that grace period.
Because RCU readers can make forward progress while updates
are in progress, different readers might disagree about the state
of the data structure, a topic taken up by the next section.
\subsubsection{Maintain Multiple Versions of Recently Updated Objects}
\label{sec:defer:Maintain Multiple Versions of Recently Updated Objects}
This section discusses how RCU accommodates synchronization-free readers
by maintaining multiple versions of data.
Because these synchronization-free readers provide very weak temporal
synchronization, RCU users compensate via spatial synchronization.
Spatial synchronization was discussed in
\cref{chp:Partitioning and Synchronization Design}, and is heavily used
in practice to obtain good performance and scalability.
In this section, spatial synchronization will be used to attain a weak
(but useful) form of correctness as well as excellent performance and
scalability.
\Cref{fig:defer:Deletion With Concurrent Readers}
in
\cref{sec:defer:Minimal Insertion and Deletion}
showed a simple variant of spatial synchronization, in which different
readers running concurrently with \co{del_route()}
(see \cref{lst:defer:Insertion and Deletion With Concurrent Readers})
might see the old \co{route} structure or an empty list, but either
way get a valid result.
Of course, a closer look at
\cref{fig:defer:Insertion With Concurrent Readers}
shows that calls to \co{ins_route()} can also result in concurrent
readers seeing different versions:
Either the initial empty list or the newly inserted \co{route} structure.
Note that both reference counting
(\cref{sec:defer:Reference Counting})
and hazard pointers
(\cref{sec:defer:Hazard Pointers})
can also cause concurrent readers to see different versions, but
RCU's lightweight readers make this more likely.
\begin{figure}
\centering
\resizebox{3in}{!}{\includegraphics{defer/multver}}
\caption{Multiple RCU Data-Structure Versions}
\label{fig:defer:Multiple RCU Data-Structure Versions}
\end{figure}
However, maintaining multiple weakly consistent versions can provide
some surprises.
For example, consider
\cref{fig:defer:Multiple RCU Data-Structure Versions},
in which a reader is traversing a linked list that is concurrently
updated.\footnote{
RCU linked-list APIs may be found in
\cref{sec:defer:RCU Linux-Kernel API}.}
In the first row of the figure, the reader is referencing data item~A,
and in the second row, it advances to~B, having thus far seen A followed by~B\@.
In the third row, an updater removes element~A and in the fourth row
an updater adds element~E to the end of the list.
In the fifth and final row, the reader completes its traversal, having
seeing elements~A through~E\@.
Except that there was no time at which such a list existed.
This situation might be even more surprising than that shown in
\cref{fig:defer:Deletion With Concurrent Readers},
in which different concurrent readers see different versions.
In contrast, in
\cref{fig:defer:Multiple RCU Data-Structure Versions}
the reader sees a version that never actually existed!
One way to resolve this strange situation is via weaker semanitics.
A reader traversal must encounter any data item that was present
during the full traversal (B, C, and~D), and might or might not
encounter data items that were present for only part of the
traversal (A and~E)\@.
Therefore, in this particular case, it is perfectly legitimate for
the reader traversal to encounter all five elements.
If this outcome is problematic, another way to resolve this situation is
through use of stronger synchronization mechanisms, such as reader-writer
locking, or clever use of timestamps and versioning,
as discussed in \cref{sec:defer:Quasi Multi-Version Concurrency Control}.
Of course, stronger mechanisms will be more expensive, but then again
the engineering life is all about choices and tradeoffs.
Strange though this situation might seem, it is entirely consistent with
the real world.
As we saw in
\cref{sec:cpu:Overheads},
the finite speed of light cannot be ignored within a computer system,
and it most certainly cannot be ignored outside of this system.
This in turn means that any data within the system representing state
in the real world outside of the system is always and forever outdated,
and thus inconsistent with the real world.
Therefore, it is quite possible that the sequence \{A, B, C, D, E\}
occurred in the real world, but due to speed-of-light delays was
never represented in the computer system's memory.
In this case, the reader's surprising traversal would correctly reflect
reality.
As a result, algorithms operating on real-world data must account for
inconsistent data, either by tolerating inconsistencies or by taking
steps to exclude or reject them.
In many cases, these algorithms are also perfectly capable of dealing
with inconsistencies within the system.
The pre-BSD packet routing example laid out in
\cref{sec:defer:Running Example}
is a case in point.
The contents of a routing list is set by routing protocols, and these
protocols feature significant delays (seconds or even minutes) to avoid
routing instabilities.
Therefore, once a routing update reaches a given system,
it might well have been sending packets the wrong way for quite some time.
Sending a few more packets the wrong way for the few microseconds during
which the update is in flight is clearly not a problem because the same
higher-level protocol actions that deal with delayed routing updates
will also deal with internal inconsistencies.
Nor is Internet routing the only situation tolerating inconsistencies.
To repeat, any algorithm in which data within a system tracks
outside-of-system state must tolerate inconsistencies, which includes
security policies (often set by committees of humans), storage configuration,
and WiFi access points, to say nothing of removable hardware such as
microphones, headsets, cameras, mice, printers, and much else besides.
Furthermore, the large number of Linux-kernel RCU API uses shown in
\cref{fig:defer:RCU Usage in the Linux Kernel},
combined with the Linux kernel's heavy use of reference counting
and with increasing use of \IXpl{hazard pointer} in other projects, demonstrates
that tolerance for such inconsistencies is more common than one might
imagine.
One root cause of this common-case tolerance of inconsistencies is
that single-item lookups are much more common in practice than are
full-data-structure traversals.
After all, full-data-structure traversals are much more expensive than
single-item lookups, so developers are motivated to avoid such traversals.
Not only are concurrent updates less likely to affect a single-item
lookup than they are a full traversal, but it is also the case that an
isolated single-item lookup has no way of detecting such inconsistencies.
As a result, in the common case, such inconsistencies are not just
tolerable, they are in fact invisible.
In such cases, RCU readers can be considered to be fully ordered with
updaters, despite the fact that these readers might be executing the
exact same sequence of machine instructions that would be executed by
a single-threaded program, as hinted on
\cpageref{sec:defer:Mysteries RCU}.
For example, referring back to
\cref{lst:defer:Insertion and Deletion With Concurrent Readers}
on \cpageref{lst:defer:Insertion and Deletion With Concurrent Readers},
suppose that each reader thread invokes \co{access_route()} exactly
once during its lifetime, and that there is no other communication among
reader and updater threads.
\begin{fcvref}[ln:defer:Insertion and Deletion With Concurrent Readers]
Then each invocation of \co{access_route()} can be ordered after the
\co{ins_route()} invocation that produced the \co{route} structure
accessed by \clnref{access_rp} of the listing in \co{access_route()}
and ordered before any subsequent
\co{ins_route()} or \co{del_route()} invocation.
\end{fcvref}
In summary, maintaining multiple versions is exactly what enables the
extremely low overheads of RCU readers, and as noted earlier, many
algorithms are unfazed by multiple versions.
However, there are algorithms that absolutely cannot handle multiple versions.
There are techniques for adapting such algorithms to
RCU~\cite{PaulEdwardMcKenneyPhD}, for example, the use of sequence locking
described in \cref{sec:together:Correlated Data Elements}.
\paragraph{Exercises}
\label{sec:defer:Exercises}
These examples assumed that a mutex was held across the entire
update operation, which would mean that there could be at most two
versions of the list active at a given time.
\QuickQuizSeries{%
\QuickQuizB{
How would you modify the deletion example to permit more than two
versions of the list to be active?
}\QuickQuizAnswerB{
One way of accomplishing this is as shown in
\cref{lst:defer:Concurrent RCU Deletion}.
\begin{listing}
\begin{VerbatimL}
spin_lock(&mylock);
p = search(head, key);
if (p == NULL)
spin_unlock(&mylock);
else {
list_del_rcu(&p->list);
spin_unlock(&mylock);
synchronize_rcu();
kfree(p);
}
\end{VerbatimL}
\caption{Concurrent RCU Deletion}
\label{lst:defer:Concurrent RCU Deletion}
\end{listing}
Note that this means that multiple concurrent deletions might be
waiting in \co{synchronize_rcu()}.
}\QuickQuizEndB
%
\QuickQuizM{
How many RCU versions of a given list can be
active at any given time?
}\QuickQuizAnswerM{
That depends on the synchronization design.
If a semaphore protecting the update is held across the grace period,
then there can be at most two versions, the old and the new.
However, suppose that only the search, the update, and the
\co{list_replace_rcu()} were protected by a lock, so that
the \co{synchronize_rcu()} was outside of that lock, similar
to the code shown in
\cref{lst:defer:Concurrent RCU Deletion}.
Suppose further that a large number of threads undertook an
RCU replacement at about the same time, and that readers
are also constantly traversing the data structure.
Then the following sequence of events could occur, starting from
the end state of
\cref{fig:defer:Multiple RCU Data-Structure Versions}:
\begin{enumerate}
\item Thread~A traverses the list, obtaining a reference to
Element~C.
\item Thread~B replaces Element~C with a new
Element~F, then waits for its \co{synchronize_rcu()}
call to return.
\item Thread~C traverses the list, obtaining a reference to
Element~F.
\item Thread~D replaces Element~F with a new
Element~G, then waits for its \co{synchronize_rcu()}
call to return.
\item Thread~E traverses the list, obtaining a reference to
Element~G.
\item Thread~F replaces Element~G with a new
Element~H, then waits for its \co{synchronize_rcu()}
call to return.
\item Thread~G traverses the list, obtaining a reference to
Element~H.
\item And the previous two steps repeat quickly with additional
new elements, so that all of them happen before any of
the \co{synchronize_rcu()} calls return.
\end{enumerate}
Thus, there can be an arbitrary number of versions active,
limited only by memory and by how many updates could be completed
within a grace period.
But please note that data structures that are updated so frequently
are not likely to be good candidates for RCU\@.
Nevertheless, RCU can handle high update rates when necessary.
}\QuickQuizEndM
%
\QuickQuizE{
How can the per-update overhead of RCU be reduced?
}\QuickQuizAnswerE{
The most effective way to reduce the per-update overhead
of RCU is to increase the number of updates served by
a given grace period.
This works because the per-grace period overhead is nearly
independent of the number of updates served by that
grace period.
One way to do this is to delay the start of a given grace
period in the hope that more updates requiring that grace
period appear in the meantime.
Another way is to slow down execution of the grace period
in the hope that more updates requiring an additional
grace period will accumulate in the meantime.
There are many other possible optimizations, and fanatically
devoted readers are referred to the Linux-kernel RCU
implementation.
}\QuickQuizEndE
}
\subsubsection{Summary of RCU Fundamentals}
\label{sec:defer:Summary of RCU Fundamentals}
This section has described the three fundamental components of RCU-based
algorithms:
\begin{enumerate}
\item A publish-subscribe mechanism for adding new data featuring
\co{rcu_assign_pointer()} for update-side publication and
\co{rcu_dereference()} for read-side subscription,
\item A way of waiting for pre-existing RCU readers to finish
based on readers being delimited by \co{rcu_read_lock()}
and \co{rcu_read_unlock()} on the one hand and
updaters waiting via \co{synchronize_rcu()} or
\co{call_rcu()} on the other
(see \cref{sec:memorder:RCU} for a formal description),
and
\item A discipline of maintaining multiple versions to permit
change without harming or unduly delaying concurrent RCU readers.
\end{enumerate}
\QuickQuiz{
How can RCU updaters possibly delay RCU readers, given that
neither \co{rcu_read_lock()} nor \co{rcu_read_unlock()}
spin or block?
}\QuickQuizAnswer{
The modifications undertaken by a given RCU updater will cause the
corresponding CPU to invalidate cache lines containing the data,
forcing the CPUs running concurrent RCU readers to incur expensive
cache misses.
(Can you design an algorithm that changes a data structure
\emph{without}
inflicting expensive cache misses on concurrent readers?
On subsequent readers?)
}\QuickQuizEnd
These three RCU components allow data to be updated in the face of concurrent
readers that might be executing the same sequence of machine instructions
that would be used by a reader in a single-threaded implementation.
These RCU components can be combined in different ways to implement a
surprising variety of different types of RCU-based algorithms, a number
of which are presented in
\cref{sec:defer:RCU Usage}.
However, it is usually better to work at higher levels of abstraction.
To this end, the next section describes the Linux-kernel API, which
includes simple data structures such as lists.