advsync/rcu.tex - pub/scm/linux/kernel/git/paulmck/perfbook - Git at Google

 % advsync/rcu.tex

 \section{Read-Copy Update}
 \label{sec:advsync:Read-Copy Update}

 Read-copy update (RCU) can be thought of as a replacement for
 reader-writer locking for
 which the read-side primitives are extremely light weight, and where
 readers and writers do not exclude each other.
 This last means that writers must leave old versions
 of RCU-protected data structures
 in place until all concurrent readers have completed,
 so that RCU write-side primitives can be quite expensive.
 RCU is thus best-suited for read-mostly situations, or for situations
 in which realtime constraints require deterministic readers.

 RCU provides the \co{rcu_read_lock()} and \co{rcu_read_unlock}
 primitives to demark read-side critical sections, and
 \co{synchronize_rcu} to allow writers to wait for completion
 of all pre-existing readers.
 The time that waiters much wait is termed a ``grace period''.
 RCU read-side critical sections may be nested, which results in
 the inner critical sections being subsumed into the outermost critical
 section.

 \subsection{RCU Read-Side Critical Sections and Grace Periods}
 \label{sec:advsync:RCU Read-Side Critical Sections and Grace Periods}

 Legal and illegal relationships of RCU read-side critical sections
 to a grace periods are shown in
 Figure~\ref{fig:advsync:RCU Readers and Grace Periods}.
 Each blue box in the diagram represents an RCU read-side critical
 section, as shown below.
 The green box represents update-initiation code, for example, a
 \co{list_del_rcu()} removing an element from an RCU-protected linked list,
 with the color green representing the fact that readers can freely
 access the element being removed.
 The yellow box represents an RCU grace period, for example, a
 \co{synchronize_rcu()}, with the color yell representing the fact that
 pre-existing readers may still access the removed element, but new
 readers cannot.
 Finally, the red box represents completion
 of the update, for example, a \co{kfree()} call to reclaim the removed
 element's memory, with the color red indicating that no readers may
 access the removed element.

 \vspace{5pt}
 \begin{minipage}[t]{\columnwidth}
 \begin{verbatim}
 rcu_read_lock();
 /* RCU read-side critical section. */
 rcu_read_unlock();
 \end{verbatim}
 \end{minipage}
 \vspace{5pt}

 \begin{figure}[htb]
 \centering
 \resizebox{3in}{!}{\includegraphics{advsync/RCUReaderGP}}
 \caption{RCU Readers and Grace Periods}
 \label{fig:advsync:RCU Readers and Grace Periods}
 \end{figure}

 This color coding indicates why it is forbidden for Reader~8 to
 begin before the grace period starts, but persist until after the
 grace period has ended.
 This situation might permit Reader~8 to still be holding a reference
 to the removed element, which could in turn result in memory corruption
 failures.

 So what happens if a reader executes for longer than planned?

 The grace period extends in this case for as long as necessary to
 accommodate any pre-existing reader, as shown in
 Figure~\ref{fig:advsync:RCU Readers Extending Grace Periods}.
 Note, however, that the grace period need not extend to cover Readers~2
 and 8, since these two readers started after the beginning of the
 grace period, and therefore cannot possibly have gained a reference
 to the removed element.

 \begin{figure}[htb]
 \centering
 \resizebox{3in}{!}{\includegraphics{advsync/RCUReaderGPExtends}}
 \caption{RCU Readers Extending Grace Period}
 \label{fig:advsync:RCU Readers Extending Grace Periods}
 \end{figure}

 The next sections show how these concepts can be applied to linked-list
 data structures.

 \subsection{Deletion From an RCU-Protected Linked List}
 \label{sec:advsync:Deletion From an RCU-Protected Linked List}

 Figure~\ref{fig:advsync:RCU Deletion From Linked List}
 illustrates deletion from an RCU-protected linked list,
 with time advancing from top to bottom.

 \begin{figure}[htb]
 \centering
 \resizebox{3in}{!}{\includegraphics{advsync/RCUDeletion}}
 \caption{RCU Deletion From Linked List}
 \label{fig:advsync:RCU Deletion From Linked List}
 \end{figure}

 Initially, the list contains elements~A, B, and C, in that order.
 RCU readers traverse the list concurrently with the update,
 perhaps executing code like the following:

 \vspace{5pt}
 \begin{minipage}[t]{\columnwidth}
 \begin{verbatim}
 rcu_read_lock();
 list_for_each_entry_rcu(p, h, le)
   if (p->key == k)
   	break;
 present = p != NULL;
 rcu_read_unlock();
 \end{verbatim}
 \end{minipage}
 \vspace{5pt}

 The \co{rcu_read_lock()} and \co{rcu_read_unlock()} primitives
 delimit the RCU read-side critical section, preventing a grace
 period from elapsing during this section of code.
 Note that a grace period that commenced prior to the \co{rcu_read_lock()}
 might complete during this critical section, but that a grace period
 that begins after the \co{rcu_read_lock()} must not complete until
 after completion of the \co{rcu_read_unlock()}.

 Updaters might be executing code like the following:

 \vspace{5pt}
 \begin{minipage}[t]{\columnwidth}
 \begin{verbatim}
 /* p == element to delete. */
 down(&update_sema);
 list_del_rcu(p);
 up(&update_sema);
 synchronize_rcu();
 kfree(p);
 \end{verbatim}
 \end{minipage}
 \vspace{5pt}

 The \co{down()} and \co{up()} Linux-kernel primitives act as sleeplock,
 with \co{down()} acquiring the sleeplock ``\co{update_sema}''
 and \co{up()} releasing it.

 Note that both the \co{rcu_read_lock()} and \co{rcu_read_unlock()}
 primitives have $O\left(1\right)$ execution time, and in particular, that
 neither of them block or spin.
 This means that the update code has no way of excluding readers.

 Let us now walk through the execution of the update code
 while referring to
 Figure~\ref{fig:advsync:RCU Deletion From Linked List}.
 A \co{list_del_rcu()} removes element~B from the list, so that
 pre-existing readers might still be referencing
 element~B, but new readers will advance from A directly to C.
 Element~B's pointers remain intact, but so that readers still
 referencing element~B will still advance to element~C.
 A \co{synchronize_kernel()} waits for a grace period to elapse,
 so that all readers that might have been referencing element~B have
 advanced through the list.
 Because new readers can no longer gain a reference to element~B,
 it is now safe to execute \co{kfree()} to free up element~B's memory.

 The \co{synchronize_rcu()} primitive can be thought of as a temporal
 mutual-exclusion primitive, since it prevents the following \co{kfree()}
 from executing until after any pre-existing readers complete.
 The \co{kfree()} might well run concurrently with \emph{new}
 RCU read-side critical sections, but it is guaranteed not to run
 until after all \emph{old} RCU read-side critical sections complete.

 The next section describes in-place update of an RCU-protected
 linked list, the procedure that gave RCU its name.

 \subsection{Replacement in an RCU-Protected Linked List}
 \label{sec:advsync:Replacement in an RCU-Protected Linked List}

 Figure~\ref{fig:advsync:RCU Replacement in Linked List}
 shows how an RCU-protected data structure may be updated via
 copying.
 The read-side code is the same as for the previous example, but
 the update code is outlined as follows:

 \vspace{5pt}
 \begin{minipage}[t]{\columnwidth}
 \begin{verbatim}
 down(&update_sema);
 /* p == element to update */
 q = kmalloc(sizeof(*q), GFP_KERNEL);
 *q = *p;
 list_replace_rcu(p, q);
 up(&update_sema);
 synchronize_rcu();
 kfree(p);
 \end{verbatim}
 \end{minipage}
 \vspace{5pt}

 As before, we start with a linked list with elements~A, B, and C
 in that order, and, again as before, readers might be traversing
 this list throughout the update process.
 To update element~B, we first allocate a new element and copy element~B
 to it, then update the copy to produce element~B'.
 We then execute \co{list_replace_rcu()} so that element~A now
 references the new element~B'---however, element~B still references
 element~C so that any pre-existing readers still referencing old element~B
 are still able to advance to element~C.
 New readers will find element~B'.

 After executing \co{synchronize_rcu()}, which again waits for a grace
 period to elapse, all readers that were referencing old element~B'
 will have advanced through the list, so that \co{kfree()} may now
 be invoked on element~B, resulting in the final state with the list
 now containing elements~A, B', and C.

 This procedure where \emph{readers} continue traversing the list
 while a \emph{copy} operation is used to carry out an \emph{update}
 is what gives RCU---or read-copy update---its name.

 \begin{figure}[p]
 \centering
 \resizebox{3in}{!}{\includegraphics{advsync/RCUReplacement}}
 \caption{RCU Replacement in Linked List}
 \label{fig:advsync:RCU Replacement in Linked List}
 \end{figure}

 \subsection{Insertion into an RCU-Protected Linked List}
 \label{sec:advsync:Insertion into an RCU-Protected Linked List}

 Insertion into an RCU-protected linked list is straightforward, as
 illustrated by
 Figure~\ref{fig:advsync:RCU Insertion into Linked List}.
 The read-side code is again that shown in
 Section~\ref{sec:advsync:Deletion From an RCU-Protected Linked List},
 and the update code might be as follows, where ``h'' is the list header:

 \vspace{5pt}
 \begin{minipage}[t]{\columnwidth}
 \begin{verbatim}
 down(&update_sema);
 p = kmalloc(sizeof(*p), GFP_KERNEL);
 initialize(p);
 list_add_rcu(p, h);
 up(&update_sema);
 \end{verbatim}
 \end{minipage}
 \vspace{5pt}

 Referring to
 Figure~\ref{fig:advsync:RCU Insertion into Linked List},
 the \co{kmalloc()} primitive and the programmer-supplied \co{initialize()}
 function allocate and initialize the new element~B,
 and the \co{list_add_rcu()} primitive inserts it into the list.
 There is no need for grace periods in this case: readers will either
 see the new element~B or not, depending on when they execute relative
 to the update.

 \begin{figure}[htb]
 \centering
 \resizebox{3in}{!}{\includegraphics{advsync/RCUInsertion}}
 \caption{RCU Insertion into Linked List}
 \label{fig:advsync:RCU Insertion into Linked List}
 \end{figure}

 \subsection{RCU API}
 \label{sec:advsync:RCU API}

 RCU can be best thought of as an API, particularly given the large number
 of implementations available for the RCU API.
 Figure~\ref{fig:advsync:RCU API}
 shows a color-coded guide to the major members of the Linux-kernel
 RCU API, with the read-side primitives in blue boxes, the update-initiation
 primitives in the green box, the grace-period primitives in the yellow box,
 and the update-completion primitives in the red box.

 \begin{figure}[htb]
 \centering
 \includegraphics{advsync/RCU-API}
 \caption{RCU API}
 \label{fig:advsync:RCU API}
 \end{figure}

 As noted earlier, \co{rcu_read_lock()} and \co{rcu_read_unlock()}
 delimit RCU read-side critical sections.
 The \co{list_for_each_entry_rcu(p,h,f)} primitive is an iterator that
 traverses the RCU-protected linked list with header ``h'' threaded
 through struct field ``f'' using
 local variable ``p'' as the iteration pointer.
 The \co{rcu_dereference()} primitive is used to traverse RCU-protected
 pointers, as in \co{rcu_dereference(p)->next}.

 The \co{list_add_rcu()}, \co{list_del_rcu()}, and
 \co{list_replace_rcu()} primitives perform the expected linked-list
 operations, as illustrated in the examples in the preceding sections.
 The \co{rcu_assign_pointer()} primitive assigns a new value to an
 RCU-protected field, for example, \co{rcu_assign_pointer(p->next,q)}
 would add the structure referenced by ``q'' following that referenced
 by ``p''.

 The \co{synchronize_rcu()} primitive waits for an RCU grace period
 to elapse, as see in the examples in the previous sections.
 The \co{call_rcu()} primitive causes a function to be invoked
 at the end of a subsequent grace period, so that \co{call_rcu(p,f)}
 would cause \co{f(p)} to be invoked after a grace period.
 The \co{call_rcu()} primitive is thus the continuation form of the
 \co{synchronize_rcu()} primitive.

 Finally note that \co{kfree()} is simply an example, as any primitive
 that frees memory can be used to complete an update.

 The RCU API in the Linux kernel is somewhat more elaborate in order to
 allow for the kernel's variety of different contexts and environments,
 including interrupt and NMI handlers.

 \QuickQuiz{}
 	Why is an \co{rcu_dereference()} primitive required?
 \QuickQuizAnswer{
 	The \co{rcu_dereference()} primitive provides the memory
 	barrier required by the DEC Alpha.
 	Without such a memory barrier, the Alpha can dereference
 	the new pointer and find pre-initialization
 	data~\cite{Compaq01,WilliamPugh2000Gharachorloo}.
 	Perhaps more importantly, it serves to document which
 	pointers are are protected by RCU.
 	But most important of all, it disables compiler optimizations
 	that might otherwise result in double fetches of the
 	argument to \co{rcu_dereference()}.
 } \QuickQuizEnd

 \QuickQuiz{}
 	Why is a separate \co{list_for_each_entry_rcu()} required?
 	In other words, why can't the existing
 	\co{list_for_each_entry()} primitive be used instead?
 \QuickQuizAnswer{
 	The \co{list_for_each_entry_rcu()} primitive includes the
 	use of \co{rcu_dereference()} required by the DEC Alpha.
 } \QuickQuizEnd

 \QuickQuiz{}
 	Why would one need to use \co{rcu_assign_pointer(p->next,q)}
 	instead of the straightforward \co{p->next=q}?
 \QuickQuizAnswer{
 	On weakly consistent machines, a memory barrier is required
 	between the initialization of a data structure and the
 	assignment of the corresponding pointer that allows other
 	CPUs to access the data structure.
 	Without such a memory barrier, these other CPUs might
 	see the assignments out of order, thus seeing the preinitialization
 	contents of the data structure.
 	The \co{rcu_assign_pointer()} primitive includes such a
 	memory barrier on any architectures requiring it.
 } \QuickQuizEnd

 \QuickQuiz{}
 	Why are there separate \co{list_add_rcu()},
 	\co{list_del_rcu()},
 	and \co{list_replace_rcu()} primitives?
 	Why couldn't the existing \co{list_add()}, \co{list_del()},
 	and \co{list_replace()} primitives be used instead?
 \QuickQuizAnswer{
 	The \co{list_add_rcu()} and \co{list_replace_rcu()} primitives
 	must use the \co{rcu_assign_pointer()} primitive
 	(or, alternatively, explicit memory barriers).
 	The \co{list_del_rcu()} primitive must omit the ``poisoning''
 	of list pointers that is used by \co{list_del()} as a
 	debug check.
 } \QuickQuizEnd

 \subsection{RCU Properties}
 \label{sec:advsync:RCU Properties}

 The following are important properties of RCU, some of which are
 beneficial, some adverse, and others dependent on the situation.

 \begin{enumerate}
 \item	Read-side primitives have very fast and deterministic
 	execution time.
 \item	Grace-period-detection primitives can have rather large
 	and non-deterministic overheads.
 \item	Read-side primitives cannot participate in a deadlock cycle.
 \item	Read-side primitives cannot participate in a priority-inversion
 	scenarios.
 \item	When updates are guarded by locking, read-side critical
 	sections can unconditionally acquire the update-side lock.
 \item	Read-side critical sections run concurrently with updates.
 	Without this read/update concurrency, the read-side primitives could
 	not possibly be fast and deterministic, however, some algorithms
 	do not straightforwardly tolerate such concurrency.
 \item	Some implementations of the \co{call_rcu()} primitive
 	can place very low overhead on the thread invoking
 	\co{call_rcu()}, although the large overhead of grace-period
 	detection is merely deferred, rather than avoided, in this
 	case.
 \end{enumerate}

 RCU is therefore attractive in read-mostly situations where either
 good performance and scalability or realtime response are required.
 These properties will be discussed at greater length @@@.

 @@@@ move RCU usage to its own chapter?

 \begin{enumerate}
 \item	Pure RCU~(\ref{sec:advsync:Pure RCU}).
 \item	Reader-Writer-Lock/RCU
         Analogy~(\ref{sec:advsync:Reader-Writer-Lock/RCU Analogy}).
 \item	RCU Existence Locks(\ref{sec:advsync:RCU Existence Locks}).
 \item	RCU Readers With NBS
 	Writers~(\ref{sec:advsync:RCU Readers With NBS Writers}).
 \end{enumerate}

 \subsection{Pure RCU}
 \label{sec:advsync:Pure RCU}

 Pure RCU is used to effect a lazy change in mode, for example, preventing
 any new threads from accessing a subsystem, then waiting for all
 pre-existing threads to complete before tearing it down.

 \emph{@@@ Focus on change-in-mode uses.  (One can argue that the interrupt/NMI
 uses are more closely related to reader-writer-lock/RCU analogy.)
 Potential example: RCU guarding a variable prohibiting additional reference
 counts for cleanup of the corresponding data structure.}

 \subsection{Reader-Writer-Lock/RCU Analogy}
 \label{sec:advsync:Reader-Writer-Lock/RCU Analogy}

 \subsection{RCU Existence Locks}
 \label{sec:advsync:RCU Existence Locks}

 \subsection{RCU Readers With NBS Writers}
 \label{sec:advsync:RCU Readers With NBS Writers}

 \emph{@@@ simple intro, along with performance data on example from SMPdesign.}

 \emph{@@@ consider moving advanced RCU usage to a separate chapter.
 Add more material from chapter 5 of dissertation, particularly transformational
 patterns.
 \url{http://www.rdrop.com/users/paulmck/RCU/RCUdissertation.2004.07.14e1.pdf}.}

 \emph{@@@ consider creating an RCU applications chapter/appendix, with
 information from chapter 6 of dissertation.}
	% advsync/rcu.tex

	\section{Read-Copy Update}
	\label{sec:advsync:Read-Copy Update}

	Read-copy update (RCU) can be thought of as a replacement for
	reader-writer locking for
	which the read-side primitives are extremely light weight, and where
	readers and writers do not exclude each other.
	This last means that writers must leave old versions
	of RCU-protected data structures
	in place until all concurrent readers have completed,
	so that RCU write-side primitives can be quite expensive.
	RCU is thus best-suited for read-mostly situations, or for situations
	in which realtime constraints require deterministic readers.

	RCU provides the \co{rcu_read_lock()} and \co{rcu_read_unlock}
	primitives to demark read-side critical sections, and
	\co{synchronize_rcu} to allow writers to wait for completion
	of all pre-existing readers.
	The time that waiters much wait is termed a ``grace period''.
	RCU read-side critical sections may be nested, which results in
	the inner critical sections being subsumed into the outermost critical
	section.

	\subsection{RCU Read-Side Critical Sections and Grace Periods}
	\label{sec:advsync:RCU Read-Side Critical Sections and Grace Periods}

	Legal and illegal relationships of RCU read-side critical sections
	to a grace periods are shown in
	Figure~\ref{fig:advsync:RCU Readers and Grace Periods}.
	Each blue box in the diagram represents an RCU read-side critical
	section, as shown below.
	The green box represents update-initiation code, for example, a
	\co{list_del_rcu()} removing an element from an RCU-protected linked list,
	with the color green representing the fact that readers can freely
	access the element being removed.
	The yellow box represents an RCU grace period, for example, a
	\co{synchronize_rcu()}, with the color yell representing the fact that
	pre-existing readers may still access the removed element, but new
	readers cannot.
	Finally, the red box represents completion
	of the update, for example, a \co{kfree()} call to reclaim the removed
	element's memory, with the color red indicating that no readers may
	access the removed element.

	\vspace{5pt}
	\begin{minipage}[t]{\columnwidth}
	\begin{verbatim}
	rcu_read_lock();
	/* RCU read-side critical section. */
	rcu_read_unlock();
	\end{verbatim}
	\end{minipage}
	\vspace{5pt}

	\begin{figure}[htb]
	\centering
	\resizebox{3in}{!}{\includegraphics{advsync/RCUReaderGP}}
	\caption{RCU Readers and Grace Periods}
	\label{fig:advsync:RCU Readers and Grace Periods}
	\end{figure}

	This color coding indicates why it is forbidden for Reader~8 to
	begin before the grace period starts, but persist until after the
	grace period has ended.
	This situation might permit Reader~8 to still be holding a reference
	to the removed element, which could in turn result in memory corruption
	failures.

	So what happens if a reader executes for longer than planned?

	The grace period extends in this case for as long as necessary to
	accommodate any pre-existing reader, as shown in
	Figure~\ref{fig:advsync:RCU Readers Extending Grace Periods}.
	Note, however, that the grace period need not extend to cover Readers~2
	and 8, since these two readers started after the beginning of the
	grace period, and therefore cannot possibly have gained a reference
	to the removed element.

	\begin{figure}[htb]
	\centering
	\resizebox{3in}{!}{\includegraphics{advsync/RCUReaderGPExtends}}
	\caption{RCU Readers Extending Grace Period}
	\label{fig:advsync:RCU Readers Extending Grace Periods}
	\end{figure}

	The next sections show how these concepts can be applied to linked-list
	data structures.

	\subsection{Deletion From an RCU-Protected Linked List}
	\label{sec:advsync:Deletion From an RCU-Protected Linked List}

	Figure~\ref{fig:advsync:RCU Deletion From Linked List}
	illustrates deletion from an RCU-protected linked list,
	with time advancing from top to bottom.

	\begin{figure}[htb]
	\centering
	\resizebox{3in}{!}{\includegraphics{advsync/RCUDeletion}}
	\caption{RCU Deletion From Linked List}
	\label{fig:advsync:RCU Deletion From Linked List}
	\end{figure}

	Initially, the list contains elements~A, B, and C, in that order.
	RCU readers traverse the list concurrently with the update,
	perhaps executing code like the following:

	\vspace{5pt}
	\begin{minipage}[t]{\columnwidth}
	\begin{verbatim}
	rcu_read_lock();
	list_for_each_entry_rcu(p, h, le)
	if (p->key == k)
	break;
	present = p != NULL;
	rcu_read_unlock();
	\end{verbatim}
	\end{minipage}
	\vspace{5pt}

	The \co{rcu_read_lock()} and \co{rcu_read_unlock()} primitives
	delimit the RCU read-side critical section, preventing a grace
	period from elapsing during this section of code.
	Note that a grace period that commenced prior to the \co{rcu_read_lock()}
	might complete during this critical section, but that a grace period
	that begins after the \co{rcu_read_lock()} must not complete until
	after completion of the \co{rcu_read_unlock()}.

	Updaters might be executing code like the following:

	\vspace{5pt}
	\begin{minipage}[t]{\columnwidth}
	\begin{verbatim}
	/* p == element to delete. */
	down(&update_sema);
	list_del_rcu(p);
	up(&update_sema);
	synchronize_rcu();
	kfree(p);
	\end{verbatim}
	\end{minipage}
	\vspace{5pt}

	The \co{down()} and \co{up()} Linux-kernel primitives act as sleeplock,
	with \co{down()} acquiring the sleeplock ``\co{update_sema}''
	and \co{up()} releasing it.

	Note that both the \co{rcu_read_lock()} and \co{rcu_read_unlock()}
	primitives have $O\left(1\right)$ execution time, and in particular, that
	neither of them block or spin.
	This means that the update code has no way of excluding readers.

	Let us now walk through the execution of the update code
	while referring to
	Figure~\ref{fig:advsync:RCU Deletion From Linked List}.
	A \co{list_del_rcu()} removes element~B from the list, so that
	pre-existing readers might still be referencing
	element~B, but new readers will advance from A directly to C.
	Element~B's pointers remain intact, but so that readers still
	referencing element~B will still advance to element~C.
	A \co{synchronize_kernel()} waits for a grace period to elapse,
	so that all readers that might have been referencing element~B have
	advanced through the list.
	Because new readers can no longer gain a reference to element~B,
	it is now safe to execute \co{kfree()} to free up element~B's memory.

	The \co{synchronize_rcu()} primitive can be thought of as a temporal
	mutual-exclusion primitive, since it prevents the following \co{kfree()}
	from executing until after any pre-existing readers complete.
	The \co{kfree()} might well run concurrently with \emph{new}
	RCU read-side critical sections, but it is guaranteed not to run
	until after all \emph{old} RCU read-side critical sections complete.

	The next section describes in-place update of an RCU-protected
	linked list, the procedure that gave RCU its name.

	\subsection{Replacement in an RCU-Protected Linked List}
	\label{sec:advsync:Replacement in an RCU-Protected Linked List}

	Figure~\ref{fig:advsync:RCU Replacement in Linked List}
	shows how an RCU-protected data structure may be updated via
	copying.
	The read-side code is the same as for the previous example, but
	the update code is outlined as follows:

	\vspace{5pt}
	\begin{minipage}[t]{\columnwidth}
	\begin{verbatim}
	down(&update_sema);
	/* p == element to update */
	q = kmalloc(sizeof(*q), GFP_KERNEL);
	q = p;
	list_replace_rcu(p, q);
	up(&update_sema);
	synchronize_rcu();
	kfree(p);
	\end{verbatim}
	\end{minipage}
	\vspace{5pt}

	As before, we start with a linked list with elements~A, B, and C
	in that order, and, again as before, readers might be traversing
	this list throughout the update process.
	To update element~B, we first allocate a new element and copy element~B
	to it, then update the copy to produce element~B'.
	We then execute \co{list_replace_rcu()} so that element~A now
	references the new element~B'---however, element~B still references
	element~C so that any pre-existing readers still referencing old element~B
	are still able to advance to element~C.
	New readers will find element~B'.

	After executing \co{synchronize_rcu()}, which again waits for a grace
	period to elapse, all readers that were referencing old element~B'
	will have advanced through the list, so that \co{kfree()} may now
	be invoked on element~B, resulting in the final state with the list
	now containing elements~A, B', and C.

	This procedure where \emph{readers} continue traversing the list
	while a \emph{copy} operation is used to carry out an \emph{update}
	is what gives RCU---or read-copy update---its name.

	\begin{figure}[p]
	\centering
	\resizebox{3in}{!}{\includegraphics{advsync/RCUReplacement}}
	\caption{RCU Replacement in Linked List}
	\label{fig:advsync:RCU Replacement in Linked List}
	\end{figure}

	\subsection{Insertion into an RCU-Protected Linked List}
	\label{sec:advsync:Insertion into an RCU-Protected Linked List}

	Insertion into an RCU-protected linked list is straightforward, as
	illustrated by
	Figure~\ref{fig:advsync:RCU Insertion into Linked List}.
	The read-side code is again that shown in
	Section~\ref{sec:advsync:Deletion From an RCU-Protected Linked List},
	and the update code might be as follows, where ``h'' is the list header:

	\vspace{5pt}
	\begin{minipage}[t]{\columnwidth}
	\begin{verbatim}
	down(&update_sema);
	p = kmalloc(sizeof(*p), GFP_KERNEL);
	initialize(p);
	list_add_rcu(p, h);
	up(&update_sema);
	\end{verbatim}
	\end{minipage}
	\vspace{5pt}

	Referring to
	Figure~\ref{fig:advsync:RCU Insertion into Linked List},
	the \co{kmalloc()} primitive and the programmer-supplied \co{initialize()}
	function allocate and initialize the new element~B,
	and the \co{list_add_rcu()} primitive inserts it into the list.
	There is no need for grace periods in this case: readers will either
	see the new element~B or not, depending on when they execute relative
	to the update.

	\begin{figure}[htb]
	\centering
	\resizebox{3in}{!}{\includegraphics{advsync/RCUInsertion}}
	\caption{RCU Insertion into Linked List}
	\label{fig:advsync:RCU Insertion into Linked List}
	\end{figure}

	\subsection{RCU API}
	\label{sec:advsync:RCU API}

	RCU can be best thought of as an API, particularly given the large number
	of implementations available for the RCU API.
	Figure~\ref{fig:advsync:RCU API}
	shows a color-coded guide to the major members of the Linux-kernel
	RCU API, with the read-side primitives in blue boxes, the update-initiation
	primitives in the green box, the grace-period primitives in the yellow box,
	and the update-completion primitives in the red box.

	\begin{figure}[htb]
	\centering
	\includegraphics{advsync/RCU-API}
	\caption{RCU API}
	\label{fig:advsync:RCU API}
	\end{figure}

	As noted earlier, \co{rcu_read_lock()} and \co{rcu_read_unlock()}
	delimit RCU read-side critical sections.
	The \co{list_for_each_entry_rcu(p,h,f)} primitive is an iterator that
	traverses the RCU-protected linked list with header ``h'' threaded
	through struct field ``f'' using
	local variable ``p'' as the iteration pointer.
	The \co{rcu_dereference()} primitive is used to traverse RCU-protected
	pointers, as in \co{rcu_dereference(p)->next}.

	The \co{list_add_rcu()}, \co{list_del_rcu()}, and
	\co{list_replace_rcu()} primitives perform the expected linked-list
	operations, as illustrated in the examples in the preceding sections.
	The \co{rcu_assign_pointer()} primitive assigns a new value to an
	RCU-protected field, for example, \co{rcu_assign_pointer(p->next,q)}
	would add the structure referenced by ``q'' following that referenced
	by ``p''.

	The \co{synchronize_rcu()} primitive waits for an RCU grace period
	to elapse, as see in the examples in the previous sections.
	The \co{call_rcu()} primitive causes a function to be invoked
	at the end of a subsequent grace period, so that \co{call_rcu(p,f)}
	would cause \co{f(p)} to be invoked after a grace period.
	The \co{call_rcu()} primitive is thus the continuation form of the
	\co{synchronize_rcu()} primitive.

	Finally note that \co{kfree()} is simply an example, as any primitive
	that frees memory can be used to complete an update.

	The RCU API in the Linux kernel is somewhat more elaborate in order to
	allow for the kernel's variety of different contexts and environments,
	including interrupt and NMI handlers.

	\QuickQuiz{}
	Why is an \co{rcu_dereference()} primitive required?
	\QuickQuizAnswer{
	The \co{rcu_dereference()} primitive provides the memory
	barrier required by the DEC Alpha.
	Without such a memory barrier, the Alpha can dereference
	the new pointer and find pre-initialization
	data~\cite{Compaq01,WilliamPugh2000Gharachorloo}.
	Perhaps more importantly, it serves to document which
	pointers are are protected by RCU.
	But most important of all, it disables compiler optimizations
	that might otherwise result in double fetches of the
	argument to \co{rcu_dereference()}.
	} \QuickQuizEnd

	\QuickQuiz{}
	Why is a separate \co{list_for_each_entry_rcu()} required?
	In other words, why can't the existing
	\co{list_for_each_entry()} primitive be used instead?
	\QuickQuizAnswer{
	The \co{list_for_each_entry_rcu()} primitive includes the
	use of \co{rcu_dereference()} required by the DEC Alpha.
	} \QuickQuizEnd

	\QuickQuiz{}
	Why would one need to use \co{rcu_assign_pointer(p->next,q)}
	instead of the straightforward \co{p->next=q}?
	\QuickQuizAnswer{
	On weakly consistent machines, a memory barrier is required
	between the initialization of a data structure and the
	assignment of the corresponding pointer that allows other
	CPUs to access the data structure.
	Without such a memory barrier, these other CPUs might
	see the assignments out of order, thus seeing the preinitialization
	contents of the data structure.
	The \co{rcu_assign_pointer()} primitive includes such a
	memory barrier on any architectures requiring it.
	} \QuickQuizEnd

	\QuickQuiz{}
	Why are there separate \co{list_add_rcu()},
	\co{list_del_rcu()},
	and \co{list_replace_rcu()} primitives?
	Why couldn't the existing \co{list_add()}, \co{list_del()},
	and \co{list_replace()} primitives be used instead?
	\QuickQuizAnswer{
	The \co{list_add_rcu()} and \co{list_replace_rcu()} primitives
	must use the \co{rcu_assign_pointer()} primitive
	(or, alternatively, explicit memory barriers).
	The \co{list_del_rcu()} primitive must omit the ``poisoning''
	of list pointers that is used by \co{list_del()} as a
	debug check.
	} \QuickQuizEnd

	\subsection{RCU Properties}
	\label{sec:advsync:RCU Properties}

	The following are important properties of RCU, some of which are
	beneficial, some adverse, and others dependent on the situation.

	\begin{enumerate}
	\item Read-side primitives have very fast and deterministic
	execution time.
	\item Grace-period-detection primitives can have rather large
	and non-deterministic overheads.
	\item Read-side primitives cannot participate in a deadlock cycle.
	\item Read-side primitives cannot participate in a priority-inversion
	scenarios.
	\item When updates are guarded by locking, read-side critical
	sections can unconditionally acquire the update-side lock.
	\item Read-side critical sections run concurrently with updates.
	Without this read/update concurrency, the read-side primitives could
	not possibly be fast and deterministic, however, some algorithms
	do not straightforwardly tolerate such concurrency.
	\item Some implementations of the \co{call_rcu()} primitive
	can place very low overhead on the thread invoking
	\co{call_rcu()}, although the large overhead of grace-period
	detection is merely deferred, rather than avoided, in this
	case.
	\end{enumerate}

	RCU is therefore attractive in read-mostly situations where either
	good performance and scalability or realtime response are required.
	These properties will be discussed at greater length @@@.

	@@@@ move RCU usage to its own chapter?

	\begin{enumerate}
	\item Pure RCU~(\ref{sec:advsync:Pure RCU}).
	\item Reader-Writer-Lock/RCU
	Analogy~(\ref{sec:advsync:Reader-Writer-Lock/RCU Analogy}).
	\item RCU Existence Locks(\ref{sec:advsync:RCU Existence Locks}).
	\item RCU Readers With NBS
	Writers~(\ref{sec:advsync:RCU Readers With NBS Writers}).
	\end{enumerate}

	\subsection{Pure RCU}
	\label{sec:advsync:Pure RCU}

	Pure RCU is used to effect a lazy change in mode, for example, preventing
	any new threads from accessing a subsystem, then waiting for all
	pre-existing threads to complete before tearing it down.

	\emph{@@@ Focus on change-in-mode uses. (One can argue that the interrupt/NMI
	uses are more closely related to reader-writer-lock/RCU analogy.)
	Potential example: RCU guarding a variable prohibiting additional reference
	counts for cleanup of the corresponding data structure.}

	\subsection{Reader-Writer-Lock/RCU Analogy}
	\label{sec:advsync:Reader-Writer-Lock/RCU Analogy}

	\subsection{RCU Existence Locks}
	\label{sec:advsync:RCU Existence Locks}

	\subsection{RCU Readers With NBS Writers}
	\label{sec:advsync:RCU Readers With NBS Writers}

	\emph{@@@ simple intro, along with performance data on example from SMPdesign.}

	\emph{@@@ consider moving advanced RCU usage to a separate chapter.
	Add more material from chapter 5 of dissertation, particularly transformational
	patterns.
	\url{http://www.rdrop.com/users/paulmck/RCU/RCUdissertation.2004.07.14e1.pdf}.}

	\emph{@@@ consider creating an RCU applications chapter/appendix, with
	information from chapter 6 of dissertation.}