| % defer/rcuAPI.tex |
| |
| \subsection{RCU Linux-Kernel API} |
| \label{sec:defer:RCU Linux-Kernel API} |
| \OriginallyPublished{Section}{sec:defer:RCU Linux-Kernel API}{RCU Linux-Kernel API}{Linux Weekly News}{PaulEMcKenney2008WhatIsRCUAPI} |
| |
| This section looks at RCU from the viewpoint of its Linux-kernel API. |
| Section~\ref{sec:defer:RCU has a Family of Wait-to-Finish APIs} |
| presents RCU's wait-to-finish APIs, and |
| Section~\ref{sec:defer:RCU has Publish-Subscribe and Version-Maintenance APIs} |
| presents RCU's publish-subscribe and version-maintenance APIs. |
| Finally, |
| Section~\ref{sec:defer:So, What is RCU Really?} |
| presents concluding remarks. |
| |
| \subsubsection{RCU has a Family of Wait-to-Finish APIs} |
| \label{sec:defer:RCU has a Family of Wait-to-Finish APIs} |
| |
| \begin{table*}[p] |
| \begin{center} |
| \scriptsize |
| \begin{tabular}{p{1.1in}|p{1.0in}|p{1.1in}|p{1.0in}|p{1.0in}} |
| Attribute & |
| RCU Classic & |
| RCU BH & |
| RCU Sched & |
| Realtime RCU \\ |
| \hline |
| \hline |
| Purpose & |
| Original & |
| Prevent DDoS attacks & |
| { \raggedright Wait for preempt-disable regions, |
| hardirqs, \& NMIs } & |
| Realtime response \\ |
| \hline |
| Availability & |
| 2.5.43 & |
| 2.6.9 & |
| 2.6.12 & |
| 2.6.26 \\ |
| \hline |
| Read-side primitives & |
| { \raggedright |
| \co{rcu_read_lock()}~! \\ |
| \co{rcu_read_unlock()}~! } & |
| { \raggedright |
| \co{rcu_read_lock_bh()} \\ |
| \co{rcu_read_unlock_bh()} } & |
| { \raggedright |
| \co{preempt_disable()} \\ |
| \co{preempt_enable()} \\ |
| (and friends) } & |
| { \raggedright |
| \co{rcu_read_lock()} \\ |
| \co{rcu_read_unlock()} } \\ |
| \hline |
| { Update-side primitives (synchronous) } & |
| { \co{synchronize_rcu()} \co{synchronize_net()} } & |
| & |
| \co{synchronize_sched()} & |
| { \co{synchronize_rcu()} \co{synchronize_net()} } \\ |
| \hline |
| { Update-side primitives (asynchronous/callback) } & |
| \co{call_rcu()} ! & |
| \co{call_rcu_bh()} & |
| \co{call_rcu_sched()} & |
| \co{call_rcu()} \\ |
| \hline |
| { Update-side primitives (wait for callbacks) } & |
| \co{rcu_barrier()} & |
| \co{rcu_barrier_bh()} & |
| \co{rcu_barrier_sched()} & |
| \co{rcu_barrier()} \\ |
| \hline |
| Type-safe memory & |
| \co{SLAB_DESTROY_BY_RCU} & |
| & |
| & |
| \co{SLAB_DESTROY_BY_RCU} \\ |
| \hline |
| Read side constraints & |
| No blocking & |
| No irq enabling & |
| No blocking & |
| Only preemption and lock acquisition \\ |
| \hline |
| Read side overhead & |
| Preempt disable/enable (free on non-PREEMPT) & |
| BH disable/enable & |
| Preempt disable/enable (free on non-PREEMPT) & |
| Simple instructions, irq disable/enable \\ |
| \hline |
| Asynchronous update-side overhead & |
| sub-microsecond & |
| sub-microsecond & |
| & |
| sub-microsecond \\ |
| \hline |
| Grace-period latency & |
| 10s of milliseconds & |
| 10s of milliseconds & |
| 10s of milliseconds & |
| 10s of milliseconds \\ |
| \hline |
| Non-\co{PREEMPT_RT} implementation & |
| RCU Classic & |
| RCU BH & |
| RCU Classic & |
| Preemptible RCU \\ |
| \hline |
| \co{PREEMPT_RT} implementation & |
| Preemptible RCU & |
| Realtime RCU & |
| Forced Schedule on all CPUs & |
| Realtime RCU \\ |
| \end{tabular} |
| \end{center} |
| \caption{RCU Wait-to-Finish APIs} |
| \label{tab:defer:RCU Wait-to-Finish APIs} |
| \end{table*} |
| |
| \begin{table*}[p] |
| \begin{center} |
| \scriptsize |
| \begin{tabular}{p{1.1in}|p{1.5in}|p{1.5in}} |
| Attribute & |
| SRCU & |
| QRCU \\ |
| \hline |
| \hline |
| Purpose & |
| Sleeping readers & |
| Sleeping readers and fast grace periods \\ |
| \hline |
| Availability & |
| 2.6.19 & |
| \\ |
| \hline |
| Read-side primitives & |
| { \raggedright |
| \co{srcu_read_lock()} \\ |
| \co{srcu_read_unlock()} } & |
| { \raggedright |
| \co{qrcu_read_lock()} \\ |
| \co{qrcu_read_unlock()} } \\ |
| \hline |
| { Update-side primitives (synchronous) } & |
| \co{synchronize_srcu()} & |
| \co{synchronize_qrcu()} \\ |
| \hline |
| { Update-side primitives (asynchronous/callback) } & |
| N/A & |
| N/A \\ |
| \hline |
| { Update-side primitives (wait for callbacks) } & |
| N/A & |
| N/A \\ |
| \hline |
| Type-safe memory & |
| & |
| \\ |
| \hline |
| Read side constraints & |
| No \co{synchronize_srcu()} & |
| No \co{synchronize_qrcu()} \\ |
| \hline |
| Read side overhead & |
| Simple instructions, preempt disable/enable & |
| Atomic increment and decrement of shared variable \\ |
| \hline |
| Asynchronous update-side overhead & |
| N/A & |
| N/A \\ |
| \hline |
| Grace-period latency & |
| 10s of milliseconds & |
| 10s of \emph{nanoseconds} in absence of readers \\ |
| \hline |
| Non-\co{PREEMPT_RT} implementation & |
| SRCU & |
| N/A \\ |
| \hline |
| \co{PREEMPT_RT} implementation & |
| SRCU & |
| N/A \\ |
| \end{tabular} |
| \end{center} |
| \caption{Sleepable RCU Wait-to-Finish APIs} |
| \label{tab:defer:Sleepable RCU Wait-to-Finish APIs} |
| \end{table*} |
| |
| The most straightforward answer to ``what is RCU'' is that RCU is |
| an API used in the Linux kernel, as summarized by |
| Tables~\ref{tab:defer:RCU Wait-to-Finish APIs} and |
| \ref{tab:defer:Sleepable RCU Wait-to-Finish APIs}, |
| which shows the wait-for-RCU-readers portions of the non-sleepable and |
| sleepable APIs, respectively, |
| and by |
| Table~\ref{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs}, |
| which shows the publish/subscribe portions of the API. |
| |
| If you are new to RCU, you might consider focusing on just one |
| of the columns in |
| Table~\ref{tab:defer:RCU Wait-to-Finish APIs}, |
| each of which summarizes one member of the Linux kernel's RCU API family. |
| For example, if you are primarily interested in understanding how RCU |
| is used in the Linux kernel, ``RCU Classic'' would be the place to start, |
| as it is used most frequently. |
| On the other hand, if you want to understand RCU for its own sake, |
| ``SRCU'' has the simplest API. |
| You can always come back for the other columns later. |
| |
| If you are already familiar with RCU, these tables can |
| serve as a useful reference. |
| |
| \QuickQuiz{} |
| Why do some of the cells in |
| Table~\ref{tab:defer:RCU Wait-to-Finish APIs} |
| have exclamation marks (``!'')? |
| \QuickQuizAnswer{ |
| The API members with exclamation marks (\co{rcu_read_lock()}, |
| \co{rcu_read_unlock()}, and \co{call_rcu()}) were the |
| only members of the Linux RCU API that Paul E. McKenney was aware |
| of back in the mid-90s. |
| During this timeframe, he was under the mistaken impression that |
| he knew all that there is to know about RCU. |
| } \QuickQuizEnd |
| |
| The ``RCU Classic'' column corresponds to the original RCU implementation, |
| in which RCU read-side critical sections are delimited by |
| \co{rcu_read_lock()} and \co{rcu_read_unlock()}, which |
| may be nested. |
| The corresponding synchronous update-side primitives, |
| \co{synchronize_rcu()}, along with its synonym |
| \co{synchronize_net()}, wait for any currently executing |
| RCU read-side critical sections to complete. |
| The length of this wait is known as a ``grace period''. |
| The asynchronous update-side primitive, \co{call_rcu()}, |
| invokes a specified function with a specified argument after a |
| subsequent grace period. |
| For example, \co{call_rcu(p,f);} will result in |
| the ``RCU callback'' \co{f(p)} |
| being invoked after a subsequent grace period. |
| There are situations, |
| such as when unloading a Linux-kernel module that uses \co{call_rcu()}, |
| when it is necessary to wait for all |
| outstanding RCU callbacks to complete~\cite{PaulEMcKenney2007rcubarrier}. |
| The \co{rcu_barrier()} primitive does this job. |
| Note that the more recent hierarchical |
| RCU~\cite{PaulEMcKenney2008HierarchicalRCU} |
| implementation described in |
| Sections~\ref{app:rcuimpl:rcutree:Hierarchical RCU Overview} and |
| \ref{app:rcuimpl:rcutreewt:Hierarchical RCU Code Walkthrough} |
| also adheres to ``RCU Classic'' semantics. |
| |
| Finally, RCU may be used to provide |
| type-safe memory~\cite{Cheriton96a}, as described in |
| Section~\ref{sec:defer:RCU is a Way of Providing Type-Safe Memory}. |
| In the context of RCU, type-safe memory guarantees that a given |
| data element will not change type during any RCU read-side critical section |
| that accesses it. |
| To make use of RCU-based type-safe memory, pass |
| \co{SLAB_DESTROY_BY_RCU} to |
| \co{kmem_cache_create()}. |
| It is important to note that \co{SLAB_DESTROY_BY_RCU} will |
| \emph{in no way} |
| prevent \co{kmem_cache_alloc()} from immediately reallocating |
| memory that was just now freed via \co{kmem_cache_free()}! |
| In fact, the \co{SLAB_DESTROY_BY_RCU}-protected data structure |
| just returned by \co{rcu_dereference} might be freed and reallocated |
| an arbitrarily large number of times, even when under the protection |
| of \co{rcu_read_lock()}. |
| Instead, \co{SLAB_DESTROY_BY_RCU} operates by preventing |
| \co{kmem_cache_free()} |
| from returning a completely freed-up slab of data structures |
| to the system until after an RCU grace period elapses. |
| In short, although the data element might be freed and reallocated arbitrarily |
| often, at least its type will remain the same. |
| |
| \QuickQuiz{} |
| How do you prevent a huge number of RCU read-side critical |
| sections from indefinitely blocking a \co{synchronize_rcu()} |
| invocation? |
| \QuickQuizAnswer{ |
| There is no need to do anything to prevent RCU read-side |
| critical sections from indefinitely blocking a |
| \co{synchronize_rcu()} invocation, because the |
| \co{synchronize_rcu()} invocation need wait only for |
| \emph{pre-existing} RCU read-side critical sections. |
| So as long as each RCU read-side critical section is |
| of finite duration, there should be no problem. |
| } \QuickQuizEnd |
| |
| \QuickQuiz{} |
| The \co{synchronize_rcu()} API waits for all pre-existing |
| interrupt handlers to complete, right? |
| \QuickQuizAnswer{ |
| Absolutely not! |
| And especially not when using preemptible RCU! |
| You instead want \co{synchronize_irq()}. |
| Alternatively, you can place calls to \co{rcu_read_lock()} |
| and \co{rcu_read_unlock()} in the specific interrupt handlers that |
| you want \co{synchronize_rcu()} to wait for. |
| } \QuickQuizEnd |
| |
| In the ``RCU BH'' column, \co{rcu_read_lock_bh()} and |
| \co{rcu_read_unlock_bh()} delimit RCU read-side critical |
| sections, and \co{call_rcu_bh()} invokes the specified |
| function and argument after a subsequent grace period. |
| Note that RCU BH does not have a synchronous \co{synchronize_rcu_bh()} |
| interface, |
| though one could easily be added if required. |
| |
| \QuickQuiz{} |
| What happens if you mix and match? |
| For example, suppose you use \co{rcu_read_lock()} and |
| \co{rcu_read_unlock()} to delimit RCU read-side critical |
| sections, but then use \co{call_rcu_bh()} to post an |
| RCU callback? |
| \QuickQuizAnswer{ |
| If there happened to be no RCU read-side critical |
| sections delimited by \co{rcu_read_lock_bh()} and |
| \co{rcu_read_unlock_bh()} at the time \co{call_rcu_bh()} |
| was invoked, RCU would be within its rights to invoke the callback |
| immediately, possibly freeing a data structure still being used by |
| the RCU read-side critical section! |
| This is not merely a theoretical possibility: a long-running RCU |
| read-side critical section delimited by \co{rcu_read_lock()} |
| and \co{rcu_read_unlock()} is vulnerable to this failure mode. |
| |
| However, the \co{rcu_dereference()} family of functions apply |
| to all flavors of RCU. |
| (There was an attempt to have per-flavor variants of |
| \co{rcu_dereference()}, but it was just too messy.) |
| } \QuickQuizEnd |
| |
| \QuickQuiz{} |
| Hardware interrupt handlers can be thought of as being |
| under the protection of an implicit \co{rcu_read_lock_bh()}, |
| right? |
| \QuickQuizAnswer{ |
| Absolutely not! |
| And especially not when using preemptible RCU! |
| If you need to access ``rcu\_bh''-protected data structures |
| in an interrupt handler, you need to provide explicit calls to |
| \co{rcu_read_lock_bh()} and \co{rcu_read_unlock_bh()}. |
| } \QuickQuizEnd |
| |
| In the ``RCU Sched'' column, anything that disables preemption |
| acts as an RCU read-side critical section, and \co{synchronize_sched()} |
| waits for the corresponding RCU grace period. |
| This RCU API family was added in the 2.6.12 kernel, which split the |
| old \co{synchronize_kernel()} API into the current |
| \co{synchronize_rcu()} (for RCU Classic) and |
| \co{synchronize_sched()} (for RCU Sched). |
| Note that RCU Sched did not originally have an asynchronous |
| \co{call_rcu_sched()} interface, but one was added in 2.6.26. |
| In accordance with the quasi-minimalist philosophy of the Linux |
| community, APIs are added on an as-needed basis. |
| |
| \QuickQuiz{} |
| What happens if you mix and match RCU Classic and RCU Sched? |
| \QuickQuizAnswer{ |
| In a non-\co{PREEMPT} or a \co{PREEMPT} kernel, mixing these |
| two works ``by accident'' because in those kernel builds, RCU Classic |
| and RCU Sched map to the same implementation. |
| However, this mixture is fatal in \co{PREEMPT_RT} builds using the -rt |
| patchset, due to the fact that Realtime RCU's read-side critical |
| sections can be preempted, which would permit |
| \co{synchronize_sched()} to return before the |
| RCU read-side critical section reached its \co{rcu_read_unlock()} |
| call. |
| This could in turn result in a data structure being freed before the |
| read-side critical section was finished with it, |
| which could in turn greatly increase the actuarial risk experienced |
| by your kernel. |
| |
| In fact, the split between RCU Classic and RCU Sched was inspired |
| by the need for preemptible RCU read-side critical sections. |
| } \QuickQuizEnd |
| |
| \QuickQuiz{} |
| In general, you cannot rely on \co{synchronize_sched()} to |
| wait for all pre-existing interrupt handlers, |
| right? |
| \QuickQuizAnswer{ |
| That is correct! |
| Because -rt Linux uses threaded interrupt handlers, there can |
| be context switches in the middle of an interrupt handler. |
| Because \co{synchronize_sched()} waits only until each |
| CPU has passed through a context switch, it can return |
| before a given interrupt handler completes. |
| |
| If you need to wait for a given interrupt handler to complete, |
| you should instead use \co{synchronize_irq()} or place |
| explicit RCU read-side critical sections in the interrupt |
| handlers that you wish to wait on. |
| } \QuickQuizEnd |
| |
| The ``Realtime RCU'' column has the same API as does |
| RCU Classic, the only difference being that RCU read-side critical |
| sections may be preempted and may block while acquiring spinlocks. |
| The design of Realtime RCU is described |
| elsewhere~\cite{PaulEMcKenney2007PreemptibleRCU}. |
| |
| \QuickQuiz{} |
| Why do both SRCU and QRCU lack asynchronous \co{call_srcu()} |
| or \co{call_qrcu()} interfaces? |
| \QuickQuizAnswer{ |
| Given an asynchronous interface, a single task |
| could register an arbitrarily large number of SRCU or QRCU callbacks, |
| thereby consuming an arbitrarily large quantity of memory. |
| In contrast, given the current synchronous |
| \co{synchronize_srcu()} and \co{synchronize_qrcu()} |
| interfaces, a given task must finish waiting for a given grace period |
| before it can start waiting for the next one. |
| } \QuickQuizEnd |
| |
| The ``SRCU'' column in |
| Table~\ref{tab:defer:Sleepable RCU Wait-to-Finish APIs} |
| displays a specialized RCU API that permits |
| general sleeping in RCU read-side critical sections |
| (see Appendix~\ref{app:rcuimpl:Sleepable RCU Implementation} for more details). |
| Of course, |
| use of \co{synchronize_srcu()} in an SRCU read-side |
| critical section can result in |
| self-deadlock, so should be avoided. |
| SRCU differs from earlier RCU implementations in that the caller |
| allocates an \co{srcu_struct} for each distinct SRCU |
| usage. |
| This approach prevents SRCU read-side critical sections from blocking |
| unrelated \co{synchronize_srcu()} invocations. |
| In addition, in this variant of RCU, \co{srcu_read_lock()} |
| returns a value that must be passed into the corresponding |
| \co{srcu_read_unlock()}. |
| |
| The ``QRCU'' column presents an RCU implementation with the same |
| API structure as SRCU, but optimized for extremely low-latency |
| grace periods in absence of readers, as described |
| elsewhere~\cite{PaulEMcKenney2007QRCUspin}. |
| As with SRCU, use of \co{synchronize_qrcu()} in a QRCU read-side |
| critical section can result in |
| self-deadlock, so should be avoided. |
| Although QRCU has not yet been accepted into the Linux kernel, it |
| is worth mentioning given that it is the only kernel-level |
| RCU implementation |
| that can boast deep sub-microsecond grace-period latencies. |
| |
| \QuickQuiz{} |
| Under what conditions can \co{synchronize_srcu()} be safely |
| used within an SRCU read-side critical section? |
| \QuickQuizAnswer{ |
| In principle, you can use |
| \co{synchronize_srcu()} with a given \co{srcu_struct} |
| within an SRCU read-side critical section that uses some other |
| \co{srcu_struct}. |
| In practice, however, doing this is almost certainly a bad idea. |
| In particular, the code shown in |
| Figure~\ref{fig:defer:Multistage SRCU Deadlocks} |
| could still result in deadlock. |
| |
| \begin{figure}[htbp] |
| { \centering |
| \begin{verbatim} |
| 1 idx = srcu_read_lock(&ssa); |
| 2 synchronize_srcu(&ssb); |
| 3 srcu_read_unlock(&ssa, idx); |
| 4 |
| 5 /* . . . */ |
| 6 |
| 7 idx = srcu_read_lock(&ssb); |
| 8 synchronize_srcu(&ssa); |
| 9 srcu_read_unlock(&ssb, idx); |
| \end{verbatim} |
| } |
| \caption{Multistage SRCU Deadlocks} |
| \label{fig:defer:Multistage SRCU Deadlocks} |
| \end{figure} |
| |
| } \QuickQuizEnd |
| |
| The Linux kernel currently has a surprising number of RCU APIs and |
| implementations. |
| There is some hope of reducing this number, evidenced by the fact |
| that a given build of the Linux kernel currently has at most |
| three implementations behind four APIs (given that RCU Classic |
| and Realtime RCU share the same API). |
| However, careful inspection and analysis will be required, just as |
| would be required in order to eliminate one of the many locking APIs. |
| |
| The various RCU APIs are distinguished by the forward-progress |
| guarantees that their RCU read-side critical sections must provide, |
| and also by their scope, as follows: |
| |
| \begin{enumerate} |
| \item RCU BH: read-side critical sections |
| must guarantee forward progress against everything except for |
| NMI and interrupt handlers, but not including software-interrupt |
| (\co{softirq}) handlers. |
| RCU BH is global in scope. |
| \item RCU Sched: read-side critical sections must guarantee forward |
| progress against everything except for NMI and irq handlers, |
| including \co{softirq} handlers. |
| RCU Sched is global in scope. |
| \item RCU (both classic and real-time): read-side critical sections |
| must guarantee forward progress against everything except for |
| NMI handlers, irq handlers, \co{softirq} handlers, and (in the |
| real-time case) higher-priority real-time tasks. |
| RCU is global in scope. |
| \item SRCU and QRCU: read-side critical sections need not guarantee |
| forward progress unless some other task is waiting for the |
| corresponding grace period to complete, in which case these |
| read-side critical sections should complete in no more than |
| a few seconds (and preferably much more quickly).\footnote{ |
| Thanks to James Bottomley for urging me to this |
| formulation, as opposed to simply saying that |
| there are no forward-progress guarantees.} |
| SRCU's and QRCU's scope is defined by the use of the corresponding |
| \co{srcu_struct} or \co{qrcu_struct}, respectively. |
| \end{enumerate} |
| |
| In other words, SRCU and QRCU compensate for their extremely weak |
| forward-progress guarantees by permitting the developer to restrict |
| their scope. |
| |
| \subsubsection{RCU has Publish-Subscribe and Version-Maintenance APIs} |
| \label{sec:defer:RCU has Publish-Subscribe and Version-Maintenance APIs} |
| |
| Fortunately, the RCU publish-subscribe and version-maintenance |
| primitives shown in the following |
| table apply to all of the variants of RCU discussed above. |
| This commonality can in some cases allow more code to be shared, |
| which certainly reduces the API proliferation that would otherwise |
| occur. |
| The original purpose of the RCU publish-subscribe APIs was to |
| bury memory barriers into these APIs, so that Linux kernel |
| programmers could use RCU without needing to become expert on |
| the memory-ordering models of each of the 20+ CPU families |
| that Linux supports~\cite{Spraul01}. |
| |
| \begin{table*}[tb] |
| \begin{center} |
| % \scriptsize |
| \begin{tabular}{p{1.1in}|p{1.9in}|p{0.7in}|p{1.3in}} |
| Category & |
| Primitives & |
| Availability & |
| Overhead \\ |
| \hline |
| \hline |
| List traversal & |
| \co{list_for_each_entry_rcu()} & |
| 2.5.59 & |
| { \raggedright |
| Simple instructions (memory barrier on Alpha) } \\ |
| \hline |
| List update & |
| \co{list_add_rcu()} & |
| 2.5.44 & |
| Memory barrier \\ |
| & |
| \co{list_add_tail_rcu()} & |
| 2.5.44 & |
| Memory barrier \\ |
| & |
| \co{list_del_rcu()} & |
| 2.5.44 & |
| Simple instructions \\ |
| & |
| \co{list_replace_rcu()} & |
| 2.6.9 & |
| Memory barrier \\ |
| & |
| \co{list_splice_init_rcu()} & |
| 2.6.21 & |
| Grace-period latency \\ |
| \hline |
| Hlist traversal & |
| \co{hlist_for_each_entry_rcu()} & |
| 2.6.8 & |
| { \raggedright |
| Simple instructions (memory barrier on Alpha) } \\ |
| & |
| \co{hlist_add_after_rcu()} & |
| 2.6.14 & |
| Memory barrier \\ |
| & |
| \co{hlist_add_before_rcu()} & |
| 2.6.14 & |
| Memory barrier \\ |
| & |
| \co{hlist_add_head_rcu()} & |
| 2.5.64 & |
| Memory barrier \\ |
| & |
| \co{hlist_del_rcu()} & |
| 2.5.64 & |
| Simple instructions \\ |
| & |
| \co{hlist_replace_rcu()} & |
| 2.6.15 & |
| Memory barrier \\ |
| \hline |
| Pointer traversal & |
| \co{rcu_dereference()} & |
| 2.6.9 & |
| { \raggedright |
| Simple instructions (memory barrier on Alpha) } \\ |
| \hline |
| Pointer update & |
| \co{rcu_assign_pointer()} & |
| 2.6.10 & |
| Memory barrier \\ |
| \end{tabular} |
| \end{center} |
| \caption{RCU Publish-Subscribe and Version Maintenance APIs} |
| \label{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs} |
| \end{table*} |
| |
| The first pair of categories operate on Linux |
| \co{struct list_head} lists, which are circular, doubly-linked |
| lists. |
| The \co{list_for_each_entry_rcu()} primitive traverses an |
| RCU-protected list in a type-safe manner, while also enforcing |
| memory ordering for situations where a new list element is inserted |
| into the list concurrently with traversal. |
| On non-Alpha platforms, this primitive incurs little or no performance |
| penalty compared to \co{list_for_each_entry()}. |
| The \co{list_add_rcu()}, \co{list_add_tail_rcu()}, |
| and \co{list_replace_rcu()} primitives are analogous to |
| their non-RCU counterparts, but incur the overhead of an additional |
| memory barrier on weakly-ordered machines. |
| The \co{list_del_rcu()} primitive is also analogous to its |
| non-RCU counterpart, but oddly enough is very slightly faster due to the |
| fact that it poisons only the \co{prev} pointer rather than |
| both the \co{prev} and \co{next} pointers as |
| \co{list_del()} must do. |
| Finally, the \co{list_splice_init_rcu()} primitive is similar |
| to its non-RCU counterpart, but incurs a full grace-period latency. |
| The purpose of this grace period is to allow RCU readers to finish |
| their traversal of the source list before completely disconnecting |
| it from the list header -- failure to do this could prevent such |
| readers from ever terminating their traversal. |
| |
| \QuickQuiz{} |
| Why doesn't \co{list_del_rcu()} poison both the \co{next} |
| and \co{prev} pointers? |
| \QuickQuizAnswer{ |
| Poisoning the \co{next} pointer would interfere |
| with concurrent RCU readers, who must use this pointer. |
| However, RCU readers are forbidden from using the \co{prev} |
| pointer, so it may safely be poisoned. |
| } \QuickQuizEnd |
| |
| The second pair of categories operate on Linux's |
| \co{struct hlist_head}, which is a linear linked list. |
| One advantage of \co{struct hlist_head} over |
| \co{struct list_head} is that the former requires only |
| a single-pointer list header, which can save significant memory in |
| large hash tables. |
| The \co{struct hlist_head} primitives in the table |
| relate to their non-RCU counterparts in much the same way as do the |
| \co{struct list_head} primitives. |
| |
| The final pair of categories operate directly on pointers, and |
| are useful for creating RCU-protected non-list data structures, |
| such as RCU-protected arrays and trees. |
| The \co{rcu_assign_pointer()} primitive ensures that any |
| prior initialization remains ordered before the assignment to the |
| pointer on weakly ordered machines. |
| Similarly, the \co{rcu_dereference()} primitive ensures that subsequent |
| code dereferencing the pointer will see the effects of initialization code |
| prior to the corresponding \co{rcu_assign_pointer()} on |
| Alpha CPUs. |
| On non-Alpha CPUs, \co{rcu_dereference()} documents which pointer |
| dereferences are protected by RCU. |
| |
| \QuickQuiz{} |
| Normally, any pointer subject to \co{rcu_dereference()} \emph{must} |
| always be updated using \co{rcu_assign_pointer()}. |
| What is an exception to this rule? |
| \QuickQuizAnswer{ |
| One such exception is when a multi-element linked |
| data structure is initialized as a unit while inaccessible to other |
| CPUs, and then a single \co{rcu_assign_pointer()} is used |
| to plant a global pointer to this data structure. |
| The initialization-time pointer assignments need not use |
| \co{rcu_assign_pointer()}, though any such assignments that |
| happen after the structure is globally visible \emph{must} use |
| \co{rcu_assign_pointer()}. |
| |
| However, unless this initialization code is on an impressively hot |
| code-path, it is probably wise to use \co{rcu_assign_pointer()} |
| anyway, even though it is in theory unnecessary. |
| It is all too easy for a ``minor'' change to invalidate your cherished |
| assumptions about the initialization happening privately. |
| } \QuickQuizEnd |
| |
| \QuickQuiz{} |
| Are there any downsides to the fact that these traversal and update |
| primitives can be used with any of the RCU API family members? |
| \QuickQuizAnswer{ |
| It can sometimes be difficult for automated |
| code checkers such as ``sparse'' (or indeed for human beings) to |
| work out which type of RCU read-side critical section a given |
| RCU traversal primitive corresponds to. |
| For example, consider the code shown in |
| Figure~\ref{fig:defer:Diverse RCU Read-Side Nesting}. |
| |
| \begin{figure}[htbp] |
| { \centering |
| \begin{verbatim} |
| 1 rcu_read_lock(); |
| 2 preempt_disable(); |
| 3 p = rcu_dereference(global_pointer); |
| 4 |
| 5 /* . . . */ |
| 6 |
| 7 preempt_enable(); |
| 8 rcu_read_unlock(); |
| \end{verbatim} |
| } |
| \caption{Diverse RCU Read-Side Nesting} |
| \label{fig:defer:Diverse RCU Read-Side Nesting} |
| \end{figure} |
| |
| Is the \co{rcu_dereference()} primitive in an RCU Classic |
| or an RCU Sched critical section? |
| What would you have to do to figure this out? |
| } \QuickQuizEnd |
| |
| \subsubsection{Where Can RCU's APIs Be Used?} |
| \label{sec:defer:Where Can RCU's APIs Be Used?} |
| |
| \begin{figure}[tb] |
| \begin{center} |
| \resizebox{3in}{!}{\includegraphics{defer/RCUenvAPI}} |
| \end{center} |
| \caption{RCU API Usage Constraints} |
| \label{fig:defer:RCU API Usage Constraints} |
| \end{figure} |
| |
| Figure~\ref{fig:defer:RCU API Usage Constraints} |
| shows which APIs may be used in which in-kernel environments. |
| The RCU read-side primitives may be used in any environment, including NMI, |
| the RCU mutation and asynchronous grace-period primitives may be used in any |
| environment other than NMI, and, finally, the RCU synchronous grace-period |
| primitives may be used only in process context. |
| The RCU list-traversal primitives include \co{list_for_each_entry_rcu()}, |
| \co{hlist_for_each_entry_rcu()}, etc. |
| Similarly, the RCU list-mutation primitives include |
| \co{list_add_rcu()}, \co{hlist_del_rcu()}, etc. |
| |
| Note that primitives from other families of RCU may be substituted, |
| for example, \co{srcu_read_lock()} may be used in any context |
| in which \co{rcu_read_lock()} may be used. |
| |
| \subsubsection{So, What \emph{is} RCU Really?} |
| \label{sec:defer:So, What is RCU Really?} |
| |
| At its core, RCU is nothing more nor less than an API that supports |
| publication and subscription for insertions, waiting for all RCU readers |
| to complete, and maintenance of multiple versions. |
| That said, it is possible to build higher-level constructs |
| on top of RCU, including the reader-writer-locking, reference-counting, |
| and existence-guarantee constructs listed in the companion article. |
| Furthermore, I have no doubt that the Linux community will continue to |
| find interesting new uses for RCU, |
| just as they do for any of a number of synchronization |
| primitives throughout the kernel. |
| |
| Of course, a more-complete view of RCU would also include |
| all of the things you can do with these APIs. |
| |
| However, for many people, a complete view of RCU must include sample |
| RCU implementations. |
| The next section therefore presents a series of ``toy'' RCU implementations |
| of increasing complexity and capability. |