blob: 3315ec194889cbe985240b1c2fee504c48bca438 [file] [log] [blame]
% defer/refcnt.tex
\section{Reference Counting}
\label{sec:defer:Reference Counting}
Reference counting tracks the number of references
to a given object in order to prevent that object from being prematurely
freed.
Although this is a conceptually simple technique, many devils hide in
the details.
After all, if the object was not subject to premature disposal,
there would be no need for the reference counter in the first place.
But if the object can be disposed of, what prevents disposal during
the reference-acquisition process itself?
There are a number of possible answers to this question, including:
\begin{enumerate}
\item A lock residing outside of the object must be held while
manipulating the reference count.
\item The object is created with a non-zero reference count, and new
references may be acquired only when the current value of
the reference counter is non-zero.
If a thread does not have a reference to a given object,
it may obtain one with the help of another thread that
already has a reference.
\item An existence guarantee is provided for the object, preventing
it from being freed while some other
entity might be attempting to acquire a reference.
Existence guarantees are often provided by automatic
garbage collectors, and, as will be seen in
Section~\ref{sec:defer:Read-Copy Update (RCU)}, by RCU.
\item A type-safety guarantee is provided for the object.
An additional identity check must be performed once
the reference is acquired.
Type-safety guarantees can be provided by special-purpose
memory allocators, for example, by the
\co{SLAB_DESTROY_BY_RCU} feature within the Linux kernel,
as will be seen in Section~\ref{sec:defer:Read-Copy Update (RCU)}.
\end{enumerate}
Of course, any mechanism that provides existence guarantees
by definition also provides type-safety guarantees.
This section will therefore group the last two answers together under the
rubric of RCU, leaving us with three general categories of
reference-acquisition protection: Reference counting, sequence
locking, and RCU.
\QuickQuiz{}
Why not implement reference-acquisition using
a simple compare-and-swap operation that only
acquires a reference if the reference counter is
non-zero?
\QuickQuizAnswer{
Although this can resolve the race between the release of
the last reference and acquisition of a new reference,
it does absolutely nothing to prevent the data structure
from being freed and reallocated, possibly as some completely
different type of structure.
It is quite likely that the ``simple compare-and-swap
operation'' would give undefined results if applied to the
differently typed structure.
In short, use of atomic operations such as compare-and-swap
absolutely requires either type-safety or existence guarantees.
} \QuickQuizEnd
\begin{table}
\centering
\begin{tabular}{l||c|c|c}
& \multicolumn{3}{c}{Release Synchronization} \\
\cline{2-4}
Acquisition & & Reference & \\
Synchronization & Locking & Counting & RCU \\
\hline
\hline
Locking & - & CAM & CA \\
\hline
Reference & A & AM & A \\
Counting & & & \\
\hline
RCU & CA & MCA & CA \\
\end{tabular}
\caption{Reference Counting and Synchronization Mechanisms}
\label{tab:defer:Reference Counting and Synchronization Mechanisms}
\end{table}
Given that the key reference-counting issue
is synchronization between acquisition
of a reference and freeing of the object, we have nine possible
combinations of mechanisms, as shown in
Table~\ref{tab:defer:Reference Counting and Synchronization Mechanisms}.
This table
divides reference-counting mechanisms into the following broad categories:
\begin{enumerate}
\item Simple counting with neither atomic operations, memory
barriers, nor alignment constraints \makebox{(``-'')}.
\item Atomic counting without memory barriers (``A'').
\item Atomic counting, with memory barriers required only on release
(``AM'').
\item Atomic counting with a check combined with the atomic acquisition
operation, and with memory barriers required only on release
(``CAM'').
\item Atomic counting with a check combined with the atomic acquisition
operation (``CA'').
\item Atomic counting with a check combined with the atomic acquisition
operation, and with memory barriers also required on acquisition
(``MCA'').
\end{enumerate}
However, because all Linux-kernel atomic operations that return a
value are defined to contain memory barriers, all release operations
contain memory barriers, and all checked acquisition operations also
contain memory barriers.
Therefore, cases ``CA'' and ``MCA'' are equivalent to ``CAM'', so that
there are sections below for only the first four cases:
\makebox{``-''}, ``A'', ``AM'', and ``CAM''.
The Linux primitives that support reference counting are presented in
Section~\ref{sec:defer:Linux Primitives Supporting Reference Counting}.
Later sections cite optimizations that can improve performance
if reference acquisition and release is very frequent, and the
reference count need be checked for zero only very rarely.
\subsection{Implementation of Reference-Counting Categories}
\label{sec:defer:Implementation of Reference-Counting Categories}
Simple counting protected by locking (\makebox{``-''}) is described in
Section~\ref{sec:defer:Simple Counting},
atomic counting with no memory barriers (``A'') is described in
Section~\ref{sec:defer:Atomic Counting}
atomic counting with acquisition memory barrier (``AM'') is described in
Section~\ref{sec:defer:Atomic Counting With Release Memory Barrier},
and
atomic counting with check and release memory barrier (``CAM'') is described in
Section~\ref{sec:defer:Atomic Counting With Check and Release Memory Barrier}.
\subsubsection{Simple Counting}
\label{sec:defer:Simple Counting}
Simple counting, with neither atomic operations nor memory barriers,
can be used when the reference-counter acquisition and release are
both protected by the same lock.
In this case, it should be clear that the reference count itself
may be manipulated non-atomically, because the lock provides any
necessary exclusion, memory barriers, atomic instructions, and disabling
of compiler optimizations.
This is the method of choice when the lock is required to protect
other operations in addition to the reference count, but where
a reference to the object must be held after the lock is released.
Figure~\ref{fig:defer:Simple Reference-Count API} shows a simple
API that might be used to implement simple non-atomic reference
counting -- although simple reference counting is almost always
open-coded instead.
\begin{figure}[htbp]
{ \scriptsize
\begin{verbatim}
1 struct sref {
2 int refcount;
3 };
4
5 void sref_init(struct sref *sref)
6 {
7 sref->refcount = 1;
8 }
9
10 void sref_get(struct sref *sref)
11 {
12 sref->refcount++;
13 }
14
15 int sref_put(struct sref *sref,
16 void (*release)(struct sref *sref))
17 {
18 WARN_ON(release == NULL);
19 WARN_ON(release == (void (*)(struct sref *))kfree);
20
21 if (--sref->refcount == 0) {
22 release(sref);
23 return 1;
24 }
25 return 0;
26 }
\end{verbatim}
}
\caption{Simple Reference-Count API}
\label{fig:defer:Simple Reference-Count API}
\end{figure}
\subsubsection{Atomic Counting}
\label{sec:defer:Atomic Counting}
Simple atomic counting may be used in cases where any CPU acquiring
a reference must already hold a reference.
This style is used when a single CPU creates an object for its
own private use, but must allow other CPU, tasks, timer handlers,
or I/O completion handlers that it later spawns to also access this object.
Any CPU that hands the object off must first acquire a new reference
on behalf of the recipient object.
In the Linux kernel, the \co{kref} primitives are used to implement
this style of reference counting, as shown in
Figure~\ref{fig:defer:Linux Kernel kref API}.
Atomic counting is required
because locking is not used to protect all reference-count operations,
which means that it is possible for two different CPUs to concurrently
manipulate the reference count.
If normal increment and decrement were used, a pair of CPUs might both
fetch the reference count concurrently, perhaps both obtaining
the value ``3''.
If both of them increment their value, they will both obtain ``4'',
and both will store this value back into the counter.
Since the new value of the counter should instead be ``5'', one
of the two increments has been lost.
Therefore, atomic operations must be used both for counter increments
and for counter decrements.
If releases are guarded by locking or RCU,
memory barriers are \emph{not} required, but for different reasons.
In the case of locking, the locks provide any needed memory barriers
(and disabling of compiler optimizations), and the locks also
prevent a pair of releases from running concurrently.
In the case of RCU, cleanup must be deferred until all currently
executing RCU read-side critical sections have completed, and
any needed memory barriers or disabling of compiler optimizations
will be provided by the RCU infrastructure.
Therefore, if two CPUs release the final two references concurrently,
the actual cleanup will be deferred until both CPUs exit their
RCU read-side critical sections.
\QuickQuiz{}
Why isn't it necessary to guard against cases where one CPU
acquires a reference just after another CPU releases the last
reference?
\QuickQuizAnswer{
Because a CPU must already hold a reference in order
to legally acquire another reference.
Therefore, if one CPU releases the last reference,
there cannot possibly be any CPU that is permitted
to acquire a new reference.
This same fact allows the non-atomic check in line~22
of Figure~\ref{fig:defer:Linux Kernel kref API}.
} \QuickQuizEnd
\begin{figure}[htbp]
{ \scriptsize
\begin{verbatim}
1 struct kref {
2 atomic_t refcount;
3 };
4
5 void kref_init(struct kref *kref)
6 {
7 atomic_set(&kref->refcount, 1);
8 }
9
10 void kref_get(struct kref *kref)
11 {
12 WARN_ON(!atomic_read(&kref->refcount));
13 atomic_inc(&kref->refcount);
14 }
15
16 static inline int
17 kref_sub(struct kref *kref, unsigned int count,
18 void (*release)(struct kref *kref))
19 {
20 WARN_ON(release == NULL);
21
22 if (atomic_sub_and_test((int) count,
23 &kref->refcount)) {
24 release(kref);
25 return 1;
26 }
27 return 0;
28 }
\end{verbatim}
}
\caption{Linux Kernel kref API}
\label{fig:defer:Linux Kernel kref API}
\end{figure}
The \co{kref} structure itself, consisting of a single atomic
data item, is shown in lines~1-3 of
Figure~\ref{fig:defer:Linux Kernel kref API}.
The \co{kref_init()} function on lines~5-8 initializes the counter
to the value ``1''.
Note that the \co{atomic_set()} primitive is a simple
assignment, the name stems from the data type of \co{atomic_t}
rather than from the operation.
The \co{kref_init()} function must be invoked during object creation,
before the object has been made available to any other CPU.
The \co{kref_get()} function on lines~10-14 unconditionally atomically
increments the counter.
The \co{atomic_inc()} primitive does not necessarily explicitly
disable compiler
optimizations on all platforms, but the fact that the \co{kref}
primitives are in a separate module and that the Linux kernel build
process does no cross-module optimizations has the same effect.
The \co{kref_put()} function on lines~16-28 atomically decrements the
counter, and if the result is zero, line~24 invokes the specified
\co{release()} function and line~24 returns, informing the caller
that \co{release()} was invoked.
Otherwise, \co{kref_put()} returns zero, informing the caller that
\co{release()} was not called.
\QuickQuiz{}
Suppose that just after the \co{atomic_sub_and_test()}
on line~22 of
Figure~\ref{fig:defer:Linux Kernel kref API} is invoked,
that some other CPU invokes \co{kref_get()}.
Doesn't this result in that other CPU now having an illegal
reference to a released object?
\QuickQuizAnswer{
This cannot happen if these functions are used correctly.
It is illegal to invoke \co{kref_get()} unless you already
hold a reference, in which case the \co{kref_sub()} could
not possibly have decremented the counter to zero.
} \QuickQuizEnd
\QuickQuiz{}
Suppose that \co{kref_sub()} returns zero, indicating that
the \co{release()} function was not invoked.
Under what conditions can the caller rely on the continued
existence of the enclosing object?
\QuickQuizAnswer{
The caller cannot rely on the continued existence of the
object unless it knows that at least one reference will
continue to exist.
Normally, the caller will have no way of knowing this, and
must therefore carefullly avoid referencing the object after
the call to \co{kref_sub()}.
} \QuickQuizEnd
\subsubsection{Atomic Counting With Release Memory Barrier}
\label{sec:defer:Atomic Counting With Release Memory Barrier}
This style of reference is used in the Linux kernel's networking
layer to track the destination caches that are used in packet routing.
The actual implementation is quite a bit more involved; this section
focuses on the aspects of \co{struct dst_entry} reference-count
handling that matches this use case,
shown in Figure~\ref{fig:defer:Linux Kernel dst-clone API}.
\begin{figure}[htbp]
{ \scriptsize
\begin{verbatim}
1 static inline
2 struct dst_entry * dst_clone(struct dst_entry * dst)
3 {
4 if (dst)
5 atomic_inc(&dst->__refcnt);
6 return dst;
7 }
8
9 static inline
10 void dst_release(struct dst_entry * dst)
11 {
12 if (dst) {
13 WARN_ON(atomic_read(&dst->__refcnt) < 1);
14 smp_mb__before_atomic_dec();
15 atomic_dec(&dst->__refcnt);
16 }
17 }
\end{verbatim}
}
\caption{Linux Kernel dst\_clone API}
\label{fig:defer:Linux Kernel dst-clone API}
\end{figure}
The \co{dst_clone()} primitive may be used if the caller
already has a reference to the specified \co{dst_entry},
in which case it obtains another reference that may be handed off
to some other entity within the kernel.
Because a reference is already held by the caller, \co{dst_clone()}
need not execute any memory barriers.
The act of handing the \co{dst_entry} to some other entity might
or might not require a memory barrier, but if such a memory barrier
is required, it will be embedded in the mechanism used to hand the
\co{dst_entry} off.
The \co{dst_release()} primitive may be invoked from any environment,
and the caller might well reference elements of the \co{dst_entry}
structure immediately prior to the call to \co{dst_release()}.
The \co{dst_release()} primitive therefore contains a memory
barrier on line~14 preventing both the compiler and the CPU
from misordering accesses.
Please note that the programmer making use of \co{dst_clone()} and
\co{dst_release()} need not be aware of the memory barriers, only
of the rules for using these two primitives.
\subsubsection{Atomic Counting With Check and Release Memory Barrier}
\label{sec:defer:Atomic Counting With Check and Release Memory Barrier}
Consider a situation where the caller must be able to acquire a new
reference to an object to which it does not already hold a reference.
The fact that initial reference-count acquisition can now run concurrently
with reference-count release adds further complications.
Suppose that a reference-count release finds that the new
value of the reference count is zero, signalling that it is
now safe to clean up the reference-counted object.
We clearly cannot allow a reference-count acquisition to
start after such clean-up has commenced, so the acquisition
must include a check for a zero reference count.
This check must be part of the atomic increment operation,
as shown below.
\QuickQuiz{}
Why can't the check for a zero reference count be
made in a simple ``if'' statement with an atomic
increment in its ``then'' clause?
\QuickQuizAnswer{
Suppose that the ``if'' condition completed, finding
the reference counter value equal to one.
Suppose that a release operation executes, decrementing
the reference counter to zero and therefore starting
cleanup operations.
But now the ``then'' clause can increment the counter
back to a value of one, allowing the object to be
used after it has been cleaned up.
} \QuickQuizEnd
The Linux kernel's \co{fget()} and \co{fput()} primitives
use this style of reference counting.
Simplified versions of these functions are shown in
Figure~\ref{fig:defer:Linux Kernel fget/fput API}.
\begin{figure}[htbp]
{ \scriptsize
\begin{verbatim}
1 struct file *fget(unsigned int fd)
2 {
3 struct file *file;
4 struct files_struct *files = current->files;
5
6 rcu_read_lock();
7 file = fcheck_files(files, fd);
8 if (file) {
9 if (!atomic_inc_not_zero(&file->f_count)) {
10 rcu_read_unlock();
11 return NULL;
12 }
13 }
14 rcu_read_unlock();
15 return file;
16 }
17
18 struct file *
19 fcheck_files(struct files_struct *files, unsigned int fd)
20 {
21 struct file * file = NULL;
22 struct fdtable *fdt = rcu_dereference((files)->fdt);
23
24 if (fd < fdt->max_fds)
25 file = rcu_dereference(fdt->fd[fd]);
26 return file;
27 }
28
29 void fput(struct file *file)
30 {
31 if (atomic_dec_and_test(&file->f_count))
32 call_rcu(&file->f_u.fu_rcuhead, file_free_rcu);
33 }
34
35 static void file_free_rcu(struct rcu_head *head)
36 {
37 struct file *f;
38
39 f = container_of(head, struct file, f_u.fu_rcuhead);
40 kmem_cache_free(filp_cachep, f);
41 }
\end{verbatim}
}
\caption{Linux Kernel fget/fput API}
\label{fig:defer:Linux Kernel fget/fput API}
\end{figure}
Line~4 of \co{fget()} fetches the pointer to the current
process's file-descriptor table, which might well be shared
with other processes.
Line~6 invokes \co{rcu_read_lock()}, which
enters an RCU read-side critical section.
The callback function from any subsequent \co{call_rcu()} primitive
will be deferred until a matching \co{rcu_read_unlock()} is reached
(line~10 or 14 in this example).
Line~7 looks up the file structure corresponding to the file
descriptor specified by the \co{fd} argument, as will be
described later.
If there is an open file corresponding to the specified file descriptor,
then line~9 attempts to atomically acquire a reference count.
If it fails to do so, lines~10-11 exit the RCU read-side critical
section and report failure.
Otherwise, if the attempt is successful, lines~14-15 exit the read-side
critical section and return a pointer to the file structure.
The \co{fcheck_files()} primitive is a helper function for
\co{fget()}.
It uses the \co{rcu_dereference()} primitive to safely fetch an
RCU-protected pointer for later dereferencing (this emits a
memory barrier on CPUs such as DEC Alpha in which data dependencies
do not enforce memory ordering).
Line~22 uses \co{rcu_dereference()} to fetch a pointer to this
task's current file-descriptor table,
and line~24 checks to see if the specified file descriptor is in range.
If so, line~25 fetches the pointer to the file structure, again using
the \co{rcu_dereference()} primitive.
Line~26 then returns a pointer to the file structure or \co{NULL}
in case of failure.
The \co{fput()} primitive releases a reference to a file structure.
Line~31 atomically decrements the reference count, and, if the result
was zero, line~32 invokes the \co{call_rcu()} primitives in order to
free up the file structure (via the \co{file_free_rcu()} function
specified in \co{call_rcu()}'s second argument),
but only after all currently-executing
RCU read-side critical sections complete.
The time period required for all currently-executing RCU read-side
critical sections to complete is termed a ``grace period''.
Note that the \co{atomic_dec_and_test()} primitive contains
a memory barrier.
This memory barrier is not necessary in this example, since the structure
cannot be destroyed until the RCU read-side critical section completes,
but in Linux, all atomic operations that return a result must
by definition contain memory barriers.
Once the grace period completes, the \co{file_free_rcu()} function
obtains a pointer to the file structure on line~39, and frees it
on line~40.
This approach is also used by Linux's virtual-memory system,
see \co{get_page_unless_zero()} and \co{put_page_testzero()} for
page structures as well as \co{try_to_unuse()} and \co{mmput()}
for memory-map structures.
\subsection{Hazard Pointers}
\label{sec:defer:Hazard Pointers}
All of the reference-counting mechanisms discussed in the previous
section require some other mechanism to prevent the data element from
being deleted while the reference count is being acquired.
This other mechanism might be a pre-existing reference held on that
data element, locking, RCU, or atomic operations, but all of them
either degrade performance and scalability or restrict use cases.
One way of avoiding these problems is to implement the reference counters
inside out, that is, rather than incrementing an integer stored in the
data element, instead store a pointer to that data element in
per-CPU (or per-thread) lists.
Each element of these lists is called a
\emph{hazard pointer}~\cite{MagedMichael04a}.\footnote{
Also independently invented by others~\cite{HerlihyLM02}.}
The value of a given data element's ``virtual reference counter'' can
then be obtained by counting the number of hazard pointers referencing
that element.
Therefore, if that element has been rendered inaccessible to readers,
and there are no longer any hazard pointers referencing it, that element
may safely be freed.
\begin{figure}[tbp]
{ \scriptsize
\begin{verbatim}
1 int hp_store(void **p, void **hp)
2 {
3 void *tmp;
4
5 tmp = ACCESS_ONCE(*p);
6 ACCESS_ONCE(*hp) = tmp;
7 smp_mb();
8 if (tmp != ACCESS_ONCE(*p) ||
9 tmp == HAZPTR_POISON) {
10 ACCESS_ONCE(*hp) = NULL;
11 return 0;
12 }
13 return 1;
14 }
15
16 void hp_erase(void **hp)
17 {
18 smp_mb();
19 ACCESS_ONCE(*hp) = NULL;
20 hp_free(hp);
21 }
\end{verbatim}
}
\caption{Hazard-Pointer Storage and Erasure}
\label{fig:defer:Hazard-Pointer Storage and Erasure}
\end{figure}
Of course, this means that hazard-pointer acquisition must be carried
out quite carefully in order to avoid destructive races with concurrent
deletion.
One implementation is shown in
Figure~\ref{fig:defer:Hazard-Pointer Storage and Erasure},
which shows \co{hp_store()} on lines~1-13 and \co{hp_erase()} on
lines~15-20.
The \co{smp_mb()} primitive will be described in detail in
Section~\ref{sec:advsync:Memory Barriers}, but may be ignored for
the purposes of this brief overview.
The \co{hp_store()} function records a hazard pointer at \co{hp} for the data
element whose pointer is referenced by \co{p}, while checking for
concurrent modifications.
If a concurrent modification occurred, \co{hp_store()} refuses to record
a hazard pointer, and returns zero to indicate that the caller must
restart its traversal from the beginning.
Otherwise, \co{hp_store()} returns one to indicate that it successfully
recorded a hazard pointer for the data element.
\QuickQuiz{}
Why does \co{hp_store()} in
Figure~\ref{fig:defer:Hazard-Pointer Storage and Erasure}
take a double indirection to the data element?
Why not \co{void *} instead of \co{void **}?
\QuickQuizAnswer{
Because \co{hp_record()} must check for concurrent modifications.
To do that job, it needs a pointer to a pointer to the element,
so that it can check for a modification to the pointer to the
element.
} \QuickQuizEnd
\QuickQuiz{}
Why does \co{hp_store()}'s caller need to restart its
traversal from the beginning in case of failure?
Isn't that inefficient for large data structures?
\QuickQuizAnswer{
It might be inefficient in some sense, but the fact is that
such restarting is absolutely required for correctness.
To see this, consider a hazard-pointer-protected linked list
containing elements~A, B, and~C that is subjecte to the
following sequence of events:
\begin{enumerate}
\item Thread~0 stores a hazard pointer to element~B
(having presumably traversed to element~B from element~A).
\item Thread~1 removes element~B from the list, which sets
the pointer from element~B to element~C to a special
\co{HAZPTR_POISON} value in order to mark the deletion.
Because Thread~0 has a hazard pointer to element~B,
it cannot yet be freed.
\item Thread~1 removes element~C from the list.
Because there are no hazard pointers referencing element~C,
it is immediately freed.
\item Thread~0 attempts to acquire a hazard pointer to
now-removed element~B's successor, but sees the
\co{HAZPTR_POISON} value, and thus returns zero,
forcing the caller to restart its traversal from the
beginning of the list.
\end{enumerate}
Which is a very good thing, because otherwise Thread~0 would
have attempted to access the now-freed element~C,
which might have resulted in arbitrarily horrible
memory corruption, especially if the memory for
element~C had since been re-allocated for some other
purpose.
} \QuickQuizEnd
\QuickQuiz{}
Given that papers on hazard pointers use the bottom bits
of each pointer to mark deleted elements, what is up with
\co{HAZPTR_POISON}?
\QuickQuizAnswer{
The published implementations of hazard pointers used
non-blocking synchornization techniques for insertion
and deletion.
These techniques require that readers traversing the
data structure ``help'' updaters complete their updates,
which in turn means that readers need to look at the successor
of a deleted element.
In contrast, we will be using locking to synchronize updates,
which does away with the need for readers to help updaters
complete their updates, which in turn allows us to leave
pointers' bottom bits alone.
This approach allows read-side code to be simpler and faster.
} \QuickQuizEnd
Because algorithms using hazard pointers might be restarted at any
step of their traversal through the data structure, such algorithms
must typically take care to avoid making any changes to the data
structure until after they have acquired all relevant hazard pointers.
\QuickQuiz{}
But don't these restrictions on hazard pointers also apply
to other forms of reference counting?
\QuickQuizAnswer{
These restrictions apply only to reference-counting mechanisms whose
reference acquisition can fail.
} \QuickQuizEnd
In exchange for these restrictions, hazard pointers offer excellent
performance and scalability for readers.
Performance comparisons with other mechanisms may be found in
Chapter~\ref{chp:Data Structures}
and in other publications~\cite{ThomasEHart2007a,McKenney:2013:SDS:2483852.2483867,MagedMichael04a}.
\subsection{Linux Primitives Supporting Reference Counting}
\label{sec:defer:Linux Primitives Supporting Reference Counting}
The Linux-kernel primitives used in the above examples are
summarized in the following list.
\IfInBook{}{The RCU primitives may be unfamiliar to some readers,
so a brief overview with citations is included in
Section~\ref{sec:defer:Background on RCU}.}
\begin{itemize}
\item \co{atomic_t}
Type definition for 32-bit quantity to be manipulated atomically.
\item \co{void atomic_dec(atomic_t *var);}
Atomically decrements the referenced variable without necessarily
issuing a memory barrier or disabling compiler optimizations.
\item \co{int atomic_dec_and_test(atomic_t *var);}
Atomically decrements the referenced variable, returning
\co{true} (non-zero) if the result is zero.
Issues a memory barrier and disables compiler optimizations that
might otherwise move memory references across this primitive.
\item \co{void atomic_inc(atomic_t *var);}
Atomically increments the referenced variable without necessarily
issuing a memory barrier or disabling compiler optimizations.
\item \co{int atomic_inc_not_zero(atomic_t *var);}
Atomically increments the referenced variable, but only if the
value is non-zero, and returning \co{true} (non-zero) if the
increment occurred.
Issues a memory barrier and disables compiler optimizations that
might otherwise move memory references across this primitive.
\item \co{int atomic_read(atomic_t *var);}
Returns the integer value of the referenced variable.
This is not an atomic operation, and it does not issue any
memory-barrier instructions.
Instead of thinking of as ``an atomic read,'' think of it as
``a normal read from an atomic variable.''
\item \co{void atomic_set(atomic_t *var, int val);}
Sets the value of the referenced atomic variable to ``val''.
This is not an atomic operation, and it neither issues memory
barriers nor disables compiler optimizations.
Instead of thinking of as ``an atomic set,'' think of it as
``a normal set of an atomic variable.''
\item \co{void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *head));}
Invokes \co{func(head)} some time after all currently executing RCU
read-side critical sections complete, however, the \co{call_rcu()}
primitive returns immediately.
Note that \co{head} is normally a field within an RCU-protected
data structure, and that \co{func} is normally a function that
frees up this data structure.
The time interval between the invocation of \co{call_rcu()} and
the invocation of \co{func} is termed a ``grace period''.
Any interval of time containing a grace period is itself a
grace period.
\item \co{type *container_of(p, type, f);}
Given a pointer \co{p} to a field \co{f} within a structure
of the specified type, return a pointer to the structure.
\item \co{void rcu_read_lock(void);}
Marks the beginning of an RCU read-side critical section.
\item \co{void rcu_read_unlock(void);}
Marks the end of an RCU read-side critical section.
RCU read-side critical sections may be nested.
\item \co{void smp_mb__before_atomic_dec(void);}
Issues a memory barrier and disables code-motion compiler
optimizations only if the platform's \co{atomic_dec()}
primitive does not already do so.
\item \co{struct rcu_head}
A data structure used by the RCU infrastructure to track
objects awaiting a grace period.
This is normally included as a field within an RCU-protected
data structure.
\end{itemize}
\QuickQuiz{}
An \co{atomic_read()} and an \co{atomic_set()} that are
non-atomic?
Is this some kind of bad joke???
\QuickQuizAnswer{
It might well seem that way, but in situations where no other
CPU has access to the atomic variable in question, the overhead
of an actual atomic instruction would be wasteful.
Two examples where no other CPU has access are
during initialization and cleanup.
} \QuickQuizEnd
\subsection{Counter Optimizations}
\label{sec:defer:Counter Optimizations}
In some cases where increments and decrements are common, but checks
for zero are rare, it makes sense to maintain per-CPU or per-task
counters, as was discussed in Chapter~\ref{chp:Counting}.
\IfInBook{See Appendix~\ref{app:rcuimpl:Sleepable RCU Implementation}
for an example of this technique applied to RCU.}{
See the paper on sleepable read-copy update
(SRCU) for an example of this technique applied to
RCU~\cite{PaulEMcKenney2006c}.}
This approach eliminates the need for atomic instructions or memory
barriers on the increment and decrement primitives, but still requires
that code-motion compiler optimizations be disabled.
In addition, the primitives such as \co{synchronize_srcu()}
that check for the aggregate reference
count reaching zero can be quite slow.
This underscores the fact that these techniques are designed
for situations where the references are frequently acquired and
released, but where it is rarely necessary to check for a zero
reference count.
% @@@ Difficulty of managing reference counts: leaks, premature freeing.
However, it is usually the case that use of reference counts requires
writing (often atomically) to a data structure that is otherwise
read only.
In this case, reference counts are imposing expensive cache misses
on readers.
\QuickQuiz{}
But hazard pointers don't write to the data structure!
\QuickQuizAnswer{
Indeed, they do not.
However, they do write to the hazard pointers themselves,
and, more important, require that possible failures be
handled for all \co{hp_store()} calls, each of which
might fail.
Therefore, although hazard pointers are extremely useful,
it is still worth looking for improved mechanisms.
} \QuickQuizEnd
It is therefore worthwhile to look into synchronization mechanisms
that do not require readers to do writes at all.
One such synchronization mechanism, sequence locks, is covered in
the next section.