blob: aa5afbab84c33e6c4e1f4910987b17c7e7877a75 [file] [log] [blame]
% together/seqlock.tex
% mainfile: ../perfbook.tex
% SPDX-License-Identifier: CC-BY-SA-3.0
\section{Sequence-Locking Specials}
\label{sec:together:Sequence-Locking Specials}
%
\epigraph{The girl who can't dance says the band can't play.}
{Yiddish proverb}
This section looks at some special uses of sequence locks.
\subsection{Dueling Sequence Locks}
\label{sec:together:Dueling Sequence Locks}
The classic sequence-locking use case enables a reader to see a consistent
snapshot of a small collection of variables, for example, calibration
constants for timekeeping.
This works quite well in practice because calibration constants are
rarely updated and, when updated, are updated quickly.
Readers therefore almost never need to retry.
However, if the updater is delayed during the update, readers will
also be delayed.
Such delays might be due to interrupts, NMIs, or even
virtual-CPU preemption.
One way to prevent updater delays from causing reader delays is to
maintain two sets of calibration constants.
Each set is updated in turn, but frequently enough that readers can
make good use of either set.
Each set has its own sequence lock (\co{seqlock_t} structure).
The updater alternates between the two sets, so that an delayed updater
delays readers of at most one of the sets.
Each reader attempts to access the first set, but upon retry attempts
to access the second set.
If the second set also forces a retry, the reader repeats starting
again from the first set.
If the updater is stuck, only one of the two sets will force
readers to retry, and therefore readers will succeed as soon as
they attempt to access the other set.
\QuickQuiz{
Why don't all sequence-locking use cases replicate the
data in this fashion?
}\QuickQuizAnswer{
Such replication is impractical if the data is too
large, as it might be in the Schr\"odinger's-zoo example
described in
\cref{sec:together:Correlated Data Elements}.
Such replication is unnecessary if delays are prevented,
for example, when updaters disable interrupts when running
on bare-metal hardware (that is, without the use of
a vCPU-preemption-prone hypervisor).
Alternatively, if readers can tolerate the occasional delay,
then replication is again unnecessary.
Consider the example of reader-writer locking, where
writers always delay readers and vice versa.
However, if the data to be replicated is reasonably
small, if delays are possible, and if readers cannot
tolerate these delays, replicating the data is an
excellent approach.
}\QuickQuizEnd
\subsection{Correlated Data Elements}
\label{sec:together:Correlated Data Elements}
Suppose we have a hash table where we need correlated views of two or
more of the elements.
These elements are updated together, and we do not want to see an old
version of the first element along with new versions of the other
elements.
For example, Schr\"odinger decided to add his extended family to his
in-memory database along with all his animals.
Although Schr\"odinger understands that marriages and divorces do not
happen instantaneously, he is also a traditionalist.
As such, he absolutely does not want his database ever to show that the
bride is now married, but the groom is not, and vice versa.
Plus, if you think Schr\"odinger is a traditionalist, you just
try conversing with some of his family members!
In other words, Schr\"odinger wants to be able to carry out a
wedlock-consistent traversal of his database.
One approach is to use sequence locks
(see \cref{sec:defer:Sequence Locks}),
so that wedlock-related updates are carried out under the
protection of \co{write_seqlock()}, while reads requiring
wedlock consistency are carried out within
a \co{read_seqbegin()} / \co{read_seqretry()} loop.
Note that sequence locks are not a replacement for RCU protection:
Sequence locks protect against concurrent modifications, but RCU
is still needed to protect against concurrent deletions.
This approach works quite well when the number of correlated elements is
small, the time to read these elements is short, and the update rate is
low.
Otherwise, updates might happen so quickly that readers might never complete.
Although Schr\"odinger does not expect that even his least-sane relatives
will marry and divorce quickly enough for this to be a problem,
he does realize that this problem could well arise in other situations.
One way to avoid this reader-starvation problem is to have the readers
use the update-side primitives if there have been too many retries,
but this can degrade both performance and scalability.
Another way to avoid \IX{starvation} is to have multiple sequence locks,
in Schr\"odinger's case, perhaps one per species.
In addition, if the update-side primitives are used too frequently,
poor performance and scalability will result due to lock contention.
One way to avoid this is to maintain a per-element sequence lock,
and to hold both spouses' locks when updating their marital status.
Readers can do their retry looping on either of the spouses' locks
to gain a stable view of any change in marital status involving both
members of the pair.
This avoids contention due to high marriage and divorce rates, but
complicates gaining a stable view of all marital statuses during a
single scan of the database.
If the element groupings are well-defined and persistent, which marital
status is hoped to be,
then one approach is to add pointers to the data elements to link
together the members of a given group.
Readers can then traverse these pointers to access all the data elements
in the same group as the first one located.
This technique is used heavily in the Linux kernel, perhaps most
notably in the dcache subsystem~\cite{NeilBrown2015RCUwalk}.
Note that it is likely that similar schemes also work with hazard
pointers.
This approach provides sequential consistency to successful readers,
each of which will either see the effects of a given update or not,
with any partial updates resulting in a read-side retry.
Sequential consistency is an extremely strong guarantee, incurring equally
strong restrictions and equally high overheads.
In this case, we saw that readers might be starved on the one hand, or
might need to acquire the update-side lock on the other.
Although this works very well in cases where updates are infrequent,
it unnecessarily forces read-side retries even when the update does not
affect any of the data that a retried reader accesses.
\Cref{sec:together:Correlated Fields} therefore covers a much weaker form
of consistency that not only avoids reader starvation, but also avoids
any form of read-side retry.
The next section instead presents a weaker form of consistency that
can be provided with much lower probabilities of reader starvation.
\subsection{Atomic Move}
\label{sec:together:Atomic Move}
Suppose that individual data elements are moved from one data structure
to another, and that readers look up only single data structures.
However, when a data element moves, readers must must never see it as
being in both structures at the same time and must also never see it as
missing from both structures at the same time.
At the same time, any reader seeing the element in its new location
must never subsequently see it in its old location.
In addition, the move may be implemented by inserting a new copy of the
old data element into the destination location.
For example, consider a hash table that supports an atomic-to-readers
rename operation.
Expanding on Schr\"odinger's zoo, suppose that an animal's name changes,
for example, each of the brides in Schr\"odinger's traditionalist family
might change their last name to match that of their groom.
But changing their name might change the hash value, and might also
require that the bride's element move from one hash chain to another.
The consistency set forth above requires that if a reader successfully
looks up the new name, then any subsequent lookup of the old name by
that reader must result in failure.
Similarly, if a reader's lookup of the old name results in lookup failure,
then any subsequent lookup of the new name by that reader must succeed.
In short, a given reader should not see a bride momentarily blinking out
of existence, nor should that reader lookup a bride under her new name
and then later lookup that bride under her old name.
This consistency guarantee could be enforced with a single global sequence
lock as described in \cref{sec:together:Correlated Data Elements}, but
this can result in reader starvation even for readers that are not looking
up a bride who is currently undergoing a name change.
This guarantee could also be enforced by requiring that readers acquire
a per-hash-chain lock, but reviewing
\cref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo}
shows that this results in poor performance and scalabilty, even for
single-socket systems.
Another more reader-friendly way to implement this is to use RCU and to
place a sequence lock on each element.
Readers looking up a given element act as sequence-lock readers across
their full set of accesses to that element.
Note that these sequence-lock operations will order each reader's lookups.
Renaming an element can then proceed roughly as follows:
\begin{enumerate}
\item Acquire a global lock protecting rename operations.
\item Allocate and initialize a copy of the element with the new name.
\item Write-acquire the sequence lock on the element with the old name,
which has the side effect of ordering this acquisition with the
following insertion.
Concurrent lookups of the old name will now repeatedly retry.
\item Insert the copy of the element with the new name.
Lookups of the new name will now succeed.
\item Execute \co{smp_wmb()} to order the prior insertion with the
subsequent removal.
\item Remove the element with the old name.
Concurrent lookups of the old name will now fail.
\item Write-release the sequence lock if necessary, for example, if
required by lock dependency checkers.
\item Release the global lock.
\end{enumerate}
Thus, readers looking up the old name will retry until the new name
is available, at which point their final retry will fail.
Any subsequent lookups of the new name will succeed.
Any reader succeeding in looking up the new name is guaranteed that
any subsequent lookup of the old name will fail, perhaps after a series
of retries.
\QuickQuizSeries{%
\QuickQuizB{
Is it possible to write-acquire the sequence lock on
the new element before it is inserted instead of acquiring
that of the old element before it is removed?
}\QuickQuizAnswerB{
Yes, and the details are left as an exercise to the reader.
The term \emph{tombstone} is sometimes used to refer to the
element with the old name after its sequence lock is acquired.
Similarly, the term \emph{birthstone} is sometimes used to refer
to the element with the new name while its sequence lock is
still held.
}\QuickQuizEndB
\QuickQuizE{
Is it possible to avoid the global lock?
}\QuickQuizAnswerE{
Yes, and one way to do this would be to use per-hash-chain locks.
The updater could acquire lock(s) corresponding to both the old
and the new element, acquiring them in address order.
In this case, the insertion and removal operations would of
course need to refrain from acquiring and releasing these
same per-hash-chain locks.
This complexity can be worthwhile if rename operations are
frequent, and of course can allow rename operations to execute
concurrently.
}\QuickQuizEndE
}% End of \QuickQuizSeries
It is of course possible to instead implement this procedure somewhat
more efficiently using simple flags.
However, this can be thought of as a simplified variant of sequence
locking that relies on the fact that a given element's sequence lock is
never write-acquired more than once.
% @@@ Reference TBD flag-for-deletion section.
\subsection{Upgrade to Writer}
\label{sec:together:Upgrade to Writer}
As discussed in
\cref{sec:defer:Quasi Reader-Writer Lock},
RCU permits readers to upgrade to writers.
This capability can be quite useful when a reader scanning an
RCU-protected data structure notices that the current element
needs to be updated.
What happens when you try this trick with sequence locking?
It turns out that this sequence-locking trick is actually used in
the Linux kernel, for example, by the \co{sdma_flush()} function in
\path{drivers/infiniband/hw/hfi1/sdma.c}.
The effect is to doom the enclosing reader to retry.
This trick is therefore used when the reader detects some condition
that requires a retry.