blob: 3ed2f48383545699b8f9c001a0d21b67fd9c3bdf [file] [log] [blame]
% defer/seqlock.tex
% mainfile: ../perfbook.tex
% SPDX-License-Identifier: CC-BY-SA-3.0
\section{Sequence Locks}
\label{sec:defer:Sequence Locks}
%
\epigraph{It'll be just like starting over.}{John Lennon}
The published sequence-lock
record~\cite{10.1145/800212.806505,10.1145/359863.359878}
extends back as far as that of reader-writer locking, but sequence locks
nevertheless remain in relative obscurity.
Sequence locks are used in the Linux kernel for read-mostly data that
must be seen in a consistent state by readers.
However, unlike reader-writer locking, readers do not exclude writers.
Instead, like hazard pointers, sequence locks force readers to
\emph{retry} an operation if they detect activity from a concurrent writer.
As can be seen from
\cref{fig:defer:Reader And Uncooperative Sequence Lock},
it is important to design code using sequence locks so that readers
very rarely need to retry.
\begin{figure}
\centering
\resizebox{3in}{!}{\includegraphics{cartoons/r-2014-Start-over}}
\caption{Reader And Uncooperative Sequence Lock}
\label{fig:defer:Reader And Uncooperative Sequence Lock}
\end{figure}
\QuickQuiz{
Why isn't this sequence-lock discussion in \cref{chp:Locking},
you know, the one on \emph{locking}?
}\QuickQuizAnswer{
The sequence-lock mechanism is really a combination of two
separate synchronization mechanisms, sequence counts and
locking.
In fact, the sequence-count mechanism is available separately
in the Linux kernel via the
\co{write_seqcount_begin()} and \co{write_seqcount_end()}
primitives.
However, the combined \co{write_seqlock()} and
\co{write_sequnlock()} primitives are used much more heavily
in the Linux kernel.
More importantly, many more people will understand what you
mean if you say ``sequence lock'' than if you say
``sequence count''.
So this section is entitled ``Sequence Locks'' so that people
will understand what it is about just from the title, and
it appears in the ``Deferred Processing'' because (1) of the
emphasis on the ``sequence count'' aspect of ``sequence locks''
and (2) because a ``sequence lock'' is much more than merely
a lock.
}\QuickQuizEnd
\begin{listing}
\begin{VerbatimL}
do {
seq = read_seqbegin(&test_seqlock);
/* read-side access. */
} while (read_seqretry(&test_seqlock, seq));
\end{VerbatimL}
\caption{Sequence-Locking Reader}
\label{lst:defer:Sequence-Locking Reader}
\end{listing}
\begin{listing}
\begin{VerbatimL}
write_seqlock(&test_seqlock);
/* Update */
write_sequnlock(&test_seqlock);
\end{VerbatimL}
\caption{Sequence-Locking Writer}
\label{lst:defer:Sequence-Locking Writer}
\end{listing}
The key component of sequence locking is the sequence number, which has
an even value in the absence of updaters and an odd value if there
is an update in progress.
Readers can then snapshot the value before and after each access.
If either snapshot has an odd value, or if the two snapshots differ,
there has been a concurrent update, and the reader must discard
the results of the access and then retry it.
Readers therefore use the \co{read_seqbegin()} and \co{read_seqretry()}
functions shown in \cref{lst:defer:Sequence-Locking Reader}
when accessing data protected by a sequence lock.
Writers must increment the value before and after each update,
and only one writer is permitted at a given time.
Writers therefore use the \co{write_seqlock()} and \co{write_sequnlock()}
functions shown in \cref{lst:defer:Sequence-Locking Writer}
when updating data protected by a sequence lock.
As a result, sequence-lock-protected data can have an arbitrarily
large number of concurrent readers, but only one writer at a time.
Sequence locking is used in the Linux kernel to protect calibration
quantities used for timekeeping.
It is also used in pathname traversal to detect concurrent rename operations.
\begin{listing}
\input{CodeSamples/defer/seqlock@impl.fcv}
\caption{Sequence-Locking Implementation}
\label{lst:defer:Sequence-Locking Implementation}
\end{listing}
A simple implementation of sequence locks is shown in
\cref{lst:defer:Sequence-Locking Implementation}
(\path{seqlock.h}).
\begin{fcvref}[ln:defer:seqlock:impl:typedef]
The \co{seqlock_t} data structure is shown on
\clnrefrange{b}{e}, and contains
the sequence number along with a lock to serialize writers.
\end{fcvref}
\begin{fcvref}[ln:defer:seqlock:impl:init]
\Clnrefrange{b}{e} show \co{seqlock_init()}, which, as the name indicates,
initializes a \co{seqlock_t}.
\end{fcvref}
\begin{fcvref}[ln:defer:seqlock:impl:read_seqbegin]
\Clnrefrange{b}{e} show \co{read_seqbegin()}, which begins a sequence-lock
\IXh{read-side}{critical section}.
\Clnref{fetch} takes a snapshot of the sequence counter, and
\clnref{mb} orders
this snapshot operation before the caller's critical section.
Finally, \clnref{ret} returns the value of the snapshot (with the least-significant
bit cleared), which the caller
will pass to a later call to \co{read_seqretry()}.
\end{fcvref}
\QuickQuiz{
Why not have \co{read_seqbegin()} in
\cref{lst:defer:Sequence-Locking Implementation}
check whether the sequence-number value is odd, and, if so,
retry internally rather than entering a doomed read-side critical
section?
}\QuickQuizAnswer{
This would be a legitimate implementation.
But please keep in mind that
\begin{enumerate*}[(1)]
\item This added check is a relatively expensive conditional branch,
\item It cannot be substituted for the later check done by
\tco{read_seqretry()}, which must happen after the
critical section completes, and
\item Sequence locking is intended for read-mostly workloads,
which means that this extra check would slow down the
common case.
\end{enumerate*}
On the other hand, in an alternate universe having a sufficiently
large fraction of updates and sufficiently high-overhead readers,
having this internal-to-\co{read_seqbegin()} check might be
preferable.
\begin{fcvref}[ln:defer:seqlock:impl]
Of course, the full memory barriers
on \clnref{read_seqbegin:mb,read_seqretry:mb} of
\cref{lst:defer:Sequence-Locking Implementation}
are quite heavyweight as instructions go, which suggests that the
overhead of the added check might be negligible.
Except that, in userspace code, the \co{membarrier()} system
call~\cite{Corbet2010membarrier,MathieuDesnoyers2017membarrier,Linuxmanpage2018sys-membarrier}
can be used to eliminate that \co{smp_mb()} overhead on the
read side in exchange for the added overhead of \co{membarrier()}
on the update side.
This feature is on its way into the C++ standard under the name
of ``asymmetric fences''~\cite{DavidGoldblatt2022asymmetricFences}.
Either way, this trick eliminates the update-side \co{smp_mb()}
overhead, which in turn makes eliminating the check more
attractive, arriving again at the form of the code shown in
\cref{lst:defer:Sequence-Locked Pre-BSD Routing Table Lookup}.
\end{fcvref}
This same trick may be applied to Linux-kernel code using tools
such as \co{smp_call_function()}, at least in non-realtime builds
of the Linux kernel.
}\QuickQuizEnd
\begin{fcvref}[ln:defer:seqlock:impl:read_seqretry]
\Clnrefrange{b}{e} show \co{read_seqretry()}, which returns \co{true} if there
was at least one writer since the time of the corresponding
call to \co{read_seqbegin()}.
\Clnref{mb} orders the caller's prior critical section before \clnref{fetch}'s
fetch of the new snapshot of the sequence counter.
\Clnref{ret} checks whether the sequence counter has changed,
in other words, whether there has been at least one writer, and returns
\co{true} if so.
\end{fcvref}
\QuickQuizSeries{%
\QuickQuizB{
Why is the \co{smp_mb()} on
\clnrefr{ln:defer:seqlock:impl:read_seqretry:mb} of
\cref{lst:defer:Sequence-Locking Implementation}
needed?
}\QuickQuizAnswerB{
If it was omitted, both the compiler and the CPU would be
within their rights to move the critical section preceding
the call to \co{read_seqretry()} down below this function.
This would prevent the sequence lock from protecting the
critical section.
The \co{smp_mb()} primitive prevents such reordering.
}\QuickQuizEndB
%
\QuickQuizM{
Can't weaker memory barriers be used in the code in
\cref{lst:defer:Sequence-Locking Implementation}?
}\QuickQuizAnswerM{
In older versions of the Linux kernel, no.
\begin{fcvref}[ln:defer:seqlock:impl]
In very new versions of the Linux kernel,
\clnref{read_seqbegin:fetch} could use
\co{smp_load_acquire()} instead of \co{READ_ONCE()}, which
in turn would allow the \co{smp_mb()} on
\clnref{read_seqbegin:mb} to be dropped.
Similarly, \clnref{write_sequnlock:inc} could use an
\co{smp_store_release()}, for
example, as follows:
\begin{VerbatimU}
smp_store_release(&slp->seq, READ_ONCE(slp->seq) + 1);
\end{VerbatimU}
This would allow the \co{smp_mb()} on
\clnref{write_sequnlock:mb} to be dropped.
\end{fcvref}
}\QuickQuizEndM
%
\QuickQuizE{
What prevents sequence-locking updaters from starving readers?
}\QuickQuizAnswerE{
Nothing.
This is one of the weaknesses of sequence locking, and as a
result, you should use sequence locking only in read-mostly
situations.
Unless of course read-side starvation is acceptable in your
situation, in which case, go wild with the sequence-locking updates!
}\QuickQuizEndE
}
\begin{fcvref}[ln:defer:seqlock:impl:write_seqlock]
\Clnrefrange{b}{e} show \co{write_seqlock()}, which simply acquires the lock,
increments the sequence number, and executes a memory barrier to ensure
that this increment is ordered before the caller's critical section.
\end{fcvref}
\begin{fcvref}[ln:defer:seqlock:impl:write_sequnlock]
\Clnrefrange{b}{e} show \co{write_sequnlock()}, which executes a memory barrier
to ensure that the caller's critical section is ordered before the
increment of the sequence number on \clnref{inc}, then releases the lock.
\end{fcvref}
\QuickQuizSeries{%
\QuickQuizB{
What if something else serializes writers, so that the lock
is not needed?
}\QuickQuizAnswerB{
In this case, the \co{->lock} field could be omitted, as it
is in \co{seqcount_t} in the Linux kernel.
}\QuickQuizEndB
%
\QuickQuizE{
Why isn't \co{seq} on
\clnrefr{ln:defer:seqlock:impl:typedef:seq} of
\cref{lst:defer:Sequence-Locking Implementation}
\co{unsigned} rather than \co{unsigned long}?
After all, if \co{unsigned} is good enough for the Linux
kernel, shouldn't it be good enough for everyone?
}\QuickQuizAnswerE{
Not at all.
The Linux kernel has a number of special attributes that allow
it to ignore the following sequence of events:
\begin{enumerate}
\item Thread~0 executes \co{read_seqbegin()}, picking up
\co{->seq} in
\clnrefr{ln:defer:seqlock:impl:read_seqbegin:fetch},
noting that the value is even,
and thus returning to the caller.
\item Thread~0 starts executing its read-side critical section,
but is then preempted for a long time.
\item Other threads repeatedly invoke \co{write_seqlock()} and
\co{write_sequnlock()}, until the value of \co{->seq}
overflows back to the value that Thread~0 fetched.
\item Thread~0 resumes execution, completing its read-side
critical section with inconsistent data.
\item Thread~0 invokes \co{read_seqretry()}, which incorrectly
concludes that Thread~0 has seen a consistent view of
the data protected by the sequence lock.
\end{enumerate}
The Linux kernel uses sequence locking for things that are
updated rarely, with time-of-day information being a case
in point.
This information is updated at most once per millisecond,
so that seven weeks would be required to overflow the counter.
If a kernel thread was preempted for seven weeks, the Linux
kernel's soft-lockup code would be emitting warnings every two
minutes for that entire time.
In contrast, with a 64-bit counter, more than five centuries
would be required to overflow, even given an update every
\emph{nano}second.
Therefore, this implementation uses a type for \co{->seq}
that is 64 bits on 64-bit systems.
}\QuickQuizEndE
}
\begin{listing}
\input{CodeSamples/defer/route_seqlock@lookup.fcv}
\caption{Sequence-Locked Pre-BSD Routing Table Lookup (BUGGY!!!)}
\label{lst:defer:Sequence-Locked Pre-BSD Routing Table Lookup}
\end{listing}
\begin{listing}
\input{CodeSamples/defer/route_seqlock@add_del.fcv}
\caption{Sequence-Locked Pre-BSD Routing Table Add\slash Delete (BUGGY!!!)}
\label{lst:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete}
\end{listing}
So what happens when sequence locking is applied to the Pre-BSD
routing table?
\Cref{lst:defer:Sequence-Locked Pre-BSD Routing Table Lookup}
shows the data structures and \co{route_lookup()}, and
\cref{lst:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete}
shows \co{route_add()} and \co{route_del()} (\path{route_seqlock.c}).
This implementation is once again similar to its counterparts in earlier
sections, so only the differences will be highlighted.
\begin{fcvref}[ln:defer:route_seqlock:lookup]
In
\cref{lst:defer:Sequence-Locked Pre-BSD Routing Table Lookup},
\clnref{struct:re_freed} adds \co{->re_freed}, which is checked on
\clnref{lookup:chk_freed,lookup:abort}.
\Clnref{struct:sl} adds a sequence lock, which is used by \co{route_lookup()}
\end{fcvref}
\begin{fcvref}[ln:defer:route_seqlock:lookup:lookup]
on \clnref{r_sqbegin,r_sqretry1,r_sqretry2},
with \clnref{goto_retry1,goto_retry2} branching back to
the \co{retry} label on \clnref{retry}.
The effect is to retry any lookup that runs concurrently with an update.
\end{fcvref}
\begin{fcvref}[ln:defer:route_seqlock:add_del]
In
\cref{lst:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete},
\clnref{add:w_sqlock,add:w_squnlock,del:w_sqlock,%
del:w_squnlock1,del:w_squnlock2}
acquire and release the sequence lock,
while \clnref{add:clr_freed,del:set_freed} handle \co{->re_freed}.
This implementation is therefore quite straightforward.
\end{fcvref}
\begin{figure}
\centering
\resizebox{2.5in}{!}{\includegraphics{CodeSamples/defer/data/hps.2019.12.17a/perf-seqlock}}
\caption{Pre-BSD Routing Table Protected by Sequence Locking}
\label{fig:defer:Pre-BSD Routing Table Protected by Sequence Locking}
\end{figure}
It also performs better on the read-only workload, as can be seen in
\cref{fig:defer:Pre-BSD Routing Table Protected by Sequence Locking},
though its performance is still far from ideal.
Worse yet, it suffers use-after-free failures.
The problem is that the reader might encounter a segmentation violation
due to accessing an already-freed structure before \co{read_seqretry()}
has a chance to warn of the concurrent update.
\QuickQuiz{
Can this bug be fixed?
In other words, can you use sequence locks as the \emph{only}
synchronization mechanism protecting a linked list supporting
concurrent addition, deletion, and lookup?
}\QuickQuizAnswer{
One trivial way of accomplishing this is to surround all
accesses, including the read-only accesses, with
\co{write_seqlock()} and \co{write_sequnlock()}.
Of course, this solution also prohibits all read-side
parallelism, resulting in massive lock contention,
and furthermore could just as easily be implemented
using simple locking.
If you do come up with a solution that uses \co{read_seqbegin()}
and \co{read_seqretry()} to protect read-side accesses, make
sure that you correctly handle the following sequence of events:
\begin{enumerate}
\item CPU~0 is traversing the linked list, and picks up a pointer
to list element~A.
\item CPU~1 removes element~A from the list and frees it.
\item CPU~2 allocates an unrelated data structure, and gets
the memory formerly occupied by element~A\@.
In this unrelated data structure, the memory previously
used for element~A's \co{->next} pointer is now occupied
by a floating-point number.
\item CPU~0 picks up what used to be element~A's \co{->next}
pointer, gets random bits, and therefore gets a
segmentation fault.
\end{enumerate}
One way to protect against this sort of problem requires use
of ``type-safe memory'', which will be discussed in
\cref{sec:defer:Type-Safe Memory}.
Roughly similar solutions are possible using the hazard pointers
discussed in
\cref{sec:defer:Hazard Pointers}.
But in either case, you would be using some other synchronization
mechanism in addition to sequence locks!
}\QuickQuizEnd
As hinted on
\cpageref{sec:defer:Mysteries sequence locking},
both the read-side and write-side critical sections of a sequence lock
can be thought of as transactions, and sequence locking therefore can
be thought of as a limited form of transactional memory, which will be
discussed in \cref{sec:future:Transactional Memory}.
The limitations of sequence locking are:
\begin{enumerate*}[(1)]
\item Sequence locking restricts updates and
\item Sequence locking does not permit traversal of pointers
to objects that might be freed by updaters.
\end{enumerate*}
These limitations are of course overcome by transactional memory, but
can also be overcome by combining other synchronization primitives
with sequence locking.
Sequence locks allow writers to defer readers, but not vice versa.
This can result in \IX{unfairness} and even \IX{starvation}
in writer-heavy workloads.\footnote{
Dmitry Vyukov describes one way to reduce (but, sadly, not eliminate)
reader starvation:
\url{http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/improved-lock-free-seqlock}.}
On the other hand, in the absence of writers, sequence-lock readers are
reasonably fast and scale linearly.
It is only human to want the best of both worlds:
Fast readers without the possibility of read-side failure,
let alone starvation.
In addition, it would also be nice to overcome sequence locking's limitations
with pointers.
The following section presents a synchronization mechanism with exactly
these properties.