blob: ba8abfeafecaf0e4cdb9f4e184d09dfe9699ab04 [file] [log] [blame]
% defer/seqlock.tex
\section{Sequence Locks}
\label{sec:defer:Sequence Locks}
\begin{figure}[tb]
\centering
\resizebox{3in}{!}{\includegraphics{cartoons/r-2014-Start-over}}
\caption{Reader And Uncooperative Sequence Lock}
\label{fig:defer:Reader And Uncooperative Sequence Lock}
\end{figure}
Sequence locks are used in the Linux kernel for read-mostly data that
must be seen in a consistent state by readers.
However, unlike reader-writer locking, readers do not exclude writers.
Instead, like hazard pointers, sequence locks force readers to
\emph{retry} an operation if they detect activity from a concurrent writer.
As can be seen from
Figure~\ref{fig:defer:Reader And Uncooperative Sequence Lock},
it is important to design code using sequence locks so that readers
very rarely need to retry.
\QuickQuiz{}
Why isn't this sequence-lock discussion in Chapter~\ref{chp:Locking},
you know, the one on \emph{locking}?
\QuickQuizAnswer{
The sequence-lock mechanism is really a combination of two
separate synchronization mechanisms, sequence counts and
locking.
In fact, the sequence-count mechanism is available separately
in the Linux kernel via the
\co{write_seqcount_begin()} and \co{write_seqcount_end()}
primitives.
However, the combined \co{write_seqlock()} and
\co{write_sequnlock()} primitives are used much more heavily
in the Linux kernel.
More importantly, many more people will understand what you
mean if you say ``sequence lock'' than if you say
``sequence count''.
So this section is entitled ``Sequence Locks'' so that people
will understand what it is about just from the title, and
it appears in the ``Deferred Processing'' because (1) of the
emphasis on the ``sequence count'' aspect of ``sequence locks''
and (2) because a ``sequence lock'' is much more than merely
a lock.
} \QuickQuizEnd
\begin{figure}[bp]
{ \scriptsize
\begin{verbbox}
1 do {
2 seq = read_seqbegin(&test_seqlock);
3 /* read-side access. */
4 } while (read_seqretry(&test_seqlock, seq));
\end{verbbox}
}
\centering
\theverbbox
\caption{Sequence-Locking Reader}
\label{fig:defer:Sequence-Locking Reader}
\end{figure}
\begin{figure}[bp]
{ \scriptsize
\begin{verbbox}
1 write_seqlock(&test_seqlock);
2 /* Update */
3 write_sequnlock(&test_seqlock);
\end{verbbox}
}
\centering
\theverbbox
\caption{Sequence-Locking Writer}
\label{fig:defer:Sequence-Locking Writer}
\end{figure}
The key component of sequence locking is the sequence number, which has
an even value in the absence of updaters and an odd value if there
is an update in progress.
Readers can then snapshot the value before and after each access.
If either snapshot has an odd value, or if the two snapshots differ,
there has been a concurrent update, and the reader must discard
the results of the access and then retry it.
Readers therefore use the \co{read_seqbegin()} and \co{read_seqretry()}
functions shown in Figure~\ref{fig:defer:Sequence-Locking Reader}
when accessing data protected by a sequence lock.
Writers must increment the value before and after each update,
and only one writer is permitted at a given time.
Writers therefore use the \co{write_seqlock()} and \co{write_sequnlock()}
functions shown in Figure~\ref{fig:defer:Sequence-Locking Writer}
when updating data protected by a sequence lock.
As a result, sequence-lock-protected data can have an arbitrarily
large number of concurrent readers, but only one writer at a time.
Sequence locking is used in the Linux kernel to protect calibration
quantities used for timekeeping.
It is also used in pathname traversal to detect concurrent rename operations.
\begin{figure}[tb]
{ \scriptsize
\begin{verbbox}
1 typedef struct {
2 unsigned long seq;
3 spinlock_t lock;
4 } seqlock_t;
5
6 static void seqlock_init(seqlock_t *slp)
7 {
8 slp->seq = 0;
9 spin_lock_init(&slp->lock);
10 }
11
12 static unsigned long read_seqbegin(seqlock_t *slp)
13 {
14 unsigned long s;
15
16 s = ACCESS_ONCE(slp->seq);
17 smp_mb();
18 return s & ~0x1UL;
19 }
20
21 static int read_seqretry(seqlock_t *slp,
22 unsigned long oldseq)
23 {
24 unsigned long s;
25
26 smp_mb();
27 s = ACCESS_ONCE(slp->seq);
28 return s != oldseq;
29 }
30
31 static void write_seqlock(seqlock_t *slp)
32 {
33 spin_lock(&slp->lock);
34 ++slp->seq;
35 smp_mb();
36 }
37
38 static void write_sequnlock(seqlock_t *slp)
39 {
40 smp_mb();
41 ++slp->seq;
42 spin_unlock(&slp->lock);
43 }
\end{verbbox}
}
\centering
\theverbbox
\caption{Sequence-Locking Implementation}
\label{fig:defer:Sequence-Locking Implementation}
\end{figure}
A simple implementation of sequence locks is shown in
Figure~\ref{fig:defer:Sequence-Locking Implementation}
(\path{seqlock.h}).
The \co{seqlock_t} data structure is shown on lines~1-4, and contains
the sequence number along with a lock to serialize writers.
Lines~6-10 show \co{seqlock_init()}, which, as the name indicates,
initializes a \co{seqlock_t}.
Lines~12-19 show \co{read_seqbegin()}, which begins a sequence-lock
read-side critical section.
Line~16 takes a snapshot of the sequence counter, and line~17 orders
this snapshot operation before the caller's critical section.
Finally, line~18 returns the value of the snapshot (with the least-significant
bit cleared), which the caller
will pass to a later call to \co{read_seqretry()}.
\QuickQuiz{}
Why not have \co{read_seqbegin()} in
Figure~\ref{fig:defer:Sequence-Locking Implementation}
check for the low-order bit being set, and retry
internally, rather than allowing a doomed read to start?
\QuickQuizAnswer{
That would be a legitimate implementation.
However, if the workload is read-mostly, it would likely
increase the overhead of the common-case successful read,
which could be counter-productive.
However, given a sufficiently large fraction of updates
and sufficiently high-overhead readers, having the
check internal to \co{read_seqbegin()} might be preferable.
} \QuickQuizEnd
Lines~21-29 show \co{read_seqretry()}, which returns true if there
were no writers present since the time of the corresponding
call to \co{read_seqbegin()}.
Line~26 orders the caller's prior critical section before line~27's
fetch of the new snapshot of the sequence counter.
Finally, line~28 checks that the sequence counter has not changed,
in other words, that there has been no writer, and returns true if so.
\QuickQuiz{}
Why is the \co{smp_mb()} on line~26 of
Figure~\ref{fig:defer:Sequence-Locking Implementation}
needed?
\QuickQuizAnswer{
If it was omitted, both the compiler and the CPU would be
within their rights to move the critical section preceding
the call to \co{read_seqretry()} down below this function.
This would prevent the sequence lock from protecting the
critical section.
The \co{smp_mb()} primitive prevents such reordering.
} \QuickQuizEnd
\QuickQuiz{}
Can't weaker memory barriers be used in the code in
Figure~\ref{fig:defer:Sequence-Locking Implementation}?
\QuickQuizAnswer{
In older versions of the Linux kernel, no.
In very new versions of the Linux kernel, line~16 could use
\co{smp_load_acquire()} instead of \co{ACCESS_ONCE()}, which
in turn would allow the \co{smp_mb()} on line~17 to be dropped.
Similarly, line~41 could use an \co{smp_store_release()}, for
example, as follows: \\
\co{smp_store_release(&slp->seq, ACCESS_ONCE(slp->seq) + 1);} \\
This would allow the \co{smp_mb()} on line~40 to be dropped.
} \QuickQuizEnd
\QuickQuiz{}
What prevents sequence-locking updaters from starving readers?
\QuickQuizAnswer{
Nothing.
This is one of the weaknesses of sequence locking, and as a
result, you should use sequence locking only in read-mostly
situations.
Unless of course read-side starvation is acceptable in your
situation, in which case, go wild with the sequence-locking updates!
} \QuickQuizEnd
Lines~31-36 show \co{write_seqlock()}, which simply acquires the lock,
increments the sequence number, and executes a memory barrier to ensure
that this increment is ordered before the caller's critical section.
Lines~38-43 show \co{write_sequnlock()}, which executes a memory barrier
to ensure that the caller's critical section is ordered before the
increment of the sequence number on line~44, then releases the lock.
\QuickQuiz{}
What if something else serializes writers, so that the lock
is not needed?
\QuickQuizAnswer{
In this case, the \co{->lock} field could be omitted, as it
is in \co{seqcount_t} in the Linux kernel.
} \QuickQuizEnd
\QuickQuiz{}
Why isn't \co{seq} on line~2 of
Figure~\ref{fig:defer:Sequence-Locking Implementation}
\co{unsigned} rather than \co{unsigned long}?
After all, if \co{unsigned} is good enough for the Linux
kernel, shouldn't it be good enough for everyone?
\QuickQuizAnswer{
Not at all.
The Linux kernel has a number of special attributes that allow
it to ignore the following sequence of events:
\begin{enumerate}
\item Thread~0 executes \co{read_seqbegin()}, picking up
\co{->seq} in line~16, noting that the value is even,
and thus returning to the caller.
\item Thread~0 starts executing its read-side critical section,
but is then preempted for a long time.
\item Other threads repeatedly invoke \co{write_seqlock()} and
\co{write_sequnlock()}, until the value of \co{->seq}
overflows back to the value that Thread~0 fetched.
\item Thread~0 resumes execution, completing its read-side
critical section with inconsistent data.
\item Thread~0 invokes \co{read_seqretry()}, which incorrectly
concludes that Thread~0 has seen a consistent view of
the data protected by the sequence lock.
\end{enumerate}
The Linux kernel uses sequence locking for things that are
updated rarely, with time-of-day information being a case
in point.
This information is updated at most once per millisecond,
so that seven weeks would be required to overflow the counter.
If a kernel thread was preempted for seven weeks, the Linux
kernel's soft-lockup code would be emitting warnings every two
minutes for that entire time.
In contrast, with a 64-bit counter, more than five centuries
would be required to overflow, even given an update every
\emph{nano}second.
Therefore, this implementation uses a type for \co{->seq}
that is 64 bits on 64-bit systems.
} \QuickQuizEnd
\begin{figure}[tbp]
{ \scriptsize
\begin{verbbox}
1 struct route_entry {
2 struct route_entry *re_next;
3 unsigned long addr;
4 unsigned long iface;
5 int re_freed;
6 };
7 struct route_entry route_list;
8 DEFINE_SEQ_LOCK(sl);
9
10 unsigned long route_lookup(unsigned long addr)
11 {
12 struct route_entry *rep;
13 struct route_entry **repp;
14 unsigned long ret;
15 unsigned long s;
16
17 retry:
18 s = read_seqbegin(&sl);
19 repp = &route_list.re_next;
20 do {
21 rep = ACCESS_ONCE(*repp);
22 if (rep == NULL) {
23 if (read_seqretry(&sl, s))
24 goto retry;
25 return ULONG_MAX;
26 }
27 repp = &rep->re_next;
28 } while (rep->addr != addr);
29 if (ACCESS_ONCE(rep->re_freed))
30 abort();
31 ret = rep->iface;
32 if (read_seqretry(&sl, s))
33 goto retry;
34 return ret;
35 }
\end{verbbox}
}
\centering
\theverbbox
\caption{Sequence-Locked Pre-BSD Routing Table Lookup (BUGGY!!!)}
\label{fig:defer:Sequence-Locked Pre-BSD Routing Table Lookup}
\end{figure}
\begin{figure}[tbp]
{ \scriptsize
\begin{verbbox}
1 int route_add(unsigned long addr,
2 unsigned long interface)
3 {
4 struct route_entry *rep;
5
6 rep = malloc(sizeof(*rep));
7 if (!rep)
8 return -ENOMEM;
9 rep->addr = addr;
10 rep->iface = interface;
11 rep->re_freed = 0;
12 write_seqlock(&sl);
13 rep->re_next = route_list.re_next;
14 route_list.re_next = rep;
15 write_sequnlock(&sl);
16 return 0;
17 }
18
19 int route_del(unsigned long addr)
20 {
21 struct route_entry *rep;
22 struct route_entry **repp;
23
24 write_seqlock(&sl);
25 repp = &route_list.re_next;
26 for (;;) {
27 rep = *repp;
28 if (rep == NULL)
29 break;
30 if (rep->addr == addr) {
31 *repp = rep->re_next;
32 write_sequnlock(&sl);
33 smp_mb();
34 rep->re_freed = 1;
35 free(rep);
36 return 0;
37 }
38 repp = &rep->re_next;
39 }
40 write_sequnlock(&sl);
41 return -ENOENT;
42 }
\end{verbbox}
}
\centering
\theverbbox
\caption{Sequence-Locked Pre-BSD Routing Table Add/Delete (BUGGY!!!)}
\label{fig:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete}
\end{figure}
So what happens when sequence locking is applied to the Pre-BSD
routing table?
Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Lookup}
shows the data structures and \co{route_lookup()}, and
Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete}
shows \co{route_add()} and \co{route_del()} (\path{route_seqlock.c}).
This implementation is once again similar to its counterparts in earlier
sections, so only the differences will be highlighted.
In
Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Lookup},
line~5 adds \co{->re_freed}, which is checked on lines~29 and~30.
Line~8 adds a sequence lock, which is used by \co{route_lookup()}
on lines~18, 23, and~32, with lines~24 and~33 branching back to
the \co{retry} label on line~17.
The effect is to retry any lookup that runs concurrently with an update.
In
Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete},
lines~12, 15, 24, and~40 acquire and release the sequence lock,
while lines~11, 33, and~44 handle \co{->re_freed}.
This implementation is therefore quite straightforward.
\begin{figure}[tb]
\centering
\resizebox{2.5in}{!}{\includegraphics{CodeSamples/defer/perf-seqlock}}
\caption{Pre-BSD Routing Table Protected by Sequence Locking}
\label{fig:defer:Pre-BSD Routing Table Protected by Sequence Locking}
\end{figure}
It also performs better on the read-only workload, as can be seen in
Figure~\ref{fig:defer:Pre-BSD Routing Table Protected by Sequence Locking},
though its performance is still far from ideal.
Unfortunately, it also suffers use-after-free failures.
The problem is that the reader might encounter a segmentation violation
due to accessing an already-freed structure before it comes to the
\co{read_seqretry()}.
\QuickQuiz{}
Can this bug be fixed?
In other words, can you use sequence locks as the \emph{only}
synchronization mechanism protecting a linked list supporting
concurrent addition, deletion, and lookup?
\QuickQuizAnswer{
One trivial way of accomplishing this is to surround all
accesses, including the read-only accesses, with
\co{write_seqlock()} and \co{write_sequnlock()}.
Of course, this solution also prohibits all read-side
parallelism, resulting in massive lock contention,
and furthermore could just as easily be implemented
using simple locking.
If you do come up with a solution that uses \co{read_seqbegin()}
and \co{read_seqretry()} to protect read-side accesses, make
sure that you correctly handle the following sequence of events:
\begin{enumerate}
\item CPU~0 is traversing the linked list, and picks up a pointer
to list element~A.
\item CPU~1 removes element~A from the list and frees it.
\item CPU~2 allocates an unrelated data structure, and gets
the memory formerly occupied by element~A.
In this unrelated data structure, the memory previously
used for element~A's \co{->next} pointer is now occupied
by a floating-point number.
\item CPU~0 picks up what used to be element~A's \co{->next}
pointer, gets random bits, and therefore gets a
segmentation fault.
\end{enumerate}
One way to protect against this sort of problem requires use
of ``type-safe memory'', which will be discussed in
Section~\ref{sec:defer:RCU is a Way of Providing Type-Safe Memory}.
But in that case, you would be using some other synchronization
mechanism in addition to sequence locks!
} \QuickQuizEnd
Both the read-side and write-side critical sections of a sequence lock
can be thought of as transactions, and sequence locking therefore
can be thought of as a limited form of transactional memory, which
will be discussed in Section~\ref{sec:future:Transactional Memory}.
The limitations of sequence locking are: (1)~Sequence locking restricts
updates and (2)~sequence locking does not permit traversal of pointers
to objects that might be freed by updaters.
These limitations are of course overcome by transactional memory, but
can also be overcome by combining other synchronization primitives
with sequence locking.
Sequence locks allow writers to defer readers, but not vice versa.
This can result in unfairness and even starvation
in writer-heavy workloads.
On the other hand, in the absence of writers, sequence-lock readers are
reasonably fast and scale linearly.
It is only human to want the best of both worlds: fast readers without
the possibility of read-side failure, let alone starvation.
In addition, it would also be nice to overcome sequence locking's limitations
with pointers.
The following section presents a synchronization mechanism with exactly
these properties.