defer/seqlock.tex - pub/scm/linux/kernel/git/paulmck/perfbook - Git at Google

 % defer/seqlock.tex

 \section{Sequence Locks}
 \label{sec:defer:Sequence Locks}

 \begin{figure}[tb]
 \centering
 \resizebox{3in}{!}{\includegraphics{cartoons/r-2014-Start-over}}
 \caption{Reader And Uncooperative Sequence Lock}
 \label{fig:defer:Reader And Uncooperative Sequence Lock}
 \end{figure}

 Sequence locks are used in the Linux kernel for read-mostly data that
 must be seen in a consistent state by readers.
 However, unlike reader-writer locking, readers do not exclude writers.
 Instead, like hazard pointers, sequence locks force readers to
 \emph{retry} an operation if they detect activity from a concurrent writer.
 As can be seen from
 Figure~\ref{fig:defer:Reader And Uncooperative Sequence Lock},
 it is important to design code using sequence locks so that readers
 very rarely need to retry.

 \QuickQuiz{}
 	Why isn't this sequence-lock discussion in Chapter~\ref{chp:Locking},
 	you know, the one on \emph{locking}?
 \QuickQuizAnswer{
 	The sequence-lock mechanism is really a combination of two
 	separate synchronization mechanisms, sequence counts and
 	locking.
 	In fact, the sequence-count mechanism is available separately
 	in the Linux kernel via the
 	\co{write_seqcount_begin()} and \co{write_seqcount_end()}
 	primitives.

 	However, the combined \co{write_seqlock()} and
 	\co{write_sequnlock()} primitives are used much more heavily
 	in the Linux kernel.
 	More importantly, many more people will understand what you
 	mean if you say ``sequence lock'' than if you say
 	``sequence count''.

 	So this section is entitled ``Sequence Locks'' so that people
 	will understand what it is about just from the title, and
 	it appears in the ``Deferred Processing'' because (1) of the
 	emphasis on the ``sequence count'' aspect of ``sequence locks''
 	and (2) because a ``sequence lock'' is much more than merely
 	a lock.
 } \QuickQuizEnd

 \begin{figure}[bp]
 { \scriptsize
 \begin{verbbox}
   1 do {
   2   seq = read_seqbegin(&test_seqlock);
   3   /* read-side access. */
   4 } while (read_seqretry(&test_seqlock, seq));
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Sequence-Locking Reader}
 \label{fig:defer:Sequence-Locking Reader}
 \end{figure}

 \begin{figure}[bp]
 { \scriptsize
 \begin{verbbox}
   1 write_seqlock(&test_seqlock);
   2 /* Update */
   3 write_sequnlock(&test_seqlock);
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Sequence-Locking Writer}
 \label{fig:defer:Sequence-Locking Writer}
 \end{figure}

 The key component of sequence locking is the sequence number, which has
 an even value in the absence of updaters and an odd value if there
 is an update in progress.
 Readers can then snapshot the value before and after each access.
 If either snapshot has an odd value, or if the two snapshots differ,
 there has been a concurrent update, and the reader must discard
 the results of the access and then retry it.
 Readers therefore use the \co{read_seqbegin()} and \co{read_seqretry()}
 functions shown in Figure~\ref{fig:defer:Sequence-Locking Reader}
 when accessing data protected by a sequence lock.
 Writers must increment the value before and after each update,
 and only one writer is permitted at a given time.
 Writers therefore use the \co{write_seqlock()} and \co{write_sequnlock()}
 functions shown in Figure~\ref{fig:defer:Sequence-Locking Writer}
 when updating data protected by a sequence lock.

 As a result, sequence-lock-protected data can have an arbitrarily
 large number of concurrent readers, but only one writer at a time.
 Sequence locking is used in the Linux kernel to protect calibration
 quantities used for timekeeping.
 It is also used in pathname traversal to detect concurrent rename operations.

 \begin{figure}[tb]
 { \scriptsize
 \begin{verbbox}
  1  typedef struct {
  2    unsigned long seq;
  3    spinlock_t lock;
  4  } seqlock_t;
  5
  6  static void seqlock_init(seqlock_t *slp)
  7  {
  8    slp->seq = 0;
  9    spin_lock_init(&slp->lock);
 10  }
 11
 12  static unsigned long read_seqbegin(seqlock_t *slp)
 13  {
 14    unsigned long s;
 15
 16    s = ACCESS_ONCE(slp->seq);
 17    smp_mb();
 18    return s & ~0x1UL;
 19  }
 20
 21  static int read_seqretry(seqlock_t *slp,
 22                           unsigned long oldseq)
 23  {
 24    unsigned long s;
 25
 26    smp_mb();
 27    s = ACCESS_ONCE(slp->seq);
 28    return s != oldseq;
 29  }
 30
 31  static void write_seqlock(seqlock_t *slp)
 32  {
 33    spin_lock(&slp->lock);
 34    ++slp->seq;
 35    smp_mb();
 36  }
 37
 38  static void write_sequnlock(seqlock_t *slp)
 39  {
 40    smp_mb();
 41    ++slp->seq;
 42    spin_unlock(&slp->lock);
 43  }
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Sequence-Locking Implementation}
 \label{fig:defer:Sequence-Locking Implementation}
 \end{figure}

 A simple implementation of sequence locks is shown in
 Figure~\ref{fig:defer:Sequence-Locking Implementation}
 (\path{seqlock.h}).
 The \co{seqlock_t} data structure is shown on lines~1-4, and contains
 the sequence number along with a lock to serialize writers.
 Lines~6-10 show \co{seqlock_init()}, which, as the name indicates,
 initializes a \co{seqlock_t}.

 Lines~12-19 show \co{read_seqbegin()}, which begins a sequence-lock
 read-side critical section.
 Line~16 takes a snapshot of the sequence counter, and line~17 orders
 this snapshot operation before the caller's critical section.
 Finally, line~18 returns the value of the snapshot (with the least-significant
 bit cleared), which the caller
 will pass to a later call to \co{read_seqretry()}.

 \QuickQuiz{}
 	Why not have \co{read_seqbegin()} in
 	Figure~\ref{fig:defer:Sequence-Locking Implementation}
 	check for the low-order bit being set, and retry
 	internally, rather than allowing a doomed read to start?
 \QuickQuizAnswer{
 	That would be a legitimate implementation.
 	However, if the workload is read-mostly, it would likely
 	increase the overhead of the common-case successful read,
 	which could be counter-productive.
 	However, given a sufficiently large fraction of updates
 	and sufficiently high-overhead readers, having the
 	check internal to \co{read_seqbegin()} might be preferable.
 } \QuickQuizEnd

 Lines~21-29 show \co{read_seqretry()}, which returns true if there
 were no writers present since the time of the corresponding
 call to \co{read_seqbegin()}.
 Line~26 orders the caller's prior critical section before line~27's
 fetch of the new snapshot of the sequence counter.
 Finally, line~28 checks that the sequence counter has not changed,
 in other words, that there has been no writer, and returns true if so.

 \QuickQuiz{}
 	Why is the \co{smp_mb()} on line~26 of
 	Figure~\ref{fig:defer:Sequence-Locking Implementation}
 	needed?
 \QuickQuizAnswer{
 	If it was omitted, both the compiler and the CPU would be
 	within their rights to move the critical section preceding
 	the call to \co{read_seqretry()} down below this function.
 	This would prevent the sequence lock from protecting the
 	critical section.
 	The \co{smp_mb()} primitive prevents such reordering.
 } \QuickQuizEnd

 \QuickQuiz{}
 	Can't weaker memory barriers be used in the code in
 	Figure~\ref{fig:defer:Sequence-Locking Implementation}?
 \QuickQuizAnswer{
 	In older versions of the Linux kernel, no.

 	In very new versions of the Linux kernel, line~16 could use
 	\co{smp_load_acquire()} instead of \co{ACCESS_ONCE()}, which
 	in turn would allow the \co{smp_mb()} on line~17 to be dropped.
 	Similarly, line~41 could use an \co{smp_store_release()}, for
 	example, as follows: \\
 	\co{smp_store_release(&slp->seq, ACCESS_ONCE(slp->seq) + 1);} \\
 	This would allow the \co{smp_mb()} on line~40 to be dropped.
 } \QuickQuizEnd

 \QuickQuiz{}
 	What prevents sequence-locking updaters from starving readers?
 \QuickQuizAnswer{
 	Nothing.
 	This is one of the weaknesses of sequence locking, and as a
 	result, you should use sequence locking only in read-mostly
 	situations.
 	Unless of course read-side starvation is acceptable in your
 	situation, in which case, go wild with the sequence-locking updates!
 } \QuickQuizEnd

 Lines~31-36 show \co{write_seqlock()}, which simply acquires the lock,
 increments the sequence number, and executes a memory barrier to ensure
 that this increment is ordered before the caller's critical section.
 Lines~38-43 show \co{write_sequnlock()}, which executes a memory barrier
 to ensure that the caller's critical section is ordered before the
 increment of the sequence number on line~44, then releases the lock.

 \QuickQuiz{}
 	What if something else serializes writers, so that the lock
 	is not needed?
 \QuickQuizAnswer{
 	In this case, the \co{->lock} field could be omitted, as it
 	is in \co{seqcount_t} in the Linux kernel.
 } \QuickQuizEnd

 \QuickQuiz{}
 	Why isn't \co{seq} on line~2 of
 	Figure~\ref{fig:defer:Sequence-Locking Implementation}
 	\co{unsigned} rather than \co{unsigned long}?
 	After all, if \co{unsigned} is good enough for the Linux
 	kernel, shouldn't it be good enough for everyone?
 \QuickQuizAnswer{
 	Not at all.
 	The Linux kernel has a number of special attributes that allow
 	it to ignore the following sequence of events:
 	\begin{enumerate}
 	\item	Thread~0 executes \co{read_seqbegin()}, picking up
 		\co{->seq} in line~16, noting that the value is even,
 		and thus returning to the caller.
 	\item	Thread~0 starts executing its read-side critical section,
 		but is then preempted for a long time.
 	\item	Other threads repeatedly invoke \co{write_seqlock()} and
 		\co{write_sequnlock()}, until the value of \co{->seq}
 		overflows back to the value that Thread~0 fetched.
 	\item	Thread~0 resumes execution, completing its read-side
 		critical section with inconsistent data.
 	\item	Thread~0 invokes \co{read_seqretry()}, which incorrectly
 		concludes that Thread~0 has seen a consistent view of
 		the data protected by the sequence lock.
 	\end{enumerate}

 	The Linux kernel uses sequence locking for things that are
 	updated rarely, with time-of-day information being a case
 	in point.
 	This information is updated at most once per millisecond,
 	so that seven weeks would be required to overflow the counter.
 	If a kernel thread was preempted for seven weeks, the Linux
 	kernel's soft-lockup code would be emitting warnings every two
 	minutes for that entire time.

 	In contrast, with a 64-bit counter, more than five centuries
 	would be required to overflow, even given an update every
 	\emph{nano}second.
 	Therefore, this implementation uses a type for \co{->seq}
 	that is 64 bits on 64-bit systems.
 } \QuickQuizEnd

 \begin{figure}[tbp]
 { \scriptsize
 \begin{verbbox}
  1 struct route_entry {
  2   struct route_entry *re_next;
  3   unsigned long addr;
  4   unsigned long iface;
  5   int re_freed;
  6 };
  7 struct route_entry route_list;
  8 DEFINE_SEQ_LOCK(sl);
  9
 10 unsigned long route_lookup(unsigned long addr)
 11 {
 12   struct route_entry *rep;
 13   struct route_entry **repp;
 14   unsigned long ret;
 15   unsigned long s;
 16
 17 retry:
 18   s = read_seqbegin(&sl);
 19   repp = &route_list.re_next;
 20   do {
 21     rep = ACCESS_ONCE(*repp);
 22     if (rep == NULL) {
 23       if (read_seqretry(&sl, s))
 24         goto retry;
 25       return ULONG_MAX;
 26     }
 27     repp = &rep->re_next;
 28   } while (rep->addr != addr);
 29   if (ACCESS_ONCE(rep->re_freed))
 30     abort();
 31   ret = rep->iface;
 32   if (read_seqretry(&sl, s))
 33     goto retry;
 34   return ret;
 35 }
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Sequence-Locked Pre-BSD Routing Table Lookup (BUGGY!!!)}
 \label{fig:defer:Sequence-Locked Pre-BSD Routing Table Lookup}
 \end{figure}

 \begin{figure}[tbp]
 { \scriptsize
 \begin{verbbox}
  1 int route_add(unsigned long addr,
  2               unsigned long interface)
  3 {
  4   struct route_entry *rep;
  5
  6   rep = malloc(sizeof(*rep));
  7   if (!rep)
  8     return -ENOMEM;
  9   rep->addr = addr;
 10   rep->iface = interface;
 11   rep->re_freed = 0;
 12   write_seqlock(&sl);
 13   rep->re_next = route_list.re_next;
 14   route_list.re_next = rep;
 15   write_sequnlock(&sl);
 16   return 0;
 17 }
 18
 19 int route_del(unsigned long addr)
 20 {
 21   struct route_entry *rep;
 22   struct route_entry **repp;
 23
 24   write_seqlock(&sl);
 25   repp = &route_list.re_next;
 26   for (;;) {
 27     rep = *repp;
 28     if (rep == NULL)
 29       break;
 30     if (rep->addr == addr) {
 31       *repp = rep->re_next;
 32       write_sequnlock(&sl);
 33       smp_mb();
 34       rep->re_freed = 1;
 35       free(rep);
 36       return 0;
 37     }
 38     repp = &rep->re_next;
 39   }
 40   write_sequnlock(&sl);
 41   return -ENOENT;
 42 }
 \end{verbbox}
 }
 \centering
 \theverbbox
 \caption{Sequence-Locked Pre-BSD Routing Table Add/Delete (BUGGY!!!)}
 \label{fig:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete}
 \end{figure}

 So what happens when sequence locking is applied to the Pre-BSD
 routing table?
 Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Lookup}
 shows the data structures and \co{route_lookup()}, and
 Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete}
 shows \co{route_add()} and \co{route_del()} (\path{route_seqlock.c}).
 This implementation is once again similar to its counterparts in earlier
 sections, so only the differences will be highlighted.

 In
 Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Lookup},
 line~5 adds \co{->re_freed}, which is checked on lines~29 and~30.
 Line~8 adds a sequence lock, which is used by \co{route_lookup()}
 on lines~18, 23, and~32, with lines~24 and~33 branching back to
 the \co{retry} label on line~17.
 The effect is to retry any lookup that runs concurrently with an update.

 In
 Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete},
 lines~12, 15, 24, and~40 acquire and release the sequence lock,
 while lines~11, 33, and~44 handle \co{->re_freed}.
 This implementation is therefore quite straightforward.

 \begin{figure}[tb]
 \centering
 \resizebox{2.5in}{!}{\includegraphics{CodeSamples/defer/perf-seqlock}}
 \caption{Pre-BSD Routing Table Protected by Sequence Locking}
 \label{fig:defer:Pre-BSD Routing Table Protected by Sequence Locking}
 \end{figure}

 It also performs better on the read-only workload, as can be seen in
 Figure~\ref{fig:defer:Pre-BSD Routing Table Protected by Sequence Locking},
 though its performance is still far from ideal.

 Unfortunately, it also suffers use-after-free failures.
 The problem is that the reader might encounter a segmentation violation
 due to accessing an already-freed structure before it comes to the
 \co{read_seqretry()}.

 \QuickQuiz{}
 	Can this bug be fixed?
 	In other words, can you use sequence locks as the \emph{only}
 	synchronization mechanism protecting a linked list supporting
 	concurrent addition, deletion, and lookup?
 \QuickQuizAnswer{
 	One trivial way of accomplishing this is to surround all
 	accesses, including the read-only accesses, with
 	\co{write_seqlock()} and \co{write_sequnlock()}.
 	Of course, this solution also prohibits all read-side
 	parallelism, resulting in massive lock contention,
 	and furthermore could just as easily be implemented
 	using simple locking.

 	If you do come up with a solution that uses \co{read_seqbegin()}
 	and \co{read_seqretry()} to protect read-side accesses, make
 	sure that you correctly handle the following sequence of events:

 	\begin{enumerate}
 	\item	CPU~0 is traversing the linked list, and picks up a pointer
 		to list element~A.
 	\item	CPU~1 removes element~A from the list and frees it.
 	\item	CPU~2 allocates an unrelated data structure, and gets
 		the memory formerly occupied by element~A.
 		In this unrelated data structure, the memory previously
 		used for element~A's \co{->next} pointer is now occupied
 		by a floating-point number.
 	\item	CPU~0 picks up what used to be element~A's \co{->next}
 		pointer, gets random bits, and therefore gets a
 		segmentation fault.
 	\end{enumerate}

 	One way to protect against this sort of problem requires use
 	of ``type-safe memory'', which will be discussed in
 	Section~\ref{sec:defer:RCU is a Way of Providing Type-Safe Memory}.
 	But in that case, you would be using some other synchronization
 	mechanism in addition to sequence locks!
 } \QuickQuizEnd

 Both the read-side and write-side critical sections of a sequence lock
 can be thought of as transactions, and sequence locking therefore
 can be thought of as a limited form of transactional memory, which
 will be discussed in Section~\ref{sec:future:Transactional Memory}.
 The limitations of sequence locking are: (1)~Sequence locking restricts
 updates and (2)~sequence locking does not permit traversal of pointers
 to objects that might be freed by updaters.
 These limitations are of course overcome by transactional memory, but
 can also be overcome by combining other synchronization primitives
 with sequence locking.

 Sequence locks allow writers to defer readers, but not vice versa.
 This can result in unfairness and even starvation
 in writer-heavy workloads.
 On the other hand, in the absence of writers, sequence-lock readers are
 reasonably fast and scale linearly.
 It is only human to want the best of both worlds: fast readers without
 the possibility of read-side failure, let alone starvation.
 In addition, it would also be nice to overcome sequence locking's limitations
 with pointers.
 The following section presents a synchronization mechanism with exactly
 these properties.
	% defer/seqlock.tex

	\section{Sequence Locks}
	\label{sec:defer:Sequence Locks}

	\begin{figure}[tb]
	\centering
	\resizebox{3in}{!}{\includegraphics{cartoons/r-2014-Start-over}}
	\caption{Reader And Uncooperative Sequence Lock}
	\label{fig:defer:Reader And Uncooperative Sequence Lock}
	\end{figure}

	Sequence locks are used in the Linux kernel for read-mostly data that
	must be seen in a consistent state by readers.
	However, unlike reader-writer locking, readers do not exclude writers.
	Instead, like hazard pointers, sequence locks force readers to
	\emph{retry} an operation if they detect activity from a concurrent writer.
	As can be seen from
	Figure~\ref{fig:defer:Reader And Uncooperative Sequence Lock},
	it is important to design code using sequence locks so that readers
	very rarely need to retry.

	\QuickQuiz{}
	Why isn't this sequence-lock discussion in Chapter~\ref{chp:Locking},
	you know, the one on \emph{locking}?
	\QuickQuizAnswer{
	The sequence-lock mechanism is really a combination of two
	separate synchronization mechanisms, sequence counts and
	locking.
	In fact, the sequence-count mechanism is available separately
	in the Linux kernel via the
	\co{write_seqcount_begin()} and \co{write_seqcount_end()}
	primitives.

	However, the combined \co{write_seqlock()} and
	\co{write_sequnlock()} primitives are used much more heavily
	in the Linux kernel.
	More importantly, many more people will understand what you
	mean if you say ``sequence lock'' than if you say
	``sequence count''.

	So this section is entitled ``Sequence Locks'' so that people
	will understand what it is about just from the title, and
	it appears in the ``Deferred Processing'' because (1) of the
	emphasis on the ``sequence count'' aspect of ``sequence locks''
	and (2) because a ``sequence lock'' is much more than merely
	a lock.
	} \QuickQuizEnd

	\begin{figure}[bp]
	{ \scriptsize
	\begin{verbbox}
	1 do {
	2 seq = read_seqbegin(&test_seqlock);
	3 /* read-side access. */
	4 } while (read_seqretry(&test_seqlock, seq));
	\end{verbbox}
	}
	\centering
	\theverbbox
	\caption{Sequence-Locking Reader}
	\label{fig:defer:Sequence-Locking Reader}
	\end{figure}

	\begin{figure}[bp]
	{ \scriptsize
	\begin{verbbox}
	1 write_seqlock(&test_seqlock);
	2 /* Update */
	3 write_sequnlock(&test_seqlock);
	\end{verbbox}
	}
	\centering
	\theverbbox
	\caption{Sequence-Locking Writer}
	\label{fig:defer:Sequence-Locking Writer}
	\end{figure}

	The key component of sequence locking is the sequence number, which has
	an even value in the absence of updaters and an odd value if there
	is an update in progress.
	Readers can then snapshot the value before and after each access.
	If either snapshot has an odd value, or if the two snapshots differ,
	there has been a concurrent update, and the reader must discard
	the results of the access and then retry it.
	Readers therefore use the \co{read_seqbegin()} and \co{read_seqretry()}
	functions shown in Figure~\ref{fig:defer:Sequence-Locking Reader}
	when accessing data protected by a sequence lock.
	Writers must increment the value before and after each update,
	and only one writer is permitted at a given time.
	Writers therefore use the \co{write_seqlock()} and \co{write_sequnlock()}
	functions shown in Figure~\ref{fig:defer:Sequence-Locking Writer}
	when updating data protected by a sequence lock.

	As a result, sequence-lock-protected data can have an arbitrarily
	large number of concurrent readers, but only one writer at a time.
	Sequence locking is used in the Linux kernel to protect calibration
	quantities used for timekeeping.
	It is also used in pathname traversal to detect concurrent rename operations.

	\begin{figure}[tb]
	{ \scriptsize
	\begin{verbbox}
	1 typedef struct {
	2 unsigned long seq;
	3 spinlock_t lock;
	4 } seqlock_t;
	5
	6 static void seqlock_init(seqlock_t *slp)
	7 {
	8 slp->seq = 0;
	9 spin_lock_init(&slp->lock);
	10 }
	11
	12 static unsigned long read_seqbegin(seqlock_t *slp)
	13 {
	14 unsigned long s;
	15
	16 s = ACCESS_ONCE(slp->seq);
	17 smp_mb();
	18 return s & ~0x1UL;
	19 }
	20
	21 static int read_seqretry(seqlock_t *slp,
	22 unsigned long oldseq)
	23 {
	24 unsigned long s;
	25
	26 smp_mb();
	27 s = ACCESS_ONCE(slp->seq);
	28 return s != oldseq;
	29 }
	30
	31 static void write_seqlock(seqlock_t *slp)
	32 {
	33 spin_lock(&slp->lock);
	34 ++slp->seq;
	35 smp_mb();
	36 }
	37
	38 static void write_sequnlock(seqlock_t *slp)
	39 {
	40 smp_mb();
	41 ++slp->seq;
	42 spin_unlock(&slp->lock);
	43 }
	\end{verbbox}
	}
	\centering
	\theverbbox
	\caption{Sequence-Locking Implementation}
	\label{fig:defer:Sequence-Locking Implementation}
	\end{figure}

	A simple implementation of sequence locks is shown in
	Figure~\ref{fig:defer:Sequence-Locking Implementation}
	(\path{seqlock.h}).
	The \co{seqlock_t} data structure is shown on lines~1-4, and contains
	the sequence number along with a lock to serialize writers.
	Lines~6-10 show \co{seqlock_init()}, which, as the name indicates,
	initializes a \co{seqlock_t}.

	Lines~12-19 show \co{read_seqbegin()}, which begins a sequence-lock
	read-side critical section.
	Line~16 takes a snapshot of the sequence counter, and line~17 orders
	this snapshot operation before the caller's critical section.
	Finally, line~18 returns the value of the snapshot (with the least-significant
	bit cleared), which the caller
	will pass to a later call to \co{read_seqretry()}.

	\QuickQuiz{}
	Why not have \co{read_seqbegin()} in
	Figure~\ref{fig:defer:Sequence-Locking Implementation}
	check for the low-order bit being set, and retry
	internally, rather than allowing a doomed read to start?
	\QuickQuizAnswer{
	That would be a legitimate implementation.
	However, if the workload is read-mostly, it would likely
	increase the overhead of the common-case successful read,
	which could be counter-productive.
	However, given a sufficiently large fraction of updates
	and sufficiently high-overhead readers, having the
	check internal to \co{read_seqbegin()} might be preferable.
	} \QuickQuizEnd

	Lines~21-29 show \co{read_seqretry()}, which returns true if there
	were no writers present since the time of the corresponding
	call to \co{read_seqbegin()}.
	Line~26 orders the caller's prior critical section before line~27's
	fetch of the new snapshot of the sequence counter.
	Finally, line~28 checks that the sequence counter has not changed,
	in other words, that there has been no writer, and returns true if so.

	\QuickQuiz{}
	Why is the \co{smp_mb()} on line~26 of
	Figure~\ref{fig:defer:Sequence-Locking Implementation}
	needed?
	\QuickQuizAnswer{
	If it was omitted, both the compiler and the CPU would be
	within their rights to move the critical section preceding
	the call to \co{read_seqretry()} down below this function.
	This would prevent the sequence lock from protecting the
	critical section.
	The \co{smp_mb()} primitive prevents such reordering.
	} \QuickQuizEnd

	\QuickQuiz{}
	Can't weaker memory barriers be used in the code in
	Figure~\ref{fig:defer:Sequence-Locking Implementation}?
	\QuickQuizAnswer{
	In older versions of the Linux kernel, no.

	In very new versions of the Linux kernel, line~16 could use
	\co{smp_load_acquire()} instead of \co{ACCESS_ONCE()}, which
	in turn would allow the \co{smp_mb()} on line~17 to be dropped.
	Similarly, line~41 could use an \co{smp_store_release()}, for
	example, as follows: \\
	\co{smp_store_release(&slp->seq, ACCESS_ONCE(slp->seq) + 1);} \\
	This would allow the \co{smp_mb()} on line~40 to be dropped.
	} \QuickQuizEnd

	\QuickQuiz{}
	What prevents sequence-locking updaters from starving readers?
	\QuickQuizAnswer{
	Nothing.
	This is one of the weaknesses of sequence locking, and as a
	result, you should use sequence locking only in read-mostly
	situations.
	Unless of course read-side starvation is acceptable in your
	situation, in which case, go wild with the sequence-locking updates!
	} \QuickQuizEnd

	Lines~31-36 show \co{write_seqlock()}, which simply acquires the lock,
	increments the sequence number, and executes a memory barrier to ensure
	that this increment is ordered before the caller's critical section.
	Lines~38-43 show \co{write_sequnlock()}, which executes a memory barrier
	to ensure that the caller's critical section is ordered before the
	increment of the sequence number on line~44, then releases the lock.

	\QuickQuiz{}
	What if something else serializes writers, so that the lock
	is not needed?
	\QuickQuizAnswer{
	In this case, the \co{->lock} field could be omitted, as it
	is in \co{seqcount_t} in the Linux kernel.
	} \QuickQuizEnd

	\QuickQuiz{}
	Why isn't \co{seq} on line~2 of
	Figure~\ref{fig:defer:Sequence-Locking Implementation}
	\co{unsigned} rather than \co{unsigned long}?
	After all, if \co{unsigned} is good enough for the Linux
	kernel, shouldn't it be good enough for everyone?
	\QuickQuizAnswer{
	Not at all.
	The Linux kernel has a number of special attributes that allow
	it to ignore the following sequence of events:
	\begin{enumerate}
	\item Thread~0 executes \co{read_seqbegin()}, picking up
	\co{->seq} in line~16, noting that the value is even,
	and thus returning to the caller.
	\item Thread~0 starts executing its read-side critical section,
	but is then preempted for a long time.
	\item Other threads repeatedly invoke \co{write_seqlock()} and
	\co{write_sequnlock()}, until the value of \co{->seq}
	overflows back to the value that Thread~0 fetched.
	\item Thread~0 resumes execution, completing its read-side
	critical section with inconsistent data.
	\item Thread~0 invokes \co{read_seqretry()}, which incorrectly
	concludes that Thread~0 has seen a consistent view of
	the data protected by the sequence lock.
	\end{enumerate}

	The Linux kernel uses sequence locking for things that are
	updated rarely, with time-of-day information being a case
	in point.
	This information is updated at most once per millisecond,
	so that seven weeks would be required to overflow the counter.
	If a kernel thread was preempted for seven weeks, the Linux
	kernel's soft-lockup code would be emitting warnings every two
	minutes for that entire time.

	In contrast, with a 64-bit counter, more than five centuries
	would be required to overflow, even given an update every
	\emph{nano}second.
	Therefore, this implementation uses a type for \co{->seq}
	that is 64 bits on 64-bit systems.
	} \QuickQuizEnd

	\begin{figure}[tbp]
	{ \scriptsize
	\begin{verbbox}
	1 struct route_entry {
	2 struct route_entry *re_next;
	3 unsigned long addr;
	4 unsigned long iface;
	5 int re_freed;
	6 };
	7 struct route_entry route_list;
	8 DEFINE_SEQ_LOCK(sl);
	9
	10 unsigned long route_lookup(unsigned long addr)
	11 {
	12 struct route_entry *rep;
	13 struct route_entry **repp;
	14 unsigned long ret;
	15 unsigned long s;
	16
	17 retry:
	18 s = read_seqbegin(&sl);
	19 repp = &route_list.re_next;
	20 do {
	21 rep = ACCESS_ONCE(*repp);
	22 if (rep == NULL) {
	23 if (read_seqretry(&sl, s))
	24 goto retry;
	25 return ULONG_MAX;
	26 }
	27 repp = &rep->re_next;
	28 } while (rep->addr != addr);
	29 if (ACCESS_ONCE(rep->re_freed))
	30 abort();
	31 ret = rep->iface;
	32 if (read_seqretry(&sl, s))
	33 goto retry;
	34 return ret;
	35 }
	\end{verbbox}
	}
	\centering
	\theverbbox
	\caption{Sequence-Locked Pre-BSD Routing Table Lookup (BUGGY!!!)}
	\label{fig:defer:Sequence-Locked Pre-BSD Routing Table Lookup}
	\end{figure}

	\begin{figure}[tbp]
	{ \scriptsize
	\begin{verbbox}
	1 int route_add(unsigned long addr,
	2 unsigned long interface)
	3 {
	4 struct route_entry *rep;
	5
	6 rep = malloc(sizeof(*rep));
	7 if (!rep)
	8 return -ENOMEM;
	9 rep->addr = addr;
	10 rep->iface = interface;
	11 rep->re_freed = 0;
	12 write_seqlock(&sl);
	13 rep->re_next = route_list.re_next;
	14 route_list.re_next = rep;
	15 write_sequnlock(&sl);
	16 return 0;
	17 }
	18
	19 int route_del(unsigned long addr)
	20 {
	21 struct route_entry *rep;
	22 struct route_entry **repp;
	23
	24 write_seqlock(&sl);
	25 repp = &route_list.re_next;
	26 for (;;) {
	27 rep = *repp;
	28 if (rep == NULL)
	29 break;
	30 if (rep->addr == addr) {
	31 *repp = rep->re_next;
	32 write_sequnlock(&sl);
	33 smp_mb();
	34 rep->re_freed = 1;
	35 free(rep);
	36 return 0;
	37 }
	38 repp = &rep->re_next;
	39 }
	40 write_sequnlock(&sl);
	41 return -ENOENT;
	42 }
	\end{verbbox}
	}
	\centering
	\theverbbox
	\caption{Sequence-Locked Pre-BSD Routing Table Add/Delete (BUGGY!!!)}
	\label{fig:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete}
	\end{figure}

	So what happens when sequence locking is applied to the Pre-BSD
	routing table?
	Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Lookup}
	shows the data structures and \co{route_lookup()}, and
	Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete}
	shows \co{route_add()} and \co{route_del()} (\path{route_seqlock.c}).
	This implementation is once again similar to its counterparts in earlier
	sections, so only the differences will be highlighted.

	In
	Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Lookup},
	line~5 adds \co{->re_freed}, which is checked on lines~29 and~30.
	Line~8 adds a sequence lock, which is used by \co{route_lookup()}
	on lines~18, 23, and~32, with lines~24 and~33 branching back to
	the \co{retry} label on line~17.
	The effect is to retry any lookup that runs concurrently with an update.

	In
	Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete},
	lines~12, 15, 24, and~40 acquire and release the sequence lock,
	while lines~11, 33, and~44 handle \co{->re_freed}.
	This implementation is therefore quite straightforward.

	\begin{figure}[tb]
	\centering
	\resizebox{2.5in}{!}{\includegraphics{CodeSamples/defer/perf-seqlock}}
	\caption{Pre-BSD Routing Table Protected by Sequence Locking}
	\label{fig:defer:Pre-BSD Routing Table Protected by Sequence Locking}
	\end{figure}

	It also performs better on the read-only workload, as can be seen in
	Figure~\ref{fig:defer:Pre-BSD Routing Table Protected by Sequence Locking},
	though its performance is still far from ideal.

	Unfortunately, it also suffers use-after-free failures.
	The problem is that the reader might encounter a segmentation violation
	due to accessing an already-freed structure before it comes to the
	\co{read_seqretry()}.

	\QuickQuiz{}
	Can this bug be fixed?
	In other words, can you use sequence locks as the \emph{only}
	synchronization mechanism protecting a linked list supporting
	concurrent addition, deletion, and lookup?
	\QuickQuizAnswer{
	One trivial way of accomplishing this is to surround all
	accesses, including the read-only accesses, with
	\co{write_seqlock()} and \co{write_sequnlock()}.
	Of course, this solution also prohibits all read-side
	parallelism, resulting in massive lock contention,
	and furthermore could just as easily be implemented
	using simple locking.

	If you do come up with a solution that uses \co{read_seqbegin()}
	and \co{read_seqretry()} to protect read-side accesses, make
	sure that you correctly handle the following sequence of events:

	\begin{enumerate}
	\item CPU~0 is traversing the linked list, and picks up a pointer
	to list element~A.
	\item CPU~1 removes element~A from the list and frees it.
	\item CPU~2 allocates an unrelated data structure, and gets
	the memory formerly occupied by element~A.
	In this unrelated data structure, the memory previously
	used for element~A's \co{->next} pointer is now occupied
	by a floating-point number.
	\item CPU~0 picks up what used to be element~A's \co{->next}
	pointer, gets random bits, and therefore gets a
	segmentation fault.
	\end{enumerate}

	One way to protect against this sort of problem requires use
	of ``type-safe memory'', which will be discussed in
	Section~\ref{sec:defer:RCU is a Way of Providing Type-Safe Memory}.
	But in that case, you would be using some other synchronization
	mechanism in addition to sequence locks!
	} \QuickQuizEnd

	Both the read-side and write-side critical sections of a sequence lock
	can be thought of as transactions, and sequence locking therefore
	can be thought of as a limited form of transactional memory, which
	will be discussed in Section~\ref{sec:future:Transactional Memory}.
	The limitations of sequence locking are: (1)~Sequence locking restricts
	updates and (2)~sequence locking does not permit traversal of pointers
	to objects that might be freed by updaters.
	These limitations are of course overcome by transactional memory, but
	can also be overcome by combining other synchronization primitives
	with sequence locking.

	Sequence locks allow writers to defer readers, but not vice versa.
	This can result in unfairness and even starvation
	in writer-heavy workloads.
	On the other hand, in the absence of writers, sequence-lock readers are
	reasonably fast and scale linearly.
	It is only human to want the best of both worlds: fast readers without
	the possibility of read-side failure, let alone starvation.
	In addition, it would also be nice to overcome sequence locking's limitations
	with pointers.
	The following section presents a synchronization mechanism with exactly
	these properties.