| % defer/seqlock.tex |
| |
| \section{Sequence Locks} |
| \label{sec:defer:Sequence Locks} |
| |
| \begin{figure}[tb] |
| \centering |
| \resizebox{3in}{!}{\includegraphics{cartoons/r-2014-Start-over}} |
| \caption{Reader And Uncooperative Sequence Lock} |
| \label{fig:defer:Reader And Uncooperative Sequence Lock} |
| \end{figure} |
| |
| Sequence locks are used in the Linux kernel for read-mostly data that |
| must be seen in a consistent state by readers. |
| However, unlike reader-writer locking, readers do not exclude writers. |
| Instead, like hazard pointers, sequence locks force readers to |
| \emph{retry} an operation if they detect activity from a concurrent writer. |
| As can be seen from |
| Figure~\ref{fig:defer:Reader And Uncooperative Sequence Lock}, |
| it is important to design code using sequence locks so that readers |
| very rarely need to retry. |
| |
| \QuickQuiz{} |
| Why isn't this sequence-lock discussion in Chapter~\ref{chp:Locking}, |
| you know, the one on \emph{locking}? |
| \QuickQuizAnswer{ |
| The sequence-lock mechanism is really a combination of two |
| separate synchronization mechanisms, sequence counts and |
| locking. |
| In fact, the sequence-count mechanism is available separately |
| in the Linux kernel via the |
| \co{write_seqcount_begin()} and \co{write_seqcount_end()} |
| primitives. |
| |
| However, the combined \co{write_seqlock()} and |
| \co{write_sequnlock()} primitives are used much more heavily |
| in the Linux kernel. |
| More importantly, many more people will understand what you |
| mean if you say ``sequence lock'' than if you say |
| ``sequence count''. |
| |
| So this section is entitled ``Sequence Locks'' so that people |
| will understand what it is about just from the title, and |
| it appears in the ``Deferred Processing'' because (1) of the |
| emphasis on the ``sequence count'' aspect of ``sequence locks'' |
| and (2) because a ``sequence lock'' is much more than merely |
| a lock. |
| } \QuickQuizEnd |
| |
| \begin{figure}[bp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 do { |
| 2 seq = read_seqbegin(&test_seqlock); |
| 3 /* read-side access. */ |
| 4 } while (read_seqretry(&test_seqlock, seq)); |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Sequence-Locking Reader} |
| \label{fig:defer:Sequence-Locking Reader} |
| \end{figure} |
| |
| \begin{figure}[bp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 write_seqlock(&test_seqlock); |
| 2 /* Update */ |
| 3 write_sequnlock(&test_seqlock); |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Sequence-Locking Writer} |
| \label{fig:defer:Sequence-Locking Writer} |
| \end{figure} |
| |
| The key component of sequence locking is the sequence number, which has |
| an even value in the absence of updaters and an odd value if there |
| is an update in progress. |
| Readers can then snapshot the value before and after each access. |
| If either snapshot has an odd value, or if the two snapshots differ, |
| there has been a concurrent update, and the reader must discard |
| the results of the access and then retry it. |
| Readers therefore use the \co{read_seqbegin()} and \co{read_seqretry()} |
| functions shown in Figure~\ref{fig:defer:Sequence-Locking Reader} |
| when accessing data protected by a sequence lock. |
| Writers must increment the value before and after each update, |
| and only one writer is permitted at a given time. |
| Writers therefore use the \co{write_seqlock()} and \co{write_sequnlock()} |
| functions shown in Figure~\ref{fig:defer:Sequence-Locking Writer} |
| when updating data protected by a sequence lock. |
| |
| As a result, sequence-lock-protected data can have an arbitrarily |
| large number of concurrent readers, but only one writer at a time. |
| Sequence locking is used in the Linux kernel to protect calibration |
| quantities used for timekeeping. |
| It is also used in pathname traversal to detect concurrent rename operations. |
| |
| \begin{figure}[tb] |
| { \scriptsize |
| \begin{verbbox} |
| 1 typedef struct { |
| 2 unsigned long seq; |
| 3 spinlock_t lock; |
| 4 } seqlock_t; |
| 5 |
| 6 static void seqlock_init(seqlock_t *slp) |
| 7 { |
| 8 slp->seq = 0; |
| 9 spin_lock_init(&slp->lock); |
| 10 } |
| 11 |
| 12 static unsigned long read_seqbegin(seqlock_t *slp) |
| 13 { |
| 14 unsigned long s; |
| 15 |
| 16 s = ACCESS_ONCE(slp->seq); |
| 17 smp_mb(); |
| 18 return s & ~0x1UL; |
| 19 } |
| 20 |
| 21 static int read_seqretry(seqlock_t *slp, |
| 22 unsigned long oldseq) |
| 23 { |
| 24 unsigned long s; |
| 25 |
| 26 smp_mb(); |
| 27 s = ACCESS_ONCE(slp->seq); |
| 28 return s != oldseq; |
| 29 } |
| 30 |
| 31 static void write_seqlock(seqlock_t *slp) |
| 32 { |
| 33 spin_lock(&slp->lock); |
| 34 ++slp->seq; |
| 35 smp_mb(); |
| 36 } |
| 37 |
| 38 static void write_sequnlock(seqlock_t *slp) |
| 39 { |
| 40 smp_mb(); |
| 41 ++slp->seq; |
| 42 spin_unlock(&slp->lock); |
| 43 } |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Sequence-Locking Implementation} |
| \label{fig:defer:Sequence-Locking Implementation} |
| \end{figure} |
| |
| A simple implementation of sequence locks is shown in |
| Figure~\ref{fig:defer:Sequence-Locking Implementation} |
| (\path{seqlock.h}). |
| The \co{seqlock_t} data structure is shown on lines~1-4, and contains |
| the sequence number along with a lock to serialize writers. |
| Lines~6-10 show \co{seqlock_init()}, which, as the name indicates, |
| initializes a \co{seqlock_t}. |
| |
| Lines~12-19 show \co{read_seqbegin()}, which begins a sequence-lock |
| read-side critical section. |
| Line~16 takes a snapshot of the sequence counter, and line~17 orders |
| this snapshot operation before the caller's critical section. |
| Finally, line~18 returns the value of the snapshot (with the least-significant |
| bit cleared), which the caller |
| will pass to a later call to \co{read_seqretry()}. |
| |
| \QuickQuiz{} |
| Why not have \co{read_seqbegin()} in |
| Figure~\ref{fig:defer:Sequence-Locking Implementation} |
| check for the low-order bit being set, and retry |
| internally, rather than allowing a doomed read to start? |
| \QuickQuizAnswer{ |
| That would be a legitimate implementation. |
| However, if the workload is read-mostly, it would likely |
| increase the overhead of the common-case successful read, |
| which could be counter-productive. |
| However, given a sufficiently large fraction of updates |
| and sufficiently high-overhead readers, having the |
| check internal to \co{read_seqbegin()} might be preferable. |
| } \QuickQuizEnd |
| |
| Lines~21-29 show \co{read_seqretry()}, which returns true if there |
| were no writers present since the time of the corresponding |
| call to \co{read_seqbegin()}. |
| Line~26 orders the caller's prior critical section before line~27's |
| fetch of the new snapshot of the sequence counter. |
| Finally, line~28 checks that the sequence counter has not changed, |
| in other words, that there has been no writer, and returns true if so. |
| |
| \QuickQuiz{} |
| Why is the \co{smp_mb()} on line~26 of |
| Figure~\ref{fig:defer:Sequence-Locking Implementation} |
| needed? |
| \QuickQuizAnswer{ |
| If it was omitted, both the compiler and the CPU would be |
| within their rights to move the critical section preceding |
| the call to \co{read_seqretry()} down below this function. |
| This would prevent the sequence lock from protecting the |
| critical section. |
| The \co{smp_mb()} primitive prevents such reordering. |
| } \QuickQuizEnd |
| |
| \QuickQuiz{} |
| Can't weaker memory barriers be used in the code in |
| Figure~\ref{fig:defer:Sequence-Locking Implementation}? |
| \QuickQuizAnswer{ |
| In older versions of the Linux kernel, no. |
| |
| In very new versions of the Linux kernel, line~16 could use |
| \co{smp_load_acquire()} instead of \co{ACCESS_ONCE()}, which |
| in turn would allow the \co{smp_mb()} on line~17 to be dropped. |
| Similarly, line~41 could use an \co{smp_store_release()}, for |
| example, as follows: \\ |
| \co{smp_store_release(&slp->seq, ACCESS_ONCE(slp->seq) + 1);} \\ |
| This would allow the \co{smp_mb()} on line~40 to be dropped. |
| } \QuickQuizEnd |
| |
| \QuickQuiz{} |
| What prevents sequence-locking updaters from starving readers? |
| \QuickQuizAnswer{ |
| Nothing. |
| This is one of the weaknesses of sequence locking, and as a |
| result, you should use sequence locking only in read-mostly |
| situations. |
| Unless of course read-side starvation is acceptable in your |
| situation, in which case, go wild with the sequence-locking updates! |
| } \QuickQuizEnd |
| |
| Lines~31-36 show \co{write_seqlock()}, which simply acquires the lock, |
| increments the sequence number, and executes a memory barrier to ensure |
| that this increment is ordered before the caller's critical section. |
| Lines~38-43 show \co{write_sequnlock()}, which executes a memory barrier |
| to ensure that the caller's critical section is ordered before the |
| increment of the sequence number on line~44, then releases the lock. |
| |
| \QuickQuiz{} |
| What if something else serializes writers, so that the lock |
| is not needed? |
| \QuickQuizAnswer{ |
| In this case, the \co{->lock} field could be omitted, as it |
| is in \co{seqcount_t} in the Linux kernel. |
| } \QuickQuizEnd |
| |
| \QuickQuiz{} |
| Why isn't \co{seq} on line~2 of |
| Figure~\ref{fig:defer:Sequence-Locking Implementation} |
| \co{unsigned} rather than \co{unsigned long}? |
| After all, if \co{unsigned} is good enough for the Linux |
| kernel, shouldn't it be good enough for everyone? |
| \QuickQuizAnswer{ |
| Not at all. |
| The Linux kernel has a number of special attributes that allow |
| it to ignore the following sequence of events: |
| \begin{enumerate} |
| \item Thread~0 executes \co{read_seqbegin()}, picking up |
| \co{->seq} in line~16, noting that the value is even, |
| and thus returning to the caller. |
| \item Thread~0 starts executing its read-side critical section, |
| but is then preempted for a long time. |
| \item Other threads repeatedly invoke \co{write_seqlock()} and |
| \co{write_sequnlock()}, until the value of \co{->seq} |
| overflows back to the value that Thread~0 fetched. |
| \item Thread~0 resumes execution, completing its read-side |
| critical section with inconsistent data. |
| \item Thread~0 invokes \co{read_seqretry()}, which incorrectly |
| concludes that Thread~0 has seen a consistent view of |
| the data protected by the sequence lock. |
| \end{enumerate} |
| |
| The Linux kernel uses sequence locking for things that are |
| updated rarely, with time-of-day information being a case |
| in point. |
| This information is updated at most once per millisecond, |
| so that seven weeks would be required to overflow the counter. |
| If a kernel thread was preempted for seven weeks, the Linux |
| kernel's soft-lockup code would be emitting warnings every two |
| minutes for that entire time. |
| |
| In contrast, with a 64-bit counter, more than five centuries |
| would be required to overflow, even given an update every |
| \emph{nano}second. |
| Therefore, this implementation uses a type for \co{->seq} |
| that is 64 bits on 64-bit systems. |
| } \QuickQuizEnd |
| |
| \begin{figure}[tbp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 struct route_entry { |
| 2 struct route_entry *re_next; |
| 3 unsigned long addr; |
| 4 unsigned long iface; |
| 5 int re_freed; |
| 6 }; |
| 7 struct route_entry route_list; |
| 8 DEFINE_SEQ_LOCK(sl); |
| 9 |
| 10 unsigned long route_lookup(unsigned long addr) |
| 11 { |
| 12 struct route_entry *rep; |
| 13 struct route_entry **repp; |
| 14 unsigned long ret; |
| 15 unsigned long s; |
| 16 |
| 17 retry: |
| 18 s = read_seqbegin(&sl); |
| 19 repp = &route_list.re_next; |
| 20 do { |
| 21 rep = ACCESS_ONCE(*repp); |
| 22 if (rep == NULL) { |
| 23 if (read_seqretry(&sl, s)) |
| 24 goto retry; |
| 25 return ULONG_MAX; |
| 26 } |
| 27 repp = &rep->re_next; |
| 28 } while (rep->addr != addr); |
| 29 if (ACCESS_ONCE(rep->re_freed)) |
| 30 abort(); |
| 31 ret = rep->iface; |
| 32 if (read_seqretry(&sl, s)) |
| 33 goto retry; |
| 34 return ret; |
| 35 } |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Sequence-Locked Pre-BSD Routing Table Lookup (BUGGY!!!)} |
| \label{fig:defer:Sequence-Locked Pre-BSD Routing Table Lookup} |
| \end{figure} |
| |
| \begin{figure}[tbp] |
| { \scriptsize |
| \begin{verbbox} |
| 1 int route_add(unsigned long addr, |
| 2 unsigned long interface) |
| 3 { |
| 4 struct route_entry *rep; |
| 5 |
| 6 rep = malloc(sizeof(*rep)); |
| 7 if (!rep) |
| 8 return -ENOMEM; |
| 9 rep->addr = addr; |
| 10 rep->iface = interface; |
| 11 rep->re_freed = 0; |
| 12 write_seqlock(&sl); |
| 13 rep->re_next = route_list.re_next; |
| 14 route_list.re_next = rep; |
| 15 write_sequnlock(&sl); |
| 16 return 0; |
| 17 } |
| 18 |
| 19 int route_del(unsigned long addr) |
| 20 { |
| 21 struct route_entry *rep; |
| 22 struct route_entry **repp; |
| 23 |
| 24 write_seqlock(&sl); |
| 25 repp = &route_list.re_next; |
| 26 for (;;) { |
| 27 rep = *repp; |
| 28 if (rep == NULL) |
| 29 break; |
| 30 if (rep->addr == addr) { |
| 31 *repp = rep->re_next; |
| 32 write_sequnlock(&sl); |
| 33 smp_mb(); |
| 34 rep->re_freed = 1; |
| 35 free(rep); |
| 36 return 0; |
| 37 } |
| 38 repp = &rep->re_next; |
| 39 } |
| 40 write_sequnlock(&sl); |
| 41 return -ENOENT; |
| 42 } |
| \end{verbbox} |
| } |
| \centering |
| \theverbbox |
| \caption{Sequence-Locked Pre-BSD Routing Table Add/Delete (BUGGY!!!)} |
| \label{fig:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete} |
| \end{figure} |
| |
| So what happens when sequence locking is applied to the Pre-BSD |
| routing table? |
| Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Lookup} |
| shows the data structures and \co{route_lookup()}, and |
| Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete} |
| shows \co{route_add()} and \co{route_del()} (\path{route_seqlock.c}). |
| This implementation is once again similar to its counterparts in earlier |
| sections, so only the differences will be highlighted. |
| |
| In |
| Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Lookup}, |
| line~5 adds \co{->re_freed}, which is checked on lines~29 and~30. |
| Line~8 adds a sequence lock, which is used by \co{route_lookup()} |
| on lines~18, 23, and~32, with lines~24 and~33 branching back to |
| the \co{retry} label on line~17. |
| The effect is to retry any lookup that runs concurrently with an update. |
| |
| In |
| Figure~\ref{fig:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete}, |
| lines~12, 15, 24, and~40 acquire and release the sequence lock, |
| while lines~11, 33, and~44 handle \co{->re_freed}. |
| This implementation is therefore quite straightforward. |
| |
| \begin{figure}[tb] |
| \centering |
| \resizebox{2.5in}{!}{\includegraphics{CodeSamples/defer/perf-seqlock}} |
| \caption{Pre-BSD Routing Table Protected by Sequence Locking} |
| \label{fig:defer:Pre-BSD Routing Table Protected by Sequence Locking} |
| \end{figure} |
| |
| It also performs better on the read-only workload, as can be seen in |
| Figure~\ref{fig:defer:Pre-BSD Routing Table Protected by Sequence Locking}, |
| though its performance is still far from ideal. |
| |
| Unfortunately, it also suffers use-after-free failures. |
| The problem is that the reader might encounter a segmentation violation |
| due to accessing an already-freed structure before it comes to the |
| \co{read_seqretry()}. |
| |
| \QuickQuiz{} |
| Can this bug be fixed? |
| In other words, can you use sequence locks as the \emph{only} |
| synchronization mechanism protecting a linked list supporting |
| concurrent addition, deletion, and lookup? |
| \QuickQuizAnswer{ |
| One trivial way of accomplishing this is to surround all |
| accesses, including the read-only accesses, with |
| \co{write_seqlock()} and \co{write_sequnlock()}. |
| Of course, this solution also prohibits all read-side |
| parallelism, resulting in massive lock contention, |
| and furthermore could just as easily be implemented |
| using simple locking. |
| |
| If you do come up with a solution that uses \co{read_seqbegin()} |
| and \co{read_seqretry()} to protect read-side accesses, make |
| sure that you correctly handle the following sequence of events: |
| |
| \begin{enumerate} |
| \item CPU~0 is traversing the linked list, and picks up a pointer |
| to list element~A. |
| \item CPU~1 removes element~A from the list and frees it. |
| \item CPU~2 allocates an unrelated data structure, and gets |
| the memory formerly occupied by element~A. |
| In this unrelated data structure, the memory previously |
| used for element~A's \co{->next} pointer is now occupied |
| by a floating-point number. |
| \item CPU~0 picks up what used to be element~A's \co{->next} |
| pointer, gets random bits, and therefore gets a |
| segmentation fault. |
| \end{enumerate} |
| |
| One way to protect against this sort of problem requires use |
| of ``type-safe memory'', which will be discussed in |
| Section~\ref{sec:defer:RCU is a Way of Providing Type-Safe Memory}. |
| But in that case, you would be using some other synchronization |
| mechanism in addition to sequence locks! |
| } \QuickQuizEnd |
| |
| Both the read-side and write-side critical sections of a sequence lock |
| can be thought of as transactions, and sequence locking therefore |
| can be thought of as a limited form of transactional memory, which |
| will be discussed in Section~\ref{sec:future:Transactional Memory}. |
| The limitations of sequence locking are: (1)~Sequence locking restricts |
| updates and (2)~sequence locking does not permit traversal of pointers |
| to objects that might be freed by updaters. |
| These limitations are of course overcome by transactional memory, but |
| can also be overcome by combining other synchronization primitives |
| with sequence locking. |
| |
| Sequence locks allow writers to defer readers, but not vice versa. |
| This can result in unfairness and even starvation |
| in writer-heavy workloads. |
| On the other hand, in the absence of writers, sequence-lock readers are |
| reasonably fast and scale linearly. |
| It is only human to want the best of both worlds: fast readers without |
| the possibility of read-side failure, let alone starvation. |
| In addition, it would also be nice to overcome sequence locking's limitations |
| with pointers. |
| The following section presents a synchronization mechanism with exactly |
| these properties. |