| % defer/seqlock.tex |
| % mainfile: ../perfbook.tex |
| % SPDX-License-Identifier: CC-BY-SA-3.0 |
| |
| \section{Sequence Locks} |
| \label{sec:defer:Sequence Locks} |
| % |
| \epigraph{It'll be just like starting over.}{John Lennon} |
| |
| The published sequence-lock |
| record~\cite{10.1145/800212.806505,10.1145/359863.359878} |
| extends back as far as that of reader-writer locking, but sequence locks |
| nevertheless remain in relative obscurity. |
| Sequence locks are used in the Linux kernel for read-mostly data that |
| must be seen in a consistent state by readers. |
| However, unlike reader-writer locking, readers do not exclude writers. |
| Instead, like hazard pointers, sequence locks force readers to |
| \emph{retry} an operation if they detect activity from a concurrent writer. |
| As can be seen from |
| \cref{fig:defer:Reader And Uncooperative Sequence Lock}, |
| it is important to design code using sequence locks so that readers |
| very rarely need to retry. |
| |
| \begin{figure} |
| \centering |
| \resizebox{3in}{!}{\includegraphics{cartoons/r-2014-Start-over}} |
| \caption{Reader And Uncooperative Sequence Lock} |
| \label{fig:defer:Reader And Uncooperative Sequence Lock} |
| \end{figure} |
| |
| \QuickQuiz{ |
| Why isn't this sequence-lock discussion in \cref{chp:Locking}, |
| you know, the one on \emph{locking}? |
| }\QuickQuizAnswer{ |
| The sequence-lock mechanism is really a combination of two |
| separate synchronization mechanisms, sequence counts and |
| locking. |
| In fact, the sequence-count mechanism is available separately |
| in the Linux kernel via the |
| \co{write_seqcount_begin()} and \co{write_seqcount_end()} |
| primitives. |
| |
| However, the combined \co{write_seqlock()} and |
| \co{write_sequnlock()} primitives are used much more heavily |
| in the Linux kernel. |
| More importantly, many more people will understand what you |
| mean if you say ``sequence lock'' than if you say |
| ``sequence count''. |
| |
| So this section is entitled ``Sequence Locks'' so that people |
| will understand what it is about just from the title, and |
| it appears in the ``Deferred Processing'' because (1) of the |
| emphasis on the ``sequence count'' aspect of ``sequence locks'' |
| and (2) because a ``sequence lock'' is much more than merely |
| a lock. |
| }\QuickQuizEnd |
| |
| \begin{listing} |
| \begin{VerbatimL} |
| do { |
| seq = read_seqbegin(&test_seqlock); |
| /* read-side access. */ |
| } while (read_seqretry(&test_seqlock, seq)); |
| \end{VerbatimL} |
| \caption{Sequence-Locking Reader} |
| \label{lst:defer:Sequence-Locking Reader} |
| \end{listing} |
| |
| \begin{listing} |
| \begin{VerbatimL} |
| write_seqlock(&test_seqlock); |
| /* Update */ |
| write_sequnlock(&test_seqlock); |
| \end{VerbatimL} |
| \caption{Sequence-Locking Writer} |
| \label{lst:defer:Sequence-Locking Writer} |
| \end{listing} |
| |
| The key component of sequence locking is the sequence number, which has |
| an even value in the absence of updaters and an odd value if there |
| is an update in progress. |
| Readers can then snapshot the value before and after each access. |
| If either snapshot has an odd value, or if the two snapshots differ, |
| there has been a concurrent update, and the reader must discard |
| the results of the access and then retry it. |
| Readers therefore use the \co{read_seqbegin()} and \co{read_seqretry()} |
| functions shown in \cref{lst:defer:Sequence-Locking Reader} |
| when accessing data protected by a sequence lock. |
| Writers must increment the value before and after each update, |
| and only one writer is permitted at a given time. |
| Writers therefore use the \co{write_seqlock()} and \co{write_sequnlock()} |
| functions shown in \cref{lst:defer:Sequence-Locking Writer} |
| when updating data protected by a sequence lock. |
| |
| As a result, sequence-lock-protected data can have an arbitrarily |
| large number of concurrent readers, but only one writer at a time. |
| Sequence locking is used in the Linux kernel to protect calibration |
| quantities used for timekeeping. |
| It is also used in pathname traversal to detect concurrent rename operations. |
| |
| \begin{listing} |
| \input{CodeSamples/defer/seqlock@impl.fcv} |
| \caption{Sequence-Locking Implementation} |
| \label{lst:defer:Sequence-Locking Implementation} |
| \end{listing} |
| |
| A simple implementation of sequence locks is shown in |
| \cref{lst:defer:Sequence-Locking Implementation} |
| (\path{seqlock.h}). |
| \begin{fcvref}[ln:defer:seqlock:impl:typedef] |
| The \co{seqlock_t} data structure is shown on |
| \clnrefrange{b}{e}, and contains |
| the sequence number along with a lock to serialize writers. |
| \end{fcvref} |
| \begin{fcvref}[ln:defer:seqlock:impl:init] |
| \Clnrefrange{b}{e} show \co{seqlock_init()}, which, as the name indicates, |
| initializes a \co{seqlock_t}. |
| \end{fcvref} |
| |
| \begin{fcvref}[ln:defer:seqlock:impl:read_seqbegin] |
| \Clnrefrange{b}{e} show \co{read_seqbegin()}, which begins a sequence-lock |
| \IXh{read-side}{critical section}. |
| \Clnref{fetch} takes a snapshot of the sequence counter, and |
| \clnref{mb} orders |
| this snapshot operation before the caller's critical section. |
| Finally, \clnref{ret} returns the value of the snapshot (with the least-significant |
| bit cleared), which the caller |
| will pass to a later call to \co{read_seqretry()}. |
| \end{fcvref} |
| |
| \QuickQuiz{ |
| Why not have \co{read_seqbegin()} in |
| \cref{lst:defer:Sequence-Locking Implementation} |
| check whether the sequence-number value is odd, and, if so, |
| retry internally rather than entering a doomed read-side critical |
| section? |
| }\QuickQuizAnswer{ |
| This would be a legitimate implementation. |
| |
| But please keep in mind that |
| \begin{enumerate*}[(1)] |
| \item This added check is a relatively expensive conditional branch, |
| \item It cannot be substituted for the later check done by |
| \tco{read_seqretry()}, which must happen after the |
| critical section completes, and |
| \item Sequence locking is intended for read-mostly workloads, |
| which means that this extra check would slow down the |
| common case. |
| \end{enumerate*} |
| |
| On the other hand, in an alternate universe having a sufficiently |
| large fraction of updates and sufficiently high-overhead readers, |
| having this internal-to-\co{read_seqbegin()} check might be |
| preferable. |
| |
| \begin{fcvref}[ln:defer:seqlock:impl] |
| Of course, the full memory barriers |
| on \clnref{read_seqbegin:mb,read_seqretry:mb} of |
| \cref{lst:defer:Sequence-Locking Implementation} |
| are quite heavyweight as instructions go, which suggests that the |
| overhead of the added check might be negligible. |
| Except that, in userspace code, the \co{membarrier()} system |
| call~\cite{Corbet2010membarrier,MathieuDesnoyers2017membarrier,Linuxmanpage2018sys-membarrier} |
| can be used to eliminate that \co{smp_mb()} overhead on the |
| read side in exchange for the added overhead of \co{membarrier()} |
| on the update side. |
| This feature is on its way into the C++ standard under the name |
| of ``asymmetric fences''~\cite{DavidGoldblatt2022asymmetricFences}. |
| Either way, this trick eliminates the update-side \co{smp_mb()} |
| overhead, which in turn makes eliminating the check more |
| attractive, arriving again at the form of the code shown in |
| \cref{lst:defer:Sequence-Locked Pre-BSD Routing Table Lookup}. |
| \end{fcvref} |
| |
| This same trick may be applied to Linux-kernel code using tools |
| such as \co{smp_call_function()}, at least in non-realtime builds |
| of the Linux kernel. |
| }\QuickQuizEnd |
| |
| \begin{fcvref}[ln:defer:seqlock:impl:read_seqretry] |
| \Clnrefrange{b}{e} show \co{read_seqretry()}, which returns \co{true} if there |
| was at least one writer since the time of the corresponding |
| call to \co{read_seqbegin()}. |
| \Clnref{mb} orders the caller's prior critical section before \clnref{fetch}'s |
| fetch of the new snapshot of the sequence counter. |
| \Clnref{ret} checks whether the sequence counter has changed, |
| in other words, whether there has been at least one writer, and returns |
| \co{true} if so. |
| \end{fcvref} |
| |
| \QuickQuizSeries{% |
| \QuickQuizB{ |
| Why is the \co{smp_mb()} on |
| \clnrefr{ln:defer:seqlock:impl:read_seqretry:mb} of |
| \cref{lst:defer:Sequence-Locking Implementation} |
| needed? |
| }\QuickQuizAnswerB{ |
| If it was omitted, both the compiler and the CPU would be |
| within their rights to move the critical section preceding |
| the call to \co{read_seqretry()} down below this function. |
| This would prevent the sequence lock from protecting the |
| critical section. |
| The \co{smp_mb()} primitive prevents such reordering. |
| }\QuickQuizEndB |
| % |
| \QuickQuizM{ |
| Can't weaker memory barriers be used in the code in |
| \cref{lst:defer:Sequence-Locking Implementation}? |
| }\QuickQuizAnswerM{ |
| In older versions of the Linux kernel, no. |
| |
| \begin{fcvref}[ln:defer:seqlock:impl] |
| In very new versions of the Linux kernel, |
| \clnref{read_seqbegin:fetch} could use |
| \co{smp_load_acquire()} instead of \co{READ_ONCE()}, which |
| in turn would allow the \co{smp_mb()} on |
| \clnref{read_seqbegin:mb} to be dropped. |
| Similarly, \clnref{write_sequnlock:inc} could use an |
| \co{smp_store_release()}, for |
| example, as follows: |
| |
| \begin{VerbatimU} |
| smp_store_release(&slp->seq, READ_ONCE(slp->seq) + 1); |
| \end{VerbatimU} |
| |
| This would allow the \co{smp_mb()} on |
| \clnref{write_sequnlock:mb} to be dropped. |
| \end{fcvref} |
| }\QuickQuizEndM |
| % |
| \QuickQuizE{ |
| What prevents sequence-locking updaters from starving readers? |
| }\QuickQuizAnswerE{ |
| Nothing. |
| This is one of the weaknesses of sequence locking, and as a |
| result, you should use sequence locking only in read-mostly |
| situations. |
| Unless of course read-side starvation is acceptable in your |
| situation, in which case, go wild with the sequence-locking updates! |
| }\QuickQuizEndE |
| } |
| |
| \begin{fcvref}[ln:defer:seqlock:impl:write_seqlock] |
| \Clnrefrange{b}{e} show \co{write_seqlock()}, which simply acquires the lock, |
| increments the sequence number, and executes a memory barrier to ensure |
| that this increment is ordered before the caller's critical section. |
| \end{fcvref} |
| \begin{fcvref}[ln:defer:seqlock:impl:write_sequnlock] |
| \Clnrefrange{b}{e} show \co{write_sequnlock()}, which executes a memory barrier |
| to ensure that the caller's critical section is ordered before the |
| increment of the sequence number on \clnref{inc}, then releases the lock. |
| \end{fcvref} |
| |
| \QuickQuizSeries{% |
| \QuickQuizB{ |
| What if something else serializes writers, so that the lock |
| is not needed? |
| }\QuickQuizAnswerB{ |
| In this case, the \co{->lock} field could be omitted, as it |
| is in \co{seqcount_t} in the Linux kernel. |
| }\QuickQuizEndB |
| % |
| \QuickQuizE{ |
| Why isn't \co{seq} on |
| \clnrefr{ln:defer:seqlock:impl:typedef:seq} of |
| \cref{lst:defer:Sequence-Locking Implementation} |
| \co{unsigned} rather than \co{unsigned long}? |
| After all, if \co{unsigned} is good enough for the Linux |
| kernel, shouldn't it be good enough for everyone? |
| }\QuickQuizAnswerE{ |
| Not at all. |
| The Linux kernel has a number of special attributes that allow |
| it to ignore the following sequence of events: |
| \begin{enumerate} |
| \item Thread~0 executes \co{read_seqbegin()}, picking up |
| \co{->seq} in |
| \clnrefr{ln:defer:seqlock:impl:read_seqbegin:fetch}, |
| noting that the value is even, |
| and thus returning to the caller. |
| \item Thread~0 starts executing its read-side critical section, |
| but is then preempted for a long time. |
| \item Other threads repeatedly invoke \co{write_seqlock()} and |
| \co{write_sequnlock()}, until the value of \co{->seq} |
| overflows back to the value that Thread~0 fetched. |
| \item Thread~0 resumes execution, completing its read-side |
| critical section with inconsistent data. |
| \item Thread~0 invokes \co{read_seqretry()}, which incorrectly |
| concludes that Thread~0 has seen a consistent view of |
| the data protected by the sequence lock. |
| \end{enumerate} |
| |
| The Linux kernel uses sequence locking for things that are |
| updated rarely, with time-of-day information being a case |
| in point. |
| This information is updated at most once per millisecond, |
| so that seven weeks would be required to overflow the counter. |
| If a kernel thread was preempted for seven weeks, the Linux |
| kernel's soft-lockup code would be emitting warnings every two |
| minutes for that entire time. |
| |
| In contrast, with a 64-bit counter, more than five centuries |
| would be required to overflow, even given an update every |
| \emph{nano}second. |
| Therefore, this implementation uses a type for \co{->seq} |
| that is 64 bits on 64-bit systems. |
| }\QuickQuizEndE |
| } |
| |
| \begin{listing} |
| \input{CodeSamples/defer/route_seqlock@lookup.fcv} |
| \caption{Sequence-Locked Pre-BSD Routing Table Lookup (BUGGY!!!)} |
| \label{lst:defer:Sequence-Locked Pre-BSD Routing Table Lookup} |
| \end{listing} |
| |
| \begin{listing} |
| \input{CodeSamples/defer/route_seqlock@add_del.fcv} |
| \caption{Sequence-Locked Pre-BSD Routing Table Add\slash Delete (BUGGY!!!)} |
| \label{lst:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete} |
| \end{listing} |
| |
| So what happens when sequence locking is applied to the Pre-BSD |
| routing table? |
| \Cref{lst:defer:Sequence-Locked Pre-BSD Routing Table Lookup} |
| shows the data structures and \co{route_lookup()}, and |
| \cref{lst:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete} |
| shows \co{route_add()} and \co{route_del()} (\path{route_seqlock.c}). |
| This implementation is once again similar to its counterparts in earlier |
| sections, so only the differences will be highlighted. |
| |
| \begin{fcvref}[ln:defer:route_seqlock:lookup] |
| In |
| \cref{lst:defer:Sequence-Locked Pre-BSD Routing Table Lookup}, |
| \clnref{struct:re_freed} adds \co{->re_freed}, which is checked on |
| \clnref{lookup:chk_freed,lookup:abort}. |
| \Clnref{struct:sl} adds a sequence lock, which is used by \co{route_lookup()} |
| \end{fcvref} |
| \begin{fcvref}[ln:defer:route_seqlock:lookup:lookup] |
| on \clnref{r_sqbegin,r_sqretry1,r_sqretry2}, |
| with \clnref{goto_retry1,goto_retry2} branching back to |
| the \co{retry} label on \clnref{retry}. |
| The effect is to retry any lookup that runs concurrently with an update. |
| \end{fcvref} |
| |
| \begin{fcvref}[ln:defer:route_seqlock:add_del] |
| In |
| \cref{lst:defer:Sequence-Locked Pre-BSD Routing Table Add/Delete}, |
| \clnref{add:w_sqlock,add:w_squnlock,del:w_sqlock,% |
| del:w_squnlock1,del:w_squnlock2} |
| acquire and release the sequence lock, |
| while \clnref{add:clr_freed,del:set_freed} handle \co{->re_freed}. |
| This implementation is therefore quite straightforward. |
| \end{fcvref} |
| |
| \begin{figure} |
| \centering |
| \resizebox{2.5in}{!}{\includegraphics{CodeSamples/defer/data/hps.2019.12.17a/perf-seqlock}} |
| \caption{Pre-BSD Routing Table Protected by Sequence Locking} |
| \label{fig:defer:Pre-BSD Routing Table Protected by Sequence Locking} |
| \end{figure} |
| |
| It also performs better on the read-only workload, as can be seen in |
| \cref{fig:defer:Pre-BSD Routing Table Protected by Sequence Locking}, |
| though its performance is still far from ideal. |
| Worse yet, it suffers use-after-free failures. |
| The problem is that the reader might encounter a segmentation violation |
| due to accessing an already-freed structure before \co{read_seqretry()} |
| has a chance to warn of the concurrent update. |
| |
| \QuickQuiz{ |
| Can this bug be fixed? |
| In other words, can you use sequence locks as the \emph{only} |
| synchronization mechanism protecting a linked list supporting |
| concurrent addition, deletion, and lookup? |
| }\QuickQuizAnswer{ |
| One trivial way of accomplishing this is to surround all |
| accesses, including the read-only accesses, with |
| \co{write_seqlock()} and \co{write_sequnlock()}. |
| Of course, this solution also prohibits all read-side |
| parallelism, resulting in massive lock contention, |
| and furthermore could just as easily be implemented |
| using simple locking. |
| |
| If you do come up with a solution that uses \co{read_seqbegin()} |
| and \co{read_seqretry()} to protect read-side accesses, make |
| sure that you correctly handle the following sequence of events: |
| |
| \begin{enumerate} |
| \item CPU~0 is traversing the linked list, and picks up a pointer |
| to list element~A. |
| \item CPU~1 removes element~A from the list and frees it. |
| \item CPU~2 allocates an unrelated data structure, and gets |
| the memory formerly occupied by element~A\@. |
| In this unrelated data structure, the memory previously |
| used for element~A's \co{->next} pointer is now occupied |
| by a floating-point number. |
| \item CPU~0 picks up what used to be element~A's \co{->next} |
| pointer, gets random bits, and therefore gets a |
| segmentation fault. |
| \end{enumerate} |
| |
| One way to protect against this sort of problem requires use |
| of ``type-safe memory'', which will be discussed in |
| \cref{sec:defer:Type-Safe Memory}. |
| Roughly similar solutions are possible using the hazard pointers |
| discussed in |
| \cref{sec:defer:Hazard Pointers}. |
| But in either case, you would be using some other synchronization |
| mechanism in addition to sequence locks! |
| }\QuickQuizEnd |
| |
| As hinted on |
| \cpageref{sec:defer:Mysteries sequence locking}, |
| both the read-side and write-side critical sections of a sequence lock |
| can be thought of as transactions, and sequence locking therefore can |
| be thought of as a limited form of transactional memory, which will be |
| discussed in \cref{sec:future:Transactional Memory}. |
| The limitations of sequence locking are: |
| \begin{enumerate*}[(1)] |
| \item Sequence locking restricts updates and |
| \item Sequence locking does not permit traversal of pointers |
| to objects that might be freed by updaters. |
| \end{enumerate*} |
| These limitations are of course overcome by transactional memory, but |
| can also be overcome by combining other synchronization primitives |
| with sequence locking. |
| |
| Sequence locks allow writers to defer readers, but not vice versa. |
| This can result in \IX{unfairness} and even \IX{starvation} |
| in writer-heavy workloads.\footnote{ |
| Dmitry Vyukov describes one way to reduce (but, sadly, not eliminate) |
| reader starvation: |
| \url{http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/improved-lock-free-seqlock}.} |
| On the other hand, in the absence of writers, sequence-lock readers are |
| reasonably fast and scale linearly. |
| It is only human to want the best of both worlds: |
| Fast readers without the possibility of read-side failure, |
| let alone starvation. |
| In addition, it would also be nice to overcome sequence locking's limitations |
| with pointers. |
| The following section presents a synchronization mechanism with exactly |
| these properties. |