| % together/seqlock.tex |
| % mainfile: ../perfbook.tex |
| % SPDX-License-Identifier: CC-BY-SA-3.0 |
| |
| \section{Sequence-Locking Specials} |
| \label{sec:together:Sequence-Locking Specials} |
| % |
| \epigraph{The girl who can't dance says the band can't play.} |
| {Yiddish proverb} |
| |
| This section looks at some special uses of sequence locks. |
| |
| \subsection{Dueling Sequence Locks} |
| \label{sec:together:Dueling Sequence Locks} |
| |
| The classic sequence-locking use case enables a reader to see a consistent |
| snapshot of a small collection of variables, for example, calibration |
| constants for timekeeping. |
| This works quite well in practice because calibration constants are |
| rarely updated and, when updated, are updated quickly. |
| Readers therefore almost never need to retry. |
| |
| However, if the updater is delayed during the update, readers will |
| also be delayed. |
| Such delays might be due to interrupts, NMIs, or even |
| virtual-CPU preemption. |
| |
| One way to prevent updater delays from causing reader delays is to |
| maintain two sets of calibration constants. |
| Each set is updated in turn, but frequently enough that readers can |
| make good use of either set. |
| Each set has its own sequence lock (\co{seqlock_t} structure). |
| |
| The updater alternates between the two sets, so that an delayed updater |
| delays readers of at most one of the sets. |
| |
| Each reader attempts to access the first set, but upon retry attempts |
| to access the second set. |
| If the second set also forces a retry, the reader repeats starting |
| again from the first set. |
| If the updater is stuck, only one of the two sets will force |
| readers to retry, and therefore readers will succeed as soon as |
| they attempt to access the other set. |
| |
| \QuickQuiz{ |
| Why don't all sequence-locking use cases replicate the |
| data in this fashion? |
| }\QuickQuizAnswer{ |
| Such replication is impractical if the data is too |
| large, as it might be in the Schr\"odinger's-zoo example |
| described in |
| \cref{sec:together:Correlated Data Elements}. |
| |
| Such replication is unnecessary if delays are prevented, |
| for example, when updaters disable interrupts when running |
| on bare-metal hardware (that is, without the use of |
| a vCPU-preemption-prone hypervisor). |
| |
| Alternatively, if readers can tolerate the occasional delay, |
| then replication is again unnecessary. |
| Consider the example of reader-writer locking, where |
| writers always delay readers and vice versa. |
| |
| However, if the data to be replicated is reasonably |
| small, if delays are possible, and if readers cannot |
| tolerate these delays, replicating the data is an |
| excellent approach. |
| }\QuickQuizEnd |
| |
| \subsection{Correlated Data Elements} |
| \label{sec:together:Correlated Data Elements} |
| |
| Suppose we have a hash table where we need correlated views of two or |
| more of the elements. |
| These elements are updated together, and we do not want to see an old |
| version of the first element along with new versions of the other |
| elements. |
| For example, Schr\"odinger decided to add his extended family to his |
| in-memory database along with all his animals. |
| Although Schr\"odinger understands that marriages and divorces do not |
| happen instantaneously, he is also a traditionalist. |
| As such, he absolutely does not want his database ever to show that the |
| bride is now married, but the groom is not, and vice versa. |
| Plus, if you think Schr\"odinger is a traditionalist, you just |
| try conversing with some of his family members! |
| In other words, Schr\"odinger wants to be able to carry out a |
| wedlock-consistent traversal of his database. |
| |
| One approach is to use sequence locks |
| (see \cref{sec:defer:Sequence Locks}), |
| so that wedlock-related updates are carried out under the |
| protection of \co{write_seqlock()}, while reads requiring |
| wedlock consistency are carried out within |
| a \co{read_seqbegin()} / \co{read_seqretry()} loop. |
| Note that sequence locks are not a replacement for RCU protection: |
| Sequence locks protect against concurrent modifications, but RCU |
| is still needed to protect against concurrent deletions. |
| |
| This approach works quite well when the number of correlated elements is |
| small, the time to read these elements is short, and the update rate is |
| low. |
| Otherwise, updates might happen so quickly that readers might never complete. |
| Although Schr\"odinger does not expect that even his least-sane relatives |
| will marry and divorce quickly enough for this to be a problem, |
| he does realize that this problem could well arise in other situations. |
| One way to avoid this reader-starvation problem is to have the readers |
| use the update-side primitives if there have been too many retries, |
| but this can degrade both performance and scalability. |
| Another way to avoid \IX{starvation} is to have multiple sequence locks, |
| in Schr\"odinger's case, perhaps one per species. |
| |
| In addition, if the update-side primitives are used too frequently, |
| poor performance and scalability will result due to lock contention. |
| One way to avoid this is to maintain a per-element sequence lock, |
| and to hold both spouses' locks when updating their marital status. |
| Readers can do their retry looping on either of the spouses' locks |
| to gain a stable view of any change in marital status involving both |
| members of the pair. |
| This avoids contention due to high marriage and divorce rates, but |
| complicates gaining a stable view of all marital statuses during a |
| single scan of the database. |
| |
| If the element groupings are well-defined and persistent, which marital |
| status is hoped to be, |
| then one approach is to add pointers to the data elements to link |
| together the members of a given group. |
| Readers can then traverse these pointers to access all the data elements |
| in the same group as the first one located. |
| |
| This technique is used heavily in the Linux kernel, perhaps most |
| notably in the dcache subsystem~\cite{NeilBrown2015RCUwalk}. |
| Note that it is likely that similar schemes also work with hazard |
| pointers. |
| |
| This approach provides sequential consistency to successful readers, |
| each of which will either see the effects of a given update or not, |
| with any partial updates resulting in a read-side retry. |
| Sequential consistency is an extremely strong guarantee, incurring equally |
| strong restrictions and equally high overheads. |
| In this case, we saw that readers might be starved on the one hand, or |
| might need to acquire the update-side lock on the other. |
| Although this works very well in cases where updates are infrequent, |
| it unnecessarily forces read-side retries even when the update does not |
| affect any of the data that a retried reader accesses. |
| \Cref{sec:together:Correlated Fields} therefore covers a much weaker form |
| of consistency that not only avoids reader starvation, but also avoids |
| any form of read-side retry. |
| The next section instead presents a weaker form of consistency that |
| can be provided with much lower probabilities of reader starvation. |
| |
| \subsection{Atomic Move} |
| \label{sec:together:Atomic Move} |
| |
| Suppose that individual data elements are moved from one data structure |
| to another, and that readers look up only single data structures. |
| However, when a data element moves, readers must must never see it as |
| being in both structures at the same time and must also never see it as |
| missing from both structures at the same time. |
| At the same time, any reader seeing the element in its new location |
| must never subsequently see it in its old location. |
| In addition, the move may be implemented by inserting a new copy of the |
| old data element into the destination location. |
| |
| For example, consider a hash table that supports an atomic-to-readers |
| rename operation. |
| Expanding on Schr\"odinger's zoo, suppose that an animal's name changes, |
| for example, each of the brides in Schr\"odinger's traditionalist family |
| might change their last name to match that of their groom. |
| |
| But changing their name might change the hash value, and might also |
| require that the bride's element move from one hash chain to another. |
| The consistency set forth above requires that if a reader successfully |
| looks up the new name, then any subsequent lookup of the old name by |
| that reader must result in failure. |
| Similarly, if a reader's lookup of the old name results in lookup failure, |
| then any subsequent lookup of the new name by that reader must succeed. |
| In short, a given reader should not see a bride momentarily blinking out |
| of existence, nor should that reader lookup a bride under her new name |
| and then later lookup that bride under her old name. |
| |
| This consistency guarantee could be enforced with a single global sequence |
| lock as described in \cref{sec:together:Correlated Data Elements}, but |
| this can result in reader starvation even for readers that are not looking |
| up a bride who is currently undergoing a name change. |
| This guarantee could also be enforced by requiring that readers acquire |
| a per-hash-chain lock, but reviewing |
| \cref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo} |
| shows that this results in poor performance and scalabilty, even for |
| single-socket systems. |
| |
| Another more reader-friendly way to implement this is to use RCU and to |
| place a sequence lock on each element. |
| Readers looking up a given element act as sequence-lock readers across |
| their full set of accesses to that element. |
| Note that these sequence-lock operations will order each reader's lookups. |
| |
| Renaming an element can then proceed roughly as follows: |
| |
| \begin{enumerate} |
| \item Acquire a global lock protecting rename operations. |
| \item Allocate and initialize a copy of the element with the new name. |
| \item Write-acquire the sequence lock on the element with the old name, |
| which has the side effect of ordering this acquisition with the |
| following insertion. |
| Concurrent lookups of the old name will now repeatedly retry. |
| \item Insert the copy of the element with the new name. |
| Lookups of the new name will now succeed. |
| \item Execute \co{smp_wmb()} to order the prior insertion with the |
| subsequent removal. |
| \item Remove the element with the old name. |
| Concurrent lookups of the old name will now fail. |
| \item Write-release the sequence lock if necessary, for example, if |
| required by lock dependency checkers. |
| \item Release the global lock. |
| \end{enumerate} |
| |
| Thus, readers looking up the old name will retry until the new name |
| is available, at which point their final retry will fail. |
| Any subsequent lookups of the new name will succeed. |
| Any reader succeeding in looking up the new name is guaranteed that |
| any subsequent lookup of the old name will fail, perhaps after a series |
| of retries. |
| |
| \QuickQuizSeries{% |
| \QuickQuizB{ |
| Is it possible to write-acquire the sequence lock on |
| the new element before it is inserted instead of acquiring |
| that of the old element before it is removed? |
| }\QuickQuizAnswerB{ |
| Yes, and the details are left as an exercise to the reader. |
| |
| The term \emph{tombstone} is sometimes used to refer to the |
| element with the old name after its sequence lock is acquired. |
| Similarly, the term \emph{birthstone} is sometimes used to refer |
| to the element with the new name while its sequence lock is |
| still held. |
| }\QuickQuizEndB |
| |
| \QuickQuizE{ |
| Is it possible to avoid the global lock? |
| }\QuickQuizAnswerE{ |
| Yes, and one way to do this would be to use per-hash-chain locks. |
| The updater could acquire lock(s) corresponding to both the old |
| and the new element, acquiring them in address order. |
| In this case, the insertion and removal operations would of |
| course need to refrain from acquiring and releasing these |
| same per-hash-chain locks. |
| This complexity can be worthwhile if rename operations are |
| frequent, and of course can allow rename operations to execute |
| concurrently. |
| }\QuickQuizEndE |
| }% End of \QuickQuizSeries |
| |
| It is of course possible to instead implement this procedure somewhat |
| more efficiently using simple flags. |
| However, this can be thought of as a simplified variant of sequence |
| locking that relies on the fact that a given element's sequence lock is |
| never write-acquired more than once. |
| |
| % @@@ Reference TBD flag-for-deletion section. |
| |
| \subsection{Upgrade to Writer} |
| \label{sec:together:Upgrade to Writer} |
| |
| As discussed in |
| \cref{sec:defer:Quasi Reader-Writer Lock}, |
| RCU permits readers to upgrade to writers. |
| This capability can be quite useful when a reader scanning an |
| RCU-protected data structure notices that the current element |
| needs to be updated. |
| What happens when you try this trick with sequence locking? |
| |
| It turns out that this sequence-locking trick is actually used in |
| the Linux kernel, for example, by the \co{sdma_flush()} function in |
| \path{drivers/infiniband/hw/hfi1/sdma.c}. |
| The effect is to doom the enclosing reader to retry. |
| This trick is therefore used when the reader detects some condition |
| that requires a retry. |