| % together/hash.tex |
| % mainfile: ../perfbook.tex |
| % SPDX-License-Identifier: CC-BY-SA-3.0 |
| |
| \section{Hashing Hassles} |
| \label{sec:together:Hashing Hassles} |
| % |
| \epigraph{The girl who can't dance says the band can't play.} |
| {\emph{Yiddish proverb}} |
| |
| This section looks at some issues that can arise when dealing with |
| hash tables. |
| Please note that these issues also apply to many other search structures. |
| |
| \subsection{Correlated Data Elements} |
| \label{sec:together:Correlated Data Elements} |
| |
| This situation is analogous to that in |
| \cref{sec:together:Correlated Fields}: |
| We have a hash table where we need correlated views of two or more of |
| the elements. |
| These elements are updated together, and we do not want to see an old |
| version of the first element along with new versions of the other |
| elements. |
| For example, Schr\"odinger decided to add his extended family to his |
| in-memory database along with all his animals. |
| Although Schr\"odinger understands that marriages and divorces do not |
| happen instantaneously, he is also a traditionalist. |
| As such, he absolutely does not want his database ever to show that the |
| bride is now married, but the groom is not, and vice versa. |
| In other words, Schr\"odinger wants to be able to carry out a |
| wedlock-consistent traversal of his database. |
| |
| One approach is to use sequence locks |
| (see \cref{sec:defer:Sequence Locks}), |
| so that wedlock-related updates are carried out under the |
| protection of \co{write_seqlock()}, while reads requiring |
| wedlock consistency are carried out within |
| a \co{read_seqbegin()} / \co{read_seqretry()} loop. |
| Note that sequence locks are not a replacement for RCU protection: |
| Sequence locks protect against concurrent modifications, but RCU |
| is still needed to protect against concurrent deletions. |
| |
| This approach works quite well when the number of correlated elements is |
| small, the time to read these elements is short, and the update rate is |
| low. |
| Otherwise, updates might happen so quickly that readers might never complete. |
| Although Schr\"odinger does not expect that even his least-sane relatives |
| will marry and divorce quickly enough for this to be a problem, |
| he does realize that this problem could well arise in other situations. |
| One way to avoid this reader-starvation problem is to have the readers |
| use the update-side primitives if there have been too many retries, |
| but this can degrade both performance and scalability. |
| |
| In addition, if the update-side primitives are used too frequently, |
| poor performance and scalability will result due to lock contention. |
| One way to avoid this is to maintain a per-element sequence lock, |
| and to hold both spouses' locks when updating their marital status. |
| Readers can do their retry looping on either of the spouses' locks |
| to gain a stable view of any change in marital status involving both |
| members of the pair. |
| This avoids contention due to high marriage and divorce rates, but |
| complicates gaining a stable view of all marital statuses during a |
| single scan of the database. |
| |
| If the element groupings are well-defined and persistent, which marital |
| status is hoped to be, |
| then one approach is to add pointers to the data elements to link |
| together the members of a given group. |
| Readers can then traverse these pointers to access all the data elements |
| in the same group as the first one located. |
| |
| Other approaches using version numbering are left as exercises for the |
| interested reader. |
| |
| \subsection{Update-Friendly Hash-Table Traversal} |
| \label{sec:together:Update-Friendly Hash-Table Traversal} |
| |
| Suppose that a statistical scan of all elements in a hash table is |
| required. |
| For example, Schr\"odinger might wish to compute the average |
| length-to-weight ratio over all of his animals.\footnote{ |
| Why would such a quantity be useful? |
| Beats me! |
| But group statistics in general are often useful.} |
| Suppose further that Schr\"odinger is willing to ignore slight |
| errors due to animals being added to and removed from the hash |
| table while this statistical scan is being carried out. |
| What should Schr\"odinger do to control concurrency? |
| |
| One approach is to enclose the statistical scan in an RCU read-side |
| critical section. |
| This permits updates to proceed concurrently without unduly impeding |
| the scan. |
| In particular, the scan does not block the updates and vice versa, |
| which allows scan of hash tables containing very large numbers of |
| elements to be supported gracefully, even in the face of very high |
| update rates. |
| |
| \QuickQuiz{} |
| But how does this scan work while a resizable hash table |
| is being resized? |
| In that case, neither the old nor the new hash table is |
| guaranteed to contain all the elements in the hash table! |
| \QuickQuizAnswer{ |
| True, resizable hash tables as described in |
| \cref{sec:datastruct:Non-Partitionable Data Structures} |
| cannot be fully scanned while being resized. |
| One simple way around this is to acquire the |
| \co{hashtab} structure's \co{->ht_lock} while scanning, |
| but this prevents more than one scan from proceeding |
| concurrently. |
| |
| Another approach is for updates to mutate the old hash |
| table as well as the new one while resizing is in |
| progress. |
| This would allow scans to find all elements in the old |
| hash table. |
| Implementing this is left as an exercise for the reader. |
| } \QuickQuizEnd |