| % defer/hazptr.tex |
| % mainfile: ../perfbook.tex |
| % From an C++ Standards Committee meeting: "Can I hazptr cheezeberger?" |
| |
| \section{Hazard Pointers} |
| \label{sec:defer:Hazard Pointers} |
| % |
| \epigraph{If in doubt, turn it inside out.}{Zara Carpenter} |
| |
| One way of avoiding problems with concurrent reference counting |
| is to implement the reference counters |
| inside out, that is, rather than incrementing an integer stored in the |
| data element, instead store a pointer to that data element in |
| per-CPU (or per-thread) lists. |
| Each element of these lists is called a |
| \emph{\IX{hazard pointer}}~\cite{MagedMichael04a}.\footnote{ |
| Also independently invented by others~\cite{HerlihyLM02}.} |
| The value of a given data element's ``virtual reference counter'' can |
| then be obtained by counting the number of hazard pointers referencing |
| that element. |
| Therefore, if that element has been rendered inaccessible to readers, |
| and there are no longer any hazard pointers referencing it, that element |
| may safely be freed. |
| |
| \begin{listing} |
| \input{CodeSamples/defer/hazptr@record_clear.fcv} |
| \caption{Hazard-Pointer Recording and Clearing} |
| \label{lst:defer:Hazard-Pointer Recording and Clearing} |
| \end{listing} |
| |
| Of course, this means that hazard-pointer acquisition must be carried |
| out quite carefully in order to avoid destructive races with concurrent |
| deletion. |
| \begin{fcvref}[ln:defer:hazptr:record_clear] |
| One implementation is shown in |
| \cref{lst:defer:Hazard-Pointer Recording and Clearing}, |
| which shows \co{hp_try_record()} on \clnrefrange{htr:b}{htr:e}, |
| \co{hp_record()} on \clnrefrange{hr:b}{hr:e}, and |
| \co{hp_clear()} on |
| \clnrefrange{hc:b}{hc:e} (\path{hazptr.h}). |
| |
| The \co{hp_try_record()} macro on \clnref{htr:e} is simply a casting |
| wrapper for the \co{_h_t_r_impl()} function, which attempts to store |
| the pointer referenced by \co{p} into the hazard pointer referenced |
| by \co{hp}. |
| If successful, it returns the value of the stored pointer. |
| If it fails due to that pointer being \co{NULL}, it returns \co{NULL}. |
| Finally, if it fails due to racing with an update, it returns a special |
| \co{HAZPTR_POISON} token. |
| |
| \QuickQuiz{ |
| Given that papers on hazard pointers use the bottom bits |
| of each pointer to mark deleted elements, what is up with |
| \co{HAZPTR_POISON}? |
| }\QuickQuizAnswer{ |
| The published implementations of hazard pointers used |
| non-blocking synchronization techniques for insertion |
| and deletion. |
| These techniques require that readers traversing the |
| data structure ``help'' updaters complete their updates, |
| which in turn means that readers need to look at the successor |
| of a deleted element. |
| |
| In contrast, we will be using locking to synchronize updates, |
| which does away with the need for readers to help updaters |
| complete their updates, which in turn allows us to leave |
| pointers' bottom bits alone. |
| This approach allows read-side code to be simpler and faster. |
| }\QuickQuizEnd |
| |
| \Clnref{htr:ro1} reads the pointer to the object to be protected. |
| If \clnref{htr:race1} finds that this pointer was either \co{NULL} or |
| the special \co{HAZPTR_POISON} deleted-object token, it returns |
| the pointer's value to inform the caller of the failure. |
| Otherwise, \clnref{htr:store} stores the pointer into the specified |
| hazard pointer, and \clnref{htr:mb} forces full ordering of that |
| store with the reload of the original pointer on \clnref{htr:ro2}. |
| (See \cref{chp:Advanced Synchronization: Memory Ordering} |
| for more information on memory ordering.) |
| If the value of the original pointer has not changed, then the hazard |
| pointer protects the pointed-to object, and in that case, |
| \clnref{htr:success} returns a pointer to that object, which also |
| indicates success to the caller. |
| Otherwise, if the pointer changed between the two \co{READ_ONCE()} |
| invocations, \clnref{htr:race2} indicates failure. |
| |
| \QuickQuiz{ |
| Why does \co{hp_try_record()} in |
| \cref{lst:defer:Hazard-Pointer Recording and Clearing} |
| take a double indirection to the data element? |
| Why not \co{void *} instead of \co{void **}? |
| }\QuickQuizAnswer{ |
| Because \co{hp_try_record()} must check for concurrent modifications. |
| To do that job, it needs a pointer to a pointer to the element, |
| so that it can check for a modification to the pointer to the |
| element. |
| }\QuickQuizEnd |
| |
| The \co{hp_record()} function is quite straightforward: |
| It repeatedly invokes \co{hp_try_record()} until the return value |
| is something other than \co{HAZPTR_POISON}. |
| |
| \QuickQuiz{ |
| Why bother with \co{hp_try_record()}? |
| Wouldn't it be easier to just use the failure-immune |
| \co{hp_record()} function? |
| }\QuickQuizAnswer{ |
| It might be easier in some sense, but as will be seen in the |
| Pre-BSD routing example, there are situations for which |
| \co{hp_record()} simply does not work. |
| }\QuickQuizEnd |
| |
| The \co{hp_clear()} function is even more straightforward, with |
| an \co{smp_mb()} to force full ordering between the caller's uses |
| of the object protected by the hazard pointer and the setting of |
| the hazard pointer to \co{NULL}. |
| \end{fcvref} |
| |
| \begin{listing} |
| \ebresizeverb{.91}{\input{CodeSamples/defer/hazptr@scan_free.fcv}} |
| \caption{Hazard-Pointer Scanning and Freeing} |
| \label{lst:defer:Hazard-Pointer Scanning and Freeing} |
| \end{listing} |
| |
| \begin{fcvref}[ln:defer:hazptr:scan_free:free] |
| Once a hazard-pointer-protected object has been removed from its |
| linked data structure, so that it is now inaccessible to future |
| hazard-pointer readers, it is passed to \co{hazptr_free_later()}, |
| which is shown on \clnrefrange{b}{e} of |
| \cref{lst:defer:Hazard-Pointer Scanning and Freeing} |
| (\path{hazptr.c}). |
| \Clnref{enq:b,enq:e} |
| enqueue the object on a per-thread list \co{rlist} |
| and \clnref{count} counts the object in \co{rcount}. |
| If \clnref{check} sees that a sufficiently large number of objects are now |
| queued, \clnref{scan} invokes \co{hazptr_scan()} to attempt to |
| free some of them. |
| \end{fcvref} |
| |
| \begin{fcvref}[ln:defer:hazptr:scan_free:scan] |
| The \co{hazptr_scan()} function is shown on \clnrefrange{b}{e} |
| of the listing. |
| This function relies on a fixed maximum number of threads (\co{NR_THREADS}) |
| and a fixed maximum number of hazard pointers per thread (\co{K}), |
| which allows a fixed-size array of hazard pointers to be used. |
| Because any thread might need to scan the hazard pointers, each thread |
| maintains its own array, which is referenced by the per-thread variable |
| \co{gplist}. |
| If \clnref{check} determines that this thread has not yet allocated its |
| \co{gplist}, \clnrefrange{alloc:b}{alloc:e} carry out the allocation. |
| The \IX{memory barrier} on \clnref{mb1} ensures that all threads see the |
| removal of all objects by this thread before |
| \clnrefrange{loop:b}{loop:e} scan |
| all of the hazard pointers, accumulating non-NULL pointers into |
| the \co{plist} array and counting them in \co{psize}. |
| The memory barrier on \clnref{mb2} ensures that the reads of |
| the hazard pointers |
| happen before any objects are freed. |
| \Clnref{sort} then sorts this array to enable use of binary search below. |
| |
| \Clnref{rem:b,rem:e} |
| remove all elements from this thread's list of |
| to-be-freed objects, placing them on the local \co{tmplist} |
| and \clnref{zero} zeroes the count. |
| Each pass through the loop spanning |
| \clnrefrange{loop2:b}{loop2:e} processes each |
| of the to-be-freed objects. |
| \Clnref{rem1st:b,rem1st:e} |
| remove the first object from \co{tmplist}, |
| and if \clnref{chkhazp:b,chkhazp:e} |
| determine that there is a hazard pointer |
| protecting this object, \clnrefrange{back:b}{back:e} |
| place it back onto \co{rlist}. |
| Otherwise, \clnref{free} frees the object. |
| \end{fcvref} |
| |
| \begin{listing} |
| \input{CodeSamples/defer/route_hazptr@lookup.fcv} |
| \caption{Hazard-Pointer Pre-BSD Routing Table Lookup} |
| \label{lst:defer:Hazard-Pointer Pre-BSD Routing Table Lookup} |
| \end{listing} |
| |
| The Pre-BSD routing example can use hazard pointers as shown in |
| \cref{lst:defer:Hazard-Pointer Pre-BSD Routing Table Lookup} |
| for data structures and \co{route_lookup()}, and in |
| \cref{lst:defer:Hazard-Pointer Pre-BSD Routing Table Add/Delete} |
| for \co{route_add()} and \co{route_del()} |
| (\path{route_hazptr.c}). |
| As with reference counting, the hazard-pointers implementation |
| is quite similar to the sequential algorithm shown in |
| \cref{lst:defer:Sequential Pre-BSD Routing Table} |
| on |
| \cpageref{lst:defer:Sequential Pre-BSD Routing Table}, |
| so only differences will be discussed. |
| |
| \begin{fcvref}[ln:defer:route_hazptr:lookup] |
| Starting with |
| \cref{lst:defer:Hazard-Pointer Pre-BSD Routing Table Lookup}, |
| \clnref{hh} shows the \co{->hh} field used to queue objects pending |
| hazard-pointer free, |
| \clnref{re_freed} shows the \co{->re_freed} field used to detect |
| use-after-free bugs, and \clnref{tryrecord} invokes |
| \co{hp_try_record()} to attempt to acquire a hazard pointer. |
| If the return value is \co{NULL}, \clnref{NULL} returns a not-found |
| indication to the caller. |
| If the call to \co{hp_try_record()} raced with deletion, \clnref{deleted} |
| branches back to \clnref{retry}'s \co{retry} to re-traverse the list |
| from the beginning. |
| The \co{do}--\co{while} loop falls through when the desired element is |
| located, but if this element has already been freed, \clnref{abort} |
| terminates the program. |
| Otherwise, the element's \co{->iface} field is returned to the caller. |
| |
| Note that \clnref{tryrecord} invokes \co{hp_try_record()} rather |
| than the easier-to-use \co{hp_record()}, restarting the full search |
| upon \co{hp_try_record()} failure. |
| And such restarting is absolutely required for correctness. |
| To see this, consider a hazard-pointer-protected linked list |
| containing elements~A, B, and~C that is subjected to the following |
| sequence of events: |
| \end{fcvref} |
| |
| \begin{enumerate} |
| \item Thread~0 stores a hazard pointer to element~B |
| (having presumably traversed to element~B from element~A). |
| \item Thread~1 removes element~B from the list, which sets |
| the pointer from element~B to element~C to the special |
| \co{HAZPTR_POISON} value in order to mark the deletion. |
| Because Thread~0 has a hazard pointer to element~B, |
| it cannot yet be freed. |
| \item Thread~1 removes element~C from the list. |
| Because there are no hazard pointers referencing element~C, |
| it is immediately freed. |
| \item Thread~0 attempts to acquire a hazard pointer to now-removed |
| element~B's successor, but \co{hp_try_record()} returns the |
| \co{HAZPTR_POISON} value, forcing the caller to restart its |
| traversal from the beginning of the list. |
| \end{enumerate} |
| |
| Which is a very good thing, because B's successor is the now-freed |
| element~C, which means that Thread~0's subsequent accesses might have |
| resulted in arbitrarily horrible memory corruption, especially if the |
| memory for element~C had since been re-allocated for some other purpose. |
| Therefore, hazard-pointer readers must typically restart the full |
| traversal in the face of a concurrent deletion. |
| Often the restart must go back to some global (and thus immortal) pointer, |
| but it is sometimes possible to restart at some intermediate location |
| if that location is guaranteed to still be live, for example, due to |
| the current thread holding a lock, a reference count, etc. |
| |
| \QuickQuiz{ |
| Readers must ``typically'' restart? |
| What are some exceptions? |
| }\QuickQuizAnswer{ |
| If the pointer emanates from a global variable or is otherwise |
| not subject to being freed, then \co{hp_record()} may be |
| used to repeatedly attempt to record the hazard pointer, |
| even in the face of concurrent deletions. |
| |
| In certain cases, restart can be avoided by using link counting |
| as exemplified by the UnboundedQueue and ConcurrentHashMap data |
| structures implemented in Folly open-source library.\footnote{ |
| \url{https://github.com/facebook/folly}} |
| }\QuickQuizEnd |
| |
| Because algorithms using hazard pointers might be restarted at any |
| step of their traversal through the linked data structure, such algorithms |
| must typically take care to avoid making any changes to the data |
| structure until after they have acquired all the hazard pointers that |
| are required for the update in question. |
| |
| \QuickQuiz{ |
| But don't these restrictions on hazard pointers also apply |
| to other forms of reference counting? |
| }\QuickQuizAnswer{ |
| Yes and no. |
| These restrictions apply only to reference-counting mechanisms whose |
| reference acquisition can fail. |
| }\QuickQuizEnd |
| |
| These hazard-pointer restrictions result in great benefits to readers, |
| courtesy of the fact that the hazard pointers are stored local to each |
| CPU or thread, which in turn allows traversals to be carried out without |
| any writes to the data structures being traversed. |
| Referring back to |
| \cref{fig:count:Optimization and the Four Parallel-Programming Tasks} |
| on |
| \cpageref{fig:count:Optimization and the Four Parallel-Programming Tasks}, |
| hazard pointers enable the CPU caches to do resource replication, which |
| in turn allows weakening of the parallel-access-control mechanism, |
| thus boosting performance and scalability. |
| |
| Another advantage of restarting hazard pointers traversals is a reduction in |
| minimal memory footprint: |
| Any object not currently referenced by some hazard pointer may be |
| immediately freed. |
| In contrast, |
| \cref{sec:defer:Read-Copy Update (RCU)} |
| will discuss a mechanism that avoids read-side retries (and minimizes |
| read-side overhead), but which can result in a much larger memory |
| footprint. |
| |
| \begin{listing} |
| \input{CodeSamples/defer/route_hazptr@add_del.fcv} |
| \caption{Hazard-Pointer Pre-BSD Routing Table Add\slash Delete} |
| \label{lst:defer:Hazard-Pointer Pre-BSD Routing Table Add/Delete} |
| \end{listing} |
| |
| \begin{fcvref}[ln:defer:route_hazptr:add_del] |
| The \co{route_add()} and \co{route_del()} functions are shown in |
| \cref{lst:defer:Hazard-Pointer Pre-BSD Routing Table Add/Delete}. |
| \Clnref{init_freed} initializes \co{->re_freed}, |
| \clnref{poison} poisons the \co{->re_next} field of the newly removed |
| object, and |
| \clnref{free_later} passes that object to the |
| \co{hazptr_free_later()} function, which will free that object once it |
| is safe to do so. |
| The spinlocks work the same as in |
| \cref{lst:defer:Reference-Counted Pre-BSD Routing Table Add/Delete}. |
| \end{fcvref} |
| |
| \begin{figure} |
| \centering |
| \resizebox{2.5in}{!}{\includegraphics{CodeSamples/defer/data/hps.2019.12.17a/perf-hazptr}} |
| \caption{Pre-BSD Routing Table Protected by Hazard Pointers} |
| \label{fig:defer:Pre-BSD Routing Table Protected by Hazard Pointers} |
| \end{figure} |
| |
| \Cref{fig:defer:Pre-BSD Routing Table Protected by Hazard Pointers} |
| shows the hazard-pointers-protected Pre-BSD routing algorithm's |
| performance on the same read-only workload as for |
| \cref{fig:defer:Pre-BSD Routing Table Protected by Reference Counting}. |
| Although hazard pointers scale far better than does reference counting, |
| hazard pointers still require readers to do writes to shared |
| memory (albeit with much improved locality of reference), |
| and also require a full memory barrier and retry check for each |
| object traversed. |
| Therefore, hazard-pointers performance is still far short of ideal. |
| On the other hand, unlike naive approaches to concurrent |
| reference-counting, hazard pointers not only operate correctly for |
| workloads involving concurrent updates, but also exhibit excellent |
| scalability. |
| Additional performance comparisons with other mechanisms may be found in |
| \cref{chp:Data Structures} |
| and in other publications~\cite{ThomasEHart2007a,McKenney:2013:SDS:2483852.2483867,MagedMichael04a}. |
| |
| \QuickQuizSeries{% |
| \QuickQuizB{ |
| \Cref{fig:defer:Pre-BSD Routing Table Protected by Hazard Pointers} |
| shows no sign of hyperthread-induced flattening at 224 threads. |
| Why is that? |
| }\QuickQuizAnswerB{ |
| Modern microprocessors are complicated beasts, so significant |
| skepticism is appropriate for any simple answer. |
| That aside, the most likely reason is the full memory barriers |
| required by hazard-pointers readers. |
| Any delays resulting from those memory barriers would make time |
| available to the other hardware thread sharing the core, resulting |
| in greater scalability at the expense of per-hardware-thread |
| performance. |
| }\QuickQuizEndB |
| % |
| \QuickQuizE{ |
| The paper ``Structured Deferral: |
| Synchronization via |
| Procrastination''~\cite{McKenney:2013:SDS:2483852.2483867} |
| shows that hazard pointers have near-ideal performance. |
| Whatever happened in |
| \cref{fig:defer:Pre-BSD Routing Table Protected by Hazard Pointers}??? |
| }\QuickQuizAnswerE{ |
| First, |
| \cref{fig:defer:Pre-BSD Routing Table Protected by Hazard Pointers} |
| has a linear y-axis, while most of the graphs in the |
| ``Structured Deferral'' paper have logscale y-axes. |
| Next, that paper uses lightly-loaded hash tables, while |
| \cref{fig:defer:Pre-BSD Routing Table Protected by Hazard Pointers}'s |
| uses a 10-element simple linked list, which means that hazard pointers |
| face a larger memory-barrier penalty in this workload than in |
| that of the ``Structured Deferral'' paper. |
| Finally, that paper used an older modest-sized x86 system, while |
| a much newer and larger system was used to generate the data |
| shown in |
| \cref{fig:defer:Pre-BSD Routing Table Protected by Hazard Pointers}. |
| |
| In addition, use of pairwise asymmetric |
| barriers~\cite{Windows2008FlushProcessWriteBuffers,JonathanCorbet2010sys-membarrier,Linuxmanpage2018sys-membarrier} |
| has been proposed to eliminate the read-side hazard-pointer |
| memory barriers on systems supporting this notion~\cite{DavidGoldblatt2018asymmetricFences}, |
| which might improve the performance of hazard pointers beyond |
| what is shown in the figure. |
| |
| As always, your mileage may vary. |
| Given the difference in performance, it is clear that hazard |
| pointers give you the best performance either for |
| very large data structures (where the memory-barrier overhead |
| will at least partially overlap cache-miss penalties) and |
| for data structures such as hash tables where a lookup |
| operation needs a minimal number of hazard pointers. |
| }\QuickQuizEndE |
| } |
| |
| On June 17, 2023, the ISO C++ Standards committee voted hazard pointers |
| into C++26~\cite{MagedMichael2023C++26Hazptr3}. |
| Daniel Anderson has produced a prototype C++ atomic shared-pointer |
| implementation based on hazard |
| pointers~\cite{DanielAnderson2023HazptrSharedPtr}. |
| |
| And hazard pointers are the concurrent reference counter mentioned |
| on \cpageref{sec:defer:Mysteries hazard pointers}. |
| The next section attempts to improve on hazard pointers by using |
| sequence locks, which avoid both read-side writes and per-object memory |
| barriers. |