| % glossary.tex |
| % mainfile: perfbook.tex |
| % SPDX-License-Identifier: CC-BY-SA-3.0 |
| |
| \chapter{Glossary} |
| % |
| \Epigraph{Dictionaries are inherently circular in nature.} |
| {\emph{Self Reference in word definitions}, |
| David~Levary~et~al.} |
| |
| \begin{description} |
| \item[\IXG{Acquire Load}:] |
| A read from memory that has acquire semantics. |
| Normal use cases pair an acquire load with a release store, |
| in which case if the load returns the value stored, then all |
| code executed by the loading CPU after that acquire load will |
| see the effects of all memory-reference instructions executed |
| by the storing CPU prior to that release store. |
| Acquiring a lock provides similar memory-ordering semantics, |
| hence the ``acquire'' in ``acquire load''. |
| (See also ``memory barrier'' and ``release store''.) |
| \item[\IXGh{Address}{dependency}:] |
| The value returned by a load instruction is used to compute |
| a later memory-reference instruction's address. |
| Address dependencies provide weak memory ordering as described |
| in \cref{sec:memorder:Address Dependencies}. |
| However, because compilers do not understand them, address |
| dependencies are fragile, so please pay close attention to the |
| potential difficulties discussed in |
| \cref{sec:memorder:Address- and Data-Dependency Difficulties}. |
| \item[\IXGr{Amdahl's Law}:] |
| If sufficient numbers of CPUs are used to run a job that has both |
| a sequential portion and a concurrent portion, performance and |
| scalability will be limited by the overhead of the sequential |
| portion. |
| \item[\IXGalt{Associativity}{Cache associativity}:] |
| The number of cache lines that can be held simultaneously in |
| a given cache, when all of these cache lines hash identically |
| in that cache. |
| A cache that could hold four cache lines for each possible |
| hash value would be termed a ``four-way set-associative'' cache, |
| while a cache that could hold only one cache line for each |
| possible hash value would be termed a ``direct-mapped'' cache. |
| A cache whose associativity was equal to its capacity would |
| be termed a ``fully associative'' cache. |
| Fully associative caches have the advantage of eliminating |
| associativity misses, but, due to hardware limitations, |
| fully associative caches are normally quite limited in size. |
| The associativity of the large caches found on modern microprocessors |
| typically range from two-way to eight-way. |
| \item[\IXGalth{Associativity Miss}{associativity}{cache miss}:] |
| A cache miss incurred because the corresponding CPU has recently |
| accessed more data hashing to a given set of the cache than will |
| fit in that set. |
| Fully associative caches are not subject to associativity misses |
| (or, equivalently, in fully associative caches, associativity |
| and capacity misses are identical). |
| \item[\IXG{Atomic}:] |
| An operation is considered ``atomic'' if it is not possible to |
| observe any intermediate state. |
| For example, on most CPUs, a store to a properly aligned pointer |
| is atomic, because other CPUs will see either the old value or |
| the new value, but are guaranteed not to see some mixed value |
| containing some pieces of the new and old values. |
| \item[\IXG{Atomic Read-Modify-Write Operation}:] |
| An atomic operation that both reads and writes memory is |
| considered an atomic read-modify-write operation, or atomic RMW |
| operation for short. |
| Although the value written usually depends on the value read, |
| \co{atomic_xchg()} is the exception that proves this rule. |
| \item[\IXGh{Bounded}{Wait Free}:] |
| A forward-progress guarantee in which every thread makes |
| progress within a specific finite period of time, the specific |
| time being the bound. |
| \item[\IXGh{Bounded Population-Oblivious}{Wait Free}:] |
| A forward-progress guarantee in which every thread makes |
| progress within a specific finite period of time, the specific |
| time being the bound, where this bound is independent of the |
| number of threads. |
| \item[\IXG{Cache}:] |
| In modern computer systems, CPUs have caches in which to hold |
| frequently used data. |
| These caches can be thought of as hardware hash tables with very |
| simple hash functions, |
| but in which each hash bucket (termed a ``set'' by hardware types) |
| can hold only a limited number of data items. |
| The number of data items that can be held by each of a cache's hash |
| buckets is termed the cache's ``associativity''. |
| These data items are normally called ``cache lines'', which |
| can be thought of a fixed-length blocks of data that circulate |
| among the CPUs and memory. |
| \item[\IXG{Cache Coherence}:] |
| A property of most modern SMP machines where all CPUs will |
| observe a sequence of values for a given variable that is |
| consistent with at least one global order of values for |
| that variable. |
| Cache coherence also guarantees that at the end of a group |
| of stores to a given variable, all CPUs will agree |
| on the final value for that variable. |
| Note that cache coherence applies only to the series of values |
| taken on by a single variable. |
| In contrast, the memory consistency model for a given machine |
| describes the order in which loads and stores to groups of |
| variables will appear to occur. |
| See \cref{sec:memorder:Cache Coherence} |
| for more information. |
| \item[\IXG{Cache-Coherence Protocol}:] |
| A communications protocol, normally implemented in hardware, |
| that enforces memory consistency and ordering, preventing |
| different CPUs from seeing inconsistent views of data held |
| in their caches. |
| \item[\IXG{Cache Geometry}:] |
| The size and associativity of a cache is termed its geometry. |
| Each cache may be thought of as a two-dimensional array, |
| with rows of cache lines (``sets'') that have the same hash |
| value, and columns of cache lines (``ways'') in which every |
| cache line has a different hash value. |
| The associativity of a given cache is its number of |
| columns (hence the name ``way''---a two-way set-associative |
| cache has two ``ways''), and the size of the cache is its |
| number of rows multiplied by its number of columns. |
| \item[\IXG{Cache Line}:] |
| (1) The unit of data that circulates among the CPUs and memory, |
| usually a moderate power of two in size. |
| Typical cache-line sizes range from 16 to 256 bytes. \\ |
| (2) A physical location in a CPU cache capable of holding |
| one cache-line unit of data. \\ |
| (3) A physical location in memory capable of holding one |
| cache-line unit of data, but that it also aligned |
| on a cache-line boundary. |
| For example, the address of the first word of a cache line |
| in memory will end in 0x00 on systems with 256-byte cache lines. |
| \item[\IXG{Cache Miss}:] |
| A cache miss occurs when data needed by the CPU is not in |
| that CPU's cache. |
| The data might be missing because of a number of reasons, |
| including: |
| \begin{enumerate*}[(1)] |
| \item This CPU has never accessed the data before |
| (``warm-up'' miss), |
| \item This CPU has recently accessed more |
| data than would fit in its cache, so that some of the older |
| data had to be removed (``capacity'' miss), |
| \item This CPU |
| has recently accessed more data in a given set\footnote{ |
| In hardware-cache terminology, the word ``set'' |
| is used in the same way that the word ``bucket'' |
| is used when discussing software caches.} |
| than that set could hold (``associativity'' miss), |
| \item Some other CPU has written to the data (or some other |
| data in the same cache line) since this CPU has accessed it |
| (``communication'' miss), or |
| \item This CPU attempted to write to a cache line that is |
| currently read-only, possibly due to that line being replicated |
| in other CPUs' caches (``write'' miss). |
| \end{enumerate*} |
| Note that cache misses can be split into multiple partially |
| overlapping categories, for example, read misses vs.\@ write misses |
| as opposed to the above categorization. |
| \item[\IXGalth{Capacity Miss}{capacity}{cache miss}:] |
| A cache miss incurred because the corresponding CPU has recently |
| accessed more data than will fit into the cache. |
| \item[CAS:]\glsuseriii{cas} |
| Compare-and-swap operation, which is an atomic operation that |
| takes a pointer, and old value, and a new value. |
| If the pointed-to value is equal to the old value, it is atomically |
| replaced with the new value. |
| There is some variety in CAS API\@. |
| One variation returns the actual pointed-to value, so that the |
| caller compares the CAS return value to the specified old value, |
| with equality indicating a successful CAS operation. |
| Another variation returns a boolean success indication, in which |
| case a pointer to the old value may be passed in, and if so, |
| the old value is updated in the CAS failure case. |
| \item[\IXG{Clash Free}:] |
| A forward-progress guarantee in which, in the absence of |
| contention, at least one thread makes progress within a finite |
| period of time. |
| \item[\IXGalth{Code Locking}{code}{locking}:] |
| A simple locking design in which a ``global lock'' is used to protect |
| a set of critical sections, so that access by a given thread |
| to that set is |
| granted or denied based only on the set of threads currently |
| occupying the set of critical sections, not based on what |
| data the thread intends to access. |
| The scalability of a code-locked program is limited by the code; |
| increasing the size of the data set will normally not increase |
| scalability (in fact, will typically \emph{decrease} scalability |
| by increasing ``lock contention''). |
| Contrast with ``data locking''. |
| \item[\IXG{Combinatorial Explosion}:] |
| Denotes the exponential increase in executions that |
| formal-verification tools must analyze as problem size increases. |
| \item[\IXG{Combinatorial Implosion}:] |
| Denotes the exponential decrease in executions that |
| formal-verification tools must analyze when a given |
| code fragment is partitioned. |
| \item[\IXGalth{Communication Miss}{communication}{cache miss}:] |
| A cache miss incurred because some other CPU has written to |
| the cache line since the last time this CPU accessed it. |
| \item[\IXG{Concurrent}:] |
| In this book, a synonym of parallel. |
| Please see \cref{sec:app:questions:What is the Difference Between ``Concurrent'' and ``Parallel''?} |
| on \cpageref{sec:app:questions:What is the Difference Between ``Concurrent'' and ``Parallel''?} |
| for a discussion of the recent distinction between these two |
| terms. |
| \item[\IXGh{Control}{dependency}:] |
| The value returned by a load instruction is used to determine |
| whether or not a later store instruction is executed. |
| Control dependencies provide weak memory ordering as described |
| in \cref{sec:memorder:Control Dependencies}. |
| However, because compilers do not understand them, control |
| dependencies are exceedingly fragile, so please avoid using them. |
| If severe performance requirements mean that you absolutely must |
| use control dependencies, please carefully consider the potential |
| calamities discussed in |
| \cref{sec:memorder:Control-Dependency Calamities}. |
| Also, please think carefully about alternative approaches that |
| might permit you to meet your performance requirements without |
| use of control dependencies. |
| \item[\IXG{Critical Section}:] |
| A section of code guarded by some synchronization mechanism, |
| so that its execution constrained by that primitive. |
| For example, if a set of critical sections are guarded by |
| the same global lock, then only one of those critical sections |
| may be executing at a given time. |
| If a thread is executing in one such critical section, |
| any other threads must wait until the first thread completes |
| before executing any of the critical sections in the set. |
| \item[\IXGh{Data}{dependency}:] |
| The value returned by a load instruction is used to compute |
| the value stored by a later store instruction. |
| Data dependencies provide weak memory ordering as described |
| in \cref{sec:memorder:Data Dependencies}. |
| However, because compilers do not understand them, data |
| dependencies are fragile, so please pay close attention to the |
| potential difficulties discussed in |
| \cref{sec:memorder:Address- and Data-Dependency Difficulties}. |
| \item[\IXGh{Data}{Locking}:] |
| A scalable locking design in which each instance of a given |
| data structure has its own lock. |
| If each thread is using a different instance of the |
| data structure, then all of the threads may be executing in |
| the set of critical sections simultaneously. |
| Data locking has the advantage of automatically scaling to |
| increasing numbers of CPUs as the number of instances of |
| data grows. |
| Contrast with ``code locking''. |
| \item[\IXG{Data Race}:] |
| A race condition in which several CPUs or threads access |
| a variable concurrently, and in which at least one of those |
| accesses is a store and at least one of those accesses |
| is a plain access. |
| It is important to note that while the presence of data races |
| often indicates the presence of bugs, the absence of data races |
| in no way implies the absence of bugs. |
| (See ``Plain access'' and ``Race condition''.) |
| \item[\IXG{Deadlock}:] |
| A failure mode in which each of several threads is unable to |
| make progress until some other thread makes progress. |
| For example, if two threads acquire a pair of locks in opposite |
| orders, deadlock can result. |
| More information is provided in |
| \cref{sec:locking:Deadlock}. |
| \item[\IXG{Deadlock Free}:] |
| A forward-progress guarantee in which, in the absence of |
| failures, at least one thread makes progress within a finite |
| period of time. |
| \item[\IXGh{Direct-Mapped}{Cache}:] |
| A cache with only one way, so that it may hold only one cache |
| line with a given hash value. |
| \item[\IXG{Efficiency}:] |
| A measure of effectiveness normally expressed as a ratio |
| of some metric actually achieved to some maximum value. |
| The maximum value might be a theoretical maximum, but in |
| parallel programming is often based on the corresponding |
| measured single-threaded metric. |
| \item[\IXG{Embarrassingly Parallel}:] |
| A problem or algorithm where adding threads does not significantly |
| increase the overall cost of the computation, resulting in |
| linear speedups as threads are added (assuming sufficient |
| CPUs are available). |
| \item[\IXGalth{Energy Efficiency}{energy}{efficiency}:] |
| Shorthand for ``energy-efficient use'' in which the goal is to |
| carry out a given computation with reduced energy consumption. |
| Sublinear scalability can be an obstacle to energy-efficient use |
| of a multicore system. |
| \item[Epoch-Based Reclamation (EBR):]\glsuseriii{ebr} |
| An \acr{rcu} implementation style put forward by |
| \ppl{Keir}{Fraser}~\cite{KeirAnthonyFraserPhD,UCAM-CL-TR-579,KeirFraser2007withoutLocks}. |
| \item[\IXG{Existence Guarantee}:] |
| An existence guarantee is provided by a synchronization mechanism |
| that prevents a given dynamically allocated object from being |
| freed for the duration of that guarantee. |
| For example, \acr{rcu} provides existence guarantees for the duration |
| of \acr{rcu} read-side critical sections. |
| A similar but strictly weaker guarantee is provided by |
| type-safe memory. |
| \item[\IXGh{Exclusive}{Lock}:] |
| An exclusive lock is a mutual-exclusion mechanism that |
| permits only one thread at a time into the |
| set of critical sections guarded by that lock. |
| \item[\IXG{False Sharing}:] |
| If two CPUs each frequently write to one of a pair of data items, |
| but the pair of data items are located in the same cache line, |
| this cache line will be repeatedly invalidated, ``ping-ponging'' |
| back and forth between the two CPUs' caches. |
| This is a common cause of ``cache thrashing'', also called |
| ``cacheline bouncing'' (the latter most commonly in the Linux |
| community). |
| False sharing can dramatically reduce both performance and |
| scalability. |
| \item[\IXG{Forward-Progress Guarantee}:] |
| Algorithms or programs that guarantee that execution will |
| progress at some rate under specified conditions. |
| Academic forward-progress guarantees are grouped into a |
| formal hierarchy shown in |
| \cref{sec:advsync:Non-Blocking Synchronization}. |
| A wide variety of practical forward-progress guarantees are |
| provided by real-time systems, as discussed in |
| \cref{sec:advsync:Parallel Real-Time Computing}. |
| \item[\IXG{Fragmentation}:] |
| A memory pool that has a large amount of unused memory, but |
| not laid out to permit satisfying a relatively small request |
| is said to be fragmented. |
| External fragmentation occurs when the space is divided up |
| into small fragments lying between allocated blocks of memory, |
| while internal fragmentation occurs when specific requests or |
| types of requests have been allotted more memory than they |
| actually requested. |
| \item[\IXGh{Fully Associative}{Cache}:] |
| A fully associative cache contains only |
| one set, so that it can hold any subset of |
| memory that fits within its capacity. |
| \item[\IXG{Grace Period}:] |
| A grace period is any contiguous time interval such that |
| any \acr{rcu} read-side critical section that began before the |
| start of that interval has |
| completed before the end of that same interval. |
| Many \acr{rcu} implementations define a grace period to be a |
| time interval during which each thread has passed through at |
| least one quiescent state. |
| Since \acr{rcu} read-side critical sections by definition cannot |
| contain quiescent states, these two definitions are almost |
| always interchangeable. |
| \item[Hardware Transactional Memory (HTM):]\glsuseriii{htm} |
| A transactional-memory system based on hardware instructions |
| provided for this purpose, as discussed in |
| \cref{sec:future:Hardware Transactional Memory}. |
| (See ``Transactional memory''.) |
| \item[\IXG{Hazard Pointer}:] |
| A scalable counterpart to a reference counter in which an |
| object's reference count is represented implicitly by a count |
| of the number of special hazard pointers referencing that object. |
| \item[\IXG{Heisenbug}:] |
| A timing-sensitive bug that disappears from sight when you |
| add print statements or tracing in an attempt to track it |
| down. |
| \item[\IXG{Hot Spot}:] |
| Data structure that is very heavily used, resulting in high |
| levels of contention on the corresponding lock. |
| One example of this situation would be a hash table with |
| a poorly chosen hash function. |
| \item[\IXG{Humiliatingly Parallel}:] |
| A problem or algorithm where adding threads significantly |
| \emph{decreases} the overall cost of the computation, resulting in |
| large superlinear speedups as threads are added (assuming sufficient |
| CPUs are available). |
| \item[\IXG{Immutable}:] |
| In this book, a synonym for read-mostly. |
| \item[\IXG{Invalidation}:] |
| When a CPU wishes to write to a data item, it must first ensure |
| that this data item is not present in any other CPUs' cache. |
| If necessary, the item is removed from the other CPUs' caches |
| via ``invalidation'' messages from the writing CPUs to any |
| CPUs having a copy in their caches. |
| \item[IPI:]\glsuseriii{ipi} |
| Inter-processor interrupt, which is an |
| interrupt sent from one CPU to another. |
| IPIs are used heavily in the Linux kernel, for example, within |
| the scheduler to alert CPUs that a high-priority process is now |
| runnable. |
| \item[IRQ:]\glsuseriii{irq} |
| Interrupt request, often used as an abbreviation for ``interrupt'' |
| within the Linux kernel community, as in ``irq handler''. |
| \item[\IXG{Latency}:] |
| The wall-clock time required for a given operation to complete. |
| \item[\IXG{Linearizable}:] |
| A sequence of operations is ``linearizable'' if there is at |
| least one global ordering of the sequence that is consistent |
| with the observations of all CPUs and/or threads. |
| Linearizability is much prized by many researchers, but less |
| useful in practice than one might |
| expect~\cite{AndreasHaas2012FIFOisnt}. |
| \item[\IXG{Livelock}:] |
| A failure mode in which each of several threads is able to |
| execute, but in which a repeating series of failed operations |
| prevents any of the threads from making any useful forward progress. |
| For example, incorrect use of conditional locking |
| (for example, \co{spin_trylock()} in the Linux kernel) |
| can result in livelock. |
| More information is provided in |
| \cref{sec:locking:Livelock and Starvation}. |
| \item[\IXG{Lock}:] |
| A software abstraction that can be used to guard critical sections, |
| as such, an example of a ``mutual exclusion mechanism''. |
| An ``exclusive lock'' permits only one thread at a time into the |
| set of critical sections guarded by that lock, while a |
| ``reader-writer lock'' permits any number of reading |
| threads, or but one writing thread, into the set of critical |
| sections guarded by that lock. |
| (Just to be clear, the presence of a writer thread in any of |
| a given reader-writer lock's critical sections will prevent |
| any reader from entering any of that lock's critical sections |
| and vice versa.) |
| \item[\IXG{Lock Contention}:] |
| A lock is said to be suffering contention when it is being |
| used so heavily that there is often a CPU waiting on it. |
| Reducing lock contention is often a concern when designing |
| parallel algorithms and when implementing parallel programs. |
| \item[\IXG{Lock Free}:] |
| A forward-progress guarantee in which at least one thread makes |
| progress within a finite period of time. |
| \item[\IXG{Marked Access}:] |
| A source-code memory access that uses a special function or |
| macro, such as \co{READ_ONCE()}, \co{WRITE_ONCE()}, |
| \co{atomic_inc()}, and so on, in order to protect that access |
| from compiler and/or hardware optimizations. |
| In contrast, a plain access simply mentions the name of |
| the object being accessed, so that in the following, line~2 |
| is the plain-access equivalent of line~1: |
| \begin{VerbatimN} |
| WRITE_ONCE(a, READ_ONCE(b) + READ_ONCE(c)); |
| a = b + c; |
| \end{VerbatimN} |
| \item[\IXG{Memory}:] |
| From the viewpoint of memory models, the main memory, |
| caches, and store buffers in which values might be stored. |
| However, this term is often used to denote the main memory |
| itself, excluding caches and store buffers. |
| \item[\IXG{Memory Barrier}:] |
| A compiler directive that might also include a special |
| memory-barrier instruction. |
| The purpose of a memory barrier is to order memory-reference |
| instructions that executed before the memory barrier to precede |
| those that will execute following that memory barrier. |
| (See also ``read memory barrier'' and ``write memory barrier''.) |
| \item[\IXG{Memory Consistency}:] |
| A set of properties that impose constraints on the order in |
| which accesses to groups of variables appear to occur. |
| Memory consistency models range from sequential consistency, |
| a very constraining model popular in academic circles, through |
| process consistency, release consistency, and weak consistency. |
| \item[\IXGaltr{MESI Protocol}{MESI protocol}:] |
| The |
| cache-coherence protocol featuring |
| modified, exclusive, shared, and invalid (MESI) states, |
| so that this protocol is named after the states that the |
| cache lines in a given cache can take on. |
| A modified line has been recently written to by this CPU, |
| and is the sole representative of the current value of |
| the corresponding memory location. |
| An exclusive cache line has not been written to, but this |
| CPU has the right to write to it at any time, as the line |
| is guaranteed not to be replicated into any other CPU's cache |
| (though the corresponding location in main memory is up to date). |
| A shared cache line is (or might be) replicated in some other |
| CPUs' cache, meaning that this CPU must interact with those other |
| CPUs before writing to this cache line. |
| An invalid cache line contains no value, instead representing |
| ``empty space'' in the cache into which data from memory might |
| be loaded. |
| \item[\IXGaltr{Moore's Law}{Moore's Law}:] |
| A 1965 empirical projection by Gordon Moore that |
| transistor density increases exponentially over |
| time~\cite{GordonMoore1965MooresLaw}. |
| \item[\IXG{Multi-Copy Atomic}:] |
| A memory-consistency model in which all writes to memory appear |
| to occur in an order consistent with a single global order. |
| A read of a given location does not return a given value written |
| until that value has propagated throughout the system. |
| Note that this means that if a CPU does a write to a given |
| location followed immediately by a read from that same location, |
| it might take significant time for that read to complete. |
| Sometimes called ``full(y) multi-copy atomic'' for distinction. |
| (See also ``other-multi-copy atomic'' and ``non-multi-copy atomic''.) |
| \item[\IXG{Mutual-Exclusion Mechanism}:] |
| A software abstraction that regulates threads' access to |
| ``critical sections'' and corresponding data. |
| \item[NMI:]\glsuseriii{nmi} |
| Non-maskable interrupt. |
| As the name indicates, this is an extremely high-priority |
| interrupt that cannot be masked. |
| These are used for hardware-specific purposes such as profiling. |
| The advantage of using NMIs for profiling is that it allows you |
| to profile code that runs with interrupts disabled. |
| \item[\IXG{Non-Blocking}:] |
| A group of academic forward-progress guarantees that includes |
| bounded population-oblivious wait free, |
| bounded wait free, |
| wait free, |
| lock free, |
| obstruction free, |
| clash free, |
| starvation free, and |
| deadlock free. |
| See \cref{sec:advsync:Non-Blocking Synchronization} |
| for more information. |
| \item[Non-Blocking Synchronization (NBS):]\glsuseriii{nbs} |
| The use of algorithms, mechanisms, or techniques that provide |
| non-blocking forward-progress guarantees. |
| NBS is often used in a more restrictive sense of providing one |
| of the stronger forward-progress guarantees, usually wait free or |
| lock free, but sometimes also obstruction free. |
| (See ``Non-blocking''.) |
| \item[\IXGhy{Non-}{Multi-Copy Atomic}:] |
| A memory-consistency model in which, in the absence of explicit |
| memory-ordering instructions, writes to memory may appear to |
| occur in different orders from the perspective of different CPUs. |
| (See also ``multi-copy atomic'' and ``other-multi-copy atomic''.) |
| \item[NUCA:]\glsuseriii{nuca} |
| Non-uniform cache architecture, where groups of CPUs share |
| caches and/or store buffers. |
| CPUs in a group can therefore exchange cache lines with each |
| other much more quickly than they can with CPUs in other groups. |
| Systems comprised of CPUs with hardware threads will generally |
| have a NUCA architecture. |
| \item[NUMA:]\glsuseriii{numa} |
| Non-uniform memory architecture, where memory is split into |
| banks and each such bank is ``close'' to a group of CPUs, |
| the group being termed a ``NUMA node''. |
| An example NUMA machine is Sequent's NUMA-Q system, where |
| each group of four CPUs had a bank of memory nearby. |
| The CPUs in a given group can access their memory much |
| more quickly than another group's memory. |
| \item[\IXGaltr{NUMA Node}{NUMA node}:] |
| A group of closely placed CPUs and associated memory within |
| a larger NUMA machines. |
| \item[\IXG{Obstruction Free}:] |
| A forward-progress guarantee in which, in the absence of |
| contention, every thread makes progress within a finite |
| period of time. |
| \item[\IXGhy{Other-}{Multi-Copy Atomic}:] |
| A memory-consistency model in which a given set of writes to |
| memory appear to occur in an order consistent with a single |
| global order, but only from the perspective of CPUs that did |
| not execute any of the writes in that set. |
| Note that this means that if a CPU does a write to a given |
| location followed immediately by a read from that same location, |
| that read can immediately return the value written. |
| (See also ``multi-copy atomic'' and ``non-multi-copy atomic''.) |
| \item[\IXG{Overhead}:] |
| Operations that must be executed, but which do not contribute |
| directly to the work that must be accomplished. |
| For example, lock acquisition and release is normally considered |
| to be overhead, and specifically to be synchronization overhead. |
| \item[\IXG{Parallel}:] |
| In this book, a synonym of concurrent. |
| Please see \cref{sec:app:questions:What is the Difference Between ``Concurrent'' and ``Parallel''?} |
| on \cpageref{sec:app:questions:What is the Difference Between ``Concurrent'' and ``Parallel''?} |
| for a discussion of the recent distinction between these two |
| terms. |
| \item[\IXG{Performance}:] |
| Rate at which work is done, expressed as work per unit time. |
| If this work is fully serialized, then the performance will |
| be the reciprocal of the mean latency of the work items. |
| \item[\IXGr{Pipelined CPU}:] |
| A CPU with a pipeline, which is |
| an internal flow of instructions internal to the CPU that |
| is in some way similar to an assembly line, with many of |
| the same advantages and disadvantages. |
| In the 1960s through the early 1980s, pipelined CPUs were the |
| province of supercomputers, but started appearing in microprocessors |
| (such as the 80486) in the late 1980s. |
| \item[\IXG{Plain Access}:] |
| A source-code memory access that simply mentions the name of |
| the object being accessed. |
| (See ``Marked access''.) |
| \item[\IXGalth{Process Consistency}{process}{memory consistency}:] |
| A memory-consistency model in which each CPU's stores appear to |
| occur in program order, but in which different CPUs might see |
| accesses from more than one CPU as occurring in different orders. |
| \item[\IXG{Program Order}:] |
| The order in which a given thread's instructions |
| would be executed by a now-mythical ``in-order'' CPU that |
| completely executed each instruction before proceeding to |
| the next instruction. |
| (The reason such CPUs are now the stuff of ancient myths |
| and legends is that they were extremely slow. |
| These dinosaurs were one of the many victims of |
| Moore's-Law-driven increases in CPU clock frequency. |
| Some claim that these beasts will roam the earth once again, |
| others vehemently disagree.) |
| \item[\IXG{Quiescent State}:] |
| In \acr{rcu}, a point in the code where there can be no references held |
| to \acr{rcu}-protected data structures, which is normally any point |
| outside of an \acr{rcu} read-side critical section. |
| Any interval of time during which all threads pass through at |
| least one quiescent state each is termed a ``grace period''. |
| \item[Quiescent-State-Based Reclamation (QSBR):]\glsuseriii{qsbr} |
| An \acr{rcu} implementation style characterized by explicit quiescent |
| states. |
| In \acr{qsbr} implementations, read-side markers |
| (\co{rcu_read_lock()} and \co{rcu_read_unlock()} in the Linux |
| kernel) are no-ops~\cite{McKenney98,Slingwine95}. |
| Hooks in other parts of the software (for example, the Linux-kernel |
| scheduler) provide the quiescent states. |
| \item[\IXG{Race Condition}:] |
| Any situation where multiple CPUs or threads can interact, |
| though this term is often used in cases where such interaction |
| is undesirable. |
| (See ``Data race''.) |
| \item[\IXGaltr{RCU-Protected Data}{RCU-protected data}:] |
| A block of dynamically allocated memory whose freeing will be |
| deferred such that an \acr{rcu} grace period will elapse between the |
| time that there were no longer any \acr{rcu}-reader-accessible pointers |
| to that block and the time that that block is freed. |
| This ensures that no \acr{rcu} readers will have access to that block at |
| the time that it is freed. |
| \item[\IXGaltr{RCU-Protected Pointer}{RCU-protected pointer}:] |
| A pointer to \acr{rcu}-protected data. |
| Such pointers must be handled carefully, for example, any reader |
| that intends to dereference an \acr{rcu}-protected pointer must |
| use \co{rcu_dereference()} (or stronger) to load that pointer, |
| and any updater must use \co{rcu_assign_pointer()} (or stronger) |
| to store to that pointer. |
| More information is provided in |
| \cref{sec:memorder:Address- and Data-Dependency Difficulties}. |
| \item[\IXGalthmr{RCU Read-Side Critical Section}{RCU read-side}{critical section}:] |
| A section of code protected by \acr{rcu}, for example, beginning with |
| \co{rcu_read_lock()} and ending with \co{rcu_read_unlock()}. |
| (See ``Read-side critical section''.) |
| \item[Read-Copy Update (RCU):]\glsuseriii{rcu} |
| A synchronization mechanism that can be thought of as a replacement |
| for reader-writer locking or reference counting. |
| RCU provides extremely low-overhead access for readers, while |
| writers incur additional overhead maintaining old versions |
| for the benefit of pre-existing readers. |
| Readers neither block nor spin, and thus cannot participate in |
| deadlocks, however, they also can see stale data and can |
| run concurrently with updates. |
| RCU is thus best-suited for read-mostly situations where |
| stale data can either be tolerated (as in routing tables) |
| or avoided (as in the Linux kernel's System V IPC implementation). |
| \item[\IXGh{Read}{Memory Barrier}:] |
| A memory barrier that is only guaranteed to affect the ordering |
| of load instructions, that is, reads from memory. |
| (See also ``memory barrier'' and ``write memory barrier''.) |
| \item[\IXGalth{Read Miss}{read}{cache miss}:] |
| A cache miss incurred because the corresponding CPU attempted |
| to either read from or write to a cache line that is absent |
| from its cache. |
| Note that cache lines are normally larger than machine words, |
| so a store instruction must read an entire cache line in order |
| to be able to write to a portion of it. |
| \item[\IXG{Read Mostly}:] |
| Read-mostly data is (again, as the name implies) rarely updated. |
| However, it might be updated at any time. |
| \item[\IXG{Read Only}:] |
| Read-only data is, as the name implies, never updated except |
| by beginning-of-time initialization. |
| In this book, a synonym for immutable. |
| \item[\IXGh{Read-Side}{Critical Section}:] |
| A section of code guarded by read-acquisition of |
| some reader-writer synchronization mechanism. |
| For example, if one set of critical sections are guarded by |
| read-acquisition of |
| a given global reader-writer lock, while a second set of critical |
| section are guarded by write-acquisition of that same reader-writer |
| lock, then the first set of critical sections will be the |
| read-side critical sections for that lock. |
| Any number of threads may concurrently execute the read-side |
| critical sections, but only if no thread is executing one of |
| the write-side critical sections. |
| (See also ``RCU read-side critical section''.) |
| \item[\IXGh{Reader-Writer}{Lock}:] |
| A reader-writer lock is a mutual-exclusion mechanism that |
| permits any number of reading |
| threads, or but one writing thread, into the set of critical |
| sections guarded by that lock. |
| Threads attempting to write must wait until all pre-existing |
| reading threads release the lock, and, similarly, if there |
| is a pre-existing writer, any threads attempting to write must |
| wait for the writer to release the lock. |
| A key concern for reader-writer locks is ``fairness'': |
| Can an unending stream of readers starve a writer or vice versa? |
| \item[\IXG{Real Time}:] |
| A situation in which getting the correct result is not sufficient, |
| but where this result must also be obtained within a given amount |
| of time. |
| \item[\IXG{Reference Count}:] |
| A counter that tracks the number of users of a given object or |
| entity. |
| Reference counters provide existence guarantees and are sometimes |
| used to implement garbage collectors. |
| \item[\IXG{Release Store}:] |
| A write to memory that has release semantics. |
| Normal use cases pair an acquire load with a release store, |
| in which case if the load returns the value stored, then all |
| code executed by the loading CPU after that acquire load will |
| see the effects of all memory-reference instructions executed |
| by the storing CPU prior to that release store. |
| Releasing a lock provides similar memory-ordering semantics, |
| hence the ``release'' in ``release store''. |
| (See also ``acquire load'' and ``memory barrier''.) |
| \item[\IXG{Scalability}:] |
| A measure of how effectively a given system is able to utilize |
| additional resources. |
| For parallel computing, the additional resources are usually |
| additional CPUs. |
| \item[\IXGh{Sequence}{Lock}:] |
| A reader-writer synchronization mechanism in which readers |
| retry their operations if a writer was present. |
| \item[\IXGalth{Sequential Consistency}{sequential}{memory consistency}:] |
| A memory-consistency model where all memory references appear to occur |
| in an order consistent with |
| a single global order, and where each CPU's memory references |
| appear to all CPUs to occur in program order. |
| \item[Software Transactional Memory (HTM):]\glsuseriii{stm} |
| A transactional-memory system capable running on computer systems |
| without special hardware support. |
| (See ``Transactional memory''.) |
| \item[\IXG{Starvation}:] |
| A condition where at least one CPU or thread is unable to make |
| progress due to an unfortunate series of resource-allocation |
| decisions, as discussed in |
| \cref{sec:locking:Livelock and Starvation}. |
| For example, in a multisocket system, CPUs on one socket having |
| privileged access to the data structure implementing a given lock |
| could prevent CPUs on other sockets from ever acquiring that lock. |
| \item[\IXG{Starvation Free}:] |
| A forward-progress guarantee in which, in the absence of |
| failures, every thread makes progress within a finite |
| period of time. |
| \item[\IXG{Store Buffer}:] |
| A small set of internal registers used by a given CPU |
| to record pending stores |
| while the corresponding cache lines are making their |
| way to that CPU\@. |
| Also called ``store queue''. |
| \item[\IXG{Store Forwarding}:] |
| An arrangement where a given CPU refers to its store buffer |
| as well as its cache so as to ensure that the software sees |
| the memory operations performed by this CPU as if they |
| were carried out in program order. |
| \item[\IXGr{Superscalar CPU}:] |
| A scalar (non-vector) CPU capable of executing multiple instructions |
| concurrently. |
| This is a step up from a pipelined CPU that executes multiple |
| instructions in an assembly-line fashion---in a superscalar |
| CPU, each stage of the pipeline would be capable of handling |
| more than one instruction. |
| For example, if the conditions were exactly right, |
| the Intel Pentium Pro CPU from the mid-1990s could |
| execute two (and sometimes three) instructions per clock cycle. |
| Thus, a 200\,MHz Pentium Pro CPU could ``retire'', or complete the |
| execution of, up to 400 million instructions per second. |
| \item[\IXG{Synchronization}:] |
| Means for avoiding destructive interactions among CPUs or threads. |
| Synchronization mechanisms include atomic RMW operations, memory |
| barriers, locking, reference counting, hazard pointers, sequence |
| locking, RCU, non-blocking synchronization, and transactional |
| memory. |
| \item[\IXG{Teachable}:] |
| A topic, concept, method, or mechanism that teachers believe that |
| they understand completely and are therefore comfortable teaching. |
| \item[\IXG{Throughput}:] |
| A performance metric featuring work items completed per unit time. |
| \item[Transactional Lock Elision (TLE):]\glsuseriii{tle} |
| The use of transactional memory to emulate locking. |
| Synchronization is instead carried out by conflicting accesses |
| to the data to be protected by the lock. |
| In some cases, this can increase performance because TLE |
| avoids contention on the lock |
| word~\cite{MartinPohlack2011HTM2TLE,Kleen:2014:SEL:2566590.2576793,PascalFelber2016rwlockElision,SeongJaePark2020HTMRCUlock}. |
| \item[Transactional Memory (TM):]\glsuseriii{tm} |
| A synchronization mechanism that gathers groups of memory |
| accesses so as to execute them atomically from the viewpoint |
| of transactions on other CPUs or threads, discussed in |
| \cref{sec:future:Transactional Memory,sec:future:Hardware Transactional Memory}. |
| \item[\IXG{Type-Safe Memory}:] |
| Type-safe memory~\cite{Cheriton96a} is provided by a |
| synchronization mechanism that prevents a given dynamically |
| allocated object from changing to an incompatible type. |
| Note that the object might well be freed and then reallocated, but |
| the reallocated object is guaranteed to be of a compatible type. |
| Within the Linux kernel, type-safe memory is provided within |
| RCU read-side critical sections for memory allocated from slabs |
| marked with the \co{SLAB_TYPESAFE_BY_RCU} flag. |
| The strictly stronger existence guarantee also prevents freeing |
| of the protected object. |
| \item[Unbounded Transactional Memory (UTM):]\glsuseriii{utm} |
| A transactional-memory system based on hardware instructions |
| provided for this purpose, but with special hardware or |
| software capabilities that allow a given transaction to |
| have a very large memory footprint. |
| Such a system would at least partially avoid |
| HTM's transaction-size limitations called out in |
| \cref{sec:future:Transaction-Size Limitations}. |
| (See ``Hardware transactional memory''.) |
| \item[\IXG{Unfairness}:] |
| A condition where the progress of at least one CPU or thread |
| is impeded by an unfortunate series of resource-allocation |
| decisions, as discussed in |
| \cref{sec:locking:Livelock and Starvation}. |
| Extreme levels of unfairness are termed ``starvation''. |
| \item[\IXG{Unteachable}:] |
| A topic, concept, method, or mechanism that the teacher does |
| not understand well is therefore uncomfortable teaching. |
| \item[\IXGr{Vector CPU}:] |
| A CPU that can apply a single instruction to multiple items of |
| data concurrently. |
| In the 1960s through the 1980s, only supercomputers had vector |
| capabilities, but the advent of MMX in x86 CPUs and VMX in |
| PowerPC CPUs brought vector processing to the masses. |
| \item[\IXG{Wait Free}:] |
| A forward-progress guarantee in which every thread makes |
| progress within a finite period of time. |
| \item[\IXGalth{Warm-up Miss}{warm-up}{cache miss}:] |
| A cache miss incurred because the corresponding cache line |
| never was in the corresponding CPU's cache. |
| In a benchmark, warm-up misses can be avoided during |
| the measurement phase by warming the cache with a few |
| pre-measurement-phase iterations of the benchmark. |
| \item[\IXGh{Write}{Memory Barrier}:] |
| A memory barrier that is only guaranteed to affect the ordering |
| of store instructions, that is, writes to memory. |
| (See also ``memory barrier'' and ``read memory barrier''.) |
| \item[\IXGalth{Write Miss}{write}{cache miss}:] |
| A cache miss incurred because the corresponding CPU attempted |
| to write to a cache line that is read-only, for example, due |
| to that cache line being replicated in other CPUs' caches. |
| \item[\IXG{Write Mostly}:] |
| Write-mostly data is (yet again, as the name implies) frequently |
| updated. |
| \item[\IXGh{Write-Side}{Critical Section}:] |
| A section of code guarded by write-acquisition of |
| some reader-writer synchronization mechanism. |
| For example, if one set of critical sections are guarded by |
| write-acquisition of |
| a given global reader-writer lock, while a second set of critical |
| section are guarded by read-acquisition of that same reader-writer |
| lock, then the first set of critical sections will be the |
| write-side critical sections for that lock. |
| Only one thread may execute in the write-side critical section |
| at a time, and even then only if there are no threads are |
| executing concurrently in any of the corresponding read-side |
| critical sections. |
| \end{description} |