appendix/whymb: Add Quick Quiz on invalidate acknowledgments
Appendix C.4.2 ("Invalidate Queues and Invalidate Acknowledge") is
a bit abrupt, and cause readers some angst. So add a Quick Quiz to
explain things. Probably causing even more angst, but so it goes...
Reported-by: Philipp Stanner <stanner@posteo.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
diff --git a/appendix/whymb/whymemorybarriers.tex b/appendix/whymb/whymemorybarriers.tex
index 01bdbc0..4930f88 100644
--- a/appendix/whymb/whymemorybarriers.tex
+++ b/appendix/whymb/whymemorybarriers.tex
@@ -1124,12 +1124,61 @@
in a short time period, a given CPU might fall behind in processing
them, thus possibly stalling all the other CPUs.
-However, the CPU need not actually invalidate the cache line
-before sending the acknowledgement.
-It could instead queue the invalidate message with the understanding
+However, in real-world systems, the CPU might not actually invalidate
+the cache line before sending the acknowledgement.
+It might instead queue the invalidate message with the understanding
that the message will be processed before the CPU sends any further
messages regarding that cache line.
+\QuickQuiz{
+ How can the CPU possibly send the invalidate acknowledge before
+ actually invalidating the corresponding cache line without
+ wreaking all kinds of havoc, including different CPUs seeing
+ multiple different values for the same variables at the same
+ time???
+}\QuickQuizAnswer{
+ To start with, CPUs can already see multiple values for a given
+ variable at the same time, as amply demonstrated by
+ \cref{fig:memorder:A Variable With More Simultaneous Values}
+ on
+ \cpageref{fig:memorder:A Variable With More Simultaneous Values}.
+ Furthermore, on weakly ordered architectures, there are very few
+ guarantees, so if the CPU can prove that the cache coherence
+ protocol allows the CPU to order subsequent loads before the
+ other CPU's store (the one that caused the invalidate to be sent),
+ then it can continue supplying loads from the doomed cache line.
+
+ Furthermore, the CPU might be able to commence speculative
+ execution, which can be squashed if need be.
+ In short, speculative execution allows the CPU to violate the
+ rules if (and only if!\@) it can hide any such violations from
+ the user code.
+
+ However, if the cache line was in ``modified'' or ``exclusive''
+ state when the invalidate was received, it will be necessary
+ to transition its state to at least ``shared'' prior to sending
+ the invalidate acknowledge message.
+ Otherwise, stores might be lost or atomic read-modify-write
+ operations on the other CPU might be executed incorrectly.
+ And even then, the CPU will need to keep track of the fact that
+ the cache line is doomed, which means that we are using something
+ other than strict MESI\@.
+ On the other hand, hardware architects can and do use much more
+ complex cache-coherence protocols than MESI, which in turn permit
+ more optimizations courtesy of the additional state information
+ that these more complex protocols track.
+
+ This is all in the name of optimization, although such hardware
+ optimizations must be implemented extremely carefully.
+ Of course, if the code contains memory-ordering instructions
+ such as memory barriers, then the CPU's ability to optimize
+ will be more limited.
+
+ But never forget that CPU-based optimizations are quite limited
+ compared to those of modern optimizing compilers!
+ Whose optimizations are also limited by memory-ordering constraints.
+}\QuickQuizEnd
+
\subsection{Invalidate Queues and Invalidate Acknowledge}
\label{sec:app:whymb:Invalidate Queues and Invalidate Acknowledge}