cpu/cpu.tex - pub/scm/linux/kernel/git/paulmck/perfbook - Git at Google

 % cpu/cpu.tex
 % SPDX-License-Identifier: CC-BY-SA-3.0

 \QuickQuizChapter{chp:Hardware and its Habits}{Hardware and its Habits}
 %
 \Epigraph{Premature abstraction is the root of all evil.}
 	 {\emph{A cast of thousands}}

 Most people have an intuitive understanding that passing messages between
 systems is considerably more expensive than performing simple calculations
 within the confines of a single system.
 However, it is not always so clear that communicating among threads within
 the confines of a single shared-memory system can also be quite expensive.
 This chapter therefore looks at the cost of synchronization and communication
 within a shared-memory system.
 These few pages can do no more than scratch the surface of shared-memory
 parallel hardware design; readers desiring more detail would do well
 to start with a recent edition of Hennessy and Patterson's classic
 text~\cite{Hennessy2011,Hennessy95a}.

 \QuickQuiz{}
 	Why should parallel programmers bother learning low-level
 	properties of the hardware?
 	Wouldn't it be easier, better, and more general to remain at
 	a higher level of abstraction?
 \QuickQuizAnswer{
 	It might well be easier to ignore the detailed properties of
 	the hardware, but in most cases it would be quite foolish
 	to do so.
 	If you accept that the only purpose of parallelism is to
 	increase performance, and if you further accept that
 	performance depends on detailed properties of the hardware,
 	then it logically follows that parallel programmers are going
 	to need to know at least a few hardware properties.

 	This is the case in most engineering disciplines.
 	Would \emph{you} want to use a bridge designed by an
 	engineer who did not understand the properties of
 	the concrete and steel making up that bridge?
 	If not, why would you expect a parallel programmer to be
 	able to develop competent parallel software without at least
 	\emph{some} understanding of the underlying hardware?
 } \QuickQuizEnd

 \input{cpu/overview}
 \input{cpu/overheads}
 \input{cpu/hwfreelunch}
 \input{cpu/swdesign}

 So, to sum up:

 \begin{enumerate}
 \item	The good news is that multicore systems are inexpensive and
 	readily available.
 \item	More good news:  The overhead of many synchronization operations
 	is much lower than it was on parallel systems from the early 2000s.
 \item	The bad news is that the overhead of cache misses is still high,
 	especially on large systems.
 \end{enumerate}

 The remainder of this book describes ways of handling this bad news.

 In particular,
 Chapter~\ref{chp:Tools of the Trade} will cover some of the low-level
 tools used for parallel programming,
 Chapter~\ref{chp:Counting} will investigate problems and solutions to
 parallel counting, and
 Chapter~\ref{cha:Partitioning and Synchronization Design}
 will discuss design disciplines that promote performance and scalability.
	% cpu/cpu.tex
	% SPDX-License-Identifier: CC-BY-SA-3.0

	\QuickQuizChapter{chp:Hardware and its Habits}{Hardware and its Habits}
	%
	\Epigraph{Premature abstraction is the root of all evil.}
	{\emph{A cast of thousands}}

	Most people have an intuitive understanding that passing messages between
	systems is considerably more expensive than performing simple calculations
	within the confines of a single system.
	However, it is not always so clear that communicating among threads within
	the confines of a single shared-memory system can also be quite expensive.
	This chapter therefore looks at the cost of synchronization and communication
	within a shared-memory system.
	These few pages can do no more than scratch the surface of shared-memory
	parallel hardware design; readers desiring more detail would do well
	to start with a recent edition of Hennessy and Patterson's classic
	text~\cite{Hennessy2011,Hennessy95a}.

	\QuickQuiz{}
	Why should parallel programmers bother learning low-level
	properties of the hardware?
	Wouldn't it be easier, better, and more general to remain at
	a higher level of abstraction?
	\QuickQuizAnswer{
	It might well be easier to ignore the detailed properties of
	the hardware, but in most cases it would be quite foolish
	to do so.
	If you accept that the only purpose of parallelism is to
	increase performance, and if you further accept that
	performance depends on detailed properties of the hardware,
	then it logically follows that parallel programmers are going
	to need to know at least a few hardware properties.

	This is the case in most engineering disciplines.
	Would \emph{you} want to use a bridge designed by an
	engineer who did not understand the properties of
	the concrete and steel making up that bridge?
	If not, why would you expect a parallel programmer to be
	able to develop competent parallel software without at least
	\emph{some} understanding of the underlying hardware?
	} \QuickQuizEnd

	\input{cpu/overview}
	\input{cpu/overheads}
	\input{cpu/hwfreelunch}
	\input{cpu/swdesign}

	So, to sum up:

	\begin{enumerate}
	\item The good news is that multicore systems are inexpensive and
	readily available.
	\item More good news: The overhead of many synchronization operations
	is much lower than it was on parallel systems from the early 2000s.
	\item The bad news is that the overhead of cache misses is still high,
	especially on large systems.
	\end{enumerate}

	The remainder of this book describes ways of handling this bad news.

	In particular,
	Chapter~\ref{chp:Tools of the Trade} will cover some of the low-level
	tools used for parallel programming,
	Chapter~\ref{chp:Counting} will investigate problems and solutions to
	parallel counting, and
	Chapter~\ref{cha:Partitioning and Synchronization Design}
	will discuss design disciplines that promote performance and scalability.