defer/rcuexercises.tex - pub/scm/linux/kernel/git/paulmck/perfbook - Git at Google

 % defer/rcuexercises.tex

 \subsection{RCU Exercises}
 \label{sec:defer:RCU Exercises}

 This section is organized as a series of Quick Quizzes that invite you
 to apply RCU to a number of examples earlier in this book.
 The answer to each Quick Quiz gives some hints, and also contains a
 pointer to a later section where the solution is explained at length.
 The \co{rcu_read_lock()}, \co{rcu_read_unlock()}, \co{rcu_dereference()},
 \co{rcu_assign_pointer()}, and \co{synchronize_rcu()} primitives should
 suffice for most of these exercises.

 \QuickQuiz{}
 	The statistical-counter implementation shown in
 	Figure~\ref{fig:count:Per-Thread Statistical Counters}
 	(\url{count_end.c})
 	used a global lock to guard the summation in \co{read_count()},
 	which resulted in poor performance and negative scalability.
 	How could you use RCU to provide \co{read_count()} with
 	excellent performance and good scalability.
 	(Keep in mind that \co{read_count()}'s scalability will
 	necessarily be limited by its need to scan all threads'
 	counters.)
 \QuickQuizAnswer{
 	Hint: place the global variable \co{finalcount} and the
 	array \co{counterp[]} into a single RCU-protected struct.
 	At initialization time, this structure would be allocated
 	and set to all zero and \co{NULL}.

 	The \co{inc_count()} function would be unchanged.

 	The \co{read_count()} function would use \co{rcu_read_lock()}
 	instead of acquiring \co{final_mutex}, and would need to
 	use \co{rcu_dereference()} to acquire a reference to the
 	current structure.

 	The \co{count_register_thread()} function would set the
 	array element corresponding to the newly created thread
 	to reference that thread's per-thread \co{counter} variable.

 	The \co{count_unregister_thread()} function would need to
 	allocate a new structure, acquire \co{final_mutex},
 	copy the old structure to the new one, add the outgoing
 	thread's \co{counter} variable to the total, \co{NULL}
 	the pointer to this same \co{counter} variable,
 	use \co{rcu_assign_pointer()} to install the new structure
 	in place of the old one, release \co{final_mutex},
 	wait for a grace period, and finally free the old structure.

 	Does this really work?
 	Why or why not?

 	See
 	Section~\ref{sec:together:RCU and Per-Thread-Variable-Based Statistical Counters}
 	on
 	page~\pageref{sec:together:RCU and Per-Thread-Variable-Based Statistical Counters}
 	for more details.
 } \QuickQuizEnd


 \QuickQuiz{}
 	Section~\ref{sec:count:Applying Specialized Parallel Counters}
 	showed a fanciful pair of code fragments that dealt with counting
 	I/O accesses to removable devices.
 	These code fragments suffered from high overhead on the fastpath
 	(starting an I/O) due to the need to acquire a reader-writer
 	lock.
 	How would you use RCU to provide excellent performance and
 	scalability?
 	(Keep in mind that the performance of the common-case first
 	code fragment that does I/O accesses is much more important
 	than that of the device-removal code fragment.)
 \QuickQuizAnswer{
 	Hint: replace the read-acquisitions of the reader-writer lock
 	with RCU read-side critical sections, then adjust the
 	device-removal code fragment to suit.

 	See
 	Section~\ref{sec:together:RCU and Counters for Removable I/O Devices}
 	on
 	Page~\pageref{sec:together:RCU and Counters for Removable I/O Devices}
 	for one solution to this problem.
 } \QuickQuizEnd
	% defer/rcuexercises.tex

	\subsection{RCU Exercises}
	\label{sec:defer:RCU Exercises}

	This section is organized as a series of Quick Quizzes that invite you
	to apply RCU to a number of examples earlier in this book.
	The answer to each Quick Quiz gives some hints, and also contains a
	pointer to a later section where the solution is explained at length.
	The \co{rcu_read_lock()}, \co{rcu_read_unlock()}, \co{rcu_dereference()},
	\co{rcu_assign_pointer()}, and \co{synchronize_rcu()} primitives should
	suffice for most of these exercises.

	\QuickQuiz{}
	The statistical-counter implementation shown in
	Figure~\ref{fig:count:Per-Thread Statistical Counters}
	(\url{count_end.c})
	used a global lock to guard the summation in \co{read_count()},
	which resulted in poor performance and negative scalability.
	How could you use RCU to provide \co{read_count()} with
	excellent performance and good scalability.
	(Keep in mind that \co{read_count()}'s scalability will
	necessarily be limited by its need to scan all threads'
	counters.)
	\QuickQuizAnswer{
	Hint: place the global variable \co{finalcount} and the
	array \co{counterp[]} into a single RCU-protected struct.
	At initialization time, this structure would be allocated
	and set to all zero and \co{NULL}.

	The \co{inc_count()} function would be unchanged.

	The \co{read_count()} function would use \co{rcu_read_lock()}
	instead of acquiring \co{final_mutex}, and would need to
	use \co{rcu_dereference()} to acquire a reference to the
	current structure.

	The \co{count_register_thread()} function would set the
	array element corresponding to the newly created thread
	to reference that thread's per-thread \co{counter} variable.

	The \co{count_unregister_thread()} function would need to
	allocate a new structure, acquire \co{final_mutex},
	copy the old structure to the new one, add the outgoing
	thread's \co{counter} variable to the total, \co{NULL}
	the pointer to this same \co{counter} variable,
	use \co{rcu_assign_pointer()} to install the new structure
	in place of the old one, release \co{final_mutex},
	wait for a grace period, and finally free the old structure.

	Does this really work?
	Why or why not?

	See
	Section~\ref{sec:together:RCU and Per-Thread-Variable-Based Statistical Counters}
	on
	page~\pageref{sec:together:RCU and Per-Thread-Variable-Based Statistical Counters}
	for more details.
	} \QuickQuizEnd


	\QuickQuiz{}
	Section~\ref{sec:count:Applying Specialized Parallel Counters}
	showed a fanciful pair of code fragments that dealt with counting
	I/O accesses to removable devices.
	These code fragments suffered from high overhead on the fastpath
	(starting an I/O) due to the need to acquire a reader-writer
	lock.
	How would you use RCU to provide excellent performance and
	scalability?
	(Keep in mind that the performance of the common-case first
	code fragment that does I/O accesses is much more important
	than that of the device-removal code fragment.)
	\QuickQuizAnswer{
	Hint: replace the read-acquisitions of the reader-writer lock
	with RCU read-side critical sections, then adjust the
	device-removal code fragment to suit.

	See
	Section~\ref{sec:together:RCU and Counters for Removable I/O Devices}
	on
	Page~\pageref{sec:together:RCU and Counters for Removable I/O Devices}
	for one solution to this problem.
	} \QuickQuizEnd