howto.txt - pub/scm/linux/kernel/git/dhowells/cachefilesd - Git at Google

 			   ========================
 			   FILESYSTEM LOCAL CACHING
 			   ========================

 ========
 CONTENTS
 ========

  (*) Introduction.

  (*) Setting up a cache.

  (*) Setting cache cull limits.

  (*) Monitoring.

  (*) Relocating the cache.

  (*) Further information.


 ============
 INTRODUCTION
 ============

 Linux now supports local caching of certain filesystems (currently only NFS and
 the in-kernel AFS filesystems).  This permits remote data to be cached on local
 disk, thus potentially speeding up future accesses to that data by avoiding the
 need to go to the network and fetch it again.

 This facility (known as FS-Cache) is designed to be as transparent as possible
 to a user of the system.  Applications should just be able to use NFS files as
 normal, without any knowledge of there being a cache.

 The administrator has to set up the cache in the first place, tell the system
 to use it and then mark the NFS mount points they want caching, but the user
 need not see any of that.

 The facility can be conceptualised by the following diagram:

 	+--------+             +--------+       +--------+     +--------+
 	|        |   /\        |        |       |        |     |        |
 	|  NFS   |---  \  ---->|  NFS   |------>| Page   |---->| User   |
 	| Server |      \/     | Client |  ^    | Cache  |     | App    |
 	|        |   Network   |        |  |    | (RAM)  |     |        |
 	+--------+             +--------+  |    +--------+     +--------+
 	                          |        |
 	                          |  +-----+
 	                          V  |
 	                       +--------+     +--------+     +---------+
 	                       |        |     |        |     |         |
 	                       | FS-    |<--->| Cache  |<--->| /var/   |
 	                       | Cache  |     | Files  |     |  fscache|
 	                       |        |     |        |     |         |
 	                       +--------+     +--------+     +---------+

 When a user application reads data, data flows left to right along the top row.
 With a local cache is available, the NFS client copies any data it doesn't have
 a local copy of into the cache if there's space such that the second and
 subsequent times it tries to read that data, it retrieves it from the cache
 instead.

 FS-Cache is an intermediary between the network filesystems (such as NFS) and
 the actual cache backends (such as CacheFiles) that do the real work.  If there
 aren't any caches available, FS-Cache will smooth over the fact, with as little
 extra latency as possible.

 CacheFiles is the only cache backend currently available.  It uses files in a
 directory nominated by the administrator to store the data given to it.  The
 contents of the cache are persistent over reboots.


 ==================
 SETTING UP A CACHE
 ==================

 Setting up a cache should be straightforward.  The configuration for the
 in-filesystem cache backend (CacheFiles) is placed in /etc/cachefilesd.conf.
 There is a manual page available to cover the options in detail, but they will
 be overviewed here.  The cachefilesd package will need to be installed to use
 the cache.

 The administrator first needs to decide which directory they want to place the
 cache in (typically /var/cache/fscache) and specify that to the system:

 	[/etc/cachefilesd.conf]
 	dir /var/cache/fscache

 The cache will be stored in the filesystem that hosts that directory.  For
 something like a laptop, you'll probably want to select the root directory
 here, but for a main desktop machine you might want to mount a disk partition
 specifically for the cache.

 The filesystem must support user-defined extended attributes as these are used
 by CacheFiles to store coherency maintenance information.  User-defined
 extended attributes can be turned on on an Ext3 filesystem by doing the
 following:

 	tune2fs -o user_xattr /dev/hdxN

 or by mounting the filesystem like this:

 	mount /dev/hda6 /var/cache/fscache/ -o user_xattr

 All other requirements should be met by using a RHEL5+ or FC6+ kernel and using
 Ext3 (ReiserFS and XFS will also meet the requirements).  See the "Further
 information" section for more information.


 The CacheFiles backend works by using up free space on the disk, caching remote
 data in it.  See the section on "Setting cache cull limits" for configuring how
 much free space it maintains.  This is, however, optional as defaults are set.


 Once the configuration file is in place, just start up the cachefilesd service:

 	systemd start cachefilesd.service

 And the cache is ready to go.  This can be made to happen automatically on boot
 by running this as root:

 	systemd enable cachefilesd.service


 ========================
 USING THE CACHE WITH NFS
 ========================

 NFS will not use the cache unless explicitly told to do so.  This is done by
 attaching an extra option to an NFS mount ("-o fsc"), for instance:

 	mount fred:/ /fred -o fsc

 All the accesses to files under /fred will then be put through the cache,
 provided they aren't opened for direct I/O or opened for writing (see below).

 NFS supports caching for version 2, 3 and 4, though they'll use different
 branches of the cache for each.

 NFS keys the contents of the cache on the server and the NFS file handle,
 meaning that hard linked files share the cache correctly.


 CACHE LIMITATIONS WITH NFS
 --------------------------

 If a file is opened for direct-I/O, the cache will be bypassed because the I/O
 must be direct to the server.

 If the file is opened for writing, NFS version 2 and 3 protocols don't provide
 sufficient coherency management information for the client to be able to detect
 a write from another client that overlapped with one that it did.

 So if a file is opened for direct-I/O or for writing, the copy of the data
 cached on disk will be retired and that file will cease being cached until it
 is no longer being used by that client.


 =========================
 SETTING CACHE CULL LIMITS
 =========================

 The CacheFiles backend works by using up free space on the disk, caching remote
 data in it.  This could, potentially, consume the entirety of the free space,
 which if it was also your root partition, would be bad.  To control this,
 CacheFiles tries to maintain a certain amount of free space, and will shrink
 the cache to compensate if whatever else is on the disk grows.

 This can be controlled by three settings:

 	[/etc/cachefilesd.conf]
 	brun 20%
 	bcull 10%
 	bstop 5%

 These are specified as percentages of the total disk space.  When the amount of
 available free space drops below the "bcull" or "bstop" limits, the cache
 management daemon will start reducing the amount of data in the cache, and when
 the available free space rises above the "brun" limit, the culling will cease.
 This provides hysteresis.  Note that the following must hold true:

 	0 <= bstop < bcull < brun < 100


 Similarly, some filesystems have limited numbers of files that they can
 actually support (Ext3 for instance falls into this category).  If the data
 being pulled from the server is in lots of small files, then this can quickly
 use up all the files available to the cache without using up all the data.  To
 counter this problem, the cache tries to maintain a minimum percentage of free
 files, just as it does for available free space.  This can also be configured:

 	[/etc/cachefilesd.conf]
 	frun 20%
 	fcull 10%
 	fstop 5%

 And this must hold true:

 	0 <= fstop < fcull < frun < 100


 The defaults are 7% (run), 5% (cull) and 1% (stop) for both groups of settings.

 When the bstop or fstop limit is reached, no more data will be added to the
 cache until appropriate parameter falls back beneath the run limit.


 ==========
 MONITORING
 ==========

 The state of NFS filesystem caching can be monitored to a certain extent by the
 data exposed through files in /proc/sys/fs/nfs/:

  (*) nfs_fscache_to_pages

 	The number of pages of data NFS has added to the cache.

  (*) nfs_fscache_from_pages

 	The number of pages of data NFS has retrieved from the cache.

  (*) nfs_fscache_uncache_page

 	The number of active page bindings that NFS has removed from the
 	cache. (Note that just because a page binding has been released, it
 	does not mean the page has been removed from the cache, just that NFS
 	is no longer using that particular bit of the cache at the moment).

  (*) nfs_fscache_from_error

 	The last error incurred when reading page(s) from the cache.

  (*) nfs_fscache_to_error

 	The last error incurred when writing a page to the cache.

 Note that these sysctl parameters are only temporary and will be integrated in
 to the NFS per-mount statistics sometime in the future.


 Futhermore, the caching state of individual mountpoints can be examined through
 other /proc files.  For instance:

 	[root@andromeda ~]# cat /proc/fs/nfsfs/servers
 	NV SERVER   PORT USE HOSTNAME
 	v4 ac101209  801   1 home0
 	[root@andromeda ~]# cat /proc/fs/nfsfs/volumes
 	NV SERVER   PORT DEV     FSID              FSC
 	v4 ac101209  801 0:16    9:2               no
 	v4 ac101209  801 0:17    9:3               yes

 The "FSC" column says "yes" when the system has been asked to cache a
 particular NFS share/volume/export, and "no" when it hasn't.


 ====================
 RELOCATING THE CACHE
 ====================

 By default, the cache is located in /var/cache/fscache, but this may be
 undesirable.  Unless SELinux is being used in enforcing mode, relocating the
 cache is trivially a matter of changing the "dir" line in /etc/cachefilesd.

 However, if SELinux is being used in enforcing mode, then it's not that
 simple.  The security policy that governs access to the cache must be changed.
 For more information, see:

 	move-cache.txt


 ===================
 FURTHER INFORMATION
 ===================

 On the subject of the CacheFiles facility and configuring it:

 	/usr/share/doc/cachefilesd/README
 	/usr/share/man/man5/cachefilesd.conf.5.gz
 	/usr/share/man/man8/cachefilesd.8.gz

 For general information, including the design constraints and capabilities,
 see:

 	/usr/share/doc/kernel-doc-2.6.17/Documentation/filesystems/caching/fscache.txt
	========================
	FILESYSTEM LOCAL CACHING
	========================

	========
	CONTENTS
	========

	(*) Introduction.

	(*) Setting up a cache.

	(*) Setting cache cull limits.

	(*) Monitoring.

	(*) Relocating the cache.

	(*) Further information.


	============
	INTRODUCTION
	============

	Linux now supports local caching of certain filesystems (currently only NFS and
	the in-kernel AFS filesystems). This permits remote data to be cached on local
	disk, thus potentially speeding up future accesses to that data by avoiding the
	need to go to the network and fetch it again.

	This facility (known as FS-Cache) is designed to be as transparent as possible
	to a user of the system. Applications should just be able to use NFS files as
	normal, without any knowledge of there being a cache.

	The administrator has to set up the cache in the first place, tell the system
	to use it and then mark the NFS mount points they want caching, but the user
	need not see any of that.

	The facility can be conceptualised by the following diagram:

	+--------+ +--------+ +--------+ +--------+
	\| \| /\ \| \| \| \| \| \|
	\| NFS \|--- \ ---->\| NFS \|------>\| Page \|---->\| User \|
	\| Server \| \/ \| Client \| ^ \| Cache \| \| App \|
	\| \| Network \| \| \| \| (RAM) \| \| \|
	+--------+ +--------+ \| +--------+ +--------+
	\| \|
	\| +-----+
	V \|
	+--------+ +--------+ +---------+
	\| \| \| \| \| \|
	\| FS- \|<--->\| Cache \|<--->\| /var/ \|
	\| Cache \| \| Files \| \| fscache\|
	\| \| \| \| \| \|
	+--------+ +--------+ +---------+

	When a user application reads data, data flows left to right along the top row.
	With a local cache is available, the NFS client copies any data it doesn't have
	a local copy of into the cache if there's space such that the second and
	subsequent times it tries to read that data, it retrieves it from the cache
	instead.

	FS-Cache is an intermediary between the network filesystems (such as NFS) and
	the actual cache backends (such as CacheFiles) that do the real work. If there
	aren't any caches available, FS-Cache will smooth over the fact, with as little
	extra latency as possible.

	CacheFiles is the only cache backend currently available. It uses files in a
	directory nominated by the administrator to store the data given to it. The
	contents of the cache are persistent over reboots.


	==================
	SETTING UP A CACHE
	==================

	Setting up a cache should be straightforward. The configuration for the
	in-filesystem cache backend (CacheFiles) is placed in /etc/cachefilesd.conf.
	There is a manual page available to cover the options in detail, but they will
	be overviewed here. The cachefilesd package will need to be installed to use
	the cache.

	The administrator first needs to decide which directory they want to place the
	cache in (typically /var/cache/fscache) and specify that to the system:

	[/etc/cachefilesd.conf]
	dir /var/cache/fscache

	The cache will be stored in the filesystem that hosts that directory. For
	something like a laptop, you'll probably want to select the root directory
	here, but for a main desktop machine you might want to mount a disk partition
	specifically for the cache.

	The filesystem must support user-defined extended attributes as these are used
	by CacheFiles to store coherency maintenance information. User-defined
	extended attributes can be turned on on an Ext3 filesystem by doing the
	following:

	tune2fs -o user_xattr /dev/hdxN

	or by mounting the filesystem like this:

	mount /dev/hda6 /var/cache/fscache/ -o user_xattr

	All other requirements should be met by using a RHEL5+ or FC6+ kernel and using
	Ext3 (ReiserFS and XFS will also meet the requirements). See the "Further
	information" section for more information.


	The CacheFiles backend works by using up free space on the disk, caching remote
	data in it. See the section on "Setting cache cull limits" for configuring how
	much free space it maintains. This is, however, optional as defaults are set.


	Once the configuration file is in place, just start up the cachefilesd service:

	systemd start cachefilesd.service

	And the cache is ready to go. This can be made to happen automatically on boot
	by running this as root:

	systemd enable cachefilesd.service


	========================
	USING THE CACHE WITH NFS
	========================

	NFS will not use the cache unless explicitly told to do so. This is done by
	attaching an extra option to an NFS mount ("-o fsc"), for instance:

	mount fred:/ /fred -o fsc

	All the accesses to files under /fred will then be put through the cache,
	provided they aren't opened for direct I/O or opened for writing (see below).

	NFS supports caching for version 2, 3 and 4, though they'll use different
	branches of the cache for each.

	NFS keys the contents of the cache on the server and the NFS file handle,
	meaning that hard linked files share the cache correctly.


	CACHE LIMITATIONS WITH NFS
	--------------------------

	If a file is opened for direct-I/O, the cache will be bypassed because the I/O
	must be direct to the server.

	If the file is opened for writing, NFS version 2 and 3 protocols don't provide
	sufficient coherency management information for the client to be able to detect
	a write from another client that overlapped with one that it did.

	So if a file is opened for direct-I/O or for writing, the copy of the data
	cached on disk will be retired and that file will cease being cached until it
	is no longer being used by that client.


	=========================
	SETTING CACHE CULL LIMITS
	=========================

	The CacheFiles backend works by using up free space on the disk, caching remote
	data in it. This could, potentially, consume the entirety of the free space,
	which if it was also your root partition, would be bad. To control this,
	CacheFiles tries to maintain a certain amount of free space, and will shrink
	the cache to compensate if whatever else is on the disk grows.

	This can be controlled by three settings:

	[/etc/cachefilesd.conf]
	brun 20%
	bcull 10%
	bstop 5%

	These are specified as percentages of the total disk space. When the amount of
	available free space drops below the "bcull" or "bstop" limits, the cache
	management daemon will start reducing the amount of data in the cache, and when
	the available free space rises above the "brun" limit, the culling will cease.
	This provides hysteresis. Note that the following must hold true:

	0 <= bstop < bcull < brun < 100


	Similarly, some filesystems have limited numbers of files that they can
	actually support (Ext3 for instance falls into this category). If the data
	being pulled from the server is in lots of small files, then this can quickly
	use up all the files available to the cache without using up all the data. To
	counter this problem, the cache tries to maintain a minimum percentage of free
	files, just as it does for available free space. This can also be configured:

	[/etc/cachefilesd.conf]
	frun 20%
	fcull 10%
	fstop 5%

	And this must hold true:

	0 <= fstop < fcull < frun < 100


	The defaults are 7% (run), 5% (cull) and 1% (stop) for both groups of settings.

	When the bstop or fstop limit is reached, no more data will be added to the
	cache until appropriate parameter falls back beneath the run limit.


	==========
	MONITORING
	==========

	The state of NFS filesystem caching can be monitored to a certain extent by the
	data exposed through files in /proc/sys/fs/nfs/:

	(*) nfs_fscache_to_pages

	The number of pages of data NFS has added to the cache.

	(*) nfs_fscache_from_pages

	The number of pages of data NFS has retrieved from the cache.

	(*) nfs_fscache_uncache_page

	The number of active page bindings that NFS has removed from the
	cache. (Note that just because a page binding has been released, it
	does not mean the page has been removed from the cache, just that NFS
	is no longer using that particular bit of the cache at the moment).

	(*) nfs_fscache_from_error

	The last error incurred when reading page(s) from the cache.

	(*) nfs_fscache_to_error

	The last error incurred when writing a page to the cache.

	Note that these sysctl parameters are only temporary and will be integrated in
	to the NFS per-mount statistics sometime in the future.


	Futhermore, the caching state of individual mountpoints can be examined through
	other /proc files. For instance:

	[root@andromeda ~]# cat /proc/fs/nfsfs/servers
	NV SERVER PORT USE HOSTNAME
	v4 ac101209 801 1 home0
	[root@andromeda ~]# cat /proc/fs/nfsfs/volumes
	NV SERVER PORT DEV FSID FSC
	v4 ac101209 801 0:16 9:2 no
	v4 ac101209 801 0:17 9:3 yes

	The "FSC" column says "yes" when the system has been asked to cache a
	particular NFS share/volume/export, and "no" when it hasn't.


	====================
	RELOCATING THE CACHE
	====================

	By default, the cache is located in /var/cache/fscache, but this may be
	undesirable. Unless SELinux is being used in enforcing mode, relocating the
	cache is trivially a matter of changing the "dir" line in /etc/cachefilesd.

	However, if SELinux is being used in enforcing mode, then it's not that
	simple. The security policy that governs access to the cache must be changed.
	For more information, see:

	move-cache.txt


	===================
	FURTHER INFORMATION
	===================

	On the subject of the CacheFiles facility and configuring it:

	/usr/share/doc/cachefilesd/README
	/usr/share/man/man5/cachefilesd.conf.5.gz
	/usr/share/man/man8/cachefilesd.8.gz

	For general information, including the design constraints and capabilities,
	see:

	/usr/share/doc/kernel-doc-2.6.17/Documentation/filesystems/caching/fscache.txt