blob: 25af82c65f923921f1f605c7245118dfddbafc39 [file] [log] [blame]
========================
FILESYSTEM LOCAL CACHING
========================
========
CONTENTS
========
(*) Introduction.
(*) Setting up a cache.
(*) Setting cache cull limits.
(*) Monitoring.
(*) Relocating the cache.
(*) Further information.
============
INTRODUCTION
============
Linux now supports local caching of certain filesystems (currently only NFS and
the in-kernel AFS filesystems). This permits remote data to be cached on local
disk, thus potentially speeding up future accesses to that data by avoiding the
need to go to the network and fetch it again.
This facility (known as FS-Cache) is designed to be as transparent as possible
to a user of the system. Applications should just be able to use NFS files as
normal, without any knowledge of there being a cache.
The administrator has to set up the cache in the first place, tell the system
to use it and then mark the NFS mount points they want caching, but the user
need not see any of that.
The facility can be conceptualised by the following diagram:
+--------+ +--------+ +--------+ +--------+
| | /\ | | | | | |
| NFS |--- \ ---->| NFS |------>| Page |---->| User |
| Server | \/ | Client | ^ | Cache | | App |
| | Network | | | | (RAM) | | |
+--------+ +--------+ | +--------+ +--------+
| |
| +-----+
V |
+--------+ +--------+ +---------+
| | | | | |
| FS- |<--->| Cache |<--->| /var/ |
| Cache | | Files | | fscache|
| | | | | |
+--------+ +--------+ +---------+
When a user application reads data, data flows left to right along the top row.
With a local cache is available, the NFS client copies any data it doesn't have
a local copy of into the cache if there's space such that the second and
subsequent times it tries to read that data, it retrieves it from the cache
instead.
FS-Cache is an intermediary between the network filesystems (such as NFS) and
the actual cache backends (such as CacheFiles) that do the real work. If there
aren't any caches available, FS-Cache will smooth over the fact, with as little
extra latency as possible.
CacheFiles is the only cache backend currently available. It uses files in a
directory nominated by the administrator to store the data given to it. The
contents of the cache are persistent over reboots.
==================
SETTING UP A CACHE
==================
Setting up a cache should be straightforward. The configuration for the
in-filesystem cache backend (CacheFiles) is placed in /etc/cachefilesd.conf.
There is a manual page available to cover the options in detail, but they will
be overviewed here. The cachefilesd package will need to be installed to use
the cache.
The administrator first needs to decide which directory they want to place the
cache in (typically /var/cache/fscache) and specify that to the system:
[/etc/cachefilesd.conf]
dir /var/cache/fscache
The cache will be stored in the filesystem that hosts that directory. For
something like a laptop, you'll probably want to select the root directory
here, but for a main desktop machine you might want to mount a disk partition
specifically for the cache.
The filesystem must support user-defined extended attributes as these are used
by CacheFiles to store coherency maintenance information. User-defined
extended attributes can be turned on on an Ext3 filesystem by doing the
following:
tune2fs -o user_xattr /dev/hdxN
or by mounting the filesystem like this:
mount /dev/hda6 /var/cache/fscache/ -o user_xattr
All other requirements should be met by using a RHEL5+ or FC6+ kernel and using
Ext3 (ReiserFS and XFS will also meet the requirements). See the "Further
information" section for more information.
The CacheFiles backend works by using up free space on the disk, caching remote
data in it. See the section on "Setting cache cull limits" for configuring how
much free space it maintains. This is, however, optional as defaults are set.
Once the configuration file is in place, just start up the cachefilesd service:
systemd start cachefilesd.service
And the cache is ready to go. This can be made to happen automatically on boot
by running this as root:
systemd enable cachefilesd.service
========================
USING THE CACHE WITH NFS
========================
NFS will not use the cache unless explicitly told to do so. This is done by
attaching an extra option to an NFS mount ("-o fsc"), for instance:
mount fred:/ /fred -o fsc
All the accesses to files under /fred will then be put through the cache,
provided they aren't opened for direct I/O or opened for writing (see below).
NFS supports caching for version 2, 3 and 4, though they'll use different
branches of the cache for each.
NFS keys the contents of the cache on the server and the NFS file handle,
meaning that hard linked files share the cache correctly.
CACHE LIMITATIONS WITH NFS
--------------------------
If a file is opened for direct-I/O, the cache will be bypassed because the I/O
must be direct to the server.
If the file is opened for writing, NFS version 2 and 3 protocols don't provide
sufficient coherency management information for the client to be able to detect
a write from another client that overlapped with one that it did.
So if a file is opened for direct-I/O or for writing, the copy of the data
cached on disk will be retired and that file will cease being cached until it
is no longer being used by that client.
=========================
SETTING CACHE CULL LIMITS
=========================
The CacheFiles backend works by using up free space on the disk, caching remote
data in it. This could, potentially, consume the entirety of the free space,
which if it was also your root partition, would be bad. To control this,
CacheFiles tries to maintain a certain amount of free space, and will shrink
the cache to compensate if whatever else is on the disk grows.
This can be controlled by three settings:
[/etc/cachefilesd.conf]
brun 20%
bcull 10%
bstop 5%
These are specified as percentages of the total disk space. When the amount of
available free space drops below the "bcull" or "bstop" limits, the cache
management daemon will start reducing the amount of data in the cache, and when
the available free space rises above the "brun" limit, the culling will cease.
This provides hysteresis. Note that the following must hold true:
0 <= bstop < bcull < brun < 100
Similarly, some filesystems have limited numbers of files that they can
actually support (Ext3 for instance falls into this category). If the data
being pulled from the server is in lots of small files, then this can quickly
use up all the files available to the cache without using up all the data. To
counter this problem, the cache tries to maintain a minimum percentage of free
files, just as it does for available free space. This can also be configured:
[/etc/cachefilesd.conf]
frun 20%
fcull 10%
fstop 5%
And this must hold true:
0 <= fstop < fcull < frun < 100
The defaults are 7% (run), 5% (cull) and 1% (stop) for both groups of settings.
When the bstop or fstop limit is reached, no more data will be added to the
cache until appropriate parameter falls back beneath the run limit.
==========
MONITORING
==========
The state of NFS filesystem caching can be monitored to a certain extent by the
data exposed through files in /proc/sys/fs/nfs/:
(*) nfs_fscache_to_pages
The number of pages of data NFS has added to the cache.
(*) nfs_fscache_from_pages
The number of pages of data NFS has retrieved from the cache.
(*) nfs_fscache_uncache_page
The number of active page bindings that NFS has removed from the
cache. (Note that just because a page binding has been released, it
does not mean the page has been removed from the cache, just that NFS
is no longer using that particular bit of the cache at the moment).
(*) nfs_fscache_from_error
The last error incurred when reading page(s) from the cache.
(*) nfs_fscache_to_error
The last error incurred when writing a page to the cache.
Note that these sysctl parameters are only temporary and will be integrated in
to the NFS per-mount statistics sometime in the future.
Futhermore, the caching state of individual mountpoints can be examined through
other /proc files. For instance:
[root@andromeda ~]# cat /proc/fs/nfsfs/servers
NV SERVER PORT USE HOSTNAME
v4 ac101209 801 1 home0
[root@andromeda ~]# cat /proc/fs/nfsfs/volumes
NV SERVER PORT DEV FSID FSC
v4 ac101209 801 0:16 9:2 no
v4 ac101209 801 0:17 9:3 yes
The "FSC" column says "yes" when the system has been asked to cache a
particular NFS share/volume/export, and "no" when it hasn't.
====================
RELOCATING THE CACHE
====================
By default, the cache is located in /var/cache/fscache, but this may be
undesirable. Unless SELinux is being used in enforcing mode, relocating the
cache is trivially a matter of changing the "dir" line in /etc/cachefilesd.
However, if SELinux is being used in enforcing mode, then it's not that
simple. The security policy that governs access to the cache must be changed.
For more information, see:
move-cache.txt
===================
FURTHER INFORMATION
===================
On the subject of the CacheFiles facility and configuring it:
/usr/share/doc/cachefilesd/README
/usr/share/man/man5/cachefilesd.conf.5.gz
/usr/share/man/man8/cachefilesd.8.gz
For general information, including the design constraints and capabilities,
see:
/usr/share/doc/kernel-doc-2.6.17/Documentation/filesystems/caching/fscache.txt