blob: 1762b39811a8928ebd8f31b1f39982ddc26cb830 [file] [log] [blame]
<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
]>
<chapter id="xfs-overview">
<title>XFS Overview</title>
<section>
<title>XFS Filesystem Structure</title>
<para>This section gives an overview of the structure of an XFS filesystem</para>
<para>More detailed examination of the filesystem structure is covered later in the course</para>
<para>An XFS filesystem is divided evenly into allocation groups</para>
<para>An allocation group can be from 16MB to 1TB in size</para>
<para>See <command>xfs(5)</command></para>
</section>
<section>
<title>Allocation Groups</title>
<mediaobject><imageobject>
<imagedata fileref="images/XFS-allocation-groups.png" />
</imageobject></mediaobject>
</section>
<section>
<title>Allocation Group Structure</title>
<para>Each allocation group includes</para>
<itemizedlist>
<listitem><para>Super block information about the entire filesystem</para></listitem>
<listitem><para>Free space management (within the allocation group)</para></listitem>
<listitem><para>Inode allocation and tracking (with the allocation group)</para></listitem>
</itemizedlist>
<para>Inode clusters within an allocation group are created when needed</para>
<itemizedlist>
<listitem><para>mkfs.xfs does not pre-create inodes throughout the filesystem</para></listitem>
</itemizedlist>
<mediaobject><imageobject>
<imagedata fileref="images/XFS-allocation-group-structure.png" />
</imageobject></mediaobject>
</section>
<section>
<title>XFS Limits</title>
<para>32 bit Linux</para>
<itemizedlist>
<listitem><para>Maximum File Size = 16TB (O_LARGEFILE)</para></listitem>
<listitem><para>Maximum Filesystem Size = 16TB</para></listitem>
</itemizedlist>
<para>64 bit Linux</para>
<itemizedlist>
<listitem><para>Maximum File Size = 9 Million TB = 9 ExaB</para></listitem>
<listitem><para>Maximum Filesystem Size = 18 Million TB = 18 ExaB</para></listitem>
</itemizedlist>
</section>
<section>
<title>Filesystem Block Size (FSB)</title>
<para>Filesystem blocks (FSBs) are the unit of space for a filesystem</para>
<itemizedlist>
<listitem><para>Filesystem blocks are comprised of one or more device-level sectors.</para></listitem>
</itemizedlist>
<para>The page management implementation in Linux limits the FSB size to the page size</para>
<itemizedlist>
<listitem><para>4KB on ia32 and x86_64 architectures</para></listitem>
<listitem><para>16KB on ia64</para></listitem>
</itemizedlist>
<para> Performance can improve with different block sizes depending on the size of I/O requests and the size of files</para>
<itemizedlist>
<listitem><para>Larger blocks will also use more disk space for small (&lt;1FSB) files</para></listitem>
</itemizedlist>
</section>
<section>
<title>Extents</title>
<para>An extent is a set of one or more contiguous FSBs that define a region in the filesystem for file data or metadata</para>
<itemizedlist>
<listitem><para>A single extent can be up to 8GB in length</para></listitem>
</itemizedlist>
<para>A file’s inode lists the extents associated with that file</para>
<itemizedlist>
<listitem><para>For very large files, the file’s inode may have thousands of extents, or one very large extent. Usually something in between.</para></listitem>
</itemizedlist>
<para>Extents are also used for file and directory metadata when the information exceeds the space reserved for an inode</para>
</section>
<section>
<title>Unwritten Extents</title>
<para>An unwritten extent is an extent which has been marked as "not yet written" ondisk.</para>
<para>Unwritten extents can be created by preallocating file space using:</para>
<itemizedlist>
<listitem><para>XFS specific interfaces (<command>xfsctl(3)</command>)</para></listitem>
<listitem><para><command>sys_fallocate</command> on kernels >= 2.6.23</para></listitem>
<listitem><para><command>posix_fallocate(3)</command> on recent glibc
<itemizedlist>
<listitem><para>falls back to 0-writing if kernel or fs has no support</para></listitem>
</itemizedlist>
</para></listitem>
<listitem><para><command>fallocate(1)</command> on newer glibc versions</para></listitem>
<listitem><para>Through direct IOs of specific (un)alignment.</para></listitem>
</itemizedlist>
<para>They are a security measure, to ensure allocated but not yet initialised space
ondisk is not visible to arbitrary users</para>
<para>Unwritten extents apply only to regular files.</para>
<para>Once such an extent is written to, or partially written to, a transaction is
issued to convert the written part into a regular written extent, and mark the
remaining (up to 2) extents as unwritten.</para>
<para>Use the -p option to xfs_bmap to view unwritten extents.</para>
<para><command># xfs_io -f -c 'resvsp 0 10m' -c 'bmap -vp' /tmp/foo</command></para>
</section>
<section>
<title>Inodes</title>
<para>XFS has three inode structures</para>
<para>Ondisk inode</para>
<itemizedlist>
<listitem><para>Used for storing the metadata for all files, directories and other file types</para></listitem>
<listitem><para>By default 256 bytes but can be up to 2KiB</para></listitem>
</itemizedlist>
<para> Linux inode</para>
<itemizedlist>
<listitem><para>xfs_inode_t has the Linux inode embedded in it</para></listitem>
</itemizedlist>
<para>XFS inode</para>
<itemizedlist>
<listitem><para>xfs_inode contains the ondisk inode structure in memory</para></listitem>
</itemizedlist>
</section>
<section>
<title>Directory and File Inodes</title>
<mediaobject><imageobject>
<imagedata fileref="images/XFS-directory-file-inodes.png" />
</imageobject></mediaobject>
</section>
<section>
<title>Journal Log</title>
<para>XFS Journal logs all metadata transactions</para>
<itemizedlist>
<listitem><para>No record of data, only that the file size had changed</para></listitem>
</itemizedlist>
<para>Allows the filesystem to replay and recover the filesystem in seconds</para>
<itemizedlist>
<listitem><para>No requirement to run fsck</para></listitem>
</itemizedlist>
<para>Log replay will apply filesystem and metadata changes that had been
logged but may not have been applied to the filesystem when it went down</para>
<para>The log may be located on a separate device</para>
</section>
</chapter>