| <?xml version='1.0' encoding='utf-8' ?> |
| <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [ |
| ]> |
| <chapter id="xfs-overview"> |
| <title>XFS Overview</title> |
| <section> |
| <title>XFS Filesystem Structure</title> |
| <para>This section gives an overview of the structure of an XFS filesystem</para> |
| <para>More detailed examination of the filesystem structure is covered later in the course</para> |
| <para>An XFS filesystem is divided evenly into allocation groups</para> |
| <para>An allocation group can be from 16MB to 1TB in size</para> |
| <para>See <command>xfs(5)</command></para> |
| </section> |
| <section> |
| <title>Allocation Groups</title> |
| <mediaobject><imageobject> |
| <imagedata fileref="images/XFS-allocation-groups.png" /> |
| </imageobject></mediaobject> |
| </section> |
| <section> |
| <title>Allocation Group Structure</title> |
| <para>Each allocation group includes</para> |
| <itemizedlist> |
| <listitem><para>Super block information about the entire filesystem</para></listitem> |
| <listitem><para>Free space management (within the allocation group)</para></listitem> |
| <listitem><para>Inode allocation and tracking (with the allocation group)</para></listitem> |
| </itemizedlist> |
| <para>Inode clusters within an allocation group are created when needed</para> |
| <itemizedlist> |
| <listitem><para>mkfs.xfs does not pre-create inodes throughout the filesystem</para></listitem> |
| </itemizedlist> |
| <mediaobject><imageobject> |
| <imagedata fileref="images/XFS-allocation-group-structure.png" /> |
| </imageobject></mediaobject> |
| </section> |
| <section> |
| <title>XFS Limits</title> |
| <para>32 bit Linux</para> |
| <itemizedlist> |
| <listitem><para>Maximum File Size = 16TB (O_LARGEFILE)</para></listitem> |
| <listitem><para>Maximum Filesystem Size = 16TB</para></listitem> |
| </itemizedlist> |
| <para>64 bit Linux</para> |
| <itemizedlist> |
| <listitem><para>Maximum File Size = 9 Million TB = 9 ExaB</para></listitem> |
| <listitem><para>Maximum Filesystem Size = 18 Million TB = 18 ExaB</para></listitem> |
| </itemizedlist> |
| </section> |
| <section> |
| <title>Filesystem Block Size (FSB)</title> |
| <para>Filesystem blocks (FSBs) are the unit of space for a filesystem</para> |
| <itemizedlist> |
| <listitem><para>Filesystem blocks are comprised of one or more device-level sectors.</para></listitem> |
| </itemizedlist> |
| <para>The page management implementation in Linux limits the FSB size to the page size</para> |
| <itemizedlist> |
| <listitem><para>4KB on ia32 and x86_64 architectures</para></listitem> |
| <listitem><para>16KB on ia64</para></listitem> |
| </itemizedlist> |
| <para> Performance can improve with different block sizes depending on the size of I/O requests and the size of files</para> |
| <itemizedlist> |
| <listitem><para>Larger blocks will also use more disk space for small (<1FSB) files</para></listitem> |
| </itemizedlist> |
| </section> |
| <section> |
| <title>Extents</title> |
| <para>An extent is a set of one or more contiguous FSBs that define a region in the filesystem for file data or metadata</para> |
| <itemizedlist> |
| <listitem><para>A single extent can be up to 8GB in length</para></listitem> |
| </itemizedlist> |
| <para>A file’s inode lists the extents associated with that file</para> |
| <itemizedlist> |
| <listitem><para>For very large files, the file’s inode may have thousands of extents, or one very large extent. Usually something in between.</para></listitem> |
| </itemizedlist> |
| <para>Extents are also used for file and directory metadata when the information exceeds the space reserved for an inode</para> |
| </section> |
| <section> |
| <title>Unwritten Extents</title> |
| <para>An unwritten extent is an extent which has been marked as "not yet written" ondisk.</para> |
| <para>Unwritten extents can be created by preallocating file space using:</para> |
| <itemizedlist> |
| <listitem><para>XFS specific interfaces (<command>xfsctl(3)</command>)</para></listitem> |
| <listitem><para><command>sys_fallocate</command> on kernels >= 2.6.23</para></listitem> |
| <listitem><para><command>posix_fallocate(3)</command> on recent glibc |
| <itemizedlist> |
| <listitem><para>falls back to 0-writing if kernel or fs has no support</para></listitem> |
| </itemizedlist> |
| </para></listitem> |
| <listitem><para><command>fallocate(1)</command> on newer glibc versions</para></listitem> |
| <listitem><para>Through direct IOs of specific (un)alignment.</para></listitem> |
| </itemizedlist> |
| <para>They are a security measure, to ensure allocated but not yet initialised space |
| ondisk is not visible to arbitrary users</para> |
| <para>Unwritten extents apply only to regular files.</para> |
| <para>Once such an extent is written to, or partially written to, a transaction is |
| issued to convert the written part into a regular written extent, and mark the |
| remaining (up to 2) extents as unwritten.</para> |
| <para>Use the -p option to xfs_bmap to view unwritten extents.</para> |
| <para><command># xfs_io -f -c 'resvsp 0 10m' -c 'bmap -vp' /tmp/foo</command></para> |
| </section> |
| <section> |
| <title>Inodes</title> |
| <para>XFS has three inode structures</para> |
| <para>Ondisk inode</para> |
| <itemizedlist> |
| <listitem><para>Used for storing the metadata for all files, directories and other file types</para></listitem> |
| <listitem><para>By default 256 bytes but can be up to 2KiB</para></listitem> |
| </itemizedlist> |
| <para> Linux inode</para> |
| <itemizedlist> |
| <listitem><para>xfs_inode_t has the Linux inode embedded in it</para></listitem> |
| </itemizedlist> |
| <para>XFS inode</para> |
| <itemizedlist> |
| <listitem><para>xfs_inode contains the ondisk inode structure in memory</para></listitem> |
| </itemizedlist> |
| </section> |
| <section> |
| <title>Directory and File Inodes</title> |
| <mediaobject><imageobject> |
| <imagedata fileref="images/XFS-directory-file-inodes.png" /> |
| </imageobject></mediaobject> |
| </section> |
| <section> |
| <title>Journal Log</title> |
| <para>XFS Journal logs all metadata transactions</para> |
| <itemizedlist> |
| <listitem><para>No record of data, only that the file size had changed</para></listitem> |
| </itemizedlist> |
| <para>Allows the filesystem to replay and recover the filesystem in seconds</para> |
| <itemizedlist> |
| <listitem><para>No requirement to run fsck</para></listitem> |
| </itemizedlist> |
| <para>Log replay will apply filesystem and metadata changes that had been |
| logged but may not have been applied to the filesystem when it went down</para> |
| <para>The log may be located on a separate device</para> |
| </section> |
| </chapter> |