blob: a745abee27c24c235e117d69a296b090f07325f1 [file] [log] [blame]
<?xml version='1.0'?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
]>
<chapter id="Allocation_Groups">
<title>Allocation Groups</title>
<para>
XFS filesystems are divided into a number of equally sized chunks called Allocation Groups. Each AG can almost be thought of as an individual filesystem that maintains it's own space usage. Each AG can be up to one terabyte in size (512 bytes * 2<superscript>31</superscript>), regardless of the underlying device's sector size.
</para>
<para>
Each AG has the following characteristics:
</para>
<itemizedlist>
<listitem>
<para>A super block describing overall filesystem info</para>
</listitem>
<listitem>
<para>Free space management</para>
</listitem>
<listitem>
<para>Inode allocation and tracking</para>
</listitem>
</itemizedlist>
<para>
Having multiple AGs allows XFS to handle most operations in parallel without degrading performance as the number of concurrent accessing increases.
</para>
<para>
The only global information maintained by the first AG (primary) is free space across the filesystem and total inode counts. If the <command>XFS_SB_VERSION2_LAZYSBCOUNTBIT</command> flag is set in the superblock, these are only updated on-disk when the filesystem is cleanly unmounted (umount or shutdown).
</para>
<para>
Immediately after a mkfs.xfs, the primary AG has the following disk layout the subsequent AGs do not have any inodes allocated:
</para>
<para>
<mediaobject>
<imageobject><imagedata fileref="images/6.png" format="PNG" width="100%" scalefit="0"/></imageobject>
<textobject><phrase>6</phrase></textobject>
</mediaobject>
</para>
<para>
Each of these structures are expanded upon in the following sections.
</para>
<section id="Superblocks">
<title>Superblocks</title>
<para>
Each AG starts with a superblock. The first one is the primary superblock that stores aggregate AG information. Secondary superblocks are only used by xfs_repair when the primary superblock has been corrupted.
</para>
<para>
The superblock is defined by the following structure. The description of each field follows.
</para>
<programlisting>
typedef struct xfs_sb
{
__uint32_t        sb_magicnum;
__uint32_t        sb_blocksize;
xfs_drfsbno_t     sb_dblocks;
xfs_drfsbno_t     sb_rblocks;
xfs_drtbno_t      sb_rextents;
uuid_t        sb_uuid;
xfs_dfsbno_t      sb_logstart;
xfs_ino_t        sb_rootino;
xfs_ino_t        sb_rbmino;
xfs_ino_t        sb_rsumino;
xfs_agblock_t    sb_rextsize;
xfs_agblock_t    sb_agblocks;
xfs_agnumber_t    sb_agcount;
xfs_extlen_t      sb_rbmblocks;
xfs_extlen_t      sb_logblocks;
__uint16_t        sb_versionnum;
__uint16_t        sb_sectsize;
__uint16_t        sb_inodesize;
__uint16_t        sb_inopblock;
char        sb_fname[12];
__uint8_t        sb_blocklog;
__uint8_t        sb_sectlog;
__uint8_t        sb_inodelog;
__uint8_t        sb_inopblog;
__uint8_t        sb_agblklog;
__uint8_t        sb_rextslog;
__uint8_t        sb_inprogress;
__uint8_t        sb_imax_pct;
__uint64_t        sb_icount;
__uint64_t        sb_ifree;
__uint64_t        sb_fdblocks;
__uint64_t        sb_frextents;
xfs_ino_t        sb_uquotino;
xfs_ino_t        sb_gquotino;
__uint16_t        sb_qflags;
__uint8_t        sb_flags;
__uint8_t        sb_shared_vn;
xfs_extlen_t      sb_inoalignmt;
__uint32_t        sb_unit;
__uint32_t        sb_width;
__uint8_t        sb_dirblklog;
__uint8_t        sb_logsectlog;
__uint16_t        sb_logsectsize;
__uint32_t        sb_logsunit;
__uint32_t        sb_features2;
} xfs_sb_t;
</programlisting>
<variablelist>
<varlistentry>
<term>sb_magicnum</term>
<listitem><para>Identifies the filesystem. It's value is <command>XFS_SB_MAGIC = 0x58465342 "XFSB"</command>.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_blocksize</term>
<listitem><para>The size of a basic unit of space allocation in bytes. Typically, this is 4096 (4KB) but can range from 512 to 65536 bytes.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_dblocks</term>
<listitem><para>Total number of blocks available for data and metadata on the filesystem.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_rblocks</term>
<listitem><para>Number blocks in the real-time disk device. Refer to <xref linkend="Real-time_Devices"/> for more information.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_rextents</term>
<listitem><para>Number of extents on the real-time device.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_uuid</term>
<listitem><para>UUID (Universally Unique ID) for the filesystem. Filesystems can be mounted by the UUID instead of device name.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_logstart</term>
<listitem><para>First block number for the journaling log if the log is internal (ie. not on a separate disk device). For an external log device, this will be zero (the log will also start on the first block on the log device).</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_rootino</term>
<listitem><para>Root inode number for the filesystem. Typically, this is 128 when using a 4KB block size.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_rbmino</term>
<listitem><para>Bitmap inode for real-time extents.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_rsumino</term>
<listitem><para>Summary inode for real-time bitmap.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_rextsize</term>
<listitem><para>Realtime extent size in blocks.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_agblocks</term>
<listitem><para>Size of each AG in blocks. For the actual size of the last AG, refer to the <xref linkend="AG_Free_Space_Management"/> <command>agf_length</command> value.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_agcount</term>
<listitem><para>Number of AGs in the filesystem.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_rbmblocks</term>
<listitem><para>Number of real-time bitmap blocks.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_logblocks</term>
<listitem><para>Number of blocks for the journaling log.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_versionnum</term>
<listitem>
<para>Filesystem version number. This is a bitmask specifying the features enabled when creating the filesystem. Any disk checking tools or drivers that do not recognize any set bits must not operate upon the filesystem. Most of the flags indicate features introduced over time. The value must be 4 including the following flags:
<informaltable frame="all">
<tgroup cols="2"><thead><row>
<entry>
<para>Flag</para>
</entry>
<entry>
<para>Description</para>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<para><command>XFS_SB_VERSION_ATTRBIT</command></para>
</entry>
<entry>
<para>Set if any inode have extended attributes.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_SB_VERSION_NLINKBIT</command></para>
</entry>
<entry>
<para>Set if any inodes use 32-bit di_nlink values.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_SB_VERSION_QUOTABIT</command></para>
</entry>
<entry>
<para>Quotas are enabled on the filesystem. This also brings in the various quota fields in the superblock.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_SB_VERSION_ALIGNBIT</command></para>
</entry>
<entry>
<para>Set if sb_inoalignmt is used.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_SB_VERSION_DALIGNBIT</command></para>
</entry>
<entry>
<para>Set if sb_unit and sb_width are used.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_SB_VERSION_SHAREDBIT</command></para>
</entry>
<entry>
<para>Set if sb_shared_vn is used.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_SB_VERSION_LOGV2BIT</command></para>
</entry>
<entry>
<para>Version 2 journaling logs are used.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_SB_VERSION_SECTORBIT</command></para>
</entry>
<entry>
<para>Set if sb_sectsize is not 512.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_SB_VERSION_EXTFLGBIT</command></para>
</entry>
<entry>
<para>Unwritten extents are used. This is always set.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_SB_VERSION_DIRV2BIT</command></para>
</entry>
<entry>
<para>Version 2 directories are used. This is always set.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_SB_VERSION_MOREBITSBIT</command></para>
</entry>
<entry>
<para>Set if the sb_features2 field in the superblock contains more flags.</para>
</entry>
</row></tbody></tgroup>
</informaltable>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>sb_sectsize</term>
<listitem><para>Specifies the underlying disk sector size in bytes. Majority of the time, this is 512 bytes. This determines the minimum I/O alignment including Direct I/O.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_inodesize</term>
<listitem><para>Size of the inode in bytes. The default is 256 (2 inodes per standard sector) but can be made as large as 2048 bytes when creating the filesystem.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_inopblock</term>
<listitem><para>Number of inodes per block. This is equivalent to <command>sb_blocksize / sb_inodesize</command>.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_fname[12]</term>
<listitem><para>Name for the filesystem. This value can be used in the mount command.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_blocklog</term>
<listitem><para>log<superscript>2</superscript> value of <command>sb_blocksize</command>. In other terms, <command>sb_blocksize = 2sb_blocklog</command>.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_sectlog</term>
<listitem><para>log<superscript>2</superscript> value of <command>sb_sectsize</command>.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_inodelog</term>
<listitem><para>log<superscript>2</superscript> value of <command>sb_inodesize</command>.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_inopblog</term>
<listitem><para>log<superscript>2</superscript> value of <command>sb_inopblock</command>.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_agblklog</term>
<listitem><para>log<superscript>2</superscript> value of <command>sb_agblocks</command> (rounded up). This value is used to generate inode numbers and absolute block numbers defined in extent maps.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_rextslog</term>
<listitem><para>log<superscript>2</superscript> value of <command>sb_rextents</command>.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_inprogress</term>
<listitem><para>Flag specifying that the filesystem is being created.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_imax_pct</term>
<listitem><para>Maximum percentage of filesystem space that can be used for inodes. The default value is 25%.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_icount</term>
<listitem><para>Global count for number inodes allocated on the filesystem. This is only maintained in the first superblock.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_ifree</term>
<listitem><para>Global count of free inodes on the filesystem. This is only maintained in the first superblock.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_fdblocks</term>
<listitem><para>Global count of free data blocks on the filesystem. This is only maintained in the first superblock.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_frextents</term>
<listitem><para>Global count of free real-time extents on the filesystem. This is only maintained in the first superblock.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_uquotino</term>
<listitem><para>Inode for user quotas. This and the following two quota fields only apply if <command>XFS_SB_VERSION_QUOTABIT</command> flag is set in <command>sb_versionnum</command>. Refer to <xref linkend="Quota_Inodes"/> for more information.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_gquotino</term>
<listitem><para>Inode for group or project quotas. Group and Project quotas cannot be used at the same time.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_qflags</term>
<listitem><para>
Quota flags. It can be a combination of the following flags:
<informaltable frame="all">
<tgroup cols="2"><thead><row>
<entry>
<para>Flag</para>
</entry>
<entry>
<para>Description</para>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<para><command>XFS_UQUOTA_ACCT</command></para>
</entry>
<entry>
<para>User quota accounting is enabled.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_UQUOTA_ENFD</command></para>
</entry>
<entry>
<para>User quotas are enforced.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_UQUOTA_CHKD</command></para>
</entry>
<entry>
<para>User quotas have been checked and updated on disk.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_PQUOTA_ACCT</command></para>
</entry>
<entry>
<para>Project quota accounting is enabled.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_OQUOTA_ENFD</command></para>
</entry>
<entry>
<para>Other (group/project) quotas are enforced.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_OQUOTA_CHKD</command></para>
</entry>
<entry>
<para>Other (group/project) quotas have been checked.</para>
</entry>
</row>
<row>
<entry>
<para><command>XFS_GQUOTA_ACCT</command></para>
</entry>
<entry>
<para>Group quota accounting is enabled.</para>
</entry>
</row></tbody></tgroup>
</informaltable>
</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_flags</term>
<listitem><para>Miscellaneous flags.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_shared_vn</term>
<listitem><para>Reserved and must be zero ("vn" stands for version number).</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_inoalignmt</term>
<listitem><para>Inode chunk alignment in fsblocks. </para></listitem>
</varlistentry>
<varlistentry>
<term>sb_unit</term>
<listitem><para>Underlying stripe or raid unit in blocks.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_width</term>
<listitem><para>Underlying stripe or raid width in blocks.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_dirblklog</term>
<listitem><para>log2 value multiplier that determines the granularity of directory block allocations in fsblocks.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_logsectlog</term>
<listitem><para>log2 value of the log subvolume's sector size. This is only used if the journaling log is on a separate disk device (i.e. not internal).</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_logsectsize</term>
<listitem><para>The log's sector size in bytes if the filesystem uses an external log device.</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_logsunit</term>
<listitem><para>The log device's stripe or raid unit size. This only applies to version 2 logs (<command>XFS_SB_VERSION_LOGV2BIT</command> is set in <command>sb_versionnum</command>).</para></listitem>
</varlistentry>
<varlistentry>
<term>sb_features2</term>
<listitem><para>
Additional version flags if <command>XFS_SB_VERSION_MOREBITSBIT</command> is set in <command>sb_versionnum</command>. The currently defined additional features include:
<orderedlist>
<listitem>
<para><command>XFS_SB_VERSION2_LAZYSBCOUNTBIT</command>  (0x02): Lazy global counters. Making a filesystem with this bit set can improve performance. The global free space and inode counts are only updated in the primary superblock when the filesystem is cleanly unmounted.</para>
</listitem>
<listitem>
<para><command>XFS_SB_VERSION2_ATTR2BIT</command>  (0x08): Extended attributes version 2. Making a filesystem with this optimises the inode layout of extended attributes. </para>
</listitem>
<listitem>
<para><command>XFS_SB_VERSION2_PARENTBIT</command>  (0x10): Parent pointers. All inodes must have an extended attribute that points back to its parent inode. The primary purpose for this information is in backup systems.</para>
</listitem>
</orderedlist>
</para></listitem>
</varlistentry>
</variablelist>
<bridgehead>xfs_db Example:</bridgehead>
<para>A filesystem is made on a single SATA disk with the following command:</para>
<programlisting>
# mkfs.xfs -i attr=2 -n size=16384 -f /dev/sda7
meta-data=/dev/sda7 isize=256 agcount=16, agsize=3923122 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=62769952, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=16384
log =internal log bsize=4096 blocks=30649, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
</programlisting>
<para>And in xfs_db, inspecting the superblock:</para>
<programlisting>
xfs_db> sb
xfs_db> p
magicnum = 0x58465342
blocksize = 4096
dblocks = 62769952
rblocks = 0
rextents = 0
uuid = 32b24036-6931-45b4-b68c-cd5e7d9a1ca5
logstart = 33554436
rootino = 128
rbmino = 129
rsumino = 130
rextsize = 16
agblocks = 3923122
agcount = 16
rbmblocks = 0
logblocks = 30649
versionnum = 0xb084
sectsize = 512
inodesize = 256
inopblock = 16
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
blocklog = 12
sectlog = 9
inodelog = 8
inopblog = 4
agblklog = 22
rextslog = 0
inprogress = 0
imax_pct = 25
icount = 64
ifree = 61
fdblocks = 62739235
frextents = 0
uquotino = 0
gquotino = 0
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 2
unit = 0
width = 0
dirblklog = 2
logsectlog = 0
logsectsize = 0
logsunit = 0
features2 = 8
</programlisting>
</section>
<section id="AG_Free_Space_Management">
<title>AG Free Space Management</title>
<para>The XFS filesystem tracks free space in an allocation group using two B+trees. One B+tree tracks space by block number, the second by the size of the free space block. This scheme allows XFS to quickly find free space near a given block or of a given size.</para>
<para>All block numbers, indexes and counts are AG relative.</para>
<section id="AG_Free_Space_Block">
<title>AG Free Space Block</title>
<para>The second sector in an AG contains the information about the two free space B+trees and associated free space information for the AG. The "AG Free Space Block", also knows as the AGF, uses the following structure:</para>
<programlisting>
typedef struct xfs_agf {
__be32 agf_magicnum;
__be32 agf_versionnum;
__be32 agf_seqno;
__be32 agf_length;
__be32 agf_roots[XFS_BTNUM_AGF];
__be32 agf_spare0;
__be32 agf_levels[XFS_BTNUM_AGF];
__be32 agf_spare1;
__be32 agf_flfirst;
__be32 agf_fllast;
__be32 agf_flcount;
__be32 agf_freeblks;
__be32 agf_longest;
__be32 agf_btreeblks;
} xfs_agf_t;
</programlisting>
<para>
The rest of the bytes in the sector are zeroed. <command>XFS_BTNUM_AGF</command> is set to 2, index 0 for the count B+tree and index 1 for the size B+tree.
</para>
<variablelist>
<varlistentry>
<term>agf_magicnum</term>
<listitem><para>Specifies the magic number for the AGF sector: "XAGF" (0x58414746).</para></listitem>
</varlistentry>
<varlistentry>
<term>agf_versionnum</term>
<listitem><para>Set to <command>XFS_AGF_VERSION</command> which is currently 1.</para></listitem>
</varlistentry>
<varlistentry>
<term>agf_seqno</term>
<listitem><para>Specifies the AG number for the sector.</para></listitem>
</varlistentry>
<varlistentry>
<term>agf_length</term>
<listitem><para>Specifies the size of the AG in filesystem blocks. For all AGs except the last, this must be equal to the superblock's <command>sb_agblocks</command> value. For the last AG, this could be less than the <command>sb_agblocks</command> value. It is this value that should be used to determine the size of the AG.</para></listitem>
</varlistentry>
<varlistentry>
<term>agf_roots</term>
<listitem><para>Specifies the block number for the root of the two free space B+trees. </para></listitem>
</varlistentry>
<varlistentry>
<term>agf_levels</term>
<listitem><para>Specifies the level or depth of the two free space B+trees. For a fresh AG, this will be one, and the "roots" will point to a single leaf of level 0.</para></listitem>
</varlistentry>
<varlistentry>
<term>agf_flfirst</term>
<listitem><para>Specifies the index of the first "free list" block. Free lists are covered in more detail later on.</para></listitem>
</varlistentry>
<varlistentry>
<term>agf_fllast</term>
<listitem><para>Specifies the index of the last "free list" block.</para></listitem>
</varlistentry>
<varlistentry>
<term>agf_flcount</term>
<listitem><para>Specifies the number of blocks in the "free list".</para></listitem>
</varlistentry>
<varlistentry>
<term>agf_freeblks</term>
<listitem><para>Specifies the current number of free blocks in the AG.</para></listitem>
</varlistentry>
<varlistentry>
<term>agf_longest</term>
<listitem><para>Specifies the number of blocks of longest contiguous free space in the AG.</para></listitem>
</varlistentry>
<varlistentry>
<term>agf_btreeblks</term>
<listitem><para>Specifies the number of blocks used for the free space B+trees. This is only used if the <command>XFS_SB_VERSION2_LAZYSBCOUNTBIT</command> bit is set in <command>sb_features2</command>.</para></listitem>
</varlistentry>
</variablelist>
</section>
<section id="AG_Free_Space_Btrees">
<title>AG Free Space B+trees</title>
<para>The two Free Space B+trees store a sorted array of block offset and block counts in the leaves of the B+tree. The first B+tree is sorted by the offset, the second by the count or size.</para>
<para>The trees use the following header:</para>
<programlisting>
typedef struct xfs_btree_sblock {
__be32 bb_magic;
__be16 bb_level;
__be16 bb_numrecs;
__be32 bb_leftsib;
__be32 bb_rightsib;
} xfs_btree_sblock_t;
</programlisting>
<para>Leaves contain a sorted array of offset/count pairs which are also used for node keys:</para>
<programlisting>
typedef struct xfs_alloc_rec {
__be32 ar_startblock;
__be32 ar_blockcount;
} xfs_alloc_rec_t, xfs_alloc_key_t;
</programlisting>
<para>Node pointers are an AG relative block pointer:</para>
<programlisting>typedef __be32 xfs_alloc_ptr_t;</programlisting>
<itemizedlist>
<listitem>
<para>As the free space tracking is AG relative, all the block numbers are only 32-bits.</para>
</listitem>
<listitem>
<para>The <command>bb_magic</command> value depends on the B+tree: "ABTB" (0x41425442) for the block offset B+tree, "ABTC" (0x41425443) for the block count B+tree.</para>
</listitem>
<listitem>
<para>The <command>xfs_btree_sblock_t</command> header is used for intermediate B+tree node as well as the leaves.</para>
</listitem>
<listitem>
<para>For a typical 4KB filesystem block size, the offset for the <command>xfs_alloc_ptr_t</command> array would be <command>0xab0</command> (2736 decimal).</para>
</listitem>
<listitem>
<para>There are a series of macros in <command>xfs_btree.h</command> for deriving the offsets, counts, maximums, etc for the B+trees used in XFS.</para>
</listitem>
</itemizedlist>
<para>The following diagram shows a single level B+tree which consists of one leaf:</para>
<para>
<inlinemediaobject>
<imageobject><imagedata fileref="images/15a.png" format="PNG" width="100%" scalefit="0"/></imageobject>
<textobject><phrase>15a</phrase></textobject>
</inlinemediaobject>
</para>
<para>With the intermediate nodes, the associated leaf pointers are stored in a separate array about two thirds into the block. The following diagram illustrates a 2-level B+tree for a free space B+tree:</para>
<para>
<mediaobject>
<imageobject><imagedata fileref="images/15b.png" format="PNG" width="100%" scalefit="0"/></imageobject>
<textobject><phrase>15b</phrase></textobject>
</mediaobject>
</para>
</section>
<section id="AG_Free_List"><title>AG Free List</title>
<para>The AG Free List is located in the 4<superscript>th</superscript> sector of each AG and is known as the AGFL. It is an array of AG relative block pointers for reserved space for growing the free space B+trees. This space cannot be used for general user data including inodes, data, directories and extended attributes.</para>
<para>With a freshly made filesystem, 4 blocks are reserved immediately after the free space B+tree root blocks (blocks 4 to 7). As they are used up as the free space fragments, additional blocks will be reserved from the AG and added to the free list array.</para>
<para>As the free list array is located within a single sector, a typical device will have space for 128 elements in the array (512 bytes per sector, 4 bytes per AG relative block pointer). The actual size can be determined by using the <command>XFS_AGFL_SIZE</command> macro.</para>
<para>Active elements in the array are specified by the AGF's (<xref linkend="AG_Free_Space_Block"/>) <command>agf_flfirst</command>, <command>agf_fllast</command> and <command>agf_flcount</command> values. The array is managed as a circular list.</para>
<para>
<mediaobject>
<imageobject><imagedata fileref="images/16.png" format="PNG" /></imageobject>
<textobject><phrase>16</phrase></textobject>
</mediaobject>
</para>
<para>The presence of these reserved block guarantees that the free space B+trees can be updated if any blocks are freed by extent changes in a full AG.</para>
<bridgehead>xfs_db Examples:</bridgehead>
<para>These examples are derived from an AG that has been deliberately fragmented.</para>
<para>The AGF:</para>
<programlisting>
xfs_db&gt; agf &lt;ag#&gt;
xfs_db> p
magicnum = 0x58414746
versionnum = 1
seqno = 0
length = 3923122
bnoroot = 7
cntroot = 83343
bnolevel = 2
cntlevel = 2
flfirst = 22
fllast = 27
flcount = 6
freeblks = 3654234
longest = 3384327
btreeblks = 0
</programlisting>
<para>In the AGFL, the active elements are from 22 to 27 inclusive which are obtained from the <command>flfirst</command> and <command>fllast</command> values from the <command>agf</command> in the previous example:</para>
<programlisting>
xfs_db> agfl 0
xfs_db> p
bno[0-127] = 0:4 1:5 2:6 3:7 4:83342 5:83343 6:83344 7:83345 8:83346 9:83347
10:4 11:5 12:80205 13:80780 14:81496 15:81766 16:83346 17:4 18:5
19:80205 20:82449 21:81496 22:81766 23:82455 24:80780 25:5
26:80205 27:83344
</programlisting>
<para>The free space B+tree sorted by block offset, the root block is from the AGF's <command>bnoroot</command> value:</para>
<programlisting>
xfs_db> fsblock 7
xfs_db> type bnobt
xfs_db> p
magic = 0x41425442
level = 1
numrecs = 4
leftsib = null
rightsib = null
keys[1-4] = [startblock,blockcount]
1:[12,16] 2:[184586,3] 3:[225579,1] 4:[511629,1]
ptrs[1-4] = 1:2 2:83347 3:6 4:4
</programlisting>
<para>Blocks 2, 83347, 6 and 4 contain the leaves for the free space B+tree by starting block. Block 2 would contain offsets 16 up to but not including 184586 while block 4 would have all offsets from 511629 to the end of the AG.</para>
<para>The free space B+tree sorted by block count, the root block is from the AGF's <command>cntroot</command> value:</para>
<programlisting>
xfs_db> fsblock 83343
xfs_db> type cntbt
xfs_db> p
magic = 0x41425443
level = 1
numrecs = 4
leftsib = null
rightsib = null
keys[1-4] = [blockcount,startblock]
1:[1,81496] 2:[1,511729] 3:[3,191875] 4:[6,184595]
ptrs[1-4] = 1:3 2:83345 3:83342 4:83346
</programlisting>
<para>The leaf in block 3, in this example, would only contain single block counts. The offsets are sorted in ascending order if the block count is the same.</para>
<para>Inspecting the leaf in block 83346, we can see the largest block at the end:</para>
<programlisting>
xfs_db> fsblock 83346
xfs_db> type cntbt
xfs_db> p
magic = 0x41425443
level = 0
numrecs = 344
leftsib = 83342
rightsib = null
recs[1-344] = [startblock,blockcount]
1:[184595,6] 2:[187573,6] 3:[187776,6]
...
342:[513712,755] 343:[230317,258229] 344:[538795,3384327]
</programlisting>
<para>The longest block count must be the same as the AGF's <command>longest</command> value.</para>
</section>
</section>
<section id="AG_Inode_Management">
<title>AG Inode Management</title>
<section id="Inode_Numbers">
<title>Inode Numbers</title>
<para>Inode numbers in XFS come in two forms: AG relative and absolute.</para>
<para>AG relative inode numbers always fit within 32 bits. The number of bits actually used is determined by the sum of the superblock's (<xref linkend="Superblocks"/>) <command>sb_inoplog</command> and <command>sb_agblklog</command> values. Relative inode numbers are found within the AG's inode structures.</para>
<para>Absolute inode numbers include the AG number in the high bits, above the bits used for the AG relative inode number. Absolute inode numbers are found in directory (<xref linkend="Directories"/>) entries.</para>
<para>
<mediaobject>
<imageobject><imagedata fileref="images/18.png" format="PNG" width="100%" scalefit="0"/></imageobject>
<textobject><phrase>18</phrase></textobject>
</mediaobject>
</para>
</section>
<section id="Inode_Information">
<title>Inode Information</title>
<para>Each AG manages its own inodes. The third sector in the AG contains information about the AG's inodes and is known as the AGI.</para>
<para>The AGI uses the following structure:</para>
<programlisting>
typedef struct xfs_agi {
__be32 agi_magicnum;
__be32 agi_versionnum;
__be32 agi_seqno
__be32 agi_length;
__be32 agi_count;
__be32 agi_root;
__be32 agi_level;
__be32 agi_freecount;
__be32 agi_newino;
__be32 agi_dirino;
__be32 agi_unlinked[64];
} xfs_agi_t;
</programlisting>
<variablelist>
<varlistentry>
<term>agi_magicnum</term>
<listitem><para>Specifies the magic number for the AGI sector: "XAGI" (0x58414749).</para></listitem>
</varlistentry>
<varlistentry>
<term>agi_versionnum</term>
<listitem><para>Set to <command>XFS_AGI_VERSION</command> which is currently 1.</para></listitem>
</varlistentry>
<varlistentry>
<term>agi_seqno</term>
<listitem><para>Specifies the AG number for the sector.</para></listitem>
</varlistentry>
<varlistentry>
<term>agi_length</term>
<listitem><para>Specifies the size of the AG in filesystem blocks. </para></listitem>
</varlistentry>
<varlistentry>
<term>agi_count</term>
<listitem><para>Specifies the number of inodes allocated for the AG.</para></listitem>
</varlistentry>
<varlistentry>
<term>agi_root</term>
<listitem><para>Specifies the block number in the AG containing the root of the inode B+tree.</para></listitem>
</varlistentry>
<varlistentry>
<term>agi_level</term>
<listitem><para>Specifies the number of levels in the inode B+tree.</para></listitem>
</varlistentry>
<varlistentry>
<term>agi_freecount</term>
<listitem><para>Specifies the number of free inodes in the AG.</para></listitem>
</varlistentry>
<varlistentry>
<term>agi_newino</term>
<listitem><para>Specifies AG relative inode number most recently allocated.</para></listitem>
</varlistentry>
<varlistentry>
<term>agi_dirino</term>
<listitem><para>Deprecated and not used, it's always set to NULL (-1).</para></listitem>
</varlistentry>
<varlistentry>
<term>agi_unlinked[64]</term>
<listitem><para>Hash table of unlinked (deleted) inodes that are still being referenced. Refer to <xref linkend="Unlinked_Pointer"/> for more information.</para></listitem>
</varlistentry>
</variablelist>
</section>
<section id="Inode_Btrees"><title>Inode B+trees</title>
<para>Inodes are allocated in chunks of 64, and a B+tree is used to track these chunks of inodes as they are allocated and freed. The block containing root of the B+tree is defined by the AGI's <command>agi_root</command> value.</para>
<para>The B+tree header for the nodes and leaves use the <command>xfs_btree_sblock</command> structure which is the same as the header used in the AGF B+trees (<xref linkend="AG_Free_Space_Btrees"/>):</para>
<programlisting>typedef struct xfs_btree_sblock xfs_inobt_block_t;</programlisting>
<para>Leaves contain an array of the following structure:</para>
<programlisting>
typedef struct xfs_inobt_rec {
__be32 ir_startino;
__be32 ir_freecount;
__be64 ir_free;
} xfs_inobt_rec_t;
</programlisting>
<para>Nodes contain key/pointer pairs using the following types:</para>
<programlisting>
typedef struct xfs_inobt_key {
__be32 ir_startino;
} xfs_inobt_key_t;
typedef __be32 xfs_inobt_ptr_t;
</programlisting>
<para>For the leaf entries, <command>ir_startino</command> specifies the starting inode number for the chunk, <command>ir_freecount</command> specifies the number of free entries in the chuck, and the <command>ir_free</command> is a 64 element bit array specifying which entries are free in the chunk.</para>
<para>The following diagram illustrates a single level inode B+tree:</para>
<para>
<mediaobject>
<imageobject><imagedata fileref="images/20a.png" format="PNG" width="100%" scalefit="0"/></imageobject>
<textobject><phrase>20a</phrase></textobject>
</mediaobject>
</para>
<para>And a 2-level inode B+tree:</para>
<para>
<mediaobject>
<imageobject><imagedata fileref="images/20b.png" format="PNG" width="100%" scalefit="0"/></imageobject>
<textobject><phrase>20b</phrase></textobject>
</mediaobject>
</para>
<bridgehead>xfs_db Examples:</bridgehead>
<para>TODO:</para></section></section><section id="Real-time_Devices"><title>
Real-time Devices</title>
<para>TODO:</para></section></chapter>