blob: bd34eb0d30666d90e0c66f138110f79ee07088aa [file] [log] [blame]
[[Superblocks]]
== Superblocks
Each AG starts with a superblock. The first one, in AG 0, is the primary
superblock which stores aggregate AG information. Secondary superblocks are
only used by xfs_repair when the primary superblock has been corrupted. A
superblock is one sector in length.
The superblock is defined by the following structure. The description of each
field follows.
[source, c]
----
struct xfs_dsb {
__be32 sb_magicnum;
__be32 sb_blocksize;
__be64 sb_dblocks;
__be64 sb_rblocks;
__be64 sb_rextents;
uuid_t sb_uuid;
__be64 sb_logstart;
__be64 sb_rootino;
__be64 sb_rbmino;
__be64 sb_rsumino;
__be32 sb_rextsize;
__be32 sb_agblocks;
__be32 sb_agcount;
__be32 sb_rbmblocks;
__be32 sb_logblocks;
__be16 sb_versionnum;
__be16 sb_sectsize;
__be16 sb_inodesize;
__be16 sb_inopblock;
char sb_fname[XFSLABEL_MAX];
__u8 sb_blocklog;
__u8 sb_sectlog;
__u8 sb_inodelog;
__u8 sb_inopblog;
__u8 sb_agblklog;
__u8 sb_rextslog;
__u8 sb_inprogress;
__u8 sb_imax_pct;
__be64 sb_icount;
__be64 sb_ifree;
__be64 sb_fdblocks;
__be64 sb_frextents;
__be64 sb_uquotino;
__be64 sb_gquotino;
__be16 sb_qflags;
__u8 sb_flags;
__u8 sb_shared_vn;
__be32 sb_inoalignmt;
__be32 sb_unit;
__be32 sb_width;
__u8 sb_dirblklog;
__u8 sb_logsectlog;
__be16 sb_logsectsize;
__be32 sb_logsunit;
__be32 sb_features2;
__be32 sb_bad_features2;
/* version 5 superblock fields start here */
__be32 sb_features_compat;
__be32 sb_features_ro_compat;
__be32 sb_features_incompat;
__be32 sb_features_log_incompat;
__le32 sb_crc;
__be32 sb_spino_align;
__be64 sb_pquotino;
__be64 sb_lsn;
uuid_t sb_meta_uuid;
__be64 sb_metadirino;
__be32 sb_rgcount;
__be32 sb_rgextents;
__u8 sb_rgblklog;
__u8 sb_pad[7];
__be64 sb_rtstart;
__be64 sb_rtreserved;
/* must be padded to 64 bit alignment */
};
----
*sb_magicnum*::
Identifies the filesystem. Its value is +XFS_SB_MAGIC+ ``XFSB'' (0x58465342).
*sb_blocksize*::
The size of a basic unit of space allocation in bytes. Typically, this is 4096
(4KB) but can range from 512 to 65536 bytes.
*sb_dblocks*::
Total number of blocks available for data and metadata on the filesystem.
*sb_rblocks*::
Number blocks in the real-time disk device. Refer to
xref:Real-time_Devices[real-time sub-volumes] for more information.
*sb_rextents*::
Number of extents on the real-time device.
*sb_uuid*::
UUID (Universally Unique ID) for the filesystem. Filesystems can be mounted by
the UUID instead of device name.
*sb_logstart*::
First block number for the journaling log if the log is internal (ie. not on a
separate disk device). For an external log device, this will be zero (the log
will also start on the first block on the log device). The identity of the log
devices is not recorded in the filesystem, but the UUIDs of the filesystem and
the log device are compared to prevent corruption.
*sb_rootino*::
Root inode number for the filesystem. Normally, the root inode is at the
start of the first possible inode chunk in AG 0. This is 128 when using a 4KB
block size.
*sb_rbmino*::
Bitmap inode for real-time extents.
*sb_rsumino*::
Summary inode for real-time bitmap.
*sb_rextsize*::
Realtime extent size in blocks.
*sb_agblocks*::
Size of each AG in blocks. For the actual size of the last AG, refer to the
xref:AG_Free_Space_Management[free space] +agf_length+ value.
*sb_agcount*::
Number of AGs in the filesystem.
*sb_rbmblocks*::
Number of real-time bitmap blocks.
*sb_logblocks*::
Number of blocks for the journaling log.
*sb_versionnum*::
Filesystem version number. This is a bitmask specifying the features enabled
when creating the filesystem. Any disk checking tools or drivers that do not
recognize any set bits must not operate upon the filesystem. Most of the flags
indicate features introduced over time. If the value of the lower nibble is >=
4, the higher bits indicate feature flags as follows:
.Version 4 Superblock version flags
[options="header"]
|=====
| Flag | Description
| +XFS_SB_VERSION_ATTRBIT+ |
Set if any inode have extended attributes. If this bit is set; the
+XFS_SB_VERSION2_ATTR2BIT+ is not set; and the +attr2+ mount flag is not
specified, the +di_forkoff+ inode field will not be dynamically adjusted.
See the section about xref:Extended_Attribute_Versions[extended attribute
versions] for more information.
| +XFS_SB_VERSION_NLINKBIT+ | Set if any inodes use 32-bit di_nlink values.
| +XFS_SB_VERSION_QUOTABIT+ |
Quotas are enabled on the filesystem. This
also brings in the various quota fields in the superblock.
| +XFS_SB_VERSION_ALIGNBIT+ | Set if sb_inoalignmt is used.
| +XFS_SB_VERSION_DALIGNBIT+ | Set if sb_unit and sb_width are used.
| +XFS_SB_VERSION_SHAREDBIT+ | Set if sb_shared_vn is used.
| +XFS_SB_VERSION_LOGV2BIT+ | Version 2 journaling logs are used.
| +XFS_SB_VERSION_SECTORBIT+ | Set if sb_sectsize is not 512.
| +XFS_SB_VERSION_EXTFLGBIT+ | Unwritten extents are used. This is always set.
| +XFS_SB_VERSION_DIRV2BIT+ |
Version 2 directories are used. This is always set.
| +XFS_SB_VERSION_MOREBITSBIT+ |
Set if the sb_features2 field in the superblock contains more flags.
|=====
If the lower nibble of this value is 5, then this is a v5 filesystem; the
+XFS_SB_VERSION2_CRCBIT+ feature must be set in +sb_features2+.
*sb_sectsize*::
Specifies the underlying disk sector size in bytes. Typically this is 512 or
4096 bytes. This determines the minimum I/O alignment, especially for direct I/O.
*sb_inodesize*::
Size of the inode in bytes. The default is 256 (2 inodes per standard sector)
but can be made as large as 2048 bytes when creating the filesystem. On a v5
filesystem, the default and minimum inode size are both 512 bytes.
*sb_inopblock*::
Number of inodes per block. This is equivalent to +sb_blocksize / sb_inodesize+.
*sb_fname[12]*::
Name for the filesystem. This value can be used in the mount command.
*sb_blocklog*::
log~2~ value of +sb_blocksize+. In other terms, +sb_blocksize = 2^sb_blocklog^+.
*sb_sectlog*::
log~2~ value of +sb_sectsize+.
*sb_inodelog*::
log~2~ value of +sb_inodesize+.
*sb_inopblog*::
log~2~ value of +sb_inopblock+.
*sb_agblklog*::
log~2~ value of +sb_agblocks+ (rounded up). This value is used to generate inode
numbers and absolute block numbers defined in extent maps.
*sb_rextslog*::
log~2~ value of +sb_rextents+.
*sb_inprogress*::
Flag specifying that the filesystem is being created.
*sb_imax_pct*::
Maximum percentage of filesystem space that can be used for inodes. The default
value is 5%.
*sb_icount*::
Global count for number inodes allocated on the filesystem. This is only
maintained in the first superblock.
*sb_ifree*::
Global count of free inodes on the filesystem. This is only maintained in the
first superblock.
*sb_fdblocks*::
Global count of free data blocks on the filesystem. This is only maintained in
the first superblock.
*sb_frextents*::
Global count of free real-time extents on the filesystem. This is only
maintained in the first superblock.
*sb_uquotino*::
Inode for user quotas. This and the following two quota fields only apply if
+XFS_SB_VERSION_QUOTABIT+ flag is set in +sb_versionnum+. Refer to
xref:Quota_Inodes[quota inodes] for more information.
*sb_gquotino*::
Inode for group or project quotas. Group and project quotas cannot be used at
the same time on v4 filesystems. On a v5 filesystem, this inode always stores
group quota information.
*sb_qflags*::
Quota flags. It can be a combination of the following flags:
.Superblock quota flags
[options="header"]
|=====
| Flag | Description
| +XFS_UQUOTA_ACCT+ | User quota accounting is enabled.
| +XFS_UQUOTA_ENFD+ | User quotas are enforced.
| +XFS_UQUOTA_CHKD+ | User quotas have been checked.
| +XFS_PQUOTA_ACCT+ | Project quota accounting is enabled.
| +XFS_OQUOTA_ENFD+ | Other (group/project) quotas are enforced.
| +XFS_OQUOTA_CHKD+ | Other (group/project) quotas have been checked.
| +XFS_GQUOTA_ACCT+ | Group quota accounting is enabled.
| +XFS_GQUOTA_ENFD+ | Group quotas are enforced.
| +XFS_GQUOTA_CHKD+ | Group quotas have been checked.
| +XFS_PQUOTA_ENFD+ | Project quotas are enforced.
| +XFS_PQUOTA_CHKD+ | Project quotas have been checked.
|=====
If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, the +sb_qflags+ field
will persist across mounts if no quota mount options are provided.
*sb_flags*::
Miscellaneous flags.
.Superblock flags
[options="header"]
|=====
| Flag | Description
| +XFS_SBF_READONLY+ | Only read-only mounts allowed.
|=====
*sb_shared_vn*::
Reserved and must be zero (``vn'' stands for version number).
*sb_inoalignmt*::
Inode chunk alignment in fsblocks. Prior to v5, the default value provided for
inode chunks to have an 8KiB alignment. Starting with v5, the default value
scales with the multiple of the inode size over 256 bytes. Concretely, this
means an alignment of 16KiB for 512-byte inodes, 32KiB for 1024-byte inodes,
etc. If sparse inodes are enabled, the +ir_startino+ field of each inode
B+tree record must be aligned to this block granularity, even if the inode
given by +ir_startino+ itself is sparse.
*sb_unit*::
Underlying stripe or raid unit in blocks.
*sb_width*::
Underlying stripe or raid width in blocks.
*sb_dirblklog*::
log~2~ multiplier that determines the granularity of directory block allocations
in fsblocks.
*sb_logsectlog*::
log~2~ value of the log subvolume's sector size. This is only used if the
journaling log is on a separate disk device (i.e. not internal).
*sb_logsectsize*::
The log's sector size in bytes if the filesystem uses an external log device.
*sb_logsunit*::
The log device's stripe or raid unit size. This only applies to version 2 logs
+XFS_SB_VERSION_LOGV2BIT+ is set in +sb_versionnum+.
*sb_features2*::
Additional version flags if +XFS_SB_VERSION_MOREBITSBIT+ is set in
+sb_versionnum+. The currently defined additional features include:
.Extended Version 4 Superblock flags
[options="header"]
|=====
| Flag | Description
| +XFS_SB_VERSION2_LAZYSBCOUNTBIT+ |
Lazy global counters. Making a filesystem with this bit set can improve
performance. The global free space and inode counts are only updated in the
primary superblock when the filesystem is cleanly unmounted.
| +XFS_SB_VERSION2_ATTR2BIT+ |
Extended attributes version 2. Making a filesystem with this optimises the
inode layout of extended attributes. If this bit is set and the +noattr2+
mount flag is not specified, the +di_forkoff+ inode field will be dynamically
adjusted. See the section about xref:Extended_Attribute_Versions[extended
attribute versions] for more information.
| +XFS_SB_VERSION2_PARENTBIT+ |
Parent pointers. All inodes must have an extended attribute that points back to
its parent inode. The primary purpose for this information is in backup systems.
| +XFS_SB_VERSION2_PROJID32BIT+ |
32-bit Project ID. Inodes can be associated with a project ID number, which
can be used to enforce disk space usage quotas for a particular group of
directories. This flag indicates that project IDs can be 32 bits in size.
| +XFS_SB_VERSION2_CRCBIT+ |
Metadata checksumming. All metadata blocks have an extended header containing
the block checksum, a copy of the metadata UUID, the log sequence number of the
last update to prevent stale replays, and a back pointer to the owner of the
block. This feature must be and can only be set if the lowest nibble of
+sb_versionnum+ is set to 5.
| +XFS_SB_VERSION2_FTYPE+ |
Directory file type. Each directory entry records the type of the inode to
which the entry points. This speeds up directory iteration by removing the
need to load every inode into memory.
|=====
*sb_bad_features2*::
This field mirrors +sb_features2+, due to past 64-bit alignment errors.
*sb_features_compat*::
Read-write compatible feature flags. The kernel can still read and write this
FS even if it doesn't understand the flag. Currently, there are no valid
flags.
*sb_features_ro_compat*::
Read-only compatible feature flags. The kernel can still read this FS even if
it doesn't understand the flag.
.Extended Version 5 Superblock Read-Only compatibility flags
[options="header"]
|=====
| Flag | Description
| +XFS_SB_FEAT_RO_COMPAT_FINOBT+ |
Free inode B+tree. Each allocation group contains a B+tree to track inode chunks
containing free inodes. This is a performance optimization to reduce the time
required to allocate inodes.
| +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ |
Reverse mapping B+tree. Each allocation group contains a B+tree containing
records mapping AG blocks to their owners. See the section about
xref:Reconstruction[reconstruction] for more details.
| +XFS_SB_FEAT_RO_COMPAT_REFLINK+ |
Reference count B+tree. Each allocation group contains a B+tree to track the
reference counts of AG blocks. This enables files to share data blocks safely.
See the section about xref:Reflink_Deduplication[reflink and deduplication] for
more details.
| +XFS_SB_FEAT_RO_COMPAT_INOBTCNT+ |
Inode B+tree block counters. Each allocation group's inode (AGI) header
tracks the number of blocks in each of the inode B+trees. This allows us
to have a slightly higher level of redundancy over the shape of the inode
btrees, and decreases the amount of time to compute the metadata B+tree
preallocations at mount time.
|=====
*sb_features_incompat*::
Read-write incompatible feature flags. The kernel cannot read or write this
FS if it doesn't understand the flag.
.Extended Version 5 Superblock Read-Write incompatibility flags
[options="header"]
|=====
| Flag | Description
| +XFS_SB_FEAT_INCOMPAT_FTYPE+ |
Directory file type. Each directory entry tracks the type of the inode to
which the entry points. This is a performance optimization to remove the need
to load every inode into memory to iterate a directory.
| +XFS_SB_FEAT_INCOMPAT_SPINODES+ |
Sparse inodes. This feature relaxes the requirement to allocate inodes in
chunks of 64. When the free space is heavily fragmented, there might exist
plenty of free space but not enough contiguous free space to allocate a new
inode chunk. With this feature, the user can continue to create files until
all free space is exhausted.
Unused space in the inode B+tree records are used to track which parts of the
inode chunk are not inodes.
See the chapter on xref:Sparse_Inodes[Sparse Inodes] for more information.
| +XFS_SB_FEAT_INCOMPAT_META_UUID+ |
Metadata UUID. The UUID stamped into each metadata block must match the value
in +sb_meta_uuid+. This enables the administrator to change +sb_uuid+ at will
without having to rewrite the entire filesystem.
| +XFS_SB_FEAT_INCOMPAT_BIGTIME+ |
Large timestamps. Inode timestamps and quota expiration timers are extended to
support times through the year 2486. See the section on
xref:Timestamps[timestamps] for more information.
| +XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR+ |
The filesystem is not in operable condition, and must be run through
xfs_repair before it can be mounted.
| +XFS_SB_FEAT_INCOMPAT_NREXT64+ |
Large file fork extent counts. This greatly expands the maximum number of
space mappings allowed in data and extended attribute file forks.
| +XFS_SB_FEAT_INCOMPAT_EXCHRANGE+ |
Atomic file mapping exchanges. The filesystem is capable of exchanging a range
of mappings between two arbitrary ranges of a file's fork by using log intent
items to track the progress of the high level exchange operation. In other
words, the exchange operation can be restarted if the system goes down, which
is necessary for userspace to commit of new file contents atomically. This
flag has user-visible impacts, which is why it is a permanent incompat flag.
See the section about xref:XMI_Log_Item[mapping exchange log intents] for more
information.
| +XFS_SB_FEAT_INCOMPAT_PARENT+ |
Directory parent pointers. See the section about xref:Parent_Pointers[parent
pointers] for more information.
| +XFS_SB_FEAT_INCOMPAT_METADIR+ |
Metadata directory tree. See the section about the xref:Metadata_Directories[
metadata directory tree] for more information.
| +XFS_SB_FEAT_INCOMPAT_ZONED+ |
Zoned RT device. See the section about the xref:Zoned[Zoned Real-time Devices]
for more information.
| +XFS_SB_FEAT_INCOMPAT_ZONE_GAPS+ |
Each hardware zone has unusable space at the end of its LBA range, which is
mirrored by unusable filesystem blocks at the end of the rtgroup. The
+xfs_rtblock_t startblock+ in file mappings is linearly mapped to the
hardware LBA space.
|=====
*sb_features_log_incompat*::
Read-write incompatible feature flags for the log. The kernel cannot recover
the FS log if it doesn't understand the flag.
.Extended Version 5 Superblock Log incompatibility flags
[options="header"]
|=====
| Flag | Description
| +XFS_SB_FEAT_INCOMPAT_LOG_XATTRS+ |
Extended attribute updates have been committed to the ondisk log.
|=====
*sb_crc*::
Superblock checksum.
*sb_spino_align*::
Sparse inode alignment, in fsblocks. Each chunk of inodes referenced by a
sparse inode B+tree record must be aligned to this block granularity.
*sb_pquotino*::
Project quota inode.
*sb_lsn*::
Log sequence number of the last superblock update.
*sb_meta_uuid*::
If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in
all metadata blocks must match this UUID. If not, the block header UUID field
must match +sb_uuid+.
*sb_metadirino*::
If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is set, this field points to
the inode of the root directory of the metadata directory tree.
This field is zero otherwise.
*sb_rgcount*::
Count of realtime groups in the filesystem, if the
+XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled. If no realtime subvolume
exists, this value will be zero.
*sb_rgextents*::
Maximum number of realtime extents that can be contained within a realtime
group, if the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled.
*sb_rgblklog*::
If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled, this is the log~2~
value of +sb_rgextents+ * +sb_rextsize+ (rounded up). This value is used to
generate absolute block numbers defined in extent maps from the segmented
+xfs_rtblock_t+ values.
*sb_pad[7]*::
Zeroes, if the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled.
*sb_rtstart*::
If the +XFS_SB_FEAT_INCOMPAT_ZONED+ feature is enabled, this is the start
of the internal RT section. That is the RT section is placed on the same
device as the data device, and starts at this offset into the device.
The value is in units of file system blocks.
*sb_rtreserved*::
If the +XFS_SB_FEAT_INCOMPAT_ZONED+ feature is enabled, this is the amount
of space in the realtime section that is reserved for internal use
by garbage collection and reorganization algorithms.
=== xfs_db Superblock Example
A filesystem is made on a single disk with the following command:
----
# mkfs.xfs -i attr=2 -n size=16384 -f /dev/sda7
meta-data=/dev/sda7 isize=256 agcount=16, agsize=3923122 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=62769952, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=16384
log =internal log bsize=4096 blocks=30649, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
----
And in xfs_db, inspecting the superblock:
----
xfs_db> sb
xfs_db> p
magicnum = 0x58465342
blocksize = 4096
dblocks = 62769952
rblocks = 0
rextents = 0
uuid = 32b24036-6931-45b4-b68c-cd5e7d9a1ca5
logstart = 33554436
rootino = 128
rbmino = 129
rsumino = 130
rextsize = 16
agblocks = 3923122
agcount = 16
rbmblocks = 0
logblocks = 30649
versionnum = 0xb084
sectsize = 512
inodesize = 256
inopblock = 16
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
blocklog = 12
sectlog = 9
inodelog = 8
inopblog = 4
agblklog = 22
rextslog = 0
inprogress = 0
imax_pct = 25
icount = 64
ifree = 61
fdblocks = 62739235
frextents = 0
uquotino = 0
gquotino = 0
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 2
unit = 0
width = 0
dirblklog = 2
logsectlog = 0
logsectsize = 0
logsunit = 0
features2 = 8
----