blob: ddcb87f4f9b2c5ad41460094d71d6ea9268070fa [file] [log] [blame]
[[Journaling_Log]]
= Journaling Log
[NOTE]
Only v2 log format is covered here.
The XFS journal exists on disk as a reserved extent of blocks within the
filesystem, or as a separate journal device. The journal itself can be thought
of as a series of log records; each log record contains a part of or a whole
transaction. A transaction consists of a series of log operation headers
(``log items''), formatting structures, and raw data. The first operation in a
transaction establishes the transaction ID and the last operation is a commit
record. The operations recorded between the start and commit operations
represent the metadata changes made by the transaction. If the commit
operation is missing, the transaction is incomplete and cannot be recovered.
[[Log_Records]]
== Log Records
The XFS log is split into a series of log records. Log records seem to
correspond to an in-core log buffer, which can be up to 256KiB in size. Each
record has a log sequence number, which is the same LSN recorded in the v5
metadata integrity fields.
Log sequence numbers are a 64-bit quantity consisting of two 32-bit quantities.
The upper 32 bits are the ``cycle number'', which increments every time XFS
cycles through the log. The lower 32 bits are the ``block number'', which is
assigned when a transaction is committed, and should correspond to the block
offset within the log.
A log record begins with the following header, which occupies 512 bytes on
disk:
[source, c]
----
typedef struct xlog_rec_header {
__be32 h_magicno;
__be32 h_cycle;
__be32 h_version;
__be32 h_len;
__be64 h_lsn;
__be64 h_tail_lsn;
__le32 h_crc;
__be32 h_prev_block;
__be32 h_num_logops;
__be32 h_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE];
/* new fields */
__be32 h_fmt;
uuid_t h_fs_uuid;
__be32 h_size;
} xlog_rec_header_t;
----
*h_magicno*::
The magic number of log records, 0xfeedbabe.
*h_cycle*::
Cycle number of this log record.
*h_version*::
Log record version, currently 2.
*h_len*::
Length of the log record, in bytes. Must be aligned to a 64-bit boundary.
*h_lsn*::
Log sequence number of this record.
*h_tail_lsn*::
Log sequence number of the first log record with uncommitted buffers.
*h_crc*::
Checksum of the log record header, the cycle data, and the log records
themselves.
*h_prev_block*::
Block number of the previous log record.
*h_num_logops*::
The number of log operations in this record.
*h_cycle_data*::
The first u32 of each log sector must contain the cycle number. Since log
item buffers are formatted without regard to this requirement, the original
contents of the first four bytes of each sector in the log are copied into the
corresponding element of this array. After that, the first four bytes of those
sectors are stamped with the cycle number. This process is reversed at
recovery time. If there are more sectors in this log record than there are
slots in this array, the cycle data continues for as many sectors are needed;
each sector is formatted as type +xlog_rec_ext_header+.
*h_fmt*::
Format of the log record. This is one of the following values:
.Log record formats
[options="header"]
|=====
| Format value | Log format
| +XLOG_FMT_UNKNOWN+ | Unknown. Perhaps this log is corrupt.
| +XLOG_FMT_LINUX_LE+ | Little-endian Linux.
| +XLOG_FMT_LINUX_BE+ | Big-endian Linux.
| +XLOG_FMT_IRIX_BE+ | Big-endian Irix.
|=====
*h_fs_uuid*::
Filesystem UUID.
*h_size*::
In-core log record size. This is somewhere between 16 and 256KiB, with 32KiB
being the default.
As mentioned earlier, if this log record is longer than 256 sectors, the cycle
data overflows into the next sector(s) in the log. Each of those sectors is
formatted as follows:
[source, c]
----
typedef struct xlog_rec_ext_header {
__be32 xh_cycle;
__be32 xh_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE];
} xlog_rec_ext_header_t;
----
*xh_cycle*::
Cycle number of this log record. Should match +h_cycle+.
*xh_cycle_data*::
Overflow cycle data.
[[Log_Operations]]
== Log Operations
Within a log record, log operations are recorded as a series consisting of an
operation header immediately followed by a data region. The operation header
has the following format:
[source, c]
----
typedef struct xlog_op_header {
__be32 oh_tid;
__be32 oh_len;
__u8 oh_clientid;
__u8 oh_flags;
__u16 oh_res2;
} xlog_op_header_t;
----
*oh_tid*::
Transaction ID of this operation.
*oh_len*::
Number of bytes in the data region.
*oh_clientid*::
The originator of this operation. This can be one of the following:
.Log Operation Client ID
[options="header"]
|=====
| Client ID | Originator
| +XFS_TRANSACTION+ | Operation came from a transaction.
| +XFS_VOLUME+ | ???
| +XFS_LOG+ | ???
|=====
*oh_flags*::
Specifies flags associated with this operation. This can be a combination of
the following values (though most likely only one will be set at a time):
.Log Operation Flags
[options="header"]
|=====
| Flag | Description
| +XLOG_START_TRANS+ | Start a new transaction. The next operation header should describe a transaction header.
| +XLOG_COMMIT_TRANS+ | Commit this transaction.
| +XLOG_CONTINUE_TRANS+ | Continue this trans into new log record.
| +XLOG_WAS_CONT_TRANS+ | This transaction started in a previous log record.
| +XLOG_END_TRANS+ | End of a continued transaction.
| +XLOG_UNMOUNT_TRANS+ | Transaction to unmount a filesystem.
|=====
*oh_res2*::
Padding.
The data region follows immediately after the operation header and is exactly
+oh_len+ bytes long. These payloads are in host-endian order, which means that
one cannot replay the log from an unclean XFS filesystem on a system with a
different byte order.
[[Log_Items]]
== Log Items
Following are the types of log item payloads that can follow an
+xlog_op_header+. Except for buffer data and inode cores, all log items have a
magic number to distinguish themselves. Buffer data items only appear after
+xfs_buf_log_format+ items; and inode core items only appear after
+xfs_inode_log_format+ items.
.Log Operation Magic Numbers
[options="header"]
|=====
| Magic | Hexadecimal | Operation Type
| +XFS_TRANS_HEADER_MAGIC+ | 0x5452414e | xref:Log_Transaction_Headers[Log Transaction Header]
| +XFS_LI_EFI+ | 0x1236 | xref:EFI_Log_Item[Extent Freeing Intent]
| +XFS_LI_EFD+ | 0x1237 | xref:EFD_Log_Item[Extent Freeing Done]
| +XFS_LI_IUNLINK+ | 0x1238 | Unknown?
| +XFS_LI_INODE+ | 0x123b | xref:Inode_Log_Item[Inode Updates]
| +XFS_LI_BUF+ | 0x123c | xref:Buffer_Log_Item[Buffer Writes]
| +XFS_LI_DQUOT+ | 0x123d | xref:Quota_Update_Log_Item[Update Quota]
| +XFS_LI_QUOTAOFF+ | 0x123e | xref:Quota_Off_Log_Item[Quota Off]
| +XFS_LI_ICREATE+ | 0x123f | xref:Inode_Create_Log_Item[Inode Creation]
| +XFS_LI_RUI+ | 0x1240 | xref:RUI_Log_Item[Reverse Mapping Update Intent]
| +XFS_LI_RUD+ | 0x1241 | xref:RUD_Log_Item[Reverse Mapping Update Done]
| +XFS_LI_CUI+ | 0x1242 | xref:CUI_Log_Item[Reference Count Update Intent]
| +XFS_LI_CUD+ | 0x1243 | xref:CUD_Log_Item[Reference Count Update Done]
| +XFS_LI_BUI+ | 0x1244 | xref:BUI_Log_Item[File Block Mapping Update Intent]
| +XFS_LI_BUD+ | 0x1245 | xref:BUD_Log_Item[File Block Mapping Update Done]
|=====
Note that all log items (except for transaction headers) MUST start with
the following header structure. The type and size fields are baked into
each log item header, but there is not a separately defined header.
[source, c]
----
struct xfs_log_item {
__uint16_t magic;
__uint16_t size;
};
----
[[Log_Transaction_Headers]]
=== Transaction Headers
A transaction header is an operation payload that starts a transaction.
[source, c]
----
typedef struct xfs_trans_header {
uint th_magic;
uint th_type;
__int32_t th_tid;
uint th_num_items;
} xfs_trans_header_t;
----
*th_magic*::
The signature of a transaction header, ``TRAN'' (0x5452414e). Note that this
value is in host-endian order, not big-endian like the rest of XFS.
*th_type*::
Transaction type. This is one of the following values:
[options="header"]
|=====
| Type | Description
| +XFS_TRANS_SETATTR_NOT_SIZE+ | Set an inode attribute that isn't the inode's size.
| +XFS_TRANS_SETATTR_SIZE+ | Setting the size attribute of an inode.
| +XFS_TRANS_INACTIVE+ | Freeing blocks from an unlinked inode.
| +XFS_TRANS_CREATE+ | Create a file.
| +XFS_TRANS_CREATE_TRUNC+ | Unused?
| +XFS_TRANS_TRUNCATE_FILE+ | Truncate a quota file.
| +XFS_TRANS_REMOVE+ | Remove a file.
| +XFS_TRANS_LINK+ | Link an inode into a directory.
| +XFS_TRANS_RENAME+ | Rename a path.
| +XFS_TRANS_MKDIR+ | Create a directory.
| +XFS_TRANS_RMDIR+ | Remove a directory.
| +XFS_TRANS_SYMLINK+ | Create a symbolic link.
| +XFS_TRANS_SET_DMATTRS+ | Set the DMAPI attributes of an inode.
| +XFS_TRANS_GROWFS+ | Expand the filesystem.
| +XFS_TRANS_STRAT_WRITE+ | Convert an unwritten extent or delayed-allocate some blocks to handle a write.
| +XFS_TRANS_DIOSTRAT+ | Allocate some blocks to handle a direct I/O write.
| +XFS_TRANS_WRITEID+ | Update an inode's preallocation flag.
| +XFS_TRANS_ADDAFORK+ | Add an attribute fork to an inode.
| +XFS_TRANS_ATTRINVAL+ | Erase the attribute fork of an inode.
| +XFS_TRANS_ATRUNCATE+ | Unused?
| +XFS_TRANS_ATTR_SET+ | Set an extended attribute.
| +XFS_TRANS_ATTR_RM+ | Remove an extended attribute.
| +XFS_TRANS_ATTR_FLAG+ | Unused?
| +XFS_TRANS_CLEAR_AGI_BUCKET+ | Clear a bad inode pointer in the AGI unlinked inode hash bucket.
| +XFS_TRANS_SB_CHANGE+ | Write the superblock to disk.
| +XFS_TRANS_QM_QUOTAOFF+ | Start disabling quotas.
| +XFS_TRANS_QM_DQALLOC+ | Allocate a disk quota structure.
| +XFS_TRANS_QM_SETQLIM+ | Adjust quota limits.
| +XFS_TRANS_QM_DQCLUSTER+ | Unused?
| +XFS_TRANS_QM_QINOCREATE+ | Create a (quota) inode with reference taken.
| +XFS_TRANS_QM_QUOTAOFF_END+ | Finish disabling quotas.
| +XFS_TRANS_FSYNC_TS+ | Update only inode timestamps.
| +XFS_TRANS_GROWFSRT_ALLOC+ | Grow the realtime bitmap and summary data for growfs.
| +XFS_TRANS_GROWFSRT_ZERO+ | Zero space in the realtime bitmap and summary data.
| +XFS_TRANS_GROWFSRT_FREE+ | Free space in the realtime bitmap and summary data.
| +XFS_TRANS_SWAPEXT+ | Swap data fork of two inodes.
| +XFS_TRANS_CHECKPOINT+ | Checkpoint the log.
| +XFS_TRANS_ICREATE+ | Unknown?
| +XFS_TRANS_CREATE_TMPFILE+ | Create a temporary file.
|=====
*th_tid*::
Transaction ID.
*th_num_items*::
The number of operations appearing after this operation, not including the
commit operation. In effect, this tracks the number of metadata change
operations in this transaction.
[[EFI_Log_Item]]
=== Intent to Free an Extent
The next two operation types work together to handle the freeing of filesystem
blocks. Naturally, the ranges of blocks to be freed can be expressed in terms
of extents:
[source, c]
----
typedef struct xfs_extent_32 {
__uint64_t ext_start;
__uint32_t ext_len;
} __attribute__((packed)) xfs_extent_32_t;
typedef struct xfs_extent_64 {
__uint64_t ext_start;
__uint32_t ext_len;
__uint32_t ext_pad;
} xfs_extent_64_t;
----
*ext_start*::
Start block of this extent.
*ext_len*::
Length of this extent.
The ``extent freeing intent'' operation comes first; it tells the log that XFS
wants to free some extents. This record is crucial for correct log recovery
because it prevents the log from replaying blocks that are subsequently freed.
If the log lacks a corresponding ``extent freeing done'' operation, the
recovery process will free the extents.
[source, c]
----
typedef struct xfs_efi_log_format {
__uint16_t efi_type;
__uint16_t efi_size;
__uint32_t efi_nextents;
__uint64_t efi_id;
xfs_extent_t efi_extents[1];
} xfs_efi_log_format_t;
----
*efi_type*::
The signature of an EFI operation, 0x1236. This value is in host-endian order,
not big-endian like the rest of XFS.
*efi_size*::
Size of this log item. Should be 1.
*efi_nextents*::
Number of extents to free.
*efi_id*::
A 64-bit number that binds the corresponding EFD log item to this EFI log item.
*efi_extents*::
Variable-length array of extents to be freed. The array length is given by
+efi_nextents+. The record type will be either +xfs_extent_64_t+ or
+xfs_extent_32_t+; this can be determined from the log item size (+oh_len+) and
the number of extents (+efi_nextents+).
[[EFD_Log_Item]]
=== Completion of Intent to Free an Extent
The ``extent freeing done'' operation complements the ``extent freeing intent''
operation. This second operation indicates that the block freeing actually
happened, so that log recovery needn't try to free the blocks. Typically, the
operations to update the free space B+trees follow immediately after the EFD.
[source, c]
----
typedef struct xfs_efd_log_format {
__uint16_t efd_type;
__uint16_t efd_size;
__uint32_t efd_nextents;
__uint64_t efd_efi_id;
xfs_extent_t efd_extents[1];
} xfs_efd_log_format_t;
----
*efd_type*::
The signature of an EFD operation, 0x1237. This value is in host-endian order,
not big-endian like the rest of XFS.
*efd_size*::
Size of this log item. Should be 1.
*efd_nextents*::
Number of extents to free.
*efd_id*::
A 64-bit number that binds the corresponding EFI log item to this EFD log item.
*efd_extents*::
Variable-length array of extents to be freed. The array length is given by
+efd_nextents+. The record type will be either +xfs_extent_64_t+ or
+xfs_extent_32_t+; this can be determined from the log item size (+oh_len+) and
the number of extents (+efd_nextents+).
[[RUI_Log_Item]]
=== Reverse Mapping Updates Intent
The next two operation types work together to handle deferred reverse mapping
updates. Naturally, the mappings to be updated can be expressed in terms of
mapping extents:
[source, c]
----
struct xfs_map_extent {
__uint64_t me_owner;
__uint64_t me_startblock;
__uint64_t me_startoff;
__uint32_t me_len;
__uint32_t me_flags;
};
----
*me_owner*::
Owner of this reverse mapping. See the values in the section about
xref:Reverse_Mapping_Btree[reverse mapping] for more information.
*me_startblock*::
Filesystem block of this mapping.
*me_startoff*::
Logical block offset of this mapping.
*me_len*::
The length of this mapping.
*me_flags*::
The lower byte of this field is a type code indicating what sort of
reverse mapping operation we want. The upper three bytes are flag bits.
.Reverse mapping update log intent types
[options="header"]
|=====
| Value | Description
| +XFS_RMAP_EXTENT_MAP+ | Add a reverse mapping for file data.
| +XFS_RMAP_EXTENT_MAP_SHARED+ | Add a reverse mapping for file data for a file with shared blocks.
| +XFS_RMAP_EXTENT_UNMAP+ | Remove a reverse mapping for file data.
| +XFS_RMAP_EXTENT_UNMAP_SHARED+ | Remove a reverse mapping for file data for a file with shared blocks.
| +XFS_RMAP_EXTENT_CONVERT+ | Convert a reverse mapping for file data between unwritten and normal.
| +XFS_RMAP_EXTENT_CONVERT_SHARED+ | Convert a reverse mapping for file data between unwritten and normal for a file with shared blocks.
| +XFS_RMAP_EXTENT_ALLOC+ | Add a reverse mapping for non-file data.
| +XFS_RMAP_EXTENT_FREE+ | Remove a reverse mapping for non-file data.
|=====
.Reverse mapping update log intent flags
[options="header"]
|=====
| Value | Description
| +XFS_RMAP_EXTENT_ATTR_FORK+ | Extent is for the attribute fork.
| +XFS_RMAP_EXTENT_BMBT_BLOCK+ | Extent is for a block mapping btree block.
| +XFS_RMAP_EXTENT_UNWRITTEN+ | Extent is unwritten.
|=====
The ``rmap update intent'' operation comes first; it tells the log that XFS
wants to update some reverse mappings. This record is crucial for correct log
recovery because it enables us to spread a complex metadata update across
multiple transactions while ensuring that a crash midway through the complex
update will be replayed fully during log recovery.
[source, c]
----
struct xfs_rui_log_format {
__uint16_t rui_type;
__uint16_t rui_size;
__uint32_t rui_nextents;
__uint64_t rui_id;
struct xfs_map_extent rui_extents[1];
};
----
*rui_type*::
The signature of an RUI operation, 0x1240. This value is in host-endian order,
not big-endian like the rest of XFS.
*rui_size*::
Size of this log item. Should be 1.
*rui_nextents*::
Number of reverse mappings.
*rui_id*::
A 64-bit number that binds the corresponding RUD log item to this RUI log item.
*rui_extents*::
Variable-length array of reverse mappings to update.
[[RUD_Log_Item]]
=== Completion of Reverse Mapping Updates
The ``reverse mapping update done'' operation complements the ``reverse mapping
update intent'' operation. This second operation indicates that the update
actually happened, so that log recovery needn't replay the update. The RUD and
the actual updates are typically found in a new transaction following the
transaction in which the RUI was logged.
[source, c]
----
struct xfs_rud_log_format {
__uint16_t rud_type;
__uint16_t rud_size;
__uint32_t __pad;
__uint64_t rud_rui_id;
};
----
*rud_type*::
The signature of an RUD operation, 0x1241. This value is in host-endian order,
not big-endian like the rest of XFS.
*rud_size*::
Size of this log item. Should be 1.
*rud_rui_id*::
A 64-bit number that binds the corresponding RUI log item to this RUD log item.
[[CUI_Log_Item]]
=== Reference Count Updates Intent
The next two operation types work together to handle reference count updates.
Naturally, the ranges of extents having reference count updates can be
expressed in terms of physical extents:
[source, c]
----
struct xfs_phys_extent {
__uint64_t pe_startblock;
__uint32_t pe_len;
__uint32_t pe_flags;
};
----
*pe_startblock*::
Filesystem block of this extent.
*pe_len*::
The length of this extent.
*pe_flags*::
The lower byte of this field is a type code indicating what sort of
reverse mapping operation we want. The upper three bytes are flag bits.
.Reference count update log intent types
[options="header"]
|=====
| Value | Description
| +XFS_REFCOUNT_EXTENT_INCREASE+ | Increase the reference count for this extent.
| +XFS_REFCOUNT_EXTENT_DECREASE+ | Decrease the reference count for this extent.
| +XFS_REFCOUNT_EXTENT_ALLOC_COW+ | Reserve an extent for staging copy on write.
| +XFS_REFCOUNT_EXTENT_FREE_COW+ | Unreserve an extent for staging copy on write.
|=====
The ``reference count update intent'' operation comes first; it tells the log
that XFS wants to update some reference counts. This record is crucial for
correct log recovery because it enables us to spread a complex metadata update
across multiple transactions while ensuring that a crash midway through the
complex update will be replayed fully during log recovery.
[source, c]
----
struct xfs_cui_log_format {
__uint16_t cui_type;
__uint16_t cui_size;
__uint32_t cui_nextents;
__uint64_t cui_id;
struct xfs_map_extent cui_extents[1];
};
----
*cui_type*::
The signature of an CUI operation, 0x1242. This value is in host-endian order,
not big-endian like the rest of XFS.
*cui_size*::
Size of this log item. Should be 1.
*cui_nextents*::
Number of reference count updates.
*cui_id*::
A 64-bit number that binds the corresponding RUD log item to this RUI log item.
*cui_extents*::
Variable-length array of reference count update information.
[[CUD_Log_Item]]
=== Completion of Reference Count Updates
The ``reference count update done'' operation complements the ``reference count
update intent'' operation. This second operation indicates that the update
actually happened, so that log recovery needn't replay the update. The CUD and
the actual updates are typically found in a new transaction following the
transaction in which the CUI was logged.
[source, c]
----
struct xfs_cud_log_format {
__uint16_t cud_type;
__uint16_t cud_size;
__uint32_t __pad;
__uint64_t cud_cui_id;
};
----
*cud_type*::
The signature of an RUD operation, 0x1243. This value is in host-endian order,
not big-endian like the rest of XFS.
*cud_size*::
Size of this log item. Should be 1.
*cud_cui_id*::
A 64-bit number that binds the corresponding CUI log item to this CUD log item.
[[BUI_Log_Item]]
=== File Block Mapping Intent
The next two operation types work together to handle deferred file block
mapping updates. The extents to be mapped are expressed via the
+xfs_map_extent+ structure discussed in the section about
xref:RUI_Log_Item[reverse mapping intents].
The lower byte of the +me_flags+ field is a type code indicating what sort of
file block mapping operation we want. The upper three bytes are flag bits.
.File block mapping update log intent types
[options="header"]
|=====
| Value | Description
| +XFS_BMAP_EXTENT_MAP+ | Add a mapping for file data.
| +XFS_BMAP_EXTENT_UNMAP+ | Remove a mapping for file data.
|=====
.File block mapping update log intent flags
[options="header"]
|=====
| Value | Description
| +XFS_BMAP_EXTENT_ATTR_FORK+ | Extent is for the attribute fork.
| +XFS_BMAP_EXTENT_UNWRITTEN+ | Extent is unwritten.
|=====
The ``file block mapping update intent'' operation comes first; it tells the
log that XFS wants to map or unmap some extents in a file. This record is
crucial for correct log recovery because it enables us to spread a complex
metadata update across multiple transactions while ensuring that a crash midway
through the complex update will be replayed fully during log recovery.
[source, c]
----
struct xfs_bui_log_format {
__uint16_t bui_type;
__uint16_t bui_size;
__uint32_t bui_nextents;
__uint64_t bui_id;
struct xfs_map_extent bui_extents[1];
};
----
*bui_type*::
The signature of an BUI operation, 0x1244. This value is in host-endian order,
not big-endian like the rest of XFS.
*bui_size*::
Size of this log item. Should be 1.
*bui_nextents*::
Number of file mappings. Should be 1.
*bui_id*::
A 64-bit number that binds the corresponding BUD log item to this BUI log item.
*bui_extents*::
Variable-length array of file block mappings to update. There should only
be one mapping present.
[[BUD_Log_Item]]
=== Completion of File Block Mapping Updates
The ``file block mapping update done'' operation complements the ``file block
mapping update intent'' operation. This second operation indicates that the
update actually happened, so that log recovery needn't replay the update. The
BUD and the actual updates are typically found in a new transaction following
the transaction in which the BUI was logged.
[source, c]
----
struct xfs_bud_log_format {
__uint16_t bud_type;
__uint16_t bud_size;
__uint32_t __pad;
__uint64_t bud_bui_id;
};
----
*bud_type*::
The signature of an BUD operation, 0x1245. This value is in host-endian order,
not big-endian like the rest of XFS.
*bud_size*::
Size of this log item. Should be 1.
*bud_bui_id*::
A 64-bit number that binds the corresponding BUI log item to this BUD log item.
[[Inode_Log_Item]]
=== Inode Updates
This operation records changes to an inode record. There are several types of
inode updates, each corresponding to different parts of the inode record.
Allowing updates to proceed at a sub-inode granularity reduces contention for
the inode, since different parts of the inode can be updated simultaneously.
The actual buffer data are stored in subsequent log items.
The inode log format header is as follows:
[source, c]
----
typedef struct xfs_inode_log_format_64 {
__uint16_t ilf_type;
__uint16_t ilf_size;
__uint32_t ilf_fields;
__uint16_t ilf_asize;
__uint16_t ilf_dsize;
__uint32_t ilf_pad;
__uint64_t ilf_ino;
union {
__uint32_t ilfu_rdev;
uuid_t ilfu_uuid;
} ilf_u;
__int64_t ilf_blkno;
__int32_t ilf_len;
__int32_t ilf_boffset;
} xfs_inode_log_format_64_t;
----
*ilf_type*::
The signature of an inode update operation, 0x123b. This value is in
host-endian order, not big-endian like the rest of XFS.
*ilf_size*::
Number of operations involved in this update, including this format operation.
*ilf_fields*::
Specifies which parts of the inode are being updated. This can be certain
combinations of the following:
[options="header"]
|=====
| Flag | Inode changes to log include:
| +XFS_ILOG_CORE+ | The standard inode fields.
| +XFS_ILOG_DDATA+ | Data fork's local data.
| +XFS_ILOG_DEXT+ | Data fork's extent list.
| +XFS_ILOG_DBROOT+ | Data fork's B+tree root.
| +XFS_ILOG_DEV+ | Data fork's device number.
| +XFS_ILOG_UUID+ | Data fork's UUID contents.
| +XFS_ILOG_ADATA+ | Attribute fork's local data.
| +XFS_ILOG_AEXT+ | Attribute fork's extent list.
| +XFS_ILOG_ABROOT+ | Attribute fork's B+tree root.
| +XFS_ILOG_DOWNER+ | Change the data fork owner on replay.
| +XFS_ILOG_AOWNER+ | Change the attr fork owner on replay.
| +XFS_ILOG_TIMESTAMP+ | Timestamps are dirty, but not necessarily anything else. Should never appear on disk.
| +XFS_ILOG_NONCORE+ | ( +XFS_ILOG_DDATA+ \| +XFS_ILOG_DEXT+ \| +XFS_ILOG_DBROOT+ \| +XFS_ILOG_DEV+ \| +XFS_ILOG_UUID+ \| +XFS_ILOG_ADATA+ \| +XFS_ILOG_AEXT+ \| +XFS_ILOG_ABROOT+ \| +XFS_ILOG_DOWNER+ \| +XFS_ILOG_AOWNER+ )
| +XFS_ILOG_DFORK+ | ( +XFS_ILOG_DDATA+ \| +XFS_ILOG_DEXT+ \| +XFS_ILOG_DBROOT+ )
| +XFS_ILOG_AFORK+ | ( +XFS_ILOG_ADATA+ \| +XFS_ILOG_AEXT+ \| +XFS_ILOG_ABROOT+ )
| +XFS_ILOG_ALL+ | ( +XFS_ILOG_CORE+ \| +XFS_ILOG_DDATA+ \| +XFS_ILOG_DEXT+ \| +XFS_ILOG_DBROOT+ \| +XFS_ILOG_DEV+ \| +XFS_ILOG_UUID+ \| +XFS_ILOG_ADATA+ \| +XFS_ILOG_AEXT+ \| +XFS_ILOG_ABROOT+ \| +XFS_ILOG_TIMESTAMP+ \| +XFS_ILOG_DOWNER+ \| +XFS_ILOG_AOWNER+ )
|=====
*ilf_asize*::
Size of the attribute fork, in bytes.
*ilf_dsize*::
Size of the data fork, in bytes.
*ilf_ino*::
Absolute node number.
*ilfu_rdev*::
Device number information, for a device file update.
*ilfu_uuid*::
UUID, for a UUID update?
*ilf_blkno*::
Block number of the inode buffer, in sectors.
*ilf_len*::
Length of inode buffer, in sectors.
*ilf_boffset*::
Byte offset of the inode in the buffer.
Be aware that there is a nearly identical +xfs_inode_log_format_32+ which may
appear on disk. It is the same as +xfs_inode_log_format_64+, except that it is
missing the +ilf_pad+ field and is 52 bytes long as opposed to 56 bytes.
[[Inode_Data_Log_Item]]
=== Inode Data Log Item
This region contains the new contents of a part of an inode, as described in
the xref:Inode_Log_Item[previous section]. There are no magic numbers.
If +XFS_ILOG_CORE+ is set in +ilf_fields+, the corresponding data buffer must
be in the format +struct xfs_icdinode+, which has the same format as the first
96 bytes of an xref:On-disk_Inode[inode], but is recorded in host byte order.
[[Buffer_Log_Item]]
=== Buffer Log Item
This operation writes parts of a buffer to disk. The regions to write are
tracked in the data map; the actual buffer data are stored in subsequent log
items.
[source, c]
----
typedef struct xfs_buf_log_format {
unsigned short blf_type;
unsigned short blf_size;
ushort blf_flags;
ushort blf_len;
__int64_t blf_blkno;
unsigned int blf_map_size;
unsigned int blf_data_map[XFS_BLF_DATAMAP_SIZE];
} xfs_buf_log_format_t;
----
*blf_type*::
Magic number to specify a buffer log item, 0x123c.
*blf_size*::
Number of buffer data items following this item.
*blf_flags*::
Specifies flags associated with the buffer item. This can be any of the
following:
[options="header"]
|=====
| Flag | Description
| +XFS_BLF_INODE_BUF+ | Inode buffer. These must be recovered before replaying items that change this buffer.
| +XFS_BLF_CANCEL+ | Don't recover this buffer, blocks are being freed.
| +XFS_BLF_UDQUOT_BUF+ | User quota buffer, don't recover if there's a subsequent quotaoff.
| +XFS_BLF_PDQUOT_BUF+ | Project quota buffer, don't recover if there's a subsequent quotaoff.
| +XFS_BLF_GDQUOT_BUF+ | Group quota buffer, don't recover if there's a subsequent quotaoff.
|=====
*blf_len*::
Number of sectors affected by this buffer.
*blf_blkno*::
Block number to write, in sectors.
*blf_map_size*::
The size of +blf_data_map+, in 32-bit words.
*blf_data_map*::
This variable-sized array acts as a dirty bitmap for the logged buffer. Each
1 bit represents a dirty region in the buffer, and each run of 1 bits
corresponds to a subsequent log item containing the new contents of the buffer
area. Each bit represents +XFS_BLF_CHUNK+ (i.e. 128) bytes.
[[Buffer_Data_Log_Item]]
=== Buffer Data Log Item
This region contains the new contents of a part of a buffer, as described in
the xref:Buffer_Log_Item[previous section]. There are no magic numbers.
[[Quota_Update_Log_Item]]
=== Update Quota File
This updates a block in a quota file. The buffer data must be in the next log
item.
[source, c]
----
typedef struct xfs_dq_logformat {
__uint16_t qlf_type;
__uint16_t qlf_size;
xfs_dqid_t qlf_id;
__int64_t qlf_blkno;
__int32_t qlf_len;
__uint32_t qlf_boffset;
} xfs_dq_logformat_t;
----
*qlf_type*::
The signature of an inode create operation, 0x123e. This value is in
host-endian order, not big-endian like the rest of XFS.
*qlf_size*::
Size of this log item. Should be 2.
*qlf_id*::
The user/group/project ID to alter.
*qlf_blkno*::
Block number of the quota buffer, in sectors.
*qlf_len*::
Length of the quota buffer, in sectors.
*qlf_boffset*::
Buffer offset of the quota data to update, in bytes.
[[Quota_Update_Data_Log_Item]]
=== Quota Update Data Log Item
This region contains the new contents of a part of a buffer, as described in
the xref:Quota_Update_Log_Item[previous section]. There are no magic numbers.
[[Quota_Off_Log_Item]]
=== Disable Quota Log Item
A request to disable quota controls has the following format:
[source, c]
----
typedef struct xfs_qoff_logformat {
unsigned short qf_type;
unsigned short qf_size;
unsigned int qf_flags;
char qf_pad[12];
} xfs_qoff_logformat_t;
----
*qf_type*::
The signature of an inode create operation, 0x123d. This value is in
host-endian order, not big-endian like the rest of XFS.
*qf_size*::
Size of this log item. Should be 1.
*qf_flags*::
Specifies which quotas are being turned off. Can be a combination of the
following:
[options="header"]
|=====
| Flag | Quota type to disable
| +XFS_UQUOTA_ACCT+ | User quotas.
| +XFS_PQUOTA_ACCT+ | Project quotas.
| +XFS_GQUOTA_ACCT+ | Group quotas.
|=====
[[Inode_Create_Log_Item]]
=== Inode Creation Log Item
This log item is created when inodes are allocated in-core. When replaying
this item, the specified inode records will be zeroed and some of the inode
fields populated with default values.
[source, c]
----
struct xfs_icreate_log {
__uint16_t icl_type;
__uint16_t icl_size;
__be32 icl_ag;
__be32 icl_agbno;
__be32 icl_count;
__be32 icl_isize;
__be32 icl_length;
__be32 icl_gen;
};
----
*icl_type*::
The signature of an inode create operation, 0x123f. This value is in
host-endian order, not big-endian like the rest of XFS.
*icl_size*::
Size of this log item. Should be 1.
*icl_ag*::
AG number of the inode chunk to create.
*icl_agbno*::
AG block number of the inode chunk.
*icl_count*::
Number of inodes to initialize.
*icl_isize*::
Size of each inode, in bytes.
*icl_length*::
Length of the extent being initialized, in blocks.
*icl_gen*::
Inode generation number to write into the new inodes.
== xfs_logprint Example
Here's an example of dumping the XFS log contents with +xfs_logprint+:
----
# xfs_logprint /dev/sda
xfs_logprint: /dev/sda contains a mounted and writable filesystem
xfs_logprint:
data device: 0xfc03
log device: 0xfc03 daddr: 900931640 length: 879816
cycle: 48 version: 2 lsn: 48,0 tail_lsn: 47,879760
length of Log Record: 19968 prev offset: 879808 num ops: 53
uuid: 24afeec2-f418-46a2-a573-10091f5e200e format: little endian linux
h_size: 32768
----
This is the log record header.
----
Oper (0): tid: 30483aec len: 0 clientid: TRANS flags: START
----
This operation indicates that we're starting a transaction, so the next
operation should record the transaction header.
----
Oper (1): tid: 30483aec len: 16 clientid: TRANS flags: none
TRAN: type: CHECKPOINT tid: 30483aec num_items: 50
----
This operation records a transaction header. There should be fifty operations
in this transaction and the transaction ID is 0x30483aec.
----
Oper (2): tid: 30483aec len: 24 clientid: TRANS flags: none
BUF: #regs: 2 start blkno: 145400496 (0x8aaa2b0) len: 8 bmap size: 1 flags: 0x2000
Oper (3): tid: 30483aec len: 3712 clientid: TRANS flags: none
BUF DATA
...
Oper (4): tid: 30483aec len: 24 clientid: TRANS flags: none
BUF: #regs: 3 start blkno: 59116912 (0x3860d70) len: 8 bmap size: 1 flags: 0x2000
Oper (5): tid: 30483aec len: 128 clientid: TRANS flags: none
BUF DATA
0 43544241 49010000 fa347000 2c357000 3a40b200 13000000 2343c200 13000000
8 3296d700 13000000 375deb00 13000000 8a551501 13000000 56be1601 13000000
10 af081901 13000000 ec741c01 13000000 9e911c01 13000000 69073501 13000000
18 4e539501 13000000 6549501 13000000 5d0e7f00 14000000 c6908200 14000000
Oper (6): tid: 30483aec len: 640 clientid: TRANS flags: none
BUF DATA
0 7f47c800 21000000 23c0e400 21000000 2d0dfe00 21000000 e7060c01 21000000
8 34b91801 21000000 9cca9100 22000000 26e69800 22000000 4c969900 22000000
...
90 1cf69900 27000000 42f79c00 27000000 6a99e00 27000000 6a99e00 27000000
98 6a99e00 27000000 6a99e00 27000000 6a99e00 27000000 6a99e00 27000000
----
Operations 4-6 describe two updates to a single dirty buffer at disk address
59,116,912. The first chunk of dirty data is 128 bytes long. Notice how the
first four bytes of the first chunk is 0x43544241? Remembering that log items
are in host byte order, reverse that to 0x41425443, which is the magic number
for the free space B+tree ordered by size.
The second chunk is 640 bytes. There are more buffer changes, so we'll skip
ahead a few operations:
----
Oper (19): tid: 30483aec len: 56 clientid: TRANS flags: none
INODE: #regs: 2 ino: 0x63a73b4e flags: 0x1 dsize: 40
blkno: 1412688704 len: 16 boff: 7168
Oper (20): tid: 30483aec len: 96 clientid: TRANS flags: none
INODE CORE
magic 0x494e mode 0100600 version 2 format 3
nlink 1 uid 1000 gid 1000
atime 0x5633d58d mtime 0x563a391b ctime 0x563a391b
size 0x109dc8 nblocks 0x111 extsize 0x0 nextents 0x1b
naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
flags 0x0 gen 0x389071be
----
This is an update to the core of inode 0x63a73b4e. There were similar inode
core updates after this, so we'll skip ahead a bit:
----
Oper (32): tid: 30483aec len: 56 clientid: TRANS flags: none
INODE: #regs: 3 ino: 0x4bde428 flags: 0x5 dsize: 16
blkno: 79553568 len: 16 boff: 4096
Oper (33): tid: 30483aec len: 96 clientid: TRANS flags: none
INODE CORE
magic 0x494e mode 0100644 version 2 format 2
nlink 1 uid 1000 gid 1000
atime 0x563a3924 mtime 0x563a3931 ctime 0x563a3931
size 0x1210 nblocks 0x2 extsize 0x0 nextents 0x1
naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
flags 0x0 gen 0x2829c6f9
Oper (34): tid: 30483aec len: 16 clientid: TRANS flags: none
EXTENTS inode data
----
This inode update changes both the core and also the data fork. Since we're
changing the block map, it's unsurprising that one of the subsequent operations
is an EFI:
----
Oper (37): tid: 30483aec len: 32 clientid: TRANS flags: none
EFI: #regs: 1 num_extents: 1 id: 0xffff8801147b5c20
(s: 0x720daf, l: 1)
\----------------------------------------------------------------------------
Oper (38): tid: 30483aec len: 32 clientid: TRANS flags: none
EFD: #regs: 1 num_extents: 1 id: 0xffff8801147b5c20
\----------------------------------------------------------------------------
Oper (39): tid: 30483aec len: 24 clientid: TRANS flags: none
BUF: #regs: 2 start blkno: 8 (0x8) len: 8 bmap size: 1 flags: 0x2800
Oper (40): tid: 30483aec len: 128 clientid: TRANS flags: none
AGF Buffer: XAGF
ver: 1 seq#: 0 len: 56308224
root BNO: 18174905 CNT: 18175030
level BNO: 2 CNT: 2
1st: 41 last: 46 cnt: 6 freeblks: 35790503 longest: 19343245
\----------------------------------------------------------------------------
Oper (41): tid: 30483aec len: 24 clientid: TRANS flags: none
BUF: #regs: 3 start blkno: 145398760 (0x8aa9be8) len: 8 bmap size: 1 flags: 0x2000
Oper (42): tid: 30483aec len: 128 clientid: TRANS flags: none
BUF DATA
Oper (43): tid: 30483aec len: 128 clientid: TRANS flags: none
BUF DATA
\----------------------------------------------------------------------------
Oper (44): tid: 30483aec len: 24 clientid: TRANS flags: none
BUF: #regs: 3 start blkno: 145400224 (0x8aaa1a0) len: 8 bmap size: 1 flags: 0x2000
Oper (45): tid: 30483aec len: 128 clientid: TRANS flags: none
BUF DATA
Oper (46): tid: 30483aec len: 3584 clientid: TRANS flags: none
BUF DATA
\----------------------------------------------------------------------------
Oper (47): tid: 30483aec len: 24 clientid: TRANS flags: none
BUF: #regs: 3 start blkno: 59066216 (0x3854768) len: 8 bmap size: 1 flags: 0x2000
Oper (48): tid: 30483aec len: 128 clientid: TRANS flags: none
BUF DATA
Oper (49): tid: 30483aec len: 768 clientid: TRANS flags: none
BUF DATA
----
Here we see an EFI, followed by an EFD, followed by updates to the AGF and the
free space B+trees. Most probably, we just unmapped a few blocks from a file.
----
Oper (50): tid: 30483aec len: 56 clientid: TRANS flags: none
INODE: #regs: 2 ino: 0x3906f20 flags: 0x1 dsize: 16
blkno: 59797280 len: 16 boff: 0
Oper (51): tid: 30483aec len: 96 clientid: TRANS flags: none
INODE CORE
magic 0x494e mode 0100644 version 2 format 2
nlink 1 uid 1000 gid 1000
atime 0x563a3938 mtime 0x563a3938 ctime 0x563a3938
size 0x0 nblocks 0x0 extsize 0x0 nextents 0x0
naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
flags 0x0 gen 0x35ed661
\----------------------------------------------------------------------------
Oper (52): tid: 30483aec len: 0 clientid: TRANS flags: COMMIT
----
One more inode core update and this transaction commits.