blob: 6f905a1ba7143cc76a9d433357dd8c501d6b3888 [file] [log] [blame]
[[Extended_Attributes]]
= Extended Attributes
Extended attributes enable users and administrators to attach (name: value)
pairs to inodes within the XFS filesystem. They could be used to store
meta-information about the file.
Attribute names can be up to 256 bytes in length, terminated by the first 0
byte. The intent is that they be printable ASCII (or other character set) names
for the attribute. The values can contain up to 64KB of arbitrary binary data.
Some XFS internal attributes (eg. parent pointers) use non-printable names for
the attribute.
Access Control Lists (ACLs) and Data Migration Facility (DMF) use extended
attributes to store their associated metadata with an inode.
XFS uses two disjoint attribute name spaces associated with every inode. These
are the root and user address spaces. The root address space is accessible only
to the superuser, and then only by specifying a flag argument to the function
call. Other users will not see or be able to modify attributes in the root
address space. The user address space is protected by the normal file
permissions mechanism, so the owner of the file can decide who is able to see
and/or modify the value of attributes on any particular file.
To view extended attributes from the command line, use the +getfattr+ command.
To set or delete extended attributes, use the +setfattr+ command. ACLs control
should use the +getfacl+ and +setfacl+ commands.
XFS attributes supports three namespaces: ``user'', ``trusted'' (or ``root'' using
IRIX terminology), and ``secure''.
See the section about xref:Extended_Attribute_Versions[extended attributes] in
the inode for instructions on how to calculate the location of the attributes.
The following four sections describe each of the on-disk formats.
[[Shortform_Attributes]]
== Short Form Attributes
When the all extended attributes can fit within the inode's attribute fork, the
inode's +di_aformat+ is set to ``local'' and the attributes are stored in the
inode's literal area starting at offset +di_forkoff × 8+.
Shortform attributes use the following structures:
[source, c]
----
typedef struct xfs_attr_shortform {
struct xfs_attr_sf_hdr {
__be16 totsize;
__u8 count;
} hdr;
struct xfs_attr_sf_entry {
__uint8_t namelen;
__uint8_t valuelen;
__uint8_t flags;
__uint8_t nameval[1];
} list[1];
} xfs_attr_shortform_t;
typedef struct xfs_attr_sf_hdr xfs_attr_sf_hdr_t;
typedef struct xfs_attr_sf_entry xfs_attr_sf_entry_t;
----
*totsize*::
Total size of the attribute structure in bytes.
*count*::
The number of entries that can be found in this structure.
*namelen* and *valuelen*::
These values specify the size of the two byte arrays containing the name and
value pairs. +valuelen+ is zero for extended attributes with no value.
*nameval[]*::
A single array whose size is the sum of +namelen+ and +valuelen+. The names and
values are not null terminated on-disk. The value immediately follows the name
in the array.
[[Attribute_Flags]]
*flags*::
A combination of the following:
.Attribute Namespaces
[options="header"]
|=====
| Flag | Description
| 0 | The attribute's namespace is ``user''.
| +XFS_ATTR_ROOT+ | The attribute's namespace is ``trusted''.
| +XFS_ATTR_SECURE+ | The attribute's namespace is ``secure''.
| +XFS_ATTR_INCOMPLETE+ | This attribute is being modified.
| +XFS_ATTR_LOCAL+ | The attribute value is contained within this block.
| +XFS_ATTR_PARENT+ | This attribute is a parent pointer.
|=====
.Short form attribute layout
image::images/64.png[]
=== xfs_db Short Form Attribute Example
A file is created and two attributes are set:
----
# setfattr -n user.empty few_attr
# setfattr -n trusted.trust -v val1 few_attr
----
Using xfs_db, we dump the inode:
----
xfs_db> inode <inode#>
xfs_db> p
core.magic = 0x494e
core.mode = 0100644
...
core.naextents = 0
core.forkoff = 15
core.aformat = 1 (local)
...
a.sfattr.hdr.totsize = 24
a.sfattr.hdr.count = 2
a.sfattr.list[0].namelen = 5
a.sfattr.list[0].valuelen = 0
a.sfattr.list[0].root = 0
a.sfattr.list[0].secure = 0
a.sfattr.list[0].name = "empty"
a.sfattr.list[1].namelen = 5
a.sfattr.list[1].valuelen = 4
a.sfattr.list[1].root = 1
a.sfattr.list[1].secure = 0
a.sfattr.list[1].name = "trust"
a.sfattr.list[1].value = "val1"
----
We can determine the actual inode offset to be 220 (15 x 8 + 100) or +0xdc+.
Examining the raw dump, the second attribute is highlighted:
[subs="quotes"]
----
xfs_db> type text
xfs_db> p
09: 49 4e 81 a4 01 02 00 01 00 00 00 00 00 00 00 00 IN..............
10: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 02 ................
20: 44 be 19 be 38 d1 26 98 44 be 1a be 38 d1 26 98 D...8...D...8...
30: 44 be 1a e1 3a 9a ea 18 00 00 00 00 00 00 00 04 D...............
40: 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 ................
50: 00 00 0f 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
60: ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 12 ................
70: 53 a0 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
d0: 00 00 00 00 00 00 00 00 00 00 00 00 *00 18* 02 00 ................ &lt;-- hdr.totsize = 0x18
e0: 05 00 00 65 6d 70 74 79 *05 04 02 74 72 75 73 74* ...empty...trust
f0: *76 61 6c 31* 00 00 00 00 00 00 00 00 00 00 00 00 val1............
----
Adding another attribute with attr1, the format is converted to extents and
+di_forkoff+ remains unchanged (and all those zeros in the dump above remain
unused):
----
xfs_db> inode <inode#>
xfs_db> p
...
core.naextents = 1
core.forkoff = 15
core.aformat = 2 (extents)
...
a.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,37534,1,0]
----
Performing the same steps with attr2, adding one attribute at a time, you can
see +di_forkoff+ change as attributes are added:
----
xfs_db> inode <inode#>
xfs_db> p
...
core.naextents = 0
core.forkoff = 15
core.aformat = 1 (local)
...
a.sfattr.hdr.totsize = 17
a.sfattr.hdr.count = 1
a.sfattr.list[0].namelen = 10
a.sfattr.list[0].valuelen = 0
a.sfattr.list[0].root = 0
a.sfattr.list[0].secure = 0
a.sfattr.list[0].name = "empty_attr"
----
Attribute added:
----
xfs_db> p
...
core.naextents = 0
core.forkoff = 15
core.aformat = 1 (local)
...
a.sfattr.hdr.totsize = 31
a.sfattr.hdr.count = 2
a.sfattr.list[0].namelen = 10
a.sfattr.list[0].valuelen = 0
a.sfattr.list[0].root = 0
a.sfattr.list[0].secure = 0
a.sfattr.list[0].name = "empty_attr"
a.sfattr.list[1].namelen = 7
a.sfattr.list[1].valuelen = 4
a.sfattr.list[1].root = 1
a.sfattr.list[1].secure = 0
a.sfattr.list[1].name = "trust_a"
a.sfattr.list[1].value = "val1"
----
Another attribute is added:
[subs="quotes"]
----
xfs_db> p
...
core.naextents = 0
*core.forkoff = 13*
core.aformat = 1 (local)
...
a.sfattr.hdr.totsize = 52
a.sfattr.hdr.count = 3
a.sfattr.list[0].namelen = 10
a.sfattr.list[0].valuelen = 0
a.sfattr.list[0].root = 0
a.sfattr.list[0].secure = 0
a.sfattr.list[0].name = "empty_attr"
a.sfattr.list[1].namelen = 7
a.sfattr.list[1].valuelen = 4
a.sfattr.list[1].root = 1
a.sfattr.list[1].secure = 0
a.sfattr.list[1].name = "trust_a"
a.sfattr.list[1].value = "val1"
a.sfattr.list[2].namelen = 6
a.sfattr.list[2].valuelen = 12
a.sfattr.list[2].root = 0
a.sfattr.list[2].secure = 0
a.sfattr.list[2].name = "second"
a.sfattr.list[2].value = "second_value"
----
One more is added:
----
xfs_db> p
core.naextents = 0
core.forkoff = 10
core.aformat = 1 (local)
...
a.sfattr.hdr.totsize = 69
a.sfattr.hdr.count = 4
a.sfattr.list[0].namelen = 10
a.sfattr.list[0].valuelen = 0
a.sfattr.list[0].root = 0
a.sfattr.list[0].secure = 0
a.sfattr.list[0].name = "empty_attr"
a.sfattr.list[1].namelen = 7
a.sfattr.list[1].valuelen = 4
a.sfattr.list[1].root = 1
a.sfattr.list[1].secure = 0
a.sfattr.list[1].name = "trust_a"
a.sfattr.list[1].value = "val1"
a.sfattr.list[2].namelen = 6
a.sfattr.list[2].valuelen = 12
a.sfattr.list[2].root = 0
a.sfattr.list[2].secure = 0
a.sfattr.list[2].name = "second"
a.sfattr.list[2].value = "second_value"
a.sfattr.list[3].namelen = 6
a.sfattr.list[3].valuelen = 8
a.sfattr.list[3].root = 0
a.sfattr.list[3].secure = 1
a.sfattr.list[3].name = "policy"
a.sfattr.list[3].value = "contents"
----
A raw dump is shown to compare with the attr1 dump on a prior page, the header
is highlighted:
[subs="quotes"]
----
xfs_db> type text
xfs_db> p
00: 49 4e 81 a4 01 02 00 01 00 00 00 00 00 00 00 00 IN..............
10: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 05 ................
20: 44 be 24 cd 0f b0 96 18 44 be 24 cd 0f b0 96 18 D.......D.......
30: 44 be 2d f5 01 62 7a 18 00 00 00 00 00 00 00 04 D....bz.........
40: 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 ................
50: 00 00 0a 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
60: ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 01 ................
70: 41 c0 00 01 00 00 00 00 00 00 00 00 00 00 00 00 A...............
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
b0: 00 00 00 00 *00 45 04 00* 0a 00 00 65 6d 70 74 79 .....E.....empty
c0: 5f 61 74 74 72 07 04 02 74 72 75 73 74 5f 61 76 .attr...trust.av
d0: 61 6c 31 06 0c 00 73 65 63 6f 6e 64 73 65 63 6f all...secondseco
e0: 6e 64 5f 76 61 6c 75 65 06 08 04 70 6f 6c 69 63 nd.value...polic
f0: 79 63 6f 6e 74 65 6e 74 73 64 5f 76 61 6c 75 65 ycontentsd.value
----
It can be clearly seen that attr2 allows many more attributes to be stored in
an inode before they are moved to another filesystem block.
[[Leaf_Attributes]]
== Leaf Attributes
When an inode's attribute fork space is used up with shortform attributes and
more are added, the attribute format is migrated to ``extents''.
Extent based attributes use hash/index pairs to speed up an attribute lookup.
The first part of the ``leaf'' contains an array of fixed size hash/index pairs
with the flags stored as well. The remaining part of the leaf block contains the
array name/value pairs, where each element varies in length.
Each leaf is based on the +xfs_da_blkinfo_t+ block header declared in the
section about xref:Directory_Attribute_Block_Header[directories]. On a v5
filesystem, the block header is +xfs_da3_blkinfo_t+. The structure
encapsulating all other structures in the attribute block is
+xfs_attr_leafblock_t+.
The structures involved are:
[source, c]
----
typedef struct xfs_attr_leaf_map {
__be16 base;
__be16 size;
} xfs_attr_leaf_map_t;
----
*base*::
Block offset of the free area, in bytes.
*size*::
Size of the free area, in bytes.
[source, c]
----
typedef struct xfs_attr_leaf_hdr {
xfs_da_blkinfo_t info;
__be16 count;
__be16 usedbytes;
__be16 firstused;
__u8 holes;
__u8 pad1;
xfs_attr_leaf_map_t freemap[3];
} xfs_attr_leaf_hdr_t;
----
*info*::
Directory/attribute block header.
*count*::
Number of entries.
*usedbytes*::
Number of bytes used in the leaf block.
*firstused*::
Block offset of the first entry in use, in bytes.
*holes*::
Set to 1 if block compaction is necessary.
*pad1*::
Padding to maintain alignment to 64-bit boundaries.
[source, c]
-----
typedef struct xfs_attr_leaf_entry {
__be32 hashval;
__be16 nameidx;
__u8 flags;
__u8 pad2;
} xfs_attr_leaf_entry_t;
----
*hashval*::
Hash value of the attribute name.
*nameidx*::
Block offset of the name entry, in bytes.
*flags*::
Attribute flags, as specified xref:Attribute_Flags[above].
*pad2*::
Pads the structure to 64-bit boundaries.
[source, c]
----
typedef struct xfs_attr_leaf_name_local {
__be16 valuelen;
__u8 namelen;
__u8 nameval[1];
} xfs_attr_leaf_name_local_t;
----
*valuelen*::
Length of the value, in bytes.
*namelen*::
Length of the name, in bytes.
*nameval*::
The name and the value. String values are not zero-terminated.
[source, c]
----
typedef struct xfs_attr_leaf_name_remote {
__be32 valueblk;
__be32 valuelen;
__u8 namelen;
__u8 name[1];
} xfs_attr_leaf_name_remote_t;
----
*valueblk*::
The logical block in the attribute map where the value is located.
*valuelen*::
Length of the value, in bytes.
*namelen*::
Length of the name, in bytes.
*nameval*::
The name. String values are not zero-terminated.
[source, c]
----
typedef struct xfs_attr_leafblock {
xfs_attr_leaf_hdr_t hdr;
xfs_attr_leaf_entry_t entries[1];
xfs_attr_leaf_name_local_t namelist;
xfs_attr_leaf_name_remote_t valuelist;
} xfs_attr_leafblock_t;
----
*hdr*::
Attribute block header.
*entries*::
A variable-length array of attribute entries.
*namelist*::
A variable-length array of descriptors of local attributes. The location and
size of these entries is determined dynamically.
*valuelist*::
A variable-length array of descriptors of remote attributes. The location and
size of these entries is determined dynamically.
On a v5 filesystem, the header becomes +xfs_da3_blkinfo_t+ to accommodate the
extra metadata integrity fields:
[source, c]
----
typedef struct xfs_attr3_leaf_hdr {
xfs_da3_blkinfo_t info;
__be16 count;
__be16 usedbytes;
__be16 firstused;
__u8 holes;
__u8 pad1;
xfs_attr_leaf_map_t freemap[3];
__be32 pad2;
} xfs_attr3_leaf_hdr_t;
typedef struct xfs_attr3_leafblock {
xfs_attr3_leaf_hdr_t hdr;
xfs_attr_leaf_entry_t entries[1];
xfs_attr_leaf_name_local_t namelist;
xfs_attr_leaf_name_remote_t valuelist;
} xfs_attr3_leafblock_t;
----
Each leaf header uses the magic number +XFS_ATTR_LEAF_MAGIC+ (0xfbee). On a
v5 filesystem, the magic number is +XFS_ATTR3_LEAF_MAGIC+ (0x3bee).
The hash/index elements in the +entries[]+ array are packed from the top of the
block. Name/values grow from the bottom but are not packed. The freemap contains
run-length-encoded entries for the free bytes after the +entries[]+ array, but
only the three largest runs are stored (smaller runs are dropped). When the
+freemap+ doesn't show enough space for an allocation, the name/value area is
compacted and allocation is tried again. If there still isn't enough space, then
the block is split. The name/value structures (both local and remote versions)
must be 32-bit aligned.
For attributes with small values (ie. the value can be stored within the leaf),
the +XFS_ATTR_LOCAL+ flag is set for the attribute. The entry details are stored
using the +xfs_attr_leaf_name_local_t+ structure. For large attribute values
that cannot be stored within the leaf, separate filesystem blocks are allocated
to store the value. They use the +xfs_attr_leaf_name_remote_t+ structure. See
xref:Remote_Values[Remote Values] for more information.
.Leaf attribute layout
image::images/69.png[]
Both local and remote entries can be interleaved as they are only addressed by
the hash/index entries. The flag is stored with the hash/index pairs so the
appropriate structure can be used.
Since duplicate hash keys are possible, for each hash that matches during a
lookup, the actual name string must be compared.
An ``incomplete'' bit is also used for attribute flags. It shows that an attribute
is in the middle of being created and should not be shown to the user if we
crash during the time that the bit is set. The bit is cleared when attribute
has finished being set up. This is done because some large attributes cannot
be created inside a single transaction.
=== xfs_db Leaf Attribute Example
A single 30KB extended attribute is added to an inode:
----
xfs_db> inode <inode#>
xfs_db> p
...
core.nblocks = 9
core.nextents = 0
core.naextents = 1
core.forkoff = 15
core.aformat = 2 (extents)
...
a.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,37535,9,0]
xfs_db> ablock 0
xfs_db> p
hdr.info.forw = 0
hdr.info.back = 0
hdr.info.magic = 0xfbee
hdr.count = 1
hdr.usedbytes = 20
hdr.firstused = 4076
hdr.holes = 0
hdr.freemap[0-2] = [base,size] 0:[40,4036] 1:[0,0] 2:[0,0]
entries[0] = [hashval,nameidx,incomplete,root,secure,local]
0:[0xfcf89d4f,4076,0,0,0,0]
nvlist[0].valueblk = 0x1
nvlist[0].valuelen = 30692
nvlist[0].namelen = 8
nvlist[0].name = "big_attr"
----
Attribute blocks 1 to 8 (filesystem blocks 37536 to 37543) contain the raw
binary value data for the attribute.
Index 4076 (0xfec) is the offset into the block where the name/value information
is. As can be seen by the value, it's at the end of the block:
----
xfs_db> type text
xfs_db> p
000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 01 00 14 ................
010: 0f ec 00 00 00 28 0f c4 00 00 00 00 00 00 00 00 ................
020: fc f8 9d 4f 0f ec 00 00 00 00 00 00 00 00 00 00 ...O............
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
...
fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ................
ff0: 00 00 77 e4 08 62 69 67 5f 61 74 74 72 00 00 00 ..w..big.attr...
----
A 30KB attribute and a couple of small attributes are added to a file:
----
xfs_db> inode <inode#>
xfs_db> p
...
core.nblocks = 10
core.extsize = 0
core.nextents = 1
core.naextents = 2
core.forkoff = 15
core.aformat = 2 (extents)
...
u.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,81857,1,0]
a.bmx[0-1] = [startoff,startblock,blockcount,extentflag]
0:[0,81858,1,0]
1:[1,182398,8,0]
xfs_db> ablock 0
xfs_db> p
hdr.info.forw = 0
hdr.info.back = 0
hdr.info.magic = 0xfbee
hdr.count = 3
hdr.usedbytes = 52
hdr.firstused = 4044
hdr.holes = 0
hdr.freemap[0-2] = [base,size] 0:[56,3988] 1:[0,0] 2:[0,0]
entries[0-2] = [hashval,nameidx,incomplete,root,secure,local]
0:[0x1e9d3934,4044,0,0,0,1]
1:[0x1e9d3937,4060,0,0,0,1]
2:[0xfcf89d4f,4076,0,0,0,0]
nvlist[0].valuelen = 6
nvlist[0].namelen = 5
nvlist[0].name = "attr2"
nvlist[0].value = "value2"
nvlist[1].valuelen = 6
nvlist[1].namelen = 5
nvlist[1].name = "attr1"
nvlist[1].value = "value1"
nvlist[2].valueblk = 0x1
nvlist[2].valuelen = 30692
nvlist[2].namelen = 8
nvlist[2].name = "big_attr"
----
As can be seen in the entries array, the two small attributes have the local
flag set and the values are printed.
A raw disk dump shows the attributes. The last attribute added is highlighted
(offset 4044 or 0xfcc):
[subs="quotes"]
----
000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 03 00 34 ...............4
010: 0f cc 00 00 00 38 0f 94 00 00 00 00 00 00 00 00 .....8..........
020: 1e 9d 39 34 0f cc 01 00 1e 9d 39 37 0f dc 01 00 ..94......97....
030: fc f8 9d 4f 0f ec 00 00 00 00 00 00 00 00 00 00 ...0............
040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00.................
...
fc0: 00 00 00 00 00 00 00 00 00 00 00 00 *00 06 05 61* ...............a
fd0: *74 74 72 32 76 61 6c 75 65 32* 00 00 00 06 05 61 ttr2value2.....a
fe0: 74 74 72 31 76 61 6c 75 65 31 00 00 00 00 00 01 ttr1value1......
ff0: 00 00 77 e4 08 62 69 67 5f 61 74 74 72 00 00 00 ..w..big.attr...
----
[[Node_Attributes]]
== Node Attributes
When the number of attributes exceeds the space that can fit in one filesystem
block (ie. hash, flag, name and local values), the first attribute block becomes
the root of a B+tree where the leaves contain the hash/name/value information
that was stored in a single leaf block. The inode's attribute format itself
remains extent based. The nodes use the +xfs_da_intnode_t+ or
+xfs_da3_intnode_t+ structures introduced in the section about
xref:Directory_Attribute_Internal_Node[directories].
The location of the attribute leaf blocks can be in any order. The only way to
find an attribute is by walking the node block hash/before values. Given a hash
to look up, search the node's btree array for the first +hashval+ in the array
that exceeds the given hash. The entry is in the block pointed to by the
+before+ value.
Each attribute node block has a magic number of +XFS_DA_NODE_MAGIC+ (0xfebe).
On a v5 filesystem this is +XFS_DA3_NODE_MAGIC+ (0x3ebe).
.Node attribute layout
image::images/72.png[]
=== xfs_db Node Attribute Example
An inode with 1000 small attributes with the naming ``attribute_n'' where 'n' is a
number:
----
xfs_db> inode <inode#>
xfs_db> p
...
core.nblocks = 15
core.nextents = 0
core.naextents = 1
core.forkoff = 15
core.aformat = 2 (extents)
...
a.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,525144,15,0]
xfs_db> ablock 0
xfs_db> p
hdr.info.forw = 0
hdr.info.back = 0
hdr.info.magic = 0xfebe
hdr.count = 14
hdr.level = 1
btree[0-13] = [hashval,before]
0:[0x3435122d,1]
1:[0x343550a9,14]
2:[0x343553a6,13]
3:[0x3436122d,12]
4:[0x343650a9,8]
5:[0x343653a6,7]
6:[0x343691af,6]
7:[0x3436d0ab,11]
8:[0x3436d3a7,10]
9:[0x3437122d,9]
10:[0x3437922e,3]
11:[0x3437d22a,5]
12:[0x3e686c25,4]
13:[0x3e686fad,2]
----
The hashes are in ascending order in the btree array, and if the hash for the
attribute we are looking up is before the entry, we go to the addressed
attribute block.
For example, to lookup attribute ``attribute_267'':
----
xfs_db> hash attribute_267
0x3437d1a8
----
In the root btree node, this falls between +0x3437922e+ and +0x3437d22a+,
therefore leaf 11 or attribute block 5 will contain the entry.
[subs="quotes"]
----
xfs_db> ablock 5
xfs_db> p
hdr.info.forw = 4
hdr.info.back = 3
hdr.info.magic = 0xfbee
hdr.count = 96
hdr.usedbytes = 2688
hdr.firstused = 1408
hdr.holes = 0
hdr.freemap[0-2] = [base,size] 0:[800,608] 1:[0,0] 2:[0,0]
entries[0.95] = [hashval,nameidx,incomplete,root,secure,local]
0:[0x3437922f,4068,0,0,0,1]
1:[0x343792a6,4040,0,0,0,1]
2:[0x343792a7,4012,0,0,0,1]
3:[0x343792a8,3984,0,0,0,1]
...
82:[0x3437d1a7,2892,0,0,0,1]
*83:[0x3437d1a8,2864,0,0,0,1]*
84:[0x3437d1a9,2836,0,0,0,1]
...
95:[0x3437d22a,2528,0,0,0,1]
nvlist[0].valuelen = 10
nvlist[0].namelen = 13
nvlist[0].name = "attribute_310"
nvlist[0].value = "value_316\d"
nvlist[1].valuelen = 16
nvlist[1].namelen = 13
nvlist[1].name = "attribute_309"
nvlist[1].value = "value_309\d"
nvlist[2].valuelen = 10
nvlist[2].namelen = 13
nvlist[2].name = "attribute_308"
nvlist[2].value = "value_308\d"
nvlist[3].valuelen = 10
nvlist[3].namelen = 13
nvlist[3].name = "attribute_307"
nvlist[3].value = "value_307\d"
...
nvlist[82].valuelen = 10
nvlist[82].namelen = 13
nvlist[82].name = "attribute_268"
nvlist[82].value = "value_268\d"
nvlist[83].valuelen = 10
nvlist[83].namelen = 13
nvlist[83].name = "attribute_267"
nvlist[83].value = "value_267\d"
nvlist[84].valuelen = 10
nvlist[84].namelen = 13
nvlist[84].name = "attribute_266"
nvlist[84].value = "value_266\d"
...
----
Each of the hash entries has +XFS_ATTR_LOCAL+ flag set (1), which means the
attribute's value follows immediately after the name. Raw disk of the name/value
pair at offset 2864 (0xb30), highlighted with ``value_267'' following
immediately after the name:
[subs="quotes"]
----
b00: 62 75 74 65 5f 32 36 35 76 61 6c 75 65 5f 32 36 bute.265value.26
b10: 35 0a 00 00 00 0a 0d 61 74 74 72 69 62 75 74 65 5......attribute
b20: 51 32 36 36 76 61 6c 75 65 5f 32 36 36 0a 00 00 .266value.266...
b30: *00 0a 0d 61 74 74 72 69 62 75 74 65 5f 32 36 37* ...attribute.267
b40: *76 61 6c 75 65 5f 32 36 37 0a* 00 00 00 0a 0d 61 value.267......a
b50: 74 74 72 69 62 75 74 65 5f 32 36 38 76 61 6c 75 ttribute.268va1u
b60: 65 5f 32 36 38 0a 00 00 00 0a 0d 61 74 74 72 69 e.268......attri
b70: 62 75 74 65 5f 32 36 39 76 61 6c 75 65 5f 32 36 bute.269value.26
----
Each entry starts on a 32-bit (4 byte) boundary, therefore the highlighted entry
has 2 unused bytes after it.
[[Btree_Attributes]]
== B+tree Attributes
When the attribute's extent map in an inode grows beyond the available space,
the inode's attribute format is changed to a ``btree''. The inode contains root
node of the extent B+tree which then address the leaves that contains the extent
arrays for the attribute data. The attribute data itself in the allocated
filesystem blocks use the same layout and structures as described in
xref:Node_Attributes[Node Attributes].
Refer to the previous section on xref:Btree_Extent_List[B+tree Data Extents] for
more information on XFS B+tree extents.
=== xfs_db B+tree Attribute Example
Added 2000 attributes with 729 byte values to a file:
----
xfs_db> inode <inode#>
xfs_db> p
...
core.nblocks = 640
core.extsize = 0
core.nextents = 1
core.naextents = 274
core.forkoff = 15
core.aformat = 3 (btree)
...
a.bmbt.level = 1
a.bmbt.numrecs = 2
a.bmbt.keys[1-2] = [startoff] 1:[0] 2:[219]
a.bmbt.ptrs[1-2] = 1:83162 2:109968
xfs_db> fsblock 83162
xfs_db> type bmapbtd
xfs_db> p
magic = 0x424d4150
level = 0
numrecs = 127
leftsib = null
rightsib = 109968
recs[1-127] = [startoff,startblock,blockcount,extentflag]
1:[0,81870,1,0]
...
xfs_db> fsblock 109968
xfs_db> type bmapbtd
xfs_db> p
magic = 0x424d4150
level = 0
numrecs = 147
leftsib = 83162
rightsib = null
recs[1-147] = [startoff,startblock,blockcount,extentflag]
...
(which is fsblock 81870)
xfs_db> ablock 0
xfs_db> p
hdr.info.forw = 0
hdr.info.back = 0
hdr.info.magic = 0xfebe
hdr.count = 2
hdr.level = 2
btree[0-1] = [hashval,before] 0:[0x343612a6,513] 1:[0x3e686fad,512]
----
The extent B+tree has two leaves that specify the 274 extents used for the
attributes. Looking at the first block, it can be seen that the attribute B+tree
is two levels deep. The two blocks at offset 513 and 512 (ie. access using the
+ablock+ command) are intermediate +xfs_da_intnode_t+ nodes that index all the
attribute leaves.
[[Remote_Values]]
== Remote Attribute Values
On a v5 filesystem, all remote value blocks start with this header:
[source, c]
----
struct xfs_attr3_rmt_hdr {
__be32 rm_magic;
__be32 rm_offset;
__be32 rm_bytes;
__be32 rm_crc;
uuid_t rm_uuid;
__be64 rm_owner;
__be64 rm_blkno;
__be64 rm_lsn;
};
----
*rm_magic*::
Specifies the magic number for the remote value block: "XARM" (0x5841524d).
*rm_offset*::
Offset of the remote value data, in bytes.
*rm_bytes*::
Number of bytes used to contain the remote value data.
*rm_crc*::
Checksum of the remote value block.
*rm_uuid*::
The UUID of this block, which must match either +sb_uuid+ or +sb_meta_uuid+
depending on which features are set.
*rm_owner*::
The inode number that this remote value block belongs to.
*rm_blkno*::
Disk block number of this remote value block.
*rm_lsn*::
Log sequence number of the last write to this block.
Filesystems formatted prior to v5 do not have this header in the remote block.
Value data begins immediately at offset zero.
[[Parent_Pointers]]
== Directory Parent Pointers
If this feature is enabled, each directory entry pointing from a parent
directory to a child file has a corresponding back link from the child file
back to the parent. In other words, if directory P has an entry "foo" pointing
to child C, then child C will have a parent pointer entry "foo" pointing to
parent P. This redundancy enables validation and repairs of the directory tree
if the tree structure is damaged.
Parent pointers are stored in a private namespace within the extended attribute
structure. The attribute name contains the following binary structure, and
the attribute value contains the directory entry name.
[source, c]
----
struct xfs_parent_name_rec {
__be64 p_ino;
__be32 p_gen;
__be32 p_namehash;
};
----
*p_ino*::
Inode number of the parent directory.
*p_gen*::
Generation number of the parent directory.
*p_namehash*::
The directory name hash of the directory entry name in the parent.
=== xfs_db Parent Pointer Example
Create a directory tree with the following structure, assuming that the
XFS filesystem is mounted on +/mnt+:
----
$ mkdir /mnt/a/ /mnt/b
$ touch /mnt/a/autoexec.bat
$ ln /mnt/a/autoexec.bat /mnt/b/config.sys
----
Now we open this up in the debugger:
----
xfs_db> path /a
xfs_db> ls
8 131 directory 0x0000002e 1 . (good)
10 128 directory 0x0000172e 2 .. (good)
12 132 regular 0x5a1f6ea0 12 autoexec.bat (good)
xfs_db> path /b
xfs_db> ls
8 8388736 directory 0x0000002e 1 . (good)
10 128 directory 0x0000172e 2 .. (good)
15 132 regular 0x9a01678c 10 config.sys (good)
xfs_db> path /b/config.sys
xfs_db> p a
a.sfattr.hdr.totsize = 64
a.sfattr.hdr.count = 2
a.sfattr.list[0].namelen = 16
a.sfattr.list[0].valuelen = 12
a.sfattr.list[0].root = 0
a.sfattr.list[0].secure = 0
a.sfattr.list[0].parent = 1
a.sfattr.list[0].parent_ino = 131
a.sfattr.list[0].parent_gen = 3772462576
a.sfattr.list[0].parent_namehash = 0x5a1f6ea0
a.sfattr.list[0].parent_name = "autoexec.bat"
a.sfattr.list[1].namelen = 16
a.sfattr.list[1].valuelen = 10
a.sfattr.list[1].root = 0
a.sfattr.list[1].secure = 0
a.sfattr.list[1].parent = 1
a.sfattr.list[1].parent_ino = 8388736
a.sfattr.list[1].parent_gen = 1161632072
a.sfattr.list[1].parent_namehash = 0x9a01678c
a.sfattr.list[1].parent_name = "config.sys"
----
In this example, +/a+ and +/b+ are subdirectories of the root. A regular file
is hardlinked into both subdirectories, under different names. Directory +/a+
is inode 131 and has an entry +autoexec.bat+ pointing to the child file.
Directory +/b+ is inode 8388736 and has an entry +config.sys+ pointing to the
same child file.
Within the child file, notice that there are two parent pointers in the
extended attribute structure. The first parent pointer tells us that directory
inode 131 should have an entry +autoexec.bat+ pointing down to the child; the
second parent pointer tells us that directory inode 8388736 should have an
entry +config.sys+ pointing down to the child. Note that the name hashes are
the same between each directory entry and its parent pointer.
== Key Differences Between Directories and Extended Attributes
Directories and extended attributes share the function of mapping names to
information, but the differences in the functionality requirements applied to
each type of structure influence their respective internal formats.
Directories map variable length names to iterable directory entry records
(dirent records), whereas extended attributes map variable length names to
non-iterable attribute records. Both structures can take advantage of variable
length record btree structures (i.e the dabtree) to map name hashes, but there
are major differences in the way each type of structure integrate the dabtree
index within the information being stored. The directory dabtree leaf nodes
contain mappings between a name hash and the location of a dirent record inside
the directory entry segment. Extended attributes, on the other hand, store
attribute records directly in the leaf nodes of the dabtree.
When XFS adds or removes an attribute record in any dabtree, it splits or
merges leaf nodes of the tree based on where the name hash index determines a
record needs to be inserted into or removed. In the attribute dabtree, XFS
splits or merges sparse leaf nodes of the dabtree as a side effect of inserting
or removing attribute records.
Directories, however, are subject to stricter constraints. The userspace
readdir/seekdir/telldir directory cookie API places a requirement on the
directory structure that dirent record cookie cannot change for the life of the
dirent record. XFS uses the dirent record's logical offset into the directory
data segment as the cookie, and hence the dirent record cannot change location.
Therefore, XFS cannot store dirent records in the leaf nodes of the dabtree
because the offset into the tree would change as other entries are inserted and
removed.
Dirent records are therefore stored within directory data blocks, all of which
are mapped in the first directory segment. The directory dabtree is mapped
into the second directory segment. Therefore, directory blocks require
external free space tracking because they are not part of the dabtree itself.
Because the dabtree only stores pointers to dirent records in the first data
segment, there is no need to leave holes in the dabtree itself. The dabtree
splits or merges leaf nodes as required as pointers to the directory data
segment are added or removed, and needs no free space tracking.
When XFS adds a dirent record, it needs to find the best-fitting free space in
the directory data segment to turn into the new record. This requires a free
space index for the directory data segment. The free space index is held in
the third directory segment. Once XFS has used the free space index to find
the block with that best free space, it modifies the directory data block and
updates the dabtree to point the name hash at the new record. When XFS removes
dirent records, it leaves hole in the data segment so that the rest of the
entries do not move, and removes the corresponding dabtree name hash mapping.
Note that for small directories, XFS collapses the name hash mappings and
the free space information into the directory data blocks to save space.
In summary, the requirement for a free space map in the directory structure
results from storing the dirent records externally to the dabtree. Attribute
records are stored directly in the dabtree leaf nodes of the dabtree (except
for remote attribute values which can be anywhere in the attr fork address
space) and do not need external free space tracking to determine where to best
insert them. As a result, extended attributes exhibit nearly perfect scaling
until the computer runs out of memory.