blob: f172e0f81616568aae723cb3d7cd7afe854adfe6 [file] [log] [blame]
[[Reconstruction]]
= Metadata Reconstruction
[NOTE]
This is a theoretical discussion of how reconstruction could work; none of this
is implemented as of 2015.
A simple UNIX filesystem can be thought of in terms of a directed acyclic graph.
To a first approximation, there exists a root directory node, which points to
other nodes. Those other nodes can themselves be directories or they can be
files. Each file, in turn, points to data blocks.
XFS adds a few more details to this picture:
* The real root(s) of an XFS filesystem are the allocation group headers
(superblock, AGF, AGI, AGFL).
* Each allocation groups headers point to various per-AG B+trees (free space,
inode, free inodes, free list, etc.)
* The free space B+trees point to unused extents;
* The inode B+trees point to blocks containing inode chunks;
* All superblocks point to the root directory and the log;
* Hardlinks mean that multiple directories can point to a single file node;
* File data block pointers are indexed by file offset;
* Files and directories can have a second collection of pointers to data blocks
which contain extended attributes;
* Large directories require multiple data blocks to store all the subpointers;
* Still larger directories use high-offset data blocks to store a B+tree of
hashes to directory entries;
* Large extended attribute forks similarly use high-offset data blocks to store
a B+tree of hashes to attribute keys; and
* Symbolic links can point to data blocks.
The beauty of this massive graph structure is that under normal circumstances,
everything known to the filesystem is discoverable (access controls
notwithstanding) from the root. The major weakness of this structure of course
is that breaking a edge in the graph can render entire subtrees inaccessible.
+xfs_repair+ recovers from broken directories by scanning for unlinked inodes
and connecting them to +/lost+found+, but this isnt sufficiently general to
recover from breaks in other parts of the graph structure. Wouldnt it be
useful to have back pointers as a secondary data structure? The current repair
strategy is to reconstruct whatever can be rebuilt, but to scrap anything that
doesn't check out.
The xref:Reverse_Mapping_Btree[reverse-mapping B+tree] fills in part of the
puzzle. Since it contains copies of every entry in each inode’s data and
attribute forks, we can fix a corrupted block map with these records.
Furthermore, if the inode B+trees become corrupt, it is possible to visit all
inode chunks using the reverse-mapping data. Should XFS ever gain the ability
to store parent directory information in each inode, it also becomes possible
to resurrect damaged directory trees, which should reduce the complaints about
inodes ending up in +/lost+found+. Everything else in the per-AG primary
metadata can already be reconstructed via +xfs_repair+. Hopefully,
reconstruction will not turn out to be a fool's errand.