repair/README - pub/scm/linux/kernel/git/djwong/xfsprogs-dev - Git at Google

 # SPDX-License-Identifier: GPL-2.0

 A living document.  The basic algorithm.

 TODO: (D == DONE)

 0)	Need to bring some sanity into the case of flags that can
 	be set in the secondaries at mkfs time but reset or cleared
 	in the primary later in the filesystem's life.

 0)	Clear the persistent read-only bit if set.  Clear the
 	shared bit if set and the version number is zero.  This
 	brings the filesystem back to a known state.

 0)	make sure that superblock geometry code checks the logstart
 	value against whether or not we have an internal log.
 	If we have an internal log and a logdev, that's ok.
 	(Maybe we just aren't using it).  If we have an external
 	log (logstart == 0) but no logdev, that's right out.

 0)	write secondary superblock search code.  Rewrite initial
 	superblock parsing code to be less complicated.  Just
 	use variables to indicate primary, secondary, etc.,
 	and use a function to get the SB given a specific location
 	or something.

 2)	For inode alignment, if the SB bit is set and the
 	inode alignment size field in the SB is set, then
 	believe that the fs inodes MUST be aligned and
 	disallow any non-aligned inodes.  Likewise, if
 	the SB bit isn't set (or earlier version) and
 	the inode alignment size field is zero, then
 	never set the bit even if the inodes are aligned.
 	Note that the bits and alignment values are
 	replicated in the secondary superblocks.

 0)  add feature specification options to parse_arguments

 0)	add logic to add_inode_ref(), add_inode_reached()
 	to detect nlink overflows in cases where the fs
 	(or user had indicated fs) doesn't support new nlinks.

 6) check to make sure that the inodes containing btree blocks
 	with # recs < minrecs aren't legit -- e.g. the only
 	descendant of a root block.

 7)  inode di_size value sanity checking -- should always be less than
 	the biggest filebno offset mentioned in the bmaps.  Doesn't
 	have to be equal though since we're allowed to overallocate
 	(it just wastes a little space).  This is for both regular
 	files and directories (have to modify the existing directory
 	check).

 	Add tracking of largest offset in bmap scanning code.  Compare
 	value against di_size.  Should be >= di_size.

 	Alternatively, you could pass the inode into down through
 	the extent record processing layer and make the checks
 	there.

 	Add knowledge of quota inodes.  size of quota inode is
 	always zero.  We should maintain that.

 8)  Basic quota stuff.

 	Invariants
 		if quota feature bit is set, the quota inodes
 		if set, should point to disconnected, 0 len inodes.

 D -		if quota inodes exist, the quota bits must be
 		turned on.  It's ok for the quota flags to be
 		zeroed but they should be in a legal state
 		(see xfs_quota.h).

 D -		if the quota flags are non-zero, the corresponding
 		quota inodes must exist.

 		quota inodes are never deleted, only their space
 		is freed.

 	if quotas are being downgraded, then check quota inodes
 	at the end of phase 3.  If they haven't been cleared yet,
 	clear them.  Regardless, then clear sb flags (quota inode
 	fields, quota flags, and quota bit).


 5) look at verify_inode_chunk().  it's probably really broken.


 9)  Complicated quota stuff.  Add code to bmap scan code to
 	track used blocks.  Add another pair of AVL trees
 	to track user and project quota limits.  Set AVL
 	trees up at the beginning of phase 3.  Quota inodes
 	can be rebuilt or corrected later if damaged.


 D - 0)	fix directory processing.  phase 3, if an entry references
 	a free inode, *don't* mark it used.  wait for the rest of
 	phase 3 processing to hit that inode.  If it looks like it's
 	in use, we'll mark in use then.  If not, we'll clear it and
 	mark the inode map.  then in phase 4, you can depend on the
 	inode map.  should probably set the parent info in phase 4.
 	So we have a check_dups flag.  Maybe we should change the
 	name of check_dir to discover_inodes.  During phase 3
 	(discover_inodes == 1), uncertain inodes are added to list.
 	During phase 4 (discover_inodes == 0), they aren't.  And
 	we never mark inodes in use from the directory code.
 	During phase 4, we shouldn't complain about names with
 	a leading '/' since we made those names in phase 3.

 	Have to change dino_chunks.c (parent setting), dinode.c
 	and dir.c.

 D - 0)	make sure we don't screw up filesystems with real-time inodes.
 	remember to initialize real-time map with all blocks XR_E_FREE.

 D - 4) check contents of symlinks as well as lengths in process_symlinks()
 	in dinode.c.  Right now, we only check lengths.


 D - 1)	Feature mismatches -- for quotas and attributes,
 	if the stuff exists in the filesystem, set the
 	superblock version bits.

 D - 0)	rewrite directory leaf block holemap comparison code.
 	probably should just check the leaf block hole info
 	against our incore bitmap.  If the hole flag is not
 	set, then we know that there can only be one hole and
 	it has to be between the entry table and the top of heap.
 	If the hole flag is set, then it's ok if the on-disk
 	holemap doesn't describe everything as long as what
 	it does describe doesn't conflict with reality.

 D - 0)	rewrite setting nlinks handling -- for version 1
 	inodes, set both nlinks and onlinks (zero projid_lo/hi
 	and pad) if we have to change anything.  For
 	version 2, I think we're ok.

 D - 0)	Put awareness of quota inode into mark_standalone_inodes.


 D - 8) redo handling of superblocks with bad version numbers.  need
 	to bail out (without harming) fs's that have sbs that
 	are newer than we are.

 D - 0)  How do we handle feature mismatches between fs and
 	superblock?  For nlink, check each inode after you
 	know it's good.  If onlinks is 0 and nlinks is > 0
 	and it's a version 2 inode, then it really is a version
 	2 inode and the nlinks flag in the SB needs to be set.
 	If it's a version 2 inode and the SB agrees but onlink
 	is non-zero, then clear onlink.

 D - 3)  keep cumulative counts of freeblocks, inodes, etc. to set in
 	the superblock at the end of phase 5.  Remember that
 	agf freeblock counters don't include blocks used by
 	the non-root levels of the freespace trees but that
 	the sb free block counters include those.

 D - 0)  Do parent setting in directory code (called by phase 3).
 	actually, I put it in process_inode_set and propagated
 	the parent up to it from the process_dinode/process_dir
 	routines.  seemed cleaner than pushing the irec down
 	and letting them bang on it.

 D - 0)  If we clear a file in phase 4, make sure that if it's
 	a directory that the parent info is cleared also.

 D - 0) put inode tree flashover (call to add_ino_backptrs) into phase 5.

 D - 0) do set/get_inode_parent functions in incore_ino.c.
 	also do is/set/ inode_processed.

 D - 0) do a versions.c to extract feature info and set global vars
 	from the superblock version number and possibly feature bits

 D - 0) change longform_dir_entry_check + shortform_dir_entry_check
 	to return a count of how many illegal '/' entries exist.
 	if > 0, then process_dirstack needs to call prune_dir_entry
 	with a hash value of 0 to delete the entries.

 D - 0)  add the "processed" bitfield
 	to the backptrs_t struct that gets attached after
 	phase 4.

 D- )  Phase 6 !!!

 D - 0) look at usage of XFS_MAKE_IPTR().  It does the right
 	arithmetic assuming you count your offsets from the
 	beginning of the buffer.


 D - 0) look at references to XFS_INODES_PER_CHUNK.  change the
 	ones that really mean sizeof(uint64_t)*NBBY to
 	something else (like that only defined as a constant
 	INOS_PER_IREC. this isn't as important since
 	XFS_INODES_PER_CHUNK will never chang


 D - 0) look at junk_zerolen_dir_leaf_entries() to make sure it isn't hosing
 	the freemap since it assumed that bytes between the
 	end of the table and firstused didn't show up in the
 	freemap when they actually do.

 D - 0) track down XFS_INO_TO_OFFSET() usage.  I don't think I'm
 	using it right.  (e.g. I think
 	it gives you the offset of an inode into a block but
 	on small block filesystems, I may be reading in inodes
 	in multiblock buffers and working from the start of
 	the buffer plus I'm using it to get offsets into
 	my ino_rec's which may not be a good idea since I
 	use 64-inode ino_rec's whereas the offset macro
 	works off blocksize).

 D - 0.0) put buffer -> dirblock conversion macros into xfs kernel code

 D - 0.2) put in sibling pointer checking and path fixup into
 	bmap (long form) scan routines in scan.c
 D - 0.3) find out if bmap btrees with only root blocks are legal.  I'm
 	betting that they're not because they'd be extent inodes
 	instead.  If that's the case, rip some code out of
 	process_btinode()


 Algorithm (XXX means not done yet):

 Phase 1 -- get a superblock and zero log

 	get a superblock -- either read in primary or
 		find a secondary (ag header), check ag headers

 		To find secondary:

 			Go for brute force and read in the filesystem N meg
 				at a time looking for a superblock.  as a
 				slight optimization, we could maybe skip
 				ahead some number of blocks to try and get
 				towards the end of the first ag.

 			After you find a secondary, try and find at least
 				other ags as a verification that the
 				secondary is a good superblock.

 XXX -		Ugh.  Have to take growfs'ed filesystems into account.
 		The root superblock geometry info may not be right if
 		recovery hasn't run or it's been trashed.  The old ag's
 		may or may not be right since the system could have crashed
 		during growfs or the bwrite() to the superblocks could have
 		failed and the buffer been reused.  So we need to check
 		to see if another ag exists beyond the "last" ag
 		to see if a growfs happened.  If not, then we know that
 		the geometry info is good and treat the fs as a non-growfs'ed
 		fs.  If we do have inconsistencies, then the smaller geometry
 		is the old fs and the larger the new.  We can check the
 		new superblocks to see if they're good.  If not, then we
 		know the system crashed at or soon after the growfs and
 		we can choose to either accept the new geometry info or
 		trash it and truncate the fs back to the old geometry
 		parameters.

 	Cross-check geometry information in secondary sb's with
 	primary to ensure that it's correct.

 	Use sim code to allow mount filesystems *without* reading
 	in root inode.  This sets up the xfs_mount_t structure
 	and allows us to use XFS_* macros that we wouldn't
 	otherwise be able to use.

 	Note, I split phase 1 and 2 into separate pieces because I want
 	to initialize the xfs_repair incore data structures after phase 1.

 	parse superblock version and feature flags and set appropriate
 		global vars to reflect the flags (attributes, quotas, etc.)

 	Workaround for the mkfs "not zeroing the superblock buffer" bug.
 	Determine what field is the last valid non-zero field in
 	the superblock.  The trick here is to be able to differentiate
 	the last valid non-zero field in the primary superblock and
 	secondaries because they may not be the same.  Fields in
 	the primary can be set as the filesystem gets upgraded but
 	the upgrades won't touch the secondaries.  This means that
 	we need to find some number of secondaries and check them.
 	So we do the checking here and the setting in phase2.

 Phase 2 -- check integrity of allocation group allocation structures

 	zero the log if in no modify mode

 	sanity check ag headers -- superblocks match, agi isn't
 				trashed -- the agf and agfl
 				don't really matter because we can
 				just recreate them later.

 		Zero part of the superblock buffer if necessary

 		Walk the freeblock trees to get an
 			initial idea of what the fs thinks is free.
 			Files that disagree (claim free'd blocks)
 			can be salvaged or deleted.  If the btree is
 			internally inconsistent, when in doubt, mark
 			blocks free.  If they're used, they'll be stolen
 			back later.  don't have to check sibling pointers
 			for each level since we're going to regenerate
 			all the trees anyway.
 		Walk the inode allocation trees and
 			make sure they're ok, otherwise the sim
 			inode routines will probably just barf.
 			mark inode allocation tree blocks and ag header
 			blocks as used blocks.  If the trees are
 			corrupted, this phase will generate "uncertain"
 			inode chunks.  Those chunks go on a list and
 			will have to verified later.  Record the blocks
 			that are used to detect corruption and multiply
 			claimed blocks.  These trees will be regenerated
 			later.  Mark the blocks containing inodes referenced
 			by uncorrupted inode trees as being used by inodes.
 			The other blocks will get marked when/if the inodes
 			are verified.

 	calculate root and realtime inode numbers from the
 		filesystem geometry, fix up mount structure's
 		incore superblock if they're wrong.

 ASSUMPTION:  at end of phase 2, we've got superblocks and ag headers
 	that are not garbage (some data in them like counters and the
 	freeblock and inode trees may be inconsistent but the header
 	is readable and otherwise makes sense).

 XXX	if in no_modify mode, check for blocks claimed by one freespace
 	btree and not the other

 Phase 3 -- traverse inodes to make the inodes, bmaps and freespace maps
 		consistent.  For each ag, use either the incore inode map or
 		scan the ag for inodes.
 		Let's use the incore inode map, now that we've made one
 		up in phase2.  If we lose the maps, we'll locate inodes
 		when we traverse the directory heirarchy.  If we lose both,
 		we could scan the disk.  Ugh.  Maybe make that a command-line
 		option that we support later.

 	ASSUMPTION: we know if the ag allocation btrees are intact (phase 2)

 	First - Walk and clear the ag unlinked lists.  We'll process
 		the inodes later.  Check and make sure that the unlinked
 		lists reference known inodes.  If not, add to the list
 		of uncertain inodes.

 	Second, check the uncertain inode list generated in phase2 and
 		above and get them into the inode tree if they're good.
 		The incore inode cluster tree *always* has good
 		clusters (alignment, etc.) in it.

 	Third, make sure that the root inode is known.  If not,
 		and we know the inode number from the superblock,
 		discover that inode and its chunk.

 	Then, walk the incore inode-cluster tree.

 	Maintain an in-core bitmap over the entire fs for block allocation.

 	traverse each inode, make sure inode mode field matches free/allocated
 		bit in the incore inode allocation tree.  If there's a mismatch,
 		assume that the inode is in use.

 		- for each in-use inode, traverse each bmap/dir/attribute
 			map or tree.  Maintain a map (extent list?) for the
 			current inode.

 		- For each block marked as used, check to see if already known
 			(referenced by another file or directory) and sanity
 			check the contents of the block as well if possible
 			(in the case of meta-blocks).

 		- if the inode claims already used blocks, mark the blocks
 			as multiply claimed (duplicate) and go on.  the inode
 			will be cleared in phase 4.

 		- if metablocks are garbaged, clear the inode after
 			traversing what you can of the bmap and
 			proceed to next inode.  We don't have to worry
 			about trashing the maps or trees in cleared inodes
 			because the blocks will show up as free in the
 			ag freespace trees that we set up in phase 5.

 		- clear the di_next_unlinked pointer -- all unlinked
 			but active files go bye-bye.

 		- All blocks start out unknown.  We need the last state
 			in case we run into a case where we need to step
 			on a block to store filesystem meta-data and it
 			turns out later that it's referenced by some inode's
 			bmap.  In that case, the inode loses because we've
 			already trashed the block.  This shouldn't happen
 			in the first version unless some inode has a bogus
 			bmap referencing blocks in the ag header but the
 			4th state will keep us from inadvertently doing
 			something stupid in that case.

 		- If inode is allocated, mark all blocks allocated to the
 			current inode as allocated in the incore freespace
 			bitmap.

 		- If inode is good and a directory, scan through it to
 			find leaf entries and discover any unknown inodes.

 			For shortform, we correct what we can.

 			If the directory is corrupt, we try and fix it in
 			place.  If it has zero good entries, then we blast it.

 			All unknown inodes get put onto the uncertain inode
 			list.  This is safe because we only put inodes onto
 			the list when we're processing known inodes so the
 			uncertain inode list isn't in use.

 			We fix only one problem -- an entry that has
 			a mathematically invalid inode numbers in them.
 			If that's the case, we replace the inode number
 			with NULLFSINO and we'll fix up the entry in
 			phase 6.

 			That info may conflict with the inode information,
 			but we'll straighten out any inconsistencies there
 			in phase4 when we process the inodes again.

 			Errors involving bogus forward/back links,
 			zero-length entries make the directory get
 			trashed.

 			if an entry references a free inode, ignore that
 			fact for now.  wait for the rest of phase 3
 			processing to hit that inode.  If it looks like it's
 			in use, we'll mark in use then.  If not, we'll
 			clear it and mark the inode map.  then in phase
 			4, you can depend on the inode map.

 			Entries that point to non-existent or free
 			inodes, and extra blocks in the directory
 			will get fixed in place in a later pass.

 			Entries that point to a quota inode are
 			marked TBD.

 			If the directory internally points to the same
 			block twice, the directory gets blown away.

 	Note that processing uncertain inodes can add more inodes
 	to the uncertain list if they're directories.  So we loop
 	until the uncertain list is empty.

 	During inode verification, if the inode blocks are unknown,
 	mark then as in-use by inodes.

 XXX	HEURISTIC -- if we blow an inode away that has space,
 	assume that the freespace btree is now out of wack.
 	If it was ok earlier, it's certain to be wrong now.
 	And the odds of this space free cancelling out the
 	existing error is so small I'm willing to ignore it.
 	Should probably do this via a global var and complain
 	about this later.

 Assumption:  All known inodes are now marked as in-use or free.  Any
 	inodes that we haven't found by now are hosed (lost) since
 	we can't reach them via either the inode btrees or via directory
 	entries.

 	Directories are semi-clean.  All '.' entries are good.
 	Root '..' entry is good if root inode exists.  All entries
 	referencing non-existent inodes, free inodes, etc.

 XXX	verify that either quota inode is 0 or NULLFSINO or
 	if sb quota flag is non zero, verify that quota inode
 	is NULLFSINO or is referencing a used, but disconnected
 	inode.

 XXX	if in no_modify mode, check for unclaimed blocks

 - Phase 4 - Check for inodes referencing duplicate blocks

 	At this point, all known duplicate blocks are marked in
 	the block map.  However, some of the claimed blocks in
 	the bmap may in fact be free because they belong to inodes
 	that have to be cleared either due to being a trashed
 	directory or because it's the first inode to claim a
 	block that was then claimed later.  There's a similar
 	problem with meta-data blocks that are referenced by
 	inode bmaps that are going to be freed once the inode
 	(or directory) gets cleared.

 	So at this point, we collect the duplicate blocks into
 	extents and put them into the duplicate extent list.

 	Mark the ag header blocks as in use.

 	We then process each inode twice -- the first time
 	we check to see if the inode claims a duplicate extent
 	and we do NOT set the block bitmap.  If the inode claims
 	a duplicate extent, we clear the inode.  Since the bitmap
 	hasn't been set, that automatically frees all blocks associated
 	with the cleared inode.  If the inode is ok, process it a second
 	time and set the bitmap since we know that this inode will live.

 	The unlinked list gets cleared in every inode at this point as
 	well.  We no longer need to preserve it since we've discovered
 	every inode we're going to find from it.

 	verify existence of root inode.  if it exists, check for
 	existence of "lost+found".  If it exists, mark the entry
 	to be deleted, and clear the inode.  All the inodes that
 	were connected to the lost+found will be reconnected later.

 XXX	HEURISTIC -- if we blow an inode away that has space,
 	assume that the freespace btree is now out of wack.
 	If it was ok earlier, it's certain to be wrong now.
 	And the odds of this space free cancelling out the
 	existing error is so small I'm willing to ignore it.
 	Should probably do this via a global var and complain
 	about this later.

 	Clear the quota inodes if the inode btree says that
 	they're not in use.  The space freed will get picked
 	up by phase 5.

 XXX	Clear the quota inodes if the filesystem is being downgraded.

 - Phase 5 - Build inode allocation trees, freespace trees and
 		agfl's for each ag.  After this, we should be able to
 		unmount the filesystem and remount it for real.

 	For each ag: (if no in no_modify mode)

 	scan bitmap first to figure out number of extents.

 	calculate space required for all trees.  Start with inode trees.
 	Setup the btree cursor which includes the list of preallocated
 	blocks.  As a by-product, this will delete the extents required
 	for the inode tree from the incore extent tree.

 	Calculate how many extents will be required to represent the
 	remaining free extent tree on disk (twice, one for bybno and
 	one for bycnt).  You have to iterate on this because consuming
 	extents can alter the number of blocks required to represent
 	the remaining extents.  If there's slop left over, you can
 	put it in the agfl though.

 	Then, manually build the trees, agi, agfs, and agfls.

 XXX	if in no_modify mode, scan the on-disk inode allocation
 	trees and compare against the incore versions.  Don't have
 	to scan the freespace trees because we caught the problems
 	there in phase2 and phase3.  But if we cleared any inodes
 	with space during phases 3 or 4, now is the time to complain.

 XXX -	Free duplicate extent lists. ???

 Assumptions:  at this point, sim code having to do with inode
 		creation/modification/deletion and space allocation
 		work because the inode maps, space maps, and bmaps
 		for all files in the filesystem are good.  The only
 		structures that are screwed up are the directory contents,
 		which means that lookup may not work for beans, the
 		root inode which exists but may be completely bogus and
 		the link counts on all inodes which may also be bogus.

 	Free the bitmap, the freespace tree.

 	Flash the incore inode tree over from parent list to having
 	full backpointers.

 	realtime processing, if any --

 		(Skip to below if running in no_modify mode).

 		Generate the realtime bitmap from the incore realtime
 		extent map and slam the info into the realtime bitmap
 		inode.  Generate summary info from the realtime extent map.

 XXX		if in no_modify mode, compare contents of realtime bitmap
 		inode to the incore realtime extent map.  generate the
 		summary info from the incore realtime extent map.
 		compare against the contents of the realtime summary inode.
 		complain if bad.

 	reset superblock counters, sync version numbers

 - Phase 6 - directory traversal -- check reference counts,
 		attach disconnected inodes, fix up bogus directories

 	Assumptions:  all on-disk space and inode trees are structurally
 		sound.  Incore and on-disk inode trees agree on whether
 		an inode is in use.

 		Directories are structurally sound.  All hashvalues
 		are monotonically increasing and interior nodes are
 		correct so lookups work.  All legal directory entries
 		point to inodes that are in use and exist.  Shortform
 		directories are fine except that the links haven't been
 		checked for conflicts (cycles, ".." being correct, etc.).
 		Longform directories haven't been checked for those problems
 		either PLUS longform directories may still contain
 		entries beginning with '/'.  No zero-length entries
 		exist (they've been deleted or converted to '/').

 		Root directory may or may not exist.  orphange may
 		or may not exist.  Contents of either may be completely
 		bogus.

 		Entries may point to free or non-existent inodes.

 	At this we point, we may need new incore structures and
 		may be able to trash an old one (like the filesystem
 		block map)

 	If '/' is trashed, then reinitialize it.

 	If no realtime inodes, make them and if necessary, slam the
 		summary info into the realtime summary
 		inode.  Ditto with the realtime bitmap inode.

 	Make orphanage (lost+found ???).

 	Traverse each directory from '/' (unless it was created).
 		Check directory structure and each directory entry.
 		If the entry is bogus (points to a non-existent or
 		free inode, for example), mark that entry TBD.  Maintain
 		link counts on all inodes.  Currently, traversal is
 		depth-first.

 		Mark every inode reached as "reached" (includes
 		bumping up link counts).

 		If a entry points to a directory but the parent (..)
 		disagrees, then blow away the entry.  if the directory
 		being pointed to winds up disconnected, it'll be moved
 		to the orphanage (and the link count incremented to
 		account for the link and the reached bit set then).

 		If an entry points to a directory that we've already
 		reached, then some entry is bad and should be blown
 		away.  It's easiest to blow away the current entry
 		plus since presumably the parent entry in the
 		reached directory points to another directory,
 		then it's far more likely that the current
 		entry is bogus (otherwise the parent should point
 		at it).

 		If an entry points to a non-existent of free inode,
 		blow the entry away.

 		Every time a good entry is encountered update the
 		link count for the inode that the entry points to.

 	After traversal, scan incore inode map for directories not
 		reached.  Go to first one and try and find its root
 		by following .. entries.  Once at root, run traversal
 		algorithm.  When algorithm terminates, move subtree
 		root inode to the orphanage.  Repeat as necessary
 		until all disconnected directories are attached.

 	Move all disconnected inodes to orphanage.

 - Phase 7:  reset reference counts if required.

 	Now traverse the on-disk inodes again, and make sure on-disk
 		reference counts are correct.  Reset if necessary.

 		SKIP all unused inodes -- that also makes us
 		skip the orphanage inode which we think is
 		unused but is really used.  However, the ref counts
 		on that should be right so that's ok.

 ---

 multiple TB xfs_repair

 modify above to work in a couple of AGs at a time.  The bitmaps
 should span only the current set of AGs.

 The key it scan the inode bmaps and keep a list of inodes
 that span multiple AG sets and keep the list in a data structure
 that's keyed off AG set # as well as inode # and also has a bit
 to indicate whether or not the inode will be cleared.

 Then in each AG set, when doing duplicate extent processing,
 you have to process all multi-AG-set inodes that claim blocks in
 the current AG set.  If there's a conflict, you mark clear the
 inode in the current AG and you mark the multi-AG inode as
 "to be cleared".

 After going through all AGs, you can clear the to-be-cleared
 multi-AG-set inodes and pull them off the list.

 When building up the AG freespace trees, you walk the bmaps
 of all multi-AG-set inodes that are in the AG-set and include
 blocks claimed in the AG by the inode as used.

 This probably involves adding a phase 3-0 which would have to
 check all the inodes to see which ones are multi-AG-set inodes
 and set up the multi-AG-set inode data structure.  Plus the
 process_dinode routines may have to be altered just a bit
 to do the right thing if running in tera-byte mode (call
 out to routines that check the multi-AG-set inodes when
 appropriate).

 To make things go faster, phase 3-0 could probably run
 in parallel.  It should be possible to run phases 2-5
 in parallel as well once the appropriate synchronization
 is added to the incore routines and the static directory
 leaf block bitmap is changed to be on the stack.

 Phase 7 probably can be in parallel as well.

 By in parallel, I mean that assuming that an AG-set
 contains 4 AGs, you could run 4 threads, 1 per AG
 in parallel to process the AG set.

 I don't see how phase 6 can be run in parallel though.

 And running Phase 8 in parallel is just silly.