| # SPDX-License-Identifier: GPL-2.0 |
| |
| A living document. The basic algorithm. |
| |
| TODO: (D == DONE) |
| |
| 0) Need to bring some sanity into the case of flags that can |
| be set in the secondaries at mkfs time but reset or cleared |
| in the primary later in the filesystem's life. |
| |
| 0) Clear the persistent read-only bit if set. Clear the |
| shared bit if set and the version number is zero. This |
| brings the filesystem back to a known state. |
| |
| 0) make sure that superblock geometry code checks the logstart |
| value against whether or not we have an internal log. |
| If we have an internal log and a logdev, that's ok. |
| (Maybe we just aren't using it). If we have an external |
| log (logstart == 0) but no logdev, that's right out. |
| |
| 0) write secondary superblock search code. Rewrite initial |
| superblock parsing code to be less complicated. Just |
| use variables to indicate primary, secondary, etc., |
| and use a function to get the SB given a specific location |
| or something. |
| |
| 2) For inode alignment, if the SB bit is set and the |
| inode alignment size field in the SB is set, then |
| believe that the fs inodes MUST be aligned and |
| disallow any non-aligned inodes. Likewise, if |
| the SB bit isn't set (or earlier version) and |
| the inode alignment size field is zero, then |
| never set the bit even if the inodes are aligned. |
| Note that the bits and alignment values are |
| replicated in the secondary superblocks. |
| |
| 0) add feature specification options to parse_arguments |
| |
| 0) add logic to add_inode_ref(), add_inode_reached() |
| to detect nlink overflows in cases where the fs |
| (or user had indicated fs) doesn't support new nlinks. |
| |
| 6) check to make sure that the inodes containing btree blocks |
| with # recs < minrecs aren't legit -- e.g. the only |
| descendant of a root block. |
| |
| 7) inode di_size value sanity checking -- should always be less than |
| the biggest filebno offset mentioned in the bmaps. Doesn't |
| have to be equal though since we're allowed to overallocate |
| (it just wastes a little space). This is for both regular |
| files and directories (have to modify the existing directory |
| check). |
| |
| Add tracking of largest offset in bmap scanning code. Compare |
| value against di_size. Should be >= di_size. |
| |
| Alternatively, you could pass the inode into down through |
| the extent record processing layer and make the checks |
| there. |
| |
| Add knowledge of quota inodes. size of quota inode is |
| always zero. We should maintain that. |
| |
| 8) Basic quota stuff. |
| |
| Invariants |
| if quota feature bit is set, the quota inodes |
| if set, should point to disconnected, 0 len inodes. |
| |
| D - if quota inodes exist, the quota bits must be |
| turned on. It's ok for the quota flags to be |
| zeroed but they should be in a legal state |
| (see xfs_quota.h). |
| |
| D - if the quota flags are non-zero, the corresponding |
| quota inodes must exist. |
| |
| quota inodes are never deleted, only their space |
| is freed. |
| |
| if quotas are being downgraded, then check quota inodes |
| at the end of phase 3. If they haven't been cleared yet, |
| clear them. Regardless, then clear sb flags (quota inode |
| fields, quota flags, and quota bit). |
| |
| |
| 5) look at verify_inode_chunk(). it's probably really broken. |
| |
| |
| 9) Complicated quota stuff. Add code to bmap scan code to |
| track used blocks. Add another pair of AVL trees |
| to track user and project quota limits. Set AVL |
| trees up at the beginning of phase 3. Quota inodes |
| can be rebuilt or corrected later if damaged. |
| |
| |
| D - 0) fix directory processing. phase 3, if an entry references |
| a free inode, *don't* mark it used. wait for the rest of |
| phase 3 processing to hit that inode. If it looks like it's |
| in use, we'll mark in use then. If not, we'll clear it and |
| mark the inode map. then in phase 4, you can depend on the |
| inode map. should probably set the parent info in phase 4. |
| So we have a check_dups flag. Maybe we should change the |
| name of check_dir to discover_inodes. During phase 3 |
| (discover_inodes == 1), uncertain inodes are added to list. |
| During phase 4 (discover_inodes == 0), they aren't. And |
| we never mark inodes in use from the directory code. |
| During phase 4, we shouldn't complain about names with |
| a leading '/' since we made those names in phase 3. |
| |
| Have to change dino_chunks.c (parent setting), dinode.c |
| and dir.c. |
| |
| D - 0) make sure we don't screw up filesystems with real-time inodes. |
| remember to initialize real-time map with all blocks XR_E_FREE. |
| |
| D - 4) check contents of symlinks as well as lengths in process_symlinks() |
| in dinode.c. Right now, we only check lengths. |
| |
| |
| D - 1) Feature mismatches -- for quotas and attributes, |
| if the stuff exists in the filesystem, set the |
| superblock version bits. |
| |
| D - 0) rewrite directory leaf block holemap comparison code. |
| probably should just check the leaf block hole info |
| against our incore bitmap. If the hole flag is not |
| set, then we know that there can only be one hole and |
| it has to be between the entry table and the top of heap. |
| If the hole flag is set, then it's ok if the on-disk |
| holemap doesn't describe everything as long as what |
| it does describe doesn't conflict with reality. |
| |
| D - 0) rewrite setting nlinks handling -- for version 1 |
| inodes, set both nlinks and onlinks (zero projid_lo/hi |
| and pad) if we have to change anything. For |
| version 2, I think we're ok. |
| |
| D - 0) Put awareness of quota inode into mark_standalone_inodes. |
| |
| |
| D - 8) redo handling of superblocks with bad version numbers. need |
| to bail out (without harming) fs's that have sbs that |
| are newer than we are. |
| |
| D - 0) How do we handle feature mismatches between fs and |
| superblock? For nlink, check each inode after you |
| know it's good. If onlinks is 0 and nlinks is > 0 |
| and it's a version 2 inode, then it really is a version |
| 2 inode and the nlinks flag in the SB needs to be set. |
| If it's a version 2 inode and the SB agrees but onlink |
| is non-zero, then clear onlink. |
| |
| D - 3) keep cumulative counts of freeblocks, inodes, etc. to set in |
| the superblock at the end of phase 5. Remember that |
| agf freeblock counters don't include blocks used by |
| the non-root levels of the freespace trees but that |
| the sb free block counters include those. |
| |
| D - 0) Do parent setting in directory code (called by phase 3). |
| actually, I put it in process_inode_set and propagated |
| the parent up to it from the process_dinode/process_dir |
| routines. seemed cleaner than pushing the irec down |
| and letting them bang on it. |
| |
| D - 0) If we clear a file in phase 4, make sure that if it's |
| a directory that the parent info is cleared also. |
| |
| D - 0) put inode tree flashover (call to add_ino_backptrs) into phase 5. |
| |
| D - 0) do set/get_inode_parent functions in incore_ino.c. |
| also do is/set/ inode_processed. |
| |
| D - 0) do a versions.c to extract feature info and set global vars |
| from the superblock version number and possibly feature bits |
| |
| D - 0) change longform_dir_entry_check + shortform_dir_entry_check |
| to return a count of how many illegal '/' entries exist. |
| if > 0, then process_dirstack needs to call prune_dir_entry |
| with a hash value of 0 to delete the entries. |
| |
| D - 0) add the "processed" bitfield |
| to the backptrs_t struct that gets attached after |
| phase 4. |
| |
| D- ) Phase 6 !!! |
| |
| D - 0) look at usage of XFS_MAKE_IPTR(). It does the right |
| arithmetic assuming you count your offsets from the |
| beginning of the buffer. |
| |
| |
| D - 0) look at references to XFS_INODES_PER_CHUNK. change the |
| ones that really mean sizeof(uint64_t)*NBBY to |
| something else (like that only defined as a constant |
| INOS_PER_IREC. this isn't as important since |
| XFS_INODES_PER_CHUNK will never chang |
| |
| |
| D - 0) look at junk_zerolen_dir_leaf_entries() to make sure it isn't hosing |
| the freemap since it assumed that bytes between the |
| end of the table and firstused didn't show up in the |
| freemap when they actually do. |
| |
| D - 0) track down XFS_INO_TO_OFFSET() usage. I don't think I'm |
| using it right. (e.g. I think |
| it gives you the offset of an inode into a block but |
| on small block filesystems, I may be reading in inodes |
| in multiblock buffers and working from the start of |
| the buffer plus I'm using it to get offsets into |
| my ino_rec's which may not be a good idea since I |
| use 64-inode ino_rec's whereas the offset macro |
| works off blocksize). |
| |
| D - 0.0) put buffer -> dirblock conversion macros into xfs kernel code |
| |
| D - 0.2) put in sibling pointer checking and path fixup into |
| bmap (long form) scan routines in scan.c |
| D - 0.3) find out if bmap btrees with only root blocks are legal. I'm |
| betting that they're not because they'd be extent inodes |
| instead. If that's the case, rip some code out of |
| process_btinode() |
| |
| |
| Algorithm (XXX means not done yet): |
| |
| Phase 1 -- get a superblock and zero log |
| |
| get a superblock -- either read in primary or |
| find a secondary (ag header), check ag headers |
| |
| To find secondary: |
| |
| Go for brute force and read in the filesystem N meg |
| at a time looking for a superblock. as a |
| slight optimization, we could maybe skip |
| ahead some number of blocks to try and get |
| towards the end of the first ag. |
| |
| After you find a secondary, try and find at least |
| other ags as a verification that the |
| secondary is a good superblock. |
| |
| XXX - Ugh. Have to take growfs'ed filesystems into account. |
| The root superblock geometry info may not be right if |
| recovery hasn't run or it's been trashed. The old ag's |
| may or may not be right since the system could have crashed |
| during growfs or the bwrite() to the superblocks could have |
| failed and the buffer been reused. So we need to check |
| to see if another ag exists beyond the "last" ag |
| to see if a growfs happened. If not, then we know that |
| the geometry info is good and treat the fs as a non-growfs'ed |
| fs. If we do have inconsistencies, then the smaller geometry |
| is the old fs and the larger the new. We can check the |
| new superblocks to see if they're good. If not, then we |
| know the system crashed at or soon after the growfs and |
| we can choose to either accept the new geometry info or |
| trash it and truncate the fs back to the old geometry |
| parameters. |
| |
| Cross-check geometry information in secondary sb's with |
| primary to ensure that it's correct. |
| |
| Use sim code to allow mount filesystems *without* reading |
| in root inode. This sets up the xfs_mount_t structure |
| and allows us to use XFS_* macros that we wouldn't |
| otherwise be able to use. |
| |
| Note, I split phase 1 and 2 into separate pieces because I want |
| to initialize the xfs_repair incore data structures after phase 1. |
| |
| parse superblock version and feature flags and set appropriate |
| global vars to reflect the flags (attributes, quotas, etc.) |
| |
| Workaround for the mkfs "not zeroing the superblock buffer" bug. |
| Determine what field is the last valid non-zero field in |
| the superblock. The trick here is to be able to differentiate |
| the last valid non-zero field in the primary superblock and |
| secondaries because they may not be the same. Fields in |
| the primary can be set as the filesystem gets upgraded but |
| the upgrades won't touch the secondaries. This means that |
| we need to find some number of secondaries and check them. |
| So we do the checking here and the setting in phase2. |
| |
| Phase 2 -- check integrity of allocation group allocation structures |
| |
| zero the log if in no modify mode |
| |
| sanity check ag headers -- superblocks match, agi isn't |
| trashed -- the agf and agfl |
| don't really matter because we can |
| just recreate them later. |
| |
| Zero part of the superblock buffer if necessary |
| |
| Walk the freeblock trees to get an |
| initial idea of what the fs thinks is free. |
| Files that disagree (claim free'd blocks) |
| can be salvaged or deleted. If the btree is |
| internally inconsistent, when in doubt, mark |
| blocks free. If they're used, they'll be stolen |
| back later. don't have to check sibling pointers |
| for each level since we're going to regenerate |
| all the trees anyway. |
| Walk the inode allocation trees and |
| make sure they're ok, otherwise the sim |
| inode routines will probably just barf. |
| mark inode allocation tree blocks and ag header |
| blocks as used blocks. If the trees are |
| corrupted, this phase will generate "uncertain" |
| inode chunks. Those chunks go on a list and |
| will have to verified later. Record the blocks |
| that are used to detect corruption and multiply |
| claimed blocks. These trees will be regenerated |
| later. Mark the blocks containing inodes referenced |
| by uncorrupted inode trees as being used by inodes. |
| The other blocks will get marked when/if the inodes |
| are verified. |
| |
| calculate root and realtime inode numbers from the |
| filesystem geometry, fix up mount structure's |
| incore superblock if they're wrong. |
| |
| ASSUMPTION: at end of phase 2, we've got superblocks and ag headers |
| that are not garbage (some data in them like counters and the |
| freeblock and inode trees may be inconsistent but the header |
| is readable and otherwise makes sense). |
| |
| XXX if in no_modify mode, check for blocks claimed by one freespace |
| btree and not the other |
| |
| Phase 3 -- traverse inodes to make the inodes, bmaps and freespace maps |
| consistent. For each ag, use either the incore inode map or |
| scan the ag for inodes. |
| Let's use the incore inode map, now that we've made one |
| up in phase2. If we lose the maps, we'll locate inodes |
| when we traverse the directory heirarchy. If we lose both, |
| we could scan the disk. Ugh. Maybe make that a command-line |
| option that we support later. |
| |
| ASSUMPTION: we know if the ag allocation btrees are intact (phase 2) |
| |
| First - Walk and clear the ag unlinked lists. We'll process |
| the inodes later. Check and make sure that the unlinked |
| lists reference known inodes. If not, add to the list |
| of uncertain inodes. |
| |
| Second, check the uncertain inode list generated in phase2 and |
| above and get them into the inode tree if they're good. |
| The incore inode cluster tree *always* has good |
| clusters (alignment, etc.) in it. |
| |
| Third, make sure that the root inode is known. If not, |
| and we know the inode number from the superblock, |
| discover that inode and its chunk. |
| |
| Then, walk the incore inode-cluster tree. |
| |
| Maintain an in-core bitmap over the entire fs for block allocation. |
| |
| traverse each inode, make sure inode mode field matches free/allocated |
| bit in the incore inode allocation tree. If there's a mismatch, |
| assume that the inode is in use. |
| |
| - for each in-use inode, traverse each bmap/dir/attribute |
| map or tree. Maintain a map (extent list?) for the |
| current inode. |
| |
| - For each block marked as used, check to see if already known |
| (referenced by another file or directory) and sanity |
| check the contents of the block as well if possible |
| (in the case of meta-blocks). |
| |
| - if the inode claims already used blocks, mark the blocks |
| as multiply claimed (duplicate) and go on. the inode |
| will be cleared in phase 4. |
| |
| - if metablocks are garbaged, clear the inode after |
| traversing what you can of the bmap and |
| proceed to next inode. We don't have to worry |
| about trashing the maps or trees in cleared inodes |
| because the blocks will show up as free in the |
| ag freespace trees that we set up in phase 5. |
| |
| - clear the di_next_unlinked pointer -- all unlinked |
| but active files go bye-bye. |
| |
| - All blocks start out unknown. We need the last state |
| in case we run into a case where we need to step |
| on a block to store filesystem meta-data and it |
| turns out later that it's referenced by some inode's |
| bmap. In that case, the inode loses because we've |
| already trashed the block. This shouldn't happen |
| in the first version unless some inode has a bogus |
| bmap referencing blocks in the ag header but the |
| 4th state will keep us from inadvertently doing |
| something stupid in that case. |
| |
| - If inode is allocated, mark all blocks allocated to the |
| current inode as allocated in the incore freespace |
| bitmap. |
| |
| - If inode is good and a directory, scan through it to |
| find leaf entries and discover any unknown inodes. |
| |
| For shortform, we correct what we can. |
| |
| If the directory is corrupt, we try and fix it in |
| place. If it has zero good entries, then we blast it. |
| |
| All unknown inodes get put onto the uncertain inode |
| list. This is safe because we only put inodes onto |
| the list when we're processing known inodes so the |
| uncertain inode list isn't in use. |
| |
| We fix only one problem -- an entry that has |
| a mathematically invalid inode numbers in them. |
| If that's the case, we replace the inode number |
| with NULLFSINO and we'll fix up the entry in |
| phase 6. |
| |
| That info may conflict with the inode information, |
| but we'll straighten out any inconsistencies there |
| in phase4 when we process the inodes again. |
| |
| Errors involving bogus forward/back links, |
| zero-length entries make the directory get |
| trashed. |
| |
| if an entry references a free inode, ignore that |
| fact for now. wait for the rest of phase 3 |
| processing to hit that inode. If it looks like it's |
| in use, we'll mark in use then. If not, we'll |
| clear it and mark the inode map. then in phase |
| 4, you can depend on the inode map. |
| |
| Entries that point to non-existent or free |
| inodes, and extra blocks in the directory |
| will get fixed in place in a later pass. |
| |
| Entries that point to a quota inode are |
| marked TBD. |
| |
| If the directory internally points to the same |
| block twice, the directory gets blown away. |
| |
| Note that processing uncertain inodes can add more inodes |
| to the uncertain list if they're directories. So we loop |
| until the uncertain list is empty. |
| |
| During inode verification, if the inode blocks are unknown, |
| mark then as in-use by inodes. |
| |
| XXX HEURISTIC -- if we blow an inode away that has space, |
| assume that the freespace btree is now out of wack. |
| If it was ok earlier, it's certain to be wrong now. |
| And the odds of this space free cancelling out the |
| existing error is so small I'm willing to ignore it. |
| Should probably do this via a global var and complain |
| about this later. |
| |
| Assumption: All known inodes are now marked as in-use or free. Any |
| inodes that we haven't found by now are hosed (lost) since |
| we can't reach them via either the inode btrees or via directory |
| entries. |
| |
| Directories are semi-clean. All '.' entries are good. |
| Root '..' entry is good if root inode exists. All entries |
| referencing non-existent inodes, free inodes, etc. |
| |
| XXX verify that either quota inode is 0 or NULLFSINO or |
| if sb quota flag is non zero, verify that quota inode |
| is NULLFSINO or is referencing a used, but disconnected |
| inode. |
| |
| XXX if in no_modify mode, check for unclaimed blocks |
| |
| - Phase 4 - Check for inodes referencing duplicate blocks |
| |
| At this point, all known duplicate blocks are marked in |
| the block map. However, some of the claimed blocks in |
| the bmap may in fact be free because they belong to inodes |
| that have to be cleared either due to being a trashed |
| directory or because it's the first inode to claim a |
| block that was then claimed later. There's a similar |
| problem with meta-data blocks that are referenced by |
| inode bmaps that are going to be freed once the inode |
| (or directory) gets cleared. |
| |
| So at this point, we collect the duplicate blocks into |
| extents and put them into the duplicate extent list. |
| |
| Mark the ag header blocks as in use. |
| |
| We then process each inode twice -- the first time |
| we check to see if the inode claims a duplicate extent |
| and we do NOT set the block bitmap. If the inode claims |
| a duplicate extent, we clear the inode. Since the bitmap |
| hasn't been set, that automatically frees all blocks associated |
| with the cleared inode. If the inode is ok, process it a second |
| time and set the bitmap since we know that this inode will live. |
| |
| The unlinked list gets cleared in every inode at this point as |
| well. We no longer need to preserve it since we've discovered |
| every inode we're going to find from it. |
| |
| verify existence of root inode. if it exists, check for |
| existence of "lost+found". If it exists, mark the entry |
| to be deleted, and clear the inode. All the inodes that |
| were connected to the lost+found will be reconnected later. |
| |
| XXX HEURISTIC -- if we blow an inode away that has space, |
| assume that the freespace btree is now out of wack. |
| If it was ok earlier, it's certain to be wrong now. |
| And the odds of this space free cancelling out the |
| existing error is so small I'm willing to ignore it. |
| Should probably do this via a global var and complain |
| about this later. |
| |
| Clear the quota inodes if the inode btree says that |
| they're not in use. The space freed will get picked |
| up by phase 5. |
| |
| XXX Clear the quota inodes if the filesystem is being downgraded. |
| |
| - Phase 5 - Build inode allocation trees, freespace trees and |
| agfl's for each ag. After this, we should be able to |
| unmount the filesystem and remount it for real. |
| |
| For each ag: (if no in no_modify mode) |
| |
| scan bitmap first to figure out number of extents. |
| |
| calculate space required for all trees. Start with inode trees. |
| Setup the btree cursor which includes the list of preallocated |
| blocks. As a by-product, this will delete the extents required |
| for the inode tree from the incore extent tree. |
| |
| Calculate how many extents will be required to represent the |
| remaining free extent tree on disk (twice, one for bybno and |
| one for bycnt). You have to iterate on this because consuming |
| extents can alter the number of blocks required to represent |
| the remaining extents. If there's slop left over, you can |
| put it in the agfl though. |
| |
| Then, manually build the trees, agi, agfs, and agfls. |
| |
| XXX if in no_modify mode, scan the on-disk inode allocation |
| trees and compare against the incore versions. Don't have |
| to scan the freespace trees because we caught the problems |
| there in phase2 and phase3. But if we cleared any inodes |
| with space during phases 3 or 4, now is the time to complain. |
| |
| XXX - Free duplicate extent lists. ??? |
| |
| Assumptions: at this point, sim code having to do with inode |
| creation/modification/deletion and space allocation |
| work because the inode maps, space maps, and bmaps |
| for all files in the filesystem are good. The only |
| structures that are screwed up are the directory contents, |
| which means that lookup may not work for beans, the |
| root inode which exists but may be completely bogus and |
| the link counts on all inodes which may also be bogus. |
| |
| Free the bitmap, the freespace tree. |
| |
| Flash the incore inode tree over from parent list to having |
| full backpointers. |
| |
| realtime processing, if any -- |
| |
| (Skip to below if running in no_modify mode). |
| |
| Generate the realtime bitmap from the incore realtime |
| extent map and slam the info into the realtime bitmap |
| inode. Generate summary info from the realtime extent map. |
| |
| XXX if in no_modify mode, compare contents of realtime bitmap |
| inode to the incore realtime extent map. generate the |
| summary info from the incore realtime extent map. |
| compare against the contents of the realtime summary inode. |
| complain if bad. |
| |
| reset superblock counters, sync version numbers |
| |
| - Phase 6 - directory traversal -- check reference counts, |
| attach disconnected inodes, fix up bogus directories |
| |
| Assumptions: all on-disk space and inode trees are structurally |
| sound. Incore and on-disk inode trees agree on whether |
| an inode is in use. |
| |
| Directories are structurally sound. All hashvalues |
| are monotonically increasing and interior nodes are |
| correct so lookups work. All legal directory entries |
| point to inodes that are in use and exist. Shortform |
| directories are fine except that the links haven't been |
| checked for conflicts (cycles, ".." being correct, etc.). |
| Longform directories haven't been checked for those problems |
| either PLUS longform directories may still contain |
| entries beginning with '/'. No zero-length entries |
| exist (they've been deleted or converted to '/'). |
| |
| Root directory may or may not exist. orphange may |
| or may not exist. Contents of either may be completely |
| bogus. |
| |
| Entries may point to free or non-existent inodes. |
| |
| At this we point, we may need new incore structures and |
| may be able to trash an old one (like the filesystem |
| block map) |
| |
| If '/' is trashed, then reinitialize it. |
| |
| If no realtime inodes, make them and if necessary, slam the |
| summary info into the realtime summary |
| inode. Ditto with the realtime bitmap inode. |
| |
| Make orphanage (lost+found ???). |
| |
| Traverse each directory from '/' (unless it was created). |
| Check directory structure and each directory entry. |
| If the entry is bogus (points to a non-existent or |
| free inode, for example), mark that entry TBD. Maintain |
| link counts on all inodes. Currently, traversal is |
| depth-first. |
| |
| Mark every inode reached as "reached" (includes |
| bumping up link counts). |
| |
| If a entry points to a directory but the parent (..) |
| disagrees, then blow away the entry. if the directory |
| being pointed to winds up disconnected, it'll be moved |
| to the orphanage (and the link count incremented to |
| account for the link and the reached bit set then). |
| |
| If an entry points to a directory that we've already |
| reached, then some entry is bad and should be blown |
| away. It's easiest to blow away the current entry |
| plus since presumably the parent entry in the |
| reached directory points to another directory, |
| then it's far more likely that the current |
| entry is bogus (otherwise the parent should point |
| at it). |
| |
| If an entry points to a non-existent of free inode, |
| blow the entry away. |
| |
| Every time a good entry is encountered update the |
| link count for the inode that the entry points to. |
| |
| After traversal, scan incore inode map for directories not |
| reached. Go to first one and try and find its root |
| by following .. entries. Once at root, run traversal |
| algorithm. When algorithm terminates, move subtree |
| root inode to the orphanage. Repeat as necessary |
| until all disconnected directories are attached. |
| |
| Move all disconnected inodes to orphanage. |
| |
| - Phase 7: reset reference counts if required. |
| |
| Now traverse the on-disk inodes again, and make sure on-disk |
| reference counts are correct. Reset if necessary. |
| |
| SKIP all unused inodes -- that also makes us |
| skip the orphanage inode which we think is |
| unused but is really used. However, the ref counts |
| on that should be right so that's ok. |
| |
| --- |
| |
| multiple TB xfs_repair |
| |
| modify above to work in a couple of AGs at a time. The bitmaps |
| should span only the current set of AGs. |
| |
| The key it scan the inode bmaps and keep a list of inodes |
| that span multiple AG sets and keep the list in a data structure |
| that's keyed off AG set # as well as inode # and also has a bit |
| to indicate whether or not the inode will be cleared. |
| |
| Then in each AG set, when doing duplicate extent processing, |
| you have to process all multi-AG-set inodes that claim blocks in |
| the current AG set. If there's a conflict, you mark clear the |
| inode in the current AG and you mark the multi-AG inode as |
| "to be cleared". |
| |
| After going through all AGs, you can clear the to-be-cleared |
| multi-AG-set inodes and pull them off the list. |
| |
| When building up the AG freespace trees, you walk the bmaps |
| of all multi-AG-set inodes that are in the AG-set and include |
| blocks claimed in the AG by the inode as used. |
| |
| This probably involves adding a phase 3-0 which would have to |
| check all the inodes to see which ones are multi-AG-set inodes |
| and set up the multi-AG-set inode data structure. Plus the |
| process_dinode routines may have to be altered just a bit |
| to do the right thing if running in tera-byte mode (call |
| out to routines that check the multi-AG-set inodes when |
| appropriate). |
| |
| To make things go faster, phase 3-0 could probably run |
| in parallel. It should be possible to run phases 2-5 |
| in parallel as well once the appropriate synchronization |
| is added to the incore routines and the static directory |
| leaf block bitmap is changed to be on the stack. |
| |
| Phase 7 probably can be in parallel as well. |
| |
| By in parallel, I mean that assuming that an AG-set |
| contains 4 AGs, you could run 4 threads, 1 per AG |
| in parallel to process the AG set. |
| |
| I don't see how phase 6 can be run in parallel though. |
| |
| And running Phase 8 in parallel is just silly. |