| <?xml version='1.0' encoding='utf-8' ?> |
| <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [ |
| ]> |
| <chapter id="xfs-repair"> |
| <title>XFS Repair</title> |
| <section> |
| <title>Filesystems can be corrupted by</title> |
| <para>• Filesystems can be corrupted by</para> |
| <itemizedlist> |
| <listitem><para>Hardware Errors |
| <itemizedlist> |
| <listitem><para>Media errors are common</para></listitem> |
| <listitem><para>Disks are getting bigger and bigger</para></listitem> |
| </itemizedlist> |
| </para></listitem> |
| <listitem><para>To a much lesser degree, bugs in the filesystem</para></listitem> |
| </itemizedlist> |
| <para>Filesystems are able to “repair” themselves since they consist of lists, links |
| and reference counts that can be validated</para> |
| <itemizedlist> |
| <listitem><para>But not all information is always recovered, inodes that do not have a |
| parent directory is common due to the directory structure being corrupted</para></listitem> |
| </itemizedlist> |
| </section> |
| <section> |
| <title>xfs_check</title> |
| <para>xfs_check is a script that runs xfs_db to do a filesystem check.</para> |
| <para>The "check" command in xfs_db scans all the metadata structures for inconsistency</para> |
| <para>xfs_check uses a different codebase to xfs_repair</para> |
| <itemizedlist> |
| <listitem><para>xfs_check and xfs_repair can be used to cross check each other |
| <itemizedlist> |
| <listitem><para>xfs_check vs xfs_repair -n)</para></listitem> |
| </itemizedlist> |
| </para></listitem> |
| </itemizedlist> |
| </section> |
| <section> |
| <title>xfs_repair</title> |
| <para>xfs_repair scans the filesystem and corrects any problems encountered.</para> |
| <para>xfs_repair performs a scan and repair in seven phases.</para> |
| <para>Each phase relies on the previous phase to fix a certain class of potential errors.</para> |
| <para>xfs_repair uses libxfs which is a partial port of the XFS kernel code to user-space.</para> |
| </section> |
| <section> |
| <title>xfs_repair – Phase 1</title> |
| <para>Find, verify and fix superblocks.</para> |
| <para>If a superblock is not found, xfs_repair will stop.</para> |
| <para>Sets up a virtual mount structure for the common XFS code base (libxfs) to work from.</para> |
| </section> |
| <section> |
| <title>xfs_repair – Phase 2</title> |
| <para>Checks the AG header structures (AGI, AGF and AGFL) and scans the AGF and AGI btrees.</para> |
| </section> |
| <section> |
| <title>xfs_repair – Phase 3</title> |
| <para>Using the AGI btree from phase2, scan the inode tree, processing the unlinked list for |
| deleted inodes and finding possible missing inode clusters.</para> |
| <para>Walk all the found inodes, recording used filesytem blocks (extents).</para> |
| <para>For directory inodes, scan the directory structure for more lost inodes.</para> |
| <para>Any bad inodes are trashed including unrecoverable corrupted directories.</para> |
| </section> |
| <section> |
| <title>xfs_repair – Phase 4</title> |
| <para>Scan inode extents again. Any inode with an extent covering used data is trashed.</para> |
| </section> |
| <section> |
| <title>xfs_repair – Phase 5</title> |
| <para>Rebuild AG headers and structures including the AGI btree, AGF btrees and AGFL |
| regardless whether any errors have been found or not.</para> |
| <para>Realtime inodes are also reconstructed.</para> |
| </section> |
| <section> |
| <title>xfs_repair – Phase 6</title> |
| <para>At this stage, the filesytem is in a mountable state.</para> |
| <para>Scan the directories analysing all data.</para> |
| <itemizedlist> |
| <listitem><para>Any directories with any corruption are rebuilt with whatever entries can be recovered.</para></listitem> |
| <listitem><para>A missing root directory is recreated.</para></listitem> |
| <listitem><para>All inodes that are in a directory are marked reached.</para></listitem> |
| </itemizedlist> |
| <para>At the end, any unreached inodes are put into lost+found.</para> |
| </section> |
| <section> |
| <title>xfs_repair – Phase 7</title> |
| <para>nlinks for inodes are corrected based on the data collected in phase 6.</para> |
| </section> |
| <section> |
| <title>Triaging xfs_check and xfs_repair problems</title> |
| <para>Mostofthetime,inodeinformationisrequired:</para> |
| <para><programlisting> |
| > inode <inode number> |
| > print |
| </programlisting></para> |
| <para>The root inode number can be derived from the superblock:</para> |
| <para><programlisting> |
| > sb 0 |
| > print rootino |
| </programlisting></para> |
| <para>Fordirectories,wecanalsodumpthecontentsfromtheextentlistshownintheinode:</para> |
| <para><programlisting> |
| > dblock <file offset in blocks> |
| > print |
| </programlisting></para> |
| <para>Directories have file offsets typically startingat 0, 8388608 and 16777216. |
| Each of these offsets stores different information for a directory.</para> |
| <para>The filename and inode numbers at 0, hash values at 8388608 and freespace information at 16777216.</para> |
| </section> |
| <section> |
| <title>xfs_repair and xfs_check should agree</title> |
| <para>If one of the tools reports a problem when the other passed the filesystem, there is |
| a problem with one of the tools</para> |
| <itemizedlist> |
| <listitem><para>most likely xfs_repair</para></listitem> |
| </itemizedlist> |
| <para><ulink url="http://oss.sgi.com/bugzilla/show_bug.cgi?id=723" /></para> |
| <para>xfs_check finds some errors on the filesystem:</para> |
| <para><programlisting> |
| link count mismatch for inode 387655 (name ?), nlink 0, counted 2 |
| link count mismatch for inode 13313696 (name ?), nlink 0, counted 2 |
| link count mismatch for inode 17197100 (name ?), nlink 0, counted 2 |
| </programlisting></para> |
| <para>xfs_repair reports no problems:</para> |
| <para><programlisting> |
| Phase 1 - find and verify superblock... |
| Phase 2 - using internal log |
| - zero log... |
| - scan filesystem freespace and inode maps... |
| - found root inode chunk |
| Phase 3 - for each AG... |
| - scan and clear agi unlinked lists... |
| - process known inodes and perform inode discovery... |
| - agno = 0 |
| - agno=1 |
| - agno = 2 |
| - agno = 3 |
| - agno = 4 |
| - process newly discovered inodes... |
| Phase 4 - check for duplicate blocks... |
| - setting up duplicate extent list... |
| - clear lost+found (if it exists) ... |
| - clearing existing "lost+found" inode |
| - marking entry "lost+found" to be deleted |
| - check for inodes claiming duplicate blocks... |
| - agno=0 |
| - agno=1 |
| - agno=2 |
| - agno=3 |
| - agno=4 |
| Phase 5 - rebuild AG headers and trees... |
| - reset superblock... |
| - check inode connectivity... |
| - resetting contents of realtime bitmap and summary |
| - ensuring existence of lost+found directory |
| - traversing filesystem starting at / ... |
| rebuilding directory inode 128 |
| - traversal finished ... |
| - traversing all unattached subtrees ... |
| - traversals finished ... |
| - moving disconnected inodes to lost+found ... |
| Phase 7 - verify and correct link counts... |
| done</programlisting></para> |
| </section> |
| <section> |
| <title>Dump the offending inodes.</title> |
| <para><programlisting> |
| # xfs_db -c "inode 387655" -c "print" /dev/sda6 core.magic = 0x494e |
| core.mode = 040755 |
| core.version = 1 |
| core.format = 1 (local) |
| core.nlinkv1 = 0 |
| ... |
| core.size = 6 |
| core.nblocks = 0 |
| core.extsize = 0 |
| core.nextents = 0 |
| ... |
| next_unlinked = null |
| u.sfdir2.hdr.count = 0 |
| u.sfdir2.hdr.i8count = 0 |
| u.sfdir2.hdr.parent.i4 = 135</programlisting></para> |
| </section> |
| <section> |
| <title>Mount and Repair Fails – Corrupted Log</title> |
| <para>If the log is corrupted you will see an error like:</para> |
| <para><programlisting> |
| # mount <filesystem> |
| mount: Unknown error 990 |
| # dmesg | tail -20 |
| Filesystem “<filesystem>": xfs_inode_recover: Bad inode magic number . . . |
| Filesystem "dm-0": XFS internal error xlog_recover_do_inode_trans(1) at line 2352 Caller 0xffffffff88307729 |
| XFS: log mount/recovery failed: error 990 XFS: log mount failed |
| # xfs_repair <device> |
| Phase 1 - find and verify superblock... |
| Phase 2 - using internal log |
| - zero log... |
| ERROR: The filesystem has valuable metadata changes in a log which needs to |
| be replayed. Mount the filesystem to replay the log, and unmount it before |
| re-running xfs_repair. If you are unable to mount the filesystem, then use |
| the -L option to destroy the log and attempt a repair. |
| Note that destroying the log may cause corruption -- please attempt a mount |
| of the filesystem before doing this.</programlisting></para> |
| <para>Usefulinformationcanbecollectedfortriage:</para> |
| <para><programlisting> |
| # /usr/sbin/xfs_logprint –C <filename> <device> |
| # /usr/sbin/xfs_logprint -t <device></programlisting></para> |
| <para>But in this case, the only option may be to throw the log away:</para> |
| <para><programlisting> |
| # xfs_repair -L</programlisting></para> |
| <warning><para>Destroying the log throws away valuable metadata, and may cause further corruption. |
| Attempt a mount before using -L, and use this option only as a last resort</para></warning> |
| </section> |
| </chapter> |