| From 1e249cb5b7fc09ff216aa5a12f6c302e434e88f9 Mon Sep 17 00:00:00 2001 |
| From: Eric Biggers <ebiggers@google.com> |
| Date: Tue, 12 Jan 2021 11:02:43 -0800 |
| Subject: fs: fix lazytime expiration handling in __writeback_single_inode() |
| |
| From: Eric Biggers <ebiggers@google.com> |
| |
| commit 1e249cb5b7fc09ff216aa5a12f6c302e434e88f9 upstream. |
| |
| When lazytime is enabled and an inode is being written due to its |
| in-memory updated timestamps having expired, either due to a sync() or |
| syncfs() system call or due to dirtytime_expire_interval having elapsed, |
| the VFS needs to inform the filesystem so that the filesystem can copy |
| the inode's timestamps out to the on-disk data structures. |
| |
| This is done by __writeback_single_inode() calling |
| mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC). |
| |
| However, this occurs after __writeback_single_inode() has already |
| cleared the dirty flags from ->i_state. This causes two bugs: |
| |
| - mark_inode_dirty_sync() redirties the inode, causing it to remain |
| dirty. This wastefully causes the inode to be written twice. But |
| more importantly, it breaks cases where sync_filesystem() is expected |
| to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY |
| ioctl (as reported at |
| https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well |
| as possibly filesystem freezing (freeze_super()). |
| |
| - Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is |
| called from __writeback_single_inode() for lazytime expiration, |
| xfs_fs_dirty_inode() ignores the notification. (XFS only cares about |
| lazytime expirations, and it assumes that i_state will contain |
| I_DIRTY_TIME during those.) Therefore, lazy timestamps aren't |
| persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS. |
| |
| Fix this by moving the call to mark_inode_dirty_sync() to earlier in |
| __writeback_single_inode(), before the dirty flags are cleared from |
| i_state. This makes filesystems be properly notified of the timestamp |
| expiration, and it avoids incorrectly redirtying the inode. |
| |
| This fixes xfstest generic/580 (which tests |
| FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime |
| enabled. It also fixes the new lazytime xfstest I've proposed, which |
| reproduces the above-mentioned XFS bug |
| (https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org). |
| |
| Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But |
| due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the |
| right thing to do because mark_inode_dirty_sync() now knows not to move |
| the inode to a writeback list if it is currently queued for sync. |
| |
| Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option") |
| Cc: stable@vger.kernel.org |
| Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback") |
| Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.org |
| Suggested-by: Jan Kara <jack@suse.cz> |
| Reviewed-by: Christoph Hellwig <hch@lst.de> |
| Reviewed-by: Jan Kara <jack@suse.cz> |
| Signed-off-by: Eric Biggers <ebiggers@google.com> |
| Signed-off-by: Jan Kara <jack@suse.cz> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| fs/fs-writeback.c | 24 +++++++++++++----------- |
| 1 file changed, 13 insertions(+), 11 deletions(-) |
| |
| --- a/fs/fs-writeback.c |
| +++ b/fs/fs-writeback.c |
| @@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *i |
| } |
| |
| /* |
| - * Some filesystems may redirty the inode during the writeback |
| - * due to delalloc, clear dirty metadata flags right before |
| - * write_inode() |
| + * If the inode has dirty timestamps and we need to write them, call |
| + * mark_inode_dirty_sync() to notify the filesystem about it and to |
| + * change I_DIRTY_TIME into I_DIRTY_SYNC. |
| */ |
| - spin_lock(&inode->i_lock); |
| - |
| - dirty = inode->i_state & I_DIRTY; |
| if ((inode->i_state & I_DIRTY_TIME) && |
| - ((dirty & I_DIRTY_INODE) || |
| - wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync || |
| + (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync || |
| time_after(jiffies, inode->dirtied_time_when + |
| dirtytime_expire_interval * HZ))) { |
| - dirty |= I_DIRTY_TIME; |
| trace_writeback_lazytime(inode); |
| + mark_inode_dirty_sync(inode); |
| } |
| + |
| + /* |
| + * Some filesystems may redirty the inode during the writeback |
| + * due to delalloc, clear dirty metadata flags right before |
| + * write_inode() |
| + */ |
| + spin_lock(&inode->i_lock); |
| + dirty = inode->i_state & I_DIRTY; |
| inode->i_state &= ~dirty; |
| |
| /* |
| @@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *i |
| |
| spin_unlock(&inode->i_lock); |
| |
| - if (dirty & I_DIRTY_TIME) |
| - mark_inode_dirty_sync(inode); |
| /* Don't write the inode if only I_DIRTY_PAGES was set */ |
| if (dirty & ~I_DIRTY_PAGES) { |
| int err = write_inode(inode, wbc); |