| From 0d085a529b427d97710e6a41f8a4f23e1757cd12 Mon Sep 17 00:00:00 2001 |
| From: Dave Chinner <dchinner@redhat.com> |
| Date: Tue, 23 Sep 2014 15:36:27 +1000 |
| Subject: xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly |
| |
| From: Dave Chinner <dchinner@redhat.com> |
| |
| commit 0d085a529b427d97710e6a41f8a4f23e1757cd12 upstream. |
| |
| XFS has been having trouble with stray delayed allocation extents |
| beyond EOF for a long time. Recent changes to the collapse range |
| code has triggered erroneous EBUSY errors on page invalidtion for |
| block size smaller than page size filesystems. These |
| have been caused by dirty buffers beyond EOF on a partial page which |
| do not get written to disk during a sync. |
| |
| The issue is that write-ahead in xfs_cluster_write() finds such a |
| partial page and handles it by leaving the page dirty but pushing it |
| into a writeback state. This used to work just fine, as the |
| write_cache_pages() code would then find the dirty partial page in |
| the next mapping tree lookup as the dirty tag is still set. |
| |
| Unfortunately, when we moved to a mark and sweep approach to |
| writeback to fix other writeback sync issues, we broken this. THe |
| act of marking the page as under writeback now clears the TOWRITE |
| tag in the radix tree, even though the page is still dirty. This |
| causes the TOWRITE tag to be cleared, and hence the next lookup on |
| the mapping tree does not find the dirty partial page and so doesn't |
| try to write it again. |
| |
| This same writeback bug was found recently in ext4 and fixed in |
| commit 1c8349a ("ext4: fix data integrity sync in ordered mode") |
| without communication to the wider filesystem community. We can use |
| exactly the same fix here so the TOWRITE flag is not cleared on |
| partial page writes. |
| |
| cc: stable@vger.kernel.org # dependent on 1c8349a17137b93f0a83f276c764a6df1b9a116e |
| Root-cause-found-by: Brian Foster <bfoster@redhat.com> |
| Signed-off-by: Dave Chinner <dchinner@redhat.com> |
| Reviewed-by: Brian Foster <bfoster@redhat.com> |
| Signed-off-by: Dave Chinner <david@fromorbit.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| fs/xfs/xfs_aops.c | 16 ++++++++++++++-- |
| 1 file changed, 14 insertions(+), 2 deletions(-) |
| |
| --- a/fs/xfs/xfs_aops.c |
| +++ b/fs/xfs/xfs_aops.c |
| @@ -434,10 +434,22 @@ xfs_start_page_writeback( |
| { |
| ASSERT(PageLocked(page)); |
| ASSERT(!PageWriteback(page)); |
| - if (clear_dirty) |
| + |
| + /* |
| + * if the page was not fully cleaned, we need to ensure that the higher |
| + * layers come back to it correctly. That means we need to keep the page |
| + * dirty, and for WB_SYNC_ALL writeback we need to ensure the |
| + * PAGECACHE_TAG_TOWRITE index mark is not removed so another attempt to |
| + * write this page in this writeback sweep will be made. |
| + */ |
| + if (clear_dirty) { |
| clear_page_dirty_for_io(page); |
| - set_page_writeback(page); |
| + set_page_writeback(page); |
| + } else |
| + set_page_writeback_keepwrite(page); |
| + |
| unlock_page(page); |
| + |
| /* If no buffers on the page are to be written, finish it here */ |
| if (!buffers) |
| end_page_writeback(page); |