| From 8d610a1b1bbbe314d20fddea587d397140a1732f Mon Sep 17 00:00:00 2001 |
| From: Brian Foster <bfoster@redhat.com> |
| Date: Fri, 27 Jan 2017 23:22:57 -0800 |
| Subject: [PATCH] xfs: fix eofblocks race with file extending async dio writes |
| |
| commit e4229d6b0bc9280f29624faf170cf76a9f1ca60e upstream. |
| |
| It's possible for post-eof blocks to end up being used for direct I/O |
| writes. dio write performs an upfront unwritten extent allocation, sends |
| the dio and then updates the inode size (if necessary) on write |
| completion. If a file release occurs while a file extending dio write is |
| in flight, it is possible to mistake the post-eof blocks for speculative |
| preallocation and incorrectly truncate them from the inode. This means |
| that the resulting dio write completion can discover a hole and allocate |
| new blocks rather than perform unwritten extent conversion. |
| |
| This requires a strange mix of I/O and is thus not likely to reproduce |
| in real world workloads. It is intermittently reproduced by generic/299. |
| The error manifests as an assert failure due to transaction overrun |
| because the aforementioned write completion transaction has only |
| reserved enough blocks for btree operations: |
| |
| XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, \ |
| file: fs/xfs//xfs_trans.c, line: 309 |
| |
| The root cause is that xfs_free_eofblocks() uses i_size to truncate |
| post-eof blocks from the inode, but async, file extending direct writes |
| do not update i_size until write completion, long after inode locks are |
| dropped. Therefore, xfs_free_eofblocks() effectively truncates the inode |
| to the incorrect size. |
| |
| Update xfs_free_eofblocks() to serialize against dio similar to how |
| extending writes are serialized against i_size updates before post-eof |
| block zeroing. Specifically, wait on dio while under the iolock. This |
| ensures that dio write completions have updated i_size before post-eof |
| blocks are processed. |
| |
| Signed-off-by: Brian Foster <bfoster@redhat.com> |
| Reviewed-by: Christoph Hellwig <hch@lst.de> |
| Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> |
| Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
| Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> |
| |
| diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c |
| index 191b77d84f9b..ec5d281f978e 100644 |
| --- a/fs/xfs/xfs_bmap_util.c |
| +++ b/fs/xfs/xfs_bmap_util.c |
| @@ -820,6 +820,9 @@ xfs_free_eofblocks( |
| if (error) |
| return error; |
| |
| + /* wait on dio to ensure i_size has settled */ |
| + inode_dio_wait(VFS_I(ip)); |
| + |
| error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, |
| &tp); |
| if (error) { |
| -- |
| 2.12.0 |
| |