| From 06bd3c36a733ac27962fea7d6f47168841376824 Mon Sep 17 00:00:00 2001 |
| From: Jan Kara <jack@suse.cz> |
| Date: Sun, 24 Apr 2016 00:56:03 -0400 |
| Subject: ext4: fix data exposure after a crash |
| |
| From: Jan Kara <jack@suse.cz> |
| |
| commit 06bd3c36a733ac27962fea7d6f47168841376824 upstream. |
| |
| Huang has reported that in his powerfail testing he is seeing stale |
| block contents in some of recently allocated blocks although he mounts |
| ext4 in data=ordered mode. After some investigation I have found out |
| that indeed when delayed allocation is used, we don't add inode to |
| transaction's list of inodes needing flushing before commit. Originally |
| we were doing that but commit f3b59291a69d removed the logic with a |
| flawed argument that it is not needed. |
| |
| The problem is that although for delayed allocated blocks we write their |
| contents immediately after allocating them, there is no guarantee that |
| the IO scheduler or device doesn't reorder things and thus transaction |
| allocating blocks and attaching them to inode can reach stable storage |
| before actual block contents. Actually whenever we attach freshly |
| allocated blocks to inode using a written extent, we should add inode to |
| transaction's ordered inode list to make sure we properly wait for block |
| contents to be written before committing the transaction. So that is |
| what we do in this patch. This also handles other cases where stale data |
| exposure was possible - like filling hole via mmap in |
| data=ordered,nodelalloc mode. |
| |
| The only exception to the above rule are extending direct IO writes where |
| blkdev_direct_IO() waits for IO to complete before increasing i_size and |
| thus stale data exposure is not possible. For now we don't complicate |
| the code with optimizing this special case since the overhead is pretty |
| low. In case this is observed to be a performance problem we can always |
| handle it using a special flag to ext4_map_blocks(). |
| |
| Fixes: f3b59291a69d0b734be1fc8be489fef2dd846d3d |
| Reported-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> |
| Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> |
| Signed-off-by: Jan Kara <jack@suse.cz> |
| Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| fs/ext4/inode.c | 24 +++++++++++++++--------- |
| 1 file changed, 15 insertions(+), 9 deletions(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -684,6 +684,21 @@ out_sem: |
| ret = check_block_validity(inode, map); |
| if (ret != 0) |
| return ret; |
| + |
| + /* |
| + * Inodes with freshly allocated blocks where contents will be |
| + * visible after transaction commit must be on transaction's |
| + * ordered data list. |
| + */ |
| + if (map->m_flags & EXT4_MAP_NEW && |
| + !(map->m_flags & EXT4_MAP_UNWRITTEN) && |
| + !(flags & EXT4_GET_BLOCKS_ZERO) && |
| + !IS_NOQUOTA(inode) && |
| + ext4_should_order_data(inode)) { |
| + ret = ext4_jbd2_file_inode(handle, inode); |
| + if (ret) |
| + return ret; |
| + } |
| } |
| return retval; |
| } |
| @@ -1289,15 +1304,6 @@ static int ext4_write_end(struct file *f |
| int i_size_changed = 0; |
| |
| trace_ext4_write_end(inode, pos, len, copied); |
| - if (ext4_test_inode_state(inode, EXT4_STATE_ORDERED_MODE)) { |
| - ret = ext4_jbd2_file_inode(handle, inode); |
| - if (ret) { |
| - unlock_page(page); |
| - put_page(page); |
| - goto errout; |
| - } |
| - } |
| - |
| if (ext4_has_inline_data(inode)) { |
| ret = ext4_write_inline_data_end(inode, pos, len, |
| copied, page); |