| From foo@baz Mon Dec 18 15:03:25 CET 2017 |
| From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> |
| Date: Fri, 10 Mar 2017 16:45:44 -0500 |
| Subject: btrfs: add missing memset while reading compressed inline extents |
| |
| From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> |
| |
| |
| [ Upstream commit e1699d2d7bf6e6cce3e1baff19f9dd4595a58664 ] |
| |
| This is a story about 4 distinct (and very old) btrfs bugs. |
| |
| Commit c8b978188c ("Btrfs: Add zlib compression support") added |
| three data corruption bugs for inline extents (bugs #1-3). |
| |
| Commit 93c82d5750 ("Btrfs: zero page past end of inline file items") |
| fixed bug #1: uncompressed inline extents followed by a hole and more |
| extents could get non-zero data in the hole as they were read. The fix |
| was to add a memset in btrfs_get_extent to zero out the hole. |
| |
| Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption") |
| fixed bug #2: compressed inline extents which contained non-zero bytes |
| might be replaced with zero bytes in some cases. This patch removed an |
| unhelpful memset from uncompress_inline, but the case where memset is |
| required was missed. |
| |
| There is also a memset in the decompression code, but this only covers |
| decompressed data that is shorter than the ram_bytes from the extent |
| ref record. This memset doesn't cover the region between the end of the |
| decompressed data and the end of the page. It has also moved around a |
| few times over the years, so there's no single patch to refer to. |
| |
| This patch fixes bug #3: compressed inline extents followed by a hole |
| and more extents could get non-zero data in the hole as they were read |
| (i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/). |
| The fix is the same: zero out the hole in the compressed case too, |
| by putting a memset back in uncompress_inline, but this time with |
| correct parameters. |
| |
| The last and oldest bug, bug #0, is the cause of the offending inline |
| extent/hole/extent pattern. Bug #0 is a subtle and mostly-harmless quirk |
| of behavior somewhere in the btrfs write code. In a few special cases, |
| an inline extent and hole are allowed to persist where they normally |
| would be combined with later extents in the file. |
| |
| A fast reproducer for bug #0 is presented below. A few offending extents |
| are also created in the wild during large rsync transfers with the -S |
| flag. A Linux kernel build (git checkout; make allyesconfig; make -j8) |
| will produce a handful of offending files as well. Once an offending |
| file is created, it can present different content to userspace each |
| time it is read. |
| |
| Bug #0 is at least 4 and possibly 8 years old. I verified every vX.Y |
| kernel back to v3.5 has this behavior. There are fossil records of this |
| bug's effects in commits all the way back to v2.6.32. I have no reason |
| to believe bug #0 wasn't present at the beginning of btrfs compression |
| support in v2.6.29, but I can't easily test kernels that old to be sure. |
| |
| It is not clear whether bug #0 is worth fixing. A fix would likely |
| require injecting extra reads into currently write-only paths, and most |
| of the exceptional cases caused by bug #0 are already handled now. |
| |
| Whether we like them or not, bug #0's inline extents followed by holes |
| are part of the btrfs de-facto disk format now, and we need to be able |
| to read them without data corruption or an infoleak. So enough about |
| bug #0, let's get back to bug #3 (this patch). |
| |
| An example of on-disk structure leading to data corruption found in |
| the wild: |
| |
| item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160 |
| inode generation 50 transid 50 size 47424 nbytes 49141 |
| block group 0 mode 100644 links 1 uid 0 gid 0 |
| rdev 0 flags 0x0(none) |
| item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20 |
| inode ref index 3 namelen 10 name: DB_File.so |
| item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362 |
| inline extent data size 1341 ram 4085 compress(zlib) |
| item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53 |
| extent data disk byte 5367308288 nr 20480 |
| extent data offset 0 nr 45056 ram 45056 |
| extent compression(zlib) |
| |
| Different data appears in userspace during each read of the 11 bytes |
| between 4085 and 4096. The extent in item 63 is not long enough to |
| fill the first page of the file, so a memset is required to fill the |
| space between item 63 (ending at 4085) and item 64 (beginning at 4096) |
| with zero. |
| |
| Here is a reproducer from Liu Bo, which demonstrates another method |
| of creating the same inline extent and hole pattern: |
| |
| Using 'page_poison=on' kernel command line (or enable |
| CONFIG_PAGE_POISONING) run the following: |
| |
| # touch foo |
| # chattr +c foo |
| # xfs_io -f -c "pwrite -W 0 1000" foo |
| # xfs_io -f -c "falloc 4 8188" foo |
| # od -x foo |
| # echo 3 >/proc/sys/vm/drop_caches |
| # od -x foo |
| |
| This produce the following on my box: |
| |
| Correct output: file contains 1000 data bytes followed |
| by zeros: |
| |
| 0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd |
| * |
| 0001740 cdcd cdcd cdcd cdcd 0000 0000 0000 0000 |
| 0001760 0000 0000 0000 0000 0000 0000 0000 0000 |
| * |
| 0020000 |
| |
| Actual output: the data after the first 1000 bytes |
| will be different each run: |
| |
| 0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd |
| * |
| 0001740 cdcd cdcd cdcd cdcd 6c63 7400 635f 006d |
| 0001760 5f74 6f43 7400 435f 0053 5f74 7363 7400 |
| 0002000 435f 0056 5f74 6164 7400 645f 0062 5f74 |
| (...) |
| |
| Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> |
| Reviewed-by: Liu Bo <bo.li.liu@oracle.com> |
| Reviewed-by: Chris Mason <clm@fb.com> |
| Signed-off-by: Chris Mason <clm@fb.com> |
| Signed-off-by: Sasha Levin <alexander.levin@verizon.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| --- |
| fs/btrfs/inode.c | 14 ++++++++++++++ |
| 1 file changed, 14 insertions(+) |
| |
| --- a/fs/btrfs/inode.c |
| +++ b/fs/btrfs/inode.c |
| @@ -6325,6 +6325,20 @@ static noinline int uncompress_inline(st |
| max_size = min_t(unsigned long, PAGE_CACHE_SIZE, max_size); |
| ret = btrfs_decompress(compress_type, tmp, page, |
| extent_offset, inline_size, max_size); |
| + |
| + /* |
| + * decompression code contains a memset to fill in any space between the end |
| + * of the uncompressed data and the end of max_size in case the decompressed |
| + * data ends up shorter than ram_bytes. That doesn't cover the hole between |
| + * the end of an inline extent and the beginning of the next block, so we |
| + * cover that region here. |
| + */ |
| + |
| + if (max_size + pg_offset < PAGE_SIZE) { |
| + char *map = kmap(page); |
| + memset(map + pg_offset + max_size, 0, PAGE_SIZE - max_size - pg_offset); |
| + kunmap(page); |
| + } |
| kfree(tmp); |
| return ret; |
| } |