| From 6a18243d889fa2721d6832a171632a8c78720e9e Mon Sep 17 00:00:00 2001 |
| From: Josef Bacik <josef@toxicpanda.com> |
| Date: Fri, 3 Jan 2020 10:38:44 -0500 |
| Subject: [PATCH] btrfs: fix improper setting of scanned for range cyclic write |
| cache pages |
| |
| commit 556755a8a99be8ca3cd9fbe36aaf9b3b0339a00d upstream. |
| |
| We noticed that we were having regular CG OOM kills in cases where there |
| was still enough dirty pages to avoid OOM'ing. It turned out there's |
| this corner case in btrfs's handling of range_cyclic where files that |
| were being redirtied were not getting fully written out because of how |
| we do range_cyclic writeback. |
| |
| We unconditionally were setting scanned = 1; the first time we found any |
| pages in the inode. This isn't actually what we want, we want it to be |
| set if we've scanned the entire file. For range_cyclic we could be |
| starting in the middle or towards the end of the file, so we could write |
| one page and then not write any of the other dirty pages in the file |
| because we set scanned = 1. |
| |
| Fix this by not setting scanned = 1 if we find pages. The rules for |
| setting scanned should be |
| |
| 1) !range_cyclic. In this case we have a specified range to write out. |
| 2) range_cyclic && index == 0. In this case we've started at the |
| beginning and there is no need to loop around a second time. |
| 3) range_cyclic && we started at index > 0 and we've reached the end of |
| the file without satisfying our nr_to_write. |
| |
| This patch fixes both of our writepages implementations to make sure |
| these rules hold true. This fixed our over zealous CG OOMs in |
| production. |
| |
| Fixes: d1310b2e0cd9 ("Btrfs: Split the extent_map code into two parts") |
| Signed-off-by: Josef Bacik <josef@toxicpanda.com> |
| Reviewed-by: David Sterba <dsterba@suse.com> |
| [ add comment ] |
| Signed-off-by: David Sterba <dsterba@suse.com> |
| Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> |
| |
| diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c |
| index 50bb700672f3..9ea710982438 100644 |
| --- a/fs/btrfs/extent_io.c |
| +++ b/fs/btrfs/extent_io.c |
| @@ -3941,6 +3941,11 @@ int btree_write_cache_pages(struct address_space *mapping, |
| if (wbc->range_cyclic) { |
| index = mapping->writeback_index; /* Start from prev offset */ |
| end = -1; |
| + /* |
| + * Start from the beginning does not need to cycle over the |
| + * range, mark it as scanned. |
| + */ |
| + scanned = (index == 0); |
| } else { |
| index = wbc->range_start >> PAGE_SHIFT; |
| end = wbc->range_end >> PAGE_SHIFT; |
| @@ -3958,7 +3963,6 @@ int btree_write_cache_pages(struct address_space *mapping, |
| tag))) { |
| unsigned i; |
| |
| - scanned = 1; |
| for (i = 0; i < nr_pages; i++) { |
| struct page *page = pvec.pages[i]; |
| |
| @@ -4083,6 +4087,11 @@ static int extent_write_cache_pages(struct address_space *mapping, |
| if (wbc->range_cyclic) { |
| index = mapping->writeback_index; /* Start from prev offset */ |
| end = -1; |
| + /* |
| + * Start from the beginning does not need to cycle over the |
| + * range, mark it as scanned. |
| + */ |
| + scanned = (index == 0); |
| } else { |
| index = wbc->range_start >> PAGE_SHIFT; |
| end = wbc->range_end >> PAGE_SHIFT; |
| @@ -4116,7 +4125,6 @@ static int extent_write_cache_pages(struct address_space *mapping, |
| &index, end, tag))) { |
| unsigned i; |
| |
| - scanned = 1; |
| for (i = 0; i < nr_pages; i++) { |
| struct page *page = pvec.pages[i]; |
| |
| -- |
| 2.7.4 |
| |