| From 63660bff439adc4f212c58af84485e108bcaa84f Mon Sep 17 00:00:00 2001 |
| From: Tejun Heo <tj@kernel.org> |
| Date: Thu, 3 Oct 2019 07:27:13 -0700 |
| Subject: [PATCH] btrfs: Avoid getting stuck during cyclic writebacks |
| |
| commit f7bddf1e27d18fbc7d3e3056ba449cfbe4e20b0a upstream. |
| |
| During a cyclic writeback, extent_write_cache_pages() uses done_index |
| to update the writeback_index after the current run is over. However, |
| instead of current index + 1, it gets to to the current index itself. |
| |
| Unfortunately, this, combined with returning on EOF instead of looping |
| back, can lead to the following pathlogical behavior. |
| |
| 1. There is a single file which has accumulated enough dirty pages to |
| trigger balance_dirty_pages() and the writer appending to the file |
| with a series of short writes. |
| |
| 2. balance_dirty_pages kicks in, wakes up background writeback and sleeps. |
| |
| 3. Writeback kicks in and the cursor is on the last page of the dirty |
| file. Writeback is started or skipped if already in progress. As |
| it's EOF, extent_write_cache_pages() returns and the cursor is set |
| to done_index which is pointing to the last page. |
| |
| 4. Writeback is done. Nothing happens till balance_dirty_pages |
| finishes, at which point we go back to #1. |
| |
| This can almost completely stall out writing back of the file and keep |
| the system over dirty threshold for a long time which can mess up the |
| whole system. We encountered this issue in production with a package |
| handling application which can reliably reproduce the issue when |
| running under tight memory limits. |
| |
| Reading the comment in the error handling section, this seems to be to |
| avoid accidentally skipping a page in case the write attempt on the |
| page doesn't succeed. However, this concern seems bogus. |
| |
| On each page, the code either: |
| |
| * Skips and moves onto the next page. |
| |
| * Fails issue and sets done_index to index + 1. |
| |
| * Successfully issues and continue to the next page if budget allows |
| and not EOF. |
| |
| IOW, as long as it's not EOF and there's budget, the code never |
| retries writing back the same page. Only when a page happens to be |
| the last page of a particular run, we end up retrying the page, which |
| can't possibly guarantee anything data integrity related. Besides, |
| cyclic writes are only used for non-syncing writebacks meaning that |
| there's no data integrity implication to begin with. |
| |
| Fix it by always setting done_index past the current page being |
| processed. |
| |
| Note that this problem exists in other writepages too. |
| |
| CC: stable@vger.kernel.org # 4.19+ |
| Signed-off-by: Tejun Heo <tj@kernel.org> |
| Reviewed-by: David Sterba <dsterba@suse.com> |
| Signed-off-by: David Sterba <dsterba@suse.com> |
| Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> |
| |
| diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c |
| index 0d16ed19e84b..6b70bb3ef862 100644 |
| --- a/fs/btrfs/extent_io.c |
| +++ b/fs/btrfs/extent_io.c |
| @@ -4080,7 +4080,7 @@ static int extent_write_cache_pages(struct address_space *mapping, |
| for (i = 0; i < nr_pages; i++) { |
| struct page *page = pvec.pages[i]; |
| |
| - done_index = page->index; |
| + done_index = page->index + 1; |
| /* |
| * At this point we hold neither the i_pages lock nor |
| * the page lock: the page may be truncated or |
| @@ -4115,16 +4115,6 @@ static int extent_write_cache_pages(struct address_space *mapping, |
| |
| ret = __extent_writepage(page, wbc, epd); |
| if (ret < 0) { |
| - /* |
| - * done_index is set past this page, |
| - * so media errors will not choke |
| - * background writeout for the entire |
| - * file. This has consequences for |
| - * range_cyclic semantics (ie. it may |
| - * not be suitable for data integrity |
| - * writeout). |
| - */ |
| - done_index = page->index + 1; |
| done = 1; |
| break; |
| } |
| -- |
| 2.7.4 |
| |