release/5.2.31/btrfs-Avoid-getting-stuck-during-cyclic-writebacks.patch - pub/scm/linux/kernel/git/paulg/longterm-queue-5.2 - Git at Google

 From 63660bff439adc4f212c58af84485e108bcaa84f Mon Sep 17 00:00:00 2001
 From: Tejun Heo <tj@kernel.org>
 Date: Thu, 3 Oct 2019 07:27:13 -0700
 Subject: [PATCH] btrfs: Avoid getting stuck during cyclic writebacks

 commit f7bddf1e27d18fbc7d3e3056ba449cfbe4e20b0a upstream.

 During a cyclic writeback, extent_write_cache_pages() uses done_index
 to update the writeback_index after the current run is over.  However,
 instead of current index + 1, it gets to to the current index itself.

 Unfortunately, this, combined with returning on EOF instead of looping
 back, can lead to the following pathlogical behavior.

 1. There is a single file which has accumulated enough dirty pages to
    trigger balance_dirty_pages() and the writer appending to the file
    with a series of short writes.

 2. balance_dirty_pages kicks in, wakes up background writeback and sleeps.

 3. Writeback kicks in and the cursor is on the last page of the dirty
    file.  Writeback is started or skipped if already in progress.  As
    it's EOF, extent_write_cache_pages() returns and the cursor is set
    to done_index which is pointing to the last page.

 4. Writeback is done.  Nothing happens till balance_dirty_pages
    finishes, at which point we go back to #1.

 This can almost completely stall out writing back of the file and keep
 the system over dirty threshold for a long time which can mess up the
 whole system.  We encountered this issue in production with a package
 handling application which can reliably reproduce the issue when
 running under tight memory limits.

 Reading the comment in the error handling section, this seems to be to
 avoid accidentally skipping a page in case the write attempt on the
 page doesn't succeed.  However, this concern seems bogus.

 On each page, the code either:

 * Skips and moves onto the next page.

 * Fails issue and sets done_index to index + 1.

 * Successfully issues and continue to the next page if budget allows
   and not EOF.

 IOW, as long as it's not EOF and there's budget, the code never
 retries writing back the same page.  Only when a page happens to be
 the last page of a particular run, we end up retrying the page, which
 can't possibly guarantee anything data integrity related.  Besides,
 cyclic writes are only used for non-syncing writebacks meaning that
 there's no data integrity implication to begin with.

 Fix it by always setting done_index past the current page being
 processed.

 Note that this problem exists in other writepages too.

 CC: stable@vger.kernel.org # 4.19+
 Signed-off-by: Tejun Heo <tj@kernel.org>
 Reviewed-by: David Sterba <dsterba@suse.com>
 Signed-off-by: David Sterba <dsterba@suse.com>
 Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

 diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
 index 0d16ed19e84b..6b70bb3ef862 100644
 --- a/fs/btrfs/extent_io.c
 +++ b/fs/btrfs/extent_io.c
 @@ -4080,7 +4080,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
  		for (i = 0; i < nr_pages; i++) {
  			struct page *page = pvec.pages[i];

 -			done_index = page->index;
 +			done_index = page->index + 1;
  			/*
  			 * At this point we hold neither the i_pages lock nor
  			 * the page lock: the page may be truncated or
 @@ -4115,16 +4115,6 @@ static int extent_write_cache_pages(struct address_space *mapping,

  			ret = __extent_writepage(page, wbc, epd);
  			if (ret < 0) {
 -				/*
 -				 * done_index is set past this page,
 -				 * so media errors will not choke
 -				 * background writeout for the entire
 -				 * file. This has consequences for
 -				 * range_cyclic semantics (ie. it may
 -				 * not be suitable for data integrity
 -				 * writeout).
 -				 */
 -				done_index = page->index + 1;
  				done = 1;
  				break;
  			}
 --
 2.7.4
	From 63660bff439adc4f212c58af84485e108bcaa84f Mon Sep 17 00:00:00 2001
	From: Tejun Heo <tj@kernel.org>
	Date: Thu, 3 Oct 2019 07:27:13 -0700
	Subject: [PATCH] btrfs: Avoid getting stuck during cyclic writebacks

	commit f7bddf1e27d18fbc7d3e3056ba449cfbe4e20b0a upstream.

	During a cyclic writeback, extent_write_cache_pages() uses done_index
	to update the writeback_index after the current run is over. However,
	instead of current index + 1, it gets to to the current index itself.

	Unfortunately, this, combined with returning on EOF instead of looping
	back, can lead to the following pathlogical behavior.

	1. There is a single file which has accumulated enough dirty pages to
	trigger balance_dirty_pages() and the writer appending to the file
	with a series of short writes.

	2. balance_dirty_pages kicks in, wakes up background writeback and sleeps.

	3. Writeback kicks in and the cursor is on the last page of the dirty
	file. Writeback is started or skipped if already in progress. As
	it's EOF, extent_write_cache_pages() returns and the cursor is set
	to done_index which is pointing to the last page.

	4. Writeback is done. Nothing happens till balance_dirty_pages
	finishes, at which point we go back to #1.

	This can almost completely stall out writing back of the file and keep
	the system over dirty threshold for a long time which can mess up the
	whole system. We encountered this issue in production with a package
	handling application which can reliably reproduce the issue when
	running under tight memory limits.

	Reading the comment in the error handling section, this seems to be to
	avoid accidentally skipping a page in case the write attempt on the
	page doesn't succeed. However, this concern seems bogus.

	On each page, the code either:

	* Skips and moves onto the next page.

	* Fails issue and sets done_index to index + 1.

	* Successfully issues and continue to the next page if budget allows
	and not EOF.

	IOW, as long as it's not EOF and there's budget, the code never
	retries writing back the same page. Only when a page happens to be
	the last page of a particular run, we end up retrying the page, which
	can't possibly guarantee anything data integrity related. Besides,
	cyclic writes are only used for non-syncing writebacks meaning that
	there's no data integrity implication to begin with.

	Fix it by always setting done_index past the current page being
	processed.

	Note that this problem exists in other writepages too.

	CC: stable@vger.kernel.org # 4.19+
	Signed-off-by: Tejun Heo <tj@kernel.org>
	Reviewed-by: David Sterba <dsterba@suse.com>
	Signed-off-by: David Sterba <dsterba@suse.com>
	Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

	diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
	index 0d16ed19e84b..6b70bb3ef862 100644
	--- a/fs/btrfs/extent_io.c
	+++ b/fs/btrfs/extent_io.c
	@@ -4080,7 +4080,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
	for (i = 0; i < nr_pages; i++) {
	struct page *page = pvec.pages[i];

	- done_index = page->index;
	+ done_index = page->index + 1;
	/*
	* At this point we hold neither the i_pages lock nor
	* the page lock: the page may be truncated or
	@@ -4115,16 +4115,6 @@ static int extent_write_cache_pages(struct address_space *mapping,

	ret = __extent_writepage(page, wbc, epd);
	if (ret < 0) {
	- /*
	- * done_index is set past this page,
	- * so media errors will not choke
	- * background writeout for the entire
	- * file. This has consequences for
	- * range_cyclic semantics (ie. it may
	- * not be suitable for data integrity
	- * writeout).
	- */
	- done_index = page->index + 1;
	done = 1;
	break;
	}
	--
	2.7.4