releases/4.4.140/dm-bufio-avoid-sleeping-while-holding-the-dm_bufio-lock.patch - pub/scm/linux/kernel/git/stable/stable-queue - Git at Google

 From 9ea61cac0b1ad0c09022f39fd97e9b99a2cfc2dc Mon Sep 17 00:00:00 2001
 From: Douglas Anderson <dianders@chromium.org>
 Date: Thu, 17 Nov 2016 11:24:20 -0800
 Subject: dm bufio: avoid sleeping while holding the dm_bufio lock

 From: Douglas Anderson <dianders@chromium.org>

 commit 9ea61cac0b1ad0c09022f39fd97e9b99a2cfc2dc upstream.

 We've seen in-field reports showing _lots_ (18 in one case, 41 in
 another) of tasks all sitting there blocked on:

   mutex_lock+0x4c/0x68
   dm_bufio_shrink_count+0x38/0x78
   shrink_slab.part.54.constprop.65+0x100/0x464
   shrink_zone+0xa8/0x198

 In the two cases analyzed, we see one task that looks like this:

   Workqueue: kverityd verity_prefetch_io

   __switch_to+0x9c/0xa8
   __schedule+0x440/0x6d8
   schedule+0x94/0xb4
   schedule_timeout+0x204/0x27c
   schedule_timeout_uninterruptible+0x44/0x50
   wait_iff_congested+0x9c/0x1f0
   shrink_inactive_list+0x3a0/0x4cc
   shrink_lruvec+0x418/0x5cc
   shrink_zone+0x88/0x198
   try_to_free_pages+0x51c/0x588
   __alloc_pages_nodemask+0x648/0xa88
   __get_free_pages+0x34/0x7c
   alloc_buffer+0xa4/0x144
   __bufio_new+0x84/0x278
   dm_bufio_prefetch+0x9c/0x154
   verity_prefetch_io+0xe8/0x10c
   process_one_work+0x240/0x424
   worker_thread+0x2fc/0x424
   kthread+0x10c/0x114

 ...and that looks to be the one holding the mutex.

 The problem has been reproduced on fairly easily:
 0. Be running Chrome OS w/ verity enabled on the root filesystem
 1. Pick test patch: http://crosreview.com/412360
 2. Install launchBalloons.sh and balloon.arm from
      http://crbug.com/468342
    ...that's just a memory stress test app.
 3. On a 4GB rk3399 machine, run
      nice ./launchBalloons.sh 4 900 100000
    ...that tries to eat 4 * 900 MB of memory and keep accessing.
 4. Login to the Chrome web browser and restore many tabs

 With that, I've seen printouts like:
   DOUG: long bufio 90758 ms
 ...and stack trace always show's we're in dm_bufio_prefetch().

 The problem is that we try to allocate memory with GFP_NOIO while
 we're holding the dm_bufio lock.  Instead we should be using
 GFP_NOWAIT.  Using GFP_NOIO can cause us to sleep while holding the
 lock and that causes the above problems.

 The current behavior explained by David Rientjes:

   It will still try reclaim initially because __GFP_WAIT (or
   __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO.  This is the cause of
   contention on dm_bufio_lock() that the thread holds.  You want to
   pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a
   mutex that can be contended by a concurrent slab shrinker (if
   count_objects didn't use a trylock, this pattern would trivially
   deadlock).

 This change significantly increases responsiveness of the system while
 in this state.  It makes a real difference because it unblocks kswapd.
 In the bug report analyzed, kswapd was hung:

    kswapd0         D ffffffc000204fd8     0    72      2 0x00000000
    Call trace:
    [<ffffffc000204fd8>] __switch_to+0x9c/0xa8
    [<ffffffc00090b794>] __schedule+0x440/0x6d8
    [<ffffffc00090bac0>] schedule+0x94/0xb4
    [<ffffffc00090be44>] schedule_preempt_disabled+0x28/0x44
    [<ffffffc00090d900>] __mutex_lock_slowpath+0x120/0x1ac
    [<ffffffc00090d9d8>] mutex_lock+0x4c/0x68
    [<ffffffc000708e7c>] dm_bufio_shrink_count+0x38/0x78
    [<ffffffc00030b268>] shrink_slab.part.54.constprop.65+0x100/0x464
    [<ffffffc00030dbd8>] shrink_zone+0xa8/0x198
    [<ffffffc00030e578>] balance_pgdat+0x328/0x508
    [<ffffffc00030eb7c>] kswapd+0x424/0x51c
    [<ffffffc00023f06c>] kthread+0x10c/0x114
    [<ffffffc000203dd0>] ret_from_fork+0x10/0x40

 By unblocking kswapd memory pressure should be reduced.

 Suggested-by: David Rientjes <rientjes@google.com>
 Reviewed-by: Guenter Roeck <linux@roeck-us.net>
 Signed-off-by: Douglas Anderson <dianders@chromium.org>
 Signed-off-by: Mike Snitzer <snitzer@redhat.com>
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 ---
  drivers/md/dm-bufio.c |    5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

 --- a/drivers/md/dm-bufio.c
 +++ b/drivers/md/dm-bufio.c
 @@ -818,7 +818,8 @@ static struct dm_buffer *__alloc_buffer_
  	 * dm-bufio is resistant to allocation failures (it just keeps
  	 * one buffer reserved in cases all the allocations fail).
  	 * So set flags to not try too hard:
 -	 *	GFP_NOIO: don't recurse into the I/O layer
 +	 *	GFP_NOWAIT: don't wait; if we need to sleep we'll release our
 +	 *		    mutex and wait ourselves.
  	 *	__GFP_NORETRY: don't retry and rather return failure
  	 *	__GFP_NOMEMALLOC: don't use emergency reserves
  	 *	__GFP_NOWARN: don't print a warning in case of failure
 @@ -828,7 +829,7 @@ static struct dm_buffer *__alloc_buffer_
  	 */
  	while (1) {
  		if (dm_bufio_cache_size_latch != 1) {
 -			b = alloc_buffer(c, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
 +			b = alloc_buffer(c, GFP_NOWAIT | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
  			if (b)
  				return b;
  		}
	From 9ea61cac0b1ad0c09022f39fd97e9b99a2cfc2dc Mon Sep 17 00:00:00 2001
	From: Douglas Anderson <dianders@chromium.org>
	Date: Thu, 17 Nov 2016 11:24:20 -0800
	Subject: dm bufio: avoid sleeping while holding the dm_bufio lock

	From: Douglas Anderson <dianders@chromium.org>

	commit 9ea61cac0b1ad0c09022f39fd97e9b99a2cfc2dc upstream.

	We've seen in-field reports showing _lots_ (18 in one case, 41 in
	another) of tasks all sitting there blocked on:

	mutex_lock+0x4c/0x68
	dm_bufio_shrink_count+0x38/0x78
	shrink_slab.part.54.constprop.65+0x100/0x464
	shrink_zone+0xa8/0x198

	In the two cases analyzed, we see one task that looks like this:

	Workqueue: kverityd verity_prefetch_io

	__switch_to+0x9c/0xa8
	__schedule+0x440/0x6d8
	schedule+0x94/0xb4
	schedule_timeout+0x204/0x27c
	schedule_timeout_uninterruptible+0x44/0x50
	wait_iff_congested+0x9c/0x1f0
	shrink_inactive_list+0x3a0/0x4cc
	shrink_lruvec+0x418/0x5cc
	shrink_zone+0x88/0x198
	try_to_free_pages+0x51c/0x588
	__alloc_pages_nodemask+0x648/0xa88
	__get_free_pages+0x34/0x7c
	alloc_buffer+0xa4/0x144
	__bufio_new+0x84/0x278
	dm_bufio_prefetch+0x9c/0x154
	verity_prefetch_io+0xe8/0x10c
	process_one_work+0x240/0x424
	worker_thread+0x2fc/0x424
	kthread+0x10c/0x114

	...and that looks to be the one holding the mutex.

	The problem has been reproduced on fairly easily:
	0. Be running Chrome OS w/ verity enabled on the root filesystem
	1. Pick test patch: http://crosreview.com/412360
	2. Install launchBalloons.sh and balloon.arm from
	http://crbug.com/468342
	...that's just a memory stress test app.
	3. On a 4GB rk3399 machine, run
	nice ./launchBalloons.sh 4 900 100000
	...that tries to eat 4 * 900 MB of memory and keep accessing.
	4. Login to the Chrome web browser and restore many tabs

	With that, I've seen printouts like:
	DOUG: long bufio 90758 ms
	...and stack trace always show's we're in dm_bufio_prefetch().

	The problem is that we try to allocate memory with GFP_NOIO while
	we're holding the dm_bufio lock. Instead we should be using
	GFP_NOWAIT. Using GFP_NOIO can cause us to sleep while holding the
	lock and that causes the above problems.

	The current behavior explained by David Rientjes:

	It will still try reclaim initially because __GFP_WAIT (or
	__GFP_KSWAPD_RECLAIM) is set by GFP_NOIO. This is the cause of
	contention on dm_bufio_lock() that the thread holds. You want to
	pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a
	mutex that can be contended by a concurrent slab shrinker (if
	count_objects didn't use a trylock, this pattern would trivially
	deadlock).

	This change significantly increases responsiveness of the system while
	in this state. It makes a real difference because it unblocks kswapd.
	In the bug report analyzed, kswapd was hung:

	kswapd0 D ffffffc000204fd8 0 72 2 0x00000000
	Call trace:
	[<ffffffc000204fd8>] __switch_to+0x9c/0xa8
	[<ffffffc00090b794>] __schedule+0x440/0x6d8
	[<ffffffc00090bac0>] schedule+0x94/0xb4
	[<ffffffc00090be44>] schedule_preempt_disabled+0x28/0x44
	[<ffffffc00090d900>] __mutex_lock_slowpath+0x120/0x1ac
	[<ffffffc00090d9d8>] mutex_lock+0x4c/0x68
	[<ffffffc000708e7c>] dm_bufio_shrink_count+0x38/0x78
	[<ffffffc00030b268>] shrink_slab.part.54.constprop.65+0x100/0x464
	[<ffffffc00030dbd8>] shrink_zone+0xa8/0x198
	[<ffffffc00030e578>] balance_pgdat+0x328/0x508
	[<ffffffc00030eb7c>] kswapd+0x424/0x51c
	[<ffffffc00023f06c>] kthread+0x10c/0x114
	[<ffffffc000203dd0>] ret_from_fork+0x10/0x40

	By unblocking kswapd memory pressure should be reduced.

	Suggested-by: David Rientjes <rientjes@google.com>
	Reviewed-by: Guenter Roeck <linux@roeck-us.net>
	Signed-off-by: Douglas Anderson <dianders@chromium.org>
	Signed-off-by: Mike Snitzer <snitzer@redhat.com>
	Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

	---
	drivers/md/dm-bufio.c \| 5 +++--
	1 file changed, 3 insertions(+), 2 deletions(-)

	--- a/drivers/md/dm-bufio.c
	+++ b/drivers/md/dm-bufio.c
	@@ -818,7 +818,8 @@ static struct dm_buffer *__alloc_buffer_
	* dm-bufio is resistant to allocation failures (it just keeps
	* one buffer reserved in cases all the allocations fail).
	* So set flags to not try too hard:
	- * GFP_NOIO: don't recurse into the I/O layer
	+ * GFP_NOWAIT: don't wait; if we need to sleep we'll release our
	+ * mutex and wait ourselves.
	* __GFP_NORETRY: don't retry and rather return failure
	* __GFP_NOMEMALLOC: don't use emergency reserves
	* __GFP_NOWARN: don't print a warning in case of failure
	@@ -828,7 +829,7 @@ static struct dm_buffer *__alloc_buffer_
	*/
	while (1) {
	if (dm_bufio_cache_size_latch != 1) {
	- b = alloc_buffer(c, GFP_NOIO \| __GFP_NORETRY \| __GFP_NOMEMALLOC \| __GFP_NOWARN);
	+ b = alloc_buffer(c, GFP_NOWAIT \| __GFP_NORETRY \| __GFP_NOMEMALLOC \| __GFP_NOWARN);
	if (b)
	return b;
	}