percpu-refcount: Use call_rcu_flush() for atomic switch

call_rcu() changes to save power will slow down percpu refcounter
per-CPU to atomic switch path.  The primitive uses RCU when switching to
atomic mode.

The enqueued async callback wakes up waiters waiting in the
percpu_ref_switch_waitq. This will slow down the per-CPU refcount users
such as blk_pre_runtime_suspend().

Use the call_rcu_flush() API instead which reverts to the old behavior.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index e5c5315..65c58a0 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -230,7 +230,8 @@ static void __percpu_ref_switch_to_atomic(struct percpu_ref *ref,
 		percpu_ref_noop_confirm_switch;
 
 	percpu_ref_get(ref);	/* put after confirmation */
-	call_rcu(&ref->data->rcu, percpu_ref_switch_to_atomic_rcu);
+	call_rcu_flush(&ref->data->rcu,
+		       percpu_ref_switch_to_atomic_rcu);
 }
 
 static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref)