| From 588377d6199034c36d335e7df5818b731fea072c Mon Sep 17 00:00:00 2001 |
| From: Alex Elder <elder@inktank.com> |
| Date: Mon, 8 Oct 2012 20:37:30 -0700 |
| Subject: rbd: reset BACKOFF if unable to re-queue |
| |
| From: Alex Elder <elder@inktank.com> |
| |
| commit 588377d6199034c36d335e7df5818b731fea072c upstream. |
| |
| If ceph_fault() is unable to queue work after a delay, it sets the |
| BACKOFF connection flag so con_work() will attempt to do so. |
| |
| In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't |
| result in newly-queued work, it simply ignores this condition and |
| proceeds as if no backoff delay were desired. There are two |
| problems with this--one of which is a bug. |
| |
| The first problem is simply that the intended behavior is to back |
| off, and if we aren't able queue the work item to run after a delay |
| we're not doing that. |
| |
| The only reason queue_delayed_work() won't queue work is if the |
| provided work item is already queued. In the messenger, this |
| means that con_work() is already scheduled to be run again. So |
| if we simply set the BACKOFF flag again when this occurs, we know |
| the next con_work() call will again attempt to hold off activity |
| on the connection until after the delay. |
| |
| The second problem--the bug--is a leak of a reference count. If |
| queue_delayed_work() returns 0 in con_work(), con->ops->put() drops |
| the connection reference held on entry to con_work(). However, |
| processing is (was) allowed to continue, and at the end of the |
| function a second con->ops->put() is called. |
| |
| This patch fixes both problems. |
| |
| Signed-off-by: Alex Elder <elder@inktank.com> |
| Reviewed-by: Sage Weil <sage@inktank.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| net/ceph/messenger.c | 3 ++- |
| 1 file changed, 2 insertions(+), 1 deletion(-) |
| |
| --- a/net/ceph/messenger.c |
| +++ b/net/ceph/messenger.c |
| @@ -2300,10 +2300,11 @@ restart: |
| mutex_unlock(&con->mutex); |
| return; |
| } else { |
| - con->ops->put(con); |
| dout("con_work %p FAILED to back off %lu\n", con, |
| con->delay); |
| + set_bit(CON_FLAG_BACKOFF, &con->flags); |
| } |
| + goto done; |
| } |
| |
| if (con->state == CON_STATE_STANDBY) { |