| From foo@baz Thu Jan 12 21:36:14 CET 2017 |
| From: Daniel Borkmann <daniel@iogearbox.net> |
| Date: Wed, 21 Dec 2016 18:04:11 +0100 |
| Subject: net, sched: fix soft lockup in tc_classify |
| |
| From: Daniel Borkmann <daniel@iogearbox.net> |
| |
| |
| [ Upstream commit 628185cfddf1dfb701c4efe2cfd72cf5b09f5702 ] |
| |
| Shahar reported a soft lockup in tc_classify(), where we run into an |
| endless loop when walking the classifier chain due to tp->next == tp |
| which is a state we should never run into. The issue only seems to |
| trigger under load in the tc control path. |
| |
| What happens is that in tc_ctl_tfilter(), thread A allocates a new |
| tp, initializes it, sets tp_created to 1, and calls into tp->ops->change() |
| with it. In that classifier callback we had to unlock/lock the rtnl |
| mutex and returned with -EAGAIN. One reason why we need to drop there |
| is, for example, that we need to request an action module to be loaded. |
| |
| This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning |
| after we loaded and found the requested action, we need to redo the |
| whole request so we don't race against others. While we had to unlock |
| rtnl in that time, thread B's request was processed next on that CPU. |
| Thread B added a new tp instance successfully to the classifier chain. |
| When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN |
| and destroying its tp instance which never got linked, we goto replay |
| and redo A's request. |
| |
| This time when walking the classifier chain in tc_ctl_tfilter() for |
| checking for existing tp instances we had a priority match and found |
| the tp instance that was created and linked by thread B. Now calling |
| again into tp->ops->change() with that tp was successful and returned |
| without error. |
| |
| tp_created was never cleared in the second round, thus kernel thinks |
| that we need to link it into the classifier chain (once again). tp and |
| *back point to the same object due to the match we had earlier on. Thus |
| for thread B's already public tp, we reset tp->next to tp itself and |
| link it into the chain, which eventually causes the mentioned endless |
| loop in tc_classify() once a packet hits the data path. |
| |
| Fix is to clear tp_created at the beginning of each request, also when |
| we replay it. On the paths that can cause -EAGAIN we already destroy |
| the original tp instance we had and on replay we really need to start |
| from scratch. It seems that this issue was first introduced in commit |
| 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining |
| and avoid kernel panic when we use cls_cgroup"). |
| |
| Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup") |
| Reported-by: Shahar Klein <shahark@mellanox.com> |
| Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
| Cc: Cong Wang <xiyou.wangcong@gmail.com> |
| Acked-by: Eric Dumazet <edumazet@google.com> |
| Tested-by: Shahar Klein <shahark@mellanox.com> |
| Signed-off-by: David S. Miller <davem@davemloft.net> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| --- |
| net/sched/cls_api.c | 4 +++- |
| 1 file changed, 3 insertions(+), 1 deletion(-) |
| |
| --- a/net/sched/cls_api.c |
| +++ b/net/sched/cls_api.c |
| @@ -137,13 +137,15 @@ static int tc_ctl_tfilter(struct sk_buff |
| unsigned long cl; |
| unsigned long fh; |
| int err; |
| - int tp_created = 0; |
| + int tp_created; |
| |
| if ((n->nlmsg_type != RTM_GETTFILTER) && |
| !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) |
| return -EPERM; |
| |
| replay: |
| + tp_created = 0; |
| + |
| err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL); |
| if (err < 0) |
| return err; |