| From foo@baz Fri Apr 11 08:46:36 PDT 2014 |
| From: Nikolay Aleksandrov <nikolay@redhat.com> |
| Date: Mon, 3 Mar 2014 23:19:18 +0100 |
| Subject: net: fix for a race condition in the inet frag code |
| |
| From: Nikolay Aleksandrov <nikolay@redhat.com> |
| |
| [ Upstream commit 24b9bf43e93e0edd89072da51cf1fab95fc69dec ] |
| |
| I stumbled upon this very serious bug while hunting for another one, |
| it's a very subtle race condition between inet_frag_evictor, |
| inet_frag_intern and the IPv4/6 frag_queue and expire functions |
| (basically the users of inet_frag_kill/inet_frag_put). |
| |
| What happens is that after a fragment has been added to the hash chain |
| but before it's been added to the lru_list (inet_frag_lru_add) in |
| inet_frag_intern, it may get deleted (either by an expired timer if |
| the system load is high or the timer sufficiently low, or by the |
| fraq_queue function for different reasons) before it's added to the |
| lru_list, then after it gets added it's a matter of time for the |
| evictor to get to a piece of memory which has been freed leading to a |
| number of different bugs depending on what's left there. |
| |
| I've been able to trigger this on both IPv4 and IPv6 (which is normal |
| as the frag code is the same), but it's been much more difficult to |
| trigger on IPv4 due to the protocol differences about how fragments |
| are treated. |
| |
| The setup I used to reproduce this is: 2 machines with 4 x 10G bonded |
| in a RR bond, so the same flow can be seen on multiple cards at the |
| same time. Then I used multiple instances of ping/ping6 to generate |
| fragmented packets and flood the machines with them while running |
| other processes to load the attacked machine. |
| |
| *It is very important to have the _same flow_ coming in on multiple CPUs |
| concurrently. Usually the attacked machine would die in less than 30 |
| minutes, if configured properly to have many evictor calls and timeouts |
| it could happen in 10 minutes or so. |
| |
| An important point to make is that any caller (frag_queue or timer) of |
| inet_frag_kill will remove both the timer refcount and the |
| original/guarding refcount thus removing everything that's keeping the |
| frag from being freed at the next inet_frag_put. All of this could |
| happen before the frag was ever added to the LRU list, then it gets |
| added and the evictor uses a freed fragment. |
| |
| An example for IPv6 would be if a fragment is being added and is at |
| the stage of being inserted in the hash after the hash lock is |
| released, but before inet_frag_lru_add executes (or is able to obtain |
| the lru lock) another overlapping fragment for the same flow arrives |
| at a different CPU which finds it in the hash, but since it's |
| overlapping it drops it invoking inet_frag_kill and thus removing all |
| guarding refcounts, and afterwards freeing it by invoking |
| inet_frag_put which removes the last refcount added previously by |
| inet_frag_find, then inet_frag_lru_add gets executed by |
| inet_frag_intern and we have a freed fragment in the lru_list. |
| |
| The fix is simple, just move the lru_add under the hash chain locked |
| region so when a removing function is called it'll have to wait for |
| the fragment to be added to the lru_list, and then it'll remove it (it |
| works because the hash chain removal is done before the lru_list one |
| and there's no window between the two list adds when the frag can get |
| dropped). With this fix applied I couldn't kill the same machine in 24 |
| hours with the same setup. |
| |
| Fixes: 3ef0eb0db4bf ("net: frag, move LRU list maintenance outside of |
| rwlock") |
| |
| CC: Florian Westphal <fw@strlen.de> |
| CC: Jesper Dangaard Brouer <brouer@redhat.com> |
| CC: David S. Miller <davem@davemloft.net> |
| |
| Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> |
| Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> |
| Signed-off-by: David S. Miller <davem@davemloft.net> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| --- |
| net/ipv4/inet_fragment.c | 3 ++- |
| 1 file changed, 2 insertions(+), 1 deletion(-) |
| |
| --- a/net/ipv4/inet_fragment.c |
| +++ b/net/ipv4/inet_fragment.c |
| @@ -283,9 +283,10 @@ static struct inet_frag_queue *inet_frag |
| |
| atomic_inc(&qp->refcnt); |
| hlist_add_head(&qp->list, &hb->chain); |
| + inet_frag_lru_add(nf, qp); |
| spin_unlock(&hb->chain_lock); |
| read_unlock(&f->lock); |
| - inet_frag_lru_add(nf, qp); |
| + |
| return qp; |
| } |
| |