| From: Al Viro <viro@zeniv.linux.org.uk> |
| Date: Thu, 9 Aug 2018 17:51:32 -0400 |
| Subject: fix __legitimize_mnt()/mntput() race |
| |
| commit 119e1ef80ecfe0d1deb6378d4ab41f5b71519de1 upstream. |
| |
| __legitimize_mnt() has two problems - one is that in case of success |
| the check of mount_lock is not ordered wrt preceding increment of |
| refcount, making it possible to have successful __legitimize_mnt() |
| on one CPU just before the otherwise final mntpu() on another, |
| with __legitimize_mnt() not seeing mntput() taking the lock and |
| mntput() not seeing the increment done by __legitimize_mnt(). |
| Solved by a pair of barriers. |
| |
| Another is that failure of __legitimize_mnt() on the second |
| read_seqretry() leaves us with reference that'll need to be |
| dropped by caller; however, if that races with final mntput() |
| we can end up with caller dropping rcu_read_lock() and doing |
| mntput() to release that reference - with the first mntput() |
| having freed the damn thing just as rcu_read_lock() had been |
| dropped. Solution: in "do mntput() yourself" failure case |
| grab mount_lock, check if MNT_DOOMED has been set by racing |
| final mntput() that has missed our increment and if it has - |
| undo the increment and treat that as "failure, caller doesn't |
| need to drop anything" case. |
| |
| It's not easy to hit - the final mntput() has to come right |
| after the first read_seqretry() in __legitimize_mnt() *and* |
| manage to miss the increment done by __legitimize_mnt() before |
| the second read_seqretry() in there. The things that are almost |
| impossible to hit on bare hardware are not impossible on SMP |
| KVM, though... |
| |
| Reported-by: Oleg Nesterov <oleg@redhat.com> |
| Fixes: 48a066e72d97 ("RCU'd vsfmounts") |
| Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
| [bwh: Backported to 3.16: __legitimize_mnt() has not been split out |
| from legitimize_mnt(). Adjust the added return statement and |
| comments accordingly.] |
| Signed-off-by: Ben Hutchings <ben@decadent.org.uk> |
| --- |
| fs/namespace.c | 14 ++++++++++++++ |
| 1 file changed, 14 insertions(+) |
| |
| --- a/fs/namespace.c |
| +++ b/fs/namespace.c |
| @@ -592,12 +592,20 @@ bool legitimize_mnt(struct vfsmount *bas |
| return true; |
| mnt = real_mount(bastard); |
| mnt_add_count(mnt, 1); |
| + smp_mb(); // see mntput_no_expire() |
| if (likely(!read_seqretry(&mount_lock, seq))) |
| return true; |
| if (bastard->mnt_flags & MNT_SYNC_UMOUNT) { |
| mnt_add_count(mnt, -1); |
| return false; |
| } |
| + lock_mount_hash(); |
| + if (unlikely(bastard->mnt_flags & MNT_DOOMED)) { |
| + mnt_add_count(mnt, -1); |
| + unlock_mount_hash(); |
| + return false; |
| + } |
| + unlock_mount_hash(); |
| rcu_read_unlock(); |
| mntput(bastard); |
| rcu_read_lock(); |
| @@ -984,6 +992,11 @@ put_again: |
| return; |
| } |
| lock_mount_hash(); |
| + /* |
| + * make sure that if legitimize_mnt() has not seen us grab |
| + * mount_lock, we'll see their refcount increment here. |
| + */ |
| + smp_mb(); |
| mnt_add_count(mnt, -1); |
| if (mnt_get_count(mnt)) { |
| rcu_read_unlock(); |