| From 7d36665a5886c27ca4c4d0afd3ecc50b400f3587 Mon Sep 17 00:00:00 2001 |
| From: Chunguang Xu <brookxu@tencent.com> |
| Date: Sat, 21 Mar 2020 18:22:10 -0700 |
| Subject: memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event |
| MIME-Version: 1.0 |
| Content-Type: text/plain; charset=UTF-8 |
| Content-Transfer-Encoding: 8bit |
| |
| From: Chunguang Xu <brookxu@tencent.com> |
| |
| commit 7d36665a5886c27ca4c4d0afd3ecc50b400f3587 upstream. |
| |
| An eventfd monitors multiple memory thresholds of the cgroup, closes them, |
| the kernel deletes all events related to this eventfd. Before all events |
| are deleted, another eventfd monitors the memory threshold of this cgroup, |
| leading to a crash: |
| |
| BUG: kernel NULL pointer dereference, address: 0000000000000004 |
| #PF: supervisor write access in kernel mode |
| #PF: error_code(0x0002) - not-present page |
| PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0 |
| Oops: 0002 [#1] SMP PTI |
| CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3 |
| Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014 |
| Workqueue: events memcg_event_remove |
| RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190 |
| RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202 |
| RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001 |
| RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001 |
| RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010 |
| R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880 |
| R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000 |
| FS: 0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000 |
| CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 |
| CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0 |
| Call Trace: |
| memcg_event_remove+0x32/0x90 |
| process_one_work+0x172/0x380 |
| worker_thread+0x49/0x3f0 |
| kthread+0xf8/0x130 |
| ret_from_fork+0x35/0x40 |
| CR2: 0000000000000004 |
| |
| We can reproduce this problem in the following ways: |
| |
| 1. We create a new cgroup subdirectory and a new eventfd, and then we |
| monitor multiple memory thresholds of the cgroup through this eventfd. |
| |
| 2. closing this eventfd, and __mem_cgroup_usage_unregister_event () |
| will be called multiple times to delete all events related to this |
| eventfd. |
| |
| The first time __mem_cgroup_usage_unregister_event() is called, the |
| kernel will clear all items related to this eventfd in thresholds-> |
| primary. |
| |
| Since there is currently only one eventfd, thresholds-> primary becomes |
| empty, so the kernel will set thresholds-> primary and hresholds-> spare |
| to NULL. If at this time, the user creates a new eventfd and monitor |
| the memory threshold of this cgroup, kernel will re-initialize |
| thresholds-> primary. |
| |
| Then when __mem_cgroup_usage_unregister_event () is called for the |
| second time, because thresholds-> primary is not empty, the system will |
| access thresholds-> spare, but thresholds-> spare is NULL, which will |
| trigger a crash. |
| |
| In general, the longer it takes to delete all events related to this |
| eventfd, the easier it is to trigger this problem. |
| |
| The solution is to check whether the thresholds associated with the |
| eventfd has been cleared when deleting the event. If so, we do nothing. |
| |
| [akpm@linux-foundation.org: fix comment, per Kirill] |
| Fixes: 907860ed381a ("cgroups: make cftype.unregister_event() void-returning") |
| Signed-off-by: Chunguang Xu <brookxu@tencent.com> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| Acked-by: Michal Hocko <mhocko@suse.com> |
| Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
| Cc: Johannes Weiner <hannes@cmpxchg.org> |
| Cc: Vladimir Davydov <vdavydov.dev@gmail.com> |
| Cc: <stable@vger.kernel.org> |
| Link: http://lkml.kernel.org/r/077a6f67-aefa-4591-efec-f2f3af2b0b02@gmail.com |
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| mm/memcontrol.c | 10 ++++++++-- |
| 1 file changed, 8 insertions(+), 2 deletions(-) |
| |
| --- a/mm/memcontrol.c |
| +++ b/mm/memcontrol.c |
| @@ -3480,7 +3480,7 @@ static void __mem_cgroup_usage_unregiste |
| struct mem_cgroup_thresholds *thresholds; |
| struct mem_cgroup_threshold_ary *new; |
| unsigned long usage; |
| - int i, j, size; |
| + int i, j, size, entries; |
| |
| mutex_lock(&memcg->thresholds_lock); |
| |
| @@ -3500,14 +3500,20 @@ static void __mem_cgroup_usage_unregiste |
| __mem_cgroup_threshold(memcg, type == _MEMSWAP); |
| |
| /* Calculate new number of threshold */ |
| - size = 0; |
| + size = entries = 0; |
| for (i = 0; i < thresholds->primary->size; i++) { |
| if (thresholds->primary->entries[i].eventfd != eventfd) |
| size++; |
| + else |
| + entries++; |
| } |
| |
| new = thresholds->spare; |
| |
| + /* If no items related to eventfd have been cleared, nothing to do */ |
| + if (!entries) |
| + goto unlock; |
| + |
| /* Set thresholds array to NULL if we don't have thresholds */ |
| if (!size) { |
| kfree(new); |