| From fc89213a636c3735eb3386f10a34c082271b4192 Mon Sep 17 00:00:00 2001 |
| From: Huang Ying <ying.huang@intel.com> |
| Date: Tue, 22 Mar 2022 14:46:05 -0700 |
| Subject: mm,migrate: fix establishing demotion target |
| |
| From: Huang Ying <ying.huang@intel.com> |
| |
| commit fc89213a636c3735eb3386f10a34c082271b4192 upstream. |
| |
| In commit ac16ec835314 ("mm: migrate: support multiple target nodes |
| demotion"), after the first demotion target node is found, we will |
| continue to check the next candidate obtained via find_next_best_node(). |
| This is to find all demotion target nodes with same NUMA distance. But |
| one side effect of find_next_best_node() is that the candidate node |
| returned will be set in "used" parameter, even if the candidate node isn't |
| passed in the following NUMA distance checking, the candidate node will |
| not be used as demotion target node for the following nodes. For example, |
| for system as follows, |
| |
| node distances: |
| node 0 1 2 3 |
| 0: 10 21 17 28 |
| 1: 21 10 28 17 |
| 2: 17 28 10 28 |
| 3: 28 17 28 10 |
| |
| when we establish demotion target node for node 0, in the first round node |
| 2 is added to the demotion target node set. Then in the second round, |
| node 3 is checked and failed because distance(0, 3) > distance(0, 2). But |
| node 3 is set in "used" nodemask too. When we establish demotion target |
| node for node 1, there is no available node. This is wrong, node 3 should |
| be set as the demotion target of node 1. |
| |
| To fix this, if the candidate node is failed to pass the distance |
| checking, it will be cleared in "used" nodemask. So that it can be used |
| for the following node. |
| |
| The bug can be reproduced and fixed with this patch on a 2 socket server |
| machine with DRAM and PMEM. |
| |
| Link: https://lkml.kernel.org/r/20220128055940.1792614-1-ying.huang@intel.com |
| Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes demotion") |
| Signed-off-by: "Huang, Ying" <ying.huang@intel.com> |
| Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> |
| Cc: Baolin Wang <baolin.wang@linux.alibaba.com> |
| Cc: Dave Hansen <dave.hansen@linux.intel.com> |
| Cc: Zi Yan <ziy@nvidia.com> |
| Cc: Oscar Salvador <osalvador@suse.de> |
| Cc: Yang Shi <shy828301@gmail.com> |
| Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com> |
| Cc: Xunlei Pang <xlpang@linux.alibaba.com> |
| Cc: Mel Gorman <mgorman@techsingularity.net> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| --- |
| mm/migrate.c | 7 +++++-- |
| 1 file changed, 5 insertions(+), 2 deletions(-) |
| |
| --- a/mm/migrate.c |
| +++ b/mm/migrate.c |
| @@ -3085,18 +3085,21 @@ static int establish_migrate_target(int |
| if (best_distance != -1) { |
| val = node_distance(node, migration_target); |
| if (val > best_distance) |
| - return NUMA_NO_NODE; |
| + goto out_clear; |
| } |
| |
| index = nd->nr; |
| if (WARN_ONCE(index >= DEMOTION_TARGET_NODES, |
| "Exceeds maximum demotion target nodes\n")) |
| - return NUMA_NO_NODE; |
| + goto out_clear; |
| |
| nd->nodes[index] = migration_target; |
| nd->nr++; |
| |
| return migration_target; |
| +out_clear: |
| + node_clear(migration_target, *used); |
| + return NUMA_NO_NODE; |
| } |
| |
| /* |