| From: Jiaqi Yan <jiaqiyan@google.com> |
| Subject: mm/memory-failure: userspace controls soft-offlining pages |
| Date: Wed, 26 Jun 2024 05:08:16 +0000 |
| |
| Correctable memory errors are very common on servers with large amount of |
| memory, and are corrected by ECC. Soft offline is kernel's additional |
| recovery handling for memory pages having (excessive) corrected memory |
| errors. Impacted page is migrated to a healthy page if inuse; the |
| original page is discarded for any future use. |
| |
| The actual policy on whether (and when) to soft offline should be |
| maintained by userspace, especially in case of an 1G HugeTLB page. |
| Soft-offline dissolves the HugeTLB page, either in-use or free, into |
| chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage. If |
| userspace has not acknowledged such behavior, it may be surprised when |
| later failed to mmap hugepages due to lack of hugepages. In case of a |
| transparent hugepage, it will be split into 4K pages as well; userspace |
| will stop enjoying the transparent performance. |
| |
| In addition, discarding the entire 1G HugeTLB page only because of |
| corrected memory errors sounds very costly and kernel better not doing |
| under the hood. But today there are at least 2 such cases doing so: |
| 1. when GHES driver sees both GHES_SEV_CORRECTED and |
| CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER. |
| 2. RAS Correctable Errors Collector counts correctable errors per |
| PFN and when the counter for a PFN reaches threshold |
| In both cases, userspace has no control of the soft offline performed |
| by kernel's memory failure recovery. |
| |
| This commit gives userspace the control of softofflining any page: kernel |
| only soft offlines raw page / transparent hugepage / HugeTLB hugepage if |
| userspace has agreed to. The interface to userspace is a new sysctl at |
| /proc/sys/vm/enable_soft_offline. By default its value is set to 1 to |
| preserve existing behavior in kernel. When set to 0, soft-offline (e.g. |
| MADV_SOFT_OFFLINE) will fail with EOPNOTSUPP. |
| |
| [jiaqiyan@google.com: v7] |
| Link: https://lkml.kernel.org/r/20240628205958.2845610-3-jiaqiyan@google.com |
| Link: https://lkml.kernel.org/r/20240626050818.2277273-3-jiaqiyan@google.com |
| Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> |
| Acked-by: Miaohe Lin <linmiaohe@huawei.com> |
| Acked-by: David Rientjes <rientjes@google.com> |
| Cc: Frank van der Linden <fvdl@google.com> |
| Cc: Jane Chu <jane.chu@oracle.com> |
| Cc: Jonathan Corbet <corbet@lwn.net> |
| Cc: Lance Yang <ioworker0@gmail.com> |
| Cc: Muchun Song <muchun.song@linux.dev> |
| Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> |
| Cc: Oscar Salvador <osalvador@suse.de> |
| Cc: Randy Dunlap <rdunlap@infradead.org> |
| Cc: Shuah Khan <shuah@kernel.org> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| --- |
| |
| mm/memory-failure.c | 22 ++++++++++++++++++++-- |
| 1 file changed, 20 insertions(+), 2 deletions(-) |
| |
| --- a/mm/memory-failure.c~mm-memory-failure-userspace-controls-soft-offlining-pages |
| +++ a/mm/memory-failure.c |
| @@ -68,6 +68,8 @@ static int sysctl_memory_failure_early_k |
| |
| static int sysctl_memory_failure_recovery __read_mostly = 1; |
| |
| +static int sysctl_enable_soft_offline __read_mostly = 1; |
| + |
| atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); |
| |
| static bool hw_memory_failure __read_mostly = false; |
| @@ -141,6 +143,15 @@ static struct ctl_table memory_failure_t |
| .extra1 = SYSCTL_ZERO, |
| .extra2 = SYSCTL_ONE, |
| }, |
| + { |
| + .procname = "enable_soft_offline", |
| + .data = &sysctl_enable_soft_offline, |
| + .maxlen = sizeof(sysctl_enable_soft_offline), |
| + .mode = 0644, |
| + .proc_handler = proc_dointvec_minmax, |
| + .extra1 = SYSCTL_ZERO, |
| + .extra2 = SYSCTL_ONE, |
| + } |
| }; |
| |
| /* |
| @@ -2758,8 +2769,9 @@ static int soft_offline_in_use_page(stru |
| * @pfn: pfn to soft-offline |
| * @flags: flags. Same as memory_failure(). |
| * |
| - * Returns 0 on success |
| - * -EOPNOTSUPP for hwpoison_filter() filtered the error event |
| + * Returns 0 on success, |
| + * -EOPNOTSUPP for hwpoison_filter() filtered the error event, or |
| + * disabled by /proc/sys/vm/enable_soft_offline, |
| * < 0 otherwise negated errno. |
| * |
| * Soft offline a page, by migration or invalidation, |
| @@ -2795,6 +2807,12 @@ int soft_offline_page(unsigned long pfn, |
| return -EIO; |
| } |
| |
| + if (!sysctl_enable_soft_offline) { |
| + pr_info_once("disabled by /proc/sys/vm/enable_soft_offline\n"); |
| + put_ref_page(pfn, flags); |
| + return -EOPNOTSUPP; |
| + } |
| + |
| mutex_lock(&mf_mutex); |
| |
| if (PageHWPoison(page)) { |
| _ |