| From a3fe92dcb4057571c6bb648902690b4e3b2ffc2f Mon Sep 17 00:00:00 2001 |
| From: David Hildenbrand <david@redhat.com> |
| Date: Sat, 28 Mar 2020 19:17:19 -0700 |
| Subject: [PATCH] drivers/base/memory.c: indicate all memory blocks as |
| removable |
| |
| commit 53cdc1cb29e87ce5a61de5bb393eb08925d14ede upstream. |
| |
| We see multiple issues with the implementation/interface to compute |
| whether a memory block can be offlined (exposed via |
| /sys/devices/system/memory/memoryX/removable) and would like to simplify |
| it (remove the implementation). |
| |
| 1. It runs basically lockless. While this might be good for performance, |
| we see possible races with memory offlining that will require at |
| least some sort of locking to fix. |
| |
| 2. Nowadays, more false positives are possible. No arch-specific checks |
| are performed that validate if memory offlining will not be denied |
| right away (and such check will require locking). For example, arm64 |
| won't allow to offline any memory block that was added during boot - |
| which will imply a very high error rate. Other archs have other |
| constraints. |
| |
| 3. The interface is inherently racy. E.g., if a memory block is detected |
| to be removable (and was not a false positive at that time), there is |
| still no guarantee that offlining will actually succeed. So any |
| caller already has to deal with false positives. |
| |
| 4. It is unclear which performance benefit this interface actually |
| provides. The introducing commit 5c755e9fd813 ("memory-hotplug: add |
| sysfs removable attribute for hotplug memory remove") mentioned |
| |
| "A user-level agent must be able to identify which sections |
| of memory are likely to be removable before attempting the |
| potentially expensive operation." |
| |
| However, no actual performance comparison was included. |
| |
| Known users: |
| |
| - lsmem: Will group memory blocks based on the "removable" property. [1] |
| |
| - chmem: Indirect user. It has a RANGE mode where one can specify |
| removable ranges identified via lsmem to be offlined. However, |
| it also has a "SIZE" mode, which allows a sysadmin to skip the |
| manual "identify removable blocks" step. [2] |
| |
| - powerpc-utils: Uses the "removable" attribute to skip some memory |
| blocks right away when trying to find some to offline+remove. |
| However, with ballooning enabled, it already skips this |
| information completely (because it once resulted in many false |
| negatives). Therefore, the implementation can deal with false |
| positives properly already. [3] |
| |
| According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer |
| driven from userspace via the drmgr command (powerpc-utils). Nowadays |
| it's managed in the kernel - including onlining/offlining of memory |
| blocks - triggered by drmgr writing to /sys/kernel/dlpar. So the |
| affected legacy userspace handling is only active on old kernels. Only |
| very old versions of drmgr on a new kernel (unlikely) might execute |
| slower - totally acceptable. |
| |
| With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not |
| break any user space tool. We implement a very bad heuristic now. |
| Without CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report |
| "not removable" as before. |
| |
| Original discussion can be found in [4] ("[PATCH RFC v1] mm: |
| is_mem_section_removable() overhaul"). |
| |
| Other users of is_mem_section_removable() will be removed next, so that |
| we can remove is_mem_section_removable() completely. |
| |
| [1] http://man7.org/linux/man-pages/man1/lsmem.1.html |
| [2] http://man7.org/linux/man-pages/man8/chmem.8.html |
| [3] https://github.com/ibm-power-utilities/powerpc-utils |
| [4] https://lkml.kernel.org/r/20200117105759.27905-1-david@redhat.com |
| |
| Also, this patch probably fixes a crash reported by Steve. |
| http://lkml.kernel.org/r/CAPcyv4jpdaNvJ67SkjyUJLBnBnXXQv686BiVW042g03FUmWLXw@mail.gmail.com |
| |
| Reported-by: "Scargall, Steve" <steve.scargall@intel.com> |
| Suggested-by: Michal Hocko <mhocko@kernel.org> |
| Signed-off-by: David Hildenbrand <david@redhat.com> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| Reviewed-by: Nathan Fontenot <ndfont@gmail.com> |
| Acked-by: Michal Hocko <mhocko@suse.com> |
| Cc: Dan Williams <dan.j.williams@intel.com> |
| Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| Cc: "Rafael J. Wysocki" <rafael@kernel.org> |
| Cc: Badari Pulavarty <pbadari@us.ibm.com> |
| Cc: Robert Jennings <rcj@linux.vnet.ibm.com> |
| Cc: Heiko Carstens <heiko.carstens@de.ibm.com> |
| Cc: Karel Zak <kzak@redhat.com> |
| Cc: <stable@vger.kernel.org> |
| Link: http://lkml.kernel.org/r/20200128093542.6908-1-david@redhat.com |
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
| Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> |
| |
| diff --git a/drivers/base/memory.c b/drivers/base/memory.c |
| index acd5814ed20e..94e37a56df6f 100644 |
| --- a/drivers/base/memory.c |
| +++ b/drivers/base/memory.c |
| @@ -121,27 +121,13 @@ static ssize_t phys_index_show(struct device *dev, |
| } |
| |
| /* |
| - * Show whether the section of memory is likely to be hot-removable |
| + * Legacy interface that we cannot remove. Always indicate "removable" |
| + * with CONFIG_MEMORY_HOTREMOVE - bad heuristic. |
| */ |
| static ssize_t removable_show(struct device *dev, struct device_attribute *attr, |
| char *buf) |
| { |
| - unsigned long i, pfn; |
| - int ret = 1; |
| - struct memory_block *mem = to_memory_block(dev); |
| - |
| - if (mem->state != MEM_ONLINE) |
| - goto out; |
| - |
| - for (i = 0; i < sections_per_block; i++) { |
| - if (!present_section_nr(mem->start_section_nr + i)) |
| - continue; |
| - pfn = section_nr_to_pfn(mem->start_section_nr + i); |
| - ret &= is_mem_section_removable(pfn, PAGES_PER_SECTION); |
| - } |
| - |
| -out: |
| - return sprintf(buf, "%d\n", ret); |
| + return sprintf(buf, "%d\n", (int)IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)); |
| } |
| |
| /* |
| -- |
| 2.7.4 |
| |