mm, memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap
Physical memory hotadd has to allocate a memmap (struct page array) for
the newly added memory section. kmalloc is currantly used for those
allocations.
This has some disadvantages a) an existing memory is consumed for
that purpose (~2MB per 128MB memory section) and b) if the whole node
is movable then we have off-node struct pages which has performance
drawbacks.
a) has turned out to be a problem for memory hotplug based ballooning
because the userspace might not react in time to online memory while
to memory consumed during physical hotadd consumes enough memory to push
system to OOM. 31bc3858ea3e ("memory-hotplug: add automatic onlining
policy for the newly added memory") has been added to workaround that
problem.
We can do much better when CONFIG_SPARSEMEM_VMEMMAP=y because vmemap
page tables can map arbitrary memory. That means that we can simply
use the beginning of each memory section and map struct pages there.
struct pages which back the allocated space then just need to be treated
carefully so that we know they are not usable.
Add {_Set,_Clear}PageVmemmap helpers to distinguish those pages in pfn
walkers. We do not have any spare page flag for this purpose so use the
combination of PageReserved bit which already tells that the page should
be ignored by the core mm code and store VMEMMAP_PAGE (which sets all
bits but PAGE_MAPPING_FLAGS) into page->mapping.
On the memory hotplug front reuse vmem_altmap infrastructure to override
the default allocator used by __vmemap_populate. Once the memmap is
allocated we need a way to mark altmap pfns used for the allocation
and this is done by a new vmem_altmap::flush_alloc_pfns callback.
mark_vmemmap_pages implementation then simply __SetPageVmemmap all
struct pages backing those pfns. The callback is called from
sparse_add_one_section after the memmap has been initialized to 0.
We also have to be careful about those pages during online and offline
operations. They are simply ignored.
Finally __ClearPageVmemmap is called when the vmemmap page tables are
torn down.
Please note that only the memory hotplug is currently using this
allocation scheme. The boot time memmap allocation could use the same
trick as well but this is not done yet.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
13 files changed