| ************************************************************************* |
| How to choose the correct memmap kernel parameter for PMEM on your system |
| ************************************************************************* |
| |
| When selecting a memmap kernel parameter for PMEM you have to be careful |
| that the physical addresses you are trying to reserve represent usable |
| RAM. This information is easily available in the e820 table, available |
| via dmesg. |
| |
| Here is an example setup using a virtual machine with 20GiB of memory: |
| |
| :: |
| |
| # dmesg | grep BIOS-e820 |
| [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable |
| [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved |
| [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved |
| [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable |
| [ 0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved |
| [ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved |
| [ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved |
| [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000053fffffff] usable |
| |
| In this output the regions marked as "usable" are fair game to be |
| reserved for the PMEM driver, while the "reserved" regions are not. The |
| last usable region represents the bulk of our available space, so we'll |
| use that. |
| |
| :: |
| |
| [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000053fffffff] usable |
| |
| Plugging these physical addresses into our hex calculator, the region |
| starts at 0x0000000100000000 (4 GiB) and ends at 0x000000053fffffff (21 |
| GiB). Say we want to reserve 16 GiB to be used by PMEM. We can start |
| this reservation at 4 GiB, and with size 16 GiB it will end at 20 GiB |
| which is still within this usable range. The syntax for this reservation |
| will then be as follows: |
| |
| :: |
| |
| memmap=16G!4G |
| |
| After rebooting with our new kernel parameter, we can see our new user |
| defined e820 table via dmesg as well (the old table is still present, in |
| case you want to compare): |
| |
| :: |
| |
| # dmesg | grep user: |
| [ 0.000000] user: [mem 0x0000000000000000-0x000000000009fbff] usable |
| [ 0.000000] user: [mem 0x000000000009fc00-0x000000000009ffff] reserved |
| [ 0.000000] user: [mem 0x00000000000f0000-0x00000000000fffff] reserved |
| [ 0.000000] user: [mem 0x0000000000100000-0x00000000bffdffff] usable |
| [ 0.000000] user: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved |
| [ 0.000000] user: [mem 0x00000000feffc000-0x00000000feffffff] reserved |
| [ 0.000000] user: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved |
| [ 0.000000] user: [mem 0x0000000100000000-0x00000004ffffffff] persistent (type 12) |
| [ 0.000000] user: [mem 0x0000000500000000-0x000000053fffffff] usable |
| |
| We can see that our new persistent memory range does indeed start at 4 |
| GiB and end at 20 GiB, fully overlapping the usable memory range defined |
| in the e820 table output. |
| |
| If we have the pmem driver loaded, we will see this reserved memory |
| range as /dev/pmem0: |
| |
| :: |
| |
| # fdisk -l /dev/pmem0 |
| Disk /dev/pmem0: 16 GiB, 17179869184 bytes, 33554432 sectors |
| Units: sectors of 1 * 512 = 512 bytes |
| Sector size (logical/physical): 512 bytes / 4096 bytes |
| I/O size (minimum/optimal): 4096 bytes / 4096 bytes |
| |
| Another thing that you may need to be aware of is the |
| CONFIG_RANDOMIZE_BASE kernel config option. When enabled, this |
| randomizes the physical address at which the kernel image is |
| decompressed and the virtual address where the kernel image is mapped. |
| Currently this random address is chosen without regard to the memmap |
| kernel command line parameter. |
| |
| This means that the kernel can choose to put itself in the middle of |
| your reserved memmap area. You can observe this behavior via |
| /proc/iomem. |
| |
| Here is /proc/iomem from a system with CONFIG_RANDOMIZE_BASE turned off: |
| |
| :: |
| |
| # cat /proc/iomem |
| 00000000-00000fff : reserved |
| 00001000-0009fbff : System RAM |
| 0009fc00-0009ffff : reserved |
| 000a0000-000bffff : PCI Bus 0000:00 |
| 000c0000-000c97ff : Video ROM |
| 000c9800-000ca5ff : Adapter ROM |
| 000ca800-000ccbff : Adapter ROM |
| 000f0000-000fffff : reserved |
| 000f0000-000fffff : System ROM |
| 00100000-bffd8fff : System RAM |
| 01000000-01b18598 : Kernel code |
| 01b18599-023f53ff : Kernel data |
| 0276d000-0365efff : Kernel bss |
| bffd9000-bfffffff : reserved |
| c0000000-febfffff : PCI Bus 0000:00 |
| f4000000-f7ffffff : 0000:00:02.0 |
| f8000000-fbffffff : 0000:00:02.0 |
| fc000000-fc03ffff : 0000:00:03.0 |
| fc050000-fc051fff : 0000:00:02.0 |
| fc052000-fc052fff : 0000:00:03.0 |
| fc053000-fc053fff : 0000:00:04.0 |
| fc054000-fc054fff : 0000:00:05.7 |
| fc054000-fc054fff : ehci_hcd |
| fc055000-fc055fff : 0000:00:06.0 |
| fec00000-fec003ff : IOAPIC 0 |
| fee00000-fee00fff : Local APIC |
| feffc000-feffffff : reserved |
| fffc0000-ffffffff : reserved |
| 100000000-4ffffffff : Persistent Memory (legacy) |
| 100000000-4ffffffff : namespace0.0 |
| 500000000-53fffffff : System RAM |
| |
| The interesting bits for us are the "System RAM" region from |
| 00100000-bffd8fff, and the "Persistent Memory (legacy)" region from |
| 100000000-4ffffffff. |
| |
| If I turn on CONFIG_RANDOMIZE_BASE on this same system, I get the |
| following: |
| |
| :: |
| |
| # cat /proc/iomem |
| 00000000-00000fff : reserved |
| 00001000-0009fbff : System RAM |
| 0009fc00-0009ffff : reserved |
| 000a0000-000bffff : PCI Bus 0000:00 |
| 000c0000-000c97ff : Video ROM |
| 000c9800-000ca5ff : Adapter ROM |
| 000ca800-000ccbff : Adapter ROM |
| 000f0000-000fffff : reserved |
| 000f0000-000fffff : System ROM |
| 00100000-bffd8fff : System RAM |
| bffd9000-bfffffff : reserved |
| c0000000-febfffff : PCI Bus 0000:00 |
| f4000000-f7ffffff : 0000:00:02.0 |
| f8000000-fbffffff : 0000:00:02.0 |
| fc000000-fc03ffff : 0000:00:03.0 |
| fc050000-fc051fff : 0000:00:02.0 |
| fc052000-fc052fff : 0000:00:03.0 |
| fc053000-fc053fff : 0000:00:04.0 |
| fc054000-fc054fff : 0000:00:05.7 |
| fc054000-fc054fff : ehci_hcd |
| fc055000-fc055fff : 0000:00:06.0 |
| fec00000-fec003ff : IOAPIC 0 |
| fee00000-fee00fff : Local APIC |
| feffc000-feffffff : reserved |
| fffc0000-ffffffff : reserved |
| 100000000-4e6ffffff : Persistent Memory (legacy) |
| 4e7000000-4e968bfff : System RAM |
| 4e7000000-4e7b185d8 : Kernel code |
| 4e7b185d9-4e83f54bf : Kernel data |
| 4e876d000-4e965efff : Kernel bss |
| 4e968c000-4ffffffff : Persistent Memory (legacy) |
| 500000000-53fffffff : System RAM |
| |
| The "System RAM" region now sits in the middle of my "Persistent Memory |
| (legacy)" region, splitting it in half. This results in the following |
| kernel WARNING: |
| |
| :: |
| |
| [ 6.356180] WARNING: CPU: 4 PID: 689 at kernel/memremap.c:300 devm_memremap_pages+0x3b2/0x4c0 |
| [ 6.357757] devm_memremap_pages attempted on mixed region [mem 0x4e968c000-0x4ffffffff flags 0x200] |
| |
| and no /dev/pmem\* devices being created. |
| |
| The CONFIG_RANDOMIZE_BASE (KASLR) issue should have been fixed: |
| f28442497b5caf ("x86/boot: Fix KASLR and memmap= collision") |
| |
| There seems to be an issue with CONFIG_KSAN at the moment however. |