| From: Patrick Roy <roypat@amazon.co.uk> |
| Subject: secretmem: disable memfd_secret() if arch cannot set direct map |
| Date: Tue, 1 Oct 2024 09:00:41 +0100 |
| |
| Return -ENOSYS from memfd_secret() syscall if !can_set_direct_map(). This |
| is the case for example on some arm64 configurations, where marking 4k |
| PTEs in the direct map not present can only be done if the direct map is |
| set up at 4k granularity in the first place (as ARM's break-before-make |
| semantics do not easily allow breaking apart large/gigantic pages). |
| |
| More precisely, on arm64 systems with !can_set_direct_map(), |
| set_direct_map_invalid_noflush() is a no-op, however it returns success |
| (0) instead of an error. This means that memfd_secret will seemingly |
| "work" (e.g. syscall succeeds, you can mmap the fd and fault in pages), |
| but it does not actually achieve its goal of removing its memory from the |
| direct map. |
| |
| Note that with this patch, memfd_secret() will start erroring on systems |
| where can_set_direct_map() returns false (arm64 with |
| CONFIG_RODATA_FULL_DEFAULT_ENABLED=n, CONFIG_DEBUG_PAGEALLOC=n and |
| CONFIG_KFENCE=n), but that still seems better than the current silent |
| failure. Since CONFIG_RODATA_FULL_DEFAULT_ENABLED defaults to 'y', most |
| arm64 systems actually have a working memfd_secret() and aren't be |
| affected. |
| |
| From going through the iterations of the original memfd_secret patch |
| series, it seems that disabling the syscall in these scenarios was the |
| intended behavior [1] (preferred over having |
| set_direct_map_invalid_noflush return an error as that would result in |
| SIGBUSes at page-fault time), however the check for it got dropped between |
| v16 [2] and v17 [3], when secretmem moved away from CMA allocations. |
| |
| [1]: https://lore.kernel.org/lkml/20201124164930.GK8537@kernel.org/ |
| [2]: https://lore.kernel.org/lkml/20210121122723.3446-11-rppt@kernel.org/#t |
| [3]: https://lore.kernel.org/lkml/20201125092208.12544-10-rppt@kernel.org/ |
| |
| Link: https://lkml.kernel.org/r/20241001080056.784735-1-roypat@amazon.co.uk |
| Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas") |
| Signed-off-by: Patrick Roy <roypat@amazon.co.uk> |
| Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> |
| Cc: Alexander Graf <graf@amazon.com> |
| Cc: David Hildenbrand <david@redhat.com> |
| Cc: James Gowans <jgowans@amazon.com> |
| Cc: <stable@vger.kernel.org> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| --- |
| |
| mm/secretmem.c | 4 ++-- |
| 1 file changed, 2 insertions(+), 2 deletions(-) |
| |
| --- a/mm/secretmem.c~secretmem-disable-memfd_secret-if-arch-cannot-set-direct-map |
| +++ a/mm/secretmem.c |
| @@ -238,7 +238,7 @@ SYSCALL_DEFINE1(memfd_secret, unsigned i |
| /* make sure local flags do not confict with global fcntl.h */ |
| BUILD_BUG_ON(SECRETMEM_FLAGS_MASK & O_CLOEXEC); |
| |
| - if (!secretmem_enable) |
| + if (!secretmem_enable || !can_set_direct_map()) |
| return -ENOSYS; |
| |
| if (flags & ~(SECRETMEM_FLAGS_MASK | O_CLOEXEC)) |
| @@ -280,7 +280,7 @@ static struct file_system_type secretmem |
| |
| static int __init secretmem_init(void) |
| { |
| - if (!secretmem_enable) |
| + if (!secretmem_enable || !can_set_direct_map()) |
| return 0; |
| |
| secretmem_mnt = kern_mount(&secretmem_fs); |
| _ |