| .\" Copyright (C) Michael Kerrisk, 2004 |
| .\" using some material drawn from earlier man pages |
| .\" written by Thomas Kuhn, Copyright 1996 |
| .\" |
| .\" %%%LICENSE_START(GPLv2+_DOC_FULL) |
| .\" This is free documentation; you can redistribute it and/or |
| .\" modify it under the terms of the GNU General Public License as |
| .\" published by the Free Software Foundation; either version 2 of |
| .\" the License, or (at your option) any later version. |
| .\" |
| .\" The GNU General Public License's references to "object code" |
| .\" and "executables" are to be interpreted as the output of any |
| .\" document formatting or typesetting system, including |
| .\" intermediate and printed output. |
| .\" |
| .\" This manual is distributed in the hope that it will be useful, |
| .\" but WITHOUT ANY WARRANTY; without even the implied warranty of |
| .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| .\" GNU General Public License for more details. |
| .\" |
| .\" You should have received a copy of the GNU General Public |
| .\" License along with this manual; if not, see |
| .\" <http://www.gnu.org/licenses/>. |
| .\" %%%LICENSE_END |
| .\" |
| .TH MLOCK 2 2021-03-22 "Linux" "Linux Programmer's Manual" |
| .SH NAME |
| mlock, mlock2, munlock, mlockall, munlockall \- lock and unlock memory |
| .SH SYNOPSIS |
| .nf |
| .B #include <sys/mman.h> |
| .PP |
| .BI "int mlock(const void *" addr ", size_t " len ); |
| .BI "int mlock2(const void *" addr ", size_t " len ", unsigned int " flags ); |
| .BI "int munlock(const void *" addr ", size_t " len ); |
| .PP |
| .BI "int mlockall(int " flags ); |
| .B int munlockall(void); |
| .fi |
| .SH DESCRIPTION |
| .BR mlock (), |
| .BR mlock2 (), |
| and |
| .BR mlockall () |
| lock part or all of the calling process's virtual address |
| space into RAM, preventing that memory from being paged to the |
| swap area. |
| .PP |
| .BR munlock () |
| and |
| .BR munlockall () |
| perform the converse operation, |
| unlocking part or all of the calling process's virtual |
| address space, so that pages in the specified virtual address range may |
| once more to be swapped out if required by the kernel memory manager. |
| .PP |
| Memory locking and unlocking are performed in units of whole pages. |
| .SS mlock(), mlock2(), and munlock() |
| .BR mlock () |
| locks pages in the address range starting at |
| .I addr |
| and continuing for |
| .I len |
| bytes. |
| All pages that contain a part of the specified address range are |
| guaranteed to be resident in RAM when the call returns successfully; |
| the pages are guaranteed to stay in RAM until later unlocked. |
| .PP |
| .BR mlock2 () |
| .\" commit a8ca5d0ecbdde5cc3d7accacbd69968b0c98764e |
| .\" commit de60f5f10c58d4f34b68622442c0e04180367f3f |
| .\" commit b0f205c2a3082dd9081f9a94e50658c5fa906ff1 |
| also locks pages in the specified range starting at |
| .I addr |
| and continuing for |
| .I len |
| bytes. |
| However, the state of the pages contained in that range after the call |
| returns successfully will depend on the value in the |
| .I flags |
| argument. |
| .PP |
| The |
| .I flags |
| argument can be either 0 or the following constant: |
| .TP |
| .B MLOCK_ONFAULT |
| Lock pages that are currently resident and mark the entire range so |
| that the remaining nonresident pages are locked when they are populated |
| by a page fault. |
| .PP |
| If |
| .I flags |
| is 0, |
| .BR mlock2 () |
| behaves exactly the same as |
| .BR mlock (). |
| .PP |
| .BR munlock () |
| unlocks pages in the address range starting at |
| .I addr |
| and continuing for |
| .I len |
| bytes. |
| After this call, all pages that contain a part of the specified |
| memory range can be moved to external swap space again by the kernel. |
| .SS mlockall() and munlockall() |
| .BR mlockall () |
| locks all pages mapped into the address space of the |
| calling process. |
| This includes the pages of the code, data, and stack |
| segment, as well as shared libraries, user space kernel data, shared |
| memory, and memory-mapped files. |
| All mapped pages are guaranteed |
| to be resident in RAM when the call returns successfully; |
| the pages are guaranteed to stay in RAM until later unlocked. |
| .PP |
| The |
| .I flags |
| argument is constructed as the bitwise OR of one or more of the |
| following constants: |
| .TP |
| .B MCL_CURRENT |
| Lock all pages which are currently mapped into the address space of |
| the process. |
| .TP |
| .B MCL_FUTURE |
| Lock all pages which will become mapped into the address space of the |
| process in the future. |
| These could be, for instance, new pages required |
| by a growing heap and stack as well as new memory-mapped files or |
| shared memory regions. |
| .TP |
| .BR MCL_ONFAULT " (since Linux 4.4)" |
| Used together with |
| .BR MCL_CURRENT , |
| .BR MCL_FUTURE , |
| or both. |
| Mark all current (with |
| .BR MCL_CURRENT ) |
| or future (with |
| .BR MCL_FUTURE ) |
| mappings to lock pages when they are faulted in. |
| When used with |
| .BR MCL_CURRENT , |
| all present pages are locked, but |
| .BR mlockall () |
| will not fault in non-present pages. |
| When used with |
| .BR MCL_FUTURE , |
| all future mappings will be marked to lock pages when they are faulted |
| in, but they will not be populated by the lock when the mapping is |
| created. |
| .B MCL_ONFAULT |
| must be used with either |
| .B MCL_CURRENT |
| or |
| .B MCL_FUTURE |
| or both. |
| .PP |
| If |
| .B MCL_FUTURE |
| has been specified, then a later system call (e.g., |
| .BR mmap (2), |
| .BR sbrk (2), |
| .BR malloc (3)), |
| may fail if it would cause the number of locked bytes to exceed |
| the permitted maximum (see below). |
| In the same circumstances, stack growth may likewise fail: |
| the kernel will deny stack expansion and deliver a |
| .B SIGSEGV |
| signal to the process. |
| .PP |
| .BR munlockall () |
| unlocks all pages mapped into the address space of the |
| calling process. |
| .SH RETURN VALUE |
| On success, these system calls return 0. |
| On error, \-1 is returned, |
| .I errno |
| is set to indicate the error, |
| and no changes are made to any locks in the |
| address space of the process. |
| .SH ERRORS |
| .TP |
| .B ENOMEM |
| (Linux 2.6.9 and later) the caller had a nonzero |
| .B RLIMIT_MEMLOCK |
| soft resource limit, but tried to lock more memory than the limit |
| permitted. |
| This limit is not enforced if the process is privileged |
| .RB ( CAP_IPC_LOCK ). |
| .TP |
| .B ENOMEM |
| (Linux 2.4 and earlier) the calling process tried to lock more than |
| half of RAM. |
| .\" In the case of mlock(), this check is somewhat buggy: it doesn't |
| .\" take into account whether the to-be-locked range overlaps with |
| .\" already locked pages. Thus, suppose we allocate |
| .\" (num_physpages / 4 + 1) of memory, and lock those pages once using |
| .\" mlock(), and then lock the *same* page range a second time. |
| .\" In the case, the second mlock() call will fail, since the check |
| .\" calculates that the process is trying to lock (num_physpages / 2 + 2) |
| .\" pages, which of course is not true. (MTK, Nov 04, kernel 2.4.28) |
| .TP |
| .B EPERM |
| The caller is not privileged, but needs privilege |
| .RB ( CAP_IPC_LOCK ) |
| to perform the requested operation. |
| .\"SVr4 documents an additional EAGAIN error code. |
| .PP |
| For |
| .BR mlock (), |
| .BR mlock2 (), |
| and |
| .BR munlock (): |
| .TP |
| .B EAGAIN |
| Some or all of the specified address range could not be locked. |
| .TP |
| .B EINVAL |
| The result of the addition |
| .IR addr + len |
| was less than |
| .IR addr |
| (e.g., the addition may have resulted in an overflow). |
| .TP |
| .B EINVAL |
| (Not on Linux) |
| .I addr |
| was not a multiple of the page size. |
| .TP |
| .B ENOMEM |
| Some of the specified address range does not correspond to mapped |
| pages in the address space of the process. |
| .TP |
| .B ENOMEM |
| Locking or unlocking a region would result in the total number of |
| mappings with distinct attributes (e.g., locked versus unlocked) |
| exceeding the allowed maximum. |
| .\" I.e., the number of VMAs would exceed the 64kB maximum |
| (For example, unlocking a range in the middle of a currently locked |
| mapping would result in three mappings: |
| two locked mappings at each end and an unlocked mapping in the middle.) |
| .PP |
| For |
| .BR mlock2 (): |
| .TP |
| .B EINVAL |
| Unknown \fIflags\fP were specified. |
| .PP |
| For |
| .BR mlockall (): |
| .TP |
| .B EINVAL |
| Unknown \fIflags\fP were specified or |
| .B MCL_ONFAULT |
| was specified without either |
| .B MCL_FUTURE |
| or |
| .BR MCL_CURRENT . |
| .PP |
| For |
| .BR munlockall (): |
| .TP |
| .B EPERM |
| (Linux 2.6.8 and earlier) The caller was not privileged |
| .RB ( CAP_IPC_LOCK ). |
| .SH VERSIONS |
| .BR mlock2 () |
| is available since Linux 4.4; |
| glibc support was added in version 2.27. |
| .SH CONFORMING TO |
| .BR mlock (), |
| .BR munlock (), |
| .BR mlockall (), |
| and |
| .BR munlockall (): |
| POSIX.1-2001, POSIX.1-2008, SVr4. |
| .PP |
| .BR mlock2 () |
| is Linux specific. |
| .PP |
| On POSIX systems on which |
| .BR mlock () |
| and |
| .BR munlock () |
| are available, |
| .B _POSIX_MEMLOCK_RANGE |
| is defined in \fI<unistd.h>\fP and the number of bytes in a page |
| can be determined from the constant |
| .B PAGESIZE |
| (if defined) in \fI<limits.h>\fP or by calling |
| .IR sysconf(_SC_PAGESIZE) . |
| .PP |
| On POSIX systems on which |
| .BR mlockall () |
| and |
| .BR munlockall () |
| are available, |
| .B _POSIX_MEMLOCK |
| is defined in \fI<unistd.h>\fP to a value greater than 0. |
| (See also |
| .BR sysconf (3).) |
| .\" POSIX.1-2001: It shall be defined to -1 or 0 or 200112L. |
| .\" -1: unavailable, 0: ask using sysconf(). |
| .\" glibc defines it to 1. |
| .SH NOTES |
| Memory locking has two main applications: real-time algorithms and |
| high-security data processing. |
| Real-time applications require |
| deterministic timing, and, like scheduling, paging is one major cause |
| of unexpected program execution delays. |
| Real-time applications will |
| usually also switch to a real-time scheduler with |
| .BR sched_setscheduler (2). |
| Cryptographic security software often handles critical bytes like |
| passwords or secret keys as data structures. |
| As a result of paging, |
| these secrets could be transferred onto a persistent swap store medium, |
| where they might be accessible to the enemy long after the security |
| software has erased the secrets in RAM and terminated. |
| (But be aware that the suspend mode on laptops and some desktop |
| computers will save a copy of the system's RAM to disk, regardless |
| of memory locks.) |
| .PP |
| Real-time processes that are using |
| .BR mlockall () |
| to prevent delays on page faults should reserve enough |
| locked stack pages before entering the time-critical section, |
| so that no page fault can be caused by function calls. |
| This can be achieved by calling a function that allocates a |
| sufficiently large automatic variable (an array) and writes to the |
| memory occupied by this array in order to touch these stack pages. |
| This way, enough pages will be mapped for the stack and can be |
| locked into RAM. |
| The dummy writes ensure that not even copy-on-write |
| page faults can occur in the critical section. |
| .PP |
| Memory locks are not inherited by a child created via |
| .BR fork (2) |
| and are automatically removed (unlocked) during an |
| .BR execve (2) |
| or when the process terminates. |
| The |
| .BR mlockall () |
| .B MCL_FUTURE |
| and |
| .B MCL_FUTURE | MCL_ONFAULT |
| settings are not inherited by a child created via |
| .BR fork (2) |
| and are cleared during an |
| .BR execve (2). |
| .PP |
| Note that |
| .BR fork (2) |
| will prepare the address space for a copy-on-write operation. |
| The consequence is that any write access that follows will cause |
| a page fault that in turn may cause high latencies for a real-time process. |
| Therefore, it is crucial not to invoke |
| .BR fork (2) |
| after an |
| .BR mlockall () |
| or |
| .BR mlock () |
| operation\(emnot even from a thread which runs at a low priority within |
| a process which also has a thread running at elevated priority. |
| .PP |
| The memory lock on an address range is automatically removed |
| if the address range is unmapped via |
| .BR munmap (2). |
| .PP |
| Memory locks do not stack, that is, pages which have been locked several times |
| by calls to |
| .BR mlock (), |
| .BR mlock2 (), |
| or |
| .BR mlockall () |
| will be unlocked by a single call to |
| .BR munlock () |
| for the corresponding range or by |
| .BR munlockall (). |
| Pages which are mapped to several locations or by several processes stay |
| locked into RAM as long as they are locked at least at one location or by |
| at least one process. |
| .PP |
| If a call to |
| .BR mlockall () |
| which uses the |
| .B MCL_FUTURE |
| flag is followed by another call that does not specify this flag, the |
| changes made by the |
| .B MCL_FUTURE |
| call will be lost. |
| .PP |
| The |
| .BR mlock2 () |
| .B MLOCK_ONFAULT |
| flag and the |
| .BR mlockall () |
| .B MCL_ONFAULT |
| flag allow efficient memory locking for applications that deal with |
| large mappings where only a (small) portion of pages in the mapping are touched. |
| In such cases, locking all of the pages in a mapping would incur |
| a significant penalty for memory locking. |
| .SS Linux notes |
| Under Linux, |
| .BR mlock (), |
| .BR mlock2 (), |
| and |
| .BR munlock () |
| automatically round |
| .I addr |
| down to the nearest page boundary. |
| However, the POSIX.1 specification of |
| .BR mlock () |
| and |
| .BR munlock () |
| allows an implementation to require that |
| .I addr |
| is page aligned, so portable applications should ensure this. |
| .PP |
| The |
| .I VmLck |
| field of the Linux-specific |
| .I /proc/[pid]/status |
| file shows how many kilobytes of memory the process with ID |
| .I PID |
| has locked using |
| .BR mlock (), |
| .BR mlock2 (), |
| .BR mlockall (), |
| and |
| .BR mmap (2) |
| .BR MAP_LOCKED . |
| .SS Limits and permissions |
| In Linux 2.6.8 and earlier, |
| a process must be privileged |
| .RB ( CAP_IPC_LOCK ) |
| in order to lock memory and the |
| .B RLIMIT_MEMLOCK |
| soft resource limit defines a limit on how much memory the process may lock. |
| .PP |
| Since Linux 2.6.9, no limits are placed on the amount of memory |
| that a privileged process can lock and the |
| .B RLIMIT_MEMLOCK |
| soft resource limit instead defines a limit on how much memory an |
| unprivileged process may lock. |
| .SH BUGS |
| In Linux 4.8 and earlier, |
| a bug in the kernel's accounting of locked memory for unprivileged processes |
| (i.e., without |
| .BR CAP_IPC_LOCK ) |
| meant that if the region specified by |
| .I addr |
| and |
| .I len |
| overlapped an existing lock, |
| then the already locked bytes in the overlapping region were counted twice |
| when checking against the limit. |
| Such double accounting could incorrectly calculate a "total locked memory" |
| value for the process that exceeded the |
| .BR RLIMIT_MEMLOCK |
| limit, with the result that |
| .BR mlock () |
| and |
| .BR mlock2 () |
| would fail on requests that should have succeeded. |
| This bug was fixed |
| .\" commit 0cf2f6f6dc605e587d2c1120f295934c77e810e8 |
| in Linux 4.9. |
| .PP |
| In the 2.4 series Linux kernels up to and including 2.4.17, |
| a bug caused the |
| .BR mlockall () |
| .B MCL_FUTURE |
| flag to be inherited across a |
| .BR fork (2). |
| This was rectified in kernel 2.4.18. |
| .PP |
| Since kernel 2.6.9, if a privileged process calls |
| .I mlockall(MCL_FUTURE) |
| and later drops privileges (loses the |
| .B CAP_IPC_LOCK |
| capability by, for example, |
| setting its effective UID to a nonzero value), |
| then subsequent memory allocations (e.g., |
| .BR mmap (2), |
| .BR brk (2)) |
| will fail if the |
| .B RLIMIT_MEMLOCK |
| resource limit is encountered. |
| .\" See the following LKML thread: |
| .\" http://marc.theaimsgroup.com/?l=linux-kernel&m=113801392825023&w=2 |
| .\" "Rationale for RLIMIT_MEMLOCK" |
| .\" 23 Jan 2006 |
| .SH SEE ALSO |
| .BR mincore (2), |
| .BR mmap (2), |
| .BR setrlimit (2), |
| .BR shmctl (2), |
| .BR sysconf (3), |
| .BR proc (5), |
| .BR capabilities (7) |