| .\" This manpage is Copyright (C) 1992 Drew Eckhardt; |
| .\" and Copyright (C) 1993 Michael Haardt, Ian Jackson; |
| .\" and Copyright (C) 1998 Jamie Lokier; |
| .\" and Copyright (C) 2002-2010, 2014 Michael Kerrisk; |
| .\" and Copyright (C) 2014 Jeff Layton |
| .\" and Copyright (C) 2014 David Herrmann |
| .\" and Copyright (C) 2017 Jens Axboe |
| .\" |
| .\" %%%LICENSE_START(VERBATIM) |
| .\" Permission is granted to make and distribute verbatim copies of this |
| .\" manual provided the copyright notice and this permission notice are |
| .\" preserved on all copies. |
| .\" |
| .\" Permission is granted to copy and distribute modified versions of this |
| .\" manual under the conditions for verbatim copying, provided that the |
| .\" entire resulting derived work is distributed under the terms of a |
| .\" permission notice identical to this one. |
| .\" |
| .\" Since the Linux kernel and libraries are constantly changing, this |
| .\" manual page may be incorrect or out-of-date. The author(s) assume no |
| .\" responsibility for errors or omissions, or for damages resulting from |
| .\" the use of the information contained herein. The author(s) may not |
| .\" have taken the same level of care in the production of this manual, |
| .\" which is licensed free of charge, as they might when working |
| .\" professionally. |
| .\" |
| .\" Formatted or processed versions of this manual, if unaccompanied by |
| .\" the source, must acknowledge the copyright and authors of this work. |
| .\" %%%LICENSE_END |
| .\" |
| .\" Modified 1993-07-24 by Rik Faith <faith@cs.unc.edu> |
| .\" Modified 1995-09-26 by Andries Brouwer <aeb@cwi.nl> |
| .\" and again on 960413 and 980804 and 981223. |
| .\" Modified 1998-12-11 by Jamie Lokier <jamie@imbolc.ucc.ie> |
| .\" Applied correction by Christian Ehrhardt - aeb, 990712 |
| .\" Modified 2002-04-23 by Michael Kerrisk <mtk.manpages@gmail.com> |
| .\" Added note on F_SETFL and O_DIRECT |
| .\" Complete rewrite + expansion of material on file locking |
| .\" Incorporated description of F_NOTIFY, drawing on |
| .\" Stephen Rothwell's notes in Documentation/dnotify.txt. |
| .\" Added description of F_SETLEASE and F_GETLEASE |
| .\" Corrected and polished, aeb, 020527. |
| .\" Modified 2004-03-03 by Michael Kerrisk <mtk.manpages@gmail.com> |
| .\" Modified description of file leases: fixed some errors of detail |
| .\" Replaced the term "lease contestant" by "lease breaker" |
| .\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com> |
| .\" Added notes on capability requirements |
| .\" Modified 2004-12-08, added O_NOATIME after note from Martin Pool |
| .\" 2004-12-10, mtk, noted F_GETOWN bug after suggestion from aeb. |
| .\" 2005-04-08 Jamie Lokier <jamie@shareable.org>, mtk |
| .\" Described behavior of F_SETOWN/F_SETSIG in |
| .\" multithreaded processes, and generally cleaned |
| .\" up the discussion of F_SETOWN. |
| .\" 2005-05-20, Johannes Nicolai <johannes.nicolai@hpi.uni-potsdam.de>, |
| .\" mtk: Noted F_SETOWN bug for socket file descriptor in Linux 2.4 |
| .\" and earlier. Added text on permissions required to send signal. |
| .\" 2009-09-30, Michael Kerrisk |
| .\" Note obsolete F_SETOWN behavior with threads. |
| .\" Document F_SETOWN_EX and F_GETOWN_EX |
| .\" 2010-06-17, Michael Kerrisk |
| .\" Document F_SETPIPE_SZ and F_GETPIPE_SZ. |
| .\" 2014-07-08, David Herrmann <dh.herrmann@gmail.com> |
| .\" Document F_ADD_SEALS and F_GET_SEALS |
| .\" 2017-06-26, Jens Axboe <axboe@kernel.dk> |
| .\" Document F_{GET,SET}_RW_HINT and F_{GET,SET}_FILE_RW_HINT |
| .\" |
| .TH FCNTL 2 2021-03-22 "Linux" "Linux Programmer's Manual" |
| .SH NAME |
| fcntl \- manipulate file descriptor |
| .SH SYNOPSIS |
| .nf |
| .B #include <unistd.h> |
| .B #include <fcntl.h> |
| .PP |
| .BI "int fcntl(int " fd ", int " cmd ", ... /* " arg " */ );" |
| .fi |
| .SH DESCRIPTION |
| .BR fcntl () |
| performs one of the operations described below on the open file descriptor |
| .IR fd . |
| The operation is determined by |
| .IR cmd . |
| .PP |
| .BR fcntl () |
| can take an optional third argument. |
| Whether or not this argument is required is determined by |
| .IR cmd . |
| The required argument type is indicated in parentheses after each |
| .I cmd |
| name (in most cases, the required type is |
| .IR int , |
| and we identify the argument using the name |
| .IR arg ), |
| or |
| .I void |
| is specified if the argument is not required. |
| .PP |
| Certain of the operations below are supported only since a particular |
| Linux kernel version. |
| The preferred method of checking whether the host kernel supports |
| a particular operation is to invoke |
| .BR fcntl () |
| with the desired |
| .IR cmd |
| value and then test whether the call failed with |
| .BR EINVAL , |
| indicating that the kernel does not recognize this value. |
| .SS Duplicating a file descriptor |
| .TP |
| .BR F_DUPFD " (\fIint\fP)" |
| Duplicate the file descriptor |
| .IR fd |
| using the lowest-numbered available file descriptor greater than or equal to |
| .IR arg . |
| This is different from |
| .BR dup2 (2), |
| which uses exactly the file descriptor specified. |
| .IP |
| On success, the new file descriptor is returned. |
| .IP |
| See |
| .BR dup (2) |
| for further details. |
| .TP |
| .BR F_DUPFD_CLOEXEC " (\fIint\fP; since Linux 2.6.24)" |
| As for |
| .BR F_DUPFD , |
| but additionally set the |
| close-on-exec flag for the duplicate file descriptor. |
| Specifying this flag permits a program to avoid an additional |
| .BR fcntl () |
| .B F_SETFD |
| operation to set the |
| .B FD_CLOEXEC |
| flag. |
| For an explanation of why this flag is useful, |
| see the description of |
| .B O_CLOEXEC |
| in |
| .BR open (2). |
| .SS File descriptor flags |
| The following commands manipulate the flags associated with |
| a file descriptor. |
| Currently, only one such flag is defined: |
| .BR FD_CLOEXEC , |
| the close-on-exec flag. |
| If the |
| .B FD_CLOEXEC |
| bit is set, |
| the file descriptor will automatically be closed during a successful |
| .BR execve (2). |
| (If the |
| .BR execve (2) |
| fails, the file descriptor is left open.) |
| If the |
| .B FD_CLOEXEC |
| bit is not set, the file descriptor will remain open across an |
| .BR execve (2). |
| .TP |
| .BR F_GETFD " (\fIvoid\fP)" |
| Return (as the function result) the file descriptor flags; |
| .I arg |
| is ignored. |
| .TP |
| .BR F_SETFD " (\fIint\fP)" |
| Set the file descriptor flags to the value specified by |
| .IR arg . |
| .PP |
| In multithreaded programs, using |
| .BR fcntl () |
| .B F_SETFD |
| to set the close-on-exec flag at the same time as another thread performs a |
| .BR fork (2) |
| plus |
| .BR execve (2) |
| is vulnerable to a race condition that may unintentionally leak |
| the file descriptor to the program executed in the child process. |
| See the discussion of the |
| .BR O_CLOEXEC |
| flag in |
| .BR open (2) |
| for details and a remedy to the problem. |
| .SS File status flags |
| Each open file description has certain associated status flags, |
| initialized by |
| .BR open (2) |
| .\" or |
| .\" .BR creat (2), |
| and possibly modified by |
| .BR fcntl (). |
| Duplicated file descriptors |
| (made with |
| .BR dup (2), |
| .BR fcntl (F_DUPFD), |
| .BR fork (2), |
| etc.) refer to the same open file description, and thus |
| share the same file status flags. |
| .PP |
| The file status flags and their semantics are described in |
| .BR open (2). |
| .TP |
| .BR F_GETFL " (\fIvoid\fP)" |
| Return (as the function result) |
| the file access mode and the file status flags; |
| .I arg |
| is ignored. |
| .TP |
| .BR F_SETFL " (\fIint\fP)" |
| Set the file status flags to the value specified by |
| .IR arg . |
| File access mode |
| .RB ( O_RDONLY ", " O_WRONLY ", " O_RDWR ) |
| and file creation flags |
| (i.e., |
| .BR O_CREAT ", " O_EXCL ", " O_NOCTTY ", " O_TRUNC ) |
| in |
| .I arg |
| are ignored. |
| On Linux, this command can change only the |
| .BR O_APPEND , |
| .BR O_ASYNC , |
| .BR O_DIRECT , |
| .BR O_NOATIME , |
| and |
| .B O_NONBLOCK |
| flags. |
| It is not possible to change the |
| .BR O_DSYNC |
| and |
| .BR O_SYNC |
| flags; see BUGS, below. |
| .SS Advisory record locking |
| Linux implements traditional ("process-associated") UNIX record locks, |
| as standardized by POSIX. |
| For a Linux-specific alternative with better semantics, |
| see the discussion of open file description locks below. |
| .PP |
| .BR F_SETLK , |
| .BR F_SETLKW , |
| and |
| .BR F_GETLK |
| are used to acquire, release, and test for the existence of record |
| locks (also known as byte-range, file-segment, or file-region locks). |
| The third argument, |
| .IR lock , |
| is a pointer to a structure that has at least the following fields |
| (in unspecified order). |
| .PP |
| .in +4n |
| .EX |
| struct flock { |
| ... |
| short l_type; /* Type of lock: F_RDLCK, |
| F_WRLCK, F_UNLCK */ |
| short l_whence; /* How to interpret l_start: |
| SEEK_SET, SEEK_CUR, SEEK_END */ |
| off_t l_start; /* Starting offset for lock */ |
| off_t l_len; /* Number of bytes to lock */ |
| pid_t l_pid; /* PID of process blocking our lock |
| (set by F_GETLK and F_OFD_GETLK) */ |
| ... |
| }; |
| .EE |
| .in |
| .PP |
| The |
| .IR l_whence ", " l_start ", and " l_len |
| fields of this structure specify the range of bytes we wish to lock. |
| Bytes past the end of the file may be locked, |
| but not bytes before the start of the file. |
| .PP |
| .I l_start |
| is the starting offset for the lock, and is interpreted |
| relative to either: |
| the start of the file (if |
| .I l_whence |
| is |
| .BR SEEK_SET ); |
| the current file offset (if |
| .I l_whence |
| is |
| .BR SEEK_CUR ); |
| or the end of the file (if |
| .I l_whence |
| is |
| .BR SEEK_END ). |
| In the final two cases, |
| .I l_start |
| can be a negative number provided the |
| offset does not lie before the start of the file. |
| .PP |
| .I l_len |
| specifies the number of bytes to be locked. |
| If |
| .I l_len |
| is positive, then the range to be locked covers bytes |
| .I l_start |
| up to and including |
| .IR l_start + l_len \-1. |
| Specifying 0 for |
| .I l_len |
| has the special meaning: lock all bytes starting at the |
| location specified by |
| .IR l_whence " and " l_start |
| through to the end of file, no matter how large the file grows. |
| .PP |
| POSIX.1-2001 allows (but does not require) |
| an implementation to support a negative |
| .I l_len |
| value; if |
| .I l_len |
| is negative, the interval described by |
| .I lock |
| covers bytes |
| .IR l_start + l_len |
| up to and including |
| .IR l_start \-1. |
| This is supported by Linux since kernel versions 2.4.21 and 2.5.49. |
| .PP |
| The |
| .I l_type |
| field can be used to place a read |
| .RB ( F_RDLCK ) |
| or a write |
| .RB ( F_WRLCK ) |
| lock on a file. |
| Any number of processes may hold a read lock (shared lock) |
| on a file region, but only one process may hold a write lock |
| (exclusive lock). |
| An exclusive lock excludes all other locks, |
| both shared and exclusive. |
| A single process can hold only one type of lock on a file region; |
| if a new lock is applied to an already-locked region, |
| then the existing lock is converted to the new lock type. |
| (Such conversions may involve splitting, shrinking, or coalescing with |
| an existing lock if the byte range specified by the new lock does not |
| precisely coincide with the range of the existing lock.) |
| .TP |
| .BR F_SETLK " (\fIstruct flock *\fP)" |
| Acquire a lock (when |
| .I l_type |
| is |
| .B F_RDLCK |
| or |
| .BR F_WRLCK ) |
| or release a lock (when |
| .I l_type |
| is |
| .BR F_UNLCK ) |
| on the bytes specified by the |
| .IR l_whence ", " l_start ", and " l_len |
| fields of |
| .IR lock . |
| If a conflicting lock is held by another process, |
| this call returns \-1 and sets |
| .I errno |
| to |
| .B EACCES |
| or |
| .BR EAGAIN . |
| (The error returned in this case differs across implementations, |
| so POSIX requires a portable application to check for both errors.) |
| .TP |
| .BR F_SETLKW " (\fIstruct flock *\fP)" |
| As for |
| .BR F_SETLK , |
| but if a conflicting lock is held on the file, then wait for that |
| lock to be released. |
| If a signal is caught while waiting, then the call is interrupted |
| and (after the signal handler has returned) |
| returns immediately (with return value \-1 and |
| .I errno |
| set to |
| .BR EINTR ; |
| see |
| .BR signal (7)). |
| .TP |
| .BR F_GETLK " (\fIstruct flock *\fP)" |
| On input to this call, |
| .I lock |
| describes a lock we would like to place on the file. |
| If the lock could be placed, |
| .BR fcntl () |
| does not actually place it, but returns |
| .B F_UNLCK |
| in the |
| .I l_type |
| field of |
| .I lock |
| and leaves the other fields of the structure unchanged. |
| .IP |
| If one or more incompatible locks would prevent |
| this lock being placed, then |
| .BR fcntl () |
| returns details about one of those locks in the |
| .IR l_type ", " l_whence ", " l_start ", and " l_len |
| fields of |
| .IR lock . |
| If the conflicting lock is a traditional (process-associated) record lock, |
| then the |
| .I l_pid |
| field is set to the PID of the process holding that lock. |
| If the conflicting lock is an open file description lock, then |
| .I l_pid |
| is set to \-1. |
| Note that the returned information |
| may already be out of date by the time the caller inspects it. |
| .PP |
| In order to place a read lock, |
| .I fd |
| must be open for reading. |
| In order to place a write lock, |
| .I fd |
| must be open for writing. |
| To place both types of lock, open a file read-write. |
| .PP |
| When placing locks with |
| .BR F_SETLKW , |
| the kernel detects |
| .IR deadlocks , |
| whereby two or more processes have their |
| lock requests mutually blocked by locks held by the other processes. |
| For example, suppose process A holds a write lock on byte 100 of a file, |
| and process B holds a write lock on byte 200. |
| If each process then attempts to lock the byte already |
| locked by the other process using |
| .BR F_SETLKW , |
| then, without deadlock detection, |
| both processes would remain blocked indefinitely. |
| When the kernel detects such deadlocks, |
| it causes one of the blocking lock requests to immediately fail with the error |
| .BR EDEADLK ; |
| an application that encounters such an error should release |
| some of its locks to allow other applications to proceed before |
| attempting regain the locks that it requires. |
| Circular deadlocks involving more than two processes are also detected. |
| Note, however, that there are limitations to the kernel's |
| deadlock-detection algorithm; see BUGS. |
| .PP |
| As well as being removed by an explicit |
| .BR F_UNLCK , |
| record locks are automatically released when the process terminates. |
| .PP |
| Record locks are not inherited by a child created via |
| .BR fork (2), |
| but are preserved across an |
| .BR execve (2). |
| .PP |
| Because of the buffering performed by the |
| .BR stdio (3) |
| library, the use of record locking with routines in that package |
| should be avoided; use |
| .BR read (2) |
| and |
| .BR write (2) |
| instead. |
| .PP |
| The record locks described above are associated with the process |
| (unlike the open file description locks described below). |
| This has some unfortunate consequences: |
| .IP * 3 |
| If a process closes |
| .I any |
| file descriptor referring to a file, |
| then all of the process's locks on that file are released, |
| regardless of the file descriptor(s) on which the locks were obtained. |
| .\" (Additional file descriptors referring to the same file |
| .\" may have been obtained by calls to |
| .\" .BR open "(2), " dup "(2), " dup2 "(2), or " fcntl ().) |
| This is bad: it means that a process can lose its locks on |
| a file such as |
| .I /etc/passwd |
| or |
| .I /etc/mtab |
| when for some reason a library function decides to open, read, |
| and close the same file. |
| .IP * |
| The threads in a process share locks. |
| In other words, |
| a multithreaded program can't use record locking to ensure |
| that threads don't simultaneously access the same region of a file. |
| .PP |
| Open file description locks solve both of these problems. |
| .SS Open file description locks (non-POSIX) |
| Open file description locks are advisory byte-range locks whose operation is |
| in most respects identical to the traditional record locks described above. |
| This lock type is Linux-specific, |
| and available since Linux 3.15. |
| (There is a proposal with the Austin Group |
| .\" FIXME . Review progress into POSIX |
| .\" http://austingroupbugs.net/view.php?id=768 |
| to include this lock type in the next revision of POSIX.1.) |
| For an explanation of open file descriptions, see |
| .BR open (2). |
| .PP |
| The principal difference between the two lock types |
| is that whereas traditional record locks |
| are associated with a process, |
| open file description locks are associated with the |
| open file description on which they are acquired, |
| much like locks acquired with |
| .BR flock (2). |
| Consequently (and unlike traditional advisory record locks), |
| open file description locks are inherited across |
| .BR fork (2) |
| (and |
| .BR clone (2) |
| with |
| .BR CLONE_FILES ), |
| and are only automatically released on the last close |
| of the open file description, |
| instead of being released on any close of the file. |
| .PP |
| Conflicting lock combinations |
| (i.e., a read lock and a write lock or two write locks) |
| where one lock is an open file description lock and the other |
| is a traditional record lock conflict |
| even when they are acquired by the same process on the same file descriptor. |
| .PP |
| Open file description locks placed via the same open file description |
| (i.e., via the same file descriptor, |
| or via a duplicate of the file descriptor created by |
| .BR fork (2), |
| .BR dup (2), |
| .BR fcntl () |
| .BR F_DUPFD , |
| and so on) are always compatible: |
| if a new lock is placed on an already locked region, |
| then the existing lock is converted to the new lock type. |
| (Such conversions may result in splitting, shrinking, or coalescing with |
| an existing lock as discussed above.) |
| .PP |
| On the other hand, open file description locks may conflict with |
| each other when they are acquired via different open file descriptions. |
| Thus, the threads in a multithreaded program can use |
| open file description locks to synchronize access to a file region |
| by having each thread perform its own |
| .BR open (2) |
| on the file and applying locks via the resulting file descriptor. |
| .PP |
| As with traditional advisory locks, the third argument to |
| .BR fcntl (), |
| .IR lock , |
| is a pointer to an |
| .IR flock |
| structure. |
| By contrast with traditional record locks, the |
| .I l_pid |
| field of that structure must be set to zero |
| when using the commands described below. |
| .PP |
| The commands for working with open file description locks are analogous |
| to those used with traditional locks: |
| .TP |
| .BR F_OFD_SETLK " (\fIstruct flock *\fP)" |
| Acquire an open file description lock (when |
| .I l_type |
| is |
| .B F_RDLCK |
| or |
| .BR F_WRLCK ) |
| or release an open file description lock (when |
| .I l_type |
| is |
| .BR F_UNLCK ) |
| on the bytes specified by the |
| .IR l_whence ", " l_start ", and " l_len |
| fields of |
| .IR lock . |
| If a conflicting lock is held by another process, |
| this call returns \-1 and sets |
| .I errno |
| to |
| .BR EAGAIN . |
| .TP |
| .BR F_OFD_SETLKW " (\fIstruct flock *\fP)" |
| As for |
| .BR F_OFD_SETLK , |
| but if a conflicting lock is held on the file, then wait for that lock to be |
| released. |
| If a signal is caught while waiting, then the call is interrupted |
| and (after the signal handler has returned) returns immediately |
| (with return value \-1 and |
| .I errno |
| set to |
| .BR EINTR ; |
| see |
| .BR signal (7)). |
| .TP |
| .BR F_OFD_GETLK " (\fIstruct flock *\fP)" |
| On input to this call, |
| .I lock |
| describes an open file description lock we would like to place on the file. |
| If the lock could be placed, |
| .BR fcntl () |
| does not actually place it, but returns |
| .B F_UNLCK |
| in the |
| .I l_type |
| field of |
| .I lock |
| and leaves the other fields of the structure unchanged. |
| If one or more incompatible locks would prevent this lock being placed, |
| then details about one of these locks are returned via |
| .IR lock , |
| as described above for |
| .BR F_GETLK . |
| .PP |
| In the current implementation, |
| .\" commit 57b65325fe34ec4c917bc4e555144b4a94d9e1f7 |
| no deadlock detection is performed for open file description locks. |
| (This contrasts with process-associated record locks, |
| for which the kernel does perform deadlock detection.) |
| .\" |
| .SS Mandatory locking |
| .IR Warning : |
| the Linux implementation of mandatory locking is unreliable. |
| See BUGS below. |
| Because of these bugs, |
| and the fact that the feature is believed to be little used, |
| since Linux 4.5, mandatory locking has been made an optional feature, |
| governed by a configuration option |
| .RB ( CONFIG_MANDATORY_FILE_LOCKING ). |
| This is an initial step toward removing this feature completely. |
| .PP |
| By default, both traditional (process-associated) and open file description |
| record locks are advisory. |
| Advisory locks are not enforced and are useful only between |
| cooperating processes. |
| .PP |
| Both lock types can also be mandatory. |
| Mandatory locks are enforced for all processes. |
| If a process tries to perform an incompatible access (e.g., |
| .BR read (2) |
| or |
| .BR write (2)) |
| on a file region that has an incompatible mandatory lock, |
| then the result depends upon whether the |
| .B O_NONBLOCK |
| flag is enabled for its open file description. |
| If the |
| .B O_NONBLOCK |
| flag is not enabled, then |
| the system call is blocked until the lock is removed |
| or converted to a mode that is compatible with the access. |
| If the |
| .B O_NONBLOCK |
| flag is enabled, then the system call fails with the error |
| .BR EAGAIN . |
| .PP |
| To make use of mandatory locks, mandatory locking must be enabled |
| both on the filesystem that contains the file to be locked, |
| and on the file itself. |
| Mandatory locking is enabled on a filesystem |
| using the "\-o mand" option to |
| .BR mount (8), |
| or the |
| .B MS_MANDLOCK |
| flag for |
| .BR mount (2). |
| Mandatory locking is enabled on a file by disabling |
| group execute permission on the file and enabling the set-group-ID |
| permission bit (see |
| .BR chmod (1) |
| and |
| .BR chmod (2)). |
| .PP |
| Mandatory locking is not specified by POSIX. |
| Some other systems also support mandatory locking, |
| although the details of how to enable it vary across systems. |
| .\" |
| .SS Lost locks |
| When an advisory lock is obtained on a networked filesystem such as |
| NFS it is possible that the lock might get lost. |
| This may happen due to administrative action on the server, or due to a |
| network partition (i.e., loss of network connectivity with the server) |
| which lasts long enough for the server to assume |
| that the client is no longer functioning. |
| .PP |
| When the filesystem determines that a lock has been lost, future |
| .BR read (2) |
| or |
| .BR write (2) |
| requests may fail with the error |
| .BR EIO . |
| This error will persist until the lock is removed or the file |
| descriptor is closed. |
| Since Linux 3.12, |
| .\" commit ef1820f9be27b6ad158f433ab38002ab8131db4d |
| this happens at least for NFSv4 (including all minor versions). |
| .PP |
| Some versions of UNIX send a signal |
| .RB ( SIGLOST ) |
| in this circumstance. |
| Linux does not define this signal, and does not provide any |
| asynchronous notification of lost locks. |
| .\" |
| .SS Managing signals |
| .BR F_GETOWN , |
| .BR F_SETOWN , |
| .BR F_GETOWN_EX , |
| .BR F_SETOWN_EX , |
| .BR F_GETSIG , |
| and |
| .B F_SETSIG |
| are used to manage I/O availability signals: |
| .TP |
| .BR F_GETOWN " (\fIvoid\fP)" |
| Return (as the function result) |
| the process ID or process group ID currently receiving |
| .B SIGIO |
| and |
| .B SIGURG |
| signals for events on file descriptor |
| .IR fd . |
| Process IDs are returned as positive values; |
| process group IDs are returned as negative values (but see BUGS below). |
| .I arg |
| is ignored. |
| .TP |
| .BR F_SETOWN " (\fIint\fP)" |
| Set the process ID or process group ID that will receive |
| .B SIGIO |
| and |
| .B SIGURG |
| signals for events on the file descriptor |
| .IR fd . |
| The target process or process group ID is specified in |
| .IR arg . |
| A process ID is specified as a positive value; |
| a process group ID is specified as a negative value. |
| Most commonly, the calling process specifies itself as the owner |
| (that is, |
| .I arg |
| is specified as |
| .BR getpid (2)). |
| .IP |
| As well as setting the file descriptor owner, |
| one must also enable generation of signals on the file descriptor. |
| This is done by using the |
| .BR fcntl () |
| .B F_SETFL |
| command to set the |
| .B O_ASYNC |
| file status flag on the file descriptor. |
| Subsequently, a |
| .B SIGIO |
| signal is sent whenever input or output becomes possible |
| on the file descriptor. |
| The |
| .BR fcntl () |
| .B F_SETSIG |
| command can be used to obtain delivery of a signal other than |
| .BR SIGIO . |
| .IP |
| Sending a signal to the owner process (group) specified by |
| .B F_SETOWN |
| is subject to the same permissions checks as are described for |
| .BR kill (2), |
| where the sending process is the one that employs |
| .B F_SETOWN |
| (but see BUGS below). |
| If this permission check fails, then the signal is |
| silently discarded. |
| .IR Note : |
| The |
| .BR F_SETOWN |
| operation records the caller's credentials at the time of the |
| .BR fcntl () |
| call, |
| and it is these saved credentials that are used for the permission checks. |
| .IP |
| If the file descriptor |
| .I fd |
| refers to a socket, |
| .B F_SETOWN |
| also selects |
| the recipient of |
| .B SIGURG |
| signals that are delivered when out-of-band |
| data arrives on that socket. |
| .RB ( SIGURG |
| is sent in any situation where |
| .BR select (2) |
| would report the socket as having an "exceptional condition".) |
| .\" The following appears to be rubbish. It doesn't seem to |
| .\" be true according to the kernel source, and I can write |
| .\" a program that gets a terminal-generated SIGIO even though |
| .\" it is not the foreground process group of the terminal. |
| .\" -- MTK, 8 Apr 05 |
| .\" |
| .\" If the file descriptor |
| .\" .I fd |
| .\" refers to a terminal device, then SIGIO |
| .\" signals are sent to the foreground process group of the terminal. |
| .IP |
| The following was true in 2.6.x kernels up to and including |
| kernel 2.6.11: |
| .RS |
| .IP |
| If a nonzero value is given to |
| .B F_SETSIG |
| in a multithreaded process running with a threading library |
| that supports thread groups (e.g., NPTL), |
| then a positive value given to |
| .B F_SETOWN |
| has a different meaning: |
| .\" The relevant place in the (2.6) kernel source is the |
| .\" 'switch' in fs/fcntl.c::send_sigio_to_task() -- MTK, Apr 2005 |
| instead of being a process ID identifying a whole process, |
| it is a thread ID identifying a specific thread within a process. |
| Consequently, it may be necessary to pass |
| .B F_SETOWN |
| the result of |
| .BR gettid (2) |
| instead of |
| .BR getpid (2) |
| to get sensible results when |
| .B F_SETSIG |
| is used. |
| (In current Linux threading implementations, |
| a main thread's thread ID is the same as its process ID. |
| This means that a single-threaded program can equally use |
| .BR gettid (2) |
| or |
| .BR getpid (2) |
| in this scenario.) |
| Note, however, that the statements in this paragraph do not apply |
| to the |
| .B SIGURG |
| signal generated for out-of-band data on a socket: |
| this signal is always sent to either a process or a process group, |
| depending on the value given to |
| .BR F_SETOWN . |
| .\" send_sigurg()/send_sigurg_to_task() bypasses |
| .\" kill_fasync()/send_sigio()/send_sigio_to_task() |
| .\" to directly call send_group_sig_info() |
| .\" -- MTK, Apr 2005 (kernel 2.6.11) |
| .RE |
| .IP |
| The above behavior was accidentally dropped in Linux 2.6.12, |
| and won't be restored. |
| From Linux 2.6.32 onward, use |
| .BR F_SETOWN_EX |
| to target |
| .B SIGIO |
| and |
| .B SIGURG |
| signals at a particular thread. |
| .TP |
| .BR F_GETOWN_EX " (\fIstruct f_owner_ex *\fP) (since Linux 2.6.32)" |
| Return the current file descriptor owner settings |
| as defined by a previous |
| .BR F_SETOWN_EX |
| operation. |
| The information is returned in the structure pointed to by |
| .IR arg , |
| which has the following form: |
| .IP |
| .in +4n |
| .EX |
| struct f_owner_ex { |
| int type; |
| pid_t pid; |
| }; |
| .EE |
| .in |
| .IP |
| The |
| .I type |
| field will have one of the values |
| .BR F_OWNER_TID , |
| .BR F_OWNER_PID , |
| or |
| .BR F_OWNER_PGRP . |
| The |
| .I pid |
| field is a positive integer representing a thread ID, process ID, |
| or process group ID. |
| See |
| .B F_SETOWN_EX |
| for more details. |
| .TP |
| .BR F_SETOWN_EX " (\fIstruct f_owner_ex *\fP) (since Linux 2.6.32)" |
| This operation performs a similar task to |
| .BR F_SETOWN . |
| It allows the caller to direct I/O availability signals |
| to a specific thread, process, or process group. |
| The caller specifies the target of signals via |
| .IR arg , |
| which is a pointer to a |
| .IR f_owner_ex |
| structure. |
| The |
| .I type |
| field has one of the following values, which define how |
| .I pid |
| is interpreted: |
| .RS |
| .TP |
| .BR F_OWNER_TID |
| Send the signal to the thread whose thread ID |
| (the value returned by a call to |
| .BR clone (2) |
| or |
| .BR gettid (2)) |
| is specified in |
| .IR pid . |
| .TP |
| .BR F_OWNER_PID |
| Send the signal to the process whose ID |
| is specified in |
| .IR pid . |
| .TP |
| .BR F_OWNER_PGRP |
| Send the signal to the process group whose ID |
| is specified in |
| .IR pid . |
| (Note that, unlike with |
| .BR F_SETOWN , |
| a process group ID is specified as a positive value here.) |
| .RE |
| .TP |
| .BR F_GETSIG " (\fIvoid\fP)" |
| Return (as the function result) |
| the signal sent when input or output becomes possible. |
| A value of zero means |
| .B SIGIO |
| is sent. |
| Any other value (including |
| .BR SIGIO ) |
| is the |
| signal sent instead, and in this case additional info is available to |
| the signal handler if installed with |
| .BR SA_SIGINFO . |
| .I arg |
| is ignored. |
| .TP |
| .BR F_SETSIG " (\fIint\fP)" |
| Set the signal sent when input or output becomes possible |
| to the value given in |
| .IR arg . |
| A value of zero means to send the default |
| .B SIGIO |
| signal. |
| Any other value (including |
| .BR SIGIO ) |
| is the signal to send instead, and in this case additional info |
| is available to the signal handler if installed with |
| .BR SA_SIGINFO . |
| .\" |
| .\" The following was true only up until 2.6.11: |
| .\" |
| .\" Additionally, passing a nonzero value to |
| .\" .B F_SETSIG |
| .\" changes the signal recipient from a whole process to a specific thread |
| .\" within a process. |
| .\" See the description of |
| .\" .B F_SETOWN |
| .\" for more details. |
| .IP |
| By using |
| .B F_SETSIG |
| with a nonzero value, and setting |
| .B SA_SIGINFO |
| for the |
| signal handler (see |
| .BR sigaction (2)), |
| extra information about I/O events is passed to |
| the handler in a |
| .I siginfo_t |
| structure. |
| If the |
| .I si_code |
| field indicates the source is |
| .BR SI_SIGIO , |
| the |
| .I si_fd |
| field gives the file descriptor associated with the event. |
| Otherwise, |
| there is no indication which file descriptors are pending, and you |
| should use the usual mechanisms |
| .RB ( select (2), |
| .BR poll (2), |
| .BR read (2) |
| with |
| .B O_NONBLOCK |
| set etc.) to determine which file descriptors are available for I/O. |
| .IP |
| Note that the file descriptor provided in |
| .I si_fd |
| is the one that was specified during the |
| .BR F_SETSIG |
| operation. |
| This can lead to an unusual corner case. |
| If the file descriptor is duplicated |
| .RB ( dup (2) |
| or similar), and the original file descriptor is closed, |
| then I/O events will continue to be generated, but the |
| .I si_fd |
| field will contain the number of the now closed file descriptor. |
| .IP |
| By selecting a real time signal (value >= |
| .BR SIGRTMIN ), |
| multiple I/O events may be queued using the same signal numbers. |
| (Queuing is dependent on available memory.) |
| Extra information is available |
| if |
| .B SA_SIGINFO |
| is set for the signal handler, as above. |
| .IP |
| Note that Linux imposes a limit on the |
| number of real-time signals that may be queued to a |
| process (see |
| .BR getrlimit (2) |
| and |
| .BR signal (7)) |
| and if this limit is reached, then the kernel reverts to |
| delivering |
| .BR SIGIO , |
| and this signal is delivered to the entire |
| process rather than to a specific thread. |
| .\" See fs/fcntl.c::send_sigio_to_task() (2.4/2.6) sources -- MTK, Apr 05 |
| .PP |
| Using these mechanisms, a program can implement fully asynchronous I/O |
| without using |
| .BR select (2) |
| or |
| .BR poll (2) |
| most of the time. |
| .PP |
| The use of |
| .BR O_ASYNC |
| is specific to BSD and Linux. |
| The only use of |
| .BR F_GETOWN |
| and |
| .B F_SETOWN |
| specified in POSIX.1 is in conjunction with the use of the |
| .B SIGURG |
| signal on sockets. |
| (POSIX does not specify the |
| .BR SIGIO |
| signal.) |
| .BR F_GETOWN_EX , |
| .BR F_SETOWN_EX , |
| .BR F_GETSIG , |
| and |
| .B F_SETSIG |
| are Linux-specific. |
| POSIX has asynchronous I/O and the |
| .I aio_sigevent |
| structure to achieve similar things; these are also available |
| in Linux as part of the GNU C Library (Glibc). |
| .SS Leases |
| .B F_SETLEASE |
| and |
| .B F_GETLEASE |
| (Linux 2.4 onward) are used to establish a new lease, |
| and retrieve the current lease, on the open file description |
| referred to by the file descriptor |
| .IR fd . |
| A file lease provides a mechanism whereby the process holding |
| the lease (the "lease holder") is notified (via delivery of a signal) |
| when a process (the "lease breaker") tries to |
| .BR open (2) |
| or |
| .BR truncate (2) |
| the file referred to by that file descriptor. |
| .TP |
| .BR F_SETLEASE " (\fIint\fP)" |
| Set or remove a file lease according to which of the following |
| values is specified in the integer |
| .IR arg : |
| .RS |
| .TP |
| .B F_RDLCK |
| Take out a read lease. |
| This will cause the calling process to be notified when |
| the file is opened for writing or is truncated. |
| .\" The following became true in kernel 2.6.10: |
| .\" See the man-pages-2.09 Changelog for further info. |
| A read lease can be placed only on a file descriptor that |
| is opened read-only. |
| .TP |
| .B F_WRLCK |
| Take out a write lease. |
| This will cause the caller to be notified when |
| the file is opened for reading or writing or is truncated. |
| A write lease may be placed on a file only if there are no |
| other open file descriptors for the file. |
| .TP |
| .B F_UNLCK |
| Remove our lease from the file. |
| .RE |
| .PP |
| Leases are associated with an open file description (see |
| .BR open (2)). |
| This means that duplicate file descriptors (created by, for example, |
| .BR fork (2) |
| or |
| .BR dup (2)) |
| refer to the same lease, and this lease may be modified |
| or released using any of these descriptors. |
| Furthermore, the lease is released by either an explicit |
| .B F_UNLCK |
| operation on any of these duplicate file descriptors, or when all |
| such file descriptors have been closed. |
| .PP |
| Leases may be taken out only on regular files. |
| An unprivileged process may take out a lease only on a file whose |
| UID (owner) matches the filesystem UID of the process. |
| A process with the |
| .B CAP_LEASE |
| capability may take out leases on arbitrary files. |
| .TP |
| .BR F_GETLEASE " (\fIvoid\fP)" |
| Indicates what type of lease is associated with the file descriptor |
| .I fd |
| by returning either |
| .BR F_RDLCK ", " F_WRLCK ", or " F_UNLCK , |
| indicating, respectively, a read lease , a write lease, or no lease. |
| .I arg |
| is ignored. |
| .PP |
| When a process (the "lease breaker") performs an |
| .BR open (2) |
| or |
| .BR truncate (2) |
| that conflicts with a lease established via |
| .BR F_SETLEASE , |
| the system call is blocked by the kernel and |
| the kernel notifies the lease holder by sending it a signal |
| .RB ( SIGIO |
| by default). |
| The lease holder should respond to receipt of this signal by doing |
| whatever cleanup is required in preparation for the file to be |
| accessed by another process (e.g., flushing cached buffers) and |
| then either remove or downgrade its lease. |
| A lease is removed by performing an |
| .B F_SETLEASE |
| command specifying |
| .I arg |
| as |
| .BR F_UNLCK . |
| If the lease holder currently holds a write lease on the file, |
| and the lease breaker is opening the file for reading, |
| then it is sufficient for the lease holder to downgrade |
| the lease to a read lease. |
| This is done by performing an |
| .B F_SETLEASE |
| command specifying |
| .I arg |
| as |
| .BR F_RDLCK . |
| .PP |
| If the lease holder fails to downgrade or remove the lease within |
| the number of seconds specified in |
| .IR /proc/sys/fs/lease\-break\-time , |
| then the kernel forcibly removes or downgrades the lease holder's lease. |
| .PP |
| Once a lease break has been initiated, |
| .B F_GETLEASE |
| returns the target lease type (either |
| .B F_RDLCK |
| or |
| .BR F_UNLCK , |
| depending on what would be compatible with the lease breaker) |
| until the lease holder voluntarily downgrades or removes the lease or |
| the kernel forcibly does so after the lease break timer expires. |
| .PP |
| Once the lease has been voluntarily or forcibly removed or downgraded, |
| and assuming the lease breaker has not unblocked its system call, |
| the kernel permits the lease breaker's system call to proceed. |
| .PP |
| If the lease breaker's blocked |
| .BR open (2) |
| or |
| .BR truncate (2) |
| is interrupted by a signal handler, |
| then the system call fails with the error |
| .BR EINTR , |
| but the other steps still occur as described above. |
| If the lease breaker is killed by a signal while blocked in |
| .BR open (2) |
| or |
| .BR truncate (2), |
| then the other steps still occur as described above. |
| If the lease breaker specifies the |
| .B O_NONBLOCK |
| flag when calling |
| .BR open (2), |
| then the call immediately fails with the error |
| .BR EWOULDBLOCK , |
| but the other steps still occur as described above. |
| .PP |
| The default signal used to notify the lease holder is |
| .BR SIGIO , |
| but this can be changed using the |
| .B F_SETSIG |
| command to |
| .BR fcntl (). |
| If a |
| .B F_SETSIG |
| command is performed (even one specifying |
| .BR SIGIO ), |
| and the signal |
| handler is established using |
| .BR SA_SIGINFO , |
| then the handler will receive a |
| .I siginfo_t |
| structure as its second argument, and the |
| .I si_fd |
| field of this argument will hold the file descriptor of the leased file |
| that has been accessed by another process. |
| (This is useful if the caller holds leases against multiple files.) |
| .SS File and directory change notification (dnotify) |
| .TP |
| .BR F_NOTIFY " (\fIint\fP)" |
| (Linux 2.4 onward) |
| Provide notification when the directory referred to by |
| .I fd |
| or any of the files that it contains is changed. |
| The events to be notified are specified in |
| .IR arg , |
| which is a bit mask specified by ORing together zero or more of |
| the following bits: |
| .PP |
| .RS |
| .PD 0 |
| .TP |
| .B DN_ACCESS |
| A file was accessed |
| .RB ( read (2), |
| .BR pread (2), |
| .BR readv (2), |
| and similar) |
| .TP |
| .B DN_MODIFY |
| A file was modified |
| .RB ( write (2), |
| .BR pwrite (2), |
| .BR writev (2), |
| .BR truncate (2), |
| .BR ftruncate (2), |
| and similar). |
| .TP |
| .B DN_CREATE |
| A file was created |
| .RB ( open (2), |
| .BR creat (2), |
| .BR mknod (2), |
| .BR mkdir (2), |
| .BR link (2), |
| .BR symlink (2), |
| .BR rename (2) |
| into this directory). |
| .TP |
| .B DN_DELETE |
| A file was unlinked |
| .RB ( unlink (2), |
| .BR rename (2) |
| to another directory, |
| .BR rmdir (2)). |
| .TP |
| .B DN_RENAME |
| A file was renamed within this directory |
| .RB ( rename (2)). |
| .TP |
| .B DN_ATTRIB |
| The attributes of a file were changed |
| .RB ( chown (2), |
| .BR chmod (2), |
| .BR utime (2), |
| .BR utimensat (2), |
| and similar). |
| .PD |
| .RE |
| .IP |
| (In order to obtain these definitions, the |
| .B _GNU_SOURCE |
| feature test macro must be defined before including |
| .I any |
| header files.) |
| .IP |
| Directory notifications are normally "one-shot", and the application |
| must reregister to receive further notifications. |
| Alternatively, if |
| .B DN_MULTISHOT |
| is included in |
| .IR arg , |
| then notification will remain in effect until explicitly removed. |
| .IP |
| .\" The following does seem a poor API-design choice... |
| A series of |
| .B F_NOTIFY |
| requests is cumulative, with the events in |
| .I arg |
| being added to the set already monitored. |
| To disable notification of all events, make an |
| .B F_NOTIFY |
| call specifying |
| .I arg |
| as 0. |
| .IP |
| Notification occurs via delivery of a signal. |
| The default signal is |
| .BR SIGIO , |
| but this can be changed using the |
| .B F_SETSIG |
| command to |
| .BR fcntl (). |
| (Note that |
| .B SIGIO |
| is one of the nonqueuing standard signals; |
| switching to the use of a real-time signal means that |
| multiple notifications can be queued to the process.) |
| In the latter case, the signal handler receives a |
| .I siginfo_t |
| structure as its second argument (if the handler was |
| established using |
| .BR SA_SIGINFO ) |
| and the |
| .I si_fd |
| field of this structure contains the file descriptor which |
| generated the notification (useful when establishing notification |
| on multiple directories). |
| .IP |
| Especially when using |
| .BR DN_MULTISHOT , |
| a real time signal should be used for notification, |
| so that multiple notifications can be queued. |
| .IP |
| .B NOTE: |
| New applications should use the |
| .I inotify |
| interface (available since kernel 2.6.13), |
| which provides a much superior interface for obtaining notifications of |
| filesystem events. |
| See |
| .BR inotify (7). |
| .SS Changing the capacity of a pipe |
| .TP |
| .BR F_SETPIPE_SZ " (\fIint\fP; since Linux 2.6.35)" |
| Change the capacity of the pipe referred to by |
| .I fd |
| to be at least |
| .I arg |
| bytes. |
| An unprivileged process can adjust the pipe capacity to any value |
| between the system page size and the limit defined in |
| .IR /proc/sys/fs/pipe\-max\-size |
| (see |
| .BR proc (5)). |
| Attempts to set the pipe capacity below the page size are silently |
| rounded up to the page size. |
| Attempts by an unprivileged process to set the pipe capacity above the limit in |
| .IR /proc/sys/fs/pipe\-max\-size |
| yield the error |
| .BR EPERM ; |
| a privileged process |
| .RB ( CAP_SYS_RESOURCE ) |
| can override the limit. |
| .IP |
| When allocating the buffer for the pipe, |
| the kernel may use a capacity larger than |
| .IR arg , |
| if that is convenient for the implementation. |
| (In the current implementation, |
| the allocation is the next higher power-of-two page-size multiple |
| of the requested size.) |
| The actual capacity (in bytes) that is set is returned as the function result. |
| .IP |
| Attempting to set the pipe capacity smaller than the amount |
| of buffer space currently used to store data produces the error |
| .BR EBUSY . |
| .IP |
| Note that because of the way the pages of the pipe buffer |
| are employed when data is written to the pipe, |
| the number of bytes that can be written may be less than the nominal size, |
| depending on the size of the writes. |
| .TP |
| .BR F_GETPIPE_SZ " (\fIvoid\fP; since Linux 2.6.35)" |
| Return (as the function result) the capacity of the pipe referred to by |
| .IR fd . |
| .\" |
| .SS File Sealing |
| File seals limit the set of allowed operations on a given file. |
| For each seal that is set on a file, |
| a specific set of operations will fail with |
| .B EPERM |
| on this file from now on. |
| The file is said to be sealed. |
| The default set of seals depends on the type of the underlying |
| file and filesystem. |
| For an overview of file sealing, a discussion of its purpose, |
| and some code examples, see |
| .BR memfd_create (2). |
| .PP |
| Currently, |
| file seals can be applied only to a file descriptor returned by |
| .BR memfd_create (2) |
| (if the |
| .B MFD_ALLOW_SEALING |
| was employed). |
| On other filesystems, all |
| .BR fcntl () |
| operations that operate on seals will return |
| .BR EINVAL . |
| .PP |
| Seals are a property of an inode. |
| Thus, all open file descriptors referring to the same inode share |
| the same set of seals. |
| Furthermore, seals can never be removed, only added. |
| .TP |
| .BR F_ADD_SEALS " (\fIint\fP; since Linux 3.17)" |
| Add the seals given in the bit-mask argument |
| .I arg |
| to the set of seals of the inode referred to by the file descriptor |
| .IR fd . |
| Seals cannot be removed again. |
| Once this call succeeds, the seals are enforced by the kernel immediately. |
| If the current set of seals includes |
| .BR F_SEAL_SEAL |
| (see below), then this call will be rejected with |
| .BR EPERM . |
| Adding a seal that is already set is a no-op, in case |
| .B F_SEAL_SEAL |
| is not set already. |
| In order to place a seal, the file descriptor |
| .I fd |
| must be writable. |
| .TP |
| .BR F_GET_SEALS " (\fIvoid\fP; since Linux 3.17)" |
| Return (as the function result) the current set of seals |
| of the inode referred to by |
| .IR fd . |
| If no seals are set, 0 is returned. |
| If the file does not support sealing, \-1 is returned and |
| .I errno |
| is set to |
| .BR EINVAL . |
| .PP |
| The following seals are available: |
| .TP |
| .BR F_SEAL_SEAL |
| If this seal is set, any further call to |
| .BR fcntl () |
| with |
| .B F_ADD_SEALS |
| fails with the error |
| .BR EPERM . |
| Therefore, this seal prevents any modifications to the set of seals itself. |
| If the initial set of seals of a file includes |
| .BR F_SEAL_SEAL , |
| then this effectively causes the set of seals to be constant and locked. |
| .TP |
| .BR F_SEAL_SHRINK |
| If this seal is set, the file in question cannot be reduced in size. |
| This affects |
| .BR open (2) |
| with the |
| .B O_TRUNC |
| flag as well as |
| .BR truncate (2) |
| and |
| .BR ftruncate (2). |
| Those calls fail with |
| .B EPERM |
| if you try to shrink the file in question. |
| Increasing the file size is still possible. |
| .TP |
| .BR F_SEAL_GROW |
| If this seal is set, the size of the file in question cannot be increased. |
| This affects |
| .BR write (2) |
| beyond the end of the file, |
| .BR truncate (2), |
| .BR ftruncate (2), |
| and |
| .BR fallocate (2). |
| These calls fail with |
| .B EPERM |
| if you use them to increase the file size. |
| If you keep the size or shrink it, those calls still work as expected. |
| .TP |
| .BR F_SEAL_WRITE |
| If this seal is set, you cannot modify the contents of the file. |
| Note that shrinking or growing the size of the file is |
| still possible and allowed. |
| .\" One or more other seals are typically used with F_SEAL_WRITE |
| .\" because, given a file with the F_SEAL_WRITE seal set, then, |
| .\" while it would no longer be possible to (say) write zeros into |
| .\" the last 100 bytes of a file, it would still be possible |
| .\" to (say) shrink the file by 100 bytes using ftruncate(), and |
| .\" then increase the file size by 100 bytes, which would have |
| .\" the effect of replacing the last hundred bytes by zeros. |
| .\" |
| Thus, this seal is normally used in combination with one of the other seals. |
| This seal affects |
| .BR write (2) |
| and |
| .BR fallocate (2) |
| (only in combination with the |
| .B FALLOC_FL_PUNCH_HOLE |
| flag). |
| Those calls fail with |
| .B EPERM |
| if this seal is set. |
| Furthermore, trying to create new shared, writable memory-mappings via |
| .BR mmap (2) |
| will also fail with |
| .BR EPERM . |
| .IP |
| Using the |
| .B F_ADD_SEALS |
| operation to set the |
| .B F_SEAL_WRITE |
| seal fails with |
| .B EBUSY |
| if any writable, shared mapping exists. |
| Such mappings must be unmapped before you can add this seal. |
| Furthermore, if there are any asynchronous I/O operations |
| .RB ( io_submit (2)) |
| pending on the file, |
| all outstanding writes will be discarded. |
| .TP |
| .BR F_SEAL_FUTURE_WRITE " (since Linux 5.1)" |
| The effect of this seal is similar to |
| .BR F_SEAL_WRITE , |
| but the contents of the file can still be modified via |
| shared writable mappings that were created prior to the seal being set. |
| Any attempt to create a new writable mapping on the file via |
| .BR mmap (2) |
| will fail with |
| .BR EPERM . |
| Likewise, an attempt to write to the file via |
| .BR write (2) |
| will fail with |
| .BR EPERM . |
| .IP |
| Using this seal, |
| one process can create a memory buffer that it can continue to modify |
| while sharing that buffer on a "read-only" basis with other processes. |
| .\" |
| .SS File read/write hints |
| Write lifetime hints can be used to inform the kernel about the relative |
| expected lifetime of writes on a given inode or |
| via a particular open file description. |
| (See |
| .BR open (2) |
| for an explanation of open file descriptions.) |
| In this context, the term "write lifetime" means |
| the expected time the data will live on media, before |
| being overwritten or erased. |
| .PP |
| An application may use the different hint values specified below to |
| separate writes into different write classes, |
| so that multiple users or applications running on a single storage back-end |
| can aggregate their I/O patterns in a consistent manner. |
| However, there are no functional semantics implied by these flags, |
| and different I/O classes can use the write lifetime hints |
| in arbitrary ways, so long as the hints are used consistently. |
| .PP |
| The following operations can be applied to the file descriptor, |
| .IR fd : |
| .TP |
| .BR F_GET_RW_HINT " (\fIuint64_t *\fP; since Linux 4.13)" |
| Returns the value of the read/write hint associated with the underlying inode |
| referred to by |
| .IR fd . |
| .TP |
| .BR F_SET_RW_HINT " (\fIuint64_t *\fP; since Linux 4.13)" |
| Sets the read/write hint value associated with the |
| underlying inode referred to by |
| .IR fd . |
| This hint persists until either it is explicitly modified or |
| the underlying filesystem is unmounted. |
| .TP |
| .BR F_GET_FILE_RW_HINT " (\fIuint64_t *\fP; since Linux 4.13)" |
| Returns the value of the read/write hint associated with |
| the open file description referred to by |
| .IR fd . |
| .TP |
| .BR F_SET_FILE_RW_HINT " (\fIuint64_t *\fP; since Linux 4.13)" |
| Sets the read/write hint value associated with the open file description |
| referred to by |
| .IR fd . |
| .PP |
| If an open file description has not been assigned a read/write hint, |
| then it shall use the value assigned to the inode, if any. |
| .PP |
| The following read/write |
| hints are valid since Linux 4.13: |
| .TP |
| .BR RWH_WRITE_LIFE_NOT_SET |
| No specific hint has been set. |
| This is the default value. |
| .TP |
| .BR RWH_WRITE_LIFE_NONE |
| No specific write lifetime is associated with this file or inode. |
| .TP |
| .BR RWH_WRITE_LIFE_SHORT |
| Data written to this inode or via this open file description |
| is expected to have a short lifetime. |
| .TP |
| .BR RWH_WRITE_LIFE_MEDIUM |
| Data written to this inode or via this open file description |
| is expected to have a lifetime longer than |
| data written with |
| .BR RWH_WRITE_LIFE_SHORT . |
| .TP |
| .BR RWH_WRITE_LIFE_LONG |
| Data written to this inode or via this open file description |
| is expected to have a lifetime longer than |
| data written with |
| .BR RWH_WRITE_LIFE_MEDIUM . |
| .TP |
| .BR RWH_WRITE_LIFE_EXTREME |
| Data written to this inode or via this open file description |
| is expected to have a lifetime longer than |
| data written with |
| .BR RWH_WRITE_LIFE_LONG . |
| .PP |
| All the write-specific hints are relative to each other, |
| and no individual absolute meaning should be attributed to them. |
| .SH RETURN VALUE |
| For a successful call, the return value depends on the operation: |
| .TP |
| .B F_DUPFD |
| The new file descriptor. |
| .TP |
| .B F_GETFD |
| Value of file descriptor flags. |
| .TP |
| .B F_GETFL |
| Value of file status flags. |
| .TP |
| .B F_GETLEASE |
| Type of lease held on file descriptor. |
| .TP |
| .B F_GETOWN |
| Value of file descriptor owner. |
| .TP |
| .B F_GETSIG |
| Value of signal sent when read or write becomes possible, or zero |
| for traditional |
| .B SIGIO |
| behavior. |
| .TP |
| .BR F_GETPIPE_SZ ", " F_SETPIPE_SZ |
| The pipe capacity. |
| .TP |
| .BR F_GET_SEALS |
| A bit mask identifying the seals that have been set |
| for the inode referred to by |
| .IR fd . |
| .TP |
| All other commands |
| Zero. |
| .PP |
| On error, \-1 is returned, and |
| .I errno |
| is set to indicate the error. |
| .SH ERRORS |
| .TP |
| .BR EACCES " or " EAGAIN |
| Operation is prohibited by locks held by other processes. |
| .TP |
| .B EAGAIN |
| The operation is prohibited because the file has been memory-mapped by |
| another process. |
| .TP |
| .B EBADF |
| .I fd |
| is not an open file descriptor |
| .TP |
| .B EBADF |
| .I cmd |
| is |
| .B F_SETLK |
| or |
| .B F_SETLKW |
| and the file descriptor open mode doesn't match with the |
| type of lock requested. |
| .TP |
| .BR EBUSY |
| .I cmd |
| is |
| .BR F_SETPIPE_SZ |
| and the new pipe capacity specified in |
| .I arg |
| is smaller than the amount of buffer space currently |
| used to store data in the pipe. |
| .TP |
| .B EBUSY |
| .I cmd |
| is |
| .BR F_ADD_SEALS , |
| .IR arg |
| includes |
| .BR F_SEAL_WRITE , |
| and there exists a writable, shared mapping on the file referred to by |
| .IR fd . |
| .TP |
| .B EDEADLK |
| It was detected that the specified |
| .B F_SETLKW |
| command would cause a deadlock. |
| .TP |
| .B EFAULT |
| .I lock |
| is outside your accessible address space. |
| .TP |
| .B EINTR |
| .I cmd |
| is |
| .BR F_SETLKW |
| or |
| .BR F_OFD_SETLKW |
| and the operation was interrupted by a signal; see |
| .BR signal (7). |
| .TP |
| .B EINTR |
| .I cmd |
| is |
| .BR F_GETLK , |
| .BR F_SETLK , |
| .BR F_OFD_GETLK , |
| or |
| .BR F_OFD_SETLK , |
| and the operation was interrupted by a signal before the lock was checked or |
| acquired. |
| Most likely when locking a remote file (e.g., locking over |
| NFS), but can sometimes happen locally. |
| .TP |
| .B EINVAL |
| The value specified in |
| .I cmd |
| is not recognized by this kernel. |
| .TP |
| .B EINVAL |
| .I cmd |
| is |
| .BR F_ADD_SEALS |
| and |
| .I arg |
| includes an unrecognized sealing bit. |
| .TP |
| .BR EINVAL |
| .I cmd |
| is |
| .BR F_ADD_SEALS |
| or |
| .BR F_GET_SEALS |
| and the filesystem containing the inode referred to by |
| .I fd |
| does not support sealing. |
| .TP |
| .B EINVAL |
| .I cmd |
| is |
| .BR F_DUPFD |
| and |
| .I arg |
| is negative or is greater than the maximum allowable value |
| (see the discussion of |
| .BR RLIMIT_NOFILE |
| in |
| .BR getrlimit (2)). |
| .TP |
| .B EINVAL |
| .I cmd |
| is |
| .BR F_SETSIG |
| and |
| .I arg |
| is not an allowable signal number. |
| .TP |
| .B EINVAL |
| .I cmd |
| is |
| .BR F_OFD_SETLK , |
| .BR F_OFD_SETLKW , |
| or |
| .BR F_OFD_GETLK , |
| and |
| .I l_pid |
| was not specified as zero. |
| .TP |
| .B EMFILE |
| .I cmd |
| is |
| .BR F_DUPFD |
| and the per-process limit on the number of open file descriptors |
| has been reached. |
| .TP |
| .B ENOLCK |
| Too many segment locks open, lock table is full, or a remote locking |
| protocol failed (e.g., locking over NFS). |
| .TP |
| .B ENOTDIR |
| .B F_NOTIFY |
| was specified in |
| .IR cmd , |
| but |
| .IR fd |
| does not refer to a directory. |
| .TP |
| .BR EPERM |
| .I cmd |
| is |
| .BR F_SETPIPE_SZ |
| and the soft or hard user pipe limit has been reached; see |
| .BR pipe (7). |
| .TP |
| .B EPERM |
| Attempted to clear the |
| .B O_APPEND |
| flag on a file that has the append-only attribute set. |
| .TP |
| .B EPERM |
| .I cmd |
| was |
| .BR F_ADD_SEALS , |
| but |
| .I fd |
| was not open for writing |
| or the current set of seals on the file already includes |
| .BR F_SEAL_SEAL . |
| .SH CONFORMING TO |
| SVr4, 4.3BSD, POSIX.1-2001. |
| Only the operations |
| .BR F_DUPFD , |
| .BR F_GETFD , |
| .BR F_SETFD , |
| .BR F_GETFL , |
| .BR F_SETFL , |
| .BR F_GETLK , |
| .BR F_SETLK , |
| and |
| .BR F_SETLKW |
| are specified in POSIX.1-2001. |
| .PP |
| .BR F_GETOWN |
| and |
| .B F_SETOWN |
| are specified in POSIX.1-2001. |
| (To get their definitions, define either |
| .\" .BR _BSD_SOURCE , |
| .\" or |
| .BR _XOPEN_SOURCE |
| with the value 500 or greater, or |
| .BR _POSIX_C_SOURCE |
| with the value 200809L or greater.) |
| .PP |
| .B F_DUPFD_CLOEXEC |
| is specified in POSIX.1-2008. |
| (To get this definition, define |
| .B _POSIX_C_SOURCE |
| with the value 200809L or greater, or |
| .B _XOPEN_SOURCE |
| with the value 700 or greater.) |
| .PP |
| .BR F_GETOWN_EX , |
| .BR F_SETOWN_EX , |
| .BR F_SETPIPE_SZ , |
| .BR F_GETPIPE_SZ , |
| .BR F_GETSIG , |
| .BR F_SETSIG , |
| .BR F_NOTIFY , |
| .BR F_GETLEASE , |
| and |
| .B F_SETLEASE |
| are Linux-specific. |
| (Define the |
| .B _GNU_SOURCE |
| macro to obtain these definitions.) |
| .\" .PP |
| .\" SVr4 documents additional EIO, ENOLINK and EOVERFLOW error conditions. |
| .PP |
| .BR F_OFD_SETLK , |
| .BR F_OFD_SETLKW , |
| and |
| .BR F_OFD_GETLK |
| are Linux-specific (and one must define |
| .BR _GNU_SOURCE |
| to obtain their definitions), |
| but work is being done to have them included in the next version of POSIX.1. |
| .PP |
| .BR F_ADD_SEALS |
| and |
| .BR F_GET_SEALS |
| are Linux-specific. |
| .\" FIXME . Once glibc adds support, add a note about FTM requirements |
| .SH NOTES |
| The errors returned by |
| .BR dup2 (2) |
| are different from those returned by |
| .BR F_DUPFD . |
| .\" |
| .SS File locking |
| The original Linux |
| .BR fcntl () |
| system call was not designed to handle large file offsets |
| (in the |
| .I flock |
| structure). |
| Consequently, an |
| .BR fcntl64 () |
| system call was added in Linux 2.4. |
| The newer system call employs a different structure for file locking, |
| .IR flock64 , |
| and corresponding commands, |
| .BR F_GETLK64 , |
| .BR F_SETLK64 , |
| and |
| .BR F_SETLKW64 . |
| However, these details can be ignored by applications using glibc, whose |
| .BR fcntl () |
| wrapper function transparently employs the more recent system call |
| where it is available. |
| .\" |
| .SS Record locks |
| Since kernel 2.0, there is no interaction between the types of lock |
| placed by |
| .BR flock (2) |
| and |
| .BR fcntl (). |
| .PP |
| Several systems have more fields in |
| .I "struct flock" |
| such as, for example, |
| .IR l_sysid |
| (to identify the machine where the lock is held). |
| .\" e.g., Solaris 8 documents this field in fcntl(2), and Irix 6.5 |
| .\" documents it in fcntl(5). mtk, May 2007 |
| .\" Also, FreeBSD documents it (Apr 2014). |
| Clearly, |
| .I l_pid |
| alone is not going to be very useful if the process holding the lock |
| may live on a different machine; |
| on Linux, while present on some architectures (such as MIPS32), |
| this field is not used. |
| .PP |
| The original Linux |
| .BR fcntl () |
| system call was not designed to handle large file offsets |
| (in the |
| .I flock |
| structure). |
| Consequently, an |
| .BR fcntl64 () |
| system call was added in Linux 2.4. |
| The newer system call employs a different structure for file locking, |
| .IR flock64 , |
| and corresponding commands, |
| .BR F_GETLK64 , |
| .BR F_SETLK64 , |
| and |
| .BR F_SETLKW64 . |
| However, these details can be ignored by applications using glibc, whose |
| .BR fcntl () |
| wrapper function transparently employs the more recent system call |
| where it is available. |
| .SS Record locking and NFS |
| Before Linux 3.12, if an NFSv4 client |
| loses contact with the server for a period of time |
| (defined as more than 90 seconds with no communication), |
| .\" |
| .\" Neil Brown: With NFSv3 the failure mode is the reverse. If |
| .\" the server loses contact with a client then any lock stays in place |
| .\" indefinitely ("why can't I read my mail"... I remember it well). |
| .\" |
| it might lose and regain a lock without ever being aware of the fact. |
| (The period of time after which contact is assumed lost is known as |
| the NFSv4 leasetime. |
| On a Linux NFS server, this can be determined by looking at |
| .IR /proc/fs/nfsd/nfsv4leasetime , |
| which expresses the period in seconds. |
| The default value for this file is 90.) |
| .\" |
| .\" Jeff Layton: |
| .\" Note that this is not a firm timeout. The server runs a job |
| .\" periodically to clean out expired stateful objects, and it's likely |
| .\" that there is some time (maybe even up to another whole lease period) |
| .\" between when the timeout expires and the job actually runs. If the |
| .\" client gets a RENEW in there within that window, its lease will be |
| .\" renewed and its state preserved. |
| .\" |
| This scenario potentially risks data corruption, |
| since another process might acquire a lock in the intervening period |
| and perform file I/O. |
| .PP |
| Since Linux 3.12, |
| .\" commit ef1820f9be27b6ad158f433ab38002ab8131db4d |
| if an NFSv4 client loses contact with the server, |
| any I/O to the file by a process which "thinks" it holds |
| a lock will fail until that process closes and reopens the file. |
| A kernel parameter, |
| .IR nfs.recover_lost_locks , |
| can be set to 1 to obtain the pre-3.12 behavior, |
| whereby the client will attempt to recover lost locks |
| when contact is reestablished with the server. |
| Because of the attendant risk of data corruption, |
| .\" commit f6de7a39c181dfb8a2c534661a53c73afb3081cd |
| this parameter defaults to 0 (disabled). |
| .SH BUGS |
| .SS F_SETFL |
| It is not possible to use |
| .BR F_SETFL |
| to change the state of the |
| .BR O_DSYNC |
| and |
| .BR O_SYNC |
| flags. |
| .\" FIXME . According to POSIX.1-2001, O_SYNC should also be modifiable |
| .\" via fcntl(2), but currently Linux does not permit this |
| .\" See http://bugzilla.kernel.org/show_bug.cgi?id=5994 |
| Attempts to change the state of these flags are silently ignored. |
| .SS F_GETOWN |
| A limitation of the Linux system call conventions on some |
| architectures (notably i386) means that if a (negative) |
| process group ID to be returned by |
| .B F_GETOWN |
| falls in the range \-1 to \-4095, then the return value is wrongly |
| interpreted by glibc as an error in the system call; |
| .\" glibc source: sysdeps/unix/sysv/linux/i386/sysdep.h |
| that is, the return value of |
| .BR fcntl () |
| will be \-1, and |
| .I errno |
| will contain the (positive) process group ID. |
| The Linux-specific |
| .BR F_GETOWN_EX |
| operation avoids this problem. |
| .\" mtk, Dec 04: some limited testing on alpha and ia64 seems to |
| .\" indicate that ANY negative PGID value will cause F_GETOWN |
| .\" to misinterpret the return as an error. Some other architectures |
| .\" seem to have the same range check as i386. |
| Since glibc version 2.11, glibc makes the kernel |
| .B F_GETOWN |
| problem invisible by implementing |
| .B F_GETOWN |
| using |
| .BR F_GETOWN_EX . |
| .SS F_SETOWN |
| In Linux 2.4 and earlier, there is bug that can occur |
| when an unprivileged process uses |
| .B F_SETOWN |
| to specify the owner |
| of a socket file descriptor |
| as a process (group) other than the caller. |
| In this case, |
| .BR fcntl () |
| can return \-1 with |
| .I errno |
| set to |
| .BR EPERM , |
| even when the owner process (group) is one that the caller |
| has permission to send signals to. |
| Despite this error return, the file descriptor owner is set, |
| and signals will be sent to the owner. |
| .\" |
| .SS Deadlock detection |
| The deadlock-detection algorithm employed by the kernel when dealing with |
| .BR F_SETLKW |
| requests can yield both |
| false negatives (failures to detect deadlocks, |
| leaving a set of deadlocked processes blocked indefinitely) |
| and false positives |
| .RB ( EDEADLK |
| errors when there is no deadlock). |
| For example, |
| the kernel limits the lock depth of its dependency search to 10 steps, |
| meaning that circular deadlock chains that exceed |
| that size will not be detected. |
| In addition, the kernel may falsely indicate a deadlock |
| when two or more processes created using the |
| .BR clone (2) |
| .B CLONE_FILES |
| flag place locks that appear (to the kernel) to conflict. |
| .\" |
| .SS Mandatory locking |
| The Linux implementation of mandatory locking |
| is subject to race conditions which render it unreliable: |
| .\" http://marc.info/?l=linux-kernel&m=119013491707153&w=2 |
| .\" |
| .\" Reconfirmed by Jeff Layton |
| .\" From: Jeff Layton <jlayton <at> redhat.com> |
| .\" Subject: Re: Status of fcntl() mandatory locking |
| .\" Newsgroups: gmane.linux.file-systems |
| .\" Date: 2014-04-28 10:07:57 GMT |
| .\" http://thread.gmane.org/gmane.linux.file-systems/84481/focus=84518 |
| a |
| .BR write (2) |
| call that overlaps with a lock may modify data after the mandatory lock is |
| acquired; |
| a |
| .BR read (2) |
| call that overlaps with a lock may detect changes to data that were made |
| only after a write lock was acquired. |
| Similar races exist between mandatory locks and |
| .BR mmap (2). |
| It is therefore inadvisable to rely on mandatory locking. |
| .SH SEE ALSO |
| .BR dup2 (2), |
| .BR flock (2), |
| .BR open (2), |
| .BR socket (2), |
| .BR lockf (3), |
| .BR capabilities (7), |
| .BR feature_test_macros (7), |
| .BR lslocks (8) |
| .PP |
| .IR locks.txt , |
| .IR mandatory\-locking.txt , |
| and |
| .I dnotify.txt |
| in the Linux kernel source directory |
| .IR Documentation/filesystems/ |
| (on older kernels, these files are directly under the |
| .I Documentation/ |
| directory, and |
| .I mandatory\-locking.txt |
| is called |
| .IR mandatory.txt ) |