| .\" This manpage is Copyright (C) 1992 Drew Eckhardt; |
| .\" and Copyright (C) 1993 Michael Haardt, Ian Jackson. |
| .\" and Copyright (C) 2008 Greg Banks |
| .\" and Copyright (C) 2006, 2008, 2013, 2014 Michael Kerrisk <mtk.manpages@gmail.com> |
| .\" |
| .\" %%%LICENSE_START(VERBATIM) |
| .\" Permission is granted to make and distribute verbatim copies of this |
| .\" manual provided the copyright notice and this permission notice are |
| .\" preserved on all copies. |
| .\" |
| .\" Permission is granted to copy and distribute modified versions of this |
| .\" manual under the conditions for verbatim copying, provided that the |
| .\" entire resulting derived work is distributed under the terms of a |
| .\" permission notice identical to this one. |
| .\" |
| .\" Since the Linux kernel and libraries are constantly changing, this |
| .\" manual page may be incorrect or out-of-date. The author(s) assume no |
| .\" responsibility for errors or omissions, or for damages resulting from |
| .\" the use of the information contained herein. The author(s) may not |
| .\" have taken the same level of care in the production of this manual, |
| .\" which is licensed free of charge, as they might when working |
| .\" professionally. |
| .\" |
| .\" Formatted or processed versions of this manual, if unaccompanied by |
| .\" the source, must acknowledge the copyright and authors of this work. |
| .\" %%%LICENSE_END |
| .\" |
| .\" Modified 1993-07-21 by Rik Faith <faith@cs.unc.edu> |
| .\" Modified 1994-08-21 by Michael Haardt |
| .\" Modified 1996-04-13 by Andries Brouwer <aeb@cwi.nl> |
| .\" Modified 1996-05-13 by Thomas Koenig |
| .\" Modified 1996-12-20 by Michael Haardt |
| .\" Modified 1999-02-19 by Andries Brouwer <aeb@cwi.nl> |
| .\" Modified 1998-11-28 by Joseph S. Myers <jsm28@hermes.cam.ac.uk> |
| .\" Modified 1999-06-03 by Michael Haardt |
| .\" Modified 2002-05-07 by Michael Kerrisk <mtk.manpages@gmail.com> |
| .\" Modified 2004-06-23 by Michael Kerrisk <mtk.manpages@gmail.com> |
| .\" 2004-12-08, mtk, reordered flags list alphabetically |
| .\" 2004-12-08, Martin Pool <mbp@sourcefrog.net> (& mtk), added O_NOATIME |
| .\" 2007-09-18, mtk, Added description of O_CLOEXEC + other minor edits |
| .\" 2008-01-03, mtk, with input from Trond Myklebust |
| .\" <trond.myklebust@fys.uio.no> and Timo Sirainen <tss@iki.fi> |
| .\" Rewrite description of O_EXCL. |
| .\" 2008-01-11, Greg Banks <gnb@melbourne.sgi.com>: add more detail |
| .\" on O_DIRECT. |
| .\" 2008-02-26, Michael Haardt: Reorganized text for O_CREAT and mode |
| .\" |
| .\" FIXME . Apr 08: The next POSIX revision has O_EXEC, O_SEARCH, and |
| .\" O_TTYINIT. Eventually these may need to be documented. --mtk |
| .\" |
| .TH OPEN 2 2017-09-15 "Linux" "Linux Programmer's Manual" |
| .SH NAME |
| open, openat, creat \- open and possibly create a file |
| .SH SYNOPSIS |
| .nf |
| .B #include <sys/types.h> |
| .B #include <sys/stat.h> |
| .B #include <fcntl.h> |
| .PP |
| .BI "int open(const char *" pathname ", int " flags ); |
| .BI "int open(const char *" pathname ", int " flags ", mode_t " mode ); |
| .PP |
| .BI "int creat(const char *" pathname ", mode_t " mode ); |
| .PP |
| .BI "int openat(int " dirfd ", const char *" pathname ", int " flags ); |
| .BI "int openat(int " dirfd ", const char *" pathname ", int " flags \ |
| ", mode_t " mode ); |
| .fi |
| .PP |
| .in -4n |
| Feature Test Macro Requirements for glibc (see |
| .BR feature_test_macros (7)): |
| .in |
| .PP |
| .BR openat (): |
| .PD 0 |
| .ad l |
| .RS 4 |
| .TP 4 |
| Since glibc 2.10: |
| _POSIX_C_SOURCE\ >=\ 200809L |
| .TP |
| Before glibc 2.10: |
| _ATFILE_SOURCE |
| .RE |
| .ad |
| .PD |
| .SH DESCRIPTION |
| The |
| .BR open () |
| system call opens the file specified by |
| .IR pathname . |
| If the specified file does not exist, |
| it may optionally (if |
| .B O_CREAT |
| is specified in |
| .IR flags ) |
| be created by |
| .BR open (). |
| .PP |
| The return value of |
| .BR open () |
| is a file descriptor, a small, nonnegative integer that is used |
| in subsequent system calls |
| .RB ( read "(2), " write "(2), " lseek "(2), " fcntl (2), |
| etc.) to refer to the open file. |
| The file descriptor returned by a successful call will be |
| the lowest-numbered file descriptor not currently open for the process. |
| .PP |
| By default, the new file descriptor is set to remain open across an |
| .BR execve (2) |
| (i.e., the |
| .B FD_CLOEXEC |
| file descriptor flag described in |
| .BR fcntl (2) |
| is initially disabled); the |
| .B O_CLOEXEC |
| flag, described below, can be used to change this default. |
| The file offset is set to the beginning of the file (see |
| .BR lseek (2)). |
| .PP |
| A call to |
| .BR open () |
| creates a new |
| .IR "open file description" , |
| an entry in the system-wide table of open files. |
| The open file description records the file offset and the file status flags |
| (see below). |
| A file descriptor is a reference to an open file description; |
| this reference is unaffected if |
| .I pathname |
| is subsequently removed or modified to refer to a different file. |
| For further details on open file descriptions, see NOTES. |
| .PP |
| The argument |
| .I flags |
| must include one of the following |
| .IR "access modes" : |
| .BR O_RDONLY ", " O_WRONLY ", or " O_RDWR . |
| These request opening the file read-only, write-only, or read/write, |
| respectively. |
| .PP |
| In addition, zero or more file creation flags and file status flags |
| can be |
| .RI bitwise- or 'd |
| in |
| .IR flags . |
| The |
| .I file creation flags |
| are |
| .BR O_CLOEXEC , |
| .BR O_CREAT , |
| .BR O_DIRECTORY , |
| .BR O_EXCL , |
| .BR O_NOCTTY , |
| .BR O_NOFOLLOW , |
| .BR O_TMPFILE , |
| and |
| .BR O_TRUNC . |
| The |
| .I file status flags |
| are all of the remaining flags listed below. |
| .\" SUSv4 divides the flags into: |
| .\" * Access mode |
| .\" * File creation |
| .\" * File status |
| .\" * Other (O_CLOEXEC, O_DIRECTORY, O_NOFOLLOW) |
| .\" though it's not clear what the difference between "other" and |
| .\" "File creation" flags is. I raised an Aardvark to see if this |
| .\" can be clarified in SUSv4; 10 Oct 2008. |
| .\" http://thread.gmane.org/gmane.comp.standards.posix.austin.general/64/focus=67 |
| .\" TC1 (balloted in 2013), resolved this, so that those three constants |
| .\" are also categorized" as file status flags. |
| .\" |
| The distinction between these two groups of flags is that |
| the file creation flags affect the semantics of the open operation itself, |
| while the file status flags affect the semantics of subsequent I/O operations. |
| The file status flags can be retrieved and (in some cases) |
| modified; see |
| .BR fcntl (2) |
| for details. |
| .PP |
| The full list of file creation flags and file status flags is as follows: |
| .TP |
| .B O_APPEND |
| The file is opened in append mode. |
| Before each |
| .BR write (2), |
| the file offset is positioned at the end of the file, |
| as if with |
| .BR lseek (2). |
| The modification of the file offset and the write operation |
| are performed as a single atomic step. |
| .IP |
| .B O_APPEND |
| may lead to corrupted files on NFS filesystems if more than one process |
| appends data to a file at once. |
| .\" For more background, see |
| .\" http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=453946 |
| .\" http://nfs.sourceforge.net/ |
| This is because NFS does not support |
| appending to a file, so the client kernel has to simulate it, which |
| can't be done without a race condition. |
| .TP |
| .B O_ASYNC |
| Enable signal-driven I/O: |
| generate a signal |
| .RB ( SIGIO |
| by default, but this can be changed via |
| .BR fcntl (2)) |
| when input or output becomes possible on this file descriptor. |
| This feature is available only for terminals, pseudoterminals, |
| sockets, and (since Linux 2.6) pipes and FIFOs. |
| See |
| .BR fcntl (2) |
| for further details. |
| See also BUGS, below. |
| .TP |
| .BR O_CLOEXEC " (since Linux 2.6.23)" |
| .\" NOTE! several other man pages refer to this text |
| Enable the close-on-exec flag for the new file descriptor. |
| .\" FIXME . for later review when Issue 8 is one day released... |
| .\" POSIX proposes to fix many APIs that provide hidden FDs |
| .\" http://austingroupbugs.net/tag_view_page.php?tag_id=8 |
| .\" http://austingroupbugs.net/view.php?id=368 |
| Specifying this flag permits a program to avoid additional |
| .BR fcntl (2) |
| .B F_SETFD |
| operations to set the |
| .B FD_CLOEXEC |
| flag. |
| .IP |
| Note that the use of this flag is essential in some multithreaded programs, |
| because using a separate |
| .BR fcntl (2) |
| .B F_SETFD |
| operation to set the |
| .B FD_CLOEXEC |
| flag does not suffice to avoid race conditions |
| where one thread opens a file descriptor and |
| attempts to set its close-on-exec flag using |
| .BR fcntl (2) |
| at the same time as another thread does a |
| .BR fork (2) |
| plus |
| .BR execve (2). |
| Depending on the order of execution, |
| the race may lead to the file descriptor returned by |
| .BR open () |
| being unintentionally leaked to the program executed by the child process |
| created by |
| .BR fork (2). |
| (This kind of race is in principle possible for any system call |
| that creates a file descriptor whose close-on-exec flag should be set, |
| and various other Linux system calls provide an equivalent of the |
| .BR O_CLOEXEC |
| flag to deal with this problem.) |
| .\" This flag fixes only one form of the race condition; |
| .\" The race can also occur with, for example, file descriptors |
| .\" returned by accept(), pipe(), etc. |
| .TP |
| .B O_CREAT |
| If |
| .I pathname |
| does not exist, create it as a regular file. |
| .IP |
| The owner (user ID) of the new file is set to the effective user ID |
| of the process. |
| .IP |
| The group ownership (group ID) of the new file is set either to |
| the effective group ID of the process (System V semantics) |
| or to the group ID of the parent directory (BSD semantics). |
| On Linux, the behavior depends on whether the |
| set-group-ID mode bit is set on the parent directory: |
| if that bit is set, then BSD semantics apply; |
| otherwise, System V semantics apply. |
| For some filesystems, the behavior also depends on the |
| .I bsdgroups |
| and |
| .I sysvgroups |
| mount options described in |
| .BR mount (8)). |
| .\" As at 2.6.25, bsdgroups is supported by ext2, ext3, ext4, and |
| .\" XFS (since 2.6.14). |
| .RS |
| .PP |
| The |
| .I mode |
| argument specifies the file mode bits be applied when a new file is created. |
| This argument must be supplied when |
| .B O_CREAT |
| or |
| .B O_TMPFILE |
| is specified in |
| .IR flags ; |
| if neither |
| .B O_CREAT |
| nor |
| .B O_TMPFILE |
| is specified, then |
| .I mode |
| is ignored. |
| The effective mode is modified by the process's |
| .I umask |
| in the usual way: in the absence of a default ACL, the mode of the |
| created file is |
| .IR "(mode\ &\ ~umask)" . |
| Note that this mode applies only to future accesses of the |
| newly created file; the |
| .BR open () |
| call that creates a read-only file may well return a read/write |
| file descriptor. |
| .PP |
| The following symbolic constants are provided for |
| .IR mode : |
| .TP 9 |
| .B S_IRWXU |
| 00700 user (file owner) has read, write, and execute permission |
| .TP |
| .B S_IRUSR |
| 00400 user has read permission |
| .TP |
| .B S_IWUSR |
| 00200 user has write permission |
| .TP |
| .B S_IXUSR |
| 00100 user has execute permission |
| .TP |
| .B S_IRWXG |
| 00070 group has read, write, and execute permission |
| .TP |
| .B S_IRGRP |
| 00040 group has read permission |
| .TP |
| .B S_IWGRP |
| 00020 group has write permission |
| .TP |
| .B S_IXGRP |
| 00010 group has execute permission |
| .TP |
| .B S_IRWXO |
| 00007 others have read, write, and execute permission |
| .TP |
| .B S_IROTH |
| 00004 others have read permission |
| .TP |
| .B S_IWOTH |
| 00002 others have write permission |
| .TP |
| .B S_IXOTH |
| 00001 others have execute permission |
| .RE |
| .IP |
| According to POSIX, the effect when other bits are set in |
| .I mode |
| is unspecified. |
| On Linux, the following bits are also honored in |
| .IR mode : |
| .RS |
| .TP 9 |
| .B S_ISUID |
| 0004000 set-user-ID bit |
| .TP |
| .B S_ISGID |
| 0002000 set-group-ID bit (see |
| .BR inode (7)). |
| .TP |
| .B S_ISVTX |
| 0001000 sticky bit (see |
| .BR inode (7)). |
| .RE |
| .TP |
| .BR O_DIRECT " (since Linux 2.4.10)" |
| Try to minimize cache effects of the I/O to and from this file. |
| In general this will degrade performance, but it is useful in |
| special situations, such as when applications do their own caching. |
| File I/O is done directly to/from user-space buffers. |
| The |
| .B O_DIRECT |
| flag on its own makes an effort to transfer data synchronously, |
| but does not give the guarantees of the |
| .B O_SYNC |
| flag that data and necessary metadata are transferred. |
| To guarantee synchronous I/O, |
| .B O_SYNC |
| must be used in addition to |
| .BR O_DIRECT . |
| See NOTES below for further discussion. |
| .IP |
| A semantically similar (but deprecated) interface for block devices |
| is described in |
| .BR raw (8). |
| .TP |
| .B O_DIRECTORY |
| If \fIpathname\fP is not a directory, cause the open to fail. |
| .\" But see the following and its replies: |
| .\" http://marc.theaimsgroup.com/?t=112748702800001&r=1&w=2 |
| .\" [PATCH] open: O_DIRECTORY and O_CREAT together should fail |
| .\" O_DIRECTORY | O_CREAT causes O_DIRECTORY to be ignored. |
| This flag was added in kernel version 2.1.126, to |
| avoid denial-of-service problems if |
| .BR opendir (3) |
| is called on a |
| FIFO or tape device. |
| .TP |
| .B O_DSYNC |
| Write operations on the file will complete according to the requirements of |
| synchronized I/O |
| .I data |
| integrity completion. |
| .IP |
| By the time |
| .BR write (2) |
| (and similar) |
| return, the output data |
| has been transferred to the underlying hardware, |
| along with any file metadata that would be required to retrieve that data |
| (i.e., as though each |
| .BR write (2) |
| was followed by a call to |
| .BR fdatasync (2)). |
| .IR "See NOTES below" . |
| .TP |
| .B O_EXCL |
| Ensure that this call creates the file: |
| if this flag is specified in conjunction with |
| .BR O_CREAT , |
| and |
| .I pathname |
| already exists, then |
| .BR open () |
| fails with the error |
| .BR EEXIST . |
| .IP |
| When these two flags are specified, symbolic links are not followed: |
| .\" POSIX.1-2001 explicitly requires this behavior. |
| if |
| .I pathname |
| is a symbolic link, then |
| .BR open () |
| fails regardless of where the symbolic link points. |
| .IP |
| In general, the behavior of |
| .B O_EXCL |
| is undefined if it is used without |
| .BR O_CREAT . |
| There is one exception: on Linux 2.6 and later, |
| .B O_EXCL |
| can be used without |
| .B O_CREAT |
| if |
| .I pathname |
| refers to a block device. |
| If the block device is in use by the system (e.g., mounted), |
| .BR open () |
| fails with the error |
| .BR EBUSY . |
| .IP |
| On NFS, |
| .B O_EXCL |
| is supported only when using NFSv3 or later on kernel 2.6 or later. |
| In NFS environments where |
| .B O_EXCL |
| support is not provided, programs that rely on it |
| for performing locking tasks will contain a race condition. |
| Portable programs that want to perform atomic file locking using a lockfile, |
| and need to avoid reliance on NFS support for |
| .BR O_EXCL , |
| can create a unique file on |
| the same filesystem (e.g., incorporating hostname and PID), and use |
| .BR link (2) |
| to make a link to the lockfile. |
| If |
| .BR link (2) |
| returns 0, the lock is successful. |
| Otherwise, use |
| .BR stat (2) |
| on the unique file to check if its link count has increased to 2, |
| in which case the lock is also successful. |
| .TP |
| .B O_LARGEFILE |
| (LFS) |
| Allow files whose sizes cannot be represented in an |
| .I off_t |
| (but can be represented in an |
| .IR off64_t ) |
| to be opened. |
| The |
| .B _LARGEFILE64_SOURCE |
| macro must be defined |
| (before including |
| .I any |
| header files) |
| in order to obtain this definition. |
| Setting the |
| .B _FILE_OFFSET_BITS |
| feature test macro to 64 (rather than using |
| .BR O_LARGEFILE ) |
| is the preferred |
| method of accessing large files on 32-bit systems (see |
| .BR feature_test_macros (7)). |
| .TP |
| .BR O_NOATIME " (since Linux 2.6.8)" |
| Do not update the file last access time |
| .RI ( st_atime |
| in the inode) |
| when the file is |
| .BR read (2). |
| .IP |
| This flag can be employed only if one of the following conditions is true: |
| .RS |
| .IP * 3 |
| The effective UID of the process |
| .\" Strictly speaking: the filesystem UID |
| matches the owner UID of the file. |
| .IP * |
| The calling process has the |
| .BR CAP_FOWNER |
| capability in its user namespace and |
| the owner UID of the file has a mapping in the namespace. |
| .RE |
| .IP |
| This flag is intended for use by indexing or backup programs, |
| where its use can significantly reduce the amount of disk activity. |
| This flag may not be effective on all filesystems. |
| One example is NFS, where the server maintains the access time. |
| .\" The O_NOATIME flag also affects the treatment of st_atime |
| .\" by mmap() and readdir(2), MTK, Dec 04. |
| .TP |
| .B O_NOCTTY |
| If |
| .I pathname |
| refers to a terminal device\(emsee |
| .BR tty (4)\(emit |
| will not become the process's controlling terminal even if the |
| process does not have one. |
| .TP |
| .B O_NOFOLLOW |
| If \fIpathname\fP is a symbolic link, then the open fails, with the error |
| .BR ELOOP . |
| Symbolic links in earlier components of the pathname will still be |
| followed. |
| (Note that the |
| .B ELOOP |
| error that can occur in this case is indistinguishable from the case where |
| an open fails because there are too many symbolic links found |
| while resolving components in the prefix part of the pathname.) |
| .IP |
| This flag is a FreeBSD extension, which was added to Linux in version 2.1.126, |
| and has subsequently been standardized in POSIX.1-2008. |
| .IP |
| See also |
| .BR O_PATH |
| below. |
| .\" The headers from glibc 2.0.100 and later include a |
| .\" definition of this flag; \fIkernels before 2.1.126 will ignore it if |
| .\" used\fP. |
| .TP |
| .BR O_NONBLOCK " or " O_NDELAY |
| When possible, the file is opened in nonblocking mode. |
| Neither the |
| .BR open () |
| nor any subsequent operations on the file descriptor which is |
| returned will cause the calling process to wait. |
| .IP |
| Note that this flag has no effect for regular files and block devices; |
| that is, I/O operations will (briefly) block when device activity |
| is required, regardless of whether |
| .B O_NONBLOCK |
| is set. |
| Since |
| .B O_NONBLOCK |
| semantics might eventually be implemented, |
| applications should not depend upon blocking behavior |
| when specifying this flag for regular files and block devices. |
| .IP |
| For the handling of FIFOs (named pipes), see also |
| .BR fifo (7). |
| For a discussion of the effect of |
| .B O_NONBLOCK |
| in conjunction with mandatory file locks and with file leases, see |
| .BR fcntl (2). |
| .TP |
| .BR O_PATH " (since Linux 2.6.39)" |
| .\" commit 1abf0c718f15a56a0a435588d1b104c7a37dc9bd |
| .\" commit 326be7b484843988afe57566b627fb7a70beac56 |
| .\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d |
| .\" |
| .\" http://thread.gmane.org/gmane.linux.man/2790/focus=3496 |
| .\" Subject: Re: [PATCH] open(2): document O_PATH |
| .\" Newsgroups: gmane.linux.man, gmane.linux.kernel |
| .\" |
| Obtain a file descriptor that can be used for two purposes: |
| to indicate a location in the filesystem tree and |
| to perform operations that act purely at the file descriptor level. |
| The file itself is not opened, and other file operations (e.g., |
| .BR read (2), |
| .BR write (2), |
| .BR fchmod (2), |
| .BR fchown (2), |
| .BR fgetxattr (2), |
| .BR ioctl (2), |
| .BR mmap (2)) |
| fail with the error |
| .BR EBADF . |
| .IP |
| The following operations |
| .I can |
| be performed on the resulting file descriptor: |
| .RS |
| .IP * 3 |
| .BR close (2). |
| .IP * |
| .BR fchdir (2), |
| if the file descriptor refers to a directory |
| (since Linux 3.5). |
| .\" commit 332a2e1244bd08b9e3ecd378028513396a004a24 |
| .IP * |
| .BR fstat (2) |
| (since Linux 3.6). |
| .IP * |
| .\" fstat(): commit 55815f70147dcfa3ead5738fd56d3574e2e3c1c2 |
| .BR fstatfs (2) |
| (since Linux 3.12). |
| .\" fstatfs(): commit 9d05746e7b16d8565dddbe3200faa1e669d23bbf |
| .IP * |
| Duplicating the file descriptor |
| .RB ( dup (2), |
| .BR fcntl (2) |
| .BR F_DUPFD , |
| etc.). |
| .IP * |
| Getting and setting file descriptor flags |
| .RB ( fcntl (2) |
| .BR F_GETFD |
| and |
| .BR F_SETFD ). |
| .IP * |
| Retrieving open file status flags using the |
| .BR fcntl (2) |
| .BR F_GETFL |
| operation: the returned flags will include the bit |
| .BR O_PATH . |
| .IP * |
| Passing the file descriptor as the |
| .IR dirfd |
| argument of |
| .BR openat () |
| and the other "*at()" system calls. |
| This includes |
| .BR linkat (2) |
| with |
| .BR AT_EMPTY_PATH |
| (or via procfs using |
| .BR AT_SYMLINK_FOLLOW ) |
| even if the file is not a directory. |
| .IP * |
| Passing the file descriptor to another process via a UNIX domain socket |
| (see |
| .BR SCM_RIGHTS |
| in |
| .BR unix (7)). |
| .RE |
| .IP |
| When |
| .B O_PATH |
| is specified in |
| .IR flags , |
| flag bits other than |
| .BR O_CLOEXEC , |
| .BR O_DIRECTORY , |
| and |
| .BR O_NOFOLLOW |
| are ignored. |
| .IP |
| Opening a file or directory with the |
| .B O_PATH |
| flag requires no permissions on the object itself |
| (but does require execute permission on the directories in the path prefix). |
| Depending on the subsequent operation, |
| a check for suitable file permissions may be performed (e.g., |
| .BR fchdir (2) |
| requires execute permission on the directory referred to |
| by its file descriptor argument). |
| By contrast, |
| obtaining a reference to a filesystem object by opening it with the |
| .B O_RDONLY |
| flag requires that the caller have read permission on the object, |
| even when the subsequent operation (e.g., |
| .BR fchdir (2), |
| .BR fstat (2)) |
| does not require read permission on the object. |
| .IP |
| If |
| .I pathname |
| is a symbolic link and the |
| .BR O_NOFOLLOW |
| flag is also specified, |
| then the call returns a file descriptor referring to the symbolic link. |
| This file descriptor can be used as the |
| .I dirfd |
| argument in calls to |
| .BR fchownat (2), |
| .BR fstatat (2), |
| .BR linkat (2), |
| and |
| .BR readlinkat (2) |
| with an empty pathname to have the calls operate on the symbolic link. |
| .IP |
| If |
| .I pathname |
| refers to an automount point that has not yet been triggered, so no |
| other filesystem is mounted on it, then the call returns a file |
| descriptor referring to the automount directory without triggering a mount. |
| .BR fstatfs (2) |
| can then be used to determine if it is, in fact, an untriggered |
| automount point |
| .RB ( ".f_type == AUTOFS_SUPER_MAGIC" ). |
| .IP |
| One use of |
| .B O_PATH |
| for regular files is to provide the equivalent of POSIX.1's |
| .B O_EXEC |
| functionality. |
| This permits us to open a file for which we have execute |
| permission but not read permission, and then execute that file, |
| with steps something like the following: |
| .IP |
| .in +4n |
| .EX |
| char buf[PATH_MAX]; |
| fd = open("some_prog", O_PATH); |
| snprintf(buf, "/proc/self/fd/%d", fd); |
| execl(buf, "some_prog", (char *) NULL); |
| .EE |
| .in |
| .IP |
| An |
| .B O_PATH |
| file descriptor can also be passed as the argument of |
| .BR fexecve (3). |
| .TP |
| .B O_SYNC |
| Write operations on the file will complete according to the requirements of |
| synchronized I/O |
| .I file |
| integrity completion |
| (by contrast with the |
| synchronized I/O |
| .I data |
| integrity completion |
| provided by |
| .BR O_DSYNC .) |
| .IP |
| By the time |
| .BR write (2) |
| (or similar) |
| returns, the output data and associated file metadata |
| have been transferred to the underlying hardware |
| (i.e., as though each |
| .BR write (2) |
| was followed by a call to |
| .BR fsync (2)). |
| .IR "See NOTES below" . |
| .TP |
| .BR O_TMPFILE " (since Linux 3.11)" |
| .\" commit 60545d0d4610b02e55f65d141c95b18ccf855b6e |
| .\" commit f4e0c30c191f87851c4a53454abb55ee276f4a7e |
| .\" commit bb458c644a59dbba3a1fe59b27106c5e68e1c4bd |
| Create an unnamed temporary regular file. |
| The |
| .I pathname |
| argument specifies a directory; |
| an unnamed inode will be created in that directory's filesystem. |
| Anything written to the resulting file will be lost when |
| the last file descriptor is closed, unless the file is given a name. |
| .IP |
| .B O_TMPFILE |
| must be specified with one of |
| .B O_RDWR |
| or |
| .B O_WRONLY |
| and, optionally, |
| .BR O_EXCL . |
| If |
| .B O_EXCL |
| is not specified, then |
| .BR linkat (2) |
| can be used to link the temporary file into the filesystem, making it |
| permanent, using code like the following: |
| .IP |
| .in +4n |
| .EX |
| char path[PATH_MAX]; |
| fd = open("/path/to/dir", O_TMPFILE | O_RDWR, |
| S_IRUSR | S_IWUSR); |
| |
| /* File I/O on 'fd'... */ |
| |
| snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd); |
| linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file", |
| AT_SYMLINK_FOLLOW); |
| .EE |
| .in |
| .IP |
| In this case, |
| the |
| .BR open () |
| .I mode |
| argument determines the file permission mode, as with |
| .BR O_CREAT . |
| .IP |
| Specifying |
| .B O_EXCL |
| in conjunction with |
| .B O_TMPFILE |
| prevents a temporary file from being linked into the filesystem |
| in the above manner. |
| (Note that the meaning of |
| .B O_EXCL |
| in this case is different from the meaning of |
| .B O_EXCL |
| otherwise.) |
| .IP |
| There are two main use cases for |
| .\" Inspired by http://lwn.net/Articles/559147/ |
| .BR O_TMPFILE : |
| .RS |
| .IP * 3 |
| Improved |
| .BR tmpfile (3) |
| functionality: race-free creation of temporary files that |
| (1) are automatically deleted when closed; |
| (2) can never be reached via any pathname; |
| (3) are not subject to symlink attacks; and |
| (4) do not require the caller to devise unique names. |
| .IP * |
| Creating a file that is initially invisible, which is then populated |
| with data and adjusted to have appropriate filesystem attributes |
| .RB ( fchown (2), |
| .BR fchmod (2), |
| .BR fsetxattr (2), |
| etc.) |
| before being atomically linked into the filesystem |
| in a fully formed state (using |
| .BR linkat (2) |
| as described above). |
| .RE |
| .IP |
| .B O_TMPFILE |
| requires support by the underlying filesystem; |
| only a subset of Linux filesystems provide that support. |
| In the initial implementation, support was provided in |
| the ext2, ext3, ext4, UDF, Minix, and shmem filesystems. |
| .\" To check for support, grep for "tmpfile" in kernel sources |
| Support for other filesystems has subsequently been added as follows: |
| XFS (Linux 3.15); |
| .\" commit 99b6436bc29e4f10e4388c27a3e4810191cc4788 |
| .\" commit ab29743117f9f4c22ac44c13c1647fb24fb2bafe |
| Btrfs (Linux 3.16); |
| .\" commit ef3b9af50bfa6a1f02cd7b3f5124b712b1ba3e3c |
| F2FS (Linux 3.16); |
| .\" commit 50732df02eefb39ab414ef655979c2c9b64ad21c |
| and ubifs (Linux 4.9) |
| .TP |
| .B O_TRUNC |
| If the file already exists and is a regular file and the access mode allows |
| writing (i.e., is |
| .B O_RDWR |
| or |
| .BR O_WRONLY ) |
| it will be truncated to length 0. |
| If the file is a FIFO or terminal device file, the |
| .B O_TRUNC |
| flag is ignored. |
| Otherwise, the effect of |
| .B O_TRUNC |
| is unspecified. |
| .SS creat() |
| A call to |
| .BR creat () |
| is equivalent to calling |
| .BR open () |
| with |
| .I flags |
| equal to |
| .BR O_CREAT|O_WRONLY|O_TRUNC . |
| .SS openat() |
| The |
| .BR openat () |
| system call operates in exactly the same way as |
| .BR open (), |
| except for the differences described here. |
| .PP |
| If the pathname given in |
| .I pathname |
| is relative, then it is interpreted relative to the directory |
| referred to by the file descriptor |
| .I dirfd |
| (rather than relative to the current working directory of |
| the calling process, as is done by |
| .BR open () |
| for a relative pathname). |
| .PP |
| If |
| .I pathname |
| is relative and |
| .I dirfd |
| is the special value |
| .BR AT_FDCWD , |
| then |
| .I pathname |
| is interpreted relative to the current working |
| directory of the calling process (like |
| .BR open ()). |
| .PP |
| If |
| .I pathname |
| is absolute, then |
| .I dirfd |
| is ignored. |
| .SH RETURN VALUE |
| .BR open (), |
| .BR openat (), |
| and |
| .BR creat () |
| return the new file descriptor, or \-1 if an error occurred |
| (in which case, |
| .I errno |
| is set appropriately). |
| .SH ERRORS |
| .BR open (), |
| .BR openat (), |
| and |
| .BR creat () |
| can fail with the following errors: |
| .TP |
| .B EACCES |
| The requested access to the file is not allowed, or search permission |
| is denied for one of the directories in the path prefix of |
| .IR pathname , |
| or the file did not exist yet and write access to the parent directory |
| is not allowed. |
| (See also |
| .BR path_resolution (7).) |
| .TP |
| .B EDQUOT |
| Where |
| .B O_CREAT |
| is specified, the file does not exist, and the user's quota of disk |
| blocks or inodes on the filesystem has been exhausted. |
| .TP |
| .B EEXIST |
| .I pathname |
| already exists and |
| .BR O_CREAT " and " O_EXCL |
| were used. |
| .TP |
| .B EFAULT |
| .I pathname |
| points outside your accessible address space. |
| .TP |
| .B EFBIG |
| See |
| .BR EOVERFLOW . |
| .TP |
| .B EINTR |
| While blocked waiting to complete an open of a slow device |
| (e.g., a FIFO; see |
| .BR fifo (7)), |
| the call was interrupted by a signal handler; see |
| .BR signal (7). |
| .TP |
| .B EINVAL |
| The filesystem does not support the |
| .BR O_DIRECT |
| flag. |
| See |
| .BR NOTES |
| for more information. |
| .TP |
| .B EINVAL |
| Invalid value in |
| .\" In particular, __O_TMPFILE instead of O_TMPFILE |
| .IR flags . |
| .TP |
| .B EINVAL |
| .B O_TMPFILE |
| was specified in |
| .IR flags , |
| but neither |
| .B O_WRONLY |
| nor |
| .B O_RDWR |
| was specified. |
| .TP |
| .B EINVAL |
| .B O_CREAT |
| was specified in |
| .I flags |
| and the final component ("basename") of the new file's |
| .I pathname |
| is invalid |
| (e.g., it contains characters not permitted by the underlying filesystem). |
| .TP |
| .B EISDIR |
| .I pathname |
| refers to a directory and the access requested involved writing |
| (that is, |
| .B O_WRONLY |
| or |
| .B O_RDWR |
| is set). |
| .TP |
| .B EISDIR |
| .I pathname |
| refers to an existing directory, |
| .B O_TMPFILE |
| and one of |
| .B O_WRONLY |
| or |
| .B O_RDWR |
| were specified in |
| .IR flags , |
| but this kernel version does not provide the |
| .B O_TMPFILE |
| functionality. |
| .TP |
| .B ELOOP |
| Too many symbolic links were encountered in resolving |
| .IR pathname . |
| .TP |
| .B ELOOP |
| .I pathname |
| was a symbolic link, and |
| .I flags |
| specified |
| .BR O_NOFOLLOW |
| but not |
| .BR O_PATH . |
| .TP |
| .B EMFILE |
| The per-process limit on the number of open file descriptors has been reached |
| (see the description of |
| .BR RLIMIT_NOFILE |
| in |
| .BR getrlimit (2)). |
| .TP |
| .B ENAMETOOLONG |
| .I pathname |
| was too long. |
| .TP |
| .B ENFILE |
| The system-wide limit on the total number of open files has been reached. |
| .TP |
| .B ENODEV |
| .I pathname |
| refers to a device special file and no corresponding device exists. |
| (This is a Linux kernel bug; in this situation |
| .B ENXIO |
| must be returned.) |
| .TP |
| .B ENOENT |
| .B O_CREAT |
| is not set and the named file does not exist. |
| Or, a directory component in |
| .I pathname |
| does not exist or is a dangling symbolic link. |
| .TP |
| .B ENOENT |
| .I pathname |
| refers to a nonexistent directory, |
| .B O_TMPFILE |
| and one of |
| .B O_WRONLY |
| or |
| .B O_RDWR |
| were specified in |
| .IR flags , |
| but this kernel version does not provide the |
| .B O_TMPFILE |
| functionality. |
| .TP |
| .B ENOMEM |
| The named file is a FIFO, |
| but memory for the FIFO buffer can't be allocated because |
| the per-user hard limit on memory allocation for pipes has been reached |
| and the caller is not privileged; see |
| .BR pipe (7). |
| .TP |
| .B ENOMEM |
| Insufficient kernel memory was available. |
| .TP |
| .B ENOSPC |
| .I pathname |
| was to be created but the device containing |
| .I pathname |
| has no room for the new file. |
| .TP |
| .B ENOTDIR |
| A component used as a directory in |
| .I pathname |
| is not, in fact, a directory, or \fBO_DIRECTORY\fP was specified and |
| .I pathname |
| was not a directory. |
| .TP |
| .B ENXIO |
| .BR O_NONBLOCK " | " O_WRONLY |
| is set, the named file is a FIFO, and |
| no process has the FIFO open for reading. |
| .TP |
| .B ENXIO |
| The file is a device special file and no corresponding device exists. |
| .TP |
| .BR EOPNOTSUPP |
| The filesystem containing |
| .I pathname |
| does not support |
| .BR O_TMPFILE . |
| .TP |
| .B EOVERFLOW |
| .I pathname |
| refers to a regular file that is too large to be opened. |
| The usual scenario here is that an application compiled |
| on a 32-bit platform without |
| .I -D_FILE_OFFSET_BITS=64 |
| tried to open a file whose size exceeds |
| .I (1<<31)-1 |
| bytes; |
| see also |
| .B O_LARGEFILE |
| above. |
| This is the error specified by POSIX.1; |
| in kernels before 2.6.24, Linux gave the error |
| .B EFBIG |
| for this case. |
| .\" See http://bugzilla.kernel.org/show_bug.cgi?id=7253 |
| .\" "Open of a large file on 32-bit fails with EFBIG, should be EOVERFLOW" |
| .\" Reported 2006-10-03 |
| .TP |
| .B EPERM |
| The |
| .B O_NOATIME |
| flag was specified, but the effective user ID of the caller |
| .\" Strictly speaking, it's the filesystem UID... (MTK) |
| did not match the owner of the file and the caller was not privileged. |
| .TP |
| .B EPERM |
| The operation was prevented by a file seal; see |
| .BR fcntl (2). |
| .TP |
| .B EROFS |
| .I pathname |
| refers to a file on a read-only filesystem and write access was |
| requested. |
| .TP |
| .B ETXTBSY |
| .I pathname |
| refers to an executable image which is currently being executed and |
| write access was requested. |
| .TP |
| .B ETXTBSY |
| .I pathname |
| refers to a file that is currently in use as a swap file, and the |
| .B O_TRUNC |
| flag was specified. |
| .TP |
| .B ETXTBSY |
| .I pathname |
| refers to a file that is currently being read by the kernel (e.g. for |
| module/firmware loading), and write access was requested. |
| .TP |
| .B EWOULDBLOCK |
| The |
| .B O_NONBLOCK |
| flag was specified, and an incompatible lease was held on the file |
| (see |
| .BR fcntl (2)). |
| .PP |
| The following additional errors can occur for |
| .BR openat (): |
| .TP |
| .B EBADF |
| .I dirfd |
| is not a valid file descriptor. |
| .TP |
| .B ENOTDIR |
| .I pathname |
| is a relative pathname and |
| .I dirfd |
| is a file descriptor referring to a file other than a directory. |
| .SH VERSIONS |
| .BR openat () |
| was added to Linux in kernel 2.6.16; |
| library support was added to glibc in version 2.4. |
| .SH CONFORMING TO |
| .BR open (), |
| .BR creat () |
| SVr4, 4.3BSD, POSIX.1-2001, POSIX.1-2008. |
| .PP |
| .BR openat (): |
| POSIX.1-2008. |
| .PP |
| The |
| .BR O_DIRECT , |
| .BR O_NOATIME , |
| .BR O_PATH , |
| and |
| .BR O_TMPFILE |
| flags are Linux-specific. |
| One must define |
| .B _GNU_SOURCE |
| to obtain their definitions. |
| .PP |
| The |
| .BR O_CLOEXEC , |
| .BR O_DIRECTORY , |
| and |
| .BR O_NOFOLLOW |
| flags are not specified in POSIX.1-2001, |
| but are specified in POSIX.1-2008. |
| Since glibc 2.12, one can obtain their definitions by defining either |
| .B _POSIX_C_SOURCE |
| with a value greater than or equal to 200809L or |
| .BR _XOPEN_SOURCE |
| with a value greater than or equal to 700. |
| In glibc 2.11 and earlier, one obtains the definitions by defining |
| .BR _GNU_SOURCE . |
| .PP |
| As noted in |
| .BR feature_test_macros (7), |
| feature test macros such as |
| .BR _POSIX_C_SOURCE , |
| .BR _XOPEN_SOURCE , |
| and |
| .B _GNU_SOURCE |
| must be defined before including |
| .I any |
| header files. |
| .SH NOTES |
| Under Linux, the |
| .B O_NONBLOCK |
| flag indicates that one wants to open |
| but does not necessarily have the intention to read or write. |
| This is typically used to open devices in order to get a file descriptor |
| for use with |
| .BR ioctl (2). |
| .PP |
| The (undefined) effect of |
| .B O_RDONLY | O_TRUNC |
| varies among implementations. |
| On many systems the file is actually truncated. |
| .\" Linux 2.0, 2.5: truncate |
| .\" Solaris 5.7, 5.8: truncate |
| .\" Irix 6.5: truncate |
| .\" Tru64 5.1B: truncate |
| .\" HP-UX 11.22: truncate |
| .\" FreeBSD 4.7: truncate |
| .PP |
| Note that |
| .BR open () |
| can open device special files, but |
| .BR creat () |
| cannot create them; use |
| .BR mknod (2) |
| instead. |
| .PP |
| If the file is newly created, its |
| .IR st_atime , |
| .IR st_ctime , |
| .I st_mtime |
| fields |
| (respectively, time of last access, time of last status change, and |
| time of last modification; see |
| .BR stat (2)) |
| are set |
| to the current time, and so are the |
| .I st_ctime |
| and |
| .I st_mtime |
| fields of the |
| parent directory. |
| Otherwise, if the file is modified because of the |
| .B O_TRUNC |
| flag, its |
| .I st_ctime |
| and |
| .I st_mtime |
| fields are set to the current time. |
| .PP |
| The files in the |
| .I /proc/[pid]/fd |
| directory show the open file descriptors of the process with the PID |
| .IR pid . |
| The files in the |
| .I /proc/[pid]/fdinfo |
| directory show even more information about these files descriptors. |
| See |
| .BR proc (5) |
| for further details of both of these directories. |
| .\" |
| .\" |
| .SS Open file descriptions |
| The term open file description is the one used by POSIX to refer to the |
| entries in the system-wide table of open files. |
| In other contexts, this object is |
| variously also called an "open file object", |
| a "file handle", an "open file table entry", |
| or\(emin kernel-developer parlance\(ema |
| .IR "struct file" . |
| .PP |
| When a file descriptor is duplicated (using |
| .BR dup (2) |
| or similar), |
| the duplicate refers to the same open file description |
| as the original file descriptor, |
| and the two file descriptors consequently share |
| the file offset and file status flags. |
| Such sharing can also occur between processes: |
| a child process created via |
| .BR fork (2) |
| inherits duplicates of its parent's file descriptors, |
| and those duplicates refer to the same open file descriptions. |
| .PP |
| Each |
| .BR open () |
| of a file creates a new open file description; |
| thus, there may be multiple open file descriptions |
| corresponding to a file inode. |
| .PP |
| On Linux, one can use the |
| .BR kcmp (2) |
| .B KCMP_FILE |
| operation to test whether two file descriptors |
| (in the same process or in two different processes) |
| refer to the same open file description. |
| .\" |
| .\" |
| .SS Synchronized I/O |
| The POSIX.1-2008 "synchronized I/O" option |
| specifies different variants of synchronized I/O, |
| and specifies the |
| .BR open () |
| flags |
| .BR O_SYNC , |
| .BR O_DSYNC , |
| and |
| .BR O_RSYNC |
| for controlling the behavior. |
| Regardless of whether an implementation supports this option, |
| it must at least support the use of |
| .BR O_SYNC |
| for regular files. |
| .PP |
| Linux implements |
| .BR O_SYNC |
| and |
| .BR O_DSYNC , |
| but not |
| .BR O_RSYNC . |
| (Somewhat incorrectly, glibc defines |
| .BR O_RSYNC |
| to have the same value as |
| .BR O_SYNC .) |
| .PP |
| .BR O_SYNC |
| provides synchronized I/O |
| .I file |
| integrity completion, |
| meaning write operations will flush data and all associated metadata |
| to the underlying hardware. |
| .BR O_DSYNC |
| provides synchronized I/O |
| .I data |
| integrity completion, |
| meaning write operations will flush data |
| to the underlying hardware, |
| but will only flush metadata updates that are required |
| to allow a subsequent read operation to complete successfully. |
| Data integrity completion can reduce the number of disk operations |
| that are required for applications that don't need the guarantees |
| of file integrity completion. |
| .PP |
| To understand the difference between the two types of completion, |
| consider two pieces of file metadata: |
| the file last modification timestamp |
| .RI ( st_mtime ) |
| and the file length. |
| All write operations will update the last file modification timestamp, |
| but only writes that add data to the end of the |
| file will change the file length. |
| The last modification timestamp is not needed to ensure that |
| a read completes successfully, but the file length is. |
| Thus, |
| .BR O_DSYNC |
| would only guarantee to flush updates to the file length metadata |
| (whereas |
| .BR O_SYNC |
| would also always flush the last modification timestamp metadata). |
| .PP |
| Before Linux 2.6.33, Linux implemented only the |
| .BR O_SYNC |
| flag for |
| .BR open (). |
| However, when that flag was specified, |
| most filesystems actually provided the equivalent of synchronized I/O |
| .I data |
| integrity completion (i.e., |
| .BR O_SYNC |
| was actually implemented as the equivalent of |
| .BR O_DSYNC ). |
| .PP |
| Since Linux 2.6.33, proper |
| .BR O_SYNC |
| support is provided. |
| However, to ensure backward binary compatibility, |
| .BR O_DSYNC |
| was defined with the same value as the historical |
| .BR O_SYNC , |
| and |
| .BR O_SYNC |
| was defined as a new (two-bit) flag value that includes the |
| .BR O_DSYNC |
| flag value. |
| This ensures that applications compiled against |
| new headers get at least |
| .BR O_DSYNC |
| semantics on pre-2.6.33 kernels. |
| .\" |
| .SS C library/kernel differences |
| Since version 2.26, |
| the glibc wrapper function for |
| .BR open () |
| employs the |
| .BR openat () |
| system call, rather than the kernel's |
| .BR open () |
| system call. |
| For certain architectures, this is also true in glibc versions before 2.26. |
| .\" |
| .SS NFS |
| There are many infelicities in the protocol underlying NFS, affecting |
| amongst others |
| .BR O_SYNC " and " O_NDELAY . |
| .PP |
| On NFS filesystems with UID mapping enabled, |
| .BR open () |
| may |
| return a file descriptor but, for example, |
| .BR read (2) |
| requests are denied |
| with \fBEACCES\fP. |
| This is because the client performs |
| .BR open () |
| by checking the |
| permissions, but UID mapping is performed by the server upon |
| read and write requests. |
| .\" |
| .\" |
| .SS FIFOs |
| Opening the read or write end of a FIFO blocks until the other |
| end is also opened (by another process or thread). |
| See |
| .BR fifo (7) |
| for further details. |
| .\" |
| .\" |
| .SS File access mode |
| Unlike the other values that can be specified in |
| .IR flags , |
| the |
| .I "access mode" |
| values |
| .BR O_RDONLY ", " O_WRONLY ", and " O_RDWR |
| do not specify individual bits. |
| Rather, they define the low order two bits of |
| .IR flags , |
| and are defined respectively as 0, 1, and 2. |
| In other words, the combination |
| .B "O_RDONLY | O_WRONLY" |
| is a logical error, and certainly does not have the same meaning as |
| .BR O_RDWR . |
| .PP |
| Linux reserves the special, nonstandard access mode 3 (binary 11) in |
| .I flags |
| to mean: |
| check for read and write permission on the file and return a file descriptor |
| that can't be used for reading or writing. |
| This nonstandard access mode is used by some Linux drivers to return a |
| file descriptor that is to be used only for device-specific |
| .BR ioctl (2) |
| operations. |
| .\" See for example util-linux's disk-utils/setfdprm.c |
| .\" For some background on access mode 3, see |
| .\" http://thread.gmane.org/gmane.linux.kernel/653123 |
| .\" "[RFC] correct flags to f_mode conversion in __dentry_open" |
| .\" LKML, 12 Mar 2008 |
| .\" |
| .\" |
| .SS Rationale for openat() and other "directory file descriptor" APIs |
| .BR openat () |
| and the other system calls and library functions that take |
| a directory file descriptor argument |
| (i.e., |
| .BR execveat (2), |
| .BR faccessat (2), |
| .BR fanotify_mark (2), |
| .BR fchmodat (2), |
| .BR fchownat (2), |
| .BR fstatat (2), |
| .BR futimesat (2), |
| .BR linkat (2), |
| .BR mkdirat (2), |
| .BR mknodat (2), |
| .BR name_to_handle_at (2), |
| .BR readlinkat (2), |
| .BR renameat (2), |
| .BR statx (2), |
| .BR symlinkat (2), |
| .BR unlinkat (2), |
| .BR utimensat (2), |
| .BR mkfifoat (3), |
| and |
| .BR scandirat (3)) |
| address two problems with the older interfaces that preceded them. |
| Here, the explanation is in terms of the |
| .BR openat () |
| call, but the rationale is analogous for the other interfaces. |
| .PP |
| First, |
| .BR openat () |
| allows an application to avoid race conditions that could |
| occur when using |
| .BR open () |
| to open files in directories other than the current working directory. |
| These race conditions result from the fact that some component |
| of the directory prefix given to |
| .BR open () |
| could be changed in parallel with the call to |
| .BR open (). |
| Suppose, for example, that we wish to create the file |
| .I dir1/dir2/xxx.dep |
| if the file |
| .I dir1/dir2/xxx |
| exists. |
| The problem is that between the existence check and the file-creation step, |
| .I dir1 |
| or |
| .I dir2 |
| (which might be symbolic links) |
| could be modified to point to a different location. |
| Such races can be avoided by |
| opening a file descriptor for the target directory, |
| and then specifying that file descriptor as the |
| .I dirfd |
| argument of (say) |
| .BR fstatat (2) |
| and |
| .BR openat (). |
| The use of the |
| .I dirfd |
| file descriptor also has other benefits: |
| .IP * 3 |
| the file descriptor is a stable reference to the directory, |
| even if the directory is renamed; and |
| .IP * |
| the open file descriptor prevents the underlying filesystem from |
| being dismounted, |
| just as when a process has a current working directory on a filesystem. |
| .PP |
| Second, |
| .BR openat () |
| allows the implementation of a per-thread "current working |
| directory", via file descriptor(s) maintained by the application. |
| (This functionality can also be obtained by tricks based |
| on the use of |
| .IR /proc/self/fd/ dirfd, |
| but less efficiently.) |
| .\" |
| .\" |
| .SS O_DIRECT |
| .PP |
| The |
| .B O_DIRECT |
| flag may impose alignment restrictions on the length and address |
| of user-space buffers and the file offset of I/Os. |
| In Linux alignment |
| restrictions vary by filesystem and kernel version and might be |
| absent entirely. |
| However there is currently no filesystem\-independent |
| interface for an application to discover these restrictions for a given |
| file or filesystem. |
| Some filesystems provide their own interfaces |
| for doing so, for example the |
| .B XFS_IOC_DIOINFO |
| operation in |
| .BR xfsctl (3). |
| .PP |
| Under Linux 2.4, transfer sizes, and the alignment of the user buffer |
| and the file offset must all be multiples of the logical block size |
| of the filesystem. |
| Since Linux 2.6.0, alignment to the logical block size of the |
| underlying storage (typically 512 bytes) suffices. |
| The logical block size can be determined using the |
| .BR ioctl (2) |
| .B BLKSSZGET |
| operation or from the shell using the command: |
| .PP |
| .EX |
| blockdev \-\-getss |
| .EE |
| .PP |
| .B O_DIRECT |
| I/Os should never be run concurrently with the |
| .BR fork (2) |
| system call, |
| if the memory buffer is a private mapping |
| (i.e., any mapping created with the |
| .BR mmap (2) |
| .BR MAP_PRIVATE |
| flag; |
| this includes memory allocated on the heap and statically allocated buffers). |
| Any such I/Os, whether submitted via an asynchronous I/O interface or from |
| another thread in the process, |
| should be completed before |
| .BR fork (2) |
| is called. |
| Failure to do so can result in data corruption and undefined behavior in |
| parent and child processes. |
| This restriction does not apply when the memory buffer for the |
| .B O_DIRECT |
| I/Os was created using |
| .BR shmat (2) |
| or |
| .BR mmap (2) |
| with the |
| .B MAP_SHARED |
| flag. |
| Nor does this restriction apply when the memory buffer has been advised as |
| .B MADV_DONTFORK |
| with |
| .BR madvise (2), |
| ensuring that it will not be available |
| to the child after |
| .BR fork (2). |
| .PP |
| The |
| .B O_DIRECT |
| flag was introduced in SGI IRIX, where it has alignment |
| restrictions similar to those of Linux 2.4. |
| IRIX has also a |
| .BR fcntl (2) |
| call to query appropriate alignments, and sizes. |
| FreeBSD 4.x introduced |
| a flag of the same name, but without alignment restrictions. |
| .PP |
| .B O_DIRECT |
| support was added under Linux in kernel version 2.4.10. |
| Older Linux kernels simply ignore this flag. |
| Some filesystems may not implement the flag, in which case |
| .BR open () |
| fails with the error |
| .B EINVAL |
| if it is used. |
| .PP |
| Applications should avoid mixing |
| .B O_DIRECT |
| and normal I/O to the same file, |
| and especially to overlapping byte regions in the same file. |
| Even when the filesystem correctly handles the coherency issues in |
| this situation, overall I/O throughput is likely to be slower than |
| using either mode alone. |
| Likewise, applications should avoid mixing |
| .BR mmap (2) |
| of files with direct I/O to the same files. |
| .PP |
| The behavior of |
| .B O_DIRECT |
| with NFS will differ from local filesystems. |
| Older kernels, or |
| kernels configured in certain ways, may not support this combination. |
| The NFS protocol does not support passing the flag to the server, so |
| .B O_DIRECT |
| I/O will bypass the page cache only on the client; the server may |
| still cache the I/O. |
| The client asks the server to make the I/O |
| synchronous to preserve the synchronous semantics of |
| .BR O_DIRECT . |
| Some servers will perform poorly under these circumstances, especially |
| if the I/O size is small. |
| Some servers may also be configured to |
| lie to clients about the I/O having reached stable storage; this |
| will avoid the performance penalty at some risk to data integrity |
| in the event of server power failure. |
| The Linux NFS client places no alignment restrictions on |
| .B O_DIRECT |
| I/O. |
| .PP |
| In summary, |
| .B O_DIRECT |
| is a potentially powerful tool that should be used with caution. |
| It is recommended that applications treat use of |
| .B O_DIRECT |
| as a performance option which is disabled by default. |
| .PP |
| .RS |
| "The thing that has always disturbed me about O_DIRECT is that the whole |
| interface is just stupid, and was probably designed by a deranged monkey |
| on some serious mind-controlling substances."\(emLinus |
| .RE |
| .SH BUGS |
| Currently, it is not possible to enable signal-driven |
| I/O by specifying |
| .B O_ASYNC |
| when calling |
| .BR open (); |
| use |
| .BR fcntl (2) |
| to enable this flag. |
| .\" FIXME . Check bugzilla report on open(O_ASYNC) |
| .\" See http://bugzilla.kernel.org/show_bug.cgi?id=5993 |
| .PP |
| One must check for two different error codes, |
| .B EISDIR |
| and |
| .BR ENOENT , |
| when trying to determine whether the kernel supports |
| .B O_TMPFILE |
| functionality. |
| .PP |
| When both |
| .B O_CREAT |
| and |
| .B O_DIRECTORY |
| are specified in |
| .IR flags |
| and the file specified by |
| .I pathname |
| does not exist, |
| .BR open () |
| will create a regular file (i.e., |
| .B O_DIRECTORY |
| is ignored). |
| .SH SEE ALSO |
| .BR chmod (2), |
| .BR chown (2), |
| .BR close (2), |
| .BR dup (2), |
| .BR fcntl (2), |
| .BR link (2), |
| .BR lseek (2), |
| .BR mknod (2), |
| .BR mmap (2), |
| .BR mount (2), |
| .BR open_by_handle_at (2), |
| .BR read (2), |
| .BR socket (2), |
| .BR stat (2), |
| .BR umask (2), |
| .BR unlink (2), |
| .BR write (2), |
| .BR fopen (3), |
| .BR acl (5), |
| .BR fifo (7), |
| .BR inode (7), |
| .BR path_resolution (7), |
| .BR symlink (7) |