|  | .. SPDX-License-Identifier: GPL-2.0 | 
|  |  | 
|  | =================================== | 
|  | File management in the Linux kernel | 
|  | =================================== | 
|  |  | 
|  | This document describes how locking for files (struct file) | 
|  | and file descriptor table (struct files) works. | 
|  |  | 
|  | Up until 2.6.12, the file descriptor table has been protected | 
|  | with a lock (files->file_lock) and reference count (files->count). | 
|  | ->file_lock protected accesses to all the file related fields | 
|  | of the table. ->count was used for sharing the file descriptor | 
|  | table between tasks cloned with CLONE_FILES flag. Typically | 
|  | this would be the case for posix threads. As with the common | 
|  | refcounting model in the kernel, the last task doing | 
|  | a put_files_struct() frees the file descriptor (fd) table. | 
|  | The files (struct file) themselves are protected using | 
|  | reference count (->f_count). | 
|  |  | 
|  | In the new lock-free model of file descriptor management, | 
|  | the reference counting is similar, but the locking is | 
|  | based on RCU. The file descriptor table contains multiple | 
|  | elements - the fd sets (open_fds and close_on_exec, the | 
|  | array of file pointers, the sizes of the sets and the array | 
|  | etc.). In order for the updates to appear atomic to | 
|  | a lock-free reader, all the elements of the file descriptor | 
|  | table are in a separate structure - struct fdtable. | 
|  | files_struct contains a pointer to struct fdtable through | 
|  | which the actual fd table is accessed. Initially the | 
|  | fdtable is embedded in files_struct itself. On a subsequent | 
|  | expansion of fdtable, a new fdtable structure is allocated | 
|  | and files->fdtab points to the new structure. The fdtable | 
|  | structure is freed with RCU and lock-free readers either | 
|  | see the old fdtable or the new fdtable making the update | 
|  | appear atomic. Here are the locking rules for | 
|  | the fdtable structure - | 
|  |  | 
|  | 1. All references to the fdtable must be done through | 
|  | the files_fdtable() macro:: | 
|  |  | 
|  | struct fdtable *fdt; | 
|  |  | 
|  | rcu_read_lock(); | 
|  |  | 
|  | fdt = files_fdtable(files); | 
|  | .... | 
|  | if (n <= fdt->max_fds) | 
|  | .... | 
|  | ... | 
|  | rcu_read_unlock(); | 
|  |  | 
|  | files_fdtable() uses rcu_dereference() macro which takes care of | 
|  | the memory barrier requirements for lock-free dereference. | 
|  | The fdtable pointer must be read within the read-side | 
|  | critical section. | 
|  |  | 
|  | 2. Reading of the fdtable as described above must be protected | 
|  | by rcu_read_lock()/rcu_read_unlock(). | 
|  |  | 
|  | 3. For any update to the fd table, files->file_lock must | 
|  | be held. | 
|  |  | 
|  | 4. To look up the file structure given an fd, a reader | 
|  | must use either lookup_fdget_rcu() or files_lookup_fdget_rcu() APIs. These | 
|  | take care of barrier requirements due to lock-free lookup. | 
|  |  | 
|  | An example:: | 
|  |  | 
|  | struct file *file; | 
|  |  | 
|  | rcu_read_lock(); | 
|  | file = lookup_fdget_rcu(fd); | 
|  | rcu_read_unlock(); | 
|  | if (file) { | 
|  | ... | 
|  | fput(file); | 
|  | } | 
|  | .... | 
|  |  | 
|  | 5. Since both fdtable and file structures can be looked up | 
|  | lock-free, they must be installed using rcu_assign_pointer() | 
|  | API. If they are looked up lock-free, rcu_dereference() | 
|  | must be used. However it is advisable to use files_fdtable() | 
|  | and lookup_fdget_rcu()/files_lookup_fdget_rcu() which take care of these | 
|  | issues. | 
|  |  | 
|  | 6. While updating, the fdtable pointer must be looked up while | 
|  | holding files->file_lock. If ->file_lock is dropped, then | 
|  | another thread expand the files thereby creating a new | 
|  | fdtable and making the earlier fdtable pointer stale. | 
|  |  | 
|  | For example:: | 
|  |  | 
|  | spin_lock(&files->file_lock); | 
|  | fd = locate_fd(files, file, start); | 
|  | if (fd >= 0) { | 
|  | /* locate_fd() may have expanded fdtable, load the ptr */ | 
|  | fdt = files_fdtable(files); | 
|  | __set_open_fd(fd, fdt); | 
|  | __clear_close_on_exec(fd, fdt); | 
|  | spin_unlock(&files->file_lock); | 
|  | ..... | 
|  |  | 
|  | Since locate_fd() can drop ->file_lock (and reacquire ->file_lock), | 
|  | the fdtable pointer (fdt) must be loaded after locate_fd(). | 
|  |  | 
|  | On newer kernels rcu based file lookup has been switched to rely on | 
|  | SLAB_TYPESAFE_BY_RCU instead of call_rcu(). It isn't sufficient anymore | 
|  | to just acquire a reference to the file in question under rcu using | 
|  | atomic_long_inc_not_zero() since the file might have already been | 
|  | recycled and someone else might have bumped the reference. In other | 
|  | words, callers might see reference count bumps from newer users. For | 
|  | this is reason it is necessary to verify that the pointer is the same | 
|  | before and after the reference count increment. This pattern can be seen | 
|  | in get_file_rcu() and __files_get_rcu(). | 
|  |  | 
|  | In addition, it isn't possible to access or check fields in struct file | 
|  | without first acquiring a reference on it under rcu lookup. Not doing | 
|  | that was always very dodgy and it was only usable for non-pointer data | 
|  | in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers | 
|  | either first acquire a reference or they must hold the files_lock of the | 
|  | fdtable. |