|  | =================================== | 
|  | NT synchronization primitive driver | 
|  | =================================== | 
|  |  | 
|  | This page documents the user-space API for the ntsync driver. | 
|  |  | 
|  | ntsync is a support driver for emulation of NT synchronization | 
|  | primitives by user-space NT emulators. It exists because implementation | 
|  | in user-space, using existing tools, cannot match Windows performance | 
|  | while offering accurate semantics. It is implemented entirely in | 
|  | software, and does not drive any hardware device. | 
|  |  | 
|  | This interface is meant as a compatibility tool only, and should not | 
|  | be used for general synchronization. Instead use generic, versatile | 
|  | interfaces such as futex(2) and poll(2). | 
|  |  | 
|  | Synchronization primitives | 
|  | ========================== | 
|  |  | 
|  | The ntsync driver exposes three types of synchronization primitives: | 
|  | semaphores, mutexes, and events. | 
|  |  | 
|  | A semaphore holds a single volatile 32-bit counter, and a static 32-bit | 
|  | integer denoting the maximum value. It is considered signaled (that is, | 
|  | can be acquired without contention, or will wake up a waiting thread) | 
|  | when the counter is nonzero. The counter is decremented by one when a | 
|  | wait is satisfied. Both the initial and maximum count are established | 
|  | when the semaphore is created. | 
|  |  | 
|  | A mutex holds a volatile 32-bit recursion count, and a volatile 32-bit | 
|  | identifier denoting its owner. A mutex is considered signaled when its | 
|  | owner is zero (indicating that it is not owned). The recursion count is | 
|  | incremented when a wait is satisfied, and ownership is set to the given | 
|  | identifier. | 
|  |  | 
|  | A mutex also holds an internal flag denoting whether its previous owner | 
|  | has died; such a mutex is said to be abandoned. Owner death is not | 
|  | tracked automatically based on thread death, but rather must be | 
|  | communicated using ``NTSYNC_IOC_MUTEX_KILL``. An abandoned mutex is | 
|  | inherently considered unowned. | 
|  |  | 
|  | Except for the "unowned" semantics of zero, the actual value of the | 
|  | owner identifier is not interpreted by the ntsync driver at all. The | 
|  | intended use is to store a thread identifier; however, the ntsync | 
|  | driver does not actually validate that a calling thread provides | 
|  | consistent or unique identifiers. | 
|  |  | 
|  | An event is similar to a semaphore with a maximum count of one. It holds | 
|  | a volatile boolean state denoting whether it is signaled or not. There | 
|  | are two types of events, auto-reset and manual-reset. An auto-reset | 
|  | event is designaled when a wait is satisfied; a manual-reset event is | 
|  | not. The event type is specified when the event is created. | 
|  |  | 
|  | Unless specified otherwise, all operations on an object are atomic and | 
|  | totally ordered with respect to other operations on the same object. | 
|  |  | 
|  | Objects are represented by files. When all file descriptors to an | 
|  | object are closed, that object is deleted. | 
|  |  | 
|  | Char device | 
|  | =========== | 
|  |  | 
|  | The ntsync driver creates a single char device /dev/ntsync. Each file | 
|  | description opened on the device represents a unique instance intended | 
|  | to back an individual NT virtual machine. Objects created by one ntsync | 
|  | instance may only be used with other objects created by the same | 
|  | instance. | 
|  |  | 
|  | ioctl reference | 
|  | =============== | 
|  |  | 
|  | All operations on the device are done through ioctls. There are four | 
|  | structures used in ioctl calls:: | 
|  |  | 
|  | struct ntsync_sem_args { | 
|  | __u32 count; | 
|  | __u32 max; | 
|  | }; | 
|  |  | 
|  | struct ntsync_mutex_args { | 
|  | __u32 owner; | 
|  | __u32 count; | 
|  | }; | 
|  |  | 
|  | struct ntsync_event_args { | 
|  | __u32 signaled; | 
|  | __u32 manual; | 
|  | }; | 
|  |  | 
|  | struct ntsync_wait_args { | 
|  | __u64 timeout; | 
|  | __u64 objs; | 
|  | __u32 count; | 
|  | __u32 owner; | 
|  | __u32 index; | 
|  | __u32 alert; | 
|  | __u32 flags; | 
|  | __u32 pad; | 
|  | }; | 
|  |  | 
|  | Depending on the ioctl, members of the structure may be used as input, | 
|  | output, or not at all. | 
|  |  | 
|  | The ioctls on the device file are as follows: | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_CREATE_SEM | 
|  |  | 
|  | Create a semaphore object. Takes a pointer to struct | 
|  | :c:type:`ntsync_sem_args`, which is used as follows: | 
|  |  | 
|  | .. list-table:: | 
|  |  | 
|  | * - ``count`` | 
|  | - Initial count of the semaphore. | 
|  | * - ``max`` | 
|  | - Maximum count of the semaphore. | 
|  |  | 
|  | Fails with ``EINVAL`` if ``count`` is greater than ``max``. | 
|  | On success, returns a file descriptor the created semaphore. | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_CREATE_MUTEX | 
|  |  | 
|  | Create a mutex object. Takes a pointer to struct | 
|  | :c:type:`ntsync_mutex_args`, which is used as follows: | 
|  |  | 
|  | .. list-table:: | 
|  |  | 
|  | * - ``count`` | 
|  | - Initial recursion count of the mutex. | 
|  | * - ``owner`` | 
|  | - Initial owner of the mutex. | 
|  |  | 
|  | If ``owner`` is nonzero and ``count`` is zero, or if ``owner`` is | 
|  | zero and ``count`` is nonzero, the function fails with ``EINVAL``. | 
|  | On success, returns a file descriptor the created mutex. | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_CREATE_EVENT | 
|  |  | 
|  | Create an event object. Takes a pointer to struct | 
|  | :c:type:`ntsync_event_args`, which is used as follows: | 
|  |  | 
|  | .. list-table:: | 
|  |  | 
|  | * - ``signaled`` | 
|  | - If nonzero, the event is initially signaled, otherwise | 
|  | nonsignaled. | 
|  | * - ``manual`` | 
|  | - If nonzero, the event is a manual-reset event, otherwise | 
|  | auto-reset. | 
|  |  | 
|  | On success, returns a file descriptor the created event. | 
|  |  | 
|  | The ioctls on the individual objects are as follows: | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_SEM_POST | 
|  |  | 
|  | Post to a semaphore object. Takes a pointer to a 32-bit integer, | 
|  | which on input holds the count to be added to the semaphore, and on | 
|  | output contains its previous count. | 
|  |  | 
|  | If adding to the semaphore's current count would raise the latter | 
|  | past the semaphore's maximum count, the ioctl fails with | 
|  | ``EOVERFLOW`` and the semaphore is not affected. If raising the | 
|  | semaphore's count causes it to become signaled, eligible threads | 
|  | waiting on this semaphore will be woken and the semaphore's count | 
|  | decremented appropriately. | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_MUTEX_UNLOCK | 
|  |  | 
|  | Release a mutex object. Takes a pointer to struct | 
|  | :c:type:`ntsync_mutex_args`, which is used as follows: | 
|  |  | 
|  | .. list-table:: | 
|  |  | 
|  | * - ``owner`` | 
|  | - Specifies the owner trying to release this mutex. | 
|  | * - ``count`` | 
|  | - On output, contains the previous recursion count. | 
|  |  | 
|  | If ``owner`` is zero, the ioctl fails with ``EINVAL``. If ``owner`` | 
|  | is not the current owner of the mutex, the ioctl fails with | 
|  | ``EPERM``. | 
|  |  | 
|  | The mutex's count will be decremented by one. If decrementing the | 
|  | mutex's count causes it to become zero, the mutex is marked as | 
|  | unowned and signaled, and eligible threads waiting on it will be | 
|  | woken as appropriate. | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_SET_EVENT | 
|  |  | 
|  | Signal an event object. Takes a pointer to a 32-bit integer, which on | 
|  | output contains the previous state of the event. | 
|  |  | 
|  | Eligible threads will be woken, and auto-reset events will be | 
|  | designaled appropriately. | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_RESET_EVENT | 
|  |  | 
|  | Designal an event object. Takes a pointer to a 32-bit integer, which | 
|  | on output contains the previous state of the event. | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_PULSE_EVENT | 
|  |  | 
|  | Wake threads waiting on an event object while leaving it in an | 
|  | unsignaled state. Takes a pointer to a 32-bit integer, which on | 
|  | output contains the previous state of the event. | 
|  |  | 
|  | A pulse operation can be thought of as a set followed by a reset, | 
|  | performed as a single atomic operation. If two threads are waiting on | 
|  | an auto-reset event which is pulsed, only one will be woken. If two | 
|  | threads are waiting a manual-reset event which is pulsed, both will | 
|  | be woken. However, in both cases, the event will be unsignaled | 
|  | afterwards, and a simultaneous read operation will always report the | 
|  | event as unsignaled. | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_READ_SEM | 
|  |  | 
|  | Read the current state of a semaphore object. Takes a pointer to | 
|  | struct :c:type:`ntsync_sem_args`, which is used as follows: | 
|  |  | 
|  | .. list-table:: | 
|  |  | 
|  | * - ``count`` | 
|  | - On output, contains the current count of the semaphore. | 
|  | * - ``max`` | 
|  | - On output, contains the maximum count of the semaphore. | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_READ_MUTEX | 
|  |  | 
|  | Read the current state of a mutex object. Takes a pointer to struct | 
|  | :c:type:`ntsync_mutex_args`, which is used as follows: | 
|  |  | 
|  | .. list-table:: | 
|  |  | 
|  | * - ``owner`` | 
|  | - On output, contains the current owner of the mutex, or zero | 
|  | if the mutex is not currently owned. | 
|  | * - ``count`` | 
|  | - On output, contains the current recursion count of the mutex. | 
|  |  | 
|  | If the mutex is marked as abandoned, the function fails with | 
|  | ``EOWNERDEAD``. In this case, ``count`` and ``owner`` are set to | 
|  | zero. | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_READ_EVENT | 
|  |  | 
|  | Read the current state of an event object. Takes a pointer to struct | 
|  | :c:type:`ntsync_event_args`, which is used as follows: | 
|  |  | 
|  | .. list-table:: | 
|  |  | 
|  | * - ``signaled`` | 
|  | - On output, contains the current state of the event. | 
|  | * - ``manual`` | 
|  | - On output, contains 1 if the event is a manual-reset event, | 
|  | and 0 otherwise. | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_KILL_OWNER | 
|  |  | 
|  | Mark a mutex as unowned and abandoned if it is owned by the given | 
|  | owner. Takes an input-only pointer to a 32-bit integer denoting the | 
|  | owner. If the owner is zero, the ioctl fails with ``EINVAL``. If the | 
|  | owner does not own the mutex, the function fails with ``EPERM``. | 
|  |  | 
|  | Eligible threads waiting on the mutex will be woken as appropriate | 
|  | (and such waits will fail with ``EOWNERDEAD``, as described below). | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_WAIT_ANY | 
|  |  | 
|  | Poll on any of a list of objects, atomically acquiring at most one. | 
|  | Takes a pointer to struct :c:type:`ntsync_wait_args`, which is | 
|  | used as follows: | 
|  |  | 
|  | .. list-table:: | 
|  |  | 
|  | * - ``timeout`` | 
|  | - Absolute timeout in nanoseconds. If ``NTSYNC_WAIT_REALTIME`` | 
|  | is set, the timeout is measured against the REALTIME clock; | 
|  | otherwise it is measured against the MONOTONIC clock. If the | 
|  | timeout is equal to or earlier than the current time, the | 
|  | function returns immediately without sleeping. If ``timeout`` | 
|  | is U64_MAX, the function will sleep until an object is | 
|  | signaled, and will not fail with ``ETIMEDOUT``. | 
|  | * - ``objs`` | 
|  | - Pointer to an array of ``count`` file descriptors | 
|  | (specified as an integer so that the structure has the same | 
|  | size regardless of architecture). If any object is | 
|  | invalid, the function fails with ``EINVAL``. | 
|  | * - ``count`` | 
|  | - Number of objects specified in the ``objs`` array. | 
|  | If greater than ``NTSYNC_MAX_WAIT_COUNT``, the function fails | 
|  | with ``EINVAL``. | 
|  | * - ``owner`` | 
|  | - Mutex owner identifier. If any object in ``objs`` is a mutex, | 
|  | the ioctl will attempt to acquire that mutex on behalf of | 
|  | ``owner``. If ``owner`` is zero, the ioctl fails with | 
|  | ``EINVAL``. | 
|  | * - ``index`` | 
|  | - On success, contains the index (into ``objs``) of the object | 
|  | which was signaled. If ``alert`` was signaled instead, | 
|  | this contains ``count``. | 
|  | * - ``alert`` | 
|  | - Optional event object file descriptor. If nonzero, this | 
|  | specifies an "alert" event object which, if signaled, will | 
|  | terminate the wait. If nonzero, the identifier must point to a | 
|  | valid event. | 
|  | * - ``flags`` | 
|  | - Zero or more flags. Currently the only flag is | 
|  | ``NTSYNC_WAIT_REALTIME``, which causes the timeout to be | 
|  | measured against the REALTIME clock instead of MONOTONIC. | 
|  | * - ``pad`` | 
|  | - Unused, must be set to zero. | 
|  |  | 
|  | This function attempts to acquire one of the given objects. If unable | 
|  | to do so, it sleeps until an object becomes signaled, subsequently | 
|  | acquiring it, or the timeout expires. In the latter case the ioctl | 
|  | fails with ``ETIMEDOUT``. The function only acquires one object, even | 
|  | if multiple objects are signaled. | 
|  |  | 
|  | A semaphore is considered to be signaled if its count is nonzero, and | 
|  | is acquired by decrementing its count by one. A mutex is considered | 
|  | to be signaled if it is unowned or if its owner matches the ``owner`` | 
|  | argument, and is acquired by incrementing its recursion count by one | 
|  | and setting its owner to the ``owner`` argument. An auto-reset event | 
|  | is acquired by designaling it; a manual-reset event is not affected | 
|  | by acquisition. | 
|  |  | 
|  | Acquisition is atomic and totally ordered with respect to other | 
|  | operations on the same object. If two wait operations (with different | 
|  | ``owner`` identifiers) are queued on the same mutex, only one is | 
|  | signaled. If two wait operations are queued on the same semaphore, | 
|  | and a value of one is posted to it, only one is signaled. | 
|  |  | 
|  | If an abandoned mutex is acquired, the ioctl fails with | 
|  | ``EOWNERDEAD``. Although this is a failure return, the function may | 
|  | otherwise be considered successful. The mutex is marked as owned by | 
|  | the given owner (with a recursion count of 1) and as no longer | 
|  | abandoned, and ``index`` is still set to the index of the mutex. | 
|  |  | 
|  | The ``alert`` argument is an "extra" event which can terminate the | 
|  | wait, independently of all other objects. | 
|  |  | 
|  | It is valid to pass the same object more than once, including by | 
|  | passing the same event in the ``objs`` array and in ``alert``. If a | 
|  | wakeup occurs due to that object being signaled, ``index`` is set to | 
|  | the lowest index corresponding to that object. | 
|  |  | 
|  | The function may fail with ``EINTR`` if a signal is received. | 
|  |  | 
|  | .. c:macro:: NTSYNC_IOC_WAIT_ALL | 
|  |  | 
|  | Poll on a list of objects, atomically acquiring all of them. Takes a | 
|  | pointer to struct :c:type:`ntsync_wait_args`, which is used | 
|  | identically to ``NTSYNC_IOC_WAIT_ANY``, except that ``index`` is | 
|  | always filled with zero on success if not woken via alert. | 
|  |  | 
|  | This function attempts to simultaneously acquire all of the given | 
|  | objects. If unable to do so, it sleeps until all objects become | 
|  | simultaneously signaled, subsequently acquiring them, or the timeout | 
|  | expires. In the latter case the ioctl fails with ``ETIMEDOUT`` and no | 
|  | objects are modified. | 
|  |  | 
|  | Objects may become signaled and subsequently designaled (through | 
|  | acquisition by other threads) while this thread is sleeping. Only | 
|  | once all objects are simultaneously signaled does the ioctl acquire | 
|  | them and return. The entire acquisition is atomic and totally ordered | 
|  | with respect to other operations on any of the given objects. | 
|  |  | 
|  | If an abandoned mutex is acquired, the ioctl fails with | 
|  | ``EOWNERDEAD``. Similarly to ``NTSYNC_IOC_WAIT_ANY``, all objects are | 
|  | nevertheless marked as acquired. Note that if multiple mutex objects | 
|  | are specified, there is no way to know which were marked as | 
|  | abandoned. | 
|  |  | 
|  | As with "any" waits, the ``alert`` argument is an "extra" event which | 
|  | can terminate the wait. Critically, however, an "all" wait will | 
|  | succeed if all members in ``objs`` are signaled, *or* if ``alert`` is | 
|  | signaled. In the latter case ``index`` will be set to ``count``. As | 
|  | with "any" waits, if both conditions are filled, the former takes | 
|  | priority, and objects in ``objs`` will be acquired. | 
|  |  | 
|  | Unlike ``NTSYNC_IOC_WAIT_ANY``, it is not valid to pass the same | 
|  | object more than once, nor is it valid to pass the same object in | 
|  | ``objs`` and in ``alert``. If this is attempted, the function fails | 
|  | with ``EINVAL``. |