futex: Add support for private attached futexes
This patch adds support for the futex OP FUTEX_ATTACHED which can only
be used together with FLAGS_SHARED: This is limited to private FUTEXes.
This FUTEX_ATTACHED flag can not be made default because it changes the
ATTACHED futex, usage howto:
- before usage it needs to be `attached' to initialize the in-kernel
cookie = sys_futex(&mutex->__data.__lock,
FUTEX_ATTACH | FUTEX_PRIVATE_FLAG,
0, 0, 0, 0);
The return value is either <0 for an error or >= 0 which returns a
`cookie' which should be used for further operations.
- any operation on this FUTEX should use the `cookie', for example the
ret = sys_futex((void *)(unsigned long)cookie, FUTEX_LOCK_PI |
FUTEX_PRIVATE_FLAG | FUTEX_ATTACHED,
0, 0, 0, 0);
The return value is <0 for an error and 0 for success.
- once the lock is considered removed, the FUTEX_DETACH should be
invoked in order to remove the in kernel state for the FUTEX. The
return value is 0 for success and <0 for failure. A FUTEX can not be
detached if there is an operation pending i.e. a LOCK_PI which did not
The struct_mm is exended by struct futex_cache. This struct holds the
an array of struct futex_cache_slot. Each entry is deployed after an
`FUTEX_ATTACH' operation and holds a pointer to struct futex_state.
The array is extended on demand (never shrunk) and RCU protected.
each set bit is set if the corresponding `slots' entry is in use. The
size is limited 4096 bits which means there can not be more than 4096
FUTEX per process attached / in use.
Size in bits of the currently deployed slots member.
A lock which taken in slowpath on extending of the slots member and on
removal the fs members.
On each `FUTEX_ATTACH' operation an in kernel state of the userland
FUTEX is allocated: futex_state. This state contains a dedicated
futex_hash_bucket which is used exclusively for the lock. This avoids
lock contentions on the global futex_hash_bucket which means two
different locks share never the same futex_hash_bucket. Also the memory
for the in kernel state is allocated the current NUMA node which should
reduce cross NUMA memory access for the access of the futex_hash_bucket.
The global futex_hash_bucket is used to ensure that a FUTEX is only
enqueued once. A second FUTEX_ATTACH operation on the same uaddr will
fail because it already exists in the global futex_hash_bucket.
Uppon a `FUTEX_ATTACH' operation the slot number of the ->slots array is
returned which holds the in kernel state. This number is used in the
following FUTEX operations i.e. FUTEX_LOCK_PI. In the hotpath, the
cache_map is checked to see if the array member is deployed. The slots
array and fs member is dereferenced within a RCU read section. This
avoids holding any locks in the hotpath. The futex_state has an `users'
reference counter. A value of zero means that the structure exists
within this RCU read section and is subject to removal and therefore
shall not be used. atomic_inc_not_zero() ensures usage of the object
after leave the RCU read section.
The mix of `FUTEX_ATTACHED' flag has the same outcome as the mix of the
`FUTEX_PRIVATE_FLAG' flag: The kernel won't find the correct
futex_hash_bucket and the operation will block.
It is believed that the `FUTEX_ATTACH' operation can be hidding within
pthread_mutex_init() function and the `FUTEX_ATTACH' operation with
pthread_mutex_destroy(). The glibc could turn in on for all private
locks. An automatic in-kernel switch on does not exists because the
current interfaces supplies the address of the lock instead the returned
cookie. A lookup in kernel would involve lock protected list or hashtable
which would bring locks which we try to avoid with the per-lock
Signed-off-by: Thomas Gleixner <firstname.lastname@example.org>
7 files changed