| From: Zhen Ni <zhen.ni@easystack.cn> |
| Subject: kernel/sys: optimize do_prlimit lock scope to reduce contention |
| Date: Wed, 20 Nov 2024 21:21:56 +0800 |
| |
| Refine the lock scope in the do_prlimit function to reduce contention on |
| task_lock(tsk->group_leader). The lock now protects only sections that |
| access or modify shared resources (rlim). Permission checks (capable) and |
| security validations (security_task_setrlimit) are placed outside the |
| lock, as they do not modify rlim and are independent of shared data |
| protection. |
| |
| security_task_setrlimit() is a Linux Security Module (LSM) hook that |
| evaluates resource limit changes based on security policies. It does not |
| alter the rlim data structure, as confirmed by existing LSM |
| implementations (e.g., SELinux and AppArmor). Thus, this function does |
| not require locking, ensuring correctness while improving concurrency. |
| |
| Link: https://lkml.kernel.org/r/20241120132156.207250-1-zhen.ni@easystack.cn |
| Signed-off-by: Zhen Ni <zhen.ni@easystack.cn> |
| Cc: Alexander Viro <viro@zeniv.linux.org.uk> |
| Cc: Catalin Marinas <catalin.marinas@arm.com> |
| Cc: Christian Brauner <brauner@kernel.org> |
| Cc: Oleg Nesterov <oleg@redhat.com> |
| Cc: Zev Weiss <zev@bewilderbeest.net> |
| Cc: James Morris <jmorris@namei.org> |
| Cc: Paul Moore <paul@paul-moore.com> |
| Cc: Serge E. Hallyn <serge@hallyn.com> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| --- |
| |
| kernel/sys.c | 12 +++++++----- |
| 1 file changed, 7 insertions(+), 5 deletions(-) |
| |
| --- a/kernel/sys.c~kernel-sys-optimize-do_prlimit-lock-scope-to-reduce-contention |
| +++ a/kernel/sys.c |
| @@ -1481,18 +1481,20 @@ static int do_prlimit(struct task_struct |
| |
| /* Holding a refcount on tsk protects tsk->signal from disappearing. */ |
| rlim = tsk->signal->rlim + resource; |
| - task_lock(tsk->group_leader); |
| if (new_rlim) { |
| /* |
| * Keep the capable check against init_user_ns until cgroups can |
| * contain all limits. |
| */ |
| if (new_rlim->rlim_max > rlim->rlim_max && |
| - !capable(CAP_SYS_RESOURCE)) |
| - retval = -EPERM; |
| - if (!retval) |
| - retval = security_task_setrlimit(tsk, resource, new_rlim); |
| + !capable(CAP_SYS_RESOURCE)) |
| + return -EPERM; |
| + retval = security_task_setrlimit(tsk, resource, new_rlim); |
| + if (retval) |
| + return retval; |
| } |
| + |
| + task_lock(tsk->group_leader); |
| if (!retval) { |
| if (old_rlim) |
| *old_rlim = *rlim; |
| _ |