| From ce3614daabea8a2d01c1dd17ae41d1ec5e5ae7db Mon Sep 17 00:00:00 2001 |
| From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
| Date: Mon, 6 Jul 2020 16:49:10 -0400 |
| Subject: sched: Fix unreliable rseq cpu_id for new tasks |
| |
| From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
| |
| commit ce3614daabea8a2d01c1dd17ae41d1ec5e5ae7db upstream. |
| |
| While integrating rseq into glibc and replacing glibc's sched_getcpu |
| implementation with rseq, glibc's tests discovered an issue with |
| incorrect __rseq_abi.cpu_id field value right after the first time |
| a newly created process issues sched_setaffinity. |
| |
| For the records, it triggers after building glibc and running tests, and |
| then issuing: |
| |
| for x in {1..2000} ; do posix/tst-affinity-static & done |
| |
| and shows up as: |
| |
| error: Unexpected CPU 2, expected 0 |
| error: Unexpected CPU 2, expected 0 |
| error: Unexpected CPU 2, expected 0 |
| error: Unexpected CPU 2, expected 0 |
| error: Unexpected CPU 138, expected 0 |
| error: Unexpected CPU 138, expected 0 |
| error: Unexpected CPU 138, expected 0 |
| error: Unexpected CPU 138, expected 0 |
| |
| This is caused by the scheduler invoking __set_task_cpu() directly from |
| sched_fork() and wake_up_new_task(), thus bypassing rseq_migrate() which |
| is done by set_task_cpu(). |
| |
| Add the missing rseq_migrate() to both functions. The only other direct |
| use of __set_task_cpu() is done by init_idle(), which does not involve a |
| user-space task. |
| |
| Based on my testing with the glibc test-case, just adding rseq_migrate() |
| to wake_up_new_task() is sufficient to fix the observed issue. Also add |
| it to sched_fork() to keep things consistent. |
| |
| The reason why this never triggered so far with the rseq/basic_test |
| selftest is unclear. |
| |
| The current use of sched_getcpu(3) does not typically require it to be |
| always accurate. However, use of the __rseq_abi.cpu_id field within rseq |
| critical sections requires it to be accurate. If it is not accurate, it |
| can cause corruption in the per-cpu data targeted by rseq critical |
| sections in user-space. |
| |
| Reported-By: Florian Weimer <fweimer@redhat.com> |
| Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
| Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> |
| Tested-By: Florian Weimer <fweimer@redhat.com> |
| Cc: stable@vger.kernel.org # v4.18+ |
| Link: https://lkml.kernel.org/r/20200707201505.2632-1-mathieu.desnoyers@efficios.com |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| kernel/sched/core.c | 2 ++ |
| 1 file changed, 2 insertions(+) |
| |
| --- a/kernel/sched/core.c |
| +++ b/kernel/sched/core.c |
| @@ -2889,6 +2889,7 @@ int sched_fork(unsigned long clone_flags |
| * Silence PROVE_RCU. |
| */ |
| raw_spin_lock_irqsave(&p->pi_lock, flags); |
| + rseq_migrate(p); |
| /* |
| * We're setting the CPU for the first time, we don't migrate, |
| * so use __set_task_cpu(). |
| @@ -2953,6 +2954,7 @@ void wake_up_new_task(struct task_struct |
| * as we're not fully set-up yet. |
| */ |
| p->recent_used_cpu = task_cpu(p); |
| + rseq_migrate(p); |
| __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0)); |
| #endif |
| rq = __task_rq_lock(p, &rf); |