| Subject: sched: Queue RT tasks to head when prio drops |
| From: Thomas Gleixner <tglx@linutronix.de> |
| Date: Tue, 04 Dec 2012 08:56:41 +0100 |
| |
| The following scenario does not work correctly: |
| |
| Runqueue of CPUx contains two runnable and pinned tasks: |
| T1: SCHED_FIFO, prio 80 |
| T2: SCHED_FIFO, prio 80 |
| |
| T1 is on the cpu and executes the following syscalls (classic priority |
| ceiling scenario): |
| |
| sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90); |
| ... |
| sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80); |
| ... |
| |
| Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back |
| to sleep the scheduler picks T2. Surprise! |
| |
| The same happens w/o actual preemption when T1 is forced into the |
| scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes |
| pick_next_task() which returns T2. So T1 gets preempted and scheduled |
| out. |
| |
| This happens because sched_setscheduler() dequeues T1 from the prio 90 |
| list and then enqueues it on the tail of the prio 80 list behind T2. |
| This violates the POSIX spec and surprises user space which relies on |
| the guarantee that SCHED_FIFO tasks are not scheduled out unless they |
| give the CPU up voluntarily or are preempted by a higher priority |
| task. In the latter case the preempted task must get back on the CPU |
| after the preempting task schedules out again. |
| |
| We fixed a similar issue already in commit 60db48c (sched: Queue a |
| deboosted task to the head of the RT prio queue). The same treatment |
| is necessary for sched_setscheduler(). So enqueue to head of the prio |
| bucket list if the priority of the task is lowered. |
| |
| It might be possible that existing user space relies on the current |
| behaviour, but it can be considered highly unlikely due to the corner |
| case nature of the application scenario. |
| |
| Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
| Cc: stable@vger.kernel.org |
| Cc: stable-rt@vger.kernel.org |
| --- |
| kernel/sched/core.c | 9 +++++++-- |
| 1 file changed, 7 insertions(+), 2 deletions(-) |
| |
| --- a/kernel/sched/core.c |
| +++ b/kernel/sched/core.c |
| @@ -4166,8 +4166,13 @@ recheck: |
| |
| if (running) |
| p->sched_class->set_curr_task(rq); |
| - if (on_rq) |
| - enqueue_task(rq, p, 0); |
| + if (on_rq) { |
| + /* |
| + * We enqueue to tail when the priority of a task is |
| + * increased (user space view). |
| + */ |
| + enqueue_task(rq, p, oldprio <= p->prio ? ENQUEUE_HEAD : 0); |
| + } |
| |
| check_class_changed(rq, p, prev_class, oldprio); |
| task_rq_unlock(rq, p, &flags); |