livepatch: send a fake signal to all tasks

kGraft consistency model is of LEAVE_KERNEL and SWITCH_THREAD. This
means that all tasks in the system have to be marked one by one as safe
to call a new patched function. Safe place is on the boundary between
kernel and userspace. The patching waits for all tasks to cross this
boundary and finishes the process afterwards.

The problem is that a task can block the finalization of patching
process for quite a long time, if not forever. The task could sleep
somewhere in the kernel or could be running in the userspace with no
prospect of entering the kernel and thus going through the safe place.

Luckily we can force the task to do that by sending it a fake signal,
that is a signal with no data in signal pending structures (no handler,
no sign of proper signal delivered). Suspend/freezer use this to
freeze the tasks as well. The task gets TIF_SIGPENDING set and is
woken up (if it has been sleeping in the kernel before) or kicked by
rescheduling IPI (if it was running on other CPU). This causes the task
to go to kernel/userspace boundary where the signal would be handled and
the task would be marked as safe in terms of live patching.

There are tasks which are not affected by this technique though. The
fake signal is not sent to kthreads. They should be handled in a
different way. Also if the task is in TASK_RUNNING state but not
currently running on some CPU it doesn't get the IPI, but it would
eventually handle the signal anyway. Last, if the task runs in the kernel
(in TASK_RUNNING state) it gets the IPI, but the signal is not handled
on return from the interrupt. It would be handled on return to the
userspace in the future.

If the task was sleeping in a syscall it would be woken by our fake
signal, it would check if TIF_SIGPENDING is set (by calling
signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with
ERESTART* return values are restarted in case of the fake signal (see
do_signal()). EINTR is propagated back to the userspace program. This
could disturb the program, but...

  * each process dealing with signals should react accordingly to EINTR
    return values.
  * syscalls returning EINTR happen to be quite common situation in the
    system even if no fake signal is sent.
  * freezer sends the fake signal and does not deal with EINTR anyhow.
    Thus EINTR values are returned when the system is resumed.

The very safe marking is done in entry_64.S on syscall and
interrupt/exception exit paths.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2 files changed