| ========= | 
 | Livepatch | 
 | ========= | 
 |  | 
 | This document outlines basic information about kernel livepatching. | 
 |  | 
 | .. Table of Contents: | 
 |  | 
 | .. contents:: :local: | 
 |  | 
 |  | 
 | 1. Motivation | 
 | ============= | 
 |  | 
 | There are many situations where users are reluctant to reboot a system. It may | 
 | be because their system is performing complex scientific computations or under | 
 | heavy load during peak usage. In addition to keeping systems up and running, | 
 | users want to also have a stable and secure system. Livepatching gives users | 
 | both by allowing for function calls to be redirected; thus, fixing critical | 
 | functions without a system reboot. | 
 |  | 
 |  | 
 | 2. Kprobes, Ftrace, Livepatching | 
 | ================================ | 
 |  | 
 | There are multiple mechanisms in the Linux kernel that are directly related | 
 | to redirection of code execution; namely: kernel probes, function tracing, | 
 | and livepatching: | 
 |  | 
 |   - The kernel probes are the most generic. The code can be redirected by | 
 |     putting a breakpoint instruction instead of any instruction. | 
 |  | 
 |   - The function tracer calls the code from a predefined location that is | 
 |     close to the function entry point. This location is generated by the | 
 |     compiler using the '-pg' gcc option. | 
 |  | 
 |   - Livepatching typically needs to redirect the code at the very beginning | 
 |     of the function entry before the function parameters or the stack | 
 |     are in any way modified. | 
 |  | 
 | All three approaches need to modify the existing code at runtime. Therefore | 
 | they need to be aware of each other and not step over each other's toes. | 
 | Most of these problems are solved by using the dynamic ftrace framework as | 
 | a base. A Kprobe is registered as a ftrace handler when the function entry | 
 | is probed, see CONFIG_KPROBES_ON_FTRACE. Also an alternative function from | 
 | a live patch is called with the help of a custom ftrace handler. But there are | 
 | some limitations, see below. | 
 |  | 
 |  | 
 | 3. Consistency model | 
 | ==================== | 
 |  | 
 | Functions are there for a reason. They take some input parameters, get or | 
 | release locks, read, process, and even write some data in a defined way, | 
 | have return values. In other words, each function has a defined semantic. | 
 |  | 
 | Many fixes do not change the semantic of the modified functions. For | 
 | example, they add a NULL pointer or a boundary check, fix a race by adding | 
 | a missing memory barrier, or add some locking around a critical section. | 
 | Most of these changes are self contained and the function presents itself | 
 | the same way to the rest of the system. In this case, the functions might | 
 | be updated independently one by one. | 
 |  | 
 | But there are more complex fixes. For example, a patch might change | 
 | ordering of locking in multiple functions at the same time. Or a patch | 
 | might exchange meaning of some temporary structures and update | 
 | all the relevant functions. In this case, the affected unit | 
 | (thread, whole kernel) need to start using all new versions of | 
 | the functions at the same time. Also the switch must happen only | 
 | when it is safe to do so, e.g. when the affected locks are released | 
 | or no data are stored in the modified structures at the moment. | 
 |  | 
 | The theory about how to apply functions a safe way is rather complex. | 
 | The aim is to define a so-called consistency model. It attempts to define | 
 | conditions when the new implementation could be used so that the system | 
 | stays consistent. | 
 |  | 
 | Livepatch has a consistency model which is a hybrid of kGraft and | 
 | kpatch:  it uses kGraft's per-task consistency and syscall barrier | 
 | switching combined with kpatch's stack trace switching.  There are also | 
 | a number of fallback options which make it quite flexible. | 
 |  | 
 | Patches are applied on a per-task basis, when the task is deemed safe to | 
 | switch over.  When a patch is enabled, livepatch enters into a | 
 | transition state where tasks are converging to the patched state. | 
 | Usually this transition state can complete in a few seconds.  The same | 
 | sequence occurs when a patch is disabled, except the tasks converge from | 
 | the patched state to the unpatched state. | 
 |  | 
 | An interrupt handler inherits the patched state of the task it | 
 | interrupts.  The same is true for forked tasks: the child inherits the | 
 | patched state of the parent. | 
 |  | 
 | Livepatch uses several complementary approaches to determine when it's | 
 | safe to patch tasks: | 
 |  | 
 | 1. The first and most effective approach is stack checking of sleeping | 
 |    tasks.  If no affected functions are on the stack of a given task, | 
 |    the task is patched.  In most cases this will patch most or all of | 
 |    the tasks on the first try.  Otherwise it'll keep trying | 
 |    periodically.  This option is only available if the architecture has | 
 |    reliable stacks (HAVE_RELIABLE_STACKTRACE). | 
 |  | 
 | 2. The second approach, if needed, is kernel exit switching.  A | 
 |    task is switched when it returns to user space from a system call, a | 
 |    user space IRQ, or a signal.  It's useful in the following cases: | 
 |  | 
 |    a) Patching I/O-bound user tasks which are sleeping on an affected | 
 |       function.  In this case you have to send SIGSTOP and SIGCONT to | 
 |       force it to exit the kernel and be patched. | 
 |    b) Patching CPU-bound user tasks.  If the task is highly CPU-bound | 
 |       then it will get patched the next time it gets interrupted by an | 
 |       IRQ. | 
 |  | 
 | 3. For idle "swapper" tasks, since they don't ever exit the kernel, they | 
 |    instead have a klp_update_patch_state() call in the idle loop which | 
 |    allows them to be patched before the CPU enters the idle state. | 
 |  | 
 |    (Note there's not yet such an approach for kthreads.) | 
 |  | 
 | Architectures which don't have HAVE_RELIABLE_STACKTRACE solely rely on | 
 | the second approach. It's highly likely that some tasks may still be | 
 | running with an old version of the function, until that function | 
 | returns. In this case you would have to signal the tasks. This | 
 | especially applies to kthreads. They may not be woken up and would need | 
 | to be forced. See below for more information. | 
 |  | 
 | Unless we can come up with another way to patch kthreads, architectures | 
 | without HAVE_RELIABLE_STACKTRACE are not considered fully supported by | 
 | the kernel livepatching. | 
 |  | 
 | The /sys/kernel/livepatch/<patch>/transition file shows whether a patch | 
 | is in transition.  Only a single patch can be in transition at a given | 
 | time.  A patch can remain in transition indefinitely, if any of the tasks | 
 | are stuck in the initial patch state. | 
 |  | 
 | A transition can be reversed and effectively canceled by writing the | 
 | opposite value to the /sys/kernel/livepatch/<patch>/enabled file while | 
 | the transition is in progress.  Then all the tasks will attempt to | 
 | converge back to the original patch state. | 
 |  | 
 | There's also a /proc/<pid>/patch_state file which can be used to | 
 | determine which tasks are blocking completion of a patching operation. | 
 | If a patch is in transition, this file shows 0 to indicate the task is | 
 | unpatched and 1 to indicate it's patched.  Otherwise, if no patch is in | 
 | transition, it shows -1.  Any tasks which are blocking the transition | 
 | can be signaled with SIGSTOP and SIGCONT to force them to change their | 
 | patched state. This may be harmful to the system though. Sending a fake signal | 
 | to all remaining blocking tasks is a better alternative. No proper signal is | 
 | actually delivered (there is no data in signal pending structures). Tasks are | 
 | interrupted or woken up, and forced to change their patched state. The fake | 
 | signal is automatically sent every 15 seconds. | 
 |  | 
 | Administrator can also affect a transition through | 
 | /sys/kernel/livepatch/<patch>/force attribute. Writing 1 there clears | 
 | TIF_PATCH_PENDING flag of all tasks and thus forces the tasks to the patched | 
 | state. Important note! The force attribute is intended for cases when the | 
 | transition gets stuck for a long time because of a blocking task. Administrator | 
 | is expected to collect all necessary data (namely stack traces of such blocking | 
 | tasks) and request a clearance from a patch distributor to force the transition. | 
 | Unauthorized usage may cause harm to the system. It depends on the nature of the | 
 | patch, which functions are (un)patched, and which functions the blocking tasks | 
 | are sleeping in (/proc/<pid>/stack may help here). Removal (rmmod) of patch | 
 | modules is permanently disabled when the force feature is used. It cannot be | 
 | guaranteed there is no task sleeping in such module. It implies unbounded | 
 | reference count if a patch module is disabled and enabled in a loop. | 
 |  | 
 | Moreover, the usage of force may also affect future applications of live | 
 | patches and cause even more harm to the system. Administrator should first | 
 | consider to simply cancel a transition (see above). If force is used, reboot | 
 | should be planned and no more live patches applied. | 
 |  | 
 | 3.1 Adding consistency model support to new architectures | 
 | --------------------------------------------------------- | 
 |  | 
 | For adding consistency model support to new architectures, there are a | 
 | few options: | 
 |  | 
 | 1) Add CONFIG_HAVE_RELIABLE_STACKTRACE.  This means porting objtool, and | 
 |    for non-DWARF unwinders, also making sure there's a way for the stack | 
 |    tracing code to detect interrupts on the stack. | 
 |  | 
 | 2) Alternatively, ensure that every kthread has a call to | 
 |    klp_update_patch_state() in a safe location.  Kthreads are typically | 
 |    in an infinite loop which does some action repeatedly.  The safe | 
 |    location to switch the kthread's patch state would be at a designated | 
 |    point in the loop where there are no locks taken and all data | 
 |    structures are in a well-defined state. | 
 |  | 
 |    The location is clear when using workqueues or the kthread worker | 
 |    API.  These kthreads process independent actions in a generic loop. | 
 |  | 
 |    It's much more complicated with kthreads which have a custom loop. | 
 |    There the safe location must be carefully selected on a case-by-case | 
 |    basis. | 
 |  | 
 |    In that case, arches without HAVE_RELIABLE_STACKTRACE would still be | 
 |    able to use the non-stack-checking parts of the consistency model: | 
 |  | 
 |    a) patching user tasks when they cross the kernel/user space | 
 |       boundary; and | 
 |  | 
 |    b) patching kthreads and idle tasks at their designated patch points. | 
 |  | 
 |    This option isn't as good as option 1 because it requires signaling | 
 |    user tasks and waking kthreads to patch them.  But it could still be | 
 |    a good backup option for those architectures which don't have | 
 |    reliable stack traces yet. | 
 |  | 
 |  | 
 | 4. Livepatch module | 
 | =================== | 
 |  | 
 | Livepatches are distributed using kernel modules, see | 
 | samples/livepatch/livepatch-sample.c. | 
 |  | 
 | The module includes a new implementation of functions that we want | 
 | to replace. In addition, it defines some structures describing the | 
 | relation between the original and the new implementation. Then there | 
 | is code that makes the kernel start using the new code when the livepatch | 
 | module is loaded. Also there is code that cleans up before the | 
 | livepatch module is removed. All this is explained in more details in | 
 | the next sections. | 
 |  | 
 |  | 
 | 4.1. New functions | 
 | ------------------ | 
 |  | 
 | New versions of functions are typically just copied from the original | 
 | sources. A good practice is to add a prefix to the names so that they | 
 | can be distinguished from the original ones, e.g. in a backtrace. Also | 
 | they can be declared as static because they are not called directly | 
 | and do not need the global visibility. | 
 |  | 
 | The patch contains only functions that are really modified. But they | 
 | might want to access functions or data from the original source file | 
 | that may only be locally accessible. This can be solved by a special | 
 | relocation section in the generated livepatch module, see | 
 | Documentation/livepatch/module-elf-format.rst for more details. | 
 |  | 
 |  | 
 | 4.2. Metadata | 
 | ------------- | 
 |  | 
 | The patch is described by several structures that split the information | 
 | into three levels: | 
 |  | 
 |   - struct klp_func is defined for each patched function. It describes | 
 |     the relation between the original and the new implementation of a | 
 |     particular function. | 
 |  | 
 |     The structure includes the name, as a string, of the original function. | 
 |     The function address is found via kallsyms at runtime. | 
 |  | 
 |     Then it includes the address of the new function. It is defined | 
 |     directly by assigning the function pointer. Note that the new | 
 |     function is typically defined in the same source file. | 
 |  | 
 |     As an optional parameter, the symbol position in the kallsyms database can | 
 |     be used to disambiguate functions of the same name. This is not the | 
 |     absolute position in the database, but rather the order it has been found | 
 |     only for a particular object ( vmlinux or a kernel module ). Note that | 
 |     kallsyms allows for searching symbols according to the object name. | 
 |  | 
 |   - struct klp_object defines an array of patched functions (struct | 
 |     klp_func) in the same object. Where the object is either vmlinux | 
 |     (NULL) or a module name. | 
 |  | 
 |     The structure helps to group and handle functions for each object | 
 |     together. Note that patched modules might be loaded later than | 
 |     the patch itself and the relevant functions might be patched | 
 |     only when they are available. | 
 |  | 
 |  | 
 |   - struct klp_patch defines an array of patched objects (struct | 
 |     klp_object). | 
 |  | 
 |     This structure handles all patched functions consistently and eventually, | 
 |     synchronously. The whole patch is applied only when all patched | 
 |     symbols are found. The only exception are symbols from objects | 
 |     (kernel modules) that have not been loaded yet. | 
 |  | 
 |     For more details on how the patch is applied on a per-task basis, | 
 |     see the "Consistency model" section. | 
 |  | 
 |  | 
 | 5. Livepatch life-cycle | 
 | ======================= | 
 |  | 
 | Livepatching can be described by five basic operations: | 
 | loading, enabling, replacing, disabling, removing. | 
 |  | 
 | Where the replacing and the disabling operations are mutually | 
 | exclusive. They have the same result for the given patch but | 
 | not for the system. | 
 |  | 
 |  | 
 | 5.1. Loading | 
 | ------------ | 
 |  | 
 | The only reasonable way is to enable the patch when the livepatch kernel | 
 | module is being loaded. For this, klp_enable_patch() has to be called | 
 | in the module_init() callback. There are two main reasons: | 
 |  | 
 | First, only the module has an easy access to the related struct klp_patch. | 
 |  | 
 | Second, the error code might be used to refuse loading the module when | 
 | the patch cannot get enabled. | 
 |  | 
 |  | 
 | 5.2. Enabling | 
 | ------------- | 
 |  | 
 | The livepatch gets enabled by calling klp_enable_patch() from | 
 | the module_init() callback. The system will start using the new | 
 | implementation of the patched functions at this stage. | 
 |  | 
 | First, the addresses of the patched functions are found according to their | 
 | names. The special relocations, mentioned in the section "New functions", | 
 | are applied. The relevant entries are created under | 
 | /sys/kernel/livepatch/<name>. The patch is rejected when any above | 
 | operation fails. | 
 |  | 
 | Second, livepatch enters into a transition state where tasks are converging | 
 | to the patched state. If an original function is patched for the first | 
 | time, a function specific struct klp_ops is created and an universal | 
 | ftrace handler is registered\ [#]_. This stage is indicated by a value of '1' | 
 | in /sys/kernel/livepatch/<name>/transition. For more information about | 
 | this process, see the "Consistency model" section. | 
 |  | 
 | Finally, once all tasks have been patched, the 'transition' value changes | 
 | to '0'. | 
 |  | 
 | .. [#] | 
 |  | 
 |     Note that functions might be patched multiple times. The ftrace handler | 
 |     is registered only once for a given function. Further patches just add | 
 |     an entry to the list (see field `func_stack`) of the struct klp_ops. | 
 |     The right implementation is selected by the ftrace handler, see | 
 |     the "Consistency model" section. | 
 |  | 
 |     That said, it is highly recommended to use cumulative livepatches | 
 |     because they help keeping the consistency of all changes. In this case, | 
 |     functions might be patched two times only during the transition period. | 
 |  | 
 |  | 
 | 5.3. Replacing | 
 | -------------- | 
 |  | 
 | All enabled patches might get replaced by a cumulative patch that | 
 | has the .replace flag set. | 
 |  | 
 | Once the new patch is enabled and the 'transition' finishes then | 
 | all the functions (struct klp_func) associated with the replaced | 
 | patches are removed from the corresponding struct klp_ops. Also | 
 | the ftrace handler is unregistered and the struct klp_ops is | 
 | freed when the related function is not modified by the new patch | 
 | and func_stack list becomes empty. | 
 |  | 
 | See Documentation/livepatch/cumulative-patches.rst for more details. | 
 |  | 
 |  | 
 | 5.4. Disabling | 
 | -------------- | 
 |  | 
 | Enabled patches might get disabled by writing '0' to | 
 | /sys/kernel/livepatch/<name>/enabled. | 
 |  | 
 | First, livepatch enters into a transition state where tasks are converging | 
 | to the unpatched state. The system starts using either the code from | 
 | the previously enabled patch or even the original one. This stage is | 
 | indicated by a value of '1' in /sys/kernel/livepatch/<name>/transition. | 
 | For more information about this process, see the "Consistency model" | 
 | section. | 
 |  | 
 | Second, once all tasks have been unpatched, the 'transition' value changes | 
 | to '0'. All the functions (struct klp_func) associated with the to-be-disabled | 
 | patch are removed from the corresponding struct klp_ops. The ftrace handler | 
 | is unregistered and the struct klp_ops is freed when the func_stack list | 
 | becomes empty. | 
 |  | 
 | Third, the sysfs interface is destroyed. | 
 |  | 
 |  | 
 | 5.5. Removing | 
 | ------------- | 
 |  | 
 | Module removal is only safe when there are no users of functions provided | 
 | by the module. This is the reason why the force feature permanently | 
 | disables the removal. Only when the system is successfully transitioned | 
 | to a new patch state (patched/unpatched) without being forced it is | 
 | guaranteed that no task sleeps or runs in the old code. | 
 |  | 
 |  | 
 | 6. Sysfs | 
 | ======== | 
 |  | 
 | Information about the registered patches can be found under | 
 | /sys/kernel/livepatch. The patches could be enabled and disabled | 
 | by writing there. | 
 |  | 
 | /sys/kernel/livepatch/<patch>/force attributes allow administrator to affect a | 
 | patching operation. | 
 |  | 
 | See Documentation/ABI/testing/sysfs-kernel-livepatch for more details. | 
 |  | 
 |  | 
 | 7. Limitations | 
 | ============== | 
 |  | 
 | The current Livepatch implementation has several limitations: | 
 |  | 
 |   - Only functions that can be traced could be patched. | 
 |  | 
 |     Livepatch is based on the dynamic ftrace. In particular, functions | 
 |     implementing ftrace or the livepatch ftrace handler could not be | 
 |     patched. Otherwise, the code would end up in an infinite loop. A | 
 |     potential mistake is prevented by marking the problematic functions | 
 |     by "notrace". | 
 |  | 
 |  | 
 |  | 
 |   - Livepatch works reliably only when the dynamic ftrace is located at | 
 |     the very beginning of the function. | 
 |  | 
 |     The function need to be redirected before the stack or the function | 
 |     parameters are modified in any way. For example, livepatch requires | 
 |     using -fentry gcc compiler option on x86_64. | 
 |  | 
 |     One exception is the PPC port. It uses relative addressing and TOC. | 
 |     Each function has to handle TOC and save LR before it could call | 
 |     the ftrace handler. This operation has to be reverted on return. | 
 |     Fortunately, the generic ftrace code has the same problem and all | 
 |     this is handled on the ftrace level. | 
 |  | 
 |  | 
 |   - Kretprobes using the ftrace framework conflict with the patched | 
 |     functions. | 
 |  | 
 |     Both kretprobes and livepatches use a ftrace handler that modifies | 
 |     the return address. The first user wins. Either the probe or the patch | 
 |     is rejected when the handler is already in use by the other. | 
 |  | 
 |  | 
 |   - Kprobes in the original function are ignored when the code is | 
 |     redirected to the new implementation. | 
 |  | 
 |     There is a work in progress to add warnings about this situation. |