| Using the Linux Kernel Latency Histograms |
| |
| |
| This document gives a short explanation how to enable, configure and use |
| latency histograms. Latency histograms are primarily relevant in the |
| context of real-time enabled kernels (CONFIG_PREEMPT/CONFIG_PREEMPT_RT) |
| and are used in the quality management of the Linux real-time |
| capabilities. |
| |
| |
| * Purpose of latency histograms |
| |
| A latency histogram continuously accumulates the frequencies of latency |
| data. There are two types of histograms |
| - potential sources of latencies |
| - effective latencies |
| |
| |
| * Potential sources of latencies |
| |
| Potential sources of latencies are code segments where interrupts, |
| preemption or both are disabled (aka critical sections). To create |
| histograms of potential sources of latency, the kernel stores the time |
| stamp at the start of a critical section, determines the time elapsed |
| when the end of the section is reached, and increments the frequency |
| counter of that latency value - irrespective of whether any concurrently |
| running process is affected by latency or not. |
| - Configuration items (in the Kernel hacking/Tracers submenu) |
| CONFIG_INTERRUPT_OFF_LATENCY |
| CONFIG_PREEMPT_OFF_LATENCY |
| |
| |
| * Effective latencies |
| |
| Effective latencies are actually occuring during wakeup of a process. To |
| determine effective latencies, the kernel stores the time stamp when a |
| process is scheduled to be woken up, and determines the duration of the |
| wakeup time shortly before control is passed over to this process. Note |
| that the apparent latency in user space may be somewhat longer, since the |
| process may be interrupted after control is passed over to it but before |
| the execution in user space takes place. Simply measuring the interval |
| between enqueuing and wakeup may also not appropriate in cases when a |
| process is scheduled as a result of a timer expiration. The timer may have |
| missed its deadline, e.g. due to disabled interrupts, but this latency |
| would not be registered. Therefore, the offsets of missed timers are |
| recorded in a separate histogram. If both wakeup latency and missed timer |
| offsets are configured and enabled, a third histogram may be enabled that |
| records the overall latency as a sum of the timer latency, if any, and the |
| wakeup latency. This histogram is called "timerandwakeup". |
| - Configuration items (in the Kernel hacking/Tracers submenu) |
| CONFIG_WAKEUP_LATENCY |
| CONFIG_MISSED_TIMER_OFSETS |
| |
| |
| * Usage |
| |
| The interface to the administration of the latency histograms is located |
| in the debugfs file system. To mount it, either enter |
| |
| mount -t sysfs nodev /sys |
| mount -t debugfs nodev /sys/kernel/debug |
| |
| from shell command line level, or add |
| |
| nodev /sys sysfs defaults 0 0 |
| nodev /sys/kernel/debug debugfs defaults 0 0 |
| |
| to the file /etc/fstab. All latency histogram related files are then |
| available in the directory /sys/kernel/debug/tracing/latency_hist. A |
| particular histogram type is enabled by writing non-zero to the related |
| variable in the /sys/kernel/debug/tracing/latency_hist/enable directory. |
| Select "preemptirqsoff" for the histograms of potential sources of |
| latencies and "wakeup" for histograms of effective latencies etc. The |
| histogram data - one per CPU - are available in the files |
| |
| /sys/kernel/debug/tracing/latency_hist/preemptoff/CPUx |
| /sys/kernel/debug/tracing/latency_hist/irqsoff/CPUx |
| /sys/kernel/debug/tracing/latency_hist/preemptirqsoff/CPUx |
| /sys/kernel/debug/tracing/latency_hist/wakeup/CPUx |
| /sys/kernel/debug/tracing/latency_hist/wakeup/sharedprio/CPUx |
| /sys/kernel/debug/tracing/latency_hist/missed_timer_offsets/CPUx |
| /sys/kernel/debug/tracing/latency_hist/timerandwakeup/CPUx |
| |
| The histograms are reset by writing non-zero to the file "reset" in a |
| particular latency directory. To reset all latency data, use |
| |
| #!/bin/sh |
| |
| TRACINGDIR=/sys/kernel/debug/tracing |
| HISTDIR=$TRACINGDIR/latency_hist |
| |
| if test -d $HISTDIR |
| then |
| cd $HISTDIR |
| for i in `find . | grep /reset$` |
| do |
| echo 1 >$i |
| done |
| fi |
| |
| |
| * Data format |
| |
| Latency data are stored with a resolution of one microsecond. The |
| maximum latency is 10,240 microseconds. The data are only valid, if the |
| overflow register is empty. Every output line contains the latency in |
| microseconds in the first row and the number of samples in the second |
| row. To display only lines with a positive latency count, use, for |
| example, |
| |
| grep -v " 0$" /sys/kernel/debug/tracing/latency_hist/preemptoff/CPU0 |
| |
| #Minimum latency: 0 microseconds. |
| #Average latency: 0 microseconds. |
| #Maximum latency: 25 microseconds. |
| #Total samples: 3104770694 |
| #There are 0 samples greater or equal than 10240 microseconds |
| #usecs samples |
| 0 2984486876 |
| 1 49843506 |
| 2 58219047 |
| 3 5348126 |
| 4 2187960 |
| 5 3388262 |
| 6 959289 |
| 7 208294 |
| 8 40420 |
| 9 4485 |
| 10 14918 |
| 11 18340 |
| 12 25052 |
| 13 19455 |
| 14 5602 |
| 15 969 |
| 16 47 |
| 17 18 |
| 18 14 |
| 19 1 |
| 20 3 |
| 21 2 |
| 22 5 |
| 23 2 |
| 25 1 |
| |
| |
| * Wakeup latency of a selected process |
| |
| To only collect wakeup latency data of a particular process, write the |
| PID of the requested process to |
| |
| /sys/kernel/debug/tracing/latency_hist/wakeup/pid |
| |
| PIDs are not considered, if this variable is set to 0. |
| |
| |
| * Details of the process with the highest wakeup latency so far |
| |
| Selected data of the process that suffered from the highest wakeup |
| latency that occurred in a particular CPU are available in the file |
| |
| /sys/kernel/debug/tracing/latency_hist/wakeup/max_latency-CPUx. |
| |
| In addition, other relevant system data at the time when the |
| latency occurred are given. |
| |
| The format of the data is (all in one line): |
| <PID> <Priority> <Latency> (<Timeroffset>) <Command> \ |
| <- <PID> <Priority> <Command> <Timestamp> |
| |
| The value of <Timeroffset> is only relevant in the combined timer |
| and wakeup latency recording. In the wakeup recording, it is |
| always 0, in the missed_timer_offsets recording, it is the same |
| as <Latency>. |
| |
| When retrospectively searching for the origin of a latency and |
| tracing was not enabled, it may be helpful to know the name and |
| some basic data of the task that (finally) was switching to the |
| late real-tlme task. In addition to the victim's data, also the |
| data of the possible culprit are therefore displayed after the |
| "<-" symbol. |
| |
| Finally, the timestamp of the time when the latency occurred |
| in <seconds>.<microseconds> after the most recent system boot |
| is provided. |
| |
| These data are also reset when the wakeup histogram is reset. |