| From: Tom Zanussi <tom.zanussi@linux.intel.com> |
| Date: Tue, 5 Sep 2017 16:57:49 -0500 |
| Subject: [PATCH 37/40] tracing: Add inter-event hist trigger Documentation |
| |
| Add background and details on inter-event hist triggers, including |
| hist variables, synthetic events, and actions. |
| |
| Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> |
| Signed-off-by: Baohong Liu <baohong.liu@intel.com> |
| Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
| --- |
| Documentation/trace/events.txt | 385 +++++++++++++++++++++++++++++++++++++++++ |
| 1 file changed, 385 insertions(+) |
| |
| --- a/Documentation/trace/events.txt |
| +++ b/Documentation/trace/events.txt |
| @@ -571,6 +571,7 @@ triggers (you have to use '!' for each o |
| .sym-offset display an address as a symbol and offset |
| .syscall display a syscall id as a system call name |
| .execname display a common_pid as a program name |
| + .usecs display a $common_timestamp in microseconds |
| |
| Note that in general the semantics of a given field aren't |
| interpreted when applying a modifier to it, but there are some |
| @@ -2101,3 +2102,387 @@ triggers (you have to use '!' for each o |
| Hits: 489 |
| Entries: 7 |
| Dropped: 0 |
| + |
| +6.3 Inter-event hist triggers |
| +----------------------------- |
| + |
| +Inter-event hist triggers are hist triggers that combine values from |
| +one or more other events and create a histogram using that data. Data |
| +from an inter-event histogram can in turn become the source for |
| +further combined histograms, thus providing a chain of related |
| +histograms, which is important for some applications. |
| + |
| +The most important example of an inter-event quantity that can be used |
| +in this manner is latency, which is simply a difference in timestamps |
| +between two events (although trace events don't have an externally |
| +visible timestamp field, the inter-event hist trigger support adds a |
| +pseudo-field to all events named '$common_timestamp' which can be used |
| +as if it were an actual event field). Although latency is the most |
| +important inter-event quantity, note that because the support is |
| +completely general across the trace event subsystem, any event field |
| +can be used in an inter-event quantity. |
| + |
| +An example of a histogram that combines data from other histograms |
| +into a useful chain would be a 'wakeupswitch latency' histogram that |
| +combines a 'wakeup latency' histogram and a 'switch latency' |
| +histogram. |
| + |
| +Normally, a hist trigger specification consists of a (possibly |
| +compound) key along with one or more numeric values, which are |
| +continually updated sums associated with that key. A histogram |
| +specification in this case consists of individual key and value |
| +specifications that refer to trace event fields associated with a |
| +single event type. |
| + |
| +The inter-event hist trigger extension allows fields from multiple |
| +events to be referenced and combined into a multi-event histogram |
| +specification. In support of this overall goal, a few enabling |
| +features have been added to the hist trigger support: |
| + |
| + - In order to compute an inter-event quantity, a value from one |
| + event needs to saved and then referenced from another event. This |
| + requires the introduction of support for histogram 'variables'. |
| + |
| + - The computation of inter-event quantities and their combination |
| + require some minimal amount of support for applying simple |
| + expressions to variables (+ and -). |
| + |
| + - A histogram consisting of inter-event quantities isn't logically a |
| + histogram on either event (so having the 'hist' file for either |
| + event host the histogram output doesn't really make sense). To |
| + address the idea that the histogram is associated with a |
| + combination of events, support is added allowing the creation of |
| + 'synthetic' events that are events derived from other events. |
| + These synthetic events are full-fledged events just like any other |
| + and can be used as such, as for instance to create the |
| + 'combination' histograms mentioned previously. |
| + |
| + - A set of 'actions' can be associated with histogram entries - |
| + these can be used to generate the previously mentioned synthetic |
| + events, but can also be used for other purposes, such as for |
| + example saving context when a 'max' latency has been hit. |
| + |
| + - Trace events don't have a 'timestamp' associated with them, but |
| + there is an implicit timestamp saved along with an event in the |
| + underlying ftrace ring buffer. This timestamp is now exposed as a |
| + a synthetic field named '$common_timestamp' which can be used in |
| + histograms as if it were any other event field. Note that it has |
| + a '$' prefixed to it - this is meant to indicate that it isn't an |
| + actual field in the trace format but rather is a synthesized value |
| + that nonetheless can be used as if it were an actual field. By |
| + default it is in units of nanoseconds; appending '.usecs' to a |
| + common_timestamp field changes the units to microseconds. |
| + |
| +These features are decribed in more detail in the following sections. |
| + |
| +6.3.1 Histogram Variables |
| +------------------------- |
| + |
| +Variables are simply named locations used for saving and retrieving |
| +values between matching events. A 'matching' event is defined as an |
| +event that has a matching key - if a variable is saved for a histogram |
| +entry corresponding to that key, any subsequent event with a matching |
| +key can access that variable. |
| + |
| +A variable's value is normally available to any subsequent event until |
| +it is set to something else by a subsequent event. The one exception |
| +to that rule is that any variable used in an expression is essentially |
| +'read-once' - once it's used by an expression in a subsequent event, |
| +it's reset to its 'unset' state, which means it can't be used again |
| +unless it's set again. This ensures not only that an event doesn't |
| +use an uninitialized variable in a calculation, but that that variable |
| +is used only once and not for any unrelated subsequent match. |
| + |
| +The basic syntax for saving a variable is to simply prefix a unique |
| +variable name not corresponding to any keyword along with an '=' sign |
| +to any event field. |
| + |
| +Either keys or values can be saved and retrieved in this way. This |
| +creates a variable named 'ts0' for a histogram entry with the key |
| +'next_pid': |
| + |
| + # echo 'hist:keys=next_pid:vals=$ts0:ts0=$common_timestamp ... >> \ |
| + event/trigger |
| + |
| +The ts0 variable can be accessed by any subsequent event having the |
| +same pid as 'next_pid'. |
| + |
| +Variable references are formed by prepending the variable name with |
| +the '$' sign. Thus for example, the ts0 variable above would be |
| +referenced as '$ts0' in expressions. |
| + |
| +Because 'vals=' is used, the $common_timestamp variable value above |
| +will also be summed as a normal histogram value would (though for a |
| +timestamp it makes little sense). |
| + |
| +The below shows that a key value can also be saved in the same way: |
| + |
| + # echo 'hist:timer_pid=common_pid:key=timer_pid ...' >> event/trigger |
| + |
| +If a variable isn't a key variable or prefixed with 'vals=', the |
| +associated event field will be saved in a variable but won't be summed |
| +as a value: |
| + |
| + # echo 'hist:keys=next_pid:ts1=$common_timestamp ... >> event/trigger |
| + |
| +Multiple variables can be assigned at the same time. The below would |
| +result in both ts0 and b being created as variables, with both |
| +common_timestamp and field1 additionally being summed as values: |
| + |
| + # echo 'hist:keys=pid:vals=$ts0,$b:ts0=$common_timestamp,b=field1 ... >> \ |
| + event/trigger |
| + |
| +Note that variable assignments can appear either preceding or |
| +following their use. The command below behaves identically to the |
| +command above: |
| + |
| + # echo 'hist:keys=pid:ts0=$common_timestamp,b=field1:vals=$ts0,$b ... >> \ |
| + event/trigger |
| + |
| +Any number of variables not bound to a 'vals=' prefix can also be |
| +assigned by simply separating them with colons. Below is the same |
| +thing but without the values being summed in the histogram: |
| + |
| + # echo 'hist:keys=pid:ts0=$common_timestamp:b=field1 ... >> event/trigger |
| + |
| +Variables set as above can be referenced and used in expressions on |
| +another event. |
| + |
| +For example, here's how a latency can be calculated: |
| + |
| + # echo 'hist:keys=pid,prio:ts0=$common_timestamp ... >> event1/trigger |
| + # echo 'hist:keys=next_pid:wakeup_lat=$common_timestamp-$ts0 ... >> event2/trigger |
| + |
| +In the first line above, the event's timetamp is saved into the |
| +variable ts0. In the next line, ts0 is subtracted from the second |
| +event's timestamp to produce the latency, which is then assigned into |
| +yet another variable, 'wakeup_lat'. The hist trigger below in turn |
| +makes use of the wakeup_lat variable to compute a combined latency |
| +using the same key and variable from yet another event: |
| + |
| + # echo 'hist:key=pid:wakeupswitch_lat=$wakeup_lat+$switchtime_lat ... >> event3/trigger |
| + |
| +6.3.2 Synthetic Events |
| +---------------------- |
| + |
| +Synthetic events are user-defined events generated from hist trigger |
| +variables or fields associated with one or more other events. Their |
| +purpose is to provide a mechanism for displaying data spanning |
| +multiple events consistent with the existing and already familiar |
| +usage for normal events. |
| + |
| +To define a synthetic event, the user writes a simple specification |
| +consisting of the name of the new event along with one or more |
| +variables and their types, which can be any valid field type, |
| +separated by semicolons, to the tracing/synthetic_events file. |
| + |
| +For instance, the following creates a new event named 'wakeup_latency' |
| +with 3 fields: lat, pid, and prio. Each of those fields is simply a |
| +variable reference to a variable on another event: |
| + |
| + # echo 'wakeup_latency \ |
| + u64 lat; \ |
| + pid_t pid; \ |
| + int prio' >> \ |
| + /sys/kernel/debug/tracing/synthetic_events |
| + |
| +Reading the tracing/synthetic_events file lists all the currently |
| +defined synthetic events, in this case the event defined above: |
| + |
| + # cat /sys/kernel/debug/tracing/synthetic_events |
| + wakeup_latency u64 lat; pid_t pid; int prio |
| + |
| +An existing synthetic event definition can be removed by prepending |
| +the command that defined it with a '!': |
| + |
| + # echo '!wakeup_latency u64 lat pid_t pid int prio' >> \ |
| + /sys/kernel/debug/tracing/synthetic_events |
| + |
| +At this point, there isn't yet an actual 'wakeup_latency' event |
| +instantiated in the event subsytem - for this to happen, a 'hist |
| +trigger action' needs to be instantiated and bound to actual fields |
| +and variables defined on other events (see Section 6.3.3 below). |
| + |
| +Once that is done, an event instance is created, and a histogram can |
| +be defined using it: |
| + |
| + # echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \ |
| + /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger |
| + |
| +The new event is created under the tracing/events/synthetic/ directory |
| +and looks and behaves just like any other event: |
| + |
| + # ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency |
| + enable filter format hist id trigger |
| + |
| +Like any other event, once a histogram is enabled for the event, the |
| +output can be displayed by reading the event's 'hist' file. |
| + |
| +6.3.3 Hist trigger 'actions' |
| +---------------------------- |
| + |
| +A hist trigger 'action' is a function that's executed whenever a |
| +histogram entry is added or updated. |
| + |
| +The default 'action' if no special function is explicity specified is |
| +as it always has been, to simply update the set of values associated |
| +with an entry. Some applications, however, may want to perform |
| +additional actions at that point, such as generate another event, or |
| +compare and save a maximum. |
| + |
| +The following additional actions are available. To specify an action |
| +for a given event, simply specify the action between colons in the |
| +hist trigger specification. |
| + |
| + - onmatch(matching.event).<synthetic_event_name>(param list) |
| + |
| + The 'onmatch(matching.event).<synthetic_event_name>(params)' hist |
| + trigger action is invoked whenever an event matches and the |
| + histogram entry would be added or updated. It causes the named |
| + synthetic event to be generated with the values given in the |
| + 'param list'. The result is the generation of a synthetic event |
| + that consists of the values contained in those variables at the |
| + time the invoking event was hit. |
| + |
| + The 'param list' consists of one or more parameters which may be |
| + either variables or fields defined on either the 'matching.event' |
| + or the target event. The variables or fields specified in the |
| + param list may be either fully-qualified or unqualified. If a |
| + variable is specified as unqualified, it must be unique between |
| + the two events. A field name used as a param can be unqualified |
| + if it refers to the target event, but must be fully qualified if |
| + it refers to the matching event. A fully-qualified name is of the |
| + form 'system.event_name.$var_name' or 'system.event_name.field'. |
| + |
| + The 'matching.event' specification is simply the fully qualified |
| + event name of the event that matches the target event for the |
| + onmatch() functionality, in the form 'system.event_name'. |
| + |
| + Finally, the number and type of variables/fields in the 'param |
| + list' must match the number and types of the fields in the |
| + synthetic event being generated. |
| + |
| + As an example the below defines a simple synthetic event and uses |
| + a variable defined on the sched_wakeup_new event as a parameter |
| + when invoking the synthetic event. Here we define the synthetic |
| + event: |
| + |
| + # echo 'wakeup_new_test pid_t pid' >> \ |
| + /sys/kernel/debug/tracing/synthetic_events |
| + |
| + # cat /sys/kernel/debug/tracing/synthetic_events |
| + wakeup_new_test pid_t pid |
| + |
| + The following hist trigger both defines the missing testpid |
| + variable and specifies an onmatch() action that generates a |
| + wakeup_new_test synthetic event whenever a sched_wakeup_new event |
| + occurs, which because of the 'if comm == "cyclictest"' filter only |
| + happens when the executable is cyclictest: |
| + |
| + # echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\ |
| + wakeup_new_test($testpid) if comm=="cyclictest"' >> \ |
| + /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger |
| + |
| + Creating and displaying a histogram based on those events is now |
| + just a matter of using the fields and new synthetic event in the |
| + tracing/events/synthetic directory, as usual: |
| + |
| + # echo 'hist:keys=pid:sort=pid' >> \ |
| + /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger |
| + |
| + Running 'cyclictest' should cause wakeup_new events to generate |
| + wakeup_new_test synthetic events which should result in histogram |
| + output in the wakeup_new_test event's hist file: |
| + |
| + # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/hist |
| + |
| + A more typical usage would be to use two events to calculate a |
| + latency. The following example uses a set of hist triggers to |
| + produce a 'wakeup_latency' histogram: |
| + |
| + First, we define a 'wakeup_latency' synthetic event: |
| + |
| + # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \ |
| + /sys/kernel/debug/tracing/synthetic_events |
| + |
| + Next, we specify that whenever we see a sched_waking event for a |
| + cyclictest thread, save the timestamp in a 'ts0' variable: |
| + |
| + # echo 'hist:keys=$saved_pid:saved_pid=pid:ts0=$common_timestamp.usecs \ |
| + if comm=="cyclictest"' >> \ |
| + /sys/kernel/debug/tracing/events/sched/sched_waking/trigger |
| + |
| + Then, when the corresponding thread is actually scheduled onto the |
| + CPU by a sched_switch event, calculate the latency and use that |
| + along with another variable and an event field to generate a |
| + wakeup_latency synthetic event: |
| + |
| + # echo 'hist:keys=next_pid:wakeup_lat=$common_timestamp.usecs-$ts0:\ |
| + onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,\ |
| + $saved_pid,next_prio) if next_comm=="cyclictest"' >> \ |
| + /sys/kernel/debug/tracing/events/sched/sched_switch/trigger |
| + |
| + We also need to create a histogram on the wakeup_latency synthetic |
| + event in order to aggregate the generated synthetic event data: |
| + |
| + # echo 'hist:keys=pid,prio,lat:sort=pid,lat' >> \ |
| + /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger |
| + |
| + Finally, once we've run cyclictest to actually generate some |
| + events, we can see the output by looking at the wakeup_latency |
| + synthetic event's hist file: |
| + |
| + # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist |
| + |
| + - onmax(var).save(field,.. .) |
| + |
| + The 'onmax(var).save(field,...)' hist trigger action is invoked |
| + whenever the value of 'var' associated with a histogram entry |
| + exceeds the current maximum contained in that variable. |
| + |
| + The end result is that the trace event fields specified as the |
| + onmax.save() params will be saved if 'var' exceeds the current |
| + maximum for that hist trigger entry. This allows context from the |
| + event that exhibited the new maximum to be saved for later |
| + reference. When the histogram is displayed, additional fields |
| + displaying the saved values will be printed. |
| + |
| + As an example the below defines a couple of hist triggers, one for |
| + sched_waking and another for sched_switch, keyed on pid. Whenever |
| + a sched_waking occurs, the timestamp is saved in the entry |
| + corresponding to the current pid, and when the scheduler switches |
| + back to that pid, the timestamp difference is calculated. If the |
| + resulting latency, stored in wakeup_lat, exceeds the current |
| + maximum latency, the values specified in the save() fields are |
| + recoreded: |
| + |
| + # echo 'hist:keys=pid:ts0=$common_timestamp.usecs \ |
| + if comm=="cyclictest"' >> \ |
| + /sys/kernel/debug/tracing/events/sched/sched_waking/trigger |
| + |
| + # echo 'hist:keys=next_pid:\ |
| + wakeup_lat=$common_timestamp.usecs-$ts0:\ |
| + onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \ |
| + if next_comm=="cyclictest"' >> \ |
| + /sys/kernel/debug/tracing/events/sched/sched_switch/trigger |
| + |
| + When the histogram is displayed, the max value and the saved |
| + values corresponding to the max are displayed following the rest |
| + of the fields: |
| + |
| + # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist |
| + { next_pid: 2255 } hitcount: 239 |
| + common_timestamp-ts0: 0 |
| + max: 27 |
| + next_comm: cyclictest |
| + prev_pid: 0 prev_prio: 120 prev_comm: swapper/1 |
| + |
| + { next_pid: 2256 } hitcount: 2355 |
| + common_timestamp-ts0: 0 |
| + max: 49 next_comm: cyclictest |
| + prev_pid: 0 prev_prio: 120 prev_comm: swapper/0 |
| + |
| + Totals: |
| + Hits: 12970 |
| + Entries: 2 |
| + Dropped: 0 |