| For a higher level overview, try: perf report --sort comm,dso | 
 | Sample related events with: perf record -e '{cycles,instructions}:S' | 
 | Compare performance results with: perf diff [<old file> <new file>] | 
 | Boolean options have negative forms, e.g.: perf report --no-children | 
 | To not accumulate CPU time of children symbols add --no-children | 
 | Customize output of perf script with: perf script -F event,ip,sym | 
 | Generate a script for your data: perf script -g <lang> | 
 | Save output of perf stat using: perf stat record <target workload> | 
 | Create an archive with symtabs to analyse on other machine: perf archive | 
 | Search options using a keyword: perf report -h <keyword> | 
 | Use parent filter to see specific call path: perf report -p <regex> | 
 | List events using substring match: perf list <keyword> | 
 | To see list of saved events and attributes: perf evlist -v | 
 | Use --symfs <dir> if your symbol files are in non-standard locations | 
 | To see callchains in a more compact form: perf report -g folded | 
 | To see call chains by final symbol taking CPU time (bottom up) use perf report -G | 
 | Show individual samples with: perf script | 
 | Limit to show entries above 5% only: perf report --percent-limit 5 | 
 | Profiling branch (mis)predictions with: perf record -b / perf report | 
 | To show assembler sample context control flow use perf record -b / perf report --samples 10 and then browse context | 
 | To adjust path to source files to local file system use perf report --prefix=... --prefix-strip=... | 
 | Treat branches as callchains: perf record -b ... ; perf report --branch-history | 
 | Show estimate cycles per function and IPC in annotate use perf record -b ... ; perf report --total-cycles | 
 | To count events every 1000 msec: perf stat -I 1000 | 
 | Print event counts in machine readable CSV format with: perf stat -x\; | 
 | If you have debuginfo enabled, try: perf report -s sym,srcline | 
 | For memory address profiling, try: perf mem record / perf mem report | 
 | For tracepoint events, try: perf report -s trace_fields | 
 | To record callchains for each sample: perf record -g | 
 | If call chains don't work try perf record --call-graph dwarf or --call-graph lbr | 
 | To record every process run by a user: perf record -u <user> | 
 | To show inline functions in call traces add --inline to perf report | 
 | To not record events from perf itself add --exclude-perf | 
 | Skip collecting build-id when recording: perf record -B | 
 | To change sampling frequency to 100 Hz: perf record -F 100 | 
 | To show information about system the samples were collected on use perf report --header | 
 | To only collect call graph on one event use perf record -e cpu/cpu-cycles,callgraph=1/,branches ; perf report --show-ref-call-graph | 
 | To set sampling period of individual events use perf record -e cpu/cpu-cycles,period=100001/,cpu/branches,period=10001/ ... | 
 | To group events which need to be collected together for accuracy use {}: perf record -e {cycles,branches}' ... | 
 | To compute metrics for samples use perf record -e '{cycles,instructions}' ... ; perf script -F +metric | 
 | See assembly instructions with percentage: perf annotate <symbol> | 
 | If you prefer Intel style assembly, try: perf annotate -M intel | 
 | When collecting LBR backtraces use --stitch-lbr to handle more than 32 deep entries: perf record --call-graph lbr ; perf report --stitch-lbr | 
 | For hierarchical output, try: perf report --hierarchy | 
 | Order by the overhead of source file name and line number: perf report -s srcline | 
 | System-wide collection from all CPUs: perf record -a | 
 | Show current config key-value pairs: perf config --list | 
 | To collect Processor Trace with samples use perf record -e '{intel_pt//,cycles}' ; perf script --call-trace or --insn-trace --xed -F +ipc (remove --xed if no xed) | 
 | To trace calls using Processor Trace use perf record -e intel_pt// ... ; perf script --call-trace. Then use perf script --time A-B --insn-trace to look at region of interest. | 
 | To measure approximate function latency with Processor Trace use perf record -e intel_pt// ... ; perf script --call-ret-trace | 
 | To trace only single function with Processor Trace use perf record --filter 'filter func @ program' -e intel_pt//u ./program ; perf script --insn-trace | 
 | Show user configuration overrides: perf config --user --list | 
 | To add Node.js USDT(User-Level Statically Defined Tracing): perf buildid-cache --add `which node` | 
 | To analyze cache line scalability issues use perf c2c record ... ; perf c2c report | 
 | To browse sample contexts use perf report --sample 10 and select in context menu | 
 | To separate samples by time use perf report --sort time,overhead,sym | 
 | To filter subset of samples with report or script add --time X-Y or --cpu A,B,C or --socket-filter ... | 
 | To set sample time separation other than 100ms with --sort time use --time-quantum | 
 | Add -I to perf record to sample register values, which will be visible in perf report sample context. | 
 | To show IPC for sampling periods use perf record -e '{cycles,instructions}:S' and then browse context | 
 | To show context switches in perf report sample context add --switch-events to perf record. | 
 | To show time in nanoseconds in record/report add --ns | 
 | To compare hot regions in two workloads use perf record -b -o file ... ; perf diff --stream file1 file2 | 
 | To compare scalability of two workload samples use perf diff -c ratio file1 file2 | 
 | For latency profiling, try: perf record/report --latency | 
 | For parallelism histogram, try: perf report --hierarchy --sort latency,parallelism,comm,symbol | 
 | To analyze particular parallelism levels, try: perf report --latency --parallelism=32-64 | 
 | To see how parallelism changes over time, try: perf report -F time,latency,parallelism --time-quantum=1s |