perf, tools, script: Add brstackasm output for branch stacks
Implement printing full disassembled sequences for branch stacks in perf
script. This allows to directly print hot paths for individual samples,
together with branch misprediction and cycle count / IPC information if
available (on Skylake systems)
% perf record -b ...
% perf script -F brstackasm
...
000055b55d1147d0 pushq %rbp
000055b55d1147d1 pushq %r15
000055b55d1147d3 pushq %r14
000055b55d1147d5 pushq %r13
000055b55d1147d7 pushq %r12
000055b55d1147d9 pushq %rbx
000055b55d1147da sub $0x18, %rsp
000055b55d1147de mov %r8, %r13
000055b55d1147e1 mov %rcx, %rbp
000055b55d1147e4 mov %rdx, %r14
000055b55d1147e7 mov %rsi, %r15
000055b55d1147ea mov %rdi, %rbx
000055b55d1147ed movl $0x0, 0xc(%rsp)
000055b55d1147f5 movq (%rbp), %rax
000055b55d1147f9 test $0x1, %al
000055b55d1147fb jnz 0x55b55d114890 # PRED 4 cycles 3.75 IPC
000055b55d114890 mov %eax, %ecx
000055b55d114892 and $0x3, %ecx
000055b55d114895 cmp $0x1, %rcx
000055b55d114899 jnz 0x55b55d1148f8
000055b55d11489b movq -0x1(%rax), %rcx
000055b55d11489f cmpb $0x81, 0xb(%rcx)
000055b55d1148a3 jnz 0x55b55d1148fe # PRED 1 cycles 6.00 IPC
...
Open issues:
- Occasionally the path does not reach up to the sample IP, as the LBRs
may be freezed earlier.
v2:
Use low level abstracted disassembler interface.
Print symbols and source lines as labels.
Print first jump in LBR too.
Patch up blocks with filtered ring transfers.
Show IPC
Lots of cleanups and improvements.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2 files changed