dwarves_fprintf: Fixup cacheline boundary printing on expanded structs

A diff for 'pahole -EC task_struct vmlinux' should clarify what this fixes:

  [acme@jouet linux]$ diff -u /tmp/before.c /tmp/after.c | head -30
  --- /tmp/before.c	2016-06-29 17:00:38.082647281 -0300
  +++ /tmp/a.c	2016-06-29 17:03:36.913124779 -0300
  @@ -43,8 +43,8 @@
 			struct list_head * prev;                                         /*   176     8 */
 		} group_node; /*   168    16 */
 		unsigned int       on_rq;                                                /*   184     4 */
  +		/* --- cacheline 3 boundary (192 bytes) --- */
 		/* typedef u64 */ long long unsigned int exec_start;                     /*   192     8 */
  -		/* --- cacheline 1 boundary (64 bytes) was 4 bytes ago --- */
 		/* typedef u64 */ long long unsigned int sum_exec_runtime;               /*   200     8 */
 		/* typedef u64 */ long long unsigned int vruntime;                       /*   208     8 */
 		/* typedef u64 */ long long unsigned int prev_sum_exec_runtime;          /*   216     8 */
  @@ -53,40 +53,40 @@
 			/* typedef u64 */ long long unsigned int wait_start;             /*   232     8 */
 			/* typedef u64 */ long long unsigned int wait_max;               /*   240     8 */
 			/* typedef u64 */ long long unsigned int wait_count;             /*   248     8 */
  +			/* --- cacheline 4 boundary (256 bytes) --- */
 			/* typedef u64 */ long long unsigned int wait_sum;               /*   256     8 */
 			/* typedef u64 */ long long unsigned int iowait_count;           /*   264     8 */
 			/* typedef u64 */ long long unsigned int iowait_sum;             /*   272     8 */
 			/* typedef u64 */ long long unsigned int sleep_start;            /*   280     8 */
 			/* typedef u64 */ long long unsigned int sleep_max;              /*   288     8 */
  -			/* --- cacheline 1 boundary (64 bytes) --- */
 			/* typedef s64 */ long long int sum_sleep_runtime;               /*   296     8 */
 			/* typedef u64 */ long long unsigned int block_start;            /*   304     8 */
 			/* typedef u64 */ long long unsigned int block_max;              /*   312     8 */
  +			/* --- cacheline 5 boundary (320 bytes) --- */
 			/* typedef u64 */ long long unsigned int exec_max;               /*   320     8 */
 			/* typedef u64 */ long long unsigned int slice_max;              /*   328     8 */
 			/* typedef u64 */ long long unsigned int nr_migrations_cold;     /*   336     8 */
  [acme@jouet linux]$

I.e. the boundary detection was being reset at each expanded struct, do the math globally,
using the member offset, that was already done globally and correctly.

Reported-and-Tested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 files changed