pahole: Encode BTF serially in a reproducible build
Now we will ask the cus instance for the next processable CU, i.e. one
that is loaded and is in the same CU order as in the original DWARF
file, under the BTF lock.
With this we can go on loading the DWARF file in parallel and only
serialize the BTF encoding, keeping that order, with this the BTF ids
end up the same both for a serial encoding:
And here are some numbers with a Release build:
$ cat buildcmd.sh
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
cd ..
make -j $(getconf _NPROCESSORS_ONLN) -C build
$ rm -rf build
$ ./buildcmd.sh
Its an Intel Hybrid system, and migrates to/from efficiency/perfomance
cores:
$ getconf _NPROCESSORS_ONLN
28
$ grep -m1 'model name' /proc/cpuinfo
model name : Intel(R) Core(TM) i7-14700K
$
8 performance cores (16 threads), 12 efficiency cores.
Serial encoding:
$ time perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux
Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf.serial vmlinux' (5 runs):
13,313,169,305 cpu_atom/cycles:u/ ( +- 30.61% ) (0.00%)
27,985,776,096 cpu_core/cycles:u/ ( +- 0.17% ) (100.00%)
5.18276 +- 0.00952 seconds time elapsed ( +- 0.18% )
real 0m25.937s
user 0m25.337s
sys 0m0.533s
$
Parallel, but non-reproducible:
$ time perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux
Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux' (5 runs):
65,781,092,442 cpu_atom/cycles:u/ ( +- 0.99% ) (42.99%)
88,578,827,055 cpu_core/cycles:u/ ( +- 0.90% ) (60.93%)
1.8529 +- 0.0159 seconds time elapsed ( +- 0.86% )
real 0m9.293s
user 1m21.599s
sys 0m11.348s
$
Now what we want, a reproducible build done using parallel DWARF loading
+ CUs-ordered-as-in-vmlinux serial BTF encoding:
$ time perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux
Performance counter stats for 'pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux' (5 runs):
21,255,687,225 cpu_atom/cycles:u/ ( +- 0.76% ) (35.06%)
33,852,263,760 cpu_core/cycles:u/ ( +- 0.24% ) (72.70%)
2.3632 +- 0.0164 seconds time elapsed ( +- 0.69% )
real 0m11.840s
user 0m35.952s
sys 0m1.534s
$
Fastest is off course the unreproducible, fully parallel DWARF loading/
BTF encoding at 1.8529 +- 0.0159 seconds, but doing a reproducible build
in 2.3632 +- 0.0164 seconds is better than completely disabling -j/full
serial at 5.18276 +- 0.00952 seconds.
Comparing the BTF generated:
$ bpftool btf dump file vmlinux.btf.serial > output.vmlinux.btf.serial
$ bpftool btf dump file vmlinux.btf.parallel > output.vmlinux.btf.parallel
$ bpftool btf dump file vmlinux.btf.parallel.reproducible > output.vmlinux.btf.parallel.reproducible
$ wc -l output.vmlinux.btf.serial output.vmlinux.btf.parallel output.vmlinux.btf.parallel.reproducible
313404 output.vmlinux.btf.serial
314345 output.vmlinux.btf.parallel
313404 output.vmlinux.btf.parallel.reproducible
941153 total
$
Non reproducible parallel BTF encoding:
$ diff -u output.vmlinux.btf.serial output.vmlinux.btf.parallel | head
--- output.vmlinux.btf.serial 2024-04-02 11:11:56.665027947 -0300
+++ output.vmlinux.btf.parallel 2024-04-02 11:12:38.490895460 -0300
@@ -1,1708 +1,2553 @@
[1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
-[2] CONST '(anon)' type_id=1
-[3] VOLATILE '(anon)' type_id=2
-[4] ARRAY '(anon)' type_id=1 index_type_id=21 nr_elems=2
-[5] PTR '(anon)' type_id=8
-[6] CONST '(anon)' type_id=5
-[7] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
$
Reproducible:
$ diff -u output.vmlinux.btf.serial output.vmlinux.btf.parallel.reproducible
$
And using a test script that I'll add to a nascent repository of
regression tests:
$ time tests/reproducible_build.sh vmlinux
Parallel reproducible DWARF Loading/Serial BTF encoding: Ok
real 1m13.844s
user 3m3.601s
sys 0m9.049s
$
If the number of threads started by pahole is different than what was
requests via its -j command line option, it will fail as well as if the
output of 'bpftool btf dump' differs from the BTF encoded totally
serially to one of the detached BTF encoded using reproducible DWARF
loading/BTF encoding.
In verbose mode:
$ time VERBOSE=1 tests/reproducible_build.sh vmlinux
Parallel reproducible DWARF Loading/Serial BTF encoding:
serial encoding...
1 threads encoding
1 threads started
diff from serial encoding:
-----------------------------
2 threads encoding
2 threads started
diff from serial encoding:
-----------------------------
3 threads encoding
3 threads started
diff from serial encoding:
-----------------------------
4 threads encoding
4 threads started
diff from serial encoding:
-----------------------------
5 threads encoding
5 threads started
diff from serial encoding:
-----------------------------
6 threads encoding
6 threads started
diff from serial encoding:
-----------------------------
7 threads encoding
7 threads started
diff from serial encoding:
-----------------------------
8 threads encoding
8 threads started
diff from serial encoding:
-----------------------------
9 threads encoding
9 threads started
diff from serial encoding:
-----------------------------
10 threads encoding
10 threads started
diff from serial encoding:
-----------------------------
11 threads encoding
11 threads started
diff from serial encoding:
-----------------------------
12 threads encoding
12 threads started
diff from serial encoding:
-----------------------------
13 threads encoding
13 threads started
diff from serial encoding:
-----------------------------
14 threads encoding
14 threads started
diff from serial encoding:
-----------------------------
15 threads encoding
15 threads started
diff from serial encoding:
-----------------------------
16 threads encoding
16 threads started
diff from serial encoding:
-----------------------------
17 threads encoding
17 threads started
diff from serial encoding:
-----------------------------
18 threads encoding
18 threads started
diff from serial encoding:
-----------------------------
19 threads encoding
19 threads started
diff from serial encoding:
-----------------------------
20 threads encoding
20 threads started
diff from serial encoding:
-----------------------------
21 threads encoding
21 threads started
diff from serial encoding:
-----------------------------
22 threads encoding
22 threads started
diff from serial encoding:
-----------------------------
23 threads encoding
23 threads started
diff from serial encoding:
-----------------------------
24 threads encoding
24 threads started
diff from serial encoding:
-----------------------------
25 threads encoding
25 threads started
diff from serial encoding:
-----------------------------
26 threads encoding
26 threads started
diff from serial encoding:
-----------------------------
27 threads encoding
27 threads started
diff from serial encoding:
-----------------------------
28 threads encoding
28 threads started
diff from serial encoding:
-----------------------------
Ok
real 1m14.800s
user 3m4.315s
sys 0m8.977s
$
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
1 file changed