blob: 5dcce1e9cc2563f2118c138c6aea643859f01d22 [file] [log] [blame]
Link time optimization (LTO) for the Linux kernel
This is an experimental feature.
Link Time Optimization allows the compiler to optimize the complete program
instead of just each file. LTO requires at least gcc 4.8 (but
works more efficiently with 4.9+) LTO requires Linux binutils (the normal FSF
releases used in many distributions do not work at the moment)
The compiler can inline functions between files and do various other global
optimizations, like specializing functions for common parameters,
determing when global variables are clobbered, making functions pure/const,
propagating constants globally, removing unneeded data and others.
It will also drop unused functions which can make the kernel
image smaller in some circumstances, in particular for small kernel
configurations.
For small monolithic kernels it can throw away unused code very effectively
(especially when modules are disabled) and usually shrinks
the code size.
Build time and memory consumption at build time will increase, depending
on the size of the largest binary. Modular kernels are less affected.
With LTO incremental builds are less incremental, as always the whole
binary needs to be re-optimized (but not re-parsed)
Oops can be somewhat more difficult to read, due to the more aggressive
inlining.
Normal "reasonable" builds work with less than 4GB of RAM, but very large
configurations like allyesconfig may need more memory. The actual
memory needed depends on the available memory (gcc sizes its garbage
collector pools based on that or on the ulimit -m limits) and
the compiler version.
gcc 4.9+ has much better build performance and less memory consumption
- A few kernel features are currently incompatible with LTO, in particular
function tracing, because they require special compiler flags for
specific files, which is not supported in LTO right now.
- Jobserver control for -j does not work correctly for the final
LTO phase due to some problems with the kernel's pipe code.
The makefiles hard codes -j<number of online cpus> for the final
LTO phase to work around for this
Configuration:
- Enable CONFIG_LTO_MENU and then disable CONFIG_LTO_DISABLE.
This is mainly to not have allyesconfig default to LTO.
- FUNCTION_TRACER, STACK_TRACER, FUNCTION_GRAPH_TRACER, KALLSYMS_ALL, GCOV
have to disabled because they are currently incompatible with LTO.
- MODVERSIONS have to be disabled (may work with 4.9+)
Requirements:
- Enough memory: 4GB for a standard build, more for allyesconfig
The peak memory usage happens single threaded (when lto-wpa merges types),
so dialing back -j options will not help much.
A 32bit compiler is unlikely to work due to the memory requirements.
You can however build a kernel targeted at 32bit on a 64bit host.
Example build procedure:
Simplified procedure for distributions that have gcc 4.8, but not
the Linux binutils (for example openSUSE 13.1 or FC20):
The LTO builds requires gcc-nm/gcc-ar. Some distributions ship
those in separate packages, which may need to be explicitely installed.
- Get the latest Linux binutils from
http://www.kernel.org/pub/linux/devel/binutils/
and unpack it.
We install it in a separate directory to not overwrite the system binutils.
# replace VERSION with respective version numbers
cd binutils*
# don't forget the --enable-plugins!
./configure --prefix=/opt/binutils-VERSION --enable-plugins
make -j $(getconf _NPROCESSORS_ONLN) && sudo make install
Fix up the kernel configuration to allow LTO:
<start with a suitable kernel configuration>
./source/scripts/config --disable function_tracer \
--disable function_graph_tracer \
--disable stack_tracer --enable lto_menu \
--disable lto_disable \
--disable gcov \
--disable kallsyms_all \
--disable modversions
make oldconfig
Then you can build with
# The COMPILER_PATH is needed to let gcc use the new binutils
# as the LTO plugin linker
# if you installed gcc in a separate directory like below also
# add it to the PATH line below before the regular $PATH
# The COMPILER_PATH setting is only needed if the gcc was not built
# with --with-plugin-ld pointing to the Linux binutils ld
# The AR/NM setting works around a Makefile bug
COMPILER_PATH=/opt/binutils-VERSION/bin PATH=$COMPILER_PATH:$PATH \
make -j$(getconf _NPROCESSORS_ONLN) AR=gcc-ar NM=gcc-nm
If you don't have gcc 4.8+ as system compiler you would also need
to install that compiler. In this case I recommend getting
a gcc 4.9+ snapshot from http://gcc.gnu.org (or release when available),
as it builds much faster for LTO than 4.8.
Here's an example build procedure:
Assuming gcc is unpacked in gcc-VERSION
cd gcc-VERSION
./contrib/download_preqrequisites
cd ..
mkdir obj-gcc
# please don't skip this cd. the build will not work correctly in the
# source dir, you have to use the separate object dir
cd obj-gcc
../gcc-VERSION/configure --prefix=/opt/gcc-VERSION --enable-lto \
--with-plugin-ld=/opt/binutils-VERSION/bin/ld
--disable-nls --enable-languages=c,c++ \
--disable-libstdcxx-pch
make -j$(getconf _NPROCESSORS_ONLN)
sudo make install-no-fixedincludes
FAQs:
Q: I get a section type attribute conflict
A: Usually because of someone doing
const __initdata (should be const __initconst) or const __read_mostly
(should be just const). Check both symbols reported by gcc.
Q: I see lots of undefined symbols for memcmp etc.
A: Usually because NM=gcc-nm AR=gcc-ar are missing.
The Makefile tries to set those automatically, but it doesn't always
work. Better to set it manually on the make command line.
Q: It's quite slow / uses too much memory.
A: Consider a gcc 4.9 snapshot/release (not released yet)
The main problem in 4.8 is the type merging in the single threaded WPA pass,
which has been improved considerably in 4.9 by running it distributed.
Q: It's still slow
A: It'll always be somewhat slower than non LTO sorry.
Q: What's up with .XXXXX numeric post fixes
A: This is due LTO turning (near) all symbols to static
Use gcc 4.9, it avoids them in most cases. They are also filtered out
in kallsyms.
References:
Presentation on Kernel LTO
(note, performance numbers/details outdated. In particular gcc 4.9 fixed
most of the build time problems):
http://halobates.de/kernel-lto.pdf
Generic gcc LTO:
http://www.ucw.cz/~hubicka/slides/labs2013.pdf
http://www.hipeac.net/system/files/barcelona.pdf
Somewhat outdated too:
http://gcc.gnu.org/projects/lto/lto.pdf
http://gcc.gnu.org/projects/lto/whopr.pdf
Happy Link-Time-Optimizing!
Andi Kleen