| Link time optimization (LTO) for the Linux kernel |
| |
| This is an experimental feature. |
| |
| Link Time Optimization allows the compiler to optimize the complete program |
| instead of just each file. LTO requires at least gcc 4.8 (but |
| works more efficiently with 4.9+) LTO requires Linux binutils (the normal FSF |
| releases used in many distributions do not work at the moment) |
| |
| The compiler can inline functions between files and do various other global |
| optimizations, like specializing functions for common parameters, |
| determing when global variables are clobbered, making functions pure/const, |
| propagating constants globally, removing unneeded data and others. |
| |
| It will also drop unused functions which can make the kernel |
| image smaller in some circumstances, in particular for small kernel |
| configurations. |
| |
| For small monolithic kernels it can throw away unused code very effectively |
| (especially when modules are disabled) and usually shrinks |
| the code size. |
| |
| Build time and memory consumption at build time will increase, depending |
| on the size of the largest binary. Modular kernels are less affected. |
| With LTO incremental builds are less incremental, as always the whole |
| binary needs to be re-optimized (but not re-parsed) |
| |
| Oops can be somewhat more difficult to read, due to the more aggressive |
| inlining. |
| |
| Normal "reasonable" builds work with less than 4GB of RAM, but very large |
| configurations like allyesconfig may need more memory. The actual |
| memory needed depends on the available memory (gcc sizes its garbage |
| collector pools based on that or on the ulimit -m limits) and |
| the compiler version. |
| |
| gcc 4.9+ has much better build performance and less memory consumption |
| |
| - A few kernel features are currently incompatible with LTO, in particular |
| function tracing, because they require special compiler flags for |
| specific files, which is not supported in LTO right now. |
| - Jobserver control for -j does not work correctly for the final |
| LTO phase due to some problems with the kernel's pipe code. |
| The makefiles hard codes -j<number of online cpus> for the final |
| LTO phase to work around for this |
| |
| Configuration: |
| - Enable CONFIG_LTO_MENU and then disable CONFIG_LTO_DISABLE. |
| This is mainly to not have allyesconfig default to LTO. |
| - FUNCTION_TRACER, STACK_TRACER, FUNCTION_GRAPH_TRACER, KALLSYMS_ALL, GCOV |
| have to disabled because they are currently incompatible with LTO. |
| - MODVERSIONS have to be disabled (may work with 4.9+) |
| |
| Requirements: |
| - Enough memory: 4GB for a standard build, more for allyesconfig |
| The peak memory usage happens single threaded (when lto-wpa merges types), |
| so dialing back -j options will not help much. |
| |
| A 32bit compiler is unlikely to work due to the memory requirements. |
| You can however build a kernel targeted at 32bit on a 64bit host. |
| |
| Example build procedure: |
| |
| Simplified procedure for distributions that have gcc 4.8, but not |
| the Linux binutils (for example openSUSE 13.1 or FC20): |
| |
| The LTO builds requires gcc-nm/gcc-ar. Some distributions ship |
| those in separate packages, which may need to be explicitely installed. |
| |
| - Get the latest Linux binutils from |
| http://www.kernel.org/pub/linux/devel/binutils/ |
| and unpack it. |
| |
| We install it in a separate directory to not overwrite the system binutils. |
| |
| # replace VERSION with respective version numbers |
| |
| cd binutils* |
| # don't forget the --enable-plugins! |
| ./configure --prefix=/opt/binutils-VERSION --enable-plugins |
| make -j $(getconf _NPROCESSORS_ONLN) && sudo make install |
| |
| Fix up the kernel configuration to allow LTO: |
| |
| <start with a suitable kernel configuration> |
| ./source/scripts/config --disable function_tracer \ |
| --disable function_graph_tracer \ |
| --disable stack_tracer --enable lto_menu \ |
| --disable lto_disable \ |
| --disable gcov \ |
| --disable kallsyms_all \ |
| --disable modversions |
| make oldconfig |
| |
| Then you can build with |
| |
| # The COMPILER_PATH is needed to let gcc use the new binutils |
| # as the LTO plugin linker |
| # if you installed gcc in a separate directory like below also |
| # add it to the PATH line below before the regular $PATH |
| # The COMPILER_PATH setting is only needed if the gcc was not built |
| # with --with-plugin-ld pointing to the Linux binutils ld |
| # The AR/NM setting works around a Makefile bug |
| COMPILER_PATH=/opt/binutils-VERSION/bin PATH=$COMPILER_PATH:$PATH \ |
| make -j$(getconf _NPROCESSORS_ONLN) AR=gcc-ar NM=gcc-nm |
| |
| If you don't have gcc 4.8+ as system compiler you would also need |
| to install that compiler. In this case I recommend getting |
| a gcc 4.9+ snapshot from http://gcc.gnu.org (or release when available), |
| as it builds much faster for LTO than 4.8. |
| |
| Here's an example build procedure: |
| |
| Assuming gcc is unpacked in gcc-VERSION |
| |
| cd gcc-VERSION |
| ./contrib/download_preqrequisites |
| cd .. |
| |
| mkdir obj-gcc |
| # please don't skip this cd. the build will not work correctly in the |
| # source dir, you have to use the separate object dir |
| cd obj-gcc |
| ../gcc-VERSION/configure --prefix=/opt/gcc-VERSION --enable-lto \ |
| --with-plugin-ld=/opt/binutils-VERSION/bin/ld |
| --disable-nls --enable-languages=c,c++ \ |
| --disable-libstdcxx-pch |
| make -j$(getconf _NPROCESSORS_ONLN) |
| sudo make install-no-fixedincludes |
| |
| FAQs: |
| |
| Q: I get a section type attribute conflict |
| A: Usually because of someone doing |
| const __initdata (should be const __initconst) or const __read_mostly |
| (should be just const). Check both symbols reported by gcc. |
| |
| Q: I see lots of undefined symbols for memcmp etc. |
| A: Usually because NM=gcc-nm AR=gcc-ar are missing. |
| The Makefile tries to set those automatically, but it doesn't always |
| work. Better to set it manually on the make command line. |
| |
| Q: It's quite slow / uses too much memory. |
| A: Consider a gcc 4.9 snapshot/release (not released yet) |
| The main problem in 4.8 is the type merging in the single threaded WPA pass, |
| which has been improved considerably in 4.9 by running it distributed. |
| |
| Q: It's still slow |
| A: It'll always be somewhat slower than non LTO sorry. |
| |
| Q: What's up with .XXXXX numeric post fixes |
| A: This is due LTO turning (near) all symbols to static |
| Use gcc 4.9, it avoids them in most cases. They are also filtered out |
| in kallsyms. |
| |
| References: |
| |
| Presentation on Kernel LTO |
| (note, performance numbers/details outdated. In particular gcc 4.9 fixed |
| most of the build time problems): |
| http://halobates.de/kernel-lto.pdf |
| |
| Generic gcc LTO: |
| http://www.ucw.cz/~hubicka/slides/labs2013.pdf |
| http://www.hipeac.net/system/files/barcelona.pdf |
| |
| Somewhat outdated too: |
| http://gcc.gnu.org/projects/lto/lto.pdf |
| http://gcc.gnu.org/projects/lto/whopr.pdf |
| |
| Happy Link-Time-Optimizing! |
| |
| Andi Kleen |