blob: 39c4690fbb48c2783481c5c017072ee905f3c1d6 [file] [log] [blame]
JIT Compiler Design
Copyright (C) 2006 Pekka Enberg
This file is released under the GPLv2.
Introduction
============
The compiler is divided into the following passes: control-flow graph
construction, bytecode parsing, instruction selection, and code
emission. The compiler analyzes the given bytecode sequence to find
basic blocks for constructing the control-flow graph. This pass is
done first to simplify parsing of bytecode branches. Bytecode
sequence is then parsed and converted to an expression tree. The tree
is given to the instruction selector to lower the IR to three-address
code. Code emission phase converts that sequence to machine code
which can be executed.
Programs are compiled one method at a time. Invocation of a method is
replaced with an invocation of a special per-method JIT trampoline
that is responsible for compiling the actual target method upon first
invocation.
Intermediate Representations
============================
The compiler has two intermediate representations: expression tree and
three-address code. The expression tree is constructed from bytecode
sequence of a method whereas three-address code is the result of
instruction selection. Three-address code is translated to executable
machine code.
The JIT compiler operates on one method at a time called a compilation
unit. A compilation unit is made up of one or more basic blocks which
represent straight-line code. Each basic block has a list of one or
more statements that can either be standalone or operate on one or two
expression trees.
The instruction selector emits three-address code for a compilation
unit from the expression tree. This intermediate representation is
essentially a sequence of instructions that mimic the native
instruction set. One notable exception is branch targets which are
represented as pointers to instructions. The pointers are converted
to real machine code targets with back-patching during code emission.