Guile’s compiler is quite simple – its compilers, to put it more accurately. Guile defines a tower of languages, starting at Scheme and progressively simplifying down to languages that resemble the VM instruction set (see Instruction Set).
Each language knows how to compile to the next, so each step is simple and understandable. Furthermore, this set of languages is not hardcoded into Guile, so it is possible for the user to add new high-level languages, new passes, or even different compilation targets.
Languages are registered in the module,
(system base language):
(use-modules (system base language))
They are registered with the
Define a language.
This syntax defines a
<language> object, bound to name in
the current environment. In addition, the language will be added to the
global language set. For example, this is the language definition for
(define-language scheme #:title "Scheme" #:reader (lambda (port env) ...) #:compilers `((tree-il . ,compile-tree-il)) #:decompilers `((tree-il . ,decompile-tree-il)) #:evaluator (lambda (x module) (primitive-eval x)) #:printer write #:make-default-environment (lambda () ...))
The interesting thing about having languages defined this way is that they present a uniform interface to the read-eval-print loop. This allows the user to change the current language of the REPL:
scheme@(guile-user)> ,language tree-il Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'. tree-il@(guile-user)> ,L scheme Happy hacking with Scheme! To switch back, type `,L tree-il'. scheme@(guile-user)>
Languages can be looked up by name, as they were above.
Looks up a language named name, autoloading it if necessary.
Languages are autoloaded by looking for a variable named name in
a module named
(language name spec).
The language object will be returned, or
#f if there does not
exist a language with that name.
When Guile goes to compile Scheme to bytecode, it will ask the Scheme language to choose a compiler from Scheme to the next language on the path from Scheme to bytecode. Performing this computation recursively builds transformations from a flexible chain of compilers. The next link will be obtained by invoking the language’s compiler chooser, or if not present, from the language’s compilers field.
A language can specify an analyzer, which is run before a term of that language is lowered and compiled. This is where compiler warnings are issued.
If a language specifies a lowerer, that procedure is called on expressions before compilation. This is where optimizations and canonicalizations go.
Finally a language’s compiler translates a lowered term from one language to the next one in the chain.
There is a notion of a “current language”, which is maintained in the
current-language parameter, defined in the core
module. This language is normally Scheme, and may be rebound by the
user. The run-time compilation interfaces
(see Read/Load/Eval/Compile) also allow you to choose other source
and target languages.
The normal tower of languages when compiling Scheme goes like this:
As discussed before (see Object File Format), bytecode is in ELF format, ready to be serialized to disk. But when compiling Scheme at run time, you want a Scheme value: for example, a compiled procedure. For this reason, so as not to break the abstraction, Guile defines a fake language at the bottom of the tower:
value loads the bytecode into a procedure, turning
cold bytes into warm code.
Perhaps this strangeness can be explained by example:
compile-file defaults to compiling to bytecode, because it
produces object code that has to live in the barren world outside the
Guile runtime; but
compile defaults to compiling to
as its product re-enters the Guile world.
Indeed, the process of compilation can circulate through these different worlds indefinitely, as shown by the following quine:
((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))