9.4.5 Bytecode

As mentioned before, Guile compiles all code to bytecode, and that bytecode is contained in ELF images. See Object File Format, for more on Guile’s use of ELF.

To produce a bytecode image, Guile provides an assembler and a linker.

The assembler, defined in the (system vm assembler) module, has a relatively straightforward imperative interface. It provides a make-assembler function to instantiate an assembler and a set of emit-inst procedures to emit instructions of each kind.

The emit-inst procedures are actually generated at compile-time from a machine-readable description of the VM. With a few exceptions for certain operand types, each operand of an emit procedure corresponds to an operand of the corresponding instruction.

Consider allocate-words, from see Memory Access Instructions. It is documented as:

Instruction: allocate-words s12:dst s12:nwords

Therefore the emit procedure has the form:

Scheme Procedure: emit-allocate-words asm dst nwords

All emit procedure take the assembler as their first argument, and return no useful values.

The argument types depend on the operand types. See Instruction Set. Most are integers within a restricted range, though labels are generally expressed as opaque symbols. Besides the emitters that correspond to instructions, there are a few additional helpers defined in the assembler module.

Scheme Procedure: emit-label asm label

Define a label at the current program point.

Scheme Procedure: emit-source asm source

Associate source with the current program point.

Scheme Procedure: emit-cache-ref asm dst key
Scheme Procedure: emit-cache-set! asm key val

Macro-instructions to implement compilation-unit caches. A single cache cell corresponding to key will be allocated for the compilation unit.

Scheme Procedure: emit-load-constant asm dst constant

Load the Scheme datum constant into dst.

Scheme Procedure: emit-begin-program asm label properties
Scheme Procedure: emit-end-program asm

Delimit the bounds of a procedure, with the given label and the metadata properties.

Scheme Procedure: emit-load-static-procedure asm dst label

Load a procedure with the given label into local dst. This macro-instruction should only be used with procedures without free variables – procedures that are not closures.

Scheme Procedure: emit-begin-standard-arity asm req nlocals alternate
Scheme Procedure: emit-begin-opt-arity asm req opt rest nlocals alternate
Scheme Procedure: emit-begin-kw-arity asm req opt rest kw-indices allow-other-keys? nlocals alternate
Scheme Procedure: emit-end-arity asm

Delimit a clause of a procedure.

The linker is a complicated beast. Hackers interested in how it works would do well do read Ian Lance Taylor’s series of articles on linkers. Searching the internet should find them easily. From the user’s perspective, there is only one knob to control: whether the resulting image will be written out to a file or not. If the user passes #:to-file? #t as part of the compiler options (see The Scheme Compiler), the linker will align the resulting segments on page boundaries, and otherwise not.

Link an ELF image, and return the bytevector. If page-aligned? is true, Guile will align the segments with different permissions on page-sized boundaries, in order to maximize code sharing between different processes. Otherwise, padding is minimized, to minimize address space consumption.

To write an image to disk, just use put-bytevector from (ice-9 binary-ports).

Compiling object code to the fake language, value, is performed via loading objcode into a program, then executing that thunk with respect to the compilation environment. Normally the environment propagates through the compiler transparently, but users may specify the compilation environment manually as well, as a module. Procedures to load images can be found in the (system vm loader) module:

(use-modules (system vm loader))
Scheme Variable: load-thunk-from-file file
C Function: scm_load_thunk_from_file (file)

Load object code from a file named file. The file will be mapped into memory via mmap, so this is a very fast operation.

Scheme Variable: load-thunk-from-memory bv
C Function: scm_load_thunk_from_memory (bv)

Load object code from a bytevector. The data will be copied out of the bytevector in order to ensure proper alignment of embedded Scheme values.

Additionally there are procedures to find the ELF image for a given pointer, or to list all mapped ELF images:

Scheme Variable: find-mapped-elf-image ptr

Given the integer value ptr, find and return the ELF image that contains that pointer, as a bytevector. If no image is found, return #f. This routine is mostly used by debuggers and other introspective tools.

Scheme Variable: all-mapped-elf-images

Return all mapped ELF images, as a list of bytevectors.