Next: , Previous: , Up: Compiling to the Virtual Machine   [Contents][Index]

9.4.5 Bytecode

As mentioned before, Guile compiles all code to bytecode, and that bytecode is contained in ELF images. See Object File Format, for more on Guile’s use of ELF.

To produce a bytecode image, Guile provides an assembler and a linker.

The assembler, defined in the (system vm assembler) module, has a relatively straightforward imperative interface. It provides a make-assembler function to instantiate an assembler and a set of emit-inst procedures to emit instructions of each kind.

The emit-inst procedures are actually generated at compile-time from a machine-readable description of the VM. With a few exceptions for certain operand types, each operand of an emit procedure corresponds to an operand of the corresponding instruction.

Consider vector-length, from see Miscellaneous Instructions. It is documented as:

Instruction: vector-length u12:dst u12:src

Therefore the emit procedure has the form:

Scheme Procedure: emit-vector-length asm dst src

All emit procedure take the assembler as their first argument, and return no useful values.

The argument types depend on the operand types. See Instruction Set. Most are integers within a restricted range, though labels are generally expressed as opaque symbols.

There are a few macro-instructions as well.

Scheme Procedure: emit-label asm label

Define a label at the current program point.

Scheme Procedure: emit-source asm source

Associate source with the current program point.

Scheme Procedure: emit-cache-current-module! asm module scope
Scheme Procedure: emit-cached-toplevel-box asm dst scope sym bound?
Scheme Procedure: emit-cached-module-box asm dst module-name sym public? bound?

Macro-instructions to implement caching of top-level variables. The first takes the current module, in the slot module, and associates it with a cache location identified by scope. The second takes a scope, and resolves the variable. See Top-Level Environment Instructions. The last does not need a cached module, rather taking the module name directly.

Scheme Procedure: emit-load-constant asm dst constant

Load the Scheme datum constant into dst.

Scheme Procedure: emit-begin-program asm label properties
Scheme Procedure: emit-end-program asm

Delimit the bounds of a procedure, with the given label and the metadata properties.

Scheme Procedure: emit-load-static-procedure asm dst label

Load a procedure with the given label into local dst. This macro-instruction should only be used with procedures without free variables – procedures that are not closures.

Scheme Procedure: emit-begin-standard-arity asm req nlocals alternate
Scheme Procedure: emit-begin-opt-arity asm req opt rest nlocals alternate
Scheme Procedure: emit-begin-kw-arity asm req opt rest kw-indices allow-other-keys? nlocals alternate
Scheme Procedure: emit-end-arity asm

Delimit a clause of a procedure.

Scheme Procedure: emit-br-if-symbol asm slot invert? label
Scheme Procedure: emit-br-if-variable asm slot invert? label
Scheme Procedure: emit-br-if-vector asm slot invert? label
Scheme Procedure: emit-br-if-string asm slot invert? label
Scheme Procedure: emit-br-if-bytevector asm slot invert? label
Scheme Procedure: emit-br-if-bitvector asm slot invert? label

TC7-specific test-and-branch instructions. The TC7 is a 7-bit code that is part of a heap object’s type. See The SCM Type in Guile. Also, See Branch Instructions.

The linker is a complicated beast. Hackers interested in how it works would do well do read Ian Lance Taylor’s series of articles on linkers. Searching the internet should find them easily. From the user’s perspective, there is only one knob to control: whether the resulting image will be written out to a file or not. If the user passes #:to-file? #t as part of the compiler options (see The Scheme Compiler), the linker will align the resulting segments on page boundaries, and otherwise not.

Scheme Procedure: link-assembly asm #:page-aligned?=#t

Link an ELF image, and return the bytevector. If page-aligned? is true, Guile will align the segments with different permissions on page-sized boundaries, in order to maximize code sharing between different processes. Otherwise, padding is minimized, to minimize address space consumption.

To write an image to disk, just use put-bytevector from (ice-9 binary-ports).

Compiling object code to the fake language, value, is performed via loading objcode into a program, then executing that thunk with respect to the compilation environment. Normally the environment propagates through the compiler transparently, but users may specify the compilation environment manually as well, as a module. Procedures to load images can be found in the (system vm loader) module:

(use-modules (system vm loader))
Scheme Variable: load-thunk-from-file file
C Function: scm_load_thunk_from_file (file)

Load object code from a file named file. The file will be mapped into memory via mmap, so this is a very fast operation.

Scheme Variable: load-thunk-from-memory bv
C Function: scm_load_thunk_from_memory (bv)

Load object code from a bytevector. The data will be copied out of the bytevector in order to ensure proper alignment of embedded Scheme values.

Additionally there are procedures to find the ELF image for a given pointer, or to list all mapped ELF images:

Scheme Variable: find-mapped-elf-image ptr

Given the integer value ptr, find and return the ELF image that contains that pointer, as a bytevector. If no image is found, return #f. This routine is mostly used by debuggers and other introspective tools.

Scheme Variable: all-mapped-elf-images

Return all mapped ELF images, as a list of bytevectors.

Next: , Previous: , Up: Compiling to the Virtual Machine   [Contents][Index]