9.3.5 Compiled Procedures are VM Programs

By default, when you enter in expressions at Guile’s REPL, they are first compiled to bytecode. Then that bytecode is executed to produce a value. If the expression evaluates to a procedure, the result of this process is a compiled procedure.

A compiled procedure is a compound object consisting of its bytecode and a reference to any captured lexical variables. In addition, when a procedure is compiled, it has associated metadata written to side tables, for instance a line number mapping, or its docstring. You can pick apart these pieces with the accessors in (system vm program). See Compiled Procedures, for a full API reference.

A procedure may reference data that was statically allocated when the procedure was compiled. For example, a pair of immediate objects (see Immediate Objects) can be allocated directly in the memory segment that contains the compiled bytecode, and accessed directly by the bytecode.

Another use for statically allocated data is to serve as a cache for a bytecode. Top-level variable lookups are handled in this way; the first time a top-level binding is referenced, the resolved variable will be stored in a cache. Thereafter all access to the variable goes through the cache cell. The variable’s value may change in the future, but the variable itself will not.

We can see how these concepts tie together by disassembling the foo function we defined earlier to see what is going on:

scheme@(guile-user)> (define (foo a) (lambda (b) (vector foo a b)))
scheme@(guile-user)> ,x foo
Disassembly of #<procedure foo (a)> at #xf1da30:

   0    (instrument-entry 164)                                at (unknown file):5:0
   2    (assert-nargs-ee/locals 2 1)    ;; 3 slots (1 arg)
   3    (allocate-words/immediate 2 3)                        at (unknown file):5:16
   4    (load-u64 0 0 65605)
   7    (word-set!/immediate 2 0 0)
   8    (load-label 0 7)                ;; anonymous procedure at #xf1da6c
  10    (word-set!/immediate 2 1 0)
  11    (scm-set!/immediate 2 2 1)
  12    (reset-frame 1)                 ;; 1 slot
  13    (handle-interrupts)
  14    (return-values)

Disassembly of anonymous procedure at #xf1da6c:

   0    (instrument-entry 183)                                at (unknown file):5:16
   2    (assert-nargs-ee/locals 2 3)    ;; 5 slots (1 arg)
   3    (static-ref 2 152)              ;; #<variable 112e530 value: #<procedure foo (a)>>
   5    (immediate-tag=? 2 7 0)         ;; heap-object?
   7    (je 19)                         ;; -> L2
   8    (static-ref 2 119)              ;; #<directory (guile-user) ca9750>
  10    (static-ref 1 127)              ;; foo
  12    (call-scm<-scm-scm 2 2 1 40)
  14    (immediate-tag=? 2 7 0)         ;; heap-object?
  16    (jne 8)                         ;; -> L1
  17    (scm-ref/immediate 0 2 1)
  18    (immediate-tag=? 0 4095 2308)   ;; undefined?
  20    (je 4)                          ;; -> L1
  21    (static-set! 2 134)             ;; #<variable 112e530 value: #<procedure foo (a)>>
  23    (j 3)                           ;; -> L2
  24    (throw/value 1 151)             ;; #(unbound-variable #f "Unbound variable: ~S")
  26    (scm-ref/immediate 2 2 1)
  27    (allocate-words/immediate 1 4)                        at (unknown file):5:28
  28    (load-u64 0 0 781)
  31    (word-set!/immediate 1 0 0)
  32    (scm-set!/immediate 1 1 2)
  33    (scm-ref/immediate 4 4 2)
  34    (scm-set!/immediate 1 2 4)
  35    (scm-set!/immediate 1 3 3)
  36    (mov 4 1)
  37    (reset-frame 1)                 ;; 1 slot
  38    (handle-interrupts)
  39    (return-values)

The first thing to notice is that the bytecode is at a fairly low level. When a program is compiled from Scheme to bytecode, it is expressed in terms of more primitive operations. As such, there can be more instructions than you might expect.

The first chunk of instructions is the outer foo procedure. It is followed by the code for the contained closure. The code can look daunting at first glance, but with practice it quickly becomes comprehensible, and indeed being able to read bytecode is an important step to understanding the low-level performance of Guile programs.

The foo function begins with a prelude. The instrument-entry bytecode increments a counter associated with the function. If the counter reaches a certain threshold, Guile will emit machine code (“JIT-compile”) for foo. Emitting machine code is fairly cheap but it does take time, so it’s not something you want to do for every function. Using a per-function counter and a global threshold allows Guile to spend time JIT-compiling only the “hot” functions.

Next in the prelude is an argument-checking instruction, which checks that it was called with only 1 argument (plus the callee function itself makes 2) and then reserves stack space for an additional 1 local.

Then from ip 3 to 11, we allocate a new closure by allocating a three-word object, initializing its first word to store a type tag, setting its second word to its code pointer, and finally at ip 11, storing local value 1 (the a argument) into the third word (the first free variable).

Before returning, foo “resets the frame” to hold only one local (the return value), runs any pending interrupts (see Asynchronous Interrupts) and then returns.

Note that local variables in Guile’s virtual machine are usually addressed relative to the stack pointer, which leads to a pleasantly efficient sp[n] access. However it can make the disassembly hard to read, because the sp can change during the function, and because incoming arguments are relative to the fp, not the sp.

To know what fp-relative slot corresponds to an sp-relative reference, scan up in the disassembly until you get to a “n slots” annotation; in our case, 3, indicating that the frame has space for 3 slots. Thus a zero-indexed sp-relative slot of 2 corresponds to the fp-relative slot of 0, which initially held the value of the closure being called. This means that Guile doesn’t need the value of the closure to compute its result, and so slot 0 was free for re-use, in this case for the result of making a new closure.

A closure is code with data. As you can see, making the closure involved making an object (ip 3), putting a code pointer in it (ip 8 and 10), and putting in the closure’s free variable (ip 11).

The second stanza disassembles the code for the closure. After the prelude, all of the code between ip 5 and 24 is related to loading the toplevel variable foo into slot 1. This lookup happens only once, and is associated with a cache; after the first run, the value in the cache will be a bound variable, and the code will jump from ip 7 to 26. On the first run, Guile gets the module associated with the function, calls out to a run-time routine to look up the variable, and checks that the variable is bound before initializing the cache. Either way, ip 26 dereferences the variable into local 2.

What follows is the allocation and initialization of the vector return value. Ip 27 does the allocation, and the following two instructions initialize the type-and-length tag for the object’s first word. Ip 32 sets word 1 of the object (the first vector slot) to the value of foo; ip 33 fetches the closure variable for a, then in ip 34 stores it in the second vector slot; and finally, in ip 35, local b is stored to the third vector slot. This is followed by the return sequence.