Compiling CPS in Guile has three phases: conversion, optimization, and code generation.
CPS conversion is the process of taking a higher-level language and
compiling it to CPS. Source languages can do this directly, or they can
convert to Tree-IL (which is probably easier) and let Tree-IL convert to
CPS later. Going through Tree-IL has the advantage of running Tree-IL
optimization passes, like partial evaluation. Also, the compiler from
Tree-IL to CPS handles assignment conversion, in which assigned local
variables (in Tree-IL, locals that are
converted to being boxed values on the heap. See Variables and the VM.
After CPS conversion, Guile runs some optimization passes. The major
optimization performed on CPS is contification, in which functions that
are always called with the same continuation are incorporated directly
into a function’s body. This opens up space for more optimizations, and
turns procedure calls into
goto. It can also make loops out of
recursive function nests.
At the time of this writing (2014), most high-level optimization in Guile is done on Tree-IL. We would like to rewrite many of these passes to operate on CPS instead, as it is easier to reason about CPS.
The rest of the optimization passes are really cleanups and canonicalizations. CPS spans the gap between high-level languages and low-level bytecodes, which allows much of the compilation process to be expressed as source-to-source transformations. Such is the case for closure conversion, in which references to variables that are free in a function are converted to closure references, and in which functions are converted to closures. There are a few more passes to ensure that the only primcalls left in the term are those that have a corresponding instruction in the virtual machine, and that their continuations expect the right number of values.
Finally, the backend of the CPS compiler emits bytecode for each function, one by one. To do so, it determines the set of live variables at all points in the function. Using this liveness information, it allocates stack slots to each variable, such that a variable can live in one slot for the duration of its lifetime, without shuffling. (Of course, variables with disjoint lifetimes can share a slot.) Finally the backend emits code, typically just one VM instruction, for each continuation in the function.