9.4.4.4 CPS Soup

We describe programs in Guile’s CPS language as being a kind of “soup” because all continuations in the program are mixed into the same “pot”, so to speak, without explicit markers as to what function or scope a continuation is in. A program in CPS is a map from continuation labels to continuation values. As discussed in the introduction, a continuation label is an integer. No label may be negative.

As a matter of convention, label 0 should map to the $kfun continuation of the entry to the program, which should be a function of no arguments. The body of a function consists of the labelled continuations that are reachable from the function entry. A program can refer to other functions, either via $fun and $rec in higher-order CPS, or via $const-fun, $callk, and allocated closures in first-order CPS. The program logically contains all continuations of all functions reachable from the entry function. A compiler pass may leave unreachable continuations in a program; subsequent compiler passes should ensure that their transformations and analyses only take reachable continuations into account. It’s OK though if transformation runs over all continuations if including the unreachable continuations has no effect on the transformations on the live continuations.

The “soup” itself is implemented as an intmap, a functional array-mapped trie specialized for integer keys. Intmaps associate integers with values of any kind. Currently intmaps are a private data structure only used by the CPS phase of the compiler. To work with intmaps, load the (language cps intmap) module:

(use-modules (language cps intmap))

Intmaps are functional data structures, so there is no constructor as such: one can simply start with the empty intmap and add entries to it.

(intmap? empty-intmap) ⇒ #t
(define x (intmap-add empty-intmap 42 "hi"))
(intmap? x) ⇒ #t
(intmap-ref x 42) ⇒ "hi"
(intmap-ref x 43) ⇒ error: 43 not present
(intmap-ref x 43 (lambda (k) "yo!")) ⇒ "yo"
(intmap-add x 42 "hej") ⇒ error: 42 already present

intmap-ref and intmap-add are the core of the intmap interface. There is also intmap-replace, which replaces the value associated with a given key, requiring that the key was present already, and intmap-remove, which removes a key from an intmap.

Intmaps have a tree-like structure that is well-suited to set operations such as union and intersection, so there are also the binary intmap-union and intmap-intersect procedures. If the result is equivalent to either argument, that argument is returned as-is; in that way, one can detect whether the set operation produced a new result simply by checking with eq?. This makes intmaps useful when computing fixed points.

If a key is present in both intmaps and the associated values are not the same in the sense of eq?, the resulting value is determined by a “meet” procedure, which is the optional last argument to intmap-union, intmap-intersect, and also to intmap-add, intmap-replace, and similar functions. The meet procedure will be called with the two values and should return the intersected or unioned value in some domain-specific way. If no meet procedure is given, the default meet procedure will raise an error.

To traverse over the set of values in an intmap, there are the intmap-next and intmap-prev procedures. For example, if intmap x has one entry mapping 42 to some value, we would have:

(intmap-next x) ⇒ 42
(intmap-next x 0) ⇒ 42
(intmap-next x 42) ⇒ 42
(intmap-next x 43) ⇒ #f
(intmap-prev x) ⇒ 42
(intmap-prev x 42) ⇒ 42
(intmap-prev x 41) ⇒ #f

There is also the intmap-fold procedure, which folds over keys and values in the intmap from lowest to highest value, and intmap-fold-right which does so in the opposite direction. These procedures may take up to 3 seed values. The number of values that the fold procedure returns is the number of seed values.

(define q (intmap-add (intmap-add empty-intmap 1 2) 3 4))
(intmap-fold acons q '()) ⇒ ((3 . 4) (1 . 2))
(intmap-fold-right acons q '()) ⇒ ((1 . 2) (3 . 4))

When an entry in an intmap is updated (removed, added, or changed), a new intmap is created that shares structure with the original intmap. This operation ensures that the result of existing computations is not affected by future computations: no mutation is ever visible to user code. This is a great property in a compiler data structure, as it lets us hold a copy of a program before a transformation and use it while we build a post-transformation program. Updating an intmap is O(log n) in the size of the intmap.

However, the O(log n) allocation costs are sometimes too much, especially in cases when we know that we can just update the intmap in place. As an example, say we have an intmap mapping the integers 1 to 100 to the integers 42 to 141. Let’s say that we want to transform this map by adding 1 to each value. There is already an efficient intmap-map procedure in the (language cps utils) module, but if we didn’t know about that we might do:

(define (intmap-increment map)
  (let lp ((k 0) (map map))
    (let ((k (intmap-next map k)))
      (if k
          (let ((v (intmap-ref map k)))
            (lp (1+ k) (intmap-replace map k (1+ v))))
          map))))

Observe that the intermediate values created by intmap-replace are completely invisible to the program – only the last result of intmap-replace value is needed. The rest might as well share state with the last one, and we could update in place. Guile allows this kind of interface via transient intmaps, inspired by Clojure’s transient interface (http://clojure.org/transients).

The in-place intmap-add! and intmap-replace! procedures return transient intmaps. If one of these in-place procedures is called on a normal persistent intmap, a new transient intmap is created. This is an O(1) operation. In all other respects the interface is like their persistent counterparts, intmap-add and intmap-replace. If an in-place procedure is called on a transient intmap, the intmap is mutated in-place and the same value is returned.

If a persistent operation like intmap-add is called on a transient intmap, the transient’s mutable substructure is then marked as persistent, and intmap-add then runs on a new persistent intmap sharing structure but not state with the original transient. Mutating a transient will cause enough copying to ensure that it can make its change, but if part of its substructure is already “owned” by it, no more copying is needed.

We can use transients to make intmap-increment more efficient. The two changed elements have been marked like this.

(define (intmap-increment map)
  (let lp ((k 0) (map map))
    (let ((k (intmap-next map k)))
      (if k
          (let ((v (intmap-ref map k)))
            (lp (1+ k) (intmap-replace! map k (1+ v))))
          (persistent-intmap map)))))

Be sure to tag the result as persistent using the persistent-intmap procedure to prevent the mutability from leaking to other parts of the program. For added paranoia, you could call persistent-intmap on the incoming map, to ensure that if it were already transient, that the mutations in the body of intmap-increment wouldn’t affect the incoming value.

In summary, programs in CPS are intmaps whose values are continuations. See the source code of (language cps utils) for a number of useful facilities for working with CPS values.