5.4.2 Garbage Collection

As explained above, the SCM type can represent all Scheme values. Some values fit entirely into a SCM value (such as small integers), but other values require additional storage in the heap (such as strings and vectors). This additional storage is managed automatically by Guile. You don’t need to explicitly deallocate it when a SCM value is no longer used.

Two things must be guaranteed so that Guile is able to manage the storage automatically: it must know about all blocks of memory that have ever been allocated for Scheme values, and it must know about all Scheme values that are still being used. Given this knowledge, Guile can periodically free all blocks that have been allocated but are not used by any active Scheme values. This activity is called garbage collection.

Guile’s garbage collector will automatically discover references to SCM objects that originate in global variables, static data sections, function arguments or local variables on the C and Scheme stacks, and values in machine registers. Other references to SCM objects, such as those in other random data structures in the C heap that contain fields of type SCM, can be made visible to the garbage collector by calling the functions scm_gc_protect_object or scm_permanent_object. Collectively, these values form the “root set” of garbage collection; any value on the heap that is referenced directly or indirectly by a member of the root set is preserved, and all other objects are eligible for reclamation.

In Guile, garbage collection has two logical phases: the mark phase, in which the collector discovers the set of all live objects, and the sweep phase, in which the collector reclaims the resources associated with dead objects. The mark phase pauses the program and traces all SCM object references, starting with the root set. The sweep phase actually runs concurrently with the main program, incrementally reclaiming memory as needed by allocation.

In the mark phase, the garbage collector traces the Scheme stack and heap precisely. Because the Scheme stack and heap are managed by Guile, Guile can know precisely where in those data structures it might find references to other heap objects. This is not the case, unfortunately, for pointers on the C stack and static data segment. Instead of requiring the user to inform Guile about all variables in C that might point to heap objects, Guile traces the C stack and static data segment conservatively. That is to say, Guile just treats every word on the C stack and every C global variable as a potential reference in to the Scheme heap4. Any value that looks like a pointer to a GC-managed object is treated as such, whether it actually is a reference or not. Thus, scanning the C stack and static data segment is guaranteed to find all actual references, but it might also find words that only accidentally look like references. These “false positives” might keep SCM objects alive that would otherwise be considered dead. While this might waste memory, keeping an object around longer than it strictly needs to is harmless. This is why this technique is called “conservative garbage collection”. In practice, the wasted memory seems to be no problem, as the static C root set is almost always finite and small, given that the Scheme stack is separate from the C stack.

The stack of every thread is scanned in this way and the registers of the CPU and all other memory locations where local variables or function parameters might show up are included in this scan as well.

The consequence of the conservative scanning is that you can just declare local variables and function parameters of type SCM and be sure that the garbage collector will not free the corresponding objects.

However, a local variable or function parameter is only protected as long as it is really on the stack (or in some register). As an optimization, the C compiler might reuse its location for some other value and the SCM object would no longer be protected. Normally, this leads to exactly the right behavior: the compiler will only overwrite a reference when it is no longer needed and thus the object becomes unprotected precisely when the reference disappears, just as wanted.

There are situations, however, where a SCM object needs to be around longer than its reference from a local variable or function parameter. This happens, for example, when you retrieve some pointer from a foreign object and work with that pointer directly. The reference to the SCM foreign object might be dead after the pointer has been retrieved, but the pointer itself (and the memory pointed to) is still in use and thus the foreign object must be protected. The compiler does not know about this connection and might overwrite the SCM reference too early.

To get around this problem, you can use scm_remember_upto_here_1 and its cousins. It will keep the compiler from overwriting the reference. See Foreign Object Memory Management.


Footnotes

(4)

Note that Guile does not scan the C heap for references, so a reference to a SCM object from a memory segment allocated with malloc will have to use some other means to keep the SCM object alive. See Function related to Garbage Collection.