Semantic analysis

The translation phase takes a top-level form (or body), and generates a ModuleExp, which is a top-level expression. This is done using a Translator, which keeps track of lexical bindings and other translation state.

class Translator
{ ...;
  public Expression rewrite(Object exp)
  { ... }
  public Expression syntaxError
   (String message) { ... }
}

The rewrite method converts a Scheme source form to an Expression. The syntaxError method is called when a syntax error is seen. It prints out the current source filename and line number with the given message.

Name resolution and scanning

In addition to handling special forms (such as lambda), the rewrite phase also handles name resolution of identifiers to their lexical declarations. This is complicated because a variable reference may appear before the declaration that defines it, for example in a letrec. The solution is to use two phases: The first scan sub-phase looks for declarations, macro-expanding outermost macro applications to determine if the result is a declaration. The second rewrite sub-phase takes the result of the scan phase, using the declarations that scan produced, and rewrites any deferred forms. The phases are actually interleaved, because a Scheme body may contain internal definitions. Thus to rewrite a body we first have to scan it for definitions, before we rewrite the result.

We won't go into further details about the scan phase.

Syntax and Macros

class Syntax
{ ...;
  public abstract Expression rewriteForm
    (Object obj, Translator tr);
}

The rewrite method in Translator checks for syntactic keywords and macros. If the car of a call is a Syntax or if it is a Symbol that is bound to a Syntax, then the rewriteForm method of the Syntax is called.

As an example, this trivial class implements quote:

class quote extends Syntax
{ ...;
  public Expression rewriteForm (Object form, Translator tr)
  { // Error-checking is left out.
    return new QuoteExp((((Pair) form).cdr).car);
  }
}

(The actual implementation of quote is more complex, since it has to interpolate forms bound to pattern variables.)

A Macro is a Syntax where rewriteForm calls a transformer function:

class Macro extends Syntax
{ ...;
  Procedure transformer;
  Expression rewriteForm (Object form, Translator tr)
  {
    return tr.rewrite(expand(form, tr));
  }
  Object expand (Object form, Translator tr)
  { // Much simplified.
    return transformer.apply1(form);
  }
}

When Kawa sees a define-syntax, it creates a Macro that contains the transformer resulting from the transformer expression. The latter may be a lambda expression; that is normally the case when using syntax-case. Alternatively, the transformer expression may be a syntax-rules form. That gets compiled to a SyntaxRules object: This contains an encoded representation of the patterns and templates in the syntax-rules.

class SyntaxRules extends Procedure1
{ ...;
  SyntaxRule[] rules;
  public Object apply1 (Object arg)
  {
    Translator tr = getCurrentTranslator();
    Object[] v = new Object[maxVars];
    for (int i = 0;  i < rules.length;)
    {
      SyntaxRule r = rules[i++];
      if (r.match (obj, v))
        return r.execute(v, tr);
    }
    return tr.syntaxError
      ("no matching syntax-rule");
  }
}

Hygienic macros

A problem when writing a macro is that identifiers you use internally in the macro might conflict with identifiers that appear at the macro call site. User-supplied code might accidentally reference a temporary variable in the macro expansion or vice versa. A macro system is hygienic when it automatically separates the names in the macro (and the macro definition context) from those in the macro expansion context. Conceptually, this is done by magically renaming certain names. Macros defined using the standard syntax-rules are by default hygienic, as are by default those defined using the syntax-case system [Dybvig93] [Waddell99] provided by Kawa and some other Scheme implementations. It is possible to override hygiene when using syntax-case, though we won't go into that.

The following discussion will focus on syntax-case, since we can view syntax-rules as a short-hand syntax for combinating a syntax-case with a syntax template for each alternative.

Syntax objects

The key to understanding syntax-case is that it uses syntax objects, which combine the actual Scheme source form (a list, symbol, or literal) with its syntactic context. A syntax object is implemented in two ways. An implicit syntax object is just the Scheme source form. In that case the syntactic context is implicitly the current lexical context. An explicit syntax object is implemented using SyntaxForm:

class SyntaxForm
{
  Object form;
  TemplateScope scope;
}

The form is the orginal form from the Scheme reader, while the scope field identifies the lexical context. A SyntaxForm object is created when a syntax template (i.e. a syntax expression) is evaluated; the scope field is the scope of the syntax form within the macro definition, which may be unrelated to the lexical scope of the expansion context.

Syntax templates

If a syntax form doesn't contain any pattern variables, then the result is a single SyntaxForm object that wraps the syntax argument expression. However, any pattern variables are returned as-is. This may require some destructuring of the template, rather like quasi-quotation. Consider this example:

(syntax-case form ()
  ((_ e1 e2)
   (syntax (list a e1 e2 b c))))

The syntax form is compiled as if it were (using the cons* function from SRFI-1):

(cons* (syntax list) (syntax a) e1 e1 (syntax b c))

Note that in Scheme code you'd have to write e1 and e2 as (syntax e1) and (syntax e2), since you're only allowed to reference pattern variables in syntax templates. However, syntax on a pattern variable is implemented as the identity function.

When the Kawa compiler sees a syntax, it translates the template into a SyntaxTemplate object. This object contains a compact encoding of the template, plus a reference to the current lexical scope. Evaluating the syntax form translates into invoking the execute method of the SyntaxTemplate. The execute creates a fresh TemplateScope instance, when is used for the scope field of any generated SyntaxForm objects.

Syntax patterns

The pattern in a syntax-case or syntax-rules is matched against a syntax object, which can be an explicit SyntaxForm object, an implicit syntax object (a plain Scheme value), or a mixture of the two. The destructuring specified by the pattern may need to decend into a SyntaxForm. Consider for example this syntax object:

`(,(syntax a) ,(syntax (b c d)) f)

being matched against this pattern:

(x (y . z) . r)

Then we get these bindings:

x: (syntax a)
y: (syntax b)
z: (syntax (c d))
r: (f)

The scope fields of y and z are taken from the original (syntax (b c d)). Note that the last element of the original list is a plain Scheme form - i.e. an implicit syntax form, so that's what the pattern variable is bound to, and that's what get inserted into any template that references r.

The Kawa compiler compiles a syntax-case into a SyntaxPattern object. A single rule in a syntax-rules is compiled to a SyntaxRule, which is a combination of a SyntaxPattern with a SyntaxTemplate. The whole syntax-rules is compiled to a SyntaxRules, which is basically a collection of SyntaxRule objects that implements Procedure1 so it can be used as the transformer of a define-syntax macro.

Template scopes

A TemplateScope is created when a syntax template is expanded.

class TemplateScope extends LetExp
{
}

Initially, a TemplateScope doesn't contain any declarations, but it may gain implicit alias declarations as explained below. In addition, a TemplateScope inherits declarations from the parent scope, which is set to the context of the macro definition.

Kawa's rewrite stage translates an input syntax object to an Expression. The initial syntax object is an implicit syntax object, consisting of a form from the Scheme reader. If we're processing the output from a macro, then the syntax object may contain SyntaxForm objects. To rewrite a SyntaxForm we temporarily change the current scope to the TemplateScope. We then rewrite the form value of the SyntaxForm, using that TemplateScope. This means that identifiers will be looked up in the TemplateScope. This normally doesn't have direct declarations, so effectively we're searching the TemplateScope's parent, which is the syntactic context of the macro definition, as desired. The current scope is restored to the orginal scope when we're done rewriting the SyntaxForm. (Setting and restoring the current scope isn't implemented very efficiently, though it probably doesn't matter in normal code.)

Things get more complicated if the syntax template creates new definitions, but using names coming from the expansion site rather than the template. Consider this example:

(define-syntax mac
  (syntax-rules ()
    ((mac v1 init exp)
     (let ((v1 init))
       (let ((i 2))
         (list exp i))))))
(define j 10)
(define k (mac i 1 (+ i j))) ;; => (11 2)

Here we have a pair of nested let-expressions, both of which end up declaring variables named i. The list-expression references the inner i literally. After macro-expansion the exp also references i, but its syntactic scope comes from the macro application, so it should match the i declaration whose name also comes from the macro application - which is that of the outer let.

This is how it is done: When rewriting the outer let we create a normal LetExp. Rewriting the inner let also creates a LetExp, nested in the normal way. However, the name of this inner i declaration comes from the macro template, so we need to hide it from any code not from the same template. This is where we use the TemplateScope: We place the declaration for the inner i in the TemplateScope, rather than in the inner LetExp where it actually belongs. When we rewrite the first operand of the list, it uses the normal scoping, so it doesn't see the inner i, since that's in the TemplateScope. On the other hand the second operand of list is a SyntaxForm whose scope is the TemplateScope, so we change the current scope to that TemplateScope. This hides the outer i and makes the inner i visible. When we're done rewriting the inner let, we move the inner i declaration out of the TemplateScope (which gets garbage collected) and into the inner LetExp where it stays for the rest of compilation. (The implementation doesn't actually move the declaration. Instead, it initially creates the declaration with a null name in the LetExp and an alias that references to it from the TemplateScope. When we're done, it sets the name from the alias.)