Kawa: Modules and how they are compiled to classes

Modules and how they are compiled to classes

Modules provide a way to organize Scheme into reusable parts with explicitly defined interfaces to the rest of the program. A module is a set of definitions that the module exports, as well as some actions (expressions evaluated for their side effect). The top-level forms in a Scheme source file compile a module; the source file is the module source. When Kawa compiles the module source, the result is the module class. Each exported definition is translated to a public field in the module class.

Name visibility

The definitions that a module exports are accessible to other modules. These are the "public" definitions, to use Java terminology. By default, all the identifiers declared at the top-level of a module are exported, except those defined using define-private. (If compiling with the --main flag, then by default no identifiers are exported.) However, a major purpose of using modules is to control the set of names exported. One reason is to reduce the chance of accidental name conflicts between separately developed modules. An even more important reason is to enforce an interface: Client modules should only use the names that are part of a documented interface, and should not use internal implementation procedures (since those may change).

If there is a module-export (or export) declaration in the module, then only those names listed are exported. There can be more than one module-export, and they can be anywhere in the Scheme file. The recommended style has a single module-export near the beginning of the file.

Syntax: module-export export-spec^*

Syntax: export export-spec^*

The forms export and module-export are equivalent. (The older Kawa name is module-export; export comes from R7RS.) Either form specifies a list of identifiers which can be made visible to other libraries or programs.

export-spec ::= identifier
| (rename identifier₁ identifier₂)

In the former variant, an identifier names a single binding defined within or imported into the library, where the external name for the export is the same as the name of the binding within the library. A rename spec exports the binding defined within or imported into the library and named by identifier₁, using identifier₂ as the external name.

Note that it is an error if there is no definition for identifier (or identifier₁) in the current module, or if it is defined using define-private.

As a matter of style, export or module-export should appear after module-name but before other commands (including import or require). (This is a requirement if there are any cycles.)

In this module, fact is public and worker is private:

(module-export fact)
(define (worker x) ...)
(define (fact x) ...)

Alternatively, you can write:

(define-private (worker x) ...)
(define (fact x) ...)

R7RS explicit library modules

A R7RS define-library form is another way to create a module. The R7RS term library is roughly the same as a Kawa module. In Kawa, each source file is a implicit module, which may contain zero or more explicit sub-modules (in the form of define-library) optionally followed by the definitions and expressions of the implicit (file-level) module.

Syntax: define-library library-name library-declaration^*

library-name ::= ( library-name-parts )
library-name-parts ::= identifier⁺

A library-name is a list whose members are identifiers and exact non-negative integers. It is used to identify the library uniquely when importing from other programs or libraries. Libraries whose first identifier is scheme are reserved for use by the R7RS report and future versions of that report. Libraries whose first identifier is srfi are reserved for libraries implementing Scheme Requests for Implementation. It is inadvisable, but not an error, for identifiers in library names to contain any of the characters | \ ? * < " : > + [ ] / . or control characters after escapes are expanded.

See module-name for how a library-name is mapped to a class name.

The begin, include, and include-ci declarations are used to specify the body of the library. They have the same syntax and semantics as the corresponding expression types. This form of begin is analogous to, but not the same as regular begin. A plain statement (which is allowed as a Kawa extension) is also part of the body of the library, as if it were wrapped in a begin).

The include-library-declarations declaration is similar to include except that the contents of the file are spliced directly into the current library definition. This can be used, for example, to share the same export declaration among multiple libraries as a simple form of library interface.

The cond-expand declaration has the same syntax and semantics as the cond-expand expression type, except that it expands to spliced-in library declarations rather than expressions enclosed in begin.

When a library is loaded, its expressions are executed in textual order. If a library’s definitions are referenced in the expanded form of a program or library body, then that library must be loaded before the expanded program or library body is evaluated. This rule applies transitively. If a library is imported by more than one program or library, it may possibly be loaded additional times.

Similarly, during the expansion of a library (foo), if any syntax keywords imported from another library (bar) are needed to expand the library, then the library (bar) must be expanded and its syntax definitions evaluated before the expansion of (foo).

Regardless of the number of times that a library is loaded, each program or library that imports bindings from a library must do so from a single loading of that library, regardless of the number of import declarations in which it appears. That is, (import (only (foo) a)) followed by (import (only (foo) b)) has the same effect as (import (only (foo) a b)).

How a module becomes a class

If you want to just use a Scheme module as a module (i.e. load or require it), you don’t care how it gets translated into a module class. However, Kawa gives you some control over how this is done, and you can use a Scheme module to define a class which you can use with other Java classes. This style of class definition is an alternative to define-class, which lets you define classes and instances fairly conveniently.

The default name of the module class is the main part of the filename of the Scheme source file (with directories and extensions stripped off). That can be overridden by the -T Kawa command-line flag. The package-prefix specified by the -P flag is prepended to give the fully-qualified class name.

Syntax: module-name name

Syntax: module-name <name>

Syntax: module-name library-name

Sets the name of the generated class, overriding the default. If there is no ‘.’ in the name, the package-prefix (specified by the -P Kawa command-line flag) is prepended.

If the form library-name is used, then the class name is the result of taking each identifier in the library-name-parts, mangling if needed, and concatenating them separated by periods. For example (org example doc-utils) becomes org.example.doc-utils. (You can’t reference the class name doc-utils directly in Java, but the JVM has no problems with it. In Java you can use reflection to access classes with such names.)

As a matter of style, module-name should be the first command in a file (after possible comments). It must appear before a require or import, in case of cycles.

By default, the base class of the generated module class is unspecified; you cannot count on it being more specific than Object. However, you can override it with module-extends.

Syntax: module-extends class

Specifies that the class generated from the immediately surrounding module should extend (be a sub-class of) the class class.

Syntax: module-implements interface ...

Specifies that the class generated from the immediately surrounding module should implement the interfaces listed.

Note that the compiler does not currently check that all the abstract methods requires by the base class or implemented interfaces are actually provided, and have the correct signatures. This will hopefully be fixed, but for now, if you are forgot a method, you will probably get a verifier error

For each top-level exported definition the compiler creates a corresponding public field with a similar (mangled) name. By default, there is some indirection: The value of the Scheme variable is not that of the field itself. Instead, the field is a gnu.mapping.Location object, and the value Scheme variable is defined to be the value stored in the Location. Howewer, if you specify an explicit type, then the field will have the specified type, instead of being a Location. The indirection using Location is also avoided if you use define-constant.

If the Scheme definition defines a procedure (which is not re-assigned in the module), then the compiler assumes the variable as bound as a constant procedure. The compiler generates one or more methods corresponding to the body of the Scheme procedure. It also generates a public field with the same name; the value of the field is an instance of a subclass of <gnu.mapping.Procedure> which when applied will execute the correct method (depending on the actual arguments). The field is used when the procedure used as a value (such as being passed as an argument to map), but when the compiler is able to do so, it will generate code to call the correct method directly.

You can control the signature of the generated method by declaring the parameter types and the return type of the method. See the applet (see Applet compilation) example for how this can be done. If the procedures has optional parameters, then the compiler will generate multiple methods, one for each argument list length. (In rare cases the default expression may be such that this is not possible, in which case an "variable argument list" method is generated instead. This only happens when there is a nested scope inside the default expression, which is very contrived.) If there are #!keyword or #!rest arguments, the compiler generate a "variable argument list" method. This is a method whose last parameter is either an array or a <list>, and whose name has $V appended to indicate the last parameter is a list.

Top-leval macros (defined using either define-syntax or defmacro) create a field whose type is currently a sub-class of kawa.lang.Syntax; this allows importing modules to detect that the field is a macro and apply the macro at compile time.

Unfortunately, the Java class verifier does not allow fields to have arbitrary names. Therefore, the name of a field that represents a Scheme variable is "mangled" (see Mangling) into an acceptable Java name. The implementation can recover the original name of a field X as ((gnu.mapping.Named) X).getName() because all the standard compiler-generated field types implement the Named interface.

Same class for module and defined class

You can declare a class using define-simple-class with the same name as the module class, for example the following in a file named foo.scm:

(define-simple-class foo ...)

In this case the defined class will serve dual-purpose as the module class.

To avoid confusion, in this case you must not specify module-extends, module-implements, or (module-static #t). Also, the defined class should not have public static members. In that case it works out pretty well: public static members represent bindings exported by the module; other non-private members “belong” to the defined class.

In this case (module-static 'init-run) is implied.

Static vs non-static modules

There are two kinds of module class: A static module is a class (or gets compiled to a class) all of whose public fields are static, and that does not have a public constructor. A JVM can only have a single global instance of a static module. An instance module has a public default constructor, and usually has at least one non-static public field. There can be multiple instances of an instance module; each instance is called a module instance. However, only a single instance of a module can be registered in an environment, so in most cases there is only a single instance of instance modules. Registering an instance in an environment means creating a binding mapping a magic name (derived from the class name) to the instance.

In fact, any Java class class that has the properties of either an instance module or a static module, is a module, and can be loaded or imported as such; the class need not have written using Scheme.

You can control whether a module is compiled to a static or a non-static class using either a command-line flag to the compiler, or using the module-static special form.

--module-static: Generate a static module (as if (module-static #t) were specified). This is (now) the default.
--module-nonstatic
--no-module-static: Generate a non-static module (as if (module-static #f) were specified). This used to be the default.
--module-static-run: Generate a static module (as if (module-static 'init-run) were specified).

Syntax: module-static name ...

Syntax: module-static #t

Syntax: module-static #f

Syntax: module-static 'init-run

Control whether the generated fields and methods are static. If #t or 'init-run is specified, then the module will be a static module, all definitions will be static. If 'init-run is specified, in addition the module body is evaluated in the class’s static initializer. (Otherwise, it is run the first time it is require’d.) Otherwise, the module is an instance module. If there is a non-empty list of names then the module is an instance module, but the names that are explicitly listed will be compiled to static fields and methods. If #f is specified, then all exported names will be compiled to non-static (instance) fields and methods.

By default, if no module-static is specified:

If there is a module-extends or module-implements declaration, or one of the --applet or --servlet command-line flags was specified, then (module-static #f) is implied.

If one of the command-line flags --no-module-static, --module-nonstatic, --module-static, or --module-static-run was specified, then the default is #f, #f, #t, or 'init-run, respectively.

If the module class is dual-purpose then (module-static 'init-run) is implied.

Otherwise the default is (module-static #t). (It used to be (module-static #f) in older Kawa versions.)

The default is (module-static #t). It usually produces more efficient code, and is recommended if a module contains only procedure or macro definitions. However, a static module means that all environments in a JVM share the same bindings, which you may not want if you use multiple top-level environments.

The top-level actions of a module will get compiled to a run method. If there is an explicit method-extends, then the module class will also automatically implement java.lang.Runnable. (Otherwise, the class does not implement Runnable, since in that case the run method return an Object rather than void. This will likely change.)

Module options

Certain compilation options can be be specified either on the command-line when compiling, or in the module itself.

Syntax: module-compile-options [key: value] ...

This sets the value of the key option to value for the current module (source file). It takes effect as soon it is seen during the first macro-expansion pass, and is active thereafter (unless overridden by with-compile-options).

The key: is one of the supported option names (The ending colon makes it a Kawa keyword). Valid option keys are:

main: - Generate an application, with a main method.

full-tailcalls: - Use a calling convention that supports proper tail recursion.

warn-undefined-variable: - Warn if no compiler-visible binding for a variable.

warn-unknown-member: - Warn if referencing an unknown method or field.

warn-invoke-unknown-method: - Warn if invoke calls an unknown method (subsumed by warn-unknown-member).

warn-unused: - Warn if a variable is usused or code never executed.

warn-uninitialized: - Warn if accessing an uninitialized variable.

warn-unreachable: - Warn if this code can never be executed.

warn-void-used: - Warn if an expression depends on the value of a void sub-expression (one that never returns a value).

warn-as-error: - Treat a compilation warning as if it were an error.

The value must be a literal value: either a boolean (#t or #f), a number, or a string, depending on the key. (All the options so far are boolean options.)
(module-compile-options warn-undefined-variable: #t)
;; This causes a warning message that y is unknown.
(define (func x) (list x y))

Syntax: with-compile-options [key: value] ... body

Similar to module-compile-options, but the option is only active within body.

The module option key main: has no effect when applied to a particular body via the with-compile-options syntax.
(define (func x)
  (with-compile-options warn-invoke-unknown-method: #f
    (invoke x 'size)))