Ideas and tasks for contributing to Kawa

Kawa (like other Free Software projects) has no lack of tasks and projects to work on. Here are some ideas.

Compiler should use class-file reading instead of reflection

The Kawa compiler currently uses reflection to determine properties (such as exported function definitions) from referenced classes. It would be better to read class files. This should not be too difficult, since the gnu.bytecode library abstracts over class information read by reflection or class reading.

Make use of Java-7 MethodHandles

Java 7 supports MethodHandles which are meant to provide better performance (ultimately) for dynamic languages. See JSR 292 and the Da Vinci Machine Project. MethodHandles will be used to compile lambdas in Java 8. Kawa can already be compiled to use Methodhandles, but only in one unimportant way. There much more to be done. For example we can start by optimizing arithmetic when the types are unknown at compile-time. They could make implementing generic functions (multimethods) more efficient. At some point we want to compile lambdas in the same way as the experimental Java 8 branch does. This can potenitally be more efficient than Kawa’s current mechanism.

R6RS and R7RS libraries and syntax

Kawa supports most of the functionality of R6RS and (draft) R7RS. However, various R6R7 or (more importantly) R7RS features are missing, and should be added. For eaxmple, both R6RS or R7RS library definition syntax are unimplemented. In a related matter, Kawa supports most of the functionality of syntax-case, but some pieces are missing, and no doubt some of it is incorrect. Adding the missing pieces and testing for correctness of corner cases is needed. Andre van Tonder’s R6RS expander may be helpful.

It would be useful to extend the import form (and also the require form when no explicit filename is given) to search a “source path” for a matching source file, automatically compiling it as needed (as done in the require form when an explicit filename is given). In interactive mode, if the module is already loaded, check if it is updated - if not recompile and re-load it.

Optimize switches (case)

Implement SwitchExp as a new class extending Expression, and compile it using the existing gnu.bytecode.SwitchState. Use it to optimize Scheme’s case form. This might be better done without a new Expression, but instead using a special Procedure with custom “validation” and code-generation.

(This is a fairly small starter project.)

Parameterized types

Kawa has some limited support for parameterized types, but it’s not used much. Improve type inferencing. Support definition of parameterized classes. Better used of parameterized types for sequence class. Support wildcards. (It might be better to have wild-carding be associated with declarations, as in Scala, rather than uses.)

Function types

Kawa doesn’t have true function types: Parameter and result types are only handled for “known” functions. Adding first-class function types would be a major task, possibly depending on improvements in Parameterized types.

Full continuations

Add support for full continuations, which is the major feature missing for Kawa to qualify as a “true Scheme”. One way to implement continuations is to add a add that converts the abstract syntax tree to continuation-pass-style, and then exhand the existing full-tail-call support to manage a stack. There are other ways to solve the problem. This may benefit from Faster tailcalls.

Faster tailcalls

Make --full-tailcalls run faster. This may depend on (or incorporate) TreeList-optimization.

TreeList-optimization

The TreeList class is a data structure for “flattened” trees. It is used for XML-style nodes, for multiple values, and for the full-tail-call API. The basic concept is fine, but it could do with some re-thinking to make make random-access indexing fast. Also, support for updating is insufficient. (This needs someone into designing and hacking on low-level data-structures, along with lots of profiling and testing.)

Implement R7RS exceptions

R6RS and R7RS specify an exception handling mechanism using the procedure with-exception-handler and syntax guard. This needs to be implemented. We also need to figure out how and if the R6RS exception handling model should co-exist with the Java exception handling model; the guard form may co-exist better. Kawa-generated exception should be caught by with-exception-handler and syntax guard, though that may be difficult to achieve. This message has some suggestions.

Asynchronous evaluation

C# recently added asynch and await keywords for asynchronous programming. Kawa’s recently improved support for lazy programming seems like a good framework for equivalent functionality: Instead of an asynch method that returns a Task<T> the Kawa programmer would write a function that returns a lazy[T]. This involves some design work, and modifying the compiler to rewrite the function body as needed.

REPL console and other REPL improvement

Improvements to the read-eval-print console. In addition to a traditional Swing console, it would be useful to support using a web browser as a a remote terminal, possibly using web-sockets. (This allows “printing” HTML-expressions, which can be a useful way to learn and experiment with web technologies.) See here for an article on the existing Swing REPL, along with some to-do items. Being able to hide and show different parts of the output might be nice. Being able to link from error messages to source might be nice. Better handling of redefinitions is discussed here in the context of JavaXF Script; this is a general REPL issue, mostly independent of the GUI for it.

XQuery-1.1-functionality

It would be nice to update the XQuery (Qexo) support to XQuery 1.1.

XQuery-updates

It would be nice to support XQuery updates. This depends on TreeList-optimization.

Common Lisp support

Kawa supports a small subset of the Common Lisp language, but it supports a much larger subset of core Common Lisp concepts and data structure, some designed with Common Lisp functionality in mind. Examples include packages, arrays, expanded function declarations, type specifications, and format. A lot could be done to improve the Common Lisp support with modest effort. Some Common Lisp features could also be useful for Scheme: Documentation strings (or markup) as Java annotations, better MOP-like introspection, and generic methods a la defmethod (i.e. with multiple definition statements, possibly in separate files, as opposed to the current make-procedure) all come to mind. Being able to run some existing Common Lisp code bases with at most modest changes should be the goal. One such package to start with might be a existing test framework, perhaps FivaAM. Full Common Lisp compatibility is nice, but let’s walk before we can run.

JEmacs improvements

A lot of work is needed to make JEmacs useful. One could try to import a useful package and see what works and what fails. Or one may look at basic editing primitives. Enhancements may be needed to core Emacs Lisp language primitives (enhancing Common Lisp support may help), or to the display engine.

Improved IDE integration

There is some Kawa support for Eclipse (Schemeway), and possibly other IDEs (NetBeans, IntelliJ). But many improvements are desirable. REPL improvements may be a component of this.

Plugin for NetBeans IDE

Kawa-Scheme support for the NetBeans IDE would be useful. One could perhaps build on the Clojure plugin.

Plugin for Eclipse IDE

Kawa-Scheme support for the Eclipse IDE would be useful. Probably makes sense to enhance SchemeWay. It may also make sense to build on the Dynamic Languages Toolkit, possibly making use of Schemeide, though DLTk seems more oriented towards interpreted non-JVM-based languages.

Improve Emacs integration

SLIME is an Emacs mode that provides IDE-like functionality. It supports Kawa.

JDEE is a Java development environment, so might have better hooks to the JVM and Java debugging architecture.

CEDET is a more general framework of development tools.

Implement javax.tools and code-range support in Kawa compiler

Kawa currently records the line and column position of syntactic elements. For an IDE it is desirable to have both start and end position of an element, for example so it can put a squiggly line under an erroneous form.

Output formatting and pretty printing

Kawa’s write should support back-references, as in SRFI-38. This isn’t difficult, but requires understanding and modifying the existing pretty-printer data structures. One you figured that out, it would be nice to integrate the pretty-printer with the REPL, so that window re-sizing re-breaks the output lines. It would be nice to enhance the pretty-printer to handle variable-width fonts and other “rich” text. Figuring out how to make the output formatter more flexible, more efficient, and more customizable are also desirable.

Shell/process syntax

See this message for a rough design.

Rich String Literals

Implement multi-line string literals with expression escapes. See Racket’s Scribble for some ideas and desired functonality. The exact syntax needs to be decided. Presumably the literal starts with the dispatch characters ‘#’ followed by some other character (most likely ‘"’, though possibly ‘{’). In the following the escape character is assumed to be ‘\’, but other possibilities include ‘@’ (as in Scribble), or ‘&’ (as in XML).

Various goals and ideas:

  • General Scribble-like syntax:

    \cmd[ expression ...]{textbody}

    This is roughly equivalent to embedding the result of evaluating:

    (cmd expression ... "textbody")

    Both [ expression ...] and {textbody} are optional. (We need some convention or delimiter to end cmd, especially when textbody is missing.) The cmd is also optional, inwhich case the expressions are embedded in the output.

  • The start delimiter and the end delimiter must be different - both for the entire string, and for embedded textbodies. This is for easier parsing (and catching errors) for both humans and tools. E.g. #"some text"# or #{some text} or #{some text}# depending on the delimiters chosen.

  • There should be some convention for stripping off indentation.

  • Should support optional end delimitors. This is useful to make long texts more readable, and also to catch errors. For example:

    \cmd{some body stuff \cmd}
    

    Here the two instances of cmd have to match. (The end \cmd has to be immediatly followed by ‘}’ to avoid ambiguity.)

  • Support format specifiers:

    \~3d(length l)
    

    is equivalent to:

    \[(format "~3d" (length l))]
    

    Should also handle printf-style format specifers: \%3d(length l).

  • Handle localization. I.e. some marker to indicate the string should be localized, in the manner of (and compatible with) GNU gettext. Optionally specify a localization key (to use as an index in the translation database); if there is no key specified, default to using the literal parts of the string.

  • Note in general the result is a “text”, which is a generalization of a string: A text is a sequence of characters *or* other values. If an embedded expression evaluates to a non-string, it is not automatically converted to a string. Conversion to a string is done on demand, for example on printing. (This is similar to values in the XML data model.)