Qexo - an XQuery-to-Java compiler

Qexo runs on the JavaTM platform. It is written in Java, and it compiles XQuery expressions and programs to Java bytecodes (.class files). Qexo is Free Software (or open-source, if you prefer), available from the Qexo website.

Qexo is based and an part of the Kawa framework. Kawa (no relation to the now-defunct IDE of the same name) was originally written in 1996 at Cygnus Solutions (now part of Red Hat) to compile the Scheme functional programming language to Java bytecodes. Since then Kawa has been generalized to handle other programming languages, including now XQuery.

Kawa depends on a Java feature called a ClassLoader, which can take bytecode representation of a Java program (same format as a .class file, but stored in an array in memory), and convert that into a runnable class in an existing Java executable (or "virtual machine"). (The same mechanism is used when a browser down-loads and runs an "applet".)

Compiling the bytecodes and then using a ClassLoader gives Qexo the best of both fast interactive responsiveness, and fast execution of repetitive code. You can also save compiled code in a .class file so it available for future use, and it can even be compiled to machine code using a Java-to-machine-code compiler such as GCJ.

The Qexo web site gives instructions for how you can get Qexo. The easiest way is to down-load the latest version of the Kawa jar file, for example kawa-1.7.jar, and put it in your class path.

Running Qexo

Running the Qexo application

In the following, we write 'qexo' to means the command you use to start up Qexo. There are a number of ways you can actually run Qexo. If you have downloaded Kawa as a jar file (for example kawa-1.7.jar), you can start up Qexo using the either command:

java -cp kawa-1.7.jar kawa.repl --xquery
or
java -jar kawa-1.7.jar --xquery

In the following we'll assuming you've defined qexo as an alias for one of the above. On a Unix or GNU/Linux system, you can make an alias like this,

$ alias qexo='java -jar kawa-1.7.jar --xquery'
and then just do:
$ qexo

We use $ to stand for the prompt for your command-line processor (shell or console), and we use boldface for commands you type.

Alternatively, you can place the kawa-1.7.jar file in your class path, and just type 'java kawa.repl --xquery' instead of 'qexo':

$ java kawa.repl --xquery

Interactive use

If you start up Qexo without specifying any file parameters, it will enter an inteactive loop. Here are some examples, with user input shown in bold.

$ qexo
(: 1 :) for $i in 1 to 3 return 10*$i
10 20 30

The command line prompt includes the current input line number, and has the form of an XQuery comment, to make it easier to cut and paste. Following the prompt you can type some complete XQuery expression, in the example for $i in 1 to 3 return 10*$i. and hit Enter (or Return on some keyboards). The Qexo processor evaluates the expression, and writes out the result, in this case a sequence of 3 integers.

How does Qexo know when an expression is "complete"? When should it evaluate what it has, as opposed to prompting for more input? The rule is that if current input line forms a complete valid expression, it evaluates it. If it has seen a syntax error, it prints out a message and discards the input. Otherwise, it prints a prompt, and waits for more input.

Let us continue, this time with some multi-line expression:

(: 2 :) (3
(: 3(:) +10)
13
(: 4 :) if (3<2)
(: 5i:) then "it's true"
(: 6i:) else "it's false"
it's false

Notice how the prompt changes to '(' or an 'i' to indicate that we're inside an incomplete parenthetical or if expression, respectively.

Next some examples of syntax errors.

(: 7 :) (for $x := 10 return $x
<stdin>:7:9: missing 'in' in 'for' clause
(: 8 :) %+1
<stdin>:8:1: invalid character '%'
(: 9 :) = 5
<stdin>:9:1: missing expression

Qexo prints out the "file name" of the error (in this case the standard console input), followed by the line and column numbers. For the last error, it couldn't be more specific than missing expression.

Next is an example of an element constructor expression. Notice how the prompt changes to an XML comment.

(: 2 :) <a>
<!--3--><b>{for $i in 1 to 3 return 10*$i}</b>
<!--4--></a>
<a>
  <b>10 20 30</b>
</a>

You can also define XQuery functions interactively:

(: 5 :) define function repeat ($count, $values) {
(: 6{:)   for $i in 1 to $count return $values
(: 7{:) }
(: 8 :) "[", repeat(4, (1,2)), "]"
[1 2 1 2 1 2 1 2]
(: 9 :)

Running XQuery programs

The XQuery specification defines a program as a collection of declarations followed by a top-level expression. The "normal" way of running a program is to put it in a file, and evaluate it. You can use the -f command-line flag to specify the name of a file containing a program:

$ qexo -f pictures.xql

You can also specify a (short!) XQuery program on the command line following a -e flag:

$ qexo -e '<img src="file.png"></img>'
<img src="file.png" />

The output is by default printed using the XHTML style, which is XML in a style that most HTML browsers can handle. You can override the output format using an --output-format option. For example you can specify HTML format:

$ qexo --output-format html -e '<img src="file.png"></img>'
<img src="file.png">

You can even specify a format for Scheme programmers:

$ qexo --output-format scheme -e '<img src="file.png"></img>'
(img src: file.png )

Compiling an XQuery program to an application

If you have an application you'll be running repeatedly, it makes sense to compile it and save the compiled form for future use. If you run Qexo with the -C flag followed by one or more filenames, then those files will be compiled, producing or more .class files. The --main option species that Qexo should generate a main method, creating an application that can be run by the java command. Assume pictures.xql is the name of a file containing an XQuery program:

$ qexo --main -C pictures.xql

This creates a file pictures.class. (It may in rare cases create some other classes as well. These have the form pictures*.class.) You can run this as follows:

$ java -cp .:kawa-1.7.jar pictures

This should be the same as, but faster than, running:

$ qexo -f pictures.xql

Run XQuery Servlets in a Web Server

A servlet is a Java class that can loaded into a Web server to process and answer HTTP requests. It is an efficient way to provide server-side computation, because the servlet can be loaded and allocated once, and then process thousands of requests. An XQuery program can be compiled by Qexo into a servlet. See here and chapter 12 for more information and examples of servlets using Qexo.

Calling Java methods from XQuery

A Qexo extension allows you to call an arbitary Java method in an XQuery expression, using XQuery function call notation.

The following example uses Drew Noakes' EXIF extraction library for extracting EXIF meta-data (time-stamps, focal-lensgth, etc) commonly produced by digital cameras. The code assume that exifExtractor.jar is in your class path. The code first declares a number of namespaces as aliases for Java classes.

declare namespace exif-extractor = "class:com.drew.imaging.exif.ExifExtractor"
declare namespace exif-loader = "class:com.drew.imaging.exif.ExifLoader"
declare namespace ImageInfo = "class:com.drew.imaging.exif.ImageInfo"
declare namespace File = "class:java.io.File"

Remember that a namespace defines a prefix alias for a URL literal, which can be any string, used as a unique name. Qexo uses the convention that a URL string starting with class: refers to a Java class. Specifically, it acts as if all Java methods are pre-bound to a QName whose local name is the method name, and whose namespace URI is class: followed by the fully-qualified Java class name. For example, if the Qexo processor sees a call to a function exif-loader:getImageInfo, with the namespaces as defined above, then it will translate that into a call to a method named getImageInfo in the class com.drew.imaging.exif.ExifLoader. (That is assuming you haven't explicitly defined a function by that name!) If the method is overloaded, Qexo uses the argument types to select a method. The method name new is used specially for creating a new objects, being equivalent to a Java new expression.

define function get-image-info ($filename as xs:string)
{
<pre>{
  let $info := exif-loader:getImageInfo(File:new($filename))
  for $i in iterator-items(ImageInfo:getTagIterator($info)) return
 ( "
", ImageInfo:getTagName($i),": ", ImageInfo:getDescription($info, $i))
}</pre>
}

The function takes a single parameter: $filename, which is the name of a JPEG image file as a string. It uses that to create a new File, which is used to create an ImageInfo object. The getTagIterator method creates an java.util.Iterator instance, which you can use to get all the EXIF tags in the image. The Qexo function interator-items takes an Iterator and turns it into an XQuery sequence consisting of the values returned by the Iterator. The for "loops" over this sequence, and we format each tag item into a readable output line.

For more information, see here.

Calling XQuery Expressions from Java

Often XQuery will be used as part of a larger Java application. In this section we will see how you can use Qexo to evaluate an XQuery expression in a Java program. The following statement creates an XQuery evaluation context, and assigns it to the variable named xq:

  XQuery xq = new XQuery();

You can then use the eval method to evaluate an XQuery expression, returning a Java Object:

  Object result = xq.eval(expression);

The following application reads the strings on the command line, evaluates them as XQuery expressions, and prints the result.

import gnu.xquery.lang.XQuery;
public class RunXQuery
{
  public static void main (String[] args) throws Throwable
  {
    XQuery xq = new XQuery();
    for (int i = 0;  i < args.length;  i++)
      {
	String exp = args[i];
	Object result = xq.eval(exp);
	System.out.print(exp);
	System.out.print(" => ");
	System.out.println(result);
      }
  }
}

You can use these commands to compile and run this application, assuming that kawa-1.7.jar is in your class path:

$ javac -g RunXQuery.java
$ java RunXQuery '3+4' 'for $i in 1 to 5 return $i+10' '<a>{3+4}</a>'
3+4 => 7
for $i in 1 to 5 return $i+10 => 11, 12, 13, 14, 15
<a>{3+4}</a> => <a>7</a>

The println method calls the generic toString method, which is fine for quick-and-dirty output (such as for debugging), but isn't recommended for printing real data. One reason is that it requires allocating a temporary string, which then has to get copied into the PrintStream's output buffer, which is wasteful for large data structures. Another reason is that none of the output shows up in the output until it has all been converted, which can also hurt performance. (If the toString gets into a loop, which is quite possible for cyclic data structures, you just sit there waiting with no idea what is going on!) Another reason to avoid toString is that it doesn't provide any control over the output format, such as whether you want characters like '<' escaped as '&lt;', or whether you want HTML-style or XML-style output, for example. Formatting to a specific line width is also difficult.

In Qexo you can instead send the output to a special Consumer, which is something you can send data to. It's like a Writer (or a SAX2 ContentHandler), but it works with abstract data rather than characters. The gnu.xml.XMLPrinter class implements Consumer and extends PrintWriter, so you can use it as either of those two. It writes out the received data in XML format, though there are options to produce HTML and other styles. Below is a revised version of RunXQuery that uses an XMLPrinter:

import gnu.xquery.lang.XQuery;
import gnu.xml.XMLPrinter;
public class RunXQuery
{
  public static void main (String[] args) throws Throwable
  {
    XQuery xq = new XQuery();
    XMLPrinter pp = new XMLPrinter(System.out);
    for (int i = 0;  i < args.length;  i++)
      {
	String exp = args[i];
	System.out.print(exp);
	System.out.print(" => ");
	Object x = xq.eval(exp);
	pp.writeObject(x);
	pp.println();
	pp.flush();
      }
  }
}
$ java RunXQuery 'for $i in 1 to 5 return $i+10'
for $i in 1 to 5 return $i+10 => 11 12 13 14 15

Note the flush call to make sure that the output from the XMLPrinter is sent to the System.out before we write anything on the latter directly. This produces mostly the same output as before, except that sequence item are separated by space instead of comma-space. (Also, XML quoting is handled correctly.)

This still isn't the best way to evaluate-and-print. It is more efficient to have the evaluator print directly to the output, rather than create an intermediate data structure. To do that we can pass the XMLPrinter directly to the eval call.

import gnu.xquery.lang.XQuery;
import gnu.xml.XMLPrinter;
public class RunXQuery
{
  public static void main (String[] args) throws Throwable
  {
    XQuery xq = new XQuery();
    XMLPrinter pp = new XMLPrinter(System.out);
    for (int i = 0;  i < args.length;  i++)
      {
	String exp = args[i];
	System.out.print(exp);
	System.out.print(" => ");
	xq.eval(exp, pp);
	pp.println();
	pp.flush();
      }
  }
}

This produces the same output as before. Whether it is more efficient will depend on the expression you evaluate (and how clever Qexo is). But for XQuery programs that generate large XML data sets it can make a large difference, and in general it's a good idea to pass the Consumer directly to the evaluator.

If the XQuery program is in a file, rather than a String, you can use an eval method that takes a Reader.

  xq.eval (new FileReader("file.xql"), new XMLPrinter(System.out));

You can also call Qexo functions that have been compiled to .class files, directly using Java method invocation. How to do so is a bit complicated and likely to change; it will be documented later.

Setting the context item from Java

(This feature is only available in the CVS version of Qexo so far. It will be in the next release.)

When you evaluate an XQuery expression from Java, you may want to set the context item, position, and size (collectively known as the focus) of the expression. The preceding eval methods evaluate the expression without the focus defined, and if you evaluate an expression that assumes a focus (such as a top-level path expression) then Qexo will report a syntax error.

If you want to specify the focus for an expression, you can use the evalWithFocus methods of gnu.xquery.lang.XQuery. For example:

import gnu.xquery.lang.XQuery;
public class EvalWithFocus1
{
  public static void main (String[] args) throws Throwable
  {
    XQuery xq = new XQuery();
    Object a = xq.eval("<a><b id='1'/><b id='2'/></a>");
    Object b = xq.evalWithFocus("<r size='{last()}'>{b}</r>", a, 1, 9);
    System.out.println(b);
    }
  }
}

The <r> element constructor has an enclosed path expression b. This is evaluated relative to the context item, which is the second argument to evalWithFocus, in this case the result of the previous eval in variable a. So the b returns the two <b> children of the <a> element. The remaining two parameters to evalWithFocus are the context position and context size. (In this case the 8 other items of the context sequence don't exist.) So the above program prints out:

<r size="9"><b id="1" /><b id="2" /></r>

If there is more than one item in the context sequence, you will usually want to evaluate the expression for each item in the sequence. Instead of writing a loop in Java, use the two-operand form of EvalWithFocus and pass it the whole sequence:

import gnu.xquery.lang.XQuery;
public class EvalWithFocus2
{
  public static void main (String[] args) throws Throwable
  {
    XQuery xq = new XQuery();
    Object a = xq.eval("<a><b id='1'/></a>, <a><b id='2'/></a>");
    Object b = xq.evalWithFocus("<r pos='{position()}'>{b}</r>", a);
    System.out.println(b);
  }
}

This results in a 2-item sequence, one for each item in a. (Note that a in this example is different than before.)

<r pos="1"><b id="1" /></r>, <r pos="2"><b id="2" /></r>

Note that if v1 is the result of evaluating e1, then the result of evalWithFocus("e2", v1) is equivalent to evaluating e1/e2.

There are variants of these methods where the output to written to a Consumer, and the expression is read from a Reader. There are also methods so you can pre-compile the expression (using evalToFocusProc) and then repeatedly apply that to different values (using applyWithFocus).

Using Qexo with SAX2

The Simple API for XML (SAX) is a set of classes for "copying" XML data (infosets) using method calls, not necessarily doing any physical copying. It is a propular API because it is an efficient way to process large datasets. The Consumer interface is similar to the SAX2 ContentHandler interface. If you have a class that implements ContentHandler you can use a ContentConsumer filter to convert it to a Consumer. The following code snippet shows how you can pass the result of evaluating an XQuery expresion to a ContentHandler.

import org.xml.sax.ContentHandler;
  ...
  ContentHandler ch = ...;
  xq.eval(exp, new ContentConsumer(ch));

The Consumer interface

The Consumer interface (like the SAX2 ContentHandler) is very useful and efficient for any kind of processing of XML data that can be done in a single pass. A Consumer is a passive output "sink". It doesn't do anything on its own. Instead, it is used as the output of a producer, which is the application that does the actual work, and sends the results to the Consumer. The separation between a producer (when generates results) and a Consumer (which uses the results) allows allows for great flexibility in plugging together modules. Note that a Consumer can pass the data along to another Consumer, acting as the latter's producer. This allows you to chain together a pipeline of Consumer filters.

Here is a Java program that counts the number of different kinds of elements produced by evaluating XQuery expressions. It is a class that extends the basic gnu.lists.FilterConsumer, which provides dummy implementations of the Consumer methods.

import gnu.xquery.lang.XQuery;
import java.util.*;
import gnu.lists.*;
import java.io.PrintStream;

public class CountElements extends FilterConsumer
{
  CountElements()
  {
    super(VoidConsumer.getInstance());
  }

  List elementNames = new ArrayList();
  int numAttributes = 0;
  int numInts = 0;
  int numObjects = 0;

  public void beginGroup(String typeName, Object type)
  {
    elementNames.add(typeName);
    super.beginGroup(typeName, type);
  }

  public void beginAttribute(String attrName, Object attrType)
  {
    numAttributes++;
    super.beginAttribute(attrName, attrType);
  }

  public void writeInt(int v)
  {
    numInts++;
    super.writeInt(v);
  }

  public void writeObject(Object v)
  {
    numObjects++;
    super.writeObject(v);
  }

  void dump (PrintStream out)
  {
    Collections.sort(elementNames);
    int total = 0;
    ListIterator it = elementNames.listIterator();
    String previous = null;
    int count = 0;
    for (;;)
      {
	boolean done = ! it.hasNext();
	String cur = done ? "" : (String) it.next();
	if (previous != null && ! previous.equals(cur))
	  {
	    out.println("<" + previous + "> - " + count + " times");
	    count = 0;
	  }
	if (done)
	  break;
	previous = cur;
	count++;
	total++;
      }
    out.println("TOTAL: " + total);
    if (numAttributes > 0)
      out.println("Attributes: " + numAttributes);
    if (numInts > 0)
      out.println("ints: " + numInts);
    if (numObjects > 0)
      out.println("Objects: " + numObjects);
  }

  public static void main(String[] args) throws Throwable
  {
    XQuery xq = new XQuery();
    CountElements counter = new CountElements();
    for (int i = 0;  i < args.length;  i++)
      {
	String exp = args[i];
	xq.eval(exp, counter);
      }
    counter.dump(System.out);
  }
}

The producer (in this case the XQuery.eval method called by the main method) calls the beginGroup method when it want to "write out" an XML element. The beginGroup implementation in this class just adds the elements string name (the typeName) to a List elementNames. It then calls super.beginGroup to do the default processing of beginGroup, which calls beginGroup in the next Consumer in the filter. In this case, that is a VoidConsumer, which ignores everything it receives, so the super.beginGroup isn't really needed, but we include it to illustrate the general idea.

We also count attributes using the beginAttribute method as wells as calls to writeInt and writeObject. These are used for non-XML typed values, which SAX doesn't handle.

At the end the dump method is called. It sorts the list of elements and writes out the number of times each has been seen, along with some other statistics. Here is a sample run.

$ javac -g CountElements.java
$ java CountElements '<a><b/>{10 to 20}<b/>{1+1}<b/></a>'
<a> - 1 times
<b> - 3 times
TOTAL: 4
ints: 11
Objects: 1

Note how the sequence 10 to 20 produces 11 calls to writeInt, while expression 1+1 produces a single call to writeObject. Whether an XQuery integer produces a calls to writeInt or writeObject is up to the Qexo implementation and how clever it is.

The TreeList DOM class

When Qexo needs to store a document in a data structure it uses an instance of the class gnu.lists.TreeList. The name of the class isn't Document because it's actually a lot more general than what is needed for plain XML documents. It can handle typed values, and it is also used to represent sequences containing multiple items.

The TreeList class is used to implement a Document Object Model (DOM), but it does not implement the standard org.w3c.dom.Node or org.w3c.dom.Document interfaces. The reason for that is that the W3C DOM APIs use a separate Node object for each conceptual node (element, attribute, etc) in a document. This is very inefficient, as it wastes a lot of space and makes a lot of work for the garbage collector. Instead, TreeList uses a much more compact array-based representation, using one char array and one Object array for the entire document. A "node" is just an index into the former array, which makes it efficient to traverse a document.

The following example shows how you can modify the CountElements application so that the command line arguments are the URLs of XML files (instead of XQuery expressions). Replace the main method by the following, leaving the rest of the CountElements class as before. Each URL is opened and parsed as an XML file, to create a TreeList object. You can now do a lot of things with this TreeList; in this example all we do is invoke its consume method, which "writes out" all of its data to a Consumer, which in this case is a CountElements object.

  public static void main(String[] args) throws Throwable
  {
    CountElements counter = new CountElements();
    for (int i = 0;  i < args.length;  i++)
      {
	String url = args[i];
	TreeList doc = gnu.kawa.xml.Document.parse(url);
	doc.consume(counter);
      }
    counter.dump(System.out);
  }

Per Bothner
Last modified: Sun Nov 9 19:02:37 PST 2003