Numbers

Scheme defines a “numerical tower” of numerical types: number, complex, real, rational, and integer. Java has primitive “unboxed” number types (such as int), just like C, and also has some “wrapper” classes that are basically boxed versions of the unboxed number types. Specifically, the standard Java number classes are not organized in any particularly useful hierarchy, except that they all inherit from Number. Kawa implements the full “tower” of Scheme number types, using its own set of sub-classes of the abstract class Quantity, a sub-class of Number we will discuss later.

public class Complex extends Quantity
{ ...;
  public abstract RealNum re();
  public abstract RealNum im();
}

Complex is the class of abstract complex numbers. It has three subclasses: the abstract class RealNum of real numbers; the general class CComplex where the components are arbitrary RealNum fields; and the optimized DComplex where the components are represented by double fields.

public class RealNum extends Complex
{ ...;
  public final RealNum re()
  { return this; }
  public final RealNum im()
  { return IntNum.zero(); }
  public abstract boolean isNegative();
}

public class DFloNum extends RealNum
{ ...;
  double value;
}

Concrete class for double-precision (64-bit) floating-point real numbers.

public class RatNum extends RealNum
{ ...;
  public abstract IntNum numerator();
  public abstract IntNum denominator();
}

RatNum, the abstract class for exact rational numbers, has two sub-classes: IntFraction and IntNum.

public class IntFraction extends RatNum
{ ...;
  IntNum num;
  IntNum den;
}

The IntFraction class implements fractions in the obvious way. Exact real infinities are identified with the fractions 1/0 and -1/0.

public class IntNum extends RatNum
{ ...;
  int ival;
  int[] words;
}

The IntNum concrete class implements infinite-precision integers. The value is stored in the first ival elements of words, in 2's complement form (with the low-order bits in word[0]).

There are already many bignum packages, including one that Sun added for JDK 1.1. What are the advantages of this one?

A complete set of operations, including gcd and lcm; logical, bit, and shift operations; power by repeated squaring; all of the division modes from Common Lisp (floor, ceiling, truncate, and round); and exact conversion to double.
Consistency and integration with a complete “numerical tower.” Specifically, consistency and integration with “fixnum” (see below).
Most bignum packages use a signed-magnitude representation, while Kawa uses 2's complement. This makes for easier integration with fixnums, and also makes it cheap to implement logical and bit-fiddling operations.
Use of all 32 bits of each “big-digit” word, which is the “expected” space-efficient representation. More importantly, it is compatible with the mpn routines from the Gnu Multi-Precision library [gmp]. The mpn routines are low-level algorithms that work on unsigned pre-allocated bignums; they have been transcribed into Java in the MPN class. If better efficiency is desired, it is straight-forward to replace the MPN methods with native ones that call the highly-optimized mpn functions.

If the integer value fits within a signed 32-bit int, then it is stored in ival and words is null. This avoids the need for extra memory allocation for the words array, and also allows us to special-case the common case.

As a further optimization, the integers in the range -100 to 1024 are pre-allocated.

Mixed-type arithmetic

Many operations are overloaded to have different definitions depending on the argument types. The classic examples are the functions of arithmetic such as “+”, which needs to use different algorithms depending on the argument types. If there is a fixed and reasonably small set of number types (as is the case with standard Scheme), then we can just enumerate each possibility. However, the Kawa system is meant to be more extensible and support adding new number types.

The solution is straight-forward in the case of a one-operand function such as “negate”, since we can use method overriding and virtual method calls to dynamically select the correct method. However, it is more difficult in the case of a binary method like “+,” since classic object-oriented languages (including Java) only support dynamic method selection using the type of the first argument (“this”). Common Lisp and some Scheme dialects support dynamic method selection using all the arguments, and in fact the problem of binary arithmetic operations is probably the most obvious example where “multi-dispatch” is useful.

Since Java does not have multi-dispatch, we have to solve the problem in other ways. Smalltalk has the same problems, and solved it using “coercive generality”: Each number class has a generality number, and operands of lower generality are converted to the class with the higher generality. This is inefficient because of all the conversions and temporary objects (see [Budd91Arith]), and it is limited to what extent you can add new kinds of number types.

In “double dispatch” [Ingalls86] the expression x-y is implemented as x.sub(y). Assuming the (run-time) class of x is Tx and that of y is Ty, this causes the sub method defined in Tx to be invoked, which just does y.subTx(x). That invokes the subTx method defined in Ty which can without further testing do the subtraction for types Tx and Ty.

The problem with this approach is that it is difficult to add a new Tz class, since you have to also add subTz methods in all the existing number classes, not to mention addTz and all the other operations.

In Kawa, x-y is also implemented by x.sub(y). The sub method of Tx checks if Ty is one of the types it knows how to handle. If so, it does the subtraction and returns the result itself. Otherwise, Tx.sub does y.subReversed(x). This invokes Ty.subReversed (or subReversed as defined in a super-class of Ty). Now Ty (or one of its super-classes) gets a chance to see if it knows how to subtract itself from a Tx object.

The advantage of this scheme is flexibility. The knowledge of how to handle a binary operation for types Tx and Ty can be in either of Tx or Ty or either of their super-classes. This makes is easier to add new classes without having to modify existing ones.

Quantities

The DSSSL language [DSSSL] is a dialect of Scheme used to process SGML documents. DSSSL has “quantities” in addition to real and integer numbers. Since DSSSL is used to format documents, it provides length values that are a multiple of a meter (e.g. 0.2m), as well as derived units like cm and pt (point). A DSSSL quantity is a product of a dimension-less number with an integral power of a length unit (the meter). A (pure) number is a quantity where the length power is zero.

For Kawa, I wanted to merge the Scheme number types with the DSSSL number types, and also generalize the DSSSL quantities to support other dimensions (such as mass and time) and units (such as kg and seconds). Quantities are implemented by the abstract class Quantity. A quantity is a product of a Unit and a pure number. The number can be an arbitrary complex number.

public class Quantity extends Number
{ ...;
  public Unit unit()
  { return Unit.Empty; }
  public abstract Complex number();
}

public class CQuantity extends Quantity
{ ...;
  Complex num;
  Unit unt;
  public Complex number()
  { return num; }
  public Unit unit()
  { return unt; }
}

A CQuantity is a concrete class that implements general Quantities. But usually we don't need that much generality, and instead use DQuanity.

public class DQuantity extends Quantity
{ ...;
  double factor;
  Unit unt;
  public final Unit unit()
  { return unt; }
  public final Complex number()
  { return new DFloNum(factor); }
}

public class Unit
{ ...;
  String name; // Optional.
  Dimensions dims;
  double factor;
}

A Unit is a product of a floating-point factor and one or more primitive units, combined into a Dimensions object. The Unit may have a name (such as “kg”), which is used for printing, and when parsing literals.

public class BaseUnit extends Unit
{ ...;
  int index;
}

A BaseUnit is a primitive unit that is not defined in terms of any other Unit, for example the meter. Each BaseUnit has a different index, which is used for identification and comparison purposes. Two BaseUnits have the same index if and only if they are the same BaseUnit.

public class Dimensions
{
  BaseUnit[] bases;
  short[] powers;
}

A Dimensions object is a product and/or ratio of BaseUnits. You can think of it as a data structure that maps every BaseUnit to an integer power. The bases array is a list of the BaseUnits that have a non-zero power, in order of the index of the BaseUnit. The powers array gives the power (exponent) of the BaseUnit that has the same index in the bases array.

Two Dimensions objects are equal if they have the same list of bases and powers. Dimensions objects are “interned” (using a global hash table) so that they are equal only if they are the same object. This makes it easy to implement addition and subtraction:

public static DQuantity add
  (DQuantity x, DQuantity y)
{
  if (x.unit().dims != y.unit().dims)
     throw new ArithmeticException
       ("units mis-match");
  double r = y.unit().factor
     / x.unit().factor;
  double s = x.factor + r * y.factor;
  return new DQuantity (s, x.unit());
}

The Unit of the result of an addition or subtraction is the Unit of the first operand. This makes it easy to convert units:

(+ 0cm 2.5m)    ==>    250cm

Because Kawa represents quantities relative to user-specified units, instead of representing them relative to primitive base units, it can display quantities using the user's preferred units, rather than having to use prmitive units. However, this does make multiplication and division a problem The actual calculation (finding the right Dimensions and multiplying the constant factors) is straight-forward. The difficulty is that we have to generate a new compound Unit, and print it out in a reasonable fashion. Exactly how this should best be done is not obvious.