Q2 - ideas for a programming language

Per Bothner


Table of Contents

Names
Applications
Sequences
Attributes
Objects
Paths
Functions
Parameter lists
Patterns
Variable definition
Arrays
Strings

Names

A name is either a simple name or a prefixed (or qualified) name. A simple name or an identifier will probably have the same syntax as an XML NCName - i.e. a XML name without a colon. A prefixed name has two parts: an optional prefix (which if not empty is a simple name), followed by a colon, followed by a local name (which is a simple name). (No whitespace or comments are allowed between the prefix and the colon, or between the colon and the local name.) A prefix is bound to a namespace using a (static) namespace declaration. A namespace in XML is just a globally unique string - a URI. However, we may associate more information with a namespace, such a default function definition. An expanded name is a local name and a namespace URI.

name ::= simple-name | qualified-name
simple-name ::= NCName [[refer to XML standard]]
qualified-name ::= namespace-prefix":"simple-name
namespace-prefix ::= simple-name

All names must be statically bound, to detect typos. In some case there may be definitions for all the local names associated with a prefix.

There are no reserved names. Though there are predefined (builtin) names and prefixes, these can all be over-ridden. The lack of reserved names may complicate the language and/or parsing, however.

Applications

An application is the main syntactic building block Q2. A program is normally an application, which will contain nested applications. An application can be a function call, an object/element creation expression, or a special form.

application ::= name attribute* simple-expression*
expression := application | simple-expression
simple-expression := literal | variable-reference | "(" expression ")"

Sequences

A value is a sequence of zero or more items. Sequences are as in XQuery: They do not nest, and an item is in all respects equivalent to a sequence of that one item. (If you need nested sequences, use an element or an array.)

The concatenation operator takes two sequences and concatenates the result. XQuery uses the comma operator for this. An alternative to consider is to use juxtaposition for concatenation. (Snobol did this, as did APL for array literals.) This is appealing, but has various consequences.

Most programming languages distinguish between expressions and statements. Q2 does not: A statement is just an expression that returns a zero-item sequence. Even expression-oriented languages in the Lisp family use statements whose value is ignored. The use of sequences allows a powerful unification. A "block" of one or more statements that are evaluated sequentially becomes one or more expressions thata are combined by sequence concatenation. A loop that evaluates a statement until some condition is true becomes an expression that concatenates the results from each iteration of the loop. This is very powerful, as shown by the XQuery for form.

Sequences are normally accessed sequentially. For indexed access, convert the sequence to an array.

A sequence can be infinitely long. This corresponds to unbounded loops, or an input to a loop that gets terminated by some condition. The sequence of natural numbers is infinitely long and can be used as the loop control "while" loop.

Attributes

An attribute consists of a name and a value. Attribute are normally written using an attribute-expression:

attribute-name: primary-expression

No space is allowed between the attribute-name and the colon, and there must either be whitespace between the colon and the expression, or the expression must start with a non-name character.

An attribute is a generalization of attributes in XML infosets, in that they can contain an arbitrary value (including possibly a sequence), while in XML the values can only be a string.

Attributes do not have identity, unlike in the XPath data-model. (We might go further and make attributes syntax only.)

Objects

An object consists of type, plus a sequence of zero or more public components, some of whome may be attributes. The non-attribute components of an object are the also called its children. This is similar to XML elements. In addition, an object may have private components (and maybe methods). Some objects are atomic; these have no public components. These include numbers and characters. (An object have have zero componets without neceaarily being atomic.)

There can be at most one attribute with a given (expanded) name. The order of attributes is not generally semantically significant, but the order is preserved, for use in printing and for use in operations that depend on "document order".

Object have identity. Unlike in XML, an object does not have a unique parent: An object can be the child of multiple other objects, and there is no link from a child to its parent. However, we will probably support "trails" or "paths" which are references to objects with the way we got to that object, and the parent of a trail is the trail with the final reference removed.

A constructor is a named function that creates an object. A constructor is also a type - the type or class of objects created by the constructor. Not all functions are constructors. However, merging the concepts of constructor and class is insufficient. A class, when used as a pattern, should be able to match any object that has maching attributes, children, and name/type/constructor.

struct name () attr-name:type ...

This declarares a type with name name, which must have the specified attributes. A value matches the type it it has all the named attributes, and the values of each attribute matches the specified attributes types. In addition, the value must has been contructed such that name is a member of the (magic) type attribute.

The object type may specify one or more ancestors:

struct name (base-name ...) attr-name:type ...

When viewed as a type, this additionally must match all the inherited types and their attributes. If an attr-name appears more than once in the inheritance hierarchy, that is equivalent to the intersection type. Question: should user be required to specify intersection type - i.e. as in co-variant attribute types?

Methods are parameterized attributes or attributes that are functions. First need to decide how functions are declared.

We will have some sub-typing mechanisms, where a constructor can inherit from or extend some other type.

struct name (super-type-names) attr-name:type[=default] ...
let val = name attr-name: value ...

Paths

The syntax objects@/type evaluates objects, which must evaluate to a sequence of objects. Then concatenate all the children of each object that satisfy the give type.

For example:

((std:vector 2 3.4 5) (std:vector 0.5 1 1.5))*/std:integer
  ==> (2 5 1)

Functions

A function (method) definition has this syntax:

rule function-name parameter-list => body
parameter-list ::= attribute-pattern* value-pattern* 

At a call site the set of method definitions in the static scope are "merged", and the most specific matching function is applied.

Parameter lists

A function takes as parameters a sequence of items. In XQuery in contrast, a parameter list is zero or more values, each of which can be a sequence. However, using a single sequence for all the parameters provides an element unification, and lets us use pattern matching to bind parameters.

Patterns

A pattern can matched against a value. If it matches, a variable may be bound to the matched value. A simple pattern looks like:

name!type

If the value matched against this pattern has the specified type, then the pattern matches (succeeds), and the name gets bound to the value. The following example declares i to have the type Integer and value 10.

let i!Integer = 10

Both the name and the type are optional. The default for type is item which matches any single item. (I.e. not a sequence that is empty or has more than one value, nor an attribute. Nor a function??) The default for name is a unique compiler-generated name.

A sequence pattern can "take apart" a sequence:

pattern1 pattern2 ...

A tricky part is that a patterni might match zero or more than one item.

A "guarded pattern" adds a conditional Boolean expression:

pattern when condition

(We might use if or where or some other syntax instead of when.)

An optional pattern may have a default value:

pattern?=default-value

If pattern matches, then the ?=default-value is ignored. Otherwise, -'[]_ '[]

A "constructor pattern" can match against objects created using constructors.

complex re:!real im:!real
Since a constructor "field" is an attribute, this is syntactic sugar for:
complex re:(re!real) im:(im!real)

A single * matches any item. (Should probably not match attributes.) A doubled ** matches any sequence.

Variable definition

A variable definition has the form:

let pattern = expresssion

This evaluates expresssion and matches it against the pattern. For example:

let i!integer = 2+3

A variable definition is a void expression - i.e. one that results in a sequence of zero values. Thus you can intermix definitions and other expressions:

let i!integer = 1
let j!integer = i+1
i+j

This is an expresson with two sub-expressions: The first two are the definitions, and the final is the i+j. The latter is a singleton integer, so the result of the combined expression is also a singleton integer.

The scope of the "variable" defined in a variable definition is undecided. Tentatively, assume it encompasses sub-sequent sub-expressions in the encloding compound expression.

Arrays

An array is a mapping from a sequence of integers to a value. The number of integers used is the rank of the array. A vector is an array of rank one. A non-array item (such as a numbers) is viewed as an array of rank zero. An array is a single item.

There are primitives to convert a sequence to an array and vice versa. We might use [square brackets] to convert a sequence to a vector:

[ val1, val2, ... ]

The function size returns the dimensions of an array as a sequence of integers. The size of a scalar is the empty sequence. For example (size 3) => (), and (size [2 3 4]) => 3. In contrast length returns the number of iterms in a sequence.

As a general rules you use arrays for "random-access" indexing, while you use sequences for iteration and composition.

Strings

A string is a sequence of one or more characters. A string is not an array. A problem with this is that you cannot create a sequence of strings, since that would just be there concatenation. This may be a disadvantage, but one can always wrap sequences as arrays.

Because a string is a sequence, we can use the regular sequence-based patterns to match string. We do not need regexps.