GNU Smalltalk User's Guide ************************** This document describes installing and operating the GNU Smalltalk programming environment. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". Introduction ************ GNU Smalltalk is an implementation that closely follows the Smalltalk-80 language as described in the book `Smalltalk-80: the Language and its Implementation' by Adele Goldberg and David Robson, which will hereinafter be referred to as `the Blue Book'. The Smalltalk programming language is an object oriented programming language. This means, for one thing, that when programming you are thinking of not only the data that an object contains, but also of the operations available on that object. The object's data representation capabilities and the operations available on the object are "inseparable"; the set of things that you can do with an object is defined precisely by the set of operations, which Smalltalk calls "methods", that are available for that object: each object belongs to a "class" (a datatype and the set of functions that operate on it) or, better, it is an "instance" of that class. You cannot even examine the contents of an object from the outside--to an outsider, the object is a black box that has some state and some operations available, but that's all you know: when you want to perform an operation on an object, you can only send it a "message", and the object picks up the method that corresponds to that message. In the Smalltalk language, everything is an object. This includes not only numbers and all data structures, but even classes, methods, pieces of code within a method ("blocks" or "closures"), stack frames ("contexts"), etc. Even `if' and `while' structures are implemented as methods sent to particular objects. Unlike other Smalltalks (including Smalltalk-80), GNU Smalltalk emphasizes Smalltalk's rapid prototyping features rather than the graphical and easy-to-use nature of the programming environment (did you know that the first GUIs ever ran under Smalltalk?). The availability of a large body of system classes, once you master them, makes it pretty easy to write complex programs which are usually a task for the so called "scripting languages". Therefore, even though we have a nice GUI environment including a class browser (*note Blox::), the goal of the GNU Smalltalk project is currently to produce a complete system to be used to write your scripts in a clear, aesthetically pleasing, and philosophically appealing programming language. An example of what can be obtained with Smalltalk in this novel way can be found in *Note Class reference: (gst-libs)Top. That part of the manual is entirely generated by a Smalltalk program, starting from the source code for the class libraries distributed together with the system. 1 Using GNU Smalltalk ********************* 1.1 Command line arguments ========================== The GNU Smalltalk virtual machine may be invoked via the following command: gst [ flags ... ] [ file ... ] When you invoke GNU Smalltalk, it will ensure that the binary image file (called `gst.im') is up to date; if not, it will build a new one as described in *Note Loading an image or creating a new one: Loading or creating an image. Your first invocation should look something like this: "Global garbage collection... done" GNU Smalltalk ready st> If you specify one or more FILEs, they will be read and executed in order, and Smalltalk will exit when end of file is reached. If you don't specify FILE, GNU Smalltalk reads standard input, issuing a `st>' prompt if the standard input is a terminal. You may specify `-' for the name of a file to invoke an explicit read from standard input. To exit while at the `st>' prompt, use `Ctrl-d', or type `ObjectMemory quit' followed by . Use `ObjectMemory snapshot' first to save a new image that you can reload later, if you wish. As is standard for GNU-style options, specifying `--' stops the interpretation of options so that every argument that follows is considered a file name even if it begins with a `-'. You can specify both short and long flags; for example, `--version' is exactly the same as `-v', but is easier to remember. Short flags may be specified one at a time, or in a group. A short flag or a group of short flags always starts off with a single dash to indicate that what follows is a flag or set of flags instead of a file name; a long flag starts off with two consecutive dashes, without spaces between them. In the current implementation the flags can be intermixed with file names, but their effect is as if they were all specified first. The various flags are interpreted as follows: `-a' `--smalltalk-args' Treat all options afterward as arguments to be given to Smalltalk code retrievable with `Smalltalk arguments', ignoring them as arguments to GNU Smalltalk itself. Examples: command line Options seen by GNU Smalltalk `Smalltalk arguments' (empty) (none) `#()' `-Via foo bar' `-Vi' `#('foo' 'bar')' `-Vai test' `-Vi' `#('test')' `-Vaq' `-Vq' `#()' `--verbose -aq -c ' `--verbose -q' `#('-c')' `-c' `--core-dump' When a fatal signal occurs, produce a core dump before terminating. Without this option, only a backtrace is provided. `-D' `--declaration-trace' Print the class name, the method name, and the byte codes that the compiler generates as it compiles methods. Only applies to files that are named explicitly on the command line, unless the flag is given multiple times on the command line. `-E' `--execution-trace' Print the byte codes being executed as the interpreter operates. Only works for statements explicitly issued by the user (either interactively or from files given on the command line), unless the flag is given multiple times on the command line. `--kernel-directory' Specify the directory from which the kernel source files will be loaded. This is used mostly while compiling GNU Smalltalk itself. Smalltalk code can retrieve this information with `Directory kernel'. `--no-user-files' Don't load any files from `~/.st/' (*note Loading an image or creating a new one: Loading or creating an image.).(1) This is used mostly while compiling GNU Smalltalk itself, to ensure that the installed image is built only from files in the source tree. `-K FILE' `--kernel-file FILE' Load FILE in the usual way, but look for it relative to the kernel directory's parent directory, which is usually `/usr/local/share/smalltalk/'. See `--kernel-dir' above. `-f' `--file' The following two command lines are equivalent: gst -f FILE `args...' gst -q FILE -a `args...' This is meant to be used in the so called "sharp-bang" sequence at the beginning of a file, as in #! /usr/bin/gst -f ... Smalltalk source code ... GNU Smalltalk treats the first line as a comment, and the `-f' option ensures that the arguments are passed properly to the script. Use this instead to avoid hard-coding the path to `gst':(2) #! /bin/sh "exec" "gst" "-f" "$0" "$@" ... Smalltalk source code ... `-g' `--no-gc-messages' Suppress garbage collection messages. `-h' `--help' Print out a brief summary of the command line syntax of GNU Smalltalk, including the definitions of all of the option flags, and then exit. `-i' `--rebuild-image' Always build and save a new image file; see *Note Loading an image or creating a new one: Loading or creating an image. `--maybe-rebuild-image' Perform the image checks and rebuild as described in *Note Loading an image or creating a new one: Loading or creating an image. This is the default when `-I' is not given. `-I FILE' `--image-file FILE' Use the image file named FILE as the image file to load instead of the default location, and set FILE's directory part as the image path. This option completely bypasses checking the file dates on the kernel files; use `--maybe-rebuild-image' to restore the usual behavior, writing the newly built image to FILE if needed. `-q' `--quiet' `--silent' Suppress the printing of answered values from top-level expressions while GNU Smalltalk runs. `-r' `--regression-test' This is used by the regression testing system and is probably not of interest to the general user. It controls printing of certain information. `-S' `--snapshot' Save the image after loading files from the command line. Of course this "snapshot" is not saved if you include - (stdin) on the command line and exit by typing `Ctrl-c'. `-v' `--version' Print out the GNU Smalltalk version number, then exit. `-V' `--verbose' Print various diagnostic messages while executing (the name of each file as it's loaded, plus messages about the beginning of execution or how many byte codes were executed). ---------- Footnotes ---------- (1) The directory would be called `_st/' under MS-DOS. Under OSes that don't use home directories, it would be looked for in the current directory. (2) The words in the shell command `exec' are all quoted, so GNU Smalltalk parses them as five separate comments. 1.2 Startup sequence ==================== *Caveat*: _The startup sequence is pretty complicated. If you are not interested in its customization, you can skip the first two sections below. These two sections also don't apply when using the command-line option `-I', unless also using `--maybe-rebuild-image'._ You can abort GNU Smalltalk at any time during this procedure with `Ctrl-c'. 1.2.1 Picking an image path and a kernel path --------------------------------------------- When GNU Smalltalk is invoked, it first chooses two paths, the "image path" and the "kernel path". The image path is set by considering these paths in succession: * the directory part of the `--image-file' option if it is given; * the value of the `SMALLTALK_IMAGE' environment variable if it is defined and readable; this step will disappear in a future release; * the path compiled in the binary (usually, under Unix systems, `/usr/local/var/lib/smalltalk' or a similar path under `/var') if it exists and it is readable; * the current directory. The current directory is also used if the image has to be rebuilt but you cannot write to a directory chosen according to the previous criteria. The "kernel path" is the directory in which to look for Smalltalk code compiled into the base image. The possibilities in this case are: * the argument to the `--kernel-dir' option if it is given; * the value of the `SMALLTALK_KERNEL' environment variable if it is defined and readable; this step will disappear in a future release; * the path compiled in the binary (usually, under Unix systems, `/usr/local/share/smalltalk/kernel' or a similar data file path) if it exists and it is readable; * a subdirectory named `kernel' of the image path. 1.2.2 Loading an image or creating a new one -------------------------------------------- GNU Smalltalk can load images created on any system with the same pointer size as its host system by approximately the same version of GNU Smalltalk, even if they have different endianness. For example, images created on 32-bit PowerPC can be loaded with a 32-bit x86 `gst' VM, provided that the GNU Smalltalk versions are similar enough. Such images are called "compatible images". It cannot load images created on systems with different pointer sizes; for example, our x86 `gst' cannot load an image created on x86-64. Unless the `-i' flag is used, GNU Smalltalk first tries to load the file named by `--image-file', defaulting to `gst.im' in the image path. If this is found, GNU Smalltalk ensures the image is "not stale", meaning its write date is newer than the write dates of all of the kernel method definition files. It also ensures that the image is "compatible", as described above. If both tests pass, GNU Smalltalk loads the image and continues with *Note After the image is created or restored: Starting the system. If that fails, a new image has to be created. The image path may now be changed to the current directory if the previous choice is not writeable. To build an image, GNU Smalltalk loads the set of files that make up the kernel, one at a time. The list can be found in `libgst/lib.c', in the `standard_files' variable. You can override kernel files by placing your own copies in `~/.st/kernel/'.(1) For example, if you create a file `~/.st/kernel/Builtins.st', it will be loaded instead of the `Builtins.st' in the kernel path. To aid with image customization and local bug fixes, GNU Smalltalk loads two more files (if present) before saving the image. The first is `site-pre.st', found in the parent directory of the kernel directory. Unless users at a site change the kernel directory when running `gst', `/usr/local/share/smalltalk/site-pre.st' provides a convenient place for site-wide customization. The second is `~/.st/pre.st', which can be different for each user's home directory.(2). Before the next steps, GNU Smalltalk takes a snapshot of the new memory image, saving it over the old image file if it can, or in the current directory otherwise. ---------- Footnotes ---------- (1) The directory is called `_st/kernel' under MS-DOS. Under OSes that don't use home directories, it is looked for in the current directory. (2) The file is looked up as `_st/pre.st' under MS-DOS and again, under OSes that don't use home directories it is looked for as `pre.st' in the current directory. 1.2.3 After the image is created or restored -------------------------------------------- Next, GNU Smalltalk sends the `returnFromSnapshot' event to the dependents of the special class `ObjectMemory' (*note Memory access::). Afterwards, it loads `~/.st/init.st' if available.(1) You can remember the difference between `pre.st' and `init.st' by remembering that `pre.st' is the _pre_-snapshot file and `init.st' is the post-image-load _init_ialization file. Finally, GNU Smalltalk loads files listed on the command line, or prompts for input at the terminal, as described in *Note Command line arguments: Invocation. ---------- Footnotes ---------- (1) The same considerations made above hold here too. The file is called `_st/init.st' under MS-DOS, and is looked for in the current directory under OSes that don't use home directories. 1.3 Syntax of GNU Smalltalk =========================== The language that GNU Smalltalk accepts is basically the same that other Smalltalk environment accept and the same syntax used in the "Blue Book", also known as `Smalltalk-80: The Language and Its Implementation'. The return operator, which is represented in the Blue Book as an up-arrow, is mapped to the ASCII caret symbol `^'; the assignment operator (left-arrow) is usually represented as `:='(1). Actually, the grammar of GNU Smalltalk is slightly different from the grammar of other Smalltalk environments in order to simplify interaction with the system in a command-line environment as well as in full-screen editors. Statements are executed one by one; multiple statements are separated by a period. At end-of-line, if a valid statement is complete, a period is implicit. For example, 8r300. 16rFFFF prints out the decimal value of octal `300' and hex `FFFF', each followed by a newline. Multiple statements share the same local variables, which are automatically declared. To delete the local variables, terminate a statement with `!' rather than `.' or newline. Here, a := 42 a! a the first two `a's are printed as `42', but the third one is uninitialized and thus printed as `nil'. In order to evaluate multiple statements in a single block, wrap them into an "eval block" as follows: Eval [ a := 42. a printString ] This won't print the intermediate result (the integer 42), only the final result (the string `'42''). ObjectMemory quit exits from the system. You can also type a `C-d' to exit from Smalltalk if it's reading statements from standard input. GNU Smalltalk provides three extensions to the language that make it simpler to write complete programs in an editor. However, it is also compatible with the "file out" syntax as shown in the "Green Book" (also known as `Smalltalk-80: Bits of History, Words of Advice' by Glenn Krasner). A new class is created using this syntax: SUPERCLASS-NAME subclass: NEW-CLASS-NAME [ | INSTANCE VARIABLES | PRAGMAS MESSAGE-PATTERN-1 [ STATEMENTS ] MESSAGE-PATTERN-2 [ STATEMENTS ] ... CLASS-VARIABLE-1 := EXPRESSION. CLASS-VARIABLE-2 := EXPRESSION. ... ] In short: * Instance variables are defined with the same syntax as method temporary variables. * Unlike other Smalltalks, method statements are inside brackets. * Class variables are defined the same as variable assignments. * Pragmas define class comment, class category, imported namespaces, and the shape of indexed instance variables. A similar syntax is used to define new methods in an existing class. CLASS-EXPRESSION extend [ ... ] The CLASS-EXPRESSION is an expression that evaluates to a class object, which is typically just the name of a class, although it can be the name of a class followed by the word `class', which causes the method definitions that follow to apply to the named class itself, rather than to its instances. Number extend [ radiusToArea [ ^self squared * Float pi ] radiusToCircumference [ ^self * 2 * Float pi ] ] A complete treatment of the Smalltalk syntax and of the class library can be found in the included tutorial and class reference (*note Class Reference: (gst-libs)Top.). More information on the implementation of the language can be found in the `Blue Book'; the relevant parts are also available online as HTML documents, at `http://users.ipa.net/~dwighth/smalltalk/bluebook/bluebook_imp_toc.html'. ---------- Footnotes ---------- (1) It also bears mentioning that there are two assignment operators: `_' and `:='. Both are usable interchangeably, provided that they are surrounded by spaces. The GNU Smalltalk kernel code uses the `:=' form exclusively, but `_' is supported a) for compatibility with previous versions of GNU Smalltalk b) because this is the correct mapping between the assignment operator mentioned in the Blue Book and the current ASCII definition. In the ancient days (like the middle 70's), the ASCII underscore character was also printed as a back-arrow, and many terminals would display it that way, thus its current usage. Anyway, using `_' may lead to portability problems. 1.4 Running the test suite ========================== GNU Smalltalk comes with a set of files that provides a simple regression test suite. To run the test suite, you should be connected to the top-level Smalltalk directory. Type make check You should see the names of the test suite files as they are processed, but that's it. Any other output indicates some problem. 1.5 Licensing of GNU Smalltalk ============================== Different parts of GNU Smalltalk comes under two licenses: the virtual machine and the development environment (compiler and browser) come under the GNU General Public License, while the system class libraries come under the Lesser General Public License. 1.5.1 Complying with the GNU GPL -------------------------------- The GPL licensing of the virtual machine means that all derivatives of the virtual machine must be put under the same license. In other words, it is strictly forbidden to distribute programs that include the GNU Smalltalk virtual machine under a license that is not the GPL. This also includes any bindings to external libraries. For example, the bindings to Gtk+ are released under the GPL. In principle, the GPL would not extend to Smalltalk programs, since these are merely input data for the virtual machine. On the other hand, using bindings that are under the GPL via dynamic linking would constitute combining two parts (the Smalltalk program and the bindings) into one program. Therefore, we added a special exception to the GPL in order to avoid gray areas that could adversely hit both the project and its users: In addition, as a special exception, the Free Software Foundation give you permission to combine GNU Smalltalk with free software programs or libraries that are released under the GNU LGPL and with independent programs running under the GNU Smalltalk virtual machine. You may copy and distribute such a system following the terms of the GNU GPL for GNU Smalltalk and the licenses of the other code concerned, provided that you include the source code of that other code when and as the GNU GPL requires distribution of source code. Note that people who make modified versions of GNU Smalltalk are not obligated to grant this special exception for their modified versions; it is their choice whether to do so. The GNU General Public License gives permission to release a modified version without this exception; this exception also makes it possible to release a modified version which carries forward this exception. 1.5.2 Complying with the GNU LGPL --------------------------------- Smalltalk programs that run under GNU Smalltalk are linked with the system classes in GNU Smalltalk class library. Therefore, they must respect the terms of the Lesser General Public License(1). The interpretation of this license for architectures different from that of the C language is often difficult; the accepted one for Smalltalk is as follows. The image file can be considered as an object file, falling under Subsection 6a of the license, as long as it allows a user to load an image, upgrade the library or otherwise apply modifications to it, and save a modified image: this is most conveniently obtained by allowing the user to use the read-eval-print loop that is embedded in the GNU Smalltalk virtual machine. In other words, provided that you leave access to the loop in a documented way, or that you provide a way to file in arbitrary files in an image and save the result to a new image, you are obeying Subsection 6a of the Lesser General Public License, which is reported here: a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.) In the future, alternative mechanisms similar to shared libraries may be provided, so that it is possible to comply with the GNU LGPL in other ways. ---------- Footnotes ---------- (1) Of course, they may be more constrained by usage of GPL class libraries. 2 Features of GNU Smalltalk *************************** In this section, the features which are specific to GNU Smalltalk are described. These features include support for calling C functions from within Smalltalk, accessing environment variables, and controlling various aspects of compilation and execution monitoring. Note that, in general, GNU Smalltalk is much more powerful than the original Smalltalk-80, as it contains a lot of methods that are common in today's Smalltalk implementation and are present in the ANSI Standard for Smalltalk, but were absent in the Blue Book. Examples include Collection's `allSatisfy:' and `anySatisfy:' methods and many methods in SystemDictionary (the Smalltalk dictionary's class). 2.1 Extended streams ==================== The basic image in GNU Smalltalk includes powerful extensions to the _Stream_ hierarchy found in ANSI Smalltalk (and Smalltalk-80). In particular: * Read streams support all the iteration protocols available for collections. In some cases (like `fold:', `detect:', `inject:into:') these are completely identical. For messages that return a new stream, such as `select:' and `collect:', the blocks are evaluated lazily, as elements are requested from the stream using `next'. * Read streams can be concatenated using `,' like SequenceableCollections. * "Generators" are supported as a quick way to create a Stream. A generator is a kind of pluggable stream, in that a user-supplied blocks defines which values are in a stream. For example, here is an empty generator and two infinite generators: "Returns an empty stream" Generator on: [ :gen | ] "Return an infinite stream of 1's" Generator on: [ :gen | [ gen yield: 1 ] repeat ] "Return an infinite stream of integers counting up from 1" Generator inject: 1 into: [ :value | value + 1 ] The block is put "on hold" and starts executing as soon as `#next' or `#atEnd' are sent to the generator. When the block sends `#yield:' to the generator, it is again put on hold and the argument becomes the next object in the stream. Generators use "continuations", but they shield the users from their complexity by presenting the same simple interface as streams. 2.2 Regular expression matching =============================== _Regular expressions_, or "regexes", are a sophisticated way to efficiently match patterns of text. If you are unfamiliar with regular expressions in general, see *Note Syntax of Regular Expressions: (emacs)Regexps, for a guide for those who have never used regular expressions. GNU Smalltalk supports regular expressions in the core image with methods on `String'. The GNU GNU Smalltalk expression library is derived from GNU libc, with modifications made originally for Ruby to support Perl-like syntax. It will always use its included library, and never the ones installed on your system; this may change in the future in backwards-compatible ways. Regular expressions are currently 8-bit clean, meaning they can work with any ordinary String, but do not support full Unicode, even when package `I18N' is loaded. Broadly speaking, these regexes support Perl 5 syntax; register groups `()' and repetition `{}' must not be given with backslashes, and their counterpart literal characters should. For example, `\{{1,3}' matches `{', `{{', `{{{'; correspondingly, `(a)(\()' matches `a(', with `a' and `(' as the first and second register groups respectively. GNU Smalltalk also supports the regex modifiers `imsx', as in Perl. You can't put regex modifiers like `im' after Smalltalk strings to specify them, because they aren't part of Smalltalk syntax. Instead, use the inline modifier syntax. For example, `(?is:abc.)' is equivalent to `[Aa][Bb][Cc](?:.|\n)'. In most cases, you should specify regular expressions as ordinary strings. GNU Smalltalk always caches compiled regexes, and uses a special high-efficiency caching when looking up literal strings (i.e. most regexes), to hide the compiled `Regex' objects from most code. For special cases where this caching is not good enough, simply send `#asRegex' to a string to retrieved a compiled form, which works in all places in the public API where you would specify a regex string. You should always rely on the cache until you have demonstrated that using Regex objects makes a noticeable performance difference in your code. Smalltalk strings only have one escape, the `'' given by `''', so backslashes used in regular expression strings will be understood as backslashes, and a literal backslash can be given directly with `\\'(1). The methods on the compiled Regex object are private to this interface. As a public interface, GNU Smalltalk provides methods on String, in the category `regex'. There are several methods for matching, replacing, pattern expansion, iterating over matches, and other useful things. The fundamental operator is `#searchRegex:', usually written as `#=~', reminiscent of Perl syntax. This method will always return a `RegexResults', which you can query for whether the regex matched, the location Interval and contents of the match and any register groups as a collection, and other features. For example, here is a simple configuration file line parser: | file config | config := LookupTable new. file := (File name: 'myapp.conf') readStream. file linesDo: [:line | (line =~ '(\w+)\s*=\s*((?: ?\w+)+)') ifMatched: [:match | config at: (match at: 1) put: (match at: 2)]]. file close. config printNl. As with Perl, `=~' will scan the entire string and answer the leftmost match if any is to be found, consuming as many characters as possible from that position. You can anchor the search with variant messages like `#matchRegex:', or of course `^' and `$' with their usual semantics if you prefer. You shouldn't modify the string while you want a particular RegexResults object matched on it to remain valid, because changes to the matched text may propagate to the RegexResults object. Analogously to the Perl `s' operator, GNU Smalltalk provides `#replacingRegex:with:'. Unlike Perl, GNU Smalltalk employs the pattern expansion syntax of the `#%' message here. For example, `'The ratio is 16/9.' replacingRegex: '(\d+)/(\d+)' with: '$%1\over%2$'' answers `'The ratio is $16\over9$.''. In place of the `g' modifier, use the `#replacingAllRegex:with:' message instead. One other interesting String message is `#onOccurrencesOfRegex:do:', which invokes its second argument, a block, on every successful match found in the receiver. Internally, every search will start at the end of the previous successful match. For example, this will print all the words in a stream: stream contents onOccurrencesOfRegex: '\w+' do: [:each | each match printNl] ---------- Footnotes ---------- (1) Whereas it must be given as `\\\\' in a literal Emacs Lisp string, for example. 2.3 Namespaces ============== [This section (and the implementation of namespaces in GNU Smalltalk) is based on the paper `Structured Symbolic Name Spaces in Smalltalk', by Augustin Mrazik.] 2.3.1 Introduction ------------------ The Smalltalk-80 programming environment, upon which GNU Smalltalk is historically based, supports symbolic identification of objects in one global namespace--in the `Smalltalk' system dictionary. This means that each global variable in the system has its unique name which is used for symbolic identification of the particular object in the source code (e.g. in expressions or methods). The most important of these global variables are classes defining the behavior of objects. In development dealing with modelling of real systems, "polymorphic symbolic identification" is often needed. By this, we mean that it should be possible to use the same name for different classes or other global variables. Selection of the proper variable binding should be context-specific. By way of illustration, let us consider class `Statement' as an example which would mean totally different things in different domains: GNU Smalltalk or other programming language An expression in the top level of a code body, possibly with special syntax available such as assignment or branching. Bank A customer's trace report of recent transactions. AI, logical derivation An assertion of a truth within a logical system. This issue becomes inevitable if we start to work persistently, using `ObjectMemory snapshot' to save after each session for later resumption. For example, you might have the class `Statement' already in your image with the "Bank" meaning above (e.g. in the live bank support systems we all run in our images) and you might decide to start developing YAC [Yet Another C]. Upon starting to write parse nodes for the compiler, you would find that `#Statement' is boundk in the banking package. You could replace it with your parse node class, and the bank's `Statement' could remain in the system as an unbound class with full functionality; however, it could not be accessed anymore at the symbolic level in the source code. Whether this would be a problem or not would depend on whether any of the bank's code refers to the class `Statement', and when these references occur. Objects which have to be identified in source code by their names are included in `Smalltalk', the sole instance of `SystemDictionary'. Such objects may be identified simply by writing their names as you would any variable names. The code is compiled in the default environment, and if the variable is found in `Smalltalk', without being shadowed by a class pool or local variables, its value is retrieved and used as the value of the expression. In this way `Smalltalk' represents the sole symbolic namespace. In the following text the symbolic namespace, as a concept, will be called simply "environment" to make the text more clear. 2.3.2 Concepts -------------- To support polymorphic symbolical identification several environments will be needed. The same name may exist concurrently in several environments as a key, pointing to diverse objects in each. Symbolic navigation between these environments is needed. Before approaching the problem of the syntax and semantics to be implemented, we have to decide on structural relations to be established between environments. Since the environment must first be symbolically identified to direct access to its global variables, it must first itself be a global variable in another environment. `Smalltalk' is a great choice for the root environment, from which selection of other environments and their variables begins. From `Smalltalk' some of the existing sub-environments may be seen; from these other sub-environments may be seen, etc. This means that environments represent nodes in a graph where symbolic selections from one environment to another one represent branches. The symbolic identification should be unambiguous, although it will be polymorphic. This is why we should avoid cycles in the environment graph. Cycles in the graph could cause also other problems in the implementation, e.g. inability to use trivially recursive algorithms. Thus, in general, the environments must build a directed acyclic graph; GNU Smalltalk currently limits this to an n-ary tree, with the extra feature that environments can be used as pool dictionaries. Let us call the partial ordering relation which occurs between environments "inheritance". Sub-environments inherit from their super-environments. The feature of inheritance in the meaning of object-orientation is associated with this relation: all associations of the super-environment are valid also in its sub-environments, unless they are locally redefined in the sub-environment. A super-environment includes all its sub-enviroments as `Association's under their names. The sub-environment includes its super-environment under the symbol `#Super'. Most environments inherit from `Smalltalk', the standard root environment, but they are not required to do so; this is similar to how most classes derive from `Object', yet one can derive a class directly from `nil'. Since they all inherit `Smalltalk''s global variables, it is not necessary to define `Smalltalk' as pointing to `Smalltalk''s `Smalltalk' in each environment. The inheritance links to the super-environments are used in the lookup for a potentially inherited global variable. This includes lookups by a compiler searching for a variable binding and lookups via methods such as `#at:' and `#includesKey:'. 2.3.3 Syntax ------------ Global objects of an environment, be they local or inherited, may be referenced by their symbol variable names used in the source code, e.g. John goHome if the `#John -> aMan' association exists in the particular environment or one of its super-environments, all along the way to the root environment. If an object must be referenced from another environment (i.e. which is not one of its sub-environments) it has to be referenced either _relatively_ to the position of the current environment, using the `Super' symbol, or _absolutely_, using the "full pathname" of the object, navigating from the tree root (usually `Smalltalk') through the tree of sub-environments. For the identification of global objects in another environment, we use a "pathname" of symbols. The symbols are separated by periods; the "look" to appear is that of Smalltalk.Tasks.MyTask and of Super.Super.Peter. As is custom in Smalltalk, we are reminded by capitalization that we are accessing global objects. Another syntax returns the "variable binding", the `Association' for a particular global. The first example above is equivalently: #{Smalltalk.Tasks.MyTask} value The latter syntax, a "variable binding", is also valid inside literal arrays. 2.3.4 Implementation -------------------- A superclass of `SystemDictionary' called `RootNamespace' is defined, and many of the features of the Smalltalk-80 `SystemDictionary' will be hosted by that class. `Namespace' and `RootNamespace' are in turn subclasses of `AbstractNamespace'. To handle inheritance, the following methods have to be defined or redefined in Namespace (_not_ in RootNamespace): Accessors like `#at:ifAbsent:' and `#includesKey:' Inheritance must be implemented. When `Namespace', trying to read a variable, finds an association in its own dictionary or a super-environment dictionary, it uses that; for `Dictionary''s writes and when a new association must be created, `Namespace' creates it in its own dictionary. There are special methods like `#set:to:' for cases in which you want to modify a binding in a super-environment if that is the relevant variable's binding. Enumerators like `#do:' and `#keys' This should return *all* the objects in the namespace, including those which are inherited. Hierarchy access `AbstractNamespace' will also implement a new set of methods that allow one to navigate through the namespace hierarchy; these parallel those found in `Behavior' for the class hierarchy. The most important task of the `Namespace' class is to provide organization for the most important global objects in the Smalltalk system--for the classes. This importance becomes even more crucial in a structure of multiple environments intended to change the semantics of code compiled for those classes. In Smalltalk the classes have the instance variable `name' which holds the name of the class. Each "defined class" is included in `Smalltalk', or another environment, under this name. In a framework with several environments the class should know the environment in which it has been created and compiled. This is a new property of `Class' which must be defined and properly used in relevant methods. In the mother environment the class shall be included under its name. Any class, as with any other object, may be included concurrently in several environments, even under different symbols in the same or in diverse environments. We can consider these "alias names" of the particular class or other value. A class may be referenced under the other names or in other environments than its mother environment, e.g. for the purpose of instance creation or messages to the class, but it should not compile code in these environments, even if this compilation is requested from another environment. If the syntax is not correct in the mother environment, a compilation error occurs. This follows from the existence of class "mother environments", as a class is responsible for compiling its own methods. An important issue is also the name of the class answered by the class for the purpose of its identification in diverse tools (e.g. in a browser). This must be changed to reflect the environment in which it is shown, i.e. the method `nameIn: environment' must be implemented and used in proper places. Other changes must be made to the Smalltalk system to achieve the full functionality of structured environments. In particular, changes have to be made to the behavior classes, the user interface, the compiler, and a few classes supporting persistance. One small detail of note is that evaluation in the REPL or `Workspace', implemented by compiling methods on `UndefinedObject', make more sense if `UndefinedObject''s environment is the "current environment" as reachable by `Namespace current', even though its mother environment by any other sensibility is `Smalltalk'. 2.3.5 Using namespaces ---------------------- Using namespaces is often merely a matter of adding a `namespace' option to the GNU Smalltalk XML package description used by `PackageLoader', or wrapping your code like this: Namespace current: NewNS [ ... ] Namespaces can be imported into classes like this: Stream subclass: EncodedStream [ ] Alternatively, paths to classes (and other objects) in the namespaces will have to be specified completely. Importing a namespace into a class is similar to C++'s `using namespace' declaration within the class proper's definition. Finally, be careful when working with fundamental system classes. Although you can use code like Namespace current: NewNS [ Smalltalk.Set subclass: #Set [ ... ] ] this approach won't work when applied to core classes. For example, you might be successful with a `Set' or `WriteStream' object, but subclassing `SmallInteger' this way can bite you in strange ways: integer literals will still belong to the `Smalltalk' dictionary's version of the class (this holds for `Array's, `String's, etc. too), primitive operations will still answer standard Smalltalk `SmallIntegers', and so on. Similarly, word-shaped will recognize 32-bit `Smalltalk.LargeInteger' objects, but not `LargeInteger's belonging to your own namespace. Unfortunately, this problem is not easy to solve since Smalltalk has to know the OOPs of determinate class objects for speed--it would not be feasible to lookup the environment to which sender of a message belongs every time the `+' message was sent to an Integer. So, GNU Smalltalk namespaces cannot yet solve 100% of the problem of clashes between extensions to a class--for that you'll still have to rely on prefixes to method names. But they _do_ solve the problem of clashes between class names, or between class names and pool dictionary names. Namespaces are unrelated from packages; loading a package does not import the corresponding namespace. 2.4 Disk file-IO primitive messages =================================== Four classes (`FileDescriptor', `FileStream', `File', `Directory') allow you to create files and access the file system in a fully object-oriented way. `FileDescriptor' and `FileStream' are much more powerful than the corresponding C language facilities (the difference between the two is that, like the C `stdio' library, `FileStream' does buffering). For one thing, they allow you to write raw binary data in a portable endian-neutral format. But, more importantly, these classes transparently implement virtual filesystems and asynchronous I/O. Asynchronous I/O means that an input/output operation blocks the Smalltalk Process that is doing it, but not the others, which makes them very useful in the context of network programming. Virtual file systems mean that these objects can transparently extract files from archives such as `tar' and `gzip' files, through a mechanism that can be extended through either shell scripting or Smalltalk programming. For more information on these classes, look in the class reference, under the `VFS' namespace. URLs may be used as file names; though, unless you have loaded the `NetClients' package (*note Network support::), only `file' URLs will be accepted. In addition, the three files, `stdin', `stdout', and `stderr' are declared as global instances of `FileStream' that are bound to the proper values as passed to the C virtual machine. They can be accessed as either `stdout' and `FileStream stdout'--the former is easier to type, but the latter can be clearer. Finally, `Object' defines four other methods: `print' and `printNl', `store' and `storeNl'. These do a `printOn:' or `storeOn:' to the "Transcript" object; this object, which is the sole instance of class `TextCollector', normally delegates write operations to `stdout'. If you load the Blox GUI, instead, the Transcript Window will be attached to the Transcript object (*note Blox::). The `fileIn:' message sent to the FileStream class, with a file name as a string argument, will cause that file to be loaded into Smalltalk. For example, FileStream fileIn: 'foo.st' ! will cause `foo.st' to be loaded into GNU Smalltalk. 2.5 The GNU Smalltalk ObjectDumper ================================== Another GNU Smalltalk-specific class, the `ObjectDumper' class, allows you to dump objects in a portable, endian-neutral, binary format. Note that you can use the `ObjectDumper' on ByteArrays too, thanks to another GNU Smalltalk-specific class, `ByteStream', which allows you to treat ByteArrays the same way you would treat disk files. For more information on the usage of the `ObjectDumper', look in the class reference. 2.6 Dynamic loading =================== The `DLD' class enhances the C callout mechanism to automatically look for unresolved functions in a series of program-specified libraries. To add a library to the list, evaluate code like the following: DLD addLibrary: 'libc' The extension (`.so', `.sl', `.a', `.dll' depending on your operating system) will be added automatically. You are advised not to specify it for portability reasons. You will then be able to use the standard C call-out mechanisms to define all the functions in the C run-time library. Note that this is a potential security problem (especially if your program is SUID root under Unix), so you might want to disable dynamic loading when using GNU Smalltalk as an extension language. To disable dynamic loading, configure GNU Smalltalk passing the `--disable-dld' switch. Note that a `DLD' class will be present even if dynamic loading is disabled (either because your system is not supported, or by the `--disable-dld' configure switch) but any attempt to perform dynamic linking will result in an error. 2.7 Automatic documentation generator ===================================== GNU Smalltalk includes an automatic documentation generator invoked via the `gst-doc' command. The code is actually part of the `ClassPublisher' package, and `gst-doc' takes care of reading the code to be documented and firing a `ClassPublisher'. Currently, `gst-doc' can only generate output in Texinfo format, though this will change in future releases. `gst-doc' can document code that is already in the image, or it can load external files and packages. Note that the latter approach will not work for files and packages that programmatically create code or file in other files/packages. `gst-doc' is invoked as follows: gst-doc [ FLAG ... ] CLASS ... The following options are supported: `-p PACKAGE' `--package=PACKAGE' Produce documentation for the classes inside the PACKAGE package. `-f FILE' `--file=FILE' Produce documentation for the classes inside the FILE file. `-I' `--image-file' Produce documentation for the code that is already in the given image. `-o' `--output=FILE' Emit documentation in the named file. CLASS is either a class name, or a namespace name followed by `.*'. Documentation will be written for classes that are specified in the command line. CLASS can be omitted if a `-f' or `-p' option is given. In this case, documentation will be written for all the classes in the package. 2.8 Memory accessing methods ============================ GNU Smalltalk provides methods to query its own internal data structures. You may determine the real memory address of an object or the real memory address of the OOP table that points to a given object, by using messages to the `Memory' class, described below. -- Method on Object: asOop Returns the index of the OOP for anObject. This index is immume from garbage collection and is the same value used by default as an hash value for anObject (it is returned by Object's implementation of `hash' and `identityHash'). -- Method on Integer: asObject Converts the given OOP _index_ (not address) back to an object. Fails if no object is associated to the given index. -- Method on Integer: asObjectNoFail Converts the given OOP _index_ (not address) back to an object. Returns nil if no object is associated to the given index. Other methods in ByteArray and Memory allow to read various C types (`doubleAt:', `ucharAt:', etc.). For examples of using `asOop' and `asObject', look at the Blox source code in `blox-tk/BloxBasic.st'. Another interesting class is ObjectMemory. This provides a few methods that enable one to tune the virtual machine's usage of memory; many methods that in the past were instance methods of Smalltalk or class methods of Memory are now class methods of ObjectMemory. In addition, and that's what the rest of this section is about, the virtual machines signals events to its dependents exactly through this class. The events that can be received are "returnFromSnapshot" This is sent every time an image is restarted, and substitutes the concept of an "init block" that was present in previous versions. "aboutToQuit" This is sent just before the interpreter is exiting, either because `ObjectMemory quit' was sent or because the specified files were all filed in. Exiting from within this event might cause an infinite loop, so be careful. "aboutToSnapshot" This is sent just before an image file is created. Exiting from within this event will leave any preexisting image untouched. "finishedSnapshot" This is sent just after an image file is created. Exiting from within this event will not make the image unusable. 2.9 Memory management in GNU Smalltalk ====================================== The GNU Smalltalk virtual machine is equipped with a garbage collector, a facility that reclaims the space occupied by objects that are no longer accessible from the system roots. The collector is composed of several parts, each of which can be invoked by the virtual machine using various tunable strategies, or invoked manually by the programmer. These parts include a "generation scavenger", a "mark & sweep" collectory with an incremental sweep phase, and a "compactor". All these facilities work on different memory spaces and differs from the other in its scope, speed and disadvantages (which are hopefully balanced by the availability of different algorithms). What follows is a description of these algorithms and of the memory spaces they work in. "NewSpace" is the memory space where young objects live. It is composed of three sub-spaces: an object-creation space ("Eden") and two "SurvivorSpaces". When an object is first created, it is placed in Eden. When Eden starts to fill up (i.e., when the number of used bytes in Eden exceeds the scavenge threshold), objects that are housed in Eden or in the occupied SurvivorSpace and that are still reachable from the system roots are copied to the unoccupied SurvivorSpace. As an object survives different scavenging passes, it will be shuffled by the scavenger from the occupied SurvivorSpace to the unoccupied one. When the number of used bytes in SurvivorSpace is high enough that the scavenge pause might be excessively long, the scavenger will move some of the older surviving objects from NewSpace to "OldSpace". In the garbage collection jargon, we say that such objects are being "tenured" to OldSpace. This garbage collection algorithm is designed to reclaim short-lived objects, that is those objects that expire while residing in NewSpace, and to decide when enough data is residing in NewSpace that it is useful to move some of it in OldSpace. A "copying" garbage collector is particularly efficient in an object population whose members are more likely to die than survive, because this kind of scavenger spends most of its time copying survivors, who will be few in number in such populations, rather than tracing corpses, who will be many in number. This fact makes copying collection especially well suited to NewSpace, where a percentage of 90% or more objects often fails to survive across a single scavenge. The particular structure of NewSpace has many advantages. On one hand, having a large Eden and two small SurvivorSpaces has a smaller memory footprint than having two equally big semi-spaces and allocating new objects directly from the occupied one (by default, GNU Smalltalk uses 420=300+60*2 kilobytes of memory, while a simpler configuration would use 720=360*2 kilobytes). On the other hand, it makes tenuring decisions particularly simple: the copying order is such that short-lived objects tend to be copied last, while objects that are being referred from OldSpace tend to be copied first: this is because the tenuring strategy of the scavenger is simply to treat the destination SurvivorSpace as a circular buffer, tenuring objects with a First-In-First-Out policy. An object might become part of the scavenger root set for several reasons: objects that have been tenured are roots if their data lives in an OldSpace page that has been written to since the last scavenge (more on this later), plus all objects can be roots if they are known to be referenced from C code or from the Smalltalk stacks. In turn, some of the old objects can be made to live in a special area, called "FixedSpace". Objects that reside in FixedSpace are special in that their body is guaranteed to remain at a fixed address (in general, GNU Smalltalk only ensures that the header of the object remains at a fixed address in the Object Table). Because the garbage collector can and does move objects, passing objects to foreign code which uses the object's address as a fixed key, or which uses a ByteArray as a buffer, presents difficulties. One can use `CObject' to manipulate C data on the `malloc' heap, which indeed does not move, but this can be tedious and requires the same attentions to avoid memory leaks as coding in C. FixedSpace provides a much more convenient mechanism: once an object is deemed fixed, the object's body will never move through-out its life-time; the space it occupies will however still be returned automatically to the FixedSpace pool when the object is garbage collected. Note that because objects in FixedSpace cannot move, FixedSpace cannot be compacted and can therefore suffer from extensive fragmentation. For this reason, FixedSpace should be used carefully. FixedSpace however is rebuilt (of course) every time an image is brought up, so a kind of compaction of FixedSpace can be achieved by saving a snapshot, quitting, and then restarting the newly saved image. Memory for OldSpace and FixedSpace is allocated using a variation of the system allocator `malloc': in fact, GNU Smalltalk uses the same allocator for its own internal needs, for OldSpace and for FixedSpace, but it ensures that a given memory page never hosts objects that reside in separate spaces. New pages are mapped into the address space as needed and devoted to OldSpace or FixedSpace segments; similarly, when unused they may be subsequently unmapped, or they might be left in place waiting to be reused by `malloc' or by another Smalltalk data space. Garbage that is created among old objects is taken care of by a mark & sweep collector which, unlike the scavenger which only reclaims objects in NewSpace, can only reclaim objects in OldSpace. Note that as objects are allocated, they will not only use the space that was previously occupied in the Eden by objects that have survived, but they will also reuse the entries in the global Object Table that have been freed by object that the scavenger could reclaim. This quest for free object table entries can be combined with the sweep phase of the OldSpace collector, which can then be done incrementally, limiting the disruptive part of OldSpace garbage collection to the mark phase. Several runs of the mark & sweep collector can lead to fragmentation (where objects are allocated from several pages, and then become garbage in an order such that a bunch of objects remain in each page and the system is not able to recycle them). For this reason, the system periodically tries to compact OldSpace. It does so simply by looping through every old object and copying it into a new OldSpace. Since the OldSpace allocator does not suffer from fragmentation until objects start to be freed nor after all objects are freed, at the end of the copy all the pages in the fragmented OldSpace will have been returned to the system (some of them might already have been used by the compacted OldSpace), and the new, compacted OldSpace is ready to be used as the system oldspace. Growing the object heap (which is done when it is found to be quite full even after a mark & sweep collection) automatically triggers a compaction. You can run the compactor without marking live objects. Since the amount of garbage in OldSpace is usually quite limited, the overhead incurred by copying potentially dead objects is small enough that the compactor still runs considerably faster than a full garbage collection, and can still give the application some breathing room. Keeping OldSpace and FixedSpace in the same heap would then make compaction of OldSpace (whereby it is rebuilt from time to time in order to limit fragmentation) much less effective. Also, the `malloc' heap is not used for FixedSpace objects because GNU Smalltalk needs to track writes to OldSpace and FixedSpace in order to support efficient scavenging of young objects. To do so, the grey page table(1) contains one entry for each page in OldSpace or FixedSpace that is thought to contain at least a reference to an object housed in NewSpace. Every page in OldSpace is created as grey, and is considered grey until a scavenging pass finds out that it actually does not contain pointers to NewSpace. Then the page is recolored black(2), and will stay black until it is written to or another object is allocated in it (either a new fixed object, or a young object being tenured). The grey page table is expanded and shrunk as needed by the virtual machine. Drawing an histogram of object sizes shows that there are only a few sources of large objects on average (i.e., objects greater than a page in size), but that enough of these objects are created dynamically that they must be handled specially. Such objects should not be allocated in NewSpace along with ordinary objects, since they would fill up NewSpace prematurely (or might not even fit in it), thus accelerating the scavenging rate, reducing performance and resulting in an increase in tenured garbage. Even though this is not an optimal solution because it effectively tenures these objects at the time they are created, a benefit can be obtained by allocating these objects directly in FixedSpace. The reason why FixedSpace is used is that these objects are big enough that they don't result in fragmentation(3); and using FixedSpace instead of OldSpace avoids that the compactor copies them because this would not provide any benefit in terms of reduced fragmentation. Smalltalk activation records are allocated from another special heap, the context pool. This is because it is often the case that they can be deallocated in a Last-In-First-Out (stack) fashion, thereby saving the work needed to allocate entries in the object table for them, and quickly reusing the memory that they use. When the activation record is accessed by Smalltalk, however, the activation record must be turned into a first-class `OOP'(4). Since even these objects are usually very short-lived, the data is however not copied to the Eden: the eviction of the object bodies from the context pool is delayed to the next scavenging, which will also empty the context pool just like it empties Eden. If few objects are allocated and the context pool turns full before the Eden, a scavenging is also triggered; this is however quite rare. Optionally, GNU Smalltalk can avoid the overhead of interpretation by executing a given Smalltalk method only after that method has been compiled into the underlying microprocessor's machine code. This machine-code generation is performed automatically, and the resulting machine code is then placed in `malloc'-managed memory. Once executed, a method's machine code is left there for subsequent execution. However, since it would require way too much memory to permanently house the machine-code version of every Smalltalk method, methods might be compiled more than once: when a translation is not used at the time that two garbage collection actions are taken (scavenges and global garbage collections count equally), the incremental sweeper discards it, so that it will be recomputed if and when necessary. ---------- Footnotes ---------- (1) The denomination "grey" comes from the lexicon of "tri-color marking", which is an abstraction of every possible garbage collection algorithm: in tri-color marking, grey objects are those that are known to be reachable or that we are not interested in reclaiming, yet have not been scanned yet to mark the objects that they refer to as reachable. (2) Black objects are those that are known to be reachable or that we are not interested in reclaiming, and are known to have references only to other black or grey objects (in case you're curious, the tri-color marking algorithm goes on like this: object not yet known to be reachable are white, and when all objects are either black or white, the white ones are garbage). (3) Remember that free pages are shared among the three heaps, that is, OldSpace, FixedSpace and the `malloc' heap. When a large object is freed, the memory that it used can be reused by `malloc' or by OldSpace allocation (4) This is short for "Ordinary Object Pointer". 2.10 Security in GNU Smalltalk ============================== 2.11 Special kinds of objects ============================= A few methods in Object support the creation of particular objects. This include: * finalizable objects * weak and ephemeron objects (i.e. objects whose contents are considered specially, during the heap scanning phase of garbage collection). * read-only objects (like literals found in methods) * fixed objects (guaranteed not to move across garbage collections) They are: -- Method on Object: makeWeak Marks the object so that it is considered weak in subsequent garbage collection passes. The garbage collector will consider dead an object which has references only inside weak objects, and will replace references to such an "almost-dead" object with nils, and then send the `mourn' message to the object. -- Method on Object: makeEphemeron Marks the object so that it is considered specially in subsequent garbage collection passes. Ephemeron objects are sent the message `mourn' when the first instance variable is not referenced or is referenced _only through another instance variable in the ephemeron_. Ephemerons provide a very versatile base on which complex interactions with the garbage collector can be programmed (for example, finalization which is described below is implemented with ephemerons). -- Method on Object: addToBeFinalized Marks the object so that, as soon as it becomes unreferenced, its `finalize' method is called. Before `finalize' is called, the VM implicitly removes the objects from the list of finalizable ones. If necessary, the `finalize' method can mark again the object as finalizable, but by default finalization will only occur once. Note that a finalizable object is kept in memory even when it has no references, because tricky finalizers might "resuscitate" the object; automatic marking of the object as not to be finalized has the nice side effect that the VM can simply delay the releasing of the memory associated to the object, instead of being forced to waste memory even after finalization happens. An object must be explicitly marked as to be finalized _every time the image is loaded_; that is, finalizability is not preserved by an image save. This was done because in most cases finalization is used together with `CObject's that would be stale when the image is loaded again, causing a segmentation violation as soon as they are accessed by the finalization method. -- Method on Object: removeToBeFinalized Removes the to-be-finalized mark from the object. As I noted above, the finalize code for the object does not have to do this explicitly. -- Method on Object: finalize This method is called by the VM when there are no more references to the object (or, of course, if it only has references inside weak objects). -- Method on Object: isReadOnly This method answers whether the VM will refuse to make changes to the objects when methods like `become:', `basicAt:put:', and possibly `at:put:' too (depending on the implementation of the method). Note that GNU Smalltalk won't try to intercept assignments to fixed instance variables, nor assignments via `instVarAt:put:'. Many objects (Characters, `nil', `true', `false', method literals) are read-only by default. -- Method on Object: makeReadOnly: aBoolean Changes the read-only or read-write status of the receiver to that indicated by `aBoolean'. -- Method on Object: basicNewInFixedSpace Same as `#basicNew', but the object won't move across garbage collections. -- Method on Object: basicNewInFixedSpace: Same as `#basicNew:', but the object won't move across garbage collections. -- Method on Object: makeFixed Ensure that the receiver won't move across garbage collections. This can be used either if you decide after its creation that an object must be fixed, or if a class does not support using `#new' or `#new:' to create an object Note that, although particular applications will indeed have a need for fixed, read-only or finalizable objects, the `#makeWeak' primitive is seldom needed and weak objects are normally used only indirectly, through the so called "weak collections". These are easier to use because they provide additional functionality (for example, `WeakArray' is able to determine whether an item has been garbage collected, and `WeakSet' implements hash table functionality); they are: * `WeakArray' * `WeakSet' * `WeakKeyDictionary' * `WeakValueLookupTable' * `WeakIdentitySet' * `WeakKeyIdentityDictionary' * `WeakValueIdentityDictionary' Versions of GNU Smalltalk preceding 2.1 included a `WeakKeyLookupTable' class which has been replaced by `WeakKeyDictionary'; the usage is completely identical, but the implementation was changed to use a more efficient approach based on ephemeron objects. 3 Packages ********** GNU Smalltalk includes a packaging system which allows one to file in components (often called "goodies" in Smalltalk's very folkloristic terminology) without caring of whether they need other goodies to be loaded first. The packaging system is implemented by a Smalltalk class, `PackageLoader', which looks for information about packages in the XML file named (guess what) `packages.xml', in one of three places: * the kernel directory's parent directory; this is where an installed `packages.xml' resides, in a system-wide data directory such as `/usr/local/share/smalltalk'; * in the file `.st/packages.xml', hosting per-user packages; * finally, there can be a `packages.xml' in the same directory as the current image. There are two ways to load something using the packaging system. The first way is to use the PackageLoader's `fileInPackage:' and `fileInPackages:' methods. For example: PackageLoader fileInPackages: #('Blox' 'Browser'). PackageLoader fileInPackage: 'Compiler'. The second way is to use the `gst-load' script which is installed together with the virtual machine. For example, you can do: gst-load Browser Blox Compiler and GNU Smalltalk will automatically file in: * BloxTK, needed by Blox * Blox, loaded first because Browser needs it * Parser, not specified, but needed by Browser and Compiler * Browser * Compiler (Blox is skipped because it has already been loaded) Then it will save the Smalltalk image, and finally exit. `gst-load' supports several options: `-I' `--image-file' Load the packages inside the given image. `-q' `--quiet' Hide the script's output. `-v' `--verbose' Show which files are loaded, one by one. `-f' `--force' If a package given on the command-line is already present, reload it. This does not apply to automatically selected prerequisites. `-t' `--test' Run the package testsuite before installing, and exit with a failure if the tests fail. Currently, the testsuites are placed in the image together with the package, but this may change in future versions. `-n' `--dry-run' Do not save the image after loading. `--start[=ARG]' Start the services identified by the package. If an argument is given, only one package can be specified on the command-line. If at least one package specifies a startup script, `gst-load' won't exit. To provide support for this system, you have to give away with your GNU Smalltalk goodies a small file (usually called `package.xml') which looks like this: BloxGTK BLOX blox-gtk GTK BLOX BloxBasic.st BloxWidgets.st BloxText.st BloxExtend.st Blox.st Blox.st BloxBasic.st BloxWidgets.st BloxText.st BloxExtend.st Other tags exist: `module' Loads a dynamic shared object and calls the `gst_initModule' function in it. Modules can register functions so that Smalltalk code can call them, and can interact with or manipulate Smalltalk objects. The `TCP' package uses a module to provide a bridge to the socket functions. `library' Loads a dynamic shared object and registers the functions in it so that they can all be called from Smalltalk code. The `GTK' package registers the GTK+ library in this way, so that the bindings can use them. `callout' Instructs to load the package only if the C function whose name is within the tag is available to be called from Smalltalk code. `sunit' Specifies a testing script that `gst-sunit' (*note SUnit::) will run in order to test the package. If this is specified, the package should list `SUnit' among the prerequisites. `start' Specifies a Smalltalk script that `gst-load' and `gst-remote' will execute in order to start the execution of the service implemented in the package. Before executing the script, `%1' is replaced with either `nil' or a String literal. `stop' Specifies a Smalltalk script that `gst-remote' will execute in order to shut down the service implemented in the package. Before executing the script, `%1' is replaced with either `nil' or a String literal. `test' Specifies a subpackage that is only loaded by `gst-sunit' in order to test the package. The subpackage may include arbitrary tags (including `file', `filein' and `sunit') but not `name'. The `SUnit' package is implicitly made a prerequisite of the testing subpackage, and the default value of `directory' and `namespace' is the one given for the outer package. To install your package, you only have to do gst-package path/to/package.xml `gst-package' is a Smalltalk script which will create a `.star' archive in the current image directory, with the files specified in the `file' tags. By default the package is placed in the system-wide package directory; you can use the option `--target-directory' to create the `.star' file elsewhere). Alternatively, `gst-package' can be used to create a skeleton GNU style source tree. This includes a `configure.ac' that will find the installation path of GNU Smalltalk, and a `Makefile.am' to support all the standard Makefile targets (including `make install' and `make dist'). To do so, go in the directory that is to become the top of the source tree and type. gst-package --prepare path1/package.xml path2/package.xml In this case the generated configure script and Makefile will use more features of `gst-package', which are yet to be documented. The GNU Smalltalk makefile similarly uses `gst-package' to install packages and to prepare the distribution tarballs. The rest of this chapter discusses some of the packages provided with GNU Smalltalk. 3.1 Blox ======== Blox is a GUI building block tool kit. It is an abstraction on top of the a platform's native GUI toolkit that is common across all platforms. Writing to the Blox interface means your GUI based application will be portable to any platform where Blox is supported. The Blox classes, which reside in the `BLOX' namespace and are fully documented in *Note Graphical users interfaces with BLOX: (gst-libs)BLOX, act as wrappers around other toolkits, which constitute the required portability layer; currently the only one supported is Tcl/Tk but alternative versions of Blox, for example based on Gtk+ and GNOME, have been considered and might even replace Tcl/Tk in the future(1). Instead of having to rewrite widgets and support for each platform, Blox simply asks the other toolkit to do so (currently, it hands valid Tcl code to a standard Tcl 8.0 environment); the abstraction from the operating system being used is then extracted out of GNU Smalltalk. Together with the toolkit, there is a browsing system in the `browser' directory that will allow the programmer to view the source code for existing classes, to modify existing classes and methods, to get detailed information about the classes and methods, and to evaluate code within the browser. In addition, some simple debugging tools are provided. An Inspector window allows the programmer to graphically inspect and modify the representation of an object and a walkback inspector was designed which will display a backtrace when the program encounters an error. The Transcript global object is redirected to print to the transcript window instead of printing to stdout, and the transcript window as well as the workspaces, unlike the console read-eval-print loop, support variables that live across multiple evaluations: a := 2 "Do-it" a + 2 "Print-it: 4 will be shown" This browser evolved from an Xt-based version developed around 1993 written by Brad Diller (bdiller@docent.com). Because of legal concerns about possible copyright infringement because his initial implementation used parts of ParcPlace's Model-View-Controller (MVC) message interface, he and Richard Stallman devised a new window update scheme which is more flexible and powerful than MVC's dependency mechanism, and allowed him to purge all the MVC elements from the implementation. The code was then further improved to employ a better class design (for example, Brad used Dictionaries for classes still to be fleshed out), to be aesthetically more appealing (taking advantage of the new Blox text widget, the code browsers were enhanced with syntax highlighting), and to be more complete (adding multiple "views" to the inspector, namespace support and a complete debugger). To start the browser you can simply type: gst-blox This will load any requested packages, then, if all goes well, a worksheet window with a menu named "Smalltalk" will appear in the top-left corner of the screen. ---------- Footnotes ---------- (1) The Gtk+ bindings for GNU Smalltalk are in an embryonic state; you can find them in the `GTK' package if you have Gtk+ 2.0 or later installed 3.2 The Smalltalk-in-Smalltalk library ====================================== The Smalltalk-in-Smalltalk library is a set of classes for looking at Smalltalk code, constructing models of Smalltalk classes that can later be created for real, analyzing and performing changes to the image, finding smelly code and automatically doing repetitive changes. This package incredibly enhances the reflective capabilities of Smalltalk. A fundamental part of the system is the recursive-descent parser which creates parse nodes in the form of instances of subclasses of `RBProgramNode'. The parser's extreme flexibility can be exploited in three ways, all of which are demonstrated by source code available in the distribution: * First, actions are not hard-coded in the parser itself: the parser creates a parse tree, then hands it to methods in `RBParser' that can be overridden in different `RBParser' subclasses. This is done by the compiler itself, in which a subclass of `RBParser' (class `STFileInParser') hands the parse trees to the `STCompiler' class. * Second, an implementation of the "visitor" pattern is provided to help in dealing with parse trees created along the way; this approach is demonstrated by the Smalltalk code pretty-printer in class `RBFormatter', by the syntax highlighting engine included with the browser, and by the compiler. * The parser is able to perform complex tree searches and rewrites, through the ParseTreeSearcher and ParseTreeRewriter classes. In addition, two applications were created on top of this library which are specific to GNU Smalltalk. The first is a compiler for Smalltalk methods written in Smalltalk itself, whose source code provides good insights into the GNU Smalltalk virtual machine. The second is the automatic documentation extractor, contained in two files, `packages/stinst/compiler/STLoader.st' and `packages/stinst/compiler/STLoaderObjs.st'. To be able to create Texinfo files even if the library cannot be loaded (for example, `BLOX' requires a running X server) Smalltalk source code is interpreted and objects for the classes and methods being read in are created; then, polymorphism allows one to treat these exactly like usual classes which can be fed to GNU Smalltalk's `ClassPublisher' (found in `packages/stinst/doc/Publish.st'. 3.3 Database connectivity ========================= GNU Smalltalk includes support for connecting to databases. Currently this support is limited to retrieving result sets from SQL selection queries and executing SQL data manipulation queries; in the future however a full object model will be available that hides the usage of SQL. Classes that are independent of the database management system that is in use reside in package `DBI', while the drivers proper reside in separate packages which have `DBI' as a prerequisite; currently, drivers are supplied for _MySQL_ and _PostgreSQL_, in packages `DBD-MySQL' and `DBD-PostgreSQL' respectively. Using the library is fairly simple. To execute a query you need to create a connection to the database, create a statement on the connection, and execute your query. For example, let's say I want to connect to the `test' database on the localhost. My user name is `doe' and my password is `mypass'. | connection statement result | connection := DBI.Connection connect: 'dbi:MySQL:dbname=test:host=localhost' user: 'doe' password: 'mypass'). You can see that the DBMS-specific classes live in a sub-namespace of `DBI', while DBMS-independent classes live in `DBI'. Here is how I execute a query. statement := connection execute: 'insert into aTable (aField) values (123)'. The result that is returned is a `ResultSet'. For date queries the object returns the number of ows affected. For read queries (such as selection queries) the result set supports standard stream protocol (`next', `atEnd' to read rows off the result stream) and can also supply collection of column information. These are instances of `ColumnInfo') and describe the type, size, and other characteristics of the returned column. A common usage of a ResultSet would be: | resultSet values | [resultSet atEnd] whileFalse: [values add: (resultSet next at: 'columnName') ]. 3.4 Internationalization and localization support ================================================= Different countries and cultures have varying conventions for how to communicate. These conventions range from very simple ones, such as the format for representing dates and times, to very complex ones, such as the language spoken. Provided the programs are written to obey the choice of conventions, they will follow the conventions preferred by the user. GNU Smalltalk provides two packages to ease you in doing so. The `I18N' package covers both "internationalization" and "multilingualization"; the lighter-weight `Iconv' package covers only the latter, as it is a prerequisite for correct internationalization. "Multilingualizing" software means programming it to be able to support languages from every part of the world. In particular, it includes understanding multi-byte character sets (such as UTF-8) and Unicode characters whose "code point" (the equivalent of the ASCII value) is above 127. To this end, GNU Smalltalk provides the `UnicodeString' class that stores its data as 32-bit Unicode values. In addition, `Character' will provide support for all the over one million available code points in Unicode. Loading the `I18N' package improves this support through the `EncodedStream' class(1), which interprets and transcodes non-ASCII Unicode characters. This support is mostly transparent, because the base classes `Character', `UnicodeCharacter' and `UnicodeString' are enhanced to use it. Sending `asString' or `printString' to an instance of `Character' and `UnicodeString' will convert Unicode characters so that they are printed correctly in the current locale. For example, `$<279> printNl' will print a small Latin letter `e' with a dot above, when the `I18N' package is loaded. Dually, you can convert `String' or `ByteArray' objects to Unicode with a single method call. If the current locale's encoding is UTF-8, `#[196 151] asUnicodeString' will return a Unicode string with the same character as above, the small Latin letter `e' with a dot above. The implementation of multilingualization support is not yet complete. For example, methods such as `asLowercase', `asUppercase', `isLetter' do not yet recognize Unicode characters. You need to exercise some care, or your program will be buggy when Unicode characters are used. In particular, Characters must *not* be compared with `=='(2) and should be printed on a Stream with `display:' rather than `nextPut:'. Also, Characters need to be created with the class method `codePoint:' if you are referring to their Unicode value; `codePoint:' is also the only method to create characters that is accepted by the ANSI Standard for Smalltalk. The method `value:', instead, should be used if you are referring to a byte in a particular encoding. This subtle difference means that, for example, the last two of the following examples will fail: "Correct. Use #value: with Strings, #codePoint: with UnicodeString." String with: (Character value: 65) String with: (Character value: 128) UnicodeString with: (Character codePoint: 65) UnicodeString with: (Character codePoint: 128) "Correct. Only works for characters in the 0-127 range, which may be considered as defensive programming." String with: (Character codePoint: 65) "Dubious, and only works for characters in the 0-127 range. With UnicodeString, probably you always want #codePoint:." UnicodeString with: (Character value: 65) "Fails, we try to use a high character in a String" String with: (Character codePoint: 128) "Fails, we try to use an encoding in a Unicode string" UnicodeString with: (Character value: 128) "Internationalizing" software, instead, means programming it to be able to adapt to the user's favorite conventions. These conventions can get pretty complex; for example, the user might specify the locale `espana-castellano' for most purposes, but specify the locale `usa-english' for currency formatting: this might make sense if the user is a Spanish-speaking American, working in Spanish, but representing monetary amounts in US dollars. You can see that this system is simple but, at the same time, very complete. This manual, however, is not the right place for a thorough discussion of how an user would set up his system for these conventions; for more information, refer to your operating system's manual or to the GNU C library's manual. GNU Smalltalk inherits from ISO C the concept of a "locale", that is, a collection of conventions, one convention for each purpose, and maps each of these purposes to a Smalltalk class defined by the `I18N' package, and these classes form a small hierarchy with class `Locale' as its roots: * `LcNumeric' formats numbers; `LcMonetary' and `LcMonetaryISO' format currency amounts. * `LcTime' formats dates and times. * `LcMessages' translates your program's output. Of course, the package can't automatically translate your program's output messages into other languages; the only way you can support output in the user's favorite language is to translate these messages by hand. The package does, though, provide methods to easily handle translations into multiple languages. Basic usage of the `I18N' package involves a single selector, the question mark (`?'), which is a rarely used yet valid character for a Smalltalk binary message. The meaning of the question mark selector is "Hey, how do you say ... under your convention?". You can send `?' to either a specific instance of a subclass of `Locale', or to the class itself; in this case, rules for the default locale (which is specified via environment variables) apply. You might say, for example, `LcTime ? Date today' or, for example, `germanMonetaryLocale ? account balance'. This syntax can be at first confusing, but turns out to be convenient because of its consistency and overall simplicity. Here is how `?' works for different classes: -- Method on LcTime: ? aString Format a date, a time or a timestamp (`DateTime' object). -- Method on LcNumber: ? aString Format a number. -- Method on LcMonetary: ? aString Format a monetary value together with its currency symbol. -- Method on LcMonetaryISO: ? aString Format a monetary value together with its ISO currency symbol. -- Method on LcMessages: ? aString Answer an `LcMessagesDomain' that retrieves translations from the specified file. -- Method on LcMessagesDomain: ? aString Retrieve the translation of the given string.(3) These two packages provides much more functionality, including more advanced formatting options support for Unicode, and conversion to and from several character sets. For more information, refer to *Note Multilingual and international support with Iconv and I18N: (gst-libs)I18N. As an aside, the representation of locales that the package uses is exactly the same as the C library, which has many advantages: the burden of mantaining locale data is removed from GNU Smalltalk's mantainers; the need of having two copies of the same data is removed from GNU Smalltalk's users; and finally, uniformity of the conventions assumed by different internationalized programs is guaranteed to the end user. In addition, the representation of translated strings is the standard MO file format adopted by the GNU `gettext' library. ---------- Footnotes ---------- (1) All the classes mentioned in this section reside in the `I18N' namespace. (2) Character equality with `=' will be as fast as with `=='. (3) The `?' method does not apply to the LcMessagesDomain class itself, but only to its instances. This is because LcMessagesDomain is not a subclass of Locale. 3.5 The Seaside web framework ============================= Seaside is a framework to build highly interactive web applications quickly, reusably and maintainably. Features of Seaside include callback-based request handling, hierarchical (component-based) page design, and modal session management to easily implement complex workflows. A simple Seaside component looks like this: Seaside.WAComponent subclass: MyCounter [ | count | MyCounter class >> canBeRoot [ ^true ] initialize [ super initialize. count := 0. ] states [ ^{ self } ] renderContentOn: html [ html heading: count. html anchor callback: [ count := count + 1 ]; with: '++'. html space. html anchor callback: [ count := count - 1 ]; with: '--'. ] ] MyCounter registerAsApplication: 'mycounter' Most of the time, you will run Seaside in a background virtual machine. First of all, you should load the Seaside packages into an image like this: $ gst st> PackageLoader fileInPackage: 'Seaside' st> PackageLoader fileInPackage: 'Seaside-Development' st> PackageLoader fileInPackage: 'Seaside-Examples' st> ObjectMemory snapshot: 'seaside.im' Then, you can start Seaside with $ gst-remote -I seaside.im --daemon --start=Seaside which will start serving pages at `http://localhost:8080/seaside'. You can stop serving Seaside pages, and bring down the server, respectively with these commands: $ gst-remote --kill $ gst-remote --stop=Seaside 3.6 The Swazoo web server ========================= Swazoo (Smalltalk Web Application Zoo) is a free Smalltalk HTTP server supporting both static web serving and a fully-featured web request resolution framework. The server can be started using $ gst-load --start[=ARG] Swazoo or loaded into a background GNU Smalltalk virtual machine with $ gst-remote --start=Swazoo[:ARG] Usually, the first time you start Swazoo ARG is `swazoodemo' (which starts a simple "Hello, World!" servlet) or a path to a configuration file like this one: After this initial step, ARG can take the following meanings: * if omitted altogether, all the sites registered on the server are started; * if a number, all the sites registered on the server on that port are started; * if a configuration file name, the server configuration is _replaced_ with the one loaded from that file; * if any other string, the site named ARG is started. In addition, a background server can be stopped using $ gst-remote --stop=Swazoo[:ARG] where ARG can have the same meanings, except for being a configuration file. In addition, package `WebServer' implements an older web server engine which is now superseded by Swazoo. It is based on the GPL'ed WikiWorks project. Apart from porting to GNU Smalltalk, a number of changes were made to the code, including refactoring of classes, better aesthetics, authentication support, virtual hosting, and HTTP 1.1 compliance. 3.7 The SUnit testing package ============================= `SUnit' is a framework to write and perform test cases in Smalltalk, originarily written by the father of Extreme Programming(1), Kent Beck. `SUnit' allows one to write the tests and check results in Smalltalk; while this approach has the disadvantage that testers need to be able to write simple Smalltalk programs, the resulting tests are very stable. What follows is a description of the philosophy of `SUnit' and a description of its usage, excerpted from Kent Beck's paper in which he describes `SUnit'. 3.7.1 Where should you start? ----------------------------- Testing is one of those impossible tasks. You'd like to be absolutely complete, so you can be sure the software will work. On the other hand, the number of possible states of your program is so large that you can't possibly test all combinations. If you start with a vague idea of what you'll be testing, you'll never get started. Far better to _start with a single configuration whose behavior is predictable_. As you get more experience with your software, you will be able to add to the list of configurations. Such a configuration is called a "fixture". Two example fixtures for testing Floats can be `1.0' and `2.0'; two fixtures for testing Arrays can be `#()' and `#(1 2 3)'. By choosing a fixture you are saying what you will and won't test for. A complete set of tests for a community of objects will have many fixtures, each of which will be tested many ways. To design a test fixture you have to * Subclass TestCase * Add an instance variable for each known object in the fixture * Override setUp to initialize the variables 3.7.2 How do you represent a single unit of testing? ---------------------------------------------------- You can predict the results of sending a message to a fixture. You need to represent such a predictable situation somehow. The simplest way to represent this is interactively. You open an Inspector on your fixture and you start sending it messages. There are two drawbacks to this method. First, you keep sending messages to the same fixture. If a test happens to mess that object up, all subsequent tests will fail, even though the code may be correct. More importantly, though, you can't easily communicate interactive tests to others. If you give someone else your objects, the only way they have of testing them is to have you come and inspect them. By representing each predictable situation as an object, each with its own fixture, no two tests will ever interfere. Also, you can easily give tests to others to run. _Represent a predictable reaction of a fixture as a method._ Add a method to TestCase subclass, and stimulate the fixture in the method. 3.7.3 How do you test for expected results? ------------------------------------------- If you're testing interactively, you check for expected results directly, by printing and inspecting your objects. Since tests are in their own objects, you need a way to programmatically look for problems. One way to accomplish this is to use the standard error handling mechanism (`#error:') with testing logic to signal errors: 2 + 3 = 5 ifFalse: [self error: 'Wrong answer'] When you're testing, you'd like to distinguish between errors you are checking for, like getting six as the sum of two and three, and errors you didn't anticipate, like subscripts being out of bounds or messages not being understood. There's not a lot you can do about unanticipated errors (if you did something about them, they wouldn't be unanticipated any more, would they?) When a catastrophic error occurs, the framework stops running the test case, records the error, and runs the next test case. Since each test case has its own fixture, the error in the previous case will not affect the next. The testing framework makes checking for expected values simple by providing a method, `#should:', that takes a Block as an argument. If the Block evaluates to true, everything is fine. Otherwise, the test case stops running, the failure is recorded, and the next test case runs. So, you have to _turn checks into a Block evaluating to a Boolean, and send the Block as the parameter to `#should:'_. In the example, after stimulating the fixture by adding an object to an empty Set, we want to check and make sure it's in there: SetTestCase>>#testAdd empty add: 5. self should: [empty includes: 5] There is a variant on `TestCase>>#should:'. `TestCase>>#shouldnt:' causes the test case to fail if the Block argument evaluates to true. It is there so you don't have to use `(...) not'. Once you have a test case this far, you can run it. Create an instance of your TestCase subclass, giving it the selector of the testing method. Send `run' to the resulting object: (SetTestCase selector: #testAdd) run If it runs to completion, the test worked. If you get a walkback, something went wrong. 3.7.4 How do you collect and run many different test cases? ----------------------------------------------------------- As soon as you have two test cases running, you'll want to run them both one after the other without having to execute two do it's. You could just string together a bunch of expressions to create and run test cases. However, when you then wanted to run "this bunch of cases and that bunch of cases" you'd be stuck. The testing framework provides an object to represent "a bunch of tests", `TestSuite'. A `TestSuite' runs a collection of test cases and reports their results all at once. Taking advantage of polymorphism, `TestSuites' can also contain other `TestSuites', so you can put Joe's tests and Tammy's tests together by creating a higher level suite. _Combine test cases into a test suite._ (TestSuite named: 'Money') add: (MoneyTestCase selector: #testAdd); add: (MoneyTestCase selector: #testSubtract); run The result of sending `#run' to a `TestSuite' is a `TestResult' object. It records all the test cases that caused failures or errors, and the time at which the suite was run. All of these objects are suitable for being stored in the image and retrieved. You can easily store a suite, then bring it in and run it, comparing results with previous runs. 3.7.5 Running testsuites from the command line ---------------------------------------------- GNU Smalltalk includes a Smalltalk script to simplify running SUnit test suites. It is called `gst-sunit'. The command-line to `gst-sunit' specifies the packages, files and classes to test: `-I' `--image-file' Run tests inside the given image. `-q' `--quiet' Hide the program's output. The results are still communicated with the program's exit code. `-v' `--verbose' Be more verbose, in particular this will cause `gst-sunit' to write which test is currently being executed. `-f FILE' `--file=FILE' Load FILE before running the required test cases. `-p PACKAGE' `--package=PACKAGE' Load PACKAGE and its dependencies, and add PACKAGE's tests to the set of test cases to run. `CLASS' `CLASS*' Add CLASS to the set of test cases to run. An asterisk after the class name adds all the classes in CLASS's hierarchy. In particular, each selector whose name starts with `test' constitutes a separate test case. `VAR=VALUE' Associate variable VAR with a value. Variables allow customization of the testing environment. For example, the username with which to access a database can be specified with variables. From within a test, variables are accessible with code like this: TestSuitesScripter variableAt: 'mysqluser' ifAbsent: [ 'root' ] Note that a `#variableAt:' variant does _not_ exist, because the testsuite should pick default values in case the variables are not specified by the user. ---------- Footnotes ---------- (1) Extreme Programming is a software engineering technique that focuses on team work (to the point that a programmer looks in real-time at what another one is typing), frequent testing of the program, and incremental design. 3.8 TCP, WebServer, NetClients ============================== GNU Smalltalk includes an almost complete abstraction of the TCP, UDP and IP protocols. Although based on the standard BSD sockets, this library provides facilities such as buffering and preemptive I/O which a C programmer usually has to implement manually. The distribution includes a few tests (mostly loopback tests that demonstrate both client and server connection), which are class methods in `Socket'. This code should guide you in the process of creating and using both server and client sockets; after creation, sockets behave practically the same as standard Smalltalk streams, so you should not have particular problems. For more information, refer to *Note Network programming with TCP: (gst-libs)TCP. The library is also used by many other packages, including Swazoo and the MySQL driver. There is also code implementing the most popular Internet protocols: FTP, HTTP, NNTP, SMTP, POP3 and IMAP. These classes, loaded by the `NetClients' package, are derived from multiple public domain and free software packages available for other Smalltalk dialects and ported to GNU Smalltalk. Future version of GNU Smalltalk will include documentation for these as well. 3.9 An XML parser and object model for GNU Smalltalk ==================================================== The XML parser library for Smalltalk, loaded as package `XML' includes a validating XML parser and Document Object Model. This library is rapidly becoming a standard in the Smalltalk world and a XSLR interpreter based on it is bundled with GNU Smalltalk as well (see packages `XPath' and `XSL'). Parts of the basic XML package can be loaded independently using packages `XML-DOM', `XML-SAXParser', `XML-XMLParser', `XML-SAXDriver', `XML-XMLNodeBuilder'. 3.10 Other packages =================== Various other "minor" packages are provided, typically as examples of writing modules for GNU Smalltalk (*note Linking your libraries to the virtual machine: External modules.). These include: Complex which adds transparent operations with complex numbers GDBM which is an interface to the GNU database manager Digest which provides two easy to use classes to quickly compute cryptographically strong hash values using the MD5 and SHA1 algorithms. NCurses which provides bindings to ncurses Continuations which provides more examples and tests for continuations (an advanced feature to support complex control flow). DebugTools which provides a way to attach to another Smalltalk process and execute it a bytecode or a method at a time. 4 Smalltalk interface for GNU Emacs *********************************** GNU Smalltalk comes with its own Emacs mode for hacking Smalltalk code. It also provides tools for interacting with a running Smalltalk system in an Emacs subwindow. 4.1 Autoloading GNU Smalltalk mode ================================== To cause Emacs to automatically go into Smalltalk mode when you edit a Smalltalk file (one with the extension `.st'), you need to add the following lines to your `.emacs' file: (setq auto-mode-alist (append '(("\\.st\\'" . smalltalk-mode)) auto-mode-alist)) (autoload 'smalltalk-mode "PATH/smalltalk-mode.elc" "" t) where PATH is the path to where the two Emacs Lisp files included with GNU Smalltalk are installed (by default, something like `/usr/local/share/emacs/site-lisp'). 4.2 Smalltalk editing mode ========================== The GNU Smalltalk editing mode is there to assist you in editing your Smalltalk code. It tries to be smart about indentation and provides a few cooked templates to save you keystrokes. Since Smalltalk syntax is highly context sensitive, the Smalltalk editing mode will occasionally get confused when you are editing expressions instead of method definitions. In particular, using local variables, thus: | foo | foo := 3. ^foo squared ! will confuse the Smalltalk editing mode, as this might also be a definition the binary operator `|', with second argument called `foo'. If you find yourself confused when editing this type of expression, put a dummy method name before the start of the expression, and take it out when you're done editing, thus: x | foo | foo := 3. ^foo squared ! 4.3 Smalltalk interactor mode ============================= An interesting feature of Emacs Smalltalk is the Smalltalk interactor, which basically allows you run in GNU Emacs with Smalltalk files in one window, and Smalltalk in the other. You can, with a single command, edit and change method definitions in the live Smalltalk system, evaluate expressions, make image snapshots of the system so you can pick up where you left off, file in an entire Smalltalk file, etc. It makes a tremendous difference in the productivity and enjoyment that you'll have when using GNU Smalltalk. To start up the Smalltalk interactor, you must be running GNU Emacs and in a buffer that's in Smalltalk mode. Then, if you type `C-c m'. A second window will appear with GNU Smalltalk running in it. This window is in most respects like a Shell mode window. You can type Smalltalk expressions to it directly and re-execute previous things in the window by moving the cursor back to the line that contains the expression that you wish to re-execute and typing return. Notice the status in the mode line (e.g. `starting-up', `idle', etc). This status will change when you issue various commands from Smalltalk mode. When you first fire up the Smalltalk interactor, it puts you in the window in which Smalltalk is running. You'll want to switch back to the window with your file in it to explore the rest of the interactor mode, so do it now. To execute a range of code, mark the region around and type `C-c e'. The expression in the region is sent to Smalltalk and evaluated. The status will change to indicate that the expression is executing. This will work for any region that you create. If the region does not end with an exclamation point (which is syntactically required by Smalltalk), one will be added for you. There is also a shortcut, `C-c d' (also invokeable as `M-x smalltalk-doit'), which uses a simple heuristic to figure out the start and end of the expression: it searches forward for a line that begins with an exclamation point, and backward for a line that does not begin with space, tab, or the comment character, and sends all the text in between to Smalltalk. If you provide a prefix argument (by typing `C-u C-c d' for instance), it will bypass the heuristic and use the region instead (just like `C-c e' does). `C-c c' will compile a method; it uses a similar heuristic to determine the bounds of the method definition. Typically, you'll change a method definition, type `C-c c' and move on to whatever's next. If you want to compile a whole bunch of method definitions, you'll have to mark the entire set of method definitions (from the `methodsFor:' line to the `! !') as the region and use `C-c e'. After you've compiled and executed some expressions, you may want to take a snapshot of your work so that you don't have to re-do things next time you fire up Smalltalk. To do this, you use the `C-c s' command, which invokes `ObjectMemory snapshot'. If you invoke this command with a prefix argument, you can specify a different name for the image file, and you can have that image file loaded instead of the default one by using the `-I' flag on the command line when invoking Smalltalk. You can also evaluate an expression and have the result of the evaluation printed by using the `C-c p' command. Mark the region and use the command. To file in an entire file (perhaps the one that you currently have in the buffer that you are working on), type `C-c f'. You can type the name of a file to load at the prompt, or just type return and the file associated with the current buffer will be loaded into Smalltalk. When you're ready to quit using GNU Smalltalk, you can quit cleanly by using the `C-c q' command. If you want to fire up Smalltalk again, or if (heaven forbid) Smalltalk dies on you, you can use the `C-c m' command, and Smalltalk will be reincarnated. Even if it's running, but the Smalltalk window is not visible, `C-c m' will cause it to be displayed right away. You might notice that as you use this mode, the Smalltalk window will scroll to keep the bottom of the buffer in focus, even when the Smalltalk window is not the current window. This was a design choice that I made to see how it would work. On the whole, I guess I'm pretty happy with it, but I am interested in hearing your opinions on the subject. 5 Interoperability between C and GNU Smalltalk ********************************************** 5.1 Linking your libraries to the virtual machine ================================================= A nice thing you can do with GNU Smalltalk is enhancing it with your own goodies. If they're written in Smalltalk only, no problem: getting them to work as packages (*note Packages::), and to fit in with the GNU Smalltalk packaging system, is likely to be a five-minutes task. If your goodie is mostly written in C and you don't need particular glue to link it to Smalltalk (for example, there are no callbacks from C code to Smalltalk code), you can use the `dynamic library linking' system. When using this system, you have to link GNU Smalltalk with the library at run-time using DLD; the method to be used here is `DLD class>>#addLibrary:'. But if you want to provide a more intimate link between C and Smalltalk, as is the case with Blox, you should use the `dynamic module linking' system. This section explains what to do, taking the Blox library as a guide. Modules are searched for in the `gnu-smalltalk' subdirectory of the system library path, or in the directory that the `SMALLTALK_MODULES' environment variable points to. A module is distinguished from a standard shared library because it has a function which Smalltalk calls to initialize the module; the name of this function must be `gst_initModule'. Here is the initialization function used by Blox: void gst_initModule(proxy) VMProxy *proxy; { vmProxy = proxy; vmProxy->defineCFunc("Tcl_Eval", Tcl_Eval); vmProxy->defineCFunc("Tcl_GetStringResult", Tcl_GetStringResult); vmProxy->defineCFunc("tclInit", tclInit); vmProxy->defineCFunc("bloxIdle", bloxIdle); } Note that the `defineCFunc' function is called through a function pointer in `gst_initModule', and that Blox saves the value of its parameter to be used elsewhere in its code. This is not strictly necessary on many platforms, namely those where the module is effectively _linked with the Smalltalk virtual machine_ at run-time; but since some(1) cannot obtain this, for maximum portability you must always call the virtual machine through the proxy and never refer to any symbol which the virtual machine exports. For uniformity, even programs that link with `libgst.a' should not call these functions directly, but through a `VMProxy' exported by `libgst.a' and accessible through the `gst_interpreter_proxy' variable. First of all, you have to build your package as a shared library; using GNU Automake and `libtool', this is as easy as changing your `Makefile.am' file so that it reads like this pkglib_LTLIBRARIES = libblox.la libblox_la_LDFLAGS = -module -no-undefined "... more flags ..." libblox_la_SOURCES = "... your source files ..." instead of reading like this pkglib_LIBRARIES = libblox.a libblox_a_LDFLAGS = "... more flags ..." libblox_a_SOURCES = "... your source files ..." As you see, you only have to change `.a' extensions to `.la', `LIBRARIES' targets to `LTLIBRARIES', and add appropriate options to `LDFLAGS'(2). You will also have to run `libtoolize' and follow its instruction, but this is really simpler than it looks. Note that this example uses `pkglib' because BLOX is installed together with Smalltalk, but in general this is not necessary. You can install the library wherever you want; `libtool' will even generate appropriate warnings to the installer if `ldconfig' (or an equivalent program) has to be re-run. Finally, you will have to add the name of the module in the `packages.xml' file. In this case, the relevant entry in that file will be BloxTK BLOX BloxBasic.st BloxWidgets.st BloxText.st BloxCanvas.st BloxExtend.st Blox.st blox-tk blox-tk Blox.st BloxBasic.st BloxWidgets.st BloxText.st BloxCanvas.st BloxExtend.st colors.txt ---------- Footnotes ---------- (1) The most notable are AIX and Windows. (2) Specifying `-no-undefined' is not necessary, but it does perform that the portability conditions explained above (no reference to symbols in the virtual machine) are satisfied 5.2 Using the C callout mechanism ================================= To use the C callout mechanism, you first need to inform Smalltalk about the C functions that you wish to call. You currently need to do this in two places: 1) you need to establish the mapping between your C function's address and the name that you wish to refer to it by, and 2) define that function along with how the argument objects should be mapped to C data types to the Smalltalk interpreter. As an example, let us use the pre-defined (to GNU Smalltalk) functions of `system' and `getenv'. First, the mapping between these functions and string names for the functions needs to be established in your module. If you are writing an external Smalltalk module (which can look at Smalltalk objects and manipulate them), see *Note Linking your libraries to the virtual machine: External modules.; if you are using function from a dynamically loaded library, see *Note Dynamic loading::. Second, we need to define a method that will invoke these C functions and describe its arguments to the Smalltalk runtime system. Such a method is defined with a primitive-like syntax, similar to the following example (taken from `kernel/CFuncs.st') system: aString getenv: aString These methods were defined on class `SystemDictionary', so that we would invoke it thus: Smalltalk system: 'lpr README' ! However, there is no special significance to which class receives the method; it could have just as well been Float, but it might look kind of strange to see: 1701.0 system: 'mail help-smalltalk@gnu.org' ! The various keyword arguments are described below. `cCall: 'system'' This says that we are defining the C function `system'. This name must be *exactly* the same as the string passed to `defineCFunc'. The name of the method does not have to match the name of the C function; we could have just as easily defined the selector to be `'rambo: fooFoo''; it's just good practice to define the method with a similar name and the argument names to reflect the data types that should be passed. `returning: #int' This defines the C data type that will be returned. It is converted to the corresponding Smalltalk data type. The set of valid return types is: `char' Single C character value `string' A C char *, converted to a Smalltalk string `stringOut' A C char *, converted to a Smalltalk string and then freed. `symbol' A C char *, converted to a Smalltalk symbol `symbolOut' A C char *, converted to a Smalltalk symbol and then freed. `int' A C int value `uInt' A C unsigned int value `long' A C long value `uLong' A C unsigned long value `double' A C double, converted to an instance of FloatD `longDouble' A C long double, converted to an instance of FloatQ `void' No returned value (`self' returned from Smalltalk) `wchar' Single C wide character (`wchar_t') value `wstring' Wide C string (`wchar_t *'), converted to a UnicodeString `wstringOut' Wide C string (`wchar_t *'), converted to a UnicodeString and then freed `cObject' An anonymous C pointer; useful to pass back to some C function later `smalltalk' An anonymous (to C) Smalltalk object pointer; should have been passed to C at some point in the past or created by the program by calling other public GNU Smalltalk functions (*note Smalltalk types::). `CTYPE' You can pass an instance of CType or one of its subclasses (*note C data types::). In this case the object will be sent `#narrow' before being returned: an example of this feature is given in the experimental Gtk+ bindings. `args: #(string)' This is an array of symbols that describes the types of the arguments in order. For example, to specify a call to open(2), the arguments might look something like: args: #(string int int) The following argument types are supported; see above for details. `unknown' Smalltalk will make the best conversion that it can guess for this object; see the mapping table below `boolean' passed as `char', which is promoted to `int' `char' passed as `char', which is promoted to `int' `wchar' passed as `wchar_t' `string' passed as `char *' `stringOut' passed as `char *', the contents are expected to be overwritten with a new C string, and the object that was passed becomes the new string on return `wstring' passed as `wchar_t *' `wstringOut' passed as `wchar_t *', the contents are expected to be overwritten with a new C wide string, and the object that was passed becomes the new string on return `symbol' passed as `char *' `byteArray' passed as `char *', even though may contain NUL's `int' passed as `int' `uInt' passed as `unsigned int' `long' passed as `long' `uLong' passed as `unsigned long' `double' passed as `double' `longDouble' passed as `long double' `cObject' C object value passed as `void *' `cObjectPtr' Pointer to C object value passed as `void **'. The `CObject' is modified on output to reflect the value stored into the passed object. `smalltalk' Pass the object pointer to C. The C routine should treat the value as a pointer to anonymous storage. This pointer can be returned to Smalltalk at some later point in time. `variadic' `variadicSmalltalk' an Array is expected, each of the elements of the array will be converted like an `unknown' parameter if `variadic' is used, or passed as a raw object pointer for `variadicSmalltalk'. `self' `selfSmalltalk' Pass the receiver, converting it to C like an `unknown' parameter if `self' is used or passing the raw object pointer for `selfSmalltalk'. Parameters passed this way don't map to the message's arguments, instead they map to the message's receiver. Table of parameter conversions: Declared param type Object type C parameter type used boolean Boolean (True, False) int byteArray ByteArray char * cObject CObject void * cObjectPtr CObject void ** char Boolean (True, False) int char Character int (C promotion rule) char Integer int double Float double (C promotion) longDouble Float long double int Boolean (True, False) int int Integer int uInt Boolean (True, False) unsigned int uInt Integer unsigned int long Boolean (True, False) long long Integer long uLong Boolean (True, False) unsigned long uLong Integer unsigned long smalltalk, anything OOP selfSmalltalk string String char * string Symbol char * stringOut String char * symbol Symbol char * unknown, self Boolean (True, False) int unknown, self ByteArray char * unknown, self CObject void * unknown, self Character int unknown, self Float double unknown, self Integer long unknown, self String char * unknown, self Symbol char * unknown, self anything else OOP variadic Array each element is passed according to "unknown" variadicSmalltalk Array each element is passed as an OOP wchar Character wchar_t wstring UnicodeString wchar_t * wstringOut UnicodeString wchar_t * When your call-out returns `#void', depending on your application you might consider using "asynchronous call-outs". These are call-outs that do not suspend the process that initiated them, so the process might be scheduled again, executing the code that follows the call-out, during the execution of the call-out itself. This is particularly handy when writing event loops (the most common place where you call back into Smalltalk) because then _you can handle events that arrive during the handling of an outer event_ before the outer event's processing has ended. Depending on your application this might be correct or not, of course. In the future, asynchronous call-outs might be started into a separate thread. An asynchronous call-out is defined using an alternate primitive-like syntax, `asyncCCall:args:'. Note that the returned value parameter is missing because an asynchronous call-out always returns `nil'. 5.3 The C data type manipulation system ======================================= `CType' is a class used to represent C data types themselves (no storage, just the type). There are subclasses called things like `CMUMBLECType'. The instances can answer their size and alignment. Their `valueType' is the underlying type of data. It's either an integer, which is interpreted by the interpreter as the scalar type, or the underlying element type, which is another `CType' subclass instance. To make life easier, there are global variables which hold onto instances of `CScalarCType': they are called `CMUMBLEType' (like `CIntType', not like `CIntCType'), and can be used wherever a C datatype is used. If you had an array of strings, the elements would be CStringType's (a specific instance of CScalarCType). `CObject' is the base class of the instances of C data. It has a subclass called `CScalar', which has subclasses called `CMUMBLE'. These subclasses can answer size and alignment information. Instances of `CObject' holds a pointer to a C type variable. The variable can be allocated from Smalltalk by doing `TYPE new', where TYPE is a `CType' subclass instance, or it may have been returned through the C callout mechanism as a return value. Remember that `CObject' and its subclasses represent a pointer to a C object and as such provide the full range of operations supported by C pointers. For example, `+' `anInteger' which returns a CObject which is higher in memory by `anInteger' times the size of each item. There is also `-' which acts like `+' if it is given an integer as its parameter. If a CObject is given, it returns the difference between the two pointers. `incr', `decr', `incrBy:', `decrBy:' adjust the string either forward or backward, by either 1 or `n' characters. Only the pointer to the string is changed; the actual characters in the string remain untouched. CObjects can be divided into two families, scalars and non-scalars, just like C data types. Scalars fetch a Smalltalk object when sent the `value' message, and change their value when sent the `value:' message. Non-scalars do not support these two messages. `replaceWith:' `aString' replaces the string the instance points to with the new string. Actually, it copies the bytes from the Smalltalk `String' instance aString into the C string object, and null terminates. Be sure that the C string has enough room! You can also use a Smalltalk `ByteArray' as the data source. Non-scalars include instances of `CArray', `CPtr' and subclasses of `CStruct' and `CUnion'. CPtrs and CArrays get their underlying element type through a `CType' subclass instance which is associated with the `CArray' or `CPtr' instance. CPtr's also have `value' and `value:' which get or change the underlying value that's pointed to. In practice, `value' dereferences the pointer. CString is a subclass that answers a Smalltalk `String' when sent `value', and automatically allocates storage to copy and null-terminate a Smalltalk `String' when sent `value:'. Note that a `CPtr' to `long' points to a place in memory where a pointer to long is stored. In other words it is really a `long **' and must be dereferenced twice with `cPtr value value' to get the `long'. Finally, there are `CStruct' and `CUnion', which are abstract subclasses of `CObject'(1). In the following I will refer to CStruct, but the same considerations apply to CUnion as well, with the only difference that CUnions of course implement the semantics of a C union. These classes provide direct access to C data structures including * `long' (unsigned too) * `short' (unsigned too) * `char' (unsigned too) & byte type * `double', `long double', `float' * `string' (NUL terminated char *, with special accessors) * arrays of any type * pointers to any type * other structs containing any fixed size types Here is an example struct decl in C: struct audio_prinfo { unsigned channels; unsigned precision; unsigned encoding; unsigned gain; unsigned port; unsigned _xxx[4]; unsigned samples; unsigned eof; unsigned char pause; unsigned char error; unsign