GNU Smalltalk User’s Guide: Inside Arrays

Smalltalk provides a very adequate selection of predefined classes from which to choose. Eventually, however, you will find the need to code a new basic data structure. Because Smalltalk’s most fundamental storage allocation facilities are arrays, it is important that you understand how to use them to gain efficient access to this kind of storage.

The Array Class. Our examples have already shown the Array class, and its use is fairly obvious. For many applications, it will fill all your needs—when you need an array in a new class, you keep an instance variable, allocate a new Array and assign it to the variable, and then send array accesses via the instance variable.

This technique even works for string-like objects, although it is wasteful of storage. An Array object uses a Smalltalk pointer for each slot in the array; its exact size is transparent to the programmer, but you can generally guess that it’ll be roughly the word size of your machine. ³⁹ For storing an array of characters, therefore, an Array works but is inefficient.

Arrays at a Lower Level. So let’s step down to a lower level of data structure. A ByteArray is much like an Array, but each slot holds only an integer from 0 to 255-and each slot uses only a byte of storage. If you only needed to store small quantities in each array slot, this would therefore be a much more efficient choice than an Array. As you might guess, this is the type of array which a String uses.

Aha! But when you go back to chapter 9 and look at the Smalltalk hierarchy, you notice that String does not inherit from ByteArray. To see why, we must delve down yet another level, and arrive at the basic methods for setting up the structure of the instances of a class.

When we implemented our NiledArray example, we used <shape: #inherit>. The shape is exactly that: the fundamental structure of Smalltalk objects created within a given class. Let’s consider the differences in the next sub-sections.

There are many more shapes for more specialized usage. All of them specify the same kind of “array-like” behavior, with different data types.

How to access this new arrays? You already know how to access instance variables—by name. But there doesn’t seem to be a name for this new storage. The way an object accesses it is to send itself array-type messages like at:, at:put:, and so forth.

The problem is when an object wants to add a new level of interpretation to these messages. Consider a Dictionary—it is a pointer-holding object but its at: message is in terms of a key, not an integer index of its storage. Since it has redefined the at: message, how does it access its fundamental storage?

The answer is that Smalltalk has defined basicAt: and basicAt:put:, which will access the basic storage even when the at: and at:put: messages have been defined to provide a different abstraction.

This can get pretty confusing in the abstract, so let’s do an example to show how it’s pretty simple in practice. Smalltalk arrays tend to start at 1; let’s define an array type whose permissible range is arbitrary.

   ArrayedCollection subclass: RangedArray [
       | offset |
       <comment: 'I am an Array whose base is arbitrary'>
       RangedArray class >> new: size [
           <category: 'instance creation'>
           ^self new: size base: 1
       ]
       RangedArray class >> new: size base: b [
           <category: 'instance creation'>
           ^(super new: size) init: b
       ]

       init: b [
           <category: 'init'>
           offset := (b - 1).   "- 1 because basicAt: works with a 1 base"
           ^self
      ]
      rangeCheck: i [
           <category: 'basic'>
           (i <= offset) | (i > (offset + self basicSize)) ifTrue: [
               'Bad index value: ' printOn: stderr.
               i printOn: stderr.
               Character nl printOn: stderr.
               ^self error: 'illegal index'
           ]
       ]
       at: [
           self rangeCheck: i.
           ^self basicAt: i - offset
       ]
       at: i put: v [
           self rangeCheck: i.
           ^self basicAt: i - offset put: v
       ]
   ]

The code has two parts; an initialization, which simply records what index you wish the array to start with, and the at: messages, which adjust the requested index so that the underlying storage receives its 1-based index instead. We’ve included a range check; its utility will demonstrate itself in a moment:

Since 4 is below our base of 5, a range check error occurs. But this check can catch more than just our own misbehavior!

Our do: message handling is broken! The stack backtrace pretty much tells the story:

Our code received a do: message. We didn’t define one, so we inherited the existing do: handling. We see that an Integer loop was constructed, that a code block was invoked, and that our own at: code was invoked. When we range checked, we trapped an illegal index. Just by coincidence, this version of our range checking code also dumps the index. We see that do: has assumed that all arrays start at 1.

But the issues start to run deep. If our parent class believed that it knew enough to assume a starting index of 1⁴², why didn’t it also assume that it could call basicAt:? The answer is that of the two choices, the designer of the parent class chose the one which was less likely to cause trouble; in fact all standard Smalltalk collections do have indices starting at 1, yet not all of them are implemented so that calling basicAt: would work.⁴³

Object-oriented methodology says that one object should be entirely opaque to another. But what sort of privacy should there be between a higher class and its subclasses? How many assumption can a subclass make about its superclass, and how many can the superclass make before it begins infringing on the sovereignty of its subclasses?

Alas, there are rarely easy answers, and this is just an example. For this particular problem, there is an easy solution. When the storage need not be accessed with peak efficiency, you can use the existing array classes. When every access counts, having the storage be an integral part of your own object allows for the quickest access—but remember that when you move into this area, inheritance and polymorphism become trickier, as each level must coordinate its use of the underlying array with other levels.

6.12.1 How Arrays Work

Footnotes

(39)

(40)

(41)

(42)

(43)