Lists diagrammed (Programming in Emacs Lisp)

Next: Symbols as a Chest of Drawers, Up: How Lists are Implemented [Contents][Index]

Lists diagrammed

For example, the list (rose violet buttercup) has three elements, ‘rose’, ‘violet’, and ‘buttercup’. In the computer, the electronic address of ‘rose’ is recorded in a segment of computer memory called a cons cell (because it’s what the function cons actually creates). That cons cell also holds the address of a second cons cell, whose CAR is the atom ‘violet’; and that address (the one that tells where to find ‘violet’) is kept along with the address of a third cons cell which holds the address for the atom ‘buttercup’.

This sounds more complicated than it is and is easier seen in a diagram:

    ___ ___      ___ ___      ___ ___
   |___|___|--> |___|___|--> |___|___|--> nil
     |            |            |
     |            |            |
      --> rose     --> violet   --> buttercup

In the diagram, each box represents a word of computer memory that holds a Lisp object, usually in the form of a memory address. The boxes, i.e., the addresses, are in pairs. Each arrow points to what the address is the address of, either an atom or another pair of addresses. The first box is the electronic address of ‘rose’ and the arrow points to ‘rose’; the second box is the address of the next pair of boxes, the first part of which is the address of ‘violet’ and the second part of which is the address of the next pair. The very last box points to the symbol nil, which marks the end of the list.

When a variable is set to a list with an operation such as setq, it stores the address of the first box in the variable. Thus, evaluation of the expression

(setq bouquet '(rose violet buttercup))

creates a situation like this:

bouquet
     |
     |     ___ ___      ___ ___      ___ ___
      --> |___|___|--> |___|___|--> |___|___|--> nil
            |            |            |
            |            |            |
             --> rose     --> violet   --> buttercup

In this example, the symbol bouquet holds the address of the first pair of boxes.

This same list can be illustrated in a different sort of box notation like this:

bouquet
 |
 |    --------------       ---------------       ----------------
 |   | car   | cdr  |     | car    | cdr  |     | car     | cdr  |
  -->| rose  |   o------->| violet |   o------->| butter- |  nil |
     |       |      |     |        |      |     | cup     |      |
      --------------       ---------------       ----------------

(Symbols consist of more than pairs of addresses, but the structure of a symbol is made up of addresses. Indeed, the symbol bouquet consists of a group of address-boxes, one of which is the address of the printed word ‘bouquet’, a second of which is the address of a function definition attached to the symbol, if any, a third of which is the address of the first pair of address-boxes for the list (rose violet buttercup), and so on. Here we are showing that the symbol’s third address-box points to the first pair of address-boxes for the list.)

If a symbol is set to the CDR of a list, the list itself is not changed; the symbol simply has an address further down the list. (In the jargon, CAR and CDR are “non-destructive”.) Thus, evaluation of the following expression

(setq flowers (cdr bouquet))

produces this:

bouquet        flowers
  |              |
  |     ___ ___  |     ___ ___      ___ ___
   --> |   |   |  --> |   |   |    |   |   |
       |___|___|----> |___|___|--> |___|___|--> nil
         |              |            |
         |              |            |
          --> rose       --> violet   --> buttercup

The value of flowers is (violet buttercup), which is to say, the symbol flowers holds the address of the pair of address-boxes, the first of which holds the address of violet, and the second of which holds the address of buttercup.

A pair of address-boxes is called a cons cell or dotted pair. See Cons Cell and List Types in The GNU Emacs Lisp Reference Manual, and Dotted Pair Notation in The GNU Emacs Lisp Reference Manual, for more information about cons cells and dotted pairs.

The function cons adds a new pair of addresses to the front of a series of addresses like that shown above. For example, evaluating the expression

(setq bouquet (cons 'lily bouquet))

produces:

bouquet                       flowers
  |                             |
  |     ___ ___        ___ ___  |     ___ ___       ___ ___
   --> |   |   |      |   |   |  --> |   |   |     |   |   |
       |___|___|----> |___|___|----> |___|___|---->|___|___|--> nil
         |              |              |             |
         |              |              |             |
          --> lily      --> rose       --> violet    --> buttercup

However, this does not change the value of the symbol flowers, as you can see by evaluating the following,

(eq (cdr (cdr bouquet)) flowers)

which returns t for true.

Until it is reset, flowers still has the value (violet buttercup); that is, it has the address of the cons cell whose first address is of violet. Also, this does not alter any of the pre-existing cons cells; they are all still there.

Thus, in Lisp, to get the CDR of a list, you just get the address of the next cons cell in the series; to get the CAR of a list, you get the address of the first element of the list; to cons a new element on a list, you add a new cons cell to the front of the list. That is all there is to it! The underlying structure of Lisp is brilliantly simple!

And what does the last address in a series of cons cells refer to? It is the address of the empty list, of nil.

In summary, when a Lisp variable is set to a value, it is provided with the address of the list to which the variable refers.