Next: , Up: Full Source Bootstrap   [Contents][Index]


1.6.1 Stage0

June 2016 I learnt about Stage0. Jeremiah Orians created hex0 a ~500 byte self-hosting hex assembler. The source code is well documented and the binary is the exact mirror of the source code. I was inspired.

Here is an example of what the hex0 program looks like; the start of the hex function

00000060: 4883 f830 7c6f 4883 f83a 7c5a 4883 f841  H..0|oH..:|ZH..A
…
000000d0: 48c7 c0ff ffff ffc3 0000 0000 0000 0000  H...............
000000e0: 4883 e830 c300 0000 0000 0000 0000 0000  H..0............

All computer programs look like this: an opaque list of computer codes. The initial programs that we take for granted—the bootstrap binary seed—are about 250MB of such numbers: think 250,000 pages full of numbers.

Most computers work pretty well so apparently there is not a pressing need to inspect and study all of these codes. At the same time it is tricky to fully trust9 a computer that was bootstrapped in this way.

Here is what the source code of the hex0 assembler looks like

## function: hex
48 83 f8 30                # cmp $0x30,%rax
7c 6f                      # jl 6000f3 <ascii_other>
48 83 f8 3a                # cmp $0x3a,%rax
7c 5a                      # jl 6000e4 <ascii_num>
48 83 f8 41                # cmp $0x41,%rax
…
## function: ascii_other
48 c7 c0 ff ff ff ff       # mov $0xffffffffffffffff,%rax
c3                         # ret
…
## function: ascii_num
48 83 e8 30                # sub $0x30,%rax
c3                         # ret

While it may be hard to understand what this piece of the program does, it should be possible for anyone to verify that the computer codes above correspond to the source code with comments.

One step beyond these annotated codes is Assembly language. To write a program in Assembly, you only need to provide the instructions; the codes are computed by the assembler program.

hex:
	# deal all ascii less than 0
	cmp $48, %rax
	jl ascii_other
	# deal with 0-9
	cmp $58, %rax
	jl ascii_num
…
ascii_other:
	mov $-1, %rax
	ret
ascii_num:
	sub $48, %rax
	ret

More readable still, a similar program text in the C programming language.

int
hex (int c)
{
  if (c >= '0' && c <= '9')
    return c - 48;
…
}

What if we could bootstrap our entire system from only this one hex0 assembler binary seed? We would only ever need to inspect these 500 bytes of computer codes. Every10 later program is written in a more friendly programming language: Assembly, C, … Scheme.

Inspecting all these programs is a lot of work, but it can certainly be done. We might be able to create a fully inspectable path from almost nothing to all of the programs that our computer runs. Something that seemed to be an impossible dream is suddenly starting to look like “just a couple years of work”.


Footnotes

(9)

Ken Thompson’s 1984 Turing award acceptance speech Reflections on Trusting Tust.

(10)

Some program languages have become very hard or practically impossible to bootstrap. Instead of depending on a simple language such as C, they depend on a recent version of itself, or on other binary or ASCII seeds, on other recent programs written in that language, or even on manual intervention. Programs written in a language that cannot be bootstrapped can still run on our systems, but cannot enjoy any of the trust we intend to create.


Next: M2-Planet, Up: Full Source Bootstrap   [Contents][Index]