This is part 2 in a small series of notes explaining my opinion on what is a good system structure for the Hurd. While the ideas in part 1 motivate the system structure presented here, the feasibility of this system structure in turn justifies my opinion as presented in part 1. However, either part can also be taken individually. There will probably not be a third part.
I will start with presenting the process hierarchy, explain some abstract design patterns, and then show some specific applications.
Note that within this document, I will limit myself to certain types of operations and features. This does not mean that the system itself, by design, contains any measures to forbid or ban other types of operations.
A process is a protection domain. The initial configuration of the machine contains one or more processes with specific, but unspecified, relationships. These processes are called the "root processes". From the initial configuration, processes can be created and destroyed.
I do not make a disctinction between data and capability pages. Both are, for the course of this discussion, memory pages.
Processes require at the very least some memory resources to keep the process state. Memory is allocated from containers, which therefore provide an abstraction for memory reserves. It is required that one of the root processes is a server implementing container objects.
A container provides an interface that allows to allocate and return memory frames, and to create new containers with a new reserve limit (thus, containers form a hierarchy). Any successful allocation and deallocation from such a derived container will also be accounted for in all containers from which it is derived. A container can be destroyed, which will return all memory frames allocated from it, and thus recursively destroy all containers derived from it as well.
Any process which has access to a container from which a sufficient amount of memory can be allocated, can convert this memory into a process. The process is destroyed by deallocating the memory from which it was created.
The above description is actually mostly complete. What is missing is the description of a somewhat unrelated feature which allows process identification, a description of what the default mechanisms are in the system to support common design patterns, and an illustration that these design patterns are sufficient.
By default, every process is associated with one memory container, the primary container of the process. This is the container from which the process is allocated, and from which the process does all allocations for its own needs. Primary containers are by default not shared.
To create a new process, by default, a process, the parent, creates a new container from its primary container, allocates some memory from it and converts it into a new process, the child. It then prepares the process to get it into a runnable state. This includes the following steps: First, a special executable image (allocated from the primary container of the child) is installed into the child's address space, which runs a cooperative protocol with the parent. Then, the parent provides the primary container of the child, and any other initial state that the child should receive, to the startup code. The startup code finally installs this initial state and starts to execute it.
It is clear from this description that the child's existance is completely determined by the parent.
Process destruction can be done either cooperatively, or forcibly. The difference corresponds approximately to the difference between SIGTERM and SIGKILL in Unix. To destroy a process cooperatively, a request message is sent to a special capability implemented by the child process. The child can then begin to tear down the program, and at some time send a request back to the parent process to ask for forced process destruction.
Forced process destruction can be done by the parent process without any cooperation by the child process. The parent process simply destroys the primary container of the child (this means that the parent process should retain the primary container capability).
Because container destruction works recursively, forced process destruction works recursively as well.
From the above description it should be clear that containers and processes are organized in the same hierarchical tree structure, where every node corresponds to a process and its primary container, and every edge corresponds to a parent-child relationship.
The ability to subdivide the container's resource reserves provides the ability to completely isolate sibling processes in the process hierarchy. By default, two processes, where neither is an ancestor of the other process, are completely isolated. Also, an ancestor is partially isolated from its child. To overcome this isolation, the two processes need the cooperation of at least all their respective ancestors up to the first common ancestor in the tree. An example should illustrate that:
A / \ B C / \ D E
In this picture, A is the direct parent of B and C, and C is the direct parent of D and E. A is a common ancestor of B, C, D and E. C is a comon ancestor of D and E. The isolation is by default complete between (B C), (B D), (B E), and (D E). There is partial isolation between (A B), (A C), (A D), (A E), (C D) and (C E). The isolation properties of A are, if it is a root node, defined by the initial configuration.
If, for example, B and D should be able to communicate, the explicit or implicit permission needs to be provided by both A and C.
Because of the recursive nature of the process hierarchy, and because the existance of a child is completely determined by its direct parent (which existance is completely determined by its direct parent, etc), processes can be confined, and the confinement extends to all their child processes as well.
In the above example, A confines B, C, D and E. C confined D and E. Thus, B and C are only confined by A, whereas D and E are confined by A and C.
Because the existance of a child process is completely defined by its parent, its understanding of what is secure, what its needs are, what is "external" to itself and what is internal, etc, is completely defined by the parent as well. It therefore does not make sense to object to the above model by claiming that the child can not do what it wants to do, because what the child wants to do is completely defined by the parent, as are its abilities to do it. It also does not make sense to object that the child can not determine if a capability it got from the parent is safe to use, because it is the parent which defines for the child if a capability is safe to use or not.
Any such objection has, at its root, some assumption that is different from the assumptions made in this model, and thus needs to be analysed and reasoned about outside of the model.
An branding operation exists which, at the micro-level, allows a server process to check if a certain capability is implemented by itself. The server can then provide an identify operation to its clients, which allow the clients to check with the server if a certain capability is implemented by it. The client can then refuse to use the capability if it is not authentic.
I will now describe some common applications that need to be supported, and how they can be supported in the above system structure. To make this brief, I only include applications that have any significance in the confined+isolated discussion. There are other applications (pipes, daemonization, process management), which are important to discuss, but can be solved in identical ways in both types of system structures, so I am excluding them here.
Unix-style suid applications have been proposed as one application for alternative process construction mechanisms. However, suid applications in Unix are, from the perspective of the parent, not confined, only isolated. Thus, they are readily replaced by a system service that is created by the system software, and that runs as a sibling to any user process. Only the ability to invoke the system service needs to be given to the user, not the ability to instantiate it.
In fact, no gain can derived from letting the user instantiate system services. In Unix, system services run on durable resources, which the user can not revoke. Thus, the system service needs to acquire its resources from a container that is not derived from the user's primary container.
In "Design of the EROS Trusted Window System", Shap et al describe a uni-directional communication mechanism that can be used for a cut&paste operation in a window manager, that is guaranteed to not allow backflow of information. The main challenge to do this is format conversion, which traditionally requires negotiation between the two parties. In the mechanism proposed, confined constructors are used to allow the sending party to provide format converters that can be used by the receiving party to convert into a format it understands.
I think that in the context of a free software operating system, and considering the threat caused by proprietary document formats, it is fully sufficient and in fact appropriate for our needs to replace this mechanism with one in which the format converters are not provided as isolated programs, but where instead at least the binary image of the format converter is provided in read-only fasion to the receiver.
Accepting this means, in practice, that in the proposed protocol, the format converter constructor capability can be replaced by the vector of capabilities, which must be transitive read-only, which is put into the constructor by the sending party before sealing. The sending party then can instantiate these programs itself.
This alternative mechanism breaks with the principle of least authority, because it values other principles with a higher priority.
Two agents in the system can collaborate suspiciously by means of a third agent. In the process, they rely on the third agent to implement the common will. This third agent can even be a constructor-like service. The validity of the service can either be established by the abovely described "Identify" operation, or, in principle, if the underlying operating system exposes the functionality of a "trusted computing" component, the two agents can even get all the guarantees and restrictions imposed by such a component. There is nothing in the system structure above that can prevent this. The changes needed in the underlying operating system are purely local changes with no effect on the overall system structure.
 I should add here that my analysis is limited to technical constraints. There may be further legal constraints imposed by software licenses such as the upcoming GPL v3, which draft has an anti-DRM provision.
I said earlier that this makes it hard for me to understand why it has been said that the above system structure constitutes a "ban" on this mechanism. I believe, without having inquired further, that the reason must be that the suspicious collaboration in the above sense is a contract with limited scope. Any information that is passed from the mediating agent to either of the two parties will subsequently not be controlled further. This is in fact always true. The only difference is what the scope of the mediating agent is.
In "locked down" computer systems, the mediating agent has a scope that extends to all of the operating system. For example, the window manager would be part of the mediating agent, and conspire with other components to not allow some information displayed to be read out or modified. Or it could reduce the quality of the information if such a read out occurs (as is required by HDCP licenses, for example). In the danger of repeating myself here, the differences that surfaced in the discussion are probably rooted in the issue of scope. The scope problem is not visible under a microsope, but is only revealed as emergent behaviour by a macroscopic analysis of the resulting system.