On Unix systems, fork is a rather simple system call.

Our implementation in glibc is and needs to be rather bulky.

For example, it has to duplicate all port rights for the new Mach task. The address space can simply be duplicated by standard means of the Mach, but as file descriptors (for example) are a concept that is implemented inside glibc (based on Mach ports), these have to be duplicated from userspace, which requires a small number of RPCs for each of them, and in the sum, this affects performance when new processes are continuously being spawned from the shell, for example.

Often, a fork call will eventually be followed by an exec, which may in turn close (most of) the duplicated port rights. Unfortunately, this cannot be known at the time the fork executing, so in order to optimize this, the code calling fork has to be modified instead, and the fork, exec combo be replaced by a posix_spawn call, for example, to avoid this work of duplicating each port right, then closing each again.

As far as we know, Cygwin has the same problem of fork being a nontrivial operation. Perhaps we can learn from what they're been doing? Also, perhaps they have patches for software packages, to avoid using fork followed by exec, for example.

TODO

Related

External