We offer a wide range of possible projects to choose from. If you have an idea not listed here, we'd love to hear about it!

In either case, we encourage you to contact us (on IRC and/or our developer mailing lists), so we can discuss your idea, or help you pick a suitable task -- we will gladly explain the tasks in more detail, if the descriptions are not clear enough.

In fact, we suggest you discuss your choice with us even if you have no trouble finding a task that suits you: as explained in the introduction section of the student application form, we ask all students to get into regular communication with us for the application to be considered complete. Talking about your project choice is a good start :-)

(We strongly suggest that you generally take a look at the student application form right now -- the sooner you know what we expect, the better you can cater to it :-) )

Many of the project descriptions suggest some "exercise". The reason is that for the application to be complete, we require you to make a change to the Hurd code, and send us the resulting patch. (This is also explained in the student application form.) If possible, the change should make some improvement to the code you will be working on during the summer, or to some related code.

The "exercise" bit in the project description is trying to give you some ideas what kind of change this could be. In most cases it is quite obvious, though: Try to find something to improve in the relevant code, by looking at known issues in the Savannah bug tracker; by running the code and testing stuff; and by looking through the code. If you don't find anything, try with some related code -- if you task involves translator programming, make some improvement to an existing translor; if it involves glibc hacking, make an improvement to glibc; if it involves driver hacking, make an improvement to the driver framework; and so on... Makes sense, doesn't it? :-)

Sometimes it's hard to come up with a useful improvement to the code in question, that isn't too complicated for the purposes of the application. In this case, we need to find a good alternative. You could for example make an improvement to some Hurd code that is not directly related to your project: this way you won't get familiar with working on the code you will actually need for the task, but at least you can show that you are able to work with the Hurd code in general.

Another possible alternative would be making some change to the code in question, that isn't really a useful improvement, while still making sense in some way -- this could suffice to prove that you are able to work with the code.

Don't despair if you can't come up with anything suitable by yourself. Contact us, and we will think of something together :-)

In either case, we strongly suggest that you talk to us about the change you want to make up front, to be sure that it is something that will get our approval -- especially if the idea is not directly taken from the project description.

Also, don't let this whole patch stuff discourage you from applying! As explained in the student application form, it's not a problem if you do not yet have all the necessary knowledge to do this alone -- we don't expect that. After all, the purpose of GSoC is to introduce you to free software development :-) We only want to see that you are able to obtain the necessary knowledge before the end of the application process, with our help -- contact us, and we will assist you as well as we can.

See also the list of Hurd-related X.org project ideas.

The main idea of the Hurd design is giving users the ability to easily modify/extend the system's functionality (extensible system). This is done by creating filesystem translators and other kinds of Hurd servers.

However, in practice this is not as easy as it should, because creating translators and other servers is quite involved -- the interfaces for doing that are not exactly simple, and available only for C programs. Being able to easily create simple translators in RAD languages is highly desirable, to really be able to reap the advantages of the Hurd architecture.

Originally Lisp was meant to be the second system language besides C in the GNU system; but that doesn't mean we are bound to Lisp. Bindings for any popular high-level language, that helps quickly creating simple programs, are highly welcome.

Several approaches are possible when creating such bindings. One way is simply to provide wrappers to all the available C libraries (?libtrivfs, libnetfs etc.). While this is easy (it requires relatively little consideration), it may not be the optimal solution. It is preferable to hook in at a lower level, thus being able te create interfaces that are specially adapted to make good use of the features available in the respective language.

These more specialised bindings could hook in at some of the lower level library interfaces (?libports, glibc, etc.); use the MIG-provided RPC stubs directly; or even create native stubs directly from the interface definitions. The lisp bindings created by Flavio Cruz in last year's GSoC mostly use the latter approach, and can serve as a good example.

There is another possible reason for preferring lower-level bindings: Presently, the Hurd server libraries use the cthreads threading library, which predates the pthread standard prevalent today. There is a pthread library for the Hurd as well, but it's not possible to use both cthreads and pthreads in the same executable. Thus, until porting the Hurd libraries to pthreads is finished, implementing bindings for any language that uses pthreads (in the runtime environment or the actual programs) is only possible when not using the standard Hurd server libraries at all -- i.e. when binding at MIG stub level or interface definition level.

The task is to create easy to use Hurd bindings for a language of the student's choice, and some example servers to prove that it works well in practice. This project will require gaining a very good understanding of the various Hurd interfaces. Skills in designing nice programming interfaces are a must.

Anatoly A. Kazantsev has started working on Python bindings last year -- if Python is your language of choice, you probably should take his work and complete it.

There was also some previous work on Perl bindings, which might serve as a reference if you want to work on Perl.

Possible mentors: Anatoly A. Kazantsev (anatoly) for Python

Posted 2009-03-05 18:20:56 UTC

The main idea behind the Hurd design is to allow users to replace almost any system functionality (extensible system). Any user can easily create a subenvironment using some custom servers instead of the default system servers. This can be seen as an advanced lightweight virtualization mechanism, which allows implementing all kinds of standard and nonstandard virtualization scenarios.

However, though the basic mechanisms are there, currently it's not easy to make use of these possibilities, because we lack tools to automatically launch the desired constellations.

The goal is to create a set of powerful tools for managing at least one desirable virtualization scenario. One possible starting point could be the subhurd/neighborhurd mechanism, which allows a second almost totally independant instance of the Hurd in parallel to the main one.

While subhurd allow creating a complete second system instance, with an own set of Hurd servers and UNIX daemons and all, there are also situations where it is desirable to have a smaller subenvironment, living withing the main system and using most of its facilities -- similar to a chroot environment. A simple way to create such a subenvironment with a single command would be very helpful.

It might be possible to implement (perhaps as a prototype) a wrapper using existing tools (chroot and unionfs); or it might require more specific tools, like some kind of unionfs-like filesytem proxy that mirrors other parts of the filesystem, but allows overriding individual locations, in conjuction with either chroot or some similar mechanism to create a subenvironment with a different root filesystem.

It's also desirable to have a mechanism allowing a user to set up such a custom environment in a way that it will automatically get launched on login -- practically allowing the user to run a customized operating system in his own account.

Yet another interesting scenario would be a subenvironment -- using some kind of special filesystem proxy again -- in which the user serves as root, being able to create local sub-users and/or sub-groups.

This would allow the user to run "dangerous" applications (webbrowser, chat client etc.) in a confined fashion, allowing it access to only a subset of the user's files and other resources. (This could be done either using a lot of groups for individual resources, and lots of users for individual applications; adding a user to a group would give the corresponding application access to the corresponding resource -- an advanced ?ACL mechanism. Or leave out the groups, assigning the resources to users instead, and use the Hurd's ability for a process to have multiple user IDs, to equip individual applications with sets of user IDs giving them access to the necessary resources -- basically a capability mechanism.)

The student will have to pick (at least) one of the described scenarios -- or come up with some other one in a similar spirit -- and implement all the tools (scripts, translators) necessary to make it available to users in an easy-to-use fashion. While the Hurd by default already offers the necessary mechanisms for that, these are not perfect and could be further refined for even better virtualization capabilities. Should need or desire for specific improvements in that regard come up in the course of this project, implementing these improvements can be considered part of the task.

Completing this project will require gaining a very good understanding of the Hurd architecture and spirit. Previous experience with other virtualization solutions would be very helpful.

Possible mentors: Olaf Buddenhagen (antrik)

Posted 2009-03-05 18:20:56 UTC

Over the years, UNIX has aquired a host of different file locking mechanisms. Some of them work on the Hurd, while others are buggy or only partially implemented. This breaks many applications.

The goal is to make all file locking mechanisms work properly. This requires finding all existing shortcomings (through systematic testing and/or checking for known issues in the bug tracker and mailing list archives), and fixing them.

This task will require digging into parts of the code to understand how file locking works on the Hurd. Only general programming skills are required.

Possible mentors: Samuel Thibault (youpi)

Exercise: Find one of the existing issues, either by looking at the task/bug trackers on savannah, or by trying things out yourself; and take a go at it. Note though that most of these issues are probably not trivial -- it's quite likely that you won't be able to actually fix any of them in the time available during the application process. However, you might be able to spot something else that could be improved while looking into this.

If after trying for a while you haven't found anything easy enough to improve in the locking-related code, talk to us about some alternative exercise task. Perhaps you actually find something you could do while looking through the bug tracker or trying stuff yourself in search of locking issues :-)

Posted 2009-03-05 18:20:56 UTC

Although a driver framework in userspace would be desirable, presently the Hurd uses kernel drivers in the microkernel, GNU Mach. (And changing this would be far beyond a GSoC project...)

The problem is that the drivers in GNU Mach are presently old Linux drivers (mostly from 2.0.x) accessed through a glue code layer. This is not an ideal solution, but works quite OK, except that the drivers are very old. The goal of this project is to redo the glue code, so we can use drivers from current Linux versions, or from one of the free BSD variants.

While it would be certainly possible to create custom glue code again, a more sustainable and probably also easier approch is to use ddekit instead -- it already does the hard work of providing an environment where the foreign drivers can run, and has the additional advantage of being externally maintained.

This is a doable, but pretty involved project. Previous experience with driver programming probably is a must. (No Hurd-specific knowledge is required, though.)

This is GNU Savannah task #5488.

Possible mentors: Samuel Thibault (youpi)

Exercise: Take a driver for some newer piece of hardware (e.g. Intel e1000 ethernet) from a recent system, and try to port it to run in the existing driver framework in GNU Mach. Completing the port might be too involved for the exercise; but it's pretty likely that you will find something else to improve in the glue code while working on this...

Posted 2009-03-05 18:20:56 UTC

The main idea of the Hurd is that every user can influence almost all system functionality (extensible system), by running private Hurd servers that replace or proxy the global default implementations.

However, running such a cumstomized subenvironment presently is not easy, because there is no standard mechanism to easily replace an individual standard server, keeping everything else. (Presently there is only the subhurd method, which creates a completely new system instance with a completely independent set of servers.)

The goal of this project is to provide a simple method for overriding individual standard servers, using environment variables, or a special subshell, or something like that. It is closely related to the virtualization task.

Various approaches for such a mechanism has been discussed before. Probably the easiest (1) would be to modify the Hurd-specific parts of glibc, which are contacting various standard servers to implement certain system calls, so that instead of always looking for the servers in default locations, they first check for overrides in environment variables, and use these instead if present. Take a look at the socket server overriding patch for an example.

A somewhat more generic solution (2) could use some mechanism for arbitrary client-side namespace overrides. The client-side part of the filename lookup mechanism would have to check an override table on each lookup, and apply the desired replacement whenever a match is found.

Another approach would be server-side overrides. Again there are various variants. The actual servers themself could provide a mechanism to redirect to other servers on request. (3) Or we could use some more generic server-side namespace overrides: Either all filesystem servers could provide a mechanism to modify the namespace they export to certain clients (4), or proxies could be used that mirror the default namespace but override certain locations. (5)

Variants (4) and (5) are the most powerful. They are intimately related to chroots: (4) is like the current chroot implementation works in the Hurd, and (5) has been proposed as an alternative. The generic overriding mechanism could be implemented on top of chroot, or chroot could be implemented on top of the generic overriding mechanism. But this is out of scope for this project...

In practice, probably a mix of the different approaches would prove most useful for various servers and use cases. It is strongly recommended that the student starts with (1) as the simplest approach, perhaps augmenting it with (3) for certain servers that don't work with (1) because of indirect invocation.

This tasks requires some understanding of the Hurd internals, especially a good understanding of the file name lookup mechanism. It's probably not too heavy on the coding side.

This is GNU Savannah task #6612. Also there are quite a bit of emails discussing this topic, from a previous year's GSoC application -- see http://lists.gnu.org/archive/html/bug-hurd/2007-03/msg00050.html, http://lists.gnu.org/archive/html/bug-hurd/2007-03/msg00114.html, http://lists.gnu.org/archive/html/bug-hurd/2007-06/msg00082.html, http://lists.gnu.org/archive/html/bug-hurd/2008-03/msg00039.html.

Possible mentors: Olaf Buddenhagen (antrik)

Exercise: Come up with a glibc patch that allows overriding one specific standard server using method (1).

Posted 2009-03-05 18:20:56 UTC

The Hurd presently uses a TCP/IP stack based on code from an old Linux version. This works, but lacks some rather important features (like PPP/PPPoE), and the design is not hurdish at all.

A true hurdish network stack will use a set of translator processes, each implementing a different protocol layer. This way not only the implementation gets more modular, but also the network stack can be used way more flexibly. Rather than just having the standard socket interface, plus some lower-level hooks for special needs, there are explicit (perhaps filesystem-based) interfaces at all the individual levels; special application can just directly access the desired layer. All kinds of packet filtering, routing, tunneling etc. can be easily achieved by stacking compononts in the desired constellation.

Implementing a complete modular network stack is not feasible as a GSoC project, though. Instead, the task is to take some existing user space TCP/IP implementation, and make it run as a single Hurd server for now, so it can be used in place of the existing pfinet. The idea is to split it up into individual layers later. The initial implementation, and the choice of a TCP/IP stack, should be done with this in mind -- it needs to be modular enough to make such a split later on feasible.

This is GNU Savannah task #5469.

Possible mentors: ?

Exercise: You could try making some improvement to the existing pfinet implementation; or you could work towards running some existing userspace TCP/IP stack on Hurd. (As a normal program for now, not a proper Hurd server yet.)

Posted 2009-03-05 18:20:56 UTC

The Hurd has both NFS server and client implementations, which work, but not very well: File locking doesn't work properly (at least in conjuction with a GNU/Linux server), and performance is extremely poor. Part of the problems could be owed to the fact that only NFSv2 is supported so far.

(Note though that locking on the Hurd is problematic in general, not only in conjuction with NFS -- see the file locking task.)

This project encompasses implementing NFSv3 support, fixing bugs and performance problems -- the goal is to have good NFS support. The work done in a previous unfinished GSoC project can serve as a starting point.

Both client and server parts need work, though the client is probably much more important for now, and shall be the major focus of this project.

Some discussion of NFS improvements has been done for a former GSoC application -- it might give you some pointers. But don't take any of the statements made there for granted -- check the facts yourself!

This task, GNU Savannah task #5497, has no special prerequisites besides general programming skills, and an interest in file systems and network protocols.

Possible mentors: ?

Exercise: Look into one of the existing issues in the NFS code. It's quite possible that you will not be able to fix any of the visible problems before the end of the application process; but you might discover something else you could improve in the code while working on it :-)

If you can't find anything suitable, talk to us about possible other exercise tasks.

Posted 2009-03-05 18:20:56 UTC

Nowadays the most often encountered cause of Hurd crashes seems to be lockups in the ext2fs server. One of these could be traced recently, and turned out to be a lock inside ?libdiskfs that was taken and not released in some cases. There is reason to believe that there are more faulty paths causing these lockups.

The task is systematically checking the ?libdiskfs code for this kind of locking issues. To achieve this, some kind of test harness has to be implemented: For exmple instrumenting the code to check locking correctness constantly at runtime. Or implementing a unit testing framework that explicitely checks locking in various code paths. (The latter could serve as a template for implementing unit checks in other parts of the Hurd codebase...)

(A systematic code review would probably suffice to find the existing locking issues; but it wouldn't document the work in terms of actual code produced, and thus it's not suitable for a GSoC project...)

This task requires experience with debugging locking issues in multithreaded applications.

Possible mentors: Samuel Thibault (youpi)

Exercise: If you could actually track down and fix one of the existing locking errors before the end of the application process, that would be excellent. This might be rather tough though, so probably you need to talk to us about an alternative exercise task...

Posted 2009-03-05 18:20:56 UTC

The Hurd was originally created at a time when the pthreads standard didn't exist yet. Thus all Hurd servers and libraries are using the old ?cthreads package that came with Mach, which is not compatible with ?pthreads.

Not only does that mean that people hacking on Hurd internals have to deal with a non-standard thread package, which nobody is familiar with. Although a pthreads implementation for the Hurd was created in the meantime, it's not possible to use both cthreads and pthreads in the same program. Consequently, pthreads can't presently be used in any Hurd servers -- including translators.

(Thus it's impossible to use the Hurd libfuse with any FUSE modules depending on pthreads for example.)

Most of the conversion has already been done in previous efforts (see GNU Savannah task #5487) -- but the tricky parts are still missing.

The goal of this project is to have all the Hurd code use pthreads. Should any limitations in the existing pthreads implementation turn up that hinder this transition, they will have to be fixed as well.

This project requires relatively little Hurd-specific knowledge. Experience with multithreaded programming in general and pthreads in particular is required, though.

Possible mentors: Barry deFreese (bddebian), Samuel Thibault (youpi)

Exercise: Try to fix one of the outstanding issues with the work done so far. It's not yet complete, and there hasn't been much debugging yet, so it should not be too hard to find something needing improvement -- but if you don't see anything obvious, feel free to talk to us about an alternative exercise task.

Posted 2009-03-05 18:20:56 UTC

The Hurd presently has no sound support. Fixing this, GNU Savannah task #5485, requires two steps: the first is to port some other kernel's drivers to GNU Mach so we can get access to actual sound hardware. The second is to implement a userspace server (translator), that implements an interface on top of the kernel device that can be used by applications -- probably OSS or maybe ALSA.

Completing this task requires porting at least one driver (e.g. from Linux) for a popular piece of sound hardware, and the basic userspace server. For the driver part, previous experience with programming kernel drivers is strongly advisable. The userspace part requires some knowledge about programming Hurd translators, but shouldn't be too hard.

Once the basic support is working, it's up to the student to use the remaining time for porting more drivers, or implementing a more sophisticated userspace infrastructure. The latter requires good understanding of the Hurd philosophy, to come up with an appropriate design.

Another option would be to evaluate whether a driver that is completely running in user-space is feasible.

Possible mentors: Samuel Thibault (youpi)

Exercise: This project requires kernel (driver framework) hacking as well as some Hurd server hacking; so the exercise should involve either of these, or even both. You could for example port some newer driver to run in the existing framework (see the device driver project descrption), or try to make some fix(es) to the unfinished random device implementation created by Michael Casadevall.

Posted 2009-03-05 18:20:56 UTC

The most obvious reason for the Hurd feeling slow compared to mainstream systems like GNU/Linux, is very slow harddisk access.

The reason for this slowness is lack and/or bad implementation of common optimisation techniques, like scheduling reads and writes to minimalize head movement; effective block caching; effective reads/writes to partial blocks; reading/writing multiple blocks at once; and read-ahead. The ext2 filesystem server might also need some optimisations at a higher logical level.

The goal of this project is to analyze the current situation, and implement/fix various optimisations, to achieve significantly better disk performance. It requires understanding the data flow through the various layers involved in disk acces on the Hurd (filesystem, pager, driver), and general experience with optimising complex systems. That said, the killing feature we are definitely missing is the read-ahead, and even a very simple implementation would bring very big performance speedups.

Possible mentors: Samuel Thibault (youpi)

Exercise: Look through all the code involved in disk I/O, and try something easy to improve. It's quite likely though that you will find nothing obvious -- in this case, please contact us about a different exercise task.

Posted 2009-03-05 18:20:56 UTC

Hurd/Mach presently make very bad use of the available physical memory in the system. Some of the problems are inherent to the system design (the kernel can't distinguish between important application data and discardable disk buffers for example), and can't be fixed without fundamental changes. Other problems however are an ordinary lack of optimisation, like extremely crude heuristics when to start paging. (See http://lists.gnu.org/archive/html/bug-hurd/2007-08/msg00034.html for example.) Many parameters are based on assumptions from a time when typical machines had like 16 MiB of RAM, or simply have been set to arbitrary values and never tuned for actual use.

The goal of this project is to bring the virtual memory management in Hurd/Mach closer to that of modern mainstream kernels (Linux, FreeBSD), by comparing the implementation to other systems, implementing any worthwhile improvements, and general optimisation/tuning. It requires very good understanding of the Mach VM, and virtual memory in general.

This project is related to GNU Savannah task #5489.

Possible mentors: Samuel Thibault (youpi)

Exercise: Make some modification to the existing VM code. You could try to find a piece of code that can be improved with simple code optimization, for example.

Posted 2009-03-05 18:20:56 UTC

In traditional monolithic system, the kernel keeps track of all mounts; the information is available through /proc/mounts (on Linux at least), and in a very similar form in /etc/mtab.

The Hurd on the other hand has a totally decentralized file system. There is no single entity involved in all mounts. Rather, only the parent file system to which a mountpoint (translator) is attached is involved. As a result, there is no central place keeping track of mounts.

As a consequence, there is currently no easy way to obtain a listing of all mounted file systems. This also means that commands like df can only work on explicitely specified mountpoints, instead of displaying the usual listing.

One possible solution to this would be for the translator startup mechanism to update the mtab on any mount/unmount, like in traditional systems. However, there are some problems with this approach. Most notably: what to do with passive translators, i.e., translators that are not presently running, but set up to be started automatically whenever the node is accessed? Probably these should be counted among the mounted filesystems; but how to handle the mtab updates for a translator that is not started yet? Generally, being centralized and event-based, this is a pretty unelegant, non-hurdish solution.

A more promising approach is to have mtab exported by a special translator, which gathers the necessary information on demand. This could work by traversing the tree of translators, asking each one for mount points attached to it. (Theoretically, it could also be done by just traversing all nodes, checking each one for attached translators. That would be very inefficient, though. Thus a special interface is probably required, that allows asking a translator to list mount points only.)

There are also some other issues to keep in mind. Traversing arbitrary translators set by other users can be quite dangerous -- and it's probably not very interesting anyways what private filesystems some other user has mounted. But what about the global /etc/mtab? Should it list only root-owned filesystems? Or should it create different listings depending on what user contacts it?...

That leads to a more generic question: which translators should be actually listed? There are different kinds of translators: ranging from traditional filesystems (?disks and other actual stores), but also purely virtual filesystems like ?ftpfs or unionfs, and even things that have very little to do with a traditional filesystem, like a gzip translator, mbox translator, xml translator, or various device file translators... Listing all of these in /etc/mtab would be pretty pointless, so some kind of classification mechanism is necessary. By default it probably should list only translators that claim to be real filesystems, though alternative views with other filtering rules might be desirable.

After taking decisions on the outstanding design questions, the student will implement both the actual ?mtab translator, and the necessery interface(s) for gathering the data. It requires getting a good understanding of the translator mechanism and Hurd interfaces in general.

Possible mentors: Olaf Buddenhagen (antrik)

Exercise: Make some improvement to any of the existing Hurd translators. Especially those in hurdextras are often quite rudimentary, and it shouldn't be hard to find something to improve.

Posted 2009-03-05 18:20:56 UTC

Although there are some attempts to move to a more modern microkernel alltogether, the current Hurd implementation is based on GNU Mach, which is only a slightly modified variant of the original CMU Mach.

Unfortunately, Mach was created about two decades ago, and is in turn based on even older BSD code. Parts of the BSD kernel -- file systems, UNIX mechanisms like processes and signals, etc. -- were ripped out (to be implemented in userspace servers instead); while other mechanisms were added to allow implementing stuff in userspace. (Pager interface, IPC, etc.)

Also, Mach being a research project, many things were tried, adding lots of optional features not really needed.

The result of all this is that the current code base is in a pretty bad shape. It's rather hard to make modifications -- to make better use of modern hardware for example, or even to fix bugs. The goal of this project is to improve the situation.

There are various things you can do here: Fixing compiler warnings; removing dead or unneeded code paths; restructuring code for readability and maintainability etc. -- a glance at the source code should quickly give you some ideas.

This task requires good knowledge of C, and experience with working on a large existing code base. Previous kernel hacking experience is an advantage, but not really necessary.

Possible mentors: Samuel Thibault (youpi)

Exercise: You should have no trouble finding something to improve when looking at the gnumach code, or even just at compiler warnings.

Posted 2009-03-05 18:20:56 UTC

Hurd translators allow presenting underlying data in a different format. This is a very powerful ability: it allows using standard tools on all kinds of data, and combining existing components in new ways, once you have the necessary translators.

A typical example for such a translator would be xmlfs: a translator that presents the contents of an underlying XML file in the form of a directory tree, so it can be studied and edited with standard filesystem tools, or using a graphical file manager, or to easily extract data from an XML file in a script etc.

The exported directory tree should represent the DOM structure of the document, or implement XPath/XQuery, or both, or some combination thereof (perhaps XPath/XQuery could be implemented as a second translator working on top of the DOM one) -- whatever works well, while sticking to XML standards as much as possible.

Ideally, the translation should be reversible, so that another, complementary translator applied on the expanded directory tree would yield the original XML file again; and also the other way round, applying the complementary translator on top of some directory tree and xmlfs on top of that would yield the original directory again. However, with the different semantics of directory trees and XML files, it might not be possible to create such a universal mapping. Thus it is a desirable goal, but not a strict requirement.

The goal of this project is to create a fully usable XML translator, that allows both reading and writing any XML file. Implementing the complementary translator also would be nice if time permits, but is not mandatory part of the task.

The existing partial (read-only) xmlfs implementation can serve as a starting point.

This task requires pretty good designing skills. Very good knowledge of XML is also necessary. Learning translator programming will obviously be necessary to complete the task.

Possible mentors: Olaf Buddenhagen (antrik)

Exercise: Make some improvement to the existing xmlfs, or some other existing Hurd translator. (Especially those in hurdextras are often quite rudimental -- it shouldn't be hard to find something to improve...)

Posted 2009-03-05 18:20:56 UTC

In UNIX systems, traditionally most software is installed in a common directory hierachy, where files from various packages live beside each other, grouped by function: user-invokable executables in /bin, system-wide configuration files in /etc, architecture specific static files in /lib, variable data in /var, and so on. To allow clean installation, deinstallation, and upgrade of software packages, GNU/Linux distributions usually come with a package manager, which keeps track of all files upon installation/removal in some kind of central database.

An alternative approach is the one implemented by GNU Stow: each package is actually installed in a private directory tree. The actual standard directory structure is then created by collecting the individual files from all the packages, and presenting them in the common /bin, /lib, etc. locations.

While the normal Stow package (for traditional UNIX systems) uses symlinks to the actual files, updated on installation/deinstallation events, the Hurd translator mechanism allows a much more elegant solution: stowfs (which is actually a special mode of unionfs) creates virtual directories on the fly, composed of all the files from the individual package directories.

The problem with this approach is that unionfs presently can be launched only once the system is booted up, meaning the virtual directories are not available at boot time. But the boot process itself already needs access to files from various packages. So to make this design actually usable, it is necessary to come up with a way to launch unionfs very early at boot time, along with the root filesystem.

Completing this task will require gaining a very good understanding of the Hurd boot process and other parts of the design. It requires some design skills also to come up with a working mechanism.

Possible mentors: ?

Posted 2009-03-05 18:20:56 UTC

In some situations it is desirable to have a file system that is not backed by actual disk storage, but only by anonymous memory, i.e. lives in the RAM (and possibly swap space).

A simplistic way to implement such a memory filesystem is literally creating a ramdisk, i.e. simply allocating a big chunck of RAM (called a memory store in Hurd terminology), and create a normal filesystem like ext2 on that. However, this is not very efficient, and not very convenient either (the filesystem needs to be recreated each time the ramdisk is invoked). A nicer solution is having a real tmpfs, which creates all filesystem structures directly in RAM, allocating memory on demand.

The Hurd has had such a tmpfs for a long time. However, the existing implementation doesn't work anymore -- it got broken by changes in other parts of the Hurd design.

There are several issues. The most serious known problem seems to be that for technical reasons it receives RPCs from two different sources on one port, and gets mixed up with them. Fixing this is non-trivial, and requires a good understanding of the involved mechanisms.

The goal of this project is to get a fully working, full featured tmpfs implementation. It requires digging into some parts of the Hurd, including the pager interface and translator programming. This task probably doesn't require any design work, only good debugging skills.

Possible mentors: ?

Exercise: Take a look at tmpfs and try to fix one of the existing issues. Some of them are probably not too tricky; or you might discover something else you could improve while working on it. If you don't find anything obvious, contact us about a different exercise task.

Posted 2009-03-05 18:20:56 UTC

For historical reasons, UNIX filesystems have a real (hard) .. link from each directory pointing to its parent. However, this is problematic, because the meaning of "parent" really depends on context. If you have a symlink for example, you can reach a certain node in the filesystem by a different path. If you go to .. from there, UNIX will traditionally take you to the hard-coded parent node -- but this is usually not what you want. Usually you want to go back to the logical parent from which you came. That is called "lexical" resolution.

Some application already use lexical resolution internally for that reason. It is generally agreed that many problems could be avoided if the standard filesystem lookup calls used lexical resolution as well. The compatibility problems probably would be negligable.

The goal of this project is to modify the filename lookup mechanism in the Hurd to use lexical resolution, and to check that the system is still fully functional afterwards. This task requires understanding the filename resolution mechanism.

See also GNU Savannah bug #17133.

Possible mentors: ?

Exercise: This project requires changes to the name lookup mechanism in the Hurd-related glibc parts, as well as the Hurd servers. Thus, the exercise task should involve hacking glibc or Hurd servers, or even both. Fixing the bug in the client-side nfs translator (/hurd/nfs) that makes "rmdir foo/" fail while "rmdir foo" works, seems a good candidate.

Posted 2009-03-05 18:20:56 UTC

As the Hurd attempts to be (almost) fully UNIX-compatible, it also implements a chroot() system call. However, the current implementation is not really good, as it allows easily escaping the chroot, for example by use of passive translators.

Many solutions have been suggested for this problem -- ranging from simple workaround changing the behaviour of passive translators in a chroot; changing the context in which passive translators are exectuted; changing the interpretation of filenames in a chroot; to reworking the whole passive translator mechanism. Some involving a completely different approch to chroot implementation, using a proxy instead of a special system call in the filesystem servers.

See http://tri-ceps.blogspot.com/2007/07/theory-of-filesystem-relativity.html for some suggestions, as well as the followup discussions on http://lists.gnu.org/archive/html/gnu-system-discuss/2007-09/msg00118.html and http://lists.gnu.org/archive/html/bug-hurd/2008-03/msg00089.html.

The task is to pick and implement one approach for fixing chroot.

This task is pretty heavy: it requires a very good understanding of file name lookup and the translator mechanism, as well as of security concerns in general -- the student must prove that he really understands security implications of the UNIX namespace approach, and how they are affected by the introduction of new mechanisms. (Translators.) More important than the actual code is the documentation of what he did: he must be able to defend why he chose a certain approach, and explain why he believes this approach really secure.

Possible mentors: ?

Exercise: It's hard to come up with a relevant exercise, as there are so many possible solutions... Probably best to make an improvement to one of the existing translators -- if possible, something touching name resolution or and such, e.g. implementing file_reparent() in a translator that doesn't support it yet.

Posted 2009-03-05 18:20:56 UTC

Most GNU/Linux systems use pretty sophisticated package managers, to ease the management of installed software. These keep track of all installed files, and various kinds of other necessary information, in special databases. On package installation, deinstallation, and upgrade, scripts are used that make all kinds of modifications to other parts of the system, making sure the packages get properly integrated.

This approach creates various problems. For one, all management has to be done with the distribution package management tools, or otherwise they would loose track of the system state. This is reinforced by the fact that the state information is stored in special databases, that only the special package management tools can work with.

Also, as changes to various parts of the system are made on certain events (installation/deinstallation/update), managing the various possible state transitions becomes very complex and bug-prone.

For the official (Hurd-based) GNU system, a different approach is intended: making use of Hurd translators -- more specifically their ability to present existing data in a different form -- the whole system state will be created on the fly, directly from the information provided by the individual packages. The visible system state is always a reflection of the sum of packages installed at a certain moment; it doesn't matter how this state came about. There are no global databases of any kind. (Some things might require caching for better performance, but this must happen transparently.)

The core of this approach is formed by stowfs, which creates a traditional unix directory structure from all the files in the individual package directories. But this only handles the lowest level of package management. Additional mechanisms are necessary to handle stuff like dependencies on other packages.

The goal of this task is to create these mechanisms.

Possible mentors: Ben Asselstine (bing)

Exercise: Make some improvement to any of the existing Hurd translators. Especially those in hurdextras are often quite rudimentary, and it shouldn't be hard to find something to improve.

Posted 2009-03-05 18:20:56 UTC

The primary means of distributing the Hurd is through Debian GNU/Hurd. However, the installation CDs presently use an ancient, non-native installer. The situation could be much improved by making sure that the newer Debian Installer works on the Hurd.

Some preliminary work has been done, see http://wiki.debian.org/DebianInstaller/Hurd.

The goal is to have the Debian Installer fully working on the Hurd. It requires relatively little Hurd-specific knowledge.

Possible mentors: Samuel Thibault (youpi)

Exercise: Try to get one piece of the installer running on Hurd.

Posted 2009-03-05 18:20:56 UTC

The Hurd design facilitates splitting up large applications into independent, generic components, which can be easily combined in different contexts, by moving common functionality into separate Hurd servers (translators), accessible trough filesystem interfaces and/or specialized RPC interfaces.

Download protocols like FTP, HTTP, BitTorrent etc. are very good candidates for this kind of modularization: a program could simply use the download functionality by accessing FTP, HTTP etc. translators.

There is already an ftpfs traslator in the Hurd tree, as well as an httpfs translator on hurdextras; however, these are only suitable for very simple use cases: they just provide the actual file contents downloaded from the URL, but no additional status information that are necessary for interactive use. (Progress indication, error codes, HTTP redirects etc.)

A new interface providing all this additional information (either as an extension to the existing translators, or as distinct translators) is required to make such translators usable as backends for programs like apt-get for example.

The goal of this project is to design a suitable interface, implement it for at least one download protocol, and adapt apt-get (or some other program) to use this as a backend.

This task requires some design skills and some knowlegde of internet protocols, to create a suitable interface. Translator programming knowledge will have to be obtained while implementing it.

It is not an easy task, but it shouldn't pose any really hard problems either.

Possible mentors: Olaf Buddenhagen (antrik)

Exercise: Make some improvement to one of the existing download translators -- httpfs in particular is known to be buggy.

Posted 2009-03-13 11:46:00 UTC

libgtop is a library used by many applications (especially GNOME applications) to abstract the system-specific methods for obtaining information about the current state of the system -- processes running, system load etc.

A Linux-compatible procfs implementation has been created during GSoC 2008, and should cover a large part of the functionality of libgtop. However, not all necessary information is exported via /proc (even on Linux); there are some bits still missing in the Hurd procfs implementation; and there are a couple of bugs that need to be fixed to make it fully usable.

The goal of this project is a fully functional libgtop in Debian GNU/Hurd. Some application(s) using it also need to be ported, e.g. gnome-system-monitor.

Some bits of this work are easy, others need some digging into Hurd internals. This task doesn't require any specific previous knowlegde (besides of general C/UNIX programming skills of course); but during the course of the project, some knowlegde about Hurd internals will have to be obtained, along with a bit of Debian stuff.

Possible mentors: Samuel Thibault (youpi)

Exercise: Fix one of the shortcomings in the existing procfs implementation.

Posted 2009-03-19 16:13:27 UTC

POSIX describes some constants (or rather macros) like PATH_MAX/MAXPATHLEN and similar, which may be defined by the system to indicate certain limits. Many people overlook the may though: Systems only should define them if they actually have such fixed limits. The Hurd, following the GNU Coding Standards, tries to avoid this kind of arbitrary limits, and consequently doesn't define the macros.

Many programs however just assume their presence, and use them unconditionally. This is simply sloppy coding: not only does it violate POSIX and fails on systems not defining the macros, but in fact most common use cases of these macros are simply wrong! (See http://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html for some hints as to why this is so.)

There are a few hundred packages in Debian GNU/Hurd failing to build because of this -- simply grep for the offending macros in the list_of_build_failures.

Fixing these issues usually boils down to replacing char foo[PATH_MAX] by char *foo, and using dynamic memory allocation, i.e. e.g. a loop that tries geometrically growing sizes. Sometimes this is tricky, but more often not very hard. Sometimes it is even trivial because the GNU system has proper replacements. See the corresponding section of the porting guidelines page for more details. With a bit of practice, it should be easily possible to fix several programs per day.

The goal of this project is to fix the PATH_MAX and related problems in a significant number of packages, and make the fixes ready for inclusion in Debian and (where possible) upstream. No Hurd-specific knowledge is needed, nor any other special knowledge aside from general C programming skills.

Possible mentors: Samuel Thibault (youpi)

Exercise: Fix the PATH_MAX issues in some Debian package.

Posted 2009-03-21 01:00:35 UTC

The GNU Ada Translator (GNAT) isn't available for the Hurd so far. There are also a number of other Debian packages depending on GNAT, and thus not buildable on the Hurd.

The goal of this project is getting GNAT fully working in Debian GNU/Hurd. It requires implementing some explicitely system-specific stuff in GNAT, and maybe fixing a few other problems. Good knowledge of Ada is a must; some Hurd knowledge will have to be aquired while working on the project.

Possible mentors: Samuel Thibault (youpi)

Exercise: Fix one of the problems preventing GNAT from working on the Hurd.

Posted 2009-03-21 22:19:19 UTC

Many programs use special libraries to access certain hardware devices, like libusb, libbluetooth, libraw1394, libiw-dev (though there already is a wireless-tools-gnumach package), etc.

The Hurd presently doesn't support these devices. Nevertheless, all of these programs could still be built -- and most of them would indeed be useful -- without actual support of these hardware devices, kdebase for instance. However, as the libraries are presently not available for Hurd, the programs can't be easily built in Debian GNU/Hurd due to missing dependencies.

This could be avoided by providing dummy libraries, which the programs could link against, but which wouldn't actually do any hardware access: instead, they would simply return appropriate error codes, reporting that no devices were found.

There are two possible approaches for providing such stub libraries: Either implement replacement libraries providing the same API as the real ones; or implement dummy backends for the Hurd in the proper libraries. Which approach to prefer probably depends on the structure of the various libraries.

The goal of this project is to create working dummy libraries/backends for the mentioned devices, and get them into Debian GNU/Hurd. It shouldn't require any special previous knowledge, though some experience with build systems would be helpful. Finishing this task will probably require learning a bit about the hardware devices in question, and about Debian packaging.

Possible mentors: Samuel Thibault (youpi)

Exercise: Get one of the libraries to compile on Debian GNU/Hurd. It doesn't need to report reasonable error codes yet -- just make it build at all for now.

Posted 2009-03-22 19:40:30 UTC

The Hurd presently has only support for CD-ROMs, but not for audio extraction ("grabbing"). As a result, cdparanoia (and other extraction libraries/utilities) are not available; and many other packages depending on these can't be built in Debian GNU/Hurd either.

Adding support for audio extraction shouldn't be too hard. It requires implementing a number of additional ioctl()s, generating the appropriate ATAPI commands.

The goal of this task is fully working cdparanoia in Debian GNU/Hurd. It will require digging a bit into Hurd internals and ATAPI commands, but should be quite doable without any previous knowledge about either.

Possible mentors: Samuel Thibault (youpi)

Exercise: Look at the implementation of the existing ioctl()s, and try to find something that could be easily added/improved. If you don't see anything obvious, talk to us about a different exercise task.

Posted 2009-03-22 19:59:46 UTC

Perl is available on the Hurd, but there are quite a lot of test suite failures. These could be caused by problems in the system-specific implementation bits of Perl, and/or shortcomings in the actual system functionality which Perl depends on.

The goal of this project is to fix all of these problems if possible, or at least some of them. Some issues might require digging quite deep into Hurd internals, while others are probably easy to fix.

Note that while some Perl knowledge is probably necessary to understand what the test suite failures are about, the actual work necessary to fix these issues is mostly C programming -- in the implementation of Perl and/or the Hurd.

Possible mentors: Samuel Thibault (youpi)

Exercise: Make some improvement to Perl support on the Hurd, e.g. fixing one of the known test suite failures.

Posted 2009-03-22 20:12:17 UTC

libcap is a library providing the API to access POSIX capabilities. These allow giving various kinds of specific privileges to individual users, without giving them full root permissions.

Although the Hurd design should faciliate implementing such features in a quite natural fashion, there is no support for POSIX capabilities yet. As a consequence, libcap is not available on the Hurd, and thus various packages using it can not be easily built in Debian GNU/Hurd.

The first goal of this project is implementing a dummy libcap, which doesn't actually do anything useful yet, but returns appropriate status messages, so program using the library can be built and run on Debian GNU/Hurd.

Having this, actual support for at least some of the capabilities should be implemented, as time permits. This will require some digging into Hurd internals.

Some knowledge of POSIX capabilities will need to be obtained, and for the latter part also some knowledge about the Hurd architecture. This project is probably doable without previous experience with either, though.

Possible mentors: Samuel Thibault (youpi)

Exercise: Make libcap compile on Debian GNU/Hurd. It doesn't need to actually do anything yet -- just make it build at all for now.

Posted 2009-03-24 17:57:47 UTC

Extended attributes (xattr) are a standardized, generic method for storing additional metadata along with a file (inode). Most modern UNIX filesystems support xattrs.

In general, xattrs should be used sparingly, as they are less transparent than data stored as explicit file contents; however, there are some cases where they really make sense. The Hurd's variant of ext2 presently uses some additional fields in the inode to store Hurd-specific metadata: most notable passive translator settings. As these fields are Hurd-specific, they can't be accessed by the standard methods from Linux for example, so it's not possible to fully work with a Hurd filesystem on GNU/Linux (copy, backup etc.); and also, even when on Hurd, only tools that explicitely support the Hurd-specific information can handle them.

Using extended attributes instead of custom fields for the Hurd-specific information would be very helpful.

The most important goal of this project thus is to make the Hurd ext2fs server able to store and read the Hurd-specific information with extended attributes instead of the custom fields, so it become accessible from other systems. Being able to access the information through the standard xattr API instead of Hurd-specific calls is also desirable. (And in turn requires implementing the generic xattr API first, which can be useful for other purposes as well.)

Completing this project will require digging into some parts of the Hurd, but it should be quite doable without previous Hurd experience. Some experience with xattrs might help a bit, but shouldn't be really necessary either.

Some previous work on xattr support is available in GNU Savannah patch #5126, and might serve as a starting point.

Possible mentors: Samuel Thibault (youpi)

Exercise: Implement support for different inode sizes (other than 128 bytes) in Hurd's ext2fs.

Posted 2009-03-25 04:24:15 UTC