Summer of Code project suggestions for GNU

2007 Ideas list

This page lists the project proposals for 2007. The Summer of Code program for 2007 is finished, and the Free Software Foundation is collecting project ideas on the current Google Summer of Code ideas page. If you have an idea, please email it to summer-of-code@gnu.org.

STUDENTS - BEFORE YOU SUBMIT YOUR PROJECT PROPOSAL:
Please make sure that you have read the GNU Project's guidelines for Summer of Code projects. In particular, please make sure you include all the information we need.

Many GNU projects have more than one suggestion, so they're listed in alphabetical order by project.

Bison - CSSC - DotGNU - Findutils - GNOME - GNOWSYS - GNUNet - GRUB - Hurd - libextractor - Mailutils - phpGroupWare - Shishi - www.gnu.org webserver

About adding ideas to this page:

If you are an eligible student and have an idea that is not listed here, you should propose it normally through the Google Summer of Code web site. Also, if you are a student, feel free to ask questions about the overall process at Summer-Discuss-2006@googlegroups.com, and see the guidelines for more info.
If you are a GNU package developer, have an idea for a Summer of Code project for your own package, and can mentor it yourself, please email the idea to us at summer-of-code@gnu.org and we will add it.
In all other cases (e.g., you are a developer with an idea for another package), please contact the maintainer for the package. If you can find a mentor for the project (or, hopefully, can mentor it yourself), then we will add it if it is feasible. The project must meet the Summer of Code criteria; see the guidelines.

GNU Bison

GNU Bison is the GNU project's parser generator.

Extension to other languages.
Currently Bison generates parsers in C/C++ for LALR(1) and GLR algorithms. More language support would be useful: Scheme, C# to name just a few.
Extension of the front-end features.
Many features are desirable in Bison, all of them very reachable, provided one has the time to address them:
- One would like to be able to split the grammar in several component. An "import" feature would be acclaimed by our users.
- Labeling symbols would dramatically improve the readability and maintainability of our grammars. For instance
  exp (res): exp (a) '+' exp (b) { $res = $a + $b; };
  or even
  r:exp -> a:exp '+' b:exp { r = a + b; }; Some consensus on the syntax is being made in the Bison community, the candidate will not be left alone.
- the handling of precedences and associativities is not flexible enough. For instance one sometimes wants to be able to specify precedences inside two distinct groups of operators, but not between these two groups. That's currently not possible. Using partial orders seems a better option.
- many many more! (contact us)
Burke-Fisher Error Correction
YACC error recovery scheme is quite poor: the author of the grammar has to clutter her grammar with special annotation specifying how to recover from the error. As emphasized, this is not even correction: most of the time, the intent is merely to recover from the error, i.e., continuing as far as we can, while completely ignoring the contents of the erroneous text.
The Burke-Fisher Error Correction (or repair) algorithm tries to address these issues by trying, by itself, insertion, deletion and replacement of various tokens around the error spot. In addition, the user is provide with new directives to specify semantic values of tokens to create.
This scheme is
- nice for the grammar author
- nice for the users
Some inspiration can be taken from SML YACC, which provides this service.
The work would consist of implementing Burke-Fisher in Bison, for C output only as a first stage. Andrew Appel's ``Modern Compiler Implementation'' gives some information about it, but it is certainly not sufficient. Writing documentation and test cases in the current testing framework is a mandatory part of the project.
XML Automaton Report
Building a parser for a complex language is a very iterative process. The automaton report in the *.output file plays a central role in this process; it contains all the relevant information:
- A summary of the grammar
- A table of conflicts
- Information about the symbols
- The description of the automaton, state by state:
  - Current items (pointed rules) and look-aheads
  - Actions (shift/reduce etc.)
  - Conflicts
Unfortunately navigating in this file is painful. Hyperlinks would make navigation much easier and much more efficient: clicking in the table of conflicts yields the reader to the right spot, clicking on an item drives to the rule in the grammar, clicking on a symbol yields it definition and so forth.
A graphical representation is sometimes helpful. Bison already features VCG output, but it is poor, and not very useful.
Some users would like richer information, for instance a complete section for precedence and associativity conflict resolution, etc.
All these needs can be addressed simply if Bison had an XML format to output its automaton. Using XSLT transformations enables any kind of output/use, including graphical with hyperlinks to an HTML page, and back.
The candidate will have to:
- Define an XML grammar for Bison reports
- Implement its production in bison
- Implement an XSLT transformation to produce an HTML page
- Implement an XSLT transformation to produce a Dot file

GNU CSSC

DotGNU

If you want to discuss these project ideas or want to ask about an idea of you own, you can contact the DotGNU developers by email on the DotGNU mailing list, or by IRC on the #portable.net or #dotgnu channels on irc.freenode.net.

Finish libJIT ELF writer (Complexity: medium)
Read the libjit rationale for instruction and rationale for the DotGNU JIT Library (libJIT). The libJIT library contains routines that permit pre-compiling JIT'ed functions to an on-disk representation. This representation can be loaded at some future time, to avoid the overhead of compiling the functions at runtime. We use the ELF format for this purpose, which is a common binary format used by modern operating systems and compilers. GNU/Linux uses ELF. However, it isn't necessary for your operating system to be based on ELF natively. We use our own routines to read and write ELF binaries. We chose ELF because it has all of the features that we require, and reusing an existing format was better than inventing a completely new one.
Port libJIT to a new architecture (Complexity: medium)
You could port libJIT to a new architecture, for example OpenRISC, SPARC, MIPSEL and so on. For this project, you should be familiar with compiler implementation techniques and the particulars of the target CPU's instruction set. The libJIT manual describes the steps needed to for porting libJIT to new architectures.
Enhance the libJIT interpreter (Complexity: medium-high)
LibJIT includes an interpreter for running code on platforms that don't have a native code generator yet. This reduces the need for programmers to write their own interpreters for such platforms. Essentially, this project means making the regression tests with 'make check' in the Portable .NET directory work with the interpreter.
Finish the implementation of libJIT support for ARM or x86-64 (Complexity: medium)
For this project, you should be familiar with compiler implementation techniques and the particulars of the target CPU's instruction set. The libJIT manual describes the steps needed to for porting libJIT to new architectures. We can provide access to ARM and x86-64 machines, and indeed machines with other CPUs too.
Enhance libJIT support for x86 (Complexity: medium)
LibJIT includes a set of primitive code generators. However, the current implementation calls intrinstic functions for opcodes with long and float values. These need to be implemented as primitive code generators instead.
Enhance libJIT optimization (Complexity: medium-high)
For example, implement inlining, enhance constant propagation or dead-code elimination.
Porting Application (Complexity: medium)
There are a number of Free applications using .NET which currently do not run under DotGNU. Pick any non-trivial Free application and propose a Summer-of-Code project to make it work under DotGNU. The CodeProject contains many software projects that are interesting, but they are likely small. Ports should aim to create a helper class library to assist in the porting. Basically, every time a P/Invoke is found in one of these applications or a dependency exists on a third-party control or library, some stubs or primitive implementation should be exposed in this "helper" library. This includes Windows.Forms, XML, and Internet applications.
Enhance Windows.Forms (Complexity: medium)
The Portable .NET Windows.Froms library implements much of .NET 1.1, but many are still missing. None of the .NET 2.0 specific Windows.Forms is implemented yet. This project would significantly enhance the completeness of implementation of at .NET 1.1 or .NET 2.0.
Replacing CIL with native code. (Complexity: very high)
DotGNU contains a code generator that can be used for Just-in-Time compilation at runtime. Code can also be compiled ahead of time to produce native code before it's needed. JIT compilation is more commonly used, but for some systems where memory is restricted or where startup time is important, pre-compiling the code can be a significant win. The goal is to modify the runtime and compilation so that the bytecodes can be safely removed from a program and a single image is shipped containing both metadata and native code.
Implement generics or any other C# 2.0, 3.0 feature. (Complexity: very high)
The Portable .NET C# compiler is based on the treecc tool.
Object-oriented C# bindings for Allegro (Complexity: medium-high)
This project would provide C# bindings for the Allegro library. This includes not only being able to call Allegro functions from C#, but also being able to do so in a way which 'feels natural' for the C# language. While the first part of the task is technically straightforward to define, the second part will require some thoughtful interface design. The Allegro library is a free video game software library, with functions for basic 2D graphics, image manipulation, text output, audio output, midi music, input and timers. It also includes additional routines for things like fixed-point and floating-point matrix arithmetic, unicode strings, file system access, file manipulation, data files, and (limited, software-only) 3D graphics.

GNU findutils

Replacement of updatedb and better support in xargs for ARG_MAX
This project suggestion contains a number of smaller tasks, so it has more than one deliverable item.
- Enhance locate
  1. Implement a replacement for the current updatedb shell script which does pretty much the same thing but is less ugly. Don't introduce a dependency on anything not in the base system install (i.e. /bin/sh and C are OK, but Perl probably isn't).
- Enhance xargs
  1. Implement an optional feature in which xargs figures out how long a command line it can pass to exec() without necessarily believing ARG_MAX (because for example with the Linux kernel this can be an underestimate).
Enhanced test coverage for find
There are two distinct parts to this project suggestion:
- Time-based predicates
  Many of the find predicates are not well tested by the findutils regression test suite. It would be good to find a way of enhancing the test suite to cover these predicates well without having to do things like touch foo; sleep 1; touch bar (which in any case will probably not work well on FAT filesystems, since they have a coarse tmestamp granularity).
- Enhance the test suite to measure its own test coverage, and improve the test coverage
Be more NFS friendly
Allow locate to pick up databases from mount points, so that many NFS clients can share the same locate database which we build on the server. Automatically adjust the path prefix of the results where requried. Ideally this should work without any requirement for locate to read a separate external configuration file. If you change the interpretation of $LOCATE_PATH, do this in a backward-compatible way.
Near-real-time enhancements for locate and updatedb
At the moment the locate database is rebuilt periodically, but already there are a number of systems (for example, inotify) for monitoring filesystems for changes in real-time. Perhaps one of those is suitable for monitoring an entire filesystem. If so, it may be possible to build a locate command whose results are no more than seconds out of date. It's likely that this task would fit best with the previous one, since a near-real-time database might apply only to one filesystem.
Update: Feedback from GNU/Linux filesystem developers raises a few problems with this idea:
1. The dnotify interface is apparently preferred over inotify.
2. Requesting a notification for every direcoty on a filesystem would be tremendously resource intensive.
3. Many systems have a number of directories which have huge rates of change (for example mail queues).
Plugin architecture
Design and build a plugin architecture for findutils (and, in particular, find). Plugins should be able to:
- Add new predicates, actions, and options
- Replace existing ones
- Not be implemented in such a way that one plugin normally prevents the use of another
It should be possible to build a plugin separately from the findutils source code. This may mean that findutils may need to install a header file to define the plugin interface.

GNOWSYS

GNOWSYS is Gnowledge Networking and Organizing System, a GNU project emerging as a kernel for semantic computing. Visit Project Website, and Project Documentation Site for more details.

The following specific tasks under this project can be taken up by students for a two month project.

Currently GNOWSYS is saving the data, metadata and catalog in the object database of ZODB, the native ZOPE database. GNOWSYS is currently used to store massive size knowledge bases like wikipedia, wordnet, and other free software documentations like TLDP, man pages, info pages and other such free knowledge resources, for organizing them to produce a distributed network of free knowledge resources: e.g. gnowledge portal.
We found out that several of the ZOPE's features are not necessary for GNOWSYS and therefore its dependency on ZOPE can be gradually reduced. One of the steps in this direction is to store each object as a pickled Python object in a file system, which will store both data and metadata of each knowledge resource, while creating their index in side ZODB for serving the objects through http server. This will make GNOWSYS scale well for massive distributed knowledge bases.

Skills required: Programming in Python, basic understanding of ZOPE, sound understanding of file system.
Automated scripts to harvest data and metadata from structured free software documentation from GNU project, TexInfo pages, TLDP, man pages, etc. and create a granular reusable objects for building course ware for free software. These objects will be made available as seed content for organizing courses at gnowledge portal. It is desirable that the harvested objects are encoded in a standard XML representation.
Skill required: Python/Perl/Bash scripting abilities particularly in string processing, and working knowledge of HTTP protocol. Since most other projects use Python extensively, it is desirable that these scripts are also made in Python.
GNOWSYS is made keeping in mind the emerging semantic web. GNOWSYS does not use XML for storing the metadata of the resources. However, in order to participate with other agent oriented systems on the network it is necessary to build a single unified framework for importing and exporting the knowledge base and its organization in RDF, OWL, CL, KIF, LISP, Prolog, CNL and other such languages for seamless interoperability.
Skills required: Python, parsing XML files using SAX/DOM libraries, good understanding of FOL (first order logic), and its mapping in XML.
Gnowser: The knowledge represented in GNOWSYS can be drawn as concept maps on Mozilla Firefox Browser or a GTK client. Using appropriate toolkits on the client side, e.g. XUL, JavaScript, or Ajax, a graphical representation of the data be made for semantic navigation, as opposed to monotonous hypertext navigation. Such a tool will be very useful for elearning applications.
Skills required: PyGTK for GTK based gnowser, or XUL, XPCOM, JavaScript for Mozilla Firefox.
GnowQL (the GNOWSYS query library) is the basic API of GNOWSYS. As a database, we need to make GnowQL work like SQL, and OQL. The requirement is to parse SQL and OQL statements and map them to GnowQL, and vice versa.
Skill required: Python, sound knowledge of RDBMS, SQL, and familiarity with OODBMS, OQL.

GNUnet and GNU libextractor

The team would like to receive students' own ideas on how to improve both projects (including of course the subprojects gnunet-gtk and gnunet-qt). Talk to us! You can find us on the IRC channel #gnunet and on the gnunet-developers mailing list.

GRUB

There is a list of project suggestions for GRUB.

GNU Hurd

The GNU Hurd is the GNU project's replacement for the Unix kernel. The Hurd is a collection of servers that run on the Mach microkernel to implement file systems, network protocols, file access control, and other features that are implemented by the Unix kernel or similar kernels (such as Linux).

The following is a list of items you might want to work on. If you want to modify these task proposals or have your own ideas on what to work, then please don't hesitate to contact us on the bug-hurd mailing list or the #hurd IRC channel.

Design and implement libchannel, a library for streams.
Rewrite pfinet, our interface to the IPv4 world; create a pfinet6 to interface to the IPv6 world.
Make GNU Mach use more up to date device drivers.
Design and implement a sound system.
Introduce the world of the Andrew File System (AFS) to the Hurd.
Work on enhancing our NFS client and NFSd.
Implement support for Logical Volume Management (LVM).

GNU Mailutils

Add TLS capability to IMAP and POP client code.
All Mailutils servers (pop3d and imap4d) support encryption via TLS. However, none of the client programs (mail, sieve, etc.), support it. This should be fixed.
Implement debugging capability.
We need a universal mechanism that would allow to enable/disable debugging output in various parts of the library. Currently implemented is debug_t object, which however is not widely used in the library.
Universal aliasing support
At the moment, only Mailutils MH utilities support email aliasing. It would be nice to have aliasing support in the base libmailutils library.
Generic mail search interface.
Various parts of the package implement searching in mailboxes or folders, for example imap4d and MH. However, each of them implements it separately. We would like to have a generic mailbox search interface in the library, so that it can be used by application programs. The interface should allow searching for messages containing arbitrary strings in given parts of the message (envelope, headers, body). It should support various encodings and character sets.

phpGroupWare

phpGroupWare is open to suggestions from students, if you have an idea that is not on this list, please post a messgae on the forum, so we can discuss it. The list below is what the developers came up with (in no particular order).

GNU Shishi

Shishi is a free Kerberos 5 implementation. The goal is to be compatible with MIT Kerberos, Heimdal and Windows, and most basic features work. It can support Kerberos authentication in SSH (OpenSSH and LSH) and SASL (Cyrus SASL and GNU SASL), and support Kerberos rsh/telnet via GNU InetUtils.

The following is a list of items you might want to work on. If you want to modify or extend these tasks or have your own ideas what to work on, please feel invited to contact us on the help-shishi mailing list.

Implement the set/change password protocol, see draft-ietf-krb-wg-kerberos-set-passwd-06.txt. This would make it possible to change passwords remotely, through a standardized protocol.
Implement cross-realm authentication logic.
Improve the PAM module for host-based authentication using Shishi, possibly including a mechanism to automatically create Kerberos principals for existing "password"-based users, to allow smooth transition.
Implement a LDAP backend for the Kerberos server.
Implement Public-Key Cryptography for Initial Authentication in Kerberos, see rfc4556. This is another way to support X.509 authentication in Kerberos, compared to the one which Shishi already support through TLS.

www.gnu.org webserver

The GNU Project web server is an essential resource for people who want to know about the GNU Project or locate and use GNU software. Maintaining and enhancing it is important to the GNU Project. There is a list of tasks needed to improve the www.gnu.org web server. You might find that one of the tasks there would make a good Summer of Code project.

Other GNU Projects

Some GNU projects are registered as separate projects in the Google Summer of Code (at least, were in 2006; the registration process for 2007 is not yet complete). These include

GCC (but please also see the GCC Summer of Code wiki page, which will be updated for 2007 soon)
GIMP
GNOME