Information for GNU grep developers

[image of the head of a GNU]

Generic GNU info | Mailing lists | Savannah | Git | Release procedure | To do | Distributors

1  Generic GNU information

A good start is to read the GNU coding standards and the Information for maintainers of GNU software.

2  Mailing lists

GNU grep's mailing lists are hosted on

2.1  The bug-grep mailing list

To report bugs, suggest features, ask questions, or help in the development of GNU grep, please send email to the bug-grep mailing list. You can attach bug fixes and patches to your email. To save time, you may want to first look at GNU grep's bug report log to see whether the bug has already been reported. If you see, for example, that Bug#16979 is similar to the symptoms you observe, you can follow up to that bug report by sending email to <>.

Before contributing significant changes to GNU grep, the Free Software Foundation (FSF) requires that you sign copyright assignment papers. Therefore, if you have not already done so and are not willing or able to, it may be better then to just describe bugs or proposed features rather than post actual code (or documentation), as they would then have to be rewritten anyway.

2.2  The grep-commit mailing list

The grep-commit read-only mailing list tracks all changes made to GNU grep.

2.3  Other deprecated mailing lists

Older GNU grep releases directed users to the bug-gnu-utils mailing list. As a consequence, some still post their bug reports and questions there. For this reason, it is a good idea for GNU grep developers to monitor this mailing list and follow up on related threads started there by redirecting them to the bug-grep mailing list. New threads about GNU grep should not be intentionally started there.

3  Project page on Savannah

The Savannah project page for GNU grep features development-related tools.

4  Git repository

4.1  Source code

See the Savannah web page about the Git repository for GNU grep's source code.

4.2  Web site

See the Savannah web page about the CVS repository for GNU grep's web pages.

4.3  Tools

Developers with write access to the repositories will need to create an account on Savannah and upload their SSH public identity information there.

6  Release procedure

A number of tasks must be performed before every release. See README-release.

6.1  Source code compatibility with GNU awk

Drop dfa.[ch] into a copy of gawk and run “make check”. This step will soon be obsolete: we're syncing the two dfa.c files.

7  To do

7.1  Other implementations

See this list of grep implementations.

Take a look at these and consider opportunities for merging or cloning:

7.2  POSIX

In general, interesting things to check in POSIX/OpenGroup include:

7.2.1  POSIX and --ignore-case

For this issue, interesting things to check in POSIX include:

In particular, consider the following with POSIX' approach on case folding in mind. Assume a non-Turkic locale with a character repertoire reduced to the following various forms of “LATIN LETTER I”:

0049;LATIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;0069;
0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049
0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;0049

First note the differing UTF-8 octet lengths of U+0049 (0x49) and U+0069 (0x69) versus U+0130 (0xC4 0xB0) and U+0131 (0xC4 0xB1). This implies that whole UTF-8 strings cannot be case-converted in place, using the same memory buffer, and that the needed octet-size of the new buffer cannot merely be guessed.

We have

lc(I) = i, uc(I) = I
lc(i) = i, uc(i) = I
lc(İ) = i, uc(İ) = İ
lc(ı) = ı, uc(ı) = I

where lc() and uc() denote lower-case and upper-case conversions.

There are several candidate --ignore-case logics (including the one mandated by POSIX):

Any optimization in the implementation of each logic must not change its basic semantic.

7.3  Unicode

In general, interesting things to check in Unicode include:

7.3.1  Unicode and --ignore-case

For this issue, interesting things to check in Unicode include:

Unicode uses the

if (toCasefold(input_wchar_string) == toCasefold(pattern_wchar_string))

logic for caseless matching. Let's consider the “LATIN LETTER I” example mentioned above. In a non-Turkic locale, simple case folding yields

toCasefold_simple(U+0049) = U+0069
toCasefold_simple(U+0069) = U+0069
toCasefold_simple(U+0130) = U+0130
toCasefold_simple(U+0131) = U+0131

which leads to the following matches:

  \in  I  i  İ  ı
pat\   ----------
"I" |  Y  Y  n  n
"i" |  Y  Y  n  n
"İ" |  n  n  Y  n
"ı" |  n  n  n  Y

This is different from anything so far!

In a non-Turkic locale, full case folding yields

toCasefold_full(U+0049) = U+0069
toCasefold_full(U+0069) = U+0069
toCasefold_full(U+0130) = <U+0069, U+0307>
toCasefold_full(U+0131) = U+0131



which leads to the following matches:

  \in  I  i  İ  ı
pat\   ----------
"I" |  Y  Y  *  n
"i" |  Y  Y  *  n
"İ" |  n  n  Y  n
"ı" |  n  n  n  Y

This is just sad!

Note that having toCasefold(U+0131), simple or full, map to itself instead of U+0069 is in contradiction with the rules of Section 5.18 of the Unicode Standard since toUpperCase(U+0131) is U+0049. Same thing for toCasefold_simple(U+0130) since toLowerCase(U+0131) is U+0069. The justification for the weird toCasefold_full(U+0130) mapping is unknown; it doesn't even make sense to add a dot (U+0307) to a letter that already has one (U+0069). It would have been so simple to put them all in the same equivalence class!

Otherwise, also consider the following problem with Unicode's approach on case folding in mind. Assume that we want to perform

echo 'AßBC | grep -i 'Sb'

which corresponds to

input:    U+0041 U+00DF U+0042 U+0043 U+000A
pattern:  U+0053 U+0062

Following “CaseFolding-4.1.0.txt”, applying the toCasefold() transformation to these yields

input:    U+0061 U+0073 U+0073 U+0062 U+0063 U+000A
pattern:                U+0073 U+0062

so, according to this approach, the input should match the pattern. As long as the original input line is to be reported to the user as a whole, there is no problem (from the user's point-of-view; implementation is complicated by this).

However, consider both these GNU extensions:

echo 'AßBC' | grep -i --only-matching 'Sb'
echo 'AßBC' | grep -i --color=always  'Sb'

What is to be reported in these cases, since the match begins in the middle of the original input character 'ß'?

Note that Unicode's toCasefold() cannot be implemented in terms of POSIX' towctrans() since that can only return a single wint_t value per input wint_t value.

7.4  Miscellaneous

8  Distributors

The purpose of this listing is to help GNU grep maintainers track down bug fixes and improvements made by distributors so they can be integrated back into the upstream releases from GNU, if appropriate.

Users should not use this listing to find a substitute target where to send their bugs reports. These are still best sent upstream, to the GNU grep team, through the use of the mailing list or of the GNU grep project page on Savannah.

This listing is not exhaustive; priority is given to listing distributors who actually maintain patches to the upstream package from GNU.

Please keep this listing sorted by entry. Each field type may appear more than once if appropriate, the field order being significant.

Debian GNU/Linux
Web site
Package database entryOld stable
MaintainerRobert van der Meulen <rvdm at>
Package database entryStable
MaintainerRyan M. Golbeck <rmgolbeck at>
MaintainerJeff Bailey <jbailey at>
Package database entryTesting
Package database entryUnstable
MaintainerAnibal Monsalve Salazar <anibal at>
MaintainerSantiago Ruano Rincon <santiago at>
Bug tracking
Source package namegrep
Binary package namegrep
Entry updated2005-11-08
Fedora Core/Red Hat
Web site
Web site
MaintainerTim Waugh <twaugh at>
Bug trackingRed Hat Bugzilla
Managed repositorycvs co devel/grep
Managed repository
Source package namegrep
Binary package namegrep
Entry updated2005-05-05
Web site
Bug tracking
Managed repositoryCVS_RSH=ssh cvs co src/gnu/usr.bin/grep
Managed repository
Entry updated2005-05-05
Gentoo Linux
Web site
Package database entry;name=grep
Bug trackingGentoo Bugzilla
Managed repository
Source package namegrep
Binary package namegrep
Entry updated2005-05-05
Mandriva Linux
Web site
Bug trackingMandriva Bugzilla
Source package namegrep
Binary package namegrep
Entry updated2005-05-05
Web site
Package database entry
Bug tracking
Managed repositorycvs co pkgsrc/textproc/grep
Managed repository
Source package namegrep
Binary package namegrep
Entry updated2005-05-05
Web site
Package database entry
MaintainerChristian Weisgerber <naddy at>
Bug tracking
Managed repositorycvs co ports/sysutils/ggrep
Managed repository
Source package nameggrep
Binary package nameggrep
Entry updated2005-11-08
Web site
MaintainerRalf S. Engelschall <rse at>
Managed repositorycvs -d co openpkg-src/grep
Managed repositoryrsync -av rsync:// .
Managed repository
Source package namegrep
Binary package namegrep
Entry updated2005-06-19
SuSE Linux
Web site
MaintainerAndreas Schwab <schwab at>
Package database entryProfessional
Source package namegrep
Binary package namegrep
Entry updated2005-06-19

Return to GNU grep's main page.

Return to the GNU Project's home page.

Return to the FSF's home page.