GNU Goose has been decommissioned. Please see the GNU Scientific Library. The remainder of this page is kept just for historical purposes.

- What is Goose?
- Getting Goose
- Goose's Features
- Goose Mailing List Information
- Developers
- Goose Alternatives

Goose is a LGPLed C++ library dedicated to statistical computation. The two design goals of this project are:

- To create a useful and complete system that takes advantage of C++'s features to improve the clarity of statistical code and that is easier to use for programmers.
- To produce a complete set of Guile bindings, exporting all of the C++ library's functionality to that environment.

Goose is being primarily developed under GNU/Linux, but an effort is being made to insure that it is portable to both other Un*x systems and to Win32.

You should be aware that **Goose is still in the early stages of
development**, and parts of it are prone to breakage,
bugginess, and sudden, sweeping, API changes. This is Alpha software.
Anyone who at this time wants to use Goose in a non-trivial way should
stay in touch with the developers via the mailing
list.

With that said, you should also know that the core parts of Goose are relatively stable and debugged. Goose does have a reasonable number of useful features, with more functionality being added all of the time.

The current version of Goose is **0.0.11**,
which was released on 18 Oct 1999. It can be
downloaded from
http://ftp.gnu.org/pub/gnu/goose.
(You might also want to use one of the
FSF mirror sites.)
A copy of the latest version can usually also be found at
ftp://ftp.gnome.org/pub/guppi.

RPMs are available from http://ftp.gnu.org/pub/gnu/goose/RPMS.

Development versions of Goose currently live in the
Gnome Project's
CVS server.
Using the anonymous CVS server, just check out `goose`.
That server also has a nice mechanism for
browsing the cross-referenced
source code.

The following is a list of features in Goose that should (more or less) work. Additional features may be available in the development version.

- Numerical functions that are useful for statistical computation, including
- Combinatorial functions: factorial, log factorial, binomial coefficient, log of binomial coefficient.
- CDF and inverse CDF functions for many common distributions, including normal (Gaussian), binomial, negative binomial, beta, chi-square, F, gamma, Poisson, hypergeometric, and Student's t.
- Other useful special functions: gamma function, log of gamma, incomplete gamma function, log of incomplete gamma.

- A fast, high-quality Mersenne Twister-based random number generator.
- The RealSet class, an optimized container class for statistical data that offers:
- Copy-on-write semantics that allow large containers to be efficiently copied and passed by value.
- Cached mean, standard deviation, minimum and maximum.
- Caching of a sorted version of the data, and automatic detection of sorted data. All unnecessary sort operations are eliminated, and the user never needs to worry about if their data is sorted or not.
- Optimized data transformations: linear, exp, log, logit. Sorting. Replacement of values by their ranks. Rearrangement by arbitrary permutations. Random re-ordering.
- Efficient calculation of descriptive statistics (many in constant time): minimum, maximum, range, sum, mean, variance, standard deviation, sample standard deviation, percentile, median, quartiles, interquartile range, deciles, trimmed mean, winsorized mean, arbitrary moments, geometric mean, harmonic mean, RMS, mean deviation, median deviation, kurtosis, skewness, Durbin-Watson, autocorrelation.
- Descriptive statistics involving two variables or data sets: covariance, correlation, Spearman's rho, Kendall's tau, pooled mean, pooled variance, weighted mean.
- Calculations on empirical distribution functions: Kolmogorov-Smirnov D, D+, D-, Kuiper's V.

- Statistical tests: t-test, F-test, Kruskall-Wallis, Spearman, McNemar, Cochran's Q.
- An implementation of simple linear regression includes
- Calculation of confidence intervals for the slope and intercept.
- t- and p-values for the model.
- Pointwise diagnostics: leverage, DFBETAS, DFFITS, and Cook's D.

- Optimized, optionally multi-threaded resampling routines for bootstrapping the mean, median, standard deviation, skewness, kurtosis, or the slope and intercept of a simple linear regression.
- Kernel density estimation using Epanechnikov, Biweight, Triweight, Gaussian and Uniform kernels.
- An "automagical" ASCII import system that can analyze and make intelligent guesses about the format/layout of text files containing numeric data.

- Most of the numerical functions.
- The random number generator.
- Pretty much all of the RealSet's functionality.
- The basics of simple linear regression.

The current "official" forum for discussing Goose is the guppi-list mailing list. (We still share a pretty low-traffic mailing list with Guppi.) Subscription requests should be sent to guppi-list-request@gnome.org.

Questions and comments can also be sent to Jon Trowbridge <trow@gnu.org>.

Goose is mainly being coded by Jon Trowbridge <trow@gnu.org>, but not without a significant amount of help from other dutiful programmers (in alphabetical order):

- Bradford Hovinen (Guile Extensions, Hypothesis Testing)
- Asger Alstrup Nielsen (Infrastructure, ASCII import)
- Havoc Pennington (General Hacking, Autoconf magic, Aura of Coolness)
- Mikkel Munck Rasmussen (Statistical tests)

Goose is just one of the GNU projects that involves statistical computation, and may not be the right tool for your job. Other useful GNU tools include:

- R, an S clone, is a system for statistical computation and graphics.
- PSPP (previously known as Fiasco) is an SPSS clone. It interprets commands in the SPSS language and produces tabular output in ASCII, HTML, or PostScript format.
- GSL - The GNU Scientific Library is a collection of routines for numerical computing. The routines are written from scratch by the GSL team in ANSI C, and are meant to present a modern Applications Programming Interface (API) for C programmers, while allowing wrappers to be written for very high level languages. It contains some support for statistical functions.
- GNU Octave is a high-level language, primarily intended for numerical computations, that provides a convenient command line interface for solving linear and nonlinear problems numerically. It also offers some limited statistical functionality.

If you are aware of any other good free statistics tools that I've omitted, please e-mail me so that I can add them to the list.

Return to GNU's home page.

Please send FSF & GNU inquiries & questions to
*gnu@gnu.org*.
There are also other ways to
contact the FSF.

Please send comments on these web pages to
*webmasters@gnu.org*,
send other questions to
*gnu@gnu.org*.

Copyright (C) 1999 Free Software Foundation, Inc.

Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.

Updated: 19 Oct 1999 trow