Previous: , Up: Grouping and Aggregates   [Contents][Index]


10.2 Aggregate Functions

recutils supports aggregate functions. These are so called because they accept a record set and a field name as inputs and generate a single result. Usually this result is numerical.

The supported aggregate functions are the following:

Count(FIELD)

Counts the number of occurrences of a field.

Avg(FIELD)

Calculates the average (mean) of the numerical values of a field.

Sum(FIELD)

Calculates the sum of the numerical values of a field.

Min(FIELD)

Calculates the minimum numerical value of a field.

Max(FIELD)

Calculates the maximum numerical value of a field.

The aggregate functions are to be invoked in the field expressions in recsel. By default they are applied to the totality of the records in a record set. For example, using the items database from the previous section, we can do calculations as in the following examples.

The SQL aggregate functions can be applied to the totality of the tuples in the relation. For example, using the Count aggregate function we can calculate the number of fields named Category present in the record set as follows:

$ recsel -p "Count(Category)" items.rec
Count_Category: 5

The result is a field whose name is derived from the function name and the field passed as its parameter, separated by an underline. This name scheme probably suffices for most purposes, but it is always possible to use a rewrite rule to obtain something different:

$ recsel -p "Count(Category):NumCategories" items.rec
NumCategories: 5

You can use different letter case in writing the name of the aggregate, and this will be reflected in the field name:

$ recsel -p "CoUnT(Category)" items.rec
CoUnT_Category: 5

It is possible to use more than one aggregate function in the field expression. Suppose we are also interested in the average price of the items we sell. We can use the Avg aggregate:

$ recsel -p "Count(Category),Avg(Price)" items.rec
Count_Category: 5
Avg_Price: 4.240000

Now let’s add a field along with an aggregate function to the field expression and see what we get:

$ recsel -p "Type,Avg(Price)" items.rec
Type: EC Car
Avg_Price: 12.200000

Type: Terria
Avg_Price: 0.600000

Type: Typex
Avg_Price: 1.200000

Type: Notebook
Avg_Price: 1

Type: Sexy Puzzle
Avg_Price: 6.200000

We get five records! The reason is that when only aggregate functions are part of the field expression, they are applied to the single record that would result from concatenating all the records in the record set together. However, when a regular field appears in the field expression the aggregate functions are applied to the individual records. This is still useful in some cases, such as a database of maintainers:

Name: Jose E. Marchesi
Email: jemarch@gnu.org
Email: jemarch@es.gnu.org

Name: Luca Saiu 
Email: positron@gnu.org

Lets see how many emails each maintainer has:

$ recsel -p "Name,Count(Email)" maintainers.rec
Name: Jose E. Marchesi
Count_Email: 2

Name: Luca Saiu 
Count_Email: 1

Aggregate functions are most useful when we combine them with grouping. This is when we are interested in some property of a subset of the records in the database. For example, the average prices of each item category stored in the database can be obtained by executing:

$ recsel -p "Category,Avg(Price)" -G Category items.rec
Category: Food
Avg_Price: 0.600000

Category: Office
Avg_Price: 1.100000

Category: Toy
Avg_Price: 9.200000

If we were interested in the actual prices that result in each average we can do:

$ recsel -p "Category,Price,Avg(Price)" -G Category items.rec
Category: Food
Price: 0.60
Avg_Price: 0.600000

Category: Office
Price: 1.20
Price: 1.00
Avg_Price: 1.100000

Category: Toy
Price: 12.2
Price: 6.20
Avg_Price: 9.200000

Previous: , Up: Grouping and Aggregates   [Contents][Index]