GNU Astronomy Utilities



12.1.1 Headers

C source code is read from top to bottom in the source file, therefore program components (for example, variables, data structures and functions) should all be defined or declared closer to the top of the source file: before they are used. Defining something in C or C++ is jargon for providing its full details. Declaring it, on the other-hand, is jargon for only providing the minimum information needed for the compiler to pass it temporarily and fill in the detailed definition later.

For a function, the declaration only contains the inputs and their data-types along with the output’s type243. The definition adds to the declaration by including the exact details of what operations are done to the inputs to generate the output. As an example, take this simple summation function:

double
sum(double a, double b)
{
  return a + b;
}

What you see above is the definition of this function: it shows you (and the compiler) exactly what it does to the two double type inputs and that the output also has a double type. Note that a function’s internal operations are rarely so simple and short, it can be arbitrarily long and complicated. This unreasonably short and simple function was chosen here for ease of reading. The declaration for this function is:

double
sum(double a, double b);

You can think of a function’s declaration as a building’s address in the city, and the definition as the building’s complete blueprints. When the compiler confronts a call to a function during its processing, it does not need to know anything about how the inputs are processed to generate the output. Just as the postman does not need to know the inner structure of a building when delivering the mail. The declaration (address) is enough. Therefore by declaring the functions once at the start of the source files, we do not have to worry about defining them after they are used.

Even for a simple real-world operation (not a simple summation like above!), you will soon need many functions (for example, some for reading/preparing the inputs, some for the processing, and some for preparing the output). Although it is technically possible, managing all the necessary functions in one file is not easy and is contrary to the modularity principle (see Review of library fundamentals), for example, the functions for preparing the input can be usable in your other projects with a different processing. Therefore, as we will see later (in Linking), the functions do not necessarily need to be defined in the source file where they are used. As long as their definitions are ultimately linked to the final executable, everything will be fine. For now, it is just important to remember that the functions that are called within one source file must be declared within the source file (declarations are mandatory), but not necessarily defined there.

In the spirit of modularity, it is common to define contextually similar functions in one source file. For example, in Gnuastro, functions that calculate the median, mean and other statistical functions are defined in lib/statistics.c, while functions that deal directly with FITS files are defined in lib/fits.c.

Keeping the definition of similar functions in a separate file greatly helps their management and modularity, but this fact alone does not make things much easier for the caller’s source code: recall that while definitions are optional, declarations are mandatory. So if this was all, the caller would have to manually copy and paste (include) all the declarations from the various source files into the file they are working on now. To address this problem, programmers have adopted the header file convention: the header file of a source code contains all the declarations that a caller would need to be able to use any of its functions. For example, in Gnuastro, lib/statistics.c (file containing function definitions) comes with lib/gnuastro/statistics.h (only containing function declarations).

The discussion above was mainly focused on functions, however, there are many more programming constructs such as preprocessor macros and data structures. Like functions, they also need to be known to the compiler when it confronts a call to them. So the header file also contains their definitions or declarations when they are necessary for the functions.

Preprocessor macros (or macros for short) are replaced with their defined value by the preprocessor before compilation. Conventionally they are written only in capital letters to be easily recognized. It is just important to understand that the compiler does not see the macros, it sees their fixed values. So when a header specifies macros you can do your programming without worrying about the actual values. The standard C types (for example, int, or float) are very low-level and basic. We can collect multiple C types into a structure for a higher-level way to keep and pass-along data. See Generic data container (gal_data_t) for some examples of macros and data structures.

The contents in the header need to be included into the caller’s source code with a special preprocessor command: #include <path/to/header.h>. As the name suggests, the preprocessor goes through the source code prior to the processor (or compiler). One of its jobs is to include, or merge, the contents of files that are mentioned with this directive in the source code. Therefore the compiler sees a single entity containing the contents of the main file and all the included files. This allows you to include many (sometimes thousands of) declarations into your code with only one line. Since the headers are also installed with the library into your system, you do not even need to keep a copy of them for each separate program, making things even more convenient.

Try opening some of the .c files in Gnuastro’s lib/ directory with a text editor to check out the include directives at the start of the file (after the copyright notice). Let’s take lib/fits.c as an example. You will notice that Gnuastro’s header files (like gnuastro/fits.h) are indeed within this directory (the fits.h file is in the gnuastro/ directory). You will notice that files like stdio.h, or string.h are not in this directory (or anywhere within Gnuastro).

On most systems the basic C header files (like stdio.h and string.h mentioned above) are located in /usr/include/244. Your compiler is configured to automatically search that directory (and possibly others), so you do not have to explicitly mention these directories. Go ahead, look into the /usr/include directory and find stdio.h for example. When the necessary header files are not in those specific libraries, the preprocessor can also search in places other than the current directory. You can specify those directories with this preprocessor option245:

-I DIR

“Add the directory DIR to the list of directories to be searched for header files. Directories named by ’-I’ are searched before the standard system include directories. If the directory DIR is a standard system include directory, the option is ignored to ensure that the default search order for system directories and the special treatment of system headers are not defeated...” (quoted from the GNU Compiler Collection manual). Note that the space between I and the directory is optional and commonly not used.

If the preprocessor cannot find the included files, it will abort with an error. In fact a common error when building programs that depend on a library is that the compiler does not know where a library’s header is (see Known issues). So you have to manually tell the compiler where to look for the library’s headers with the -I option. For a small software with one or two source files, this can be done manually (see Summary and example on libraries). However, to enhance modularity, Gnuastro (and most other bin/libraries) contain many source files, so the compiler is invoked many times246. This makes manual addition or modification of this option practically impossible.

To solve this problem, in the GNU build system, there are conventional environment variables for the various kinds of compiler options (or flags). These environment variables are used in every call to the compiler (they can be empty). The environment variable used for the C preprocessor (or CPP) is CPPFLAGS. By giving CPPFLAGS a value once, you can be sure that each call to the compiler will be affected. See Known issues for an example of how to set this variable at configure time.

As described in Installation directory, you can select the top installation directory of a software using the GNU build system, when you ./configure it. All the separate components will be put in their separate sub-directory under that, for example, the programs, compiled libraries and library headers will go into $prefix/bin (replace $prefix with a directory), $prefix/lib, and $prefix/include respectively. For enhanced modularity, libraries that contain diverse collections of functions (like GSL, WCSLIB, and Gnuastro), put their header files in a sub-directory unique to themselves. For example, all Gnuastro’s header files are installed in $prefix/include/gnuastro. In your source code, you need to keep the library’s sub-directory when including the headers from such libraries, for example, #include <gnuastro/fits.h>247. Not all libraries need to follow this convention, for example, CFITSIO only has one header (fitsio.h) which is directly installed in $prefix/include.


Footnotes

(243)

Recall that in C, functions only have one output.

(244)

The include/ directory name is taken from the pre-processor’s #include directive, which is also the motivation behind the ‘I’ in the -I option to the pre-processor.

(245)

Try running Gnuastro’s make and find the directories given to the compiler with the -I option.

(246)

Nearly every command you see being executed after running make is one call to the compiler.

(247)

the top $prefix/include directory is usually known to the compiler