Currently the programming language that is most commonly used in scientific applications is C++192, Python193, and Julia194 (which is a newcomer but swiftly gaining ground). One of the main reasons behind this choice is their high-level abstractions. However, GNU Astronomy Utilities is fully written in the C programming language195. The reasons can be summarized with simplicity, portability and efficiency/speed. All three are very important in a scientific software and we will discuss them below.
Simplicity can best be demonstrated in a comparison of the main books of C++ and C. The “C programming language”196 book, written by the authors of C, is only 286 pages and covers a very good fraction of the language, it has also remained unchanged from 1988. C is the main programming language of nearly all operating systems and there is no plan of any significant update. On the other hand, the most recent “C++ programming language”197 book, also written by its author, has 1366 pages and its fourth edition came out in 2013! As discussed in Science and its tools, it is very important for other scientists to be able to readily read the code of a program at their will with minimum requirements.
In C++, inheriting objects in the object oriented programming paradigm and
their internal functions make the code very easy to write for a programmer
who is deeply invested in those objects and understands all their relations
well. But it simultaneously makes reading the program for a first time
reader (a curious scientist who wants to know only how a small step was
done) extremely hard. Before understanding the methods, the scientist has
to invest a lot of time and energy in understanding those objects and their
relations. But in C, everything is done with basic language types for
floats and their pointers to define
arrays. So when an outside reader is only interested in one part of the
program, that part is all they have to understand.
Recently it is also becoming common to write scientific software in Python, or a combination of it with C or C++. Python is a high level scripting language which doesn’t need compilation. It is very useful when you want to do something on the go and don’t want to be halted by the troubles of compiling, linking, memory checking, etc. When the datasets are small and the job is temporary, this ability of Python is great and is highly encouraged. A very good example might be plotting, in which Python is undoubtedly one of the best.
But as the data sets increase in size and the processing becomes more complicated, the speed of Python scripts significantly decrease. So when the program doesn’t change too often and is widely used in a large community, mostly on large data sets (like astronomical images), using Python will waste a lot of valuable research-hours. It is possible to wrap C or C++ functions with Python to fix the speed issue. But this creates further complexity, because the interested scientist has to master two programming languages and their connection (which is not trivial).
Like C++, Python is object oriented, so as explained above, it needs a high level of experience with that particular program to reasonably understand its inner workings. To make things worse, since it is mainly for on-the-go programming198, it can undergo significant changes. One recent example is how Python 2.x and Python 3.x are not compatible. Lots of research teams that invested heavily in Python 2.x cannot benefit from Python 3.x or future versions any more. Some converters are available, but since they are automatic, lots of complications might arise in the conversion199.
If a research project begins using Python 3.x today, there is no telling how compatible their investments will be when Python 4.x or 5.x will come out. This stems from the core principles of Python, which are very useful when you look in the ‘on the go’ basis as described before and not future usage. Reproducibility (ability to run the code in the future) is a core principal of any scientific result, or the software that produced that result. Rebuilding all the dependencies of a software in an obsolete language is not easy. Future-proof code (as long as current operating systems will be used) is written in C.
The portability of C is best demonstrated by the fact that both C++ and Python are part of the C-family of programming languages which also include Julia, Java, Perl, and many other languages. C libraries can be immediately included in C++, and it is easy to write wrappers for them in all C-family programming languages. This will allow other scientists to benefit from C libraries using any C-family language that they prefer. As a result, Gnuastro’s library is already usable in C and C++, and wrappers will be200 added for higher-level languages like Python, Julia and Java.
The final reason was speed. This is another very important aspect of C which is not independent of simplicity (first reason discussed above). The abstractions provided by the higher-level languages (which also makes learning them harder for a newcomer) comes at the cost of speed. Since C is a low-level language201 (closer to the hardware), it is much less complex for both the human reader and the computer. The benefits of simplicity for a human were discussed above. Simplicity for the computer translates into more efficient (faster) programs. This creates a much closer relation between the scientist/programmer (or their program) and the actual data and processing. The GNU coding standards202 also encourage the use of C over all other languages when generality of usage and “high speed” is desired.
Brian Kernighan, Dennis Ritchie. The C programming language. Prentice Hall, Inc., Second edition, 1988. It is also commonly known as K&R and is based on the ANSI C and ISO C90 standards.
Bjarne Stroustrup. The C++ programming language. Addison-Wesley Professional; 4 edition, 2013.
Note that Python is good for fast programming, not fast programs.
For example see Jenness (2017) which describes how LSST is managing the transition.
Low-level languages are those that directly operate the hardware like assembly languages. So C is actually a high-level language, but it can be considered one of the lowest-level languages among all high-level languages.