Writing a test case (DejaGnu)

5.6 Writing a test case

The easiest way to prepare a new test case is to base it on an existing one for a similar situation. There are two major categories of tests: batch-oriented and interactive. Batch-oriented tests are usually easier to write.

The GCC tests are a good example of batch-oriented tests. All GCC tests consist primarily of a call to a single common procedure, since all the tests either have no output, or only have a few warning messages when successfully compiled. Any non-warning output constitutes a test failure. All the C code needed is kept in the test directory. The test driver, written in Tcl, need only get a listing of all the C files in the directory, and compile them all using a generic procedure. This procedure and a few others supporting for these tests are kept in the library module lib/c-torture.exp of the GCC testsuite. Most tests of this kind use very few Expect features, and are coded almost purely in Tcl.

Writing the complete suite of C tests, then, consisted of these steps:

Copying all the C code into the test directory. These tests were based on the C-torture test created by Torbjorn Granlund (on behalf of the Free Software Foundation) for GCC development.
Writing (and debugging) the generic Tcl procedures for compilation.
Writing the simple test driver: its main task is to search the directory (using the Tcl procedure glob for filename expansion with wildcards) and call a Tcl procedure with each filename. It also checks for a few errors from the testing procedure.

Testing interactive programs is intrinsically more complex. Tests for most interactive programs require some trial and error before they are complete.

However, some interactive programs can be tested in a simple fashion reminiscent of batch tests. For example, prior to the creation of DejaGnu, the GDB distribution already included a wide-ranging testing procedure. This procedure was very robust, and had already undergone much more debugging and error checking than many recent DejaGnu test cases. Accordingly, the best approach was simply to encapsulate the existing GDB tests, for reporting purposes. Thereafter, new GDB tests built up a family of Tcl procedures specialized for GDB testing.

Hints on writing a test case

5.6.1 Hints on writing a test case

To preserve basic sanity, no should test ever pass if there was any kind of problem in the test case. To take an extreme case, tests that pass even when the tool will not spawn are misleading. Ideally, a test in this sort of situation should not fail either. Instead, print an error message by calling one of the DejaGnu procedures perror or warning. Note that using perror will cause the next text result to be reported as ‘UNRESOLVED’, so printing an error and allowing the test to fail is a good option.

If you have trouble understanding why a pattern does not match the program output, try using the --debug option to runtest, and examine the debug log carefully.

If you use glob patterns, you will need to escape any ‘*’, ‘?’, ‘[’, ‘]’, and ‘\’ characters that are meant to match literally. If you use regular expressions, see the re_syntax(n) manual page from Tcl for the syntax details, and be sure to understand what punctuation characters match literally and what characters have special meanings in regular expressions.

Tcl has a few options for quoting; the most notable are ‘{}’ and ‘""’. These quotes behave differently: ‘{}’ must balance, while ‘""’ performs various interpolations. In ‘{}’ quotes, unbalanced ‘{’ or ‘}’ characters must be escaped with ‘\’ and these escapes are not removed; fortunately, backslash-escaped braces match literal braces in Tcl regular expressions. In ‘""’ quotes, any embedded ‘"’ characters must be escaped, a literal ‘$’ begins a variable substitution, and unescaped ‘[]’ introduce a Tcl command substitution.

Synchronization with the tested program

A DejaGnu testsuite executes concurrently with the programs that it tests. As a result, DejaGnu may see parts of the tested program’s output while the tested program is still producing more output. Expect patterns must be written to handle the possibility that incomplete output from the tested program will be considered for matching.

Expect reads the output from the tested program into an internal matching buffer and removes everything from the start of the buffer to the end of the match when a match is found. Any given character can be matched at most once, or skipped if a match is found starting later in the buffer or the buffer reaches its capacity. Anything left in the buffer after the end of the match remains in the buffer and is considered for the next expect command. If expect is invoked and no patterns match, Expect waits for more text to arrive. New text is appended to the buffer as it is read. If the buffer reaches its capacity, the entire contents of the buffer are discarded and Expect resumes reading.

In Expect patterns, the regular expression anchors ‘^’ and ‘$’ match at the beginning and end of the buffer, not at line boundaries. The ‘$’ anchor must be used with care because it will match at the end of what Expect has read, but the program may have produced more output that Expect has not yet read. Similarly, regular expressions ending with the ‘*’ quantifier can potentially match a prefix of the intended text, only for the rest to arrive shortly thereafter.

Maintaining synchronization with the tested program is easier if the patterns match all of the output generated by the tested program; this is called closure.

For interactive programs, a prompt is usually a good synchronization point, provided that the program’s prompt can be uniquely recognized. Since the prompt is usually the last output until the program receives further input, the ‘$’ anchor can be useful here.

If the output from the tested program is organized into lines, matching end-of-line using ‘\n’ is usually a good way to process one line at a time. Note that terminal settings may result in the insertion of additional ‘\r’ characters, usually translating ‘\n’ to ‘\r\n’.

Be careful not to neglect output generated by setup rather than by the interesting parts of a test case. For example, while testing GDB, a ‘set height 0\n’ command is issued. The purpose is simply to make sure GDB never calls a paging program. The ‘set height’ command in GDB does not generate any output; but running any command makes GDB issue a new ‘(gdb) ’ prompt. If there were no expect command to match this prompt, the ‘(gdb) ’ prompt will remain in the buffer and begin the text seen by the next expect command—which might make that pattern fail to match.

Priority of Expect patterns

Be particularly careful about how you write the patterns. Expect attempts to match each pattern in the order that they are written in the expect command. Unless a regexp pattern is anchored at the beginning of the buffer, Expect can search ahead for a match for a pattern that appears earlier in the expect command and skip over text that would match a later pattern. The text thus skipped is discarded. This is a source of very hard to trace bugs, especially when reading input from batch-oriented unit tests.

For example, consider a simple model once used by the DejaGnu testsuite for unit testing. In this example, a test has failed, but the tests before and after it have passed. First the relevant input to DejaGnu:

PASSED: foo
FAILED: bar
PASSED: baz

The test script is reading this with two Expect patterns, simplified for this example by omitting handling of the actual messages and other possible results:

expect {
       -re {PASSED: [^\r\n]+[\r\n]+} { pass ... }
       -re {FAILED: [^\r\n]+[\r\n]+} { fail ... }
}

At every cycle, Expect attempts to match each pattern in the order that they are written against the available input. If DejaGnu is processing the input as quickly as it arrives, this example will actually work. However, if the system scheduler sets DejaGnu aside for a bit, or the external program produces output in a burst, Expect can find that its input buffer contains the text in the first example above all at once as the cycle begins.

If this occurs, Expect will first attempt to match {PASSED: [^\r\n]+[\r\n]+} against the input and will succeed, since the input begins with ‘PASSED: foo’. The pass procedure is called and the test result recorded. Expect then starts a new matching cycle.

If the input had been presented one line at a time, the expected result would occur: the {FAILED: [^\r\n]+[\r\n]+} pattern would match and the test driver would work correctly. But we are considering the case where all three lines arrived “at once” so we must examine what Expect will do in this case. After the first line has been processed, the Expect buffer now contains:

FAILED: bar
PASSED: baz

Expect again attempts to match each pattern in order. Expect will attempt to match {PASSED: [^\r\n]+[\r\n]+} before attempting to match {FAILED: [^\r\n]+[\r\n]+} and the first attempt succeeds because the pattern is not anchored. The ‘FAILED: bar’ message is simply discarded when Expect finds the later ‘PASSED:baz’ message in the buffer.

How to prevent this? There are two ways: either group all of your test matches into a single regexp using alternation, or ensure that all patterns can match only at the start of Expect’s buffer. Both options can be made to work. Grouping all status results into a single regexp allows some other unspecified text to still be silently discarded, while ensuring that all patterns are anchored absolutely requires closure, as any unmatched text will cause Expect to run out of buffer space. Expect discards the entire buffer when this occurs.