When comparing two files,
diff finds sequences of lines common to
both files, interspersed with groups of differing lines called
hunks. Comparing two identical files yields one sequence of
common lines and no hunks, because no lines differ. Comparing two
entirely different files yields no common lines and one large hunk that
contains all lines of both files. In general, there are many ways to
match up lines between two given files.
diff tries to minimize
the total hunk size by finding large sequences of common lines
interspersed with small hunks of differing lines.
For example, suppose the file F contains the three lines
‘a’, ‘b’, ‘c’, and the file G contains the same
three lines in reverse order ‘c’, ‘b’, ‘a’. If
diff finds the line ‘c’ as common, then the command
‘diff F G’ produces this output:
1,2d0 < a < b 3a2,3 > b > a
diff notices the common line ‘b’ instead, it produces
1c1 < a --- > c 3c3 < c --- > a
It is also possible to find ‘a’ as the common line.
does not always find an optimal matching between the files; it takes
shortcuts to run faster. But its output is usually close to the
shortest possible. You can adjust this tradeoff with the
--minimal (-d) option (see diff Performance).