Next: , Previous: , Up: Examples   [Contents][Index]


7.7 Text search across multiple lines

This section uses N and D commands to search for consecutive words spanning multiple lines. See Multiline techniques.

These examples deal with finding doubled occurrences of words in a document.

Finding doubled words in a single line is easy using GNU grep and similarly with GNU sed:

$ cat two-cities-dup1.txt
It was the best of times,
it was the worst of times,
it was the the age of wisdom,
it was the age of foolishness,

$ grep -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
it was the the age of wisdom,

$ grep -n -E '\b(\w+)\s+\1\b' two-cities-dup1.txt
3:it was the the age of wisdom,

$ sed -En '/\b(\w+)\s+\1\b/p' two-cities-dup1.txt
it was the the age of wisdom,

$ sed -En '/\b(\w+)\s+\1\b/{=;p}' two-cities-dup1.txt
3
it was the the age of wisdom,

When the doubled word span two lines the above regular expression will not find them as grep and sed operate line-by-line.

By using N and D commands, sed can apply regular expressions on multiple lines (that is, multiple lines are stored in the pattern space, and the regular expression works on it):

$ cat two-cities-dup2.txt
It was the best of times, it was the
worst of times, it was the
the age of wisdom,
it was the age of foolishness,

$ sed -En '{N; /\b(\w+)\s+\1\b/{=;p} ; D}'  two-cities-dup2.txt
3
worst of times, it was the
the age of wisdom,

See the GNU coreutils manual for an alternative solution using tr -s and uniq at https://gnu.org/s/coreutils/manual/html_node/Squeezing-and-deleting.html.


Next: , Previous: , Up: Examples   [Contents][Index]