Sometimes when you you write text, you duplicate words—as with “you
you” near the beginning of this sentence. I find that most
frequently, I duplicate “the”; hence, I call the function for
detecting duplicated words,
As a first step, you could use the following regular expression to search for duplicates:
This regexp matches one or more word-constituent characters followed by one or more spaces, tabs, or newlines. However, it does not detect duplicated words on different lines, since the ending of the first word, the end of the line, is different from the ending of the second word, a space. (For more information about regular expressions, see Regular Expression Searches, as well as Syntax of Regular Expressions, and Regular Expressions.)
You might try searching just for duplicated word-constituent characters but that does not work since the pattern detects doubles such as the two occurrences of “th” in “with the”.
Another possible regexp searches for word-constituent characters followed by non-word-constituent characters, reduplicated. Here, ‘\\w+’ matches one or more word-constituent characters and ‘\\W*’ matches zero or more non-word-constituent characters.
Again, not useful.
Here is the pattern that I use. It is not perfect, but good enough. ‘\\b’ matches the empty string, provided it is at the beginning or end of a word; ‘[^@ \n\t]+’ matches one or more occurrences of any characters that are not an @-sign, space, newline, or tab.
\\b\\([^@ \n\t]+\\)[ \n\t]+\\1\\b
One can write more complicated expressions, but I found that this expression is good enough, so I use it.
Here is the
the-the function, as I include it in my
.emacs file, along with a handy global key binding:
(defun the-the () "Search forward for for a duplicated word." (interactive) (message "Searching for for duplicated words ...") (push-mark) ;; This regexp is not perfect ;; but is fairly good over all: (if (re-search-forward "\\b\\([^@ \n\t]+\\)[ \n\t]+\\1\\b" nil 'move) (message "Found duplicated word.") (message "End of buffer"))) ;; Bind 'the-the' to C-c \ (global-set-key "\C-c\\" 'the-the)
Here is test text:
one two two three four five five six seven
You can substitute the other regular expressions shown above in the function definition and try each of them on this list.