Previous: Coprocesses, Up: Shell Commands   [Contents][Index]


3.2.6 GNU Parallel

GNU Parallel, as its name suggests, can be used to build and run commands in parallel. You may run the same command with different arguments, whether they are filenames, usernames, hostnames, or lines read from files.

For a complete description, refer to the GNU Parallel documentation. A few examples should provide a brief introduction to its use.

For example, it is easy to prefix each line in a text file with a specified string:

cat file | parallel -k echo prefix_string

The -k option is required to preserve the lines’ order.

Similarly, you can append a specified string to each line in a text file:

cat file | parallel -k echo {} append_string

You can use Parallel to move files from the current directory when the number of files is too large to process with one mv invocation:

ls | parallel mv {} destdir

As you can see, the {} is replaced with each line read from standard input. This will run as many mv commands as there are files in the current directory. You can emulate a parallel xargs by adding the -X option:

ls | parallel -X mv {} destdir

GNU Parallel can replace certain common idioms that operate on lines read from a file (in this case, filenames):

	for x in $(cat list); do
		do-something1 $x config-$x
		do-something2 < $x
	done | process-output

with a more compact syntax reminiscent of lambdas:

cat list | parallel "do-something1 {} config-{} ; do-something2 < {}" | process-output

Parallel provides a built-in mechanism to remove filename extensions, which lends itself to batch file transformations or renaming:

ls *.gz | parallel -j+0 "zcat {} | bzip2 >{.}.bz2 && rm {}"

This will recompress all files in the current directory with names ending in .gz using bzip2, running one job per CPU (-j+0) in parallel.

If a command generates output, you may want to preserve the input order in the output. For instance, the following command

{ echo foss.org.my ; echo debian.org; echo freenetproject.org; } | parallel traceroute

will display as output the traceroute invocation that finishes first. Using the -k option, as we saw above

{ echo foss.org.my ; echo debian.org; echo freenetproject.org; } | parallel -k traceroute

will ensure that the output of traceroute foss.org.my is displayed first.


Previous: Coprocesses, Up: Shell Commands   [Contents][Index]