Spinning off threads is not necessarily the most efficient way to run an application. Creating a new thread isn’t a cheap operation for the operating system. It is most useful when the input data are fixed and you want the same operation to be done on parts of it. For example one input image to Crop and multiple crops from various parts of it. In this fashion, the image is loaded into memory once, all the crops are divided between the number of threads internally and each thread cuts out those parts which are assigned to it from the same image. On the other hand, if you have multiple images and you want to crop the same region(s) out of all of them, it is much more efficient to set --numthreads=1 (so no threads spin off) and run Crop multiple times simultaneously, see How to run simultaneous operations.
You can check the boost in speed by first running a program on one of the data sets with the maximum number of threads and another time (with everything else the same) and only using one thread. You will notice that the wall-clock time (reported by most programs at their end) in the former is longer than the latter divided by number of physical CPU cores (not threads) available to your operating system. Asymptotically these two times can be equal (most of the time they aren’t). So limiting the programs to use only one thread and running them independently on the number of available threads will be more efficient.
Note that the operating system keeps a cache of recently processed data, so usually, the second time you process an identical data set (independent of the number of threads used), you will get faster results. In order to make an unbiased comparison, you have to first clean the system’s cache with the following command between the two runs.
$ sync; echo 3 | sudo tee /proc/sys/vm/drop_caches
SUMMARY: Should I use multiple threads? Depends: