Parallelizing shell tasks with project middleman and xargs


While poking around the Internet, I came across a link to project middleman. The project provides an easy way for administrators to parallelize tasks inside shell scripts, and is described rather nicely in the README file that comes with the source code:

“The philosophy behind mdm is that users should benefit from their multi-core systems without making drastic changes to their shell scripts. With mdm, you annotate your scripts to specify which commands might benefit from parallelization, and then you run it under the supervision of the mdm system. At runtime, the mdm system dynamically discovers parallelization opportunities and run the annotated commands in parallel as appropriate.”

And when they mention annotating a shell script, it really is as simple as placing the “mdm-run” binary in front of tasks that can be parallelized (you can also define an I/O profile if tasks will interfere with each others I/O streams):

$ mdm-run convert2ogg *.mp3

This is pretty sweet, and I need to play around with this a bit more on my quad core desktop. Rock on!

I just came across Parallelizing Jobs with xargs, which describes how to use the xargs “-P” option to parallelize tasks:

$ ls .mp3 | xargs -P 8 -n 1 convert2ogg

The “-P” argument to xargs will cause 8 convert2ogg processes to be kicked off, and the “-n” option will ensure that only one argument of the ls output is passed to each process that is created. This is sweet, and I can DEFINITELY see myself using this super useful argument in the future!!!!!

This article was posted by Matty on 2009-03-14 10:45:00 -0400 -0400