Using xargs and lscpu to spawn one process per CPU core


One of my friends reached out to me earlier this week to ask if there was an easy way to run multiple Linux processes in parallel. There are several ways to approach this problem but most of them don’t take into account hardware cores and threads. My preferred solution for CPU intensive operations is to use the xargs parallel option ("-P”) along with the CPU cores listed in lscpu. This allows me to run one process per core which is ideal for CPU intensive applications. But enough talk, let’s see an example.

Let’s say you need to compress a directory full of log files and want to run one compression job on each CPU core. To locate the number of cores you can combine lscpu and grep:

$ CPU_CORES=$(lscpu -p=CORE,ONLINE | grep -c 'Y')

To generate a list of files we can run find and pass the output of that to xargs:

$ find . -type f -name \.log | xargs -n1 -P{CPU_CORES} bzip2

The xargs command listed above will create one bzip2 process per core and pass it a log file to process. To monitor the pipeline to make sure it is working as intended we can run a simple while loop:

$ while :; do ps auxwww | grep [b]zip; sleep 1; done

matty 14322 0.0 0.0 113968 1228 pts/0 S+ 07:24 0:00 xargs -n1 -P4 bzip2
matty 14323 95.0 0.0 13748 7624 pts/0 R+ 07:24 0:11 bzip2 ./log10.txt.log
matty 14324 95.9 0.0 13748 7616 pts/0 R+ 07:24 0:11 bzip2 ./log2.txt.log
matty 14325 96.0 0.0 13748 7664 pts/0 R+ 07:24 0:11 bzip2 ./log3.txt.log
matty 14326 94.9 0.0 13748 7632 pts/0 R+ 07:24 0:11 bzip2 ./log4.txt.log

There are a number of other useful things you can do with the items listed above but I will leave that to your imagination. Viva la xargs!

This article was posted by Matty on 2017-08-17 14:38:00 -0400 -0400