Managing shell commands with Python

Managing shell commands with Python

Let's admit this, shell scripts sometimes can be hard to write. Especially if you got 1000s shell commands to run and would like to restrict the resources they can use on with a 10-core CPU. Whether it be test programs or simulations, it would be nice if we have a manager to control the throttle and don't starve them.

Background

I once was asked to write a python script to run over 500 simulations on a 40-core server, with each one of them taking several hours to complete on a single core. I did not want to run them all at the same time as each one of them will only get 0.08 core, stretching the completion time to several days.

I also did not want my script to be "dumb": it should put the next task to run once a previous one finishes. Nor I wanted it to be complicated with synchronization between threads. Thus I adapt to the asyncio library in Python, which will run coroutines asynchronously in a comprehensible manner. The result was the parallel-manager package, which is super easy to use and even the simplest manager it has can automatively schedule the tasks.

parallel-manager to the rescue!

Using the package is quite simple. First we download it from pypi with:

pip install parallel-manager

Then we initialize the manager and add shell commands to its task queue, all in a single async function:

from parallel_manager.manager import BaseShellManager
from parallel_manager.workerGroup import ShellWorkerGroup
import logging
import asyncio

async def Main():
    # Init the manager
    simpleShellWorkergroup = ShellWorkerGroup("simpleShellWorkergroup",
                                        logging.getLogger(),
                                        "./log",
                                        10)
    simpleShellManager = BaseShellManager("simpleShellManager")
    simpleShellManager.add_workergroup("shell", simpleShellWorkergroup)
    await simpleShellManager.init()

    # Adding tasks
    # Here we just run 100 echo
    for i in range(100):
        simpleShellManager.add_shell_request(f"echoing loop-{i}", f"echo {i}")

   # Wait for the manager to finish
   await simpleShellManager.done()
   
# That's it! We can run the manager now for these commands!
asyncio.run(Main())

You can add any shell commands to the manager, just replace the for loop with your own.

We also add an await simpleShellManager.done() at the end to wait for the requests to be finished.

Then we run the manager with asyncio.run(Main()), which will create the manager, add shell commands, and wait for them to finish. All the requests will have their stdout and stderr redirected under ./log folder.

Explanation

The above script will create 10 workers, where each one of them will start a single process for the shell commands their acquired. Thus even we have 100 echos to do, there will be at most 10 echos at any given time.

You could also increase the worker count based on your computer configuration:

# Creating 40 workers in a worker group
simpleShellWorkergroup = ShellWorkerGroup("simpleShellWorkergroup",
                                        logging.getLogger(),
                                        "./log",
                                        40)

Links

You can visit the package github page here for any bugs or detailed instructions: https://github.com/William-An/parallel-manager

Subscribe to TheXYZLab Blog

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe