Wednesday, June 22, 2016

Python Module: joblib - make parallelism easy!

This week, I found a nice python module to do quick parallel computing - joblib. I used to do parallel computing using python Multiprocessing module. But for a quick dirty way to parallel for loop, joblib is a very nice tool! Here's an example. You can download the example from Qingkai's Github.

Serial Version

In [1]:
def square_int(i):
    return i * i
In [2]:
results = []
for i in range(10):
print results
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Parallel Version

In [3]:
from joblib import Parallel, delayed

results = []
results = Parallel(n_jobs= -1, backend="threading")\
    (delayed(square_int)(i) for i in range(10))

print results
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
We can see that square_int is the function that you want to run in the for loop. The main restriction is that myfun must be a toplevel function. The backend parameter can be either "threading" or "multiprocessing". If you choose "multiprocessing", under the hood, the Parallel object create a multiprocessing pool that forks separate Python worker processes to execute tasks concurrently on separate CPUs.
If you know that the function you are calling is based on a compiled extension that releases the Python Global Interpreter Lock (GIL) during most of its computation then it might be more efficient to use threads instead of Python processes as concurrent workers.

1 comment:

  1. Serial Version execution time is less than to Parallel Version.
    Serial Version execution time:
    real 0m0.287s
    user 0m0.579s
    sys 0m6.112s

    Parallel Version execution time:

    real 0m0.460s
    user 0m0.614s
    sys 0m6.084s