Use Joblib to run your Python code in parallel
For most problems, parallel computing can really increase the computing speed. As the increase of PC computing power, we can simply increase our computing by running parallel code in our own PC. Joblib is such an pacakge that can simply turn our Python code into parallel computing mode and of course increase the computing speed.
Joblib is a set of tools to provide lightweight pipelining in Python. In particular:
- transparent disk-caching of functions and lazy re-evaluation (memoize pattern)
easy simple parallel computing - Joblib is optimized to be fast and robust in particular on large data and has specific optimizations for numpy arrays.
Here we use a simply example to demostrate the parallel computing functionality. We define a simply function my_fun
with a single parameter i
. This function will wait 1 second and then compute the square root of i**2
.
from joblib import Parallel, delayed
import time, mathdef my_fun(i):
""" We define a simple function here.
"""
time.sleep(1)
return math.sqrt(i**2)
Here we set the total iteration to be 10. We use the time.time()
function to compute the my_fun()
running time. Using simple for loop, we can get the computing time to be about 10 seconds.
num = 10
start = time.time()
for i in range(num):
my_fun(i)end = time.time()print('{:.4f} s'.format(end-start))10.0387 s
With the Parallel
and delayed
functions from Joblib, we can simply configure a parallel run of the my_fun()
function. n_jobs
is the number of parallel jobs, and we set it to be 2
here. i
is the input parameter of my_fun()
function, and we'd like to run 10 iterations. Without any surprise, the 2 parallel jobs give me about half of the original for loop running time, that is, about 5 seconds.
start = time.time()
# n_jobs is the number of parallel jobs
Parallel(n_jobs=2)(delayed(my_fun)(i) for i in range(num))
end = time.time()
print('{:.4f} s'.format(end-start))5.6560 s
It’s that easy! What if we have more than one parameters in our functions? It’s also very simple. Let’s define a new function with two parameters my_fun_2p(i, j)
.
def my_fun_2p(i, j):
""" We define a simple function with two parameters.
"""
time.sleep(1)
return math.sqrt(i**j)j_num = 3
num = 10
start = time.time()
for i in range(num):
for j in range(j_num):
my_fun_2p(i, j)end = time.time()
print('{:.4f} s'.format(end-start))30.0778 sstart = time.time()
# n_jobs is the number of parallel jobs
Parallel(n_jobs=2)(delayed(my_fun_2p)(i, j) for i in range(num) for j in range(j_num))
end = time.time()
print('{:.4f} s'.format(end-start))15.0622 s
Except the parallel computing funtionality, Joblib also have the following features:
- Transparent and fast disk-caching of output value: a memoize or make-like functionality for Python functions that works well for arbitrary Python objects, including very large numpy arrays.
- Fast compressed Persistence: a replacement for pickle to work efficiently on Python objects containing large data ( joblib.dump & joblib.load ).
More details can be found at Joblib official website.
More tutorials and articles can be found at my blog-Measure Space and my YouTube channel.