Trivial Python Multiprocessing

I just wrote up a notebook for a fellow PhD student on how I use python’s builtin multiprocessing library to do embarassingly parallel computations much faster. Every time I think about it, I’m floored at how simple using the builtin multiprocessing library is for certain operations.

There’s a ton of uncertainty out there around the state of parallel computing in Python, and I’m not an expert. But, I figure if it’s good enough for the unicorn I worked for, it’s good enough for a computational social scientist. Since you can prototype so fast, it’s very simple to run tons more parallel simulations than you could ever expect to if you did it sequentially.

Since I use multiprocessing for easy stuff like Monte Carlo simulation and GIS processing, many of the operations are embarassingly parallel, meaning that no information is shared between each run of the procedure. Computations like this are your classic Monte Carlo simulations, where each simulation computes some statistic about the realization of a stochastic data generating process.

Many GIS operations and geoprocessing techniques can also be embarassingly parallel, like if you need to construct the minimum bounding circles for a set of polygons. You can do this easily, since each polygon’s minimum bounding circle is independent of any other’s minimum bounding circle. So, if you can define your function to take one set of parameters and compute one result, then you can map that function over your simulation matrix.

For some experiment function, experiment, and a matrix of configurations, data, multiprocessing in python is often as simple as adding this below your declaration of your function:

import multiprocessing as mp

pool = mp.Pool(mp.cpu_count())

results = pool.map(experiment, data)

So, say you’re computing the Isoperimetric Quotient for a ton of shapes. You can just:

def ipq(polygon):

return (4 * PI * polygon.area) / (chain.perimeter**2)

import multiprocessing as mp

pool = mp.Pool(mp.cpu_count())

results = pool.map(ipq, polygons)

And then results contains the IPQ for each polygon. This is super simple, and can save tons of time when you can’t figure out how to vectorize a particular operation, or just plain need to do a ton of processing.

imported from: yetanothergeographer