Today we provide a helpful script inspired by R’s microbenchmark package for benchmarking in Python. We then use this script to compare random number generation (RNG) in Python vs R. You may want to view a previous blog on R vs Python.
At the moment if you want to benchmark functions in Python, there are limited tools that allow you to do this apart from basic clock time. The timeit module is available but it doesn’t have the most convenient usage syntax. So after using the microbenchmark package, we were inspired to outline a similar simple script in Python.
The code in this blog entry is run/compiled on Ubuntu 14.04, core i7 processor having 16 GB RAM, using the g++ version 4.8.2 compiler, R version 3.1.0, Python version 2.7.6. The script is now located in github.
We load a few python modules that will be used in the script:
import collections as col
import numpy as np
import time as tt
import tabulate as tab
Next we create a SimpleTable object, which is literally a simple table type object that inherits from an OrderedDict
in the collections
library. It is rudimentary and designed to hold the data from the benchmarks but does not have any checking or advanced features. The keys of the dict
object represent the column headers and column data is held in the values:
class SimpleTable(col.OrderedDict):
"""
SimpleTable is just a OrderedDict with some convenience functions
It is not designed to be full featured nor does it do any checking
of the table structure.
The keys represent the column headers and the values represent the
columns.
"""
def nrow(self):
"""
Returns the number of rows in the table
"""
return self[list(self.viewkeys())[0]].__len__()
def ncol(self):
"""
Returns the number of columns in the table
"""
return self.__len__()
def rbind(self, row):
"""
Binds any suitable dict, OrderedDict, or SimpleTable
To the current table
:row: suitable dict or SimpleTable of rows to be bound
"""
for k in self.viewkeys():
for i in np.arange(row[k].__len__()):
self[k].append(row[k][i])
def __repr__(self):
return tab.tabulate(self, headers = "keys")
Below we have a function that formats and rounds units of time from seconds to either "ms"
miliseconds, "us"
microsceonds, "ns"
nanoseconds, "s"
seconds (rounding), or simply return the "raw"
timing output from tt.time
function.
# Function to transform and round time units
def timeUnits(times, unit = "ms", ndigits = 5):
"""
Function formats times for the bench function
:param times: list of times to be formatted
:param unit: string the units of time: "ms": (milliseconds), "us": (microseconds),
"ns": nanoseconds, "s": time in seconds, "raw": raw time in seconds
:param ndigits: number of decimal places to round down the times
:return: list of formatted times
"""
if unit == "ms":
return [round(i * 1E3, ndigits) for i in times]
elif unit == "us":
return [round(i * 1E6, ndigits) for i in times]
elif unit == "ns":
return [round(i * 1E9, ndigits) for i in times]
elif unit == "s":
return [round(i, ndigits) for i in times]
elif unit == "raw":
return times
Next we provide three functions for benchmarking, bench
is the workhorse function that carries out the benchmark of a single case defined in a string, the lbenchmark
function carries out benchmarks of a list of of strings to be evaluated, and benchmark
allows multiple strings to be entered one after the other as different arguments in the function. So for common use, the lbenchmark
and benchmark
functions are the available - take your pick.
The bench
function is given below:
def bench(sExpr, neval=100, units = "ms", ndigits = 5):
"""
:param expr: string expression to be evaluated
:param neval: number of evaluations that the statistics will be calculated from
:param units: string the units of time: "ms": (milliseconds), "us": (microseconds),
"ns": nanoseconds, "s": time in seconds, "raw": raw time in seconds
:param ndigits: number of decimal places to round down the times
:return: SimpleTable of times min, lower, mid, upper quartiles and max time, and the expression run
"""
times = np.ndarray(shape=(neval,), dtype = np.float64)
expr = compile(sExpr, "<string>", "eval")
for i in np.arange(neval):
start = tt.time()
out = eval(expr)
end = tt.time()
times[i] = end - start
times = np.percentile(times, [0, 25, 50, 75, 100])
times = timeUnits(times, units, ndigits)
summ = SimpleTable([('expr', [sExpr]), ('min', [times[0]]), ('lq', [times[1]]), ('median', [times[2]]), ('uq', [times[3]]), ('max', [times[4]])], ('neval', [neval]))
return summ
The lbenchmark
function is given below:
def lbenchmark(lExpr, **kwargs):
"""
List version of benchmark, takes in a list of string expressions as lExpr
:param lExpr: list of strings to be evaluated
:param neval: number of evaluations that the statistics will be calculated from
:param units: string the units of time: "ms": (milliseconds), "us": (microseconds),
"ns": nanoseconds, "s": time in seconds, "raw": raw time in seconds
:param ndigits: number of decimal places to round down the times
:return: SimpleTable of times min, lower, mid, upper quartiles and max time, and the expression run
"""
nExpr = lExpr.__len__()
out = bench(lExpr[0], **kwargs)
if nExpr > 1:
for i in np.arange(1, nExpr):
out.rbind(bench(lExpr[i], **kwargs))
return out
The benchmark
function is given below:
def benchmark(*args, **kwargs):
"""
Function running bench underneath for multiple strings
:param args: the strings to be evaluated
:param kwargs: arguments passed to bench
:return: SimpleTable of times min, lower, mid, upper quartiles and max time, and the expression run
"""
nExpr = args.__len__()
out = bench(args[0], **kwargs)
if nExpr > 1:
for i in np.arange(1, nExpr):
out.rbind(bench(args[i], **kwargs))
return out
Here we give examples that benchmark a few functions for sampling from statistical distributions and compare them to their R counterparts. Both R and Python use the Mersenne Twister as the RNG engine:
# Units are microseconds ...
>>> evalStrings = ["np.random.uniform(size=1000)", "np.random.normal(size=1000)",
"np.random.binomial(size=1000, n=10, p = .5)", "np.random.poisson(size=1000, lam=1)"]
>>> bench(evalStrings[0], units = "us")
expr min lq median uq max neval
---------------------------- ------- ------- -------- ------ ------- -------
np.random.uniform(size=1000) 23.8419 24.0803 25.034 25.034 46.9685 100
>>> print benchmark("np.random.uniform(size=1000)", "np.random.normal(size=1000)",
"np.random.binomial(size=1000, n=10, p = .5)", "np.random.poisson(size=1000, lam=1)",
units="us")
expr min lq median uq max neval
------------------------------------------- ------- ------- -------- -------- -------- -------
np.random.uniform(size=1000) 15.974 16.9277 16.9277 17.345 36.9549 100
np.random.normal(size=1000) 56.982 57.9357 57.9357 58.8894 148.058 100
np.random.binomial(size=1000, n=10, p = .5) 57.9357 58.8894 64.1346 147.104 190.02 100
np.random.poisson(size=1000, lam=1) 61.9888 63.1809 66.0419 71.0487 105.143 100
>>> lbenchmark(evalStrings, units = "us")
expr min lq median uq max neval
------------------------------------------- ------- ------- -------- ------- -------- -------
np.random.uniform(size=1000) 24.7955 25.9876 25.9876 28.1334 56.0284 100
np.random.normal(size=1000) 76.0555 78.9166 85.1154 86.0691 168.085 100
np.random.binomial(size=1000, n=10, p = .5) 59.1278 64.8499 66.5188 96.0827 120.163 100
np.random.poisson(size=1000, lam=1) 64.8499 69.6778 70.0951 71.0487 97.99 100
> require(microbenchmark)
Loading required package: microbenchmark
> n = 1000> microbenchmark(runif(n), rnorm(n), rbinom(n, p = .5, size = 10), rpois(n, 1))
Unit: microseconds
expr min lq median uq max neval
runif(n) 37.317 39.6890 40.914 42.2355 70.507 100
rnorm(n) 82.480 85.4910 86.836 88.2970 154.761 100
rbinom(n, p = 0.5, size = 10) 92.181 94.5645 95.781 97.2435 153.588 100
rpois(n, 1) 39.534 41.3360 42.541 43.7010 85.183 100
We have created a few easy to use functions here for quickly carrying out benchmarks. These tools allow programmers to have an idea of what the performance of different functions will be and we provided some examples of usage. The scripts brings some R-like sugar to Python.
The timed RNG tests reveals a little better RNG performance from Python overall (particularly for the uniform distribution at more than x2 faster). We will probably come back to this topic at some later blog.