# Benchmarks of speed (Numpy vs all)

Personally I am a big fan of

However I am much worried about the speed, so decided to collect different benchmarks

(posted in 2011)

http://technicaldiscovery.blogspot.ru/2011/06/speeding-up-python-numpy-cython-and.html

Primarily the post is about numba, the pairwise distances are computed with

(posted in 2013)

https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/

Well, if that is true, there would be no need in writing easily-vectorizable operation in pure python (yep, I mean they are simply cheating), currently these 'benchmarks' are at the main page

http://julialang.org/

The following post was written in 2011, the problem observed is solution of Laplace equation as usual.

Numpy is ~10 times slower than pure C++ solution (and equal to matlab),

http://wiki.scipy.org/PerformancePython

Fresh (2014) benchmark of different python tools, simple vectorized expression A*B-4.1*A > 2.5*B is evaluated with

http://nbviewer.ipython.org/github/rasbt/One-Python-benchmark-per-day/blob/master/ipython_nbs/day7_2_jit_numpy.ipynb

Haven't found any general-purpose

http://conference.scipy.org/proceedings/scipy2010/pdfs/bergstra.pdf

One more detailed review of numpy vs cython vs c (held in 2014)

http://notes-on-cython.readthedocs.org/en/latest/std_dev.html

Let me copy-paste the example results (computation of std).

Simple, but comprehensive comparison of python accelerators was prepared by Jake Vanderplaas and Olivier Grisel:

http://nbviewer.ipython.org/github/ogrisel/notebooks/blob/master/Numba%20Parakeet%20Cython.ipynb

Problem is computation of pairwise distances. The results this time seem to be unbiased:

By the way, I was recently looking for a slow place in my code, and it totally dropped of my mind that

http://embeddedgurus.com/stack-overflow/2011/02/efficient-c-tip-13-use-the-modulus-operator-with-caution/

Integer division / remainder computation time significaly depends on the bit depth (16 bit operation are ~2 times faster than 32 bit ones, division takes roughly 10 times more time than multiplication).

Surprising: multiplication/addition/binary operations take the same time (compared on numpy 1.9), which I personally find very strange (this behavior may be caused by quite complicated addressing in numpy arrays, which becomes the bottleneck, but this is only a guess).

**numpy**package, since it makes the code clean and still quite fast.However I am much worried about the speed, so decided to collect different benchmarks

**numpy**vs**cython**vs**weave**(numpy is about 2 times slower than others)(posted in 2011)

http://technicaldiscovery.blogspot.ru/2011/06/speeding-up-python-numpy-cython-and.html

Primarily the post is about numba, the pairwise distances are computed with

**cython**,**numpy**,**numba**.**Numba**is claimed to be the fastest, around 10 times faster than numpy.(posted in 2013)

https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/

**Julia**is claimed by its developers to be very fast language.Well, if that is true, there would be no need in writing easily-vectorizable operation in pure python (yep, I mean they are simply cheating), currently these 'benchmarks' are at the main page

http://julialang.org/

The following post was written in 2011, the problem observed is solution of Laplace equation as usual.

Numpy is ~10 times slower than pure C++ solution (and equal to matlab),

**weave**shows the speed comparable with pure C++ (I don't know anyone using weave now)http://wiki.scipy.org/PerformancePython

Fresh (2014) benchmark of different python tools, simple vectorized expression A*B-4.1*A > 2.5*B is evaluated with

**numpy, cython, numba, numexpr, and parakeet**(and two latest are the fastest - about 10 times less time than numpy, achieved by using multithreading with two cores)http://nbviewer.ipython.org/github/rasbt/One-Python-benchmark-per-day/blob/master/ipython_nbs/day7_2_jit_numpy.ipynb

Haven't found any general-purpose

**theano vs numpy**benchmarks, but in the article there is comparison of neural networks and theano is expected to give much better speed than numpy/torch(c++)/matlab, specially it is fast on GPUhttp://conference.scipy.org/proceedings/scipy2010/pdfs/bergstra.pdf

One more detailed review of numpy vs cython vs c (held in 2014)

http://notes-on-cython.readthedocs.org/en/latest/std_dev.html

Let me copy-paste the example results (computation of std).

Method | Time (ms) | Compared to Python | Compared to Numpy |
---|---|---|---|

Pure Python | 183 | x1 | x0.03 |

Numpy | 5.97 | x31 | x1 |

Naive Cython | 7.76 | x24 | x0.8 |

Optimised Cython | 2.18 | x84 | x2.7 |

Cython calling C | 2.22 | x82 | x2.7 |

Simple, but comprehensive comparison of python accelerators was prepared by Jake Vanderplaas and Olivier Grisel:

http://nbviewer.ipython.org/github/ogrisel/notebooks/blob/master/Numba%20Parakeet%20Cython.ipynb

Problem is computation of pairwise distances. The results this time seem to be unbiased:

Python | 9.51s |

Naive numpy | 64.7 ms |

Numba | 6.72ms |

Cython | 6.57ms |

Parakeet | 12.3 ms |

Cython | 6.57 ms |

By the way, I was recently looking for a slow place in my code, and it totally dropped of my mind that

*computation of residual is long operation*:http://embeddedgurus.com/stack-overflow/2011/02/efficient-c-tip-13-use-the-modulus-operator-with-caution/

Integer division / remainder computation time significaly depends on the bit depth (16 bit operation are ~2 times faster than 32 bit ones, division takes roughly 10 times more time than multiplication).

Surprising: multiplication/addition/binary operations take the same time (compared on numpy 1.9), which I personally find very strange (this behavior may be caused by quite complicated addressing in numpy arrays, which becomes the bottleneck, but this is only a guess).