Faster alternatives to numpy?
Personally I am a big fan of numpy package, since it makes the code clean and still quite fast.
However I am much worried about the speed, so decided to collect different benchmarks
numpy vs cython vs weave (numpy is about 2 times slower than others)
(posted in 2011)
http://technicaldiscovery.blogspot.ru/2011/06/speeding-up-python-numpy-cython-and.html
Primarily the post is about numba, the pairwise distances are computed with cython, numpy, numba.
Numba is claimed to be the fastest, around 10 times faster than numpy.
(posted in 2013)
https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/
Julia is claimed by its developers to be very fast language.
Well, if that is true, there would be no need in writing easily-vectorizable operation in pure python (yep, I mean they are simply cheating), currently these 'benchmarks' are at the main page
http://julialang.org/
The following post was written in 2011, the problem observed is solution of Laplace equation as usual.
Numpy is ~10 times slower than pure C++ solution (and equal to matlab), weave shows the speed comparable with pure C++ (I don't know anyone using weave now)
http://wiki.scipy.org/PerformancePython
Fresh (2014) benchmark of different python tools, simple vectorized expression A*B-4.1*A > 2.5*B is evaluated with numpy, cython, numba, numexpr, and parakeet (and two latest are the fastest - about 10 times less time than numpy, achieved by using multithreading with two cores)
http://nbviewer.ipython.org/github/rasbt/One-Python-benchmark-per-day/blob/master/ipython_nbs/day7_2_jit_numpy.ipynb
Haven't found any general-purpose theano vs numpy benchmarks, but in the article there is comparison of neural networks and theano is expected to give much better speed than numpy/torch(c++)/matlab, specially it is fast on GPU
http://conference.scipy.org/proceedings/scipy2010/pdfs/bergstra.pdf
One more detailed review of numpy vs cython vs c (held in 2014)
http://notes-on-cython.readthedocs.org/en/latest/std_dev.html
Let me copy-paste the example results (computation of std).
Simple, but comprehensive comparison of python accelerators was prepared by Jake Vanderplaas and Olivier Grisel:
http://nbviewer.ipython.org/github/ogrisel/notebooks/blob/master/Numba%20Parakeet%20Cython.ipynb
Problem is computation of pairwise distances. The results this time seem to be unbiased:
By the way, I was recently looking for a slow place in my code, and it totally dropped of my mind that computation of residual is long operation:
http://embeddedgurus.com/stack-overflow/2011/02/efficient-c-tip-13-use-the-modulus-operator-with-caution/
Integer division / remainder computation time significaly depends on the bit depth (16 bit operation are ~2 times faster than 32 bit ones, division takes roughly 10 times more time than multiplication).
Surprising: multiplication/addition/binary operations take the same time (compared on numpy 1.9), which I personally find very strange (this behavior may be caused by quite complicated addressing in numpy arrays, which becomes the bottleneck, but this is only a guess).
However I am much worried about the speed, so decided to collect different benchmarks
numpy vs cython vs weave (numpy is about 2 times slower than others)
(posted in 2011)
http://technicaldiscovery.blogspot.ru/2011/06/speeding-up-python-numpy-cython-and.html
Primarily the post is about numba, the pairwise distances are computed with cython, numpy, numba.
Numba is claimed to be the fastest, around 10 times faster than numpy.
(posted in 2013)
https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/
Julia is claimed by its developers to be very fast language.
Well, if that is true, there would be no need in writing easily-vectorizable operation in pure python (yep, I mean they are simply cheating), currently these 'benchmarks' are at the main page
http://julialang.org/
The following post was written in 2011, the problem observed is solution of Laplace equation as usual.
Numpy is ~10 times slower than pure C++ solution (and equal to matlab), weave shows the speed comparable with pure C++ (I don't know anyone using weave now)
http://wiki.scipy.org/PerformancePython
Fresh (2014) benchmark of different python tools, simple vectorized expression A*B-4.1*A > 2.5*B is evaluated with numpy, cython, numba, numexpr, and parakeet (and two latest are the fastest - about 10 times less time than numpy, achieved by using multithreading with two cores)
http://nbviewer.ipython.org/github/rasbt/One-Python-benchmark-per-day/blob/master/ipython_nbs/day7_2_jit_numpy.ipynb
Haven't found any general-purpose theano vs numpy benchmarks, but in the article there is comparison of neural networks and theano is expected to give much better speed than numpy/torch(c++)/matlab, specially it is fast on GPU
http://conference.scipy.org/proceedings/scipy2010/pdfs/bergstra.pdf
One more detailed review of numpy vs cython vs c (held in 2014)
http://notes-on-cython.readthedocs.org/en/latest/std_dev.html
Let me copy-paste the example results (computation of std).
Method | Time (ms) | Compared to Python | Compared to Numpy |
---|---|---|---|
Pure Python | 183 | x1 | x0.03 |
Numpy | 5.97 | x31 | x1 |
Naive Cython | 7.76 | x24 | x0.8 |
Optimised Cython | 2.18 | x84 | x2.7 |
Cython calling C | 2.22 | x82 | x2.7 |
Simple, but comprehensive comparison of python accelerators was prepared by Jake Vanderplaas and Olivier Grisel:
http://nbviewer.ipython.org/github/ogrisel/notebooks/blob/master/Numba%20Parakeet%20Cython.ipynb
Problem is computation of pairwise distances. The results this time seem to be unbiased:
Python | 9.51s |
Naive numpy | 64.7 ms |
Numba | 6.72ms |
Cython | 6.57ms |
Parakeet | 12.3 ms |
Cython | 6.57 ms |
By the way, I was recently looking for a slow place in my code, and it totally dropped of my mind that computation of residual is long operation:
http://embeddedgurus.com/stack-overflow/2011/02/efficient-c-tip-13-use-the-modulus-operator-with-caution/
Integer division / remainder computation time significaly depends on the bit depth (16 bit operation are ~2 times faster than 32 bit ones, division takes roughly 10 times more time than multiplication).
Surprising: multiplication/addition/binary operations take the same time (compared on numpy 1.9), which I personally find very strange (this behavior may be caused by quite complicated addressing in numpy arrays, which becomes the bottleneck, but this is only a guess).