When one starts writing in python, the typical reaction is disappointment about how slow it is compared to any compilable language. After a while, you learn numpy and find out it's actually not so bad.

Having spent a month with numpy, I found out that many things can be written in it.

Having spent a year with it, I found out that almost any algorithm may be vectorized, though it's sometimes non-trivial.

I'm still quite disappointed about majority of answers at stackoverflow, where people prefer plain python for anything more complicated than computing a sum of array.


For instance, you need to compute statistics of values in array.

There is a function in `scipy.stats` library which is created specially for this purpose:

order_statistics = rankdata(initial_array)

Another option is to sort array and keep track of initial positions (quite vectorizable).

Alternatively, you can compute statistics in numpy with one-liner:

order_statistics = numpy.argsort(numpy.argsort(initial_array))

(isn’t this beatiful? I don’t say simple, I say beautiful)


Want to compute mean value over the group of events? With one-liner? Here you go:

means = numpy.bincount(group_indices, weights=values) / numpy.bincount(group_indices)

Writing oblivious decision tree in numpy is very simple and computations there are done really fast.

As a non-trivial problem: will you be able to write application of a generic decision tree (like one in sklearn) in pure numpy? For simplicity, you can first consider only trees with equal depth of all leaves.

See also: