Numpy exercises
When one starts writing in python, the typical reaction is disappointment about how slow it is compared to any compilable language. After a while, you learn numpy and find out it's actually not so bad.
Having spent a month with numpy, I found out that many things can be written in it.
Having spent a year with it, I found out that almost any algorithm may be vectorized, though it's sometimes non-trivial.
I'm still quite disappointed about majority of answers at stackoverflow, where people prefer plain python for anything more complicated than computing a sum of array.
For instance, you need to compute statistics of values in array.
There is a function in `scipy.stats` library which is created specially for this purpose:
order_statistics = rankdata(initial_array)
Another option is to sort array and keep track of initial positions (quite vectorizable).
Alternatively, you can compute statistics in numpy
with one-liner:
order_statistics = numpy.argsort(numpy.argsort(initial_array))
(isn’t this beatiful? I don’t say simple, I say beautiful)
Want to compute mean value over the group of events? With one-liner? Here you go:
means = numpy.bincount(group_indices, weights=values) / numpy.bincount(group_indices)
Writing oblivious decision tree in numpy is very simple and computations there are done really fast.
As a non-trivial problem: will you be able to write application of a generic decision tree (like one in sklearn) in pure numpy? For simplicity, you can first consider only trees with equal depth of all leaves.
See also:
- Numpy tips and tricks for data analysis, part1 and part2.
- Numpy speed benchmark.
- See 100 numpy exercises for challenging problems. Most of them are also solved with one-liners.