During the weekend I've been working on implementation of decision train.

Decision train is a classification/regression model, which I first introduced as a way to speed up gradient boosting with bit-hacks, so it's main purpose was to demonstrate how bit operations coupled with proper preprocessing of data plus revisiting can speed up training/application of trained formula. It may sound strange,  but this incredibly fast algorithm was written in python (all the computationally expensive things are using numpy).

In general, I found it's basic version to be no worse than GBRT from scikit-learn. And my implementation is of course much faster.

Am interesting moment of my approach: I am writing hep_ml library in modular way, for instance, considering GBRT implementation, there are separate classes/functions for:

  • boosting algorithm
  • base estimator (at this moment only different trees supported, but I believe that one can use any clustering algorithm, like k-means, which sounds a bit strange, but after experiments with pruning this becomes obvious, that getting estimators from some predefined pool can yield very good results in classification)
  • loss function. The only difference between regression, binary classification and ranking for gradient boosting is in the loss you use. 
  • splitting criterion (actually, this may be called loss function too), the FOM we minimize when building a new tree, but GBRT usually uses simply MSE as such criterion)
This 'modular structure' made it possible to write loss functions once and use with different classifiers.
Maybe, I'll find a time to provide a support of FlatnessLoss inside my neural networks (since it is flexible and uses only gradient, this should be not very complicated).