Well, since I defended my PhD thesis, I now have more spare time to write something to my blog.

First, I'd better fix the thoughts about things I want to experiment with in the nearest time


  • probabilistic gradient boosting, firstly those are modifications of AdaBoost, which model not predictions but distributions for particular events
  • separate experiments for Oblivious DT probabilistic pruning.
  • pruning + boosting over bagging + probabilistic GB + optimal selection of values for leaves for RankBoost with proper loss function. I expect this mixture of algorithms and techniques to provide very efficient and reliable method of ranking. At this moment, pruning works fine for classification purposes.
  • rotation-invariant Conformal FT-like neural networks. Seems that I resolved main issues with final formula, but there are still some problems with pretraining, since I don't use binary hidden variables. PCA is the strongest candidate on pretraining at the moment.
  • finally, after Avazu CTR contest I came to strong opinion that good results in this area may be achieved only after finding good vector representations for categories (like word2vec for words). This maybe a bit tricky, since I have only ideas about heuristics that may appear useful, not some optimization approach.