Well, since I defended my PhD thesis, I now have more spare time to write something to my blog.
First, I'd better fix the thoughts about things I want to experiment with in the nearest time
- probabilistic gradient boosting, firstly those are modifications of AdaBoost, which model not predictions
but distributions for particular events
- separate experiments for Oblivious DT probabilistic pruning.
- pruning + boosting over bagging + probabilistic GB + optimal selection of values for leaves for RankBoost
with proper loss function. I expect this mixture of algorithms and techniques to provide very efficient and
reliable method of ranking. At this moment, pruning works fine for classification purposes.
- rotation-invariant Conformal FT-like neural networks. Seems that I resolved main issues with final formula,
but there are still some problems with pretraining, since I don't use binary hidden variables. PCA is the
strongest candidate on pretraining at the moment.
- finally, after Avazu CTR contest I came to strong opinion that good results in this area may be achieved only after finding good vector representations for categories (like word2vec for words). This maybe a bit tricky, since I have only ideas about heuristics that may appear useful, not some optimization approach.