This year my team at Yandex organized MLHEP (Machine Learning in High Energy Physics) summer school in Lund, Sweden.
There were two tracks: basic and advanced, lasting for three days + 2 days on neural networks for both tracks together.
School was accompanied by two kaggle challenges: one for both tracks and one for advanced. This is the most producive way to try and learn techniques in practice.
Just as a year ago, I gave lectures for basic track. Previous materials were enriched with new topics and more explanations.
Also, I’ve added many visualizations and animations compared to the previous year.
This 3-day course is the shortest course of machine learning, and it
still gives nice introduction into some advanced topics!
Introduction to machine learning terminology. Applications within High Energy Physics and outside HEP.
- Basic problems: classification and regression.
- Nearest neighbours approach and spacial indices
- Overfitting (intro)
- Curse of dimensionality
- ROC curve, ROC AUC
- Bayes optimal classifier
- Density estimation: KDE and histograms
- Parametric density estimation
- Mixtures for density estimation and EM algorithm
- Generative approach vs discriminative approach
- Linear models:
- Linear decision rule, intro to logistic regression
- Linear regression
- Linear models: logistic regression
- Polynomial decision rule and polynomial regression
- SVM (Support Vector Machine) and kernel trick
- Overfitting: two definitions
- Model selection
- Regularizations: L1, L2, elastic net.
- Decision trees
- Splitting criteria for classification and regression
- Overfitting in trees: pre-stopping and post-pruning
- Non-stability of trees
- Feature importance
- RSM, subsampling, bagging.
- Random Forest
- Gradient Boosting for regression
- Gradient Boosting for classification
- Second-order information
- Losses: regression, classification, ranking
- Multiclass classification:
- softmax modifications
- Feature engineering and output engineering
- Feature selection
- Dimensionality rediction:
- LDA, CSP
- Hyperparameter optimization
- ML-based approach
- Gaussian processes
Day 4, part 1
Slides of Tatiana Likhomanenko on non-trivial applications of boosting in High Energy Physics.