This year my team at Yandex organized MLHEP (Machine Learning in High Energy Physics) summer school in Lund, Sweden.

There were two tracks: basic and advanced, lasting for three days + 2 days on neural networks for both tracks together.

School was accompanied by two kaggle challenges: one for both tracks and one for advanced. This is the most producive way to try and learn techniques in practice.

Just as a year ago, I gave lectures for basic track. Previous materials were enriched with new topics and more explanations.

Also, I’ve added many visualizations and animations compared to the previous year.

This 3-day course is the shortest course of machine learning, and it
still gives nice introduction into some advanced topics!

## Day 1

Introduction to machine learning terminology. Applications within High Energy Physics and outside HEP.

• Basic problems: classification and regression.
• Nearest neighbours approach and spacial indices
• Overfitting (intro)
• Curse of dimensionality
• ROC curve, ROC AUC
• Bayes optimal classifier
• Density estimation: KDE and histograms
• Parametric density estimation
• Mixtures for density estimation and EM algorithm
• Generative approach vs discriminative approach
• Linear models:
• Linear decision rule, intro to logistic regression
• Linear regression

## Day 2

• Linear models: logistic regression
• Polynomial decision rule and polynomial regression
• SVM (Support Vector Machine) and kernel trick
• Overfitting: two definitions
• Model selection
• Regularizations: L1, L2, elastic net.
• Decision trees
• Splitting criteria for classification and regression
• Overfitting in trees: pre-stopping and post-pruning
• Non-stability of trees
• Feature importance
• Ensembling
• RSM, subsampling, bagging.
• Random Forest

## Day 3

• Ensembles
• Second-order information
• Losses: regression, classification, ranking
• Multiclass classification:
• ensembling
• softmax modifications
• Feature engineering and output engineering
• Feature selection
• Dimensionality rediction:
• PCA
• LDA, CSP
• LLE
• Isomap
• Hyperparameter optimization
• ML-based approach
• Gaussian processes

## Day 4, part 1

Slides of Tatiana Likhomanenko on non-trivial applications of boosting in High Energy Physics.

1. All materials from school are available at MLHEP 2016 repository
2. Official page at indico
3. Kaggle competitions for school: exotic higgs and triggers