# MLHEP 2016 lectures slides

This year my team at Yandex organized MLHEP (Machine Learning in High Energy Physics) summer school in Lund, Sweden.

There were two tracks: basic and advanced, lasting for three days + 2 days on neural networks for both tracks together.

School was accompanied by two kaggle challenges: one for both tracks and one for advanced. This is the most producive way to try and learn techniques in practice.

Just as a year ago, I gave lectures for basic track. Previous materials were enriched with new topics and more explanations.

Also, I’ve added many visualizations and animations compared to the previous year.

This 3-day course is the *shortest course of machine learning*, and it

still gives nice introduction into some advanced topics!

## Day 1

Introduction to machine learning terminology. Applications within High Energy Physics and outside HEP.

- Basic problems: classification and regression.
- Nearest neighbours approach and spacial indices
- Overfitting (intro)
- Curse of dimensionality
- ROC curve, ROC AUC
- Bayes optimal classifier
- Density estimation: KDE and histograms
- Parametric density estimation
- Mixtures for density estimation and EM algorithm

- Generative approach vs discriminative approach
- Linear models:
- Linear decision rule, intro to logistic regression
- Linear regression

## Day 2

- Linear models: logistic regression
- Polynomial decision rule and polynomial regression
- SVM (Support Vector Machine) and kernel trick
- Overfitting: two definitions
- Model selection
- Regularizations: L1, L2, elastic net.
- Decision trees
- Splitting criteria for classification and regression
- Overfitting in trees: pre-stopping and post-pruning
- Non-stability of trees
- Feature importance

- Ensembling
- RSM, subsampling, bagging.
- Random Forest

## Day 3

- Ensembles
- AdaBoost
- Gradient Boosting for regression
- Gradient Boosting for classification
- Second-order information
- Losses: regression, classification, ranking

- Multiclass classification:
- ensembling
- softmax modifications

- Feature engineering and output engineering
- Feature selection
- Dimensionality rediction:
- PCA
- LDA, CSP
- LLE
- Isomap

- Hyperparameter optimization
- ML-based approach
- Gaussian processes

## Day 4, part 1

Slides of Tatiana Likhomanenko on non-trivial applications of boosting in High Energy Physics.

# Links

- All materials from school are available at MLHEP 2016 repository
- Official page at indico
- Kaggle competitions for school: exotic higgs and triggers

**research scientist in machine learning**to join your team?

Drop me an email, I'm currently open for opportunities! My CV.