MLHEP 2016 lectures slides

This year my team at Yandex organized MLHEP (Machine Learning in High Energy Physics) summer school in Lund, Sweden.

There were two tracks: basic and advanced, lasting for three days + 2 days on neural networks for both tracks together.

School was accompanied by two kaggle challenges: one for both tracks and one for advanced. This is the most producive way to try and learn techniques in practice.

Just as a year ago, I gave lectures for basic track. Previous materials were enriched with new topics and more explanations.

Also, I’ve added many visualizations and animations compared to the previous year.

This 3-day course is the shortest course of machine learning, and it
still gives nice introduction into some advanced topics!

Day 1

MLHEP Lectures - day 1, basic track from arogozhnikov

Introduction to machine learning terminology. Applications within High Energy Physics and outside HEP.

Basic problems: classification and regression.
Nearest neighbours approach and spacial indices
Overfitting (intro)
Curse of dimensionality
ROC curve, ROC AUC
Bayes optimal classifier
Density estimation: KDE and histograms
Parametric density estimation
- Mixtures for density estimation and EM algorithm
Generative approach vs discriminative approach
Linear models:
- Linear decision rule, intro to logistic regression
- Linear regression

Day 2

MLHEP Lectures - day 2, basic track from arogozhnikov

Linear models: logistic regression
Polynomial decision rule and polynomial regression
SVM (Support Vector Machine) and kernel trick
Overfitting: two definitions
Model selection
Regularizations: L1, L2, elastic net.
Decision trees
- Splitting criteria for classification and regression
- Overfitting in trees: pre-stopping and post-pruning
- Non-stability of trees
- Feature importance
Ensembling
- RSM, subsampling, bagging.
- Random Forest

Day 3

MLHEP Lectures - day 3, basic track from arogozhnikov

Ensembles
- AdaBoost
- Gradient Boosting for regression
- Gradient Boosting for classification
- Second-order information
- Losses: regression, classification, ranking
Multiclass classification:
- ensembling
- softmax modifications
Feature engineering and output engineering
Feature selection
Dimensionality rediction:
- PCA
- LDA, CSP
- LLE
- Isomap
Hyperparameter optimization
- ML-based approach
- Gaussian processes

Day 4, part 1

Reweighting and Boosting to uniforimty in HEP from arogozhnikov

Slides of Tatiana Likhomanenko on non-trivial applications of boosting in High Energy Physics.

MLHEP 2016 lectures slides

Day 1

Day 2

Day 3

Day 4, part 1

Links

Top posts at "brilliantly wrong": (all posts)