Metric functions¶
Currently hep_ml.metrics module contains metric functions, which measure nonuniformity in predictions.
These metrics are unfortunately more complicated than usual ones and require more information: not only predictions and classes, but also mass (or other variables along which we want to have uniformity)
Available metrics of uniformity of predictions (for each of them bin version and knn version are available):
SDE - the standard deviation of efficiency
Theil - Theil index of Efficiency (Theil index is used in economics)
CVM - based on Cramer-von Mises similarity between distributions
- uniform_label:
1, if you want to measure non-uniformity in signal predictions
0, if background.
Metrics are following REP conventions (first fit, then compute metrics on same dataset). For these metrics fit stage is crucial, since it precomputes information using dataset X, which is quite long and better to do this once. Different quality metrics with same interface can be found in REP package.
Examples¶
we want to check if our predictions are uniform in mass for background events
>>> metric = BinBasedCvM(uniform_features=['mass'], uniform_label=0)
>>> metric.fit(X, y, sample_weight=sample_weight)
>>> result = metric(y, classifier.predict_proba(X), sample_weight=sample_weight)
to check predictions over two variables in signal (for dimensions > 2 always use kNN, not bins):
>>> metric = KnnBasedCvM(uniform_features=['mass12', 'mass23'], uniform_label=1)
>>> metric.fit(X, y, sample_weight=sample_weight)
>>> result = metric(y, classifier.predict_proba(X), sample_weight=sample_weight)
to check uniformity of signal predictions at global signal efficiency of 0.7:
>>> metric = KnnBasedSDE(uniform_features=['mass12', 'mass23'], uniform_label=1, target_rcp=[0.7])
>>> metric.fit(X, y, sample_weight=sample_weight)
>>> result = metric(y, classifier.predict_proba(X), sample_weight=sample_weight)
Generally kNN versions are slower, but more stable in higher dimensions. Don’t forget to scale features is those are of different nature.
- class hep_ml.metrics.BinBasedCvM(uniform_features, uniform_label, n_bins=10, power=2.0)[source]¶
Bases:
hep_ml.metrics.AbstractBinMetric
Nonuniformity metric based on Cramer-von Mises distance between distributions, computed on bins.
- Parameters
uniform_features (list[str]) – features, in which we compute non-uniformity.
uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)
n_bins (int) – number of bins used along each axis.
power (float) – power used in CvM formula (default is 2.)
- class hep_ml.metrics.BinBasedSDE(uniform_features, uniform_label, n_bins=10, target_rcp=None, power=2.0)[source]¶
Bases:
hep_ml.metrics.AbstractBinMetric
Standard Deviation of Efficiency, computed using bins.
- Parameters
uniform_features (list[str]) – features, in which we compute non-uniformity.
uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)
n_bins (int) – number of bins used along each axis.
target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]
power (float) – power used in SDE formula (default is 2.)
- class hep_ml.metrics.BinBasedTheil(uniform_features, uniform_label, n_bins=10, target_rcp=None)[source]¶
Bases:
hep_ml.metrics.AbstractBinMetric
Theil index of Efficiency, computed using bins.
- Parameters
uniform_features (list[str]) – features, in which we compute non-uniformity.
uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)
n_bins (int) – number of bins used along each axis.
target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]
- class hep_ml.metrics.KnnBasedCvM(uniform_features, uniform_label, n_neighbours=50, power=2.0)[source]¶
Bases:
hep_ml.metrics.AbstractKnnMetric
Nonuniformity metric based on Cramer-von Mises distance between distributions, computed on nearest neighbours.
- Parameters
uniform_features (list[str]) – features, in which we compute non-uniformity.
uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)
n_neighbours (int) – number of neighs
power (float) – power used in CvM formula (default is 2.)
- class hep_ml.metrics.KnnBasedSDE(uniform_features, uniform_label, n_neighbours=50, target_rcp=None, power=2.0)[source]¶
Bases:
hep_ml.metrics.AbstractKnnMetric
Standard Deviation of Efficiency, computed using k nearest neighbours.
- Parameters
uniform_features (list[str]) – features, in which we compute non-uniformity.
uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)
n_neighbours (int) – number of neighs
target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]
power (float) – power used in SDE formula (default is 2.)
- class hep_ml.metrics.KnnBasedTheil(uniform_features, uniform_label, n_neighbours=50, target_rcp=None)[source]¶
Bases:
hep_ml.metrics.AbstractKnnMetric
Theil index of Efficiency, computed using k nearest neighbours.
- Parameters
uniform_features (list[str]) – features, in which we compute non-uniformity.
uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)
n_neighbours (int) – number of neighs
target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]