Metric functions

Currently hep_ml.metrics module contains metric functions, which measure nonuniformity in predictions.

These metrics are unfortunately more complicated than usual ones and require more information: not only predictions and classes, but also mass (or other variables along which we want to have uniformity)

Available metrics of uniformity of predictions (for each of them bin version and knn version are available):

  • SDE - the standard deviation of efficiency

  • Theil - Theil index of Efficiency (Theil index is used in economics)

  • CVM - based on Cramer-von Mises similarity between distributions

uniform_label:
  • 1, if you want to measure non-uniformity in signal predictions

  • 0, if background.

Metrics are following REP conventions (first fit, then compute metrics on same dataset). For these metrics fit stage is crucial, since it precomputes information using dataset X, which is quite long and better to do this once. Different quality metrics with same interface can be found in REP package.

Examples

we want to check if our predictions are uniform in mass for background events

>>> metric = BinBasedCvM(uniform_features=['mass'], uniform_label=0)
>>> metric.fit(X, y, sample_weight=sample_weight)
>>> result = metric(y, classifier.predict_proba(X), sample_weight=sample_weight)

to check predictions over two variables in signal (for dimensions > 2 always use kNN, not bins):

>>> metric = KnnBasedCvM(uniform_features=['mass12', 'mass23'], uniform_label=1)
>>> metric.fit(X, y, sample_weight=sample_weight)
>>> result = metric(y, classifier.predict_proba(X), sample_weight=sample_weight)

to check uniformity of signal predictions at global signal efficiency of 0.7:

>>> metric = KnnBasedSDE(uniform_features=['mass12', 'mass23'], uniform_label=1, target_rcp=[0.7])
>>> metric.fit(X, y, sample_weight=sample_weight)
>>> result = metric(y, classifier.predict_proba(X), sample_weight=sample_weight)

Generally kNN versions are slower, but more stable in higher dimensions. Don’t forget to scale features is those are of different nature.

class hep_ml.metrics.BinBasedCvM(uniform_features, uniform_label, n_bins=10, power=2.0)[source]

Bases: hep_ml.metrics.AbstractBinMetric

Nonuniformity metric based on Cramer-von Mises distance between distributions, computed on bins.

Parameters
  • uniform_features (list[str]) – features, in which we compute non-uniformity.

  • uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)

  • n_bins (int) – number of bins used along each axis.

  • power (float) – power used in CvM formula (default is 2.)

class hep_ml.metrics.BinBasedSDE(uniform_features, uniform_label, n_bins=10, target_rcp=None, power=2.0)[source]

Bases: hep_ml.metrics.AbstractBinMetric

Standard Deviation of Efficiency, computed using bins.

Parameters
  • uniform_features (list[str]) – features, in which we compute non-uniformity.

  • uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)

  • n_bins (int) – number of bins used along each axis.

  • target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]

  • power (float) – power used in SDE formula (default is 2.)

class hep_ml.metrics.BinBasedTheil(uniform_features, uniform_label, n_bins=10, target_rcp=None)[source]

Bases: hep_ml.metrics.AbstractBinMetric

Theil index of Efficiency, computed using bins.

Parameters
  • uniform_features (list[str]) – features, in which we compute non-uniformity.

  • uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)

  • n_bins (int) – number of bins used along each axis.

  • target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]

class hep_ml.metrics.KnnBasedCvM(uniform_features, uniform_label, n_neighbours=50, power=2.0)[source]

Bases: hep_ml.metrics.AbstractKnnMetric

Nonuniformity metric based on Cramer-von Mises distance between distributions, computed on nearest neighbours.

Parameters
  • uniform_features (list[str]) – features, in which we compute non-uniformity.

  • uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)

  • n_neighbours (int) – number of neighs

  • power (float) – power used in CvM formula (default is 2.)

class hep_ml.metrics.KnnBasedSDE(uniform_features, uniform_label, n_neighbours=50, target_rcp=None, power=2.0)[source]

Bases: hep_ml.metrics.AbstractKnnMetric

Standard Deviation of Efficiency, computed using k nearest neighbours.

Parameters
  • uniform_features (list[str]) – features, in which we compute non-uniformity.

  • uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)

  • n_neighbours (int) – number of neighs

  • target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]

  • power (float) – power used in SDE formula (default is 2.)

class hep_ml.metrics.KnnBasedTheil(uniform_features, uniform_label, n_neighbours=50, target_rcp=None)[source]

Bases: hep_ml.metrics.AbstractKnnMetric

Theil index of Efficiency, computed using k nearest neighbours.

Parameters
  • uniform_features (list[str]) – features, in which we compute non-uniformity.

  • uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)

  • n_neighbours (int) – number of neighs

  • target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]