Metric functions

Currently hep_ml.metrics module contains metric functions, which measure nonuniformity in predictions.

These metrics are unfortunately more complicated than usual ones and require more information: not only predictions and classes, but also mass (or other variables along which we want to have uniformity)

Available metrics of uniformity of predictions (for each of them bin version and knn version are available):

  • SDE - the standard deviation of efficiency

  • Theil - Theil index of Efficiency (Theil index is used in economics)

  • CVM - based on Cramer-von Mises similarity between distributions

uniform_label:
  • 1, if you want to measure non-uniformity in signal predictions

  • 0, if background.

Metrics are following REP conventions (first fit, then compute metrics on same dataset). For these metrics fit stage is crucial, since it precomputes information using dataset X, which is quite long and better to do this once. Different quality metrics with same interface can be found in REP package.

Examples

we want to check if our predictions are uniform in mass for background events

>>> metric = BinBasedCvM(uniform_features=['mass'], uniform_label=0)
>>> metric.fit(X, y, sample_weight=sample_weight)
>>> result = metric(y, classifier.predict_proba(X), sample_weight=sample_weight)

to check predictions over two variables in signal (for dimensions > 2 always use kNN, not bins):

>>> metric = KnnBasedCvM(uniform_features=['mass12', 'mass23'], uniform_label=1)
>>> metric.fit(X, y, sample_weight=sample_weight)
>>> result = metric(y, classifier.predict_proba(X), sample_weight=sample_weight)

to check uniformity of signal predictions at global signal efficiency of 0.7:

>>> metric = KnnBasedSDE(uniform_features=['mass12', 'mass23'], uniform_label=1, target_rcp=[0.7])
>>> metric.fit(X, y, sample_weight=sample_weight)
>>> result = metric(y, classifier.predict_proba(X), sample_weight=sample_weight)

Generally kNN versions are slower, but more stable in higher dimensions. Don’t forget to scale features is those are of different nature.

class hep_ml.metrics.BinBasedCvM(uniform_features, uniform_label, n_bins=10, power=2.0)[source]

Bases: AbstractBinMetric

Nonuniformity metric based on Cramer-von Mises distance between distributions, computed on bins.

Parameters:
  • uniform_features (list[str]) – features, in which we compute non-uniformity.

  • uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)

  • n_bins (int) – number of bins used along each axis.

  • power (float) – power used in CvM formula (default is 2.)

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BinBasedCvM

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns

selfobject

The updated object.

class hep_ml.metrics.BinBasedSDE(uniform_features, uniform_label, n_bins=10, target_rcp=None, power=2.0)[source]

Bases: AbstractBinMetric

Standard Deviation of Efficiency, computed using bins.

Parameters:
  • uniform_features (list[str]) – features, in which we compute non-uniformity.

  • uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)

  • n_bins (int) – number of bins used along each axis.

  • target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]

  • power (float) – power used in SDE formula (default is 2.)

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BinBasedSDE

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns

selfobject

The updated object.

class hep_ml.metrics.BinBasedTheil(uniform_features, uniform_label, n_bins=10, target_rcp=None)[source]

Bases: AbstractBinMetric

Theil index of Efficiency, computed using bins.

Parameters:
  • uniform_features (list[str]) – features, in which we compute non-uniformity.

  • uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)

  • n_bins (int) – number of bins used along each axis.

  • target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BinBasedTheil

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns

selfobject

The updated object.

class hep_ml.metrics.KnnBasedCvM(uniform_features, uniform_label, n_neighbours=50, power=2.0)[source]

Bases: AbstractKnnMetric

Nonuniformity metric based on Cramer-von Mises distance between distributions, computed on nearest neighbours.

Parameters:
  • uniform_features (list[str]) – features, in which we compute non-uniformity.

  • uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)

  • n_neighbours (int) – number of neighs

  • power (float) – power used in CvM formula (default is 2.)

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KnnBasedCvM

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns

selfobject

The updated object.

class hep_ml.metrics.KnnBasedSDE(uniform_features, uniform_label, n_neighbours=50, target_rcp=None, power=2.0)[source]

Bases: AbstractKnnMetric

Standard Deviation of Efficiency, computed using k nearest neighbours.

Parameters:
  • uniform_features (list[str]) – features, in which we compute non-uniformity.

  • uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)

  • n_neighbours (int) – number of neighs

  • target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]

  • power (float) – power used in SDE formula (default is 2.)

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KnnBasedSDE

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns

selfobject

The updated object.

class hep_ml.metrics.KnnBasedTheil(uniform_features, uniform_label, n_neighbours=50, target_rcp=None)[source]

Bases: AbstractKnnMetric

Theil index of Efficiency, computed using k nearest neighbours.

Parameters:
  • uniform_features (list[str]) – features, in which we compute non-uniformity.

  • uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)

  • n_neighbours (int) – number of neighs

  • target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KnnBasedTheil

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns

selfobject

The updated object.