Metric functions¶
Currently hep_ml.metrics module contains metric functions, which measure nonuniformity in predictions.
These metrics are unfortunately more complicated than usual ones and require more information: not only predictions and classes, but also mass (or other variables along which we want to have uniformity)
Available metrics of uniformity of predictions (for each of them bin version and knn version are available):
SDE - the standard deviation of efficiency
Theil - Theil index of Efficiency (Theil index is used in economics)
CVM - based on Cramer-von Mises similarity between distributions
- uniform_label:
1, if you want to measure non-uniformity in signal predictions
0, if background.
Metrics are following REP conventions (first fit, then compute metrics on same dataset). For these metrics fit stage is crucial, since it precomputes information using dataset X, which is quite long and better to do this once. Different quality metrics with same interface can be found in REP package.
Examples¶
we want to check if our predictions are uniform in mass for background events
>>> metric = BinBasedCvM(uniform_features=['mass'], uniform_label=0)
>>> metric.fit(X, y, sample_weight=sample_weight)
>>> result = metric(y, classifier.predict_proba(X), sample_weight=sample_weight)
to check predictions over two variables in signal (for dimensions > 2 always use kNN, not bins):
>>> metric = KnnBasedCvM(uniform_features=['mass12', 'mass23'], uniform_label=1)
>>> metric.fit(X, y, sample_weight=sample_weight)
>>> result = metric(y, classifier.predict_proba(X), sample_weight=sample_weight)
to check uniformity of signal predictions at global signal efficiency of 0.7:
>>> metric = KnnBasedSDE(uniform_features=['mass12', 'mass23'], uniform_label=1, target_rcp=[0.7])
>>> metric.fit(X, y, sample_weight=sample_weight)
>>> result = metric(y, classifier.predict_proba(X), sample_weight=sample_weight)
Generally kNN versions are slower, but more stable in higher dimensions. Don’t forget to scale features is those are of different nature.
- class hep_ml.metrics.BinBasedCvM(uniform_features, uniform_label, n_bins=10, power=2.0)[source]¶
Bases:
AbstractBinMetric
Nonuniformity metric based on Cramer-von Mises distance between distributions, computed on bins.
- Parameters:
uniform_features (list[str]) – features, in which we compute non-uniformity.
uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)
n_bins (int) – number of bins used along each axis.
power (float) – power used in CvM formula (default is 2.)
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BinBasedCvM ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns¶
- selfobject
The updated object.
- class hep_ml.metrics.BinBasedSDE(uniform_features, uniform_label, n_bins=10, target_rcp=None, power=2.0)[source]¶
Bases:
AbstractBinMetric
Standard Deviation of Efficiency, computed using bins.
- Parameters:
uniform_features (list[str]) – features, in which we compute non-uniformity.
uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)
n_bins (int) – number of bins used along each axis.
target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]
power (float) – power used in SDE formula (default is 2.)
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BinBasedSDE ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns¶
- selfobject
The updated object.
- class hep_ml.metrics.BinBasedTheil(uniform_features, uniform_label, n_bins=10, target_rcp=None)[source]¶
Bases:
AbstractBinMetric
Theil index of Efficiency, computed using bins.
- Parameters:
uniform_features (list[str]) – features, in which we compute non-uniformity.
uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)
n_bins (int) – number of bins used along each axis.
target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BinBasedTheil ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns¶
- selfobject
The updated object.
- class hep_ml.metrics.KnnBasedCvM(uniform_features, uniform_label, n_neighbours=50, power=2.0)[source]¶
Bases:
AbstractKnnMetric
Nonuniformity metric based on Cramer-von Mises distance between distributions, computed on nearest neighbours.
- Parameters:
uniform_features (list[str]) – features, in which we compute non-uniformity.
uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)
n_neighbours (int) – number of neighs
power (float) – power used in CvM formula (default is 2.)
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KnnBasedCvM ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns¶
- selfobject
The updated object.
- class hep_ml.metrics.KnnBasedSDE(uniform_features, uniform_label, n_neighbours=50, target_rcp=None, power=2.0)[source]¶
Bases:
AbstractKnnMetric
Standard Deviation of Efficiency, computed using k nearest neighbours.
- Parameters:
uniform_features (list[str]) – features, in which we compute non-uniformity.
uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)
n_neighbours (int) – number of neighs
target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]
power (float) – power used in SDE formula (default is 2.)
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KnnBasedSDE ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns¶
- selfobject
The updated object.
- class hep_ml.metrics.KnnBasedTheil(uniform_features, uniform_label, n_neighbours=50, target_rcp=None)[source]¶
Bases:
AbstractKnnMetric
Theil index of Efficiency, computed using k nearest neighbours.
- Parameters:
uniform_features (list[str]) – features, in which we compute non-uniformity.
uniform_label – label of class, in which uniformity is measured (0 for bck, 1 for signal)
n_neighbours (int) – number of neighs
target_rcp (list[float]) – global right-classified-parts. Thresholds are selected so this part of class was correctly classified. Default values are [0.5, 0.6, 0.7, 0.8, 0.9]
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KnnBasedTheil ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns¶
- selfobject
The updated object.