Fast predictions¶
hep_ml.speedup is module to obtain formulas with machine learning, which can be applied very fast (with a speed comparable to simple selections), while keeping high quality of classification.
In many application (i.e. triggers in HEP) it is pressing to get really fast formula. This module contains tools to prepare formulas, which can be applied with the speed comparable to cuts.
Example¶
Let’s show how one can use some really heavy classifier and still have fast predictions:
>>> from sklearn.ensemble import RandomForestClassifier
>>> from hep_ml.speedup import LookupClassifier
>>> base_classifier = RandomForestClassifier(n_estimators=1000, max_depth=25)
>>> classifier = LookupClassifier(base_estimator=base_classifier, keep_trained_estimator=False)
>>> classifier.fit(X, y, sample_weight=sample_weight)
Though training takes much time, all predictions are precomputed and saved to lookup table, so you are able to predict millions of events per second using single CPU:
>>> classifier.predict_proba(testX)
- class hep_ml.speedup.LookupClassifier(base_estimator, n_bins=16, max_cells=500000000, keep_trained_estimator=True)[source]¶
Bases:
BaseEstimator
,ClassifierMixin
LookupClassifier splits each of features into bins, trains a base_estimator to use this data. To predict class for new observation, results of base_estimator are kept for all possible combinations of bins, and saved together
- Parameters:
base_estimator – classifier used to build predictions
n_bins (int | dict) –
int: how many bins to use for each axis
dict: feature_name -> int, specialize how many bins to use for each axis
dict: feature_name -> list of floats, set manually edges of bins
By default, the (weighted) quantiles are used to compute bin edges.
max_cells (int) – raise error if lookup table will have more items.
keep_trained_estimator (bool) – if True, trained estimator will be saved.
See also: this idea is used inside LHCb triggers, see V. Gligorov, M. Williams, ‘Bonsai BDT’
Resulting formula is very simple and can be rewritten in other language or environment (C++, CUDA, etc).
- convert_bins_to_lookup_index(bins_indices)[source]¶
- Parameters:
bins_indices – numpy.array of shape [n_samples, n_columns], filled with indices of bins.
- Returns:
numpy.array of shape [n_samples] with corresponding index in lookup table
- convert_lookup_index_to_bins(lookup_indices)[source]¶
- Parameters:
lookup_indices – array of shape [n_samples] with positions at lookup table
- Returns:
array of shape [n_samples, n_features] with indices of bins.
- fit(X, y, sample_weight=None)[source]¶
Train a classifier and collect predictions for all possible combinations.
- Parameters:
X – pandas.DataFrame or numpy.array with data of shape [n_samples, n_features]
y – array with labels of shape [n_samples]
sample_weight – None or array of shape [n_samples] with weights of events
- Returns:
self
- predict(X)[source]¶
Predict class for each event
- Parameters:
X – pandas.DataFrame with data
- Returns:
array of shape [n_samples] with predicted class labels.
- predict_proba(X)[source]¶
Predict probabilities for new observations
- Parameters:
X – pandas.DataFrame with data
- Returns:
probabilities, array of shape [n_samples, n_classes]
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LookupClassifier ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns¶
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LookupClassifier ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns¶
- selfobject
The updated object.