The module contains an implementation of uBoost algorithm.
The main goal of uBoost is to fight correlation between predictions and some variables (i.e. mass of particle).
uBoostBDT is a modified version of AdaBoost, that targets to obtain efficiency uniformity at the specified level (global efficiency)
uBoostClassifier is a combination of uBoostBDTs for different efficiencies
This implementation is more advanced than one described in the original paper,
contains smoothing and trains classifiers in threads, has learning_rate and uniforming_rate parameters,
does automatic weights renormalization and supports SAMME.R modification to use predicted probabilities.
uBoostBDT is AdaBoostClassifier, which is modified to have flat
efficiency of signal (class=1) along some variables.
Efficiency is only guaranteed at the cut,
corresponding to global efficiency == target_efficiency.
Can be used alone, without uBoostClassifier.
Parameters:
uniform_features – list of strings, names of variables, along which
flatness is desired
uniform_label – int, label of class on which uniformity is desired
(typically 0 for background, 1 for signal).
target_efficiency – float, the flatness is obtained at global BDT cut,
corresponding to global efficiency
n_neighbors – int, (default=50) the number of neighbours,
which are used to compute local efficiency
subsample – float (default=1.0), part of training dataset used
to build each base estimator.
base_estimator – classifier, optional (default=DecisionTreeClassifier(max_depth=2))
The base estimator from which the boosted ensemble is built.
Support for sample weighting is required, as well as proper
classes_ and n_classes_ attributes.
n_estimators – integer, optional (default=50)
number of estimators used.
learning_rate – float, optional (default=1.)
Learning rate shrinks the contribution of each classifier by
learning_rate. There is a trade-off between learning_rate
and n_estimators.
uniforming_rate – float, optional (default=1.)
how much do we take into account the uniformity of signal,
there is a trade-off between uniforming_rate and the speed of
uniforming, zero value corresponds to plain AdaBoost
train_features – list of strings, names of variables used in
fit/predict. If None, all the variables are used
(including uniform_variables)
smoothing – float, (default=0.), used to smooth computing of local
efficiencies, 0.0 corresponds to usual uBoost
random_state – int, RandomState instance or None (default None)
Return the feature importances for train_features.
Returns:
array of shape [n_features], the order is the same as in train_features
fit(X, y, sample_weight=None, neighbours_matrix=None)[source]¶
Build a boosted classifier from the training set (X, y).
Parameters:
X – array-like of shape [n_samples, n_features]
y – labels, array of shape [n_samples] with 0 and 1.
sample_weight – array-like of shape [n_samples] or None
neighbours_matrix – array-like of shape [n_samples, n_neighbours],
each row contains indices of signal neighbours
(neighbours should be computed for background too),
if None, this matrix is computed.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
list of strings, names of variables,
along which flatness is desired
param uniform_label:
int,
tha label of class for which uniformity is desired
param train_features:
list of strings,
names of variables used in fit/predict.
if None, all the variables are used (including uniform_variables)
param n_neighbors:
int, (default=50) the number of neighbours,
which are used to compute local efficiency
param n_estimators:
integer, optional (default=50)
The maximum number of estimators at which boosting is terminated.
In case of perfect fit, the learning procedure is stopped early.
param efficiency_steps:
integer, optional (default=20),
How many uBoostBDTs should be trained
(each with its own target_efficiency)
param base_estimator:
object, optional (default=DecisionTreeClassifier(max_depth=2))
The base estimator from which the boosted ensemble is built.
Support for sample weighting is required,
as well as proper classes_ and n_classes_ attributes.
param subsample:
float (default =1.) part of training dataset used
to train each base classifier.
param smoothing:
float, default=None, used to smooth computing of
local efficiencies, 0.0 corresponds to usual uBoost,
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.