Losses for Gradient Boosting¶
hep_ml.losses contains different loss functions to use in gradient boosting.
Apart from standard classification losses, hep_ml contains losses for uniform classification
(see BinFlatnessLossFunction
, KnnFlatnessLossFunction
, KnnAdaLossFunction
)
and for ranking (see RankBoostLossFunction
)
Interface
Loss functions inside hep_ml are stateful estimators and require initial fitting, which is done automatically inside gradient boosting.
All loss function should be derived from AbstractLossFunction and implement this interface.
Examples¶
Training gradient boosting, optimizing LogLoss and using all features
>>> from hep_ml.gradientboosting import UGradientBoostingClassifier, LogLossFunction
>>> classifier = UGradientBoostingClassifier(loss=LogLossFunction(), n_estimators=100)
>>> classifier.fit(X, y, sample_weight=sample_weight)
Using composite loss function and subsampling:
>>> loss = CompositeLossFunction()
>>> classifier = UGradientBoostingClassifier(loss=loss, subsample=0.5)
To get uniform predictions in mass in background (note that mass should not present in features):
>>> loss = BinFlatnessLossFunction(uniform_features=['mass'], uniform_label=0, train_features=['pt', 'flight_time'])
>>> classifier = UGradientBoostingClassifier(loss=loss)
To get uniform predictions in both signal and background:
>>> loss = BinFlatnessLossFunction(uniform_features=['mass'], uniform_label=[0, 1], train_features=['pt', 'flight_time'])
>>> classifier = UGradientBoostingClassifier(loss=loss)
- class hep_ml.losses.AbstractLossFunction[source]¶
Bases:
sklearn.base.BaseEstimator
This is base class for loss functions used in hep_ml. Main differences compared to scikit-learn loss functions:
losses are stateful, and may require fitting of training data before usage.
thus, when computing gradient, hessian, one shall provide predictions of all events.
losses are object that shall be passed as estimators to gradient boosting (see examples).
only two-class case is supported, and different classes may have different role and meaning.
- compute_optimal_step(y_pred)[source]¶
Compute optimal global step. This method is typically used to make optimal step before fitting trees to reduce variance.
- Parameters
y_pred – initial predictions, numpy.array of shape [n_samples]
- Returns
float
- fit(X, y, sample_weight)[source]¶
This method is optional, it is called before all the others. Heavy preprocessing should be done here.
- prepare_new_leaves_values(terminal_regions, leaf_values, y_pred)[source]¶
Loss function can prepare better values for leaves by overriding this function
- Parameters
terminal_regions – indices of terminal regions of each event.
leaf_values – numpy.array, current mapping of leaf indices to prediction values.
y_pred – predictions before adding new tree.
- Returns
numpy.array with new prediction values for all leaves.
- prepare_tree_params(y_pred)[source]¶
Prepares parameters for regression tree that minimizes MSE
- Parameters
y_pred – contains predictions for all the events passed to fit method, moreover, the order should be the same
- Returns
tuple (tree_target, tree_weight) with target and weight to be used in decision tree
- class hep_ml.losses.AdaLossFunction(regularization=5.0)[source]¶
Bases:
hep_ml.losses.HessianLossFunction
AdaLossFunction is the same as Exponential Loss Function (aka exploss)
- Parameters
regularization – float, penalty for leaves with few events, corresponds roughly to the number of added events of both classes to each leaf.
- fit(X, y, sample_weight)[source]¶
This method is optional, it is called before all the others. Heavy preprocessing should be done here.
- hessian(y_pred)[source]¶
Returns diagonal of hessian matrix. :param y_pred: numpy.array of shape [n_samples] with events passed in the same order as in fit. :return: numpy.array of shape [n_sampels] with second derivatives with respect to each prediction.
- prepare_tree_params(y_pred)[source]¶
Prepares parameters for regression tree that minimizes MSE
- Parameters
y_pred – contains predictions for all the events passed to fit method, moreover, the order should be the same
- Returns
tuple (tree_target, tree_weight) with target and weight to be used in decision tree
- class hep_ml.losses.BinFlatnessLossFunction(uniform_features, uniform_label, n_bins=10, power=2.0, fl_coefficient=3.0, allow_wrong_signs=True)[source]¶
Bases:
hep_ml.losses.AbstractFlatnessLossFunction
This loss function contains separately penalty for non-flatness and for bad prediction quality. See [FL] for details.
\(\text{loss} =\text{ExpLoss} + c \times \text{FlatnessLoss}\)
FlatnessLoss computed using binning of uniform variables
- Parameters
uniform_features (list[str]) – names of features, along which we want to obtain uniformity of predictions
uniform_label (int|list[int]) – the label(s) of classes for which uniformity is desired
n_bins (int) – number of bins along each variable
power (float) – the loss contains the difference \(| F - F_bin |^p\), where p is power
fl_coefficient (float) – multiplier for flatness_loss. Controls the tradeoff of quality vs uniformity.
allow_wrong_signs (bool) – defines whether gradient may different sign from the “sign of class” (i.e. may have negative gradient on signal). If False, values will be clipped to zero.
- FL
A. Rogozhnikov et al, New approaches for boosting to uniformity http://arxiv.org/abs/1410.4140
- class hep_ml.losses.CompositeLossFunction(regularization=5.0)[source]¶
Bases:
hep_ml.losses.HessianLossFunction
Composite loss function is defined as exploss for backgorund events and logloss for signal with proper constants.
Such kind of loss functions is very useful to optimize AMS or in situations where very clean signal is expected.
- Parameters
regularization – float, penalty for leaves with few events, corresponds roughly to the number of added events of both classes to each leaf.
- fit(X, y, sample_weight)[source]¶
This method is optional, it is called before all the others. Heavy preprocessing should be done here.
- class hep_ml.losses.KnnAdaLossFunction(uniform_features, uniform_label, knn=10, row_norm=1.0)[source]¶
Bases:
hep_ml.losses.AbstractMatrixLossFunction
Modification of AdaLoss to achieve uniformity of predictions
\(\text{loss} = \sum_i w_i * exp(- \sum_j a_{ij} y_j score_j)\)
A matrix is square, each row corresponds to a single event in train dataset, in each row we put ones to the closest neighbours if this event from uniform class. See [BU] for details.
- Parameters
uniform_features (list[str]) – the features, along which uniformity is desired
uniform_label (int|list[int]) – the label (labels) of ‘uniform classes’
knn (int) – the number of nonzero elements in the row, corresponding to event in ‘uniform class’
- BU
A. Rogozhnikov et al, New approaches for boosting to uniformity http://arxiv.org/abs/1410.4140
- class hep_ml.losses.KnnFlatnessLossFunction(uniform_features, uniform_label, n_neighbours=100, power=2.0, fl_coefficient=3.0, max_groups=5000, allow_wrong_signs=True, random_state=42)[source]¶
Bases:
hep_ml.losses.AbstractFlatnessLossFunction
This loss function contains separately penalty for non-flatness and for bad prediction quality. See [FL] for details.
\(\text{loss} = \text{ExpLoss} + c \times \text{FlatnessLoss}\)
FlatnessLoss computed using nearest neighbors in space of uniform features
- Parameters
uniform_features (list[str]) – names of features, along which we want to obtain uniformity of predictions
uniform_label (int|list[int]) – the label(s) of classes for which uniformity is desired
n_neighbours (int) – number of neighbors used in flatness loss
power (float) – the loss contains the difference \(| F - F_bin |^p\), where p is power
fl_coefficient (float) – multiplier for flatness_loss. Controls the tradeoff of quality vs uniformity.
allow_wrong_signs (bool) – defines whether gradient may different sign from the “sign of class” (i.e. may have negative gradient on signal). If False, values will be clipped to zero.
max_groups (int) – to limit memory consumption when training sample is large, we randomly pick this number of points with their members.
- FL
A. Rogozhnikov et al, New approaches for boosting to uniformity http://arxiv.org/abs/1410.4140
- class hep_ml.losses.LogLossFunction(regularization=5.0)[source]¶
Bases:
hep_ml.losses.HessianLossFunction
Logistic loss function (logloss), aka binomial deviance, aka cross-entropy, aka log-likelihood loss.
- Parameters
regularization – float, penalty for leaves with few events, corresponds roughly to the number of added events of both classes to each leaf.
- fit(X, y, sample_weight)[source]¶
This method is optional, it is called before all the others. Heavy preprocessing should be done here.
- hessian(y_pred)[source]¶
Returns diagonal of hessian matrix. :param y_pred: numpy.array of shape [n_samples] with events passed in the same order as in fit. :return: numpy.array of shape [n_sampels] with second derivatives with respect to each prediction.
- prepare_tree_params(y_pred)[source]¶
Prepares parameters for regression tree that minimizes MSE
- Parameters
y_pred – contains predictions for all the events passed to fit method, moreover, the order should be the same
- Returns
tuple (tree_target, tree_weight) with target and weight to be used in decision tree
- class hep_ml.losses.MAELossFunction[source]¶
Bases:
hep_ml.losses.AbstractLossFunction
Mean absolute error loss function, used for regression. \(\text{loss} = \sum_i |y_i - \hat{y}_i|\)
- compute_optimal_step(y_pred)[source]¶
Compute optimal global step. This method is typically used to make optimal step before fitting trees to reduce variance.
- Parameters
y_pred – initial predictions, numpy.array of shape [n_samples]
- Returns
float
- fit(X, y, sample_weight)[source]¶
This method is optional, it is called before all the others. Heavy preprocessing should be done here.
- prepare_new_leaves_values(terminal_regions, leaf_values, y_pred)[source]¶
Loss function can prepare better values for leaves by overriding this function
- Parameters
terminal_regions – indices of terminal regions of each event.
leaf_values – numpy.array, current mapping of leaf indices to prediction values.
y_pred – predictions before adding new tree.
- Returns
numpy.array with new prediction values for all leaves.
- prepare_tree_params(y_pred)[source]¶
Prepares parameters for regression tree that minimizes MSE
- Parameters
y_pred – contains predictions for all the events passed to fit method, moreover, the order should be the same
- Returns
tuple (tree_target, tree_weight) with target and weight to be used in decision tree
- class hep_ml.losses.MSELossFunction(regularization=5.0)[source]¶
Bases:
hep_ml.losses.HessianLossFunction
Mean squared error loss function, used for regression. \(\text{loss} = \sum_i (y_i - \hat{y}_i)^2\)
- Parameters
regularization – float, penalty for leaves with few events, corresponds roughly to the number of added events of both classes to each leaf.
- compute_optimal_step(y_pred)[source]¶
Optimal step is computed using Newton-Raphson algorithm (10 iterations). :param y_pred: predictions (usually, zeros) :return: float
- fit(X, y, sample_weight)[source]¶
This method is optional, it is called before all the others. Heavy preprocessing should be done here.
- hessian(y_pred)[source]¶
Returns diagonal of hessian matrix. :param y_pred: numpy.array of shape [n_samples] with events passed in the same order as in fit. :return: numpy.array of shape [n_sampels] with second derivatives with respect to each prediction.
- prepare_tree_params(y_pred)[source]¶
Prepares parameters for regression tree that minimizes MSE
- Parameters
y_pred – contains predictions for all the events passed to fit method, moreover, the order should be the same
- Returns
tuple (tree_target, tree_weight) with target and weight to be used in decision tree
- class hep_ml.losses.RankBoostLossFunction(request_column, penalty_power=1.0, update_iterations=1)[source]¶
Bases:
hep_ml.losses.HessianLossFunction
RankBoostLossFunction is target of optimization in RankBoost [RB] algorithm, which was developed for ranking and introduces penalties for wrong order of predictions.
However, this implementation goes further and there is selection of optimal leaf values based on iterative procedure. This implementation also uses matrix decomposition of loss function, which is very effective, when labels are from some very limited set (usually it is 0, 1, 2, 3, 4)
\(\text{loss} = \sum_{ij} w_{ij} exp(pred_i - pred_j)\),
\(w_{ij} = ( \alpha + \beta * [query_i = query_j]) R_{label_i, label_j}\), where \(R_{ij} = 0\) if \(i \leq j\), else \(R_{ij} = (i - j)^{p}\)
- Parameters
request_column (str) – name of column with search query ids. The higher attention is payed to samples with same query.
penalty_power (float) – describes dependence of penalty on the difference between target labels.
update_iterations (int) – number of minimization steps to provide optimal values in leaves.
- RB
Freund et al. An Efficient Boosting Algorithm for Combining Preferences
- fit(X, y, sample_weight)[source]¶
This method is optional, it is called before all the others. Heavy preprocessing should be done here.
- class hep_ml.losses.ReweightLossFunction(regularization=5.0)[source]¶
Bases:
hep_ml.losses.AbstractLossFunction
Loss function used to reweight distributions. Works inside
hep_ml.reweight.GBReweighter
See [Rew] for details.Conventions: \(y=0\) - target distribution, \(y=1\) - original distribution.
Weights after look like:
\(w = w_0\) for target distribution
\(w = w_0 * exp(pred)\) for events from original distribution (so predictions for target distribution is ignored)
- Parameters
regularization (float) – roughly, it’s number of events added in each leaf to prevent overfitting.
- fit(X, y, sample_weight)[source]¶
This method is optional, it is called before all the others. Heavy preprocessing should be done here.
- prepare_new_leaves_values(terminal_regions, leaf_values, y_pred)[source]¶
Loss function can prepare better values for leaves by overriding this function
- Parameters
terminal_regions – indices of terminal regions of each event.
leaf_values – numpy.array, current mapping of leaf indices to prediction values.
y_pred – predictions before adding new tree.
- Returns
numpy.array with new prediction values for all leaves.
- prepare_tree_params(y_pred)[source]¶
Prepares parameters for regression tree that minimizes MSE
- Parameters
y_pred – contains predictions for all the events passed to fit method, moreover, the order should be the same
- Returns
tuple (tree_target, tree_weight) with target and weight to be used in decision tree