Losses for Gradient Boosting

hep_ml.losses contains different loss functions to use in gradient boosting.

Apart from standard classification losses, hep_ml contains losses for uniform classification (see BinFlatnessLossFunction, KnnFlatnessLossFunction, KnnAdaLossFunction) and for ranking (see RankBoostLossFunction)

Interface

Loss functions inside hep_ml are stateful estimators and require initial fitting, which is done automatically inside gradient boosting.

All loss function should be derived from AbstractLossFunction and implement this interface.

Examples

Training gradient boosting, optimizing LogLoss and using all features

>>> from hep_ml.gradientboosting import UGradientBoostingClassifier, LogLossFunction
>>> classifier = UGradientBoostingClassifier(loss=LogLossFunction(), n_estimators=100)
>>> classifier.fit(X, y, sample_weight=sample_weight)

Using composite loss function and subsampling:

>>> loss = CompositeLossFunction()
>>> classifier = UGradientBoostingClassifier(loss=loss, subsample=0.5)

To get uniform predictions in mass in background (note that mass should not present in features):

>>> loss = BinFlatnessLossFunction(uniform_features=['mass'], uniform_label=0, train_features=['pt', 'flight_time'])
>>> classifier = UGradientBoostingClassifier(loss=loss)

To get uniform predictions in both signal and background:

>>> loss = BinFlatnessLossFunction(uniform_features=['mass'], uniform_label=[0, 1], train_features=['pt', 'flight_time'])
>>> classifier = UGradientBoostingClassifier(loss=loss)
class hep_ml.losses.AbstractLossFunction[source]

Bases: sklearn.base.BaseEstimator

This is base class for loss functions used in hep_ml. Main differences compared to scikit-learn loss functions:

  1. losses are stateful, and may require fitting of training data before usage.

  2. thus, when computing gradient, hessian, one shall provide predictions of all events.

  3. losses are object that shall be passed as estimators to gradient boosting (see examples).

  4. only two-class case is supported, and different classes may have different role and meaning.

compute_optimal_step(y_pred)[source]

Compute optimal global step. This method is typically used to make optimal step before fitting trees to reduce variance.

Parameters

y_pred – initial predictions, numpy.array of shape [n_samples]

Returns

float

fit(X, y, sample_weight)[source]

This method is optional, it is called before all the others. Heavy preprocessing should be done here.

prepare_new_leaves_values(terminal_regions, leaf_values, y_pred)[source]

Loss function can prepare better values for leaves by overriding this function

Parameters
  • terminal_regions – indices of terminal regions of each event.

  • leaf_values – numpy.array, current mapping of leaf indices to prediction values.

  • y_pred – predictions before adding new tree.

Returns

numpy.array with new prediction values for all leaves.

prepare_tree_params(y_pred)[source]

Prepares parameters for regression tree that minimizes MSE

Parameters

y_pred – contains predictions for all the events passed to fit method, moreover, the order should be the same

Returns

tuple (tree_target, tree_weight) with target and weight to be used in decision tree

class hep_ml.losses.AdaLossFunction(regularization=5.0)[source]

Bases: hep_ml.losses.HessianLossFunction

AdaLossFunction is the same as Exponential Loss Function (aka exploss)

Parameters

regularization – float, penalty for leaves with few events, corresponds roughly to the number of added events of both classes to each leaf.

fit(X, y, sample_weight)[source]

This method is optional, it is called before all the others. Heavy preprocessing should be done here.

hessian(y_pred)[source]

Returns diagonal of hessian matrix. :param y_pred: numpy.array of shape [n_samples] with events passed in the same order as in fit. :return: numpy.array of shape [n_sampels] with second derivatives with respect to each prediction.

negative_gradient(y_pred)[source]
prepare_tree_params(y_pred)[source]

Prepares parameters for regression tree that minimizes MSE

Parameters

y_pred – contains predictions for all the events passed to fit method, moreover, the order should be the same

Returns

tuple (tree_target, tree_weight) with target and weight to be used in decision tree

class hep_ml.losses.BinFlatnessLossFunction(uniform_features, uniform_label, n_bins=10, power=2.0, fl_coefficient=3.0, allow_wrong_signs=True)[source]

Bases: hep_ml.losses.AbstractFlatnessLossFunction

This loss function contains separately penalty for non-flatness and for bad prediction quality. See [FL] for details.

\(\text{loss} =\text{ExpLoss} + c \times \text{FlatnessLoss}\)

FlatnessLoss computed using binning of uniform variables

Parameters
  • uniform_features (list[str]) – names of features, along which we want to obtain uniformity of predictions

  • uniform_label (int|list[int]) – the label(s) of classes for which uniformity is desired

  • n_bins (int) – number of bins along each variable

  • power (float) – the loss contains the difference \(| F - F_bin |^p\), where p is power

  • fl_coefficient (float) – multiplier for flatness_loss. Controls the tradeoff of quality vs uniformity.

  • allow_wrong_signs (bool) – defines whether gradient may different sign from the “sign of class” (i.e. may have negative gradient on signal). If False, values will be clipped to zero.

FL

A. Rogozhnikov et al, New approaches for boosting to uniformity http://arxiv.org/abs/1410.4140

class hep_ml.losses.CompositeLossFunction(regularization=5.0)[source]

Bases: hep_ml.losses.HessianLossFunction

Composite loss function is defined as exploss for backgorund events and logloss for signal with proper constants.

Such kind of loss functions is very useful to optimize AMS or in situations where very clean signal is expected.

Parameters

regularization – float, penalty for leaves with few events, corresponds roughly to the number of added events of both classes to each leaf.

fit(X, y, sample_weight)[source]

This method is optional, it is called before all the others. Heavy preprocessing should be done here.

hessian(y_pred)[source]

Returns diagonal of hessian matrix. :param y_pred: numpy.array of shape [n_samples] with events passed in the same order as in fit. :return: numpy.array of shape [n_sampels] with second derivatives with respect to each prediction.

negative_gradient(y_pred)[source]
class hep_ml.losses.KnnAdaLossFunction(uniform_features, uniform_label, knn=10, row_norm=1.0)[source]

Bases: hep_ml.losses.AbstractMatrixLossFunction

Modification of AdaLoss to achieve uniformity of predictions

\(\text{loss} = \sum_i w_i * exp(- \sum_j a_{ij} y_j score_j)\)

A matrix is square, each row corresponds to a single event in train dataset, in each row we put ones to the closest neighbours if this event from uniform class. See [BU] for details.

Parameters
  • uniform_features (list[str]) – the features, along which uniformity is desired

  • uniform_label (int|list[int]) – the label (labels) of ‘uniform classes’

  • knn (int) – the number of nonzero elements in the row, corresponding to event in ‘uniform class’

BU

A. Rogozhnikov et al, New approaches for boosting to uniformity http://arxiv.org/abs/1410.4140

compute_parameters(trainX, trainY, trainW)[source]

This method should be overloaded in descendant, and should return A, w (matrix and vector)

class hep_ml.losses.KnnFlatnessLossFunction(uniform_features, uniform_label, n_neighbours=100, power=2.0, fl_coefficient=3.0, max_groups=5000, allow_wrong_signs=True, random_state=42)[source]

Bases: hep_ml.losses.AbstractFlatnessLossFunction

This loss function contains separately penalty for non-flatness and for bad prediction quality. See [FL] for details.

\(\text{loss} = \text{ExpLoss} + c \times \text{FlatnessLoss}\)

FlatnessLoss computed using nearest neighbors in space of uniform features

Parameters
  • uniform_features (list[str]) – names of features, along which we want to obtain uniformity of predictions

  • uniform_label (int|list[int]) – the label(s) of classes for which uniformity is desired

  • n_neighbours (int) – number of neighbors used in flatness loss

  • power (float) – the loss contains the difference \(| F - F_bin |^p\), where p is power

  • fl_coefficient (float) – multiplier for flatness_loss. Controls the tradeoff of quality vs uniformity.

  • allow_wrong_signs (bool) – defines whether gradient may different sign from the “sign of class” (i.e. may have negative gradient on signal). If False, values will be clipped to zero.

  • max_groups (int) – to limit memory consumption when training sample is large, we randomly pick this number of points with their members.

FL

A. Rogozhnikov et al, New approaches for boosting to uniformity http://arxiv.org/abs/1410.4140

class hep_ml.losses.LogLossFunction(regularization=5.0)[source]

Bases: hep_ml.losses.HessianLossFunction

Logistic loss function (logloss), aka binomial deviance, aka cross-entropy, aka log-likelihood loss.

Parameters

regularization – float, penalty for leaves with few events, corresponds roughly to the number of added events of both classes to each leaf.

fit(X, y, sample_weight)[source]

This method is optional, it is called before all the others. Heavy preprocessing should be done here.

hessian(y_pred)[source]

Returns diagonal of hessian matrix. :param y_pred: numpy.array of shape [n_samples] with events passed in the same order as in fit. :return: numpy.array of shape [n_sampels] with second derivatives with respect to each prediction.

negative_gradient(y_pred)[source]
prepare_tree_params(y_pred)[source]

Prepares parameters for regression tree that minimizes MSE

Parameters

y_pred – contains predictions for all the events passed to fit method, moreover, the order should be the same

Returns

tuple (tree_target, tree_weight) with target and weight to be used in decision tree

class hep_ml.losses.MAELossFunction[source]

Bases: hep_ml.losses.AbstractLossFunction

Mean absolute error loss function, used for regression. \(\text{loss} = \sum_i |y_i - \hat{y}_i|\)

compute_optimal_step(y_pred)[source]

Compute optimal global step. This method is typically used to make optimal step before fitting trees to reduce variance.

Parameters

y_pred – initial predictions, numpy.array of shape [n_samples]

Returns

float

fit(X, y, sample_weight)[source]

This method is optional, it is called before all the others. Heavy preprocessing should be done here.

negative_gradient(y_pred)[source]
prepare_new_leaves_values(terminal_regions, leaf_values, y_pred)[source]

Loss function can prepare better values for leaves by overriding this function

Parameters
  • terminal_regions – indices of terminal regions of each event.

  • leaf_values – numpy.array, current mapping of leaf indices to prediction values.

  • y_pred – predictions before adding new tree.

Returns

numpy.array with new prediction values for all leaves.

prepare_tree_params(y_pred)[source]

Prepares parameters for regression tree that minimizes MSE

Parameters

y_pred – contains predictions for all the events passed to fit method, moreover, the order should be the same

Returns

tuple (tree_target, tree_weight) with target and weight to be used in decision tree

class hep_ml.losses.MSELossFunction(regularization=5.0)[source]

Bases: hep_ml.losses.HessianLossFunction

Mean squared error loss function, used for regression. \(\text{loss} = \sum_i (y_i - \hat{y}_i)^2\)

Parameters

regularization – float, penalty for leaves with few events, corresponds roughly to the number of added events of both classes to each leaf.

compute_optimal_step(y_pred)[source]

Optimal step is computed using Newton-Raphson algorithm (10 iterations). :param y_pred: predictions (usually, zeros) :return: float

fit(X, y, sample_weight)[source]

This method is optional, it is called before all the others. Heavy preprocessing should be done here.

hessian(y_pred)[source]

Returns diagonal of hessian matrix. :param y_pred: numpy.array of shape [n_samples] with events passed in the same order as in fit. :return: numpy.array of shape [n_sampels] with second derivatives with respect to each prediction.

negative_gradient(y_pred)[source]
prepare_tree_params(y_pred)[source]

Prepares parameters for regression tree that minimizes MSE

Parameters

y_pred – contains predictions for all the events passed to fit method, moreover, the order should be the same

Returns

tuple (tree_target, tree_weight) with target and weight to be used in decision tree

class hep_ml.losses.RankBoostLossFunction(request_column, penalty_power=1.0, update_iterations=1)[source]

Bases: hep_ml.losses.HessianLossFunction

RankBoostLossFunction is target of optimization in RankBoost [RB] algorithm, which was developed for ranking and introduces penalties for wrong order of predictions.

However, this implementation goes further and there is selection of optimal leaf values based on iterative procedure. This implementation also uses matrix decomposition of loss function, which is very effective, when labels are from some very limited set (usually it is 0, 1, 2, 3, 4)

\(\text{loss} = \sum_{ij} w_{ij} exp(pred_i - pred_j)\),

\(w_{ij} = ( \alpha + \beta * [query_i = query_j]) R_{label_i, label_j}\), where \(R_{ij} = 0\) if \(i \leq j\), else \(R_{ij} = (i - j)^{p}\)

Parameters
  • request_column (str) – name of column with search query ids. The higher attention is payed to samples with same query.

  • penalty_power (float) – describes dependence of penalty on the difference between target labels.

  • update_iterations (int) – number of minimization steps to provide optimal values in leaves.

RB
  1. Freund et al. An Efficient Boosting Algorithm for Combining Preferences

fit(X, y, sample_weight)[source]

This method is optional, it is called before all the others. Heavy preprocessing should be done here.

hessian(y_pred)[source]

Returns diagonal of hessian matrix. :param y_pred: numpy.array of shape [n_samples] with events passed in the same order as in fit. :return: numpy.array of shape [n_sampels] with second derivatives with respect to each prediction.

negative_gradient(y_pred)[source]
prepare_new_leaves_values(terminal_regions, leaf_values, y_pred)[source]

This expression comes from optimization of second-order approximation of loss function.

class hep_ml.losses.ReweightLossFunction(regularization=5.0)[source]

Bases: hep_ml.losses.AbstractLossFunction

Loss function used to reweight distributions. Works inside hep_ml.reweight.GBReweighter See [Rew] for details.

Conventions: \(y=0\) - target distribution, \(y=1\) - original distribution.

Weights after look like:

  • \(w = w_0\) for target distribution

  • \(w = w_0 * exp(pred)\) for events from original distribution (so predictions for target distribution is ignored)

Parameters

regularization (float) – roughly, it’s number of events added in each leaf to prevent overfitting.

Rew

http://arogozhnikov.github.io/2015/10/09/gradient-boosted-reweighter.html

fit(X, y, sample_weight)[source]

This method is optional, it is called before all the others. Heavy preprocessing should be done here.

negative_gradient(y_pred)[source]
prepare_new_leaves_values(terminal_regions, leaf_values, y_pred)[source]

Loss function can prepare better values for leaves by overriding this function

Parameters
  • terminal_regions – indices of terminal regions of each event.

  • leaf_values – numpy.array, current mapping of leaf indices to prediction values.

  • y_pred – predictions before adding new tree.

Returns

numpy.array with new prediction values for all leaves.

prepare_tree_params(y_pred)[source]

Prepares parameters for regression tree that minimizes MSE

Parameters

y_pred – contains predictions for all the events passed to fit method, moreover, the order should be the same

Returns

tuple (tree_target, tree_weight) with target and weight to be used in decision tree