Neural networks

hep_ml.nnet is minimalistic theano-powered version of feed-forward neural networks. The neural networks from this library provide sklearn classifier’s interface.

Definitions for loss functions, trainers of neural networks are defined in this file too. Main point of this library: black-box stochastic optimization of any given loss function. This gives ability to define any activation expression (at the cost of unavailability of pretraining).

In this file we have examples of neural networks, user is encouraged to write his own specific architecture, which can be much more complex than those used usually.

If you don’t want to dive into details, use hep_ml.nnet.MLPClassifier, hep_ml.nnet.MLPRegressor

This library should be preferred for different experiments with architectures. Also hep_ml.nnet allows optimization of parameters in any differentiable decision function.

Being written in theano, these neural networks are able to make use of your GPU.

See also libraries: keras, mxnet, pytorch.

Examples

Training a neural network with two hidden layers using IRPROP- algorithm

>>> network = MLPClassifier(layers=[7, 7], loss='log_loss', trainer='irprop-', epochs=1000)
>>> network.fit(X, y)
>>> probability = network.predict_proba(X)

Training an AdaBoost over neural network with adadelta trainer. Trainer specific parameter was used (size of minibatch)

>>> from sklearn.ensemble import AdaBoostClassifier
>>> base_network = MLPClassifier(layers=[10], trainer='adadelta', trainer_parameters={'batch': 600})
>>> classifier = AdaBoostClassifier(base_estimator=base_network, n_estimators=20)
>>> classifier.fit(X, y)

Using custom pretransformer and ExponentialLoss:

>>> from sklearn.preprocessing import PolynomialFeatures
>>> network = MLPClassifier(layers=[10], scaler=PolynomialFeatures(), loss='exp_loss')

To create custom neural network, see code of SimpleNeuralNetwork, which is good place to start.

Interface

Below an interface for classifier and regressor is demonstrated.

class hep_ml.nnet.AbstractNeuralNetworkClassifier(layers=(10), scaler='standard', loss='log_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]

Base class for classification neural networks. Supports only binary classification, supports weights, which makes it usable in boosting.

Works as usual sklearn classifier, can be used in pipelines, ensembles, pickled, etc.

Parameters
  • layers – list of int, e.g [9, 7] - the number of units in each hidden layer

  • scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.

  • loss – loss function used. Options: dict_keys([‘exp_loss’, ‘log_loss’, ‘exp_log_loss’, ‘squared_loss’])

  • trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])

  • epochs – number of times each sample takes part in training

  • trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

fit(X, y, sample_weight=None)

Prepare the model by optimizing selected loss function with some trainer.

Parameters
  • X – numpy.array of shape [n_samples, n_features]

  • y – numpy.array of shape [n_samples]

  • sample_weight – numpy.array of shape [n_samples], leave None for array of 1’s

Returns

self

predict(X)[source]

Predict the classes for new events (not recommended, use predict_proba).

Parameters

X (numpy.array) – of shape [n_samples, n_features]

Returns

numpy.array of shape [n_samples] with labels of predicted classes.

predict_proba(X)[source]

Computes probability of each event to belong to each class

Parameters

X (numpy.array) – of shape [n_samples, n_features]

Returns

numpy.array of shape [n_samples, n_classes]

class hep_ml.nnet.AbstractNeuralNetworkRegressor(layers=(10), scaler='standard', loss='mse_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]

Base class for regression neural networks. Supports weights.

Works as usual sklearn classifier, can be used in pipelines, ensembles, pickled, etc.

Parameters
  • layers – list of int, e.g [9, 7] - the number of units in each hidden layer

  • scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.

  • loss – loss function used. Options: dict_keys([‘mse_loss’, ‘smooth_huber_loss’])

  • trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])

  • epochs – number of times each sample takes part in training

  • trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

fit(X, y, sample_weight=None)

Prepare the model by optimizing selected loss function with some trainer.

Parameters
  • X – numpy.array of shape [n_samples, n_features]

  • y – numpy.array of shape [n_samples]

  • sample_weight – numpy.array of shape [n_samples], leave None for array of 1’s

Returns

self

predict(X)[source]

Compute predictions for new events.

Parameters

X (numpy.array) – of shape [n_samples, n_features]

Returns

numpy.array of shape [n_samples] with labels of predicted classes

Custom networks

Below some examples of custom networks are given. Those are initial point for constructing own architectures.

class hep_ml.nnet.SimpleNeuralNetwork(layers=(10), scaler='standard', loss='log_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]

The most simple NN with one hidden layer (sigmoid activation), for example purposes. Supports only one hidden layer.

See source code as an example of custom NN.

Parameters
  • layers – list of int, e.g [9, 7] - the number of units in each hidden layer

  • scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.

  • loss – loss function used. Options: dict_keys([‘exp_loss’, ‘log_loss’, ‘exp_log_loss’, ‘squared_loss’])

  • trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])

  • epochs – number of times each sample takes part in training

  • trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

class hep_ml.nnet.SoftmaxNeuralNetwork(layers=(10), scaler='standard', loss='log_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]

Neural network with one hidden layer, softmax activation function

Parameters
  • layers – list of int, e.g [9, 7] - the number of units in each hidden layer

  • scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.

  • loss – loss function used. Options: dict_keys([‘exp_loss’, ‘log_loss’, ‘exp_log_loss’, ‘squared_loss’])

  • trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])

  • epochs – number of times each sample takes part in training

  • trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

class hep_ml.nnet.RBFNeuralNetwork(layers=(10), scaler='standard', loss='log_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]

Neural network with one hidden layer with normalized RBF activation (Radial Basis Function).

Parameters
  • layers – list of int, e.g [9, 7] - the number of units in each hidden layer

  • scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.

  • loss – loss function used. Options: dict_keys([‘exp_loss’, ‘log_loss’, ‘exp_log_loss’, ‘squared_loss’])

  • trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])

  • epochs – number of times each sample takes part in training

  • trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

class hep_ml.nnet.PairwiseNeuralNetwork(layers=(10), scaler='standard', loss='log_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]

The result is computed as \(h = sigmoid(Ax)\), \(output = \sum_{ij} B_{ij} h_i (1 - h_j)\), this is a brilliant example when easier to define activation function rather than trying to implement this inside some framework.

Parameters
  • layers – list of int, e.g [9, 7] - the number of units in each hidden layer

  • scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.

  • loss – loss function used. Options: dict_keys([‘exp_loss’, ‘log_loss’, ‘exp_log_loss’, ‘squared_loss’])

  • trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])

  • epochs – number of times each sample takes part in training

  • trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

class hep_ml.nnet.PairwiseSoftplusNeuralNetwork(layers=(10), scaler='standard', loss='log_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]

The result is computed as \(h1 = softplus(A_1 x)\), \(h2 = sigmoid(A_2 x)\), \(output = \sum_{ij} B_{ij} h1_i h2_j)\)

Parameters
  • layers – list of int, e.g [9, 7] - the number of units in each hidden layer

  • scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.

  • loss – loss function used. Options: dict_keys([‘exp_loss’, ‘log_loss’, ‘exp_log_loss’, ‘squared_loss’])

  • trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])

  • epochs – number of times each sample takes part in training

  • trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

Loss functions

The following loss functions are available for classification:

hep_ml.nnet.log_loss(y, pred, w)[source]

Logistic loss for classification (aka cross-entropy, aka binomial deviance)

hep_ml.nnet.exp_loss(y, pred, w)[source]

Exponential loss for classification (aka AdaLoss function)

hep_ml.nnet.exp_log_loss(y, pred, w)[source]

Classification loss function, combines logistic loss for signal and exponential loss for background

hep_ml.nnet.squared_loss(y, pred, w)[source]

Squared loss for classification, not to be messed up with MSE

The following loss functions are available for regression:

hep_ml.nnet.mse_loss(y, pred, w)[source]

Regression loss function, mean squared error.

hep_ml.nnet.smooth_huber_loss(y, pred, w)[source]

Regression loss function, smooth version of Huber loss function.

Trainers

The trainers are optimization algorithms used to minimize target loss function in neural networks. The trainers are implemented as functions with some standard parameters and some optional.

hep_ml.nnet.sgd_trainer(x, y, w, parameters, loss, random_stream, batch=30, learning_rate=0.1, l2_penalty=0.001, momentum=0.9)[source]

Stochastic gradient descent with momentum, trivial but very popular.

Parameters
  • batch (int) – size of minibatch, each time averaging gradient over minibatch.

  • learning_rate (float) – size of step

  • l2_penalty (float) – speed of weights’ decay, l2 regularization prevents overfitting

  • momentum (float) – momentum to stabilize learning process.

hep_ml.nnet.irprop_minus_trainer(x, y, w, parameters, loss, random_stream, positive_step=1.2, negative_step=0.5, max_step=1.0, min_step=1e-06)[source]

IRPROP- is batch trainer, for details see http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.3428 . This is default trainer, very stable for classification.

Parameters
  • positive_step – factor, by which the step is increased when continuing going in the direction

  • negative_step – factor, by which the step is increased when changing direction to opposite

  • min_step – minimal change of weight during iteration

  • max_step – maximal change of weight during iteration

hep_ml.nnet.irprop_plus_trainer(x, y, w, parameters, loss, random_stream, positive_step=1.2, negative_step=0.5, max_step=1.0, min_step=1e-06)[source]

IRPROP+ is batch trainer, for details see http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.3428

Parameters
  • positive_step – factor, by which the step is increased when continuing going in the direction

  • negative_step – factor, by which the step is increased when changing direction to opposite

  • min_step – minimal change of weight during iteration

  • max_step – maximal change of weight during iteration

hep_ml.nnet.adadelta_trainer(x, y, w, parameters, loss, random_stream, batch=30, learning_rate=0.1, half_life=1000, epsilon=0.0001)[source]

AdaDelta is trainer with adaptive learning rate.

Parameters
  • half_life – momentum-like parameter. The estimated parameters are decreased by 2 after so many events. It is recommended for small datasets to put halflife = number of samples in dataset.

  • learning_rate – size of step

  • batch – size of minibatch

  • epsilon – regularization