Neural networks¶

hep_ml.nnet is minimalistic theano-powered version of feed-forward neural networks. The neural networks from this library provide sklearn classifier’s interface.

Definitions for loss functions, trainers of neural networks are defined in this file too. Main point of this library: black-box stochastic optimization of any given loss function. This gives ability to define any activation expression (at the cost of unavailability of pretraining).

In this file we have examples of neural networks, user is encouraged to write his own specific architecture, which can be much more complex than those used usually.

If you don’t want to dive into details, use hep_ml.nnet.MLPClassifier, hep_ml.nnet.MLPRegressor

This library should be preferred for different experiments with architectures. Also hep_ml.nnet allows optimization of parameters in any differentiable decision function.

Being written in theano, these neural networks are able to make use of your GPU.

See also libraries: keras, mxnet, pytorch.

Examples¶

Training a neural network with two hidden layers using IRPROP- algorithm

>>> network = MLPClassifier(layers=[7, 7], loss='log_loss', trainer='irprop-', epochs=1000)
>>> network.fit(X, y)
>>> probability = network.predict_proba(X)

Training an AdaBoost over neural network with adadelta trainer. Trainer specific parameter was used (size of minibatch)

>>> from sklearn.ensemble import AdaBoostClassifier
>>> base_network = MLPClassifier(layers=[10], trainer='adadelta', trainer_parameters={'batch': 600})
>>> classifier = AdaBoostClassifier(base_estimator=base_network, n_estimators=20)
>>> classifier.fit(X, y)

Using custom pretransformer and ExponentialLoss:

>>> from sklearn.preprocessing import PolynomialFeatures
>>> network = MLPClassifier(layers=[10], scaler=PolynomialFeatures(), loss='exp_loss')

To create custom neural network, see code of SimpleNeuralNetwork, which is good place to start.

Interface¶

Below an interface for classifier and regressor is demonstrated.

class hep_ml.nnet.AbstractNeuralNetworkClassifier(layers=(10), scaler='standard', loss='log_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]¶

Base class for classification neural networks. Supports only binary classification, supports weights, which makes it usable in boosting.

Works as usual sklearn classifier, can be used in pipelines, ensembles, pickled, etc.

Parameters

layers – list of int, e.g [9, 7] - the number of units in each hidden layer
scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.
loss – loss function used. Options: dict_keys([‘exp_loss’, ‘log_loss’, ‘exp_log_loss’, ‘squared_loss’])
trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])
epochs – number of times each sample takes part in training
trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

fit(X, y, sample_weight=None)¶

Prepare the model by optimizing selected loss function with some trainer.

Parameters

X – numpy.array of shape [n_samples, n_features]
y – numpy.array of shape [n_samples]
sample_weight – numpy.array of shape [n_samples], leave None for array of 1’s

Returns

self

predict(X)[source]¶

Predict the classes for new events (not recommended, use predict_proba).

Parameters: X (numpy.array) – of shape [n_samples, n_features]
Returns: numpy.array of shape [n_samples] with labels of predicted classes.

predict_proba(X)[source]¶

Computes probability of each event to belong to each class

Parameters: X (numpy.array) – of shape [n_samples, n_features]
Returns: numpy.array of shape [n_samples, n_classes]

class hep_ml.nnet.AbstractNeuralNetworkRegressor(layers=(10), scaler='standard', loss='mse_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]¶

Base class for regression neural networks. Supports weights.

Works as usual sklearn classifier, can be used in pipelines, ensembles, pickled, etc.

Parameters

layers – list of int, e.g [9, 7] - the number of units in each hidden layer
scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.
loss – loss function used. Options: dict_keys([‘mse_loss’, ‘smooth_huber_loss’])
trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])
epochs – number of times each sample takes part in training
trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

fit(X, y, sample_weight=None)¶

Prepare the model by optimizing selected loss function with some trainer.

Parameters

X – numpy.array of shape [n_samples, n_features]
y – numpy.array of shape [n_samples]
sample_weight – numpy.array of shape [n_samples], leave None for array of 1’s

Returns

self

predict(X)[source]¶

Compute predictions for new events.

Parameters: X (numpy.array) – of shape [n_samples, n_features]
Returns: numpy.array of shape [n_samples] with labels of predicted classes

Recommended networks¶

Multilayer Perceptron is well-known popular algorithm working well in most applications.

class hep_ml.nnet.MLPClassifier(layers=(10), scaler='standard', loss='log_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]¶

MLP (MultiLayerPerceptron) for binary classification. Supports arbitrary number of layers (tanh is used as activation).

Parameters

layers – list of int, e.g [9, 7] - the number of units in each hidden layer
scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.
loss – loss function used. Options: dict_keys([‘exp_loss’, ‘log_loss’, ‘exp_log_loss’, ‘squared_loss’])
trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])
epochs – number of times each sample takes part in training
trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

class hep_ml.nnet.MLPRegressor(layers=(10), scaler='standard', loss='mse_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]¶

MLP (MultiLayerPerceptron) regressor. Supports arbitrary number of layers (tanh is used as activation).

Parameters

layers – list of int, e.g [9, 7] - the number of units in each hidden layer
scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.
loss – loss function used. Options: dict_keys([‘mse_loss’, ‘smooth_huber_loss’])
trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])
epochs – number of times each sample takes part in training
trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

class hep_ml.nnet.MLPMultiClassifier(layers=(10), scaler='standard', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]¶: MLP (MultiLayerPerceptron) for multi-class classification. Supports arbitrary number of layers (tanh is used as activation).

Custom networks¶

Below some examples of custom networks are given. Those are initial point for constructing own architectures.

class hep_ml.nnet.SimpleNeuralNetwork(layers=(10), scaler='standard', loss='log_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]¶

The most simple NN with one hidden layer (sigmoid activation), for example purposes. Supports only one hidden layer.

See source code as an example of custom NN.

Parameters

layers – list of int, e.g [9, 7] - the number of units in each hidden layer
scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.
loss – loss function used. Options: dict_keys([‘exp_loss’, ‘log_loss’, ‘exp_log_loss’, ‘squared_loss’])
trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])
epochs – number of times each sample takes part in training
trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

class hep_ml.nnet.SoftmaxNeuralNetwork(layers=(10), scaler='standard', loss='log_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]¶

Neural network with one hidden layer, softmax activation function

Parameters

layers – list of int, e.g [9, 7] - the number of units in each hidden layer
scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.
loss – loss function used. Options: dict_keys([‘exp_loss’, ‘log_loss’, ‘exp_log_loss’, ‘squared_loss’])
trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])
epochs – number of times each sample takes part in training
trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

class hep_ml.nnet.RBFNeuralNetwork(layers=(10), scaler='standard', loss='log_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]¶

Neural network with one hidden layer with normalized RBF activation (Radial Basis Function).

Parameters

layers – list of int, e.g [9, 7] - the number of units in each hidden layer
scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.
loss – loss function used. Options: dict_keys([‘exp_loss’, ‘log_loss’, ‘exp_log_loss’, ‘squared_loss’])
trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])
epochs – number of times each sample takes part in training
trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

class hep_ml.nnet.PairwiseNeuralNetwork(layers=(10), scaler='standard', loss='log_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]¶

The result is computed as \(h = sigmoid(Ax)\), \(output = \sum_{ij} B_{ij} h_i (1 - h_j)\), this is a brilliant example when easier to define activation function rather than trying to implement this inside some framework.

Parameters

layers – list of int, e.g [9, 7] - the number of units in each hidden layer
scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.
loss – loss function used. Options: dict_keys([‘exp_loss’, ‘log_loss’, ‘exp_log_loss’, ‘squared_loss’])
trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])
epochs – number of times each sample takes part in training
trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

class hep_ml.nnet.PairwiseSoftplusNeuralNetwork(layers=(10), scaler='standard', loss='log_loss', trainer='irprop-', epochs=100, trainer_parameters=None, random_state=None)[source]¶

The result is computed as \(h1 = softplus(A_1 x)\), \(h2 = sigmoid(A_2 x)\), \(output = \sum_{ij} B_{ij} h1_i h2_j)\)

Parameters

layers – list of int, e.g [9, 7] - the number of units in each hidden layer
scaler – ‘standard’, ‘minmax’, ‘iron’ or some other Transformer used to transform features. Default is ‘standard’, which will apply StandardScaler. ‘Iron’ is for heavy tails distributions.
loss – loss function used. Options: dict_keys([‘exp_loss’, ‘log_loss’, ‘exp_log_loss’, ‘squared_loss’])
trainer – string, name of optimization method used. Options: dict_keys([‘sgd’, ‘irprop-‘, ‘irprop+’, ‘adadelta’])
epochs – number of times each sample takes part in training
trainer_parameters (dict) – parameters passed to trainer function (learning_rate, etc., trainer-specific). See parameters in documentation.

Loss functions¶

The following loss functions are available for classification:

hep_ml.nnet.log_loss(y, pred, w)[source]¶: Logistic loss for classification (aka cross-entropy, aka binomial deviance)

hep_ml.nnet.exp_loss(y, pred, w)[source]¶: Exponential loss for classification (aka AdaLoss function)

hep_ml.nnet.exp_log_loss(y, pred, w)[source]¶: Classification loss function, combines logistic loss for signal and exponential loss for background

hep_ml.nnet.squared_loss(y, pred, w)[source]¶: Squared loss for classification, not to be messed up with MSE

The following loss functions are available for regression:

hep_ml.nnet.mse_loss(y, pred, w)[source]¶: Regression loss function, mean squared error.

hep_ml.nnet.smooth_huber_loss(y, pred, w)[source]¶: Regression loss function, smooth version of Huber loss function.

Trainers¶

The trainers are optimization algorithms used to minimize target loss function in neural networks. The trainers are implemented as functions with some standard parameters and some optional.

hep_ml.nnet.sgd_trainer(x, y, w, parameters, loss, random_stream, batch=30, learning_rate=0.1, l2_penalty=0.001, momentum=0.9)[source]¶

Stochastic gradient descent with momentum, trivial but very popular.

Parameters

batch (int) – size of minibatch, each time averaging gradient over minibatch.
learning_rate (float) – size of step
l2_penalty (float) – speed of weights’ decay, l2 regularization prevents overfitting
momentum (float) – momentum to stabilize learning process.

hep_ml.nnet.irprop_minus_trainer(x, y, w, parameters, loss, random_stream, positive_step=1.2, negative_step=0.5, max_step=1.0, min_step=1e-06)[source]¶

IRPROP- is batch trainer, for details see http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.3428 . This is default trainer, very stable for classification.

Parameters

positive_step – factor, by which the step is increased when continuing going in the direction
negative_step – factor, by which the step is increased when changing direction to opposite
min_step – minimal change of weight during iteration
max_step – maximal change of weight during iteration

hep_ml.nnet.irprop_plus_trainer(x, y, w, parameters, loss, random_stream, positive_step=1.2, negative_step=0.5, max_step=1.0, min_step=1e-06)[source]¶

IRPROP+ is batch trainer, for details see http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.3428

Parameters

positive_step – factor, by which the step is increased when continuing going in the direction
negative_step – factor, by which the step is increased when changing direction to opposite
min_step – minimal change of weight during iteration
max_step – maximal change of weight during iteration

hep_ml.nnet.adadelta_trainer(x, y, w, parameters, loss, random_stream, batch=30, learning_rate=0.1, half_life=1000, epsilon=0.0001)[source]¶

AdaDelta is trainer with adaptive learning rate.

Parameters

half_life – momentum-like parameter. The estimated parameters are decreased by 2 after so many events. It is recommended for small datasets to put halflife = number of samples in dataset.
learning_rate – size of step
batch – size of minibatch
epsilon – regularization