# Neural Networks

Just to describe one of my experiments with neural networks.

Neural networs initially were developed as simulation of real neurons, first training rules (i.e. Hebb's rule) were 'reproducing' the behaviour we observe in nature.

But I don't expect this aproach to be very fruitful today. I prefer thinking of neural network as of just one of ways to define function (which is usually called activation function).

For instance, one-layer perceptron's activation function may be written down as

$$f(x) = \sigma( a^i \, x_i )$$

following the Einstein rule, I omit the summation over $i$. $a_i$ are weights.

Activation function for two-layer perceptron ($a^i_j$ and $b^j$ are weights):

$$f(x) = \sigma( b^j \, \sigma( a^i_j \, x_i )) $$

If one operates the vector variables, and $Ab$ is matrix-by-vector dot product, $\sigma x$ denotes elementwise sigmoid function, then activation function can be written down in pretty simple way:

$$f(x) = \sigma b \sigma A x $$

Neural networs initially were developed as simulation of real neurons, first training rules (i.e. Hebb's rule) were 'reproducing' the behaviour we observe in nature.

But I don't expect this aproach to be very fruitful today. I prefer thinking of neural network as of just one of ways to define function (which is usually called activation function).

For instance, one-layer perceptron's activation function may be written down as

$$f(x) = \sigma( a^i \, x_i )$$

following the Einstein rule, I omit the summation over $i$. $a_i$ are weights.

Activation function for two-layer perceptron ($a^i_j$ and $b^j$ are weights):

$$f(x) = \sigma( b^j \, \sigma( a^i_j \, x_i )) $$

If one operates the vector variables, and $Ab$ is matrix-by-vector dot product, $\sigma x$ denotes elementwise sigmoid function, then activation function can be written down in pretty simple way:

$$f(x) = \sigma b \sigma A x $$

This is how one can define two-layer perceptron in theano, for instance. Three- or four- layer perceptron isn't more complicated really.

But defining function is only the part of the story - what about training of network?

I'm sure that the most efficient algorithms won't come from neurobiology, but from pure mathematics. And that is how it is done in today's guides to neural networks: you define activation function, define some figure of merit (logloss for instance), and then use your favourite way of optimization.

I hope that soon the activation functions will be inspired by mathematics, though I didn't succeed much n this direction.

One of activation functions I tried is the following:

First layer:

$$y = \sigma A x $$

Second (pairwise) layer:

$$f(x) = \sigma (b^{ij} y_i y_j ) $$

The difference here that we can use now not only activation of neurons, but introduce some pairwise interaction between them. Unfortunately, I didn't feel much difference between this modification and simple two-layer network.

Thank to theano, this is very simple to play with different activation functions :)