Link to article: http://arxiv.org/abs/1505.00387

With proposed technique one can build very deep neural networks (up to hundreds of layers). The key principle why this works is very simple: $$ x_{n+1} = x_n + f(x_n), $$ where the second summand is small enough.

There are two points actually:

  • First, one uses very many layers, and is are able to approximate all needed functions.
  • Second, since the first summand dominates, there is no vanishing gradient problem.

Not sure if this really has some advantages over shallow ANNs, but still an interesting approach.

So, it's a way to train deep network, though doesn't have any attitude to what people usually call 'deep learning', since here we are not trying to establish some new hidden categories.