With proposed technique one can build very deep neural networks (up to hundreds of layers). The key principle is very simple: activation of next layer is computed using explicitly given activation previous layer $$x_{n+1} = x_n + f(x_n),$$ where the second summand contains non-linearity (and this term is small enough, at least on firs iterations).