Who invented deep residual learning?

Schmidhuber, Juergen

arXiv.org Artificial Intelligence 

Who invented deep residual learning? Modern AI is based on deep artificial neural networks (NNs). As of 2025, the most cited scientific article of the 21st century is an NN paper on deep residual learning with residual connections . Here is the timeline of the evolution of deep residual learning: 1991: recurrent residual connections (weight 1.0) solve the vanishing gradient problem 1997 LSTM: plain recurrent residual connections (weight 1.0) 1999 LSTM: gated recurrent residual connections (gates initially open: 1.0) 2005: unfolding LSTM--from recurrent to feedforward residual NNs May 2015: very deep Highway Net--gated feedforward residual connections (initially 1.0) Dec 2015: ResNet--like an open-gated Highway Net (or an unfolded 1997 LSTM) His recurrent residual connection was mathematically derived from first principles to overcome the fundamental deep learning problem of vanishing or exploding gradients, first identified and analyzed in the very same thesis. That is, at every time step of information processing, this unit just adds its current input to its previous activation value. The invariant residual connections transport error signals back to typically highly nonlinear adaptive parts of the NN where they can cause appropriate weight changes.