Biologically plausible deep learning
The success of deep networks in a range of prediction tasks has raised the question of whether they can serve as models for processing in the cortex [4, 8]. These networks are most successful when trained using versions of stochastic gradient descent, where small random subsets of training data are used to compute the gradient of the loss function with respect to the weights connecting subsequent layers and then update them. Due to the particular structure of the function represented by these multi-layer networks the 1 gradient is computed using back-propagation - an algorithmic formulation of the chain rule. In all but the last layer, the gradient of a weight in this algorithm is a product of the activity of the units it connects - the pre-synaptic input unit in the lower layer and the post-synaptic output unit in the higher layer. In that sense it is performing a form of local Hebbian learning: the update depends on the product of the feedback activity in the post-synaptic unit (the error signal) and the activity of the pre-synaptic unit.
Nov-19-2018