On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences

Neural Information Processing Systems 

We consider the class of noisy multi-layered sigmoid recurrent neural networks with w (unbounded) weights for classification of sequences of length T, where independent noise distributed according to \mathcal{N}(0,\sigma 2) is added to the output of each neuron in the network. Our main result shows that the sample complexity of PAC learning this class can be bounded by O (w\log(T/\sigma)) . For the non-noisy version of the same class (i.e., \sigma 0), we prove a lower bound of \Omega (wT) for the sample complexity. Our results indicate an exponential gap in the dependence of sample complexity on T for noisy versus non-noisy networks. Moreover, given the mild logarithmic dependence of the upper bound on 1/\sigma, this gap still holds even for numerically negligible values of \sigma .