How Many Samples are Needed to Estimate a Convolutional Neural Network?

Neural Information Processing Systems 

A widespread folklore for explaining the success of Convolutional Neural Networks (CNNs) is that CNNs use a more compact representation than the Fully-connected Neural Network (FNN) and thus require fewer training samples to accurately estimate their parameters. We initiate the study of rigorously characterizing the sample complexity of estimating CNNs. We show that for an m -dimensional convolutional filter with linear activation acting on a d -dimensional input, the sample complexity of achieving population prediction error of \epsilon is \widetilde{O(m/\epsilon 2), whereas the sample-complexity for its FNN counterpart is lower bounded by \Omega(d/\epsilon 2) samples. Since, in typical settings m \ll d, this result demonstrates the advantage of using a CNN. We further consider the sample complexity of estimating a one-hidden-layer CNN with linear activation where both the m -dimensional convolutional filter and the r -dimensional output weights are unknown.