Initialization-Dependent Sample Complexity of Linear Predictors and Neural Networks

Neural Information Processing Systems 

Clearly, in order for learning to be possible, we must impose some constraints on the size of the function class. One possibility is to bound the number of parameters (i.e., the dimensions of the matrix W), in which case learnability follows from standard VC-dimension or covering number arguments (see Anthony and Bartlett [1999]).