Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Sparse Neural Networks

Oct-9-2024, 14:53:01 GMT–Neural Information Processing Systems

The lottery ticket hypothesis (LTH) states that learning on a properly pruned network (the winning ticket) has improved test accuracy over the original unpruned network. Although LTH has been justified empirically in a broad range of deep neural network (DNN) involved applications like computer vision and natural language processing, the theoretical validation of the improved generalization of a winning ticket remains elusive. To the best of our knowledge, our work, for the first time, characterizes the performance of training a pruned neural network by analyzing the geometric structure of the objective function and the sample complexity to achieve zero generalization error. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned, indicating the structural importance of a winning ticket. Moreover, as the algorithm for training a pruned neural network is specified as an (accelerated) stochastic gradient descent algorithm, we theoretically show that the number of samples required for achieving zero generalization error is proportional to the number of the non-pruned weights in the hidden layer.

neural network, sparse neural network, theoretical perspective, (4 more...)

Neural Information Processing Systems

Oct-9-2024, 14:53:01 GMT

Conferences Web Page

Add feedback

Genre:
- Contests & Prizes (1.00)

Industry:
- Leisure & Entertainment > Gambling (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Gradient Descent (0.61)