Plant 'n' Seek: Can You Find the Winning Ticket?

Fischer, Jonas, Burkholz, Rebekka

arXiv.org Artificial Intelligence 

The lottery ticket hypothesis has sparked the rapid development of pruning algorithms that perform structure learning by identifying a sparse subnetwork of a large randomly initialized neural network. The existence of such'winning tickets' has been proven theoretically but at suboptimal sparsity levels. Contemporary pruning algorithms have furthermore been struggling to identify sparse lottery tickets for complex learning tasks. Is this suboptimal sparsity merely an artifact of existence proofs and algorithms or a general limitation of the pruning approach? And, if very sparse tickets exist, are current algorithms able to find them or are further improvements needed to achieve effective network compression? To answer these questions systematically, we derive a framework to plant and hide target architectures within large randomly initialized neural networks. For three common challenges in machine learning, we hand-craft extremely sparse network topologies, plant them in large neural networks, and evaluate state-ofthe-art lottery ticket pruning methods. We find that current limitations of pruning algorithms to identify extremely sparse tickets are likely of algorithmic rather than fundamental nature and anticipate that our planting framework will facilitate future developments of efficient pruning algorithms, as we have addressed the issue of missing baselines in the field raised by Frankle et al. (2021). Deep learning has achieved breakthroughs in multiple challenging areas pertaining to machine learning, in particular in areas for which we lack competitive hand-crafted algorithms. The benefits of overparameterization for training with SGD (Belkin et al., 2019) seem to call for ever wider and deeper neural network (NN) architectures, which are computationally demanding to learn and deploy. Training smaller, adequately regularized NNs from scratch could be a remedy but it commonly seems to fail due to inadequate parameter initialization, as Frankle & Carbin (2019) noted in their seminal paper. As proof of concept that this problem is solvable, they proposed the lottery ticket (LT) hypothesis, which states that a small, well trainable subnetwork can be identified by pruning a large, randomly initialized NN, opening the field to discover such subnetworks or'winning tickets'. Based on the findings of Zhou et al. (2019), Ramanujan et al. (2020b) went even further and conjectured the existence of strong lottery tickets, i.e., subnetworks of randomly initialized NNs that do not require any further training. This strong LT hypothesis holds the promise that training NNs could potentially be replaced by efficient NN pruning, which simultaneously performs structure learning by identifying a task specific sparse neural network architecture. The existence of strong LTs has also been proven formally for networks without (Malach et al., 2020; Pensia et al., 2020; Orseau et al., 2020) and with potentially nonzero biases (Fischer & Burkholz, 2021).