On How Iterative Magnitude Pruning Discovers Local Receptive Fields in Fully Connected Neural Networks

Redman, William T., Wang, Zhangyang, Ingrosso, Alessandro, Goldt, Sebastian

arXiv.org Artificial Intelligence 

Iterative magnitude pruning (IMP) [1] has emerged as a powerful tool for identifying sparse subnetworks ("winning tickets") that can be trained to perform as well as the dense model they are extracted from [2, 3]. That IMP, despite its simplicity, is more robust in discovering such winning tickets than other, more complex pruning schemes [4] suggests that its iterative coarse-graining [5] is especially capable of extracting and maintaining strong inductive biases. This perspective is strengthened by observations that winning tickets discovered by IMP: 1) have properties that make them transferable across related tasks [6-13] and architectures [14]; 2) can outperform dense models on classes with limited data [15]; 3) have less overconfident predictions [16]. The first direct evidence for IMP discovering good inductive biases came from studying the winning tickets extracted by IMP in fully connected neural networks (FCNs) [17]. Pellegrini and Biroli (2022) [17] found that the sparse subnetworks identified by IMP had local receptive field (RF) structure (Figure 1A), an architectural feature found in visual cortex [18] and convolutional neural networks (CNNs) [19]. Comparing IMP derived winning tickets with the sparse subnetworks found by oneshot pruning (Figure 1B), Pellegrini and Biroli (2022) [17] argued that the iterative nature of IMP was essential for refining the local RF structure. However, to-date, an understanding of how IMP, a pruning method based purely on the magnitude of the network parameters, is able to "sift out" non-localized weights remains unknown. Resolving this will not only shed light on the effect of IMP on FCNs, but also will provide new insight on the success of IMP more broadly.