continuous sparsification
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Winning the Lottery with Continuous Sparsification
The search for efficient, sparse deep neural network models is most prominently performed by pruning: training a dense, overparameterized network and removing parameters, usually via following a manually-crafted heuristic. Additionally, the recent Lottery Ticket Hypothesis conjectures that, for a typically-sized neural network, it is possible to find small sub-networks which, when trained from scratch on a comparable budget, match the performance of the original dense counterpart. We revisit fundamental aspects of pruning algorithms, pointing out missing ingredients in previous approaches, and develop a method, Continuous Sparsification, which searches for sparse networks based on a novel approximation of an intractable $\ell_0$ regularization. We compare against dominant heuristic-based methods on pruning as well as ticket search -- finding sparse subnetworks that can be successfully re-trained from an early iterate. Empirical results show that we surpass the state-of-the-art for both objectives, across models and datasets, including VGG trained on CIFAR-10 and ResNet-50 trained on ImageNet. In addition to setting a new standard for pruning, Continuous Sparsification also offers fast parallel ticket search, opening doors to new applications of the Lottery Ticket Hypothesis.
- North America > Canada > Alberta (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
Winning the Lottery with Continuous Sparsification
The search for efficient, sparse deep neural network models is most prominently performed by pruning: training a dense, overparameterized network and removing parameters, usually via following a manually-crafted heuristic. Additionally, the recent Lottery Ticket Hypothesis conjectures that, for a typically-sized neural network, it is possible to find small sub-networks which, when trained from scratch on a comparable budget, match the performance of the original dense counterpart. We revisit fundamental aspects of pruning algorithms, pointing out missing ingredients in previous approaches, and develop a method, Continuous Sparsification, which searches for sparse networks based on a novel approximation of an intractable \ell_0 regularization. We compare against dominant heuristic-based methods on pruning as well as ticket search -- finding sparse subnetworks that can be successfully re-trained from an early iterate. Empirical results show that we surpass the state-of-the-art for both objectives, across models and datasets, including VGG trained on CIFAR-10 and ResNet-50 trained on ImageNet.
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Mondorf, Philipp, Wold, Sondre, Plank, Barbara
A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions via subnetworks that can be composed to perform more complex tasks. Recent developments in mechanistic interpretability have made progress in identifying subnetworks, often referred to as circuits, which represent the minimal computational subgraph responsible for a model's behavior on specific tasks. However, most studies focus on identifying circuits for individual tasks without investigating how functionally similar circuits relate to each other. To address this gap, we examine the modularity of neural networks by analyzing circuits for highly compositional subtasks within a transformer-based language model. Specifically, given a probabilistic context-free grammar, we identify and compare circuits responsible for ten modular string-edit operations. Our results indicate that functionally similar circuits exhibit both notable node overlap and cross-task faithfulness. Moreover, we demonstrate that the circuits identified can be reused and combined through subnetwork set operations to represent more complex functional capabilities of the model. Neural networks can be effectively modeled as causal graphs that illustrate how inputs are mapped to the output space (Mueller et al., 2024). For instance, the feed-forward and attention modules within the Transformer architecture (Vaswani et al., 2017) can be interpreted as a series of causal nodes that guide the transformation from input to output via the residual stream (Ferrando et al., 2024). This abstraction is commonly used in mechanistic interpretability to identify computational subgraphs, or circuits, responsible for the network's behavior on specific tasks (Wang et al., 2023). Circuits are typically identified through causal mediation analysis, which quantifies the causal influence of model components on the network's predictions (Mueller et al., 2024). However, a notable limitation of existing studies is their focus on identifying circuits for isolated, individual tasks. Few studies compare circuits responsible for different functional behaviors of the model, and those that do primarily focus on tasks with limited cross-functional similarity (Hanna et al., 2024b).
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Singapore (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- (6 more...)
Growing Efficient Deep Networks by Structured Continuous Sparsification
Yuan, Xin, Savarese, Pedro, Maire, Michael
We develop an approach to training deep networks while dynamically adjusting their architecture, driven by a principled combination of accuracy and sparsity objectives. Unlike conventional pruning approaches, our method adopts a gradual continuous relaxation of discrete network structure optimization and then samples sparse subnetworks, enabling efficient deep networks to be trained in a growing and pruning manner. Extensive experiments across CIFAR-10, ImageNet, PASCAL VOC, and Penn Treebank, with convolutional models for image classification and semantic segmentation, and recurrent models for language modeling, show that our training scheme yields efficient networks that are smaller and more accurate than those produced by competing pruning methods.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Winning the Lottery with Continuous Sparsification
Savarese, Pedro, Silva, Hugo, Maire, Michael
The Lottery Ticket Hypothesis from Frankle & Carbin (2019) conjectures that, for typically-sized neural networks, it is possible to find small sub-networks which train faster and yield superior performance than their original counterparts. The proposed algorithm to search for "winning tickets", Iterative Magnitude Pruning, consistently finds sub-networks with $90-95\%$ less parameters which train faster and better than the overparameterized models they were extracted from, creating potential applications to problems such as transfer learning. In this paper, we propose Continuous Sparsification, a new algorithm to search for winning tickets which continuously removes parameters from a network during training, and learns the sub-network's structure with gradient-based methods instead of relying on pruning strategies. We show empirically that our method is capable of finding tickets that outperforms the ones learned by Iterative Magnitude Pruning, and at the same time providing faster search, when measured in number of training epochs or wall-clock time.
- Research Report (0.82)
- Contests & Prizes (0.79)