bayesian sparsification
Shaving Weights with Occam's Razor: Bayesian Sparsification for Neural Networks using the Marginal Likelihood
Neural network sparsification is a promising avenue to save computational time and memory costs, especially in an age where many successful AI models are becoming too large to naively deploy on consumer hardware. While much work has focused on different weight pruning criteria, the overall sparsifiability of the network, i.e., its capacity to be pruned without quality loss, has often been overlooked. We present Sparsifiability via the Marginal likelihood (SpaM), a sparsification framework that highlights the effectiveness of using the Bayesian marginal likelihood in conjunction with sparsity-inducing priors for making neural networks more sparsifiable. Our approach implements an automatic Occam's razor that selects the most sparsifiable model that still explains the data well, both for structured and unstructured sparsification. In addition, we demonstrate that the pre-computed posterior precision from the Laplace approximation can be re-used to define a cheap pruning criterion, which outperforms many existing (more expensive) approaches.
Bayesian sparsification for deep neural networks with Bayesian model reduction
Marković, Dimitrije, Friston, Karl J., Kiebel, Stefan J.
Deep learning's immense capabilities are often constrained by the complexity of its models, leading to an increasing demand for effective sparsification techniques. Bayesian sparsification for deep learning emerges as a crucial approach, facilitating the design of models that are both computationally efficient and competitive in terms of performance across various deep learning applications. The state-of-the-art -- in Bayesian sparsification of deep neural networks -- combines structural shrinkage priors on model weights with an approximate inference scheme based on stochastic variational inference. However, model inversion of the full generative model is exceptionally computationally demanding, especially when compared to standard deep learning of point estimates. In this context, we advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights. As a generalization of the Savage-Dickey ratio, BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model. Our comparative study highlights the advantages of the BMR method relative to established approaches based on hierarchical horseshoe priors over model weights. We illustrate the potential of BMR across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision Transformers and MLP-Mixers.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England > Greater London > London (0.14)
- Europe > Germany > Saxony > Dresden (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Structured Sparsification of Gated Recurrent Neural Networks
Lobacheva, Ekaterina, Chirkova, Nadezhda, Markovich, Alexander, Vetrov, Dmitry
Recently, a lot of techniques were developed to sparsify the weights of neural networks and to remove networks' structure units, e.g. neurons. We adjust the existing sparsification approaches to the gated recurrent architectures. Specifically, in addition to the sparsification of weights and neurons, we propose sparsifying the preactivations of gates. This makes some gates constant and simplifies LSTM structure. We test our approach on the text classification and language modeling tasks. We observe that the resulting structure of gate sparsity depends on the task and connect the learned structure to the specifics of the particular tasks. Our method also improves neuron-wise compression of the model in most of the tasks.
- North America > Canada (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
Bayesian Sparsification of Gated Recurrent Neural Networks
Lobacheva, Ekaterina, Chirkova, Nadezhda, Vetrov, Dmitry
Bayesian methods have been successfully applied to sparsify weights of neural networks and to remove structure units from the networks, e. g. neurons. We apply and further develop this approach for gated recurrent architectures. Specifically, in addition to sparsification of individual weights and neurons, we propose to sparsify preactivations of gates and information flow in LSTM. It makes some gates and information flow components constant, speeds up forward pass and improves compression. Moreover, the resulting structure of gate sparsity is interpretable and depends on the task. Code is available on github: https://github.com/tipt0p/SparseBayesianRNN
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
Bayesian Sparsification of Recurrent Neural Networks
Lobacheva, Ekaterina, Chirkova, Nadezhda, Vetrov, Dmitry
Recurrent neural networks show state-of-the-art results in many text analysis tasks but often require a lot of memory to store their weights. Recently proposed Sparse V ariational Dropout (Molchanov et al., 2017) eliminates the majority of the weights in a feed-forward neural network without significant loss of quality. We apply this technique to sparsify recurrent neural networks. To account for recurrent specifics we also rely on Binary V ariational Dropout for RNN (Gal & Ghahramani, 2016b). We report 99.5% sparsity level on sentiment analysis task without a quality drop and up to 87% sparsity level on language modeling task with slight loss of accuracy.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
- Asia > Russia (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)