AITopics | neural network compression

Genre: Research Report > Promising Solution (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)

arXiv.org Artificial IntelligenceAug-15-2025

Is Quantum Optimization Ready? An Effort Towards Neural Network Compression using Adiabatic Quantum Computing

Wang, Zhehui, Choong, Benjamin Chen Ming, Huang, Tian, Gerlinghoff, Daniel, Goh, Rick Siow Mong, Liu, Cheng, Luo, Tao

Quantum optimization is the most mature quantum computing technology to date, providing a promising approach towards efficiently solving complex combinatorial problems. Methods such as adiabatic quantum computing (AQC) have been employed in recent years on important optimization problems across various domains. In deep learning, deep neural networks (DNN) have reached immense sizes to support new predictive capabilities. Optimization of large-scale models is critical for sustainable deployment, but becomes increasingly challenging with ever-growing model sizes and complexity. While quantum optimization is suitable for solving complex problems, its application to DNN optimization is not straightforward, requiring thorough reformulation for compatibility with commercially available quantum devices. In this work, we explore the potential of adopting AQC for fine-grained pruning-quantization of convolutional neural networks. We rework established heuristics to formulate model compression as a quadratic unconstrained binary optimization (QUBO) problem, and assess the solution space offered by commercial quantum annealing devices. Through our exploratory efforts of reformulation, we demonstrate that AQC can achieve effective compression of practical DNN models. Experiments demonstrate that adiabatic quantum computing (AQC) not only outperforms classical algorithms like genetic algorithms and reinforcement learning in terms of time efficiency but also excels at identifying global optima.

artificial intelligence, machine learning, optimization, (14 more...)

2505.16332

Country:

Asia > Singapore (0.04)
Asia > China (0.04)
South America > Brazil > Federal District > Brasília (0.04)
Asia > Japan > Honshū > Kantō > Tochigi Prefecture > Utsunomiya (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsJun-1-2025, 00:36:32 GMT

Review for NeurIPS paper: WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

Weaknesses: --- Missing details about lambda While mentioned line 138, the dampening parameter lambda does not appear in the experimental section of the main body, and I only found a value 1e-5 in the appendix (l799). How do you select its value? I expect your final algorithm be very sensitive to lambda, since \delta_L as defined in eq.4 will select directions with smallest curvature. Another comment about lambda is that if you set it to a very large value k, then its becomes dominant compared to the eigenvalues of F, then your technique basically amounts to magnitude pruning. In that regards, it means that MP is just a special case of your technique, when using a large dampening value.

efficient second-order approximation, neural network compression, woodfisher, (8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Neural Information Processing SystemsJun-1-2025, 00:36:25 GMT

Review for NeurIPS paper: WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

The focus of the submission is training neural networks using 2nd-order information. Particularly, the goal of the work is the approximation of the inverse of the empirical Fisher matrix as it is defined in the displayed equation under (1). The authors notice that the empirical Fisher is an average of diads (a x a T where T denotes transposition) hence its inverse can be recursively computed by the Woodbury matrix identity. The resulting inverse is applied for pruning of convolutional neural networks (CNNs) and is compared against other unstructured pruning methods. Training and pruning neural networks are central problems of machine learning.

efficient second-order approximation, neural network compression, neurips paper, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsOct-11-2024, 10:30:31 GMT

WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

efficient second-order approximation, neural network compression, woodfisher, (3 more...)

Genre: Research Report > Promising Solution (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsOct-7-2024, 23:07:21 GMT

Reviews: Frequency-Domain Dynamic Pruning for Convolutional Neural Networks

My only major issue has been addressed and the same is true for my minor questions and issues, except for (5), which I do not consider crucial, particularly given that the authors only have one page for their response. Since most of my issues were regarding question that I had or minor detail that should be added to the paper, I have raised my confidence of reproducibility to 3. ] The paper introduces a novel method for parameter-pruning in convolutional neural networks that operates in the frequency domain. The latter is a natural domain to determine parameter-importance for convolutional filters – most filters of a trained neural network are smooth and thus have high energy (i.e. An additional advantage of the method is that pruning is not performed as a single post-training step, but parameters can be pruned and re-introduced during training in a continuous fashion, which has been shown to be beneficial in previous pruning schemes. The method is evaluated on three different image classification tasks (with a separate network architecture each) and outperforms the methods it is compared against.

frequency component, neural network, pruning, (12 more...)

Genre: Research Report > Promising Solution (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Aghababaei-Harandi, Ali, Amini, Massih-Reza

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection

arXiv.org Artificial IntelligenceSep-5-2024

Despite their high accuracy, complex neural networks demand significant computational resources, posing challenges for deployment on resource-constrained devices such as mobile phones and embedded systems. Compression algorithms have been developed to address these challenges by reducing model size and computational demands while maintaining accuracy. Among these approaches, factorization methods based on tensor decomposition are theoretically sound and effective. However, they face difficulties in selecting the appropriate rank for decomposition. This paper tackles this issue by presenting a unified framework that simultaneously applies decomposition and optimal rank selection, employing a composite compression loss within defined rank constraints. Our approach includes an automatic rank search in a continuous space, efficiently identifying optimal rank configurations without the use of training data, making it computationally efficient. Combined with a subsequent fine-tuning step, our approach maintains the performance of highly compressed models on par with their original counterparts. Using various benchmark datasets, we demonstrate the efficacy of our method through a comprehensive analysis.

decomposition and optimal rank selection, neural network compression, unified framework

2409.03555

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Dam, Harvey, Joseph, Vinu, Bhaskara, Aditya, Gopalakrishnan, Ganesh, Muralidharan, Saurav, Garland, Michael

Understanding the Effect of the Long Tail on Neural Network Compression

arXiv.org Artificial IntelligenceJun-27-2023

Network compression is now a mature sub-field of neural network research: over the last decade, significant progress has been made towards reducing the size of models and speeding up inference, while maintaining the classification accuracy. However, many works have observed that focusing on just the overall accuracy can be misguided. E.g., it has been shown that mismatches between the full and compressed models can be biased towards under-represented classes. This raises the important research question, can we achieve network compression while maintaining "semantic equivalence" with the original network? In this work, we study this question in the context of the "long tail" phenomenon in computer vision datasets observed by Feldman, et al. They argue that memorization of certain inputs (appropriately defined) is essential to achieving good generalization. As compression limits the capacity of a network (and hence also its ability to memorize), we study the question: are mismatches between the full and compressed models correlated with the memorized training data? We present positive evidence in this direction for image classification tasks, by considering different base architectures and compression schemes.

artificial intelligence, machine learning, training example, (16 more...)

2306.06238

Country:

North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre:

Research Report > Experimental Study (0.70)
Research Report > New Finding (0.49)

Industry: Information Technology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceMar-21-2023

Low Rank Optimization for Efficient Deep Learning: Making A Balance between Compact Architecture and Fast Training

Ou, Xinwei, Chen, Zhangxin, Zhu, Ce, Liu, Yipeng

Deep neural networks have achieved great success in many data processing applications. However, the high computational complexity and storage cost makes deep learning hard to be used on resource-constrained devices, and it is not environmental-friendly with much power cost. In this paper, we focus on low-rank optimization for efficient deep learning techniques. In the space domain, deep neural networks are compressed by low rank approximation of the network parameters, which directly reduces the storage requirement with a smaller number of network parameters. In the time domain, the network parameters can be trained in a few subspaces, which enables efficient training for fast convergence. The model compression in the spatial domain is summarized into three categories as pre-train, pre-set, and compression-aware methods, respectively. With a series of integrable techniques discussed, such as sparse pruning, quantization, and entropy coding, we can ensemble them in an integration framework with lower computational complexity and storage. Besides of summary of recent technical advances, we have two findings for motivating future works: one is that the effective rank outperforms other sparse measures for network compression. The other is a spatial and temporal balance for tensorized neural networks.

artificial intelligence, machine learning, neural network, (17 more...)

2303.13635

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > France (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-13-2023

Neural Network Compression for Noisy Storage Devices

Isik, Berivan, Choi, Kristy, Zheng, Xin, Weissman, Tsachy, Ermon, Stefano, Wong, H. -S. Philip, Alaghi, Armin

Compression and efficient storage of neural network (NN) parameters is critical for applications that run on resource-constrained devices. Despite the significant progress in NN model compression, there has been considerably less investigation in the actual \textit{physical} storage of NN parameters. Conventionally, model compression and physical storage are decoupled, as digital storage media with error-correcting codes (ECCs) provide robust error-free storage. However, this decoupled approach is inefficient as it ignores the overparameterization present in most NNs and forces the memory device to allocate the same amount of resources to every bit of information regardless of its importance. In this work, we investigate analog memory devices as an alternative to digital media -- one that naturally provides a way to add more protection for significant bits unlike its counterpart, but is noisy and may compromise the stored model's performance if used naively. We develop a variety of robust coding strategies for NN weight storage on analog devices, and propose an approach to jointly optimize model compression and memory resource allocation. We then demonstrate the efficacy of our approach on models trained on MNIST, CIFAR-10 and ImageNet datasets for existing compression techniques. Compared to conventional error-free digital storage, our method reduces the memory footprint by up to one order of magnitude, without significantly compromising the stored model's accuracy.

artificial intelligence, machine learning, storage, (15 more...)

2102.07725

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Semiconductors & Electronics (0.66)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)