AITopics | model compression technique

Collaborating Authors

model compression technique

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AlignedStructuredSparsityLearningforEfficient ImageSuper-Resolution

Neural Information Processing SystemsFeb-7-2026, 14:56:43 GMT

Lightweight image super-resolution (SR) networks have obtained promising results with moderate model size.

artificial intelligence, machine learning, regularization, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

Aligned Structured Sparsity Learning for Efficient Image Super-Resolution

Neural Information Processing SystemsDec-23-2025, 19:28:14 GMT

Lightweight image super-resolution (SR) networks have obtained promising results with moderate model size. Many SR methods have focused on designing lightweight architectures, which neglect to further reduce the redundancy of network parameters. On the other hand, model compression techniques, like neural architecture search and knowledge distillation, typically consume considerable memory and computation resources. In contrast, network pruning is a cheap and effective model compression technique. However, it is hard to be applied to SR networks directly, because filter pruning for residual blocks is well-known tricky.

aligned structured sparsity learning, efficient image super-resolution, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency

Park, Sihyeong, Jeon, Sungryeol, Lee, Chaelyn, Jeon, Seokhun, Kim, Byung-Soo, Lee, Jemin

arXiv.org Artificial IntelligenceNov-27-2025

Large language models (LLMs) are widely applied in chatbots, code generators, and search engines. Workload such as chain-of-throught, complex reasoning, agent services significantly increase the inference cost by invoke the model repeatedly. Optimization methods such as parallelism, compression, and caching have been adopted to reduce costs, but the diverse service requirements make it hard to select the right method. Recently, specialized LLM inference engines have emerged as a key component for integrating the optimization methods into service-oriented infrastructures. However, a systematic study on inference engines is still lacking.This paper provides a comprehensive evaluation of 25 open-source and commercial inference engines. We examine each inference engine in terms of ease-of-use, ease-of-deployment, general-purpose support, scalability, and suitability for throughput- and latency-aware computation. Furthermore, we explore the design goals of each inference engine by investigating the optimization techniques it supports. In addition, we assess the ecosystem maturity of open source inference engines and handle the performance and cost policy of commercial solutions.We outline future research directions that include support for complex LLM-based services, support of various hardware, and enhanced security, offering practical guidance to researchers and developers in selecting and designing optimized LLM inference engines. We also provide a public repository to continually track developments in this fast-evolving field: \href{https://github.com/sihyeong/Awesome-LLM-Inference-Engine}{https://github.com/sihyeong/Awesome-LLM-Inference-Engine}.

artificial intelligence, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.01658

Country: Asia > South Korea (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Workflow (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.92)
Information Technology > Services (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CognitiveArm: Enabling Real-Time EEG-Controlled Prosthetic Arm Using Embodied Machine Learning

Basit, Abdul, Nawaz, Maha, Rehman, Saim, Shafique, Muhammad

arXiv.org Artificial IntelligenceSep-22-2025

Efficient control of prosthetic limbs via non-invasive brain-computer interfaces (BCIs) requires advanced EEG processing, including pre-filtering, feature extraction, and action prediction, performed in real time on edge AI hardware. Achieving this on resource-constrained devices presents challenges in balancing model complexity, computational efficiency, and latency. We present CognitiveArm, an EEG-driven, brain-controlled prosthetic system implemented on embedded AI hardware, achieving real-time operation without compromising accuracy. The system integrates BrainFlow, an open-source library for EEG data acquisition and streaming, with optimized deep learning (DL) models for precise brain signal classification. Using evolutionary search, we identify Pareto-optimal DL configurations through hyperparameter tuning, optimizer analysis, and window selection, analyzed individually and in ensemble configurations. We apply model compression techniques such as pruning and quantization to optimize models for embedded deployment, balancing efficiency and accuracy. We collected an EEG dataset and designed an annotation pipeline enabling precise labeling of brain signals corresponding to specific intended actions, forming the basis for training our optimized DL models. CognitiveArm also supports voice commands for seamless mode switching, enabling control of the prosthetic arm's 3 degrees of freedom (DoF). Running entirely on embedded hardware, it ensures low latency and real-time responsiveness. A full-scale prototype, interfaced with the OpenBCI UltraCortex Mark IV EEG headset, achieved up to 90% accuracy in classifying three core actions (left, right, idle). Voice integration enables multiplexed, variable movement for everyday tasks (e.g., handshake, cup picking), enhancing real-world performance and demonstrating CognitiveArm's potential for advanced prosthetic control.

accuracy, machine learning, real time system, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/DAC63849.2025.11132917

2508.07731

Country: Asia > Middle East > UAE (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Orthopedics/Orthopedic Surgery (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models

Chen, Yuxuan, Li, Xiao

arXiv.org Artificial IntelligenceJun-24-2025

--Vision-Language-Action models (VLA) have demonstrated remarkable capabilities and promising potential in solving complex robotic manipulation tasks. However, their substantial parameter sizes and high inference latency pose significant challenges for real-world deployment, particularly on resource-constrained robotic platforms. T o address this issue, we begin by conducting an extensive empirical study to explore the effectiveness of model compression techniques when applied to VLAs. Building on the insights gained from these preliminary experiments, we propose RLRC, a three-stage recovery method for compressed VLAs, including structured pruning, performance recovery based on SFT and RL, and further quantization. RLRC achieves up to an 8 reduction in memory usage and a 2.3 improvement in inference throughput, while maintaining or even surpassing the original VLA's task success rate. Extensive experiments show that RLRC consistently outperforms existing compression baselines, demonstrating strong potential for on-device deployment of VLAs. I. INTRODUCTION Recent advances in the field of robot learning have demonstrated new breakthroughs in both the accuracy and generalization of robotic policies for task execution. Since the introduction of RT -2 [1], Vision-Language-Action (VLA) models have attracted increasing attention. These models, built upon large foundation models, exhibit strong generalization capabilities, suggesting a promising path toward the development of general-purpose robots capable of performing a wide range of manipulation tasks. VLA models leverage the general knowledge embedded in pretrained Vision-Language Models (VLMs), while possessing the capability to comprehend language instructions, perceive the visual environment, and generate appropriate actions [2][3][4].

large language model, machine learning, quantization, (18 more...)

arXiv.org Artificial Intelligence

2506.17639

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Add feedback

Aligned Structured Sparsity Learning for Efficient Image Super-Resolution

Neural Information Processing SystemsOct-9-2024, 14:52:53 GMT

aligned structured sparsity learning, efficient image super-resolution, model compression technique, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.65)
Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Adversarial Attacks on Machine Learning in Embedded and IoT Platforms

Westbrook, Christian, Pasricha, Sudeep

arXiv.org Artificial IntelligenceMar-3-2023

Machine learning (ML) algorithms are increasingly being integrated into embedded and IoT systems that surround us, and they are vulnerable to adversarial attacks. The deployment of these ML algorithms on resource-limited embedded platforms also requires the use of model compression techniques. The impact of such model compression techniques on adversarial robustness in ML is an important and emerging area of research. This article provides an overview of the landscape of adversarial attacks and ML model compression techniques relevant to embedded systems. We then describe efforts that seek to understand the relationship between adversarial attacks and ML model compression before discussing open problems in this area.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2303.02214

Country: Asia (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Introduction to DistilBERT in Student Model - Analytics Vidhya

#artificialintelligenceNov-3-2022, 18:35:24 GMT

In 2018, GoogleAI researchers released the BERT model. It was a fantastic work that brought a revolution in the NLP domain. However, the BERT model did have some drawbacks i.e. it was bulky and hence a little slow. To navigate these issues, researchers from Hugging Face proposed DistilBERT, which employed knowledge distillation for model compression. In this article, we will look at this work in more detail.

distilbert, knowledge distillation, probability, (10 more...)

#artificialintelligence

Country: North America > Canada > Ontario > Toronto (0.05)

Genre: Research Report (0.68)

Industry: Education > Educational Technology > Educational Software (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

How Knowledge Distillation is utilised part1(Artificial Intelligence+ Data Mining)

#artificialintelligenceOct-28-2022, 11:50:06 GMT

Abstract: We present Referee, a novel framework for sentence summarization that can be trained reference-free (i.e., requiring no gold summaries for supervision), while allowing direct control for compression ratio. Our work is the first to demonstrate that reference-free, controlled sentence summarization is feasible via the conceptual framework of Symbolic Knowledge Distillation (West et al., 2022), where latent knowledge in pre-trained language models is distilled via explicit examples sampled from the teacher models, further purified with three types of filters: length, fidelity, and Information Bottleneck. Moreover, we uniquely propose iterative distillation of knowledge, where student models from the previous iteration of distillation serve as teacher models in the next iteration. Starting off from a relatively modest set of GPT3-generated summaries, we demonstrate how iterative knowledge distillation can lead to considerably smaller, but better summarizers with sharper controllability. A useful by-product of this iterative distillation process is a high-quality dataset of sentence-summary pairs with varying degrees of compression ratios.

knowledge distillation, machine domain, teacher model, (11 more...)

#artificialintelligence

Industry: Education (0.42)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

Add feedback

Model Compression for Resource-Constrained Mobile Robots

Souroulla, Timotheos, Hata, Alberto, Terra, Ahmad, Özkahraman, Özer, Inam, Rafia

arXiv.org Artificial IntelligenceJul-20-2022

The number of mobile robots with constrained computing resources that need to execute complex machine learning models has been increasing during the past decade. Commonly, these robots rely on edge infrastructure accessible over wireless communication to execute heavy computational complex tasks. However, the edge might become unavailable and, consequently, oblige the execution of the tasks on the robot. This work focuses on making it possible to execute the tasks on the robots by reducing the complexity and the total number of parameters of pre-trained computer vision models. This is achieved by using model compression techniques such as Pruning and Knowledge Distillation. These compression techniques have strong theoretical and practical foundations, but their combined usage has not been widely explored in the literature. Therefore, this work especially focuses on investigating the effects of combining these two compression techniques. The results of this work reveal that up to 90% of the total number of parameters of a computer vision model can be removed without any considerable reduction in the model's accuracy.

experiment, knowledge distillation, student model, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.4204/EPTCS.362.7

2207.10082

Country:

Europe > Sweden > Stockholm > Stockholm (0.05)
South America > Brazil (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback