AITopics | distillation model

Collaborating Authors

distillation model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rectifying Soft-Label Entangled Bias in Long-Tailed Dataset Distillation

Neural Information Processing SystemsJun-22-2026, 15:01:25 GMT

However, existing research primarily focuses on balanced datasets and struggles to perform under real-world long-tailed distributions. In this work, we emphasize the critical role of soft labels in long-tailed dataset distillation and uncover the underlying mechanisms contributing to performance degradation. Specifically, we derive an imbalance-aware generalization bound for model trained on distilled dataset. We then identify two primary sources of soft-label bias, which originate from the distillation model and the distilled images, through systematic perturbation of the data imbalance levels. To address this, we propose ADSA, an Adaptive Soft-label Alignment module that calibrates the entangled biases. This lightweight module integrates seamlessly into existing distillation pipelines and consistently improves performance. On ImageNet-1k-LT with EDC and IPC=50, ADSA improves tailclass accuracy by up to 11.8% and raises overall accuracy to 41.4%. Extensive experiments demonstrate that ADSA provides a robust and generalizable solution under limited label budgets and across a range of distillation techniques.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Rectifying Soft-Label Entangled Bias in Long-Tailed Dataset Distillation

Jiang, Chenyang, Zhao, Hang, Zhang, Xinyu, Li, Zhengcen, Shan, Qiben, Wu, Shaocong, Su, Jingyong

arXiv.org Artificial IntelligenceNov-25-2025

Dataset distillation compresses large-scale datasets into compact, highly informative synthetic data, significantly reducing storage and training costs. However, existing research primarily focuses on balanced datasets and struggles to perform under real-world long-tailed distributions. In this work, we emphasize the critical role of soft labels in long-tailed dataset distillation and uncover the underlying mechanisms contributing to performance degradation. Specifically, we derive an imbalance-aware generalization bound for model trained on distilled dataset. We then identify two primary sources of soft-label bias, which originate from the distillation model and the distilled images, through systematic perturbation of the data imbalance levels. To address this, we propose ADSA, an Adaptive Soft-label Alignment module that calibrates the entangled biases. This lightweight module integrates seamlessly into existing distillation pipelines and consistently improves performance. On ImageNet-1k-LT with EDC and IPC=50, ADSA improves tail-class accuracy by up to 11.8% and raises overall accuracy to 41.4%. Extensive experiments demonstrate that ADSA provides a robust and generalizable solution under limited label budgets and across a range of distillation techniques. Code is available at: https://github.com/j-cyoung/ADSA_DD.git.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.17914

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Ensemble Distribution Distillation for Self-Supervised Human Activity Recognition

Nolan, Matthew, Yao, Lina, Davidson, Robert

arXiv.org Artificial IntelligenceSep-11-2025

Human Activity Recognition (HAR) has seen significant advancements with the adoption of deep learning techniques, yet challenges remain in terms of data requirements, reliability and robustness. This paper explores a novel application of Ensemble Distribution Distillation (EDD) within a self-supervised learning framework for HAR aimed at overcoming these challenges. By leveraging unlabeled data and a partially supervised training strategy, our approach yields an increase in predictive accuracy, robust estimates of uncertainty, and substantial increases in robustness against adversarial perturbation; thereby significantly improving reliability in real-world scenarios without increasing computational complexity at inference. We demonstrate this with an evaluation on several publicly available datasets. The contributions of this work include the development of a self-supervised EDD framework, an innovative data augmentation technique designed for HAR, and empirical validation of the proposed method's effectiveness in increasing robustness and reliability.

artificial intelligence, inductive learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.08225

Country:

Oceania > Australia (0.46)
Asia (0.28)

Genre:

Research Report > New Finding (0.94)
Overview (0.88)

Industry:

Information Technology (0.93)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.89)

Add feedback

Do GNN-based QEC Decoders Require Classical Knowledge? Evaluating the Efficacy of Knowledge Distillation from MWPM

Ikeda, Ryota

arXiv.org Artificial IntelligenceAug-7-2025

Quantum computers hold the potential to outperform classical computers on specific computational problems, but their realization is hindered by the fragility of qubits due to decoherence. Quantum Error Correction (QEC) is an essential technology to overcome this challenge, enabling the detection and correction of errors by redundantly encoding a single logical qubit into multiple physical qubits. The performance of QEC is critically dependent on the classical "decoder" algorithm, which interprets the error syndrome to deduce the appropriate correction operation. The standard decoder for the surface code, Minimum-Weight Perfect Matching (MWPM) [1], performs well under a simplified noise model where errors are assumed to be independent and identically distributed (i.i.d.). However, noise in real quantum devices exhibits complex spatio-temporal correlations, and the discrepancy between the theoretical model and reality can degrade the decoder's performance. To address this, decoders based on machine learning, such as Graph Neural Networks (GNNs), have emerged as a promising alternative [2, 3]. GNNs have the ability to learn error patterns directly from data. It is generally anticipated that injecting physical prior knowledge into a GNN should improve its performance. Specifically, "knowledge distillation" [4], which transfers the knowledge of theoretical error structures from MWPM to a GNN, is considered a concrete method to realize this hypothesis. 1

artificial intelligence, knowledge distillation, machine learning, (10 more...)

arXiv.org Artificial Intelligence

2508.03782

Genre: Research Report > New Finding (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Add feedback

Interview with Yuki Mitsufuji: Text-to-sound generation

AIHubJul-29-2025, 13:29:45 GMT

Earlier this year, we spoke to Yuki Mitsufuji, Lead Research Scientist at Sony AI, about work concerning different aspects of image generation. Yuki and his team have since extended their work to sound generation, presenting work at ICLR 2025 entitled: SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation. We caught up with Yuki to find out more. Creating sounds for different types of multimedia, such as video games and movies, takes a lot of experimenting, as artists try to match sounds to their evolving creative ideas. New high-quality diffusion-based Text-to-Sound (T2S) generative models can help with this process, but they are often slow, which makes it harder for creators to experiment quickly.

artificial intelligence, machine learning, yuki mitsufuji, (11 more...)

AIHub

Genre:

Research Report (0.73)
Personal > Interview (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.53)
Information Technology > Artificial Intelligence > Vision (0.37)

Add feedback

Robust Planning for Autonomous Vehicles with Diffusion-Based Failure Samplers

Wang, Juanran, Schlichting, Marc R., Kochenderfer, Mykel J.

arXiv.org Artificial IntelligenceJul-17-2025

--High-risk traffic zones such as intersections are a major cause of collisions. This study leverages deep generative models to enhance the safety of autonomous vehicles in an intersection context. We train a 1000-step denoising diffusion probabilistic model to generate collision-causing sensor noise sequences for an autonomous vehicle navigating a four-way intersection based on the current relative position and velocity of an intruder . Using the generative adversarial architecture, the 1000-step model is distilled into a single-step denoising diffusion model which demonstrates fast inference speed while maintaining similar sampling quality. We demonstrate one possible application of the single-step model in building a robust planner for the autonomous vehicle. The planner uses the single-step model to efficiently sample potential failure cases based on the currently measured traffic state to inform its decision-making. Through simulation experiments, the robust planner demonstrates significantly lower failure rate and delay rate compared with the baseline Intelligent Driver Model controller .

artificial intelligence, machine learning, vehicle, (20 more...)

arXiv.org Artificial Intelligence

2507.11991

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (1.00)

Industry: Transportation (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

A Survey on Pre-Trained Diffusion Model Distillations

Fan, Xuhui, Wu, Zhangkai, Wu, Hongyu

arXiv.org Artificial IntelligenceFeb-12-2025

Diffusion Models~(DMs) have emerged as the dominant approach in Generative Artificial Intelligence (GenAI), owing to their remarkable performance in tasks such as text-to-image synthesis. However, practical DMs, such as stable diffusion, are typically trained on massive datasets and thus usually require large storage. At the same time, many steps may be required, i.e., recursively evaluating the trained neural network, to generate a high-quality image, which results in significant computational costs during sample generation. As a result, distillation methods on pre-trained DM have become widely adopted practices to develop smaller, more efficient models capable of rapid, few-step generation in low-resource environment. When these distillation methods are developed from different perspectives, there is an urgent need for a systematic survey, particularly from a methodological perspective. In this survey, we review distillation methods through three aspects: output loss distillation, trajectory distillation and adversarial distillation. We also discuss current challenges and outline future research directions in the conclusion.

artificial intelligence, distillation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.08364

Country: Oceania > Australia (0.04)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Inference-Time Diffusion Model Distillation

Park, Geon Yeong, Lee, Sang Wan, Ye, Jong Chul

arXiv.org Artificial IntelligenceDec-11-2024

Diffusion distillation models effectively accelerate reverse sampling by compressing the process into fewer steps. However, these models still exhibit a performance gap compared to their pre-trained diffusion model counterparts, exacerbated by distribution shifts and accumulated errors during multi-step sampling. To address this, we introduce Distillation++, a novel inference-time distillation framework that reduces this gap by incorporating teacher-guided refinement during sampling. Inspired by recent advances in conditional sampling, our approach recasts student model sampling as a proximal optimization problem with a score distillation sampling loss (SDS). To this end, we integrate distillation optimization during reverse sampling, which can be viewed as teacher guidance that drives student sampling trajectory towards the clean manifold using pre-trained diffusion models. Thus, Distillation++ improves the denoising process in real-time without additional source data or fine-tuning. Distillation++ demonstrates substantial improvements over state-of-the-art distillation baselines, particularly in early sampling stages, positioning itself as a robust guided sampling process crafted for diffusion distillation models. Code: https://github.com/geonyeong-park/inference_distillation.

artificial intelligence, distillation, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2412.08871

Country: Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report (1.00)

Industry: Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning

Li, Yichuan, Ma, Xiyao, Lu, Sixing, Lee, Kyumin, Liu, Xiaohu, Guo, Chenlei

arXiv.org Artificial IntelligenceMar-12-2024

Large Language models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities, where a LLM makes predictions for a given test input together with a few input-output pairs (demonstrations). Nevertheless, the inclusion of demonstrations leads to a quadratic increase in the computational overhead of the self-attention mechanism. Existing solutions attempt to distill lengthy demonstrations into compact vectors. However, they often require task-specific retraining or compromise LLM's in-context learning performance. To mitigate these challenges, we present Meta dEmonstratioN Distillation (MEND), where a language model learns to distill any lengthy demonstrations into vectors without retraining for a new downstream task. We exploit the knowledge distillation to enhance alignment between MEND and LLM, achieving both efficiency and effectiveness simultaneously. MEND is endowed with the meta-knowledge of distilling demonstrations through a two-stage training process, which includes meta-distillation pretraining and fine-tuning. Comprehensive evaluations across seven diverse ICL task partitions using decoder-only (GPT-2) and encoder-decoder (T5) attest to MEND's prowess. It not only matches but often outperforms the Vanilla ICL as well as other state-of-the-art distillation models, while significantly reducing the computational demands. This innovation promises enhanced scalability and efficiency for the practical deployment of large language models

demonstration, distillation, mend, (14 more...)

arXiv.org Artificial Intelligence

2403.06914

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study

Zhao, Lirui, Zhang, Yuxin, Lin, Mingbao, Chao, Fei, Ji, Rongrong

arXiv.org Artificial IntelligenceDec-9-2023

The poor cross-architecture generalization of dataset distillation greatly weakens its practical significance. This paper attempts to mitigate this issue through an empirical study, which suggests that the synthetic datasets undergo an inductive bias towards the distillation model. Therefore, the evaluation model is strictly confined to having similar architectures of the distillation model. We propose a novel method of EvaLuation with distillation Feature (ELF), which utilizes features from intermediate layers of the distillation model for the cross-architecture evaluation. In this manner, the evaluation model learns from bias-free knowledge therefore its architecture becomes unfettered while retaining performance. By performing extensive experiments, we successfully prove that ELF can well enhance the cross-architecture generalization of current DD methods. Code of this project is at \url{https://github.com/Lirui-Zhao/ELF}.

dataset, distillation model, evaluation model, (13 more...)

arXiv.org Artificial Intelligence

2312.05598

Country:

North America > United States > California > San Mateo County > Menlo Park (0.04)
Asia > China > Fujian Province > Xiamen (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
(3 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback