AITopics | dd method

Collaborating Authors

dd method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Understanding Dataset Distillation via Spectral Filtering

Bo, Deyu, Liu, Songhua, Wang, Xinchao

arXiv.org Artificial IntelligenceMar-3-2025

Dataset distillation (DD) has emerged as a promising approach to compress datasets and speed up model training. However, the underlying connections among various DD methods remain largely unexplored. In this paper, we introduce UniDD, a spectral filtering framework that unifies diverse DD objectives. UniDD interprets each DD objective as a specific filter function that affects the eigenvalues of the feature-feature correlation (FFC) matrix and modulates the frequency components of the feature-label correlation (FLC) matrix. In this way, UniDD reveals that the essence of DD fundamentally lies in matching frequency-specific features. Moreover, according to the filter behaviors, we classify existing methods into low-frequency matching and high-frequency matching, encoding global texture and local details, respectively. However, existing methods rely on fixed filter functions throughout distillation, which cannot capture the low- and high-frequency information simultaneously. To address this limitation, we further propose Curriculum Frequency Matching (CFM), which gradually adjusts the filter parameter to cover both low- and high-frequency information of the FFC and FLC matrices. Extensive experiments on small-scale datasets, such as CIFAR-10/100, and large-scale datasets, including ImageNet-1K, demonstrate the superior performance of CFM over existing baselines and validate the practicality of UniDD.

dataset, dd method, distillation, (16 more...)

arXiv.org Artificial Intelligence

2503.01212

Country: Asia > Singapore (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Dark Distillation: Backdooring Distilled Datasets without Accessing Raw Data

Yang, Ziyuan, Yan, Ming, Zhang, Yi, Zhou, Joey Tianyi

arXiv.org Artificial IntelligenceFeb-6-2025

Dataset distillation (DD) enhances training efficiency and reduces bandwidth by condensing large datasets into smaller synthetic ones. It enables models to achieve performance comparable to those trained on the raw full dataset and has become a widely adopted method for data sharing. However, security concerns in DD remain underexplored. Existing studies typically assume that malicious behavior originates from dataset owners during the initial distillation process, where backdoors are injected into raw datasets. In contrast, this work is the first to address a more realistic and concerning threat: attackers may intercept the dataset distribution process, inject backdoors into the distilled datasets, and redistribute them to users. While distilled datasets were previously considered resistant to backdoor attacks, we demonstrate that they remain vulnerable to such attacks. Furthermore, we show that attackers do not even require access to any raw data to inject the backdoors successfully. Specifically, our approach reconstructs conceptual archetypes for each class from the model trained on the distilled dataset. Backdoors are then injected into these archetypes to update the distilled dataset. Moreover, we ensure the updated dataset not only retains the backdoor but also preserves the original optimization trajectory, thus maintaining the knowledge of the raw dataset. To achieve this, a hybrid loss is designed to integrate backdoor information along the benign optimization trajectory, ensuring that previously learned information is not forgotten. Extensive experiments demonstrate that distilled datasets are highly vulnerable to backdoor attacks, with risks pervasive across various raw datasets, distillation methods, and downstream training strategies. Moreover, our attack method is efficient, capable of synthesizing a malicious distilled dataset in under one minute in certain cases.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.04229

Country:

North America > United States > District of Columbia > Washington (0.05)
Asia > Singapore > Central Region > Singapore (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring the Impact of Dataset Bias on Dataset Distillation

Lu, Yao, Gu, Jianyang, Chen, Xuguang, Vahidian, Saeed, Xuan, Qi

arXiv.org Artificial IntelligenceMar-24-2024

Dataset Distillation (DD) is a promising technique to synthesize a smaller dataset that preserves essential information from the original dataset. This synthetic dataset can serve as a substitute for the original large-scale one, and help alleviate the training workload. However, current DD methods typically operate under the assumption that the dataset is unbiased, overlooking potential bias issues within the dataset itself. To fill in this blank, we systematically investigate the influence of dataset bias on DD. To the best of our knowledge, this is the first exploration in the DD domain. Given that there are no suitable biased datasets for DD, we first construct two biased datasets, CMNIST-DD and CCIFAR10-DD, to establish a foundation for subsequent analysis. Then we utilize existing DD methods to generate synthetic datasets on CMNIST-DD and CCIFAR10-DD, and evaluate their performance following the standard process. Experiments demonstrate that biases present in the original dataset significantly impact the performance of the synthetic dataset in most cases, which highlights the necessity of identifying and mitigating biases in the original datasets during DD. Finally, we reformulate DD within the context of a biased dataset. Our code along with biased datasets are available at https://github.com/yaolu-zjut/Biased-DD.

dataset, distillation, synthetic dataset, (14 more...)

arXiv.org Artificial Intelligence

2403.16028

Country: Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study

Zhao, Lirui, Zhang, Yuxin, Lin, Mingbao, Chao, Fei, Ji, Rongrong

arXiv.org Artificial IntelligenceDec-9-2023

The poor cross-architecture generalization of dataset distillation greatly weakens its practical significance. This paper attempts to mitigate this issue through an empirical study, which suggests that the synthetic datasets undergo an inductive bias towards the distillation model. Therefore, the evaluation model is strictly confined to having similar architectures of the distillation model. We propose a novel method of EvaLuation with distillation Feature (ELF), which utilizes features from intermediate layers of the distillation model for the cross-architecture evaluation. In this manner, the evaluation model learns from bias-free knowledge therefore its architecture becomes unfettered while retaining performance. By performing extensive experiments, we successfully prove that ELF can well enhance the cross-architecture generalization of current DD methods. Code of this project is at \url{https://github.com/Lirui-Zhao/ELF}.

dataset, distillation model, evaluation model, (13 more...)

arXiv.org Artificial Intelligence

2312.05598

Country:

North America > United States > California > San Mateo County > Menlo Park (0.04)
Asia > China > Fujian Province > Xiamen (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
(3 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

A working likelihood approach to support vector regression with a data-driven insensitivity parameter

Wu, Jinran, Wang, You-Gan

arXiv.org Machine LearningMar-8-2020

The insensitive parameter in support vector regression determines the set of support vectors that greatly impacts the prediction. A data-driven approach is proposed to determine an approximate value for this insensitive parameter by minimizing a generalized loss function originating from the likelihood principle. This data-driven support vector regression also statistically standardizes samples using the scale of noises. Nonlinear and linear numerical simulations with three types of noises ($\epsilon$-Laplacian distribution, normal distribution, and uniform distribution), and in addition, five real benchmark data sets, are used to test the capacity of the proposed method. Based on all of the simulations and the five case studies, the proposed support vector regression using a working likelihood, data-driven insensitive parameter is superior and has lower computational costs.

dd method, insensitive parameter, noise, (14 more...)

arXiv.org Machine Learning

2003.03893

Country:

Oceania > Australia > Queensland > Brisbane (0.04)
Asia > China > Yunnan Province > Kunming (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback