Data Distillation: A Survey
Sachdeva, Noveen, McAuley, Julian
–arXiv.org Artificial Intelligence
The popularity of deep learning has led to the curation of a vast number of massive and multifarious datasets. Despite having close-to-human performance on individual tasks, training parameter-hungry models on large datasets poses multi-faceted problems such as (a) high model-training time; (b) slow research iteration; and (c) poor eco-sustainability. As an alternative, data distillation approaches aim to synthesize terse data summaries, which can serve as effective drop-in replacements of the original dataset for scenarios like model training, inference, architecture search, etc. In this survey, we present a formal framework for data distillation, along with providing a detailed taxonomy of existing approaches. Additionally, we cover data distillation approaches for different data modalities, namely images, graphs, and user-item interactions (recommender systems), while also identifying current challenges and future research directions.
arXiv.org Artificial Intelligence
Sep-26-2023
- Country:
- Africa > Togo (0.04)
- North America > United States
- Virginia (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- California > San Diego County
- San Diego (0.04)
- Europe
- Asia
- Middle East > Jordan (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Genre:
- Overview (0.86)
- Research Report (0.63)
- Industry:
- Information Technology (0.46)
- Transportation (0.46)
- Health & Medicine (0.46)
- Technology: