DMin: Scalable Training Data Influence Estimation for Diffusion Models
Lin, Huawei, Lao, Yingjie, Zhao, Weijie
–arXiv.org Artificial Intelligence
Identifying the training data samples that most influence a generated image is a critical task in understanding diffusion models, yet existing influence estimation methods are constrained to small-scale or LoRA-tuned models due to computational limitations. As diffusion models scale up, these methods become impractical. To address this challenge, we propose DMin (Diffusion Model influence), a scalable framework for estimating the influence of each training data sample on a given generated image. By leveraging efficient gradient compression and retrieval techniques, DMin reduces storage requirements from 339.39 TB to only 726 MB and retrieves the top-k most influential training samples in under 1 second, all while maintaining performance. Our empirical results demonstrate DMin is both effective in identifying influential training samples and efficient in terms of computational and storage requirements.
arXiv.org Artificial Intelligence
Dec-11-2024
- Country:
- Asia
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East > UAE
- Europe
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- California > Los Angeles County
- Long Beach (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Maryland > Baltimore (0.04)
- Texas > Dallas County
- Dallas (0.04)
- California > Los Angeles County
- Canada > Ontario
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.66)
- Technology: