Toward Understanding Generative Data Augmentation

Jan-19-2025, 18:35:05 GMT–Neural Information Processing Systems

Generative data augmentation, which scales datasets by obtaining fake labeled examples from a trained conditional generative model, boosts classification performance in various learning tasks including (semi-)supervised learning, few-shot learning, and adversarially robust learning. However, little work has theoretically investigated the effect of generative data augmentation. To fill this gap, we establish a general stability bound in this not independently and identicallydistributed (non-i.i.d.) setting, where the learned distribution is dependent on the original train set and generally not the same as the true distribution. Our theoretical result includes the divergence between the learned distribution and the true distribution. It shows that generative data augmentation can enjoy a faster learning rate when the order of divergence term is o(\max\left( \log(m)\beta_m, 1 / \sqrt{m})\right), where m is the train set size and \beta_m is the corresponding stability constant.

gaussian mixture model, generative data augmentation, learning, (1 more...)

Neural Information Processing Systems

Jan-19-2025, 18:35:05 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)