Goto

Collaborating Authors

 training data


Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

Neural Information Processing Systems

We introduce ALIA (Automated Language-guided Image Augmentation), a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains and augment the training data via language-guided image editing.








Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective Huayang Li Tian Lan Zihao Fu Deng Cai Lemao Liu Nigel Collier

Neural Information Processing Systems

In this work, we aim to advance our understanding by presenting a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data. Subsequent experiments also demonstrate that by selectively dropping out the attention to repetitive words in training data, degeneration can be significantly minimized.



Appendix

Neural Information Processing Systems

The annotation tool is a free painting tool, which allows the raters to freely draw the instance mask. We ask the raters to try to draw within the bbox, but if the object is obviously exceeding the bbox, then they can draw outside the bbox. The size of the stroke is adjustable.