Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models
Chandu, Khyathi Raghavi, Sharma, Piyush, Changpinyo, Soravit, Thapliyal, Ashish, Soricut, Radu
–arXiv.org Artificial Intelligence
Training large-scale image captioning (IC) models demands access to a rich and diverse set of training examples, gathered from the wild, often from noisy alt-text data. However, recent modeling approaches to IC often fall short in terms of performance in this case, because they assume a clean annotated dataset (as opposed to the noisier alt-text--based annotations), and employ an end-to-end generation approach, which often lacks both controllability and interpretability. We address these problems by breaking down the task into two simpler, more controllable tasks -- skeleton prediction and skeleton-based caption generation. Specifically, we show that selecting content words as skeletons} helps in generating improved and denoised captions when leveraging rich yet noisy alt-text--based uncurated datasets. We also show that the predicted English skeletons can be further cross-lingually leveraged to generate non-English captions, and present experimental results covering caption generation in French, Italian, German, Spanish and Hindi. We also show that skeleton-based prediction allows for better control of certain caption properties, such as length, content, and gender expression, providing a handle to perform human-in-the-loop semi-automatic corrections.
arXiv.org Artificial Intelligence
Oct-30-2022
- Country:
- North America
- United States
- Washington > King County
- Seattle (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Massachusetts > Suffolk County
- Boston (0.04)
- California > Los Angeles County
- Long Beach (0.04)
- Washington > King County
- Canada
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Europe
- Belgium (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Germany
- Berlin (0.04)
- Bavaria > Upper Bavaria
- Munich (0.04)
- Asia
- China > Hong Kong (0.04)
- Japan (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- South Korea
- Seoul > Seoul (0.04)
- Gyeonggi-do > Suwon (0.04)
- North America
- Genre:
- Research Report (0.82)
- Technology: