Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory

Anagnostopoulou, Aliki, Hartmann, Mareike, Sonntag, Daniel

Jun-6-2023–arXiv.org Artificial Intelligence

Interactive machine learning (IML) is a beneficial learning paradigm in cases of limited data availability, as human feedback is incrementally integrated into the training process. In this paper, we present an IML pipeline for image captioning which allows us to incrementally adapt a pre-trained image captioning model to a new data distribution based on user input. In order to incorporate user input into the model, we explore the use of a combination of simple data augmentation methods to obtain larger data batches for each newly annotated data instance and implement continual learning methods to prevent catastrophic forgetting from repeated updates. For our experiments, we split a domain-specific image captioning dataset, namely VizWiz, into non-overlapping parts to simulate an incremental input flow for continually adapting the model to new data. We find that, while data augmentation worsens results, even when relatively small amounts of data are available, episodic memory is an effective strategy to retain knowledge from previously seen clusters.

artificial intelligence, caption, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Jun-6-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Pennsylvania > Philadelphia County
      - Philadelphia (0.04)
    - New York > New York County
      - New York City (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
  - Canada > Newfoundland and Labrador
    - Labrador (0.04)
- Europe
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Germany
    - Saarland (0.04)
    - Berlin (0.04)
  - France > Hauts-de-France
    - Nord > Lille (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine > Consumer Health (0.72)
- Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning > Scripts & Frames (0.62)
  - Machine Learning
    - Neural Networks > Deep Learning (0.94)
    - Statistical Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found