KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

Jia, Zhiwei, Narayana, Pradyumna, Akula, Arjun R., Pruthi, Garima, Su, Hao, Basu, Sugato, Jampani, Varun

May-28-2023–arXiv.org Artificial Intelligence

Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world entities, and reasoning over scene-texts, how to interpret image ads is relatively under-explored, especially in the era of foundational vision-language models (VLMs) featuring impressive generalizability and adaptability. In this paper, we perform the first empirical study of image ad understanding through the lens of pre-trained VLMs. We benchmark and reveal practical challenges in adapting these VLMs to image ad understanding. We propose a simple feature adaptation strategy to effectively fuse multimodal information for image ads and further empower it with knowledge of real-world entities. We hope our study draws more attention to image ad understanding which is broadly relevant to the advertising industry.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

May-28-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > San Diego County > San Diego (0.04)

Genre:
- Research Report (0.50)

Industry:
- Marketing (0.94)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Text Processing (0.94)
  - Machine Learning > Neural Networks (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found