KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Jia, Zhiwei, Narayana, Pradyumna, Akula, Arjun R., Pruthi, Garima, Su, Hao, Basu, Sugato, Jampani, Varun
–arXiv.org Artificial Intelligence
Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world entities, and reasoning over scene-texts, how to interpret image ads is relatively under-explored, especially in the era of foundational vision-language models (VLMs) featuring impressive generalizability and adaptability. In this paper, we perform the first empirical study of image ad understanding through the lens of pre-trained VLMs. We benchmark and reveal practical challenges in adapting these VLMs to image ad understanding. We propose a simple feature adaptation strategy to effectively fuse multimodal information for image ads and further empower it with knowledge of real-world entities. We hope our study draws more attention to image ad understanding which is broadly relevant to the advertising industry.
arXiv.org Artificial Intelligence
May-28-2023
- Genre:
- Research Report (0.50)
- Industry:
- Marketing (0.94)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.67)
- Natural Language > Text Processing (0.94)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence