Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning

Padhi, Trilok, Kursuncu, Ugur, Kumar, Yaman, Shalin, Valerie L., Fronczek, Lane Peterson

Feb-5-2024–arXiv.org Artificial Intelligence

The prevalence of smart devices with the ability to capture moments in multiple modalities has enabled users to experience multimodal information online. However, large Language (LLMs) and Vision models (LVMs) are still limited in capturing holistic meaning with cross-modal semantic relationships. Without explicit, common sense knowledge (e.g., as a knowledge graph), Visual Language Models (VLMs) only learn implicit representations by capturing high-level patterns in vast corpora, missing essential contextual cross-modal cues. In this work, we design a framework to couple explicit commonsense knowledge in the form of knowledge graphs with large VLMs to improve the performance of a downstream task, predicting the effectiveness of multi-modal marketing campaigns. While the marketing application provides a compelling metric for assessing our methods, our approach enables the early detection of likely persuasive multi-modal campaigns and the assessment and augmentation of marketing theory.

large language model, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

Feb-5-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.14)
- North America > United States (0.14)
- Oceania > Australia (0.14)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Health & Medicine > Therapeutic Area (0.93)
- Information Technology (0.67)
- Media (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.68)
    - Performance Analysis > Accuracy (0.68)
  - Natural Language
    - Large Language Model (1.00)
    - Text Processing (1.00)
  - Representation & Reasoning (1.00)
  - Vision (1.00)