DashCLIP: Leveraging multimodal models for generating semantic embeddings for DoorDash

Gurjar, Omkar, Liu, Kin Sum, Kolli, Praveen, Kumar, Utsaw, Rahurkar, Mandar

Nov-7-2025–arXiv.org Artificial Intelligence

Despite the success of vision-language models in various generative tasks, obtaining high-quality semantic representations for products and user intents is still challenging due to the inability of off-the-shelf models to capture nuanced relationships between the entities. In this paper, we introduce a joint training framework for product and user queries by aligning uni-modal and multi-modal encoders through contrastive learning on image-text data. Our novel approach trains a query encoder with an LLM-curated relevance dataset, eliminating the reliance on engagement history. These embeddings demonstrate strong generalization capabilities and improve performance across applications, including product categorization and relevance prediction. For personalized ads recommendation, a significant uplift in the click-through rate and conversion rate after the deployment further confirms the impact on key business metrics. We believe that the flexibility of our framework makes it a promising solution toward enriching the user experience across the e-commerce landscape.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Nov-7-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland (0.28)

Genre:
- Research Report
  - Promising Solution (0.54)
  - New Finding (0.46)

Industry:
- Information Technology > Services (1.00)

Technology:
- Information Technology
  - Information Management > Search (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language
      - Text Processing (1.00)
      - Large Language Model (1.00)
    - Machine Learning
      - Statistical Learning (1.00)
      - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found