Zero-Shot Retrieval for Scalable Visual Search in a Two-Sided Marketplace
Rusli, Andre, Ishimoto, Shoma, Akiyama, Sho, Singh, Aman Kumar
–arXiv.org Artificial Intelligence
Visual search offers an intuitive way for customers to explore diverse product catalogs, particularly in consumer-to-consumer (C2C) marketplaces where listings are often unstructured and visually driven. This paper presents a scalable visual search system deployed in Mercari's C2C marketplace, where end-users act as buyers and sellers. We evaluate recent vision-language models for zero-shot image retrieval and compare their performance with an existing fine-tuned baseline. The system integrates real-time inference and background indexing workflows, supported by a unified embedding pipeline optimized through dimensionality reduction. Offline evaluation using user interaction logs shows that the multilingual SigLIP model outperforms other models across multiple retrieval metrics, achieving a 13.3% increase in nDCG@5 over the baseline. A one-week online A/B test in production further confirms real-world impact, with the treatment group showing substantial gains in engagement and conversion, up to a 40.9% increase in transaction rate via image search. Our findings highlight that recent zero-shot models can serve as a strong and practical baseline for production use, which enables teams to deploy effective visual search systems with minimal overhead, while retaining the flexibility to fine-tune based on future data or domain-specific needs.
arXiv.org Artificial Intelligence
Aug-11-2025
- Country:
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
- North America
- Canada > Ontario
- Toronto (0.05)
- United States > New York
- New York County > New York City (0.04)
- Canada > Ontario
- Asia > Japan
- Genre:
- Research Report > Experimental Study (0.69)
- Industry:
- Information Technology (0.48)
- Technology: