Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings
Kulunk, Aysenur, Taskin, Berk, Eseoglu, M. Furkan, Sahin, H. Bahadir
–arXiv.org Artificial Intelligence
Abstract--In large scale e-commerce marketplaces, duplicate product listings frequently cause consumer confusion and operational inefficiencies, degrading trust on the platform and increasing costs. Traditional keyword-based search methodologies falter in accurately identifying duplicates due to their reliance on exact textual matches, neglecting semantic similarities inherent in product titles. T o address these challenges, we introduce a scalable, multimodal product deduplication designed specifically for the e-commerce domain. Our approach employs a domain-specific text model grounded in BERT architecture in conjunction with MaskedAutoEncoders for image representations. Both of these architectures are augmented with dimensionality reduction techniques to produce compact 128-dimensional embeddings without significant information loss. Complementing this, we also developed a novel decider model that leverages both text and image vectors. By integrating these feature extraction mechanisms with Milvus, an optimized vector database, our system can facilitate efficient and high-precision similarity searches across extensive product catalogs exceeding 200 million items with just 100GB of system RAM consumption. Empirical evaluations demonstrate that our matching system achieves a macro-average F1 score of 0.90, outperforming third-party solutions which attain an F1 score of 0.83. Our findings show the potential of combining domain-specific adaptations with state-of-the-art machine learning techniques to mitigate duplicate listings in large-scale e-commerce environments. In today's vast e-commerce marketplaces, particularly within the Turkish e-commerce landscape, customers frequently encounter duplicate product listings that create confusion and frustration during shopping.
arXiv.org Artificial Intelligence
Dec-2-2025
- Country:
- Europe (0.28)
- North America > United States
- Minnesota (0.28)
- Genre:
- Research Report > New Finding (0.86)
- Industry:
- Information Technology > Services > e-Commerce Services (1.00)
- Technology: