Text-Based Product Matching -- Semi-Supervised Clustering Approach
Martinek, Alicja, Łukasik, Szymon, Gandomi, Amir H.
–arXiv.org Artificial Intelligence
Matching identical products present in multiple product feeds constitutes a crucial element of many tasks of e-commerce, such as comparing product offerings, dynamic price optimization, and selecting the assortment personalized for the client. It corresponds to the well-known machine learning task of entity matching, with its own specificity, like omnipresent unstructured data or inaccurate and inconsistent product descriptions. This paper aims to present a new philosophy to product matching utilizing a semi-supervised clustering approach. We study the properties of this method by experimenting with the IDEC algorithm on the real-world dataset using predominantly textual features and fuzzy string matching, with more standard approaches as a point of reference. Encouraging results show that unsupervised matching, enriched with a small annotated sample of product links, could be a possible alternative to the dominant supervised strategy, requiring extensive manual data labeling.
arXiv.org Artificial Intelligence
Feb-1-2024
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America > United States
- New York > New York County > New York City (0.04)
- Europe > Poland
- Lesser Poland Province > Kraków (0.04)
- Asia > India
- Oceania > Australia
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Retail (1.00)
- Information Technology > Services
- e-Commerce Services (0.50)
- Technology: