Text-Based Product Matching -- Semi-Supervised Clustering Approach

Martinek, Alicja, Łukasik, Szymon, Gandomi, Amir H.

Feb-1-2024–arXiv.org Artificial Intelligence

Matching identical products present in multiple product feeds constitutes a crucial element of many tasks of e-commerce, such as comparing product offerings, dynamic price optimization, and selecting the assortment personalized for the client. It corresponds to the well-known machine learning task of entity matching, with its own specificity, like omnipresent unstructured data or inaccurate and inconsistent product descriptions. This paper aims to present a new philosophy to product matching utilizing a semi-supervised clustering approach. We study the properties of this method by experimenting with the IDEC algorithm on the real-world dataset using predominantly textual features and fuzzy string matching, with more standard approaches as a point of reference. Encouraging results show that unsupervised matching, enriched with a small annotated sample of product links, could be a possible alternative to the dominant supervised strategy, requiring extensive manual data labeling.

algorithm, constraint, dataset, (15 more...)

arXiv.org Artificial Intelligence

Feb-1-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America > United States
  - New York > New York County > New York City (0.04)
- Europe > Poland
  - Lesser Poland Province > Kraków (0.04)
- Asia > India
  - NCT > New Delhi (0.04)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Retail (1.00)
- Information Technology > Services
  - e-Commerce Services (0.50)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Natural Language > Text Processing (0.88)
    - Machine Learning
      - Neural Networks > Deep Learning (0.69)
      - Statistical Learning > Clustering (0.47)