Deep Contextual Embeddings for Address Classification in E-commerce
Mangalgi, Shreyas, Kumar, Lakshya, Tallamraju, Ravindra Babu
E-commerce customers in developing nations like India tend to follow no fixed format while entering shipping addresses. Parsing such addresses is challenging because of a lack of inherent structure or hierarchy. It is imperative to understand the language of addresses, so that shipments can be routed without delays. In this paper, we propose a novel approach towards understanding customer addresses by deriving motivation from recent advances in Natural Language Processing (NLP). We also formulate different pre-processing steps for addresses using a combination of edit distance and phonetic algorithms. Then we approach the task of creating vector representations for addresses using Word2Vec with TF-IDF, Bi-LSTM and BERT based approaches. We compare these approaches with respect to sub-region classification task for North and South Indian cities. Through experiments, we demonstrate the effectiveness of generalized RoBERTa model, pre-trained over a large address corpus for language modelling task. Our proposed RoBERTa model achieves a classification accuracy of around 90% with minimal text preprocessing for sub-region classification task outperforming all other approaches. Once pre-trained, the RoBERTa model can be fine-tuned for various downstream tasks in supply chain like pincode suggestion and geo-coding. The model generalizes well for such tasks even with limited labelled data. To the best of our knowledge, this is the first of its kind research proposing a novel approach of understanding customer addresses in e-commerce domain by pre-training language models and fine-tuning them for different purposes.
Jul-6-2020
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- Illinois > Cook County
- Chicago (0.04)
- California > San Diego County
- San Diego (0.05)
- New York > New York County
- Europe
- Middle East > Malta
- Port Region > Southern Harbour District > Valletta (0.04)
- Italy > Tuscany
- Florence (0.04)
- France > Île-de-France
- Denmark > Capital Region
- Copenhagen (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Middle East > Malta
- Asia
- Nepal (0.04)
- Bangladesh (0.04)
- India
- Karnataka > Bengaluru (0.14)
- Haryana > Faridabad (0.04)
- Uttar Pradesh (0.04)
- Maharashtra > Mumbai (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Overview (0.68)
- Industry:
- Information Technology > Services > e-Commerce Services (0.82)
- Technology: