Collaborating Authors

Modeling Product Search Relevance in e-Commerce Machine Learning

With the rapid growth of e-Commerce, online product search has emerged as a popular and effective paradigm for customers to find desired products and engage in online shopping. However, there is still a big gap between the products that customers really desire to purchase and relevance of products that are suggested in response to a query from the customer. In this paper, we propose a robust way of predicting relevance scores given a search query and a product, using techniques involving machine learning, natural language processing and information retrieval. We compare conventional information retrieval models such as BM25 and Indri with deep learning models such as word2vec, sentence2vec and paragraph2vec. We share some of our insights and findings from our experiments.

Machine Learning in Python: Building a Linear Regression Model


Machine Learning in Python: Building a Linear Regression Model In this video, I will be showing you how to build a linear regression model in Python using the scikit-learn package. We will be using the Diabetes dataset (built-in data from scikit-learn) and the Boston Housing (download from GitHub) dataset. This video is part of the [Python Data Science Project] series. If you're new here, it would mean the world to me if you would consider subscribing to this channel. Disclaimer: Chanin is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to

Build PMML-based Applications and Generate Predictions in AWS Amazon Web Services


If you generate machine learning (ML) models, you know that the key challenge is exporting and importing them into other frameworks to separate model generation and prediction. Many applications use PMML (Predictive Model Markup Language) to move ML models from one framework to another. PMML is an XML representation of a data mining model. In this post, I show how to build a PMML application on AWS. First, you build a PMML model in Apache Spark using Amazon EMR.

Large Scale Online Brand Networks to Study Brand Effects

AAAI Conferences

Mining consumer perceptions of brands has been a dominant research area in marketing. The marketing literature provides a well-developed rationale for proposing brands as intangible assets that significantly contribute to firm performance. Consumer-brand perceptions typically collected through surveys or focus groups, require recruitment and interaction with a large set of participants; leading to cost, feasibility and validity issues. The advent of web 2.0 opens the door to the application of a wide range of data-centric approaches which can automate and scale beyond the traditional methods used in marketing science. We address this knowledge area by exploiting social media based brand communities to generate a brand network, incorporating consumer perceptions across a broad ecosystem of brands. A brand network is one in which individual nodes represent brands, and a weighted link between two nodes represents the strength of consumer co-interest in these two brands. The implicit brand-brand network is used to examine two branding effects, in particular, positioning and performance. We use hard and soft clustering algorithms, Walktrap Clustering and Stochastic Block Modelling respectively, to identify subsets of closely related brands; and this provides the basis for examining brand positioning. We also examine how a focal brand’s location in the brand network relates to performance, measured in terms of relative market share. For this, a hierarchical regression analysis is conducted between brand network variables and brand performance. While the size of brand community on Twitter does relate to brand performance, the brand network variables like degree, eigenvector centrality and between-industry links help improve the model fit considerably.

Atlas: A Dataset and Benchmark for E-commerce Clothing Product Categorization Machine Learning

In E-commerce, it is a common practice to organize the product catalog using product taxonomy. This enables the buyer to easily locate the item they are looking for and also to explore various items available under a category. Product taxonomy is a tree structure with 3 or more levels of depth and several leaf nodes. Product categorization is a large scale classification task that assigns a category path to a particular product. Research in this area is restricted by the unavailability of good real-world datasets and the variations in taxonomy due to the absence of a standard across the different e-commerce stores. In this paper, we introduce a high-quality product taxonomy dataset focusing on clothing products which contain 186,150 images under clothing category with 3 levels and 52 leaf nodes in the taxonomy. We explain the methodology used to collect and label this dataset. Further, we establish the benchmark by comparing image classification and Attention based Sequence models for predicting the category path. Our benchmark model reaches a micro f-score of 0.92 on the test set. The dataset, code and pre-trained models are publicly available at \url{}. We invite the community to improve upon these baselines.