Perera, Sujan
Learning Personalized Page Content Ranking Using Customer Representation
Shen, Xin, Zhao, Yan, Perera, Sujan, Liu, Yujia, Yan, Jinyun, Goodman, Mitchell
On E-commerce stores, there are rich recommendation content to help shoppers shopping more efficiently. However given numerous products, it's crucial to select most relevant content to reduce the burden of information overload. We introduced a content ranking service powered by a linear causal bandit algorithm to rank and select content for each shopper under each context. The algorithm mainly leverages aggregated customer behavior features, and ignores single shopper level past activities. We study the problem of inferring shoppers interest from historical activities. We propose a deep learning based bandit algorithm that incorporates historical shopping behavior, customer latent shopping goals, and the correlation between customers and content categories. This model produces more personalized content ranking measured by 12.08% nDCG lift.
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Sheth, Amit, Perera, Sujan, Wijeratne, Sanjaya
Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to unsupervised learning from a massive amount of data, albeit much of it relates to one modality/type of data at a time. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition of utilizing knowledge whenever it is available or can be created purposefully. In this paper, we focus on discussing the indispensable role of knowledge for deeper understanding of complex text and multimodal data in situations where (i) large amounts of training data (labeled/unlabeled) are not available or labor intensive to create, (ii) the objects (particularly text) to be recognized are complex (i.e., beyond simple entity-person/location/organization names), such as implicit entities and highly subjective content, and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create knowledge, varying from comprehensive or cross domain to domain or application specific, and (b) carefully exploit the knowledge to further empower or extend the applications of ML/NLP techniques. Using the early results in several diverse situations - both in data types and applications - we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data.
Optimizing Hierarchical Classification with Adaptive Node Collapses
Perera, Sujan (IBM Watson Health) | Raz, Orna (IBM Research) | Routray, Ramani (IBM Watson Health) | Bao, Shenghua (IBM Watson Health) | Zalmanovici, Marcel (IBM Research)
Data intensive solutions, such as solutions that include machine learning components, are becoming more and more prevalent. The standard way of developing such solutions is to train machine learning models with manually annotated or labeled data for a given task. This methodology assumes the existence of ample human annotated data. Unfortunately, this is often not the case, due to imbalanced distribution of classes and lack of human annotation resources. This challenge is exasperated when thousands of hierarchical classes are introduced. Therefore, it is critical to quantify the sufficiency of the data for a given task before applying standard machine learning algorithms. Moreover, it may be the case that there is ample labeled training data to only solve a sub-problem. In particular, in the hierarchical classification problem, the sufficiency level of training data could vary significantly depending on the granularity level of hierarchy we use for classification. We identify a need to decompose the given problem to sub-problems for which there is ample training data. In this paper we propose a methodology to decompose a hierarchical classification problem considering the characteristics of a given dataset. We define an optimization problem of adaptive node collapse that identifies an appropriate hierarchy decomposition based on a trade-off between multiple goals. In our experiments, we consider the trade-off between the learning accuracy and the hierarchy abstraction level.