high cardinality
Handling Large-scale Cardinality in building recommendation systems
Kurra, Dhruva Dixith, Ling, Bo, Zh, Chun, Ashrafzadeh, Seyedshahin
Effective recommendation systems rely on capturing user preferences, often requiring incorporating numerous features such as universally unique identifiers (UUIDs) of entities. However, the exceptionally high cardinality of UUIDs poses a significant challenge in terms of model degradation and increased model size due to sparsity. This paper presents two innovative techniques to address the challenge of high cardinality in recommendation systems. Specifically, we propose a bag-of-words approach, combined with layer sharing, to substantially decrease the model size while improving performance. Our techniques were evaluated through offline and online experiments on Uber use cases, resulting in promising results demonstrating our approach's effectiveness in optimizing recommendation systems and enhancing their overall performance.
Edelweiss improves cross-sell using machine learning on Amazon SageMaker
This post is co-written by Nikunj Agarwal, lead data scientist at Edelweiss Tokio Life Insurance. Edelweiss Tokio Life Insurance Company Ltd is a leading life insurance company in India. Its broad spectrum of offerings includes life insurance, health insurance, retirement policies, wealth enhancement schemes, education funding, and more. How are you being recommended a credit card based on your savings account behavior? How about a life insurance product when you buy car insurance, or a side dish when you order a main course on your food ordering app?
High number of unique values and tree based models
Having columns of data with high cardinality can adversely affect the performance of your models. The idea of this article stemmed from my personal experience of employing tree based solutions in various projects. In this article I will attempt to show the effects of this on a couple of datasets using the simple decision tree. Cardinality can be defined as the uniqueness of data in the machine learning context. Examples of fields with a high number of unique values include cities, countries, medical diagnosis codes, movie categories on Netflix, flavours of ice cream, etc.