Vasudevan, Sriram
Weak Supervision for Improved Precision in Search Systems
Vasudevan, Sriram
Labeled datasets are essential for modern search engines, which increasingly rely on supervised learning methods like Learning to Rank and massive amounts of data to power deep learning models. However, creating these datasets is both time-consuming and costly, leading to the common use of user click and activity logs as proxies for relevance. In this paper, we present a weak supervision approach to infer the quality of query-document pairs and apply it within a Learning to Rank framework to enhance the precision of a large-scale search system.
Learning to Retrieve for Job Matching
Shen, Jianqiang, Juan, Yuchin, Zhang, Shaobo, Liu, Ping, Pu, Wen, Vasudevan, Sriram, Song, Qingquan, Borisyuk, Fedor, Shen, Kay Qianqi, Wei, Haichao, Ren, Yunxiang, Chiou, Yeou S., Kuang, Sicong, Yin, Yuan, Zheng, Ben, Wu, Muchen, Gharghabi, Shaghayegh, Wang, Xiaoqing, Xue, Huichao, Guo, Qi, Hewlett, Daniel, Simon, Luke, Hong, Liangjie, Zhang, Wenjing
Web-scale search systems typically tackle the scalability challenge As one of the largest professional networking platforms globally, with a two-step paradigm: retrieval and ranking. The retrieval step, LinkedIn is a hub for job seekers and recruiters, with 65M+ job also known as candidate selection, often involves extracting standardized seekers utilizing the search and recommendation services weekly entities, creating an inverted index, and performing term to discover millions of open job listings. To enable realtime personalization matching for retrieval. Such traditional methods require manual for job seekers, we adopted the classic two-stage paradigm and time-consuming development of query models. In this paper, of retrieval and ranking to tackle the scalability challenge. The retrieval we discuss applying learning-to-retrieve technology to enhance layer, also known as candidate selection, chooses a small set LinkedIn's job search and recommendation systems. In the realm of of relevant jobs from the set of all jobs, after which the ranking layer promoted jobs, the key objective is to improve the quality of applicants, performs a more computationally expensive second-pass scoring thereby delivering value to recruiter customers. To achieve and sorting of the resulting candidate set. This paper focuses on this, we leverage confirmed hire data to construct a graph that improving the methodology and systems for retrieval.
LiFT: A Scalable Framework for Measuring Fairness in ML Applications
Vasudevan, Sriram, Kenthapadi, Krishnaram
Many internet applications are powered by machine learned models, which are usually trained on labeled datasets obtained through either implicit / explicit user feedback signals or human judgments. Since societal biases may be present in the generation of such datasets, it is possible for the trained models to be biased, thereby resulting in potential discrimination and harms for disadvantaged groups. Motivated by the need for understanding and addressing algorithmic bias in web-scale ML systems and the limitations of existing fairness toolkits, we present the LinkedIn Fairness Toolkit (LiFT), a framework for scalable computation of fairness metrics as part of large ML systems. We highlight the key requirements in deployed settings, and present the design of our fairness measurement system. We discuss the challenges encountered in incorporating fairness tools in practice and the lessons learned during deployment at LinkedIn. Finally, we provide open problems based on practical experience.