Goto

Collaborating Authors

 Gharghabi, Shaghayegh


Learning to Retrieve for Job Matching

arXiv.org Artificial Intelligence

Web-scale search systems typically tackle the scalability challenge As one of the largest professional networking platforms globally, with a two-step paradigm: retrieval and ranking. The retrieval step, LinkedIn is a hub for job seekers and recruiters, with 65M+ job also known as candidate selection, often involves extracting standardized seekers utilizing the search and recommendation services weekly entities, creating an inverted index, and performing term to discover millions of open job listings. To enable realtime personalization matching for retrieval. Such traditional methods require manual for job seekers, we adopted the classic two-stage paradigm and time-consuming development of query models. In this paper, of retrieval and ranking to tackle the scalability challenge. The retrieval we discuss applying learning-to-retrieve technology to enhance layer, also known as candidate selection, chooses a small set LinkedIn's job search and recommendation systems. In the realm of of relevant jobs from the set of all jobs, after which the ranking layer promoted jobs, the key objective is to improve the quality of applicants, performs a more computationally expensive second-pass scoring thereby delivering value to recruiter customers. To achieve and sorting of the resulting candidate set. This paper focuses on this, we leverage confirmed hire data to construct a graph that improving the methodology and systems for retrieval.


The UCR Time Series Archive

arXiv.org Machine Learning

The UCR Time Series Archive - introduced in 2002, has become an important resource in the time series data mining community, with at least one thousand published papers making use of at least one dataset from the archive. The original incarnation of the archive had sixteen datasets but since that time, it has gone through periodic expansions. The last expansion took place in the summer of 2015 when the archive grew from 45 datasets to 85 datasets. This paper introduces and will focus on the new data expansion from 85 to 128 datasets. Beyond expanding this valuable resource, this paper offers pragmatic advice to anyone who may wish to evaluate a new algorithm on the archive. Finally, this paper makes a novel and yet actionable claim: of the hundreds of papers that show an improvement over the standard baseline (1-Nearest Neighbor classification), a large fraction may be misattributing the reasons for their improvement. Moreover, they may have been able to achieve the same improvement with a much simpler modification, requiring just a single line of code.