AITopics | Borisyuk, Fedor

Collaborating Authors

Borisyuk, Fedor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications

Behdin, Kayhan, Dai, Yun, Fatahibaarzi, Ata, Gupta, Aman, Song, Qingquan, Tang, Shao, Sang, Hejian, Dexter, Gregory, Zhu, Sirou, Zhu, Siyu, Dharamsi, Tejas, Sanjabi, Maziar, Kothapalli, Vignesh, Firooz, Hamed, Fu, Zhoutong, Cao, Yihan, Hsu, Pin-Lun, Borisyuk, Fedor, Wang, Zhipeng, Mazumder, Rahul, Pillai, Natesh, Simon, Luke

arXiv.org Artificial IntelligenceFeb-20-2025

Large language models (LLMs) have demonstrated remarkable performance across a wide range of industrial applications, from search and recommendations to generative tasks. Although scaling laws indicate that larger models generally yield better generalization and performance, their substantial computational requirements often render them impractical for many real-world scenarios at scale. In this paper, we present methods and insights for training small language models (SLMs) that deliver high performance and efficiency in deployment. We focus on two key techniques: (1) knowledge distillation and (2) model compression via quantization and pruning. These approaches enable SLMs to retain much of the quality of their larger counterparts while significantly reducing training, serving costs, and latency. We detail the impact of these techniques on a variety of use cases at a large professional social network platform and share deployment lessons - including hardware optimization strategies that enhance speed and throughput for both predictive and reasoning-based applications.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.14305

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

From Features to Transformers: Redefining Ranking for Scalable Impact

Borisyuk, Fedor, Hertel, Lars, Parameswaran, Ganesh, Srivastava, Gaurav, Ramanujam, Sudarshan Srinivasa, Ocejo, Borja, Du, Peng, Akterskii, Andrei, Daftary, Neil, Tang, Shao, Sun, Daqi, Xiao, Qiang Charles, Nathani, Deepesh, Kothari, Mohit, Dai, Yun, Gupta, Aman

arXiv.org Artificial IntelligenceFeb-5-2025

We present LiGR, a large-scale ranking framework developed at LinkedIn that brings state-of-the-art transformer-based modeling architectures into production. We introduce a modified transformer architecture that incorporates learned normalization and simultaneous set-wise attention to user history and ranked items. This architecture enables several breakthrough achievements, including: (1) the deprecation of most manually designed feature engineering, outperforming the prior state-of-the-art system using only few features (compared to hundreds in the baseline), (2) validation of the scaling law for ranking systems, showing improved performance with larger models, more training data, and longer context sequences, and (3) simultaneous joint scoring of items in a set-wise manner, leading to automated improvements in diversity. To enable efficient serving of large ranking models, we describe techniques to scale inference effectively using single-pass processing of user history and set-wise attention. We also summarize key insights from various ablation studies and A/B tests, highlighting the most impactful technical approaches.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.03417

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

Add feedback

Efficient user history modeling with amortized inference for deep learning recommendation models

Hertel, Lars, Daftary, Neil, Borisyuk, Fedor, Gupta, Aman, Mazumder, Rahul

arXiv.org Artificial IntelligenceDec-9-2024

We study user history modeling via Transformer encoders in deep learning recommendation models (DLRM). Such architectures can significantly improve recommendation quality, but usually incur high latency cost necessitating infrastructure upgrades or very small Transformer models. An important part of user history modeling is early fusion of the candidate item and various methods have been studied. We revisit early fusion and compare concatenation of the candidate to each history item against appending it to the end of the list as a separate item. Using the latter method, allows us to reformulate the recently proposed amortized history inference algorithm M-FALCON \cite{zhai2024actions} for the case of DLRM models. We show via experimental results that appending with cross-attention performs on par with concatenation and that amortization significantly reduces inference costs. We conclude with results from deploying this model on the LinkedIn Feed and Ads surfaces, where amortization reduces latency by 30\% compared to non-amortized inference.

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Artificial Intelligence

2412.06924

Country: North America > United States > California (0.16)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.50)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

LiNR: Model Based Neural Retrieval on GPUs at LinkedIn

Borisyuk, Fedor, Song, Qingquan, Zhou, Mingzhou, Parameswaran, Ganesh, Arun, Madhu, Popuri, Siva, Bingol, Tugrul, Pei, Zhuotao, Lee, Kuang-Hsuan, Zheng, Lu, Shao, Qizhan, Naqvi, Ali, Zhou, Sen, Gupta, Aman

arXiv.org Artificial IntelligenceJul-18-2024

This paper introduces LiNR, LinkedIn's large-scale, GPU-based retrieval system. LiNR supports a billion-sized index on GPU models. We discuss our experiences and challenges in creating scalable, differentiable search indexes using TensorFlow and PyTorch at production scale. In LiNR, both items and model weights are integrated into the model binary. Viewing index construction as a form of model training, we describe scaling our system for large indexes, incorporating full scans and efficient filtering. A key focus is on enabling attribute-based pre-filtering for exhaustive GPU searches, addressing the common challenge of post-filtering in KNN searches that often reduces system quality. We further provide multi-embedding retrieval algorithms and strategies for tackling cold start issues in retrieval. Our advancements in supporting larger indexes through quantization are also discussed. We believe LiNR represents one of the industry's first Live-updated model-based retrieval indexes. Applied to out-of-network post recommendations on LinkedIn Feed, LiNR has contributed to a 3% relative increase in professional daily active users. We envisage LiNR as a step towards integrating retrieval and ranking into a single GPU model, simplifying complex infrastructures and enabling end-to-end optimization of the entire differentiable infrastructure through gradient descent.

artificial intelligence, data mining, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2407.13218

Country:

North America > United States > Idaho (0.14)
North America > United States > New York (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Services (0.88)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

LiMAML: Personalization of Deep Recommender Models via Meta Learning

Wang, Ruofan, Prabhakar, Prakruthi, Srivastava, Gaurav, Wang, Tianqi, Jalali, Zeinab S., Bharill, Varun, Ouyang, Yunbo, Nigam, Aastha, Venugopalan, Divya, Gupta, Aman, Borisyuk, Fedor, Keerthi, Sathiya, Muralidharan, Ajith

arXiv.org Artificial IntelligenceFeb-23-2024

In the realm of recommender systems, the ubiquitous adoption of deep neural networks has emerged as a dominant paradigm for modeling diverse business objectives. As user bases continue to expand, the necessity of personalization and frequent model updates have assumed paramount significance to ensure the delivery of relevant and refreshed experiences to a diverse array of members. In this work, we introduce an innovative meta-learning solution tailored to the personalization of models for individual members and other entities, coupled with the frequent updates based on the latest user interaction signals. Specifically, we leverage the Model-Agnostic Meta Learning (MAML) algorithm to adapt per-task sub-networks using recent user interaction data. Given the near infeasibility of productionizing original MAML-based models in online recommendation systems, we propose an efficient strategy to operationalize meta-learned sub-networks in production, which involves transforming them into fixed-sized vectors, termed meta embeddings, thereby enabling the seamless deployment of models with hundreds of billions of parameters for online serving. Through extensive experimentation on production data drawn from various applications at LinkedIn, we demonstrate that the proposed solution consistently outperforms the baseline models of those applications, including strong baselines such as using wide-and-deep ID based personalization approach. Our approach has enabled the deployment of a range of highly personalized AI models across diverse LinkedIn applications, leading to substantial improvements in business metrics as well as refreshed experience for our members.

artificial intelligence, limaml, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2403.00803

Country:

Europe > Spain (0.16)
North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Learning to Retrieve for Job Matching

Shen, Jianqiang, Juan, Yuchin, Zhang, Shaobo, Liu, Ping, Pu, Wen, Vasudevan, Sriram, Song, Qingquan, Borisyuk, Fedor, Shen, Kay Qianqi, Wei, Haichao, Ren, Yunxiang, Chiou, Yeou S., Kuang, Sicong, Yin, Yuan, Zheng, Ben, Wu, Muchen, Gharghabi, Shaghayegh, Wang, Xiaoqing, Xue, Huichao, Guo, Qi, Hewlett, Daniel, Simon, Luke, Hong, Liangjie, Zhang, Wenjing

arXiv.org Artificial IntelligenceFeb-20-2024

Web-scale search systems typically tackle the scalability challenge As one of the largest professional networking platforms globally, with a two-step paradigm: retrieval and ranking. The retrieval step, LinkedIn is a hub for job seekers and recruiters, with 65M+ job also known as candidate selection, often involves extracting standardized seekers utilizing the search and recommendation services weekly entities, creating an inverted index, and performing term to discover millions of open job listings. To enable realtime personalization matching for retrieval. Such traditional methods require manual for job seekers, we adopted the classic two-stage paradigm and time-consuming development of query models. In this paper, of retrieval and ranking to tackle the scalability challenge. The retrieval we discuss applying learning-to-retrieve technology to enhance layer, also known as candidate selection, chooses a small set LinkedIn's job search and recommendation systems. In the realm of of relevant jobs from the set of all jobs, after which the ranking layer promoted jobs, the key objective is to improve the quality of applicants, performs a more computationally expensive second-pass scoring thereby delivering value to recruiter customers. To achieve and sorting of the resulting candidate set. This paper focuses on this, we leverage confirmed hire data to construct a graph that improving the methodology and systems for retrieval.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.13435

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Information Technology > Services (0.86)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
(2 more...)

Add feedback

LinkSAGE: Optimizing Job Matching Using Graph Neural Networks

Liu, Ping, Wei, Haichao, Hou, Xiaochen, Shen, Jianqiang, He, Shihai, Shen, Kay Qianqi, Chen, Zhujun, Borisyuk, Fedor, Hewlett, Daniel, Wu, Liang, Veeraraghavan, Srikant, Tsun, Alex, Jiang, Chengming, Zhang, Wenjing

arXiv.org Artificial IntelligenceFeb-20-2024

We present LinkSAGE, an innovative framework that integrates Graph Neural Networks (GNNs) into large-scale personalized job matching systems, designed to address the complex dynamics of LinkedIns extensive professional network. Our approach capitalizes on a novel job marketplace graph, the largest and most intricate of its kind in industry, with billions of nodes and edges. This graph is not merely extensive but also richly detailed, encompassing member and job nodes along with key attributes, thus creating an expansive and interwoven network. A key innovation in LinkSAGE is its training and serving methodology, which effectively combines inductive graph learning on a heterogeneous, evolving graph with an encoder-decoder GNN model. This methodology decouples the training of the GNN model from that of existing Deep Neural Nets (DNN) models, eliminating the need for frequent GNN retraining while maintaining up-to-date graph signals in near realtime, allowing for the effective integration of GNN insights through transfer learning. The subsequent nearline inference system serves the GNN encoder within a real-world setting, significantly reducing online latency and obviating the need for costly real-time GNN infrastructure. Validated across multiple online A/B tests in diverse product scenarios, LinkSAGE demonstrates marked improvements in member engagement, relevance matching, and member retention, confirming its generalizability and practical impact.

artificial intelligence, graph, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2402.1343

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Services (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LiRank: Industrial Large Scale Ranking Models at LinkedIn

Borisyuk, Fedor, Zhou, Mingzhou, Song, Qingquan, Zhu, Siyu, Tiwana, Birjodh, Parameswaran, Ganesh, Dangi, Siddharth, Hertel, Lars, Xiao, Qiang, Hou, Xiaochen, Ouyang, Yunbo, Gupta, Aman, Singh, Sheallika, Liu, Dan, Cheng, Hailing, Le, Lei, Hung, Jonathan, Keerthi, Sathiya, Wang, Ruoyan, Zhang, Fengyu, Kothari, Mohit, Zhu, Chen, Sun, Daqi, Dai, Yun, Luan, Xun, Zhu, Sirou, Wang, Zhiwei, Daftary, Neil, Shen, Qianqi, Jiang, Chengming, Wei, Haichao, Varshney, Maneesh, Ghoting, Amol, Ghosh, Souvik

arXiv.org Artificial IntelligenceFeb-9-2024

We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including Dense Gating, Transformers and Residual DCN. We also propose novel techniques for calibration and describe how we productionalized deep learning based explore/exploit methods. To enable effective, production-grade serving of large ranking models, we detail how to train and compress models using quantization and vocabulary compression. We provide details about the deployment setup for large-scale use cases of Feed ranking, Jobs Recommendations, and Ads click-through rate (CTR) prediction. We summarize our learnings from various A/B tests by elucidating the most effective technical approaches. These ideas have contributed to relative metrics improvements across the board at LinkedIn: +0.5% member sessions in the Feed, +1.76% qualified job applications for Jobs search and recommendations, and +4.3% for Ads CTR. We hope this work can provide practical insights and solutions for practitioners interested in leveraging large-scale deep ranking systems.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2402.06859

Country:

Europe > Spain (0.16)
North America > United States (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MultiSlot ReRanker: A Generic Model-based Re-Ranking Framework in Recommendation Systems

Xiao, Qiang Charles, Muralidharan, Ajith, Tiwana, Birjodh, Jia, Johnson, Borisyuk, Fedor, Gupta, Aman, Woodard, Dawn

arXiv.org Artificial IntelligenceJan-11-2024

In this paper, we propose a generic model-based re-ranking framework, MultiSlot ReRanker, which simultaneously optimizes relevance, diversity, and freshness. Specifically, our Sequential Greedy Algorithm (SGA) is efficient enough (linear time complexity) for large-scale production recommendation engines. It achieved a lift of $+6\%$ to $ +10\%$ offline Area Under the receiver operating characteristic Curve (AUC) which is mainly due to explicitly modeling mutual influences among items of a list, and leveraging the second pass ranking scores of multiple objectives. In addition, we have generalized the offline replay theory to multi-slot re-ranking scenarios, with trade-offs among multiple objectives. The offline replay results can be further improved by Pareto Optimality. Moreover, we've built a multi-slot re-ranking simulator based on OpenAI Gym integrated with the Ray framework. It can be easily configured for different assumptions to quickly benchmark both reinforcement learning and supervised learning algorithms.

artificial intelligence, machine learning, multislot reranker, (15 more...)

arXiv.org Artificial Intelligence

2401.06293

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback