Wei, Haichao
Control LLM: Controlled Evolution for Intelligence Retention in LLM
Wei, Haichao, Ren, Yunxiang, Fu, Zhoutong, Lunia, Aman, Chen, Yi-Lin, Leung, Alice, Xu, Ya
Large Language Models (LLMs) demand significant computational resources, making it essential to enhance their capabilities without retraining from scratch. A key challenge in this domain is \textit{catastrophic forgetting} (CF), which hampers performance during Continuous Pre-training (CPT) and Continuous Supervised Fine-Tuning (CSFT). We propose \textbf{Control LLM}, a novel approach that leverages parallel pre-trained and expanded transformer blocks, aligning their hidden-states through interpolation strategies This method effectively preserves performance on existing tasks while seamlessly integrating new knowledge. Extensive experiments demonstrate the effectiveness of Control LLM in both CPT and CSFT. On Llama3.1-8B-Instruct, it achieves significant improvements in mathematical reasoning ($+14.4\%$ on Math-Hard) and coding performance ($+10\%$ on MBPP-PLUS). On Llama3.1-8B, it enhances multilingual capabilities ($+10.6\%$ on C-Eval, $+6.8\%$ on CMMLU, and $+30.2\%$ on CMMLU-0shot-CoT). It surpasses existing methods and achieves SOTA among open-source models tuned from the same base model, using substantially less data and compute. Crucially, these gains are realized while preserving strong original capabilities, with minimal degradation ($<4.3\% \text{on MMLU}$) compared to $>35\%$ in open-source Math and Coding models. This approach has been successfully deployed in LinkedIn's GenAI-powered job seeker and Ads unit products. To support further research, we release the training and evaluation code (https://github.com/linkedin/ControlLLM) along with models trained on public datasets (https://huggingface.co/ControlLLM) to the community.
Learning to Retrieve for Job Matching
Shen, Jianqiang, Juan, Yuchin, Zhang, Shaobo, Liu, Ping, Pu, Wen, Vasudevan, Sriram, Song, Qingquan, Borisyuk, Fedor, Shen, Kay Qianqi, Wei, Haichao, Ren, Yunxiang, Chiou, Yeou S., Kuang, Sicong, Yin, Yuan, Zheng, Ben, Wu, Muchen, Gharghabi, Shaghayegh, Wang, Xiaoqing, Xue, Huichao, Guo, Qi, Hewlett, Daniel, Simon, Luke, Hong, Liangjie, Zhang, Wenjing
Web-scale search systems typically tackle the scalability challenge As one of the largest professional networking platforms globally, with a two-step paradigm: retrieval and ranking. The retrieval step, LinkedIn is a hub for job seekers and recruiters, with 65M+ job also known as candidate selection, often involves extracting standardized seekers utilizing the search and recommendation services weekly entities, creating an inverted index, and performing term to discover millions of open job listings. To enable realtime personalization matching for retrieval. Such traditional methods require manual for job seekers, we adopted the classic two-stage paradigm and time-consuming development of query models. In this paper, of retrieval and ranking to tackle the scalability challenge. The retrieval we discuss applying learning-to-retrieve technology to enhance layer, also known as candidate selection, chooses a small set LinkedIn's job search and recommendation systems. In the realm of of relevant jobs from the set of all jobs, after which the ranking layer promoted jobs, the key objective is to improve the quality of applicants, performs a more computationally expensive second-pass scoring thereby delivering value to recruiter customers. To achieve and sorting of the resulting candidate set. This paper focuses on this, we leverage confirmed hire data to construct a graph that improving the methodology and systems for retrieval.
LinkSAGE: Optimizing Job Matching Using Graph Neural Networks
Liu, Ping, Wei, Haichao, Hou, Xiaochen, Shen, Jianqiang, He, Shihai, Shen, Kay Qianqi, Chen, Zhujun, Borisyuk, Fedor, Hewlett, Daniel, Wu, Liang, Veeraraghavan, Srikant, Tsun, Alex, Jiang, Chengming, Zhang, Wenjing
We present LinkSAGE, an innovative framework that integrates Graph Neural Networks (GNNs) into large-scale personalized job matching systems, designed to address the complex dynamics of LinkedIns extensive professional network. Our approach capitalizes on a novel job marketplace graph, the largest and most intricate of its kind in industry, with billions of nodes and edges. This graph is not merely extensive but also richly detailed, encompassing member and job nodes along with key attributes, thus creating an expansive and interwoven network. A key innovation in LinkSAGE is its training and serving methodology, which effectively combines inductive graph learning on a heterogeneous, evolving graph with an encoder-decoder GNN model. This methodology decouples the training of the GNN model from that of existing Deep Neural Nets (DNN) models, eliminating the need for frequent GNN retraining while maintaining up-to-date graph signals in near realtime, allowing for the effective integration of GNN insights through transfer learning. The subsequent nearline inference system serves the GNN encoder within a real-world setting, significantly reducing online latency and obviating the need for costly real-time GNN infrastructure. Validated across multiple online A/B tests in diverse product scenarios, LinkSAGE demonstrates marked improvements in member engagement, relevance matching, and member retention, confirming its generalizability and practical impact.
LiRank: Industrial Large Scale Ranking Models at LinkedIn
Borisyuk, Fedor, Zhou, Mingzhou, Song, Qingquan, Zhu, Siyu, Tiwana, Birjodh, Parameswaran, Ganesh, Dangi, Siddharth, Hertel, Lars, Xiao, Qiang, Hou, Xiaochen, Ouyang, Yunbo, Gupta, Aman, Singh, Sheallika, Liu, Dan, Cheng, Hailing, Le, Lei, Hung, Jonathan, Keerthi, Sathiya, Wang, Ruoyan, Zhang, Fengyu, Kothari, Mohit, Zhu, Chen, Sun, Daqi, Dai, Yun, Luan, Xun, Zhu, Sirou, Wang, Zhiwei, Daftary, Neil, Shen, Qianqi, Jiang, Chengming, Wei, Haichao, Varshney, Maneesh, Ghoting, Amol, Ghosh, Souvik
We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including Dense Gating, Transformers and Residual DCN. We also propose novel techniques for calibration and describe how we productionalized deep learning based explore/exploit methods. To enable effective, production-grade serving of large ranking models, we detail how to train and compress models using quantization and vocabulary compression. We provide details about the deployment setup for large-scale use cases of Feed ranking, Jobs Recommendations, and Ads click-through rate (CTR) prediction. We summarize our learnings from various A/B tests by elucidating the most effective technical approaches. These ideas have contributed to relative metrics improvements across the board at LinkedIn: +0.5% member sessions in the Feed, +1.76% qualified job applications for Jobs search and recommendations, and +4.3% for Ads CTR. We hope this work can provide practical insights and solutions for practitioners interested in leveraging large-scale deep ranking systems.