Huang, Chengkai
Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models
Huang, Chengkai, Wu, Junda, Xia, Yu, Yu, Zixu, Wang, Ruhan, Yu, Tong, Zhang, Ruiyi, Rossi, Ryan A., Kveton, Branislav, Zhou, Dongruo, McAuley, Julian, Yao, Lina
Recent breakthroughs in Large Language Models (LLMs) have led to the emergence of agentic AI systems that extend beyond the capabilities of standalone models. By empowering LLMs to perceive external environments, integrate multimodal information, and interact with various tools, these agentic systems exhibit greater autonomy and adaptability across complex tasks. This evolution brings new opportunities to recommender systems (RS): LLM-based Agentic RS (LLM-ARS) can offer more interactive, context-aware, and proactive recommendations, potentially reshaping the user experience and broadening the application scope of RS. Despite promising early results, fundamental challenges remain, including how to effectively incorporate external knowledge, balance autonomy with controllability, and evaluate performance in dynamic, multimodal settings. In this perspective paper, we first present a systematic analysis of LLM-ARS: (1) clarifying core concepts and architectures; (2) highlighting how agentic capabilities -- such as planning, memory, and multimodal reasoning -- can enhance recommendation quality; and (3) outlining key research questions in areas such as safety, efficiency, and lifelong personalization. We also discuss open problems and future directions, arguing that LLM-ARS will drive the next wave of RS innovation. Ultimately, we foresee a paradigm shift toward intelligent, autonomous, and collaborative recommendation experiences that more closely align with users' evolving needs and complex decision-making processes.
Dual Contrastive Transformer for Hierarchical Preference Modeling in Sequential Recommendation
Huang, Chengkai, Wang, Shoujin, Wang, Xianzhi, Yao, Lina
Sequential recommender systems (SRSs) aim to predict the subsequent items which may interest users via comprehensively modeling users' complex preference embedded in the sequence of user-item interactions. However, most of existing SRSs often model users' single low-level preference based on item ID information while ignoring the high-level preference revealed by item attribute information, such as item category. Furthermore, they often utilize limited sequence context information to predict the next item while overlooking richer inter-item semantic relations. To this end, in this paper, we proposed a novel hierarchical preference modeling framework to substantially model the complex low- and high-level preference dynamics for accurate sequential recommendation. Specifically, in the framework, a novel dual-transformer module and a novel dual contrastive learning scheme have been designed to discriminatively learn users' low- and high-level preference and to effectively enhance both low- and high-level preference learning respectively. In addition, a novel semantics-enhanced context embedding module has been devised to generate more informative context embedding for further improving the recommendation performance. Extensive experiments on six real-world datasets have demonstrated both the superiority of our proposed method over the state-of-the-art ones and the rationality of our design.
Modeling Temporal Positive and Negative Excitation for Sequential Recommendation
Huang, Chengkai, Wang, Shoujin, Wang, Xianzhi, Yao, Lina
Sequential recommendation aims to predict the next item which interests users via modeling their interest in items over time. Most of the existing works on sequential recommendation model users' dynamic interest in specific items while overlooking users' static interest revealed by some static attribute information of items, e.g., category, or brand. Moreover, existing works often only consider the positive excitation of a user's historical interactions on his/her next choice on candidate items while ignoring the commonly existing negative excitation, resulting in insufficient modeling dynamic interest. The overlook of static interest and negative excitation will lead to incomplete interest modeling and thus impede the recommendation performance. To this end, in this paper, we propose modeling both static interest and negative excitation for dynamic interest to further improve the recommendation performance. Accordingly, we design a novel Static-Dynamic Interest Learning (SDIL) framework featured with a novel Temporal Positive and Negative Excitation Modeling (TPNE) module for accurate sequential recommendation. TPNE is specially designed for comprehensively modeling dynamic interest based on temporal positive and negative excitation learning. Extensive experiments on three real-world datasets show that SDIL can effectively capture both static and dynamic interest and outperforms state-of-the-art baselines.
Dual Conditional Diffusion Models for Sequential Recommendation
Huang, Hongtao, Huang, Chengkai, Chang, Xiaojun, Hu, Wen, Yao, Lina
Recent advancements in diffusion models have shown promising results in sequential recommendation (SR). However, current diffusion-based methods still exhibit two key limitations. First, they implicitly model the diffusion process for target item embeddings rather than the discrete target item itself, leading to inconsistency in the recommendation process. Second, existing methods rely on either implicit or explicit conditional diffusion models, limiting their ability to fully capture the context of user behavior and leading to less robust target item embeddings. In this paper, we propose the Dual Conditional Diffusion Models for Sequential Recommendation (DCRec), introducing a discrete-to-continuous sequential recommendation diffusion framework. Our framework introduces a complete Markov chain to model the transition from the reversed target item representation to the discrete item index, bridging the discrete and continuous item spaces for diffusion models and ensuring consistency with the diffusion framework. Building on this framework, we present the Dual Conditional Diffusion Transformer (DCDT) that incorporates the implicit conditional and the explicit conditional for diffusion-based SR. Extensive experiments on public benchmark datasets demonstrate that DCRec outperforms state-of-the-art methods.
Federated Large Language Models: Current Progress and Future Directions
Yao, Yuhang, Zhang, Jianyi, Wu, Junda, Huang, Chengkai, Xia, Yu, Yu, Tong, Zhang, Ruiyi, Kim, Sungchul, Rossi, Ryan, Li, Ang, Yao, Lina, McAuley, Julian, Chen, Yiran, Joe-Wong, Carlee
Large language models are rapidly gaining popularity and have been widely adopted in real-world applications. While the quality of training data is essential, privacy concerns arise during data collection. Federated learning offers a solution by allowing multiple clients to collaboratively train LLMs without sharing local data. However, FL introduces new challenges, such as model convergence issues due to heterogeneous data and high communication costs. A comprehensive study is required to address these challenges and guide future research. This paper surveys Federated learning for LLMs (FedLLM), highlighting recent advances and future directions. We focus on two key aspects: fine-tuning and prompt learning in a federated setting, discussing existing work and associated research challenges. We finally propose potential research directions for federated LLMs, including pre-training and how LLMs can further enhance federated learning.
Learn When (not) to Trust Language Models: A Privacy-Centric Adaptive Model-Aware Approach
Huang, Chengkai, Wang, Rui, Xie, Kaige, Yu, Tong, Yao, Lina
Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks. Despite their great success, the knowledge provided by the retrieval process is not always useful for improving the model prediction, since in some samples LLMs may already be quite knowledgeable and thus be able to answer the question correctly without retrieval. Aiming to save the cost of retrieval, previous work has proposed to determine when to do/skip the retrieval in a data-aware manner by analyzing the LLMs' pretraining data. However, these data-aware methods pose privacy risks and memory limitations, especially when requiring access to sensitive or extensive pretraining data. Moreover, these methods offer limited adaptability under fine-tuning or continual learning settings. We hypothesize that token embeddings are able to capture the model's intrinsic knowledge, which offers a safer and more straightforward way to judge the need for retrieval without the privacy risks associated with accessing pre-training data. Moreover, it alleviates the need to retain all the data utilized during model pre-training, necessitating only the upkeep of the token embeddings. Extensive experiments and in-depth analyses demonstrate the superiority of our model-aware approach.