Personal Assistant Systems
Toward Safe and Human-Aligned Game Conversational Recommendation via Multi-Agent Decomposition
Hui, Zheng, Wei, Xiaokai, Jiang, Yexi, Gao, Kevin, Wang, Chen, Ong, Frank, Yoon, Se-eun, Pareek, Rachit, Gong, Michelle
Conversational recommender systems (CRS) have advanced with large language models, showing strong results in domains like movies. These domains typically involve fixed content and passive consumption, where user preferences can be matched by genre or theme. In contrast, games present distinct challenges: fast-evolving catalogs, interaction-driven preferences (e.g., skill level, mechanics, hardware), and increased risk of unsafe responses in open-ended conversation. We propose MATCHA, a multi-agent framework for CRS that assigns specialized agents for intent parsing, tool-augmented retrieval, multi-LLM ranking with reflection, explanation, and risk control which enabling finer personalization, long-tail coverage, and stronger safety. Evaluated on real user request dataset, MATCHA outperforms six baselines across eight metrics, improving Hit@5 by 20%, reducing popularity bias by 24%, and achieving 97.9% adversarial defense. Human and virtual-judge evaluations confirm improved explanation quality and user alignment.
Enhance Large Language Models as Recommendation Systems with Collaborative Filtering
Yang, Zhisheng, Xu, Xiaofei, Deng, Ke, Li, Li
As powerful tools in Natural Language Processing (NLP), Large Language Models (LLMs) have been leveraged for crafting recommendations to achieve precise alignment with user preferences and elevate the quality of the recommendations. The existing approaches implement both non-tuning and tuning strategies. Compared to following the tuning strategy, the approaches following the non-tuning strategy avoid the relatively costly, time-consuming, and expertise-requiring process of further training pre-trained LLMs on task-specific datasets, but they suffer the issue of not having the task-specific business or local enterprise knowledge. To the best of our knowledge, none of the existing approaches following the non-tuning strategy explicitly integrates collaborative filtering, one of the most successful recommendation techniques. This study aims to fill the gap by proposing critique-based LLMs as recommendation systems (Critic-LLM-RS). For our purpose, we train a separate machine-learning model called Critic that implements collaborative filtering for recommendations by learning from the interactions between many users and items. The Critic provides critiques to LLMs to significantly refine the recommendations. Extensive experiments have verified the effectiveness of Critic-LLM-RS on real datasets.
State Your Intention to Steer Your Attention: An AI Assistant for Intentional Digital Living
Choi, Juheon, Lee, Juyong, Kim, Jian, Kim, Chanyoung, Min, Taywon, Knox, W. Bradley, Lee, Min Kyung, Lee, Kimin
When working on digital devices, people often face distractions that can lead to a decline in productivity and efficiency, as well as negative psychological and emotional impacts. To address this challenge, we introduce a novel Artificial Intelligence (AI) assistant that elicits a user's intention, assesses whether ongoing activities are in line with that intention, and provides gentle nudges when deviations occur. The system leverages a large language model to analyze screenshots, application titles, and URLs, issuing notifications when behavior diverges from the stated goal. Its detection accuracy is refined through initial clarification dialogues and continuous user feedback. In a three-week, within-subjects field deployment with 22 participants, we compared our assistant to both a rule-based intent reminder system and a passive baseline that only logged activity. Results indicate that our AI assistant effectively supports users in maintaining focus and aligning their digital behavior with their intentions. Our source code is publicly available at https://intentassistant.github.io
Learning Correlated Reward Models: Statistical Barriers and Opportunities
Cherapanamjeri, Yeshwanth, Daskalakis, Constantinos, Farina, Gabriele, Mohammadpour, Sobhan
Random Utility Models (RUMs) are a classical framework for modeling user preferences and play a key role in reward modeling for Reinforcement Learning from Human Feedback (RLHF). However, a crucial shortcoming of many of these techniques is the Independence of Irrelevant Alternatives (IIA) assumption, which collapses \emph{all} human preferences to a universal underlying utility function, yielding a coarse approximation of the range of human preferences. On the other hand, statistical and computational guarantees for models avoiding this assumption are scarce. In this paper, we investigate the statistical and computational challenges of learning a \emph{correlated} probit model, a fundamental RUM that avoids the IIA assumption. First, we establish that the classical data collection paradigm of pairwise preference data is \emph{fundamentally insufficient} to learn correlational information, explaining the lack of statistical and computational guarantees in this setting. Next, we demonstrate that \emph{best-of-three} preference data provably overcomes these shortcomings, and devise a statistically and computationally efficient estimator with near-optimal performance. These results highlight the benefits of higher-order preference data in learning correlated utilities, allowing for more fine-grained modeling of human preferences. Finally, we validate these theoretical guarantees on several real-world datasets, demonstrating improved personalization of human preferences.
Gemini in Google Home Keeps Mistaking My Dog for a Cat
Google's Gemini AI has offended my dog, but at least it can automatically turn the lights on for me. A cat jumped up on my couch. The alert about the leaping feline is something my Google Home app sent me when I was out at a party. Turns out it was my dog. This notification came through a day after I turned on Google's Gemini for Home capability in the Google Home app.
Cognitive-Aligned Spatio-Temporal Large Language Models For Next Point-of-Interest Prediction
Zhai, Penglong, Li, Jie, Di, Fanyi, Liu, Yue, Yuan, Yifang, Huang, Jie, Wu, Peng, Wang, Sicong, Yin, Mingyang, Hu, Tingting, Xu, Yao, Li, Xin
The next point-of-interest (POI) recommendation task aims to predict the users' immediate next destinations based on their preferences and historical check-ins, holding significant value in location-based services. Recently, large language models (LLMs) have shown great potential in recommender systems, which treat the next POI prediction in a generative manner. However, these LLMs, pretrained primarily on vast corpora of unstructured text, lack the native understanding of structured geographical entities and sequential mobility patterns required for next POI prediction tasks. Moreover, in industrial-scale POI prediction applications, incorporating world knowledge and alignment of human cognition, such as seasons, weather conditions, holidays, and users' profiles (such as habits, occupation, and preferences), can enhance the user experience while improving recommendation performance. To address these issues, we propose CoAST (Cognitive-Aligned Spatial-Temporal LLMs), a framework employing natural language as an interface, allowing for the incorporation of world knowledge, spatio-temporal trajectory patterns, profiles, and situational information. Specifically, CoAST mainly comprises of 2 stages: (1) Recommendation Knowledge Acquisition through continued pretraining on the enriched spatial-temporal trajectory data of the desensitized users; (2) Cognitive Alignment to align cognitive judgments with human preferences using enriched training data through Supervised Fine-Tuning (SFT) and a subsequent Reinforcement Learning (RL) phase. Extensive offline experiments on various real-world datasets and online experiments deployed in "Guess Where You Go" of AMAP App homepage demonstrate the effectiveness of CoAST.
Causality Enhancement for Cross-Domain Recommendation
Wu, Zhibo, Wu, Yunfan, Jiang, Lin, Yang, Ping, Hu, Yao
Cross-domain recommendation forms a crucial component in recommendation systems. It leverages auxiliary information through source domain tasks or features to enhance target domain recommendations. However, incorporating inconsistent source domain tasks may result in insufficient cross-domain modeling or negative transfer. While incorporating source domain features without considering the underlying causal relationships may limit their contribution to final predictions. Thus, a natural idea is to directly train a cross-domain representation on a causality-labeled dataset from the source to target domain. Yet this direction has been rarely explored, as identifying unbiased real causal labels is highly challenging in real-world scenarios. In this work, we attempt to take a first step in this direction by proposing a causality-enhanced framework, named CE-CDR. Specifically, we first reformulate the cross-domain recommendation as a causal graph for principled guidance. We then construct a causality-aware dataset heuristically. Subsequently, we derive a theoretically unbiased Partial Label Causal Loss to generalize beyond the biased causality-aware dataset to unseen cross-domain patterns, yielding an enriched cross-domain representation, which is then fed into the target model to enhance target-domain recommendations. Theoretical and empirical analyses, as well as extensive experiments, demonstrate the rationality and effectiveness of CE-CDR and its general applicability as a model-agnostic plugin. Moreover, it has been deployed in production since April 2025, showing its practical value in real-world applications.
GemiRec: Interest Quantization and Generation for Multi-Interest Recommendation
Wu, Zhibo, Wu, Yunfan, Liu, Quan, Jiang, Lin, Yang, Ping, Hu, Yao
Multi-interest recommendation has gained attention, especially in industrial retrieval stage. Unlike classical dual-tower methods, it generates multiple user representations instead of a single one to model comprehensive user interests. However, prior studies have identified two underlying limitations: the first is interest collapse, where multiple representations homogenize. The second is insufficient modeling of interest evolution, as they struggle to capture latent interests absent from a user's historical behavior. We begin with a thorough review of existing works in tackling these limitations. Then, we attempt to tackle these limitations from a new perspective. Specifically, we propose a framework-level refinement for multi-interest recommendation, named GemiRec. The proposed framework leverages interest quantization to enforce a structural interest separation and interest generation to learn the evolving dynamics of user interests explicitly. It comprises three modules: (a) Interest Dictionary Maintenance Module (IDMM) maintains a shared quantized interest dictionary. (b) Multi-Interest Posterior Distribution Module (MIPDM) employs a generative model to capture the distribution of user future interests. (c) Multi-Interest Retrieval Module (MIRM) retrieves items using multiple user-interest representations. Both theoretical and empirical analyses, as well as extensive experiments, demonstrate its advantages and effectiveness. Moreover, it has been deployed in production since March 2025, showing its practical value in industrial applications.
Large Scale Retrieval for the LinkedIn Feed using Causal Language Models
Ramanujam, Sudarshan Srinivasa, Alonso, Antonio, Kataria, Saurabh, Dangi, Siddharth, Gupta, Akhilesh, Tiwana, Birjodh Singh, Somaiya, Manas, Simon, Luke, Byrne, David, Ha, Sojeong, Zhou, Sen, Akterskii, Andrei, Liu, Zhanglong, Sriram, Samira, Xiong, Crescent, Pei, Zhoutao, Shao, Angela, Li, Alex, Xiao, Annie, Kolb, Caitlin, Kistler, Thomas, Moore, Zach, Firooz, Hamed
In large-scale recommendation systems like the LinkedIn Feed, the retrieval stage is critical for narrowing hundreds of millions of potential candidates to a manageable subset for ranking. LinkedIn's Feed serves suggested content from outside of the member's network (based on the member's topical interests), where 2000 candidates are retrieved from a pool of hundreds of millions candidate with a latency budget of a few milliseconds and inbound QPS of several thousand per second. This paper presents a novel retrieval approach that fine-tunes a large causal language model (Meta's LLaMA 3) as a dual encoder to generate high quality embeddings for both users (members) and content (items), using only textual input. We describe the end-to-end pipeline, including prompt design for embedding generation, techniques for fine-tuning at LinkedIn's scale, and infrastructure for low latency, cost effective online serving. We share our findings on how quantiz-ing numerical features in the prompt enables the information to get properly encoded in the embedding, facilitating greater alignment between the retrieval and ranking layer. The system was evaluated using offline metrics and an online A/B test, which showed substantial improvements in member engagement. We observed significant gains among newer members, who often lack strong network connections, indicating that high-quality suggested content aids retention. This work demonstrates how generative language models can be effectively adapted for real time, high throughput retrieval in industrial applications.
GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents
Yu, Xi, Yang, Yang, Liu, Qun, Du, Yonghua, McSweeney, Sean, Lin, Yuewei
Cellular image segmentation is essential for quantitative biology yet remains difficult due to heterogeneous modalities, morphological variability, and limited annotations. We present GenCellAgent, a training-free multi-agent framework that orchestrates specialist segmenters and generalist vision-language models via a planner-executor-evaluator loop (choose tool $\rightarrow$ run $\rightarrow$ quality-check) with long-term memory. The system (i) automatically routes images to the best tool, (ii) adapts on the fly using a few reference images when imaging conditions differ from what a tool expects, (iii) supports text-guided segmentation of organelles not covered by existing models, and (iv) commits expert edits to memory, enabling self-evolution and personalized workflows. Across four cell-segmentation benchmarks, this routing yields a 15.7\% mean accuracy gain over state-of-the-art baselines. On endoplasmic reticulum and mitochondria from new datasets, GenCellAgent improves average IoU by 37.6\% over specialist models. It also segments novel objects such as the Golgi apparatus via iterative text-guided refinement, with light human correction further boosting performance. Together, these capabilities provide a practical path to robust, adaptable cellular image segmentation without retraining, while reducing annotation burden and matching user preferences.