Personal Assistant Systems
What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context
Ouyang, Zhongyu, Wen, Qianlong, Zhang, Chunhui, Ye, Yanfang, Vosoughi, Soroush
Sequential recommendation systems aspire to profile users by interpreting their interaction histories, echoing how humans make decisions by weighing experience, relative preference strength, and situational relevance. Yet, existing large language model (LLM)-based recommenders often fall short of mimicking the flexible, context-aware decision strategies humans exhibit, neglecting the structured, dynamic, and context-aware mechanisms fundamental to human behaviors. To bridge this gap, we propose RecPO, a preference optimization framework that models structured feedback and contextual delay to emulate human-like prioritization in sequential recommendation. RecPO exploits adaptive reward margins based on inferred preference hierarchies and temporal signals, enabling the model to favor immediately relevant items and to distinguish between varying degrees of preference and aversion. Extensive experiments across five real-world datasets demonstrate that RecPO not only yields performance gains over state-of-the-art baselines, but also mirrors key characteristics of human decision-making: favoring timely satisfaction, maintaining coherent preferences, and exercising discernment under shifting contexts.
Exploring human-SAV interaction using LLMs: The impact of psychological factors on user experience
Guo, Lirui, Burke, Michael G., Griggs, Wynita M.
There has been extensive prior work exploring how psychological factors such as anthropomorphism affect the adoption of Shared Autonomous Vehicles (SAVs). However, limited research has been conducted on how prompt strategies in large language models (LLM)-powered conversational SAV agents affect users' perceptions, experiences, and intentions to adopt such technology. In this work, we investigate how conversational SAV agents powered by LLMs drive these psychological factors, such as psychological ownership, the sense of possession a user may come to feel towards an entity or object they may not legally own. We designed four SAV agents with varying levels of anthropomorphic characteristics and psychological ownership triggers. Quantitative measures of psychological ownership, anthropomorphism, quality of service, disclosure tendency, sentiment of SAV responses, and overall acceptance were collected after participants interacted with each SAV. Qualitative feedback was also gathered regarding the experience of psychological ownership during the interactions. The results indicate that an SAV designed to be more anthropomorphic and to induce psychological ownership improved users' perceptions of the SAV's human-like qualities, and its responses were perceived as more positive but also more subjective compared to the control conditions. Qualitative findings support established routes to psychological ownership in the SAV context and suggest that the conversational agent's perceived performance may also influence psychological ownership. Both quantitative and qualitative outcomes highlight the importance of personalization in designing effective SAV interactions. These findings provide practical guidance for designing conversational SAV agents that enhance user experience and adoption.
Bandits with Single-Peaked Preferences and Limited Resources
Keinan, Gur, Torkan, Rotem, Ben-Porat, Omer
Modern recommendation systems often face the challenge of personalization at scale--learning individual user preferences while simultaneously satisfying global resource allocation constraints. To illustrate, consider a content platform that must decide which content creators to commission daily, where each creator has a different cost and produces ephemeral content on specific topics. Each user has preferences over all creators' content styles and topics. After commissioning a subset of creators that fit the platform's budget, it matches each user to content from one of these creators, where the same creator's content can be recommended to multiple users. The challenge lies in learning individual user preferences for each creator's content while selecting which creators to commission and how to assign their content to maximize user satisfaction. This problem fits the combinatorial multi-armed bandit framework, where the decision-maker must choose structured action sets [8], such as assigning each user to an item. The goal is to maximize cumulative reward, or equivalently, minimize regret by balancing exploration and exploitation. Unfortunately, combinatorial problems like the one in the example above are NP-complete even for offline settings. Therefore, traditional approaches settle for weaker notions of ฮฑ-regret [8], competing against the best ef-All authors contributed equally to this work.
From Entity Reliability to Clean Feedback: An Entity-Aware Denoising Framework Beyond Interaction-Level Signals
Liu, Ze, Wang, Xianquan, Liu, Shuochen, Ma, Jie, Xu, Huibo, Han, Yupeng, Zhang, Kai, Zhou, Jun
Implicit feedback is central to modern recommender systems but is inherently noisy, often impairing model training and degrading user experience. At scale, such noise can mislead learning processes, reducing both recommendation accuracy and platform value. Existing denoising strategies typically overlook the entity-specific nature of noise while introducing high computational costs and complex hyperparameter tuning. To address these challenges, we propose \textbf{EARD} (\textbf{E}ntity-\textbf{A}ware \textbf{R}eliability-\textbf{D}riven Denoising), a lightweight framework that shifts the focus from interaction-level signals to entity-level reliability. Motivated by the empirical observation that training loss correlates with noise, EARD quantifies user and item reliability via their average training losses as a proxy for reputation, and integrates these entity-level factors with interaction-level confidence. The framework is \textbf{model-agnostic}, \textbf{computationally efficient}, and requires \textbf{only two intuitive hyperparameters}. Extensive experiments across multiple datasets and backbone models demonstrate that EARD yields substantial improvements over state-of-the-art baselines (e.g., up to 27.01\% gain in NDCG@50), while incurring negligible additional computational cost. Comprehensive ablation studies and mechanism analyses further confirm EARD's robustness to hyperparameter choices and its practical scalability. These results highlight the importance of entity-aware reliability modeling for denoising implicit feedback and pave the way for more robust recommendation research.
Controlled Personalization in Legacy Media Online Services: A Case Study in News Recommendation
Holzleitner, Marlene, Leitner, Stephan, Jorgensen, Hanna Lind, Schmitz, Christoph, Welander, Jacob, Jannach, Dietmar
Personalized news recommendations have become a standard feature of large news aggregation services, optimizing user engagement through automated content selection. In contrast, legacy news media often approach personalization cautiously, striving to balance technological innovation with core editorial values. As a result, online platforms of traditional news outlets typically combine editorially curated content with algorithmically selected articles - a strategy we term controlled personalization. In this industry paper, we evaluate the effectiveness of controlled personalization through an A/B test conducted on the website of a major Norwegian legacy news organization. Our findings indicate that even a modest level of personalization yields substantial benefits. Specifically, we observe that users exposed to personalized content demonstrate higher click-through rates and reduced navigation effort, suggesting improved discovery of relevant content. Moreover, our analysis reveals that controlled personalization contributes to greater content diversity and catalog coverage and in addition reduces popularity bias. Overall, our results suggest that controlled personalization can successfully align user needs with editorial goals, offering a viable path for legacy media to adopt personalization technologies while upholding journalistic values.
MATT-CTR: Unleashing a Model-Agnostic Test-Time Paradigm for CTR Prediction with Confidence-Guided Inference Paths
Zhang, Moyu, Chen, Yun, Jin, Yujun, Hu, Jinxin, Zhang, Yu, Zeng, Xiaoyi
Recently, a growing body of research has focused on either optimizing CTR model architectures to better model feature interactions or refining training objectives to aid parameter learning, thereby achieving better predictive performance. However, previous efforts have primarily focused on the training phase, largely neglecting opportunities for optimization during the inference phase. Infrequently occurring feature combinations, in particular, can degrade prediction performance, leading to unreliable or low-confidence outputs. To unlock the predictive potential of trained CTR models, we propose a Model-Agnostic Test-Time paradigm (MATT), which leverages the confidence scores of feature combinations to guide the generation of multiple inference paths, thereby mitigating the influence of low-confidence features on the final prediction. Specifically, to quantify the confidence of feature combinations, we introduce a hierarchical probabilistic hashing method to estimate the occurrence frequencies of feature combinations at various orders, which serve as their corresponding confidence scores. Then, using the confidence scores as sampling probabilities, we generate multiple instance-specific inference paths through iterative sampling and subsequently aggregate the prediction scores from multiple paths to conduct robust predictions. Finally, extensive offline experiments and online A/B tests strongly validate the compatibility and effectiveness of MATT across existing CTR models.
'I realised I'd been ChatGPT-ed into bed': how 'Chatfishing' made finding love on dating apps even weirder
'I realised I'd been ChatGPT-ed into bed': how'Chatfishing' made finding love on dating apps even weirder Where once people were duped by soft-focus photos and borrowed chat-up lines, now they have to watch out for computer-generated charm. But it's one thing to use a witty phrase - another thing entirely to build a whole fake persona S tanding outside the pub, 36-year-old business owner Rachel took a final tug on her vape and steeled herself to meet the man she'd spent the last three weeks opening up to. They'd matched on the dating app Hinge and built a rapport that quickly became something deeper. "From the beginning he was asking very open-ended questions, and that felt refreshing," says Rachel. One early message from her match read: "I've been reading a bit about attachment styles lately, it's helped me to understand myself better - and the type of partner I should be looking for. Have you ever looked at yours? Do you know your attachment style?" "It was like he was genuinely trying to get to know me on a deeper level. The questions felt a lot more thoughtful than the usual, 'How's your day going?'"