Personal Assistant Systems
The Value of Personalized Recommendations: Evidence from Netflix
Zielnicki, Kevin, Aridor, Guy, Bibaut, Aurélien, Tran, Allen, Chou, Winston, Kallus, Nathan
Personalized recommendation systems shape much of user choice online, yet their targeted nature makes separating out the value of recommendation and the underlying goods challenging. We build a discrete choice model that embeds recommendation-induced utility, low-rank heterogeneity, and flexible state dependence and apply the model to viewership data at Netflix. We exploit idiosyncratic variation introduced by the recommendation algorithm to identify and separately value these components as well as to recover model-free diversion ratios that we can use to validate our structural model. We use the model to evaluate counterfactuals that quantify the incremental engagement generated by personalized recommendations. First, we show that replacing the current recommender system with a matrix factorization or popularity-based algorithm would lead to 4% and 12% reduction in engagement, respectively, and decreased consumption diversity. Second, most of the consumption increase from recommendations comes from effective targeting, not mechanical exposure, with the largest gains for mid-popularity goods (as opposed to broadly appealing or very niche goods).
TBGRecall: A Generative Retrieval Model for E-commerce Recommendation Scenarios
Liang, Zida, Wu, Changfa, Huang, Dunxian, Sun, Weiqiang, Wang, Ziyang, Yan, Yuliang, Wu, Jian, Jiang, Yuning, Zheng, Bo, Chen, Ke, Zhou, Silu, Zhang, Yu
Recommendation systems are essential tools in modern e-commerce, facilitating personalized user experiences by suggesting relevant products. Recent advancements in generative models have demonstrated potential in enhancing recommendation systems; however, these models often exhibit limitations in optimizing retrieval tasks, primarily due to their reliance on autoregressive generation mechanisms. Conventional approaches introduce sequential dependencies that impede efficient retrieval, as they are inherently unsuitable for generating multiple items without positional constraints within a single request session. To address these limitations, we propose TBGRecall, a framework integrating Next Session Prediction (NSP), designed to enhance generative retrieval models for e-commerce applications. Our framework reformulation involves partitioning input samples into multi-session sequences, where each sequence comprises a session token followed by a set of item tokens, and then further incorporate multiple optimizations tailored to the generative task in retrieval scenarios. In terms of training methodology, our pipeline integrates limited historical data pre-training with stochastic partial incremental training, significantly improving training efficiency and emphasizing the superiority of data recency over sheer data volume. Our extensive experiments, conducted on public benchmarks alongside a large-scale industrial dataset from TaoBao, show TBGRecall outperforms the state-of-the-art recommendation methods, and exhibits a clear scaling law trend. Ultimately, NSP represents a significant advancement in the effectiveness of generative recommendation systems for e-commerce applications.
G-UBS: Towards Robust Understanding of Implicit Feedback via Group-Aware User Behavior Simulation
Chen, Boyu, Chen, Siran, Yue, Zhengrong, Yan, Kainan, Yu, Chenyun, Kong, Beibei, Lei, Cheng, Zhuo, Chengxiang, Li, Zang, Wang, Yali
User feedback is critical for refining recommendation systems, yet explicit feedback (e.g., likes or dislikes) remains scarce in practice. As a more feasible alternative, inferring user preferences from massive implicit feedback has shown great potential (e.g., a user quickly skipping a recommended video usually indicates disinterest). Unfortunately, implicit feedback is often noisy: a user might skip a video due to accidental clicks or other reasons, rather than disliking it. Such noise can easily misjudge user interests, thereby undermining recommendation performance. To address this issue, we propose a novel Group-aware User Behavior Simulation (G-UBS) paradigm, which leverages contextual guidance from relevant user groups, enabling robust and in-depth interpretation of implicit feedback for individual users. Specifically, G-UBS operates via two key agents. First, the User Group Manager (UGM) effectively clusters users to generate group profiles utilizing a ``summarize-cluster-reflect" workflow based on LLMs. Second, the User Feedback Modeler (UFM) employs an innovative group-aware reinforcement learning approach, where each user is guided by the associated group profiles during the reinforcement learning process, allowing UFM to robustly and deeply examine the reasons behind implicit feedback. To assess our G-UBS paradigm, we have constructed a Video Recommendation benchmark with Implicit Feedback (IF-VR). To the best of our knowledge, this is the first multi-modal benchmark for implicit feedback evaluation in video recommendation, encompassing 15k users, 25k videos, and 933k interaction records with implicit feedback. Extensive experiments on IF-VR demonstrate that G-UBS significantly outperforms mainstream LLMs and MLLMs, with a 4.0% higher proportion of videos achieving a play rate > 30% and 14.9% higher reasoning accuracy on IF-VR.
GFlowGR: Fine-tuning Generative Recommendation Frameworks with Generative Flow Networks
Wang, Yejing, Zhou, Shengyu, Lu, Jinyu, Liu, Qidong, Li, Xinhang, Zhang, Wenlin, Li, Feng, Wang, Pengjie, Xu, Jian, Zheng, Bo, Zhao, Xiangyu
Generative recommendations (GR), which usually include item tokenizers and generative Large Language Models (LLMs), have demonstrated remarkable success across a wide range of scenarios. The majority of existing research efforts primarily concentrate on developing powerful item tokenizers or advancing LLM decoding strategies to attain superior performance. However, the critical fine-tuning step in GR frameworks, which is essential for adapting LLMs to recommendation data, remains largely unexplored. Current approaches predominantly rely on either the next-token prediction loss of supervised fine-tuning (SFT) or recommendationspecific direct preference optimization (DPO) strategies. Both methods ignore the exploration of possible positive unobserved samples, which is commonly referred to as the exposure bias problem. To mitigate this problem, this paper treats the GR as a multi-step generation task and constructs a GFlowNets-based fine-tuning framework (GFlowGR). The proposed framework integrates collaborative knowledge from traditional recommender systems to create an adaptive trajectory sampler and a comprehensive reward model. Leveraging the diverse generation property of GFlowNets, along with sampling and heuristic weighting techniques, GFlowGR emerges as a promising approach to mitigate the exposure bias problem. Extensive empirical results on two real-world datasets and with two different GR backbones highlight the effectiveness and robustness of GFlowGR.
Constructing Political Coordinates: Aggregating Over the Opposition for Diverse News Recommendation
Earl, Eamon, Ding, Chen, Valenzano, Richard, Paulen-Patterson, Drai
Abstract--In the past two decades, open access to news and information has increased rapidly, empowering educated political growth within democratic societies. News recommender systems (NRSs) have shown to be useful in this process, minimizing political disengagement and information overload by providing individuals with articles on topics that matter to them. Unfortunately, NRSs often conflate underlying user interest with the partisan bias of the articles in their reading history and with the most popular biases present in the coverage of their favored topics. Over extended interaction, this can result in the formation of filter bubbles and the polarization of user partisanship. In this paper, we propose a novel embedding space called Constructed Political Coordinates (CPC), which models the political partisanship of users over a given topic-space, relative to a larger sample population. We apply a simple collaborative filtering (CF) framework using CPC-based correlation to recommend articles sourced from oppositional users, who have different biases from the user in question. We compare against classical CF methods and find that CPC-based methods promote pointed bias diversity and better match the true political tolerance of users, while classical methods implicitly exploit biases to maximize interaction. Recommender system (RS) utility has two main value measurements: users seeing content that they engage positively with, and the content providers maximizing engagement with their content or platform. While the two are evidently correlated (i.e. a user who is not properly catered to will likely cease to use the platform), the latter provides motivation for recommendation algorithms to shift a user's preferences to make them easier to cater to, resulting in higher expectations of long-term engagement [1]. Previous research [2] on the relationship between recom-mender systems and American political typology suggests that users with more extreme political preferences exhibit higher engagement metrics with their recommended news. Additionally, it was found that their engagement can be maximized by recommending articles among which a dominant percentage express a singular partisan bias. This establishes an implicit incentive for a News Recommender System (NRS) to shift user preferences toward political extremes through selection bias, particularly in long-term value systems or those leveraging popularity [1]. This phenomenon results in the formation of filter bubbles, where users are eventually shown only perspectives in their news which comply with their preexisting opinions, and users with heterogeneous partisanship over distinct topics have their political ideology homogenized over time.
Humanlike Multi-user Agent (HUMA): Designing a Deceptively Human AI Facilitator for Group Chats
Jacniacki, Mateusz, Serrat, Martí Carmona
Conversational agents built on large language models (LLMs) are becoming increasingly prevalent, yet most systems are designed for one-on-one, turn-based exchanges rather than natural, asynchronous group chats. As AI assistants become widespread throughout digital platforms, from virtual assistants to customer service, developing natural and humanlike interaction patterns seems crucial for maintaining user trust and engagement. We present the Humanlike Multi-user Agent (HUMA), an LLM-based facilitator that participates in multi-party conversations using human-like strategies and timing. HUMA extends prior multi-user chatbot work with an event-driven architecture that handles messages, replies, reactions and introduces realistic response-time simulation. HUMA comprises three components--Router, Action Agent, and Reflection--which together adapt LLMs to group conversation dynamics. We evaluate HUMA in a controlled study with 97 participants in four-person role-play chats, comparing AI and human community managers (CMs). Participants classified CMs as human at near-chance rates in both conditions, indicating they could not reliably distinguish HUMA agents from humans. Subjective experience was comparable across conditions: community-manager effectiveness, social presence, and engagement/satisfaction differed only modestly with small effect sizes. Our results suggest that, in natural group chat settings, an AI facilitator can match human quality while remaining difficult to identify as nonhuman.
A*-based Temporal Logic Path Planning with User Preferences on Relaxed Task Satisfaction
Kamale, Disha, Yu, Xi, Vasile, Cristian-Ioan
In this work, we consider the problem of planning for temporal logic tasks in large robot environments. When full task compliance is unattainable, we aim to achieve the best possible task satisfaction by integrating user preferences for relaxation into the planning process. Utilizing the automata-based representations for temporal logic goals and user preferences, we propose an A*-based planning framework. This approach effectively tackles large-scale problems while generating near-optimal high-level trajectories. To facilitate this, we propose a simple, efficient heuristic that allows for planning over large robot environments in a fraction of time and search memory as compared to uninformed search algorithms. We present extensive case studies to demonstrate the scalability, runtime analysis as well as empirical bounds on the suboptimality of the proposed heuristic.
Generalization Bounds for Semi-supervised Matrix Completion with Distributional Side Information
Ledent, Antoine, Soo, Mun Chong, Hieu, Nong Minh
We study a matrix completion problem where both the ground truth $R$ matrix and the unknown sampling distribution $P$ over observed entries are low-rank matrices, and \textit{share a common subspace}. We assume that a large amount $M$ of \textit{unlabeled} data drawn from the sampling distribution $P$ is available, together with a small amount $N$ of labeled data drawn from the same distribution and noisy estimates of the corresponding ground truth entries. This setting is inspired by recommender systems scenarios where the unlabeled data corresponds to `implicit feedback' (consisting in interactions such as purchase, click, etc. ) and the labeled data corresponds to the `explicit feedback', consisting of interactions where the user has given an explicit rating to the item. Leveraging powerful results from the theory of low-rank subspace recovery, together with classic generalization bounds for matrix completion models, we show error bounds consisting of a sum of two error terms scaling as $\widetilde{O}\left(\sqrt{\frac{nd}{M}}\right)$ and $\widetilde{O}\left(\sqrt{\frac{dr}{N}}\right)$ respectively, where $d$ is the rank of $P$ and $r$ is the rank of $M$. In synthetic experiments, we confirm that the true generalization error naturally splits into independent error terms corresponding to the estimations of $P$ and and the ground truth matrix $\ground$ respectively. In real-life experiments on Douban and MovieLens with most explicit ratings removed, we demonstrate that the method can outperform baselines relying only on the explicit ratings, demonstrating that our assumptions provide a valid toy theoretical setting to study the interaction between explicit and implicit feedbacks in recommender systems.
Mum of two left penniless by Tinder scammer
A mother of two says she was left penniless after giving her savings to Tinder predator Christopher Harkins in a fake investment scam. The pair matched on the dating app in London in 2020. Caitlyn - not her real name - told how the fraudster and rapist initially tried to talk her into going on holiday with him - a regular ruse of Harkins, now 38. When she said she couldn't afford a holiday, he offered to help by doubling what money she had via his foreign currency exchange business. She's one of four women the BBC is aware of who were targeted by Harkins in the capital - where he fled to after his crimes were exposed in Scotland.
Beyond Parity: Fairness Objectives for Collaborative Filtering
We study fairness in collaborative-filtering recommender systems, which are sensitive to discrimination that exists in historical data. Biased data can lead collaborative-filtering methods to make unfair predictions for users from minority groups. We identify the insufficiency of existing fairness metrics and propose four new metrics that address different forms of unfairness. These fairness metrics can be optimized by adding fairness terms to the learning objective. Experiments on synthetic and real data show that our new metrics can better measure fairness than the baseline, and that the fairness objectives effectively help reduce unfairness.