Goto

Collaborating Authors

 Personal Assistant Systems


Counterfactual Risk Minimization with IPS-Weighted BPR and Self-Normalized Evaluation in Recommender Systems

arXiv.org Artificial Intelligence

Learning and evaluating recommender systems from logged implicit feedback is challenging due to exposure bias. While inverse propensity scoring (IPS) corrects this bias, it often suffers from high variance and instability. In this paper, we present a simple and effective pipeline that integrates IPS-weighted training with an IPS-weighted Bayesian Personalized Ranking (BPR) objective augmented by a Propensity Regularizer (PR). We compare Direct Method (DM), IPS, and Self-Normalized IPS (SNIPS) for offline policy evaluation, and demonstrate how IPS-weighted training improves model robustness under biased exposure. The proposed PR further mitigates variance amplification from extreme propensity weights, leading to more stable estimates. Experiments on synthetic and MovieLens 100K data show that our approach generalizes better under unbiased exposure while reducing evaluation variance compared to naive and standard IPS methods, offering practical guidance for counterfactual learning and evaluation in real-world recommendation settings.


Algorithm Adaptation Bias in Recommendation System Online Experiments

arXiv.org Artificial Intelligence

Online experiments (A/B tests) are widely regarded as the gold standard for evaluating recommender system variants and guiding launch decisions. However, a variety of biases can distort the results of the experiment and mislead decision-making. An underexplored but critical bias is algorithm adaptation effect. This bias arises from the flywheel dynamics among production models, user data, and training pipelines: new models are evaluated on user data whose distributions are shaped by the incumbent system or tested only in a small treatment group. As a result, the measured effect of a new product change in modeling and user experience in this constrained experimental setting can diverge substantially from its true impact in full deployment. In practice, the experiment results often favor the production variant with large traffic while underestimating the performance of the test variant with small traffic, which leads to missing opportunities to launch a true winning arm or underestimating the impact. This paper aims to raise awareness of algorithm adaptation bias, situate it within the broader landscape of RecSys evaluation biases, and motivate discussion of solutions that span experiment design, measurement, and adjustment. We detail the mechanisms of this bias, present empirical evidence from real-world experiments, and discuss potential methods for a more robust online evaluation.


Bias Mitigation for AI-Feedback Loops in Recommender Systems: A Systematic Literature Review and Taxonomy

arXiv.org Artificial Intelligence

Recommender systems continually retrain on user reactions to their own predictions, creating AI feedback loops that amplify biases and diminish fairness over time. Despite this well-known risk, most bias mitigation techniques are tested only on static splits, so their long-term fairness across multiple retraining rounds remains unclear. We therefore present a systematic literature review of bias mitigation methods that explicitly consider AI feedback loops and are validated in multi-round simulations or live A/B tests. Screening 347 papers yields 24 primary studies published between 2019-2025. Each study is coded on six dimensions: mitigation technique, biases addressed, dynamic testing set-up, evaluation focus, application domain, and ML task, organising them into a reusable taxonomy. The taxonomy offers industry practitioners a quick checklist for selecting robust methods and gives researchers a clear roadmap to the field's most urgent gaps. Examples include the shortage of shared simulators, varying evaluation metrics, and the fact that most studies report either fairness or performance; only six use both.


Google sets date for Nest Cam, Gemini for Home reveal

PCWorld

After years of waiting for new Nest smart gear from Google, it appears we'll actually get our wish next month. "Gemini is coming to Google Home," a post reads on Google's official X account. "Come back October 1st," the message continues above an animation of a Nest camera peeking into the frame. A link labeled "sign up for updates" sends users to the Google Store, where they can sign up for news. It's not clear what type of reveal Google is planning for October 1; it could simply be a news release, or perhaps it will be a full-on media event.


What Data is Really Necessary? A Feasibility Study of Inference Data Minimization for Recommender Systems

arXiv.org Artificial Intelligence

Data minimization is a legal principle requiring personal data processing to be limited to what is necessary for a specified purpose. Operationalizing this principle for recommender systems, which rely on extensive personal data, remains a significant challenge. This paper conducts a feasibility study on minimizing implicit feedback inference data for such systems. We propose a novel problem formulation, analyze various minimization techniques, and investigate key factors influencing their effectiveness. We demonstrate that substantial inference data reduction is technically feasible without significant performance loss. However, its practicality is critically determined by two factors: the technical setting (e.g., performance targets, choice of model) and user characteristics (e.g., history size, preference complexity). Thus, while we establish its technical feasibility, we conclude that data minimization remains practically challenging and its dependence on the technical and user context makes a universal standard for data `necessity' difficult to implement.


Diffusion-based Multi-modal Synergy Interest Network for Click-through Rate Prediction

arXiv.org Artificial Intelligence

In click-through rate prediction, click-through rate prediction is used to model users' interests. However, most of the existing CTR prediction methods are mainly based on the ID modality. As a result, they are unable to comprehensively model users' multi-modal preferences. Therefore, it is necessary to introduce multi-modal CTR prediction. Although it seems appealing to directly apply the existing multi-modal fusion methods to click-through rate prediction models, these methods (1) fail to effectively disentangle commonalities and specificities across different modalities; (2) fail to consider the synergistic effects between modalities and model the complex interactions between modalities. To address the above issues, this paper proposes the Diffusion-based Multi-modal Synergy Interest Network (Diff-MSIN) framework for click-through prediction. This framework introduces three innovative modules: the Multi-modal Feature Enhancement (MFE) Module Synergistic Relationship Capture (SRC) Module, and the Feature Dynamic Adaptive Fusion (FDAF) Module. The MFE Module and SRC Module extract synergistic, common, and special information among different modalities. They effectively enhances the representation of the modalities, improving the overall quality of the fusion. To encourage distinctiveness among different features, we design a Knowledge Decoupling method. Additionally, the FDAF Module focuses on capturing user preferences and reducing fusion noise. To validate the effectiveness of the Diff-MSIN framework, we conducted extensive experiments using the Rec-Tmall and three Amazon datasets. The results demonstrate that our approach yields a significant improvement of at least 1.67% compared to the baseline, highlighting its potential for enhancing multi-modal recommendation systems. Our code is available at the following link: https://github.com/Cxx-0/Diff-MSIN.


Morae: Proactively Pausing UI Agents for User Choices

arXiv.org Artificial Intelligence

User interface (UI) agents promise to make inaccessible or complex UIs easier to access for blind and low-vision (BLV) users. However, current UI agents typically perform tasks end-to-end without involving users in critical choices or making them aware of important contextual information, thus reducing user agency. For example, in our field study, a BLV participant asked to buy the cheapest available sparkling water, and the agent automatically chose one from several equally priced options, without mentioning alternative products with different flavors or better ratings. To address this problem, we introduce Morae, a UI agent that automatically identifies decision points during task execution and pauses so that users can make choices. Morae uses large multimodal models to interpret user queries alongside UI code and screenshots, and prompt users for clarification when there is a choice to be made. In a study over real-world web tasks with BLV participants, Morae helped users complete more tasks and select options that better matched their preferences, as compared to baseline agents, including OpenAI Operator. More broadly, this work exemplifies a mixed-initiative approach in which users benefit from the automation of UI agents while being able to express their preferences.


Stairway to Fairness: Connecting Group and Individual Fairness

arXiv.org Artificial Intelligence

Fairness in recommender systems (RSs) is commonly categorised into group fairness and individual fairness. However, there is no established scientific understanding of the relationship between the two fairness types, as prior work on both types has used different evaluation measures or evaluation objectives for each fairness type, thereby not allowing for a proper comparison of the two. As a result, it is currently not known how increasing one type of fairness may affect the other. To fill this gap, we study the relationship of group and individual fairness through a comprehensive comparison of evaluation measures that can be used for both fairness types. Our experiments with 8 runs across 3 datasets show that recommendations that are highly fair for groups can be very unfair for individuals. Our finding is novel and useful for RS practitioners aiming to improve the fairness of their systems. Our code is available at: https://github.com/theresiavr/stairway-to-fairness.


Breaking the Cold-Start Barrier: Reinforcement Learning with Double and Dueling DQNs

arXiv.org Artificial Intelligence

Recommender systems struggle to provide accurate suggestions to new users with limited interaction history, a challenge known as the cold-user problem. This paper proposes a reinforcement learning approach using Double and Dueling Deep Q-Networks (DQN) to dynamically learn user preferences from sparse feedback, enhancing recommendation accuracy without relying on sensitive demographic data. By integrating these advanced DQN variants with a matrix factorization model, we achieve superior performance on a large e-commerce dataset compared to traditional methods like popularity-based and active learning strategies. Experimental results show that our method, particularly Dueling DQN, reduces Root Mean Square Error (RMSE) for cold users, offering an effective solution for privacy-constrained environments.


Designing Smarter Conversational Agents for Kids: Lessons from Cognitive Work and Means-Ends Analyses

arXiv.org Artificial Intelligence

This paper presents two studies on how Brazilian children (ages 9--11) use conversational agents (CAs) for schoolwork, discovery, and entertainment, and how structured scaffolds can enhance these interactions. In Study 1, a seven-week online investigation with 23 participants (children, parents, teachers) employed interviews, observations, and Cognitive Work Analysis to map children's information-processing flows, the role of more knowledgeable others, functional uses, contextual goals, and interaction patterns to inform conversation-tree design. We identified three CA functions: School, Discovery, Entertainment, and derived ``recipe'' scaffolds mirroring parent-child support. In Study 2, we prompted GPT-4o-mini on 1,200 simulated child-CA exchanges, comparing conversation-tree recipes based on structured-prompting to an unstructured baseline. Quantitative evaluation of readability, question count/depth/diversity, and coherence revealed gains for the recipe approach. Building on these findings, we offer design recommendations: scaffolded conversation-trees, child-dedicated profiles for personalized context, and caregiver-curated content. Our contributions include the first CWA application with Brazilian children, an empirical framework of child-CA information flows, and an LLM-scaffolding ``recipe'' (i.e., structured-prompting) for effective, scaffolded learning.