Goto

Collaborating Authors

 Personal Assistant Systems


ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization

arXiv.org Artificial Intelligence

W e introduce ImageGem, a dataset for studying generative models that understand fine-grained individual preferences. W e posit that a key challenge hindering the development of such a generative model is the lack of in-the-wild and fine-grained user preference annotations. Our dataset features real-world interaction data from 57K users, who collectively have built 242K customized LoRAs, written 3M text prompts, and created 5M generated images. With user preference annotations from our dataset, we were able to train better preference alignment models. In addition, leveraging individual user preference, we investigated the performance of retrieval models and a vision-language model on personalized image retrieval and generative model recommendation. Finally, we propose an end-to-end framework for editing customized diffusion models in a latent weight space to align with individual user preferences. Our results demonstrate that the ImageGem dataset enables, for the first time, a new paradigm for generative model personalization.


From AutoRecSys to AutoRecLab: A Call to Build, Evaluate, and Govern Autonomous Recommender-Systems Research Labs

arXiv.org Artificial Intelligence

Recommender-systems research has accelerated model and evaluation advances, yet largely neglects automating the research process itself. We argue for a shift from narrow AutoRecSys tools -- focused on algorithm selection and hyper-parameter tuning -- to an Autonomous Recommender-Systems Research Lab (AutoRecLab) that integrates end-to-end automation: problem ideation, literature analysis, experimental design and execution, result interpretation, manuscript drafting, and provenance logging. Drawing on recent progress in automated science (e.g., multi-agent AI Scientist and AI Co-Scientist systems), we outline an agenda for the RecSys community: (1) build open AutoRecLab prototypes that combine LLM-driven ideation and reporting with automated experimentation; (2) establish benchmarks and competitions that evaluate agents on producing reproducible RecSys findings with minimal human input; (3) create review venues for transparently AI-generated submissions; (4) define standards for attribution and reproducibility via detailed research logs and metadata; and (5) foster interdisciplinary dialogue on ethics, governance, privacy, and fairness in autonomous research. Advancing this agenda can increase research throughput, surface non-obvious insights, and position RecSys to contribute to emerging Artificial Research Intelligence. We conclude with a call to organise a community retreat to coordinate next steps and co-author guidance for the responsible integration of automated research systems.


Human-AI Interactions: Cognitive, Behavioral, and Emotional Impacts

arXiv.org Artificial Intelligence

As stories of human-AI interactions continue to be highlighted in the news and research platforms, the challenges are becoming more pronounced, including potential risks of overreliance, cognitive offloading, social and emotional manipulation, and the nuanced degradation of human agency and judgment. This paper surveys recent research on these issues through the lens of the psychological triad: cognition, behavior, and emotion. Observations seem to suggest that while AI can substantially enhance memory, creativity, and engagement, it also introduces risks such as diminished critical thinking, skill erosion, and increased anxiety. Emotional outcomes are similarly mixed, with AI systems showing promise for support and stress reduction, but raising concerns about dependency, inappropriate attachments, and ethical oversight. This paper aims to underscore the need for responsible and context-aware AI design, highlighting gaps for longitudinal research and grounded evaluation frameworks to balance benefits with emerging human-centric risks.


An Automated Multi-modal Evaluation Framework for Mobile Intelligent Assistants Based on Large Language Models and Multi-Agent Collaboration

arXiv.org Artificial Intelligence

With the rapid development of mobile intelligent assistant technologies, multi-modal AI assistants have become essential interfaces for daily user interactions. However, current evaluation methods face challenges including high manual costs, inconsistent standards, and subjective bias. This paper proposes an automated multi-modal evaluation framework based on large language models and multi-agent collaboration. The framework employs a three-tier agent architecture consisting of interaction evaluation agents, semantic verification agents, and experience decision agents. Through supervised fine-tuning on the Qwen3-8B model, we achieve a significant evaluation matching accuracy with human experts. Experimental results on eight major intelligent agents demonstrate the framework's effectiveness in predicting users' satisfaction and identifying generation defects.


Large-scale User Game Lifecycle Representation Learning

arXiv.org Artificial Intelligence

However, existing representation learning methods crafted for handling billions of items in recommendation systems are unsuitable for game advertising and recommendation. This is primarily due to game sparsity, where the mere hundreds of games fall short for large-scale user representation learning, and game imbalance, where user behaviors are overwhelmingly dominated by a handful of popular games. To address the sparsity issue, we introduce the User Game Lifecycle (UGL), designed to enrich user behaviors in games. Additionally, we propose two innovative strategies aimed at manipulating user behaviors to more effectively extract both short and long-term interests. To tackle the game imbalance challenge, we present an Inverse Probability Masking strategy for UGL representation learning. The offline and online experimental results demonstrate that the UGL representations significantly enhance model by achieving a 1.83% AUC offline increase on average and a 21.67% CVR online increase on average for game advertising and a 0.5% AUC offline increase and a 0.82% ARPU online increase for in-game item recommendation.


Visibility Allocation Systems: How Algorithmic Design Shapes Online Visibility and Societal Outcomes

arXiv.org Artificial Intelligence

Throughout application domains, we now rely extensively on algorithmic systems to engage with ever-expanding datasets of information. Despite their benefits, these systems are often complex (comprising of many intricate tools, e.g., moderation, recommender systems, prediction models), of unknown structure (due to the lack of accompanying documentation), and having hard-to-predict yet potentially severe downstream consequences (due to the extensive use, systematic enactment of existing errors, and many comprising feedback loops). As such, understanding and evaluating these systems as a whole remains a challenge for both researchers and legislators. To aid ongoing efforts, we introduce a formal framework for such visibility allocation systems (VASs) which we define as (semi-)automated systems deciding which (processed) data to present a human user with. We review typical tools comprising VASs and define the associated computational problems they solve. By doing so, VASs can be decomposed into sub-processes and illustrated via data flow diagrams. Moreover, we survey metrics for evaluating VASs throughout the pipeline, thus aiding system diagnostics. Using forecasting-based recommendations in school choice as a case study, we demonstrate how our framework can support VAS evaluation. We also discuss how our framework can support ongoing AI-legislative efforts to locate obligations, quantify systemic risks, and enable adaptive compliance.


The Burden of Interactive Alignment with Inconsistent Preferences

arXiv.org Artificial Intelligence

From media platforms to chatbots, algorithms shape how people interact, learn, and discover information. Such interactions between users and an algorithm often unfold over multiple steps, during which strategic users can guide the algorithm to better align with their true interests by selectively engaging with content. However, users frequently exhibit inconsistent preferences: they may spend considerable time on content that offers little long-term value, inadvertently signaling that such content is desirable. Focusing on the user side, this raises a key question: what does it take for such users to align the algorithm with their true interests? To investigate these dynamics, we model the user's decision process as split between a rational system 2 that decides whether to engage and an impulsive system 1 that determines how long engagement lasts. We then study a multi-leader, single-follower extensive Stackelberg game, where users, specifically system 2, lead by committing to engagement strategies and the algorithm best-responds based on observed interactions. We define the burden of alignment as the minimum horizon over which users must optimize to effectively steer the algorithm. We show that a critical horizon exists: users who are sufficiently foresighted can achieve alignment, while those who are not are instead aligned to the algorithm's objective. This critical horizon can be long, imposing a substantial burden. However, even a small, costly signal (e.g., an extra click) can significantly reduce it. Overall, our framework explains how users with inconsistent preferences can align an engagement-driven algorithm with their interests in a Stackelberg equilibrium, highlighting both the challenges and potential remedies for achieving alignment.


BPL: Bias-adaptive Preference Distillation Learning for Recommender System

arXiv.org Artificial Intelligence

Abstract--Recommender systems suffer from biases that cause the collected feedback to incompletely reveal user preference. While debiasing learning has been extensively studied, they mostly focused on the specialized (called counterfactual) test environment simulated by random exposure of items, significantly degrading accuracy in the typical (called factual) test environment based on actual user-item interactions. In fact, each test environment highlights the benefit of a different aspect: the counterfactual test emphasizes user satisfaction in the long-terms, while the factual test focuses on predicting subsequent user behaviors on platforms. Therefore, it is desirable to have a model that performs well on both tests rather than only one. In this work, we introduce a new learning framework, called Bias-adaptive Preference distillation Learning (BPL), to gradually uncover user preferences with dual distillation strategies. These distillation strategies are designed to drive high performance in both factual and counterfactual test environments. Employing a specialized form of teacher-student distillation from a biased model, BPL retains accurate preference knowledge aligned with the collected feedback, leading to high performance in the factual test. This enables the model to produce more accurate predictions across a broader range of user-item combinations, thereby improving performance in the counterfactual test. Real-world recommender systems form a feedback loop in which the systems' recommendations influence user behaviors, which in turn serve as training data for the system [1]. This feedback loop leads to the creation and amplification of various biases affected by multiple factors, including but not limited to user selection patterns, item exposure mechanism, and influence of public opinions [2], [3]. These biases progressively cause the training data to deviate from users' true preference, ultimately degrading the user satisfaction. SeongKu Kang is with the Department of Computer Science and Engineering, Korea University, Seoul, South Korea. Dongha Lee is with the Department of Aritifial Intelligence, Y onsei University, Seoul, South Korea, E-mail:donalee@yonsei.ac.kr. Jindong Wang is with William & Mary, Virginia, United States.


Aligning Language Models with Investor and Market Behavior for Financial Recommendations

arXiv.org Artificial Intelligence

Most financial recommendation systems often fail to account for key behavioral and regulatory factors, leading to advice that is misaligned with user preferences, difficult to interpret, or unlikely to be followed. We present FLARKO (Financial Language-model for Asset Recommendation with Knowledge-graph Optimization), a novel framework that integrates Large Language Models (LLMs), Knowledge Graphs (KGs), and Kahneman-Tversky Optimization (KTO) to generate asset recommendations that are both profitable and behaviorally aligned. FLARKO encodes users' transaction histories and asset trends as structured KGs, providing interpretable and controllable context for the LLM. To demonstrate the adaptability of our approach, we develop and evaluate both a centralized architecture (CenFLARKO) and a federated variant (FedFLARKO). To our knowledge, this is the first demonstration of combining KTO for fine-tuning of LLMs for financial asset recommendation. We also present the first use of structured KGs to ground LLM reasoning over behavioral financial data in a federated learning (FL) setting. Evaluated on the FAR-Trans dataset, FLARKO consistently outperforms state-of-the-art recommendation baselines on behavioral alignment and joint profitability, while remaining interpretable and resource-efficient.


HERO: Heterogeneous Continual Graph Learning via Meta-Knowledge Distillation

arXiv.org Artificial Intelligence

Heterogeneous graph neural networks have seen rapid progress in web applications such as social networks, knowledge graphs, and recommendation systems, driven by the inherent heterogeneity of web data. However, existing methods typically assume static graphs, while real-world graphs are continuously evolving. This dynamic nature requires models to adapt to new data while preserving existing knowledge. To this end, this work introduces HERO (HEterogeneous continual gRaph learning via meta-knOwledge distillation), a unified framework for continual learning on heterogeneous graphs. HERO employs meta-adaptation, a gradient-based meta-learning strategy that provides directional guidance for rapid adaptation to new tasks with limited samples. To enable efficient and effective knowledge reuse, we propose DiSCo (Diversity Sampling with semantic Consistency), a heterogeneity-aware sampling method that maximizes target node diversity and expands subgraphs along metapaths, retaining critical semantic and structural information with minimal overhead. Furthermore, HERO incorporates heterogeneity-aware knowledge distillation, which aligns knowledge at both the node and semantic levels to balance adaptation and retention across tasks. Extensive experiments on four web-related heterogeneous graph benchmarks demonstrate that HERO substantially mitigates catastrophic forgetting while achieving efficient and consistent knowledge reuse in dynamic web environments.