Personal Assistant Systems
Unified Representation Learning for Multi-Intent Diversity and Behavioral Uncertainty in Recommender Systems
Xu, Wei, Zheng, Jiasen, Lin, Junjiang, Han, Mingxuan, Du, Junliang
This paper addresses the challenge of jointly modeling user intent diversity and behavioral uncertainty in recommender systems. A unified representation learning framework is proposed. The framework builds a multi-intent representation module and an uncertainty modeling mechanism. It extracts multi-granularity interest structures from user behavior sequences. Behavioral ambiguity and preference fluctuation are captured using Bayesian distribution modeling. In the multi-intent modeling part, the model introduces multiple latent intent vectors. These vectors are weighted and fused using an attention mechanism to generate semantically rich representations of long-term user preferences. In the uncertainty modeling part, the model learns the mean and covariance of behavior representations through Gaussian distributions. This reflects the user's confidence in different behavioral contexts. Next, a learnable fusion strategy is used to combine long-term intent and short-term behavior signals. This produces the final user representation, improving both recommendation accuracy and robustness. The method is evaluated on standard public datasets. Experimental results show that it outperforms existing representative models across multiple metrics. It also demonstrates greater stability and adaptability under cold-start and behavioral disturbance scenarios. The approach alleviates modeling bottlenecks faced by traditional methods when dealing with complex user behavior. These findings confirm the effectiveness and practical value of the unified modeling strategy in real-world recommendation tasks.
ELIXIR: Efficient and LIghtweight model for eXplaIning Recommendations
Kabongo, Ben, Guigue, Vincent, Lemberger, Pirmin
Collaborative filtering drives many successful recommender systems but struggles with fine-grained user-item interactions and explainability. As users increasingly seek transparent recommendations, generating textual explanations through language models has become a critical research area. Existing methods employ either RNNs or Transformers. However, RNN-based approaches fail to leverage the capabilities of pre-trained Transformer models, whereas Transformer-based methods often suffer from suboptimal adaptation and neglect aspect modeling, which is crucial for personalized explanations. We propose ELIXIR (Efficient and LIghtweight model for eXplaIning Recommendations), a multi-task model combining rating prediction with personalized review generation. ELIXIR jointly learns global and aspect-specific representations of users and items, optimizing overall rating, aspect-level ratings, and review generation, with personalized attention to emphasize aspect importance. Based on a T5-small (60M) model, we demonstrate the effectiveness of our aspect-based architecture in guiding text generation in a personalized context, where state-of-the-art approaches exploit much larger models but fail to match user preferences as well. Experimental results on TripAdvisor and RateBeer demonstrate that ELIXIR significantly outperforms strong baseline models, especially in review generation.
First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay
Zhu, Andrew, Osgood, Evan, Callison-Burch, Chris
Much work has been done on conversational LLM agents which directly assist human users with tasks. We present an alternative paradigm for interacting with LLM agents, which we call "overhearing agents". These overhearing agents do not actively participate in conversation -- instead, they "listen in" on human-to-human conversations and perform background tasks or provide suggestions to assist the user. In this work, we explore the overhearing agents paradigm through the lens of Dungeons & Dragons gameplay. We present an in-depth study using large multimodal audio-language models as overhearing agents to assist a Dungeon Master. We perform a human evaluation to examine the helpfulness of such agents and find that some large audio-language models have the emergent ability to perform overhearing agent tasks using implicit audio cues. Finally, we release Python libraries and our project code to support further research into the overhearing agents paradigm at https://github.com/zhudotexe/overhearing_agents.
The biggest dating app photo turn-offs (and no, it's not holding a fish)
Choosing what pictures to include in your online dating profile is a big deal. Most people want to present a decent mix of flattering, fun and relaxed photos that showcase the best of you. But there are some in particular that should be avoided at all costs, experts say. A team from dating app Wisp asked 1,200 people for their biggest photo red flags that make them swipe left. The survey revealed 83 per cent of singles judge profiles on photos before reading a single word of your personal bio.
Oops! Google's unannounced new Nest Cams spotted in Google Home app
The big smart home manufacturers have been leaking like sieves as of late, giving us juicy early previews of their super-secret upcoming releases. Philips Hue recently fell victim to its own leak that revealed its entire fall product lineup, and now Google appears to have unwittingly shared images of its new Nest cam hardware. First, a quick recap: Google had already teased--intentionally--a new Gemini smart speaker during its Pixel event a couple of weeks back, and just days ago it promised an upcoming Google Home update on October 1, complete with a partial image of what appears to be a new Nest camera. Instead, it seems Google may have inadvertently left images of its new Nest hardware in the Google Home app following a recent update. The images, which were spotted by Android Authority and appear to have been subsequently yanked from the app, don't reveal anything startlingly new about the new Nest cams, aside from the fact that they exist.
Designing Gaze Analytics for ELA Instruction: A User-Centered Dashboard with Conversational AI Support
Davalos, Eduardo, Zhang, Yike, Jain, Shruti, Srivastava, Namrata, Truong, Trieu, Haque, Nafees-ul, Van, Tristan, Salas, Jorge, McFadden, Sara, Cho, Sun-Joo, Biswas, Gautam, Goodwin, Amanda
Eye-tracking offers rich insights into student cognition and engagement, but remains underutilized in classroom-facing educational technology due to challenges in data interpretation and accessibility. In this paper, we present the iterative design and evaluation of a gaze-based learning analytics dashboard for English Language Arts (ELA), developed through five studies involving teachers and students. Guided by user-centered design and data storytelling principles, we explored how gaze data can support reflection, formative assessment, and instructional decision-making. Our findings demonstrate that gaze analytics can be approachable and pedagogically valuable when supported by familiar visualizations, layered explanations, and narrative scaffolds. We further show how a conversational agent, powered by a large language model (LLM), can lower cognitive barriers to interpreting gaze data by enabling natural language interactions with multimodal learning analytics. We conclude with design implications for future EdTech systems that aim to integrate novel data modalities in classroom contexts.
ACT: Automated Constraint Targeting for Multi-Objective Recommender Systems
Chang, Daryl, Wu, Yi, She, Jennifer, Wei, Li, Heldt, Lukasz
Recommender systems often must maximize a primary objective while ensuring secondary ones satisfy minimum thresholds, or "guardrails." This is critical for maintaining a consistent user experience and platform ecosystem, but enforcing these guardrails despite orthogonal system changes is challenging and often requires manual hyperparameter tuning. We introduce the Automated Constraint Targeting (ACT) framework, which automatically finds the minimal set of hyperparameter changes needed to satisfy these guardrails. ACT uses an offline pairwise evaluation on unbiased data to find solutions and continuously retrains to adapt to system and user behavior changes. We empirically demonstrate its efficacy and describe its deployment in a large-scale production environment.
Short-Form Video Recommendations with Multimodal Embeddings: Addressing Cold-Start and Bias Challenges
Dzhoha, Andrii, Mirylenka, Katya, Malykh, Egor, Buchmann, Marco-Andrea, Catino, Francesca
In recent years, social media users have spent significant amounts of time on short-form video platforms. As a result, established platforms in other domains, such as e-commerce, have begun introducing short-form video content to engage users and increase their time spent on the platform. The success of these experiences is due not only to the content itself but also to a unique UI innovation: instead of offering users a list of choices to click, platforms actively recommend content for users to watch one at a time. This creates new challenges for recommender systems, especially when launching a new video experience. Beyond the limited interaction data, immersive feed experiences introduce stronger position bias due to the UI and duration bias when optimizing for watch-time, as models tend to favor shorter videos. These issues, together with the feedback loop inherent in recommender systems, make it difficult to build effective solutions. In this paper, we highlight the challenges faced when introducing a new short-form video experience and present our experience showing that, even with sufficient video interaction data, it can be more beneficial to leverage a video retrieval system using a fine-tuned multimodal vision-language model to overcome these challenges. This approach demonstrated greater effectiveness compared to conventional supervised learning methods in online experiments conducted on our e-commerce platform.
Decoupled Entity Representation Learning for Pinterest Ads Ranking
Liu, Jie, Li, Yinrui, Sun, Jiankai, Li, Kungang, Sun, Han, Wang, Sihan, Wu, Huasen, Gao, Siyuan, Soares, Paulo, Li, Nan, Liu, Zhifang, Li, Haoyang, Ji, Siping, Leng, Ling, Deshikachar, Prathibha
In this paper, we introduce a novel framework following an upstream-downstream paradigm to construct user and item (Pin) embeddings from diverse data sources, which are essential for Pinterest to deliver personalized Pins and ads effectively. Our upstream models are trained on extensive data sources featuring varied signals, utilizing complex architectures to capture intricate relationships between users and Pins on Pinterest. To ensure scalability of the upstream models, entity embeddings are learned, and regularly refreshed, rather than real-time computation, allowing for asynchronous interaction between the upstream and downstream models. These embeddings are then integrated as input features in numerous downstream tasks, including ad retrieval and ranking models for CTR and CVR predictions. We demonstrate that our framework achieves notable performance improvements in both offline and online settings across various downstream tasks. This framework has been deployed in Pinterest's production ad ranking systems, resulting in significant gains in online metrics.
RankGraph: Unified Heterogeneous Graph Learning for Cross-Domain Recommendation
Wu, Renzhi, Yang, Junjie, Chen, Li, Li, Hong, Yu, Li, Yan, Hong
Cross-domain recommendation systems face the challenge of integrating fine-grained user and item relationships across various product domains. To address this, we introduce RankGraph, a scalable graph learning framework designed to serve as a core component in recommendation foundation models (FMs). By constructing and leveraging graphs composed of heterogeneous nodes and edges across multiple products, RankGraph enables the integration of complex relationships between users, posts, ads, and other entities. Our framework employs a GPU-accelerated Graph Neural Network and contrastive learning, allowing for dynamic extraction of subgraphs such as item-item and user-user graphs to support similarity-based retrieval and real-time clustering. Furthermore, RankGraph integrates graph-based pretrained representations as contextual tokens into FM sequence models, enriching them with structured relational knowledge. RankGraph has demonstrated improvements in click (+0.92%) and conversion rates (+2.82%) in online A/B tests, showcasing its effectiveness in cross-domain recommendation scenarios.