Personal Assistant Systems
On Component Interactions in Two-Stage Recommender Systems
Thanks to their scalability, two-stage recommenders are used by many of today's largest online platforms, including YouTube, LinkedIn, and Pinterest. These systems produce recommendations in two steps: (i) multiple nominators--tuned for low prediction latency--preselect a small subset of candidates from the whole item pool; (ii) a slower but more accurate ranker further narrows down the nominated items, and serves to the user. Despite their popularity, the literature on two-stage recommenders is relatively scarce, and the algorithms are often treated as mere sums of their parts. Such treatment presupposes that the two-stage performance is explained by the behavior of the individual components in isolation. This is not the case: using synthetic and real-world data, we demonstrate that interactions between the ranker and the nominators substantially affect the overall performance.
Tuning Language Models for Robust Prediction of Diverse User Behaviors
Meng, Fanjin, Ding, Jingtao, Gong, Jiahui, Yang, Chen, Chen, Hong, Wang, Zuojian, Lu, Haisheng, Li, Yong
Predicting user behavior is essential for intelligent assistant services, yet deep learning models often struggle to capture long-tailed behaviors. Large language models (LLMs), with their pretraining on vast corpora containing rich behavioral knowledge, offer promise. However, existing fine-tuning approaches tend to overfit to frequent ``anchor'' behaviors, reducing their ability to predict less common ``tail'' behaviors. In this paper, we introduce BehaviorLM, a progressive fine-tuning approach that addresses this issue. In the first stage, LLMs are fine-tuned on anchor behaviors while preserving general behavioral knowledge. In the second stage, fine-tuning uses a balanced subset of all behaviors based on sample difficulty to improve tail behavior predictions without sacrificing anchor performance. Experimental results on two real-world datasets demonstrate that BehaviorLM robustly predicts both anchor and tail behaviors and effectively leverages LLM behavioral knowledge to master tail behavior prediction with few-shot examples.
BehaveGPT: A Foundation Model for Large-scale User Behavior Modeling
Gong, Jiahui, Ding, Jingtao, Meng, Fanjin, Yang, Chen, Chen, Hong, Wang, Zuojian, Lu, Haisheng, Li, Yong
In recent years, foundational models have revolutionized the fields of language and vision, demonstrating remarkable abilities in understanding and generating complex data; however, similar advances in user behavior modeling have been limited, largely due to the complexity of behavioral data and the challenges involved in capturing intricate temporal and contextual relationships in user activities. To address this, we propose BehaveGPT, a foundational model designed specifically for large-scale user behavior prediction. Leveraging transformer-based architecture and a novel pretraining paradigm, BehaveGPT is trained on vast user behavior datasets, allowing it to learn complex behavior patterns and support a range of downstream tasks, including next behavior prediction, long-term generation, and cross-domain adaptation. Our approach introduces the DRO-based pretraining paradigm tailored for user behavior data, which improves model generalization and transferability by equitably modeling both head and tail behaviors. Extensive experiments on real-world datasets demonstrate that BehaveGPT outperforms state-of-the-art baselines, achieving more than a 10% improvement in macro and weighted recall, showcasing its ability to effectively capture and predict user behavior. Furthermore, we measure the scaling law in the user behavior domain for the first time on the Honor dataset, providing insights into how model performance scales with increased data and parameter sizes.
Large language model as user daily behavior data generator: balancing population diversity and individual personality
Li, Haoxin, Ding, Jingtao, Gong, Jiahui, Li, Yong
Predicting human daily behavior is challenging due to the complexity of routine patterns and short-term fluctuations. While data-driven models have improved behavior prediction by leveraging empirical data from various platforms and devices, the reliance on sensitive, large-scale user data raises privacy concerns and limits data availability. Synthetic data generation has emerged as a promising solution, though existing methods are often limited to specific applications. In this work, we introduce BehaviorGen, a framework that uses large language models (LLMs) to generate high-quality synthetic behavior data. By simulating user behavior based on profiles and real events, BehaviorGen supports data augmentation and replacement in behavior prediction models. We evaluate its performance in scenarios such as pertaining augmentation, fine-tuning replacement, and fine-tuning augmentation, achieving significant improvements in human mobility and smartphone usage predictions, with gains of up to 18.9%. Our results demonstrate the potential of BehaviorGen to enhance user behavior modeling through flexible and privacy-preserving synthetic data generation.
Predictively Combatting Toxicity in Health-related Online Discussions through Machine Learning
Paz-Ruza, Jorge, Alonso-Betanzos, Amparo, Guijarro-Berdiรฑas, Bertha, Eiras-Franco, Carlos
--In health-related topics, user toxicity in online discussions frequently becomes a source of social conflict or promotion of dangerous, unscientific behaviour; common approaches for battling it include different forms of detection, flagging and/or removal of existing toxic comments, which is often counterproductive for platforms and users alike. In this work, we propose the alternative of combatting user toxicity predictively, anticipating where a user could interact toxically in health-related online discussions. The hierarchical and decentralised structure made Reddit a hub of heated debate during the onset of the COVID pandemic, with over 200,000 related posts per day. Center accredited by Galician University System, is funded by "Conseller Conversely, volunteer-based moderation is generally more susceptible to bias and under-moderation, depending on the platform's audience. The design of an adapted Leave Out Last Item data partitioning method suitable for binary classification-oriented Collaborative Filtering tasks. We remove "generic comments'' from the set, i.e. those Label comments as "generic'' if they do not contain any words from Authors have temporarily removed this link to the work's repository to The majority of users do not post toxic comments when discussing health on Reddit, with 9.96% of toxic comments in the aggregate, similar to previous work. Furthermore, as Figure 2 shows, a user's toxicity on a subreddit tends to be consistent (toxic or non-toxic, as indicated by the peaks in the distribution at toxicities 0 Note the logarithmic scale on the y-axis. To tag the toxicity of comments we use Detoxify-original [7], a pre-trained language model. Instead of only detecting and punishing the toxicity of existing interactions like common content moderation methods, which is ineffective and counterproductive in the long term, this work's proposal is to predict the toxicity of an unobserved interaction Figure 5. Topology of the Machine Learning model proposed to predict the toxicity of health-related conversations in unobserved user-subreddit interactions on the Reddit platform. We assessed the predictive ability of our model and baselines using classical binary classification metrics: sensitivity, specificity, and geometric mean (G.Mean) of the class-wise We identify different avenues of future work. U. Naseem, J. Kim, M. Khushi, and A. G. Dunn, "Identification of disease or symptom terms in reddit to improve health mention classification," in "R/redditsecurity - understanding hate on reddit, and the impact of our Iii, "Toxicity detection is not all you need: Measuring the gaps to "Meta to replace'biased' fact-checkers with moderation by users -- J. Brownlee, Imbalanced classification with Python: better metrics, balance skewed classes, cost-sensitive learning .
Embedding-to-Prefix: Parameter-Efficient Personalization for Pre-Trained Large Language Models
Huber, Bernd, Fazelnia, Ghazal, Damianou, Andreas, Peleato, Sebastian, Lefarov, Max, Ravichandran, Praveen, De Nadai, Marco, Lalmas-Roellke, Mounia, Bennett, Paul N.
Large language models (LLMs) excel at generating contextually relevant content. However, tailoring these outputs to individual users for effective personalization is a significant challenge. While rich user-specific information often exists as pre-existing user representations, such as embeddings learned from preferences or behaviors, current methods to leverage these for LLM personalization typically require costly fine-tuning or token-heavy prompting. We propose Embedding-to-Prefix (E2P), a parameter-efficient method that injects pre-computed context embeddings into an LLM's hidden representation space through a learned projection to a single soft token prefix. This enables effective personalization while keeping the backbone model frozen and avoiding expensive adaptation techniques. We evaluate E2P across two public datasets and in a production setting: dialogue personalization on Persona-Chat, contextual headline generation on PENS, and large-scale personalization for music and podcast consumption. Results show that E2P preserves contextual signals and achieves strong performance with minimal computational overhead, offering a scalable, efficient solution for contextualizing generative AI systems.
'Alexa, what do you know about us?' What I discovered when I asked Amazon to tell me everything my family's smart speaker had heard
She needs to be spoken to slowly and clearly, as you'd talk to an aged relative with diminished faculties. '"Alexa, how long do wasps live for?" "Alexa, how long do wasps live if you hit them with a tea towel and then a saucepan?" In September 2016, a new presence appears in our house, squatting on the kitchen counter between the kettle and the coffee machine. It is blandly futuristic, a minimal cylinder with an LED ring that glows blue to alert us to the fact that it is ready, poised to answer our questions or carry out our instructions, as long as those instructions are clearly stated and fall within a narrow band of available "skills".
Children's Mental Models of AI Reasoning: Implications for AI Literacy Education
Dangol, Aayushi, Wolfe, Robert, Zhao, Runhua, Kim, JaeWon, Ramanan, Trushaa, Davis, Katie, Kientz, Julie A.
As artificial intelligence (AI) advances in reasoning capabilities, most recently with the emergence of Large Reasoning Models (LRMs), understanding how children conceptualize AI's reasoning processes becomes critical for fostering AI literacy. While one of the "Five Big Ideas" in AI education highlights reasoning algorithms as central to AI decision-making, less is known about children's mental models in this area. Through a two-phase approach, consisting of a co-design session with 8 children followed by a field study with 106 children (grades 3-8), we identified three models of AI reasoning: Deductive, Inductive, and Inherent. Our findings reveal that younger children (grades 3-5) often attribute AI's reasoning to inherent intelligence, while older children (grades 6-8) recognize AI as a pattern recognizer. We highlight three tensions that surfaced in children's understanding of AI reasoning and conclude with implications for scaffolding AI curricula and designing explainable AI tools.
Conf-GNNRec: Quantifying and Calibrating the Prediction Confidence for GNN-based Recommendation Methods
Yan, Meng, Xu, Cai, Wang, Xujing, Guan, Ziyu, Zhao, Wei, Zhou, Yuhang
Recommender systems based on graph neural networks perform well in tasks such as rating and ranking. However, in real-world recommendation scenarios, noise such as user misuse and malicious advertisement gradually accumulates through the message propagation mechanism. Even if existing studies mitigate their effects by reducing the noise propagation weights, the severe sparsity of the recommender system still leads to the low-weighted noisy neighbors being mistaken as meaningful information, and the prediction result obtained based on the polluted nodes is not entirely trustworthy. Therefore, it is crucial to measure the confidence of the prediction results in this highly noisy framework. Furthermore, our evaluation of the existing representative GNN-based recommendation shows that it suffers from overconfidence. Based on the above considerations, we propose a new method to quantify and calibrate the prediction confidence of GNN-based recommendations (Conf-GNNRec). Specifically, we propose a rating calibration method that dynamically adjusts excessive ratings to mitigate overconfidence based on user personalization. We also design a confidence loss function to reduce the overconfidence of negative samples and effectively improve recommendation performance. Experiments on public datasets demonstrate the validity of Conf-GNNRec in prediction confidence and recommendation performance.
Hisense taps new Google Home APIs to expand smart home integration
Google issued 100 announcements during its Google I/O developers conference this week, none of which involved the smart home. That apparent lack of enthusiasm for a topic close to our heart didn't dissuade TV and smart-appliance manufacturer Hisense from announcing plans to integrate new Google Home APIs into its own ConnectLife app, so that third-party smart home devices can be folded into that ecosystem. Hisense first announced that it would open its ConnectLife app to third-party products in December, 2024. Today, it announced it will incorporate the latest Google Home APIs into the app by the fall of 2025, Hisense says this will enable users to onboard a wide range of third-party smart home devices--including Matter and Works With Google Home-certified products--to create a more integrated smart home experience. Hisense cited two examples of how this would benefit ConnectLife users: "One-touch modes and customized automations can blend Hisense products with third-party devices to create intelligent home responses, such as air conditioners automatically adjusting based on third-party air quality sensors, or smart lights providing visual notifications when the Hisense refrigerator's VersaTemp drawer reaches the ideal temperature for chilling drinks."