Personal Assistant Systems
DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation
The modern recommender systems are facing an increasing challenge of modelling and predicting the dynamic and context-rich user preferences. Traditional collaborative filtering and content-based methods often struggle to capture the temporal patternings and evolving user intentions. While Large Language Models (LLMs) have gained gradual attention in recent years, by their strong semantic understanding and reasoning abilities, they are not inherently designed to model chronologically evolving user preference and intentions. On the other hand, for sequential models like LSTM (Long-Short-Term-Memory) which is good at capturing the temporal dynamics of user behaviour and evolving user preference over time, but still lacks a rich semantic understanding for comprehensive recommendation generation. In this study, we propose DUALRec (Dynamic User-Aware Language-based Recommender), a novel recommender that leverages the complementary strength of both models, which combines the temporal modelling abilities of LSTM networks with semantic reasoning power of the fine-tuned Large Language Models. The LSTM component will capture users evolving preference through their viewing history, while the fine-tuned LLM variants will leverage these temporal user insights to generate next movies that users might enjoy. Experimental results on MovieLens-1M dataset shows that the DUALRec model outperforms a wide range of baseline models, with comprehensive evaluation matrices of Hit Rate (HR@k), Normalized Discounted Cumulative Gain (NDCG@k), and genre similarity metrics. This research proposes a novel architecture that bridges the gap between temporal sequence modeling and semantic reasoning, and offers a promising direction for developing more intelligent and context-aware recommenders.
Point of Interest Recommendation: Pitfalls and Viable Solutions
Bellogín, Alejandro, Dietz, Linus W., Ricci, Francesco, Sánchez, Pablo
Point of interest (POI) recommendation can play a pivotal role in enriching tourists' experiences by suggesting context-dependent and preference-matching locations and activities, such as restaurants, landmarks, itineraries, and cultural attractions. Unlike some more common recommendation domains (e.g., music and video), POI recommendation is inherently high-stakes: users invest significant time, money, and effort to search, choose, and consume these suggested POIs. Despite the numerous research works in the area, several fundamental issues remain unresolved, hindering the real-world applicability of the proposed approaches. In this paper, we discuss the current status of the POI recommendation problem and the main challenges we have identified. The first contribution of this paper is a critical assessment of the current state of POI recommendation research and the identification of key shortcomings across three main dimensions: datasets, algorithms, and evaluation methodologies. We highlight persistent issues such as the lack of standardized benchmark datasets, flawed assumptions in the problem definition and model design, and inadequate treatment of biases in the user behavior and system performance. The second contribution is a structured research agenda that, starting from the identified issues, introduces important directions for future work related to multistakeholder design, context awareness, data collection, trustworthiness, novel interactions, and real-world evaluation.
Off-Policy Evaluation and Learning for Matching Markets
Hayashi, Yudai, Goda, Shuhei, Saito, Yuta
Matching users based on mutual preferences is a fundamental aspect of services driven by reciprocal recommendations, such as job search and dating applications. Although A/B tests remain the gold standard for evaluating new policies in recommender systems for matching markets, it is costly and impractical for frequent policy updates. Off-Policy Evaluation (OPE) thus plays a crucial role by enabling the evaluation of recommendation policies using only offline logged data naturally collected on the platform. However, unlike conventional recommendation settings, the large scale and bidirectional nature of user interactions in matching platforms introduce variance issues and exacerbate reward sparsity, making standard OPE methods unreliable. To address these challenges and facilitate effective offline evaluation, we propose novel OPE estimators, \textit{DiPS} and \textit{DPR}, specifically designed for matching markets. Our methods combine elements of the Direct Method (DM), Inverse Propensity Score (IPS), and Doubly Robust (DR) estimators while incorporating intermediate labels, such as initial engagement signals, to achieve better bias-variance control in matching markets. Theoretically, we derive the bias and variance of the proposed estimators and demonstrate their advantages over conventional methods. Furthermore, we show that these estimators can be seamlessly extended to offline policy learning methods for improving recommendation policies for making more matches. We empirically evaluate our methods through experiments on both synthetic data and A/B testing logs from a real job-matching platform. The empirical results highlight the superiority of our approach over existing methods in off-policy evaluation and learning tasks for a variety of configurations.
PrefPalette: Personalized Preference Modeling with Latent Attributes
Li, Shuyue Stella, Sclar, Melanie, Lang, Hunter, Ni, Ansong, He, Jacqueline, Xu, Puxin, Cohen, Andrew, Park, Chan Young, Tsvetkov, Yulia, Celikyilmaz, Asli
Personalizing AI systems requires understanding not just what users prefer, but the reasons that underlie those preferences - yet current preference models typically treat human judgment as a black box. We introduce PrefPalette, a framework that decomposes preferences into attribute dimensions and tailors its preference prediction to distinct social community values in a human-interpretable manner. PrefPalette operationalizes a cognitive science principle known as multi-attribute decision making in two ways: (1) a scalable counterfactual attribute synthesis step that involves generating synthetic training data to isolate for individual attribute effects (e.g., formality, humor, cultural values), and (2) attention-based preference modeling that learns how different social communities dynamically weight these attributes. This approach moves beyond aggregate preference modeling to capture the diverse evaluation frameworks that drive human judgment. When evaluated on 45 social communities from the online platform Reddit, PrefPalette outperforms GPT-4o by 46.6% in average prediction accuracy. Beyond raw predictive improvements, PrefPalette also shed light on intuitive, community-specific profiles: scholarly communities prioritize verbosity and stimulation, conflict-oriented communities value sarcasm and directness, and support-based communities emphasize empathy. By modeling the attribute-mediated structure of human judgment, PrefPalette delivers both superior preference modeling and transparent, interpretable insights, and serves as a first step toward more trustworthy, value-aware personalized applications.
Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?
Li, Shuangning, Wang, Chonghuan, Wang, Jingyan
Recommendation systems are widely deployed across online platforms. Users receive numerous recommendations every day, including news and creators' content on social media, products in online marketplaces, services in freelancing labor markets, ads on websites, and so on. During the development of such recommendation systems, a crucial task that companies face all the time is to compare the performance of different recommendation algorithms, and make business decisions on which one to eventually deploy in production. A common approach to comparing the performance of two recommendation algorithms is through randomized controlled trials, also known as A/B experiments. In a typical user-randomized A/B experiment, each user is assigned to a treatment group (running one recommendation algorithm) or a control group (running the other recommendation algorithm), uniformly at random. The metric to measure the performance of the two algorithms can be, for example, user engagement, click-through rates, purchase revenues, etc. Our goal is to estimate the global treatment effect (GTE), the difference between the treatment group and the control group in terms of this performance metric. More precisely, the GTE is defined as the difference in this performance metric between deploying the treatment algorithm to all users versus deploying the control algorithm to all users.
Looking for Fairness in Recommender Systems
Recommender systems can be found everywhere today, shaping our everyday experience whenever we're consuming content, ordering food, buying groceries online, or even just reading the news. Let's imagine we're in the process of building a recommender system to make content suggestions to users on social media. When thinking about fairness, it becomes clear there are several perspectives to consider: the users asking for tailored suggestions, the content creators hoping for some limelight, and society at large, navigating the repercussions of algorithmic recommendations. A shared fairness concern across all three is the emergence of filter bubbles, a side-effect that takes place when recommender systems are almost "too good", making recommendations so tailored that users become inadvertently confined to a narrow set of opinions/themes and isolated from alternative ideas. From the user's perspective, this is akin to manipulation. From the small content creator's perspective, this is an obstacle preventing them access to a whole range of potential fans. From society's perspective, the potential consequences are far-reaching, influencing collective opinions, social behavior and political decisions. How can our recommender system be fine-tuned to avoid the creation of filter bubbles, and ensure a more inclusive and diverse content landscape? Approaching this problem involves defining one (or more) performance metric to represent diversity, and tweaking our recommender system's performance through the lens of fairness. By incorporating this metric into our evaluation framework, we aim to strike a balance between personalized recommendations and the broader societal goal of fostering rich and varied cultures and points of view.
I spent a day with Amazon's Alexa : It's not perfect, but it's much smarter
"Alexa," I asked the Echo display in my kitchen, "what was that song from The Hills? You know, that MTV show? Can you play it on the Echo Show in the office?" The old Alexa wouldn't have had a prayer of answering such a poorly worded query. But the new Alexa, now packing AI-enhanced smarts, handled it easily.
Alleviating User-Sensitive bias with Fair Generative Sequential Recommendation Model
Liu, Yang, Wu, Feng, Zhu, Xuefang
Recommendation fairness has recently attracted much attention. In the real world, recommendation systems are driven by user behavior, and since users with the same sensitive feature (e.g., gender and age) tend to have the same patterns, recommendation models can easily capture the strong correlation preference of sensitive features and thus cause recommendation unfairness. Diffusion model (DM) as a new generative model paradigm has achieved great success in recommendation systems. DM's ability to model uncertainty and represent diversity, and its modeling mechanism has a high degree of adaptability with the real-world recommendation process with bias. Therefore, we use DM to effectively model the fairness of recommendation and enhance the diversity. This paper proposes a FairGENerative sequential Recommendation model based on DM, FairGENRec. In the training phase, we inject random noise into the original distribution under the guidance of the sensitive feature recognition model, and a sequential denoise model is designed for the reverse reconstruction of items. Simultaneously, recommendation fairness modeling is completed by injecting multi-interests representational information that eliminates the bias of sensitive user features into the generated results. In the inference phase, the model obtains the noise in the form of noise addition by using the history interactions which is followed by reverse iteration to reconstruct the target item representation. Finally, our extensive experiments on three datasets demonstrate the dual enhancement effect of FairGENRec on accuracy and fairness, while the statistical analysis of the cases visualizes the degree of improvement on the fairness of the recommendation.
How Stylistic Similarity Shapes Preferences in Dialogue Dataset with User and Third Party Evaluations
Numaya, Ikumi, Moriya, Shoji, Sato, Shiki, Akama, Reina, Suzuki, Jun
Recent advancements in dialogue generation have broadened the scope of human-bot interactions, enabling not only contextually appropriate responses but also the analysis of human affect and sensitivity. While prior work has suggested that stylistic similarity between user and system may enhance user impressions, the distinction between subjective and objective similarity is often overlooked. To investigate this issue, we introduce a novel dataset that includes users' preferences, subjective stylistic similarity based on users' own perceptions, and objective stylistic similarity annotated by third party evaluators in open-domain dialogue settings. Analysis using the constructed dataset reveals a strong positive correlation between subjective stylistic similarity and user preference. Furthermore, our analysis suggests an important finding: users' subjective stylistic similarity differs from third party objective similarity. This underscores the importance of distinguishing between subjective and objective evaluations and understanding the distinct aspects each captures when analyzing the relationship between stylistic similarity and user preferences. The dataset presented in this paper is available online.
Deako has a new smart light switch and a next-gen app
Deako takes a unique approach to smart lighting with its modular system, and now it's offering a next-generation switch and app that promises to make installation and configuration even easier than before. The appeal of Deako's system lies in its use of backplates that, once hardwired to your home's electrical system, allow you to hot-swap one type of Deako lighting control for another--you don't even need to turn off the power. If you have an on/off switch controlling your ceiling cans, for example, but you'd prefer to be able to dim those lights, you can just yank the switch out of the backplate and pop in a dimmer. Once you've installed a Deako backplate in your electrical box, you can plug in–and later hot-swap–any of its smart or simple lighting controls. Deako is celebrating its 10th anniversary by taking the wraps off an all-new smart switch: the Deako Smart Switch Gen 2 ( 100, pictured at the top of this page) features an integrated, dual-function LED light bar at its base.