Personal Assistant Systems
BayesCNS: A Unified Bayesian Approach to Address Cold Start and Non-Stationarity in Search Systems at Scale
Ardywibowo, Randy, Sunki, Rakesh, Kuo, Lucy, Nayak, Sankalp
Information Retrieval (IR) systems used in search and recommendation platforms frequently employ Learning-to-Rank (LTR) models to rank items in response to user queries. These models heavily rely on features derived from user interactions, such as clicks and engagement data. This dependence introduces cold start issues for items lacking user engagement and poses challenges in adapting to non-stationary shifts in user behavior over time. We address both challenges holistically as an online learning problem and propose BayesCNS, a Bayesian approach designed to handle cold start and non-stationary distribution shifts in search systems at scale. BayesCNS achieves this by estimating prior distributions for user-item interactions, which are continuously updated with new user interactions gathered online. This online learning procedure is guided by a ranker model, enabling efficient exploration of relevant items using contextual information provided by the ranker. We successfully deployed BayesCNS in a large-scale search system and demonstrated its efficacy through comprehensive offline and online experiments. Notably, an online A/B experiment showed a 10.60% increase in new item interactions and a 1.05% improvement in overall success metrics over the existing production baseline.
Bridging Conversational and Collaborative Signals for Conversational Recommendation
Rabiah, Ahmad Bin, Sadeq, Nafis, McAuley, Julian
Conversational recommendation systems (CRS) leverage contextual information from conversations to generate recommendations but often struggle due to a lack of collaborative filtering (CF) signals, which capture user-item interaction patterns essential for accurate recommendations. We introduce Reddit-ML32M, a dataset that links reddit conversations with interactions on MovieLens 32M, to enrich item representations by leveraging collaborative knowledge and addressing interaction sparsity in conversational datasets. We propose an LLM-based framework that uses Reddit-ML32M to align LLM-generated recommendations with CF embeddings, refining rankings for better performance. We evaluate our framework against three sets of baselines: CF-based recommenders using only interactions from CRS tasks, traditional CRS models, and LLM-based methods relying on conversational context without item representations. Our approach achieves consistent improvements, including a 12.32% increase in Hit Rate and a 9.9% improvement in NDCG, outperforming the best-performing baseline that relies on conversational context but lacks collaborative item representations.
'It feels like admin': why are people falling out of love with dating apps?
About 1.4 million people in the UK have left the online dating scene in the past 12 months, but is that a sign that the apps don't work or that people are turning away from dating altogether? Ofcom's 2024 Online Nation report shows that dating app use declined significantly between 2023 and 2024, with a drop of nearly 16% in the use of the top 10 most popular dating apps this year. Tinder experienced the biggest loss, with more than half a million users abandoning the platform since May 2023. Bumble and Hinge were also hit hard, losing 368,000 and 131,000 users respectively in the same period. According to researchers at the University of Leeds' Centre for Love, Sex, and Relationships (CLSR), a contributing factor in the decline of online dating could be a sense of detachment from reality and fatigue at the process.
Fuzzy Norm-Explicit Product Quantization for Recommender Systems
Jamalifard, Mohammadreza, Andreu-Perez, Javier, Hagras, Hani, López, Luis Martínez
As the data resources grow, providing recommendations that best meet the demands has become a vital requirement in business and life to overcome the information overload problem. However, building a system suggesting relevant recommendations has always been a point of debate. One of the most cost-efficient techniques in terms of producing relevant recommendations at a low complexity is Product Quantization (PQ). PQ approaches have continued developing in recent years. This system's crucial challenge is improving product quantization performance in terms of recall measures without compromising its complexity. This makes the algorithm suitable for problems that require a greater number of potentially relevant items without disregarding others, at high-speed and low-cost to keep up with traffic. This is the case of online shops where the recommendations for the purpose are important, although customers can be susceptible to scoping other products. This research proposes a fuzzy approach to perform norm-based product quantization. Type-2 Fuzzy sets (T2FSs) define the codebook allowing sub-vectors (T2FSs) to be associated with more than one element of the codebook, and next, its norm calculus is resolved by means of integration. Our method finesses the recall measure up, making the algorithm suitable for problems that require querying at most possible potential relevant items without disregarding others. The proposed method outperforms all PQ approaches such as NEQ, PQ, and RQ up to +6%, +5%, and +8% by achieving a recall of 94%, 69%, 59% in Netflix, Audio, Cifar60k datasets, respectively. More and over, computing time and complexity nearly equals the most computationally efficient existing PQ method in the state-of-the-art.
A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks
Liang, Chia Xin, Tian, Pu, Yin, Caitlyn Heqi, Yua, Yao, An-Hou, Wei, Ming, Li, Wang, Tianyang, Bi, Ziqian, Liu, Ming
This survey and application guide to multimodal large language models(MLLMs) explores the rapidly developing field of MLLMs, examining their architectures, applications, and impact on AI and Generative Models. Starting with foundational concepts, we delve into how MLLMs integrate various data types, including text, images, video and audio, to enable complex AI systems for cross-modal understanding and generation. It covers essential topics such as training methods, architectural components, and practical applications in various fields, from visual storytelling to enhanced accessibility. Through detailed case studies and technical analysis, the text examines prominent MLLM implementations while addressing key challenges in scalability, robustness, and cross-modal learning. Concluding with a discussion of ethical considerations, responsible AI development, and future directions, this authoritative resource provides both theoretical frameworks and practical insights. It offers a balanced perspective on the opportunities and challenges in the development and deployment of MLLMs, and is highly valuable for researchers, practitioners, and students interested in the intersection of natural language processing and computer vision.
Path-based summary explanations for graph recommenders (extended version)
Karidi, Danae Pla, Pitoura, Evaggelia
Path-based explanations provide intrinsic insights into graph-based recommendation models. However, most previous work has focused on explaining an individual recommendation of an item to a user. In this paper, we propose summary explanations, i.e., explanations that highlight why a user or a group of users receive a set of item recommendations and why an item, or a group of items, is recommended to a set of users as an effective means to provide insights into the collective behavior of the recommender. We also present a novel method to summarize explanations using efficient graph algorithms, specifically the Steiner Tree and the Prize-Collecting Steiner Tree. Our approach reduces the size and complexity of summary explanations while preserving essential information, making explanations more comprehensible for users and more useful to model developers. Evaluations across multiple metrics demonstrate that our summaries outperform baseline explanation methods in most scenarios, in a variety of quality aspects.
Google Home preview users just got another Gemini AI feature
Google is stepping up its Gemini AI rollout for smart home users, delivering access to a new Gemini-powered feature in the Google Home app. Android users enrolled in Google Home's Public Preview can now help test "Help me create," a Gemini AI-infused tool that lets you create Google Home automations from natural-language prompts. Here's how it works: You open the Google Home app, tap the Automations tab, tap Add, then tap "Help me create." Next, just describe in English (yes, just English, at least for now) what you'd like the automation to do. For example, you could type (or say) "Turn off the lights at 11 p.m. every night," "Lock all the doors when everyone's away," or "Set a meditation time reminder."
Unifying Generative and Dense Retrieval for Sequential Recommendation
Yang, Liu, Paischer, Fabian, Hassani, Kaveh, Li, Jiacheng, Shao, Shuai, Li, Zhang Gabriel, He, Yun, Feng, Xue, Noorshams, Nima, Park, Sem, Long, Bo, Nowak, Robert D, Gao, Xiaoli, Eghbalzadeh, Hamid
Sequential dense retrieval models utilize advanced sequence learning techniques to compute item and user representations, which are then used to rank relevant items for a user through inner product computation between the user and all item representations. However, this approach requires storing a unique representation for each item, resulting in significant memory requirements as the number of items grow. In contrast, the recently proposed generative retrieval paradigm offers a promising alternative by directly predicting item indices using a generative model trained on semantic IDs that encapsulate items' semantic information. Despite its potential for large-scale applications, a comprehensive comparison between generative retrieval and sequential dense retrieval under fair conditions is still lacking, leaving open questions regarding performance, and computation trade-offs. To address this, we compare these two approaches under controlled conditions on academic benchmarks and propose LIGER (LeveragIng dense retrieval for GEnerative Retrieval), a hybrid model that combines the strengths of these two widely used methods. LIGER integrates sequential dense retrieval into generative retrieval, mitigating performance differences and enhancing cold-start item recommendation in the datasets evaluated. This hybrid approach provides insights into the trade-offs between these approaches and demonstrates improvements in efficiency and effectiveness for recommendation systems in small-scale benchmarks.
AI's assigned gender affects human-AI cooperation
Bazazi, Sepideh, Karpus, Jurgis, Yasseri, Taha
Cooperation between humans and machines is increasingly vital as artificial intelligence (AI) becomes more integrated into daily life. Research indicates that people are often less willing to cooperate with AI agents than with humans, more readily exploiting AI for personal gain. While prior studies have shown that giving AI agents human-like features influences people's cooperation with them, the impact of AI's assigned gender remains underexplored. This study investigates how human cooperation varies based on gender labels assigned to AI agents with which they interact. In the Prisoner's Dilemma game, 402 participants interacted with partners labelled as AI (bot) or humans. The partners were also labelled male, female, non-binary, or gender-neutral. Results revealed that participants tended to exploit female-labelled and distrust male-labelled AI agents more than their human counterparts, reflecting gender biases similar to those in human-human interactions. These findings highlight the significance of gender biases in human-AI interactions that must be considered in future policy, design of interactive AI systems, and regulation of their use.
CALICO: Conversational Agent Localization via Synthetic Data Generation
Rosenbaum, Andy, Kharazmi, Pegah, Banijamali, Ershad, Zeng, Lu, DiPersio, Christopher, Wei, Pan, Oz, Gokmen, Chung, Clement, Owczarzak, Karolina, Triefenbach, Fabian, Hamza, Wael
We present CALICO, a method to fine-tune Large Language Models (LLMs) to localize conversational agent training data from one language to another. For slots (named entities), CALICO supports three operations: verbatim copy, literal translation, and localization, i.e. generating slot values more appropriate in the target language, such as city and airport names located in countries where the language is spoken. Furthermore, we design an iterative filtering mechanism to discard noisy generated samples, which we show boosts the performance of the downstream conversational agent. To prove the effectiveness of CALICO, we build and release a new human-localized (HL) version of the MultiATIS++ travel information test set in 8 languages. Compared to the original human-translated (HT) version of the test set, we show that our new HL version is more challenging. We also show that CALICO out-performs state-of-the-art LINGUIST (which relies on literal slot translation out of context) both on the HT case, where CALICO generates more accurate slot translations, and on the HL case, where CALICO generates localized slots which are closer to the HL test set.