Personal Assistant Systems
Optimizing Online Advertising with Multi-Armed Bandits: Mitigating the Cold Start Problem under Auction Dynamics
Soboleva, Anastasiia, Pudovikov, Andrey, Snetkov, Roman, Babenko, Alina, Samosvat, Egor, Dorn, Yuriy
Online advertising platforms often face a common challenge: the cold start problem. Insufficient behavioral data (clicks) makes accurate click-through rate (CTR) forecasting of new ads challenging. CTR for "old" items can also be significantly underestimated due to their early performance influencing their long-term behavior on the platform. The cold start problem has far-reaching implications for businesses, including missed long-term revenue opportunities. To mitigate this issue, we developed a UCB-like algorithm under multi-armed bandit (MAB) setting for positional-based model (PBM), specifically tailored to auction pay-per-click systems. Our proposed algorithm successfully combines theory and practice: we obtain theoretical upper estimates of budget regret, and conduct a series of experiments on synthetic and real-world data that confirm the applicability of the method on the real platform. In addition to increasing the platform's long-term profitability, we also propose a mechanism for maintaining short-term profits through controlled exploration and exploitation of items.
SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions
Wagner, Dominik, Churchill, Alexander, Sigtia, Siddharth, Marchi, Erik
In this work, we present and evaluate SELMA, a Speech-Enabled Language Model for virtual Assistant interactions that integrates audio and text as inputs to a Large Language Model (LLM). SELMA is designed to handle three primary and two auxiliary tasks related to interactions with virtual assistants simultaneously within a single end-to-end model. We employ low-rank adaptation modules for parameter-efficient training of both the audio encoder and the LLM. Additionally, we implement a feature pooling strategy enabling the system to recognize global patterns and improve accuracy on tasks less reliant on individual sequence elements. Experimental results on Voice Trigger (VT) detection, Device-Directed Speech Detection (DDSD), and Automatic Speech Recognition (ASR), demonstrate that our approach both simplifies the typical input processing pipeline of virtual assistants significantly and also improves performance compared to dedicated models for each individual task. SELMA yields relative Equal-Error Rate improvements of 64% on the VT detection task, and 22% on DDSD, while also achieving word error rates close to the baseline.
Explainability in Practice: A Survey of Explainable NLP Across Various Domains
Mohammadi, Hadi, Bagheri, Ayoub, Giachanou, Anastasia, Oberski, Daniel L.
Natural Language Processing (NLP) has become a cornerstone in many critical sectors, including healthcare, finance, and customer relationship management. This is especially true with the development and use of advanced models such as GPT-based architectures and BERT, which are widely used in decision-making processes. However, the black-box nature of these advanced NLP models has created an urgent need for transparency and explainability. This review explores explainable NLP (XNLP) with a focus on its practical deployment and real-world applications, examining its implementation and the challenges faced in domain-specific contexts. The paper underscores the importance of explainability in NLP and provides a comprehensive perspective on how XNLP can be designed to meet the unique demands of various sectors, from healthcare's need for clear insights to finance's emphasis on fraud detection and risk assessment. Additionally, this review aims to bridge the knowledge gap in XNLP literature by offering a domain-specific exploration and discussing underrepresented areas such as real-world applicability, metric evaluation, and the role of human interaction in model assessment. The paper concludes by suggesting future research directions that could enhance the understanding and broader application of XNLP.
Detecting Ambiguities to Guide Query Rewrite for Robust Conversations in Enterprise AI Assistants
Tanjim, Md Mehrab, Chen, Xiang, Bursztyn, Victor S., Bhattacharya, Uttaran, Mai, Tung, Muppala, Vaishnavi, Maharaj, Akash, Mitra, Saayan, Koh, Eunyee, Li, Yunyao, Russell, Ken
Multi-turn conversations with an Enterprise AI Assistant can be challenging due to conversational dependencies in questions, leading to ambiguities and errors. To address this, we propose an NLU-NLG framework for ambiguity detection and resolution through reformulating query automatically and introduce a new task called "Ambiguity-guided Query Rewrite." To detect ambiguities, we develop a taxonomy based on real user conversational logs and draw insights from it to design rules and extract features for a classifier which yields superior performance in detecting ambiguous queries, outperforming LLM-based baselines. Furthermore, coupling the query rewrite module with our ambiguity detecting classifier shows that this end-to-end framework can effectively mitigate ambiguities without risking unnecessary insertions of unwanted phrases for clear queries, leading to an improvement in the overall performance of the AI Assistant. Due to its significance, this has been deployed in the real world application, namely Adobe Experience Platform AI Assistant.
Superhuman AI Disclosure: Impacts on Toxicity, Fairness, and Trust Vary by Expertise and Persona Attributes
Chua, Jaymari, Wang, Chen, Yao, Lina
As artificial intelligence demonstrates surpassing human performance across real-world tasks, disclosing superhuman capabilities poses challenges for fairness, accountability, and trust. To investigate how transparency impacts attitudes and perceptions, we introduce a grounded and validated set of synthetic personas reflecting diverse fairness concerns and technology acceptance levels. Then we evaluate responses in two contrasting domains: (1) a competitive player in StarCraft II, where strategy and high-skill gameplay often elicit toxic interactions, and (2) a cooperative personal-assistant in providing information. Across numerous interactions spanning persona profiles, we test non-disclosure versus explicit superhuman labelling under controlled game outcomes and usage contexts. Our findings reveal sharp domain-specific effects: in StarCraft II, explicitly labelling AI as superhuman, novice personas who learned of it reported lower toxicity and higher fairness-attributing defeat to advanced skill rather than hidden cheating-whereas expert personas found the disclosure statements irksome but still less deceptive than non-disclosure. Conversely, in the LLM as personal-assistant setting, disclosure of superhuman capabilities improved perceived trustworthiness, though it risked AI overreliance among certain persona segments. We release Dataset X-containing persona cards-including profile attributes, disclosure prompts, and detailed interaction logs, accompanied by reproducible protocols and disclaimers for adapting them to diverse tasks. Our results demonstrate that transparency is not a cure-all: while it reduces suspicion and enhances trust in cooperative contexts, it may inflame resistance or disappointment in competitive domains.
Men who like MEAT are more likely to bag a date - because women see them as more masculine than vegetarians, study finds
Whether it's Tinder or Hinge, anyone with an online dating account will know that choosing the perfect pictures and words for your profile is a tricky business. From candid photos to funny jokes, it can be difficult to know what will help you bag the likes in a sea of profiles. But help is at hand, as scientists have revealed the one word you definitely should not include on your profile. According to researchers from the University of Warsaw, the word'vegetarian' will immediately put off potential dates. In a new study, the team found that being a vegetarian makes both women and men less attractive as potential partners.
International AI Safety Report
Bengio, Yoshua, Mindermann, Sören, Privitera, Daniel, Besiroglu, Tamay, Bommasani, Rishi, Casper, Stephen, Choi, Yejin, Fox, Philip, Garfinkel, Ben, Goldfarb, Danielle, Heidari, Hoda, Ho, Anson, Kapoor, Sayash, Khalatbari, Leila, Longpre, Shayne, Manning, Sam, Mavroudis, Vasilios, Mazeika, Mantas, Michael, Julian, Newman, Jessica, Ng, Kwan Yee, Okolo, Chinasa T., Raji, Deborah, Sastry, Girish, Seger, Elizabeth, Skeadas, Theodora, South, Tobin, Strubell, Emma, Tramèr, Florian, Velasco, Lucia, Wheeler, Nicole, Acemoglu, Daron, Adekanmbi, Olubayo, Dalrymple, David, Dietterich, Thomas G., Felten, Edward W., Fung, Pascale, Gourinchas, Pierre-Olivier, Heintz, Fredrik, Hinton, Geoffrey, Jennings, Nick, Krause, Andreas, Leavy, Susan, Liang, Percy, Ludermir, Teresa, Marda, Vidushi, Margetts, Helen, McDermid, John, Munga, Jane, Narayanan, Arvind, Nelson, Alondra, Neppel, Clara, Oh, Alice, Ramchurn, Gopal, Russell, Stuart, Schaake, Marietje, Schölkopf, Bernhard, Song, Dawn, Soto, Alvaro, Tiedrich, Lee, Varoquaux, Gaël, Yao, Andrew, Zhang, Ya-Qin, Albalawi, Fahad, Alserkal, Marwan, Ajala, Olubunmi, Avrin, Guillaume, Busch, Christian, de Carvalho, André Carlos Ponce de Leon Ferreira, Fox, Bronwyn, Gill, Amandeep Singh, Hatip, Ahmet Halit, Heikkilä, Juha, Jolly, Gill, Katzir, Ziv, Kitano, Hiroaki, Krüger, Antonio, Johnson, Chris, Khan, Saif M., Lee, Kyoung Mu, Ligot, Dominic Vincent, Molchanovskyi, Oleksii, Monti, Andrea, Mwamanzi, Nusu, Nemer, Mona, Oliver, Nuria, Portillo, José Ramón López, Ravindran, Balaraman, Rivera, Raquel Pezoa, Riza, Hammam, Rugege, Crystal, Seoighe, Ciarán, Sheehan, Jerry, Sheikh, Haroon, Wong, Denise, Zeng, Yi
I am honoured to present the International AI Safety Report. It is the work of 96 international AI experts who collaborated in an unprecedented effort to establish an internationally shared scientific understanding of risks from advanced AI and methods for managing them. We embarked on this journey just over a year ago, shortly after the countries present at the Bletchley Park AI Safety Summit agreed to support the creation of this report. Since then, we published an Interim Report in May 2024, which was presented at the AI Seoul Summit. We are now pleased to publish the present, full report ahead of the AI Action Summit in Paris in February 2025. Since the Bletchley Summit, the capabilities of general-purpose AI, the type of AI this report focuses on, have increased further. For example, new models have shown markedly better performance at tests of Professor Yoshua Bengio programming and scientific reasoning.
HyperZero: A Customized End-to-End Auto-Tuning System for Recommendation with Hourly Feedback
Cai, Xufeng, Guan, Ziwei, Yuan, Lei, Aydin, Ali Selman, Xu, Tengyu, Liu, Boying, Ren, Wenbo, Xiang, Renkai, He, Songyi, Yang, Haichuan, Li, Serena, Gao, Mingze, Weng, Yue, Liu, Ji
Modern recommendation systems can be broadly divided into two key stages: the ranking stage, where the system predicts various user engagements (e.g., click-through rate, like rate, follow rate, watch time), and the value model stage, which aggregates these predictive scores through a function (e.g., a linear combination defined by a weight vector) to measure the value of each content by a single numerical score. Both stages play roughly equally important roles in real industrial systems; however, how to optimize the model weights for the second stage still lacks systematic study. This paper focuses on optimizing the second stage through auto-tuning technology. Although general auto-tuning systems and solutions - both from established production practices and open-source solutions - can address this problem, they typically require weeks or even months to identify a feasible solution. Such prolonged tuning processes are unacceptable in production environments for recommendation systems, as suboptimal value models can severely degrade user experience. An effective auto-tuning solution is required to identify a viable model within 2-3 days, rather than the extended timelines typically associated with existing approaches. In this paper, we introduce a practical auto-tuning system named HyperZero that addresses these time constraints while effectively solving the unique challenges inherent in modern recommendation systems. Moreover, this framework has the potential to be expanded to broader tuning tasks within recommendation systems.
Towards Recommender Systems LLMs Playground (RecSysLLMsP): Exploring Polarization and Engagement in Simulated Social Networks
Bojic, Ljubisa, Dodevska, Zorica, Deldjoo, Yashar, Pantelic, Nenad
Given the exponential advancement in AI technologies and the potential escalation of harmful effects from recommendation systems, it is crucial to simulate and evaluate these effects early on. Doing so can help prevent possible damage to both societies and technology companies. This paper introduces the Recommender Systems LLMs Playground (RecSysLLMsP), a novel simulation framework leveraging Large Language Models (LLMs) to explore the impacts of different content recommendation setups on user engagement and polarization in social networks. By creating diverse AI agents (AgentPrompts) with descriptive, static, and dynamic attributes, we assess their autonomous behaviour across three scenarios: Plurality, Balanced, and Similarity. Our findings reveal that the Similarity Scenario, which aligns content with user preferences, maximizes engagement while potentially fostering echo chambers. Conversely, the Plurality Scenario promotes diverse interactions but produces mixed engagement results. Our study emphasizes the need for a careful balance in recommender system designs to enhance user satisfaction while mitigating societal polarization. It underscores the unique value and challenges of incorporating LLMs into simulation environments. The benefits of RecSysLLMsP lie in its potential to calculate polarization effects, which is crucial for assessing societal impacts and determining user engagement levels with diverse recommender system setups. This advantage is essential for developing and maintaining a successful business model for social media companies. However, the study's limitations revolve around accurately emulating reality. Future efforts should validate the similarity in behaviour between real humans and AgentPrompts and establish metrics for measuring polarization scores.
From tools to thieves: Measuring and understanding public perceptions of AI through crowdsourced metaphors
Cheng, Myra, Lee, Angela Y., Rapuano, Kristina, Niederhoffer, Kate, Liebscher, Alex, Hancock, Jeffrey
How has the public responded to the increasing prevalence of artificial intelligence (AI)-based technologies? We investigate public perceptions of AI by collecting over 12,000 responses over 12 months from a nationally representative U.S. sample. Participants provided open-ended metaphors reflecting their mental models of AI, a methodology that overcomes the limitations of traditional self-reported measures. Using a mixed-methods approach combining quantitative clustering and qualitative coding, we identify 20 dominant metaphors shaping public understanding of AI. To analyze these metaphors systematically, we present a scalable framework integrating language modeling (LM)-based techniques to measure key dimensions of public perception: anthropomorphism (attribution of human-like qualities), warmth, and competence. We find that Americans generally view AI as warm and competent, and that over the past year, perceptions of AI's human-likeness and warmth have significantly increased ($+34\%, r = 0.80, p < 0.01; +41\%, r = 0.62, p < 0.05$). Furthermore, these implicit perceptions, along with the identified dominant metaphors, strongly predict trust in and willingness to adopt AI ($r^2 = 0.21, 0.18, p < 0.001$). We further explore how differences in metaphors and implicit perceptions--such as the higher propensity of women, older individuals, and people of color to anthropomorphize AI--shed light on demographic disparities in trust and adoption. In addition to our dataset and framework for tracking evolving public attitudes, we provide actionable insights on using metaphors for inclusive and responsible AI development.