Goto

Collaborating Authors

 user expectation


Interview with Anindya Das Antar: Evaluating effectiveness of moderation guardrails in aligning LLM outputs

AIHub

In their paper presented at AIES 2025, "Do Your Guardrails Even Guard?" Method for Evaluating Effectiveness of Moderation Guardrails in Aligning LLM Outputs with Expert User Expectations, Anindya Das Antar Xun Huan and Nikola Banovic propose a method to evaluate and select guardrails that best align LLM outputs with domain knowledge from subject-matter experts. Here, Anindya tells us more about their method, some case studies, and plans for future developments. Could you give us some background to your work - why are guardrails such an important area for study? Ensuring that large language models (LLMs) produce desirable outputs without harmful side effects and align with user expectations, organizational goals, and existing domain knowledge is crucial for their adoption in high-stakes decision-making. However, despite training on vast amounts of data, LLMs can still produce incorrect, misleading, or otherwise unexpected and undesirable outputs.


Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent

Feng, Xueyang, Zhang, Jingsen, Tang, Jiakai, Li, Wei, Cai, Guohao, Chen, Xu, Dai, Quanyu, Zhu, Yue, Dong, Zhenhua

arXiv.org Artificial Intelligence

Recent advancements in Large Language Models (LLMs) have significantly propelled the development of Conversational Recommendation Agents (CRAs). However, these agents often generate short-sighted responses that fail to sustain user guidance and meet expectations. Although preference optimization has proven effective in aligning LLMs with user expectations, it remains costly and performs poorly in multi-turn dialogue. To address this challenge, we introduce a novel multi-turn preference optimization (MTPO) paradigm ECPO, which leverages Expectation Confirmation Theory to explicitly model the evolution of user satisfaction throughout multi-turn dialogues, uncovering the underlying causes of dissatisfaction. These causes can be utilized to support targeted optimization of unsatisfactory responses, thereby achieving turn-level preference optimization. ECPO ingeniously eliminates the significant sampling overhead of existing MTPO methods while ensuring the optimization process drives meaningful improvements. To support ECPO, we introduce an LLM-based user simulator, AILO, to simulate user feedback and perform expectation confirmation during conversational recommendations. Experimental results show that ECPO significantly enhances CRA's interaction capabilities, delivering notable improvements in both efficiency and effectiveness over existing MTPO methods.


Expectations, Explanations, and Embodiment: Attempts at Robot Failure Recovery

Yadollahi, Elmira, Dogan, Fethiye Irmak, Zhang, Yujing, Nogueira, Beatriz, Guerreiro, Tiago, Tzedek, Shelly Levy, Leite, Iolanda

arXiv.org Artificial Intelligence

Expectations critically shape how people form judgments about robots, influencing whether they view failures as minor technical glitches or deal-breaking flaws. This work explores how high and low expectations, induced through brief video priming, affect user perceptions of robot failures and the utility of explanations in HRI. We conducted two online studies ( N = 600 total participants); each replicated two robots with different embodiments, Furhat and Pepper. In our first study, grounded in expectation theory, participants were divided into two groups, one primed with positive and the other with negative expectations regarding the robot's performance, establishing distinct expectation frameworks. This validation study aimed to verify whether the videos could reliably establish low and high-expectation profiles. In the second study, participants were primed using the validated videos and then viewed a new scenario in which the robot failed at a task. Half viewed a version where the robot explained its failure, while the other half received no explanation. We found that explanations significantly improved user perceptions of Furhat, especially when participants were primed to have lower expectations. Explanations boosted satisfaction and enhanced the robot's perceived expressiveness, indicating that effectively communicat-Authors contributed equally. By contrast, Pepper's explanations produced minimal impact on user attitudes, suggesting that a robot's embodiment and style of interaction could determine whether explanations can successfully offset negative impressions. Together, these findings underscore the need to consider users' expectations when tailoring explanation strategies in HRI. When expectations are initially low, a cogent explanation can make the difference between dismissing a failure and appreciating the robot's transparency and effort to communicate. Keywords: Expectations, Explanations, Explainability, Human-Robot Interaction, Priming 1. Introduction When robots operate in human environments, user expectations play a crucial role in shaping human-robot interaction (HRI) (Lohse, 2009; Horstmann and Kr amer, 2020; Dogan et al., 2025). However, there is often a mismatch between these expectations and the actual capabilities of social robots (Ros en et al., 2022), which can lead to disappointment and, consequently, diminish the quality of interactions (Olson et al., 1996; Kruglanski and Sleeth-Keppler, 2007). For instance, a user might expect robots to function as proactive and autonomous assistants, yet when robots make mistakes due to their limited abilities, this mismatch can undermine the robot's perceived trustworthiness and competence (Salem et al., 2015; Cha et al., 2015).


CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data

Cheng, Zhao, Wan, Diane, Abueg, Matthew, Ghalebikesabi, Sahra, Yi, Ren, Bagdasarian, Eugene, Balle, Borja, Mellem, Stefan, O'Banion, Shawn

arXiv.org Artificial Intelligence

Advances in generative AI point towards a new era of personalized applications that perform diverse tasks on behalf of users. While general AI assistants have yet to fully emerge, their potential to share personal data raises significant privacy challenges. This paper introduces CI-Bench, a comprehensive synthetic benchmark for evaluating the ability of AI assistants to protect personal information during model inference. Leveraging the Contextual Integrity framework, our benchmark enables systematic assessment of information flow across important context dimensions, including roles, information types, and transmission principles. We present a novel, scalable, multi-step synthetic data pipeline for generating natural communications, including dialogues and emails. Unlike previous work with smaller, narrowly focused evaluations, we present a novel, scalable, multi-step data pipeline that synthetically generates natural communications, including dialogues and emails, which we use to generate 44 thousand test samples across eight domains. Additionally, we formulate and evaluate a naive AI assistant to demonstrate the need for further study and careful training towards personal assistant tasks. We envision CI-Bench as a valuable tool for guiding future language model development, deployment, system design, and dataset construction, ultimately contributing to the development of AI assistants that align with users' privacy expectations.


Investigating the effect of Mental Models in User Interaction with an Adaptive Dialog Agent

Vanderlyn, Lindsey, Väth, Dirk, Vu, Ngoc Thang

arXiv.org Artificial Intelligence

Mental models play an important role in whether user interaction with intelligent systems, such as dialog systems is successful or not. Adaptive dialog systems present the opportunity to align a dialog agent's behavior with heterogeneous user expectations. However, there has been little research into what mental models users form when interacting with a task-oriented dialog system, how these models affect users' interactions, or what role system adaptation can play in this process, making it challenging to avoid damage to human-AI partnership. In this work, we collect a new publicly available dataset for exploring user mental models about information seeking dialog systems. We demonstrate that users have a variety of conflicting mental models about such systems, the validity of which directly impacts the success of their interactions and perceived usability of system. Furthermore, we show that adapting a dialog agent's behavior to better align with users' mental models, even when done implicitly, can improve perceived usability, dialog efficiency, and success. To this end, we argue that implicit adaptation can be a valid strategy for task-oriented dialog systems, so long as developers first have a solid understanding of users' mental models.


Imagining In-distribution States: How Predictable Robot Behavior Can Enable User Control Over Learned Policies

Sheidlower, Isaac, Bethel, Emma, Lilly, Douglas, Aronson, Reuben M., Short, Elaine Schaertl

arXiv.org Artificial Intelligence

It is crucial that users are empowered to take advantage of the functionality of a robot and use their understanding of that functionality to perform novel and creative tasks. Given a robot trained with Reinforcement Learning (RL), a user may wish to leverage that autonomy along with their familiarity of how they expect the robot to behave to collaborate with the robot. One technique is for the user to take control of some of the robot's action space through teleoperation, allowing the RL policy to simultaneously control the rest. We formalize this type of shared control as Partitioned Control (PC). However, this may not be possible using an out-of-the-box RL policy. For example, a user's control may bring the robot into a failure state from the policy's perspective, causing it to act unexpectedly and hindering the success of the user's desired task. In this work, we formalize this problem and present Imaginary Out-of-Distribution Actions, IODA, an initial algorithm which empowers users to leverage their expectations of a robot's behavior to accomplish new tasks. We deploy IODA in a user study with a real robot and find that IODA leads to both better task performance and a higher degree of alignment between robot behavior and user expectation. We also show that in PC, there is a strong and significant correlation between task performance and the robot's ability to meet user expectations, highlighting the need for approaches like IODA. Code is available at https://github.com/AABL-Lab/ioda_roman_2024


EHAZOP: A Proof of Concept Ethical Hazard Analysis of an Assistive Robot

Menon, Catherine, Rainer, Austen, Holthaus, Patrick, Lakatos, Gabriella, Carta, Silvio

arXiv.org Artificial Intelligence

The use of assistive robots in domestic environments can raise significant ethical concerns, from the risk of individual ethical harm to wider societal ethical impacts including culture flattening and compromise of human dignity. It is therefore essential to ensure that technological development of these robots is informed by robust and inclusive techniques for mitigating ethical concerns. This paper presents EHAZOP, a method for conducting an ethical hazard analysis on an assistive robot. EHAZOP draws upon collaborative, creative and structured processes originating within safety engineering, using these to identify ethical concerns associated with the operation of a given assistive robot. We present the results of a proof of concept study of EHAZOP, demonstrating the potential for this process to identify diverse ethical hazards in these systems.


Understanding (Un)Intended Memorization in Text-to-Image Generative Models

Naseh, Ali, Roh, Jaechul, Houmansadr, Amir

arXiv.org Artificial Intelligence

Multimodal machine learning, especially text-to-image models like Stable Diffusion and DALL-E 3, has gained significance for transforming text into detailed images. Despite their growing use and remarkable generative capabilities, there is a pressing need for a detailed examination of these models' behavior, particularly with respect to memorization. Historically, memorization in machine learning has been context-dependent, with diverse definitions emerging from classification tasks to complex models like Large Language Models (LLMs) and Diffusion models. Yet, a definitive concept of memorization that aligns with the intricacies of text-to-image synthesis remains elusive. This understanding is vital as memorization poses privacy risks yet is essential for meeting user expectations, especially when generating representations of underrepresented entities. In this paper, we introduce a specialized definition of memorization tailored to text-to-image models, categorizing it into three distinct types according to user expectations. We closely examine the subtle distinctions between intended and unintended memorization, emphasizing the importance of balancing user privacy with the generative quality of the model outputs. Using the Stable Diffusion model, we offer examples to validate our memorization definitions and clarify their application.


The rise of API-powered NLP apps: Hype Cycle, or a New Disruptive… – Towards AI

#artificialintelligence

Large Language Models (LLMs) have come a long way in recent years. From fluent dialogue generation to text summarisation, and article generation, language models have made it extremely easy for anyone to build an NLP-powered product. As a result, hundreds of apps have been popping up every day, predominantly relying on APIs such as OpenAI, Cohere, or Stable Diffusion. Looking at these developments, one might wonder: what is the disruptive potential of such apps? Are they poised to deliver transformative results to all industries?


How Machine Learning redefined website development

#artificialintelligence

With the increasing usage of websites, the demand for highly secure and performance-based websites development has increased. New technologies are growing to meet the latest innovative trends and user expectations, such as Machine learning. Web applications leverage machine learning algorithms to analyze and better understand customer behaviour and boost customer engagement. Apart from this, ML also optimizes site efficiency, offers dynamic display, delivers personalized content, improves search results and product discovery, simplify website development. It is vital to have an in-depth understanding of machine learning technology and how it helps to sort complex business challenges.