Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries
–arXiv.org Artificial Intelligence
ABSTRACT We present an open - source benchmark and evaluation framework for assessing emotional boundary handling in Large Language Models (LLMs). Using a dataset of 1156 prompts across six languages, we evaluated three leading LLMs (GPT - 4 o, Claude - 3 .5 Sonnet, and Mistral - large) on their ability to maintain appropriate emotional boundaries through pattern - matched response analysis. We identified a substantial performance gap between English (average score 25.62) and non - English interactions ( 0.22), with English resp onses showing markedly higher refusal rates (43.20% vs. < 1% for non - English). Pattern analysis revealed model - specific strategies, such as Mistral's preference for deflection (4.2%) a nd consistently low empathy scores across all models ( 0.06). Limitations include potential oversimplification through pattern matching, lack of contextual understanding in response analysis, and binary classification of complex emotional responses. Futur e work should explore more nuanced scoring methods, expand language coverage, and investigate cultural variations in emotional boundary expectations. Our benchmark and methodology provide a foundation for systematic evaluation of LLM emotional intelligence and boundary - setting capabilities. INTRODUCTION People often form deep emotional connections with conversational AI systems, treating them as friends or confidants, particularly when an algorithm gets a distinctive voice or recognizable avatar . This phenomenon stems from our tendency to anthropomorphize technology - we project human qualities and emotions onto machines that interact in human - like ways [1 - 11 ]. While such persona construction by users can provide comfort, it also tests the limits of AI chatbots' ethical boundaries. Many currently controversial uses for AI include personal counseling, suicide hotlines and judicial revie w, mainly in areas that suffer understaffing as much as any specific machine aptitudes or perceived emotional intelligen ce. The relentless 24/7 availability drives a different economic scenario than AI safety might recommend in areas more easily staffed by qualified professionals . In practical terms, LLM u sers may ask an AI to express love, loyalty, or other human - like emotions, effectively inviting the AI to behave like a person [12] . Current safety - aligned large language models (LLMs), however, are typically programmed not to claim human emotions or validate relationships untruthfully. They often respond with refusals or reminders of their AI identity when faced with these requests for some emotional attachment . Paradoxically, the more advanced and human - like the AI appears, the more users expect or desire emotional reciprocity [3 - 6] and the more likely the AI will refuse such requests. This phenomenon creates a tension between the empathic helpfulness that AI strives to provide, and the firm boundaries set to prevent deception or misuse.
arXiv.org Artificial Intelligence
Feb-20-2025
- Country:
- North America > United States (0.46)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Technology: