Goto

Collaborating Authors

 overreliance


How to use ChatGPT at university without cheating: 'Now it's more like a study partner'

The Guardian

Educators advise that AI should be used to assist, not replace, learning. Educators advise that AI should be used to assist, not replace, learning. How to use ChatGPT at university without cheating: 'Now it's more like a study partner' The ubiquitous AI tool has a divisive effect on educators with some seeing it a boon and others a menace. So what should you know about where to draw the line between check and cheat? For many students, ChatGPT has become as standard a tool as a notebook or a calculator.


Measuring and mitigating overreliance is necessary for building human-compatible AI

Ibrahim, Lujain, Collins, Katherine M., Kim, Sunnie S. Y., Reuel, Anka, Lamparth, Max, Feng, Kevin, Ahmad, Lama, Soni, Prajna, Kattan, Alia El, Stein, Merlin, Swaroop, Siddharth, Sucholutsky, Ilia, Strait, Andrew, Liao, Q. Vera, Bhatt, Umang

arXiv.org Artificial Intelligence

Large language models (LLMs) distinguish themselves from previous technologies by functioning as collaborative "thought partners," capable of engaging more fluidly in natural language. As LLMs increasingly influence consequential decisions across diverse domains from healthcare to personal advice, the risk of overreliance - relying on LLMs beyond their capabilities - grows. This position paper argues that measuring and mitigating overreliance must become central to LLM research and deployment. First, we consolidate risks from overreliance at both the individual and societal levels, including high-stakes errors, governance challenges, and cognitive deskilling. Then, we explore LLM characteristics, system design features, and user cognitive biases that - together - raise serious and unique concerns about overreliance in practice. We also examine historical approaches for measuring overreliance, identifying three important gaps and proposing three promising directions to improve measurement. Finally, we propose mitigation strategies that the AI research community can pursue to ensure LLMs augment rather than undermine human capabilities.


Students' Reliance on AI in Higher Education: Identifying Contributing Factors

Pitts, Griffin, Rani, Neha, Mildort, Weedguet, Cook, Eva-Marie

arXiv.org Artificial Intelligence

The increasing availability and use of artificial intelligence (AI) tools in educational settings has raised concerns about students' overreliance on these technologies. Overreliance occurs when individuals accept incorrect AI-generated recommendations, often without critical evaluation, leading to flawed problem solutions and undermining learning outcomes. This study investigates potential factors contributing to patterns of AI reliance among undergraduate students, examining not only overreliance but also appropriate reliance (correctly accepting helpful and rejecting harmful recommendations) and underreliance (incorrectly rejecting helpful recommendations). Our approach combined pre- and post-surveys with a controlled experimental task where participants solved programming problems with an AI assistant that provided both accurate and deliberately incorrect suggestions, allowing direct observation of students' reliance patterns when faced with varying AI reliability. We find that appropriate reliance is significantly related to students' programming self-efficacy, programming literacy, and need for cognition, while showing negative correlations with post-task trust and satisfaction. Overreliance showed significant correlations with post-task trust and satisfaction with the AI assistant. Underreliance was negatively correlated with programming literacy, programming self-efficacy, and need for cognition. Overall, the findings provide insights for developing targeted interventions that promote appropriate reliance on AI tools, with implications for the integration of AI in curriculum and educational technologies.


Student Perspectives on the Benefits and Risks of AI in Education

Pitts, Griffin, Marcus, Viktoria, Motamedi, Sanaz

arXiv.org Artificial Intelligence

The use of chatbots equipped with artificial intelligence (AI) in educational settings has increased in recent years, showing potential to support teaching and learning. However, the adoption of these technologies has raised concerns about their impact on academic integrity, students' ability to problem-solve independently, and potential underlying biases. To better understand students' perspectives and experiences with these tools, a survey was conducted at a large public university in the United States. Through thematic analysis, 262 undergraduate students' responses regarding their perceived benefits and risks of AI chatbots in education were identified and categorized into themes. The results discuss several benefits identified by the students, with feedback and study support, instruction capabilities, and access to information being the most cited. Their primary concerns included risks to academic integrity, accuracy of information, loss of critical thinking skills, the potential development of overreliance, and ethical considerations such as data privacy, system bias, environmental impact, and preservation of human elements in education. While student perceptions align with previously discussed benefits and risks of AI in education, they show heightened concerns about distinguishing between human and AI generated work - particularly in cases where authentic work is flagged as AI-generated. To address students' concerns, institutions can establish clear policies regarding AI use and develop curriculum around AI literacy. With these in place, practitioners can effectively develop and implement educational systems that leverage AI's potential in areas such as immediate feedback and personalized learning support. This approach can enhance the quality of students' educational experiences while preserving the integrity of the learning process with AI.


From Prompt Engineering to Prompt Science with Humans in the Loop

Communications of the ACM

In recent years, as the sophistication and capabilities of large language models (LLMs) have grown, so have the tasks for which they're applicable, going beyond information extraction and synthesis15 to include analysis, content creation, and reasoning.8 Unsurprisingly, many researchers find them useful for research tasks, such as identifying relevant papers,19 synthesizing literature reviews,3 writing proposals,11 and analyzing data.31 They have also been found effective for investigative tasks, such as drug discovery.35 There is growing concern, however, that a large portion of this success hinges on prompt engineering, which is often an ad-hoc method to revise prompts being fed into an LLM to achieve a desired response or analysis.24 LLMs are increasingly being used in scientific research, but their application often involves ad-hoc decisions that can impact research quality.


Don't Just Translate, Agitate: Using Large Language Models as Devil's Advocates for AI Explanations

Suh, Ashley, Alperin, Kenneth, Li, Harry, Gomez, Steven R

arXiv.org Artificial Intelligence

This position paper highlights a growing trend in Explainable AI (XAI) research where Large Language Models (LLMs) are used to translate outputs from explainability techniques, like feature-attribution weights, into a natural language explanation. While this approach may improve accessibility or readability for users, recent findings suggest that translating into human-like explanations does not necessarily enhance user understanding and may instead lead to overreliance on AI systems. When LLMs summarize XAI outputs without surfacing model limitations, uncertainties, or inconsistencies, they risk reinforcing the illusion of interpretability rather than fostering meaningful transparency. We argue that - instead of merely translating XAI outputs - LLMs should serve as constructive agitators, or devil's advocates, whose role is to actively interrogate AI explanations by presenting alternative interpretations, potential biases, training data limitations, and cases where the model's reasoning may break down. In this role, LLMs can facilitate users in engaging critically with AI systems and generated explanations, with the potential to reduce overreliance caused by misinterpreted or specious explanations.


From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered Analysis

Li, Zhuoyan, Zhu, Hangxiao, Lu, Zhuoran, Xiao, Ziang, Yin, Ming

arXiv.org Artificial Intelligence

AI-assisted decision making becomes increasingly prevalent, yet individuals often fail to utilize AI-based decision aids appropriately especially when the AI explanations are absent, potentially as they do not %understand reflect on AI's decision recommendations critically. Large language models (LLMs), with their exceptional conversational and analytical capabilities, present great opportunities to enhance AI-assisted decision making in the absence of AI explanations by providing natural-language-based analysis of AI's decision recommendation, e.g., how each feature of a decision making task might contribute to the AI recommendation. In this paper, via a randomized experiment, we first show that presenting LLM-powered analysis of each task feature, either sequentially or concurrently, does not significantly improve people's AI-assisted decision performance. To enable decision makers to better leverage LLM-powered analysis, we then propose an algorithmic framework to characterize the effects of LLM-powered analysis on human decisions and dynamically decide which analysis to present. Our evaluation with human subjects shows that this approach effectively improves decision makers' appropriate reliance on AI in AI-assisted decision making.


Contrastive Explanations That Anticipate Human Misconceptions Can Improve Human Decision-Making Skills

Buçinca, Zana, Swaroop, Siddharth, Paluch, Amanda E., Doshi-Velez, Finale, Gajos, Krzysztof Z.

arXiv.org Artificial Intelligence

People's decision-making abilities often fail to improve or may even erode when they rely on AI for decision-support, even when the AI provides informative explanations. We argue this is partly because people intuitively seek contrastive explanations, which clarify the difference between the AI's decision and their own reasoning, while most AI systems offer "unilateral" explanations that justify the AI's decision but do not account for users' thinking. To align human-AI knowledge on decision tasks, we introduce a framework for generating human-centered contrastive explanations that explain the difference between AI's choice and a predicted, likely human choice about the same task. Results from a large-scale experiment (N = 628) demonstrate that contrastive explanations significantly enhance users' independent decision-making skills compared to unilateral explanations, without sacrificing decision accuracy. Amid rising deskilling concerns, our research demonstrates that incorporating human reasoning into AI design can foster human skill development.


Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

Ibrahim, Lujain, Huang, Saffron, Ahmad, Lama, Anderljung, Markus

arXiv.org Artificial Intelligence

Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.


"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust

Kim, Sunnie S. Y., Liao, Q. Vera, Vorvoreanu, Mihaela, Ballard, Stephanie, Vaughan, Jennifer Wortman

arXiv.org Artificial Intelligence

Widely deployed large language models (LLMs) can produce convincing yet incorrect outputs, potentially misleading users who may rely on them as if they were correct. To reduce such overreliance, there have been calls for LLMs to communicate their uncertainty to end users. However, there has been little empirical work examining how users perceive and act upon LLMs' expressions of uncertainty. We explore this question through a large-scale, pre-registered, human-subject experiment (N=404) in which participants answer medical questions with or without access to responses from a fictional LLM-infused search engine. Using both behavioral and self-reported measures, we examine how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance. We find that first-person expressions (e.g., "I'm not sure, but...") decrease participants' confidence in the system and tendency to agree with the system's answers, while increasing participants' accuracy. An exploratory analysis suggests that this increase can be attributed to reduced (but not fully eliminated) overreliance on incorrect answers. While we observe similar effects for uncertainty expressed from a general perspective (e.g., "It's not clear, but..."), these effects are weaker and not statistically significant. Our findings suggest that using natural language expressions of uncertainty may be an effective approach for reducing overreliance on LLMs, but that the precise language used matters. This highlights the importance of user testing before deploying LLMs at scale.