Goto

Collaborating Authors

 self-confidence


Go ahead and swear--it's good for your health

Popular Science

Health Psychology Mental Health Go ahead and swear--it's good for your health Cursing can boost your workout, mood, and even confidence. Breakthroughs, discoveries, and DIY tips sent every weekday. Yelling a properly timed swear word isn't only emotionally satisfying--it may have real physical and psychological benefits . In fact, a well-voiced expletive might even help take you to the next level during a particularly strenuous workout. "In many situations, people hold themselves back--consciously or unconsciously--from using their full strength," explained Richard Stephens, a psychologist at Keele University in the United Kingdom.


Beyond Awareness: Investigating How AI and Psychological Factors Shape Human Self-Confidence Calibration

Cau, Federico Maria, Spano, Lucio Davide

arXiv.org Artificial Intelligence

Human-AI collaboration outcomes depend strongly on human self-confidence calibration, which drives reliance or resistance toward AI's suggestions. This work presents two studies examining whether calibration of self-confidence before decision tasks, low versus high levels of Need for Cognition (NFC), and Actively Open-Minded Thinking (AOT), leads to differences in decision accuracy, self-confidence appropriateness during the tasks, and metacognitive perceptions (global and affective). The first study presents strategies to identify well-calibrated users, also comparing decision accuracy and the appropriateness of self-confidence across NFC and AOT levels. The second study investigates the effects of calibrated self-confidence in AI-assisted decision-making (no AI, two-stage AI, and personalized AI), also considering different NFC and AOT levels. Our results show the importance of human self-confidence calibration and psychological traits when designing AI-assisted decision systems. We further propose design recommendations to address the challenge of calibrating self-confidence and supporting tailored, user-centric AI that accounts for individual traits.


Taming Object Hallucinations with Verified Atomic Confidence Estimation

Liu, Jiarui, Xuan, Weihao, Jin, Zhijing, Diab, Mona

arXiv.org Artificial Intelligence

Multimodal Large Language Models (MLLMs) often suffer from hallucinations, particularly errors in object existence, attributes, or relations, which undermine their reliability. We introduce TACO (Verified Atomic Confidence Estimation), a simple framework that mitigates hallucinations through self-verification and confidence calibration without relying on external vision experts. TACO decomposes responses into atomic queries, paraphrases them to reduce sensitivity to wording, and estimates confidence using self-consistency (black-box) or self-confidence (gray-box) aggregation, before refining answers with a language model. Experiments on five benchmarks (POPE, MME, HallusionBench, AMBER, and MM-Hal Bench) with two MLLMs (\texttt{LLaVA-1.5-7B} and \texttt{CogVLM2}) show that TACO consistently outperforms direct prompting and Visual Contrastive Decoding, reduces systematic biases, and improves confidence calibration, demonstrating its effectiveness in enhancing the faithfulness of MLLMs.


Modeling Psychological Profiles in Volleyball via Mixed-Type Bayesian Networks

Iannario, Maria, Lee, Dae-Jin, Leonelli, Manuele

arXiv.org Artificial Intelligence

Psychological attributes rarely operate in isolation: coaches reason about networks of related traits. We analyze a new dataset of 164 female volleyball players from Italy's C and D leagues that combines standardized psychological profiling with background information. To learn directed relationships among mixed-type variables (ordinal questionnaire scores, categorical demographics, continuous indicators), we introduce latent MMHC, a hybrid structure learner that couples a latent Gaussian copula and a constraint-based skeleton with a constrained score-based refinement to return a single DAG. We also study a bootstrap-aggregated variant for stability. In simulations spanning sample size, sparsity, and dimension, latent Max-Min Hill-Climbing (MMHC) attains lower structural Hamming distance and higher edge recall than recent copula-based learners while maintaining high specificity. Applied to volleyball, the learned network organizes mental skills around goal setting and self-confidence, with emotional arousal linking motivation and anxiety, and locates Big-Five traits (notably neuroticism and extraversion) upstream of skill clusters. Scenario analyses quantify how improvements in specific skills propagate through the network to shift preparation, confidence, and self-esteem. The approach provides an interpretable, data-driven framework for profiling psychological traits in sport and for decision support in athlete development.


Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations

Nahar, Mahjabin, Lee, Eun-Ju, Park, Jin Won, Lee, Dongwon

arXiv.org Artificial Intelligence

While we increasingly rely on large language models (LLMs) for various tasks, these models are known to produce inaccurate content or 'hallucinations' with potentially disastrous consequences. The recent integration of web search results into LLMs prompts the question of whether people utilize them to verify the generated content, thereby accurately detecting hallucinations. An online experiment (N=560) investigated how the provision of search results, either static (i.e., fixed search results provided by LLM) or dynamic (i.e., participant-led searches), affects participants' perceived accuracy of LLM-generated content (i.e., genuine, minor hallucination, major hallucination), self-confidence in accuracy ratings, as well as their overall evaluation of the LLM, as compared to the control condition (i.e., no search results). Results showed that participants in both static and dynamic conditions (vs. control) rated hallucinated content to be less accurate and perceived the LLM more negatively. However, those in the dynamic condition rated genuine content as more accurate and demonstrated greater overall self-confidence in their assessments than those in the static search or control conditions. We highlighted practical implications of incorporating web search functionality into LLMs in real-world contexts.


Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models

Luxembourg, Omer, Permuter, Haim, Nachmani, Eliya

arXiv.org Artificial Intelligence

Masked diffusion language models (MDLMs) promise fast, non-autoregressive text generation, yet existing samplers, which pick tokens to unmask based on model confidence, ignore interactions when unmasking multiple positions in parallel and effectively reduce to slow, autoregressive behavior. We propose the Dilated Unmasking Scheduler (DUS), an inference-only, planner-model-free method that partitions sequence positions into non-adjacent dilated groups and unmasked them in parallel so as to minimize an upper bound on joint entropy gain at each denoising step. By explicitly trading off the number of network calls against generation quality, DUS recovers most of the performance lost under traditional parallel unmasking strategies. Across math (GSM8K, MA TH500), code (HumanEval, MBPP) and general-knowledge benchmarks (BBH, MMLU-Pro), DUS outperforms confidence-based planners, without modifying the underlying denoiser, and reveals the true speed-quality frontier of MDLMs.


Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning

Otth, Matthias, Hübotter, Jonas, Hakimi, Ido, Krause, Andreas

arXiv.org Artificial Intelligence

Recent work has shown that language models can self-improve by maximizing their own confidence in their predictions, without relying on external verifiers or reward signals. In this work, we study the test-time scaling of language models for mathematical reasoning tasks, where the model's own confidence is used to select the most promising attempts. Surprisingly, we find that we can achieve significant performance gains by continuing only the most promising attempt, selected by the model's prefix-confidence. We systematically evaluate prefix-confidence scaling on five mathematical reasoning datasets: the school-level GSM8K and MATH500, and the competition-level AMC23, AIME24, and AIME25. We find that prefix-confidence scaling with prefixes of only 32 tokens achieves a better accuracy-compute trade-off than majority voting. Moreover, prefix-confidence scaling appears less susceptible than BoN to length biases. Finally, we also evaluate test-time training with prefix-confidence and find that, while outperforming the base model, it does not improve over prefix-confidence scaling.


When to Trust Context: Self-Reflective Debates for Context Reliability

Zhou, Zeqi, Wu, Fang, Talaei, Shayan, Zhao, Haokai, Meixin, Cheng, Xu, Tinson, Saberi, Amin, Choi, Yejin

arXiv.org Artificial Intelligence

Large language models frequently encounter conflicts between their parametric knowledge and contextual input, often resulting in factual inconsistencies or hallucinations. We propose Self-Reflective Debate for Contextual Reliability (SR-DCR), a lightweight framework that integrates token-level self-confidence with an asymmetric multi-agent debate to adjudicate such conflicts. A critic, deprived of context, challenges a defender who argues from the given passage; a judge model evaluates the debate and determines the context's reliability. The final answer is selected by combining the verdict with model confidence. Experiments on the ClashEval benchmark demonstrate that SR-DCR consistently enhances robustness to misleading context while maintaining accuracy on trustworthy inputs, outperforming both classical debate and confidence-only baselines with minimal computational overhead. The code is available at https://github.com/smiles724/Self-Reflective-Debates.


Unlocking Learning Potentials: The Transformative Effect of Generative AI in Education Across Grade Levels

Xie, Meijuan, Luo, Liling

arXiv.org Artificial Intelligence

The advent of generative artificial intelligence (GAI) has brought about a notable surge in the field of education. The use of GAI to support learning is becoming increasingly prevalent among students. However, the manner and extent of its utilisation vary considerably from one individual to another. And researches about student's utilisation and perceptions of GAI remains relatively scarce. To gain insight into the issue, this paper proposed a hybrid-survey method to examine the impact of GAI on students across four different grades in six key areas (LIPSAL): learning interest, independent learning, problem solving, self-confidence, appropriate use, and learning enjoyment. Firstly, through questionnaire, we found that among LIPSAL, GAI has the greatest impact on the concept of appropriate use, the lowest level of learning interest and self-confidence. Secondly, a comparison of four grades revealed that the high and low factors of LIPSAL exhibited grade-related variation, and college students exhibited a higher level than high school students across LIPSAL. Thirdly, through interview, the students demonstrated a comprehensive understanding of the application of GAI. We found that students have a positive attitude towards GAI and are very willing to use it, which is why GAI has grown so rapidly in popularity. They also told us prospects and challenges in using GAI. In the future, as GAI matures technologically, it will have an greater impact on students. These findings may help better understand usage by different students and inform future research in digital education.


The Impact and Feasibility of Self-Confidence Shaping for AI-Assisted Decision-Making

Takayanagi, Takehiro, Hashimoto, Ryuji, Chen, Chung-Chi, Izumi, Kiyoshi

arXiv.org Artificial Intelligence

In AI-assisted decision-making, it is crucial but challenging for humans to appropriately rely on AI, especially in high-stakes domains such as finance and healthcare. This paper addresses this problem from a human-centered perspective by presenting an intervention for self-confidence shaping, designed to calibrate self-confidence at a targeted level. We first demonstrate the impact of self-confidence shaping by quantifying the upper-bound improvement in human-AI team performance. Our behavioral experiments with 121 participants show that self-confidence shaping can improve human-AI team performance by nearly 50% by mitigating both over- and under-reliance on AI. We then introduce a self-confidence prediction task to identify when our intervention is needed. Our results show that simple machine-learning models achieve 67% accuracy in predicting self-confidence. We further illustrate the feasibility of such interventions. The observed relationship between sentiment and self-confidence suggests that modifying sentiment could be a viable strategy for shaping self-confidence. Finally, we outline future research directions to support the deployment of self-confidence shaping in a real-world scenario for effective human-AI collaboration.