AITopics | confidence metric

Collaborating Authors

confidence metric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

bc1ad6e8f86c42a371aff945535baebb-Paper.pdf

Neural Information Processing SystemsFeb-13-2026, 20:41:41 GMT

arxiv preprint arxiv, attribution, neighborhood, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
North America > Canada > Quebec > Montreal (0.04)
(3 more...)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Confident RAG: Enhancing the Performance of LLMs for Mathematics Question Answering through Multi-Embedding and Confidence Scoring

Chen, Shiting, Zhao, Zijian, Chen, Jinsong

arXiv.org Artificial IntelligenceDec-2-2025

Abstract--Large Language Models (LLMs) hold significant promise for mathematics education, yet they often struggle with complex mathematical reasoning. While Retrieval-Augmented Generation (RAG) mitigates these issues by grounding LLMs in external knowledge, its effectiveness remains unstable, heavily dependent on the choice of a single embedding model. Moving beyond static RAG workflows, we draw on agentic workflow patterns, a paradigm that introduces structured task decomposition and collaboration to enhance system performance. We propose and examine two novel approaches that combine the benefits of multiple embedding models. While our Mixture-Embedding RAG approach (fusing retrieved documents) shows limited gains, our Confident RAG method (generating multiple answers and selecting the one with the highest confidence score) demonstrates significant improvement. Experimental results show that Confident RAG achieved average accuracy improvements of approximately 10% over vanilla LLMs and 5% over vanilla RAG. The consistent results across different LLMs and embedding models indicate that Confident RAG is an efficient plug-and-play solution for trustworthy mathematical AI assistants. Finally, we discuss how this work lays the groundwork for deploying Agentic RAG systems in educational settings, where autonomous planning and iterative refinement can be built upon our robust retrieval foundation. ARGE language models (LLMs) have demonstrated remarkable capabilities across various domains [1]-[3], showing particular promise for educational applications. However, their tendency to hallucinate [4] remains a significant barrier to reliable use in learning environments, especially in mathematics education where accuracy is crucial [5].

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2507.17442

Country:

Asia > China > Hong Kong (0.05)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (0.54)
Education > Educational Setting (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

SID: Multi-LLM Debate Driven by Self Signals

Chen, Xuhang, Song, Zhifan, Ji, Deyi, Gao, Shuo, Zhu, Lanyun

arXiv.org Artificial IntelligenceOct-9-2025

Large Language Models (LLMs) have exhibited impressive capabilities across diverse application domains. Recent work has explored Multi-LLM Agent Debate (MAD) as a way to enhance performance by enabling multiple LLMs to discuss and refine responses iteratively. Nevertheless, existing MAD methods predominantly focus on utilizing external structures, such as debate graphs, using LLM-as-a-Judge, while neglecting the application of self signals, such as token logits and attention, that arise during generation. This omission leads to redundant computation and potential performance degradation. In this paper, we shift the focus to the self signals of multi-LLM debate and introduce a Self-Signals Driven Multi-LLM Debate (SID), which leverages two types of self-signals: model-level confidence and token-level semantic focus, to adaptively guide the debate process. Our approach enables high-confidence agents to exit early at the model level and compress the redundant debate contents based on the attention mechanism. We evaluate our method on various LLMs and Multimodal LLMs across multiple challenging benchmarks. Experimental results demonstrate that our method not only outperforms existing MAD techniques in accuracy but also reduces token consumption, highlighting the effectiveness of utilizing self signals in enhancing both the performance and efficiency of multi-agent debate systems. Our code will be available at~\href{https://github.com/xuhang2019/SID}{\texttt{https://github.com/xuhang2019/SID}}.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.06843

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

bc1ad6e8f86c42a371aff945535baebb-Paper.pdf

Neural Information Processing SystemsAug-20-2025, 00:38:45 GMT

arxiv preprint arxiv, attribution, neighborhood, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
(5 more...)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

Automated Brake Onset Detection in Naturalistic Driving Data

Liu, Shu-Yuan, Engström, Johan, Markkula, Gustav

arXiv.org Artificial IntelligenceJul-30-2025

Response timing measures play a crucial role in the assessment of automated driving systems (ADS) in collision avoidance scenarios, including but not limited to establishing human benchmarks and comparing ADS to human driver response performance. For example, measuring the response time (of a human driver or ADS) to a conflict requires the determination of a stimulus onset and a response onset. In existing studies, response onset relies on manual annotation or vehicle control signals such as accelerator and brake pedal movements. These methods are not applicable when analyzing large scale data where vehicle control signals are not available. This holds in particular for the rapidly expanding sets of ADS log data where the behavior of surrounding road users is observed via onboard sensors. To advance evaluation techniques for ADS and enable measuring response timing when vehicle control signals are not available, we developed a simple and efficient algorithm, based on a piecewise linear acceleration model, to automatically estimate brake onset that can be applied to any type of driving data that includes vehicle longitudinal time series data. We also proposed a manual annotation method to identify brake onset and used it as ground truth for validation. R^2 was used as a confidence metric to measure the accuracy of the algorithm, and its classification performance was analyzed using naturalistic collision avoidance data of both ADS and humans, where our method was validated against human manual annotation. Although our algorithm is subject to certain limitations, it is efficient, generalizable, applicable to any road user and scenario types, and is highly configurable.

artificial intelligence, machine learning, onset, (18 more...)

arXiv.org Artificial Intelligence

2507.17943

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Virginia (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

CARE: Confidence-Aware Regression Estimation of building density fine-tuning EO Foundation Models

Dionelis, Nikolaos, Bosmans, Jente, Longépé, Nicolas

arXiv.org Artificial IntelligenceFeb-19-2025

--Performing accurate confidence quantification and assessment in pixel-wise regression tasks, which are downstream applications of AI Foundation Models for Earth Observation (EO), is important for deep neural networks to predict their failures, improve their performance and enhance their capabilities in real-world applications, for their practical deployment. For pixel-wise regression tasks, specifically utilizing remote sensing data from satellite imagery in EO Foundation Models, confidence quantification is a critical challenge. The focus of this research is on developing a Foundation Model using EO satellite data that computes and assigns a confidence metric alongside regression outputs to improve the reliability and interpretability of predictions generated by deep neural networks. T o this end, we develop, train and evaluate the proposed Confidence-A ware Regression Estimation (CARE) Foundation Model. Our model CARE computes and assigns confidence to regression results as downstream tasks of a Foundation Model for EO data, and performs a confidence-aware self-corrective learning method for the low-confidence regions. We evaluate the model CARE, and experimental results on multi-spectral data from the Copernicus Sentinel-2 constellation to estimate the building density (i.e. We also show that our model CARE outperforms other methods. The significance of confidence quantification and assessment in deep learning, specifically in AI Foundation Models in Earth Observation (EO) that use satellite data, for regression applications is critical. The utility of satellite data seems inexhaustible, and thanks to developments in AI, applications emerge at an accelerated pace in EO Foundation Models using remote sensing data.

confidence metric, foundation model, neural network, (13 more...)

arXiv.org Artificial Intelligence

2502.13734

Country:

North America (0.14)
South America (0.04)
Europe > Switzerland (0.04)
(10 more...)

Genre: Research Report (0.64)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.75)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

Precise Antigen-Antibody Structure Predictions Enhance Antibody Development with HelixFold-Multimer

Gao, Jie, Hu, Jing, Liu, Lihang, Xue, Yang, Zhu, Kunrui, Zhang, Xiaonan, Fang, Xiaomin

arXiv.org Artificial IntelligenceDec-12-2024

The accurate prediction of antigen-antibody structures is essential for advancing immunology and therapeutic development, as it helps elucidate molecular interactions that underlie immune responses. Despite recent progress with deep learning models like AlphaFold and RoseTTAFold, accurately modeling antigen-antibody complexes remains a challenge due to their unique evolutionary characteristics. HelixFold-Multimer, a specialized model developed for this purpose, builds on the framework of AlphaFold-Multimer and demonstrates improved precision for antigen-antibody structures. HelixFold-Multimer not only surpasses other models in accuracy but also provides essential insights into antibody development, enabling more precise identification of binding sites, improved interaction prediction, and enhanced design of therapeutic antibodies. These advances underscore HelixFold-Multimer's potential in supporting antibody research and therapeutic innovation.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.09826

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Context-Based Echo State Networks with Prediction Confidence for Human-Robot Shared Control

Amirshirzad, Negin, Eren, Mehmet Arda, Oztop, Erhan

arXiv.org Artificial IntelligenceNov-30-2024

In this paper, we propose a novel lightweight learning from demonstration (LfD) model based on reservoir computing that can learn and generate multiple movement trajectories with prediction intervals, which we call as Context-based Echo State Network with prediction confidence (CESN+). CESN+ can generate movement trajectories that may go beyond the initial LfD training based on a desired set of conditions while providing confidence on its generated output. To assess the abilities of CESN+, we first evaluate its performance against Conditional Neural Movement Primitives (CNMP), a comparable framework that uses a conditional neural process to generate movement primitives. Our findings indicate that CESN+ not only outperforms CNMP but is also faster to train and demonstrates impressive performance in generating trajectories for extrapolation cases. In human-robot shared control applications, the confidence of the machine generated trajectory is a key indicator of how to arbitrate control sharing. To show the usability of the CESN+ for human-robot adaptive shared control, we have designed a proof-of-concept human-robot shared control task and tested its efficacy in adapting the sharing weight between the human and the robot by comparing it to a fixed-weight control scheme. The simulation experiments show that with CESN+ based adaptive sharing the total human load in shared control can be significantly reduced. Overall, the developed CESN+ model is a strong lightweight LfD system with desirable properties such fast training and ability to extrapolate to the new task parameters while producing robust prediction intervals for its output.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2412.00541

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

The ATTUNE model for Artificial Trust Towards Human Operators

Petousakis, Giannis, Cangelosi, Angelo, Stolkin, Rustam, Chiou, Manolis

arXiv.org Artificial IntelligenceNov-29-2024

This paper presents a novel method to quantify Trust in HRI. It proposes an HRI framework for estimating the Robot Trust towards the Human in the context of a narrow and specified task. The framework produces a real-time estimation of an AI agent's Artificial Trust towards a Human partner interacting with a mobile teleoperation robot. The approach for the framework is based on principles drawn from Theory of Mind, including information about the human state, action, and intent. The framework creates the ATTUNE model for Artificial Trust Towards Human Operators. The model uses metrics on the operator's state of attention, navigational intent, actions, and performance to quantify the Trust towards them. The model is tested on a pre-existing dataset that includes recordings (ROSbags) of a human trial in a simulated disaster response scenario. The performance of ATTUNE is evaluated through a qualitative and quantitative analysis. The results of the analyses provide insight into the next stages of the research and help refine the proposed approach.

artificial intelligence, machine learning, operator, (17 more...)

arXiv.org Artificial Intelligence

2411.1958

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre:

Research Report > Promising Solution (0.34)
Research Report > Experimental Study (0.34)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Trustworthy Intrusion Detection: Confidence Estimation Using Latent Space

Pitsiorlas, Ioannis, Arvanitakis, George, Kountouris, Marios

arXiv.org Artificial IntelligenceSep-19-2024

This work introduces a novel method for enhancing confidence in anomaly detection in Intrusion Detection Systems (IDS) through the use of a Variational Autoencoder (VAE) architecture. By developing a confidence metric derived from latent space representations, we aim to improve the reliability of IDS predictions against cyberattacks. Applied to the NSL-KDD dataset, our approach focuses on binary classification tasks to effectively distinguish between normal and malicious network activities. The methodology demonstrates a significant enhancement in anomaly detection, evidenced by a notable correlation of 0.45 between the reconstruction error and the proposed metric. Our findings highlight the potential of employing VAEs for more accurate and trustworthy anomaly detection in network security.

detection, latent space, representation, (15 more...)

arXiv.org Artificial Intelligence

2409.13774

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.49)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)

Add feedback