AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Confidence-Aware Routing for Large Language Model Reliability Enhancement: A Multi-Signal Approach to Pre-Generation Hallucination Mitigation

arXiv.org Artificial IntelligenceOct-3-2025

Large Language Models suffer from hallucination, generating plausible yet factually incorrect content. Current mitigation strategies focus on post-generation correction, which is computationally expensive and fails to prevent unreliable content generation. We propose a confidence-aware routing system that proactively assesses model uncertainty before generation and redirects queries based on estimated reliability. Our approach combines three complementary signals: semantic alignment between internal representations and reference embeddings, internal convergence analysis across model layers, and learned confidence estimation. The unified confidence score determines routing to four pathways: local generation for high confidence, retrieval-augmented generation for medium confidence, larger models for low confidence, and human review for very low confidence. Evaluation on knowledge-intensive QA benchmarks demonstrates significant improvements in hallucination detection (0.74 vs. 0.42 baseline) while reducing computational costs by 40% compared to post-hoc methods. The F1 score improves from 0.61 to 0.82 with low false positive rates (0.09). This paradigm shift from reactive correction to proactive assessment offers a computationally efficient approach to LLM reliability enhancement.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2510.01237

Genre: Research Report (0.50)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)

Add feedback

Comparison of Machine Learning Models to Classify Documents on Digital Development

Ranaweera, Uvini, Mawitagama, Bawun, Liyanage, Sanduni, Keshan, Sandupa, de Silva, Tiloka, Hewawalpita, Supun

arXiv.org Artificial IntelligenceOct-3-2025

Automated document classification is a trending topic in Natural Language Processing (NLP) due to the extensive growth in digital databases. However, a model that fits well for a specific classification task might perform weakly for another dataset due to differences in the context. Thus, training and evaluating several models is necessary to optimise the results. This study employs a publicly available document database on worldwide digital development interventions categorised under twelve areas. Since digital interventions are still emerging, utilising NLP in the field is relatively new. Given the exponential growth of digital interventions, this research has a vast scope for improving how digital-development-oriented organisations report their work. The paper examines the classification performance of Machine Learning (ML) algorithms, including Decision Trees, k-Nearest Neighbors, Support Vector Machine, AdaBoost, Stochastic Gradient Descent, Naive Bayes, and Logistic Regression. Accuracy, precision, recall and F1-score are utilised to evaluate the performance of these models, while oversampling is used to address the class-imbalanced nature of the dataset. Deviating from the traditional approach of fitting a single model for multiclass classification, this paper investigates the One vs Rest approach to build a combined model that optimises the performance. The study concludes that the amount of data is not the sole factor affecting the performance; features like similarity within classes and dissimilarity among classes are also crucial.

artificial intelligence, classification, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-99-7969-1_5

2510.0072

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.96)
Government > Regional Government > North America Government > United States Government (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Multi-Domain Brain Vessel Segmentation Through Feature Disentanglement

Galati, Francesco, Falcetta, Daniele, Cortese, Rosa, Prados, Ferran, Burgos, Ninon, Zuluaga, Maria A.

arXiv.org Artificial IntelligenceOct-3-2025

The intricate morphology of brain vessels poses significant challenges for automatic segmentation models, which usually focus on a single imaging modality. However, accurately treating brain-related conditions requires a comprehensive understanding of the cerebrovascular tree, regardless of the specific acquisition procedure. Our framework effectively segments brain arteries and veins in various datasets through image-to-image translation while avoiding domain-specific model design and data harmonization between the source and the target domain. This is accomplished by employing disentanglement techniques to independently manipulate different image properties, allowing them to move from one domain to another in a label-preserving manner. Specifically, we focus on manipulating vessel appearances during adaptation while preserving spatial information, such as shapes and locations, which are crucial for correct segmentation. Our evaluation effectively bridges large and varied domain gaps across medical centers, image modalities, and vessel types. Additionally, we conduct ablation studies on the optimal number of required annotations and other architectural choices. The results highlight our framework's robustness and versatility, demonstrating the potential of domain adaptation methodologies to perform cerebrovascular image segmentation in multiple scenarios accurately. Our code is available at https://github.com/i-vesseg/MultiVesSeg.

artificial intelligence, machine learning, segmentation, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.59275/j.melba.2025-4582

2510.00665

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

5781a2637b476d781eb3134581b32044-Supplemental.pdf

Neural Information Processing SystemsOct-2-2025, 23:36:57 GMT

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Data Science > Data Mining (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Estimating weighted areas under the ROC curve

Neural Information Processing SystemsOct-2-2025, 23:36:50 GMT

If X is unambiguous we write ˆ µ = ˆµ (X ). A summary of notation in tabular form is given in the supplement.

artificial intelligence, machine learning, probability, (18 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 22:48:50 GMT

The running application in this paper is the important problem of recommending scientific articles to people based on previous rating/interaction data. CTPF draws mainly upon two recent models: collaborative topic regression (CTR) of Wang and Blei and Poisson factorization of Gopalan et al. Each document is represented by two latent vectors in K-dimensional topic space: \theta, based on the text of the document, and \epsilon, based on the document's readers. Each user is represented by a latent K-dimensional topic affinity vector, x. Observed word counts for each document are drawn from a Poisson centered on the product of theta and the topic-word matrix, while the observed user-document ratings are drawn from a Poisson centered on x * (\theta + \epsilon), leading to a very elegant combination of text data and readership data. Authors present both batch and stochastic variational inference algorithms for approximating the posterior, and then experimental results showing state-of-the-art recall and precision @20 performance on two real-world data sets.

factorization, hyperparameter, poisson factorization, (13 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

Online Neural Connectivity Estimation with Noisy Group Testing

Neural Information Processing SystemsOct-2-2025, 22:48:04 GMT

Many previous approaches have attempted to estimate functional connectivity between neurons using statistical modeling of observational data, but these approaches rely heavily on parametric assumptions and are purely correlational. Recently, however, holographic photostimulation techniques have made it possible to precisely target selected ensembles of neurons, offering the possibility of establishing direct causal links.

artificial intelligence, machine learning, neuron, (17 more...)

Neural Information Processing Systems

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Optimizing F-Measures by Cost-Sensitive Classification

Shameem Puthiya Parambath, Nicolas Usunier, Yves Grandvalet

Neural Information Processing SystemsOct-2-2025, 21:57:39 GMT

We present a theoretical analysis of F -measures for binary, multiclass and mul-tilabel classification. These performance measures are non-linear, but in many scenarios they are pseudo-linear functions of the per-class false negative/false positive rate. Based on this observation, we present a general reduction of F - measure maximization to cost-sensitive classification with unknown costs. We then propose an algorithm with provable guarantees to obtain an approximately optimal classifier for the F -measure by solving a series of cost-sensitive classification problems. The strength of our analysis is to be valid on any dataset and any class of classifiers, extending the existing theoretical results on F -measures, which are asymptotic in nature. We present numerical experiments to illustrate the relative importance of cost asymmetry and thresholding when learning linear classifiers on various F -measure optimization tasks.

classification, classifier, probability, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Newton (0.04)
Europe > France > Hauts-de-France (0.04)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
Asia > Taiwan (0.04)

Genre: Research Report > New Finding (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Supplementary Material for Bootstrapping Neural Processes Juho Lee 1,2, Y oonho Lee

Neural Information Processing SystemsOct-2-2025, 20:27:08 GMT

We sampled 100 GP prior functions from zero mean and unit variance. After realizing them, the prior functions are used to optimize via Bayesian optimization. All the experiments are implemented with [8]. Same as Appendix B.1, except that all the models were trained for 200 The other details are the same as in Appendix B.1. Seen classes (0-9) Unseen classes (10-46) t -noise CE sharpness CE Sharpness CE Sharpness CNP 0.448 We also measure the sharpness [10] which essentially is a average prediction variance.

artificial intelligence, experiment, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > South Korea (0.46)
Europe > United Kingdom > England (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.41)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 20:23:17 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper considers weighted majority algorithm and establishes consistency (error rate of the aggregator tending to zero) results under two settings: (1) when the competence level (risk of each expert) is known in advance and (2) when it is estimated. For case (2), frequentist and Bayesian methods for estimating the competence level are provided. For case (1), consistency is established in terms of providing upper and lower bounds on the error rate of the aggregator, which involve standard calculations ( apart from the fact that upper bound is established by invoking a result by Kearns and Saul, instead of Hoeffding's inequality). For case (2) under the frequentist setting, an independent set of labeled inputs is used to estimate the competence level of each expert.

inequality, kearn-saul inequality, probability, (12 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.05)

Genre:

Research Report (0.52)
Overview (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.39)

Add feedback