Goto

Collaborating Authors

 Performance Analysis


Confidence-Aware Routing for Large Language Model Reliability Enhancement: A Multi-Signal Approach to Pre-Generation Hallucination Mitigation

arXiv.org Artificial Intelligence

Large Language Models suffer from hallucination, generating plausible yet factually incorrect content. Current mitigation strategies focus on post-generation correction, which is computationally expensive and fails to prevent unreliable content generation. We propose a confidence-aware routing system that proactively assesses model uncertainty before generation and redirects queries based on estimated reliability. Our approach combines three complementary signals: semantic alignment between internal representations and reference embeddings, internal convergence analysis across model layers, and learned confidence estimation. The unified confidence score determines routing to four pathways: local generation for high confidence, retrieval-augmented generation for medium confidence, larger models for low confidence, and human review for very low confidence. Evaluation on knowledge-intensive QA benchmarks demonstrates significant improvements in hallucination detection (0.74 vs. 0.42 baseline) while reducing computational costs by 40% compared to post-hoc methods. The F1 score improves from 0.61 to 0.82 with low false positive rates (0.09). This paradigm shift from reactive correction to proactive assessment offers a computationally efficient approach to LLM reliability enhancement.


Comparison of Machine Learning Models to Classify Documents on Digital Development

arXiv.org Artificial Intelligence

Automated document classification is a trending topic in Natural Language Processing (NLP) due to the extensive growth in digital databases. However, a model that fits well for a specific classification task might perform weakly for another dataset due to differences in the context. Thus, training and evaluating several models is necessary to optimise the results. This study employs a publicly available document database on worldwide digital development interventions categorised under twelve areas. Since digital interventions are still emerging, utilising NLP in the field is relatively new. Given the exponential growth of digital interventions, this research has a vast scope for improving how digital-development-oriented organisations report their work. The paper examines the classification performance of Machine Learning (ML) algorithms, including Decision Trees, k-Nearest Neighbors, Support Vector Machine, AdaBoost, Stochastic Gradient Descent, Naive Bayes, and Logistic Regression. Accuracy, precision, recall and F1-score are utilised to evaluate the performance of these models, while oversampling is used to address the class-imbalanced nature of the dataset. Deviating from the traditional approach of fitting a single model for multiclass classification, this paper investigates the One vs Rest approach to build a combined model that optimises the performance. The study concludes that the amount of data is not the sole factor affecting the performance; features like similarity within classes and dissimilarity among classes are also crucial.


Multi-Domain Brain Vessel Segmentation Through Feature Disentanglement

arXiv.org Artificial Intelligence

The intricate morphology of brain vessels poses significant challenges for automatic segmentation models, which usually focus on a single imaging modality. However, accurately treating brain-related conditions requires a comprehensive understanding of the cerebrovascular tree, regardless of the specific acquisition procedure. Our framework effectively segments brain arteries and veins in various datasets through image-to-image translation while avoiding domain-specific model design and data harmonization between the source and the target domain. This is accomplished by employing disentanglement techniques to independently manipulate different image properties, allowing them to move from one domain to another in a label-preserving manner. Specifically, we focus on manipulating vessel appearances during adaptation while preserving spatial information, such as shapes and locations, which are crucial for correct segmentation. Our evaluation effectively bridges large and varied domain gaps across medical centers, image modalities, and vessel types. Additionally, we conduct ablation studies on the optimal number of required annotations and other architectural choices. The results highlight our framework's robustness and versatility, demonstrating the potential of domain adaptation methodologies to perform cerebrovascular image segmentation in multiple scenarios accurately. Our code is available at https://github.com/i-vesseg/MultiVesSeg.



Estimating weighted areas under the ROC curve

Neural Information Processing Systems

If X is unambiguous we write ห† ยต = ห†ยต (X ). A summary of notation in tabular form is given in the supplement.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

The running application in this paper is the important problem of recommending scientific articles to people based on previous rating/interaction data. CTPF draws mainly upon two recent models: collaborative topic regression (CTR) of Wang and Blei and Poisson factorization of Gopalan et al. Each document is represented by two latent vectors in K-dimensional topic space: \theta, based on the text of the document, and \epsilon, based on the document's readers. Each user is represented by a latent K-dimensional topic affinity vector, x. Observed word counts for each document are drawn from a Poisson centered on the product of theta and the topic-word matrix, while the observed user-document ratings are drawn from a Poisson centered on x * (\theta + \epsilon), leading to a very elegant combination of text data and readership data. Authors present both batch and stochastic variational inference algorithms for approximating the posterior, and then experimental results showing state-of-the-art recall and precision @20 performance on two real-world data sets.


Online Neural Connectivity Estimation with Noisy Group Testing

Neural Information Processing Systems

Many previous approaches have attempted to estimate functional connectivity between neurons using statistical modeling of observational data, but these approaches rely heavily on parametric assumptions and are purely correlational. Recently, however, holographic photostimulation techniques have made it possible to precisely target selected ensembles of neurons, offering the possibility of establishing direct causal links.


Optimizing F-Measures by Cost-Sensitive Classification

Neural Information Processing Systems

We present a theoretical analysis of F -measures for binary, multiclass and mul-tilabel classification. These performance measures are non-linear, but in many scenarios they are pseudo-linear functions of the per-class false negative/false positive rate. Based on this observation, we present a general reduction of F - measure maximization to cost-sensitive classification with unknown costs. We then propose an algorithm with provable guarantees to obtain an approximately optimal classifier for the F -measure by solving a series of cost-sensitive classification problems. The strength of our analysis is to be valid on any dataset and any class of classifiers, extending the existing theoretical results on F -measures, which are asymptotic in nature. We present numerical experiments to illustrate the relative importance of cost asymmetry and thresholding when learning linear classifiers on various F -measure optimization tasks.


Supplementary Material for Bootstrapping Neural Processes Juho Lee 1,2, Y oonho Lee

Neural Information Processing Systems

We sampled 100 GP prior functions from zero mean and unit variance. After realizing them, the prior functions are used to optimize via Bayesian optimization. All the experiments are implemented with [8]. Same as Appendix B.1, except that all the models were trained for 200 The other details are the same as in Appendix B.1. Seen classes (0-9) Unseen classes (10-46) t -noise CE sharpness CE Sharpness CE Sharpness CNP 0.448 We also measure the sharpness [10] which essentially is a average prediction variance.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper considers weighted majority algorithm and establishes consistency (error rate of the aggregator tending to zero) results under two settings: (1) when the competence level (risk of each expert) is known in advance and (2) when it is estimated. For case (2), frequentist and Bayesian methods for estimating the competence level are provided. For case (1), consistency is established in terms of providing upper and lower bounds on the error rate of the aggregator, which involve standard calculations ( apart from the fact that upper bound is established by invoking a result by Kearns and Saul, instead of Hoeffding's inequality). For case (2) under the frequentist setting, an independent set of labeled inputs is used to estimate the competence level of each expert.