AITopics

Industry: Health & Medicine > Therapeutic Area (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Neural Information Processing SystemsFeb-14-2026, 09:23:31 GMT

d54e99a6c03704e95e6965532dec148b-Paper.pdf

disparity, intervention, tpr, (14 more...)

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California (0.04)
North America > Canada (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.94)

Industry:

Government (1.00)
Health & Medicine (0.93)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.69)

Neural Information Processing SystemsFeb-7-2026, 16:14:53 GMT

mhealth_ood_neurips_2021.pdf

interface, tnr, tpr95, (15 more...)

Industry: Health & Medicine > Therapeutic Area (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

arXiv.org Artificial IntelligenceSep-23-2025

Variation in Verification: Understanding Verification Dynamics in Large Language Models

Zhou, Yefan, Xu, Austin, Zhou, Yilun, Singh, Janvijay, Gui, Jiang, Joty, Shafiq

Recent advances have shown that scaling test-time computation enables large language models (LLMs) to solve increasingly complex problems across diverse domains. One effective paradigm for test-time scaling (TTS) involves LLM generators producing multiple solution candidates, with LLM verifiers assessing the correctness of these candidates without reference answers. In this paper, we study generative verifiers, which perform verification by generating chain-of-thought (CoT) reasoning followed by a binary verdict. We systematically analyze verification dynamics across three dimensions - problem difficulty, generator capability, and verifier generation capability - with empirical studies on 12 benchmarks across mathematical reasoning, knowledge, and natural language reasoning tasks using 14 open-source models (2B to 72B parameter range) and GPT-4o. Our experiments reveal three key findings about verification effectiveness: (1) Easy problems allow verifiers to more reliably certify correct responses; (2) Weak generators produce errors that are easier to detect than strong generators; (3) Verification ability is generally correlated with the verifier's own problem-solving capability, but this relationship varies with problem difficulty. These findings reveal opportunities to optimize basic verification strategies in TTS applications. First, given the same verifier, some weak generators can nearly match stronger ones in post-verification TTS performance (e.g., the Gemma2-9B to Gemma2-27B performance gap shrinks by 75.5%). Second, we identify cases where strong verifiers offer limited advantage over weak ones, as both fail to provide meaningful verification gains, suggesting that verifier scaling alone cannot overcome fundamental verification challenges.

large language model, machine learning, natural language, (18 more...)

2509.17995

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

arXiv.org Artificial IntelligenceSep-17-2025

Out of Distribution Detection in Self-adaptive Robots with AI-powered Digital Twins

Isaku, Erblin, Sartaj, Hassan, Ali, Shaukat, Sanguino, Beatriz, Wang, Tongtong, Li, Guoyuan, Zhang, Houxiang, Peyrucain, Thomas

Self-adaptive robots (SARs) in complex, uncertain environments must proactively detect and address abnormal behaviors, including out-of-distribution (OOD) cases. To this end, digital twins offer a valuable solution for OOD detection. Thus, we present a digital twin-based approach for OOD detection (ODiSAR) in SARs. ODiSAR uses a Transformer-based digital twin to forecast SAR states and employs reconstruction error and Monte Carlo dropout for uncertainty quantification. By combining reconstruction error with predictive variance, the digital twin effectively detects OOD behaviors, even in previously unseen conditions. The digital twin also includes an explainability layer that links potential OOD to specific SAR states, offering insights for self-adaptation. We evaluated ODiSAR by creating digital twins of two industrial robots: one navigating an office environment, and another performing maritime ship navigation. In both cases, ODiSAR forecasts SAR behaviors (i.e., robot trajectories and vessel motion) and proactively detects OOD events. Our results showed that ODiSAR achieved high detection performance -- up to 98\% AUROC, 96\% TNR@TPR95, and 95\% F1-score -- while providing interpretable insights to support self-adaptation.

artificial intelligence, detection, machine learning, (19 more...)

2509.12982

Country: Europe > Norway (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.96)
Energy > Renewable > Wind (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Neural Information Processing SystemsAug-20-2025, 04:09:46 GMT

d54e99a6c03704e95e6965532dec148b-Paper.pdf

disparity, intervention, tpr, (14 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.94)

Industry:

Government (1.00)
Health & Medicine (0.93)
Banking & Finance (0.68)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.69)

arXiv.org Machine LearningAug-19-2024

Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification

Li, Jing

Evaluation Metrics is an important question for model evaluation and model selection in binary classification tasks. This study investigates how consistent metrics are at evaluating different models under different data scenarios. Analyzing over 150 data scenarios and 18 model evaluation metrics using statistical simulation, I find that for binary classification tasks, evaluation metrics that are less influenced by prevalence offer more consistent ranking of a set of different models. In particular, Area Under the ROC Curve (AUC) has smallest variance in ranking of different models. Matthew's correlation coefficient as a more strict measure of model performance has the second smallest variance. These patterns holds across a rich set of data scenarios and five commonly used machine learning models as well as a naive random guess model. The results have significant implications for model evaluation and model selection in binary classification tasks.

different model, prevalence, tnr, (13 more...)

arXiv.org Machine Learning

2408.10193

Country:

North America > United States > Illinois (0.04)
North America > United States > Florida > Broward County (0.04)
Europe > Portugal (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Butt, Talha Hanif, Tiwari, Prayag, Alonso-Fernandez, Fernando

Predicting Overtakes in Trucks Using CAN Data

arXiv.org Artificial IntelligenceApr-8-2024

Safe overtakes in trucks are crucial to prevent accidents, reduce congestion, and ensure efficient traffic flow, making early prediction essential for timely and informed driving decisions. Accordingly, we investigate the detection of truck overtakes from CAN data. Three classifiers, Artificial Neural Networks (ANN), Random Forest, and Support Vector Machines (SVM), are employed for the task. Our analysis covers up to 10 seconds before the overtaking event, using an overlapping sliding window of 1 second to extract CAN features. We observe that the prediction scores of the overtake class tend to increase as we approach the overtake trigger, while the no-overtake class remain stable or oscillates depending on the classifier. Thus, the best accuracy is achieved when approaching the trigger, making early overtaking prediction challenging. The classifiers show good accuracy in classifying overtakes (Recall/TPR > 93%), but accuracy is suboptimal in classifying no-overtakes (TNR typically 80-90% and below 60% for one SVM variant). We further combine two classifiers (Random Forest and linear SVM) by averaging their output scores. The fusion is observed to improve no-overtake classification (TNR > 92%) at the expense of reducing overtake accuracy (TPR). However, the latter is kept above 91% near the overtake trigger. Therefore, the fusion balances TPR and TNR, providing more consistent performance than individual classifiers.

classifier, individual classifier, overtake, (16 more...)

2404.05723

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
Europe > Sweden > Halland County > Halmstad (0.04)

Genre: Research Report (0.40)

Industry: Automobiles & Trucks (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.88)

Benítez-Peña, Sandra, Blanquero, Rafael, Carrizosa, Emilio, Ramírez-Cobo, Pepa

On support vector machines under a multiple-cost scenario

arXiv.org Machine LearningDec-22-2023

Support Vector Machine (SVM) is a powerful tool in binary classification, known to attain excellent misclassification rates. On the other hand, many realworld classification problems, such as those found in medical diagnosis, churn or fraud prediction, involve misclassification costs which may be different in the different classes. However, it may be hard for the user to provide precise values for such misclassification costs, whereas it may be much easier to identify acceptable misclassification rates values. In this paper we propose a novel SVM model in which misclassification costs are considered by incorporating performance constraints in the problem formulation. Specifically, our aim is to seek the hyperplane with maximal margin yielding misclassification rates below given threshold values. Such maximal margin hyperplane is obtained by solving a quadratic convex problem with linear constraints and integer variables. The reported numerical experience shows that our model gives the user control on the misclassification rates in one class (possibly at the expense of an increase in misclassification rates for the other class) and is feasible in terms of running times.

artificial intelligence, constraint, machine learning, (15 more...)

arXiv.org Machine Learning

doi: 10.1007/s11634-018-0330-5

2312.14795

Country:

North America > United States > Wisconsin (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Andalusia > Seville Province > Seville (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Arnaiz-Rodriguez, Adrian, Oliver, Nuria

FairShap: A Data Re-weighting Approach for Algorithmic Fairness based on Shapley Values

arXiv.org Artificial IntelligenceNov-28-2023

Algorithmic fairness is of utmost societal importance, yet the current trend in large-scale machine learning models requires training with massive datasets that are frequently biased. In this context, pre-processing methods that focus on modeling and correcting bias in the data emerge as valuable approaches. In this paper, we propose FairShap, a novel instance-level data re-weighting method for fair algorithmic decision-making through data valuation by means of Shapley Values. FairShap is model-agnostic and easily interpretable, as it measures the contribution of each training data point to a predefined fairness metric. We empirically validate FairShap on several state-of-the-art datasets of different nature, with a variety of training scenarios and models and show how it yields fairer models with similar levels of accuracy than the baselines. We illustrate FairShap's interpretability by means of histograms and latent space visualizations. Moreover, we perform a utility-fairness study, and ablation and runtime experiments to illustrate the impact of the size of the reference dataset and FairShap's computational cost depending on the size of the dataset and the number of features. We believe that FairShap represents a promising direction in interpretable and model-agnostic approaches to algorithmic fairness that yield competitive accuracy even when only biased datasets are available.

dataset, fairness, fairshap, (16 more...)

2303.01928

Country:

North America > United States > Oregon (0.04)
North America > United States > California (0.04)
Europe > Spain > Valencian Community > Alicante Province > Alicante (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Law (1.00)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)