AITopics | Performance Analysis

Feature selection and importance estimation in a model-agnostic setting is an ongoing challenge of significant interest. Wrapper methods are commonly used because they are typically model-agnostic, even though they are computationally intensive. In this paper, we focus on feature selection methods related to the Generalized Covariance Measure (GCM) and Leave-One-Covariate-Out (LOCO) estimation, and provide a comparison based on relative efficiency. In particular, we present a theoretical comparison under three model settings: linear models, non-linear additive models, and single index models that mimic a single-layer neural network. We complement this with extensive simulations and real data examples. Our theoretical results, along with empirical findings, demonstrate that GCM-related methods generally outperform LOCO under suitable regularity conditions. Furthermore, we quantify the asymptotic relative efficiency of these approaches. Our simulations and real data analysis include widely used machine learning methods such as neural networks and gradient boosting trees.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Machine Learning

2508.14268

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
Europe > Switzerland (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Banking & Finance > Real Estate (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Evaluation and Optimization of Leave-one-out Cross-validation for the Lasso

Burn, Ryan

arXiv.org Machine LearningAug-21-2025

I develop an algorithm to produce the piecewise quadratic that computes leave-one-out cross-validation for the lasso as a function of its hyperparameter. The algorithm can be used to find exact hyperparameters that optimize leave-one-out cross-validation either globally or locally, and its practicality is demonstrated on real-world data sets.

artificial intelligence, machine learning, regressor, (17 more...)

arXiv.org Machine Learning

2508.14368

Country:

North America > United States > California (0.05)
Asia > Myanmar (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.83)

Add feedback

The C-index Multiverse

Sierra, Begoña B., McLean, Colin, Hall, Peter S., Vallejos, Catalina A.

arXiv.org Machine LearningAug-21-2025

Quantifying out-of-sample discrimination performance for time-to-event outcomes is a fundamental step for model evaluation and selection in the context of predictive modelling. The concordance index, or C-index, is a widely used metric for this purpose, particularly with the growing development of machine learning methods. Beyond differences between proposed C-index estimators (e.g. Harrell's, Uno's and Antolini's), we demonstrate the existence of a C-index multiverse among available R and python software, where seemingly equal implementations can yield different results. This can undermine reproducibility and complicate fair comparisons across models and studies. Key variation sources include tie handling and adjustment to censoring. Additionally, the absence of a standardised approach to summarise risk from survival distributions, result in another source of variation dependent on input types. We demonstrate the consequences of the C-index multiverse when quantifying predictive performance for several survival models (from Cox proportional hazards to recent deep learning approaches) on publicly available breast cancer data, and semi-synthetic examples. Our work emphasises the need for better reporting to improve transparency and reproducibility. This article aims to be a useful guideline, helping analysts when navigating the multiverse, providing unified documentation and highlighting potential pitfalls of existing software. All code is publicly available at: www.github.com/BBolosSierra/CindexMultiverse.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Machine Learning

2508.14821

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > Scotland (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)

Add feedback

Safe and Transparent Robots for Human-in-the-Loop Meat Processing

Parekh, Sagar, Grothoff, Casey, Wright, Ryan, White, Robin, Losey, Dylan P.

arXiv.org Artificial IntelligenceAug-21-2025

Labor shortages have severely affected the meat processing sector. Automated technology has the potential to support the meat industry, assist workers, and enhance job quality. However, existing automation in meat processing is highly specialized, inflexible, and cost intensive. Instead of forcing manufacturers to buy a separate device for each step of the process, our objective is to develop general-purpose robotic systems that work alongside humans to perform multiple meat processing tasks. Through a recently conducted survey of industry experts, we identified two main challenges associated with integrating these collaborative robots alongside human workers. First, there must be measures to ensure the safety of human coworkers; second, the coworkers need to understand what the robot is doing. This paper addresses both challenges by introducing a safety and transparency framework for general-purpose meat processing robots. For safety, we implement a hand-detection system that continuously monitors nearby humans. This system can halt the robot in situations where the human comes into close proximity of the operating robot. We also develop an instrumented knife equipped with a force sensor that can differentiate contact between objects such as meat, bone, or fixtures. For transparency, we introduce a method that detects the robot's uncertainty about its performance and uses an LED interface to communicate that uncertainty to the human. Additionally, we design a graphical interface that displays the robot's plans and allows the human to provide feedback on the planned cut. Overall, our framework can ensure safe operation while keeping human workers in-the-loop about the robot's actions which we validate through a user study.

artificial intelligence, machine learning, robot, (19 more...)

arXiv.org Artificial Intelligence

2508.14763

Country: North America > United States (0.93)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
(2 more...)

Industry:

Health & Medicine (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Improving Fairness in Graph Neural Networks via Counterfactual Debiasing

Wo, Zengyi, Liu, Chang, Wang, Yumeng, Shao, Minglai, Wang, Wenjun

arXiv.org Artificial IntelligenceAug-21-2025

Graph Neural Networks (GNNs) have been successful in modeling graph-structured data. However, similar to other machine learning models, GNNs can exhibit bias in predictions based on attributes like race and gender. Moreover, bias in GNNs can be exacerbated by the graph structure and message-passing mechanisms. Recent cutting-edge methods propose mitigating bias by filtering out sensitive information from input or representations, like edge dropping or feature masking. Yet, we argue that such strategies may unintentionally eliminate non-sensitive features, leading to a compromised balance between predictive accuracy and fairness. To tackle this challenge, we present a novel approach utilizing counterfactual data augmentation for bias mitigation. This method involves creating diverse neighborhoods using counterfactuals before message passing, facilitating unbiased node representations learning from the augmented graph. Subsequently, an adversarial discriminator is employed to diminish bias in predictions by conventional GNN classifiers. Our proposed technique, Fair-ICD, ensures the fairness of GNNs under moderate conditions. Experiments on standard datasets using three GNN backbones demonstrate that Fair-ICD notably enhances fairness metrics while preserving high predictive performance.

artificial intelligence, machine learning, node, (15 more...)

arXiv.org Artificial Intelligence

2508.14683

Country:

Asia > China (0.17)
Europe > Spain (0.16)

Genre: Research Report > Promising Solution (0.86)

Industry: Information Technology (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

EmoTale: An Enacted Speech-emotion Dataset in Danish

Hjuler, Maja J., Skat-Rørdam, Harald V., Clemmensen, Line H., Das, Sneha

arXiv.org Artificial IntelligenceAug-21-2025

--While multiple emotional speech corpora exist for commonly spoken languages, there is a lack of functional datasets for smaller (spoken) languages, such as Danish. T o our knowledge, Danish Emotional Speech (DES), published in 1997, is the only other database of Danish emotional speech. We demonstrate the validity of the dataset by investigating and presenting its predictive power using speech emotion recognition (SER) models. We develop SER models for EmoT ale and the reference datasets using self-supervised speech model (SSLM) embeddings and the openSMILE feature extractor . We find the embeddings superior to the hand-crafted features. The best model achieves an unweighted average recall (UAR) of 64.1% on the EmoT ale corpus using leave-one-speaker-out cross-validation, comparable to the performance on DES. Speech signals are rich in information, both linguistic (in the form of sentences and words) and paralinguistic (denoting mood and affective state). Speech also carries information about multiple, potentially personal traits of the speaker, such as age, gender, and nationality. Multiple psychological and neuroscientific models of the mind hypothesize that language and emotion are certainly linked [1]. For example, some cultures express anger more vocally, while others might be more restrained.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.14548

Country:

Oceania > Australia (0.28)
Europe > Denmark > Capital Region (0.14)

Genre: Research Report > Experimental Study (0.47)

Industry:

Health & Medicine (0.68)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.34)

Add feedback

FM4NPP: A Scaling Foundation Model for Nuclear and Particle Physics

Park, David, Li, Shuhang, Huang, Yi, Luo, Xihaier, Yu, Haiwang, Go, Yeonju, Pinkenburg, Christopher, Lin, Yuewei, Yoo, Shinjae, Osborn, Joseph, Huang, Jin, Ren, Yihui

arXiv.org Artificial IntelligenceAug-21-2025

Large language models have revolutionized artificial intelligence by enabling large, generalizable models trained through self-supervision. This paradigm has inspired the development of scientific foundation models (FMs). However, applying this capability to experimental particle physics is challenging due to the sparse, spatially distributed nature of detector data, which differs dramatically from natural language. This work addresses if an FM for particle physics can scale and generalize across diverse tasks. We introduce a new dataset with more than 11 million particle collision events and a suite of downstream tasks and labeled data for evaluation. We propose a novel self-supervised training method for detector data and demonstrate its neural scalability with models that feature up to 188 million parameters. With frozen weights and task-specific adapters, this FM consistently outperforms baseline models across all downstream tasks. The performance also exhibits robust data-efficient adaptation. Further analysis reveals that the representations extracted by the FM are task-agnostic but can be specialized via a single linear mapping for different downstream tasks.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.14087

Genre: Research Report > New Finding (0.92)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Benchmarking Sociolinguistic Diversity in Swahili NLP: A Taxonomy-Guided Approach

Oketch, Kezia, Lalor, John P., Abbasi, Ahmed

arXiv.org Artificial IntelligenceAug-21-2025

We introduce the first taxonomy-guided evaluation of Swahili NLP, addressing gaps in sociolinguistic diversity. Drawing on health-related psychometric tasks, we collect a dataset of 2,170 free-text responses from Kenyan speakers. The data exhibits tribal influences, urban vernacular, code-mixing, and loanwords. We develop a structured taxonomy and use it as a lens for examining model prediction errors across pre-trained and instruction-tuned language models. Our findings advance culturally grounded evaluation frameworks and highlight the role of sociolinguistic variation in shaping model performance.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2508.14051

Country:

Africa (1.00)
North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback