Performance Analysis
Comparing Model-agnostic Feature Selection Methods through Relative Efficiency
Zheng, Chenghui, Raskutti, Garvesh
Feature selection and importance estimation in a model-agnostic setting is an ongoing challenge of significant interest. Wrapper methods are commonly used because they are typically model-agnostic, even though they are computationally intensive. In this paper, we focus on feature selection methods related to the Generalized Covariance Measure (GCM) and Leave-One-Covariate-Out (LOCO) estimation, and provide a comparison based on relative efficiency. In particular, we present a theoretical comparison under three model settings: linear models, non-linear additive models, and single index models that mimic a single-layer neural network. We complement this with extensive simulations and real data examples. Our theoretical results, along with empirical findings, demonstrate that GCM-related methods generally outperform LOCO under suitable regularity conditions. Furthermore, we quantify the asymptotic relative efficiency of these approaches. Our simulations and real data analysis include widely used machine learning methods such as neural networks and gradient boosting trees.
Evaluation and Optimization of Leave-one-out Cross-validation for the Lasso
I develop an algorithm to produce the piecewise quadratic that computes leave-one-out cross-validation for the lasso as a function of its hyperparameter. The algorithm can be used to find exact hyperparameters that optimize leave-one-out cross-validation either globally or locally, and its practicality is demonstrated on real-world data sets.
The C-index Multiverse
Sierra, Begoรฑa B., McLean, Colin, Hall, Peter S., Vallejos, Catalina A.
Quantifying out-of-sample discrimination performance for time-to-event outcomes is a fundamental step for model evaluation and selection in the context of predictive modelling. The concordance index, or C-index, is a widely used metric for this purpose, particularly with the growing development of machine learning methods. Beyond differences between proposed C-index estimators (e.g. Harrell's, Uno's and Antolini's), we demonstrate the existence of a C-index multiverse among available R and python software, where seemingly equal implementations can yield different results. This can undermine reproducibility and complicate fair comparisons across models and studies. Key variation sources include tie handling and adjustment to censoring. Additionally, the absence of a standardised approach to summarise risk from survival distributions, result in another source of variation dependent on input types. We demonstrate the consequences of the C-index multiverse when quantifying predictive performance for several survival models (from Cox proportional hazards to recent deep learning approaches) on publicly available breast cancer data, and semi-synthetic examples. Our work emphasises the need for better reporting to improve transparency and reproducibility. This article aims to be a useful guideline, helping analysts when navigating the multiverse, providing unified documentation and highlighting potential pitfalls of existing software. All code is publicly available at: www.github.com/BBolosSierra/CindexMultiverse.
Safe and Transparent Robots for Human-in-the-Loop Meat Processing
Parekh, Sagar, Grothoff, Casey, Wright, Ryan, White, Robin, Losey, Dylan P.
Labor shortages have severely affected the meat processing sector. Automated technology has the potential to support the meat industry, assist workers, and enhance job quality. However, existing automation in meat processing is highly specialized, inflexible, and cost intensive. Instead of forcing manufacturers to buy a separate device for each step of the process, our objective is to develop general-purpose robotic systems that work alongside humans to perform multiple meat processing tasks. Through a recently conducted survey of industry experts, we identified two main challenges associated with integrating these collaborative robots alongside human workers. First, there must be measures to ensure the safety of human coworkers; second, the coworkers need to understand what the robot is doing. This paper addresses both challenges by introducing a safety and transparency framework for general-purpose meat processing robots. For safety, we implement a hand-detection system that continuously monitors nearby humans. This system can halt the robot in situations where the human comes into close proximity of the operating robot. We also develop an instrumented knife equipped with a force sensor that can differentiate contact between objects such as meat, bone, or fixtures. For transparency, we introduce a method that detects the robot's uncertainty about its performance and uses an LED interface to communicate that uncertainty to the human. Additionally, we design a graphical interface that displays the robot's plans and allows the human to provide feedback on the planned cut. Overall, our framework can ensure safe operation while keeping human workers in-the-loop about the robot's actions which we validate through a user study.
Improving Fairness in Graph Neural Networks via Counterfactual Debiasing
Wo, Zengyi, Liu, Chang, Wang, Yumeng, Shao, Minglai, Wang, Wenjun
Graph Neural Networks (GNNs) have been successful in modeling graph-structured data. However, similar to other machine learning models, GNNs can exhibit bias in predictions based on attributes like race and gender. Moreover, bias in GNNs can be exacerbated by the graph structure and message-passing mechanisms. Recent cutting-edge methods propose mitigating bias by filtering out sensitive information from input or representations, like edge dropping or feature masking. Yet, we argue that such strategies may unintentionally eliminate non-sensitive features, leading to a compromised balance between predictive accuracy and fairness. To tackle this challenge, we present a novel approach utilizing counterfactual data augmentation for bias mitigation. This method involves creating diverse neighborhoods using counterfactuals before message passing, facilitating unbiased node representations learning from the augmented graph. Subsequently, an adversarial discriminator is employed to diminish bias in predictions by conventional GNN classifiers. Our proposed technique, Fair-ICD, ensures the fairness of GNNs under moderate conditions. Experiments on standard datasets using three GNN backbones demonstrate that Fair-ICD notably enhances fairness metrics while preserving high predictive performance.
EmoTale: An Enacted Speech-emotion Dataset in Danish
Hjuler, Maja J., Skat-Rรธrdam, Harald V., Clemmensen, Line H., Das, Sneha
--While multiple emotional speech corpora exist for commonly spoken languages, there is a lack of functional datasets for smaller (spoken) languages, such as Danish. T o our knowledge, Danish Emotional Speech (DES), published in 1997, is the only other database of Danish emotional speech. We demonstrate the validity of the dataset by investigating and presenting its predictive power using speech emotion recognition (SER) models. We develop SER models for EmoT ale and the reference datasets using self-supervised speech model (SSLM) embeddings and the openSMILE feature extractor . We find the embeddings superior to the hand-crafted features. The best model achieves an unweighted average recall (UAR) of 64.1% on the EmoT ale corpus using leave-one-speaker-out cross-validation, comparable to the performance on DES. Speech signals are rich in information, both linguistic (in the form of sentences and words) and paralinguistic (denoting mood and affective state). Speech also carries information about multiple, potentially personal traits of the speaker, such as age, gender, and nationality. Multiple psychological and neuroscientific models of the mind hypothesize that language and emotion are certainly linked [1]. For example, some cultures express anger more vocally, while others might be more restrained.
FM4NPP: A Scaling Foundation Model for Nuclear and Particle Physics
Park, David, Li, Shuhang, Huang, Yi, Luo, Xihaier, Yu, Haiwang, Go, Yeonju, Pinkenburg, Christopher, Lin, Yuewei, Yoo, Shinjae, Osborn, Joseph, Huang, Jin, Ren, Yihui
Large language models have revolutionized artificial intelligence by enabling large, generalizable models trained through self-supervision. This paradigm has inspired the development of scientific foundation models (FMs). However, applying this capability to experimental particle physics is challenging due to the sparse, spatially distributed nature of detector data, which differs dramatically from natural language. This work addresses if an FM for particle physics can scale and generalize across diverse tasks. We introduce a new dataset with more than 11 million particle collision events and a suite of downstream tasks and labeled data for evaluation. We propose a novel self-supervised training method for detector data and demonstrate its neural scalability with models that feature up to 188 million parameters. With frozen weights and task-specific adapters, this FM consistently outperforms baseline models across all downstream tasks. The performance also exhibits robust data-efficient adaptation. Further analysis reveals that the representations extracted by the FM are task-agnostic but can be specialized via a single linear mapping for different downstream tasks.
Benchmarking Sociolinguistic Diversity in Swahili NLP: A Taxonomy-Guided Approach
Oketch, Kezia, Lalor, John P., Abbasi, Ahmed
We introduce the first taxonomy-guided evaluation of Swahili NLP, addressing gaps in sociolinguistic diversity. Drawing on health-related psychometric tasks, we collect a dataset of 2,170 free-text responses from Kenyan speakers. The data exhibits tribal influences, urban vernacular, code-mixing, and loanwords. We develop a structured taxonomy and use it as a lens for examining model prediction errors across pre-trained and instruction-tuned language models. Our findings advance culturally grounded evaluation frameworks and highlight the role of sociolinguistic variation in shaping model performance.