Materials
Regression Augmentation With Data-Driven Segmentation
Alahyari, Shayan, Ghobadlou, Shiva Mehdipour, Domaratzki, Mike
Imbalanced regression arises when the target distribution is skewed, causing models to focus on dense regions and struggle with underrepresented (minority) samples. Despite its relevance across many applications, few methods have been designed specifically for this challenge. Existing approaches often rely on fixed, ad hoc thresholds to label samples as rare or common, overlooking the continuous complexity of the joint feature-target space and fail to represent the true underlying rare regions. To address these limitations, we propose a fully data-driven GAN-based augmentation framework that uses Mahalanobis-Gaussian Mixture Modeling (GMM) to automatically identify minority samples and employs deterministic nearest-neighbour matching to enrich sparse regions. Rather than preset thresholds, our method lets the data determine which observations are truly rare. Evaluation on 32 benchmark imbalanced regression datasets demonstrates that our approach consistently outperforms state-of-the-art data augmentation methods.
Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models
Pan, Jiazhen, Jian, Bailiang, Hager, Paul, Zhang, Yundi, Liu, Che, Jungmann, Friedrike, Li, Hongwei Bran, You, Chenyu, Wu, Junde, Zhu, Jiayuan, Liu, Fenglin, Liu, Yuyuan, Bubeck, Niklas, Wachinger, Christian, Chen, null, Chen, null, Gong, Zhenyu, Ouyang, Cheng, Kaissis, Georgios, Wiestler, Benedikt, Rueckert, Daniel
Ensuring the safety and reliability of large language models (LLMs) in clinical practice is critical to prevent patient harm and promote trustworthy healthcare applications of AI. However, LLMs are advancing so rapidly that static safety benchmarks often become obsolete upon publication, yielding only an incomplete and sometimes misleading picture of model trustworthiness. We demonstrate that a Dynamic, Automatic, and Systematic (DAS) red-teaming framework that continuously stress-tests LLMs can reveal significant weaknesses of current LLMs across four safety-critical domains: robustness, privacy, bias/fairness, and hallucination. A suite of adversarial agents is applied to autonomously mutate test cases, identify/evolve unsafe-triggering strategies, and evaluate responses, uncovering vulnerabilities in real time without human intervention. Applying DAS to 15 proprietary and open-source LLMs revealed a stark contrast between static benchmark performance and vulnerability under adversarial pressure. Despite a median MedQA accuracy exceeding 80\%, 94\% of previously correct answers failed our dynamic robustness tests. We observed similarly high failure rates across other domains: privacy leaks were elicited in 86\% of scenarios, cognitive-bias priming altered clinical recommendations in 81\% of fairness tests, and we identified hallucination rates exceeding 66\% in widely used models. Such profound residual risks are incompatible with routine clinical practice. By converting red-teaming from a static checklist into a dynamic stress-test audit, DAS red-teaming offers the surveillance that hospitals/regulators/technology vendors require as LLMs become embedded in patient chatbots, decision-support dashboards, and broader healthcare workflows. Our framework delivers an evolvable, scalable, and reliable safeguard for the next generation of medical AI.
A Data-Driven Machine Learning Approach for Predicting Axial Load Capacity in Steel Storage Rack Columns
Mammadli, Bakhtiyar, Yazici, Casim, Gรผrbรผz, Muhammed, Kocaman, ฤฐrfan, Dominguez-Gutierrez, F. Javier, รzkal, Fatih Mehmet
In this study, we present a machine learning (ML) framework to predict the axial load-bearing capacity, (kN), of cold-formed steel structural members. The methodology emphasizes robust model selection and interpretability, addressing the limitations of traditional analytical approaches in capturing the nonlinearities and geometrical complexities inherent to buckling behavior. The dataset, comprising key geometric and mechanical parameters of steel columns, was curated with appropriate pre-processing steps including removal of non-informative identifiers and imputation of missing values. A comprehensive suite of regression algorithms, ranging from linear models to kernel-based regressors and ensemble tree methods was evaluated. Among these, Gradient Boosting Regression exhibited superior predictive performance across multiple metrics, including the coefficient of determination (R2), root mean squared error (RMSE), and mean absolute error (MAE), and was consequently selected as the final model. Model interpretability was addressed using SHapley Additive exPlanations (SHAP), enabling insight into the relative importance and interaction of input features influencing the predicted axial capacity. To facilitate practical deployment, the model was integrated into an interactive, Python-based web interface via Streamlit. This tool allows end-users-such as structural engineers and designers, to input design parameters manually or through CSV upload, and to obtain real-time predictions of axial load capacity without the need for programming expertise. Applied to the context of steel storage rack columns, the framework demonstrates how data-driven tools can enhance design safety, streamline validation workflows, and inform decision-making in structural applications where buckling is a critical failure mode
GraphVSSM: Graph Variational State-Space Model for Probabilistic Spatiotemporal Inference of Dynamic Exposure and Vulnerability for Regional Disaster Resilience Assessment
Dimasaka, Joshua, Geiร, Christian, So, Emily
Regional disaster resilience quantifies the changing nature of physical risks to inform policy instruments ranging from local immediate recovery to international sustainable development. While many existing state-of-practice methods have greatly advanced the dynamic mapping of exposure and hazard, our understanding of large-scale physical vulnerability has remained static, costly, limited, region-specific, coarse-grained, overly aggregated, and inadequately calibrated. With the significant growth in the availability of time-series satellite imagery and derived products for exposure and hazard, we focus our work on the equally important yet challenging element of the risk equation: physical vulnerability. We leverage machine learning methods that flexibly capture spatial contextual relationships, limited temporal observations, and uncertainty in a unified probabilistic spatiotemporal inference framework. We therefore introduce Graph Variational State-Space Model (GraphVSSM), a novel modular spatiotemporal approach that uniquely integrates graph deep learning, state-space modeling, and variational inference using time-series data and prior expert belief systems in a weakly supervised or coarse-to-fine-grained manner. We present three major results: a city-wide demonstration in Quezon City, Philippines; an investigation of sudden changes in the cyclone-impacted coastal Khurushkul community (Bangladesh) and mudslide-affected Freetown (Sierra Leone); and an open geospatial dataset, METEOR 2.5D, that spatiotemporally enhances the existing global static dataset for UN Least Developed Countries (2020). Beyond advancing regional disaster resilience assessment and improving our understanding global disaster risk reduction progress, our method also offers a probabilistic deep learning approach, contributing to broader urban studies that require compositional data analysis in weak supervision.
MolReasoner: Toward Effective and Interpretable Reasoning for Molecular LLMs
Zhao, Guojiang, Li, Sihang, Lu, Zixiang, Cheng, Zheng, Lin, Haitao, Wu, Lirong, Xia, Hanchen, Cai, Hengxing, Guo, Wentao, Wang, Hongshuai, Xu, Mingjun, Zhu, Siyu, Ke, Guolin, Zhang, Linfeng, Gao, Zhifeng
Large Language Models (LLMs) have demonstrated remarkable performance across various domains, yet their capabilities in molecular reasoning remain insufficiently explored. Current approaches tend to rely heavily on general-purpose prompting, which lacks domain-specific molecular semantics, while those that use fine-tuning strategies often face challenges with interpretability and reasoning depth. To address these issues, we introduce MolReasoner, a two-stage framework designed to transition LLMs from memorization towards chemical reasoning. First, we propose Mol-SFT, which initializes the model's reasoning abilities via synthetic Chain-of-Thought (CoT) samples generated by GPT -4o and verified for chemical accuracy. Subsequently, Mol-RL applies reinforcement learning with specialized reward functions designed explicitly to align chemical structures with linguistic descriptions, thereby enhancing molecular reasoning capabilities. Our approach notably enhances interpretability, improving the model's molecular understanding and enabling better generalization. Extensive experiments demonstrate that MolReasoner outperforms existing methods, and marking a significant shift from memorization-based outputs to robust chemical reasoning. Our code is available at https://github.
TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs
Das, Amitava, Jain, Vinija, Chadha, Aman
Large Language Models (LLMs) fine-tuned to align with human values often exhibit alignment drift, producing unsafe or policy-violating completions when exposed to adversarial prompts, decoding perturbations, or paraphrased jailbreaks. While prior work has behaviorally characterized alignment failure, little is known about the training-time belief sources underlying these failures. We introduce TraceAlign, a unified framework for tracing unsafe completions back to their root causes in the model's training corpus. Central to our approach is the Belief Conflict Index (BCI), which quantifies semantic inconsistency between generated spans and aligned policies, based on retrieved training documents using suffix-array matching. We propose three complementary interventions: (i) TraceShield, an inference-time safety filter that refuses completions with high-BCI spans, (ii) Contrastive Belief Deconfliction Loss, a contrastive fine-tuning objective penalizing high-BCI continuations during DPO, and (iii) Prov-Decode, a provenance-aware decoding strategy that vetoes beam expansions predicted to yield high-BCI spans. Together, these defenses reduce alignment drift by up to 85% on our curated Alignment Drift Benchmark (ADB) while preserving utility on standard tasks, with delta less than 0.2 and improved refusal quality. We further derive a theoretical upper bound on drift likelihood via suffix-array span statistics, linking memorization frequency and length to adversarial reactivation risk. TraceAlign thus provides the first scalable, traceable, and grounded toolkit for understanding and mitigating alignment failures at source. To encourage further exploration and development, we open-source our implementation at: https://anonymous.4open.science/r/tracealign-2DA7
Toward using explainable data-driven surrogate models for treating performance-based seismic design as an inverse engineering problem
This study presents a methodology to treat performance-based seismic design as an inverse engineering problem, where design parameters are directly derived to achieve specific performance objectives. By implementing explainable machine learning models, this methodology directly maps design variables and performance metrics, tackling computational inefficiencies of performance-based design. The resultant machine learning model is integrated as an evaluation function into a genetic optimization algorithm to solve the inverse problem. The developed methodology is then applied to two different inventories of steel and concrete moment frames in Los Angeles and Charleston to obtain sectional properties of frame members that minimize expected annualized seismic loss in terms of repair costs. The results show high accuracy of the surrogate models (e.g., R2> 90%) across a diverse set of building types, geometries, seismic design, and site hazard, where the optimization algorithm could identify the optimum values of members' properties for a fixed set of geometric variables, consistent with engineering principles.
Adaptive Machine Learning-Driven Multi-Fidelity Stratified Sampling for Failure Analysis of Nonlinear Stochastic Systems
Xu, Liuyun, Spence, Seymour M. J.
Existing variance reduction techniques used in stochastic simulations for rare event analysis still require a substantial number of model evaluations to estimate small failure probabilities. In the context of complex, nonlinear finite element modeling environments, this can become computationally challenging-particularly for systems subjected to stochastic excitation. To address this challenge, a multi-fidelity stratified sampling scheme with adaptive machine learning metamodels is introduced for efficiently propagating uncertainties and estimating small failure probabilities. In this approach, a high-fidelity dataset generated through stratified sampling is used to train a deep learning-based metamodel, which then serves as a cost-effective and highly correlated low-fidelity model. An adaptive training scheme is proposed to balance the trade-off between approximation quality and computational demand associated with the development of the low-fidelity model. By integrating the low-fidelity outputs with additional high-fidelity results, an unbiased estimate of the strata-wise failure probabilities is obtained using a multi-fidelity Monte Carlo framework. The overall probability of failure is then computed using the total probability theorem. Application to a full-scale high-rise steel building subjected to stochastic wind excitation demonstrates that the proposed scheme can accurately estimate exceedance probability curves for nonlinear responses of interest, while achieving significant computational savings compared to single-fidelity variance reduction approaches.
Design of a bioinspired robophysical antenna for insect-scale tactile perception and navigation
McDonnell, Parker, Meng, Lingsheng, Hariprasad, Hari Krishna, Hedrick, Alexander, Miscles, Eduardo, Gilinsky, Samuel, Mongeau, Jean-Michel, Jayaram, Kaushik
To whom correspondence should be addressed; E-mail: kaushik.jayaram@colorado.edu. Keywords: tactile sensor, capacitive sensing and robophysical antenna Abstract: The American cockroach ( Periplaneta americana) uses its soft antennae to guide decision making by extracting rich tactile information from tens of thousands of distributed mechanosensors. Although tactile sensors enable robust, autonomous perception and navigation in natural systems, replicating these capabilities in insect-scale robots remains challenging due to stringent size, weight, and power constraints that limit existing sensor technologies. To overcome these limitations, we introduce CITRAS (Cockroach Inspired Tactile Robotic Antenna Sensor), a bioinspired, multi-segmented, compliant laminate sensor with embedded capacitive angle sensors. The segmented compliant structure passively bends in response to environmental stimuli, achieving accurate hinge angle measurements with maximum errors of just 0.79 Experimental evaluations demonstrate CITRAS' multifunctional tactile perception capabilities: predicting base-to-tip distances with 7 .75 The future integration of this bioinspired tactile antenna in insect-scale robots addresses critical sensing gaps, promising enhanced autonomous exploration, obstacle avoidance, and environmental mapping in complex, confined environments. For instance, drawing inspiration from the compliant exoskeletons of arthropods, recent miniature robots are now capable of adaptive morphological changes, enabling unprecedented locomotion in confined spaces [8]. Notable examples include shape-morphing robots such as CLARI [9] and its miniature variant mCLARI [10], capable of lateral body compression to navigate narrow horizontal gaps. Such small-scale robots offer new opportunities for robotics, including environmental monitoring [11], high-value asset inspection [12], search-and-rescue operations [13], and targeted healthcare delivery [14]. Despite these advances, reliable autonomous operation remains elusive due to severe size, weight, and power (SWAP) constraints, significantly limiting onboard sensing and perception capabilities.
A Machine Learning Approach for Honey Adulteration Detection using Mineral Element Profiles
Al-Awadhi, Mokhtar A., Deshmukh, Ratnadeep R.
This paper aims to develop a Machin e Learning (ML) - based system for detecting honey adulteration utilizing honey mineral element profiles. The proposed system comprises two phases: preprocessing and classification. The preprocessing phase involves the treatment of missing - value attributes a nd normalization. In the classification phase, we use three supervised ML models: logistic regression, d ecision tree, and random forest, to discriminate between authentic and adulterated honey. To evaluate the performance of the ML models, we use a public dataset comprising measurements of mineral element content of authentic honey, sugar syrups, and adulterated honey. Experimental findings show that mineral element content in honey provides robust discriminative information for detecting honey adulteration . Results also dem onstrate that the random forest - based classifier outperforms other classifiers on this dataset, achieving the highest cross - validation accuracy of 98.37%.