AITopics | stress test

Collaborating Authors

stress test

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Triangulation as an Acceptance Rule for Multilingual Mechanistic Interpretability

Long, Yanan

arXiv.org Machine LearningJan-1-2026

Multilingual language models achieve strong aggregate performance yet often behave unpredictably across languages, scripts, and cultures. We argue that mechanistic explanations for such models should satisfy a \emph{causal} standard: claims must survive causal interventions and must \emph{cross-reference} across environments that perturb surface form while preserving meaning. We formalize \emph{reference families} as predicate-preserving variants and introduce \emph{triangulation}, an acceptance rule requiring necessity (ablating the circuit degrades the target behavior), sufficiency (patching activations transfers the behavior), and invariance (both effects remain directionally stable and of sufficient magnitude across the reference family). To supply candidate subgraphs, we adopt automatic circuit discovery and \emph{accept or reject} those candidates by triangulation. We ground triangulation in causal abstraction by casting it as an approximate transformation score over a distribution of interchange interventions, connect it to the pragmatic interpretability agenda, and present a comparative experimental protocol across multiple model families, language pairs, and tasks. Triangulation provides a falsifiable standard for mechanistic claims that filters spurious circuits passing single-environment tests but failing cross-lingual invariance.

intervention, large language model, machine learning, (18 more...)

arXiv.org Machine Learning

2512.24842

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LLMs Show Surface-Form Brittleness Under Paraphrase Stress Tests

Carranza, Juan Miguel Navarro

arXiv.org Artificial IntelligenceOct-13-2025

Benchmark scores for Large Language Models (LLMs) can be inflated by memorization of test items or near duplicates. We present a simple, protocol that probes generalization by re-evaluating models on paraphrased versions of benchmark questions. Using Mistral-7B-Instruct and Qwen2.5-7B-Instruct, we measure the accuracy gap between original and paraphrased items on ARC-Easy and ARC-Challenge. Our pipeline controls decoding, enforces multiple-choice output format, and includes a robust paraphrase-cleaning step to preserve semantics. We find that paraphrasing induces a non-trivial accuracy drop (original vs. paraphrased), consistent with prior concerns about contamination and brittle surface-form shortcuts.

large language model, natural language, qwen2, (12 more...)

arXiv.org Artificial Intelligence

2510.08616

Genre: Research Report (0.53)

Industry: Education (0.89)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Counterfactual Invariance to Spurious Correlations: Why and How to Pass Stress Tests Victor V eitch 1,2, Alexander D'Amour 1, Steve Y adlowsky 1, and Jacob Eisenstein 1 1

Neural Information Processing SystemsAug-15-2025, 16:19:06 GMT

Informally, a'spurious correlation' is the dependence of a model on some aspect of the input data that an analyst thinks shouldn't matter. In machine learning, these have a know-it-when-you-see-it character; e.g., changing the gender of a sentence's subject changes a sentiment predictor's output. To check for spurious correlations, we can'stress test' models by perturbing irrelevant parts of input data and seeing if model predictions change. In this paper, we study stress testing using the tools of causal inference. We introduce counterfactual invariance as a formalization of the requirement that changing irrelevant parts of the input shouldn't change model predictions.

artificial intelligence, counterfactual invariance, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Electromechanical Dynamics of the Heart: A Study of Cardiac Hysteresis During Physical Stress Test

Karimi, Sajjad, Karimi, Shirin, Shah, Amit J., Clifford, Gari D., Sameni, Reza

arXiv.org Artificial IntelligenceOct-25-2024

Cardiovascular diseases are best diagnosed using multiple modalities that assess both the heart's electrical and mechanical functions. While effective, imaging techniques like echocardiography and nuclear imaging are costly and not widely accessible. More affordable technologies, such as simultaneous electrocardiography (ECG) and phonocardiography (PCG), may provide valuable insights into electromechanical coupling and could be useful for prescreening in low-resource settings. Using physical stress test data from the EPHNOGRAM ECG-PCG dataset, collected from 23 healthy male subjects (age: 25.4+/-1.9 yrs), we investigated electromechanical intervals (RR, QT, systolic, and diastolic) and their interactions during exercise, along with hysteresis between cardiac electrical activity and mechanical responses. Time delay analysis revealed distinct temporal relationships between QT, systolic, and diastolic intervals, with RR as the primary driver. The diastolic interval showed near-synchrony with RR, while QT responded to RR interval changes with an average delay of 10.5s, and the systolic interval responded more slowly, with an average delay of 28.3s. We examined QT-RR, systolic-RR, and diastolic-RR hysteresis, finding narrower loops for diastolic RR and wider loops for systolic RR. Significant correlations (average:0.75) were found between heart rate changes and hysteresis loop areas, suggesting the equivalent circular area diameter as a promising biomarker for cardiac function under exercise stress. Deep learning models, including Long Short-Term Memory and Convolutional Neural Networks, estimated the QT, systolic, and diastolic intervals from RR data, confirming the nonlinear relationship between RR and other intervals. Findings highlight a significant cardiac memory effect, linking ECG and PCG morphology and timing to heart rate history.

artificial intelligence, machine learning, rr interval, (20 more...)

arXiv.org Artificial Intelligence

2410.19667

Country:

Europe > Portugal > Coimbra > Coimbra (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics

Cosma, Adrian, Ruseti, Stefan, Dascalu, Mihai, Caragea, Cornelia

arXiv.org Artificial IntelligenceOct-4-2024

Natural Language Inference (NLI) evaluation is crucial for assessing language understanding models; however, popular datasets suffer from systematic spurious correlations that artificially inflate actual model performance. To address this, we propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples. We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics. This categorization significantly reduces spurious correlation measures, with examples labeled as having the highest difficulty showing markedly decreased performance and encompassing more realistic and diverse linguistic phenomena. When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset, surpassing other dataset characterization techniques. Our research addresses limitations in NLI dataset construction, providing a more authentic evaluation of model performance with implications for diverse NLU applications.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2410.03429

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(10 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

InstaGrasp: An Entirely 3D Printed Adaptive Gripper with TPU Soft Elements and Minimal Assembly Time

Zhou, Xin, Spiers, Adam J.

arXiv.org Artificial IntelligenceMay-26-2023

Fabricating existing and popular open-source adaptive robotic grippers commonly involves using multiple professional machines, purchasing a wide range of parts, and tedious, time-consuming assembly processes. This poses a significant barrier to entry for some robotics researchers and drives others to opt for expensive commercial alternatives. To provide both parties with an easier and cheaper (under 100GBP) solution, we propose a novel adaptive gripper design where every component (with the exception of actuators and the screws that come packaged with them) can be fabricated on a hobby-grade 3D printer, via a combination of inexpensive and readily available PLA and TPU filaments. This approach means that the gripper's tendons, flexure joints and finger pads are now printed, as a replacement for traditional string-tendons and molded urethane flexures and pads. A push-fit systems results in an assembly time of under 10 minutes. The gripper design is also highly modular and requires only a few minutes to replace any part, leading to extremely user-friendly maintenance and part modifications. An extensive stress test has shown a level of durability more than suitable for research, whilst grasping experiments (with perturbations) using items from the YCB object set has also proven its mechanical adaptability to be highly satisfactory.

artificial intelligence, instagrasp, tendon, (15 more...)

arXiv.org Artificial Intelligence

2305.17029

Country:

North America > United States (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.50)

Industry: Machinery > Industrial Machinery (0.50)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (0.48)

Add feedback

Should Bank Stress Tests Be Fair?

Glasserman, Paul, Li, Mike

arXiv.org Artificial IntelligenceMay-12-2023

Regulatory stress tests have become one of the main tools for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for bank-specific portfolios in shared stress scenarios. As a matter of policy, the same models are used for all banks, despite considerable heterogeneity across institutions; individual banks have contended that some models are not suited to their businesses. Motivated by this debate, we ask, what is a fair aggregation of individually tailored models into a common model? We argue that simply pooling data across banks treats banks equally but is subject to two deficiencies: it may distort the impact of legitimate portfolio features, and it is vulnerable to implicit misdirection of legitimate information to infer bank identity. We compare various notions of regression fairness to address these deficiencies, considering both forecast accuracy and equal treatment. In the setting of linear models, we argue for estimating and then discarding centered bank fixed effects as preferable to simply ignoring differences across banks. We present evidence that the overall impact can be material. We also discuss extensions to nonlinear models.

artificial intelligence, machine learning, modeling & simulation, (20 more...)

arXiv.org Artificial Intelligence

2207.13319

Country:

North America > United States > District of Columbia > Washington (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(6 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Trading (1.00)
Banking & Finance > Loans (1.00)
Banking & Finance > Economy (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

DeScoD-ECG: Deep Score-Based Diffusion Model for ECG Baseline Wander and Noise Removal

Li, Huayu, Ditzler, Gregory, Roveda, Janet, Li, Ao

arXiv.org Artificial IntelligenceJan-17-2023

Objective: Electrocardiogram (ECG) signals commonly suffer noise interference, such as baseline wander. High-quality and high-fidelity reconstruction of the ECG signals is of great significance to diagnosing cardiovascular diseases. Therefore, this paper proposes a novel ECG baseline wander and noise removal technology. Methods: We extended the diffusion model in a conditional manner that was specific to the ECG signals, namely the Deep Score-Based Diffusion model for Electrocardiogram baseline wander and noise removal (DeScoD-ECG). Moreover, we deployed a multi-shots averaging strategy that improved signal reconstructions. We conducted the experiments on the QT Database and the MIT-BIH Noise Stress Test Database to verify the feasibility of the proposed method. Baseline methods are adopted for comparison, including traditional digital filter-based and deep learning-based methods. Results: The quantities evaluation results show that the proposed method obtained outstanding performance on four distance-based similarity metrics with at least 20\% overall improvement compared with the best baseline method. Conclusion: This paper demonstrates the state-of-the-art performance of the DeScoD-ECG for ECG baseline wander and noise removal, which has better approximations of the true data distribution and higher stability under extreme noise corruptions. Significance: This study is one of the first to extend the conditional diffusion-based generative model for ECG noise removal, and the DeScoD-ECG has the potential to be widely used in biomedical applications.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JBHI.2023.3237712

2208.00542

Country:

North America > United States > Arizona > Pima County > Tucson (0.14)
North America > United States > New Jersey > Gloucester County > Glassboro (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Using machine learning to forecast amine emissions

AIHubJan-16-2023, 10:50:48 GMT

Global warming is partly due to the vast amount of carbon dioxide that we release, mostly from power generation and industrial processes, such as making steel and cement. For a while now, chemical engineers have been exploring carbon capture, a process that can separate carbon dioxide and store it in ways that keep it out of the atmosphere. This is done in dedicated carbon-capture plants, whose chemical process involves amines, compounds that are already used to capture carbon dioxide from natural gas processing and refining plants. Amines are also used in certain pharmaceuticals, epoxy resins, and dyes. The problem is that amines could also be potentially harmful to the environment as well as a health hazard, making it essential to mitigate their impact.

artificial intelligence, europe government, machine learning, (17 more...)

AIHub

Country: Europe (0.52)

Industry:

Materials > Chemicals > Commodity Chemicals > Petrochemicals (1.00)
Energy > Oil & Gas > Downstream (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.90)

Add feedback

ASUS Launches Z790 Series Motherboards for 13th Gen Intel Core Processors

#artificialintelligenceSep-28-2022, 10:36:09 GMT

ASUS announced a comprehensive lineup of Intel Z790 motherboards across the ROG Maximus, ROG Strix, TUF Gaming, and Prime product families―all built to support the latest 13th Gen Intel Core processors. Thanks to exclusive technologies like AEMP II, AI Overclocking and AI Cooling II, plus user-friendly features such as Q-Design, ASUS Z790 motherboards are ideal solutions for users aiming to build a next-gen machine or upgrade their existing system. When ASUS first launched the previous-gen Z690 models, DDR5 memory modules had only recently hit the market. Many builders looked for assurance that their new memory would work as advertised in their new motherboards--and ASUS went above and beyond to make that happen. It collaborated closely with a wide range of industry partners to offer ASUS Enhanced Memory Profile (AEMP), which ensures better and wider compatibility with popular brands of RAM.

13th gen intel core processor, ai overclocking, motherboard, (9 more...)

#artificialintelligence

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.99)

Add feedback