Goto

Collaborating Authors

 resample



Evaluating Control Protocols for Untrusted AI Agents

Kutasov, Jon, Loughridge, Chloe, Sun, Yuqi, Sleight, Henry, Shlegeris, Buck, Tracy, Tyler, Benton, Joe

arXiv.org Artificial Intelligence

As AI systems become more capable and widely deployed as agents, ensuring their safe operation becomes critical. AI control offers one approach to mitigating the risk from untrusted AI agents by monitoring their actions and intervening or auditing when necessary. Evaluating the safety of these protocols requires understanding both their effectiveness against current attacks and their robustness to adaptive adversaries. In this work, we systematically evaluate a range of control protocols in SHADE-Arena, a dataset of diverse agentic environments. First, we evaluate blue team protocols, including deferral to trusted models, resampling, and deferring on critical actions, against a default attack policy. We find that resampling for incrimination and deferring on critical actions perform best, increasing safety from 50% to 96%. We then iterate on red team strategies against these protocols and find that attack policies with additional affordances, such as knowledge of when resampling occurs or the ability to simulate monitors, can substantially improve attack success rates against our resampling strategy, decreasing safety to 17%. However, deferring on critical actions is highly robust to even our strongest red team strategies, demonstrating the importance of denying attack policies access to protocol internals.


RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation

Xue, Yuquan, Lu, Guanxing, Wu, Zhenyu, Zhang, Chuanrui, Jia, Bofang, Gu, Zhengyi, Tang, Yansong, Wang, Ziwei

arXiv.org Artificial Intelligence

Vision-Language-Action models (VLAs) have demonstrated remarkable performance on complex robotic manipulation tasks through imitation learning. However, existing imitation learning datasets contain only successful trajectories and lack failure or recovery data, especially for out-of-distribution (OOD) states where the robot deviates from the main policy due to minor perturbations or errors, leading VLA models to struggle with states deviating from the training distribution. To this end, we propose an automated OOD data augmentation framework named RESample through exploratory sampling. Specifically, we first leverage offline reinforcement learning to obtain an action-value network that accurately identifies sub-optimal actions under the current manipulation policy. We further sample potential OOD states from trajectories via rollout, and design an exploratory sampling mechanism that adaptively incorporates these action proxies into the training dataset to ensure efficiency. Subsequently, our framework explicitly encourages the VLAs to recover from OOD states and enhances their robustness against distributional shifts. We conduct extensive experiments on the LIBERO benchmark as well as real-world robotic manipulation tasks, demonstrating that RESample consistently improves the stability and generalization ability of VLA models.


Curia: A Multi-Modal Foundation Model for Radiology

Dancette, Corentin, Khlaut, Julien, Saporta, Antoine, Philippe, Helene, Ferreres, Elodie, Callard, Baptiste, Danielou, Théo, Alberge, Léo, Machado, Léo, Tordjman, Daniel, Dupuis, Julie, Floch, Korentin Le, Terrail, Jean Du, Moshiri, Mariam, Dercle, Laurent, Boeken, Tom, Gregory, Jules, Ronot, Maxime, Legou, François, Roux, Pascal, Sapoval, Marc, Manceron, Pierre, Hérent, Paul

arXiv.org Artificial Intelligence

Radiology is at the center of many medical specialties, which rely on radiologists' interpretation of images from various modalities, including CT, MRI, ultrasound, and X-ray [1]. The analysis of these images is crucial for detecting and characterizing medical conditions, quantifying disease progression, and monitoring treatment efficacy across a broad spectrum of diseases. AI has the potential to enhance radiology workflows and improve radiologists' efficiency, particularly for labor-intensive tasks such as image segmentation, or specialized and/or complex tasks which are prone to inter-reader variability [2, 3]. To date, the dominant paradigm in radiological AI development has involved training specialized models for individual tasks such as segmentation, abnormality detection (e.g., tumor detection), or pathology classification. However, this "one-task, one-model" approach is exceptionally resource-intensive, as it necessitates the curation and manual annotation of large, task-specific datasets for each modality and clinical application [4, 5]. It is potentially one of the bottlenecks in moving AI radiology models into the clinical workflow. Foundation models (FM) represent a significant paradigm shift in the field of AI.



Reviews: Exact sampling of determinantal point processes with sublinear time preprocessing

Neural Information Processing Systems

Review of "Exact sampling of determinantal point processes with sublinear time preprocessing." Summary: the paper proposes a new algorithm for exact sampling from determinantal point processes (DPP). A DPP sampling problem involves an n x n matrix L; the probability of selecting some subset S of k n indices is given by the determinant of a k x k matrix subset divided by the determinant of L I. The proposed algorithm has preprocessing which is sublinear in matrix size n and polynomial in subset size k; sampling is polynomial in k and independent of n. Previous algorithms which require eigen-decomposition or MCMC are O(n 3). The main idea is to downsample the index set using a regularzed DPP and then run a DPP on this downsample.


FILM: Framework for Imbalanced Learning Machines based on a new unbiased performance measure and a new ensemble-based technique

Guillén-Teruel, Antonio, Caracena, Marcos, Pardo, Jose A., de-la-Gándara, Fernando, Palma, José, Botía, Juan A.

arXiv.org Artificial Intelligence

This research addresses the challenges of handling unbalanced datasets for binary classification tasks. In such scenarios, standard evaluation metrics are often biased by the disproportionate representation of the minority class. Conducting experiments across seven datasets, we uncovered inconsistencies in evaluation metrics when determining the model that outperforms others for each binary classification problem. This justifies the need for a metric that provides a more consistent and unbiased evaluation across unbalanced datasets, thereby supporting robust model selection. To mitigate this problem, we propose a novel metric, the Unbiased Integration Coefficients (UIC), which exhibits significantly reduced bias ($p < 10^{-4}$) towards the minority class compared to conventional metrics. The UIC is constructed by aggregating existing metrics while penalising those more prone to imbalance. In addition, we introduce the Identical Partitions for Imbalance Problems (IPIP) algorithm for imbalanced ML problems, an ensemble-based approach. Our experimental results show that IPIP outperforms other baseline imbalance-aware approaches using Random Forest and Logistic Regression models in three out of seven datasets as assessed by the UIC metric, demonstrating its effectiveness in addressing imbalanced data challenges in binary classification tasks. This new framework for dealing with imbalanced datasets is materialized in the FILM (Framework for Imbalanced Learning Machines) R Package, accessible at https://github.com/antoniogt/FILM.


GrainPaint: A multi-scale diffusion-based generative model for microstructure reconstruction of large-scale objects

Hoffman, Nathan, Diniz, Cashen, Liu, Dehao, Rodgers, Theron, Tran, Anh, Fuge, Mark

arXiv.org Artificial Intelligence

Simulation-based approaches to microstructure generation can suffer from a variety of limitations, such as high memory usage, long computational times, and difficulties in generating complex geometries. Generative machine learning models present a way around these issues, but they have previously been limited by the fixed size of their generation area. We present a new microstructure generation methodology leveraging advances in inpainting using denoising diffusion models to overcome this generation area limitation. We show that microstructures generated with the presented methodology are statistically similar to grain structures generated with a kinetic Monte Carlo simulator, SPPARKS.* These authors contributed equally to this work.


Time series classification with random convolution kernels based transforms: pooling operators and input representations matter

Lo, Mouhamadou Mansour, Morvan, Gildas, Rossi, Mathieu, Morganti, Fabrice, Mercier, David

arXiv.org Artificial Intelligence

This article presents a new approach based on MiniRocket, called SelF-Rocket, for fast time series classification (TSC). Unlike existing approaches based on random convolution kernels, it dynamically selects the best couple of input representations and pooling operator during the training process. SelF-Rocket achieves state-of-the-art accuracy on the University of California Riverside (UCR) TSC benchmark datasets.


Explorations of Self-Repair in Language Models

Rushing, Cody, Nanda, Neel

arXiv.org Artificial Intelligence

Prior interpretability research studying narrow distributions has preliminarily identified self-repair, a phenomena where if components in large language models are ablated, later components will change their behavior to compensate. Our work builds off this past literature, demonstrating that self-repair exists on a variety of models families and sizes when ablating individual attention heads on the full training distribution. We further show that on the full training distribution self-repair is imperfect, as the original direct effect of the head is not fully restored, and noisy, since the degree of self-repair varies significantly across different prompts (sometimes overcorrecting beyond the original effect). We highlight two different mechanisms that contribute to self-repair, including changes in the final LayerNorm scaling factor and sparse sets of neurons implementing Anti-Erasure. We additionally discuss the implications of these results for interpretability practitioners and close with a more speculative discussion on the mystery of why self-repair occurs in these models at all, highlighting evidence for the Iterative Inference hypothesis in language models, a framework that predicts self-repair.