AITopics | resample

Collaborating Authors

resample

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

bioLeak: Leakage-Aware Modeling and Diagnostics for Machine Learning in R

Korkmaz, Selçuk

arXiv.org Machine LearningApr-14-2026

Data leakage remains a recurrent source of optimistic bias in biomedical machine learning studies. Standard row-wise cross-validation and globally estimated preprocessing steps are often inappropriate for data with repeated measurements, study-level heterogeneity, batch effects, or temporal dependencies. This paper describes bioLeak, an R package for constructing leakage-aware resampling workflows and for auditing fitted models for common leakage mechanisms. The package provides leakage-aware split construction, train-fold-only preprocessing, cross-validated model fitting, nested hyperparameter tuning, post hoc leakage audits, and HTML reporting. The implementation supports binary classification, multiclass classification, regression, and survival analysis, with task-specific metrics and S4 containers for splits, fits, audits, and inflation summaries. The simulation artifacts show how apparent performance changes under controlled leakage mechanisms, and the case study illustrates how guarded and leaky pipelines can yield materially different conclusions on multi-study transcriptomic data. The emphasis throughout is on software design, reproducible workflows, and interpretation of diagnostic output.

artificial intelligence, leakage, machine learning, (18 more...)

arXiv.org Machine Learning

2604.10965

Country:

North America > United States (0.28)
Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Middle East > Republic of Türkiye > Edirne Province > Edirne (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)
Workflow (0.90)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.66)

Add feedback

fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R

Korkmaz, Selcuk, Goksuluk, Dincer, Karaismailoglu, Eda

arXiv.org Machine LearningApr-14-2026

Preprocessing leakage arises when scaling, imputation, or other data-dependent transformations are estimated before resampling, inflating apparent performance while remaining hard to detect. We present fastml, an R package that provides a single-call interface for leakage-aware machine learning through guarded resampling, where preprocessing is re-estimated inside each resample and applied to the corresponding assessment data. The package supports grouped and time-ordered resampling, blocks high-risk configurations, audits recipes for external dependencies, and includes sandboxed execution and integrated model explanation. We evaluate fastml with a Monte Carlo simulation contrasting global and fold-local normalization, a usability comparison with tidymodels under matched specifications, and survival benchmarks across datasets of different sizes. The simulation demonstrates that global preprocessing substantially inflates apparent performance relative to guarded resampling. fastml matched held-out performance obtained with tidymodels while reducing workflow orchestration, and it supported consistent benchmarking of multiple survival model classes through a unified interface.

artificial intelligence, fastml, machine learning, (19 more...)

arXiv.org Machine Learning

2604.05225

Country:

Europe > Netherlands > South Holland > Rotterdam (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)

Add feedback

Scalable Uncertainty Quantification for Black-Box Density-Based Clustering

Bariletto, Nicola, Walker, Stephen G.

arXiv.org Machine LearningMar-4-2026

We introduce a novel framework for uncertainty quantification in clustering. By combining the martingale posterior paradigm with density-based clustering, uncertainty in the estimated density is naturally propagated to the clustering structure. The approach scales effectively to high-dimensional and irregularly shaped data by leveraging modern neural density estimators and GPU-friendly parallel computation. We establish frequen-tist consistency guarantees and validate the methodology on synthetic and real data.

artificial intelligence, machine learning, scalable uncertainty quantification, (10 more...)

arXiv.org Machine Learning

2603.03188

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Air (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

b736c4b0b38876c9249db9bd900c1a86-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 16:52:48 GMT

diffusion model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.69)
(2 more...)

Add feedback

Evaluating Control Protocols for Untrusted AI Agents

Kutasov, Jon, Loughridge, Chloe, Sun, Yuqi, Sleight, Henry, Shlegeris, Buck, Tracy, Tyler, Benton, Joe

arXiv.org Artificial IntelligenceNov-6-2025

As AI systems become more capable and widely deployed as agents, ensuring their safe operation becomes critical. AI control offers one approach to mitigating the risk from untrusted AI agents by monitoring their actions and intervening or auditing when necessary. Evaluating the safety of these protocols requires understanding both their effectiveness against current attacks and their robustness to adaptive adversaries. In this work, we systematically evaluate a range of control protocols in SHADE-Arena, a dataset of diverse agentic environments. First, we evaluate blue team protocols, including deferral to trusted models, resampling, and deferring on critical actions, against a default attack policy. We find that resampling for incrimination and deferring on critical actions perform best, increasing safety from 50% to 96%. We then iterate on red team strategies against these protocols and find that attack policies with additional affordances, such as knowledge of when resampling occurs or the ability to simulate monitors, can substantially improve attack success rates against our resampling strategy, decreasing safety to 17%. However, deferring on critical actions is highly robust to even our strongest red team strategies, demonstrating the importance of denying attack policies access to protocol internals.

agent, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.02997

Genre: Research Report (0.49)

Industry: Information Technology > Security & Privacy (0.75)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation

Xue, Yuquan, Lu, Guanxing, Wu, Zhenyu, Zhang, Chuanrui, Jia, Bofang, Gu, Zhengyi, Tang, Yansong, Wang, Ziwei

arXiv.org Artificial IntelligenceOct-27-2025

Vision-Language-Action models (VLAs) have demonstrated remarkable performance on complex robotic manipulation tasks through imitation learning. However, existing imitation learning datasets contain only successful trajectories and lack failure or recovery data, especially for out-of-distribution (OOD) states where the robot deviates from the main policy due to minor perturbations or errors, leading VLA models to struggle with states deviating from the training distribution. To this end, we propose an automated OOD data augmentation framework named RESample through exploratory sampling. Specifically, we first leverage offline reinforcement learning to obtain an action-value network that accurately identifies sub-optimal actions under the current manipulation policy. We further sample potential OOD states from trajectories via rollout, and design an exploratory sampling mechanism that adaptively incorporates these action proxies into the training dataset to ensure efficiency. Subsequently, our framework explicitly encourages the VLAs to recover from OOD states and enhances their robustness against distributional shifts. We conduct extensive experiments on the LIBERO benchmark as well as real-world robotic manipulation tasks, demonstrating that RESample consistently improves the stability and generalization ability of VLA models.

arxiv preprint arxiv, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2510.1764

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Curia: A Multi-Modal Foundation Model for Radiology

Dancette, Corentin, Khlaut, Julien, Saporta, Antoine, Philippe, Helene, Ferreres, Elodie, Callard, Baptiste, Danielou, Théo, Alberge, Léo, Machado, Léo, Tordjman, Daniel, Dupuis, Julie, Floch, Korentin Le, Terrail, Jean Du, Moshiri, Mariam, Dercle, Laurent, Boeken, Tom, Gregory, Jules, Ronot, Maxime, Legou, François, Roux, Pascal, Sapoval, Marc, Manceron, Pierre, Hérent, Paul

arXiv.org Artificial IntelligenceSep-9-2025

Radiology is at the center of many medical specialties, which rely on radiologists' interpretation of images from various modalities, including CT, MRI, ultrasound, and X-ray [1]. The analysis of these images is crucial for detecting and characterizing medical conditions, quantifying disease progression, and monitoring treatment efficacy across a broad spectrum of diseases. AI has the potential to enhance radiology workflows and improve radiologists' efficiency, particularly for labor-intensive tasks such as image segmentation, or specialized and/or complex tasks which are prone to inter-reader variability [2, 3]. To date, the dominant paradigm in radiological AI development has involved training specialized models for individual tasks such as segmentation, abnormality detection (e.g., tumor detection), or pathology classification. However, this "one-task, one-model" approach is exceptionally resource-intensive, as it necessitates the curation and manual annotation of large, task-specific datasets for each modality and clinical application [4, 5]. It is potentially one of the bottlenecks in moving AI radiology models into the clinical workflow. Foundation models (FM) represent a significant paradigm shift in the field of AI.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.0683

Country: North America > United States > South Carolina (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Add feedback

Improving Sparse Vector Technique with Renyi Differential Privacy

Neural Information Processing SystemsAug-17-2025, 02:45:22 GMT

The Sparse V ector Technique (SVT) [Dwork et al., 2009] is a fundamental tool in differential privacy

artificial intelligence, machine learning, svt, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
North America > Canada (0.04)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Security & Privacy (0.94)

Add feedback

Reviews: Exact sampling of determinantal point processes with sublinear time preprocessing

Neural Information Processing SystemsJun-2-2025, 00:41:33 GMT

Review of "Exact sampling of determinantal point processes with sublinear time preprocessing." Summary: the paper proposes a new algorithm for exact sampling from determinantal point processes (DPP). A DPP sampling problem involves an n x n matrix L; the probability of selecting some subset S of k n indices is given by the determinant of a k x k matrix subset divided by the determinant of L I. The proposed algorithm has preprocessing which is sublinear in matrix size n and polynomial in subset size k; sampling is polynomial in k and independent of n. Previous algorithms which require eigen-decomposition or MCMC are O(n 3). The main idea is to downsample the index set using a regularzed DPP and then run a DPP on this downsample.

algorithm, determinantal point process, sublinear time, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.56)

Add feedback

FILM: Framework for Imbalanced Learning Machines based on a new unbiased performance measure and a new ensemble-based technique

Guillén-Teruel, Antonio, Caracena, Marcos, Pardo, Jose A., de-la-Gándara, Fernando, Palma, José, Botía, Juan A.

arXiv.org Artificial IntelligenceMar-6-2025

This research addresses the challenges of handling unbalanced datasets for binary classification tasks. In such scenarios, standard evaluation metrics are often biased by the disproportionate representation of the minority class. Conducting experiments across seven datasets, we uncovered inconsistencies in evaluation metrics when determining the model that outperforms others for each binary classification problem. This justifies the need for a metric that provides a more consistent and unbiased evaluation across unbalanced datasets, thereby supporting robust model selection. To mitigate this problem, we propose a novel metric, the Unbiased Integration Coefficients (UIC), which exhibits significantly reduced bias ($p < 10^{-4}$) towards the minority class compared to conventional metrics. The UIC is constructed by aggregating existing metrics while penalising those more prone to imbalance. In addition, we introduce the Identical Partitions for Imbalance Problems (IPIP) algorithm for imbalanced ML problems, an ensemble-based approach. Our experimental results show that IPIP outperforms other baseline imbalance-aware approaches using Random Forest and Logistic Regression models in three out of seven datasets as assessed by the UIC metric, demonstrating its effectiveness in addressing imbalanced data challenges in binary classification tasks. This new framework for dealing with imbalanced datasets is materialized in the FILM (Framework for Imbalanced Learning Machines) R Package, accessible at https://github.com/antoniogt/FILM.

algorithm, dataset, minority class, (16 more...)

arXiv.org Artificial Intelligence

2503.0437

Country:

Europe > Spain > Region of Murcia > Murcia (0.04)
Europe > United Kingdom (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.94)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Add feedback