AITopics | bootstrap sample

Collaborating Authors

bootstrap sample

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimal ridge regularization revisited

Timmermans, Jack, Alvarez, Sergio A.

arXiv.org Machine LearningMay-28-2026

We consider $L^2$-regularized linear (ridge) regression over a finite data sample $X$ with bounded covariance and linear prediction targets $y$ with additive isotropic noise of finite variance. We present an iterative procedure to compute the optimal regularization strength numerically from the generative parameters in the fixed-$X$ setting and prove its convergence at limited noise levels. Our experimental evaluation over synthetic data shows that the proposed procedure combined with sample-based parameter estimates attains near-optimal random-$X$ generalization across a wide range of sample sizes, aspect ratios, and noise levels, at an added computational cost equivalent to one preliminary ridge regression in the underparameterized regime and two in the overparameterized case.

artificial intelligence, machine learning, regularization, (16 more...)

arXiv.org Machine Learning

2605.28679

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

58b7483ba899e0ce4d97ac5eecf6fa99-Supplemental.pdf

Neural Information Processing SystemsApr-26-2026, 01:09:41 GMT

artificial intelligence, machine learning, sequence, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.45)

Add feedback

58b7483ba899e0ce4d97ac5eecf6fa99-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 19:17:39 GMT

bootstrap method, inequality, sequence, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.45)

Add feedback

48d23e87eb98cc2227b5a8c33fa00680-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 12:07:33 GMT

constraint, global problem, treatment effect, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.48)

Add feedback

OvercomingCommonFlawsintheEvaluationof SelectiveClassificationSystems

Neural Information Processing SystemsFeb-7-2026, 07:09:07 GMT

Whilecurrentevaluationofthese systems typically assumes fixed working points based on pre-defined rejection thresholds, methodological progress requires benchmarking the general performance of systems akin to the AUROC in standard classification. In this work, we define 5 requirements for multi-threshold metrics in selective classification regarding task alignment, interpretability, and flexibility, and show how current approaches fail to meet them.

artificial intelligence, justification, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Europe > Germany (0.05)
North America > United States (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Diagnostic Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Estimating the Event-Related Potential from Few EEG Trials

Nørskov, Anders Vestergaard, Jørgensen, Kasper, Zahid, Alexander Neergaard, Mørup, Morten

arXiv.org Artificial IntelligenceDec-1-2025

Event-related potentials (ERP) are measurements of brain activity with wide applications in basic and clinical neuroscience, that are typically estimated using the average of many trials of electroencephalography signals (EEG) to sufficiently reduce noise and signal variability. We introduce EEG2ERP, a novel uncertainty-aware autoencoder approach that maps an arbitrary number of EEG trials to their associated ERP. To account for the ERP uncertainty we use bootstrapped training targets and introduce a separate variance decoder to model the uncertainty of the estimated ERP. We evaluate our approach in the challenging zero-shot scenario of generalizing to new subjects considering three different publicly available data sources; i) the comprehensive ERP CORE dataset that includes over 50,000 EEG trials across six ERP paradigms from 40 subjects, ii) the large P300 Speller BCI dataset, and iii) a neuroimaging dataset on face perception consisting of both EEG and magnetoen-cephalography (MEG) data. We consistently find that our method in the few trial regime provides substantially better ERP estimates than commonly used conventional and robust averaging procedures. EEG2ERP is the first deep learning approach to map EEG signals to their associated ERP, moving toward reducing the number of trials necessary for ERP research.

artificial intelligence, deep learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.23162

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

The Impact of Bootstrap Sampling Rate on Random Forest Performance in Regression Tasks

Iwaniuk, Michał, Jarosz, Mateusz, Borycki, Bartłomiej, Jezierski, Bartosz, Cwalina, Jan, Kaźmierczak, Stanisław, Mańdziuk, Jacek

arXiv.org Artificial IntelligenceNov-19-2025

Abstract--Random Forests (RFs) typically train each tree on a bootstrap sample of the same size as the training set, i.e., bootstrap rate (BR) equals 1.0. We systematically examine how varying BR from 0.2 to 5.0 affects RF performance across 39 heterogeneous regression datasets and 16 RF configurations, evaluating with repeated two-fold cross-validation and mean squared error . Our results demonstrate that tuning the BR can yield significant improvements over the default: the best setup relied on BR 1.0 for 24 datasets, BR > 1.0 for 15, and BR = 1.0 was optimal in 4 cases only. We establish a link between dataset characteristics and the preferred BR: datasets with strong global feature-target relationships favor higher BRs, while those with higher local target variance benefit from lower BRs. T o further investigate this relationship, we conducted experiments on synthetic datasets with controlled noise levels. These experiments reproduce the observed bias-variance trade-off: in low-noise scenarios, higher BRs effectively reduce model bias, whereas in high-noise settings, lower BRs help reduce model variance. Overall, BR is an influential hyperparameter that should be tuned to optimize RF regression models. ANDOM Forest (RF) is an ensemble machine learning (ML) algorithm involving a set of decision trees that collectively make a decision. In classification tasks, each tree votes for a particular class, and the predicted label is determined either by hard voting (majority vote) or soft voting (averaged class probabilities across the trees). In regression tasks, the final prediction is the mean of all individual tree outputs. RFs serve as a robust baseline across a wide range of ML problems, offering an effective balance of predictive accuracy, training speed, and moderate interpretability. While gradient-boosted trees or deep neural networks may outperform them in heavily tuned or domain-specific settings, RF models consistently deliver near-optimal results with minimal tuning, especially on structured, tabular datasets [1], [2].

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.13952

Country: Europe (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation (0.67)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
(2 more...)

Add feedback

A Dual-Use Framework for Clinical Gait Analysis: Attention-Based Sensor Optimization and Automated Dataset Auditing

Sadeghsalehi, Hamidreza

arXiv.org Artificial IntelligenceNov-5-2025

Objective gait analysis using wearable sensors and AI is critical for managing neurological and orthopedic conditions. However, models are vulnerable to hidden dataset biases, and task-specific sensor optimization remains a challenge. We propose a multi-stream attention-based deep learning framework that functions as both a sensor optimizer and an automated data auditor. Applied to the Voisard et al. (2025) multi-cohort gait dataset on four clinical tasks (PD, OA, CVA screening; PD vs CVA differential), the model's attention mechanism quantitatively discovered a severe dataset confound. For OA and CVA screening, tasks where bilateral assessment is clinically essential, the model assigned more than 70 percent attention to the Right Foot while statistically ignoring the Left Foot (less than 0.1 percent attention, 95 percent CI [0.0-0.1]). This was not a clinical finding but a direct reflection of a severe laterality bias (for example, 15 of 15 right-sided OA) in the public dataset. The primary contribution of this work is methodological, demonstrating that an interpretable framework can automatically audit dataset integrity. As a secondary finding, the model proposes novel, data-driven sensor synergies (for example, Head plus Foot for PD screening) as hypotheses for future optimized protocols.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.02047

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)

Add feedback

047c84ec50bd8ea29349b996fc64af4b-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 17:26:21 GMT

augrc, dataset, prediction, (12 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
North America > United States (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
(2 more...)

Add feedback

Twin-Boot: Uncertainty-Aware Optimization via Online Two-Sample Bootstrapping

Brito, Carlos Stein

arXiv.org Machine LearningAug-22-2025

Standard gradient descent methods yield point estimates with no measure of confidence. This limitation is acute in overparameterized and low-data regimes, where models have many parameters relative to available data and can easily overfit. Bootstrapping is a classical statistical framework for uncertainty estimation based on resampling, but naively applying it to deep learning is impractical: it requires training many replicas, produces post-hoc estimates that cannot guide learning, and implicitly assumes comparable optima across runs - an assumption that fails in non-convex landscapes. We introduce Twin-Bootstrap Gradient Descent (Twin-Boot), a resampling-based training procedure that integrates uncertainty estimation into optimization. Two identical models are trained in parallel on independent bootstrap samples, and a periodic mean-reset keeps both trajectories in the same basin so that their divergence reflects local (within-basin) uncertainty. During training, we use this estimate to sample weights in an adaptive, data-driven way, providing regularization that favors flatter solutions. In deep neural networks and complex high-dimensional inverse problems, the approach improves calibration and generalization and yields interpretable uncertainty maps.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2508.15019

Country: Europe > Portugal > Lisbon > Lisbon (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback