AITopics | surrogate data

Collaborating Authors

surrogate data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Scaling laws for learning with real and surrogate data

Neural Information Processing SystemsMar-22-2026, 11:06:28 GMT

Collecting large quantities of high-quality data can be prohibitively expensive or impractical, and a bottleneck in machine learning. One may instead augment a small set of $n$ data points from the target distribution with data from more accessible sources, e.g.

artificial intelligence, machine learning, surrogate data, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

c7038ec7eee8dc5a99d74d594f70aa3f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 02:21:12 GMT

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Scaling laws for learning with real and surrogate data Ayush Jain

Neural Information Processing SystemsOct-10-2025, 16:21:52 GMT

Collecting large quantities of high-quality data can be prohibitively expensive or impractical, and a bottleneck in machine learning.

dataset, regression, surrogate data, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Scaling laws for learning with real and surrogate data

Neural Information Processing SystemsMay-27-2025, 16:06:46 GMT

Collecting large quantities of high-quality data can be prohibitively expensive or impractical, and a bottleneck in machine learning. One may instead augment a small set of n data points from the target distribution with data from more accessible sources, e.g. We refer to such data as surrogate data'. We study a weighted empirical risk minimization (ERM) approach for integrating surrogate data into training. We analyze mathematically this method under several classical statistical models, and validate our findings empirically on datasets from different domains.

scaling law, surrogate data, test error

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Learning from the past: predicting critical transitions with machine learning trained on surrogates of historical data

Ma, Zhiqin, Zeng, Chunhua, Zhang, Yi-Cheng, Bury, Thomas M.

arXiv.org Artificial IntelligenceOct-12-2024

Complex systems can undergo critical transitions, where slowly changing environmental conditions trigger a sudden shift to a new, potentially catastrophic state. Early warning signals for these events are crucial for decision-making in fields such as ecology, biology and climate science. Generic early warning signals motivated by dynamical systems theory have had mixed success on real noisy data. More recent studies found that deep learning classifiers trained on synthetic data could improve performance. However, neither of these methods take advantage of historical, system-specific data. Here, we introduce an approach that trains machine learning classifiers directly on surrogate data of past transitions, namely surrogate data-based machine learning (SDML). The approach provides early warning signals in empirical and experimental data from geology, climatology, sociology, and cardiology with higher sensitivity and specificity than two widely used generic early warning signals -- variance and lag-1 autocorrelation. Since the approach is trained directly on surrogates of historical data, it is not bound by the restricting assumption of a local bifurcation like previous methods. This system-specific approach can contribute to improved early warning signals to help humans better prepare for or avoid undesirable critical transitions.

artificial intelligence, machine learning, transition, (15 more...)

arXiv.org Artificial Intelligence

2410.09707

Country:

North America > Canada > Quebec > Montreal (0.14)
Atlantic Ocean > Mediterranean Sea (0.05)
Europe > Switzerland > Zürich > Zürich (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)

Add feedback

Scaling laws for learning with real and surrogate data

Jain, Ayush, Montanari, Andrea, Sasoglu, Eren

arXiv.org Artificial IntelligenceFeb-6-2024

Collecting large quantities of high-quality data is often prohibitively expensive or impractical, and a crucial bottleneck in machine learning. One may instead augment a small set of $n$ data points from the target distribution with data from more accessible sources like public datasets, data collected under different circumstances, or synthesized by generative models. Blurring distinctions, we refer to such data as `surrogate data'. We define a simple scheme for integrating surrogate data into training and use both theoretical models and empirical studies to explore its behavior. Our main findings are: $(i)$ Integrating surrogate data can significantly reduce the test error on the original distribution; $(ii)$ In order to reap this benefit, it is crucial to use optimally weighted empirical risk minimization; $(iii)$ The test error of models trained on mixtures of real and surrogate data is well described by a scaling law. This can be used to predict the optimal weighting and the gain from surrogate data.

original data, regression, surrogate data, (13 more...)

arXiv.org Artificial Intelligence

2402.04376

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Correlated Components Analysis --- Extracting Reliable Dimensions in Multivariate Data

Parra, Lucas C., Haufe, Stefan, Dmochowski, Jacek P.

arXiv.org Machine LearningFeb-10-2018

How does one find data dimensions that are reliably expressed across repetitions? For example, in neuroscience one may want to identify combinations of brain signals that are reliably activated across multiple trials or subjects. For a clinical assessment with multiple ratings, one may want to identify an aggregate score that is reliably reproduced across raters. The approach proposed here --- "correlated components analysis" --- is to identify components that maximally correlate between repetitions (e.g. trials, subjects, raters). This can be expressed as the maximization of the ratio of between-repetition to within-repetition covariance, resulting in a generalized eigenvalue problem. We show that covariances can be computed efficiently without explicitly considering all pairs of repetitions, that the result is equivalent to multi-class linear discriminant analysis for unbiased signals, and that the approach also maximize reliability, defined as the mean divided by the deviation across repetitions. We also extend the method to non-linear components using kernels, discuss regularization to improve numerical stability, present parametric and non-parametric tests to establish statistical significance, and provide code.

artificial intelligence, correlation, machine learning, (17 more...)

arXiv.org Machine Learning

1801.08881

Country: North America > United States > New York (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Slice sampling covariance hyperparameters of latent Gaussian models

Murray, Iain, Adams, Ryan P.

Neural Information Processing SystemsDec-31-2010

The Gaussian process (GP) is a popular way to specify dependencies between random variables in a probabilistic model. In the Bayesian framework the covariance structure can be specified using unknown hyperparameters. Integrating over these hyperparameters considers different possible explanations for the data when making predictions. This integration is often performed using Markov chain Monte Carlo (MCMC) sampling. However, with non-Gaussian observations standard hyperparameter sampling approaches require careful tuning and may converge slowly. In this paper we present a slice sampling approach that requires little tuning while mixing well in both strong- and weak-data regimes.

hyperparameter, latent variable, reparameterization, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Slice sampling covariance hyperparameters of latent Gaussian models

Murray, Iain, Adams, Ryan Prescott

arXiv.org Machine LearningOct-28-2010

Computer Science University of Toronto The Gaussian process (GP) is a popular way to specify dependencies between random variables in a probabilistic model. In the Bayesian framework the covariance structure can be specified using unknown hyperparameters. Integrating over these hyperparameters considers different possible explanations for the data when making predictions. This integration is often performed using Markov chain Monte Carlo (MCMC) sampling. However, with non-Gaussian observations standard hyperparameter sampling approaches require careful tuning and may converge slowly. In this paper we present a slice sampling approach that requires little tuning while mixing well in both strong-and weak-data regimes.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

1006.0868

Country: North America > Canada > Ontario > Toronto (0.55)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Estimating the Reliability of ICA Projections

Meinecke, Frank C., Ziehe, Andreas, Kawanabe, Motoaki, Müller, Klaus-Robert

Neural Information Processing SystemsDec-31-2002

When applying unsupervised learning techniques like ICA or temporal decorrelation, a key question is whether the discovered projections are reliable. In other words: can we give error bars or can we assess the quality of our separation? We use resampling methods to tackle these questions and show experimentally that our proposed variance estimations are strongly correlated to the separation error. We demonstrate that this reliability estimation can be used to choose the appropriate ICA-model, to enhance significantly the separation performance, and, most important, to mark the components that have a actual physical meaning.

algorithm, projection, separation error, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.05)
Europe > Germany > Brandenburg > Potsdam (0.05)
Europe > Sweden (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.35)

Add feedback