AITopics | test prediction

Collaborating Authors

test prediction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Theory of Generalization in Deep Learning

Litman, Elon, Guo, Gabe

arXiv.org Machine LearningMay-5-2026

We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rapidly; in the vast orthogonal dimensions corresponding to noise, the kernel's near-zero eigenvalues trap residual error in a test-invisible reservoir. Within the signal channel, minibatch SGD ensures that coherent population signal accumulates via fast linear drift, while idiosyncratic memorization is suppressed into a slow, diffusive random walk. We prove generalization survives even when the kernel evolves $\mathcal{O}(1)$ in operator norm, the full feature-learning regime. This theory naturally explains disparate phenomena in deep learning theory, such as benign overfitting, double descent, implicit bias, and grokking. Lastly, we derive an exact population-risk objective from a single training run with no validation data, for any architecture, loss, or optimizer, and prove that it measures precisely the noise in the signal channel. This objective reduces in practice to an SNR preconditioner on top of Adam, adding one state vector at no extra cost; it accelerates grokking by $5 \times$, suppresses memorization in PINNs and implicit neural representations, and improves DPO fine-tuning under noisy preferences while staying $3 \times$ closer to the reference policy.

artificial intelligence, machine learning, operator, (19 more...)

arXiv.org Machine Learning

2605.01172

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

On the Accuracy of Influence Functions for Measuring Group Effects Pang Wei Koh Kai-Siang Ang Hubert H. K. T eo

Neural Information Processing SystemsAug-19-2025, 22:53:07 GMT

Removing such large groups can result in significant changes to the model.

actual effect, approximation, influence function, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

How to Achieve Higher Accuracy with Less Training Points?

Yang, Jinghan, Pani, Anupam, Zhang, Yunchao

arXiv.org Artificial IntelligenceApr-21-2025

In the era of large-scale model training, the extensive use o f available datasets has resulted in significant computation al inefficiencies. T o tackle this issue, we explore methods for identifying informative subsets of training data that can achieve comparable or even superior model performance. W e propose a technique based on influence functions to determine which training samples should be included in the training set. W e conducted empirical evaluations of our method on binary classification tasks utilizing logistic re - gression models. Our approach demonstrates performance comparable to that of training on the entire dataset while using only 10% of the data. Furthermore, we found that our method achieved even higher accuracy when trained with just 60% of the data.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2504.13586

Country: Europe (0.46)

Genre: Research Report (0.84)

Industry: Energy (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Variational Autoencoder for Heterogeneous Temporal and Longitudinal Data

Öğretir, Mine, Ramchandran, Siddharth, Papatheodorou, Dimitrios, Lähdesmäki, Harri

arXiv.org Machine LearningNov-20-2023

The variational autoencoder (VAE) is a popular deep latent variable model used to analyse high-dimensional datasets by learning a low-dimensional latent representation of the data. It simultaneously learns a generative model and an inference network to perform approximate posterior inference. Recently proposed extensions to VAEs that can handle temporal and longitudinal data have applications in healthcare, behavioural modelling, and predictive maintenance. However, these extensions do not account for heterogeneous data (i.e., data comprising of continuous and discrete attributes), which is common in many real-life applications. In this work, we propose the heterogeneous longitudinal VAE (HL-VAE) that extends the existing temporal and longitudinal VAEs to heterogeneous data. HL-VAE provides efficient inference for high-dimensional datasets and includes likelihood models for continuous, count, categorical, and ordinal data while accounting for missing observations. We demonstrate our model's efficacy through simulated as well as clinical datasets, and show that our proposed model achieves competitive performance in missing value imputation and predictive accuracy.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/ICMLA55696.2022.00239

2204.09369

Country:

Europe > Finland (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Relabeling Minimal Training Subset to Flip a Prediction

Yang, Jinghan, Xu, Linjie, Yu, Lequan

arXiv.org Machine LearningOct-16-2023

When facing an unsatisfactory prediction from a machine learning model, it is crucial to investigate the underlying reasons and explore the potential for reversing the outcome. We ask: To flip the prediction on a test point $x_t$, how to identify the smallest training subset $\mathcal{S}_t$ we need to relabel? We propose an efficient procedure to identify and relabel such a subset via an extended influence function. We find that relabeling fewer than 2% of the training points can always flip a prediction. This mechanism can serve multiple purposes: (1) providing an approach to challenge a model prediction by altering training points; (2) evaluating model robustness with the cardinality of the subset (i.e., $|\mathcal{S}_t|$); we show that $|\mathcal{S}_t|$ is highly related to the noise ratio in the training set and $|\mathcal{S}_t|$ is correlated with but complementary to predicted probabilities; (3) revealing training points lead to group attribution bias. To the best of our knowledge, we are the first to investigate identifying and relabeling the minimal training subset required to flip a given prediction.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Machine Learning

2305.12809

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > China > Hong Kong (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

BayesDLL: Bayesian Deep Learning Library

Kim, Minyoung, Hospedales, Timothy

arXiv.org Machine LearningSep-22-2023

We release a new Bayesian neural network library for PyTorch for large-scale deep networks. Our library implements mainstream approximate Bayesian inference algorithms: variational inference, MC-dropout, stochastic-gradient MCMC, and Laplace approximation. The main differences from other existing Bayesian neural network libraries are as follows: 1) Our library can deal with very large-scale deep networks including Vision Transformers (ViTs).

artificial intelligence, machine learning, nst, (18 more...)

arXiv.org Machine Learning

2309.12928

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

Add feedback

Latent Neural ODEs with Sparse Bayesian Multiple Shooting

Iakovlev, Valerii, Yildiz, Cagatay, Heinonen, Markus, Lähdesmäki, Harri

arXiv.org Artificial IntelligenceFeb-8-2023

Training dynamic models, such as neural ODEs, on long trajectories is a hard problem that requires using various tricks, such as trajectory splitting, to make model training work in practice. These methods are often heuristics with poor theoretical justifications, and require iterative manual tuning. We propose a principled multiple shooting technique for neural ODEs that splits the trajectories into manageable short segments, which are optimised in parallel, while ensuring probabilistic control on continuity over consecutive segments. We derive variational inference for our shooting-based latent neural ODE models and propose amortized encodings of irregularly sampled trajectories with a transformer-based recognition network with temporal attention and relative positional encoding. We demonstrate efficient and stable training, and state-of-the-art performance on multiple largescale benchmark datasets. Dynamical systems, from biological cells to weather, evolve according to their underlying mechanisms, often described by differential equations. In data-driven system identification we aim to learn the rules governing a dynamical system by observing the system for a time interval [0, T ], and fitting a model of the underlying dynamics to the observations by gradient descent. Such optimisation suffers from the curse of length: complexity of the loss function grows with the length of the observed trajectory (Ribeiro et al., 2020). For even moderate T the loss landscape can become highly complex and gradient descent fails to produce a good fit (Metz et al., 2021). To alleviate this problem previous works resort to cumbersome heuristics, such as iterative training and trajectory splitting (Yildiz et al., 2019; Kochkov et al., 2021; HAN et al., 2022; Lienen & Günnemann, 2022). The optimal control literature has a long history of multiple shooting methods, where the trajectory fitting is split into piecewise segments that are easy to optimise, with constraints to ensure continuity across the segments (van Domselaar & Hemker, 1975; Bock & Plitt, 1984; Baake et al., 1992).

artificial intelligence, dyn, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.03466

Country:

North America > United States (0.14)
Europe > Finland (0.04)
Europe > Italy > Sardinia (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Heterogenous Ensemble of Models for Molecular Property Prediction

Darabi, Sajad, Fazeli, Shayan, Liu, Jiwei, Milesi, Alexandre, Morkisz, Pawel, Puget, Jean-François, Titericz, Gilberto

arXiv.org Artificial IntelligenceNov-20-2022

The OGB Large-Scale Challenge (LSC) [Hu et al., 2021] is a Machine Learning (ML) challenge to predict a quantum chemical property, the HUMO-LUMO gap of small molecules. This ground truth is obtained via a density-functional theory (DFT) computation which is known to be time-consuming and could take several hours, even for small molecules. With the rapid advancement of machine learning technology, it is promising to use fast, GPU-accelerated and accurate ML models to replace this expensive DFT optimization process. The PCQM4Mv2 dataset, based on the PubChemQC project Nakata and Shimazaki [2017], provides us with a welldefined ML task of predicting the HOMO-LUMO gap of molecules given their 2D molecular graphs. Each molecule has two natural views. The 2D graph incorporates topological structures defined by bonds, and the 3D view provides spatial information that better reflects the geometry and spatial relation of the different bonds in the molecule.

artificial intelligence, base model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2211.11035

Country:

North America > United States (0.05)
Europe > France (0.04)
South America > Brazil (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Qualitative Analysis of Monte Carlo Dropout

Seoh, Ronald

arXiv.org Machine LearningJul-3-2020

We first consider the sources of uncertainty in NNs, and briefly review Bayesian Neural Networks (BNN), the group of Bayesian approaches to tackle uncertainties in NNs. After presenting mathematical formulation of MC dropout, we proceed to suggesting potential benefits and associated costs for using MC dropout in typical NN models, with the results from our experiments.

artificial intelligence, dropout, machine learning, (19 more...)

arXiv.org Machine Learning

2007.0172

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Second-Order Group Influence Functions for Black-Box Predictions

Basu, Samyadeep, You, Xuchen, Feizi, Soheil

arXiv.org Machine LearningNov-1-2019

With the rapid adoption of machine learning systems in sensitive applications, there is an increasing need to make black-box models explainable. Often we want to identify an influential group of training samples in a particular test prediction. Existing influence functions tackle this problem by using first-order approximations of the effect of removing a sample from the training set on model parameters. To compute the influence of a group of training samples (rather than an individual point) in model predictions, the change in optimal model parameters after removing that group from the training set can be large. Thus, in such cases, the first-order approximation can be loose. In this paper, we address this issue and propose second-order influence functions for identifying influential groups in test-time predictions. For linear models and across different sizes of groups, we show that using the proposed second-order influence function improves the correlation between the computed influence values and the ground truth ones. For nonlinear models based on neural networks, we empirically show that none of the existing first-order and the proposed second-order influence functions provide proper estimates of the ground-truth influences over all training samples. We empirically study this phenomenon by decomposing the influence values over contributions from different eigenvectors of the Hessian of the trained model.

group influence function, influence function, training sample, (15 more...)

arXiv.org Machine Learning

1911.00418

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(3 more...)

Genre: Research Report (0.67)

Industry: Transportation > Air (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback