AITopics | missing data

Collaborating Authors

missing data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Latent Diffusion for Missing Data

Estad, Alberte Heering, Peis, Ignacio, Frellsen, Jes

arXiv.org Machine LearningMay-28-2026

Diffusion models have emerged as powerful generative approaches for missing-data imputation, yet most existing methods operate directly in data space and degrade when training data are heavily incomplete. We investigate whether shifting diffusion to a learned latent representation improves robustness under missing-completely-at-random (MCAR) corruption. To this end, we propose a two-stage framework: a robust VAE-based imputer first learns compact semantic features from incomplete observations, and a diffusion model is then trained in the resulting latent space. Across training missing rates, we perform a controlled comparison against pixel-space diffusion models under the same incomplete-data setting. The latent diffusion model maintains high sample quality and remains stable up to 50\% missingness, while pixel-space diffusion degrades progressively as missingness increases. For downstream imputation, latent diffusion also achieves consistently better performance than pixel-space diffusion. These findings indicate that latent-space modeling mitigates artifact amplification from zero-imputed inputs and provides a more robust generative prior for incomplete-data learning. Overall, our results support latent diffusion as a strong and practically useful alternative to pixel-space diffusion for missing-data problems.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Machine Learning

2605.28427

Genre: Research Report > New Finding (0.49)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models

Neural Information Processing SystemsDec-23-2025, 21:37:14 GMT

State-of-the-art causal discovery methods usually assume that the observational data is complete. However, the missing data problem is pervasive in many practical scenarios such as clinical trials, economics, and biology. One straightforward way to address the missing data problem is first to impute the data using off-the-shelf imputation methods and then apply existing causal discovery methods. However, such a two-step method may suffer from suboptimality, as the imputation algorithm may introduce bias for modeling the underlying data distribution. In this paper, we develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. Focusing mainly on the assumptions of ignorable missingness and the identifiable additive noise models (ANMs), MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization (EM) framework. In the E-step, in cases where computing the posterior distributions of parameters in closed-form is not feasible, Monte Carlo EM is leveraged to approximate the likelihood. In the M-step, MissDAG leverages the density transformation to model the noise distributions with simpler and specific formulations by virtue of the ANMs and uses a likelihood-based causal discovery algorithm with directed acyclic graph constraint. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.

causal discovery, continuous additive noise model, missdag, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Learning Disentangled Representations of Videos with Missing Data

Neural Information Processing SystemsDec-23-2025, 21:02:25 GMT

Missing data poses significant challenges while learning representations of video sequences. We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object, while it imputes each object trajectory where data is missing. On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin. We also present comparisons on a real-world MOTSChallenge pedestrian dataset, which demonstrates the practical value of our method in a more realistic setting.

learning disentangled representation, missing data, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

Missing Data: Datasets, Imputation, and Benchmarking

Neural Information Processing SystemsOct-10-2025, 19:19:31 GMT

Datasets and code files are publicly accessible at Link. Our dataset will be hosted on both the GitHub and cloud storage drive. Code for the TimesNet Link Code for the SAITS Link 5.2 Trajectory Prediction Codes The following are the codes for the trajectory prediction methods used in our work. The dataset is primarily created by an academic team (students and faculty). The data statistics are shown in Section 4 of the main paper.

dataset, link code, missing data, (11 more...)

Neural Information Processing Systems

Country: Asia > India > Uttarakhand > Roorkee (0.06)

Technology:

Information Technology > Artificial Intelligence (0.51)
Information Technology > Data Science > Data Quality (0.42)

Add feedback

Graphical Models for Inference with Missing Data

Karthika Mohan, Judea Pearl, Jin Tian

Neural Information Processing SystemsOct-3-2025, 06:48:55 GMT

Neural Information Processing Systems http://nips.cc/

graphical model, inference, missing data

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Quality (0.40)
Information Technology > Artificial Intelligence > Systems & Languages (0.40)

Add feedback

Provable Tensor Factorization with Missing Data

Prateek Jain, Sewoong Oh

Neural Information Processing SystemsOct-3-2025, 03:58:15 GMT

Neural Information Processing Systems http://nips.cc/

missing data, provable tensor factorization

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)

Add feedback

Graph Clustering With Missing Data: Convex Algorithms and Analysis

Ramya Korlakai Vinayak, Samet Oymak, Babak Hassibi

Neural Information Processing SystemsOct-3-2025, 01:17:14 GMT

Neural Information Processing Systems http://nips.cc/

convex algorithm and analysis, graph clustering, missing data

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Add feedback

Graphical Models for Recovering Probabilistic and Causal Queries from Missing Data

Karthika Mohan, Judea Pearl

Neural Information Processing SystemsOct-2-2025, 18:22:10 GMT

Neural Information Processing Systems http://nips.cc/

graphical model, missing data, probabilistic and causal query

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Quality (0.40)
Information Technology > Artificial Intelligence > Systems & Languages (0.40)

Add feedback

Provable Tensor Factorization with Missing Data

Neural Information Processing SystemsSep-30-2025, 10:32:35 GMT

We study the problem of low-rank tensor factorization in the presence of missing data. We ask the following question: how many sampled entries do we need, to efficiently and exactly reconstruct a tensor with a low-rank orthogonal decomposition? We propose a novel alternating minimization based method which iteratively refines estimates of the singular vectors. We show that under certain standard assumptions, our method can recover a three-mode $n\times n\times n$ dimensional rank-$r$ tensor exactly from $O(n^{3/2} r^5 \log^4 n)$ randomly sampled entries. In the process of proving this result, we solve two challenging sub-problems for tensors with missing data. First, in analyzing the initialization step, we prove a generalization of a celebrated result by Szemer\'edie et al. on the spectrum of random graphs. Next, we prove global convergence of alternating minimization with a good initialization. Simulations suggest that the dependence of the sample size on dimensionality $n$ is indeed tight.

missing data, name change, provable tensor factorization, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Quality (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Graph Clustering With Missing Data: Convex Algorithms and Analysis

Neural Information Processing SystemsSep-30-2025, 09:58:56 GMT

We consider the problem of finding clusters in an unweighted graph, when the graph is partially observed. We analyze two programs, one which works for dense graphs and one which works for both sparse and dense graphs, but requires some a priori knowledge of the total cluster size, that are based on the convex optimization approach for low-rank matrix recovery using nuclear norm minimization. For the commonly used Stochastic Block Model, we obtain \emph{explicit} bounds on the parameters of the problem (size and sparsity of clusters, the amount of observed data) and the regularization parameter characterize the success and failure of the programs. We corroborate our theoretical findings through extensive simulations. We also run our algorithm on a real data set obtained from crowdsourcing an image classification task on the Amazon Mechanical Turk, and observe significant performance improvement over traditional methods such as k-means.

convex algorithm and analysis, graph clustering, missing data, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.61)
Information Technology > Artificial Intelligence > Vision (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.41)

Add feedback