AITopics | complete case

Collaborating Authors

complete case

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

85dca1d270f7f9aef00c9d372f114482-Paper.pdf

Neural Information Processing SystemsFeb-19-2026, 05:46:12 GMT

complete case, complete data domain, fairness, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Florida > Broward County (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Quality (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

85dca1d270f7f9aef00c9d372f114482-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 15:39:16 GMT

artificial intelligence, data mining, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Florida > Broward County (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Don't fear the unlabelled: safe semi-supervised learning via simple debiasing

Schmutz, Hugo, Humbert, Olivier, Mattei, Pierre-Alexandre

arXiv.org Artificial IntelligenceMar-3-2023

Semi-supervised learning (SSL) provides an effective means of leveraging unlabelled data to improve a model's performance. Even though the domain has received a considerable amount of attention in the past years, most methods present the common drawback of lacking theoretical guarantees. Our starting point is to notice that the estimate of the risk that most discriminative SSL methods minimise is biased, even asymptotically. This bias impedes the use of standard statistical learning theory and can hurt empirical performance. We propose a simple way of removing the bias. Our debiasing approach is straightforward to implement and applicable to most deep SSL methods. We provide simple theoretical guarantees on the trustworthiness of these modified methods, without having to rely on the strong assumptions on the data distribution that SSL theory usually requires. In particular, we provide generalisation error bounds for the proposed methods. We evaluate debiased versions of different existing SSL methods, such as the Pseudolabel method and Fixmatch, and show that debiasing can compete with classic deep SSL techniques in various settings by providing better calibrated models. Additionally, we provide a theoretical explanation of the intuition of the popular SSL methods. The promise of semi-supervised learning (SSL) is to be able to learn powerful predictive models using partially labelled data. In turn, this would allow machine learning to be less dependent on the often costly and sometimes dangerously biased task of labelling data. Scudder's (1965) untaught pattern recognition machine--simply replaced unknown labels with predictions made by some estimate of the predictive model and used the obtained pseudo-labels to refine their initial estimate. Other more complex branches of SSL have been explored since, notably using generative models (from McLachlan, 1977, to Kingma et al., 2014) or graphs (notably following Zhu et al., 2003). Deep neural networks, which are state-of-the-art supervised predictors, have been trained successfully using SSL. Somewhat surprisingly, the main ingredient of their success is still the notion of pseudo-labels (or one of its variants), combined with systematic use of data augmentation (e.g. An obvious SSL baseline is simply throwing away the unlabelled data.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2203.07512

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Chains of Autoreplicative Random Forests for missing value imputation in high-dimensional datasets

Antonenko, Ekaterina, Read, Jesse

arXiv.org Artificial IntelligenceJan-2-2023

Missing values are a common problem in data science and machine learning. Removing instances with missing values can adversely affect the quality of further data analysis. This is exacerbated when there are relatively many more features than instances, and thus the proportion of affected instances is high. Such a scenario is common in many important domains, for example, single nucleotide polymorphism (SNP) datasets provide a large number of features over a genome for a relatively small number of individuals. To preserve as much information as possible prior to modeling, a rigorous imputation scheme is acutely needed. While Denoising Autoencoders is a state-of-the-art method for imputation in high-dimensional data, they still require enough complete cases to be trained on which is often not available in real-world problems. In this paper, we consider missing value imputation as a multi-label classification problem and propose Chains of Autoreplicative Random Forests. Using multi-label Random Forests instead of neural networks works well for low-sampled data as there are fewer parameters to optimize. Experiments on several SNP datasets show that our algorithm effectively imputes missing values based only on information from the dataset and exhibits better performance than standard algorithms that do not require any additional information. In this paper, the algorithm is implemented specifically for SNP data, but it can easily be adapted for other cases of missing value imputation.

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2301.00595

Country:

North America > United States > Colorado (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
South America > Brazil (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.96)

Add feedback

Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data

Dai, Zongyu, Bu, Zhiqi, Long, Qi

arXiv.org Artificial IntelligenceDec-21-2022

Missing data are ubiquitous in real world applications and, if not adequately handled, may lead to the loss of information and biased findings in downstream analysis. Particularly, high-dimensional incomplete data with a moderate sample size, such as analysis of multi-omics data, present daunting challenges. Imputation is arguably the most popular method for handling missing data, though existing imputation methods have a number of limitations. Single imputation methods such as matrix completion methods do not adequately account for imputation uncertainty and hence would yield improper statistical inference. In contrast, multiple imputation (MI) methods allow for proper inference but existing methods do not perform well in high-dimensional settings. Our work aims to address these significant methodological gaps, leveraging recent advances in neural network Gaussian process (NNGP) from a Bayesian viewpoint. We propose two NNGP-based MI methods, namely MI-NNGP, that can apply multiple imputations for missing values from a joint (posterior predictive) distribution. The MI-NNGP methods are shown to significantly outperform existing state-of-the-art methods on synthetic and real datasets, in terms of imputation error, statistical inference, robustness to missing rates, and computation costs, under three missing data mechanisms, MCAR, MAR, and MNAR.

artificial intelligence, machine learning, modeling & simulation, (18 more...)

arXiv.org Artificial Intelligence

2211.13297

Country: North America > United States > Pennsylvania (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Assessing Fairness in the Presence of Missing Data

Zhang, Yiliang, Long, Qi

arXiv.org Artificial IntelligenceDec-7-2021

Missing data are prevalent and present daunting challenges in real data analysis. While there is a growing body of literature on fairness in analysis of fully observed data, there has been little theoretical work on investigating fairness in analysis of incomplete data. In practice, a popular analytical approach for dealing with missing data is to use only the set of complete cases, i.e., observations with all features fully observed to train a prediction algorithm. However, depending on the missing data mechanism, the distribution of complete cases and the distribution of the complete data may be substantially different. When the goal is to develop a fair algorithm in the complete data domain where there are no missing values, an algorithm that is fair in the complete case domain may show disproportionate bias towards some marginalized groups in the complete data domain. To fill this significant gap, we study the problem of estimating fairness in the complete data domain for an arbitrary model evaluated merely using complete cases. We provide upper and lower bounds on the fairness estimation error and conduct numerical experiments to assess our theoretical results. Our work provides the first known theoretical results on fairness guarantee in analysis of incomplete data.

complete case, experiment, fairness, (13 more...)

arXiv.org Artificial Intelligence

2112.04899

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Florida > Broward County (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback