AITopics | softimpute

Collaborating Authors

softimpute

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Predicting missing values: A good idea?

van Buuren, Stef

arXiv.org Machine LearningMay-6-2026

Minimizing the Mean Squared Error (MSE) is a key objective in machine learning and is commonly used for imputing missing values. While this approach provides accurate point estimates, it introduces systematic biases in downstream analyses. These biases affect key parameters such as variance, prevalence, correlation, slope, and explained variance. The root cause is that imputed values optimized for MSE are averages, which reduce the natural variability in the data. This paper demonstrates that adding noise to imputed values can effectively eliminate these biases. The required noise level is proportional to the MSE. Using a toy example in a multivariate normal setting, we compare two methods: predictive imputation, which minimizes MSE, and stochastic imputation, which incorporates random noise. Simulation results show that predictive methods systematically introduce bias, while stochastic methods preserve the data's natural variability and produce unbiased estimates. We also evaluate three popular imputation tools -- missForest, softImpute, and mice -- and observe consistent biases in predictive methods. These findings highlight that MSE is an inadequate measure of imputation quality, as it prioritizes accuracy over variability. Incorporating noise into imputation methods is essential to prevent biases and ensure valid downstream analyses, underscoring the importance of stochastic approaches for handling incomplete data.

artificial intelligence, machine learning, mechanism, (18 more...)

arXiv.org Machine Learning

2605.03733

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Discordance Minimization-based Imputation Algorithms for Missing Values in Rating Data

Park, Young Woong, Kim, Jinhak, Zhu, Dan

arXiv.org Machine LearningNov-7-2023

Ratings are frequently used to evaluate and compare subjects in various applications, from education to healthcare, because ratings provide succinct yet credible measures for comparing subjects. However, when multiple rating lists are combined or considered together, subjects often have missing ratings, because most rating lists do not rate every subject in the combined list. In this study, we propose analyses on missing value patterns using six real-world data sets in various applications, as well as the conditions for applicability of imputation algorithms. Based on the special structures and properties derived from the analyses, we propose optimization models and algorithms that minimize the total rating discordance across rating providers to impute missing ratings in the combined rating lists, using only the known rating information. The total rating discordance is defined as the sum of the pairwise discordance metric, which can be written as a quadratic function. Computational experiments based on real-world and synthetic rating data sets show that the proposed methods outperform the state-of-the-art general imputation methods in the literature in terms of imputation accuracy.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1007/s10994-023-06452-4

2311.04035

Country:

North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Iowa > Story County > Ames (0.04)
Asia > South Korea > Gyeonggi-do > Suwon (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Media (0.93)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Imputation and low-rank estimation with Missing Non At Random data

Sportisse, Aude, Boyer, Claire, Josse, Julie

arXiv.org Machine LearningJan-7-2019

Preprint submitted to January 8, 2019 the use of Expectation-Maximization (EM) algorithm [8] which allows to get the maximum likelihood estimators in various incomplete-data problems [21]. The theoretical guarantees of these methods ensuring the correct prediction of missing values or the correct estimation of some parameters of interest are only valid if some assumptions are made on how the data came to be missing. Rubin [31] introduced three types of missing-data mechanisms: (i) the restrictive assumptions of missing completely at random (MCAR) data, (ii) the missing at random (MAR) data, where the missing data may only depend on the observable variables, and (iii) the more general assumption of missing not at random (MNAR) data, i.e. when the unavailability of the data depends on the values of other variables and its own value. A classic example of MNAR data, which is the focus of the paper, is surveys where rich people would be less willing to disclose their income or where people would be less incline to answer sensitive questions on their addictive use. Another example would be the diagnosis of Alzheimer's disease, which can be made using a score obtained by the patient on a specific test. However, when a patient has the disease, he or she has difficulty answering questions and is more likely to abandon the test before it ends.

algorithm, mechanism, softimpute, (15 more...)

arXiv.org Machine Learning

1812.11409

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Genre:

Research Report > New Finding (0.47)
Research Report > Experimental Study (0.47)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.88)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback