AITopics | imputation strategy

Collaborating Authors

imputation strategy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Interpretable Machine Learning for Cognitive Aging: Handling Missing Data and Uncovering Social Determinant

Mao, Xi, Wang, Zhendong, Li, Jingyu, Mao, Lingchao, Essien, Utibe, Wang, Hairong, Ni, Xuelei Sherry

arXiv.org Artificial IntelligenceOct-14-2025

Early detection of Alzheimer's disease (AD) is crucial because its neurodegenerative effects are irreversible, and neuropathologic and social-behavioral risk factors accumulate years before diagnosis. Identifying higher-risk individuals earlier enables prevention, timely care, and equitable resource allocation. We predict cognitive performance from social determinants of health (SDOH) using the NIH NIA-supported PREPARE Challenge Phase 2 dataset derived from the nationally representative Mex-Cog cohort of the 2003 and 2012 Mexican Health and Aging Study (MHAS). Data: The target is a validated composite cognitive score across seven domains-orientation, memory, attention, language, constructional praxis, and executive function-derived from the 2016 and 2021 MHAS waves. Predictors span demographic, socioeconomic, health, lifestyle, psychosocial, and healthcare access factors. Methodology: Missingness was addressed with a singular value decomposition (SVD)-based imputation pipeline treating continuous and categorical variables separately. This approach leverages latent feature correlations to recover missing values while balancing reliability and scalability. After evaluating multiple methods, XGBoost was chosen for its superior predictive performance. Results and Discussion: The framework outperformed existing methods and the data challenge leaderboard, demonstrating high accuracy, robustness, and interpretability. SHAP-based post hoc analysis identified top contributing SDOH factors and age-specific feature patterns. Notably, flooring material emerged as a strong predictor, reflecting socioeconomic and environmental disparities. Other influential factors, age, SES, lifestyle, social interaction, sleep, stress, and BMI, underscore the multifactorial nature of cognitive aging and the value of interpretable, data-driven SDOH modeling.

artificial intelligence, flooring material, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.10952

Country:

North America > United States > Texas (0.28)
North America > United States > Georgia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Tractable Representations for Convergent Approximation of Distributional HJB Equations

Alhosh, Julie, Wiltzer, Harley, Meger, David

arXiv.org Machine LearningMar-7-2025

In reinforcement learning (RL), the long-term behavior of decision-making policies is evaluated based on their average returns. Distributional RL has emerged, presenting techniques for learning return distributions, which provide additional statistics for evaluating policies, incorporating risk-sensitive considerations. When the passage of time cannot naturally be divided into discrete time increments, researchers have studied the continuous-time RL (CTRL) problem, where agent states and decisions evolve continuously. In this setting, the Hamilton-Jacobi-Bellman (HJB) equation is well established as the characterization of the expected return, and many solution methods exist. However, the study of distributional RL in the continuous-time setting is in its infancy. Recent work has established a distributional HJB (DHJB) equation, providing the first characterization of return distributions in CTRL. These equations and their solutions are intractable to solve and represent exactly, requiring novel approximation techniques. This work takes strides towards this end, establishing conditions on the method of parameterizing return distributions under which the DHJB equation can be approximately solved. Particularly, we show that under a certain topological property of the mapping between statistics learned by a distributional RL algorithm and corresponding distributions, approximation of these statistics leads to close approximations of the solution of the DHJB equation. Concretely, we demonstrate that the quantile representation common in distributional RL satisfies this topological property, certifying an efficient approximation algorithm for continuous-time distributional RL.

equation, imputation strategy, return distribution, (15 more...)

arXiv.org Machine Learning

2503.05563

Country: North America > Canada > Quebec > Montreal (0.30)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

Precision Adaptive Imputation Network : An Unified Technique for Mixed Datasets

Joshi, Harsh, Mistri, Rajeshwari, Mali, Manasi, Kapure, Nachiket, Kumari, Parul

arXiv.org Machine LearningJan-18-2025

The challenge of missing data remains a significant obstacle across various scientific domains, necessitating the development of advanced imputation techniques that can effectively address complex missingness patterns. This study introduces the Precision Adaptive Imputation Network (PAIN), a novel algorithm designed to enhance data reconstruction by dynamically adapting to diverse data types, distributions, and missingness mechanisms. PAIN employs a tri-step process that integrates statistical methods, random forests, and autoencoders, ensuring balanced accuracy and efficiency in imputation. Through rigorous evaluation across multiple datasets, including those characterized by high-dimensional and correlated features, PAIN consistently outperforms traditional imputation methods, such as mean and median imputation, as well as other advanced techniques like MissForest. The findings highlight PAIN's superior ability to preserve data distributions and maintain analytical integrity, particularly in complex scenarios where missingness is not completely at random. This research not only contributes to a deeper understanding of missing data reconstruction but also provides a critical framework for future methodological innovations in data science and machine learning, paving the way for more effective handling of mixed-type datasets in real-world applications.

artificial intelligence, imputation technique, machine learning, (17 more...)

arXiv.org Machine Learning

2501.10667

Genre:

Research Report > New Finding (0.93)
Research Report > Promising Solution (0.68)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Imputation Matters: A Deeper Look into an Overlooked Step in Longitudinal Health and Behavior Sensing Research

Choube, Akshat, Majethia, Rahul, Bhattacharya, Sohini, Swain, Vedant Das, Li, Jiachen, Mishra, Varun

arXiv.org Artificial IntelligenceDec-8-2024

Longitudinal passive sensing studies for health and behavior outcomes often have missing and incomplete data. Handling missing data effectively is thus a critical data processing and modeling step. Our formative interviews with researchers working in longitudinal health and behavior passive sensing revealed a recurring theme: most researchers consider imputation a low-priority step in their analysis and inference pipeline, opting to use simple and off-the-shelf imputation strategies without comprehensively evaluating its impact on study outcomes. Through this paper, we call attention to the importance of imputation. Using publicly available passive sensing datasets for depression, we show that prioritizing imputation can significantly impact the study outcomes -- with our proposed imputation strategies resulting in up to 31% improvement in AUROC to predict depression over the original imputation strategy. We conclude by discussing the challenges and opportunities with effective imputation in longitudinal sensing studies.

artificial intelligence, data quality, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.06018

Country:

North America > United States > Florida > Hillsborough County > University (0.05)
Asia > India (0.04)
Asia > Nepal (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Consumer Health (1.00)
Education (1.00)
(3 more...)

Technology:

Information Technology > Data Science > Data Quality (0.90)
Information Technology > Communications > Mobile (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Task-oriented Time Series Imputation Evaluation via Generalized Representers

Wang, Zhixian, Yang, Linxiao, Sun, Liang, Wen, Qingsong, Wang, Yi

arXiv.org Artificial IntelligenceOct-10-2024

Time series analysis is widely used in many fields such as power energy, economics, and transportation, including different tasks such as forecasting, anomaly detection, classification, etc. Missing values are widely observed in these tasks, and often leading to unpredictable negative effects on existing methods, hindering their further application. In response to this situation, existing time series imputation methods mainly focus on restoring sequences based on their data characteristics, while ignoring the performance of the restored sequences in downstream tasks. Considering different requirements of downstream tasks (e.g., forecasting), this paper proposes an efficient downstream task-oriented time series imputation evaluation approach. By combining time series imputation with neural network models used for downstream tasks, the gain of different imputation strategies on downstream tasks is estimated without retraining, and the most favorable imputation value for downstream tasks is given by combining different imputation strategies according to the estimated gain. The corresponding code can be found in the repository https://github.com/hkuedl/Task-Oriented-Imputation.

dataset, forecasting, imputation, (13 more...)

arXiv.org Artificial Intelligence

2410.06652

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > California (0.04)
Europe > France (0.04)

Genre: Research Report (0.82)

Industry: Energy > Power Industry (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Representation Learning for Wearable-Based Applications in the Case of Missing Data

Jungo, Janosch, Xiang, Yutong, Gashi, Shkurta, Holz, Christian

arXiv.org Artificial IntelligenceJan-12-2024

Wearable devices continuously collect sensor data and use it to infer an individual's behavior, such as sleep, physical activity, and emotions. Despite the significant interest and advancements in this field, modeling multimodal sensor data in real-world environments is still challenging due to low data quality and limited data annotations. In this work, we investigate representation learning for imputing missing wearable data and compare it with state-of-the-art statistical approaches. We investigate the performance of the transformer model on 10 physiological and behavioral signals with different masking ratios. Our results show that transformers outperform baselines for missing data imputation of signals that change more frequently, but not for monotonic signals. We further investigate the impact of imputation strategies and masking rations on downstream classification tasks. Our study provides insights for the design and development of masking-based self-supervised learning tasks and advocates the adoption of hybrid-based imputation strategies to address the challenge of missing data in wearable devices.

dataset, learning, sensor data, (15 more...)

arXiv.org Artificial Intelligence

2401.05437

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
North America > Canada > Quebec > Capitale-Nationale Region > Quebec City (0.04)

Genre: Research Report > New Finding (0.69)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Distributional Bellman Operators over Mean Embeddings

Wenliang, Li Kevin, Déletang, Grégoire, Aitchison, Matthew, Hutter, Marcus, Ruoss, Anian, Gretton, Arthur, Rowland, Mark

arXiv.org Machine LearningDec-9-2023

We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions. We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework, provide asymptotic convergence theory, and examine the empirical performance of the algorithms on a suite of tabular tasks. Further, we show that this approach can be straightforwardly combined with deep reinforcement learning, and obtain a new deep RL agent that improves over baseline distributional approaches on the Arcade Learning Environment.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2312.07358

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.46)
Leisure & Entertainment > Sports (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values

Caruso, Camillo Maria, Guarrasi, Valerio, Ramella, Sara, Soda, Paolo

arXiv.org Artificial IntelligenceJul-28-2023

One of the most challenging fields where Artificial Intelligence (AI) can be applied is lung cancer research, specifically non-small cell lung cancer (NSCLC). In particular, overall survival (OS), the time between diagnosis and death, is a vital indicator of patient status, enabling tailored treatment and improved OS rates. In this analysis, there are two challenges to take into account. First, few studies effectively exploit the information available from each patient, leveraging both uncensored (i.e., dead) and censored (i.e., survivors) patients, considering also the events' time. Second, the handling of incomplete data is a common issue in the medical field. This problem is typically tackled through the use of imputation methods. Our objective is to present an AI model able to overcome these limits, effectively learning from both censored and uncensored patients and their available features, for the prediction of OS for NSCLC patients. We present a novel approach to survival analysis with missing values in the context of NSCLC, which exploits the strengths of the transformer architecture to account only for available features without requiring any imputation strategy. By making use of ad-hoc losses for OS, it is able to account for both censored and uncensored patients, as well as changes in risks over time. We compared our method with state-of-the-art models for survival analysis coupled with different imputation strategies. We evaluated the results obtained over a period of 6 years using different time granularities obtaining a Ct-index, a time-dependent variant of the C-index, of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2307.11465

Country:

Europe > Sweden > Västerbotten County > Umeå (0.04)
Europe > Italy (0.04)

Genre: Research Report > Promising Solution (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

Jeanselme, Vincent, De-Arteaga, Maria, Zhang, Zhe, Barrett, Jessica, Tom, Brian

arXiv.org Artificial IntelligenceJun-30-2023

Machine learning risks reinforcing biases present in data, and, as we argue in this work, in what is absent from data. In healthcare, biases have marked medical history, leading to unequal care affecting marginalised groups. Patterns in missing data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is often an overlooked preprocessing step, with attention placed on the reduction of reconstruction error and overall performance, ignoring how imputation can affect groups differently. Our work studies how imputation choices affect reconstruction errors across groups and algorithmic fairness properties of downstream predictions.

data mining, data quality, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2208.06648

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Oceania > New Zealand (0.04)
North America > United States > Virginia (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.92)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.93)
Health & Medicine > Health Care Technology (0.68)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Data Science > Data Quality (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Auditing the Imputation Effect on Fairness of Predictive Analytics in Higher Education

Anahideh, Hadis, Haghighat, Parian, Nezami, Nazanin, G`andara, Denisa

arXiv.org Artificial IntelligenceDec-29-2022

Colleges and universities use predictive analytics in a variety of ways to increase student success rates. Despite the potential for predictive analytics, two major barriers exist to their adoption in higher education: (a) the lack of democratization in deployment, and (b) the potential to exacerbate inequalities. Education researchers and policymakers encounter numerous challenges in deploying predictive modeling in practice. These challenges present in different steps of modeling including data preparation, model development, and evaluation. Nevertheless, each of these steps can introduce additional bias to the system if not appropriately performed. Most large-scale and nationally representative education data sets suffer from a significant number of incomplete responses from the research participants. While many education-related studies addressed the challenges of missing data, little is known about the impact of handling missing values on the fairness of predictive outcomes in practice. In this paper, we set out to first assess the disparities in predictive modeling outcomes for college-student success, then investigate the impact of imputation techniques on the model performance and fairness using a commonly used set of metrics. We conduct a prospective evaluation to provide a less biased estimation of future performance and fairness than an evaluation of historical data. Our comprehensive analysis of a real large-scale education dataset reveals key insights on modeling disparities and how imputation techniques impact the fairness of the student-success predictive outcome under different testing scenarios. Our results indicate that imputation introduces bias if the testing set follows the historical distribution. However, if the injustice in society is addressed and consequently the upcoming batch of observations is equalized, the model would be less biased.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2109.07908

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Indiana (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting > Higher Education (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback