AITopics | imputation technique

Collaborating Authors

imputation technique

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Processing of missing data by neural networks

Marek Śmieja, Łukasz Struski, Jacek Tabor, Bartosz Zieliński, Przemysław Spurek

Neural Information Processing SystemsFeb-12-2026, 17:00:59 GMT

Our idea is to replace typical neuron's response in the firsthiddenlayerbyitsexpected value. Thisapproach canbeappliedforvarious types ofnetworksatminimal costintheirmodification. Moreover,incontrast to recent approaches, it does not require complete data for training. Experimental results performed ondifferent types ofarchitectures showthatourmethod gives better results than typical imputation strategies and other methods dedicated for incompletedata.

artificial intelligence, machine learning, neural network, (15 more...)

Neural Information Processing Systems

Country:

Europe > Poland (0.05)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

Closing Gaps: An Imputation Analysis of ICU Vital Signs

Turubayev, Alisher, Shopova, Anna, Lange, Fabian, Kamalak, Mahmut, Mattes, Paul, Ayvasky, Victoria, Arnrich, Bert, Pfitzner, Bjarne, van de Water, Robin P.

arXiv.org Artificial IntelligenceOct-29-2025

As more Intensive Care Unit (ICU) data becomes available, the interest in developing clinical prediction models to improve healthcare protocols increases. However, the lack of data quality still hinders clinical prediction using Machine Learning (ML). Many vital sign measurements, such as heart rate, contain sizeable missing segments, leaving gaps in the data that could negatively impact prediction performance. Previous works have introduced numerous time-series imputation techniques. Nevertheless, more comprehensive work is needed to compare a representative set of methods for imputing ICU vital signs and determine the best practice. In reality, ad-hoc imputation techniques that could decrease prediction accuracy, like zero imputation, are still used. In this work, we compare established imputation techniques to guide researchers in improving the performance of clinical prediction models by selecting the most accurate imputation technique. We introduce an extensible and reusable benchmark with currently 15 imputation and 4 amputation methods, created for benchmarking on major ICU datasets. We hope to provide a comparative basis and facilitate further ML development to bring more models into clinical practice.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.24217

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Vital Signs (0.91)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Data as a Lever: A Neighbouring Datasets Perspective on Predictive Multiplicity

Ganesh, Prakhar, Hsu, Hsiang, Farnadi, Golnoosh

arXiv.org Artificial IntelligenceOct-27-2025

Multiplicity -- the existence of distinct models with comparable performance -- has received growing attention in recent years. While prior work has largely emphasized modelling choices, the critical role of data in shaping multiplicity has been comparatively overlooked. In this work, we introduce a neighbouring datasets framework to examine the most granular case: the impact of a single-data-point difference on multiplicity. Our analysis yields a seemingly counterintuitive finding: neighbouring datasets with greater inter-class distribution overlap exhibit lower multiplicity. This reversal of conventional expectations arises from a shared Rashomon parameter, and we substantiate it with rigorous proofs. Building on this foundation, we extend our framework to two practical domains: active learning and data imputation. For each, we establish natural extensions of the neighbouring datasets perspective, conduct the first systematic study of multiplicity in existing algorithms, and finally, propose novel multiplicity-aware methods, namely, multiplicity-aware data acquisition strategies for active learning and multiplicity-aware data imputation techniques.

artificial intelligence, machine learning, multiplicity, (17 more...)

arXiv.org Artificial Intelligence

2510.21303

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Meta-Imputation Balanced (MIB): An Ensemble Approach for Handling Missing Data in Biomedical Machine Learning

Azad, Fatemeh, Bosnić, Zoran, Kukar, Matjaž

arXiv.org Artificial IntelligenceSep-4-2025

--Missing data represents a fundamental challenge in machine learning applications, often reducing model performance and reliability. This problem is particularly acute in fields like bioinformatics and clinical machine learning, where datasets are frequently incomplete due to the nature of both data generation and data collection. While numerous imputation methods exist, from simple statistical techniques to advanced deep learning models, no single method consistently performs well across diverse datasets and missingness mechanisms. This paper proposes a novel Meta-Imputation approach that learns to combine the outputs of multiple base imputers to predict missing values more accurately. By training the proposed method called Meta-Imputation Balanced (MIB) on synthetically masked data with known ground truth, the system learns to predict the most suitable imputed value based on the behavior of each method. We evaluate our method on tabular data under the Missing Completely at Random (MCAR) assumption using both direct metrics, where Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are computed between imputed values and their corresponding original ground truth values in the artificially masked positions, and indirect metrics, which measure the RMSE of a target variable predicted by machine learning models trained on the imputed datasets. Across three benchmark datasets, the model achieved the lowest or near-lowest RMSE and delivered stable downstream predictive performance, even when individual imputers varied in performance.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.03316

Country: Europe > Slovenia (0.15)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Imputation Techniques for Short-Term Gaps in Heart Rate Data

Gupta, Vaibhav, Maleshkova, Maria

arXiv.org Artificial IntelligenceAug-13-2025

Recent advances in wearable technology have enabled the continuous monitoring of vital physiological signals, essential for predictive modeling and early detection of extreme physiological events. Among these physiological signals, heart rate (HR) plays a central role, as it is widely used in monitoring and managing cardiovascular conditions and detecting extreme physiological events such as hypoglycemia. However, data from wearable devices often suffer from missing values. To address this issue, recent studies have employed various imputation techniques. Traditionally, the effectiveness of these methods has been evaluated using predictive accuracy metrics such as RMSE, MAPE, and MAE, which assess numerical proximity to the original data. While informative, these metrics fail to capture the complex statistical structure inherent in physiological signals. This study bridges this gap by presenting a comprehensive evaluation of four statistical imputation methods, linear interpolation, K Nearest Neighbors (KNN), Piecewise Cubic Hermite Interpolating Polynomial (PCHIP), and B splines, for short term HR data gaps. We assess their performance using both predictive accuracy metrics and statistical distance measures, including the Cohen Distance Test (CDT) and Jensen Shannon Distance (JS Distance), applied to HR data from the D1NAMO dataset and the BIG IDEAs Lab Glycemic Variability and Wearable Device dataset. The analysis reveals limitations in existing imputation approaches and the absence of a robust framework for evaluating imputation quality in physiological signals. Finally, this study proposes a foundational framework to develop a composite evaluation metric to assess imputation performance.

artificial intelligence, imputation technique, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2508.08268

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Add feedback

MoCap-Impute: A Comprehensive Benchmark and Comparative Analysis of Imputation Methods for IMU-based Motion Capture Data

Bekhit, Mahmoud, Salah, Ahmad, Alrawahi, Ahmed Salim, Attia, Tarek, Ali, Ahmed, Eldesokey, Esraa, Fathalla, Ahmed

arXiv.org Artificial IntelligenceJul-15-2025

Motion capture (MoCap) data from wearable Inertial Measurement Units (IMUs) is vital for applications in sports science, but its utility is often compromised by missing data. Despite numerous imputation techniques, a systematic performance evaluation for IMU-derived MoCap time-series data is lacking. We address this gap by conducting a comprehensive comparative analysis of statistical, machine learning, and deep learning imputation methods. Our evaluation considers three distinct contexts: univariate time-series, multivariate across subjects, and multivariate across kinematic angles. To facilitate this benchmark, we introduce the first publicly available MoCap dataset designed specifically for imputation, featuring data from 53 karate practitioners. We simulate three controlled missingness mechanisms: missing completely at random (MCAR), block missingness, and a novel value-dependent pattern at signal transition points. Our experiments, conducted on 39 kinematic variables across all subjects, reveal that multivariate imputation frameworks consistently outperform univariate approaches, particularly for complex missingness. For instance, multivariate methods achieve up to a 50% mean absolute error reduction (MAE from 10.8 to 5.8) compared to univariate techniques for transition point missingness. Advanced models like Generative Adversarial Imputation Networks (GAIN) and Iterative Imputers demonstrate the highest accuracy in these challenging scenarios. This work provides a critical baseline for future research and offers practical recommendations for improving the integrity and robustness of Mo-Cap data analysis.

data quality, imputation, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2507.10334

Country:

Africa > Middle East > Egypt (0.68)
Asia > Middle East (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filling in the Blanks: Applying Data Imputation in incomplete Water Metering Data

Amaxilatis, Dimitrios, Sarantakos, Themistoklis, Chatzigiannakis, Ioannis, Mylonas, Georgios

arXiv.org Artificial IntelligenceJun-11-2025

--In this work, we explore the application of recent data imputation techniques to enhance monitoring and management of water distribution networks using smart water meters, based on data derived from a real-world IoT water grid monitoring deployment. Despite the detailed data produced by such meters, data gaps due to technical issues can significantly impact operational decisions and efficiency. Our results, by comparing various imputation methods, such as k-Nearest Neighbors, MissForest, Transformers, and Recurrent Neural Networks, indicate that effective data imputation can substantially enhance the quality of the insights derived from water consumption data as we study their effect on accuracy and reliability of water metering data to provide solutions in applications like leak detection and predictive maintenance scheduling. In the era of smart cities and advanced utility management, the monitoring of water grids has become increasingly pivotal to ensuring efficient distribution, sustainability, and infrastructure reliability. However, despite their sophistication, the occurrence of missing data due to various factors--ranging from technical malfunctions to data transmission errors-- remains an open challenge that undermines the integrity and actionable insights that can be derived from the datasets produced by such infrastructure. Moreover, the significance of addressing missing data extends beyond mere data completeness. In the context of water grid monitoring, it impacts decision-making processes related to water management, leak detection, and predictive maintenance, all of which have profound implications for operational efficiency and environmental sustainability.

artificial intelligence, data quality, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ISC260477.2024.11004254

2506.08882

Country: Europe > Greece (0.28)

Genre: Research Report > Promising Solution (0.93)

Industry:

Water & Waste Management > Water Management (0.68)
Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Activity and Subject Detection for UCI HAR Dataset with & without missing Sensor Data

Saha, Debashish, Malik, Piyush, Saha, Adrika

arXiv.org Artificial IntelligenceMay-13-2025

Current studies in Human Activity Recognition (HAR) primarily focus on the classification of activities through sensor data, while there is not much emphasis placed on recognizing the individuals performing these activities. This type of classification is very important for developing personalized and context-sensitive applications. Additionally, the issue of missing sensor data, which often occurs in practical situations due to hardware malfunctions, has not been explored yet. This paper seeks to fill these voids by introducing a lightweight LSTM-based model that can be used to classify both activities and subjects. The proposed model was used to classify the HAR dataset by UCI [1], achieving an accuracy of 93.89% in activity recognition (across six activities), nearing the 96.67% benchmark, and an accuracy of 80.19% in subject recognition (involving 30 subjects), thereby establishing a new baseline for this area of research. We then simulate the absence of sensor data to mirror real-world scenarios and incorporate imputation techniques, both with and without Principal Component Analysis (PCA), to restore incomplete datasets. We found that K-Nearest Neighbors (KNN) imputation performs the best for filling the missing sensor data without PCA because the use of PCA resulted in slightly lower accuracy. These results demonstrate how well the framework handles missing sensor data, which is a major step forward in using the Human Activity Recognition dataset for reliable classification tasks.

artificial intelligence, machine learning, recognition, (16 more...)

arXiv.org Artificial Intelligence

2505.0673

Country: North America > United States (0.29)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The influence of missing data mechanisms and simple missing data handling techniques on fairness

Bhatti, Aeysha, Sandrock, Trudie, Nienkemper-Swanepoel, Johane

arXiv.org Machine LearningMar-10-2025

Fairness of machine learning algorithms is receiving increasing attention, as such algorithms permeate the day-to-day aspects of our lives. One way in which bias can manifest in a dataset is through missing values. If data are missing, these data are often assumed to be missing completely randomly; in reality the propensity of data being missing is often tied to the demographic characteristics of individuals. There is limited research into how missing values and the handling thereof can impact the fairness of an algorithm. Most researchers either apply listwise deletion or tend to use the simpler methods of imputation (e.g. mean or mode) compared to the more advanced ones (e.g. multiple imputation); we therefore study the impact of the simpler methods on the fairness of algorithms. The starting point of the study is the mechanism of missingness, leading into how the missing data are processed and finally how this impacts fairness. Three popular datasets in the field of fairness are amputed in a simulation study. The results show that under certain scenarios the impact on fairness can be pronounced when the missingness mechanism is missing at random. Furthermore, elementary missing data handling techniques like listwise deletion and mode imputation can lead to higher fairness compared to more complex imputation methods like k-nearest neighbour imputation, albeit often at the cost of lower accuracy.

dataset, fairness, imputation, (17 more...)

arXiv.org Machine Learning

2503.07313

Country:

Europe > Austria > Vienna (0.14)
Africa > South Africa (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Masking the Gaps: An Imputation-Free Approach to Time Series Modeling with Missing Data

Neog, Abhilash, Daw, Arka, Khorasgani, Sepideh Fatemi, Karpatne, Anuj

arXiv.org Artificial IntelligenceFeb-17-2025

A significant challenge in time-series (TS) modeling is the presence of missing values in real-world TS datasets. Traditional two-stage frameworks, involving imputation followed by modeling, suffer from two key drawbacks: (1) the propagation of imputation errors into subsequent TS modeling, (2) the trade-offs between imputation efficacy and imputation complexity. While one-stage approaches attempt to address these limitations, they often struggle with scalability or fully leveraging partially observed features. To this end, we propose a novel imputation-free approach for handling missing values in time series termed Missing Feature-aware Time Series Modeling (MissTSM) with two main innovations. First, we develop a novel embedding scheme that treats every combination of time-step and feature (or channel) as a distinct token. Second, we introduce a novel Missing Feature-Aware Attention (MFAA) Layer to learn latent representations at every time-step based on partially observed features. We evaluate the effectiveness of MissTSM in handling missing values over multiple benchmark datasets.

dataset, misstsm, variate, (15 more...)

arXiv.org Artificial Intelligence

2502.15785

Country:

North America > United States > Virginia (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.69)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback