Quantum-Inspired Optimization Process for Data Imputation

Mohanty, Nishikanta, Behera, Bikash K., Mukherjee, Badshah, Ferrie, Christopher

arXiv.org Artificial Intelligence 

--Data imputation is a critical step in data pre-processing, particularly for datasets with missing or unreliable values. This study introduces a novel quantum-inspired imputation framework evaluated on the UCI Diabetes dataset, which contains biologically implausible missing values across several clinical features. The method integrates Principal Component Analysis (PCA) with quantum-assisted rotations, optimized through gradient-free classical optimizers--COBYLA, Simulated Annealing, and Differential Evolution--to reconstruct missing values while preserving statistical fidelity. Reconstructed values are constrained within 2 standard deviations of original feature distributions, avoiding unrealistic clustering around central tendencies. This approach achieves a substantial and statistically significant improvement, including an average reduction of over 85% in Wasserstein distance and Kolmogorov-Smirnov test p-values between 0.18 and 0.22, compared to p-values > 0.99 in classical methods such as Mean, KNN, and MICE. The method also eliminates zero-value artifacts and enhances the realism and variability of imputed data. By combining quantum-inspired transformations with a scalable classical framework, this methodology provides a robust solution for imputation tasks in domains such as healthcare and AI pipelines, where data quality and integrity are crucial. I NTRODUCTION Data imputation is a statistical technique for addressing missing or partial data values within a dataset. Missing data may arise from various sources, including sensor faults, human errors, system failures, or privacy constraints [1]. The imputation process replaces missing values with estimates derived from the available data while preserving the dataset's integrity and minimizing bias [2]. Imputation plays a vital role in numerous sectors and scenarios where data completeness is essential for analysis and decision-making.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found