TabINR: An Implicit Neural Representation Framework for Tabular Data Imputation

Ochs, Vincent, Bieder, Florentin, Hadramy, Sidaty el, Friedrich, Paul, Taha-Mehlitz, Stephanie, Taha, Anas, Cattin, Philippe C.

arXiv.org Artificial Intelligence 

Tabular data builds the basis for a wide range of applications, yet real-world datasets are frequently incomplete due to collection errors, privacy restrictions, or sensor failures. As missing values degrade the performance or hinder the applicability of downstream models, and while simple imputing strategies tend to introduce bias or distort the underlying data distribution, we require imputers that provide high-quality imputations, are robust across dataset sizes and yield fast inference. INR, an auto-decoder based Implicit Neural Representation (INR) framework that models tables as neural functions. Building on recent advances in generalizable INRs, we introduce learnable row and feature embeddings that effectively deal with the discrete structure of tabular data and can be inferred from partial observations, enabling instance adaptive imputations without modifying the trained model. We evaluate our framework across a diverse range of twelve real-world datasets and multiple missingness mechanisms, demonstrating consistently strong imputation accuracy, mostly matching or outperforming classical (KNN, MICE, MissForest) and deep learning based models (GAIN, ReMasker), with the clearest gains on high-dimensional datasets. Tabular data remains one of the most common data formats across domains such as healthcare, finance, and the social sciences (Shwartz-Ziv & Armon, 2022). In these fields, missing values are ubiquitous and can severely degrade the performance of downstream machine learning models. Poor handling of missingness not only reduces predictive accuracy but may also lead to biased decisions, with real-world consequences for applications such as medical diagnostics or financial risk assessment. These challenges make robust imputation a critical step for trustworthy tabular learning and data-driven decision making (Rubin, 1976).