AITopics | value imputation

Collaborating Authors

value imputation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SimultaneousMissingValueImputation andStructureLearningwithGroups

Neural Information Processing SystemsFeb-10-2026, 04:06:45 GMT

Understanding the structural relationships among different variables provides critical insights in manyreal-worldapplications, suchasmedicine,economics andeducation [42,62]. Thus,learning graphs from observed data, known as structure learning, has recently made remarkable progress [10,61,63,64]. Formanyapplications, variables inthedata can begathered into semantically meaningful groups, where useful insights are at group level. For example, in finance, one may be interested in how a financial situation influences different industries (i.e.

artificial intelligence, arxivpreprintarxiv, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Spain (0.04)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.35)

Add feedback

LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

Zhang, Xingxuan, Ren, Gang, Yu, Han, Yuan, Hao, Wang, Hui, Li, Jiansheng, Wu, Jiayun, Mo, Lang, Mao, Li, Hao, Mingchao, Dai, Ningbo, Xu, Renzhe, Li, Shuyang, Zhang, Tianyang, He, Yue, Wang, Yuanrui, Zhang, Yunjia, Xu, Zijing, Li, Dongzhe, Gao, Fang, Zou, Hao, Liu, Jiandong, Liu, Jiashuo, Xu, Jiawei, Cheng, Kaijie, Li, Kehan, Zhou, Linjun, Li, Qing, Fan, Shaohua, Lin, Xiaoyu, Han, Xinyan, Li, Xuanyue, Lu, Yan, Xue, Yuan, Jiang, Yuanyuan, Wang, Zimu, Wang, Zhenlei, Cui, Peng

arXiv.org Artificial IntelligenceNov-10-2025

We argue that progress toward general intelligence requires complementary foundation models grounded in language, the physical world, and structured data. This report presents LimiX-16M and LimiX-2M, two instantiations of our large structured-data models (LDMs). Both models treat structured data as a joint distribution over variables and missingness, thus capable of addressing a wide range of tabular tasks through query-based conditional prediction via a single model. They are pretrained using masked joint-distribution modeling with an episodic, context-conditional objective, supporting rapid, training-free adaptation at inference. We evaluate LimiX models across 11 large structured-data benchmarks with broad regimes of sample size, feature dimensionality, class number, categorical-to-numerical feature ratio, missingness, and sample-to-feature ratios. LimiX-16M consistently surpasses strong baselines, as shown in Figure 1 and Figure 2. The superiority holds across a wide range of tasks, such as classification, regression, missing value imputation, and data generation, often by substantial margins, while avoiding task-specific architectures or bespoke training per task. Notably, LimiX-2M delivers strong results under tight compute and memory budgets. We also present the first scaling law study for LDMs, revealing how data and model scaling jointly influence downstream performance and offering quantitative guidance for tabular foundation modeling. All LimiX models are publicly accessible under Apache 2.0.

benchmark, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2509.03505

Genre: Research Report > New Finding (0.92)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Simultaneous Missing Value Imputation and Structure Learning with Groups Pablo Morales-Alvarez University of Granada Wenbo Gong Microsoft Research Angus Lamb

Neural Information Processing SystemsAug-16-2025, 09:41:30 GMT

For many applications, variables in the data can be gathered into semantically meaningful groups, where useful insights are at group level. For example, in finance, one may be interested in how a financial situation influences different industries (i.e.

artificial intelligence, bayesian inference, machine learning, (13 more...)

Neural Information Processing Systems

Country: Europe > Spain (0.28)

Genre: Research Report (0.46)

Industry:

Education (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)

Add feedback

No Imputation of Missing Values In Tabular Data Classification Using Incremental Learning

Samad, Manar D., Akhter, Kazi Fuad B., Rabbani, Shourav B., Kowsar, Ibna

arXiv.org Machine LearningApr-20-2025

Tabular data sets with varying missing values are prepared for machine learning using an arbitrary imputation strategy. Synthetic values generated by imputation models often concern data stakeholders about computational complexity, data quality, and data-driven outcomes. This paper eliminates these concerns by proposing no imputation incremental learning (NIIL) of tabular data with varying missing value rates and types. The proposed method incrementally learns partitions of overlapping feature sets while using attention masks to exclude missing values from attention scoring. The average classification performance rank order across 15 diverse tabular data sets highlights the superiority of NIIL over 11 state-of-the-art learning methods with or without missing value imputations. Further experiments substantiate the robustness of NIIL against varying missing value types and rates compared to methods that involve the imputation of missing values. Our empirical analysis reveals that a feature partition size of half of the original feature space is, computation-wise and accuracy-wise, the best choice for the proposed incremental learning. The proposed method is one of the first deep learning solutions that can effectively learn tabular data without requiring the imputation of missing values.

artificial intelligence, imputation, machine learning, (19 more...)

arXiv.org Machine Learning

2504.1461

Country: North America > United States > Tennessee > Davidson County > Nashville (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.93)

Add feedback

DeepIFSAC: Deep Imputation of Missing Values Using Feature and Sample Attention within Contrastive Framework

Kowsar, Ibna, Rabbani, Shourav B., Hou, Yina, Samad, Manar D.

arXiv.org Machine LearningFeb-5-2025

Missing values of varying patterns and rates in real-world tabular data pose a significant challenge in developing reliable data-driven models. Existing missing value imputation methods use statistical and traditional machine learning and are ineffective when the missing rate is high and not at random. This paper explores row and column attention in tabular data as between-feature and between-sample attention in a novel framework to reconstruct missing values. The proposed method uses the CutMix data augmentation within a contrastive learning framework to improve the uncertainty of missing value estimation. The performance and generalizability of trained imputation models are evaluated on set-aside test data folds with missing values. The proposed framework outperforms nine state-of-the-art imputation methods across several missing value types and rates (10\%-50\%) on a diverse selection of twelve tabular data sets. We evaluate the quality of imputed data using real-world electronic health records with missing values, demonstrating our proposed framework's superiority to state-of-the-art statistical, machine learning, and deep imputation methods. This paper highlights the heterogeneity of tabular data sets to recommend imputation methods based on missing value types and data characteristics.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2501.1091

Country: North America > United States > Tennessee > Davidson County > Nashville (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.69)
Health & Medicine > Health Care Technology > Medical Record (0.56)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data Wrangling Task Automation Using Code-Generating Language Models

Akella, Ashlesha, Narayanam, Krishnasuri

arXiv.org Artificial IntelligenceFeb-4-2025

Ensuring data quality in large tabular datasets is a critical challenge, typically addressed through data wrangling tasks. Traditional statistical methods, though efficient, cannot often understand the semantic context and deep learning approaches are resource-intensive, requiring task and dataset-specific training. To overcome these shortcomings, we present an automated system that utilizes large language models to generate executable code for tasks like missing value imputation, error detection, and error correction. Our system aims to identify inherent patterns in the data while leveraging external knowledge, effectively addressing both memory-dependent and memory-independent tasks.

dataset, imputation, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2502.15732

Country:

North America > United States > California (0.04)
Asia > India (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Quality > Data Cleaning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Data Enrichment Opportunities for Distribution Grid Cable Networks using Variational Autoencoders

Sundsgaard, Konrad, Bölat, Kutay, Yang, Guangya

arXiv.org Artificial IntelligenceJan-18-2025

Electricity distribution cable networks suffer from incomplete and unbalanced data, hindering the effectiveness of machine learning models for predictive maintenance and reliability evaluation. Features such as the installation date of the cables are frequently missing. To address data scarcity, this study investigates the application of Variational Autoencoders (VAEs) for data enrichment, synthetic data generation, imbalanced data handling, and outlier detection. Based on a proof-of-concept case study for Denmark, targeting the imputation of missing age information in cable network asset registers, the analysis underlines the potential of generative models to support data-driven maintenance. However, the study also highlights several areas for improvement, including enhanced feature importance analysis, incorporating network characteristics and external features, and handling biases in missing data. Future initiatives should expand the application of VAEs by incorporating semi-supervised learning, advanced sampling techniques, and additional distribution grid elements, including low-voltage networks, into the analysis.

artificial intelligence, machine learning, vae, (18 more...)

arXiv.org Artificial Intelligence

2501.1092

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report (0.50)

Industry: Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SketchFill: Sketch-Guided Code Generation for Imputing Derived Missing Values

Zhang, Yunfan, Li, Changlun, Luo, Yuyu, Tang, Nan

arXiv.org Artificial IntelligenceDec-26-2024

Missing value is a critical issue in data science, significantly impacting the reliability of analyses and predictions. Missing value imputation (MVI) is a longstanding problem because it highly relies on domain knowledge. Large language models (LLMs) have emerged as a promising tool for data cleaning, including MVI for tabular data, offering advanced capabilities for understanding and generating content. However, despite their promise, existing LLM techniques such as in-context learning and Chain-of-Thought (CoT) often fall short in guiding LLMs to perform complex reasoning for MVI, particularly when imputing derived missing values, which require mathematical formulas and data relationships across rows and columns. This gap underscores the need for further advancements in LLM methodologies to enhance their reasoning capabilities for more reliable imputation outcomes. To fill this gap, we propose SketchFill, a novel sketch-based method to guide LLMs in generating accurate formulas to impute missing numerical values. Our experimental results demonstrate that SketchFill significantly outperforms state-of-the-art methods, achieving 56.2% higher accuracy than CoT-based methods and 78.8% higher accuracy than MetaGPT. This sets a new standard for automated data cleaning and advances the field of MVI for numerical values.

imputation, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2412.19113

Country: Asia (1.00)

Genre: Research Report > New Finding (0.66)

Industry:

Banking & Finance (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

GIG: Graph Data Imputation With Graph Differential Dependencies

Hua, Jiang, Bewong, Michael, Kwashie, Selasi, Rahman, MD Geaur, Hu, Junwei, Guo, Xi, Fen, Zaiwen

arXiv.org Artificial IntelligenceOct-21-2024

Data imputation addresses the challenge of imputing missing values in database instances, ensuring consistency with the overall semantics of the dataset. Although several heuristics which rely on statistical methods, and ad-hoc rules have been proposed. These do not generalise well and often lack data context. Consequently, they also lack explainability. The existing techniques also mostly focus on the relational data context making them unsuitable for wider application contexts such as in graph data. In this paper, we propose a graph data imputation approach called GIG which relies on graph differential dependencies (GDDs). GIG, learns the GDDs from a given knowledge graph, and uses these rules to train a transformer model which then predicts the value of missing data within the graph. By leveraging GDDs, GIG incoporates semantic knowledge into the data imputation process making it more reliable and explainable. Experimental results on seven real-world datasets highlight GIG's effectiveness compared to existing state-of-the-art approaches.

data mining, imputation, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2410.15747

Country:

Oceania > Australia (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
Asia > British Indian Ocean Territory > Diego Garcia (0.04)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Machine Learning for Missing Value Imputation

Ahmad, Abu Fuad, Alshammari, Khaznah, Ahmed, Istiaque, Sayed, MD Shohel

arXiv.org Artificial IntelligenceOct-10-2024

In recent times, a considerable number of research studies have been carried out to address the issue of Missing Value Imputation (MVI). MVI aims to provide a primary solution for datasets that have one or more missing attribute values. The advancements in Artificial Intelligence (AI) drive the development of new and improved machine learning (ML) algorithms and methods. The advancements in ML have opened up significant opportunities for effectively imputing these missing values. The main objective of this article is to conduct a comprehensive and rigorous review, as well as analysis, of the state-of-the-art ML applications in MVI methods. This analysis seeks to enhance researchers' understanding of the subject and facilitate the development of robust and impactful interventions in data preprocessing for Data Analytics. The review is performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) technique. More than 100 articles published between 2014 and 2023 are critically reviewed, considering the methods and findings. Furthermore, the latest literature is examined to scrutinize the trends in MVI methods and their evaluation. The accomplishments and limitations of the existing literature are discussed in detail. The survey concludes by identifying the current gaps in research and providing suggestions for future research directions and emerging trends in related fields of interest.

data mining, imputation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.08308

Country:

North America > United States > New Mexico > Doña Ana County > Las Cruces (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
North America > Mexico (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Health & Medicine (0.67)
Education (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(5 more...)

Add feedback