AITopics | missing value

Collaborating Authors

missing value

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Interpretable Generalized Additive Models for Datasets with Missing Values

Neural Information Processing SystemsMay-26-2025, 17:07:19 GMT

Many important datasets contain samples that are missing one or more feature values. Maintaining the interpretability of machine learning models in the presence of such missing data is challenging. Singly or multiply imputing missing values complicates the model's mapping from features to labels. On the other hand, reasoning on indicator variables that represent missingness introduces a potentially large number of additional terms, sacrificing sparsity. We solve these problems with M-GAM, a sparse, generalized, additive modeling approach that incorporates missingness indicators and their interaction terms while maintaining sparsity through \ell_0 regularization.

artificial intelligence, interpretable generalized additive model, machine learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.88)

Add feedback

Coresets for Clustering with Missing Values

Neural Information Processing SystemsJan-16-2025, 22:58:44 GMT

We provide the first coreset for clustering points in \mathbb{R} d that have multiple missing values (coordinates). Previous coreset constructions only allow one missing coordinate. The challenge in this setting is that objective functions, like \kMeans, are evaluated only on the set of available (non-missing) coordinates, which varies across points. Recall that an \epsilon -coreset of a large dataset is a small proxy, usually a reweighted subset of points, that (1 \epsilon) -approximates the clustering objective for every possible center set.Our coresets for k -Means and k -Median clustering have size (jk) {O(\min(j,k))} (\epsilon {-1} d \log n) 2, where n is the number of data points, d is the dimension and j is the maximum number of missing coordinates for each data point. We further design an algorithm to construct these coresets in near-linear time, and consequently improve a recent quadratic-time PTAS for k -Means with missing values [Eiben et al., SODA 2021] to near-linear time.We validate our coreset construction, which is based on importance sampling and is easy to implement, on various real data sets.

clustering, coreset, missing value, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

Impact of Missing Values in Machine Learning: A Comprehensive Analysis

Ahmad, Abu Fuad, Sayeed, Md Shohel, Alshammari, Khaznah, Ahmed, Istiaque

arXiv.org Artificial IntelligenceOct-10-2024

Machine learning (ML) has become a ubiquitous tool across various domains of data mining and big data analysis. The efficacy of ML models depends heavily on high-quality datasets, which are often complicated by the presence of missing values. Consequently, the performance and generalization of ML models are at risk in the face of such datasets. This paper aims to examine the nuanced impact of missing values on ML workflows, including their types, causes, and consequences. Our analysis focuses on the challenges posed by missing values, including biased inferences, reduced predictive power, and increased computational burdens. The paper further explores strategies for handling missing values, including imputation techniques and removal strategies, and investigates how missing values affect model evaluation metrics and introduces complexities in cross-validation and model selection. The study employs case studies and real-world examples to illustrate the practical implications of addressing missing values. Finally, the discussion extends to future research directions, emphasizing the need for handling missing values ethically and transparently. The primary goal of this paper is to provide insights into the pervasive impact of missing values on ML models and guide practitioners toward effective strategies for achieving robust and reliable model outcomes.

artificial intelligence, dataset, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2410.08295

Country:

North America > United States > Iowa > Story County > Ames (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
North America > United States > New Mexico > Doña Ana County > Las Cruces (0.04)
Asia > Malaysia (0.04)

Genre:

Overview (0.93)
Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Banking & Finance (0.68)
Information Technology > Smart Houses & Appliances (0.47)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Can time series forecasting be automated? A benchmark and analysis

Sreedhara, Anvitha Thirthapura, Vanschoren, Joaquin

arXiv.org Artificial IntelligenceJul-25-2024

In the field of machine learning and artificial intelligence, time series forecasting plays a pivotal role across various domains such as finance, healthcare, and weather. However, the task of selecting the most suitable forecasting method for a given dataset is a complex task due to the diversity of data patterns and characteristics. This research aims to address this challenge by proposing a comprehensive benchmark for evaluating and ranking time series forecasting methods across a wide range of datasets. This study investigates the comparative performance of many methods from two prominent time series forecasting frameworks, AutoGluon-Timeseries, and sktime to shed light on their applicability in different real-world scenarios. This research contributes to the field of time series forecasting by providing a robust benchmarking methodology, and facilitating informed decision-making when choosing forecasting methods for achieving optimal prediction.

dataset, time sery, timeout 0, (12 more...)

arXiv.org Artificial Intelligence

2407.16445

Country:

Oceania > Australia (0.14)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (1.00)
Energy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Not Another Imputation Method: A Transformer-based Model for Missing Values in Tabular Datasets

Caruso, Camillo Maria, Soda, Paolo, Guarrasi, Valerio

arXiv.org Artificial IntelligenceJul-16-2024

Handling missing values in tabular datasets presents a significant challenge in training and testing artificial intelligence models, an issue usually addressed using imputation techniques. Here we introduce "Not Another Imputation Method" (NAIM), a novel transformer-based model specifically designed to address this issue without the need for traditional imputation techniques. NAIM employs feature-specific embeddings and a masked self-attention mechanism that effectively learns from available data, thus avoiding the necessity to impute missing values. Additionally, a novel regularization technique is introduced to enhance the model's generalization capability from incomplete data. We extensively evaluated NAIM on 5 publicly available tabular datasets, demonstrating its superior performance over 6 state-of-the-art machine learning models and 4 deep learning models, each paired with 3 different imputation techniques when necessary. The results highlight the efficacy of NAIM in improving predictive performance and resilience in the presence of missing data. To facilitate further research and practical application in handling missing data without traditional imputation methods, we made the code for NAIM available at https://github.com/cosbidev/NAIM.

imputation method, tabular dataset, transformer-based model, (1 more...)

arXiv.org Artificial Intelligence

2407.1154

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Reproducibility Study on Quantifying Language Similarity: The Impact of Missing Values in the URIEL Knowledge Base

Toossi, Hasti, Huai, Guo Qing, Liu, Jinyu, Khiu, Eric, Doğruöz, A. Seza, Lee, En-Shiun Annie

arXiv.org Artificial IntelligenceMay-17-2024

In the pursuit of supporting more languages around the world, tools that characterize properties of languages play a key role in expanding the existing multilingual NLP research. In this study, we focus on a widely used typological knowledge base, URIEL, which aggregates linguistic information into numeric vectors. Specifically, we delve into the soundness and reproducibility of the approach taken by URIEL in quantifying language similarity. Our analysis reveals URIEL's ambiguity in calculating language distances and in handling missing values. Moreover, we find that URIEL does not provide any information about typological features for 31\% of the languages it represents, undermining the reliabilility of the database, particularly on low-resource languages. Our literature review suggests URIEL and lang2vec are used in papers on diverse NLP tasks, which motivates us to rigorously verify the database as the effectiveness of these works depends on the reliability of the information the tool provides.

feature vector, uriel, vector, (14 more...)

arXiv.org Artificial Intelligence

2405.11125

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Indonesia > Bali (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.61)

Add feedback

A Missing Value Filling Model Based on Feature Fusion Enhanced Autoencoder

Liu, Xinyao, Du, Shengdong, Li, Tianrui, Teng, Fei, Yang, Yan

arXiv.org Artificial IntelligenceAug-3-2023

With the advent of the big data era, the data quality problem is becoming more critical. Among many factors, data with missing values is one primary issue, and thus developing effective imputation models is a key topic in the research community. Recently, a major research direction is to employ neural network models such as self-organizing mappings or automatic encoders for filling missing values. However, these classical methods can hardly discover interrelated features and common features simultaneously among data attributes. Especially, it is a very typical problem for classical autoencoders that they often learn invalid constant mappings, which dramatically hurts the filling performance. To solve the above-mentioned problems, we propose a missing-value-filling model based on a feature-fusion-enhanced autoencoder. We first incorporate into an autoencoder a hidden layer that consists of de-tracking neurons and radial basis function neurons, which can enhance the ability of learning interrelated features and common features. Besides, we develop a missing value filling strategy based on dynamic clustering that is incorporated into an iterative optimization process. This design can enhance the multi-dimensional feature fusion ability and thus improves the dynamic collaborative missing-value-filling performance. The effectiveness of the proposed model is validated by extensive experiments compared to a variety of baseline methods on thirteen data sets.

dataset, neuron, radial basis function neuron, (13 more...)

arXiv.org Artificial Intelligence

2208.13495

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (0.46)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Counterfactual Explanation with Missing Values

Kanamori, Kentaro, Takagi, Takuya, Kobayashi, Ken, Ike, Yuichi

arXiv.org Artificial IntelligenceApr-27-2023

Counterfactual Explanation (CE) is a post-hoc explanation method that provides a perturbation for altering the prediction result of a classifier. Users can interpret the perturbation as an "action" to obtain their desired decision results. Existing CE methods require complete information on the features of an input instance. However, we often encounter missing values in a given instance, and the previous methods do not work in such a practical situation. In this paper, we first empirically and theoretically show the risk that missing value imputation methods affect the validity of an action, as well as the features that the action suggests changing. Then, we propose a new framework of CE, named Counterfactual Explanation by Pairs of Imputation and Action (CEPIA), that enables users to obtain valid actions even with missing values and clarifies how actions are affected by imputation of the missing values. Specifically, our CEPIA provides a representative set of pairs of an imputation candidate for a given incomplete instance and its optimal action. We formulate the problem of finding such a set as a submodular maximization problem, which can be solved by a simple greedy algorithm with an approximation guarantee. Experimental results demonstrated the efficacy of our CEPIA in comparison with the baselines in the presence of missing values.

imputationce, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2304.14606

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Data Preprocessing with scikit-learn -- Missing Values

#artificialintelligenceOct-17-2022, 13:05:17 GMT

By popular demand from my previous article, in this tutorial I illustrate how to preprocess data using scikit-learn, a Python library for machine learning. Data preprocessing transforms data into a format which is more suitable for estimators. In my previous articles I illustrated how to deal with missing values, normalization, standardization, formatting and binning with Python pandas. In this tutorial I show you how to deal with mising values with scikit-learn. For the other preprocessing techniques in scikit-learn, I will write other posts.

dataset, scikit-learn library, tutorial, (14 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (0.64)
Information Technology > Artificial Intelligence > Machine Learning (0.36)
Information Technology > Communications > Social Media (0.32)

Add feedback

Recurrent Neural Networks for Multivariate Time Series with Missing Values

#artificialintelligenceMar-15-2022, 08:00:16 GMT

Gated Recurrent Units (GRUs) are gating mechanisms introduced in 2014 by Cho et al. Unlike LSTMs that have 3 gates, in GRUs have 2 gates to operate the time series data. Its main structure can be seen in Figure 3, and for further understanding, Understanding GRU Networks is highly recommended. Also, if you want to understand LSTMs and GRUs in the place, this article is recommended: Illustrated Guide to LSTM's and GRU's: A step by step explanation.

decay mechanism, multivariate time sery, recurrent neural network, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback