AITopics | Umesh, Chaithra

Collaborating Authors

Umesh, Chaithra

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Preserving logical and functional dependencies in synthetic tabular data

Umesh, Chaithra, Schultz, Kristian, Mahendra, Manjunath, Bej, Saparshi, Wolkenhauer, Olaf

arXiv.org Artificial IntelligenceSep-26-2024

Dependencies among attributes are a common aspect of tabular data. However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored. In addition to the existing notion of functional dependencies, we introduce the notion of logical dependencies among the attributes in this article. Moreover, we provide a measure to quantify logical dependencies among attributes in tabular data. Utilizing this measure, we compare several state-of-the-art synthetic data generation algorithms and test their capability to preserve logical and functional dependencies on several publicly available datasets. We demonstrate that currently available synthetic tabular data generation algorithms do not fully preserve functional dependencies when they generate synthetic datasets. In addition, we also showed that some tabular synthetic data generation models can preserve inter-attribute logical dependencies. Our review and comparison of the state-of-the-art reveal research needs and opportunities to develop task-specific synthetic tabular data generation models. Keywords: Synthetic tabular data, Logical dependencies, Functional dependencies, Generative models 1. Introduction Dependencies among attributes are a common aspect of tabular data. A well-known fact in Database Management Systems is that if one wants to remove redundancies by dividing larger tables into smaller ones (Normalization) [1], one needs tools to identify functional dependencies present among the attributes of the larger table [2]. Preserving functional dependencies in synthetic tabular data is an area that has not been explored. Dependencies exist in both tabular and image data.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.17684

Country:

Asia (0.93)
Europe > Germany > Bavaria > Upper Bavaria (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Convex space learning for tabular synthetic data generation

Mahendra, Manjunath, Umesh, Chaithra, Bej, Saptarshi, Schultz, Kristian, Wolkenhauer, Olaf

arXiv.org Artificial IntelligenceJul-13-2024

Generating synthetic samples from the convex space of the minority class is a popular oversampling approach for imbalanced classification problems. Recently, deep-learning approaches have been successfully applied to modeling the convex space of minority samples. Beyond oversampling, learning the convex space of neighborhoods in training data has not been used to generate entire tabular datasets. In this paper, we introduce a deep learning architecture (NextConvGeN) with a generator and discriminator component that can generate synthetic samples by learning to model the convex space of tabular data. The generator takes data neighborhoods as input and creates synthetic samples within the convex space of that neighborhood. Thereafter, the discriminator tries to classify these synthetic samples against a randomly sampled batch of data from the rest of the data space. We compared our proposed model with five state-of-the-art tabular generative models across ten publicly available datasets from the biomedical domain. Our analysis reveals that synthetic samples generated by NextConvGeN can better preserve classification and clustering performance across real and synthetic data than other synthetic data generation models. Synthetic data generation by deep learning of the convex space produces high scores for popular utility measures. We further compared how diverse synthetic data generation strategies perform in the privacy-utility spectrum and produced critical arguments on the necessity of high utility models. Our research on deep learning of the convex space of tabular data opens up opportunities in clinical research, machine learning model development, decision support systems, and clinical data sharing.

deep learning, machine learning, tabular synthetic data generation, (2 more...)

arXiv.org Artificial Intelligence

2407.09789

Genre: Research Report > Experimental Study (0.53)

Industry: Health & Medicine (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback