Preserving logical and functional dependencies in synthetic tabular data
Umesh, Chaithra, Schultz, Kristian, Mahendra, Manjunath, Bej, Saparshi, Wolkenhauer, Olaf
–arXiv.org Artificial Intelligence
Dependencies among attributes are a common aspect of tabular data. However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored. In addition to the existing notion of functional dependencies, we introduce the notion of logical dependencies among the attributes in this article. Moreover, we provide a measure to quantify logical dependencies among attributes in tabular data. Utilizing this measure, we compare several state-of-the-art synthetic data generation algorithms and test their capability to preserve logical and functional dependencies on several publicly available datasets. We demonstrate that currently available synthetic tabular data generation algorithms do not fully preserve functional dependencies when they generate synthetic datasets. In addition, we also showed that some tabular synthetic data generation models can preserve inter-attribute logical dependencies. Our review and comparison of the state-of-the-art reveal research needs and opportunities to develop task-specific synthetic tabular data generation models. Keywords: Synthetic tabular data, Logical dependencies, Functional dependencies, Generative models 1. Introduction Dependencies among attributes are a common aspect of tabular data. A well-known fact in Database Management Systems is that if one wants to remove redundancies by dividing larger tables into smaller ones (Normalization) [1], one needs tools to identify functional dependencies present among the attributes of the larger table [2]. Preserving functional dependencies in synthetic tabular data is an area that has not been explored. Dependencies exist in both tabular and image data.
arXiv.org Artificial Intelligence
Sep-26-2024
- Country:
- Asia (0.93)
- Europe > Germany
- Bavaria > Upper Bavaria (0.14)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine > Therapeutic Area (0.71)
- Technology: