Synthetic Dataset Generation with Itemset-Based Generative Models

Lezcano, Christian, Arias, Marta

arXiv.org Artificial Intelligence 

Limited availability of real data hinders the development and growth of knowledge in all kinds of scientific and industrial endeavours. The field of synthetic data generation tries to overcome this problem by developing data generators that produce datasets without any privacy or publishing restrictions. In this paper we propose data generators that take an original real dataset as input, and produce "fake copies" of it that preserve much of the structure of the original dataset without revealing actual information from it. Synthetic data should capture characteristics from the original data and should also represent them in a general way. Therefore, another important advantage of using synthetic data is that it may allow researchers to discover new information and insights that are not present in real datasets by fine-tuning the parameters of the data generation process.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found