Synthetic Dataset Generation with Itemset-Based Generative Models
Lezcano, Christian, Arias, Marta
–arXiv.org Artificial Intelligence
Limited availability of real data hinders the development and growth of knowledge in all kinds of scientific and industrial endeavours. The field of synthetic data generation tries to overcome this problem by developing data generators that produce datasets without any privacy or publishing restrictions. In this paper we propose data generators that take an original real dataset as input, and produce "fake copies" of it that preserve much of the structure of the original dataset without revealing actual information from it. Synthetic data should capture characteristics from the original data and should also represent them in a general way. Therefore, another important advantage of using synthetic data is that it may allow researchers to discover new information and insights that are not present in real datasets by fine-tuning the parameters of the data generation process.
arXiv.org Artificial Intelligence
Jul-13-2020
- Country:
- North America > United States
- District of Columbia > Washington (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Technology: