Sequential Models in the Synthetic Data Vault
Zhang, Kevin, Patki, Neha, Veeramachaneni, Kalyan
–arXiv.org Artificial Intelligence
Synthetic data is machine-generated data that is created specially with the goal of mimicking the format and mathematical properties of real data. Its applications range from protecting the privacy of real data to creating enhanced, augmented datasets for data science. A few years back we created an open source ecosystem called the Synthetic Data Vault (SDV), with a goal to be the most comprehensive and trusted set of approaches for creating synthetic data. To that end, the open source SDV library offers a variety of models suited for different usages ranging from the original, multi-table SDV model [4] to CTGAN, a popular, GAN-based generative model [6]. SDV also provides a benchmarking system called SDGym, a set of metrics to evaluate synthetic data via a library called SDMetrics and a set reversible data transforms (called RDT) that allow several data types to be converted to numeric formats such that they can be modeled using generative models. With our abstractions and feedback from community of researchers, our ability to create new models outpaced our ability to present them in a mathematically rigorous way. Researchers and users have consistently requested to have such presentation. This paper is an attempt to describe the first sequential model in the SDV.
arXiv.org Artificial Intelligence
Jul-28-2022
- Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Health & Medicine (1.00)
- Technology: