Sequential Models in the Synthetic Data Vault

Zhang, Kevin, Patki, Neha, Veeramachaneni, Kalyan

arXiv.org Artificial Intelligence 

Synthetic data is machine-generated data that is created specially with the goal of mimicking the format and mathematical properties of real data. Its applications range from protecting the privacy of real data to creating enhanced, augmented datasets for data science. A few years back we created an open source ecosystem called the Synthetic Data Vault (SDV), with a goal to be the most comprehensive and trusted set of approaches for creating synthetic data. To that end, the open source SDV library offers a variety of models suited for different usages ranging from the original, multi-table SDV model [4] to CTGAN, a popular, GAN-based generative model [6]. SDV also provides a benchmarking system called SDGym, a set of metrics to evaluate synthetic data via a library called SDMetrics and a set reversible data transforms (called RDT) that allow several data types to be converted to numeric formats such that they can be modeled using generative models. With our abstractions and feedback from community of researchers, our ability to create new models outpaced our ability to present them in a mathematically rigorous way. Researchers and users have consistently requested to have such presentation. This paper is an attempt to describe the first sequential model in the SDV.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found