Evaluating Synthetically Generated Data from Small Sample Sizes: An Experimental Study

Marin, Javier

arXiv.org Artificial Intelligence 

In this paper, we propose a method for measuring the similarity low sample tabular data with synthetically generated data with a larger number of samples than original. This process is also known as data augmentation. But significance levels obtained from non-parametric tests are suspect when sample size is small. Our method uses a combination of geometry, topology and robust statistics for hypothesis testing in order to compare the "validity" of generated data. We also compare the results with common global metric methods available in the literature for large sample size data.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found