Goto

Collaborating Authors

 it-gan


22456f4b545572855c766df5eefc9832-Supplemental.pdf

Neural Information Processing Systems

We use t-SNE [37] to project each real/fake record onto a 2-dim space. We summarize the statistics of our datasets as follows: 1. Adult has 22K training, 10K testing records with 6 continuous numerical, 8 categorical, and 1 discrete numerical columns. News has 32K training records, 8K testing records with 45 continuous numerical, 14 categorical, and 0 discrete numerical columns. We introduce one more visualization with Creditin Figure 4. IT-GAN(Q)shows the best similarity between the real and fake points. We compare our method with the following baseline methods, including state-of-the-art VAEs and GANs for tabular data synthesis and our IT-GAN's three variations: 1. Indis a heuristic method that we independently sample a value from each column's groundtruth distribution. We use these baselines' hyperparameters recommended in their original paper and/or GitHub repositories.


22456f4b545572855c766df5eefc9832-Supplemental.pdf

Neural Information Processing Systems

We list the best hyperparameter configurations for our methods in each dataset. In Tables 11 to 16, we list all results. The reason why we include the sub-optimal versions ofIT-GANis that one can easily decrease the attack success score by selecting sub-optimal models (and thereby, potentially sacrificing the synthesis quality). As shown in Table 23,IT-GAN(L)shows the lowest attack success scores overall. Wealso compare with various sub-optimal versions ofIT-GAN.While trainingIT-GAN,wecould obtain several sub-optimal versions around the epoch where we obtainIT-GAN and call them as IT-GAN(Sk)withk {1,2,3}.




Synthesizing Informative Training Samples with GAN

arXiv.org Artificial Intelligence

Remarkable progress has been achieved in synthesizing photo-realistic images with generative adversarial networks (GANs). Recently, GANs are utilized as the training sample generator when obtaining or storing real training data is expensive even infeasible. However, traditional GANs generated images are not as informative as the real training samples when being used to train deep neural networks. In this paper, we propose a novel method to synthesize Informative Training samples with GAN (IT-GAN). Specifically, we freeze a pre-trained GAN model and learn the informative latent vectors that correspond to informative training samples. The synthesized images are required to preserve information for training deep neural networks rather than visual reality or fidelity. Experiments verify that the deep neural networks can learn faster and achieve better performance when being trained with our IT-GAN generated images. We also show that our method is a promising solution to dataset condensation problem.


Invertible Tabular GANs: Killing Two Birds with OneStone for Tabular Data Synthesis

arXiv.org Artificial Intelligence

Tabular data synthesis has received wide attention in the literature. This is because available data is often limited, incomplete, or cannot be obtained easily, and data privacy is becoming increasingly important. In this work, we present a generalized GAN framework for tabular synthesis, which combines the adversarial training of GANs and the negative log-density regularization of invertible neural networks. The proposed framework can be used for two distinctive objectives. First, we can further improve the synthesis quality, by decreasing the negative log-density of real records in the process of adversarial training. On the other hand, by increasing the negative log-density of real records, realistic fake records can be synthesized in a way that they are not too much close to real records and reduce the chance of potential information leakage. We conduct experiments with real-world datasets for classification, regression, and privacy attacks. In general, the proposed method demonstrates the best synthesis quality (in terms of task-oriented evaluation metrics, e.g., F1) when decreasing the negative log-density during the adversarial training. If increasing the negative log-density, our experimental results show that the distance between real and fake records increases, enhancing robustness against privacy attacks.