AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing

Suh, Namjoon, Lin, Xiaofeng, Hsieh, Din-Yin, Honarkhah, Merhdad, Cheng, Guang

Nov-16-2023–arXiv.org Machine Learning

Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model for generating synthetic tabular data. The heterogeneous features in tabular data have been main obstacles in tabular data synthesis, and we tackle this problem by employing the auto-encoder architecture. When compared with the state-of-the-art tabular synthesizers, the resulting synthetic tables from our model show nice statistical fidelities to the real data, and perform well in downstream tasks for machine learning utilities. We conducted the experiments over $15$ publicly available datasets. Notably, our model adeptly captures the correlations among features, which has been a long-standing challenge in tabular data synthesis. Our code is available at https://github.com/UCLA-Trustworthy-AI-Lab/AutoDiffusion.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Machine Learning

Nov-16-2023

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found