High-Quality Tabular Data Generation using Post-Selected VAE

Jul-17-2024–arXiv.org Artificial Intelligence

Synthetic tabular data is becoming a necessity as concerns about data privacy intensify in the world. Tabular data can be useful for testing various systems, simulating real data, analyzing the data itself or building predictive models. Unfortunately, such data may not be available due to confidentiality issues. Previous techniques, such as TVAE (Xu et al., 2019) or OCTGAN (Kim et al., 2021), are either unable to handle particularly complex datasets, or are complex in themselves, resulting in inferior run time performance. This paper introduces PSVAE, a new simple model that is capable of producing high-quality synthetic data in less run time. PSVAE incorporates two key ideas: loss optimization and post-selection. Along with these ideas, the proposed model compensates for underrepresented categories and uses a modern activation function, Mish (Misra, 2019).

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

Jul-17-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Ukraine > Cherkasy Oblast (0.15)

Genre:
- Research Report (0.50)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Neural Networks (1.00)
  - Data Science (1.00)
  - Security & Privacy (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found