Evaluating Inter-Column Logical Relationships in Synthetic Tabular Data Generation
Long, Yunbo, Xu, Liming, Brintrup, Alexandra
–arXiv.org Artificial Intelligence
To evaluate the fidelity of synthetic tabular data, numerous metrics have been proposed to assess accuracy and diversity, including both low-order statistics (e.g., Density Estimation and Correlation Score (Zhang et al., 2023), Average Coverage Scores (Zein & Urvoy, 2022)) and high-order statistics (e.g., α-Precision and β-Recall (Alaa et al., 2022)). However, these metrics operate at a high level and fail to evaluate whether synthetic data preserves logical relationships, such as hierarchical or semantic dependencies between features. This highlights the need for a more fine-grained, context-aware evaluation of multivariate dependencies. To address this, we propose three evaluation metrics: Hierarchical Consistency Score (HCS), Multivariate Dependency Index (MDI), and Distributional Similarity Index (DSI). To assess the effectiveness of these metrics in quantifying inter-column relationships, we select five representative tabular data generation methods from different categories for evaluation. Their performance is measured using both existing and our proposed metrics on a real-world dataset rich in logical consistency and dependency constraints. Experimental results validate the effectiveness of our proposed metrics and reveal the limitations of existing approaches in preserving logical relationships in synthetic tabular data. Additionally, we discuss potential pathways to better capture logical constraints within joint distributions, paying the way for future advancements in synthetic tabular data generation.
arXiv.org Artificial Intelligence
Feb-6-2025
- Country:
- South America > Brazil
- São Paulo (0.04)
- Oceania
- Australia > Victoria (0.04)
- New Zealand > North Island
- Wellington Region > Wellington (0.04)
- Auckland Region > Auckland (0.04)
- North America
- Central America (0.06)
- Nicaragua (0.04)
- United States
- New Mexico (0.04)
- Washington > King County
- Seattle (0.04)
- Rhode Island > Providence County
- Providence (0.04)
- New Hampshire > Merrimack County
- Concord (0.04)
- California > Los Angeles County
- Lancaster (0.04)
- Mexico
- Honduras > Francisco Morazán
- Tegucigalpa (0.04)
- Cuba > Ciego de Ávila Province
- Ciego de Ávila (0.04)
- Europe
- Western Europe (0.05)
- Northern Europe (0.04)
- Italy > Piedmont (0.04)
- Netherlands (0.04)
- Norway (0.04)
- Eastern Europe (0.04)
- Finland > Uusimaa (0.04)
- Spain > Basque Country (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Lithuania > Vilnius County
- Vilnius (0.04)
- Germany
- North Rhine-Westphalia > Cologne Region
- Aachen (0.04)
- Hesse > Darmstadt Region
- Wiesbaden (0.04)
- Bavaria > Upper Bavaria
- Munich (0.04)
- North Rhine-Westphalia > Cologne Region
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Greater London > London (0.04)
- France
- Grand Est (0.04)
- Auvergne-Rhône-Alpes (0.04)
- Belarus > Grodno Region
- Grodno (0.04)
- Asia
- Southeast Asia (0.06)
- Turkmenistan (0.04)
- Russia > Far Eastern Federal District
- Jewish Autonomous Oblast > Birobidzhan (0.04)
- Philippines > Luzon
- National Capital Region > City of Manila (0.04)
- Middle East
- Iraq (0.04)
- Republic of Türkiye
- Yalova Province > Yalova (0.04)
- Mersin Province > Mersin (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- India
- Punjab (0.04)
- Maharashtra (0.04)
- China
- Liaoning Province (0.04)
- Henan Province (0.04)
- Guangdong Province (0.04)
- Africa
- West Africa (0.04)
- North Africa (0.04)
- Middle East > Morocco (0.04)
- East Africa (0.04)
- Democratic Republic of the Congo > Kinshasa Province
- Kinshasa (0.04)
- South America > Brazil
- Genre:
- Research Report (0.64)
- Technology: