Evaluating Inter-Column Logical Relationships in Synthetic Tabular Data Generation
Long, Yunbo, Xu, Liming, Brintrup, Alexandra
–arXiv.org Artificial Intelligence
To evaluate the fidelity of synthetic tabular data, numerous metrics have been proposed to assess accuracy and diversity, including both low-order statistics (e.g., Density Estimation and Correlation Score (Zhang et al., 2023), Average Coverage Scores (Zein & Urvoy, 2022)) and high-order statistics (e.g., α-Precision and β-Recall (Alaa et al., 2022)). However, these metrics operate at a high level and fail to evaluate whether synthetic data preserves logical relationships, such as hierarchical or semantic dependencies between features. This highlights the need for a more fine-grained, context-aware evaluation of multivariate dependencies. To address this, we propose three evaluation metrics: Hierarchical Consistency Score (HCS), Multivariate Dependency Index (MDI), and Distributional Similarity Index (DSI). To assess the effectiveness of these metrics in quantifying inter-column relationships, we select five representative tabular data generation methods from different categories for evaluation. Their performance is measured using both existing and our proposed metrics on a real-world dataset rich in logical consistency and dependency constraints. Experimental results validate the effectiveness of our proposed metrics and reveal the limitations of existing approaches in preserving logical relationships in synthetic tabular data. Additionally, we discuss potential pathways to better capture logical constraints within joint distributions, paying the way for future advancements in synthetic tabular data generation.
arXiv.org Artificial Intelligence
Feb-6-2025
- Country:
- Africa
- Democratic Republic of the Congo > Kinshasa Province
- Kinshasa (0.04)
- East Africa (0.04)
- Middle East > Morocco (0.04)
- North Africa (0.04)
- West Africa (0.04)
- Democratic Republic of the Congo > Kinshasa Province
- Asia
- China
- Guangdong Province (0.04)
- Henan Province (0.04)
- Liaoning Province (0.04)
- India
- Maharashtra (0.04)
- Punjab (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Middle East
- Iraq (0.04)
- Republic of Türkiye
- Mersin Province > Mersin (0.04)
- Yalova Province > Yalova (0.04)
- Philippines > Luzon
- National Capital Region > City of Manila (0.04)
- Russia > Far Eastern Federal District
- Jewish Autonomous Oblast > Birobidzhan (0.04)
- Southeast Asia (0.06)
- Turkmenistan (0.04)
- China
- Europe
- Spain > Basque Country (0.04)
- Northern Europe (0.04)
- Finland > Uusimaa (0.04)
- Eastern Europe (0.04)
- Western Europe (0.05)
- Belarus > Grodno Region
- Grodno (0.04)
- France
- Auvergne-Rhône-Alpes (0.04)
- Grand Est (0.04)
- Norway (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Greater London > London (0.04)
- Netherlands (0.04)
- Germany
- Bavaria > Upper Bavaria
- Munich (0.04)
- Hesse > Darmstadt Region
- Wiesbaden (0.04)
- North Rhine-Westphalia > Cologne Region
- Aachen (0.04)
- Bavaria > Upper Bavaria
- Italy > Piedmont (0.04)
- Lithuania > Vilnius County
- Vilnius (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- North America
- Central America (0.06)
- Cuba > Ciego de Ávila Province
- Ciego de Ávila (0.04)
- Honduras > Francisco Morazán
- Tegucigalpa (0.04)
- Mexico
- Nicaragua (0.04)
- United States
- California > Los Angeles County
- Lancaster (0.04)
- New Hampshire > Merrimack County
- Concord (0.04)
- New Mexico (0.04)
- Rhode Island > Providence County
- Providence (0.04)
- Washington > King County
- Seattle (0.04)
- California > Los Angeles County
- Oceania
- Australia > Victoria (0.04)
- New Zealand > North Island
- Auckland Region > Auckland (0.04)
- Wellington Region > Wellington (0.04)
- South America > Brazil
- São Paulo (0.04)
- Africa
- Genre:
- Research Report (0.64)
- Technology: