FairTabGen: Unifying Counterfactual and Causal Fairness in Synthetic Tabular Data Generation
Nagesh, Nitish, Shakibhamedan, Salar, Bagheri, Mahdi, Wang, Ziyu, TaheriNejad, Nima, Jantsch, Axel, Rahmani, Amir M.
–arXiv.org Artificial Intelligence
Generating synthetic data is crucial in privacy-sensitive, data-scarce settings, especially for tabular datasets widely used in real-world applications. A key challenge is improving counterfactual and causal fairness, while preserving high utility. We present FairTabGen, a fairness-aware large language model-based framework for tabular synthetic data generation. We integrate multiple fairness definitions including counterfactual and causal fairness into both its generation and evaluation pipelines. We use in-context learning, prompt refinement, and fairness-aware data curation to balance fairness and utility. Across diverse datasets, our method outperforms state-of-the-art GAN-based and LLM-based methods, achieving up to 10% improvements on fairness metrics such as demographic parity and path-specific causal effects while retaining statistical utility. Remarkably, it achieves these gains using less than 20% of the original data, highlighting its efficiency in low-data regimes. These results demonstrate a principled and practical approach for generating fair and useful synthetic tabular data.
arXiv.org Artificial Intelligence
Aug-19-2025
- Country:
- North America > United States > California (0.15)
- Genre:
- Research Report
- New Finding (0.66)
- Experimental Study (0.46)
- Research Report
- Industry:
- Health & Medicine (1.00)
- Technology: