FairTabGen: Unifying Counterfactual and Causal Fairness in Synthetic Tabular Data Generation

Nagesh, Nitish, Shakibhamedan, Salar, Bagheri, Mahdi, Wang, Ziyu, TaheriNejad, Nima, Jantsch, Axel, Rahmani, Amir M.

Aug-19-2025–arXiv.org Artificial Intelligence

Generating synthetic data is crucial in privacy-sensitive, data-scarce settings, especially for tabular datasets widely used in real-world applications. A key challenge is improving counterfactual and causal fairness, while preserving high utility. We present FairTabGen, a fairness-aware large language model-based framework for tabular synthetic data generation. We integrate multiple fairness definitions including counterfactual and causal fairness into both its generation and evaluation pipelines. We use in-context learning, prompt refinement, and fairness-aware data curation to balance fairness and utility. Across diverse datasets, our method outperforms state-of-the-art GAN-based and LLM-based methods, achieving up to 10% improvements on fairness metrics such as demographic parity and path-specific causal effects while retaining statistical utility. Remarkably, it achieves these gains using less than 20% of the original data, highlighting its efficiency in low-data regimes. These results demonstrate a principled and practical approach for generating fair and useful synthetic tabular data.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Aug-19-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.15)

Genre:
- Research Report
  - New Finding (0.66)
  - Experimental Study (0.46)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.47)
    - Performance Analysis > Accuracy (0.46)