HARMONIC: Harnessing LLMs for Tabular Data Synthesis and Privacy Protection

Mar-17-2025, 16:07:12 GMT–Neural Information Processing Systems

Data serves as the fundamental basis for advancing deep learning. Therefore, exploring the methods for effectively using models like LLMs to generate synthetic tabular data, which is privacy-preserving but similar to original one, is urgent.In this paper, we introduce a new framework HARMONIC for tabular data generation and evaluation by LLMs. In the data generation part of our framework, we employ fine-tuning to generate tabular data and enhance privacy rather than continued pre-training which is often used by previous small-scale LLM-based methods. In particular, we construct an instruction fine-tuning dataset based on the idea of the k-nearest neighbors algorithm to inspire LLMs to discover inter-row relationships. By such fine-tuning, LLMs are trained to remember the format and connections of the data rather than the data itself, which reduces the risk of privacy leakage.

large language model, machine learning, natural language, (8 more...)

Neural Information Processing Systems

Mar-17-2025, 16:07:12 GMT

Conferences Web Page

Add feedback

Industry:
- Information Technology > Security & Privacy (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Nearest Neighbor Methods (0.60)
  - Natural Language > Large Language Model (1.00)