Democratizing Tabular Data Access with an Open$\unicode{x2013}$Source Synthetic$\unicode{x2013}$Data SDK

Krchova, Ivona, Vieyra, Mariana Vargas, Scriminaci, Mario, Sidorenko, Andrey

Nov-14-2025–arXiv.org Artificial Intelligence

Abstract--Machine learning development critically depends on access to high-quality data. However, increasing restrictions due to privacy, proprietary interests, and ethical concerns have created significant barriers to data accessibility. Synthetic data offers a viable solution by enabling safe, broad data usage without compromising sensitive information. This paper presents the MOSTL Y AI Synthetic Data Software Development Kit (SDK), an open-source toolkit designed specifically for synthesizing high-quality tabular data. The SDK integrates robust features such as differential privacy guarantees, fairness-aware data generation, and automated quality assurance into a flexible and accessible Python interface. Leveraging the T abularARGN autoregressive framework, the SDK supports diverse data types and complex multi-table and sequential datasets, delivering competitive performance with notable improvements in speed and usability. Currently deployed both as a cloud service and locally installable software, the SDK has seen rapid adoption, highlighting its practicality in addressing real-world data bottlenecks and promoting widespread data democratization. HE development of Machine Learning applications requires broad access to training data. This necessity has become more critical in recent years with the advent of Deep Learning, which requires large-scale datasets to effectively train models.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Nov-14-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.83)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found