Goto

Collaborating Authors

 findiff


Generative AI for Banks: Benchmarks and Algorithms for Synthetic Financial Transaction Data

arXiv.org Artificial Intelligence

The banking sector, as a data-driven industry, relies on the availability of high-quality data to create value and protect its customers. The synergy between recent deep learning (DL) advancements, and the sector's data needs presents a growth potential of USD$4.6 trillion by 2035 (Accenture, 2017). However, deploying DL models is challenging due to the need for large, high-quality training data (Ryll et al., 2020), a difficulty made worse by the intricacy of financial transaction data (with complex data patterns and time-related characteristics), and strict regulations that limit data sharing (EU Regulation 2016/679, PCI DSS v4.0). One possible solution is to use synthetic data which is artificially generated rather than drawn from real-world events to increase samples in the minority class (Jordon et al., 2022), and allow safe data sharing between financial institutions while protecting privacy (Karst et al., 2024). This approach is essential for improving models used in assessing risks and detecting fraud.


FinDiff: Diffusion Models for Financial Tabular Data Generation

arXiv.org Artificial Intelligence

The sharing of microdata, such as fund holdings and derivative instruments, by regulatory institutions presents a unique challenge due to strict data confidentiality and privacy regulations. These challenges often hinder the ability of both academics and practitioners to conduct collaborative research effectively. The emergence of generative models, particularly diffusion models, capable of synthesizing data mimicking the underlying distributions of real-world data presents a compelling solution. This work introduces 'FinDiff', a diffusion model designed to generate real-world financial tabular data for a variety of regulatory downstream tasks, for example economic scenario modeling, stress tests, and fraud detection. The model uses embedding encodings to model mixed modality financial data, comprising both categorical and numeric attributes. The performance of FinDiff in generating synthetic tabular financial data is evaluated against state-of-the-art baseline models using three real-world financial datasets (including two publicly available datasets and one proprietary dataset). Empirical results demonstrate that FinDiff excels in generating synthetic tabular financial data with high fidelity, privacy, and utility.