Synthetic Health-related Longitudinal Data with Mixed-type Variables Generated using Diffusion Models
Kuo, Nicholas I-Hsien, Jorm, Louisa, Barbieri, Sebastiano
–arXiv.org Artificial Intelligence
This paper presents a novel approach to simulating electronic health records (EHRs) using diffusion probabilistic models (DPMs). Specifically, we demonstrate the effectiveness of DPMs in synthesising longitudinal EHRs that capture mixed-type variables, including numeric, binary, and categorical variables. To our knowledge, this represents the first use of DPMs for this purpose. We compared our DPM-simulated datasets to previous state-of-the-art results based on generative adversarial networks (GANs) for two clinical applications: acute hypotension and human immunodeficiency virus (ART for HIV). Given the lack of similar previous studies in DPMs, a core component of our work involves exploring the advantages and caveats of employing DPMs across a wide range of aspects. In addition to assessing the realism of the synthetic datasets, we also trained reinforcement learning (RL) agents on the synthetic data to evaluate their utility for supporting the development of downstream machine learning models. Finally, we estimated that our DPM-simulated datasets are secure and posed a low patient exposure risk for public access.
arXiv.org Artificial Intelligence
Mar-21-2023
- Country:
- South America > Chile
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- Canada (0.04)
- United States > Iowa
- Story County > Ames (0.04)
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (0.68)
- Research Report
- Industry:
- Health & Medicine > Therapeutic Area
- Infections and Infectious Diseases (1.00)
- Immunology > HIV (0.36)
- Health & Medicine > Therapeutic Area
- Technology: