CTFlow: Video-Inspired Latent Flow Matching for 3D CT Synthesis
Wang, Jiayi, Reynaud, Hadrien, Erick, Franciskus Xaverius, Kainz, Bernhard
–arXiv.org Artificial Intelligence
Generative modelling of entire CT volumes conditioned on clinical reports has the potential to accelerate research through data augmentation, privacy-preserving synthesis and reducing regulator-constraints on patient data while preserving diagnostic signals. With the recent release of CT-RATE, a large-scale collection of 3D CT volumes paired with their respective clinical reports, training large text-conditioned CT volume generation models has become achievable. In this work, we introduce CTFlow, a 0.5B latent flow matching transformer model, conditioned on clinical reports. W e leverage the A-VAE from FLUX to define our latent space, and rely on the CT-Clip text encoder to encode the clinical reports. T o generate consistent whole CT volumes while keeping the memory constraints tractable, we rely on a custom autoregressive approach, where the model predicts the first sequence of slices of the volume from text-only, and then relies on the previously generated sequence of slices and the text, to predict the following sequence. W e evaluate our results against state-of-the-art generative CT model, and demonstrate the superiority of our approach in terms of temporal coherence, image diversity and text-image alignment, with FID, FVD, IS scores and CLIP score.
arXiv.org Artificial Intelligence
Aug-19-2025
- Country:
- Asia > Middle East
- Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Europe
- Germany > Bavaria
- Middle Franconia > Nuremberg (0.40)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- United Kingdom
- England > Greater London
- London (0.04)
- North Sea > Southern North Sea (0.04)
- England > Greater London
- Germany > Bavaria
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Health & Medicine > Diagnostic Medicine > Imaging (0.95)
- Technology: