CTFlow: Video-Inspired Latent Flow Matching for 3D CT Synthesis

Wang, Jiayi, Reynaud, Hadrien, Erick, Franciskus Xaverius, Kainz, Bernhard

Aug-19-2025–arXiv.org Artificial Intelligence

Generative modelling of entire CT volumes conditioned on clinical reports has the potential to accelerate research through data augmentation, privacy-preserving synthesis and reducing regulator-constraints on patient data while preserving diagnostic signals. With the recent release of CT-RATE, a large-scale collection of 3D CT volumes paired with their respective clinical reports, training large text-conditioned CT volume generation models has become achievable. In this work, we introduce CTFlow, a 0.5B latent flow matching transformer model, conditioned on clinical reports. W e leverage the A-VAE from FLUX to define our latent space, and rely on the CT-Clip text encoder to encode the clinical reports. T o generate consistent whole CT volumes while keeping the memory constraints tractable, we rely on a custom autoregressive approach, where the model predicts the first sequence of slices of the volume from text-only, and then relies on the previously generated sequence of slices and the text, to predict the following sequence. W e evaluate our results against state-of-the-art generative CT model, and demonstrate the superiority of our approach in terms of temporal coherence, image diversity and text-image alignment, with FID, FVD, IS scores and CLIP score.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Aug-19-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.48)

Industry:
- Health & Medicine > Diagnostic Medicine > Imaging (0.95)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (0.94)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)