close
Skip to main content

TRACE: Temporally Reliable Anatomically-Conditioned 3D CT Generation with Enhanced Efficiency

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2025 (MICCAI 2025)

Abstract

3D medical image generation is essential for data augmentation and patient privacy, calling for reliable and efficient models suited for clinical practice. However, current methods suffer from limited anatomical fidelity, restricted axial length, and substantial computational cost, placing them beyond reach for regions with limited resources and infrastructure. We introduce TRACE, a framework that generates 3D medical images with spatiotemporal alignment using a 2D multimodal-conditioned diffusion approach. TRACE models sequential 2D slices as video frame pairs, combining segmentation priors and radiology reports for anatomical alignment, incorporating optical flow to sustain temporal coherence. During inference, an overlapping-frame strategy links frame pairs into a flexible length sequence, reconstructed into a spatiotemporally and anatomically aligned 3D volume. Experimental results demonstrate that TRACE effectively balances computational efficiency with preserving anatomical fidelity and spatiotemporal consistency. Code is available at: https://github.com/VinyehShaw/TRACE

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - view details

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Antonelli, M., Reinke, A., Bakas, S.E.A.: The medical segmentation decathlon. Nat. Commun. 13(1), 4128 (2022)

    Google Scholar 

  2. Butte, S., Wang, H., Xian, M.E.A.: Sharp-GAN: Sharpness loss regularized GAN for histopathology image synthesis. In: Proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 1–5 (2022)

    Google Scholar 

  3. Cao, S., Konz, N., Duncan, J.E.A.: deep learning for breast MRI style transfer with limited training data. J. Digit. Imaging 36(2), 666–678 (2023)

    Google Scholar 

  4. Caron, M., Touvron, H., Misra, I.E.A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)

    Google Scholar 

  5. Chen, Y., Yang, X., Yue, X.E.A.: A general variation-driven network for medical image synthesis. Appl. Intell. 54(4), 3295–3307 (2024)

    Google Scholar 

  6. Choi, Y., Choi, M., Kim, M.E.A.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797 (2018)

    Google Scholar 

  7. Fang, Y., Zhu, H., Zeng, Y.E.A.: Perceptual quality assessment of smartphone photography. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3677–3686 (2020)

    Google Scholar 

  8. Hamamci, I.E., Er, S., Almas, F.E.A.: A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities. arXiv preprint arXiv:2403.17834 (2024)

  9. Hamamci, I.E., Er, S., Sekuboyina, A.E.A.: Generatect: text-conditional generation of 3D chest CT volumes. arXiv preprint arXiv:2305.16037 (2023)

  10. Han, K., Xiong, Y., You, C.E.A.: Medgen3d: a deep generative framework for paired 3D image and mask generation. In: Proceedings of the MICCAI, pp. 759–769 (2023)

    Google Scholar 

  11. He, Y., Guo, P., Tang, Y.E.A.: Vista3d: versatile imaging segmentation and annotation model for 3D computed tomography. arXiv preprint arXiv:2406.05285 (2024)

  12. Heusel, M., Ramsauer, H., Unterthiner, T.E.A.: Gans trained by a two time-scale update rule converge to a local NASH equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  13. Ho, J., Chan, W., Saharia, C.E.A.: Imagen video: high definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022)

  14. Hou, L., Agarwal, A., Samaras, D.E.A.: Robust histopathology image analysis: to label or to synthesize? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8533–8542 (2019)

    Google Scholar 

  15. Huang, Z., et al.: Vbench: Comprehensive benchmark suite for video generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21807–21818 (2024)

    Google Scholar 

  16. Kazerouni, A., Aghdam, E.K., Heidari, M.E.A.: Diffusion models in medical imaging: a comprehensive survey. Med. Image Anal. 88, 102846 (2023)

    Google Scholar 

  17. Ke, J., Wang, Q., Wang, Y.E.A.: Musiq: multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5148–5157 (2021)

    Google Scholar 

  18. Konz, N., Chen, Y., Dong, H.E.A.: Anatomically-controllable medical image generation with segmentation-guided diffusion models. In: Proceedings of the MICCAI, pp. 88–98 (2024)

    Google Scholar 

  19. Lalande, A., Chen, Z., Pommier, T.E.A.: Deep learning methods for automatic evaluation of delayed enhancement-MRI. the results of the EMIDEC challenge. Med. Image Anal. 79, 102428 (2022)

    Google Scholar 

  20. Lamba, R., McGahan, J.P., Corwin, M.T.E.A.: Ct hounsfield numbers of soft tissues on unenhanced abdominal CT scans: variability between two different manufacturers’ MDCT scanners. Am. J. Roentgenol. 203(5), 1013–1020 (2014)

    Google Scholar 

  21. Li, Z., Zhu, Z.L., Han, L.H.E.A.: Amt: All-pairs multi-field transforms for efficient frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9801–9810 (2023)

    Google Scholar 

  22. Loshchilov, I.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  23. Moser, J., Sheard, S., Edyvean, S.E.A.: Radiation dose-reduction strategies in thoracic CT. Clin. Radiol. 72(5), 407–420 (2017)

    Google Scholar 

  24. Pan, M., Gan, Y., Zhou, F.E.A.: Diffuseir: diffusion models for isotropic reconstruction of 3D microscopic images. In: Proceedings of the MICCAI, pp. 323–332 (2023)

    Google Scholar 

  25. Radford, A., Kim, J.W., Hallacy, C.E.A.: Learning transferable visual models from natural language supervision. In: Proceedings of the International Conference on Machine Learning, pp. 8748–8763 (2021)

    Google Scholar 

  26. Rombach, R., Blattmann, A., Lorenz, D.E.A.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

    Google Scholar 

  27. Saharia, C., Chan, W., Saxena, S.E.A.: Photorealistic text-to-image diffusion models with deep language understanding. In: Advances in Neural Information Processing Systems, vol. 35, pp. 36479–36494 (2022)

    Google Scholar 

  28. Shao, M., et al.: Rethinking brain tumor segmentation from the frequency domain perspective. IEEE Trans. Med. Imaging (2025)

    Google Scholar 

  29. Taha, A.A., Hanbury, A.: Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC Med. Imaging 15, 1–28 (2015)

    Google Scholar 

  30. Teed, Z., Deng, J.: RAFT: Recurrent All-pairs Field Transforms for Optical Flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24

    Chapter  Google Scholar 

  31. Unterthiner, T., van Steenkiste, S., Kurach, K.E.A.: FVD: a new metric for video generation (2019)

    Google Scholar 

  32. Villegas, R., Babaeizadeh, M., Kindermans, P.J.E.A.: Phenaki: variable length video generation from open domain textual descriptions. In: Proceedings of the International Conference on Learning Representations (2022)

    Google Scholar 

  33. Wang, Y., He, Y., Li, Y.E.A.: Internvid: a large-scale video-text dataset for multimodal understanding and generation. arXiv preprint arXiv:2307.06942 (2023)

  34. Xu, Y., Sun, L., Peng, W.E.A.: Medsyn: text-guided anatomy-aware synthesis of high-fidelity 3D CT images. IEEE Trans. Med. Imaging (2024)

    Google Scholar 

  35. Yang, J., Dvornek, N.C., Zhang, F., Chapiro, J., Lin, M.D., Duncan, J.S.: Unsupervised Domain Adaptation via Disentangled Representations: Application to Cross-modality Liver Segmentation. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11765, pp. 255–263. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_29

    Chapter  Google Scholar 

  36. Yu, X., Li, G., Lou, W.E.A.: Diffusion-based data augmentation for nuclei image segmentation. In: Proceedings of the MICCAI, pp. 592–602 (2023)

    Google Scholar 

  37. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)

    Google Scholar 

  38. Zhu, L., Xue, Z., Jin, Z.E.A.: Make-a-volume: leveraging latent diffusion models for cross-modality 3D brain MRI synthesis. In: Proceedings of the MICCAI, pp. 592–601 (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Long.

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2026 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shao, M. et al. (2026). TRACE: Temporally Reliable Anatomically-Conditioned 3D CT Generation with Enhanced Efficiency. In: Gee, J.C., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2025. MICCAI 2025. Lecture Notes in Computer Science, vol 15963. Springer, Cham. https://doi.org/10.1007/978-3-032-04965-0_59

Download citation

Keywords

Publish with us

Policies and ethics

Profiles

  1. Minye Shao
  2. Jingjing Deng