TRACE: Temporally Reliable Anatomically-Conditioned 3D CT Generation with Enhanced Efficiency

Shao, Minye; Miao, Xingyu; Duan, Haoran; Wang, Zeyu; Chen, Jingkun; Huang, Yawen; Wu, Xian; Deng, Jingjing; Long, Yang; Zheng, Yefeng

doi:10.1007/978-3-032-04965-0_59

Minye Shao¹⁶,
Xingyu Miao¹⁶,
Haoran Duan¹⁷,
Zeyu Wang¹⁸,
Jingkun Chen¹⁹,
Yawen Huang²⁰,
Xian Wu²⁰,
Jingjing Deng²¹,
Yang Long¹⁶ &
…
Yefeng Zheng²²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15963))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

1841 Accesses
1 Citation

Abstract

3D medical image generation is essential for data augmentation and patient privacy, calling for reliable and efficient models suited for clinical practice. However, current methods suffer from limited anatomical fidelity, restricted axial length, and substantial computational cost, placing them beyond reach for regions with limited resources and infrastructure. We introduce TRACE, a framework that generates 3D medical images with spatiotemporal alignment using a 2D multimodal-conditioned diffusion approach. TRACE models sequential 2D slices as video frame pairs, combining segmentation priors and radiology reports for anatomical alignment, incorporating optical flow to sustain temporal coherence. During inference, an overlapping-frame strategy links frame pairs into a flexible length sequence, reconstructed into a spatiotemporally and anatomically aligned 3D volume. Experimental results demonstrate that TRACE effectively balances computational efficiency with preserving anatomical fidelity and spatiotemporal consistency. Code is available at: https://github.com/VinyehShaw/TRACE

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cycle Ynet: Semi-supervised Tracking of 3D Anatomical Landmarks

Anatomically-Controllable Medical Image Generation with Segmentation-Guided Diffusion Models

Robust Multi-scale Anatomical Landmark Detection in Incomplete 3D-CT Data

References

Antonelli, M., Reinke, A., Bakas, S.E.A.: The medical segmentation decathlon. Nat. Commun. 13(1), 4128 (2022)
Google Scholar
Butte, S., Wang, H., Xian, M.E.A.: Sharp-GAN: Sharpness loss regularized GAN for histopathology image synthesis. In: Proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 1–5 (2022)
Google Scholar
Cao, S., Konz, N., Duncan, J.E.A.: deep learning for breast MRI style transfer with limited training data. J. Digit. Imaging 36(2), 666–678 (2023)
Google Scholar
Caron, M., Touvron, H., Misra, I.E.A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Google Scholar
Chen, Y., Yang, X., Yue, X.E.A.: A general variation-driven network for medical image synthesis. Appl. Intell. 54(4), 3295–3307 (2024)
Google Scholar
Choi, Y., Choi, M., Kim, M.E.A.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797 (2018)
Google Scholar
Fang, Y., Zhu, H., Zeng, Y.E.A.: Perceptual quality assessment of smartphone photography. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3677–3686 (2020)
Google Scholar
Hamamci, I.E., Er, S., Almas, F.E.A.: A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities. arXiv preprint arXiv:2403.17834 (2024)
Hamamci, I.E., Er, S., Sekuboyina, A.E.A.: Generatect: text-conditional generation of 3D chest CT volumes. arXiv preprint arXiv:2305.16037 (2023)
Han, K., Xiong, Y., You, C.E.A.: Medgen3d: a deep generative framework for paired 3D image and mask generation. In: Proceedings of the MICCAI, pp. 759–769 (2023)
Google Scholar
He, Y., Guo, P., Tang, Y.E.A.: Vista3d: versatile imaging segmentation and annotation model for 3D computed tomography. arXiv preprint arXiv:2406.05285 (2024)
Heusel, M., Ramsauer, H., Unterthiner, T.E.A.: Gans trained by a two time-scale update rule converge to a local NASH equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Ho, J., Chan, W., Saharia, C.E.A.: Imagen video: high definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022)
Hou, L., Agarwal, A., Samaras, D.E.A.: Robust histopathology image analysis: to label or to synthesize? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8533–8542 (2019)
Google Scholar
Huang, Z., et al.: Vbench: Comprehensive benchmark suite for video generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21807–21818 (2024)
Google Scholar
Kazerouni, A., Aghdam, E.K., Heidari, M.E.A.: Diffusion models in medical imaging: a comprehensive survey. Med. Image Anal. 88, 102846 (2023)
Google Scholar
Ke, J., Wang, Q., Wang, Y.E.A.: Musiq: multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5148–5157 (2021)
Google Scholar
Konz, N., Chen, Y., Dong, H.E.A.: Anatomically-controllable medical image generation with segmentation-guided diffusion models. In: Proceedings of the MICCAI, pp. 88–98 (2024)
Google Scholar
Lalande, A., Chen, Z., Pommier, T.E.A.: Deep learning methods for automatic evaluation of delayed enhancement-MRI. the results of the EMIDEC challenge. Med. Image Anal. 79, 102428 (2022)
Google Scholar
Lamba, R., McGahan, J.P., Corwin, M.T.E.A.: Ct hounsfield numbers of soft tissues on unenhanced abdominal CT scans: variability between two different manufacturers’ MDCT scanners. Am. J. Roentgenol. 203(5), 1013–1020 (2014)
Google Scholar
Li, Z., Zhu, Z.L., Han, L.H.E.A.: Amt: All-pairs multi-field transforms for efficient frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9801–9810 (2023)
Google Scholar
Loshchilov, I.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Moser, J., Sheard, S., Edyvean, S.E.A.: Radiation dose-reduction strategies in thoracic CT. Clin. Radiol. 72(5), 407–420 (2017)
Google Scholar
Pan, M., Gan, Y., Zhou, F.E.A.: Diffuseir: diffusion models for isotropic reconstruction of 3D microscopic images. In: Proceedings of the MICCAI, pp. 323–332 (2023)
Google Scholar
Radford, A., Kim, J.W., Hallacy, C.E.A.: Learning transferable visual models from natural language supervision. In: Proceedings of the International Conference on Machine Learning, pp. 8748–8763 (2021)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D.E.A.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Google Scholar
Saharia, C., Chan, W., Saxena, S.E.A.: Photorealistic text-to-image diffusion models with deep language understanding. In: Advances in Neural Information Processing Systems, vol. 35, pp. 36479–36494 (2022)
Google Scholar
Shao, M., et al.: Rethinking brain tumor segmentation from the frequency domain perspective. IEEE Trans. Med. Imaging (2025)
Google Scholar
Taha, A.A., Hanbury, A.: Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC Med. Imaging 15, 1–28 (2015)
Google Scholar
Teed, Z., Deng, J.: RAFT: Recurrent All-pairs Field Transforms for Optical Flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Chapter Google Scholar
Unterthiner, T., van Steenkiste, S., Kurach, K.E.A.: FVD: a new metric for video generation (2019)
Google Scholar
Villegas, R., Babaeizadeh, M., Kindermans, P.J.E.A.: Phenaki: variable length video generation from open domain textual descriptions. In: Proceedings of the International Conference on Learning Representations (2022)
Google Scholar
Wang, Y., He, Y., Li, Y.E.A.: Internvid: a large-scale video-text dataset for multimodal understanding and generation. arXiv preprint arXiv:2307.06942 (2023)
Xu, Y., Sun, L., Peng, W.E.A.: Medsyn: text-guided anatomy-aware synthesis of high-fidelity 3D CT images. IEEE Trans. Med. Imaging (2024)
Google Scholar
Yang, J., Dvornek, N.C., Zhang, F., Chapiro, J., Lin, M.D., Duncan, J.S.: Unsupervised Domain Adaptation via Disentangled Representations: Application to Cross-modality Liver Segmentation. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11765, pp. 255–263. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_29
Chapter Google Scholar
Yu, X., Li, G., Lou, W.E.A.: Diffusion-based data augmentation for nuclei image segmentation. In: Proceedings of the MICCAI, pp. 592–602 (2023)
Google Scholar
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Zhu, L., Xue, Z., Jin, Z.E.A.: Make-a-volume: leveraging latent diffusion models for cross-modality 3D brain MRI synthesis. In: Proceedings of the MICCAI, pp. 592–601 (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Durham University, Durham, UK
Minye Shao, Xingyu Miao & Yang Long
Department of Automation, Tsinghua University, Beijing, China
Haoran Duan
College of Computer Science and Engineering, Dalian Minzu University, Dalian, China
Zeyu Wang
Department of Engineering Science, University of Oxford, Oxford, UK
Jingkun Chen
Jarvis Research Center, Tencent YouTu Lab, Shenzheng, China
Yawen Huang & Xian Wu
School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK
Jingjing Deng
Medical Artificial Intelligence Laboratory, Westlake University, Hangzhou, China
Yefeng Zheng

Authors

Minye Shao
View author publications
Search author on:PubMed Google Scholar
Xingyu Miao
View author publications
Search author on:PubMed Google Scholar
Haoran Duan
View author publications
Search author on:PubMed Google Scholar
Zeyu Wang
View author publications
Search author on:PubMed Google Scholar
Jingkun Chen
View author publications
Search author on:PubMed Google Scholar
Yawen Huang
View author publications
Search author on:PubMed Google Scholar
Xian Wu
View author publications
Search author on:PubMed Google Scholar
Jingjing Deng
View author publications
Search author on:PubMed Google Scholar
Yang Long
View author publications
Search author on:PubMed Google Scholar
Yefeng Zheng
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Yang Long.

Editor information

Editors and Affiliations

University of Pennsylvania, Philadelphia, PA, USA
James C. Gee
University College London, London, UK
Daniel C. Alexander
DGIST, Daegu, Korea (Republic of)
Jaesung Hong
Massachusetts General Hospital and Harvard Medical School, Charlestown, MA, USA
Juan Eugenio Iglesias
University College London, London, UK
Carole H. Sudre
Boston University, Boston, MA, USA
Archana Venkataraman
MIT, Cambridge, MA, USA
Polina Golland
Seoul National University, Seoul, Korea (Republic of)
Jong Hyo Kim
KAIST, Daejeon, Korea (Republic of)
Jinah Park

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shao, M. et al. (2026). TRACE: Temporally Reliable Anatomically-Conditioned 3D CT Generation with Enhanced Efficiency. In: Gee, J.C., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2025. MICCAI 2025. Lecture Notes in Computer Science, vol 15963. Springer, Cham. https://doi.org/10.1007/978-3-032-04965-0_59

Download citation

DOI: https://doi.org/10.1007/978-3-032-04965-0_59
Published: 19 September 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-032-04964-3
Online ISBN: 978-3-032-04965-0
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

Keywords

Publish with us

Policies and ethics

Profiles

Minye Shao View author profile
Jingjing Deng View author profile

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

TRACE: Temporally Reliable Anatomically-Conditioned 3D CT Generation with Enhanced Efficiency