Abstract
3D medical image generation is essential for data augmentation and patient privacy, calling for reliable and efficient models suited for clinical practice. However, current methods suffer from limited anatomical fidelity, restricted axial length, and substantial computational cost, placing them beyond reach for regions with limited resources and infrastructure. We introduce TRACE, a framework that generates 3D medical images with spatiotemporal alignment using a 2D multimodal-conditioned diffusion approach. TRACE models sequential 2D slices as video frame pairs, combining segmentation priors and radiology reports for anatomical alignment, incorporating optical flow to sustain temporal coherence. During inference, an overlapping-frame strategy links frame pairs into a flexible length sequence, reconstructed into a spatiotemporally and anatomically aligned 3D volume. Experimental results demonstrate that TRACE effectively balances computational efficiency with preserving anatomical fidelity and spatiotemporal consistency. Code is available at: https://github.com/VinyehShaw/TRACE
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Antonelli, M., Reinke, A., Bakas, S.E.A.: The medical segmentation decathlon. Nat. Commun. 13(1), 4128 (2022)
Butte, S., Wang, H., Xian, M.E.A.: Sharp-GAN: Sharpness loss regularized GAN for histopathology image synthesis. In: Proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 1–5 (2022)
Cao, S., Konz, N., Duncan, J.E.A.: deep learning for breast MRI style transfer with limited training data. J. Digit. Imaging 36(2), 666–678 (2023)
Caron, M., Touvron, H., Misra, I.E.A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Chen, Y., Yang, X., Yue, X.E.A.: A general variation-driven network for medical image synthesis. Appl. Intell. 54(4), 3295–3307 (2024)
Choi, Y., Choi, M., Kim, M.E.A.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797 (2018)
Fang, Y., Zhu, H., Zeng, Y.E.A.: Perceptual quality assessment of smartphone photography. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3677–3686 (2020)
Hamamci, I.E., Er, S., Almas, F.E.A.: A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities. arXiv preprint arXiv:2403.17834 (2024)
Hamamci, I.E., Er, S., Sekuboyina, A.E.A.: Generatect: text-conditional generation of 3D chest CT volumes. arXiv preprint arXiv:2305.16037 (2023)
Han, K., Xiong, Y., You, C.E.A.: Medgen3d: a deep generative framework for paired 3D image and mask generation. In: Proceedings of the MICCAI, pp. 759–769 (2023)
He, Y., Guo, P., Tang, Y.E.A.: Vista3d: versatile imaging segmentation and annotation model for 3D computed tomography. arXiv preprint arXiv:2406.05285 (2024)
Heusel, M., Ramsauer, H., Unterthiner, T.E.A.: Gans trained by a two time-scale update rule converge to a local NASH equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Ho, J., Chan, W., Saharia, C.E.A.: Imagen video: high definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022)
Hou, L., Agarwal, A., Samaras, D.E.A.: Robust histopathology image analysis: to label or to synthesize? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8533–8542 (2019)
Huang, Z., et al.: Vbench: Comprehensive benchmark suite for video generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21807–21818 (2024)
Kazerouni, A., Aghdam, E.K., Heidari, M.E.A.: Diffusion models in medical imaging: a comprehensive survey. Med. Image Anal. 88, 102846 (2023)
Ke, J., Wang, Q., Wang, Y.E.A.: Musiq: multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5148–5157 (2021)
Konz, N., Chen, Y., Dong, H.E.A.: Anatomically-controllable medical image generation with segmentation-guided diffusion models. In: Proceedings of the MICCAI, pp. 88–98 (2024)
Lalande, A., Chen, Z., Pommier, T.E.A.: Deep learning methods for automatic evaluation of delayed enhancement-MRI. the results of the EMIDEC challenge. Med. Image Anal. 79, 102428 (2022)
Lamba, R., McGahan, J.P., Corwin, M.T.E.A.: Ct hounsfield numbers of soft tissues on unenhanced abdominal CT scans: variability between two different manufacturers’ MDCT scanners. Am. J. Roentgenol. 203(5), 1013–1020 (2014)
Li, Z., Zhu, Z.L., Han, L.H.E.A.: Amt: All-pairs multi-field transforms for efficient frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9801–9810 (2023)
Loshchilov, I.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Moser, J., Sheard, S., Edyvean, S.E.A.: Radiation dose-reduction strategies in thoracic CT. Clin. Radiol. 72(5), 407–420 (2017)
Pan, M., Gan, Y., Zhou, F.E.A.: Diffuseir: diffusion models for isotropic reconstruction of 3D microscopic images. In: Proceedings of the MICCAI, pp. 323–332 (2023)
Radford, A., Kim, J.W., Hallacy, C.E.A.: Learning transferable visual models from natural language supervision. In: Proceedings of the International Conference on Machine Learning, pp. 8748–8763 (2021)
Rombach, R., Blattmann, A., Lorenz, D.E.A.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Saharia, C., Chan, W., Saxena, S.E.A.: Photorealistic text-to-image diffusion models with deep language understanding. In: Advances in Neural Information Processing Systems, vol. 35, pp. 36479–36494 (2022)
Shao, M., et al.: Rethinking brain tumor segmentation from the frequency domain perspective. IEEE Trans. Med. Imaging (2025)
Taha, A.A., Hanbury, A.: Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC Med. Imaging 15, 1–28 (2015)
Teed, Z., Deng, J.: RAFT: Recurrent All-pairs Field Transforms for Optical Flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Unterthiner, T., van Steenkiste, S., Kurach, K.E.A.: FVD: a new metric for video generation (2019)
Villegas, R., Babaeizadeh, M., Kindermans, P.J.E.A.: Phenaki: variable length video generation from open domain textual descriptions. In: Proceedings of the International Conference on Learning Representations (2022)
Wang, Y., He, Y., Li, Y.E.A.: Internvid: a large-scale video-text dataset for multimodal understanding and generation. arXiv preprint arXiv:2307.06942 (2023)
Xu, Y., Sun, L., Peng, W.E.A.: Medsyn: text-guided anatomy-aware synthesis of high-fidelity 3D CT images. IEEE Trans. Med. Imaging (2024)
Yang, J., Dvornek, N.C., Zhang, F., Chapiro, J., Lin, M.D., Duncan, J.S.: Unsupervised Domain Adaptation via Disentangled Representations: Application to Cross-modality Liver Segmentation. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11765, pp. 255–263. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_29
Yu, X., Li, G., Lou, W.E.A.: Diffusion-based data augmentation for nuclei image segmentation. In: Proceedings of the MICCAI, pp. 592–602 (2023)
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Zhu, L., Xue, Z., Jin, Z.E.A.: Make-a-volume: leveraging latent diffusion models for cross-modality 3D brain MRI synthesis. In: Proceedings of the MICCAI, pp. 592–601 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2026 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shao, M. et al. (2026). TRACE: Temporally Reliable Anatomically-Conditioned 3D CT Generation with Enhanced Efficiency. In: Gee, J.C., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2025. MICCAI 2025. Lecture Notes in Computer Science, vol 15963. Springer, Cham. https://doi.org/10.1007/978-3-032-04965-0_59
Download citation
DOI: https://doi.org/10.1007/978-3-032-04965-0_59
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-032-04964-3
Online ISBN: 978-3-032-04965-0
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science