Sdreplay: diffusion model for continual semantic segmentation in traffic scenarios

Jiang, Jian; Tian, Yan; Xu, Yongchuan; Xu, Zhaocheng; Wang, Xun

doi:10.1007/s00530-025-02049-0

Sdreplay: diffusion model for continual semantic segmentation in traffic scenarios

Regular Paper
Published: 31 October 2025

Volume 31, article number 463 (2025)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Jian Jiang^1,2,5,
Yan Tian^3,5,
Yongchuan Xu^3,5,
Zhaocheng Xu^4,5 &
…
Xun Wang^3,5

155 Accesses
Explore all metrics

Abstract

Traditional semantic segmentation conducts pixel-level classification on fixed classes, which results in catastrophic forgetting when fine-tuning the segmentation model on new data. Continual semantic segmentation has been introduced to address this challenge; however, replaying methods based on generative adversarial networks (GANs) cannot guarantee either semantic accuracy in generated images or distribution alignment between original training data and generated images. Motivated by the diffusion model, which inherently considers the entire data distribution, we propose a replay module named SDReplay with a dual-generator architecture to generate images of old classes with accurate semantics and an aligned distribution, where the Structure-Preserved Generator (SPG) synthesizes high-fidelity imagery with precise semantic consistency by leveraging structural priors, while the Distribution-Aligned Generator (DAG) ensures robust distributional fidelity for legacy classes through advanced token embedding optimization. The results in multiple datasets show that our approach improves the mean intersection-over-union (mIoU) by approximately 1.0%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

R&D: Balancing Reliability and Diversity in Synthetic Data Augmentation for Semantic Segmentation

DIAGen: Semantically Diverse Image Augmentation with Generative Models for Few-Shot Learning

Federated Generative Adversarial Learning

Data Availability

No datasets were generated or analysed during the current study.

References

Yuqiao, X., Huang, S., Zhou, H.: Ca-clip: category-aware adaptation of clip model for few-shot class-incremental learning. Multimedia Syst. 30(3), 130 (2024)
Article Google Scholar
Mengying, F., Binghao, L., Tianren, M., Qixiang, Y.: Overcomplete-to-sparse representation learning for few-shot class-incremental learning. Multimedia Systems 30(2), 102 (2024)
Article Google Scholar
Tian, Y., Zhang, Y., Chen, W.-G., Liu, D., Wang, H., Huayi, X., Han, J., Ge, Y.: 3d tooth instance segmentation learning objectness and affinity in point cloud. ACM Trans. Multimed. Comput. Commun. Appl. 18(4), 1–16 (2022)
Article Google Scholar
Tian, Y., Jian, G., Wang, J., Chen, H., Pan, L., Zhaocheng, X., Li, J., Wang, R.: A revised approach to orthodontic treatment monitoring from oralscan video. IEEE J. Biomed. Health Inform 27(12), 1–10 (2023)
Article Google Scholar
Tian, Y., Hanshi, F., Wang, H., Liu, Y., Zhaocheng, X., Chen, H., Li, J., Wang, R.: Rgb oralscan video-based orthodontic treatment monitoring. SCIENCE CHINA Inf. Sci. 67(1), 112107 (2024)
Article Google Scholar
Tian, Y., Cheng, G., Gelernter, J., et al.: Joint temporal context exploitation and active learning for video segmentation. Pattern Recognition 100, 107158 (2020)
Article Google Scholar
Tian, Y., Zhang, Y., Zhou, D., et al.: Triple attention network for video segmentation. Neurocomputing 417, 202–211 (2020)
Article Google Scholar
Zhang, C.-B., Xiao, J.-W., Liu, X., Chen, Y.-C., Cheng, M.-M.: Representation compensation networks for continual semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7053–7064, (2022)
Yang, G., Fini, E., Dan, X., Rota, P., Ding, M., Hao, T., Alameda-Pineda, X., Ricci, E.: Continual attentive fusion for incremental learning in semantic segmentation. IEEE Trans. Multimedia 25, 3841–3854 (2022)
Article Google Scholar
Oh, Y., Baek, D., Ham, B. Alife: Adaptive logit regularizer and feature replay for incremental semantic segmentation. In International Conference on Advances in Neural Information Processing Systems, pages 14516–14528, (2022)
Baek, D., Oh, Y., Lee, S., Lee,J., Ham, B.: Decomposed knowledge distillation for class-incremental semantic segmentation. In International Conference on Advances in Neural Information Processing Systems, pages 10380–10392, (2022)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Bing, X., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Article MathSciNet Google Scholar
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265, (2015)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, (2022)
Brock, A., Donahue, J., Simonyan, K.: Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, (2018)
Chamikara, M.A.P., Bertók, P., Liu, D., Camtepe, S., Khalil, I.: Efficient data perturbation for privacy preserving and accurate data stream mining. Pervasive and Mobile Computing 48, 1–19 (2018)
Article Google Scholar
Li, D., Ling, H., Kim, S.W., Kreis, K., Fidler, S.,Torralba, A. Bigdatasetgan: Synthesizing imagenet with pixel-wise annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21330–21340, (2022)
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125, 1(2):3, (2022)
Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, K., Gontijo Lopes, R., Karagol Ayan, B., Salimans, T., et al.: Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems 35, 36479–36494 (2022)
Google Scholar
Li, Z., Zhou, Q., Zhang, X., Zhang, Y., Wang, Y.,Xie, W.: Open-vocabulary object segmentation with diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7667–7676, (2023)
Wu, W., Zhao, Y., Shou, M.Z., Zhou, H., Shen, C.: Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1206–1217, (2023)
Cermelli, F., Mancini, M., Bulo, S.R., Ricci, E., Caputo, B.: Modeling the background for incremental learning in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9233–9242, (2020)
Douillard, A., Chen, Y., Dapogny, A., Cord, M. Plop: Learning without forgetting for continual semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4040–4050, (2021)
Cha, S., Yoo, Y.J., Moon, T., et al.: Ssul: Semantic segmentation with unknown label for exemplar-based class-incremental learning. In International Conference on Advances in Neural Information Processing Systems 34, 10919–10930 (2021)
Google Scholar
Chen, J., Cong, R., Luo, Y., Ip, H.H.S., Kwong, S.: Replay without saving: Prototype derivation and distribution rebalance for class-incremental semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 47(6), 4699–4716 (2025)
Article Google Scholar
Zhu, G., Dongyue, W., Gao, C., Wang, R., Yang, W., Sang, N.: Adaptive prototype replay for class incremental semantic segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence 39, 10932–10940 (2025)
Article Google Scholar
Zhu, L., Chen, T., Yin, J., See, S., Liu, J.: Continual semantic segmentation with automatic memory sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3082–3092, (2023)
Zhu, L., Chen, T., Yin, J., See, S., Soh, D.W., Liu, J.: Replay master: Automatic sample selection and effective memory utilization for continual semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–18, (2025)
Qiu, H., Feng, J., Zhao, L., Gu, C., Yu, H., Zhang, Y., Wang, Z.: Rmaf: A replay method based on active forgetting for continual learning. Neurocomputing, page 131098, (2025)
Song, Ji., Meng, C., Ermon, S.: Denoising diffusion implicit models. In International Conference on Learning Representations, pages 1156–1165, (2021)
Soria, X., Sappa, A., Humanante, P., Akbarinia, A.: Dense extreme inception network for edge detection. Pattern Recognition 139, 109461 (2023)
Article Google Scholar
Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.-Y., Ermon, S. Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, pages 2392–2402, (2021)
Gal, R., Alaluf, Y., Atzmon, Y., Patashnik, O., Bermano, A.H., Chechik, G., Cohen-Or. D.: An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, (2022)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3213–3223, (2016)
Tian, Y., Gelernter, J., Wang, X., et al.: Lane marking detection via deep convolutional neural network. Neurocomputing 280, 46–55 (2018)
Article Google Scholar
Wang, W., Dai, J., Chen, Z., Huang, Z., Li, Z., Zhu, X., Hu, X., Lu,T., Lu, L., Li, H. et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14408–14419, 2023
Zhang L., Agrawala, M.: Adding conditional control to text-to-image diffusion models. ArXiv preprint arXiv:2302.05543, (2023)
Shi, N., Li, D., Hong, M., Sun, R.: Rmsprop converges with proper hyper-parameter. In International Conference on Learning Representations, pages 1684–1695, (2020)
Tian, Y., Xu, Z., Ma, Y., Ding, W., Wang, R., Gao, Z., Cheng, G., He, L., Zhao, X.: Survey on deep learning in multimodal medical imaging for cancer detection. Neural Computing and Applications, pages 1–16, (2023)
Maracani, A., Michieli, U., Toldo, M., Zanuttigh, P. Recall: Replay-based continual learning in semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7026–7035, (2021)
Michieli, U., Zanuttigh, P.: Continual semantic segmentation via repulsion-attraction of sparse and disentangled latent representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1114–1124, (2021)

Download references

Acknowledgements

The authors would like to thank AJE (www.aje.com) for its language editing assistance during the preparation of this manuscript.

Funding

This work was supported in part by the Key R&D Program of Zhejiang Province (No. 2023C01039)

the Natural Science Foundation of Zhejiang Province (No. LZ24F020001)

the Opening Foundation of the Tongxiang Institute of General Artificial Intelligence (No. TAGI2-B-2024-0009)

and State Key Laboratory of Advanced Medical Materials and Devices.

Author information

Authors and Affiliations

School of Computer and Cyber Sciences, Communication University of China, Beijing, China
Jian Jiang
Center of Big Data, China Digital Culture Group Co., Ltd, Beijing, China
Jian Jiang
School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, China
Yan Tian, Yongchuan Xu & Xun Wang
Zhejiang Key Laboratory of Big Data and Future E-Commerce Technology, Hangzhou, China
Zhaocheng Xu
School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand
Jian Jiang, Yan Tian, Yongchuan Xu, Zhaocheng Xu & Xun Wang

Authors

Jian Jiang
View author publications
Search author on:PubMed Google Scholar
Yan Tian
View author publications
Search author on:PubMed Google Scholar
Yongchuan Xu
View author publications
Search author on:PubMed Google Scholar
Zhaocheng Xu
View author publications
Search author on:PubMed Google Scholar
Xun Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

Authors’ contributions Jian Jiang: Formal analysis, Writing – original draft preparation. Yan Tian: Conceptualization, Methodology, Writing - review & editing. Yongchuan Xu: Software, Data curation, Writing – review & editing. Zhaocheng Xu: Writing – review & editing. Xun Wang: Writing – review & editing.

Corresponding author

Correspondence to Yan Tian.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

The research does not involve human participants and/or animals. Consent for data used has already been fully informed.

Additional information

Communicated by Bing-kun Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jiang, J., Tian, Y., Xu, Y. et al. Sdreplay: diffusion model for continual semantic segmentation in traffic scenarios. Multimedia Systems 31, 463 (2025). https://doi.org/10.1007/s00530-025-02049-0

Download citation

Received: 29 May 2025
Accepted: 11 October 2025
Published: 31 October 2025
Version of record: 31 October 2025
DOI: https://doi.org/10.1007/s00530-025-02049-0

Keywords

Profiles

Yan Tian View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sdreplay: diffusion model for continual semantic segmentation in traffic scenarios

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

R&D: Balancing Reliability and Diversity in Synthetic Data Augmentation for Semantic Segmentation

DIAGen: Semantically Diverse Image Augmentation with Generative Models for Few-Shot Learning

Federated Generative Adversarial Learning

Explore related subjects

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now