close
Skip to main content
Log in

Sdreplay: diffusion model for continual semantic segmentation in traffic scenarios

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Traditional semantic segmentation conducts pixel-level classification on fixed classes, which results in catastrophic forgetting when fine-tuning the segmentation model on new data. Continual semantic segmentation has been introduced to address this challenge; however, replaying methods based on generative adversarial networks (GANs) cannot guarantee either semantic accuracy in generated images or distribution alignment between original training data and generated images. Motivated by the diffusion model, which inherently considers the entire data distribution, we propose a replay module named SDReplay with a dual-generator architecture to generate images of old classes with accurate semantics and an aligned distribution, where the Structure-Preserved Generator (SPG) synthesizes high-fidelity imagery with precise semantic consistency by leveraging structural priors, while the Distribution-Aligned Generator (DAG) ensures robust distributional fidelity for legacy classes through advanced token embedding optimization. The results in multiple datasets show that our approach improves the mean intersection-over-union (mIoU) by approximately 1.0%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

No datasets were generated or analysed during the current study.

References

  1. Yuqiao, X., Huang, S., Zhou, H.: Ca-clip: category-aware adaptation of clip model for few-shot class-incremental learning. Multimedia Syst. 30(3), 130 (2024)

    Article  Google Scholar 

  2. Mengying, F., Binghao, L., Tianren, M., Qixiang, Y.: Overcomplete-to-sparse representation learning for few-shot class-incremental learning. Multimedia Systems 30(2), 102 (2024)

    Article  Google Scholar 

  3. Tian, Y., Zhang, Y., Chen, W.-G., Liu, D., Wang, H., Huayi, X., Han, J., Ge, Y.: 3d tooth instance segmentation learning objectness and affinity in point cloud. ACM Trans. Multimed. Comput. Commun. Appl. 18(4), 1–16 (2022)

    Article  Google Scholar 

  4. Tian, Y., Jian, G., Wang, J., Chen, H., Pan, L., Zhaocheng, X., Li, J., Wang, R.: A revised approach to orthodontic treatment monitoring from oralscan video. IEEE J. Biomed. Health Inform 27(12), 1–10 (2023)

    Article  Google Scholar 

  5. Tian, Y., Hanshi, F., Wang, H., Liu, Y., Zhaocheng, X., Chen, H., Li, J., Wang, R.: Rgb oralscan video-based orthodontic treatment monitoring. SCIENCE CHINA Inf. Sci. 67(1), 112107 (2024)

    Article  Google Scholar 

  6. Tian, Y., Cheng, G., Gelernter, J., et al.: Joint temporal context exploitation and active learning for video segmentation. Pattern Recognition 100, 107158 (2020)

    Article  Google Scholar 

  7. Tian, Y., Zhang, Y., Zhou, D., et al.: Triple attention network for video segmentation. Neurocomputing 417, 202–211 (2020)

    Article  Google Scholar 

  8. Zhang, C.-B., Xiao, J.-W., Liu, X., Chen, Y.-C., Cheng, M.-M.: Representation compensation networks for continual semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7053–7064, (2022)

  9. Yang, G., Fini, E., Dan, X., Rota, P., Ding, M., Hao, T., Alameda-Pineda, X., Ricci, E.: Continual attentive fusion for incremental learning in semantic segmentation. IEEE Trans. Multimedia 25, 3841–3854 (2022)

    Article  Google Scholar 

  10. Oh, Y., Baek, D., Ham, B. Alife: Adaptive logit regularizer and feature replay for incremental semantic segmentation. In International Conference on Advances in Neural Information Processing Systems, pages 14516–14528, (2022)

  11. Baek, D., Oh, Y., Lee, S., Lee,J., Ham, B.: Decomposed knowledge distillation for class-incremental semantic segmentation. In International Conference on Advances in Neural Information Processing Systems, pages 10380–10392, (2022)

  12. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Bing, X., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)

    Article  MathSciNet  Google Scholar 

  13. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265, (2015)

  14. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, (2022)

  15. Brock, A., Donahue, J., Simonyan, K.: Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, (2018)

  16. Chamikara, M.A.P., Bertók, P., Liu, D., Camtepe, S., Khalil, I.: Efficient data perturbation for privacy preserving and accurate data stream mining. Pervasive and Mobile Computing 48, 1–19 (2018)

    Article  Google Scholar 

  17. Li, D., Ling, H., Kim, S.W., Kreis, K., Fidler, S.,Torralba, A. Bigdatasetgan: Synthesizing imagenet with pixel-wise annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21330–21340, (2022)

  18. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125, 1(2):3, (2022)

  19. Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, K., Gontijo Lopes, R., Karagol Ayan, B., Salimans, T., et al.: Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems 35, 36479–36494 (2022)

    Google Scholar 

  20. Li, Z., Zhou, Q., Zhang, X., Zhang, Y., Wang, Y.,Xie, W.: Open-vocabulary object segmentation with diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7667–7676, (2023)

  21. Wu, W., Zhao, Y., Shou, M.Z., Zhou, H., Shen, C.: Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1206–1217, (2023)

  22. Cermelli, F., Mancini, M., Bulo, S.R., Ricci, E., Caputo, B.: Modeling the background for incremental learning in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9233–9242, (2020)

  23. Douillard, A., Chen, Y., Dapogny, A., Cord, M. Plop: Learning without forgetting for continual semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4040–4050, (2021)

  24. Cha, S., Yoo, Y.J., Moon, T., et al.: Ssul: Semantic segmentation with unknown label for exemplar-based class-incremental learning. In International Conference on Advances in Neural Information Processing Systems 34, 10919–10930 (2021)

    Google Scholar 

  25. Chen, J., Cong, R., Luo, Y., Ip, H.H.S., Kwong, S.: Replay without saving: Prototype derivation and distribution rebalance for class-incremental semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 47(6), 4699–4716 (2025)

    Article  Google Scholar 

  26. Zhu, G., Dongyue, W., Gao, C., Wang, R., Yang, W., Sang, N.: Adaptive prototype replay for class incremental semantic segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence 39, 10932–10940 (2025)

    Article  Google Scholar 

  27. Zhu, L., Chen, T., Yin, J., See, S., Liu, J.: Continual semantic segmentation with automatic memory sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3082–3092, (2023)

  28. Zhu, L., Chen, T., Yin, J., See, S., Soh, D.W., Liu, J.: Replay master: Automatic sample selection and effective memory utilization for continual semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–18, (2025)

  29. Qiu, H., Feng, J., Zhao, L., Gu, C., Yu, H., Zhang, Y., Wang, Z.: Rmaf: A replay method based on active forgetting for continual learning. Neurocomputing, page 131098, (2025)

  30. Song, Ji., Meng, C., Ermon, S.: Denoising diffusion implicit models. In International Conference on Learning Representations, pages 1156–1165, (2021)

  31. Soria, X., Sappa, A., Humanante, P., Akbarinia, A.: Dense extreme inception network for edge detection. Pattern Recognition 139, 109461 (2023)

    Article  Google Scholar 

  32. Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.-Y., Ermon, S. Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, pages 2392–2402, (2021)

  33. Gal, R., Alaluf, Y., Atzmon, Y., Patashnik, O., Bermano, A.H., Chechik, G., Cohen-Or. D.: An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, (2022)

  34. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3213–3223, (2016)

  35. Tian, Y., Gelernter, J., Wang, X., et al.: Lane marking detection via deep convolutional neural network. Neurocomputing 280, 46–55 (2018)

    Article  Google Scholar 

  36. Wang, W., Dai, J., Chen, Z., Huang, Z., Li, Z., Zhu, X., Hu, X., Lu,T., Lu, L., Li, H. et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14408–14419, 2023

  37. Zhang L., Agrawala, M.: Adding conditional control to text-to-image diffusion models. ArXiv preprint arXiv:2302.05543, (2023)

  38. Shi, N., Li, D., Hong, M., Sun, R.: Rmsprop converges with proper hyper-parameter. In International Conference on Learning Representations, pages 1684–1695, (2020)

  39. Tian, Y., Xu, Z., Ma, Y., Ding, W., Wang, R., Gao, Z., Cheng, G., He, L., Zhao, X.: Survey on deep learning in multimodal medical imaging for cancer detection. Neural Computing and Applications, pages 1–16, (2023)

  40. Maracani, A., Michieli, U., Toldo, M., Zanuttigh, P. Recall: Replay-based continual learning in semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7026–7035, (2021)

  41. Michieli, U., Zanuttigh, P.: Continual semantic segmentation via repulsion-attraction of sparse and disentangled latent representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1114–1124, (2021)

Download references

Acknowledgements

The authors would like to thank AJE (www.aje.com) for its language editing assistance during the preparation of this manuscript.

Funding

This work was supported in part by the Key R&D Program of Zhejiang Province (No. 2023C01039)

the Natural Science Foundation of Zhejiang Province (No. LZ24F020001)

the Opening Foundation of the Tongxiang Institute of General Artificial Intelligence (No. TAGI2-B-2024-0009)

and State Key Laboratory of Advanced Medical Materials and Devices.

Author information

Authors and Affiliations

Authors

Contributions

Authors’ contributions Jian Jiang: Formal analysis, Writing – original draft preparation. Yan Tian: Conceptualization, Methodology, Writing - review & editing. Yongchuan Xu: Software, Data curation, Writing – review & editing. Zhaocheng Xu: Writing – review & editing. Xun Wang: Writing – review & editing.

Corresponding author

Correspondence to Yan Tian.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

The research does not involve human participants and/or animals. Consent for data used has already been fully informed.

Additional information

Communicated by Bing-kun Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, J., Tian, Y., Xu, Y. et al. Sdreplay: diffusion model for continual semantic segmentation in traffic scenarios. Multimedia Systems 31, 463 (2025). https://doi.org/10.1007/s00530-025-02049-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1007/s00530-025-02049-0

Keywords

Profiles

  1. Yan Tian