3D-Aware Semantic-Guided Generative Model for Human Synthesis

Zhang, Jichao; Sangineto, Enver; Tang, Hao; Siarohin, Aliaksandr; Zhong, Zhun; Sebe, Nicu; Wang, Wei

doi:10.1007/978-3-031-19784-0_20

Jichao Zhang¹²,
Enver Sangineto¹³,
Hao Tang¹⁴,
Aliaksandr Siarohin^12,15,
Zhun Zhong¹²,
Nicu Sebe¹² &
…
Wei Wang¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13675))

Included in the following conference series:

European Conference on Computer Vision

3395 Accesses
24 Citations

Abstract

Generative Neural Radiance Field (GNeRF) models, which extract implicit 3D representations from 2D images, have recently been shown to produce realistic images representing rigid/semi-rigid objects, such as human faces or cars. However, they usually struggle to generate high-quality images representing non-rigid objects, such as the human body, which is of a great interest for many computer graphics applications. This paper proposes a 3D-aware Semantic-Guided Generative Model (3D-SGAN) for human image synthesis, which combines a GNeRF with a texture generator. The former learns an implicit 3D representation of the human body and outputs a set of 2D semantic segmentation masks. The latter transforms these semantic masks into a real image, adding a realistic texture to the human appearance. Without requiring additional 3D information, our model can learn 3D human representations with a photo-realistic, controllable generation. Our experiments on the DeepFashion dataset show that 3D-SGAN significantly outperforms the most recent baselines. The code is available at https://github.com/zhangqianhui/3DSGAN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SGFNeRF: Shape Guided 3D Face Generation in Neural Radiance Fields

Exp-GAN: 3D-Aware Facial Image Generation with Expression Control

SemanticHuman-HD: High-Resolution Semantic Disentangled 3D Human Generation

References

Abdal, R., Qin, Y., Wonka, P.: Image2stylegan: how to embed images into the stylegan latent space? In: ICCV (2019)
Google Scholar
Abdal, R., Zhu, P., Mitra, N., Wonka, P.: Styleflow: attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM TOG 40(3), 1–21 (2020)
Article Google Scholar
Alp Güler, R., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. In: CVPR (2018)
Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE TPAMI 39(12), 2481–2495 (2017)
Article Google Scholar
Balakrishnan, G., Zhao, A., Dalca, A.V., Durand, F., Guttag, J.: Synthesizing images of humans in unseen poses. In: CVPR (2018)
Google Scholar
Brock, A., Donahue, J., Simonyan, K.: Large scale gan training for high fidelity natural image synthesis. In: ICLR (2019)
Google Scholar
Chan, E., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-gan: periodic implicit generative adversarial networks for 3d-aware image synthesis. In: CVPR (2021)
Google Scholar
Chan, E.R., et al.: Efficient geometry-aware 3d generative adversarial networks. arXiv preprint. arXiv:2112.07945 (2021)
Chen, X., Cohen-Or, D., Chen, B., Mitra, N.J.: Towards a neural graphics pipeline for controllable image generation. CGF 40(2), 127–140 (2021)
Google Scholar
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR (2018)
Google Scholar
Deng, Y., Yang, J., Xiang, J., Tong, X.: Gram: generative radiance manifolds for 3d-aware image generation (2022)
Google Scholar
DeVries, T., Bautista, M.A., Srivastava, N., Taylor, G.W., Susskind, J.M.: Unconstrained scene generation with locally conditioned radiance fields. In: ICCV (2021)
Google Scholar
Gadelha, M., Maji, S., Wang, R.: 3d shape induction from 2d views of multiple objects. In: 3DV (2017)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)
Google Scholar
Grigorev, A., Sevastopolsky, A., Vakhitov, A., Lempitsky, V.: Coordinate-based texture inpainting for pose-guided image generation. In: CVPR (2019)
Google Scholar
Grigorev, A., et al.: Stylepeople: a generative model of fullbody human avatars. In: CVPR (2021)
Google Scholar
Gu, J., Liu, L., Wang, P., Theobalt, C.: Stylenerf: a style-based 3d-aware generator for high-resolution image synthesis. ICLR (2022)
Google Scholar
Güler, R.A., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. In: CVPR (2018)
Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of wasserstein gans. In: NeurIPS (2017)
Google Scholar
He, Z., Kan, M., Shan, S.: Eigengan: layer-wise eigen-learning for gans. In: ICCV (2021)
Google Scholar
Henderson, P., Ferrari, V.: Learning single-image 3d reconstruction by generative modelling of shape, pose and shading. International Journal of Computer Vision 128(4), 835–854 (2019). https://doi.org/10.1007/s11263-019-01219-8
Article MATH Google Scholar
Henzler, P., Mitra, N.J., Ritschel, T.: Escaping plato’s cave: 3d shape from adversarial rendering. In: ICCV (2019)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS (2017)
Google Scholar
Huang, S., et al.: Generating person images with appearance-aware pose stylizer. In: IJCAI (2020)
Google Scholar
Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: ECCV (2018)
Google Scholar
Jain, A., Tancik, M., Abbeel, P.: Putting nerf on a diet: semantically consistent few-shot view synthesis. In: ICCV (2021)
Google Scholar
Jinsong, Z., Kun, L., Yu-Kun, L., Jingyu, Y.: PISE: person image synthesis and editing with decoupled gan. In: CVPR (2021)
Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. In: ICLR (2018)
Google Scholar
Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. In: NeurIPS (2020)
Google Scholar
Karras, T., et al.: Alias-free generative adversarial networks. In: NeurIPS (2021)
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: CVPR (2020)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint. arXiv:1312.6114 (2013)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2013)
Google Scholar
Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: ICML (2016)
Google Scholar
Lassner, C., Pons-Moll, G., Gehler, P.V.: A generative model of people in clothing. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 853–862 (2017)
Google Scholar
Liao, Y., Schwarz, K., Mescheder, L., Geiger, A.: Towards unsupervised learning of generative models for 3d controllable image synthesis. In: CVPR (2020)
Google Scholar
Liu, W., Piao, Z., Tu, Z., Luo, W., Ma, L., Gao, S.: Liquid warping gan with attention: a unified framework for human image synthesis. IEEE TPAMI 44(9), 5114–5132 (2021)
Google Scholar
Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: CVPR (2016)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM TOG 34(6), 1–16 (2015)
Article Google Scholar
Lunz, S., Li, Y., Fitzgibbon, A.W., Kushman, N.: Inverse graphics GAN: learning to generate 3d shapes from unstructured 2d data. CoRR abs/2002.12674 (2020). https://arxiv.org/abs/2002.12674
Lv, Z., Li, X., Li, X., Li, F., Lin, T., He, D., Zuo, W.: Learning semantic person image generation by region-adaptive normalization. In: CVPR (2021)
Google Scholar
Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. In: NeurIPS (2017)
Google Scholar
Ma, L., Sun, Q., Georgoulis, S., Van Gool, L., Schiele, B., Fritz, M.: Disentangled person image generation. In: CVPR (2018)
Google Scholar
Men, Y., Mao, Y., Jiang, Y., Ma, W.Y., Lian, Z.: Controllable person image synthesis with attribute-decomposed gan. In: CVPR (2020)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Google Scholar
Neverova, N., Alp Guler, R., Kokkinos, I.: Dense pose transfer. In: ECCV (2018)
Google Scholar
Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.L.: Hologan: unsupervised learning of 3d representations from natural images. In: ICCV (2019)
Google Scholar
Nguyen-Phuoc, T., Richardt, C., Mai, L., Yang, Y.L., Mitra, N.: Blockgan: learning 3d object-aware scene representations from unlabelled images. In: NeurIPS (2020)
Google Scholar
Niemeyer, M., Geiger, A.: CAMPARI: camera-aware decomposed generative neural radiance fields. In: 3DV (2021)
Google Scholar
Niemeyer, M., Geiger, A.: GIRAFFE: representing scenes as compositional generative neural feature fields. In: CVPR (2021)
Google Scholar
Or-El, R., Luo, X., Shan, M., Shechtman, E., Park, J.J., Kemelmacher-Shlizerman, I.: StyleSDF: high-Resolution 3D-Consistent Image and Geometry Generation. In: CVPR (2022)
Google Scholar
Pan, X., Xu, X., Loy, C.C., Theobalt, C., Dai, B.: A shading-guided generative implicit model for shape-accurate 3d-aware image synthesis. In: NeurIPS (2021)
Google Scholar
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: CVPR (2019)
Google Scholar
Peng, S., et al.: Animatable neural radiance fields for human body modeling. In: ICCV (2021)
Google Scholar
Peng, S.,et al.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: CVPR (2021)
Google Scholar
Reiser, C., Peng, S., Liao, Y., Geiger, A.: Kilonerf: speeding up neural radiance fields with thousands of tiny mlps. In: ICCV (2021)
Google Scholar
Ren, Y., Yu, X., Chen, J., Li, T.H., Li, G.: Deep image spatial transformation for person image generation. In: CVPR (2020)
Google Scholar
Rezende, D.J., Eslami, S.M.A., Mohamed, S., Battaglia, P., Jaderberg, M., Heess, N.: Unsupervised learning of 3d structure from images. In: NeurIPS (2016)
Google Scholar
Sanyal, S., et al.: Learning realistic human reposing using cyclic self-supervision with 3d shape, pose, and appearance consistency. In: ICCV (2021)
Google Scholar
Sanyal, S., et al.: Learning realistic human reposing using cyclic self-supervision with 3d shape, pose, and appearance consistency. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11138–11147 (2021)
Google Scholar
Sarkar, K., Golyanik, V., Liu, L., Theobalt, C.: Style and pose control for image synthesis of humans from a single monocular view. arXiv preprint. arXiv:2102.11263 (2021)
Sarkar, K., Liu, L., Golyanik, V., Theobalt, C.: Humangan: a generative model of humans images (2021)
Google Scholar
Sarkar, K., Mehta, D., Xu, W., Golyanik, V., Theobalt, C.: Neural re-rendering of humans from a single image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 596–613. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_35
Chapter Google Scholar
Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: Graf: generative radiance fields for 3d-aware image synthesis. In: NeurIPS (2020)
Google Scholar
Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of gans for semantic face editing. In: CVPR (2020)
Google Scholar
Shen, Y., Zhou, B.: Closed-form factorization of latent semantics in gans. In: CVPR (2021)
Google Scholar
Siarohin, A., Lathuilière, S., Sangineto, E., Sebe, N.: Appearance and pose-conditioned human image generation using deformable GANs. IEEE TPAMI 43(4), 1156–1171 (2020)
Article Google Scholar
Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., Sebe, N.: First order motion model for image animation. In: NeurIPS (2019)
Google Scholar
Song, S., Zhang, W., Liu, J., Mei, T.: Unsupervised person image generation with semantic parsing transformation. In: CVPR (2019)
Google Scholar
Sun, J., Wang, X., Zhang, Y., Li, X., Zhang, Q., Liu, Y., Wang, J.: Fenerf: face editing in neural radiance fields. arXiv preprint. arXiv:2111.15490 (2021)
Tan, F., et al.: Volux-gan: a generative model for 3d face synthesis with HDRI relighting. arXiv preprint. arXiv:2201.04873 (2022)
Tang, H., Bai, S., Zhang, L., Torr, P.H.S., Sebe, N.: XingGAN for person image generation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 717–734. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_43
Chapter Google Scholar
Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for stylegan image manipulation. ACM TOG 40(4), 1–14 (2021)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Wang, X., Gupta, A.: Generative image modeling using style and structure adversarial networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 318–335. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_20
Chapter Google Scholar
Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.B.: Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In: NeurIPS (2016)
Google Scholar
Xu, X., Pan, X., Lin, D., Dai, B.: Generative occupancy fields for 3d surface-aware image synthesis. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Google Scholar
Xu, Y., Peng, S., Yang, C., Shen, Y., Zhou, B.: 3d-aware image synthesis via learning structural and textural representations. In: CVPR (2022)
Google Scholar
Yildirim, G., Jetchev, N., Vollgraf, R., Bergmann, U.: Generating high-resolution fashion model images wearing custom outfits. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhou, P., Xie, L., Ni, B., Tian, Q.: CIPS-3D: a 3D-aware generator of gans based on conditionally-independent pixel synthesis (2021)
Google Scholar
Zhou, X., et al.: Cocosnet v2: full-resolution correspondence learning for image translation. In: CVPR (2021)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Google Scholar
Zhu, J.Y., et al.: Visual object networks: image generation with disentangled 3D representations. In: NeurIPS (2018)
Google Scholar
Zhu, Z., Huang, T., Shi, B., Yu, M., Wang, B., Bai, X.: Progressive pose attention transfer for person image generation. In: CVPR (2019)
Google Scholar

Download references

Acknowledgements

This work was supported by the EU H2020 projects AI4Media (No. 951911) and SPRING (No. 871245).

Author information

Authors and Affiliations

University of Trento, Trento, Italy
Jichao Zhang, Aliaksandr Siarohin, Zhun Zhong, Nicu Sebe & Wei Wang
University of Modena and Reggio Emilia, Modena, Italy
Enver Sangineto
ETH Zurich, Zurich, Switzerland
Hao Tang
Snap Research, Santa Monica, USA
Aliaksandr Siarohin

Authors

Jichao Zhang
View author publications
Search author on:PubMed Google Scholar
Enver Sangineto
View author publications
Search author on:PubMed Google Scholar
Hao Tang
View author publications
Search author on:PubMed Google Scholar
Aliaksandr Siarohin
View author publications
Search author on:PubMed Google Scholar
Zhun Zhong
View author publications
Search author on:PubMed Google Scholar
Nicu Sebe
View author publications
Search author on:PubMed Google Scholar
Wei Wang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jichao Zhang.

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6729 KB) (download PDF )

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J. et al. (2022). 3D-Aware Semantic-Guided Generative Model for Human Synthesis. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13675. Springer, Cham. https://doi.org/10.1007/978-3-031-19784-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-19784-0_20
Published: 31 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19783-3
Online ISBN: 978-3-031-19784-0
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

Keywords

Publish with us

Policies and ethics

3D-Aware Semantic-Guided Generative Model for Human Synthesis