DreamTexture: High-Fidelity Synthetic 3D Data Generation Through Decoupled Geometry and Texture Synthesis

Li, Jing; Luo, Yawei; Li, Ying; Li, Xueying; Li, Xiaoxue; Hao, Yuwen; Wang, Lijun; Li, Zhengping

doi:10.1007/978-3-031-91907-7_18

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15642))

Included in the following conference series:

European Conference on Computer Vision

990 Accesses

Abstract

Recent single-image to 3D generation methods commonly adopt multi-view diffusion and large reconstruction models to achieve fast 3D content generation. Despite the impressive generation speed and geometrical consistency, there are multiple deficiencies including texture distortion, color deviation, and insufficient resolution. To address these deficiencies, we present DreamTexture, a high-fidelity method for decoupling geometry and texture synthesis in two stages. The main idea is to combine the powerful geometrical generation capability of the large multi-view Gaussian model with the texture alignment ability of the synchronized multi-view diffusion strategy. To synthesize subject-driven personalized textures for decorating the 3D object, we train a personalized diffusion model to generate subject-driven multi-view images. These images are then mapped to the texture domain to synthesize high-resolution textures. To improve the quality of multi-view images for further optimizing texture details, we instantiate two scaling factors to re-balance the contributions of backbone features and skip features in personalized diffusion. Experimental results on public datasets demonstrate that DreamTexture significantly outperforms the latest state-of-the-art methods, both qualitatively and quantitatively. Notably, our framework can generate high-fidelity 3D assets with detailed textures from scratch, featuring remarkable training scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Efficient 3D View Synthesis from Single-Image Utilizing Diffusion Priors

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

HD-Tex: Leveraging Structural Priors for High-Fidelity Texture Synthesis

References

Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-NeRF 360: unbounded anti-aliased neural radiance fields. In: CVPR, pp. 5470–5479 (2022)
Google Scholar
Bi, S., Kalantari, N.K., Ramamoorthi, R.: Patch-based optimization for image-based texture mapping. TOG 36(4), 106–1 (2017)
Article Google Scholar
Bokhovkin, A., Tulsiani, S., Dai, A.: Mesh2tex: generating mesh textures from image queries. In: ICCV, pp. 8918–8928 (2023)
Google Scholar
Cao, T., Kreis, K., Fidler, S., Sharp, N., Yin, K.: Texfusion: synthesizing 3D textures with text-guided image diffusion models. In: ICCV, pp. 4169–4181 (2023)
Google Scholar
Chen, D.Z., Siddiqui, Y., Lee, H.Y., Tulyakov, S., Nießner, M.: Text2tex: text-driven texture synthesis via diffusion models. In: ICCV, pp. 18558–18568 (2023)
Google Scholar
Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3D: disentangling geometry and appearance for high-quality text-to-3D content creation. In: ICCV, pp. 22246–22256 (2023)
Google Scholar
Chen, Z., Yin, K., Fidler, S.: AUV-Net: Learning aligned UV maps for texture transfer and synthesis. In: CVPR, pp. 1465–1474 (2022)
Google Scholar
Downs, L., et al.: Google scanned objects: a high-quality dataset of 3D scanned household items. In: ICRA, pp. 2553–2560. IEEE (2022)
Google Scholar
Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. In: CVPR, pp. 5501–5510 (2022)
Google Scholar
Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion. In: ICLR (2023)
Google Scholar
Gu, Y., et al.: Mix-of-show: decentralized low-rank adaptation for multi-concept customization of diffusion models. In: NeurIPS, vol. 36 (2023)
Google Scholar
Guo, D., Li, K., Hu, B., Zhang, Y., Wang, M.: Benchmarking micro-action recognition: dataset, method, and application. TCSVT (2024)
Google Scholar
Guo, Y.C., et al.: Threestudio: a unified framework for 3D content generation. https://github.com/threestudio-project/threestudio (2023)
Hang, T., et al.: Efficient diffusion training via min-snr weighting strategy. In: ICCV, pp. 7441–7451 (2023)
Google Scholar
Henderson, P., Tsiminaki, V., Lampert, C.H.: Leveraging 2D data to learn textured 3D mesh generation. In: CVPR, pp. 7498–7507 (2020)
Google Scholar
Hong, Y., et al.: LRM: large reconstruction model for single image to 3D. In: ICLR (2024)
Google Scholar
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. In: ICLR (2022)
Google Scholar
Huang, J., et al.: Adversarial texture optimization from RGB-D scans. In: CVPR, pp. 1559–1568 (2020)
Google Scholar
Huang, Y., Wang, J., Shi, Y., Tang, B., Qi, X., Zhang, L.: Dreamtime: an improved optimization strategy for diffusion-guided 3D generation. In: ICLR (2024)
Google Scholar
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. TOG 42(4), 1–14 (2023)
Article Google Scholar
Kopf, J., Fu, C.W., Cohen-Or, D., Deussen, O., Lischinski, D., Wong, T.T.: Solid texture synthesis from 2D exemplars. In: ACM SIGGRAPH, pp. 2–es (2007)
Google Scholar
Kumari, N., Zhang, B., Zhang, R., Shechtman, E., Zhu, J.Y.: Multi-concept customization of text-to-image diffusion. In: CVPR, pp. 1931–1941 (2023)
Google Scholar
Lefebvre, S., Hoppe, H.: Appearance-space texture synthesis. TOG 25(3), 541–548 (2006)
Article Google Scholar
Li, J., et al.: Instant3D: fast text-to-3D with sparse-view generation and large reconstruction model. In: ICLR (2024)
Google Scholar
Li, Y., et al.: Gligen: open-set grounded text-to-image generation. In: CVPR, pp. 22511–22521 (2023)
Google Scholar
Lin, C.H., et al.: Magic3D: high-resolution text-to-3D content creation. In: CVPR, pp. 300–309 (2023)
Google Scholar
Liu, M., et al.: One-2-3-45++: fast single image to 3D objects with consistent multi-view generation and 3D diffusion. In: CVPR, pp. 10072–10083 (2024)
Google Scholar
Liu, M., et al.: One-2-3-45: any single image to 3D mesh in 45 seconds without per-shape optimization. In: NeurIPS, vol. 36 (2023)
Google Scholar
Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: zero-shot one image to 3D object. In: ICCV, pp. 9298–9309 (2023)
Google Scholar
Liu, Y., et al.: Syncdreamer: generating multiview-consistent images from a single-view image. In: ICLR (2024)
Google Scholar
Liu, Y., Xie, M., Liu, H., Wong, T.T.: Text-guided texturing by synchronized multi-view diffusion. arXiv preprint arXiv:2311.12891 (2023)
Long, X., et al.: Wonder3D: single image to 3D using cross-domain diffusion. In: CVPR (2024)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
Google Scholar
Lu, J., et al.: Context-aware textures. TOG 26(1), 3–es (2007)
Google Scholar
Luo, Y., Liu, P., Yang, Y.: Kill two birds with one stone: Domain generalization for semantic segmentation via network pruning. In: IJCV, pp. 1–18 (2024)
Google Scholar
Luo, Y., Liu, P., Zheng, L., Guan, T., Yu, J., Yang, Y.: Category-level adversarial adaptation for semantic segmentation using purified features. In: TPAMI, pp. 3940–3956 (2021)
Google Scholar
Luo, Y., Yang, Y.: Large language model and domain-specific model collaboration for smart education. FITEE 25(3), 333–341 (2024)
Article Google Scholar
Ma, S., Luo, Y., Yang, Y.: Reconstructing and simulating dynamic 3D objects with mesh-adsorbed gaussian splatting. arXiv preprint arXiv:2406.01593 (2024)
Melas-Kyriazi, L., Laina, I., Rupprecht, C., Vedaldi, A.: Realfusion: 360deg reconstruction of any object from a single image. In: CVPR, pp. 8446–8455 (2023)
Google Scholar
Mertens, T., Kautz, J., Chen, J., Bekaert, P., Durand, F.: Texture transfer using geometry correlation. Rendering Tech. 273(10.2312), 273–284 (2006)
Google Scholar
Metzer, G., Richardson, E., Patashnik, O., Giryes, R., Cohen-Or, D.: Latent-nerf for shape-guided generation of 3D shapes and textures. In: CVPR, pp. 12663–12673 (2023)
Google Scholar
Miao, Q., Luo, Y., Yang, Y.: Pla4D: pixel-level alignments for text-to-4D gaussian splatting. arXiv preprint arXiv:2405.19957 (2024)
Mildenhall, B., Srinivasan, P., Tancik, M., Barron, J., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Google Scholar
Min, Z., Luo, Y., Yang, W., Wang, Y., Yang, Y.: Entangled view-epipolar information aggregation for generalizable neural radiance fields. In: CVPR, pp. 4906–4916 (2024)
Google Scholar
Mohammad Khalid, N., Xie, T., Belilovsky, E., Popa, T.: Clip-mesh: generating textured meshes from text using pretrained image-text models. In: ACM SIGGRAPH, pp. 1–8 (2022)
Google Scholar
Mou, C., et al.: T2i-adapter: learning adapters to dig out more controllable ability for text-to-image diffusion models. In: AAAI, vol. 38, pp. 4296–4304 (2024)
Google Scholar
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. TOG 41(4), 1–15 (2022)
Article Google Scholar
Pavllo, D., Kohler, J., Hofmann, T., Lucchi, A.: Learning generative models of textured 3D meshes from real-world images. In: ICCV, pp. 13879–13889 (2021)
Google Scholar
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3D using 2D diffusion. In: ICLR (2023)
Google Scholar
Qian, G., et al.: Magic123: one image to high-quality 3D object generation using both 2D and 3D diffusion priors. In: ICLR (2024)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763. PMLR (2021)
Google Scholar
Richardson, E., Metzer, G., Alaluf, Y., Giryes, R., Cohen-Or, D.: Texture: text-guided texturing of 3D shapes. In: ACM SIGGRAPH, pp. 1–11 (2023)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10684–10695 (2022)
Google Scholar
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: fine tuning text-to-image diffusion models for subject-driven generation. In: CVPR, pp. 22500–22510 (2023)
Google Scholar
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. In: NeurIPS, vol. 35, pp. 36479–36494 (2022)
Google Scholar
Sanghi, A., et al.: Clip-forge: towards zero-shot text-to-shape generation. In: CVPR, pp. 18603–18613 (2022)
Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR, pp. 4104–4113 (2016)
Google Scholar
Shi, R., et al.: Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:2310.15110 (2023)
Shi, Y., Wang, P., Ye, J., Mai, L., Li, K., Yang, X.: MVDream: multi-view diffusion for 3D generation. In: ICLR (2024)
Google Scholar
Si, C., Huang, Z., Jiang, Y., Liu, Z.: Freeu: free lunch in diffusion U-Net. In: CVPR (2024)
Google Scholar
Siddiqui, Y., Thies, J., Ma, F., Shan, Q., Nießner, M., Dai, A.: Texturify: generating textures on 3D shape surfaces. In: ECCV, pp. 72–88. Springer (2022)
Google Scholar
Sun, J., et al.: Dreamcraft3D: hierarchical 3D generation with bootstrapped diffusion prior. In: ICLR (2024)
Google Scholar
Tang, J., Chen, Z., Chen, X., Wang, T., Zeng, G., Liu, Z.: LGM: large multi-view gaussian model for high-resolution 3D content creation. arXiv preprint arXiv:2402.05054 (2024)
Tang, J., Ren, J., Zhou, H., Liu, Z., Zeng, G.: Dreamgaussian: generative gaussian splatting for efficient 3D content creation. In: ICLR (2024)
Google Scholar
Tang, J., et al.: Make-it-3D: high-fidelity 3D creation from a single image with diffusion prior. In: ICCV, pp. 22819–22829 (2023)
Google Scholar
Tochilkin, D., et al.: Triposr: fast 3D object reconstruction from a single image. arXiv preprint arXiv:2403.02151 (2024)
Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score jacobian chaining: lifting pretrained 2D diffusion models for 3d generation. In: CVPR, pp. 12619–12629 (2023)
Google Scholar
Wang, P., Shi, Y.: Imagedream: image-prompt multi-view diffusion for 3D generation. arXiv preprint arXiv:2312.02201 (2023)
Wang, Z., et al.: Prolificdreamer: high-fidelity and diverse text-to-3D generation with variational score distillation. In: NeurIPS (2023)
Google Scholar
Xu, D., Jiang, Y., Wang, P., Fan, Z., Wang, Y., Wang, Z.: Neurallift-360: lifting an in-the-wild 2D photo to a 3D object with 360deg views. In: CVPR, pp. 4479–4489 (2023)
Google Scholar
Xu, J., Cheng, W., Gao, Y., Wang, X., Gao, S., Shan, Y.: Instantmesh: efficient 3D mesh generation from a single image with sparse-view large reconstruction models. arXiv preprint arXiv:2404.07191 (2024)
Ye, H., Zhang, J., Liu, S., Han, X., Yang, W.: IP-adapter: text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721 (2023)
Yeh, S.Y., Hsieh, Y.G., Gao, Z., Yang, B.B., Oh, G., Gong, Y.: Navigating text-to-image customization: from lycoris fine-tuning to model evaluation. In: ICLR (2024)
Google Scholar
Yi, T., et al.: Gaussiandreamer: fast generation from text to 3D gaussians by bridging 2D and 3D diffusion models. In: CVPR (2024)
Google Scholar
Zeng, X., et al.: Paint3D: paint anything 3D with lighting-less texture diffusion models. In: CVPR, pp. 4252–4262 (2024)
Google Scholar
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: ICCV, pp. 3836–3847 (2023)
Google Scholar
Zhou, Q.Y., Koltun, V.: Color map optimization for 3D reconstruction with consumer depth cameras. ToG 33(4), 1–10 (2014)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62293554, 62206249, U2336212), Natural Science Foundation of Zhejiang Province, China (LZ24F020002).

Author information

Authors and Affiliations

North China University of Technology, Beijing, 100144, China
Jing Li, Ying Li, Xueying Li, Lijun Wang & Zhengping Li
Zhejiang University, Hangzhou, China
Yawei Luo
Beijing Key Laboratory of Disaster Medicine, Beijing, China
Xiaoxue Li & Yuwen Hao

Authors

Jing Li
View author publications
Search author on:PubMed Google Scholar
Yawei Luo
View author publications
Search author on:PubMed Google Scholar
Ying Li
View author publications
Search author on:PubMed Google Scholar
Xueying Li
View author publications
Search author on:PubMed Google Scholar
Xiaoxue Li
View author publications
Search author on:PubMed Google Scholar
Yuwen Hao
View author publications
Search author on:PubMed Google Scholar
Lijun Wang
View author publications
Search author on:PubMed Google Scholar
Zhengping Li
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Ying Li or Zhengping Li.

Editor information

Editors and Affiliations

Istituto Italiano di Tecnologia, Genoa, Italy
Alessio Del Bue
Meta AI, Barcelona, Spain
Cristian Canton
Google DeepMind, Zürich, Switzerland
Jordi Pont-Tuset
Politecnico di Torino, Turin, Italy
Tatiana Tommasi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J. et al. (2025). DreamTexture: High-Fidelity Synthetic 3D Data Generation Through Decoupled Geometry and Texture Synthesis. In: Del Bue, A., Canton, C., Pont-Tuset, J., Tommasi, T. (eds) Computer Vision – ECCV 2024 Workshops. ECCV 2024. Lecture Notes in Computer Science, vol 15642. Springer, Cham. https://doi.org/10.1007/978-3-031-91907-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-91907-7_18
Published: 12 May 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-91906-0
Online ISBN: 978-3-031-91907-7
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

DreamTexture: High-Fidelity Synthetic 3D Data Generation Through Decoupled Geometry and Texture Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient 3D View Synthesis from Single-Image Utilizing Diffusion Priors

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

HD-Tex: Leveraging Structural Priors for High-Fidelity Texture Synthesis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Keywords

Publish with us

Profiles

Subscribe and save

Buy Now

DreamTexture: High-Fidelity Synthetic 3D Data Generation Through Decoupled Geometry and Texture Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient 3D View Synthesis from Single-Image Utilizing Diffusion Priors

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

HD-Tex: Leveraging Structural Priors for High-Fidelity Texture Synthesis

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Keywords

Publish with us

Profiles