close
Skip to main content

DreamTexture: High-Fidelity Synthetic 3D Data Generation Through Decoupled Geometry and Texture Synthesis

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 Workshops (ECCV 2024)

Abstract

Recent single-image to 3D generation methods commonly adopt multi-view diffusion and large reconstruction models to achieve fast 3D content generation. Despite the impressive generation speed and geometrical consistency, there are multiple deficiencies including texture distortion, color deviation, and insufficient resolution. To address these deficiencies, we present DreamTexture, a high-fidelity method for decoupling geometry and texture synthesis in two stages. The main idea is to combine the powerful geometrical generation capability of the large multi-view Gaussian model with the texture alignment ability of the synchronized multi-view diffusion strategy. To synthesize subject-driven personalized textures for decorating the 3D object, we train a personalized diffusion model to generate subject-driven multi-view images. These images are then mapped to the texture domain to synthesize high-resolution textures. To improve the quality of multi-view images for further optimizing texture details, we instantiate two scaling factors to re-balance the contributions of backbone features and skip features in personalized diffusion. Experimental results on public datasets demonstrate that DreamTexture significantly outperforms the latest state-of-the-art methods, both qualitatively and quantitatively. Notably, our framework can generate high-fidelity 3D assets with detailed textures from scratch, featuring remarkable training scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - view details

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-NeRF 360: unbounded anti-aliased neural radiance fields. In: CVPR, pp. 5470–5479 (2022)

    Google Scholar 

  2. Bi, S., Kalantari, N.K., Ramamoorthi, R.: Patch-based optimization for image-based texture mapping. TOG 36(4), 106–1 (2017)

    Article  Google Scholar 

  3. Bokhovkin, A., Tulsiani, S., Dai, A.: Mesh2tex: generating mesh textures from image queries. In: ICCV, pp. 8918–8928 (2023)

    Google Scholar 

  4. Cao, T., Kreis, K., Fidler, S., Sharp, N., Yin, K.: Texfusion: synthesizing 3D textures with text-guided image diffusion models. In: ICCV, pp. 4169–4181 (2023)

    Google Scholar 

  5. Chen, D.Z., Siddiqui, Y., Lee, H.Y., Tulyakov, S., Nießner, M.: Text2tex: text-driven texture synthesis via diffusion models. In: ICCV, pp. 18558–18568 (2023)

    Google Scholar 

  6. Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3D: disentangling geometry and appearance for high-quality text-to-3D content creation. In: ICCV, pp. 22246–22256 (2023)

    Google Scholar 

  7. Chen, Z., Yin, K., Fidler, S.: AUV-Net: Learning aligned UV maps for texture transfer and synthesis. In: CVPR, pp. 1465–1474 (2022)

    Google Scholar 

  8. Downs, L., et al.: Google scanned objects: a high-quality dataset of 3D scanned household items. In: ICRA, pp. 2553–2560. IEEE (2022)

    Google Scholar 

  9. Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. In: CVPR, pp. 5501–5510 (2022)

    Google Scholar 

  10. Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion. In: ICLR (2023)

    Google Scholar 

  11. Gu, Y., et al.: Mix-of-show: decentralized low-rank adaptation for multi-concept customization of diffusion models. In: NeurIPS, vol. 36 (2023)

    Google Scholar 

  12. Guo, D., Li, K., Hu, B., Zhang, Y., Wang, M.: Benchmarking micro-action recognition: dataset, method, and application. TCSVT (2024)

    Google Scholar 

  13. Guo, Y.C., et al.: Threestudio: a unified framework for 3D content generation. https://github.com/threestudio-project/threestudio (2023)

  14. Hang, T., et al.: Efficient diffusion training via min-snr weighting strategy. In: ICCV, pp. 7441–7451 (2023)

    Google Scholar 

  15. Henderson, P., Tsiminaki, V., Lampert, C.H.: Leveraging 2D data to learn textured 3D mesh generation. In: CVPR, pp. 7498–7507 (2020)

    Google Scholar 

  16. Hong, Y., et al.: LRM: large reconstruction model for single image to 3D. In: ICLR (2024)

    Google Scholar 

  17. Hu, E.J., et al.: Lora: low-rank adaptation of large language models. In: ICLR (2022)

    Google Scholar 

  18. Huang, J., et al.: Adversarial texture optimization from RGB-D scans. In: CVPR, pp. 1559–1568 (2020)

    Google Scholar 

  19. Huang, Y., Wang, J., Shi, Y., Tang, B., Qi, X., Zhang, L.: Dreamtime: an improved optimization strategy for diffusion-guided 3D generation. In: ICLR (2024)

    Google Scholar 

  20. Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. TOG 42(4), 1–14 (2023)

    Article  Google Scholar 

  21. Kopf, J., Fu, C.W., Cohen-Or, D., Deussen, O., Lischinski, D., Wong, T.T.: Solid texture synthesis from 2D exemplars. In: ACM SIGGRAPH, pp. 2–es (2007)

    Google Scholar 

  22. Kumari, N., Zhang, B., Zhang, R., Shechtman, E., Zhu, J.Y.: Multi-concept customization of text-to-image diffusion. In: CVPR, pp. 1931–1941 (2023)

    Google Scholar 

  23. Lefebvre, S., Hoppe, H.: Appearance-space texture synthesis. TOG 25(3), 541–548 (2006)

    Article  Google Scholar 

  24. Li, J., et al.: Instant3D: fast text-to-3D with sparse-view generation and large reconstruction model. In: ICLR (2024)

    Google Scholar 

  25. Li, Y., et al.: Gligen: open-set grounded text-to-image generation. In: CVPR, pp. 22511–22521 (2023)

    Google Scholar 

  26. Lin, C.H., et al.: Magic3D: high-resolution text-to-3D content creation. In: CVPR, pp. 300–309 (2023)

    Google Scholar 

  27. Liu, M., et al.: One-2-3-45++: fast single image to 3D objects with consistent multi-view generation and 3D diffusion. In: CVPR, pp. 10072–10083 (2024)

    Google Scholar 

  28. Liu, M., et al.: One-2-3-45: any single image to 3D mesh in 45 seconds without per-shape optimization. In: NeurIPS, vol. 36 (2023)

    Google Scholar 

  29. Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: zero-shot one image to 3D object. In: ICCV, pp. 9298–9309 (2023)

    Google Scholar 

  30. Liu, Y., et al.: Syncdreamer: generating multiview-consistent images from a single-view image. In: ICLR (2024)

    Google Scholar 

  31. Liu, Y., Xie, M., Liu, H., Wong, T.T.: Text-guided texturing by synchronized multi-view diffusion. arXiv preprint arXiv:2311.12891 (2023)

  32. Long, X., et al.: Wonder3D: single image to 3D using cross-domain diffusion. In: CVPR (2024)

    Google Scholar 

  33. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)

    Google Scholar 

  34. Lu, J., et al.: Context-aware textures. TOG 26(1), 3–es (2007)

    Google Scholar 

  35. Luo, Y., Liu, P., Yang, Y.: Kill two birds with one stone: Domain generalization for semantic segmentation via network pruning. In: IJCV, pp. 1–18 (2024)

    Google Scholar 

  36. Luo, Y., Liu, P., Zheng, L., Guan, T., Yu, J., Yang, Y.: Category-level adversarial adaptation for semantic segmentation using purified features. In: TPAMI, pp. 3940–3956 (2021)

    Google Scholar 

  37. Luo, Y., Yang, Y.: Large language model and domain-specific model collaboration for smart education. FITEE 25(3), 333–341 (2024)

    Article  Google Scholar 

  38. Ma, S., Luo, Y., Yang, Y.: Reconstructing and simulating dynamic 3D objects with mesh-adsorbed gaussian splatting. arXiv preprint arXiv:2406.01593 (2024)

  39. Melas-Kyriazi, L., Laina, I., Rupprecht, C., Vedaldi, A.: Realfusion: 360deg reconstruction of any object from a single image. In: CVPR, pp. 8446–8455 (2023)

    Google Scholar 

  40. Mertens, T., Kautz, J., Chen, J., Bekaert, P., Durand, F.: Texture transfer using geometry correlation. Rendering Tech. 273(10.2312), 273–284 (2006)

    Google Scholar 

  41. Metzer, G., Richardson, E., Patashnik, O., Giryes, R., Cohen-Or, D.: Latent-nerf for shape-guided generation of 3D shapes and textures. In: CVPR, pp. 12663–12673 (2023)

    Google Scholar 

  42. Miao, Q., Luo, Y., Yang, Y.: Pla4D: pixel-level alignments for text-to-4D gaussian splatting. arXiv preprint arXiv:2405.19957 (2024)

  43. Mildenhall, B., Srinivasan, P., Tancik, M., Barron, J., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)

    Google Scholar 

  44. Min, Z., Luo, Y., Yang, W., Wang, Y., Yang, Y.: Entangled view-epipolar information aggregation for generalizable neural radiance fields. In: CVPR, pp. 4906–4916 (2024)

    Google Scholar 

  45. Mohammad Khalid, N., Xie, T., Belilovsky, E., Popa, T.: Clip-mesh: generating textured meshes from text using pretrained image-text models. In: ACM SIGGRAPH, pp. 1–8 (2022)

    Google Scholar 

  46. Mou, C., et al.: T2i-adapter: learning adapters to dig out more controllable ability for text-to-image diffusion models. In: AAAI, vol. 38, pp. 4296–4304 (2024)

    Google Scholar 

  47. Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. TOG 41(4), 1–15 (2022)

    Article  Google Scholar 

  48. Pavllo, D., Kohler, J., Hofmann, T., Lucchi, A.: Learning generative models of textured 3D meshes from real-world images. In: ICCV, pp. 13879–13889 (2021)

    Google Scholar 

  49. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3D using 2D diffusion. In: ICLR (2023)

    Google Scholar 

  50. Qian, G., et al.: Magic123: one image to high-quality 3D object generation using both 2D and 3D diffusion priors. In: ICLR (2024)

    Google Scholar 

  51. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  52. Richardson, E., Metzer, G., Alaluf, Y., Giryes, R., Cohen-Or, D.: Texture: text-guided texturing of 3D shapes. In: ACM SIGGRAPH, pp. 1–11 (2023)

    Google Scholar 

  53. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10684–10695 (2022)

    Google Scholar 

  54. Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: fine tuning text-to-image diffusion models for subject-driven generation. In: CVPR, pp. 22500–22510 (2023)

    Google Scholar 

  55. Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. In: NeurIPS, vol. 35, pp. 36479–36494 (2022)

    Google Scholar 

  56. Sanghi, A., et al.: Clip-forge: towards zero-shot text-to-shape generation. In: CVPR, pp. 18603–18613 (2022)

    Google Scholar 

  57. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR, pp. 4104–4113 (2016)

    Google Scholar 

  58. Shi, R., et al.: Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:2310.15110 (2023)

  59. Shi, Y., Wang, P., Ye, J., Mai, L., Li, K., Yang, X.: MVDream: multi-view diffusion for 3D generation. In: ICLR (2024)

    Google Scholar 

  60. Si, C., Huang, Z., Jiang, Y., Liu, Z.: Freeu: free lunch in diffusion U-Net. In: CVPR (2024)

    Google Scholar 

  61. Siddiqui, Y., Thies, J., Ma, F., Shan, Q., Nießner, M., Dai, A.: Texturify: generating textures on 3D shape surfaces. In: ECCV, pp. 72–88. Springer (2022)

    Google Scholar 

  62. Sun, J., et al.: Dreamcraft3D: hierarchical 3D generation with bootstrapped diffusion prior. In: ICLR (2024)

    Google Scholar 

  63. Tang, J., Chen, Z., Chen, X., Wang, T., Zeng, G., Liu, Z.: LGM: large multi-view gaussian model for high-resolution 3D content creation. arXiv preprint arXiv:2402.05054 (2024)

  64. Tang, J., Ren, J., Zhou, H., Liu, Z., Zeng, G.: Dreamgaussian: generative gaussian splatting for efficient 3D content creation. In: ICLR (2024)

    Google Scholar 

  65. Tang, J., et al.: Make-it-3D: high-fidelity 3D creation from a single image with diffusion prior. In: ICCV, pp. 22819–22829 (2023)

    Google Scholar 

  66. Tochilkin, D., et al.: Triposr: fast 3D object reconstruction from a single image. arXiv preprint arXiv:2403.02151 (2024)

  67. Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score jacobian chaining: lifting pretrained 2D diffusion models for 3d generation. In: CVPR, pp. 12619–12629 (2023)

    Google Scholar 

  68. Wang, P., Shi, Y.: Imagedream: image-prompt multi-view diffusion for 3D generation. arXiv preprint arXiv:2312.02201 (2023)

  69. Wang, Z., et al.: Prolificdreamer: high-fidelity and diverse text-to-3D generation with variational score distillation. In: NeurIPS (2023)

    Google Scholar 

  70. Xu, D., Jiang, Y., Wang, P., Fan, Z., Wang, Y., Wang, Z.: Neurallift-360: lifting an in-the-wild 2D photo to a 3D object with 360deg views. In: CVPR, pp. 4479–4489 (2023)

    Google Scholar 

  71. Xu, J., Cheng, W., Gao, Y., Wang, X., Gao, S., Shan, Y.: Instantmesh: efficient 3D mesh generation from a single image with sparse-view large reconstruction models. arXiv preprint arXiv:2404.07191 (2024)

  72. Ye, H., Zhang, J., Liu, S., Han, X., Yang, W.: IP-adapter: text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721 (2023)

  73. Yeh, S.Y., Hsieh, Y.G., Gao, Z., Yang, B.B., Oh, G., Gong, Y.: Navigating text-to-image customization: from lycoris fine-tuning to model evaluation. In: ICLR (2024)

    Google Scholar 

  74. Yi, T., et al.: Gaussiandreamer: fast generation from text to 3D gaussians by bridging 2D and 3D diffusion models. In: CVPR (2024)

    Google Scholar 

  75. Zeng, X., et al.: Paint3D: paint anything 3D with lighting-less texture diffusion models. In: CVPR, pp. 4252–4262 (2024)

    Google Scholar 

  76. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: ICCV, pp. 3836–3847 (2023)

    Google Scholar 

  77. Zhou, Q.Y., Koltun, V.: Color map optimization for 3D reconstruction with consumer depth cameras. ToG 33(4), 1–10 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62293554, 62206249, U2336212), Natural Science Foundation of Zhejiang Province, China (LZ24F020002).

Author information

Authors and Affiliations

Corresponding authors

Correspondence to Ying Li or Zhengping Li.

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, J. et al. (2025). DreamTexture: High-Fidelity Synthetic 3D Data Generation Through Decoupled Geometry and Texture Synthesis. In: Del Bue, A., Canton, C., Pont-Tuset, J., Tommasi, T. (eds) Computer Vision – ECCV 2024 Workshops. ECCV 2024. Lecture Notes in Computer Science, vol 15642. Springer, Cham. https://doi.org/10.1007/978-3-031-91907-7_18

Download citation

Keywords

Publish with us

Policies and ethics

Profiles

  1. Jing Li
  2. Zhengping Li