Abstract
Recent single-image to 3D generation methods commonly adopt multi-view diffusion and large reconstruction models to achieve fast 3D content generation. Despite the impressive generation speed and geometrical consistency, there are multiple deficiencies including texture distortion, color deviation, and insufficient resolution. To address these deficiencies, we present DreamTexture, a high-fidelity method for decoupling geometry and texture synthesis in two stages. The main idea is to combine the powerful geometrical generation capability of the large multi-view Gaussian model with the texture alignment ability of the synchronized multi-view diffusion strategy. To synthesize subject-driven personalized textures for decorating the 3D object, we train a personalized diffusion model to generate subject-driven multi-view images. These images are then mapped to the texture domain to synthesize high-resolution textures. To improve the quality of multi-view images for further optimizing texture details, we instantiate two scaling factors to re-balance the contributions of backbone features and skip features in personalized diffusion. Experimental results on public datasets demonstrate that DreamTexture significantly outperforms the latest state-of-the-art methods, both qualitatively and quantitatively. Notably, our framework can generate high-fidelity 3D assets with detailed textures from scratch, featuring remarkable training scalability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-NeRF 360: unbounded anti-aliased neural radiance fields. In: CVPR, pp. 5470–5479 (2022)
Bi, S., Kalantari, N.K., Ramamoorthi, R.: Patch-based optimization for image-based texture mapping. TOG 36(4), 106–1 (2017)
Bokhovkin, A., Tulsiani, S., Dai, A.: Mesh2tex: generating mesh textures from image queries. In: ICCV, pp. 8918–8928 (2023)
Cao, T., Kreis, K., Fidler, S., Sharp, N., Yin, K.: Texfusion: synthesizing 3D textures with text-guided image diffusion models. In: ICCV, pp. 4169–4181 (2023)
Chen, D.Z., Siddiqui, Y., Lee, H.Y., Tulyakov, S., Nießner, M.: Text2tex: text-driven texture synthesis via diffusion models. In: ICCV, pp. 18558–18568 (2023)
Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3D: disentangling geometry and appearance for high-quality text-to-3D content creation. In: ICCV, pp. 22246–22256 (2023)
Chen, Z., Yin, K., Fidler, S.: AUV-Net: Learning aligned UV maps for texture transfer and synthesis. In: CVPR, pp. 1465–1474 (2022)
Downs, L., et al.: Google scanned objects: a high-quality dataset of 3D scanned household items. In: ICRA, pp. 2553–2560. IEEE (2022)
Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. In: CVPR, pp. 5501–5510 (2022)
Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion. In: ICLR (2023)
Gu, Y., et al.: Mix-of-show: decentralized low-rank adaptation for multi-concept customization of diffusion models. In: NeurIPS, vol. 36 (2023)
Guo, D., Li, K., Hu, B., Zhang, Y., Wang, M.: Benchmarking micro-action recognition: dataset, method, and application. TCSVT (2024)
Guo, Y.C., et al.: Threestudio: a unified framework for 3D content generation. https://github.com/threestudio-project/threestudio (2023)
Hang, T., et al.: Efficient diffusion training via min-snr weighting strategy. In: ICCV, pp. 7441–7451 (2023)
Henderson, P., Tsiminaki, V., Lampert, C.H.: Leveraging 2D data to learn textured 3D mesh generation. In: CVPR, pp. 7498–7507 (2020)
Hong, Y., et al.: LRM: large reconstruction model for single image to 3D. In: ICLR (2024)
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. In: ICLR (2022)
Huang, J., et al.: Adversarial texture optimization from RGB-D scans. In: CVPR, pp. 1559–1568 (2020)
Huang, Y., Wang, J., Shi, Y., Tang, B., Qi, X., Zhang, L.: Dreamtime: an improved optimization strategy for diffusion-guided 3D generation. In: ICLR (2024)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. TOG 42(4), 1–14 (2023)
Kopf, J., Fu, C.W., Cohen-Or, D., Deussen, O., Lischinski, D., Wong, T.T.: Solid texture synthesis from 2D exemplars. In: ACM SIGGRAPH, pp. 2–es (2007)
Kumari, N., Zhang, B., Zhang, R., Shechtman, E., Zhu, J.Y.: Multi-concept customization of text-to-image diffusion. In: CVPR, pp. 1931–1941 (2023)
Lefebvre, S., Hoppe, H.: Appearance-space texture synthesis. TOG 25(3), 541–548 (2006)
Li, J., et al.: Instant3D: fast text-to-3D with sparse-view generation and large reconstruction model. In: ICLR (2024)
Li, Y., et al.: Gligen: open-set grounded text-to-image generation. In: CVPR, pp. 22511–22521 (2023)
Lin, C.H., et al.: Magic3D: high-resolution text-to-3D content creation. In: CVPR, pp. 300–309 (2023)
Liu, M., et al.: One-2-3-45++: fast single image to 3D objects with consistent multi-view generation and 3D diffusion. In: CVPR, pp. 10072–10083 (2024)
Liu, M., et al.: One-2-3-45: any single image to 3D mesh in 45 seconds without per-shape optimization. In: NeurIPS, vol. 36 (2023)
Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: zero-shot one image to 3D object. In: ICCV, pp. 9298–9309 (2023)
Liu, Y., et al.: Syncdreamer: generating multiview-consistent images from a single-view image. In: ICLR (2024)
Liu, Y., Xie, M., Liu, H., Wong, T.T.: Text-guided texturing by synchronized multi-view diffusion. arXiv preprint arXiv:2311.12891 (2023)
Long, X., et al.: Wonder3D: single image to 3D using cross-domain diffusion. In: CVPR (2024)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
Lu, J., et al.: Context-aware textures. TOG 26(1), 3–es (2007)
Luo, Y., Liu, P., Yang, Y.: Kill two birds with one stone: Domain generalization for semantic segmentation via network pruning. In: IJCV, pp. 1–18 (2024)
Luo, Y., Liu, P., Zheng, L., Guan, T., Yu, J., Yang, Y.: Category-level adversarial adaptation for semantic segmentation using purified features. In: TPAMI, pp. 3940–3956 (2021)
Luo, Y., Yang, Y.: Large language model and domain-specific model collaboration for smart education. FITEE 25(3), 333–341 (2024)
Ma, S., Luo, Y., Yang, Y.: Reconstructing and simulating dynamic 3D objects with mesh-adsorbed gaussian splatting. arXiv preprint arXiv:2406.01593 (2024)
Melas-Kyriazi, L., Laina, I., Rupprecht, C., Vedaldi, A.: Realfusion: 360deg reconstruction of any object from a single image. In: CVPR, pp. 8446–8455 (2023)
Mertens, T., Kautz, J., Chen, J., Bekaert, P., Durand, F.: Texture transfer using geometry correlation. Rendering Tech. 273(10.2312), 273–284 (2006)
Metzer, G., Richardson, E., Patashnik, O., Giryes, R., Cohen-Or, D.: Latent-nerf for shape-guided generation of 3D shapes and textures. In: CVPR, pp. 12663–12673 (2023)
Miao, Q., Luo, Y., Yang, Y.: Pla4D: pixel-level alignments for text-to-4D gaussian splatting. arXiv preprint arXiv:2405.19957 (2024)
Mildenhall, B., Srinivasan, P., Tancik, M., Barron, J., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Min, Z., Luo, Y., Yang, W., Wang, Y., Yang, Y.: Entangled view-epipolar information aggregation for generalizable neural radiance fields. In: CVPR, pp. 4906–4916 (2024)
Mohammad Khalid, N., Xie, T., Belilovsky, E., Popa, T.: Clip-mesh: generating textured meshes from text using pretrained image-text models. In: ACM SIGGRAPH, pp. 1–8 (2022)
Mou, C., et al.: T2i-adapter: learning adapters to dig out more controllable ability for text-to-image diffusion models. In: AAAI, vol. 38, pp. 4296–4304 (2024)
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. TOG 41(4), 1–15 (2022)
Pavllo, D., Kohler, J., Hofmann, T., Lucchi, A.: Learning generative models of textured 3D meshes from real-world images. In: ICCV, pp. 13879–13889 (2021)
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3D using 2D diffusion. In: ICLR (2023)
Qian, G., et al.: Magic123: one image to high-quality 3D object generation using both 2D and 3D diffusion priors. In: ICLR (2024)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763. PMLR (2021)
Richardson, E., Metzer, G., Alaluf, Y., Giryes, R., Cohen-Or, D.: Texture: text-guided texturing of 3D shapes. In: ACM SIGGRAPH, pp. 1–11 (2023)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10684–10695 (2022)
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: fine tuning text-to-image diffusion models for subject-driven generation. In: CVPR, pp. 22500–22510 (2023)
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. In: NeurIPS, vol. 35, pp. 36479–36494 (2022)
Sanghi, A., et al.: Clip-forge: towards zero-shot text-to-shape generation. In: CVPR, pp. 18603–18613 (2022)
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR, pp. 4104–4113 (2016)
Shi, R., et al.: Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:2310.15110 (2023)
Shi, Y., Wang, P., Ye, J., Mai, L., Li, K., Yang, X.: MVDream: multi-view diffusion for 3D generation. In: ICLR (2024)
Si, C., Huang, Z., Jiang, Y., Liu, Z.: Freeu: free lunch in diffusion U-Net. In: CVPR (2024)
Siddiqui, Y., Thies, J., Ma, F., Shan, Q., Nießner, M., Dai, A.: Texturify: generating textures on 3D shape surfaces. In: ECCV, pp. 72–88. Springer (2022)
Sun, J., et al.: Dreamcraft3D: hierarchical 3D generation with bootstrapped diffusion prior. In: ICLR (2024)
Tang, J., Chen, Z., Chen, X., Wang, T., Zeng, G., Liu, Z.: LGM: large multi-view gaussian model for high-resolution 3D content creation. arXiv preprint arXiv:2402.05054 (2024)
Tang, J., Ren, J., Zhou, H., Liu, Z., Zeng, G.: Dreamgaussian: generative gaussian splatting for efficient 3D content creation. In: ICLR (2024)
Tang, J., et al.: Make-it-3D: high-fidelity 3D creation from a single image with diffusion prior. In: ICCV, pp. 22819–22829 (2023)
Tochilkin, D., et al.: Triposr: fast 3D object reconstruction from a single image. arXiv preprint arXiv:2403.02151 (2024)
Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score jacobian chaining: lifting pretrained 2D diffusion models for 3d generation. In: CVPR, pp. 12619–12629 (2023)
Wang, P., Shi, Y.: Imagedream: image-prompt multi-view diffusion for 3D generation. arXiv preprint arXiv:2312.02201 (2023)
Wang, Z., et al.: Prolificdreamer: high-fidelity and diverse text-to-3D generation with variational score distillation. In: NeurIPS (2023)
Xu, D., Jiang, Y., Wang, P., Fan, Z., Wang, Y., Wang, Z.: Neurallift-360: lifting an in-the-wild 2D photo to a 3D object with 360deg views. In: CVPR, pp. 4479–4489 (2023)
Xu, J., Cheng, W., Gao, Y., Wang, X., Gao, S., Shan, Y.: Instantmesh: efficient 3D mesh generation from a single image with sparse-view large reconstruction models. arXiv preprint arXiv:2404.07191 (2024)
Ye, H., Zhang, J., Liu, S., Han, X., Yang, W.: IP-adapter: text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721 (2023)
Yeh, S.Y., Hsieh, Y.G., Gao, Z., Yang, B.B., Oh, G., Gong, Y.: Navigating text-to-image customization: from lycoris fine-tuning to model evaluation. In: ICLR (2024)
Yi, T., et al.: Gaussiandreamer: fast generation from text to 3D gaussians by bridging 2D and 3D diffusion models. In: CVPR (2024)
Zeng, X., et al.: Paint3D: paint anything 3D with lighting-less texture diffusion models. In: CVPR, pp. 4252–4262 (2024)
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: ICCV, pp. 3836–3847 (2023)
Zhou, Q.Y., Koltun, V.: Color map optimization for 3D reconstruction with consumer depth cameras. ToG 33(4), 1–10 (2014)
Acknowledgements
This work was supported by the National Natural Science Foundation of China (62293554, 62206249, U2336212), Natural Science Foundation of Zhejiang Province, China (LZ24F020002).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, J. et al. (2025). DreamTexture: High-Fidelity Synthetic 3D Data Generation Through Decoupled Geometry and Texture Synthesis. In: Del Bue, A., Canton, C., Pont-Tuset, J., Tommasi, T. (eds) Computer Vision – ECCV 2024 Workshops. ECCV 2024. Lecture Notes in Computer Science, vol 15642. Springer, Cham. https://doi.org/10.1007/978-3-031-91907-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-91907-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-91906-0
Online ISBN: 978-3-031-91907-7
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science