Recently, significant advances have been made in 3D object generation. Building upon the generated geometry, current pipelines typically employ image diffusion models to generate multi-view RGB images, followed by UV texture reconstruction through texture baking. While 3D geometry generation has matured significantly, supported by multiple open-source frameworks, 3D texture generation remains underexplored. In this work, we systematically investigate 3D texture generation through the lens of three core dimensions: reference-texture alignment, geometry-texture consistency, and local texture quality. To tackle these issues, we propose MVPainter, which employs data filtering and augmentation strategies to enhance texture fidelity and detail, and introduces ControlNet-based geometric conditioning to improve texture-geometry alignment. Furthermore, we extract physically-based rendering (PBR) attributes from the generated views to produce PBR meshes suitable for real-world rendering applications. MVPainter achieves state-of-the-art results across all three dimensions, as demonstrated by human-aligned evaluations. To facilitate further research and reproducibility, we also release our full pipeline as an open-source system, including data construction, model architecture, and evaluation tools.
At present, mainstream texture generation methods usually use 2D image diffusion models to generate RGB images, and then map them to 3D surfaces through a projection strategy. However, three core challenges remain unresolved in current approaches: (1)Reference-texture alignment — it is difficult to ensure that the generated texture accurately reflects the visual characteristics of the reference image, especially under varying lighting and occlusions; (2) Geometry-texture consistency — aligning textures precisely with the 3D surface remains challenging; (3) Local texture quality — many methods struggle to produce textures with sufficient details. Below is the framework of MV-Painter. First, we apply data filtering and augmentation strategies to ensure that the training data contains sufficient detail and variations in lighting and viewpoint. Then, we leverage a ControlNet-based architecture to generate multi-view texture images that are structurally consistent with the 3D geometry. Finally, we introduce a dedicated PBR extraction module to estimate the basecolor, metallic, and roughness maps, which are projected back onto the 3D mesh to obtain a PBR-textured model.
@article{shao2025mvpainter,
title={MVPainter: Accurate and Detailed 3D Texture Generation via Multi-View Diffusion with Geometric Control},
author={Shao, Mingqi and Xiong, Feng and Sun, Zhaoxu and Xu, Mu},
journal={arXiv preprint arXiv:2505.12635},
year={2025},
url={https://arxiv.org/abs/2505.12635}
}