基于扩散模型的多阶段一致可控虚拟试衣

摘要/Abstract

摘要： 为解决现有扩散模型在虚拟试衣任务中难以保持服装纹理和色彩一致性的问题，提出了一种新的多阶段虚拟试衣方法。该方法包括两个阶段：在第一阶段，设计了一个基于多尺度特征注意力模块的服装形变网络，以抑制参考服装在形变过程中出现的纹理不自然现象，确保服装纹理和色彩的准确传递。在第二阶段，通过结合ControlNet和边缘与角落区域掩码，构建了一个局部修复网络，该网络可以在保持服装纹理细节的同时，专注于修复服装与人体的边界区域。结果表明：服装形变网络能够保持服装主干区域的色彩和纹理特征，局部修复网络能够填补人体和服装边界区域的纹理，从而提高虚拟试衣的真实感。相比现有虚拟试衣方法，该多阶段虚拟试衣方法在视觉一致性方面表现出优越的效果，提升了虚拟试衣体验。

关键词: 虚拟试衣, 扩散模型, 一致性, ControlNet

Abstract: "With the rapid advancement of image generation technology, diffusion models have emerged as a dominant approach in in the field of image synthesis and demonstrated significant potential across various application scenarios, with virtual try-on being one of its crucial application directions. Compared to traditional generative models, diffusion models excel in producing higher-quality images. However, due to their inherent randomness, it is challenging to ensure consistency between the generated try-on clothing and the reference clothing. Currently, despite extensive efforts to enhance consistency by incorporating conditional control into diffusion models, but they still face challenges such as accurately preserving clothing details and optimizing the integration quality between clothing and the human body. To address these issues, this paper proposes a multi-stage virtual try-on framework aimed at improving clothing consistency and enhancing the integration effect between clothing and the human body. The proposed method consists of two primary stages. In the first stage, a clothing deformation network integrated with a multi-scale feature attention module is designed to reduce texture distortion during the clothing deformation process, improve the stability of clothing deformation, and enhance the ability to preserve detailed textures. In the second stage, a local refinement network based on ControlNet is introduced to perform detailed refinement on the boundary regions between clothing and the human body. By incorporating dense pose mapping and clothing reference images as additional control signals, this network can effectively reconstruct clothing details, achieving a more natural integration of clothing with the human body. Experimental results demonstrate that the proposed method offers significant advantages in maintaining clothing consistency while improving the realism of the try-on images. Ablation experiments and comparative analysis with current mainstream virtual try-on frameworks further validate the superiority of this approach in preserving clothing details and optimizing the clothing deformation effect. The results indicate that this method has achieved significant improvements in enhancing the quality of virtual try-on, but there are still certain limitations. To streamline the workflow and improve overall try-on quality, future research will explore end-to-end training strategies and conduct training and testing on larger, more diverse datasets to enhance the model's generalization capability. By further optimizing the method and addressing existing challenges, this study aims to provide more reliable technical support for intelligent clothing design and digital clothing applications. The proposed method effectively tackles issues related to clothing deformation and detail preservation, significantly elevating the visual experience of virtual try-on and providing users with more realistic and reliable try-on effects. These improvements open up new possibilities for applications in intelligent clothing design, online shopping platforms, and digital clothing displays."

Key words: "virtual try-on, diffusion model, consistency, ControlNet

中图分类号:

孔东帅, 卢健, 孙鸿昱, 张曦文, 高惠雨. 基于扩散模型的多阶段一致可控虚拟试衣 [J]. 现代纺织技术.

参考文献

"[1] 施倩, 罗戎蕾. 基于生成对抗网络的服装图像生成研究进展[J]. 现代纺织技术, 2023, 31(2): 36-46. SHI Q, LUO R L. Research progress of clothing image generation based on Generative Adversarial Networks[J]. Advanced Textile Technology, 2023, 31(2): 36-46. [2] 阮艳雯,施雨荷,顾力文,等. 人机交互感知对虚拟试衣体验满意度的影响[J].丝绸,2023,60(5):87-96. RUAN Y W, SHI Y H, GU L W, et al. Influence of human-computer interaction perception on the satisfaction of virtual fitting experience[J]. Journal of Silk, 2023,60(5): 87-96. [3] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA. IEEE, 2022: 10684-10695. [4] MORELLI D, BALDRATI A, CARTELLA G, et al. LaDI-VTON: latent diffusion textual-inversion enhanced virtual try-on[C]//Proceedings of the 31st ACM International Conference on Multimedia. Ottawa ON, Canada. ACM, 2023: 8580-8589. [5] ZHU L, YANG D, ZHU T, et al. TryOnDiffusion: a tale of two UNets[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Vancouver, BC, Canada. IEEE, 2023: 4606-4615. [6] GOU J, SUN S, ZHANG J, et al. Taming the power of diffusion models for high-quality virtual try-on with appearance flow[C]//Proceedings of the 31st ACM International Conference on Multimedia. Ottawa ON, Canada. ACM, 2023: 7599-7607. [7] 赵娟, 魏雪霞, 徐增波. 基于深度学习的2D虚拟试衣技术研究进展[J]. 丝绸, 2021, 58(9): 48-52. ZHAO J, WEI X X, XU Z B. Research progress of 2D virtual fitting technology based on deep learning[J]. Journal of Silk, 2021, 58(9): 48-52. [8] 柳思雨. 基于流变换的虚拟试穿方法研究[D]. 杭州: 杭州电子科技大学, 2024. LIU S Y. Research on virtual try-on method based on flow[D]. Hangzhou: Hangzhou Dianzi University, 2024. [9] BOOKSTEIN F L. Principal warps: Thin-plate splines and the decomposition of deformations[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1989, 11(6): 567-585. [10] JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[EB/OL]. 2015: 1506.02025. https://arxiv.org/abs/1506.02025v3. [11] ZHANG L, RAO A, AGRAWALA M. Adding conditional control to text-to-image diffusion models[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). Paris, France. IEEE, 2023: 3813-3824. [12] 郭宇轩, 孙林. 基于扩散模型的ControlNet网络虚拟试衣研究[J]. 现代纺织技术, 2024, 32(3): 118-128.8. GUO Y X, SUN L. Virtual fitting research based on the diffusion model and ControlNet network[J]. Advanced Textile Technology, 2024, 32(3): 118-128. [13] GE Y, SONG Y, ZHANG R, et al. Parser-free virtual try-on via distilling appearance flows[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, TN, USA. IEEE, 2021: 8481-8489. [14] SUN D, ROTH S, BLACK M J. A quantitative analysis of current practices in optical flow estimation and the principles behind them[J]. International Journal of Computer Vision, 2014, 106(2): 115-137. [15] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning. New York, NY: ACM, 2021: 8748-8763. [16] CHOI S, PARK S, LEE M, et al. VITON-HD: high-resolution virtual try-on via misalignment-aware normalization[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, TN, USA. IEEE, 2021: 14126-14135. "

编辑推荐 0

Metrics

阅读次数

全文

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	3	0	0

	来源	本网站

	次数	3
	比例	100%

摘要

最新录用	在线预览	正式出版

19	0	0

	来源	本网站

	次数	19
	比例	100%