Virtual fitting research based on the diffusion model and ControlNet network

Abstract

Abstract: With the development and iteration of image generation models, models like Stable Diffusion based on the diffusion model have become the mainstream image generation models, providing a new way for clothing design and rendering. The diffusion model usually uses the text prompt word as the image generation condition and the generated picture has randomness. It is difficult to accurately generate the virtual fitting effect of a specific style. The application of ControlNet neural networks makes the generation of images more controllable. The trained Controlnet network can use the image information such as Canny edge map, depth map, and Openpose map as additional generation conditions of the diffusion model to control the human body posture, edge features, front and rear position relationship of the generated image. This paper briefly describes the development history and principle of the diffusion model, and explores its feasibility for generating virtual fitting renderings. To achieve the purpose of visualizing the clothing style diagram as the garment effect and realize the rapid generation of virtual fitting effect, This paper attempts to use ControlNet neural network to control the diffusion model to generate virtual fitting effect of virtual models wearing specified clothing styles.
The virtual fitting of three dresses was taken as an example for experimentation. Firstly, the images of real clothing models with expected posture were sampled, and the key human body images and pose depth maps of real models were extracted as the generation conditions. Then, the Controlnet control Stable Diffusion model was used to generate a virtual clothing model image that matches the intended pose. Subsequently, the edge image of the virtual model was generated by the Canny algorithm, and the edge image was edited and modified in combination with the dress style diagram. The edge image of the virtual model wearing the specified style dress was drawn, and it was used as the edge generation condition. The virtual fitting effect of the dress conforming to the specific style, color and fabric was generated by the text prompt-controlled diffusion model, and the style of the dress with the virtual fitting effect was changed in real time by modifying the edge image, so as to provide an intuitive reference for fashion designers to modify and adjust designs. In addition, the detailed feature control experiment of the virtual model was also carried out during the experiment; experiment on the control effect of text prompt word weight on clothing fabric and color was carried out. Finally, the generation effect of the proposed method was compared and evaluated with the effect of 3D modeling virtual fitting clothing.
The results show that the diffusion model combined with the ControlNet network can control the pose characteristics of the virtual model, allowing the virtual fitting effect of the expected clothing style to be generated by editing the Canny edge image control. Compared with 3D modeling, the virtual fitting effect is more expressive, the operation is more intuitive and faster, and it is more suitable for providing designers with intuitive clothing display in the style design stage, assisting designers to adjust the design style, color, fabric and process, and improving the efficiency of clothing design.

Key words: virtual fitting, diffusion model, ControlNet, virtual models, human keypoint detection, clothing design

摘要： 为快速生成特定服装款式的成衣效果图，采用扩散模型，应用ControlNet网络实现虚拟试衣。首先将人体的关键点检测图与深度图作为扩散模型的控制条件，生成姿态可控的虚拟模特；再通过Canny边缘图生成虚拟试衣效果图。以三款连衣裙为例进行虚拟试衣实验，并优化扩散模型控制条件的参数设置；最后将生成结果与三维建模虚拟试衣结果进行对比和评价。结果表明：结合ControlNet网络的扩散模型能够控制虚拟模特的姿态特征，通过服装Canny边缘图可以生成特定服装款式的虚拟试衣效果。该方法生成的虚拟试衣相较三维建模技术实现的虚拟试衣方法更具表现力，操作更加直观快捷，能够为设计师提供款式图的成衣效果可视化参考，从而提高服装设计效率。

关键词: 虚拟试衣, 扩散模型, ControlNet网络, 虚拟模特, 人体关键点检测, 服装设计

CLC Number:

TS 941.26

GUO Yuxuan, SUN Lin . Virtual fitting research based on the diffusion model and ControlNet network[J]. Advanced Textile Technology, 2024, 32(3): 118-128.

郭宇轩, 孙林. 基于扩散模型的ControlNet网络虚拟试衣研究[J]. 现代纺织技术, 2024, 32(3): 118-128.

References

[1] 崔萌,陈素英,殷文,等.基于虚拟试衣技术的服装设计与开发[J].毛纺科技,2020,48(6):58-61.
CUI Meng, CHEN Suying, YIN Wen, et al. Design and development of clothing based on virtual fitting technology[J]. Wool Textile Journal, 2020, 48(6): 58-61.
[2] 杨秀丽,谢子欣.基于3D虚拟试衣技术的服装可视化结构设计[J].针织工业,2023(2):70-74.
YANG Xiuli, XIE Zixin.Visualized structure design of clothing based on 3D virtual fitting technology[J]. Knitting Industries,2023(2):70-74.
[3] 薛萧昱,何佳臻,王敏.三维虚拟试衣技术在服装设计与性能评价中的应用进展[J].现代纺织技术,2023,31(2):12-22.
XUE Xiaoyu, HE Jiazhen, WANG Min. Application progress of 3Dvirtual fitting technology in fashion design and performance evaluation[J]. Advanced Textile Technology, 2023, 31(2):12-22.
[4] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C] // Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2014: 2672-2680.
[5] MIRZA M, OSINDERO S. Conditional generative adversarial nets[C]//NIPS Proceedings of advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2014:5767-5777.
[6] JETCHEV N, BERGMANN U. The conditional analogy GAN: swapping fashion articles on people images[C]//2017 IEEE International Conference on Computer Vision Workshops (ICCVW). October 22-29, 2017, Venice, Italy. IEEE, 2018: 2287-2292.
[7] 张颖,刘成霞.生成对抗网络在虚拟试衣中的应用研究进展[J].丝绸,2021,58(12):63-72.
ZHANG Ying, LIU Chengxia. Research progress on the application of generative adversarial network in virtual fitting[J]. Journal of Silk, 2021,58(12):63-72.
[8] HAN X T, WU Z X, WU Z, et al. VITON: An image-based virtual try-on network[C]// Proceedings of 2018 IEEE/ CVF Conference on Computer Vision and Pattern Recognition. June 18-23, 2018, Salt Lake City, UT, USA.IEEE,2018:7543-7552.
[9] WANG B C, ZHENG H B, LIANG X D, et al. Toward characteristic-preserving image-based virtual try-on network[C]// Proceedings of the European Conference on Computer Vision (ECCV). Cham: Springer, 2018:607-623.
[10] MEN Y F, MAO Y M, JIANG Y N, et al. Controllable person image synthesis with attribute-decomposed GAN[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). June 13-19, 2020, Seattle, WA, USA. IEEE, 2020: 5083-5092.
[11] 张淑芳,王沁宇.基于生成对抗网络的虚拟试穿方法[J].天津大学学报(自然科学与工程技术版),2021,54(9):925-933.
ZHANG Shufang, WANG Qinyu. Generative-adversarial-network-based virtual try-on method[J]. Journal of Tianjin University (Science and Technology),2021,54(9): 925-933.
[12] ZHANG L M, RAO A Y, AGRAWALA M, Adding conditional control to text-to-image diffusion models[EB/OL](2023-09-02)[2023-10-15]. https://arxiv.org/abs/2302.05543.
[13] SOHL-DICKSTEIN J, WEISS E A, MAHESWARANATHAN N, et al. Deep unsupervised learning using nonequilibrium thermodynamics[C]//Proceedings of the 32nd International Conference on Machine Learning - Volume 37. July 6 - 11, 2015, Lille, France. New York,NY:ACM,2015:2256-2265.
[14] SONG Y, ERMON S. Generative modeling by estimating gradients of the data distribution[EB/OL]. (2020-10-10)[2023-7-23]ArXiv,2019:1907.05600. https://arxiv.org/abs/1907.05600.
[15] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. December 6 - 12, 2020, Vancouver, BC, Canada. New York: ACM, 2020: 6840-6851.
[16] DHARIWAL P, NICHOL A.Diffusion models beat GANs on image synthesis[JEB/OL]. 2021: arXiv: 2105.05233. https://arxiv.org/abs/2105.05233.
[17] ROMBACH R, BLATTMANN A,LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 18-24, 2022, New Orleans, LA, USA. IEEE, 2022: 10674-10685.
[18] 余青龙.AI绘画软件的创作特征研究:以绘画软件Novel AI生成的动漫人物形象为例[J].信阳师范学院学报(哲学社会科学版),2023,43(3):127-132.
YU Qinglong. A study of the creative features of AI drawing software: Exampled by anime characters generated by Novel AI[J]. Journal of Xinyang Normal University(Philosophy and Social Sciences Edition), 2023, 43(3):127-132.
[19] CANNY J.A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, PAMI-8(6): 679-698.
[20] CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields[C] //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). July 21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 1302-1310.
[21] 谭泽霖,白静.二维图像虚拟试衣技术综述[J].计算机工程与应用,2023,59(15):17-26.
TAN Zelin, BAI Jing. Survey of two-dimensional image virtual try-on technology[J]. Computer Engineering and Applications,2023, 59(15):17-26.
[22] 花爱玲,余锋,陈子宜,等.深度学习在二维虚拟试衣技术的应用与进展[J].计算机工程与应用,2023,59(11):37-45.
HUA Ailing, YU Feng, CHEN Ziyi, et al. Application and progress of deep learning in 2D virtual try-on technology[J]. Computer Engineering and Applications, 2023,59(11):37-45.