Research on Multimodal Generative Adversarial Networks in the Framework of Deep Learning


  • Ruilin Xu
  • Yutian Yang
  • Hongjie Qiu
  • Xiaoyi Liu
  • Jingbo Zhang



Image Recognition, Cross-modal, Generate Adversarial Network, Triplet Loss


This project aims to align facial and vocal characteristics within a closely related common space through the construction of multi-modal generative adversarial networks (GANs). The project proposes a multi-modal approach grounded in visual perception, utilizing the Graph Cut algorithm to align feature components with the image features of each corresponding local context, thereby achieving adaptability in multi-modal information. To enhance the speed and accuracy of the modeling process, a regional attention strategy is integrated. Experimental results demonstrate that the proposed algorithm enhances the accuracy of image recognition tasks.


Xu, R., Yang, Y., Qiu, H., Liu, X., & Zhang, J. (2024). Research on Multimodal Generative Adversarial Networks in the Framework of Deep Learning. Journal of Computing and Electronic Information Management, 12(3), 84-88.

