Research on Multimodal Generative Adversarial Networks in the Framework of Deep Learning


  • Ruilin Xu
  • Yutian Yang
  • Hongjie Qiu
  • Xiaoyi Liu
  • Jingbo Zhang



Image Recognition, Cross-modal, Generate Adversarial Network, Triplet Loss


This project aims to align facial and vocal characteristics within a closely related common space through the construction of multi-modal generative adversarial networks (GANs). The project proposes a multi-modal approach grounded in visual perception, utilizing the Graph Cut algorithm to align feature components with the image features of each corresponding local context, thereby achieving adaptability in multi-modal information. To enhance the speed and accuracy of the modeling process, a regional attention strategy is integrated. Experimental results demonstrate that the proposed algorithm enhances the accuracy of image recognition tasks.


Liu, Z., Yang, Y., Pan, Z., Sharma, A., Hasan, A., Ding, C., ... & Geng, T. (2023, July). Ising-cf: A pathbreaking collaborative filtering method through efficient ising machine learning. In 2023 60th ACM/IEEE Design Automation Conference (DAC) (pp. 1-6). IEEE.

Wang, X. S., Turner, J. D., & Mann, B. P. (2021). Constrained attractor selection using deep reinforcement learning. Journal of Vibration and Control, 27(5-6), 502-514.

Xin Chen , Yuxiang Hu, Ting Xu, Haowei Yang, Tong Wu. (2024). Advancements in AI for Oncology: Developing an Enhanced YOLOv5-based Cancer Cell Detection System. International Journal of Innovative Research in Computer Science and Technology (IJIRCST), 12(2),75-80, doi:10.55524/ijircst.2024.12.2.13.

Yao, J., Wu, T., & Zhang, X. (2023). Improving depth gradient continuity in transformers: A comparative study on monocular depth estimation with cnn. arXiv preprint arXiv:2308.08333.

Yan, X., Wang, W., Xiao, M., Li, Y., & Gao, M. (2024). Survival prediction across diverse cancer types using neural networks. doi:10.48550/ARXIV.2404.08713

Yulu Gong , Haoxin Zhang, Ruilin Xu, Zhou Yu, Jingbo Zhang. (2024). Innovative Deep Learning Methods for Precancerous Lesion Detection. International Journal of Innovative Research in Computer Science and Technology (IJIRCST), 12(2),81-86, doi:10.55524/ijircst.2024.12.2.14.

Yan, C., Qiu, Y., Zhu, Y. (2021). Predict Oil Production with LSTM Neural Network. In: Liu, Q., Liu, X., Li, L., Zhou, H., Zhao, HH. (eds) Proceedings of the 9th International Conference on Computer Engineering and Networks . Advances in Intelligent Systems and Computing, vol 1143. Springer, Singapore.

Hu, Z., Li, J., Pan, Z., Zhou, S., Yang, L., Ding, C., ... & Jiang, W. (2022, October). On the design of quantum graph convolutional neural network in the nisq-era and beyond. In 2022 IEEE 40th International Conference on Computer Design (ICCD) (pp. 290-297). IEEE.

Dai, W., Tao, J., Yan, X., Feng, Z., & Chen, J. (2023, November). Addressing Unintended Bias in Toxicity Detection: An LSTM and Attention-Based Approach. In 2023 5th International Conference on Artificial Intelligence and Computer Applications (ICAICA) (pp. 375-379). IEEE.

Wang, X. S., & Mann, B. P. (2020). Attractor Selection in Nonlinear Energy Harvesting Using Deep Reinforcement Learning. arXiv preprint arXiv:2010.01255.

Li, S., Kou, P., Ma, M., Yang, H., Huang, S., & Yang, Z. (2024). Application of Semi-supervised Learning in Image Classification: Research on Fusion of Labeled and Unlabeled Data. IEEE Access.

Xiao, M., Li, Y., Yan, X., Gao, M., & Wang, W. (2024). Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example. doi:10.48550/ARXIV.2404.08279

Liu, Y., Yang, H., & Wu, C. (2023). Unveiling patterns: A study on semi-supervised classification of strip surface defects. IEEE Access, 11, 119933-119946.

Abdulatif, S., Cao, R., & Yang, B. (2022). CMGAN: Conformer-based metric-GAN for monaural speech enhancement. arXiv preprint arXiv:2209.11112.

Guo, A., Hao, Y., Wu, C., Haghi, P., Pan, Z., Si, M., ... & Geng, T. (2023, June). Software-hardware co-design of heterogeneous SmartNIC system for recommendation models inference and training. In Proceedings of the 37th International Conference on Supercomputing (pp. 336-347).

Li, Y., Yan, X., Xiao, M., Wang, W., & Zhang, F. (2024). Investigation of Creating Accessibility Linked Data Based on Publicly Available Accessibility Datasets. In Proceedings of the 2023 13th International Conference on Communication and Network Security (pp. 77–81). Association for Computing Machinery.

Zi, Y., Wang, Q., Gao, Z., Cheng, X., & Mei, T. (2024). Research on the Application of Deep Learning in Medical Image Segmentation and 3D Reconstruction. Academic Journal of Science and Technology, 10(2), 8-12.

Foody, G. M., & Arora, M. K. (1997). An evaluation of some factors affecting the accuracy of classification by an artificial neural network. International Journal of Remote Sensing, 18(4), 799-810.







How to Cite

Xu, R., Yang, Y., Qiu, H., Liu, X., & Zhang, J. (2024). Research on Multimodal Generative Adversarial Networks in the Framework of Deep Learning. Journal of Computing and Electronic Information Management, 12(3), 84-88.

Similar Articles

1-10 of 73

You may also start an advanced similarity search for this article.