Research On Text Generated Images Based on GAN And Diffusion

Junduo Zheng

doi:10.54097/ak96vm73

Authors

Junduo Zheng School of mathematics and statistics, Northeastern University, Qinhuangdao, China

DOI:

https://doi.org/10.54097/ak96vm73

Keywords:

GAN, Diffusion, Text to Image, Image Multimodal.

Abstract

With the continuous development of deep learning, the generation of content by artificial intelligence has become a hot topic. Especially in the field of text to image generation, significant progress has been made. This article comprehensively compares text image generation methods based on Generative Adversarial Network and Diffsuion and their applications in text image generation tasks, demonstrating their respective advantages and limitations as well as possible solutions. Meanwhile, this article delves into the specific methods of diffusion models in improving image quality, optimizing model efficiency, and generating images based on multilingual text prompts. Through experimental analysis on the Microsoft Common Object Context Dataset (COCO), the zero sample generation ability of the diffusion model and the performance improvement of the GAN model were verified, highlighting the advantages of the GAN model in terms of lightweight and small parameter count. In addition, this article introduces the broad application prospects of diffusion models in fields such as text 3D generation and video generation. Finally, the challenges and future development trends faced by diffusion models in text to image generation tasks were summarized, with the hope of providing convenience for further research in this field.

Downloads

Download data is not yet available.

References

[1] Gao, X. Y., Du, F., Song, L. J. (2024). Comparative Review of Text-to-Image Generation Techniques Based on Diffusion Modelsl. Computer Engineering and Applications. 60 (24).

[2] Jonathan, H., Ajay, J., Pieter, A. (2020). Denoising Diffusion Probabilistic Models. NIPS'20: Proceedings of the 34th International Conference on Neural Information Processing Systems Article No.: 574, Pages 6840 – 6851.

[3] Li, L. Y., Tong, G. X., Zhao, Y. Z. et al. (2023). Survey of Text-to-Image Synthesis Based on Generative Adversarial Network. Electronic Sci. ＆ Tech.

[4] Li, W. Y., Du, H. B., and Zhang, Q. (2025). Text generation image algorithm based on improved stable diffusion model and noise concatenation. Journal of Nanjing University of Information Science and Technology, 1-14.

[5] Liu, Z. R., Yin, F. Y., Xue, W. H. et al. (2023). A review of conditional image generation based on diffusion models. Journal of Zhejiang University (Science Edition) Volume 50, Issue 6.

[6] Mohamed, E., Omar, E., Somaya, A. et al. (2022). Image Generation: A Review. Article in Neural Processing Letters.

[7] Qiao, L. T., Zhang, J., Xu, D. Q. et al. (2019). MirrorGAN: Learning Text-to-image Generation by Redescription. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). DOI: https://doi.org/10.1109/CVPR.2019.00160

[8] Raffaella, B., Ruket, C., Desmond, E. et al. (2016). Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures. Journal of Artificial Intelligence Research 55. 409-442 DOI: https://doi.org/10.1613/jair.4900

[9] Sibi, M. (2024). An Overview of Text to Visual Generation Using GAN. Indian Journal of Image Processing and Recognition (IJIPR)ISSN: 2582-8037 (Online), Volume-4 Issue-3 DOI: https://doi.org/10.54105/ijipr.A8041.04030424

[10] Xu, Y. W., Chen, G. (2025). Text Matching Image Generation Model Based on Improved GAN Algorithm. Journal of Jilin University (Information Science Edition) Volume 43, Issue 2