How to Imagine the World with Text? From Text-to-image Generation View
DOI:
https://doi.org/10.54097/hset.v39i.6619Keywords:
Text-to-image; Image Generation; GANs; cGAN; CLIP.Abstract
Words are an effective and convenient way to describe the world, but sometimes what the texts convey may be misunderstood by readers. The expression of pictures is more vivid, easy to understand and has no borders, but creating a painting often takes a long time. Text-to-image makes the two expressions complement each other: It makes every ordinary person a “painter”, so that they can feel the world, express themselves, and create more whimsy through many rich pictures. For this vision, technologists are trying their best to improve image generation models, which enables computers to generate high quality images with texts better. And they are solving some technical defects, for instance, sometimes the content of generated images is strange. In the future, text-to-image can be adapted to applications in AI such as computer-aided design, image editing, and be employed in the field of art such as movies and artworks, and then it may even make a big difference on people's life, enriching the public's spiritual world and conveying information by vivid images.
Downloads
References
Lee H, Ullah U, Lee J S, et al. A Brief Survey of text driven image generation and manipulation [C]//2021 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia). IEEE, 2021: 1-4.
Agnese J, Herrera J, Tao H, et al. A survey and taxonomy of adversarial neural networks for text‐to‐image synthesis[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2020, 10(4): e1345.
Zhu X, Goldberg A B, Eldawy M, et al. A text-to-picture synthesis system for augmenting communication [C]// AAAI. 2007, 7: 1590-1595.
Mirza M, Osindero S. Conditional generative adversarial nets[J]. arXiv preprint arXiv:1411.1784, 2014.
Frolov S, Hinz T, Raue F, et al. Adversarial text-to-image synthesis: A review[J]. Neural Networks, 2021, 144: 187-209.
Zhou Rui, Jiang C, Xu Qi. A Review of Text-to-Image Synthesis Based on Generative Adversarial Networks [J]. Neural Computing, 2021, 451: 316-336.
Li Y, Liang F, Zhao L, et al. Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm[J]. arXiv preprint arXiv:2110.05208, 2021.
Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning. PMLR, 2021: 8748-8763.
Yu J, Xu Y, Koh JY, et al. Scaling autoregressive models for content-rich text-to-image generation[J]. arXiv preprint arXiv:2206.10789, 2022.
Marcus G, Davis E, Aaronson S. A very preliminary analysis of DALL-E 2[J]. arXiv preprint arXiv: 2204. 13807, 2022.
Sabini M, Rusak G. Painting outside the box: Image outpainting with gans[J]. arXiv preprint arXiv: 1808. 08483, 2018.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







