How to Imagine the World with Text? From Text-to-image Generation View

Jingyi Liu

doi:10.54097/hset.v39i.6619

Authors

Jingyi Liu

DOI:

https://doi.org/10.54097/hset.v39i.6619

Keywords:

Text-to-image; Image Generation; GANs; cGAN; CLIP.

Abstract

Words are an effective and convenient way to describe the world, but sometimes what the texts convey may be misunderstood by readers. The expression of pictures is more vivid, easy to understand and has no borders, but creating a painting often takes a long time. Text-to-image makes the two expressions complement each other: It makes every ordinary person a “painter”, so that they can feel the world, express themselves, and create more whimsy through many rich pictures. For this vision, technologists are trying their best to improve image generation models, which enables computers to generate high quality images with texts better. And they are solving some technical defects, for instance, sometimes the content of generated images is strange. In the future, text-to-image can be adapted to applications in AI such as computer-aided design, image editing, and be employed in the field of art such as movies and artworks, and then it may even make a big difference on people's life, enriching the public's spiritual world and conveying information by vivid images.

Downloads

Download data is not yet available.

References

Lee H, Ullah U, Lee J S, et al. A Brief Survey of text driven image generation and manipulation [C]//2021 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia). IEEE, 2021: 1-4.

Agnese J, Herrera J, Tao H, et al. A survey and taxonomy of adversarial neural networks for text‐to‐image synthesis[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2020, 10(4): e1345.

Zhu X, Goldberg A B, Eldawy M, et al. A text-to-picture synthesis system for augmenting communication [C]// AAAI. 2007, 7: 1590-1595.

Mirza M, Osindero S. Conditional generative adversarial nets[J]. arXiv preprint arXiv:1411.1784, 2014.

Frolov S, Hinz T, Raue F, et al. Adversarial text-to-image synthesis: A review[J]. Neural Networks, 2021, 144: 187-209.

Zhou Rui, Jiang C, Xu Qi. A Review of Text-to-Image Synthesis Based on Generative Adversarial Networks [J]. Neural Computing, 2021, 451: 316-336.

Li Y, Liang F, Zhao L, et al. Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm[J]. arXiv preprint arXiv:2110.05208, 2021.

Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning. PMLR, 2021: 8748-8763.

Yu J, Xu Y, Koh JY, et al. Scaling autoregressive models for content-rich text-to-image generation[J]. arXiv preprint arXiv:2206.10789, 2022.

Marcus G, Davis E, Aaronson S. A very preliminary analysis of DALL-E 2[J]. arXiv preprint arXiv: 2204. 13807, 2022.

Sabini M, Rusak G. Painting outside the box: Image outpainting with gans[J]. arXiv preprint arXiv: 1808. 08483, 2018.

How to Imagine the World with Text? From Text-to-image Generation View

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Indexing

Latest publications