Text-to-Classic: A Diffusion Method for Classical Art Generation Based on Text

Authors

  • Yi Li

DOI:

https://doi.org/10.54097/fcis.v3i1.6030

Keywords:

Denoising Diffusion Probabilistic Model, Text-to-Image Generation, Art

Abstract

Text-to-Image generation has recently become a hot research topic and diffusion models have achieved remarkable performance in this task. However, most previous researches aim at real scene generation. Few researches focus on classical art paintings. Besides, diffusion models are commonly heavy-weighted with a large number of parameters, which has a high computational cost. In this paper, we aim to solve the classical art paintings synthesis subtask. We propose a lightweight diffusion model Text-to-Classic(T2C) to synthesize classical art paintings according to text descriptions. Experiment results show that our method can achieve good performance with fewer parameters.

Downloads

Download data is not yet available.

References

Saharia, C., Chan, W., Saxena, S., et al. (2022). Photorealistic text-to-image diffusion models with deep language understanding. Arxiv Preprint, 2205.11487.

Reed, S., Akata, Z., Yan, X., et al. (2016). Generative adversarial text to image synthesis. PMLR, 1060-1069.

Dash, A., Gamboa, J. C. B., Ahmed, S., et al. (2017). Tac-gan-text conditioned auxiliary classifier generative adversarial network. Arxiv Preprint, 1703.06412.

Zhang, H., Xu, T., Li, H., et al. (2017). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. Proceedings of the IEEE international conference on computer vision, 5907-5915.

Yuan, M., Peng, Y. (2019). Bridge-GAN: Interpretable representation learning for text-to-image synthesis. IEEE Transactions on Circuits and Systems for Video Technology, 30(11):4258-4268.

Ho, J., Jain, A., Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33: 6840-6851.

Dhariwal, P., Nichol, A. (2021). Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780-8794.

Ho, J., Saharia, C., Chan, W., et al. (2022). Cascaded Diffusion Models for High Fidelity Image Generation. Journal of Machine Learning Research, 23(47):1-33.

Nichol, A., Dhariwal, P., Ramesh, A., et al. (2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. Arxiv Preprint, 2112.10741.

Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Lin, T., Maire, M., Belongie, S., et al. (2014). Microsoft coco: Common objects in context. In : Computer Vision–ECCV 2014: 13th European Conference. Zurich, Switzerland. 13:740-755.

Garcia, N., Vogiatzis, G. (2018). How to read paintings: semantic art understanding with multi-modal retrieval. Proceedings of the European Conference on Computer Vision (ECCV) Workshops , 0-0.r

Downloads

Published

17-03-2023

Issue

Section

Articles

How to Cite

Li, Y. (2023). Text-to-Classic: A Diffusion Method for Classical Art Generation Based on Text. Frontiers in Computing and Intelligent Systems, 3(1), 85-89. https://doi.org/10.54097/fcis.v3i1.6030