Text-to-Classic: A Diffusion Method for Classical Art Generation Based on Text
DOI:
https://doi.org/10.54097/fcis.v3i1.6030Keywords:
Denoising Diffusion Probabilistic Model, Text-to-Image Generation, ArtAbstract
Text-to-Image generation has recently become a hot research topic and diffusion models have achieved remarkable performance in this task. However, most previous researches aim at real scene generation. Few researches focus on classical art paintings. Besides, diffusion models are commonly heavy-weighted with a large number of parameters, which has a high computational cost. In this paper, we aim to solve the classical art paintings synthesis subtask. We propose a lightweight diffusion model Text-to-Classic(T2C) to synthesize classical art paintings according to text descriptions. Experiment results show that our method can achieve good performance with fewer parameters.
Downloads
References
Saharia, C., Chan, W., Saxena, S., et al. (2022). Photorealistic text-to-image diffusion models with deep language understanding. Arxiv Preprint, 2205.11487.
Reed, S., Akata, Z., Yan, X., et al. (2016). Generative adversarial text to image synthesis. PMLR, 1060-1069.
Dash, A., Gamboa, J. C. B., Ahmed, S., et al. (2017). Tac-gan-text conditioned auxiliary classifier generative adversarial network. Arxiv Preprint, 1703.06412.
Zhang, H., Xu, T., Li, H., et al. (2017). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. Proceedings of the IEEE international conference on computer vision, 5907-5915.
Yuan, M., Peng, Y. (2019). Bridge-GAN: Interpretable representation learning for text-to-image synthesis. IEEE Transactions on Circuits and Systems for Video Technology, 30(11):4258-4268.
Ho, J., Jain, A., Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33: 6840-6851.
Dhariwal, P., Nichol, A. (2021). Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780-8794.
Ho, J., Saharia, C., Chan, W., et al. (2022). Cascaded Diffusion Models for High Fidelity Image Generation. Journal of Machine Learning Research, 23(47):1-33.
Nichol, A., Dhariwal, P., Ramesh, A., et al. (2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. Arxiv Preprint, 2112.10741.
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Lin, T., Maire, M., Belongie, S., et al. (2014). Microsoft coco: Common objects in context. In : Computer Vision–ECCV 2014: 13th European Conference. Zurich, Switzerland. 13:740-755.
Garcia, N., Vogiatzis, G. (2018). How to read paintings: semantic art understanding with multi-modal retrieval. Proceedings of the European Conference on Computer Vision (ECCV) Workshops , 0-0.r


