Few-shot Font Generation based on SAE and Diffusion Model
DOI:
https://doi.org/10.54097/rp4sqj55Keywords:
Chinese font, Image generation, Few-shot font generation, Diffusion model.Abstract
Generating Chinese characters via few-shot font generation is an intriguing and important challenge in recent years, primarily due to the intricate and unique nature of Chinese fonts. However, the conventional GAN-based model for font generation has encountered issues such as unpredictable training and inaccurate generation. Simultaneously, in the realm of image generation, diffusion models have demonstrated remarkable success, even garnering application in AI painting commercials. Some studies have endeavored to integrate diffusion models into Few-shot Font Generation (FFG). In this paper, we present a straightforward, few-shot font generation framework utilizing a conditional diffusion model. We generate conditional embedding tokens using three encoders, which extract essential character information such as content and style. By combining these conditions into the diffusion process, we can effectively model these three pieces of information. Our model possesses three key features: i) Our model attains disentanglement of all encoders and the diffusion model. The content encoder focuses solely on extracting the content or the relative position of strokes, the style-coding provides only style features, and the diffusion model is limited to generating the target image without obscuring any content or style information. This enhances the model’s interpretability and makes the addition of new functionalities a simpler process. ii) For different fonts, our model requires fewer training steps due to the use of pre-training. We only train the style-coding on a small scale, bypassing the need for extensive training of the large-scale diffusion model. iii) Our model achieves two types of "Few-shot" training. The first type involves the same style but different characters, requiring only a few characters for training. The second type pertains to different styles, needing only a few style fonts for training. Experimental results reveal that our model outperforms previous few-font generation models in terms of quality, generation speed, and the scale of well-trained training datasets.
Downloads
References
D. Lu, “Contemporary Font Design Method Inspired by Chinese Character Tradition,” Packaging Engineering, vol. 44, no. 4, pp. 248 – 254, 2023.
W. Shi, “Research on The Evolution of Visual Expression Form of Chinese Character Font,” Design in Digital Age, vol. 35, no. 19, pp. 44 – 46, 2022.
L. Yan, “Discussing the complete set of Chinese font design in the course of font design.,” Packaging World, no. 3, pp. 64 – 66, 2017.
Z. Lai, C. Tang, and J. Lv, “Arbitrary Chinese Font Generation from a Single Reference,” in 2020 International Joint Conference on Neural Networks (IJCNN), Jul. 2020, pp. 1 – 7.
C. Wen, Y. Pan, J. Chang, Y. Zhang, S. Chen, Y. Wang, M. Han, and Q. Tian, “Handwritten Chinese Font Generation with Collaborative Stroke Refinement,” in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Jan. 2021, pp. 3881 – 3890.
J. Zeng, Q. Chen, Y. Liu, M. Wang, and Y. Yao, “Strokegan: Reducing mode collapse in Chinese font generation via stroke encoding,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, 2021, pp. 3270 – 3277.
Y. Tian, “Zi2zi: Master Chinese Calligraphy with Conditional Adversarial Networks,” Apr. 2017. [Online]. Available: https://kaonashi-tyc.github.io/2017/04/06/zi2zi.html.
S.-J. Wu, C.-Y. Yang, and J. Y.-j. Hsu, “CalliGAN: Style and Structure-aware Chinese Calligraphy Character Generator,” May 2020. [Online]. Available: http://arxiv.org/abs/2005.12500.
M. Yao, Y. Zhang, X. Lin, X. Li, and W. Zuo, “VQ-Font: Few-Shot Font Generation with Structure-Aware Enhancement and Quantization,” Aug. 2023. [Online]. Available: http://arxiv.org/abs/2308.14018.
X. He, M. Zhu, N. Wang, X. Gao, and H. Yang, “Few-shot Font Generation by Learning Style Difference and Similarity,” Jan. 2023. [Online]. Available: http://arxiv.org/abs/2301.10008
H. He, X. Chen, C. Wang, J. Liu, B. Du, D. Tao, and Y. Qiao, “Diff-Font: Diffusion Model for Robust One-Shot Font Generation,” May 2023. [Online]. Available: http://arxiv.org/abs/2212.05895.
Z. Chen, W. Yang, and X. Li, “Stroke-Based Autoencoders: Self-Supervised Learners for Efficient Zero-Shot Chinese Character Recognition,” Applied Sciences, vol. 13, no. 3, p. 1750, Jan. 2023. [Online]. Available: https://www.mdpi.com/2076-3417/13/3/1750.
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” Nov. 2018. [Online]. Available: http://arxiv.org/abs/1611.07004.
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio, “Generative Adversarial Nets,” in NIPS, Dec. 2014. [Online]. Available: https://www.semanticscholar.org/paper/Generative-Adversarial-Nets-Goodfellow-Pouget-Abadie/54e325aee6b2d476bbbb88615ac15e251c6e8214.
J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman, “Toward Multimodal Image-to-Image Translation,” Oct. 2018. [Online]. Available: http://arxiv.org/abs/1711.11586.
X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal Unsupervised Image-to-Image Translation,” Aug. 2018. [Online]. Available: http://arxiv.org/abs/1804.04732.
J.-J. Wang, N. Dobigeon, M. Chabert, D.-C. Wang, T.-Z. Huang, and J. Huang, “CD-GAN: A robust fusion-based generative adversarial network for unsupervised remote sensing change detection with heterogeneous sensors,” Nov. 2023. [Online]. Available: http://arxiv.org/abs/2203.00948.
M.-Y. Liu, X. Huang, A. Mallya, T. Karras, T. Aila, J. Lehtinen, and J. Kautz, “Few-Shot Unsupervised Image-to-Image Translation,” Sep. 2019. [Online]. Available: http://arxiv.org/abs/1905.01723.
Y. Xie, X. Chen, L. Sun, and Y. Lu, “DG-Font: Deformable Generative Networks for Unsupervised Font Generation,” Apr. 2021. [Online]. Available: http://arxiv.org/abs/2104.03064
Q. Wen, S. Li, B. Han, and Y. Yuan, “ZiGAN: Fine-grained Chinese Calligraphy Font Generation via a Few-shot Style Transfer Approach,” in Proceedings of the 29th ACM International Conference on Multimedia, Oct. 2021, pp. 621 – 629. [Online]. Available: http://arxiv.org/abs/2108.03596.
Y. Jiang, Z. Lian, Y. Tang, and J. Xiao, “SCFont: Structure-Guided Chinese Font Generation via Deep Stacked Networks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 4015 – 4022, Jul. 2019. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/4294.
J. Cha, S. Chun, G. Lee, B. Lee, S. Kim, and H. Lee, “Few-shot Compositional Font Generation with Dual Memory,” Jul. 2020. [Online]. Available: http://arxiv.org/abs/2005.10510.
J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” Dec. 2020. [Online]. Available: http://arxiv.org/abs/2006.11239.
J. Song, C. Meng, and S. Ermon, “Denoising Diffusion Implicit Models,” Oct. 2022. [Online]. Available: http://arxiv.org/abs/2010.02502.
P. Dhariwal and A. Nichol, “Diffusion Models Beat GANs on Image Synthesis,” Jun. 2021. [Online]. Available: http://arxiv.org/abs/2105.05233.
A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models,” Mar. 2022. [Online]. Available: http://arxiv.org/abs/2112.10741.
A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical Text-Conditional Image Generation with CLIP Latents,” Apr. 2022. [Online]. Available: http://arxiv.org/abs/2204.06125.
C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. S. Mahdavi, R. G. Lopes, T. Salimans, J. Ho, D. J. Fleet, and M. Norouzi, “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding,” May 2022. [Online]. Available: http://arxiv.org/abs/2205.11487.
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models,” Apr. 2022. [Online]. Available: http://arxiv.org/abs/2112.10752.
Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600 – 612, Apr. 2004. [Online]. Available: http://ieeexplore.ieee.org/document/1284395/.
A. Botchkarev, “Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology,” Interdisciplinary Journal of Information, Knowledge, and Management, vol. 14, pp. 045 – 076, 2019. [Online]. Available: http://arxiv.org/abs/1809.03006.
R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric,” Apr. 2018. [Online]. Available: http://arxiv.org/abs/1801.03924.
M. Heusel, H. Ramsauer, T. Entertainer, B. Nessler, and S. Hochreiter, “GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium,” Jan. 2018. [Online]. Available: http://arxiv.org/abs/1706.08500.
S. Park, S. Chun, J. Cha, B. Lee, and H. Shim, “Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts,” Apr. 2021. [Online]. Available: http://arxiv.org/abs/2104.00887.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







