Comparison of Latest 3D Content Generation Models with Minimal Input Images

Authors

  • Haoxuan Xie

DOI:

https://doi.org/10.54097/w5m3fx19

Keywords:

Artificial intelligence; 3D content generation; diffusion model; NeRF.

Abstract

Three-dimensional (3D) content generation has become a popular topic in recent years. It can be widely used in movie scene generation, video games 3D modeling, industrial design, and even pharmaceutical 3D structure characterization. Before artificial intelligence (AI), it can be difficult. People need to be trained to use various industrial 3D model applications and spend plenty of time building and refining a model. With the development of virtual reality and artificial reality, the demand for 3D content is rising rapidly. Traditional 3D content production cycles cannot fit the needs. Recently, the Text-to-Image technology has got great success. With the help of artificial intelligence, people can use a limited set of descriptive words to generate images. Typically, the model generates multiple images of the same category for users to choose from. Some fundamental techniques like Neural Radiance Fields (NeRF) and Diffusion Model can generate 3D scenes, avatars, and other 3D content using a couple of images. This progress marks the possibility of creating 3D content using text. Based on the technologies available today, there will be more applications for generating 3D content in the future. Selecting the core technologies will be a crucial issue in this regard. This essay mainly talks about three of the most popular models or technologies that use a minimum number of images to produce 3D content. The goal is to find the most suitable technology based on criteria such as quality, applicability, and other indicators.

Downloads

Download data is not yet available.

References

Wang, Nanyang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European conference on computer vision, 2018: 52-67.

Karras, Tero, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019: 4401-4410.

Radford, Alec, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, 2021: 8748-8763.

Yang, Ling, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys, 2022, 56(4): 1-39.

GAO, Kyle, Yina GAO, Hongjie He, Dening Lu, Linlin Xu, and Jonathan Li. Nerf: Neural radiance field in 3d vision, a comprehensive review, and 2022: arXiv preprint arXiv: 2210.00379.

Mildenhall, Ben, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2021, 65(1): 99-106.

Khosla, Prannay, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. Advances in neural information processing systems, 2020, 33: 18661-18673.

Ho, Jonathan, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 2020, 33: 6840-6851.

Yu, Alex, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4578-4587.

Raj, Amit, Srinivas Kaza, Ben Poole, Michael Niemeyer, Nataniel Ruiz, Ben Mildenhall, Shiran Zada et al. Dreambooth3d: Subject-driven text-to-3d generation, 2023: arXiv preprint arXiv: 2303.13508.

Poole, Ben, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion, 2022, arXiv preprint arXiv: 2209.14988.

Ruiz, Nataniel, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 22500-22510.

Tang, Junshu, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, and Dong Chen. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior, 2023, arXiv preprint arXiv: 2303.14184.

Saharia, Chitwan, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Kamyar Ghasemipour et al. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 2022, 35: 36479-36494.

Raffel, Colin, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020, 21(1): 5485-5551.

Müller,Thomas, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics, 2022, 41(4): 1-15.

Yu, Jiahui, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF international conference on computer vision, 2019: 4471-4480.

Lin, Chen-Hsuan, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation, 2022: arXiv preprint arXiv: 2211.10440.

Kingma, Diederik P., and Jimmy Ba. Adam: A method for stochastic optimization. 2014: arXiv preprint arXiv: 1412.6980.

Zhang, Qixuan, Longwen Zhang, LAN Xu, Di Wu, and Jingyi Yu. ChatAvatar: Creating Hyper-realistic Physically-based 3D Facial Assets through AI-Driven Conversations. In ACM SIGGRAPH 2023 Real-Time Live! 2023: 1–2.

Downloads

Published

26-04-2024

How to Cite

Xie, H. (2024). Comparison of Latest 3D Content Generation Models with Minimal Input Images. Highlights in Science, Engineering and Technology, 94, 39-47. https://doi.org/10.54097/w5m3fx19