Analysis of the Designs and Applications of AI Chip

Xinnuo Li

doi:10.54097/k1p7yk27

Authors

Xinnuo Li

DOI:

https://doi.org/10.54097/k1p7yk27

Keywords:

AI chip, Design of AI chip, AI chip structure.

Abstract

The rapid evolution of deep learning model architectures and the increasing scale of model parameters have imposed heightened demands on deep learning training, inference, and deployment, leading to the swift advancement and unprecedented prosperity of AI chips. Therefore, this study sets out to analyze the designs and applications of AI chips by considering their unique requirements compared to conventional chips, and by combining software and hardware aspects. The paper delineates the classification of common AI chips along with their distinct design strategies and optimization algorithms. It commences with the fundamental hardware design of AI chips, elucidating the basic design process and addressing the specialized demands of AI computation, particularly data parallelism and storage optimization. Subsequently, transitioning to the manufacturing process, it examines how current AI chips circumvent fabrication bottlenecks and achieve significant breakthroughs in architecture and performance through chip stacking techniques. The paper then bridges hardware and software through the AI compiler, expounding on model optimization approaches, e.g., quantization and pruning, completing the comprehensive journey from AI chip design to deployment. It identifies current developmental challenges in the AI chip realm and provides a glimpse into future prospects. Through a holistic perspective spanning design, manufacturing, algorithms, and applications of AI chips, this paper offers insights that steer upcoming innovations and practical implementations in artificial intelligence, paving the way for a dynamic future in AI chip development.

Downloads

Download data is not yet available.

References

Rombach R, Blattmann A, Lorenz D, et al. High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 10684-10695. Fangfang. Research on power load forecasting based on Improved BP neural network. Harbin Institute of Technology, 2011.

Brown T, Mann B, Ryder N, et al. Language models are few-shot learners. Advances in neural information processing systems, 2020, 33: 1877-1901.

Huawei Technologies Co., Ltd. Huawei MindSpore AI Development Framework. Artificial Intelligence Technology. Singapore: Springer Nature Singapore, 2022: 137-162.

Zhao Y, Liu C, Du Z, et al. Cambricon-Q: A hybrid architecture for efficient training. 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2021: 706-719.

Lavin A, Gray S. Fast algorithms for convolutional neural networks. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 4013-4021.

Sze V, Chen Y H, Yang T J, et al. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 2017, 105 (12): 2295-2329.

Mittal S. A survey on modeling and improving reliability of DNN algorithms and accelerators. Journal of Systems Architecture, 2020, 104: 101689.

Dean J, Patterson D, Young C. A new golden age in computer architecture: Empowering the machine-learning revolution. IEEE Micro, 2018, 38 (2): 21-29.

Li B, Gu J, Jiang W. Artificial intelligence (AI) chip technology review. 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). IEEE, 2019: 114-117.

Yan G, Lu W, Li X, et al. Comparative study of the domain-specific processors. Scientia Sinica Informationis, 2022, 52 (2): 358-375.

Lee V W, Kim C, Chhugani J, et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. Proceedings of the 37th annual international symposium on Computer architecture. 2010: 451-460.

Li B, Gu J, Jiang W. Artificial intelligence (AI) chip technology review. 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). IEEE, 2019: 114-117.

Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten. New plot and data collected for 2010-2015 by K. Rupp.

Markidis S, Der Chien S W, Laure E, et al. Nvidia tensor core programmability, performance & precision. 2018 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, 2018: 522-531.

Ilievski A, Zdraveski V, Gusev M. How CUDA powers the machine learning revolution. 2018 26th Telecommunications Forum (TELFOR). IEEE, 2018: 420-425.

Jouppi N, Kurian G, Li S, et al. Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. Proceedings of the 50th Annual International Symposium on Computer Architecture. 2023: 1-14.

Google cloud. Retrieved from: https://cloud.google.com/tpu/docs/system-architecture-tpu-vm

Martin G, Chang H. System-on-Chip design. ASICON 2001. 2001 4th International Conference on ASIC Proceedings (Cat. No. 01TH8549). IEEE, 2001: 12-17.

Chen T, Du Z, Sun N, et al. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Computer Architecture News, 2014, 42 (1): 269-284.

Jouppi N P, Yoon D H, Ashcraft M, et al. Ten lessons from three generations shaped google’s tpuv4i: Industrial product. 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2021: 1-14.

Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 2012, 25.

Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in neural information processing systems, 2017, 30.

Niu D, Li S, Wang Y, et al. 184QPS/W 64Mb/mm 2 3D logic-to-DRAM hybrid bonding with process-near-memory engine for recommendation system. 2022 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2022, 65: 1-3.

Temuçin Y H, Sojoodi A H, Alizadeh P, et al. Efficient multi-path NVLink/PCIe-aware UCX based collective communication for deep learning. 2021 IEEE Symposium on High-Performance Interconnects (HOTI). IEEE, 2021: 25-34.

Ingerly D B, Amin S, Aryasomayajula L, et al. Foveros: 3D integration and the use of face-to-face chip stacking for logic devices. 2019 IEEE International Electron Devices Meeting (IEDM). IEEE, 2019: 19.6. 1-19.6. 4.

Talpes E, Williams D, Sarma D D. Dojo: The microarchitecture of tesla’s exa-scale computer. 2022 IEEE Hot Chips 34 Symposium (HCS). IEEE Computer Society, 2022: 1-28.

Sharma D D, Pasdast G, Qian Z, et al. Universal chiplet interconnect express (UCIe): An open industry standard for innovations with chiplets at package level. IEEE Transactions on Components, Packaging and Manufacturing Technology, 2022, 12 (9): 1423-1431.

Universal Chiplet Interconnect Express UCIe 1.0 Launched. Retrieved from: https://www.servethehome.com/universal-chiplet-interconnect-express-ucie-1-0-launched/.

Chen T, Moreau T, Jiang Z, et al. {TVM}: An automated {End-to-End} optimizing compiler for deep learning. 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 2018: 578-594.

PYTORCH ORGANIZATION. 2022. API Index. Retrieved from: https://pytorch.org/tutorials/intermediate/dynamo_tutorial.html.

Micikevicius P, Narang S, Alben J, et al. Mixed precision training. arXiv preprint arXiv: 1710.03740, 2017.

Liu B, Li F, Wang X, et al. Ternary weight networks. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023: 1-5.

Fang J, Shafiee A, Abdel-Aziz H, et al. Post-training piecewise linear quantization for deep neural networks. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16. Springer International Publishing, 2020: 69-86.

Frankle J, Carbin M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv: 1803.03635, 2018.

Canziani A, Paszke A, Culurciello E. An analysis of deep neural network models for practical applications. arXiv preprint arXiv: 1605.07678, 2016.

Chen Z, Sludds A, Davis III R, et al. Deep learning with coherent VCSEL neural networks. Nature Photonics, 2023: 1-8.

Cheng S, Jin P, Guo Q, et al. Pushing the Limits of Machine Design: Automated CPU Design with AI. arXiv preprint arXiv: 2306.12456, 2023.