FPGA Hardware Acceleration Design for Deep Learning

Authors

  • Haochen Shi

DOI:

https://doi.org/10.54097/hset.v39i.6543

Keywords:

Deep Learning; Convolutional Neural Networks; FPGA; Hardware Acceleration; Parallel Computing.

Abstract

A type of artificial neural network called a convolutional neural network (CNN) can learn characteristics from a huge amount of data and performs very well in the field of large-scale image processing. CNN simulates the behavior of a biological optic nerve. In recent years, with the development of deep neural network algorithms and hardware technology, the current "CPU+GPU" model servers cannot meet the neural network structure in various fields, so a large amount of deep CNN accelerators based on the FPGA platform have gradually emerged. FPGA is beginning to be used in the fields of image recognition and natural language processing because of its programmability, high performance, high stability, high security, and low power consumption. Though FPGA has proven to have better performance, there is still room for optimization at the design level. Yolov3, as a classical algorithm, still consumes a lot of time and computational resources in actual operations. To address this problem, this experiment partially optimizes the Yolov3 algorithm by introducing the CBAM attention mechanism in the Yolov3 model and pruning the embedded system with different proportions using the Network slimming method. Finally, it is verified on a TX2 embedded device developed by Nvidia using the COCO dataset. The experiment finds that the precision, mAP, and the number of parameters of the optimized Yolov3 algorithm under different optimization strategies. It is shown that the Yolov3 algorithm still has more optimization strategies that can reduce the time required for computation and the memory occupied more effectively without any degradation in accuracy.

Downloads

Download data is not yet available.

References

J Y GUO, C M SHAO, J WANG, et al. Programming and development environment for FPGA graph computing: Overview and exploration[J]//Computer Research and Development, 2020, 57(6): 1164-1178. (In Chinese)

C C XU. Research on FPGA-based Graph Computing Accelerator System [D]//Hefei: University of Science and Technology of China,2018. (In Chinese)

Q Q GAO, Y ZHAO, G LI et al. a knowledge distillation based super-resolution convolutional neural network compression technique [J]// Computer Applications, 2019 (10). (In Chinese)

S HAN, J POOL, et al. Learning Both Weights and Connections for Efficient Neural Networks[C] //28th International Conference on Neural Information Processing Systems, Montreal: NIPS, 2015: 1137-1141.

A G HOWARD, M ZHU, B CHEN, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [J / OL]. [2021-05-10]. https://arxiv.org/abs/1704.04861.

F CHOLLET. Xception: Deep Learning with Depthwise Separable Convolutions[C] // IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1251-1258.

L YU, R W LI, Y WANG, et al. Overview: Open-Source Processors for SoC-FPGAs [J] // Journal of Electronics, 2018, 46(4): 992-1004. (In Chinese)

Z CHEN, Z FANG, P ZHOU, et al. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks[C] //. International Conference on Computer-aided Design. Austin: IEEE, 2016: 1-8.

Y SHEN, M FERDMAN, P MILDER. Maximizing CNN Accelerator Efficiency through Resource Partitioning [C] //44th Annual International Symposium on Computer Architecture. Toronto: IEEE, 2017: 535-547.

H LI, X FAN, J LI, et al. A High-Performance FPGA-based Accelerator for Large-scale Convolutional Neural Networks [C]//26th International Conference on Field Programmable Logic and Applications. Lausanne: IEEE,2016: 1-9.

S WOO, J PARK, J Y LEE, et al. CBAM: Convolutional Block Attention Module[C]// European Conference on Computer Vision. Springer, Cham, 2018.

L ZHUANG, J Li, Z SHEN, et al. Learning Efficient Convolutional Networks through Network Slimming [C]// 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017.

Downloads

Published

01-04-2023

How to Cite

Shi, H. (2023). FPGA Hardware Acceleration Design for Deep Learning. Highlights in Science, Engineering and Technology, 39, 299-304. https://doi.org/10.54097/hset.v39i.6543