FPGA Hardware Acceleration Research and Implementation of Deep Learning Algorithms

Yuxuan Hu

doi:10.54097/fcis.v5i3.14040

Authors

Yuxuan Hu

DOI:

https://doi.org/10.54097/fcis.v5i3.14040

Keywords:

YOLOv3-tiny, Hardware Acceleration, FPGA, Convolutional Neural Network

Abstract

The convolutional neural network model is an important algorithm for deep learning, and YOLOv3-tiny based on this model has excellent object detection ability. However, the computational power required by the model is still large, and it is difficult to realize the application in the embedded field. This paper proposes a hardware acceleration method for YOLOv3-tiny and implements it on FPGA platform. Firstly, the fixed-point quantitative processing was carried out for the network, and an appropriate fixed-point strategy was designed with the data accuracy as the index. Secondly, the parallel computing design and pipeline optimization principle were carried out, and the FIFO structure was introduced to shorten the running time. Finally, the experiment was carried out on the Xilinx PYNQ-Z2 platform, and the data were compared with the previous related work.

Downloads

Download data is not yet available.

References

Zhang, C., Li, P., Sun, G.Y., Guan, Y.J., Xiao, B.J., Cong, J. (2015) Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. International Symposium on Field-Programmable Gate Arrays (FPGA), 161-170.

Sun, F., Wang, C., Gong, L., Xu, C., Zhou, X. (2017) A High-Performance Accelerator for Large-Scale Convolutional Neural Networks. 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 1-9.

Venieris, S.I., Bouganis, C.S. (2016) FPGAConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs. IEEE International Symposium on Field-Programmable Custom Computing Machines, London, UK, 40-47.

Redmon, J., Divvala, S., Girshick, R., Farhadi, A. (2016) You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779-788.

Ren, S., He, K., Girshick, R., Sun, J. (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, 39(6): 1137-1149.

Bi, F., Yang, J. (2019) Target Detection System Design and FPGA Implementation Based on YOLO v2 Algorithm. 2019 3rd International Conference on Imaging, Signal Processing and Communication (ICISPC).

Wai, Y.J., Yussof, Z.B.M., Salim, S.I.B., Chuan, L.K. (2018) Fixed point implementation of tiny-yolo-v2 using opencl on fpga. International Journal of Advanced Computer Science & Applications, 9(10):506-512.

Lu, Z.J. (2013) Research on the parallel structure of convolutional neural network based on FPGA. Thesis of Harbin Engineering University.

Nakahara, H., Yonekawa, H., Fujii, T., & Sato, S. (2018) A Lightweight YOLOv2: A Binarized CNN with A Parallel Support Vector Regression for an FPGA. Field Programmable Gate Arrays. ACM, 31-40.

Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.J. (2019) A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1-13.

Preuer, T.B., Gambardella, G., Fraser, N., Blott, M. (2018) Inference of Quantized Neural Networks on Heterogeneous All-Programmable Devices. Design, Automation & Test in Europe Conference & Exhibition, 833-838.