Systematic Analysis of FPGA-Based Acceleration for Deep Neural Networks

Authors

  • Mingyang Xu

DOI:

https://doi.org/10.54097/yqf8gq04

Keywords:

FPGA, AI acceleration, Deep Neural Networks, low latency.

Abstract

This paper systematically reviews and evaluates the architectural characteristics, application status, and platform comparisons of FPGAs for deep neural network (DNN) acceleration. First, this paper outlines the foundation of Field-Programmable Gate Array-based (FPGA-based) reconfigurable hardware based on dataflow and parallelism. Subsequently, by combining representative designs and case studies, this paper summarizes FPGA advantages in low latency, energy efficiency, and operator customization, as well as bottlenecks. Compared to Graphics Processing Units (GPUs)/ Tensor Processing Units (TPUs), which are more suitable for large-scale training and high-throughput inference, FPGAs offer cost-effectiveness in deterministic low latency, small-batch/real-time scenarios, and specific protocol/operation customization, making them suitable for cloud-edge collaboration and industrial embedded applications. Looking ahead, technologies such as High Bandwidth Memory (HBM) and hierarchical caching, near-memory/near-computing, tensor- and dataflow-based reconfiguration coverage, model hardware co-optimization based on quantization and sparsity, automated compilation, heterogeneous Central Processing Unit (CPU)/ Artificial Intelligence (AI) engine/ FPGA System on Chips (SoCs), and chiplet/3-Dimension (3D) packaging will further lower the design barrier and improve system efficiency. This article aims to provide a reference for selecting heterogeneous computing capabilities and designing FPGA accelerators in different scenarios.

Downloads

Download data is not yet available.

References

[1] Farooq U, Marrakchi Z, Mehrez H. FPGA architectures: An overview. Tree-Based Heterogeneous FPGA Architectures: Application Specific Exploration and Optimization, 2012: 7 - 48.

[2] Tang Q, Mehrez H, Tuna M. Multi-FPGA prototyping board issue: the FPGA I/O bottleneck. 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV). IEEE, 2014: 207 - 214.

[3] Bolton W. Programmable logic controllers. Newnes, 2015.

[4] Mohammed S Q. Implementation of Simplified Data Encryption Standard on FPGA using VHDL. Science, 2022, 2022: 2.

[5] Bobda C, Mbongue J M, Chow P, et al. The future of FPGA acceleration in datacenters and the cloud. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2022, 15 (3): 1 - 42.

[6] Zeng K, Ma Q, Wu J W, et al. FPGA-based accelerator for object detection: a comprehensive survey. Journal of Supercomputing, 2022, 78 (12).

[7] Wu X, Ma Y, Wang M, et al. A flexible and efficient FPGA accelerator for various large-scale and lightweight CNNs. IEEE Transactions on Circuits and Systems I: Regular Papers, 2021, 69 (3): 1185 - 1198.

[8] Tunheim S A, Jiao L, Shafik R, et al. Tsetlin machine-based image classification FPGA accelerator with on-device training. IEEE Transactions on Circuits and Systems I: Regular Papers, 2024.

[9] Hesse C N. Analysis and comparison of performance and power consumption of neural networks on CPU, GPU, TPU and FPGA. Master’s thesis, University of Hildesheim, 2021.

[10] Posso J, Kieffer H, Menga N, et al. Real-Time Semantic Segmentation of Aerial Images Using an Embedded U-Net: A Comparison of CPU, GPU, and FPGA Workflows. arXiv preprint arXiv: 2503.08700, 2025.

[11] Liu X, Xu W, Wang Q, et al. Energy-efficient computing acceleration of unmanned aerial vehicles based on a cpu/fpga/npu heterogeneous system. IEEE Internet of Things Journal, 2024, 11 (16): 27126 - 27138.

[12] Hu Y, Liu Y, Liu Z. A survey on convolutional neural network accelerators: GPU, FPGA and ASIC. 2022 14th International Conference on Computer Research and Development (ICCRD). IEEE, 2022: 100 - 107.

[13] Zhang C, Yu H, Zhou Y, et al. High-performance and energy-efficient fpga-gpu-cpu heterogeneous system implementation. Advances in Parallel & Distributed Processing, and Applications: Proceedings from PDPTA'20, CSC'20, MSV'20, and GCC'20. Cham: Springer International Publishing, 2021: 477 - 492.

[14] Wang Y. Artificial-intelligence integrated circuits: Comparison of gpu fpga and asic. Applied and Computational Engineering, 2023, 4 (1): 99 - 104.

[15] Sano Y, Kobayashi R, Fujita N, et al. Performance evaluation on GPU-FPGA accelerated computing considering interconnections between accelerators. Proceedings of the 12th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies. 2022: 10 - 16.

[16] Alshemi M, Saif S, Taher M. Hardware acceleration of lane detection algorithm: A GPU versus FPGA comparison. arXiv preprint arXiv: 2212.09460, 2022.

[17] Huynh T V. Deep neural network accelerator based on FPGA. 2017 4th NAFOSTED Conference on Information and Computer Science. IEEE, 2017: 254 - 257.

[18] Ma Y, Cao Y, Vrudhula S, et al. Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Transactions on Very Large-Scale Integration (VLSI) Systems, 2018, 26 (7): 1354 - 1367.

Downloads

Published

30-03-2026

Issue

Section

Articles

How to Cite

Xu, M. (2026). Systematic Analysis of FPGA-Based Acceleration for Deep Neural Networks. Academic Journal of Science and Technology, 20(2), 703-709. https://doi.org/10.54097/yqf8gq04