FPGA-Based Deep Learning Acceleration: Analysis of Dataflow and Tiling Strategies
DOI:
https://doi.org/10.54097/41gx1442Keywords:
Performance Optimization; FPGA Accelerators; Tiling Strategies; CNNs.Abstract
With the fast development of deep learning models, especially Convolutional Neural Networks (CNNs) and Large Language Models (LLMs), Central Processing Units (CPUs) and Graphics Processing Units (GPUs) struggle to meet the high-performance and low-power requirements for deep learning acceleration. As a kind of reconfigurable hardware, Field-Programmable Gate Arrays (FPGAs) have exhibited great potentials for deep learning acceleration because of its efficient parallel computing and flexible nature. This paper is on the design and optimization strategies for FPGA based Deep Learning (DL) accelerators, specifically dataflow and tiling strategies. FPGA accelerators applied to deep learning architecture such as CNNs or Transformers; also consider the effects of different data flow strategy on the performance. This paper also presents ongoing bottlenecks like memory bandwidth, computational resource allocation and maturity of toolchain and future research directions like dynamic data flow, partial reconfiguration and heterogeneous computing platform integration. Lastly, this paper give an outlook for the future prospects of FPGAs in deep learning acceleration and offer some suggestions for further optimization.
Downloads
References
[1] Li R. Dataflow & Tiling Strategies in Edge-AI FPGA Accelerators: A Comprehensive Literature Review. arXiv preprint arXiv:2505.08992, 2025.
[2] Jiang J, Zhou Y, Gong Y, et al. FPGA-based Acceleration for Convolutional Neural Networks: A Comprehensive Review. arXiv preprint arXiv:2505.13461, 2025.
[3] Rati G, Costa R, Ishikawa L. High-Performance FPGA Acceleration for Transformer-Based Models. 2025.
[4] Domagała Ł, van Amstel D, Rastello F. Generalized cache tiling for dataflow programs. ACM SIGPLAN Notices, 2016, 51(5): 52-61.
[5] Yi Q, Sun H, Fujita M. FPGA based accelerator for neural networks computation with flexible pipelining. arXiv preprint arXiv:2112.15443, 2021.
[6] Gao L, Luo Z, Wang L. Convolutional Neural Network Acceleration Techniques Based on FPGA Platforms: Principles, Methods, and Challenges. Information, 2025, 16(10): 914.
[7] Liu Z, Li G, Cheng J. Hardware acceleration of fully quantized bert for efficient natural language processing. 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2021: 513-516.
[8] Li B, Pandey S, Fang H, et al. Ftrans: energy-efficient acceleration of transformers using fpga. Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design. 2020: 175-180.
[9] Hong S, Moon S, Kim J, et al. Dfx: A low-latency multi-fpga appliance for accelerating transformer-based text generation. 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2022: 616-630.
[10] Hur S, Na S, Kwon D, et al. A fast and flexible FPGA-based accelerator for natural language processing neural networks. ACM Transactions on Architecture and Code Optimization, 2023, 20(1): 1-24.
[11] Petrica L, Alonso T, Kroes M, et al. Memory-efficient dataflow inference for deep CNNs on FPGA. 2020 International Conference on Field-Programmable Technology (ICFPT). IEEE, 2020: 48-55.
[12] Yan F, Koch A, Sinnen O. A survey on FPGA-based accelerator for ML models. arXiv preprint arXiv:2412.15666, 2024.
[13] Zhang X, Ye H, Wang J, et al. DNNExplorer: a framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator. Proceedings of the 39th International Conference on Computer-Aided Design. 2020: 1-9.
[14] Du C, Wen Q, Wei Z, et al. Energy efficient spike transformer accelerator at the edge. Intelligent Marine Technology and Systems, 2024, 2(1): 24.
[15] Zeng S, Liu J, Dai G, et al. Flightllm: Efficient large language model inference with a complete mapping flow on fpgas. Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 2024: 223-234.
[16] Zhao S, Gao S, Wang R, et al. Acceleration and implementation of convolutional neural networks based on FPGA. Digital Signal Processing, 2023, 141: 104188.
[17] Rati G, Costa R, Ishikawa L. High-Performance FPGA Acceleration for Transformer-Based Models. IEEE, 2025.
[18] Liu Z, Yin P, Ren Z. An efficient fpga-based accelerator for swin transformer. arXiv preprint arXiv:2308.13922, 2023.
[19] Zeng S, Liu J, Dai G, et al. Flightllm: Efficient large language model inference with a complete mapping flow on fpgas. Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 2024: 223-234.
[20] Wojcicki F, Que Z, Tapper A D, et al. Accelerating transformer neural networks on fpgas for high energy physics experiments. 2022 International Conference on Field-Programmable Technology (ICFPT). IEEE, 2022: 1-8.
[21] Nechi A, Groth L, Mulhem S, et al. Fpga-based deep learning inference accelerators: Where are we standing?. ACM Transactions on Reconfigurable Technology and Systems, 2023, 16(4): 1-32.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Academic Journal of Science and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.








