Evaluation of FPGA Based Speech Recognition Hardware for Edge Devices

Qiyue Tu

doi:10.54097/yxz93945

Authors

Qiyue Tu

DOI:

https://doi.org/10.54097/yxz93945

Keywords:

Speech Recognition; Natural Language Process; FPGA.

Abstract

Current wearable devices are limited by power consumption and size, and voice recognition mostly relies on cell phone processing. Localized deployment has the potential advantages of improving privacy security and reducing latency. This paper synthesizes several research in this field to analyze the low power and latency advantages of Field Programmable Gate Array (FPGA) hardware over traditional Central Processing Unit (CPU) and CPU plus Graphics Processing Unit (GPU) solutions for speech recognition tasks. It also demonstrates the significant potential of FPGAs in terms of energy efficiency and real time performance in conjunction with Spiked Neural Networks (SNNs) and their hardware optimization strategies. Experimental data from studies shows FPGA hardware and SNN combination scheme can reach 11.5× and 40× energy efficiency compare to CPU and GPU which provides a feasible path for implementing native speech processing in future wearable devices. Potential future optimization directions such as further optimizing the on-chip memory layout and alternative of CPU controller are also analyzed based on the current state of technology development.

Downloads

Download data is not yet available.

References

[1] A. Ometov et al., “A survey on wearable technology: history, State-of-the-Art and current challenges,” Computer Networks, vol. 193, p. 108074, Apr. 2021, doi: 10.1016/j.comnet.2021.108074.

[2] T. Chen, Y. Yang, C. Qiu, X. Fan, X. Guo, and L. Shangguan, “Enabling Hands-Free Voice Assistant Activation on Earphones,” Jun. 03, 2024, pp. 155–168. doi: 10.1145/3643832.3661890.

[3] B. V. Varun, S. M. Kusuma, and G. Reddy, “AI-EDge based voice responsive smart headphone for user context-awarenes,” 2020 IEEE International Conference on Electronics, Computing and Communication Technologies, pp. 1–5, Jul. 2020, doi: 10.1109/conecct50063.2020.9198484.

[4] “Get started with Wear OS,” Android Developers. https://developer.android.com/training/wearables

[5] T. Yin, F. Dong, C. Chen, C. Ouyang, Z. Wang, and Y. Yang, “A spiking LSTM accelerator for automatic speech recognition application based on FPGA,” Electronics, vol. 13, no. 5, p. 827, Feb. 2024, doi: 10.3390/electronics13050827.

[6] W.-J. Luo, C. B. D. Kuncoro, and Y.-D. Kuan, “Wireless Power Hanger Pad for portable wireless audio device Power charger application,” Energies, vol. 13, no. 2, p. 419, Jan. 2020, doi: 10.3390/en13020419.

[7] C. Wang and Z. Luo, “A Review of the Optimal Design of Neural Networks Based on FPGA,” Applied Sciences, vol. 12, no. 21, p. 10771, doi: 10.3390/app122110771.

[8] S. Han et al., “ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA,” arXiv (Cornell University), Dec. 2016, doi: 10.48550/arxiv.1612.00694.

[9] R. Michon, M. Ducceschi, P. Cochard, T. Skare, C. J. Webb, and R. Russo, “Evaluating CPU, GPU, and FPGA performance in the context of modal reverberation: a comparative analysis,” Frontiers in Signal Processing, vol. 5, Apr. 2025, doi: 10.3389/frsip.2025.1522604.

[10] M. Popoff, R. Michon, T. Risset, Y. Orlarey, and S. Letz, “Towards an FPGA-Based compilation flow for Ultra-Low latency audio signal processing,” 2023. https://api.semanticscholar.org/CorpusID:262133771

[11] Q. T. Pham, T. Q. Nguyen, P. C. Hoang, Q. H. Dang, D. M. Nguyen, and H. H. Nguyen, “A review of SNN implementation on FPGA,” 2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR), pp. 1–6, Oct. 2021, doi: 10.1109/mapr53640.2021.9585245.

[12] S. Panchapakesan, Z. Fang, and N. Chandrachoodan, “EASpiNN: Effective Automated Spiking Neural Network Evaluation on FPGA,” 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), p. 242, May 2020, doi: 10.1109/fccm48280.2020.00075.

[13] P. Plagwitz, F. Hannig, J. Teich, and O. Keszocze, “To spike or not to spike? A quantitative comparison of SNN and CNN FPGA implementations,” arXiv (Cornell University), Jun. 2023, doi: 10.48550/arxiv.2306.12742.

[14] J. Kwon and D. Park, “Hardware/Software Co-Design for TinyML Voice-Recognition application on resource frugal edge devices,” Applied Sciences, vol. 11, no. 22, p. 11073, Nov. 2021, doi: 10.3390/app112211073.