Research and Implementation of a Human-Computer Interaction AI Intelligent Robot Based on Speech

Authors

  • Weihang Mu

DOI:

https://doi.org/10.54097/mym13c78

Keywords:

Human-computer Interaction, Speech Recognition, Natural Language Understanding, Dialogue Management, Robot Control

Abstract

With the advancement of deep learning, language model training, and edge computing, speech-based human-robot interaction (HRI) systems are expanding their applications in manufacturing and medical escort service robots. In this paper, a modular speech-driven human-robot interaction system is proposed and implemented, which integrates speech recognition (ASR), natural language understanding (NLU), dialogue management, and action execution modules. To evaluate the engineering trade-offs of different technology options, three sets of comparative trials were designed and implemented: ASR (cloud-based commercial vs. local models), NLU (traditional statistical methods vs. Transformer fine-tuning), and dialogue management (rule-driven vs. reinforcement learning). Simulation experiments were conducted in the ROS/Gazebo environment. Subjective tests were also performed, with evaluation metrics including word error rate (WER), accuracy, F1-score, response latency, task completion rate, and user satisfaction. The results show that the Transformer-based NLU is significantly better than the traditional methods in semantic parsing. ASR in the cloud has obvious advantages in recognition quality, but the local model is more suitable for real-time control scenarios in terms of delay and privacy. Dialogue management is recommended to adopt a hybrid strategy of "rule first + reinforcement learning enhancement." Finally, the problems of system engineering deployment, model compression, edge-cloud collaboration, and ethical compliance were discussed.

Downloads

Download data is not yet available.

References

[1] Wang, J., Wu, Z., Li, Y., et al. (2024). Large language models for robotics: Opportunities, challenges, and perspectives. arXiv, 2401.04334.

[2] Janssens, R., Verhelst, E., Abbo, G.A., et al. (2024). Child speech recognition in human-robot interaction: Problem solved? arXiv, 2404.17394.

[3] Zeng, F., Gan, W., Huai, Z., et al. (2023). Large language models for robotics: A survey. arXiv, 2311.07226.

[4] Matuszek, C., Williams, T., DePalma, N., et al. (2025). Reporting guidelines for large language models in human-robot interaction. ACM Transactions on Human-Robot Interaction, 15: 1-24.

[5] Wang, T., Zheng, P., Li, S., & Wang, L. (2024). Multimodal human-robot interaction for human-centric smart manufacturing: A survey. Advanced Intelligent Systems, 6(3), 2300359.

[6] Garcia, R., Mahu, R., Grageda, N., et al. (2024). Speech emotion recognition with deep learning beamforming on a distant human-robot interaction scenario. In Proc. Interspeech 2024, 3215–3219.

[7] Gong, T., Chen, D., Wang, G., et al. (2024). Multimodal fusion and human-robot interaction control of an intelligent robot. Frontiers in Bioengineering and Biotechnology, 12, 1310247.

[8] Mauliana, M., Ashok, A., Czernochowski, D., & Berns, K. (2025). Exploring LLM-powered multi-session human-robot interactions with university students. Frontiers in Robotics and AI, 12, 1585589.

[9] Kim, C.Y., Lee, C.P., & Mutlu, B. (2024). Understanding large-language model (LLM)-powered human-robot interaction. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, 371–380.

[10] Wang, C., Hasler, S., Tanneberg, D., et al. (2024). LaMI: Large language models for multi-modal human-robot interaction. arXiv, 2401.15174.

[11] Liu, H., Zhang, Y., Li, C., et al. (2023). Challenges and applications of large language models in robotics control. Journal of Artificial Intelligence Research, 77, 145–189.

[12] Amershi, S., Weld, D., Vorvoreanu, M., et al. (2019). Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 3.

[13] Jiao, X., Yin, Y., Shang, L., et al. (2020). TinyBERT: Distilling BERT for natural language understanding. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, 4163–4174.

Downloads

Published

30-04-2026

Issue

Section

Articles

How to Cite

Mu, W. (2026). Research and Implementation of a Human-Computer Interaction AI Intelligent Robot Based on Speech. Frontiers in Computing and Intelligent Systems, 16(2), 53-56. https://doi.org/10.54097/mym13c78