Research on Robot Visual Perception and Object Recognition Based on Deep Learning
DOI:
https://doi.org/10.54097/rhwged92Keywords:
Robot Visual Perception, Object Recognition, Deep Learning, Small and Medium-Sized Scenarios, YOLOv8-nano, Embedded DeploymentAbstract
Robots in small to medium-sized scenarios (such as light industrial sorting and desktop operations) face visual challenges, including occlusion, lighting fluctuations, and high positioning accuracy requirements, while traditional methods and general deep learning models fall short in balancing robustness and performance. This study proposes a solution that integrates dataset optimization, model improvement, and embedded deployment. A hybrid dataset (8 categories, >5,000 samples) was constructed using curated COCO data (style/size standardized) and self-collected images (annotation accuracy ≥98%). YOLOv8-nano was optimized with an SE module and combined with gamma correction and few-shot fine-tuning. The results show an average mAP >78% (≥72% under occlusion/lighting fluctuations, 8%-10% improvement over the baseline) and positioning error ≤6mm. Deployment on a Raspberry Pi 4B (INT8 quantization) achieved ≥22 FPS. The study is limited by the small number of categories and lack of dynamic testing; future work will expand the dataset and add tracking capabilities.
Downloads
References
[1] Han, X., Chen, S., Fu, Z., Feng, Z., Fan, L., An, D., Wang, C., Guo, L., Meng, W., Zhang, X., Xu, R., & Xu, S. (2026). Multimodal fusion and vision–language models: A survey for robot vision. Information Fusion, 126, 103652. https://doi. org/ 10. 1016/j.inffus.2025.103652.
[2] Li, Y., Guo, K., Lu, Y., & Liu, L. (2021). Cropping and attention based approach for masked face recognition. Applied Intelligence, 51(5), 3012–3025. https://doi.org/ 10.1007/ s10489-020-02100-9.
[3] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J., & Research, S. (2021). Published as a conference paper at ICLR 2021 DEFORMABLE DETR: DEFORMABLE TRANSFORMERS FOR END-TO-END OBJECT DETECTION.
[4] Carion, N., Usunier, N., Massa, F., Kirillov, A., Synnaeve, G., Zagoruyko, S., 2020., End-to-End Object Detection with Transformers. In: European Conference on Computer Vision (ECCV). Cham. Glasgow. pp. 213-229.
[5] Li, Y., Guo, K., Lu, Y., & Liu, L. (2021). Cropping and attention based approach for masked face recognition. Applied Intelligence, 51(5), 3012–3025. https://doi.org/10. 1007/ s10 489-020-02100-9.
[6] Chen, R., Qiu, T., Yang, L., Yu, T., Jia, F., & Chen, C. (2024). A method for dense occlusion target recognition of service robots based on improved YOLOv7. Optics and Precision Engineering, 32(10), 1595–1605. https://doi.org/ 10. 37188/ ope. 20243210.1595.
[7] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. https://doi.org/10.1109/cvpr.2016.90.
[8] Sun, Z., Guo, X., Zhang, X., Han, J., & Hou, J. (2021). Research on robot target recognition based on deep learning. Journal of Physics: Conference Series, 1948(1), 012056. https:// doi.org/10.1088/1742-6596/1948/1/012056.
[9] Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Frontiers in Computing and Intelligent Systems

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

