Research Progress on Technologies and Applications of Human-Computer Interaction Gestures Based on Vision and Sensors

Linhao Li

doi:10.54097/0vg95866

Authors

Linhao Li

DOI:

https://doi.org/10.54097/0vg95866

Keywords:

Human-Computer Interaction; Gesture Interaction; Vision-based Gesture Recognition.

Abstract

Against the backdrop of rapid Artificial Intelligence (AI) and Internet of Thing (IoT) advancement, traditional HCI modalities (keyboards, mice) show limitations, spurring attention to intuitive gesture interaction—a core part of NUIs. This technology mainly has two routes. Vision-based systems use cameras, with deep learning (e.g., YOLOv5 for hand detection, CNN for feature extraction) enabling tasks like keypoint and spatio-temporal feature analysis, and real-time recognition via stereo image fusion. Sensor-based systems include wearable (EMG sensors, smart gloves) and non-wearable (Leap Motion, Kinect, WDHS) devices, with multi-sensor fusion enhancing robustness; bioimpedance measurement is an emerging non-wearable method. These technologies apply to intelligent HCI, disability assistance, VR/AR, and healthcare but face challenges like poor environmental robustness and high computing demands, playing a key role in advancing HCI. Looking forward, gesture recognition is expected to evolve toward more natural, adaptive, and multimodal interaction modes by integrating speech, eye tracking, and emotion recognition for context-aware communication.

References

[1]Ga W, Qi M, Ma M, Wang L, Yang C, Liu J, Bian Y, de Melo G, Liu S, Meng X. Employing shadows for multi-person tracking based on a single RGB-D camera. Sensors. 2020;20(4):1056.

[2]Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y. Residual dense network for image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2021;43(7):2480–2495.

[3]Lu W, Wan G, Zhou Y, Fu X, Yuan P, Song S. DeepVCP: An end-to-end deep neural network for point cloud registration. In: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV). 2019. p. 12–21.

[4]Tashakori A, Jiang Z, Servati A, Soltanian S, Narayana H, Le K, Nakayama C, Yang C, Wang ZJ, Eng JJ, Servati P. Capturing complex hand movements and object interactions using machine learning-powered stretchable smart textile gloves. Nat. Mach. Intell. 2024;6(1):106–118.

[5]Mangalam M, Oruganti S, Buckingham G, Borst CW. Enhancing hand-object interactions in virtual reality for precision manual tasks. Virtual Real. 2024;28(4).

[6]de Souza LR, Francisco R, da Rosa Tavares JE, Barbosa JLV. Intelligent environments and assistive technologies for assisting visually impaired people: a systematic literature review. Universal Access Inf. Soc. 2024.

[7]Oudah M, Al-Naji A, Chahl J. Hand gesture recognition based on computer vision: a review of techniques. J. Imaging. 2020;6(8):73.

[8]Yadav KS, KM, Laskar RH, Ahmad N. Exploration of deep learning models for localizing bare-hand in the practical environment. Eng. Appl. Artif. Intell. 2023;123:106253.

[9]Duan H, Sun Y, Cheng W, Jiang D, Yun J, Liu Y, Liu Y, Zhou D. Gesture recognition based on multi-modal feature weight. Concurrency Comput. Pract. Exp. 2020;33(5).

[10]Sen A, Mishra TK, Dash R. Deep learning-based hand gesture recognition system and design of a human–machine interface. Neural Process. Lett. 2023;55(9):12569–12596.

[11]Wang X, Jiang J, Wei Y, Kang L, Gao Y. Research on gesture recognition method based on computer vision. MATEC Web Conf. 2018;232:03042.

[12]Mohammed AAQ, Lv J, Islam MDS. A deep learning-based end-to-end composite system for hand detection and gesture recognition. Sensors. 2019;19(23):5282.

[13]Farid FA, Hashim N, Abdullah JB, Bhuiyan MR, Kairanbay M, Yusoff Z, Karim HA, Mansor S, Sarker MT, Ramasamy G. Single shot detector CNN and deep dilated masks for vision-based hand gesture recognition from video sequences. IEEE Access. 2024;12:28564–28574.

[14]Gupta D, Artacho B, Savakis A. HandyPose: Multi-level framework for hand pose estimation. Pattern Recognit. 2022;128:108674.

[15]Gil-Martín M, Marini MR, San-Segundo R, Cinque L. Dual Leap Motion Controller 2: A robust dataset for multi-view hand pose recognition. Sci. Data. 2024;11(1).

[16]Pannattee P, Kumwilaisak W, Hansakunbuntheung C, Thatphithakkul N, Kuo CCJ. American sign language fingerspelling recognition in the wild with spatio-temporal feature extraction and multi-task learning. Expert Syst. Appl. 2024;243:122901.

[17]Arooj S, Altaf S, Ahmad S, Mahmoud H, Mohamed ASN. Enhancing sign language recognition using CNN and SIFT: a case study on Pakistan sign language. J. King Saud Univ. Comput. Inf. Sci. 2024;36(2):101934.

[18]Clark A, Ahmad I. Touchless and nonverbal human-robot interfaces: an overview of the state-of-the-art. Smart Health. 2023;27:100365.

[19]Xie C, Liu Q, Chen B, Hao Z. Evaluation and analysis of feature point detection methods based on vSLAM systems. Image Vis. Comput. 2024;146:105015.

[20]Liu K, Kehtarnavaz N. Real-time robust vision-based hand gesture recognition using stereo images. J. Real-Time Image Process. 2013;11(1):201–209.

[21]Zhou X, Qi W, Ovur SE, Zhang L, Hu Y, Su H, Ferrigno G, De Momi E. A novel muscle-computer interface for hand gesture recognition using depth vision. J. Ambient Intell. Humaniz. Comput. 2020;11(11):5569–5580.

[22]Sharma S, Gupta R, Kumar A. Trbaggboost: an ensemble-based transfer learning method applied to Indian Sign Language recognition. J. Ambient Intell. Humaniz. Comput. 2020;13(7):3527–3537.

[23]Zhang S, Zhao B, Zhang D, Yang M, Huang X, Han L, Chen K, Li X, Pang R, Shang Y, Cao A. Conductive hydrogels incorporating carbon nanoparticles: a review of synthesis, performance and applications. Particuology. 2023;83:212–231.

[24]Xiao J, Li H, Wu M, Jin H, Deen MJ, Cao J. A survey on wireless device-free human sensing: application scenarios, current solutions, and open issues. ACM Comput. Surv. 2022;55(5):1–35.

[25]Yongda D, Fang L, Huang X. Research on multimodal human-robot interaction based on speech and gesture. Comput. Electr. Eng. 2018;72:443–454.

[26]Lv W. Gesture recognition in somatosensory game via Kinect sensor. Internet Technol. Lett. 2021;6(5).

[27]Ma G, Chen H, Wang P, Dong S, Wang X. A two-electrode frequency-scan system for gesture recognition. Mechatronics. 2023;94:103039.

[28]Negi PS, Pawar R, Lal R. Vision-based real-time human–computer interaction on hand gesture recognition. In: Lect. Notes Netw. Syst. Singapore: Springer; 2020. p. 499–507.

[29]Jiao J, Zhang Q, You J. Research on human-computer interaction technology based on visual user gesture recognition. In: Proc. 3rd Int. Conf. Mechatron. Eng. Inf. Technol. (ICMEIT 2019). 2019.

[30]Piardi L, Leitão P, Queiroz J, Pontes J. Role of digital technologies to enhance the human integration in industrial cyber–physical systems. Annu. Rev. Control. 2024;57:100934.

[31]Tang Y, Fu H, Xu B. Advanced design of triboelectric nanogenerators for future eco-smart cities. Adv. Compos. Hybrid Mater. 2024;7(3).

[32]Mat Sanusi KA, Majonica D, Iren D, Fanchamps N, Klemke R. MILSDeM: Guiding immersive learning system development and taxonomy evaluation. Educ. Inf. Technol. 2024;29(13):16283–16316.

[33]Lee D, Won S, Kim J, Kwon HY. ARGo: augmented reality-based mobile Go stone collision game. Virtual Real. 2024;28(1).

[34]Zhu R, Li X, Zhang X, Xu X. MRI enhancement based on visual-attention by adaptive contrast adjustment and image fusion. Multimed. Tools Appl. 2020;80(9):12991–13017.

[35]Chakraborty BK, Sarma D, Bhuyan MK, MacDorman KF. Review of constraints on vision-based gesture recognition for human–computer interaction. IET Comput. Vis. 2017;12(1):3–15.

[36]Bello AA, Chiroma H, Gital AY, Gabralla LA, Abdulhamid SM, Shuib L. Machine learning algorithms for improving security on touch screen devices: a survey, challenges and new perspectives. Neural Comput. Appl. 2020;32(17):13651–13678.

[37]Jirak D, Biertimpel D, Kerzel M, Wermter S. Solving visual object ambiguities when pointing: an unsupervised learning approach. Neural Comput. Appl. 2020;33(7):2297–2319.

[38]Kerdvibulvech C. A review of computer-based gesture interaction methods for supporting disabled people with special needs. In: Lect. Notes Comput. Sci. Cham: Springer; 2016. p. 503–506.

[39]Cabrera ME, Wachs JP. A human-centered approach to one-shot gesture learning. Front. Robot. AI. 2017;4.