An ECAPA-TDNN Based Network for Hand Gesture Recognition on Skeletal Data
DOI:
https://doi.org/10.54097/hset.v68i.12502Keywords:
Terms—ECAPA-TDNN; hand gesture recognition; Human-computer interaction.Abstract
Due to the high variety of sign languages, it is essential to present a model that could recognize the hand gesture recognition. The state-of-art model is mainly driven by convolution neural networks (known as CNN), and researches are on optimizing CNN architectures. The CNN networks are too large and require long time to train. To address these challenges, we developed a more accurate and robust ECAPA-TDNN structure for recognition. The ECAPA-TDNN is a structure of multiple one- dimensional neural networks with one-dimensional convolution, activation layers, and batch normalization. On the challenging SHREC 2017 3D Shape Retrieval Contest dataset, the ECAPA-TDNN achieved an accuracy of 92.9%, which is 2% higher than the state-of-the-art accuracy achieved by CNNs.
Downloads
References
Y. Fang, K. Wang, J. Cheng, and H. Lu, “A real-time hand gesture recog- nition method,” in 2007 IEEE International Conference on Multimedia and Expo, 2007, pp. 995–998.
R. Vemulapalli, F. Arrate, and R. Chellappa, “Human action recognition by representing 3d skeletons as points in a lie group,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID: 1732632
L. Xia, C.-C. Chen, and J. K. Aggarwal, “View invariant human action recognition using histograms of 3d joints,” in 2012 IEEE Computer Soci- ety Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 20–27.
Z. Z. L. Z. Li, W., “Action recognition based on a bag of 3d points,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, 2010. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0262885616300592
Z. Liu, C. Zhang, and Y. Tian, “3d-based deep convolutional neural network for action recognition with depth sequences,” Image and Vision Computing, vol. 55, pp. 93–100, 2016, handcrafted vs. Learned Representations for Human Action Recognition.
G. Li, H. Tang, Y. Sun, J. Kong, G. Jiang, D. Jiang, B. Tao, S. Xu, and H. Liu, “Hand gesture recognition based on convolution neural network,” pp. 2719–2729, 2019.
S. Song, C. Lan, J. Xing, W. Zeng, and J. Liu, “An end-to- end spatio-temporal attention model for human action recognition from skeleton data,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, Feb. 2017. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/ view/11212
G. Devineau, F. Moutarde, W. Xi, and J. Yang, “Deep learning for hand gesture recognition on skeletal data,” in 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018), 2018, pp. 106–113.
B. Desplanques, J. Thienpondt, and K. Demuynck, “Ecapa-tdnn: Em- phasized channel attention, propagation and aggregation in tdnn based speaker verification,” 2020.
S. Gao, M.-M. Cheng, K. Zhao, X.-Y. Zhang, M.-H. Yang, and P. Torr, “Res2net: A new multi-scale backbone architecture,” 04 2019.
J. Heo, H.-s. Shin, J.-H. Kim, C.-y. Lim, and H.-J. Yu, “Convolution channel separation and frequency sub-bands aggregation for music genre classification,” 11 2022.
Q. De Smedt, H. Wannous, J.-P. Vandeborre, J. Guerry, B. Le Saux, and D. Filliat, “Shrec’17 track: 3d hand gesture recognition using a depth and skeletal dataset,” in 3DOR-10th Eurographics Workshop on 3D Object Retrieval, 2017, pp. 1–6.
X. Liu and K. Fujimura, “Hand gesture recognition using depth data,” in Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings., 2004, pp. 529–534.
Z. Jinqing, Z. Feng, C. Xu, H. Jing, and W. Ge, “Fusing shape and spatio-temporal features for depth-based dynamic hand gesture recognition,” Multimedia Tools and Applications, vol. 76, pp. 1–20, 10 2017.
M. Devanne, H. Wannous, S. Berretti, P. Pala, M. Daoudi, and A. Del Bimbo, “3d human action recognition by shape analysis of motion trajectories on riemannian manifold,” IEEE Transactions on Cybernetics, vol. 45, no. 7, pp. 1340–1352, 2015.
X. Chen, H. Guo, G. Wang, and L. Zhang, “Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition,” in 2017 IEEE International Conference on Image Processing (ICIP). IEEE, sep 2017. [Online]. Available: https://doi.org/10.1109%2Ficip.2017.8296809.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







