Emotion Recognition in Lhasa Tibetan Speech based on Bi-LSTM Graph Convolutional Networks
DOI:
https://doi.org/10.54097/y9p1va72Keywords:
Tibetan Speech Emotion Recognition, Mel-frequency Cepstral Coefficients (MFCC), Bidirectional Long Short-Term Memory (Bi-LSTM), Graph Convolution Network (GCN)Abstract
Speech Emotion Recognition (SER) is a crucial component in the field of Human-Computer Interaction (HCI), with significant research and practical application implications. However, due to the complexity of the Tibetan language and the scarcity of datasets caused by the difficulty in collecting various dialects, there are not many research achievements in Tibetan speech recognition. Based on the foundation of constructing a TBLS1 dataset containing 6,000 Tibetan-language speech samples, an approach was devised for Tibetan speech emotion recognition. This approach leverages MFCC features and incorporates a Bi-directional Long Short-Term Memory (Bi-LSTM) network within a graph convolutional neural network. Finally, by comparing the performance of different models on this dataset, we demonstrated the feasibility of our model for Tibetan speech emotion recognition.
Downloads
References
Guzeyue, Bianbawangdui, Qi Jindong. Tibetan Speech Emotion Recognition Based on Multi-Feature Fusion[J]. Modern Electronic Technology, 2023, 46(21): 129-133. DOI: 10. 16652/j.issn.1004-373x.2023.21.024.
Cai Youxin, Bianbawangdui. Tibetan Speech Emotion Recognition Based on Bidirectional GRU Model[J]. Information Technology and Informatization, 2023, (10): 209-213.
Ding Nan. Research on Speech Emotion Recognition Based on Feature Learning[D]. Nanjing University of Posts and Telecommunications, 2023. DOI: 10.27251/d.cnki. gnjdc. 2023. 001294.
Akçay M B, Oğuz K. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers[J]. Speech Communication, 2020, 116: 56-76.
Huang Xiyang, Du Qingzhi, Long Hua, et al. Speech Emotion Recognition Algorithm Based on MFCC Feature Fusion[J]. Journal of Shaanxi University of Technology (Natural Science Edition), 2023, 39(04): 17-25.
Shirian A, Guha T. Compact graph architecture for speech emotion recognition[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021: 6284-6288.
Liu J, Wang H. Graph Isomorphism Network for Speech Emotion Recognition[C]//Interspeech. 2021: 3405-3409.
Shirian A, Guha T. Compact graph architecture for speech emotion recognition[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021: 6284-6288.
Liu J, Wang H, Sun M, et al. Graph based emotion recognition with attention pooling for variable-length utterances[J]. Neurocomputing, 2022, 496: 46-55.
Li Zijing, Chen Ning. Speech Emotion Recognition Model Based on Multi-Modal Fusion of Graph Neural Network[J]. Computer Application Research, 2023, 40(08): 2286-2291+2310. DOI: 10.19734/j.issn.1001-3695.2023.01.0002.
Zhang S, Tong H, Xu J, et al. Graph convolutional networks: Algorithms, applications and open challenges[C]// Computational Data and Social Networks: 7th International Conference, CSoNet 2018, Shanghai, China, December 18–20, 2018, Proceedings 7. Springer International Publishing, 2018: 79-91.
Xu Huanan, Zhou Xiaoyan, Jiang Wan, et al. Speech Emotion Recognition Algorithm Based on Self-Attention Temporal and Spatial Features[J]. Acoustic Technology, 2021, 40(06): 807-814. DOI: 10.16300/j.cnki.1000-3630.2021.06.011.
Peng Z, Dang J, Unoki M, et al. Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech[J]. Neural Networks, 2021, 140: 261-273.
Li Y, Wang Y, Yang X, et al. Speech emotion recognition based on Graph-LSTM neural network[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2023, 2023(1): 40.
Su B H, Chang C M, Lin Y S, et al. Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network[C]//INTERSPEECH. 2020: 506-510.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Frontiers in Computing and Intelligent Systems
This work is licensed under a Creative Commons Attribution 4.0 International License.