Bimodal Emotion Recognition based on Sichuan Dialect
DOI:
https://doi.org/10.54097/abq1kz93Keywords:
Dynamic Attention, Channel Spatial Attention, Multimodal Fusion, CBAMAbstract
In view of the main technical challenges faced by dialect emotion recognition (data scarcity and low recognition rate), this paper discusses the Sichuan dialect emotion recognition technology, and proposes a multimodal emotion recognition model by constructing a high-quality dataset containing multiple emotion categories. The model adopts the dual-modal fusion strategy of Mel spectral features (MFCCs) and text features, uses dynamic convolutional network (ODConv) and convolutional block attention module (CBAM) to extract features, and combines the Text-CNN model for text sentiment analysis. Experimental results show that the proposed model has higher accuracy and robustness than the traditional speech recognition model CNN in Sichuan dialect emotion recognition task.
Downloads
References
[1] Xie Jinhong, Wei Xia. Sichuan dialect speech recognition based on ResCNN-BiGRU [J]. Modern Electronic Technology, 2024, 47 (01): 89-93. DOI: 10.16652/j.issn.1004-373x.2024. 01. 016.
[2] Chen, Y., Dai, X., Liu, M., Chen, D., Yuan, L., & Liu, Z. (2020). Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11030 11039).
[3] Jinghua Tang, Liyun Zhang, Yu Lu. VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints. arXiv preprint arXiv:2408.13019 (2024).
[4] Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV) (pp. 3-19).
[5] Quang-Anh N.D., Manh-Hung Ha, Thai Kim Dinh. EMOTIONAL VIETNAMESE SPEECH-BASED DEPRESSIONDIAGNOSIS USING DYNAMIC ATTENTION MECHANISM. arXiv preprint arXiv:2412. 08683 (2024).
[6] Li, Y., Xin, Y., Li, X., Zhang, Y., Liu, C., Cao, Z., ... & Wang, L. (2024). Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification. Visual computing for industry, biomedicine, and art, 7(1), 17.
[7] Li, C., Zhou, A., & Yao, A. Omni-dimensional dynamic convolution. arXiv preprint arXiv:2209.07947 (2022).
[8] Yang, B., Bender, G., Le, Q. V., & Ngiam, J. (2019). Condconv: Conditionally parameterized convolutions for efficient inference. Advances in neural information processing systems, 32.
[9] Mao Xueli Research on Language Recognition Methods Based on Convolutional Networks and Attention Mechanisms [D]. Xinjiang University, 2021. DOI: 10.27429/d.cnki. gxjdu. 2021. 000429.
[10] Wang Mingtian Research on Speech Emotion Recognition Based on Text and Acoustic Features [D]. Shandong University, 2022. DOI: 10.27272/d.cnki.gshdu.2022.001740. Research on Language Recognition Methods Based on Convolutional Networks and Attention Mechanisms.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Frontiers in Computing and Intelligent Systems

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

