Bimodal Emotion Recognition based on Sichuan Dialect

Authors

  • Jia Wei
  • Xiangguo Sun

DOI:

https://doi.org/10.54097/abq1kz93

Keywords:

Dynamic Attention, Channel Spatial Attention, Multimodal Fusion, CBAM

Abstract

In view of the main technical challenges faced by dialect emotion recognition (data scarcity and low recognition rate), this paper discusses the Sichuan dialect emotion recognition technology, and proposes a multimodal emotion recognition model by constructing a high-quality dataset containing multiple emotion categories. The model adopts the dual-modal fusion strategy of Mel spectral features (MFCCs) and text features, uses dynamic convolutional network (ODConv) and convolutional block attention module (CBAM) to extract features, and combines the Text-CNN model for text sentiment analysis. Experimental results show that the proposed model has higher accuracy and robustness than the traditional speech recognition model CNN in Sichuan dialect emotion recognition task.

Downloads

Download data is not yet available.

References

[1] Xie Jinhong, Wei Xia. Sichuan dialect speech recognition based on ResCNN-BiGRU [J]. Modern Electronic Technology, 2024, 47 (01): 89-93. DOI: 10.16652/j.issn.1004-373x.2024. 01. 016.

[2] Chen, Y., Dai, X., Liu, M., Chen, D., Yuan, L., & Liu, Z. (2020). Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11030 11039).

[3] Jinghua Tang, Liyun Zhang, Yu Lu. VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints. arXiv preprint arXiv:2408.13019 (2024).

[4] Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV) (pp. 3-19).

[5] Quang-Anh N.D., Manh-Hung Ha, Thai Kim Dinh. EMOTIONAL VIETNAMESE SPEECH-BASED DEPRESSIONDIAGNOSIS USING DYNAMIC ATTENTION MECHANISM. arXiv preprint arXiv:2412. 08683 (2024).

[6] Li, Y., Xin, Y., Li, X., Zhang, Y., Liu, C., Cao, Z., ... & Wang, L. (2024). Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification. Visual computing for industry, biomedicine, and art, 7(1), 17.

[7] Li, C., Zhou, A., & Yao, A. Omni-dimensional dynamic convolution. arXiv preprint arXiv:2209.07947 (2022).

[8] Yang, B., Bender, G., Le, Q. V., & Ngiam, J. (2019). Condconv: Conditionally parameterized convolutions for efficient inference. Advances in neural information processing systems, 32.

[9] Mao Xueli Research on Language Recognition Methods Based on Convolutional Networks and Attention Mechanisms [D]. Xinjiang University, 2021. DOI: 10.27429/d.cnki. gxjdu. 2021. 000429.

[10] Wang Mingtian Research on Speech Emotion Recognition Based on Text and Acoustic Features [D]. Shandong University, 2022. DOI: 10.27272/d.cnki.gshdu.2022.001740. Research on Language Recognition Methods Based on Convolutional Networks and Attention Mechanisms.

Downloads

Published

21-01-2025

Issue

Section

Articles

How to Cite

Wei, J., & Sun, X. (2025). Bimodal Emotion Recognition based on Sichuan Dialect. Frontiers in Computing and Intelligent Systems, 11(1), 59-63. https://doi.org/10.54097/abq1kz93