Multimodal Emotion Analysis Model based on Interactive Attention Mechanism
DOI:
https://doi.org/10.54097/fcis.v3i2.7512Keywords:
Multimodal sentiment analysis, Attention mechanism, Multi-mission joint trainingAbstract
In traditional multi-modal sentiment analysis, feature fusion is usually achieved by simple splicing, and multi-modal sentiment analysis is only trained as a single task, without considering the contribution of inter-modal information interaction to sentiment analysis and the correlation and constraint relationship between multi-modal and single-modal (text, video and audio) tasks. Therefore, a multi-task model based on interactive attention mechanism is proposed in this paper, which uses inter-modal attention mechanism and single-modal self-attention mechanism to train multi-modal sentiment analysis and single-modal sentiment analysis together, so as to make full use of inter-modal and inter-task information sharing, mutual complement, and reduce noise to improve the overall recognition performance. Experiments show that the proposed model performs well on MOSI and MOSEI common data sets for multimodal sentiment analysis.
Downloads
References
Yang Li-gong, Zhu Jian, TANG Shi-ping. A review of text sentiment Analysis [J]. Journal of Computer Applications, 2013, 33(6):1574-1607.
Liu Jiming, Zhang Peixiang, Liu Ying, et al. Multimodal sentiment Analysis [J]. Journal of Computer Science and Exploration, 2021, 15(7): 1165.
Poria S, Chaturvedi I, Cambria E, et al. Convolutional MKL based multimodal emotion recognition and sentiment analysis[C]//2016 IEEE 16th international conference on data mining (ICDM). IEEE, 2016: 439-448.
Erik Cambria, Devamanyu Hazarika, Soujanya Poria, Amir Hussain, and RBV Subramanyam. 2017.Benchmarking multimodal sentiment analysis. In International Conference on Computational Linguistics and Intelligent Text Processing, pages 166–179.Springer.
Zadeh A , Zellers R , Pincus E , et al. Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages[J]. IEEE Intelligent Systems, 2016, 31(6):82-88.
Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis.arXiv preprint arXiv: 1707. 07250.
Poria S, Cambria E, Hazarika D, et al. Context-dependent sentiment analysis in user-generated videos[C]//Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers). 2017: 873-883.
Zadeh A, Liang P P, Poria S, et al. Multi-attention recurrent network for human communication comprehension[C] // Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1).
Vaswani A , Shazeer N , Parmar N , et al. Attention Is All You Need[J]. arXiv, 2017.
Tsai, Y .-H. H.; Bai, S.; Liang, P . P .; Kolter, J. Z.; Morency,L.-P .; and Salakhutdinov, R. 2019. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2019, 6558. NIH Public Access.
Yu Z , Qiang Y . A Survey on Multi-Task Learning[J]. 2017.
Davoodi E , Kosseim L , Mongrain M . On the Influence of Contextual Features for the Identification of Complex Words[J]. International Journal of Semantic Computing, 2017, 11(04):497-511. V, et al. Channel models for fixed wireless applications. IEEE 802.16a cont. IEEE 802.16.3c-01/29r4, 2003
Devlin J , Chang M W , Lee K , et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [J]. 2018.
Devamanyu Hazarika, Roger Zimmermann, and Soujanya Poria. 2020. MISA: Modality-invariant and-specific representations for multimodal sentiment analysis. In Proceedings of the 28th ACM International Conference on Multimedia, pages 1122–1131.
Wenmeng Yu, Hua Xu, Ziqi Yuan, and Jiele Wu.2021. Learning modality-specific representa-tions with self-supervised multi-task learning for multimodal sentiment analysis. arXiv preprintarXiv: 2102.04830.
Hochreiter S , Schmidhuber J . Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780.
Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. 2016. Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intelligent Systems, 31(6):82–88.
AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria,Erik Cambria, and Louis-Philippe Morency. 2018.Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2236–2246.
Liu, Z.; Shen, Y .; Lakshminarasimhan, V . B.; Liang, P . P .;Zadeh, A. B.; and Morency, L.-P . 2018. Efficient Low-rank Multimodal Fusion With Modality-Specific Factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), 2247–2256.
Zadeh, A.; Liang, P . P .; Mazumder, N.; Poria, S.; Cambria, E.; and Morency, L.-P . 2018a. Memory fusion network for multi-view sequential learning. arXiv preprintarXiv:1802.00927 .
Wang, Y .; Shen, Y .; Liu, Z.; Liang, P . P .; Zadeh, A.; and Morency, L.-P . 2019. Words can shift: Dynamically adjusting word representations using nonverbal behaviors. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 7216–7223.
Rahman, W.; Hasan, M. K.; Lee, S.; Zadeh, A. B.; Mao, C.;Morency, L.-P .; and Hoque, E. 2020. Integrating Multimodal Information in Large Pretrained Transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2359–2369.
Han W , Chen H , Poria S . Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis[J]. 2021. arXiv preprintarXiv: 2102. 04830.2109.0041.


