An Image-Text Sentiment Analysis Method for Small Samples Based on Image Captioning

Authors

  • Shuailin Chen

DOI:

https://doi.org/10.54097/xsrhp326

Keywords:

Image Captioning, BERT, pre-trained language model, Sentiment Analysis.

Abstract

With the wide popularity of personal terminals, people prefer social media to share their lives, which provides a rich source for sentiment analysis methods. However, challenges still exist in small-sample sentiment analysis methods. A sentiment analysis method for Small Samples based on Image Caotion and BERT is proposed. Specifically, the model takes a pre-trained language model as the image description decoder and uses a cross-modal attention mechanism to eliminate the effects of misaligned regions. This can further increase the interaction from image to text. Then, the generated descriptions are coupled with the original text in the dataset. The BERT model is used to extract word vectors and output sentiment analysis results. The COCO dataset is used to train the model for image Captioning, and the MVSA dataset is used for training and evaluation of sentiment analysis. The experiment creates Less Sample Segmentation by randomly selecting samples from the dataset. Accuracy and F1 value are used to compare with baseline models to evaluate the model performance. The results show that the Image Captioning-BERT model has a certain performance improvement in sentiment analysis of image-text pairs with small samples.

Downloads

Download data is not yet available.

References

[1] Xu N, Mao W. Multisentinet: A deep semantic network for multimodal sentiment analys. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2017: 2399 - 2402.

[2] Xu N. Analyzing multimodal public sentiment based on hierarchical semantic attentional network. 2017 IEEE international conference on intelligence and security informatics (ISI). IEEE, 2017: 152 - 154.

[3] Yang X, Feng S, Wang D, et al. Image-text multimodal emotion classification via multi-view attentional network. IEEE Transactions on Multimedia, 2020, 23: 4014 - 4026.

[4] Yang X, Feng S, Zhang Y, et al. Multimodal sentiment detection based on multi-channel graph neural networks. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021: 328 - 339.

[5] Tan H, Bansal M. Lxmert: Learning cross-modality encoder representations from transformers. arXiv preprint arXiv, 2019.

[6] Gao T, Fisch A, Chen D. Making pre-trained language models better few-shot learners. arXiv preprint, 2020.

[7] Zhu T, Li L, Yang J, et al. Multimodal sentiment analysis with image-text interaction network. IEEE transactions on multimedia, 2022, 25: 3375 - 3385.

[8] Vaswani A. Attention is all you need. Advances in Neural Information Processing Systems, 2017.

[9] Chen, Li, et al. DeepSentiBank: Visual sentiment concept classification with deep convolutional neural networks. arXiv preprint arXiv:1410.8586, 2014.

[10] Lin T, Maire M, Belongie S J, et al. Microsoft COCO: common objects in context. Computer Vision-ECCV 2014-13th European Conference (ECCV). 2014: 740 - 755.

[11] Niu T, Zhu S, Pang L, et al. Sentiment analysis on multi-view social data. MultiMedia Modeling: 22nd International Conference, MMM 2016, Miami, FL, USA, January 4-6, 2016, Proceedings, Part II 22. Springer International Publishing, 2016: 15 - 27.

[12] Xu N, Mao W, Chen G. A co-memory network for multimodal sentiment analysis. The 41st international ACM SIGIR conference on research & development in information retrieval. 2018: 929 - 932.

Downloads

Published

11-12-2024

How to Cite

Chen, S. (2024). An Image-Text Sentiment Analysis Method for Small Samples Based on Image Captioning. Highlights in Science, Engineering and Technology, 119, 305-311. https://doi.org/10.54097/xsrhp326