Improved Multi-attention Neural Networks for Image Emotion Regression and the Initial Introduction of CAPS


  • Rending Wang
  • Dongmei Ma



Affective Image Content Analysis, CAPS, SimAM


Image sentiment analysis is a large class of tasks for classifying or regressing images containing emotional stimuli, and it is believed in psychological research that different groups produce different emotions for the same stimuli. In order to study the influence of cultural background on image sentiment analysis, it is necessary to introduce a dataset of image sentiment stimuli that can represent cultural groups. In this paper, we introduce the Chinese Affective Picture System (CAPS), which represents Chinese culture, and revise and test this dataset. The PDANet model has the best performance among the current image sentiment regression models, but due to the difficulty of extracting cross-channel information from the attention module it uses, image long-distance information correlation and other shortcomings, this paper proposes an image emotion regression multiple attention networks, introduces the SimAM attention mechanism, and improves the loss function to make it more consistent with the psychological theory, and proposes a 10-fold cross-validation for CAPS. The network achieves MSE=0.0188, R2=0.359 on IAPS, and MSE=0.0169, R2=0.463 on NAPS, which is better than PDANet; the best training result of CAPS is MSE=0.0083, R2=0.625, and the paired-sample t-test of the results shows that all the three dimensions are significantly positively correlated, with correlation coefficients r=0.942, 0.895 and 0.943, respectively, showing good internal consistency and excellent application prospect of CAPS.


Download data is not yet available.


J. Tooby and L. Cosmides, "The evolutionary psychology of the emotions and their relationship to internal regulatory variables," in Handbook of Emotions, M. Lewis, J. M. Haviland-Jones, and L. F. Barrett, Eds., 3rd ed. New York: The Guilford Press, 2008, pp. 114–137.

J. A. Russell, "Emotion, core affect, and psychological construction," Cognition and Emotion, vol. 23, no. 7, pp. 1259–1283, 2009, doi: 10.1080/02699930902809375.

S. Zhao et al., "Personalized emotion recognition by personality-aware high-order learning of physiological signals," ACM TOMM, vol. 15, no. 1s, Art. no. 14, 2019.

S. Zhao, G. Ding, J. Han, et al., "Personality-aware personalized emotion recognition from physiological signals," in Proc. IJCAI, 2018, pp. 1660–1667.

P. Ekman, "An argument for basic emotions," Cognition & Emotion, vol. 6, no. 3-4, pp. 169-200, 1992.

J. A. Mikels, B. L. Fredrickson, G. R. Larkin, et al., "Emotional category data on images from the international affective picture system," BRM, vol. 37, no. 4, pp. 626–630, 2005.

R. Plutchik, Emotion: A Psychoevolutionary Synthesis. Harpercollins College Division, 1980.

W. G. Parrott, Emotions in Social Psychology: Essential Readings. Psychology Press, 2001.

H. Schlosberg, "Three dimensions of emotion," Psychological Review, vol. 61, no. 2, p. 81, 1954.

J. Lee and E. Park, "Fuzzy similarity-based emotional classification of color images," IEEE TMM, vol. 13, no. 5, pp. 1031–1039, 2011.

H. Gunes and B. Schuller, "Categorical and dimensional affect analysis in continuous input: Current trends and future directions," IVC, vol. 31, no. 2, pp. 120–136, 2013.

K. Sun, J. Yu, Y. Huang, et al., "An improved valence-arousal emotion space for video affective content representation and recognition," in Proc. ICME, 2009, pp. 566-569.

S. M. Alarcão and M. J. Fonseca, "Identifying emotions in images from valence and arousal ratings," MTA, vol. 77, no. 13, pp. 17413–17435, 2018.

R. Markham and L. Wang, "Recognition of emotion by Chinese and Australian children," Journal of Cross-Cultural Psychology, vol. 27, no. 5, pp. 616-643, 1996.

Yuxia Huang, Yuejia Luo, "A pilot study of the International Affective Picture System in China," Chinese Journal of Mental Health, vol. 18, no. 9, pp. 631-635, 2004.

Y. Moriguchi, T. Ohnishi, T. Kawachi, et al., "Specific brain activation in Japanese and Caucasian people to fearful faces," Neuroreport, vol. 16, no. 2, pp. 133-136, 2005.

P. Hot, Y. Saito, O. Mandai, et al., "An ERP investigation of emotional processing in European and Japanese individuals," Brain Research, vol. 1122, no. 1, pp. 171-178, 2006.

J. Machajdik and A. Hanbury, "Affective image classification using features inspired by psychology and art theory," in Proc. ACM Multimedia, 2010.

X. Lu, P. Suryanarayan, R. B. Adams Jr, et al., "On shape and the computability of emotions," in Proceedings of the 20th ACM international conference on Multimedia, 2012, pp. 229-238.

A. Sartori, D. Culibrk, Y. Yan, et al., "Who’s afraid of itten: Using the art theory of color combination to analyze emotions in abstract paintings," in Proceedings of the 23rd ACM international conference on Multimedia, 2015, pp. 311-320.

D. Borth, T. Chen, R. Ji, et al., "SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content," in Acm International Conference on Multimedia, ACM, 2013, doi:10.1145/2502081.2502268.

P. J. Lang, M. M. Bradley, and B. N. Cuthbert, "International affective picture system (iaps): Technical manual and affective ratings," NIMH Center for the Study of Emotion and Attention, 1997, pp. 39-58.

J. Machajdik and A. Hanbury, "Affective image classification using features inspired by psychology and art theory," in Proceedings of the 18th ACM international conference on Multimedia, 2010, pp. 83-92.







How to Cite

Wang, R., & Ma, D. (2024). Improved Multi-attention Neural Networks for Image Emotion Regression and the Initial Introduction of CAPS. Frontiers in Computing and Intelligent Systems, 8(1), 22-34.