A Survey of Deep Learning-based Facial Expression Recognition Research

Authors

  • Chengxu Liang
  • Jianshe Dong

DOI:

https://doi.org/10.54097/fcis.v5i2.12445

Keywords:

Expression Recognition, Deep Learning, Multimodality

Abstract

Facial expression is one of the ways to convey emotional expression. Deep learning is used to analyze facial expression to understand people's true feelings, and human-computer interaction is integrated into it. However, in the natural real environment and various interference (such as lighting, age and ethnicity), facial expression recognition will face many challenges. In recent years, with the development of artificial intelligence, scholars have studied more and more facial expression recognition in the case of interference, which not only promotes the theoretical research, but also makes it popularized in the application. Facial expression recognition is to identify facial expressions to carry out emotion analysis, and emotion analysis can be analyzed with the help of facial expressions, speech, text, video and other signals. Therefore, facial expression recognition can be regarded as a research direction of emotion analysis. This paper focuses on the perspective of facial expression recognition to summarize. In the process of facial expression recognition, researchers usually try to combine multiple modal information such as voice, text, picture and video for analysis. Due to the differences between single-modal data set and multi-modal data set, this paper will analyze static facial expression recognition, dynamic facial expression recognition and multi-modal fusion. This research has a wide range of applications, such as: smart elderly care, medical research, detection of fatigue driving and other fields.

Downloads

Download data is not yet available.

References

Darwin C. The expression of the emotions in man and animals[M]//The expression of the emotions in man and animals. University of Chicago press, 2015.

Ekman P , Friesen W V . Facial action coding system: A technique for the measurement of facial movement[J]. a technique for the measurement of facial action, 1978.

Han W,Chen H,Gelbukh A, et al.Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis [C]. Proc.of the 2021 International Conference on Multimodal Interaction.2021.6-15.

Cai L, Dong J, Wei M. Multi-modal emotion recognition from speech and facial expression based on deep learning[C].Proc. of the Chinese Automation Congress (CAC). 2020.5726-5729.

Li R, Zhao J, Hu J,et al.Multi-modal Fusion for Video Sentiment Analysis[C]. Proc. of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop. 2020.27-34.

Mittal T,Bhattacharya U,Chandra R,et al.M3ER:Multiplicative Multimodal Emotion Recognition using Facial, Textual, and Speech Cues[C].Proc. of the AAAI Conference onArtificial Intelligence. 2020.1359-1367.

GOODFELLOW I J, Erhan D, Carrier P L, et al. Challenges in representation learning: A report on three machine learning contests[C]//20thInternational Conference on Neural Information Processing, Daegu, Korea, 3-7 November, 2013. UK, Neural Networks, 2015, 64: 59-63.

LUCEY P, COHN J F, KANADE T, et al. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression[C]// Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern RecognitionWorkshops. San Francisco, 2010. Los Alamitos, IEEE: 94-101.

LYONS M, AKAMATSU S, KAMACHI M,et al..Coding facial expressions with gabor wavelets[C]// Third IEEE International Conference onAutomatic Face and Gesture Recognition, Nara Japan, April 14-16 1998. Los Alamitos, IEEE Computer Society ,1998:200-205.

Levi G, Hassner T. Emotion recognition in the wild via convolutional neural networks and mapped binary patterns [C]//Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. New York: ACM, 2015: 503-510.

GROSS R, MATTHEWS I, COHN J, et al. Multipie[J]. Image and Vision Computing,2010.28(5): 807-813.

YIN L, WEI X, SUN Y, et al. A 3d facial expression database for facial behavior research[C]//Automatic face and gesture recognition, FGR 2006 7th international conference, Southampton, UK,10-12 April 2006. Piscataway, IEEE, 2006: 211-216.

ZHAO G, HUANG X, TAINI M, et al. Facial expression recognition from near-infrared videos[J]. Image and Vision Computing, 2011, 2(9): 607-619.

LANGNER O, DOTSCH R, BIJLSTRA G, et al. Presentation and validation of the radboud faces database[J]. Cognition and Emotion, 2010, 24(8): 1377- 1388.

LUNDQVIST D, FLYKT A, OHMAN A.The karolinska directed emotional faces (kdef)[M/CD].CD ROM from Department of Clinical Neuroscience, Psychology section, Karolinska Institutet, Sweden,1998.

LI S,DENG W, DU J. Reliable crowdsourcing and deep locality preserving learning for expression recognition in the wild[C]//in IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Venice, Italy, 21-16 July, 2017. Piscataway, IEEE, 2017, 2584-2593.

LI S,DENG W. Reliable crowdsourcing and deep locality preserving learning for unconstrained facial expression recognition [J]. IEEE Transactions on Image Processing, 2018.

Wang Y J, Guan L and Venetsanopoulos A N. 2012. Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition. IEEE Transactions on Multimedia, 14 ( 3 ): 597-607 [DOI: 10. 1109 / TMM. 2012. 2189550].

Zhalehpour S, Onder O, Akhtar Z and Erdem C E. 2017. BAUM-1: a spontaneous audio-visual face database of affective and mental states. IEEE Transactions on Affective Computing, 8(3): 300-313 [DOI: 10. 1109 / TAFFC. 2016. 2553038.

Xingxun Jiang, Yuan Zong, Wenming Zheng, Chuangao Tang, Wanchuang Xia, Cheng Lu, and Jiateng Liu. Dfew: A large-scale database for recognizing dynamic facial expressions in the wild. In Proceedings of the 28th ACM International Conference on Multimedia, pages 2881–2889, 2020. 1, 5, 6.

Chen G, Zhang S Q and Zhao X M. 2022. Video sequence-based human facial expression recognition using transformer networks. Journal of Image and Graphics,27(10):3022-3030[DOI:10. 11834 / jig. 210248].

GAN YCHEN J,YANG Z,et al. Multiple attention net-work for facial expression recognition[J]. IEEE Access. 2020. 8: 7383-7393.

Guo Jingyuan, Dong Yishan, Liu Xiaowen, Lu Shuhua. Attentional mechanism and Involution operator improved facial expression recognition [J/OL]. Computer engineering and applications.

Chen Gongguan, Zhang Fan, Wang Hua, Fan Hui, Zhang Caiming. Facial expression recognition in region-enhanced attention networks [J/OL]. Journal of Computer-Aided Design and Graphics.

Guo Xin-Gang, Cheng Chao, Shen Ziqi. Facial expression recognition based on convolution network attention mechanism [J/OL]. Journal of jilin university (engineering science). https://doi.org/10.13229/j.cnki.Jdxbgxb20221345.

Liu Cheng-Guang, WANG Shan-Min, LIU Qingshan. Category balance modulation of facial expression recognition [J/OL]. Computer science and exploration. https://kns. cnki. net/kcms/detail//11.5602.TP.20230203.1654.004.html.

ZHANG J, ZHENG Y,QI D.Deep spatio -temporal residualnetworks for citywide crow d flows prediction[C ] / /Proceed-ings of the thirty -one the association for the advance of arti1ficial intelligence. San Francisco: AAAI Association , 2017:1655-1661.

Cholami A, Kwon K, Wu B, et al. SqueezeNext: Hard-ware-aware neural network design [EB].arXiv: 1803. 10615. 2018.

Howard A, Sandler M , Chu G, et al. Searching for Mobile-NetV3[EB]. arXiv:1905.02244,2019.

Kensho Hara, Hirokatsu K ataoka, and Yutaka Satoh. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet In CVPR, pages 6546- 6555, 2018.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pages 770-778, 2016.

KOELSTRA S, MUHL C, SOLEYMANI M, et al.. Deap: A database foremotion analysis using physiological signals[J]. IEEE transactions on affective computing, 2011, 3(1): 18-31.

LI Y,TAO J H,CHAO L L, et al.. CHEAVD: a Chinese natural emotional audio-visual database[J]. Journal of Ambient Intelligence and Humanized Computing, 2017, 8(6):913-924.

BUSSO C, BULUT M ,LEE C, et al..IEMOCAP: Interactive emotional dyadic motion capture database[J], Journal of Language Resources and Evaluation, 2008, 42(4), 335-359.

WU M , SU W J, CHEN L F, et al.. Two-stage fuzzy fusion based-convolution neural network for dynamic emotion recognition [J]. IEEE Transactions on Affective Computing, 2020.

ELLIS J G, JOU B, CHANG S F. Why we watch the news: a dataset for exploring sentiment in broadcast video news[C]. Proceedings of the 16th international conference on multimodal interaction, Istanbul Turkey, Nov 12-16,2014. New York: ACM, 2014:104-111.

DHALL A,GOECKE R,LUCEY S,et al..Collecting large, richly annotated facial-expression databases from movies [J]. IEEE Multi Media,2012,19(3):34-41.

WEI F G, ZHANG S D, FU X H. audio-visual bimodal emotion recognition based on emotional tone[J]. Computer Applications and Software, 2018, 35(8): 238-242.

SONG G J, ZHANG S D, WEI F G. Research on audio-visual dual-modal emotion recognition fusion frame-work[J]. Computer Engineering and Applications, 2020, 56(6):140-146.

ZHANG L. Multimodal emotion recognition based on face and speech and the application in reasoning of robot service tasks [D]. ShanDong University,2021.

SHEN J. Bimodal emotion recognition system based on EEG and facial expression[D]. Nanjing: Nanjing University of Posts and telecommunications,2020.

ZHAO Y F, CHEN D Y. Expression EEG multimodal emotion recognition method based on the bidirectional LSTM and attention mechanism[J]. Computational and Mathematical Methods in Medicine, 2021 (2021): 9967592, 1-12.

Xinwang Li. Human sentiment analysis with multimodal information fusion[D]. Guangzhou: Guangdong University of Technology, 2022.

Downloads

Published

01-09-2023

Issue

Section

Articles

How to Cite

Liang, C., & Dong, J. (2023). A Survey of Deep Learning-based Facial Expression Recognition Research. Frontiers in Computing and Intelligent Systems, 5(2), 56-60. https://doi.org/10.54097/fcis.v5i2.12445