Facial Expression Recognition with Hybrid Features Leveraging DINO Prior Knowledge

Authors

  • Yuansha Xie
  • Cheng Ju
  • Yuxin Chang

DOI:

https://doi.org/10.54097/m7sc3z87

Keywords:

Facial Expression Recognition, The Large-scale Visual Model DINOv2, Hybrid Feature Facial Expression Recognition Model

Abstract

Facial expression recognition plays a crucial role in smart education. To address the over-reliance on single prior image features or the ineffective integration of multiple image features in facial recognition tasks, as well as the poor generalization of facial expression recognition in natural environments. This study utilizes the large-scale visual model DINOv2 as a pre-training model, with its pre-trained weights frozen, leveraging its learned experience from natural image datasets to acquire more universal image features, thereby enhancing the generalization performance of feature extraction. Furthermore, this work proposes a Hybrid Feature Facial Expression Recognition model (HFFER). The model utilizes two different pre-trained models to acquire distinct features, and effectively integrates them through cross-attention mechanisms and multiple convolutions. Experimental results demonstrate that the model achieved accuracies of 92.18% on the RAF-DB datasets, respectively, surpassing or being comparable to existing models. This study introduces a novel approach to facial expression recognition, while its application in real classroom images demonstrates its feasibility and potential in practical educational settings.

Downloads

Download data is not yet available.

References

[1] HE K M, ZHANG X Y, REN S Q, et al. deep residual learning for image recognition[C]∥Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 770 778.

[2] VASWANI A, SHAZEER N M, PARMAR N, et al. Attention is all you need[C]∥Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6000-6010.

[3] ZHU J, LUO B, YANG T, et al. Knowledge conditioned variational learning for one-class facial expression recognition [J]. IEEE Transactions on Image Processing, 2023, 32: 4010-4023.

[4] ZHENG C, MATIAS M, CHEN C. POSTER: a pyramid cross-fusion transformer network for facial expression recognition[C] ∥Proceedings of IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Washington D.C., USA: IEEE Press, 2022: 3138-3147.

[5] KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]∥Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2023: 3992-4003.

[6] HE K M, CHEN X L, XIE S N, et al. Masked autoencoders are scalable vision learners [C]∥Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 15979-15988.

[7] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[C]∥Proceedings of the 9th International Conference on Learning Representations (ICLR). [S.l.]: AAAI Press, 2021: 12-18.

[8] OQUAB M, DARCET T, MOUTAKANNI T, et al. DINOv2: learning robust visual features without supervision [EB/OL]. [2024-02-05]. https:∥openreview.net/forum? id=a68SUt6zFt.

[9] DENG J K, GUO J, YANG J, et al. ArcFace: additive angular margin loss for deep face recognition [C]∥ Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2018: 4685-4694.

[10] ZHAO G Y, PIETIKAINEN N. Dynamic texture recognition using local binary patterns with an application to facial expressions [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2007, 29(6): 915-928.

[11] SAVCHENKO A V. Facial expression and attributes recognition based on multi-task learning of lightweight neural networks[C]∥Proceedings of the 19th InternationaSymposium on Intelligent Systems and Informatics (SISY). Washington D.C., USA: IEEE Press, 2021: 119-124.

[12] LAN Z J, WANG L,NIE X. An expression recognition algorithm based on term frequency-inverse document frequency and hybrid loss[J]. Computer Engineering,2023, 49 (1): 295-302, 310.

[13] MA F Y, SUN B, LI S T. Facial expression recognition with visual transformers and attentional selective fusion[J]. IEEE Transactions on Affective Computing, 2021, 14: 1236-1248.

[14] NAKAMURA F, MURAKAMI M, SUZUKI K, et al. Analyzing the effect of diverse gaze and head direction on facial expression recognition with photo-reflective sensors embedded in a head-mounted display[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(10): 4124-4139.

[15] CARON M, TOUVRON H, MISRA I, et al. Emerging properties in self-supervised vision transformers [C]∥ Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 9630-9640.

[16] XIE B, LIU Y Q, LI Y L. Colorectal polyp segmentation method combining polarized self-attention and Transformer [J]. Opto-Electronic Engineering, 2024, 51(10): 240179.

[17] LI S, DENG W H, DU J P, et al. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild [C]∥Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2017: 2584-2593.

[18] RAN R S, WENG W W, WANG N, et al. Expression recognition based on the extraction of key facial features[J]. Computer Engineering,2023,49(2):254-262.

[19] CHEN S K, WANG J F, CHEN Y D, et al. Label distribution learning on auxiliary label space graphs for facial expression recognition [C]∥Proceeding of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 13981-13990. FARZANEH A H, QI X J. Facial expression recognition in the wild via deep attentive center loss [C]∥Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). Washington D.C., USA: IEEE Press, 2021: 2401-2410.

[20] LI H Y, WANG N N, DING X P, et al. Adaptively learning facial expression representation via C-F labels and distillation [J]. IEEE Transactions on Image Processing, 2021, 30: 2016-2028.

[21] SHE J H, HU Y B, SHI H L, et al. Dive into ambiguity: latent distribution mining and pairwise uncertainty estimation for facial expression recognition [C]∥Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 6244-6253.

[22] ZENG D, LIN Z K, YAN X, et al. Face2Exp: combating data biases for facial expression recognition [C]∥Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 20259-20268.

[23] ZHANG Y, WANG C, LING X, et al. Learn from all: erasing attention consistency for noisy label facial expression recognition[EB/OL].[2024-02-05].https:∥arxiv.org/pdf/ 2207. 10299.

Downloads

Published

29-12-2025

Issue

Section

Articles

How to Cite

Xie, Y., Ju, C., & Chang, Y. (2025). Facial Expression Recognition with Hybrid Features Leveraging DINO Prior Knowledge. Frontiers in Computing and Intelligent Systems, 14(3), 82-88. https://doi.org/10.54097/m7sc3z87