The Study of Performance for Face Detection Based on Multiple Representative Convolutional Neural Networks
DOI:
https://doi.org/10.54097/hset.v57i.9895Keywords:
VGG16, ResNet50, MobileNetV2, Face Detection, Deep Learning.Abstract
Due to the diversity of deep learning models, choosing the suitable model for a specific task can be rather onerous. In this paper, the performance of three deep convolutional neural networks, namely VGG16, ResNet50, and MobileNetV2 on face detection were compared. Each model was trained on a dataset of 11,900 images from the FDDB dataset that included various face sizes and orientations with multiple augmentations, including color alteration, blurring, and flipping. The final layers of the models were modified into a binary classification model and a regression model indicating face found and coordinates of the facial bounding box. The models were trained on the same basis of 40 epochs with batch size 64 with binary cross entropy loss and DIoU loss and a learning rate of 0.0001 with a learning rate decay of 0.8 per epoch. The experimental results demonstrated that VGG16 outperformed ResNet50 and MobileNetV2 in terms of accuracy, with VGG16 achieving the highest score of 0.9240, followed by ResNet50 with a score of 0.8568, and MobileNetV2 with an accuracy of 0.6028. The results suggest that VGG16 is a more suitable choice for face detection applications than ResNet50 and MobileNetV2, while ResNet50 and MobileNetV2 may provide higher accuracy for other image recognition tasks or real time face detections. The findings in this paper can contribute to the selection of appropriate deep learning models for face detection.
Downloads
References
Zhu Y, Cai H, Zhang S, et al. Tinaface: Strong but simple baseline for face detection. arXiv preprint arXiv:2011.13183, 2020.
Chi C, Zhang S, Xing J, et al. Selective refinement network for high performance face detection, Proceedings of the AAAI conference on artificial intelligence. 2019, 33(01): 8231-8238.
Mandal B, Okeukwu A, Theis Y. Masked face recognition using resnet-50. arXiv preprint arXiv:2104.08997, 2021.
FDDB: A Benchmark for Face Detection in Unconstrained Settings. Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst. 2010.
Simonyan K, & Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
“ImageNet.” [Online]. Available: http://image-net.org/index. [Accessed: 20-Febuary-2023].
He K, Zhang X, Ren S, & Sun J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778), 2016
https://pytorch.org/hub/nvidia_deeplearningexamples_resnet50/ [Accessed: 18-March-2023]
Sandler M, Howard A, Zhu M, Zhmoginov A, & Chen L C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520), 2018.
Zheng Z, Wang P, Liu W, et al. Distance-IoU loss: Faster and better learning for bounding box regression, Proceedings of the AAAI conference on artificial intelligence. 2020, 34(07): 12993-13000.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







