Enhanced Small Target Detection Methodology via Optimized YOLOv8 Framework

Authors

  • Yingjun Zhao
  • Nelson C. Rodelas

DOI:

https://doi.org/10.54097/7terh631

Keywords:

YOLOv8 Algorithm; Small Target Detection; Attention Mechanism; HIoU.

Abstract

In the grand palace of computer vision, object detection as a bright pearl, its research and practice value in automation, efficiency improvement and scientific and technological progress occupy a pivotal position. This paper takes YOLOv8 model as the cornerstone of research, and conducts in-depth optimization and exploration on its loss function, network structure and feature extraction mechanism, aiming at significantly improving the model's ability to identify small targets. Specific optimization measures are as follows: (1) The integration of a deformable convolutional module within the YOLOv8 backbone network represents a significant advancement. This strategic modification allows the model to adaptively tailor its receptive field in response to the unique attributes of small targets. Consequently, the model is endowed with the capability to concentrate more precisely on these targets, thereby substantially enhancing detection accuracy. (2) the incorporation of an attention mechanism into the neck structure of the model serves as a sophisticated enhancement. This mechanism functions akin to a discerning filter, adept at extracting salient features from a vast array of information. (3) this research introduces a groundbreaking method for calculating the Intersection over Union (IoU) loss function, termed HIoU. This innovative approach dynamically modulates the weights of the loss function components throughout the training process. The result is a more precise alignment of small targets with their corresponding ground truth bounding boxes, leading to a marked enhancement in the detection performance of small targets.

Downloads

Download data is not yet available.

References

Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C].Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001:volume 1.:IEEE,2001:I-I.

Ojala T, Pietikainen M, Maenpaa T. Multi resolution gray-scale and rotation invariant texture classification with local binary patterns[J].IEEE Transactions on pattern analysis and machine intelligence,2002,24(7):971-987.

Dalal N, Triggs B. Histograms of oriented gradients for human detection[C].2005 IEEE computer society conference on computer vision and pattern recognition(CVPR’05):volume 1.:IEEE,2005:886-893.

Viola P, Jones M J. Robust real-time face detection[J]. International journal of computer vision,2004,57:137-154.

Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of computer and system sciences,1997,55(1):119-139.

Felzenszwalb P, Mcallester D, Ramanan D. A discriminatively trained, multiscale, deformable part model[C].2008 IEEE conference on computer vision and pattern recognition.: IEEE, 2008:1-8.

Felzenszwalb P F, Girshick R B, Mcallester D. Cascade object detection with deformable part models[C].2010 IEEE Computer society conference on computer vision and pattern recognition.:EEE,2010:2241-2248.

Felzenszwalb P F, GirshicK R B. Object detection with discriminatively trained part-based models[J].IEEE transactions on pattern analysis and machine intelligence, 2009, 32 (9):1627-1645.

Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.

Simonyan K, Zisserman A.Very deep convolutional networks for large-scale image recognition[J].arXiv preprint arXiv: 1409. 1556,2014.

Szegedy C, Liu W, Jia. Going deeper with convolutions [C]. Proceedings of the IEEE conference on computer vision and pattern recognition.2015:1-9.

Ioffe S, Szegedy C. Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]. International conference on machine learning.:pmlr, 2015:448-456.

Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C].Proceedings of the IEEE conference on computer vision and pattern recognition.2016:2818-2826.

Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception resnet and the impact of residual connections on learning[C].Proceedings of the AAAI conference on artificial intelligence: volume 31.2017.

He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C].Proceedings of the IEEE conference on computer vision and pattern recognition.2016:770-778.

Girshick R, Donahue J,Darrell.Rich feature hierarchies for accurate object detection and semantic segmentation [C]. Proceedings of the IEEE conference on computer vision and pattern recognition.2014:580-587.

Downloads

Published

12-07-2024

Issue

Section

Articles

How to Cite

Zhao, Y., & Rodelas, N. C. (2024). Enhanced Small Target Detection Methodology via Optimized YOLOv8 Framework. Academic Journal of Science and Technology, 11(3), 80-84. https://doi.org/10.54097/7terh631