Systematic Analysis of CNN and Its Optimization Algorithms Based on Object Detection
DOI:
https://doi.org/10.54097/gem3mv87Keywords:
CNN; Object Detection; R-CNN series; Faster R-CNN; YOLO series.Abstract
Object detection is a very important research field in computer vision. In recent years, because of the development of Convolutional Neural Networks (CNNs), object detection methods are also more and more advanced. From the earliest Region-based Convolutional Neural Network (R-CNN), researchers proposed a two-stage detection idea, to You Only Look Once (YOLO) single-stage detection, which greatly improved the speed of object detection. There are also Transformer models like the Detection Transformer (DETR), which no longer require the complex post-processing steps of the original model, making detection more straightforward. This paper mainly sorts out how these CNN-based detection methods are developed step by step, analyzes their advantages and disadvantages, such as how the algorithm model achieves a significant improvement in accuracy and efficiency through multi-scale feature fusion, the challenges in detecting small objects and realizing real-time detection, and looks forward to possible future research directions, providing important theoretical and practical guidance for further improving model performance and expanding real-world applications.
Downloads
References
[1] Arkin E, Yadikar N, Xu X, et al. A survey: object detection methods from CNN to transformer. Multimedia Tools and Applications, 2022, 82(14): 21353-21383.
[2] Bhatti U A, Tang H, Wu G, et al. Deep learning with graph convolutional networks: An overview and latest applications in computational intelligence. International Journal of Intelligent Systems, 2023, 2023(1): 8342104.
[3] Gao M, Zheng F, Yu J J Q, et al. Deep learning for video object segmentation: a review. Artificial Intelligence Review, 2023, 56(1): 457-531.
[4] Zhu L, Lee F, Cai J, et al. An improved feature pyramid network for object detection. Neurocomputing, 2022, 483: 127-139.
[5] Xu X, Zhao M, Shi P, et al. Crack detection and comparison study based on faster R-CNN and mask R-CNN. Sensors, 2022, 22(3): 1215.
[6] Redmon J, Divvala S, Girshick R, et al. You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016: 779-788.
[7] Redmon J, Farhadi A. YOLOv3: An Incremental Improvement. 2018. arXiv:1804.02767.
[8] Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: Optimal Speed and Accuracy of Object Detection. 2020. arXiv:2004.10934.
[9] Wang C Y, Bochkovskiy A, Liao H Y M. Scaled-YOLOv4: Scaling Cross Stage Partial Network. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021.
[10] Carion N, Massa F, Synnaeve G, et al. End-to-End Object Detection with Transformers. European Conference on Computer Vision (ECCV). Cham: Springer, 2020: 213-229.
[11] Zhu X, Su W, Lu L, et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection. 2020. arXiv:2010.04159.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Academic Journal of Science and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.








