YOLO V8 Network Lightweight Method Based on Pruning and Quantization

Yiyang Cheng

doi:10.54097/4qsv1565

Authors

Yiyang Cheng

DOI:

https://doi.org/10.54097/4qsv1565

Keywords:

YOLO v8; Model pruning; Weight quantization; Object detection.

Abstract

With the development of deep learning, convolutional neural networks have been widely used as core algorithms in fields such as computer vision, natural language processing, and speech processing, surpassing the performance of traditional algorithms. However, its complex structure and massive computational load make many algorithms dependent on GPU implementation, which limits its application on mobile devices with limited resources and high real-time requirements. This article explores in detail the lightweighting process of the YOLO v8 model, aiming to optimize its performance in resource constrained environments by combining BN layer scale factor pruning and int8 quantization techniques under the TensorRT framework. Firstly, using the scale factor in the BN layer as an evaluation criterion, redundant channels in the network are identified, effectively reducing the number of model parameters and computational requirements; Then deploy the model in NVIDIA's TensorRT framework for int8 quantization. This process compresses the floating-point operations of the model into low bit int8 operations, significantly reducing computational complexity and memory usage; And validated through COCO dataset experiments during the process. The results show that YOLO v8, after pruning and quantization, can achieve a compression ratio of 64.2% with a loss detection accuracy of about 4%. This study provides a comprehensive solution for the efficient deployment of deep learning models, especially in application scenarios that require balancing performance and resource efficiency, which has important practical significance. Future work will focus on further optimizing pruning and quantization strategies to adapt to a wider range of network architectures and application requirements.

References

[1] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2014). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9.

[2] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.

[3] Huang, G., Liu, Z., & Weinberger, K.Q. (2016). Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261-2269.

[4] Yang, T., Chen, Y., & Sze, V. (2016). Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6071-6079.

[5] Liu, Z., Sun, M., Zhou, T., Huang, G., & Darrell, T. (2018). Rethinking the Value of Network Pruning. ArXiv, abs/1810.05270.

[6] Lan, X., Zhu, X., & Gong, S. (2018). Knowledge Distillation by On-the-Fly Native Ensemble. ArXiv, abs/1806.04606.

[7] Esser, S. K., McKinstry, J. L., Bablani, D., Appuswamy, R., & Modha, D. S. (2020). Learned step size quantization. International Conference on Learning Representations.

[8] Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I., Srinivasan, V., & Gopalakrishnan, K. (2018). PACT: Parameterized Clipping Activation for Quantized Neural Networks. ArXiv, abs/1805.06085.

[9] Banner, R., Nahshan, Y., & Soudry, D. (2019). Post training 4-bit quantization of convolutional networks for rapid-deployment. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc.

[10] Kummer, L., Sidak, K., Reichmann, T., & Gansterer, W. N. (2023). Adaptive precision training (AdaPT): A dynamic quantized training approach for DNNs. In 2023 SIAM International Conference on Data Mining (SDM), 559-567. https://doi.org/10.1137/1.9781611977653.ch63

[11] Wang, K., Liu, Z., Lin, Y., Lin, J., & Han, S. (2019). HAQ: Hardware-Aware Automated Quantization With Mixed Precision. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8604–8612. doi:10.1109/CVPR.2019.00881.

[12] Yang, T., Chen, Y., & Sze, V. (2016). Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6071-6079.