Multi-task panoramic driving perception algorithm based on improved YOLOv5

Authors

  • Xin Zeng
  • Yong Qi
  • Mingjun Liu

DOI:

https://doi.org/10.54097/hset.v34i.5489

Keywords:

Deep learning, YOLOv5 algorithm, traffic object detection, driveable area segmentation, lane line detection.

Abstract

Panoramic driving perception algorithm is becoming more and more important in the field of autonomous driving. A multi-task panoramic driving perception algorithm can quickly help vehicles make reasonable decisions. Aiming at the real-time high-precision problem of a multi-task panoramic driving perception algorithm, a multi-task panoramic Driving perception algorithm based on YOLOv5 structure is proposed. The algorithm consists of a backbone network for feature extraction and two branch networks for specific tasks. The C3 module in the original backbone network of YOLOv5 is replaced with an inversion residual bottleneck module to reduce the computation amount of the network and improve the network recognition accuracy. This algorithm proposes a new branch network, which can simultaneously train the segmentation of driveable area and lane detection, and improve the speed of feature extraction. In BDD100K, the public data set of Berkeley AI Laboratory, the detection speed of the algorithm proposed in this paper reaches 110FPS, which is 19FPS higher than that of the algorithm before improvement. The detection accuracy of traffic target detection reaches 77.2%, and the detection accuracy of lane line detection reaches 71.1%. It can well meet the real-time requirements of panoramic driving perception.

Downloads

Download data is not yet available.

References

Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector [C]//European conference on computer vision. Springer, Cham, 2016: 21-37.

Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection [J]. arXiv preprint arXiv:2004.10934, 2020.

Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation [C]//International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015: 234-241.

Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481-2495.

Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2881-2890.

Wang Z, Ren W, Qiu Q. Lanenet: Real-time lane detection networks for autonomous driving [J]. arXiv preprint arXiv:1807.01726, 2018.

Hou Y, Ma Z, Liu C, et al. Learning lightweight lane detection cnns by self attention distillation [C]// Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1013-1021.

Pan X, Shi J, Luo P, et al. Spatial as deep: Spatial cnn for traffic scene understanding[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1).

He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969.

Teichmann M, Weber M, Zoellner M, et al. Multinet: Real-time joint semantic reasoning for autonomous driving[C]//2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2018: 1013-1020.

Qian Y, Dolan J M, Yang M. DLT-Net: Joint detection of drivable areas, lane lines, and traffic objects[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21(11): 4670-4679.

Wu D, Liao M, Zhang W, et al. Yolop: You only look once for panoptic driving perception [J]. arXiv preprint arXiv:2108.11250, 2021.

Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.

Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263-7271.

Redmon J, Farhadi A. Yolov3: An incremental improvement [J]. arXiv preprint arXiv:1804.02767, 2018.

Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587.

Girshick R. Fast r-cnn [C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448.

Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28.

Dai J, Li Y, He K, et al. R-fcn: Object detection via region-based fully convolutional networks[J]. Advances in neural information processing systems, 2016, 29.

Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3431-3440.

Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs [J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(4): 834-848.

Qin Z, Wang H, Li X. Ultra fast structure-aware deep lane detection[C]//European Conference on Computer Vision. Springer, Cham, 2020: 276-291.

Yu F, Xian W, Chen Y, et al. Bdd100k: A diverse driving video database with scalable annotation tooling [J]. arXiv preprint arXiv:1805.04687, 2018, 2(5): 6.

He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition [J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916.

Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.

Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8759-8768.

Wang C Y, Liao H Y M, Wu Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 2020: 390-391.

Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection [C]//Proceedings of the IEEE international conference on computer vision. 2017: 2980-2988.

Zheng Z, Wang P, Liu W, et al. Distance-IoU loss: Faster and better learning for bounding box regression [C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(07): 12993-13000.

Wang Y H, Ding H W, Li B, et al. Mask wearing detection algorithm based on improved yolov3 in complex scenes [J]. Computer Engineering, 2020, 46(11): 12-22.

Paszke A, Chaurasia A, Kim S, et al. Enet: A deep neural network architecture for real-time semantic segmentation [J]. arXiv preprint arXiv:1606.02147, 2016.

Downloads

Published

28-02-2023

How to Cite

Zeng, X., Qi, Y., & Liu, M. (2023). Multi-task panoramic driving perception algorithm based on improved YOLOv5. Highlights in Science, Engineering and Technology, 34, 314-325. https://doi.org/10.54097/hset.v34i.5489