Multimodal Road Traffic Detection Algorithm based on Improved YOLOv8

Xuanning Wei; Jianhan Zhou

doi:10.54097/2m386f32

Authors

Xuanning Wei
Jianhan Zhou

DOI:

https://doi.org/10.54097/2m386f32

Keywords:

Multimodal, Target Detection, Feature Fusion

Abstract

In road traffic detection, traditional unimodal object detection methods exhibit certain limitations in adapting to environmental variations. Moreover, in complex road conditions, mutual occlusion between targets and their confusion with the background pose significant challenges in feature extraction for multi-scale objects and the detection of densely distributed small targets. To address these challenges, this paper proposes a multi-modal object detection algorithm, CD-MMNet, based on YOLOv8. Firstly, the backbone network adopts a dual-branch structure to perform intermediate fusion of features from two modalities—visible light and infrared images—thereby leveraging their complementary characteristics to dynamically select optimal feature extraction in a targeted manner. Secondly, the CBAM attention mechanism is introduced to dynamically adjust the importance of each channel and spatial position in the feature maps, enhancing key regional features while suppressing background noise, thus improving the model's feature extraction capability. Finally, the DBB module is incorporated, utilising a diversified branch network to enhance the model's adaptability to feature maps of varying scales. Experimental results demonstrate that the proposed algorithm outperforms the original YOLOv8 and other mainstream algorithms on the M3FD dataset, achieving a 4.0% improvement in mAP@0.5~0.95 compared to the baseline YOLOv8. This effectively enhances object detection performance in challenging environments such as adverse weather conditions and traffic congestion.

Downloads

Download data is not yet available.

References

[1] X. Zhang, L. Zhang, C. Liu, X. Yang, and Q. Tian, "Vehicle Detection and Classification in Traffic Surveillance Using Convolutional Neural Networks," *IEEE Transactions on Intelligent Transportation Systems*, vol. 18, no. 6, pp. 1749–1759, Jun. 2017, doi: 10.1109/TITS.2017.2755953.

[2] REDMON J, DIVVALA S, GIRSHICK R, et al. "You only look once: unified, real-time object detection." 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: pp. 779-788.

[3] REDMON J, FARHADI A. YOLO9000: Better, faster, stronger. Pro-ceedings of the lEEE Conference on Computer Vision and PatternRecognition.Hawaii: EEE,2017:7263-7271.

[4] REDMON J,FARHADI A.Yolov3: An incremental improvement. ArXi Preprint, 2018, arXiv: 1804.02767.

[5] LIU W, ANGUELOV D, ERHAN D, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Cham: Springer, 2016: pp. 21-37.

[6] FU C Y, LIU W, RANGA A, et al. DSSD: Deconvolutional SingleShot Detector. ArXiv Preprint, 2017, arXiv: 1701.06659.

[7] LI Z, ZHOU F. FSSD: Feature fusion single shot multibox detector. ArXi Preprint, 2017, arXiv:1712.00960.

[8] Zhou K, Chen L, Cao X. Improving multispectral pedestrian detection by addressing modality imbalance problems [C]// Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23 – 28, 2020, Proceedings, Part XVIII 16. Springer International Publishing, 2020: 787-803.

[9] Jiayi Ma, Wei Yu, Pengwei Liang, Chang Li, and Junjun Jiang. Fusiongan: A generative adversarial network for infrared and visible image fusion. Information Fusion, 48:11– 26, 08 2019. 1, 6, 8.

[10] Chen, Y., Liu, J., & Zhao, Z. (2018). End-to-End Multi-Scale Road Detection with Fully Convolutional Networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 3627-3635.

[11] Liu, J., Zhang, J., & Li, M. (2019). Multi-Scale Convolutional Neural Networks for Robust Road Detection. Journal of Visual Communication and Image Representation, 64, 1-9.

[12] Zhou, C., & Li, Z. (2023). Small object detection in traffic scenes using dual-attention networks. Journal of Machine Learning Research, 24(95), 1-15.

[13] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics (version 8.0.0). GitHub (2023).

[14] Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module[M]∥Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018.Lecture notes in computer science. Cham: Springer,2018, 11211: 3-19.

[15] DING X H, ZHANG X Y, HANJG, et al. Diverse Branch Block: Building a Convolution as an Inception-Like Unit[C]∥ Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition. 2021:10886-10895.

[16] LIU J, FAN X, HUANG Z, et al. Target-aware dual adversarial learn-ing and a multi-scenario multi-modality benchmark to fuse infraredand visible for object detection, Proceedings ofthe IEEE/CVF Con-ference on Computer Vision and Pattern Recognition (CVPR). NewOrleans: IEEE,2022:5802-5811.

[17] Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu, Risheng Liu, Wei Zhong, and Zhongxuan Luo. 2022. Target-aware dual adversarial learning and a multiscenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5802–5811.