Research Progress on Autonomous Driving of Automobiles Based on Deep Learning

Zibo Yuan

doi:10.54097/r2300b44

Authors

Zibo Yuan

DOI:

https://doi.org/10.54097/r2300b44

Keywords:

Deep Learning; Autonomous Driving; Perception and Decision-Making; Vehicle Control; End-To-End.

Abstract

With the wide application of deep learning in fields such as image processing and computer vision, research on autonomous driving based on deep learning has become a research hotspot in autonomous driving technology. In deep learning, the application of Convolutional Neural Network models can significantly enhance the performance of autonomous driving in environmental perception. Combined with multi-sensor fusion technology, it has considerably improved the accuracy of target detection and semantic segmentation. Based on the information output from the perception layer, at the decision-making and control level, the Recurrent Neural Network model and the Transformer model further play a role in optimizing path decision-making and vehicle control strategies. Finally, through an end-to-end algorithmic framework, imitation learning and reinforcement learning are integrated to achieve the synergy of perception, decision-making and control, thereby significantly enhancing the overall performance of the autonomous driving system. This article will conduct a review of the application of the above-mentioned models and point out the challenges faced by deep learning based on the actual application scenarios. On this basis, it concludes that future research on autonomous driving based on deep learning will develop in the directions of model lightweighting, model refinement, and multi-technology integration.

References

[1] Li, C., Zhang, Z., Liang, Z., et al.: 'Review of Object Detection Models', J. Comput. Res. Dev., 2025, pp. 1-35

[2] Zhai, S., Shang, D., Wang, S., Dong, S.: 'DF-SSD: An improved SSD object detection algorithm based on DenseNet and feature fusion', IEEE Access, 2020, 8, pp. 24344-24357

[3] Muhammad, H.: 'YOLO-v1 to YOLO-v8, the rise of YOLO and its complementary nature toward digital manufacturing and industrial defect detection', Machines, 2023, 7, (11), pp. 677-697

[4] Cheng, M., Bai, J., Li, L., et al.: 'Tiny RetinaNet: a one-stage detector for real-time object detection', Proc. Int. Conf. Graph. Image Process., 2020, 11373

[5] Yu, X.: 'Research on Key Issues of Road Object Detection for Autonomous Driving in Complex Traffic Scenarios', PhD Thesis, Univ. Chin. Acad. Sci., 2024

[6] Marinó, G.C., Petrini, A., Malchiodi, D., Frasca, M.: 'Deep neural networks compression: A comparative survey and choice recommendations', Neurocomputing, 2023, 520, pp. 152-170

[7] Ning, Q., Wang, Y.: 'Research and Application of YOLO Object Detection Technology', Sci. Technol. Innov., 2025, (9), pp. 26-35

[8] Zhou, Z.: 'Research on LiDAR Point Cloud Object Detection Algorithm Based on Deep Learning', PhD Thesis, Xi'an Univ. Technol., 2024

[9] Chen, L.C., Zhu, Y., Papandreou, G., et al.: 'Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation', Lect. Notes Comput. Sci., 2018, 11211, pp. 801-818

[10] Ye, Z., Zheng, K., Zhang, L., et al.: 'Review of Deep Learning Methods for Image Semantic Segmentation', China Equip. Eng., 2025, (10), pp. 23-25

[11] Liu, X., Chen, H., Yu, D., et al.: 'Review of Trajectory Prediction for Autonomous Vehicles Considering Long-Term and Short-Term Characteristics', Front. Comput. Sci. Technol., 2025, pp. 1-23

[12] Lee, D., Kwon, Y.P., McMains, S., et al.: 'Convolution neural network-based lane change intention prediction of surrounding vehicles for ACC', Proc. IEEE Int. Conf. Intell. Transp. Syst., 2017, pp. 1-6

[13] Xu, P., Zhu, X., Clifton, D.A.: 'Multimodal Learning With Transformers: A Survey', IEEE Trans. Pattern Anal. Mach. Intell., 2023, 45, (10), pp. 12113-12132

[14] Hao, Z., Huang, X., Wang, K., et al.: 'Attention-Based GRU for driver intention recognition and vehicle trajectory prediction', Proc. CAA Int. Conf. Veh. Control Intell., 2020, pp. 86-91

[15] Vaswani, A., et al.: 'Attention is all you need', Adv. Neural Inf. Process. Syst., 2017, 30, pp. 5998-6008

[16] Dosovitskiy, A., et al.: 'An image is worth 16x16 words: Transformers for image recognition at scale', arXiv Prepr., 2020, arXiv:2010.11929

[17] Li, Z., et al.: 'BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers', Lect. Notes Comput. Sci., 2022, 13669, pp. 1-21

[18] Bai, X., et al.: 'TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers', Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 1080-1089

[19] Li, Z., Zhang, X., Zhang, Z.: 'Research on Standardization of End-to-End Autonomous Driving Systems', China Automot., 2024, (11), pp. 3-8

[20] Xue, H.: 'Research on End-to-End Strategy for Autonomous Driving Based on Deep Imitation Reinforcement Learning', PhD Thesis, Xidian Univ., 2024