Research on Traffic Scene Element Recognition for Autonomous Driving Based on Deep Learning

Authors

  • Yangguoer Zhang

DOI:

https://doi.org/10.54097/jfsm0w96

Keywords:

deep learning; lane line segmentation; object detection; target tracking.

Abstract

This paper introduces the overall research status in the field of autonomous driving, followed by an overview of the research status of multi-lane line recognition algorithms, 3D object detection algorithms, and multi-target tracking algorithms in autonomous driving. However, the road scene in the real environment is complex and changeable, and there is still room for improvement in the real-time performance and accuracy of existing traffic sign detection and recognition methods.

Downloads

Download data is not yet available.

References

[1] L. J. Latecki, R. Lakämper, and T. E. Kurita, "Shape context: A new descriptor for shape matching and object recognition," in Proceedings of the 15th International Conference on Pattern Recognition, 2000, pp. 424-427.

[2] P. Viola and M. Jones, "Robust real-time face detection," International Journal of Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.

[3] S. Belongie, J. Malik, and J. Puzicha, "Shape matching and object recognition using shape contexts," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, 2002.

[4] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, pp. 886-893.

[5] T. Ahonen, A. Hadid, and M. Pietikäinen, "Face description with local binary patterns: Application to face recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 2037-2041, 2006.

[6] J. Sivic and A. Zisserman, "Video google: A text retrieval approach to object matching in videos," in Proceedings of the 9th IEEE International Conference on Computer Vision, 2003, pp. 1470-1477.

[7] J. Wang and J. Li, "Locality-constrained linear coding for image classification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3360-3367.

[8] M. Cheng, Y. Zhang, and S. Lin, "Robust lane detection using a particle filter," in Proceedings of the 16th International Conference on Image Processing, 2009, pp. 1053-1056.

[9] M. Huang, X. Zhang, and Y. Wang, "Robust lane detection and tracking using a hierarchical Hough transform," IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 2, pp. 655-666, 2012.

[10] S. Pan, Y. Zhang, and M. Cheng, "Spatial convolutional neural networks for semantic segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1631-1640.

[11] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.

[12] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in Proceedings of the 32nd International Conference on Machine Learning, 2015, pp. 448-456.

[13] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015.

[14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems, 2012, pp. 1097-1105.

[15] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1-9.

[16] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431-3440.

[17] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (voc) challenge," International Journal of Computer Vision, vol. 88, no. 2, pp. 303-338, 2010.

[18] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779-788.

[19] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector," in European Conference on Computer Vision, 2016, pp. 21-37.

[20] Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2016, 40(4):834-848.

[21] Bell S, Lawrence Zitnick C, Bala K, et al. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2874-2883.

[22] Liang X, Shen X, Feng J, et al. Semantic Object Parsing with Graph LSTM [J]. 2016.

[23] Tompson J, Jain A, Lecun Y, et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation [J]. Eprint Arxiv, 2014: 1799-1807.

[24] Chu X, Ouyang W, Li H, et al. CRF-CNN: Modeling Structured Information in Human Pose Estimation[J]. 2016.

[25] Pan X, Shi J, Luo P, et al. Spatial as Deep: Spatial CNN for Traffic Scene Understanding [J]. 2017.

[26] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks [C]//Advances in neural information processing systems. 2012: 1097-1105.

[27] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [J]. arXiv preprint arXiv: 1409.1556, 2014.

[28] He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition [J]. 2015.

[29] Szegedy C, Liu W, Jia Y, et al. Going Deeper with Convolutions [J]. 2014.

[30] Girshick R, Donahue J, Darrelland T, et al. Rich feature hierarchies for object detection and semantic segmentation [C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014.

[31] Girshick R. Fast R-CNN [J]. Computer Science, 2015.

[32] Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. 2015.

[33] Lu Y, Javidi T, Lazebnik S. Adaptive object detection using adjacency and zoom prediction[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2351-2359.

[34] Kong T, Yao A, Chen Y, et al. Hypernet: Towards accurate region proposal generation and joint object detection [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 845-853.

[35] Shrivastava A, Gupta A, Girshick R. Training region-based object detectors with online hard example mining [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 761-769.

[36] Cai Z, Fan Q, Feris R S, et al. A unified multi-scale deep convolutional neural network for fast object detection [C]//European conference on computer vision. Springer, Cham, 2016: 354-370.

[37] Dai J, Li Y, He K, et al. R-fcn: Object detection via region-based fully convolutional networks[C]// Advances in neural information processing systems. 2016: 379-387.

[38] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.

[39] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263-7271.

[40] Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.

[41] Uijlings J R R, Van De Sande K E A, Gevers T, et al. Selective search for object recognition[J]. International journal of computer vision, 2013, 104(2): 154-171.

[42] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C] //European conference on computer vision. Springer, Cham, 2016: 21-37.

[43] Wont W J, Kim T H, Choi M K, et al. AggNet: Simple Aggregated Network for Real-Time Multiple Object Detection in Road Driving Scene[C]//2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018: 3505-3510.

[44] Chen X, Kundu K, Zhang Z, et al. Monocular 3D Object Detection for Autonomous Driving[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016.

[45] Xiang Y, Choi N W, Lin Y, et al. Data-driven 3D Voxel Patterns for object category recognition[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 2015.

[46] Mousavian A, Anguelov D, Flynn J, et al. 3D Bounding Box Estimation Using Deep Learning and Geometry [J]. 2016.

[47] Kuo C H, Huang C, Nevatia R. Multi-Target Tracking by On-Line Learned Discriminative Appearance Models[C]// Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010.

[48] Breitenstein M D, Reichlin F, Leibe B, et al. Robust Tracking-by-Detection using a Detector Confidence Particle Filter [C]// IEEE 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, September 27 - October 4, 2009. IEEE, 2009.

[49] Henschel R, Leal-Taixé, Laura, Cremers D, et al. Fusion of Head and Full-Body Detectors for Multi-Object Tracking [J]. 2017.

[50] Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric [C]// 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017: 3645-3649.

[51] Bewley A, Ge Z, Ott L, et al. Simple Online and Realtime Tracking[J]. 2016.

[52] Chen L, Ai H, Zhuang Z, et al. [IEEE 2018 IEEE International Conference on Multimedia and Expo (ICME) - San Diego, CA, USA (2018.7.23-2018.7.27)] 2018 IEEE International Conference on Multimedia and Expo (ICME) - Real-Time Multiple People Tracking with Deeply Learned Candidate Selection and Person ReIdentification [J]. 2018:1-6.

[53] Sun S J, Akhtar N, Song H S, et al. Deep Affinity Network for Multiple Object Tracking[J]. 2018.

[54] Xu Y, Ban Y, Alameda-Pineda X, et al. DeepMOT: A Differentiable Framework for Training Multiple Object Trackers[J]. 2019.

[55] Wei Liu, Dragomir Anguelov, Dumitru Erhan,etc. SSD: Single Shot MultiBox Detector[C]// European Conference on Computer Vision. Springer International Publishing, 2016.

[56] Pulak Purkait, Cheng Zhao, Christopher Zach. SPP-Net: Deep Absolute Pose Regression with Synthetic Views [C]// British Machine Vision Conference (BMVC 2018). 2017.

[57] Karen Simonyan, Andrea Vedaldi, Andrew Zisserman. Deep Fisher Networks forLarge-ScaleImage Classification[C]// NIPS. Curran Associates Inc. 2013.

[58] Szegedy, Christian, Ioffe, Sergey, Vanhoucke, Vincent. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning [J].

[59] Nie Jun. Reduced dimensionality algorithm for redundant data in cloud computing based on K-L feature compression [J]. Microelectronics and Computers, 2016,. 000(002):125-129.

Downloads

Published

23-11-2024

How to Cite

Zhang, Y. (2024). Research on Traffic Scene Element Recognition for Autonomous Driving Based on Deep Learning. Highlights in Science, Engineering and Technology, 118, 82-91. https://doi.org/10.54097/jfsm0w96