Development of Semantic Segmentation Based on Deep Learning

Yang Zhao

doi:10.54097/hset.v34i.5485

Authors

Yang Zhao

DOI:

https://doi.org/10.54097/hset.v34i.5485

Keywords:

Semantic Segmentation, CNN, FCN, Transformer.

Abstract

In recent years, the discipline of computer vision has seen a lot of interest in the study of image semantic segmentation. Deep learning has grown in popularity, and deep learning and image segmentation have combined and improved. These technologies are now widely employed in autonomous vehicles, intelligent robots, and other devices. In the beginning, Fully Convolutional Networks (FCN) or U-net-based semantic segmentation techniques were proposed; FCN realized an end-to-end training network and effectively applied Convolutional Neural Networks (CNN) to the semantic segmentation domain. To improve outcomes in the field of semantic segmentation, the encoder-decoder structure from the FCN approach was later implemented, and the Atrous Convolution approach was also proposed. Transformer-based semantic segmentation techniques are another recent trend, in addition to CNN-based networks. The Transformer model was first proposed in 2017, and subsequent Transformer-based semantic segmentation methods have also achieved good results. In this paper, these various methods will be compared and discussed to provide a guidance for this field.

Downloads

Download data is not yet available.

References

Long J. et al. Fully Convolutional Networks for Semantic Segmentation [R], Nov. 2014, doi: 10.48550/arXiv.1411.4038.

Ronneberger O, et al. U-net: Convolutional networks for biomedical image segmentation [C], in International Conference on Medical image computing and computer-assisted intervention, 2015, 234–241.

Lecun Y. et al. Gradient-based learning applied to document recognition [C], Proceedings of the IEEE, 1998, 86(11): 2278–2324

Vaswani A. et al., Attention is all you need [J], Advances in neural information processing systems,30, 2017.

Huang G. Weinberger, Densely connected convolutional networks [C], in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, 4700–4708.

Badrinarayanan, V. et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J], IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481–2495

He K. et al. Deep Residual Learning for Image Recognition [C], in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, Jun. 2016, 770–778

Chen L C. et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C], in Proceedings of the European conference on computer vision (ECCV), 2018, 801–818.

Zheng S. et al., Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers [C], in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, 6881–6890.

Liu Z. et al., Swin transformer: Hierarchical vision transformer using shifted windows [C], in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, 10012–10022.

Dosovitskiy A. et al., An image is worth 16x16 words: Transformers for image recognition at scale [R], arXiv preprint arXiv:2010.11929, 2020.