Comparative Analysis of Transformer Integration in U-net Networks for Enhanced Medical Image Segmentation

Authors

  • Zixin Hao

DOI:

https://doi.org/10.54097/z4b39y45

Keywords:

Transformer, U-net, Medical image segmentation, Computer vision.

Abstract

Transformer is popular in Natural Language Processing (NLP) and is a cornerstone of large models. Transformer has been used by researchers to address the limitations of Convolutional Neural Networks (CNNs) in medical picture segmentation models. Through an extensive literature review and case studies, this paper comparatively analyzes the performance of different models in this field, summarizes different methods of integrating transformers into U-net, and points out existing gaps and challenges. Research has found that the Transformer model can significantly improve the accuracy and efficiency of medical image analysis. The paper discusses the advantages, disadvantages, innovations, performance, and complexity of various models in detail, and shows how to enhance performance by integrating the Transformer structure into the U-net network. In particular, the paper also analyzes the advantages of Transformers that are most suitable for integration into the encoder part and highlights the balance that needs to be made between improving performance and computational cost. The conclusion shows that although there is no perfect model, optimal performance and efficiency can be achieved by selecting different combinations of Transformer and U-net according to the actual situation. It can be seen from the networks’ performance that the mixed use of a U-shaped convolutional network and Transformer module has good development prospects and high research significance.

Downloads

Download data is not yet available.

References

Doi K. Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Computerized medical imaging and graphics, 2007, 31(4-5): 198-211.

Patil D D, Deore S G. Medical image segmentation: a review. International Journal of Computer Science and Mobile Computing, 2013, 2(1): 22-27.

Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer International Publishing, 2015: 234-241.

Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in neural information processing systems, 2017, 30.

Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.

Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention. International conference on machine learning. PMLR, 2021: 10347-10357.

Chen J, Lu Y, Yu Q, et al. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.

He A, Wang K, Li T, et al. H2Former: An Efficient Hierarchical Hybrid Transformer for Medical Image Segmentation. IEEE Transactions on Medical Imaging, 2023.

Chen D, Yang W, Wang L, et al. PCAT-UNet: UNet-like network fused convolution and transformer for retinal vessel segmentation. PloS one, 2022, 17(1): e0262689.

Wu Y, Liao K, Chen J, et al. D-former: A u-shaped dilated transformer for 3d medical image segmentation. Neural Computing and Applications, 2023, 35(2): 1931-1944.

Wang B, Wang F, Dong P, et al. Multiscale transunet++: dense hybrid u-net with transformer for medical image segmentation. Signal, Image and Video Processing, 2022, 16(6): 1607-1614.

Lin X, Yu L, Cheng K T, et al. BATFormer: Towards Boundary-Aware Lightweight Transformer for Efficient Medical Image Segmentation. IEEE Journal of Biomedical and Health Informatics, 2023.

Sheth I, Braga P H M, Sujit S, et al. RelationalUNet for Image Segmentation. International Workshop on Machine Learning in Medical Imaging. Cham: Springer Nature Switzerland, 2023: 320-329.

Bernard O, Lalande A, Zotti C, et al. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved?. IEEE transactions on medical imaging, 2018, 37(11): 2514-2525.

Porwal P, Pachade S, Kamble R, et al. Indian diabetic retinopathy image dataset (IDRiD): a database for diabetic retinopathy screening research. Data, 2018, 3(3): 25.

Jha D, Smedsrud P H, Riegler M A, et al. Kvasir-seg: A segmented polyp dataset. MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II 26. Springer International Publishing, 2020: 451-462.

Gutman D, Codella N C F, Celebi E, et al. Skin lesion analysis toward melanoma detection: A challenge at the international symposium on biomedical imaging (ISBI) 2016, hosted by the international skin imaging collaboration (ISIC). arXiv preprint arXiv:1605.01397, 2016.

Landman B, Xu Z, Igelsias J, et al. Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault—Workshop Challenge. 2015, 5: 12.

Staal J, Abràmoff M D, Niemeijer M, et al. Ridge-based vessel segmentation in color images of the retina. IEEE transactions on medical imaging, 2004, 23(4): 501-509.

Hoover A D, Kouznetsova V, Goldbaum M. Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Transactions on Medical imaging, 2000, 19(3): 203-210.

Owen C G, Rudnicka A R, Mullen R, et al. Measuring retinal vessel tortuosity in 10-year-old children: validation of the computer-assisted image analysis of the retina (CAIAR) program. Investigative ophthalmology & visual science, 2009, 50(5): 2004-2010.

Litjens G, Toth R, Van De Ven W, et al. Evaluation of prostate segmentation algorithms for MRI: the PROMISE12 challenge. Medical image analysis, 2014, 18(2): 359-373.

Bilic P, Christ P, Li H B, et al. The liver tumor segmentation benchmark (lits). Medical Image Analysis, 2023, 84: 102680.

Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data, 2018, 5(1): 1-9.

Downloads

Published

26-04-2024

How to Cite

Hao, Z. (2024). Comparative Analysis of Transformer Integration in U-net Networks for Enhanced Medical Image Segmentation. Highlights in Science, Engineering and Technology, 94, 333-340. https://doi.org/10.54097/z4b39y45