LMANet-Efficient Multi-Scale Attention Aggregation Network for Semantic Segmentation of UHR UAV Imagery

Authors

  • Dan Wang
  • Jianying Shen
  • Yiwen Li

DOI:

https://doi.org/10.54097/4p4yg336

Keywords:

UAV Remote Sensing Imagery, Semantic Segmentation, Lightweight Network, Multi-Scale Attention, Hybrid Dilated Convolution, Dense Upsampling

Abstract

Unmanned Aerial Vehicle (UAV) semantic segmentation is vital for applications like urban management and agricultural monitoring. However, segmenting ultra-high-resolution (UHR) UAV imagery, characterized by dense small objects and complex textures, presents a significant challenge in balancing accuracy, speed, and model size. To address the limitations of existing methods in small-object segmentation and computational efficiency, this paper proposes a Lightweight Multi-scale Attention Network (LMANet). Our framework integrates a Multi-Scale Window Attention Aggregation (MS-WAA) encoder, a Depthwise Separable Hybrid Dilated Convolution (DHC) module, and a lightweight decoder (LADecoder) to achieve efficient and precise feature extraction and reconstruction. Extensive experiments on the Vaihingen and Potsdam datasets demonstrate that LMANet achieves superior segmentation performance, particularly for small objects, while maintaining significantly fewer parameters and higher inference speed compared to state-of-the-art models.

Downloads

Download data is not yet available.

References

[1] Li, J., Yi, S., He, R., & Liu, Q. (2024). Semantic Segmentation Method of UAV Image Based on Swin Transformer with Window Attention Aggregation. Computer Engineering and Applications, 60(15), 198–210.

[2] Sun, G., Luo, X., & Zhang, K. (2024). DeepLabV3_DHC: Semantic Segmentation of Urban UAV Remote Sensing Images. Laser & Optoelectronics Progress, 61(04), 394–403.

[3] R. Guan, M. Wang, L. Bruzzone, H. Zhao and C. Yang, "Lightweight Attention Network for Very High-Resolution Image Semantic Segmentation," in IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-14, 2023, Art no. 4403514, doi: 10.1109/TGRS.2023.3272614.

[4] L. -C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. L. Yuille, "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, 1 April 2018, doi: 10.1109/TPAMI.2017.2699184.

[5] Chen L C , Papandreou G, Kokkinos I, et al.Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs[J].Computer Science, 2014(4):357-361. DOI: 10. 1080/17476938708814211.

[6] Chen L C , Papandreou G , Schroff F ,et al.Rethinking Atrous Convolution for Semantic Image Segmentation[J]. 2017. DOI: 10. 48550/arXiv.1706.05587.

[7] Long J , Shelhamer E , Darrell T .Fully Convolutional Networks for Semantic Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(4):640-651.DOI:10.1109/CVPR.2015.7298965.

[8] Zhao H, Shi J, Qi X ,et al. Pyramid Scene Parsing Network [J]. IEEE Computer Society, 2016.DOI: 10.1109/ CVPR. 2017.660.

[9] He K, Gkioxari G, Piotr Dollár,et al.Mask R-CNN[J].IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017.DOI:10.1109/TPAMI.2018.2844175.

[10] Paszke A, Chaurasia A , Kim S ,et al. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation [J]. 2016.DOI:10.48550/arXiv.1606.02147.

[11] Zhao H, Qi X, Shen X ,et al.ICNet for Real-Time Semantic Segmentation on High-Resolution Images[C]//Springer, Cham. Springer, Cham, 2018.DOI:10.1007/978-3-030-01219-9_25.

[12] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo and Q. Hu, "ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 11531-11539, doi:10.1109/CVPR42600.2020.01155.

[13] Chua LO. CNN: A vision of complexity. International Journal of Bifurcation and Chaos. 1997 Oct;7(10):2219-425.

[14] He, J., Chen, H., & Luo, D. (2023). Review of Real-Time Semantic Segmentation Algorithms Based on Deep Learning. Journal of Computer Engineering & Applications, 59(8).

[15] Yan L, Liu D, Xiang Q, Luo Y, Wang T, Wu D, Chen H, Zhang Y, Li Q. PSP net-based automatic segmentation network model for prostate magnetic resonance imaging. Computer Methods and Programs in Biomedicine. 2021 Aug 1; 207:106211.

[16] Campen M. Partitioning surfaces into quadrilateral patches: A survey. InComputer graphics forum 2017 Dec (Vol. 36, No. 8, pp. 567-588).

[17] Li J, Guan W. Patch merging refiner embedding UNet for image denoising. Information Sciences. 2023 Sep 1;641: 119 123.

[18] Zhai G, Zhang W, Yang X, Lin W, Xu Y. Efficient image deblocking based on postfiltering in shifted windows. IEEE Transactions on Circuits and Systems for Video Technology. 2008 Jan 28;18(1):122-6.

[19] Gao X, Zhang Z, Mu T, Zhang X, Cui C, Wang M. Self-attention driven adversarial similarity learning network. Pattern Recognition. 2020 Sep 1; 105:107331.

[20] Orhan AE, Pitkow X. Skip connections eliminate singularities. arXiv preprint arXiv:1701.09175. 2017 Jan 31.

[21] Drozdzal M, Vorontsov E, Chartrand G, Kadoury S, Pal C. The importance of skip connections in biomedical image segmentation. InInternational workshop on deep learning in medical image analysis 2016 Sep 27 (pp. 179-187). Cham: Springer International Publishing.

[22] Zhao R, Qian B, Zhang X, Li Y, Wei R, Liu Y, Pan Y. Rethinking dice loss for medical image segmentation. In2020 IEEE international conference on data mining (ICDM) 2020 Nov 17 (pp. 851-860). Ieee.

[23] Mao A, Mohri M, Zhong Y. Cross-entropy loss functions: Theoretical analysis and applications. InInternational conference on Machine learning 2023 Jul 3 (pp. 23803-23828). pmlr.

[24] Wang L, Li R, Zhang C, Fang S, Duan C, Meng X, Atkinson PM. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS Journal of Photogrammetry and Remote Sensing. 2022 Aug 1; 190: 196-214.

[25] Wang Z, Zheng JQ, Zhang Y, Cui G, Li L. Mamba-unet: Unet-like pure visual mamba for medical image segmentation. arXiv preprint arXiv:2402.05079. 2024 Feb 7.

[26] Liu J, Yang H, Zhou HY, Xi Y, Yu L, Li C, Liang Y, Shi G, Yu Y, Zhang S, Zheng H. Swin-umamba: Mamba-based unet with imagenet-based pretraining. InInternational conference on medical image computing and computer-assisted intervention 2024 Oct 3 (pp. 615-625). Cham: Springer Nature Switzerland.

Downloads

Published

30-12-2025

Issue

Section

Articles

How to Cite

Wang, D., Shen, J., & Li, Y. (2025). LMANet-Efficient Multi-Scale Attention Aggregation Network for Semantic Segmentation of UHR UAV Imagery. Academic Journal of Science and Technology, 18(3), 30-36. https://doi.org/10.54097/4p4yg336