Investigation into Few-Shot Instance Segmentation of Agricultural Landscapes Utilizing SAM-Driven Model

Authors

  • Wenbo Chen
  • Hongbo Zhu
  • Chengwei Zhao
  • Xuqing Li
  • Tingxuan Wang
  • Zekun Zhang
  • Zhihe Hu

DOI:

https://doi.org/10.54097/9nqvrz65

Keywords:

Few-shot Learning, Quantification, Semantic Segmentation in Farmland, SAM Model, Cost

Abstract

In complex agricultural landscapes, especially with limited sample sizes, current instance segmentation techniques often grapple with defining clear boundaries, thereby compromising segmentation accuracy. To mitigate this challenge, our research proposes enhancements through quantitative comparisons and the incorporation of the SAM model to bolster instance segmentation networks. Specifically, we have formulated SAM+Mask2Former and SAM+Mask R-CNN by adopting SAM's feature encoder as the foundational backbone. Further refinement is accomplished by integrating the Efficient Channel Attention (ECA) module and optimizing prompts, culminating in the SAM+ECA+Prompter model. Impressively, these models exhibit robust generalization capabilities when applied to high-resolution remote sensing imagery, even when based on smaller datasets. Empirical evidence demonstrates that, even with a mere 50 samples, the models achieve an Average Precision (AP) and Average Recall (AR) of 0.715. When scaled to 200 samples, AP surges to 0.875, while AR surpasses 0.903—figures that stand toe-to-toe with traditional field segmentation paradigms. Consequently, our research significantly elevates the efficacy of segmentation models, emphasizing a reduction in both labeling and computational expenses, thereby presenting a resource-efficient, high-precision solution for precision agriculture.

Downloads

Download data is not yet available.

References

[1] Yuan X, Shi J, Gu L. A review of deep learning methods for semantic segmentation of remote sensing imagery[J]. Expert Systems with Applications, 2021, 169: 114417.

[2] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969.

[3] Cheng B, Misra I, Schwing A G, et al. Masked-attention mask transformer for universal image segmentation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 1290-1299.

[4] Kirillov A, Mintun E, Ravi N, et al. Segment anything[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 4015-4026.

[5] Chen K, Liu C, Chen H, et al. RSPrompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024.

[6] Ministry of Land and Resources of the People's Republic of China. (2012). High-standard Basic Farmland Construction Standards: TD/T 1033-2012 [S].

[7] Zhao Z, Liu Y, Zhang G, et al. The Winning Solution to the iFLYTEK Challenge 2021 Cultivated Land Extraction from High-Resolution Remote Sensing Images[C]//2022 14th International Conference on Advanced Computational Intelligence (ICACI). IEEE, 2022: 376-380.

[8] Lin T Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]//Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014: 740-755.

[9] He K, Chen X, Xie S, et al. Masked autoencoders are scalable vision learners[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 16000-16009.

[10] Chen Z, Zhou H, Lai J, et al. Contour-aware loss: Boundary-aware learning for salient object segmentation [J]. IEEE Transactions on Image Processing, 2020, 30: 431-443.

[11] Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481-2495.

[12] Wang Q, Wu B, Zhu P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks [C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 11534-11542.

[13] Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 801-818.

Downloads

Published

28-04-2025

Issue

Section

Articles

How to Cite

Chen, W., Zhu, H., Zhao, C., Li, X., Wang, T., Zhang, Z., & Hu, Z. (2025). Investigation into Few-Shot Instance Segmentation of Agricultural Landscapes Utilizing SAM-Driven Model. International Journal of Energy, 6(3), 47-55. https://doi.org/10.54097/9nqvrz65