DiffuPrompter: Pixel-Level Automatic Annotation for High-Resolution Remote Sensing Images with Foundation Models
Abstract
:1. Introduction
- We present the novel insight that it is possible to automatically obtain the mask annotation of authentic images using off-the-shelf foundational models.
- We propose a training-free prompt generation method, DiffuPrompter, that transforms SAM from a class-agnostic segmenter to a class-aware segmenter to label RSIs automatically.
- We tested several automatic annotation methods on remote sensing datasets, and the extensive results validated the superiority of the proposed DiffuPrompter while proving the positive impact of pseudo-labels on enhancing model generalization performance. The results may provide a reference for future work.
2. Theory and Methods
2.1. Preliminary Knowledge
2.1.1. Overview of SDM
2.1.2. Overview of SAM
2.2. Proposed Method
2.2.1. Textual Concept Grounding
2.2.2. Denoise by Noise
2.2.3. Prompt for SAM
3. Results
3.1. Datasets
3.2. Evaluation Metrics
3.3. Implementation Details
3.4. Qualitative Experiments
3.5. Ablation Study
3.5.1. Comparison with Attention Map under Different Thresholds
3.5.2. Sampling Times
3.6. Segmentation Performance Comparison
3.7. Domain Generalization
3.8. Comparison with the State of the Art
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, K.; Zou, Z.; Shi, Z. Building extraction from remote sensing images with sparse token transformers. Remote Sens. 2021, 13, 4441. [Google Scholar] [CrossRef]
- Cheng, Q.; Zhang, Q.; Fu, P.; Tu, C.; Li, S. A survey and analysis on automatic image annotation. Pattern Recognit. 2018, 79, 242–259. [Google Scholar] [CrossRef]
- Wu, T.; Huang, J.; Gao, G.; Wei, X.; Wei, X.; Luo, X.; Liu, C.H. Embedded discriminative attention mechanism for weakly supervised semantic segmentation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 16765–16774. [Google Scholar] [CrossRef]
- Xu, L.; Ouyang, W.; Bennamoun, M.; Boussaid, F.; Sohel, F.; Xu, D. Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 6984–6993. [Google Scholar] [CrossRef]
- Ru, L.; Zhan, Y.; Yu, B.; Du, B. Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with transformers. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 16846–16855. [Google Scholar] [CrossRef]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 4015–4026. [Google Scholar] [CrossRef]
- Chen, J.; Chen, H.; Chen, K.; Zhang, Y.; Zou, Z.; Shi, Z. Diffusion models for imperceptible and transferable adversarial attack. arXiv 2023, arXiv:2305.08192. [Google Scholar]
- Zhang, Y.; Ling, H.; Gao, J.; Yin, K.; Lafleche, J.F.; Barriuso, A.; Torralba, A.; Fidler, S. Datasetgan: Efficient labeled data factory with minimal human effort. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10145–10155. [Google Scholar]
- Li, D.; Ling, H.; Kim, S.W.; Kreis, K.; Fidler, S.; Torralba, A. BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 21330–21340. [Google Scholar] [CrossRef]
- Wu, W.; Zhao, Y.; Shou, M.Z.; Zhou, H.; Shen, C. Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 1206–1217. [Google Scholar] [CrossRef]
- Chen, K.; Liu, C.; Chen, H.; Zhang, H.; Li, W.; Zou, Z.; Shi, Z. RSPrompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4701117. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Processing Syst. 2017, 30. [Google Scholar] [CrossRef] [PubMed]
- Jaegle, A.; Gimeno, F.; Brock, A.; Vinyals, O.; Zisserman, A.; Carreira, J. Perceiver: General perception with iterative attention. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 4651–4664. [Google Scholar] [CrossRef]
- Smilkov, D.; Thorat, N.; Kim, B.; Viégas, F.; Wattenberg, M. Smoothgrad: Removing noise by adding noise. arXiv 2017, arXiv:1706.03825. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar] [CrossRef]
- Krähenbühl, P.; Koltun, V. Efficient inference in fully connected crfs with gaussian edge potentials. Adv. Neural Inf. Process. Syst. 2011, 24, 109–117. Available online: https://dl.acm.org/doi/10.5555/2986459.2986472 (accessed on 22 April 2024).
- Waqas Zamir, S.; Arora, A.; Gupta, A.; Khan, S.; Sun, G.; Shahbaz Khan, F.; Zhu, F.; Shao, L.; Xia, G.S.; Bai, X. isaid: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019; pp. 28–37. Available online: https://api.semanticscholar.org/CorpusID:170079084 (accessed on 22 April 2024).
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 270–279. [Google Scholar] [CrossRef]
- Xia, G.S.; Yang, W.; Delon, J.; Gousseau, Y.; Sun, H.; Maître, H. Structural high-resolution satellite image indexing. In Proceedings of the ISPRS TC VII Symposium-100 Years ISPRS, Vienna, Austria, 5–7 July 2010; Volume 38, pp. 298–303. Available online: https://api.semanticscholar.org/CorpusID:18018842 (accessed on 22 April 2024).
- Dai, D.; Yang, W. Satellite Image Classification via Two-Layer Sparse Coding with Biased Image Representation. IEEE Geosci. Remote Sens. Lett. 2011, 8, 173–176. [Google Scholar] [CrossRef]
- Zou, Q.; Ni, L.; Zhang, T.; Wang, Q. Deep learning based feature selection for remote sensing scene classification. IEEE Geosci. Remote Sens. Lett. 2015, 12, 2321–2325. [Google Scholar] [CrossRef]
- Zhao, L.; Tang, P.; Huo, L. Feature significance-based multibag-of-visual-words model for remote sensing image scene classification. J. Appl. Remote Sens. 2016, 10, 035004. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Lu, X. Remote sensing image scene classification: Benchmark and state of the art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef]
- Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
- Xiao, Z.; Long, Y.; Li, D.; Wei, C.; Tang, G.; Liu, J. High-resolution remote sensing image retrieval based on CNNs from a dimensional perspective. Remote Sens. 2017, 9, 725. [Google Scholar] [CrossRef]
- Zhou, W.; Newsam, S.; Li, C.; Shao, Z. PatternNet: A benchmark dataset for performance evaluation of remote sensing image retrieval. ISPRS J. Photogramm. Remote 2018, 145, 197–209. [Google Scholar] [CrossRef]
- Wang, Q.; Liu, S.; Chanussot, J.; Li, X. Scene classification with recurrent attention of VHR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1155–1167. [Google Scholar] [CrossRef]
- Li, H.; Jiang, H.; Gu, X.; Peng, J.; Li, W.; Hong, L.; Tao, C. CLRS: Continual learning benchmark for remote sensing image scene classification. Sensors 2020, 20, 1226. [Google Scholar] [CrossRef]
- Liu, K.; Mattyus, G. Fast Multiclass Vehicle Detection on Aerial Images. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1938–1942. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar] [CrossRef]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 1290–1299. [Google Scholar] [CrossRef]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar] [CrossRef]
- Kolesnikov, A.; Lampert, C.H. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part IV 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 695–711. [Google Scholar]
- Ahn, J.; Kwak, S. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4981–4990. [Google Scholar] [CrossRef]
- Kim, B.; Yoo, Y.; Rhee, C.E.; Kim, J. Beyond semantic to instance segmentation: Weakly-supervised instance segmentation via semantic knowledge transfer and self-refinement. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 4278–4287. [Google Scholar] [CrossRef]
- Dai, J.; He, K.; Sun, J. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of the 015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1635–1643. [Google Scholar] [CrossRef]
- Chen, M.; Zhang, Y.; Chen, E.; Hu, Y.; Xie, Y.; Pan, Z. Meta-Knowledge Guided Weakly Supervised Instance Segmentation for Optical and SAR Image Interpretation. Remote Sens. 2023, 15, 2357. [Google Scholar] [CrossRef]
- Kirillov, A.; Wu, Y.; He, K.; Girshick, R. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9799–9808. [Google Scholar]
- Chen, H.; Sun, K.; Tian, Z.; Shen, C.; Huang, Y.; Yan, Y. Blendmask: Top-down meets bottom-up for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8573–8581. [Google Scholar]
- Liu, Y.; Li, H.; Hu, C.; Luo, S.; Luo, Y.; Chen, C.W. Learning to aggregate multi-scale context for instance segmentation in remote sensing images. IEEE Trans. Neural Netw. Learn. Syst. 2024, 1–15, (Early Access). [Google Scholar] [CrossRef] [PubMed]
- Lu, X.; Wang, B.; Zheng, X.; Li, X. Exploring Models and Data for Remote Sensing Image Caption Generation. IEEE Trans. Geosci. Remote Sens. 2017, 56, 2183–2195. [Google Scholar] [CrossRef]
- Qu, B.; Li, X.; Tao, D.; Lu, X. Deep semantic understanding of high resolution remote sensing image. In Proceedings of the 2016 International Conference on Computer, Information and Telecommunication Systems (Cits), Kunming, China, 6–8 July 2016; pp. 1–5. [Google Scholar] [CrossRef]
Mask | NWPU | iSAID | |||||
---|---|---|---|---|---|---|---|
AP | AP | ||||||
Cross-Attention | 0.3 | 10.3 | 17.7 | 13.2 | 3.1 | 7.4 | 5.5 |
0.4 | 17.2 | 25.1 | 13.5 | 9.2 | 15.2 | 10.6 | |
0.5 | 15.6 | 22.3 | 19.7 | 5.3 | 9.1 | 7.4 | |
DiffuPrompter | 0.3 | 22.5 | 37.9 | 30.1 | 5.7 | 15.6 | 10.9 |
0.4 | 27.3 | 50.2 | 36.1 | 15.4 | 31.2 | 19.4 | |
0.5 | 25.4 | 47.9 | 33.0 | 13.1 | 27.5 | 16.3 |
Times | NWPU | iSAID | ||||
---|---|---|---|---|---|---|
AP | AP | |||||
1 | 4.3 | 6.7 | 4.2 | 2.4 | 3.7 | 2.1 |
5 | 7.6 | 10.3 | 8.6 | 4.5 | 7.3 | 6.5 |
10 | 10.3 | 12.5 | 10.5 | 5.4 | 9.6 | 7.8 |
20 | 20.5 | 48.6 | 33.3 | 12.5 | 27.8 | 16.5 |
50 | 27.3 | 50.2 | 36.1 | 15.4 | 31.2 | 19.4 |
100 | 27.2 | 50.4 | 36.2 | 15.6 | 31.1 | 19.5 |
Training Set | Method | Size | AP | |||||
---|---|---|---|---|---|---|---|---|
Training with Pure Real Label | ||||||||
NWPU | Mask R-CNN | R: 0.8 k (all) | 58.3 | 90.2 | 60.7 | 40.9 | 56.6 | 61.1 |
Cascade R-CNN | R: 0.8 k (all) | 59.8 | 91.9 | 66.6 | 45.3 | 60.0 | 67.3 | |
Mask2Former | R: 0.8 k (all) | 61.3 | 92.5 | 68.6 | 46.3 | 62.7 | 69.5 | |
Mask R-CNN | R: 0.6 k | 50.2 | 83.1 | 55.4 | 31.2 | 48.3 | 57.5 | |
Cascade R-CNN | R: 0.6 k | 55.4 | 86.1 | 62.4 | 40.3 | 49.7 | 62.1 | |
Mask2Former | R: 0.6 k | 56.2 | 87.1 | 64.1 | 42.5 | 53.8 | 64.2 | |
Training with Pure Pseudo-Label | ||||||||
DiffuPrompter | Mask R-CNN | P: 9.3 k | 27.3 | 50.2 | 36.1 | 17.2 | 25.1 | 35.4 |
Cascade R-CNN | P: 9.3 k | 29.9 | 52.3 | 38.1 | 19.4 | 28.7 | 35.2 | |
Mask2Former | R: 9.3 k | 30.3 | 53.4 | 40.1 | 21.3 | 30.5 | 37.2 | |
Training with Pseudo and Real Label | ||||||||
DiffuPrompter & NWPU | Mask R-CNN | P: 9.3 k R: 0.4 k | 50.6 | 85.3 | 54.2 | 32.4 | 53.3 | 58.6 |
Cascade R-CNN | P: 9.3 k R: 0.4 k | 52.1 | 86.5 | 59.1 | 33.4 | 55.3 | 64.0 | |
Mask2Former | P: 9.3 k R: 0.4 k | 54.6 | 88.7 | 64.2 | 40.9 | 53.6 | 62.1 | |
Mask R-CNN | P: 9.3 k R: 0.6 k | 55.6 | 89.3 | 60.2 | 37.4 | 56.3 | 61.6 | |
Cascade R-CNN | P: 9.3 k R: 0.6 k | 59.1 | 90.5 | 65.1 | 45.3 | 58.9 | 67.0 | |
Mask2Former | P: 9.3 k R: 0.6 k | 60.9 | 92.6 | 66.9 | 46.0 | 61.8 | 68.2 |
Training Set | Method | Size | AP | |||||
---|---|---|---|---|---|---|---|---|
Training with Pure Real Label | ||||||||
iSAID | Mask R-CNN | R: 2.8 k | 34.8 | 57.4 | 37.0 | 20.5 | 43.2 | 50.3 |
Cascade R-CNN | R: 2.8 k | 35.6 | 57.8 | 38.0 | 20.8 | 44.3 | 52.7 | |
Mask2Former | R: 2.8 k | 37.1 | 59.4 | 40.2 | 20.5 | 45.1 | 55.6 | |
Mask R-CNN | R: 2.2 k | 29.5 | 52.3 | 31.4 | 12.5 | 37.7 | 46.9 | |
Cascade R-CNN | R: 2.2 k | 31.2 | 51.4 | 31.6 | 13.5 | 39.7 | 47.2 | |
Mask2Former | R: 2.2 k | 33.8 | 53.1 | 33.6 | 14.1 | 39.9 | 47.8 | |
Training with Pure Pseudo-Label | ||||||||
DiffuPrompter | Mask R-CNN | P: 9.3 k | 15.4 | 31.2 | 19.4 | 9.3 | 20.6 | 33.7 |
Cascade R-CNN | P: 9.3 k | 16.1 | 32.5 | 21.5 | 11.1 | 22.7 | 36.2 | |
Mask2Former | R: 9.3 k | 17.9 | 36.2 | 24.3 | 14.5 | 22.5 | 36.3 | |
Training with Pseudo and Real Label | ||||||||
DiffuPrompter & iSAID | Mask R-CNN | P: 9.3 k R: 1.4 k | 31.6 | 54.3 | 32.4 | 15.3 | 41.7 | 48.4 |
Cascade R-CNN | P: 9.3 k R: 1.4 k | 32.7 | 55.1 | 34.3 | 15.8 | 43.3 | 51.1 | |
Mask2Former | P: 9.3 k R: 1.4 k | 33.4 | 56.4 | 35.3 | 16.3 | 45.7 | 54.5 | |
Mask R-CNN | P: 9.3 k R: 2.2 k | 35.1 | 57.3 | 36.4 | 17.3 | 42.7 | 50.4 | |
Cascade R-CNN | P: 9.3 k R: 2.2 k | 35.2 | 57.1 | 38.3 | 17.8 | 43.9 | 52.5 | |
Mask2Former | P: 9.3 k R: 2.2 k | 36.8 | 58.7 | 38.9 | 17.9 | 45.2 | 55.3 |
Training Set | Data Size | Test Set | AP | ||
---|---|---|---|---|---|
NWPU | R: 0.8 K (all) | iSAID | 17.9 | 29.0 | 20.1 |
iSAID | R: 2.8 K (all) | NWPU | 47.2 | 78.5 | 51.0 |
DiffuPrompter & NWPU | 9.3K + R: 0.6K | iSAID | 22.5 | 40.3 | 23.0 |
DiffuPrompter & iSAID | 9.3K + R: 2.2K | NWPU | 45.4 | 72.6 | 48.1 |
DiffuPrompter & NWPU | 9.3K + R: 0.8 K (all) | iSAID | 25.2 | 45.1 | 26.0 |
DiffuPrompter & iSAID | 9.3K + R: 2.8 K (all) | NWPU | 50.3 | 82.7 | 53.5 |
Method | Seconds/im | NWPU | iSAID | |||||
---|---|---|---|---|---|---|---|---|
AP | AP | |||||||
CAM | Mask R-CNN | 2.5 | 8.4 | 15.3 | 10.7 | 3.1 | 7.4 | 5.5 |
Mask2Former | 7.9 | 16.1 | 11.2 | 3.5 | 6.9 | 5.7 | ||
DiffuMask | Mask R-CNN | 35.3 | 12.3 | 17.9 | 13.1 | 9.5 | 13.6 | 11.2 |
Mask2Former | 12.9 | 18.1 | 14.5 | 10.2 | 14.2 | 10.9 | ||
DiffuPrompter | Mask R-CNN | 297.3 | 27.3 | 50.2 | 36.1 | 15.4 | 31.2 | 19.4 |
Mask2Former | 30.3 | 53.4 | 40.1 | 17.9 | 36.2 | 24.3 |
Method | Sup | NWPU | iSAID | ||||
---|---|---|---|---|---|---|---|
AP | AP | ||||||
CAM [35] | 8.4 | 15.3 | 10.7 | 3.1 | 7.4 | 5.5 | |
SEC [36] | 21.4 | 30.0 | 23.1 | 4.2 | 12.7 | 10.3 | |
AffinityNet [37] | 22.5 | 37.9 | 30.1 | 5.7 | 15.6 | 10.9 | |
BESTIE [38] | 13.3 | 25.9 | 13.8 | 7.3 | 9.4 | 8.5 | |
BoxSup [39] | 27.5 | 53.1 | 37.9 | 10.3 | 20.2 | 14.1 | |
MGWI-Net [40] | 29.8 | 62.9 | 25.8 | - | - | - | |
27.3 | 50.2 | 36.1 | 11.4 | 22.6 | 14.4 | ||
29.9 | 52.3 | 38.1 | 13.1 | 25.5 | 16.3 |
Method | AP | AI | SH | ST | BD | TC | BC | PL | HB | BR | VE |
---|---|---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | 58.3 | 28.4 | 52.8 | 69.6 | 81.4 | 59.6 | 69.6 | 84.3 | 60.7 | 25.8 | 50.6 |
PANet [18] | 64.8 | 50.6 | 53.5 | 78.4 | 83.5 | 73.0 | 78.1 | 87.2 | 58.6 | 33.8 | 51.6 |
PointRend [41] | 65.4 | 54.5 | 53.2 | 75.7 | 84.3 | 72.4 | 74.4 | 90.1 | 58.8 | 35.9 | 54.7 |
BlendMask [42] | 65.7 | 48.1 | 51.1 | 79.8 | 84.0 | 72.4 | 76.7 | 91.5 | 58.9 | 39.6 | 54.6 |
CATNet [43] | 73.3 | 51.9 | 64.4 | 87.1 | 89.4 | 75.8 | 79.7 | 95.0 | 65.0 | 53.2 | 72.0 |
27.3 | 17.9 | 27.5 | 35.3 | 37.1 | 30.4 | 31.1 | 40.1 | 27.9 | 9.9 | 15.8 |
t | AP | AI | SH | ST | BD | TC | BC | PL | HA | BR | VE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
30 | 0.3 | 12.3 | 8.9 | 11.5 | 8.6 | 22.5 | 11.3 | 21.6 | 15.9 | 9.6 | 9.1 | 4.1 |
0.4 | 20.5 | 16.7 | 19.7 | 15.7 | 18.4 | 22.6 | 28.5 | 26.4 | 27.8 | 11.9 | 17.3 | |
0.5 | 18.5 | 16.0 | 20.7 | 19.7 | 19.4 | 25.0 | 12.9 | 19.2 | 23.9 | 10.6 | 17.5 | |
40 | 0.3 | 22.5 | 27.7 | 30.0 | 21.1 | 30.0 | 19.7 | 28.0 | 22.1 | 21.8 | 10.5 | 14.1 |
0.4 | 27.3 | 17.9 | 27.5 | 35.3 | 37.1 | 30.4 | 31.1 | 40.1 | 27.9 | 9.9 | 15.8 | |
0.5 | 25.4 | 35.0 | 35.0 | 20.1 | 35.0 | 18.9 | 18.9 | 35.0 | 30.8 | 12.7 | 12.6 | |
50 | 0.3 | 19.5 | 23.7 | 14.2 | 25.0 | 21.1 | 24.0 | 21.1 | 11.3 | 23.0 | 25.0 | 6.4 |
0.4 | 23.7 | 30.6 | 19.1 | 26.3 | 32.0 | 28.8 | 29.4 | 33.0 | 16.7 | 8.7 | 12.4 | |
0.5 | 20.1 | 32.4 | 30.5 | 20.9 | 24.4 | 15.9 | 21.0 | 24.6 | 16.8 | 8.5 | 6.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, H.; Wei, Y.; Peng, H.; Zhang, W. DiffuPrompter: Pixel-Level Automatic Annotation for High-Resolution Remote Sensing Images with Foundation Models. Remote Sens. 2024, 16, 2004. https://doi.org/10.3390/rs16112004
Li H, Wei Y, Peng H, Zhang W. DiffuPrompter: Pixel-Level Automatic Annotation for High-Resolution Remote Sensing Images with Foundation Models. Remote Sensing. 2024; 16(11):2004. https://doi.org/10.3390/rs16112004
Chicago/Turabian StyleLi, Huadong, Ying Wei, Han Peng, and Wei Zhang. 2024. "DiffuPrompter: Pixel-Level Automatic Annotation for High-Resolution Remote Sensing Images with Foundation Models" Remote Sensing 16, no. 11: 2004. https://doi.org/10.3390/rs16112004
APA StyleLi, H., Wei, Y., Peng, H., & Zhang, W. (2024). DiffuPrompter: Pixel-Level Automatic Annotation for High-Resolution Remote Sensing Images with Foundation Models. Remote Sensing, 16(11), 2004. https://doi.org/10.3390/rs16112004