Mean Inflection Point Distance: Artificial Intelligence Mapping Accuracy Evaluation Index—An Experimental Case Study of Building Extraction
(This article belongs to the Section AI Remote Sensing)
Abstract
:1. Introduction
- (1)
- An inflection point matching algorithm of a vector polygon is proposed to match the inflection points of the extracted building contour and the reference contour.
- (2)
- We define and formalize the edge inflection point of the vector contour. The inflection points extracted by artificial intelligence and the inflection points of ground truth do not always correspond one-to-one. If these inflection points are not distinguished, it will significantly increase the evaluation error. Experiments have shown that it is important to clearly distinguish the inflection points on the contour.
- (3)
- Mean inflection point distance (MPD) is calculated and realized, and its effectiveness is verified via experiments.
2. Methods
2.1. Data Preprocessing
- Mask one-to-one correspondence: The extracted building spots and reference spots on each image are matched one-to-one. For example, we used IoU and calculated IoU for one extracted spot and all reference spots on each image. The two spots with the largest value that is greater than a certain threshold (for example, we chose 0.5) correspond to each other.
- Binarization: Binarize the extracted RGB masks to obtain grayscale masks.
- Contour extraction: Extract contours from the binarized mask. For example, we used the findContours [39] algorithm to extract the contours.
- Regularization: The vector contour is regularized. For example, we used the Douglas–Peucker (DP) [40] algorithm to obtain the inflection points of the extracted contours. Inflection points of the reference contour were obtained using manual annotations.
- Inflection point matching: We used the dynamic programming [41] inflection point matching algorithm to obtain matched pairs of inflection points. The inflection point matching problem was simplified to extract m-ordered inflection points of the contour A. Next, A was divided into m segments. The reference contour B had q inflection points . Thereafter, B was divided into n segments. The inflection point matching of two contours was the multivalued mapping from inflection point set of A to inflection point set of B of where , . There were three cases of inflection point matching of two polygon contours: one-to-one, one-to-many, and many-to-one. All inflection points are required to participate in matching; therefore, they must be surjective, meaning that the inflection point matching of the two contours is transformed into a target optimization problem.
2.2. Mean Inflection Point Distance
- (1)
- The inflection point must be 1-to-n or n-to-1. If it is 1-to-1, there is no edge inflection point;
- (2)
- The base from the inflection point to the corresponding side is on the corresponding side;
- (3)
- The distance from the inflection point to the matching inflection point satisfies the following formula:
3. Experiment and Analysis
3.1. Experimental Setting
3.1.1. Experimental Hardware and Software Environment
3.1.2. Description of Experimental Data
3.1.3. Experiment Design
3.2. Evaluation Metrics
3.3. Accuracy Evaluation of Buildings with Different Complexities in Shape/Construction
3.4. Overall Accuracy Evaluation
3.5. Analysis of Experimental Results
4. Discussion
4.1. Comparison with the Networks That Predict Buildings
4.2. More Experiments with Different IoU and MPD Thresholds
4.3. Accuracy Analysis of HD Metrics
4.4. Accuracy Analysis of MPD Metrics
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Moser, G.; Serpico, S.B.; Benediktsson, J.A. Land-Cover Mapping by Markov Modeling of Spatial–Contextual Information in Very-High-Resolution Remote Sensing Images. Proc. IEEE 2013, 101, 631–651. [Google Scholar] [CrossRef]
- Friedl, M.A.; McIver, D.K.; Hodges, J.C.F.; Zhang, X.Y.; Muchoney, D.; Strahler, A.H.; Woodcock, C.E.; Gopal, S.; Schneider, A.; Cooper, A.; et al. Global Land Cover Mapping from MODIS: Algorithms and Early Results. Remote Sens. Environ. 2002, 83, 287–302. [Google Scholar] [CrossRef]
- Maus, V.; Camara, G.; Cartaxo, R.; Sanchez, A.; Ramos, F.M.; de Queiroz, G.R. A Time-Weighted Dynamic Time Warping Method for Land-Use and Land-Cover Mapping. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 3729–3739. [Google Scholar] [CrossRef]
- Longbotham, N.; Chaapel, C.; Bleiler, L.; Padwick, C.; Emery, W.J.; Pacifici, F. Very High Resolution Multiangle Urban Classification Analysis. IEEE Trans. Geosci. Remote Sens. 2012, 50, 1155–1170. [Google Scholar] [CrossRef]
- Li, X.; Xu, F.; Xia, R.; Li, T.; Chen, Z.; Wang, X.; Xu, Z.; Lyu, X. Encoding Contextual Information by Interlacing Transformer and Convolution for Remote Sensing Imagery Semantic Segmentation. Remote Sens. 2022, 14, 4065. [Google Scholar] [CrossRef]
- Fritsch, J.; Kuhnl, T.; Geiger, A. A New Performance Measure and Evaluation Benchmark for Road Detection Algorithms. In Proceedings of the 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), The Hague, The Netherlands, 6–9 October 2013; pp. 1693–1700. [Google Scholar]
- Zhang, H.; Liao, Y.; Yang, H.; Yang, G.; Zhang, L. A Local–Global Dual-Stream Network for Building Extraction from Very-High-Resolution Remote Sensing Images. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 1269–1283. [Google Scholar] [CrossRef]
- Cheng, G.; Wang, Y.; Xu, S.; Wang, H.; Xiang, S.; Pan, C. Automatic Road Detection and Centerline Extraction via Cascaded End-to-End Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3322–3337. [Google Scholar] [CrossRef]
- Li, W.; He, C.; Fang, J.; Zheng, J.; Fu, H.; Yu, L. Semantic Segmentation-Based Building Footprint Extraction Using Very High-Resolution Satellite Images and Multi-Source GIS Data. Remote Sens. 2019, 11, 403. [Google Scholar] [CrossRef] [Green Version]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. ISBN 978-3-319-24573-7. [Google Scholar]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Stoyanov, D., Taylor, Z., Carneiro, G., Syeda-Mahmood, T., Martel, A., Maier-Hein, L., Tavares, J.M.R.S., Bradley, A., Papa, J.P., Belagiannis, V., et al., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11045, pp. 3–11. ISBN 978-3-030-00888-8. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [Green Version]
- Dai, J.; He, K.; Sun, J. Instance-Aware Semantic Segmentation via Multi-Task Network Cascades. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3150–3158. [Google Scholar]
- Luo, M.; Ji, S.; Wei, S. A Diverse Large-Scale Building Dataset and a Novel Plug-and-Play Domain Generalization Method for Building Extraction. arXiv 2022, arXiv:2208.10004. [Google Scholar]
- Ji, S.; Wei, S.; Lu, M. Fully Convolutional Networks for Multisource Building Extraction from an Open Aerial and Satellite Imagery Data Set. IEEE Trans. Geosci. Remote Sens. 2019, 57, 574–586. [Google Scholar] [CrossRef]
- Zhu, Q.; Liao, C.; Hu, H.; Mei, X.; Li, H. MAP-Net: Multiple Attending Path Neural Network for Building Footprint Extraction From Remote Sensed Imagery. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6169–6181. [Google Scholar] [CrossRef]
- Wang, L.; Fang, S.; Meng, X.; Li, R. Building Extraction with Vision Transformer. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–11. [Google Scholar] [CrossRef]
- Jin, Y.; Xu, W.; Zhang, C.; Luo, X.; Jia, H. Boundary-Aware Refined Network for Automatic Building Extraction in Very High-Resolution Urban Aerial Images. Remote Sens. 2021, 13, 692. [Google Scholar] [CrossRef]
- Fang, F.; Wu, K.; Liu, Y.; Li, S.; Wan, B.; Chen, Y.; Zheng, D. A Coarse-to-Fine Contour Optimization Network for Extracting Building Instances from High-Resolution Remote Sensing Imagery. Remote Sens. 2021, 13, 3814. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Zitnick, C.L. Microsoft COCO: Common Objects in Context; Springer International Publishing: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.Y.; Sadeghian, A.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
- Cheng, B.; Girshick, R.; Dollár, P.; Berg, A.C.; Kirillov, A. Boundary IoU: Improving Object-Centric Image Segmentation Evaluation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Heimann, T.; Van Ginneken, B.; Styner, M.A.; Arzhaeva, Y.; Aurich, V.; Bauer, C.; Beck, A.; Becker, C.; Beichel, R.; Bekes, G.; et al. Comparison and Evaluation of Methods for Liver Segmentation From CT Datasets. IEEE Trans. Med. Imaging 2009, 28, 1251–1265. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Y.; Huang, B.; Gao, J.; Huang, E.; Chen, H. Adaptive Polygon Generation Algorithm for Automatic Building Extraction. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
- Wu, Y.; Xu, L.; Chen, Y.; Wong, A.; Clausi, D.A. TAL: Topography-Aware Multi-Resolution Fusion Learning for Enhanced Building Footprint Extraction. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Kristan, M.; Leonardis, A.; Matas, J.; Felsberg, M.; Pflugfelder, R.; Čehovin, L.; Vojír, T.; Häger, G.; Lukežič, A.; Fernández, G.; et al. The Visual Object Tracking VOT2016 Challenge Results. In Lecture Notes in Computer Science, Proceedings of the Computer Vision—ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–16 October 2016; Hua, G., Jégou, H., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 777–823. [Google Scholar]
- Baltsavias, E.P. Object Extraction and Revision by Image Analysis Using Existing Geodata and Knowledge: Current Status and Steps towards Operational Systems. ISPRS J. Photogramm. Remote Sens. 2004, 58, 129–151. [Google Scholar] [CrossRef]
- Lowe, D.G. Object Recognition from Local Scale-Invariant Features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 1150–1157. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Di, Y.; Shao, Y.; Chen, L.K. Real-Time Wave Mitigation for Water-Air OWC Systems Via Beam Tracking. IEEE Photonics Technol. Lett. 2021, 34, 47–50. [Google Scholar] [CrossRef]
- Leal-Taixé, L.; Milan, A.; Reid, I.; Roth, S.; Schindler, K. MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking. arXiv 2015, arXiv:1504.01942. [Google Scholar]
- Automated Segmentation of Colorectal Tumor in 3D MRI Using 3D Multiscale Densely Connected Convolutional Neural Network. Available online: https://www.hindawi.com/journals/jhe/2019/1075434/ (accessed on 20 February 2023).
- Hung, W.-L.; Yang, M.-S. Similarity Measures of Intuitionistic Fuzzy Sets Based on Hausdorff Distance. Pattern Recognit. Lett. 2004, 25, 1603–1611. [Google Scholar] [CrossRef]
- Rote, G. Computing the Minimum Hausdorff Distance between Two Point Sets on a Line under Translation. Inf. Process. Lett. 1991, 38, 123–127. [Google Scholar] [CrossRef]
- Suzuki, S.; Be, K. Topological Structural Analysis of Digitized Binary Images by Border Following. Comput. Vis. Graph. Image Process. 1985, 30, 32–46. [Google Scholar] [CrossRef]
- Douglas, D.H.; Peucker, T.K. Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line or Its Caricature. In Classics in Cartography; Dodge, M., Ed.; Wiley: Hoboken, NJ, USA, 2011; pp. 15–28. ISBN 978-0-470-68174-9. [Google Scholar]
- Petrakis, E.; Diplaros, A.; Milios, E. Matching and Retrieval of Distorted and Occluded Shapes Using Dynamic Programming. Pattern Anal. Mach. Intell. IEEE Trans. 2002, 24, 1501–1516. [Google Scholar] [CrossRef]
- Kirillov, A.; Wu, Y.; He, K.; Girshick, R. PointRend: Image Segmentation as Rendering. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9796–9805. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 9992–10002. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Rottensteiner, F.; Sohn, G.; Gerke, M.; Wegner, J.D.; Breitkopf, U.; Jung, J. Results of the ISPRS Benchmark on Urban Object Detection and 3D Building Reconstruction. ISPRS J. Photogramm. Remote Sens. 2014, 93, 256–271. [Google Scholar] [CrossRef]
- Jozdani, S.; Chen, D. On the Versatility of Popular and Recently Proposed Supervised Evaluation Metrics for Segmentation Quality of Remotely Sensed Images: An Experimental Case Study of Building Extraction. ISPRS J. Photogramm. Remote Sens. 2020, 160, 275–290. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Building | Method | IoU | Boundary IoU | MSD | MPD_EP | MPD |
---|---|---|---|---|---|---|
a | PointRend | 97.38% | 94.11% | 2.83 | 20.66 | 3.15 |
Swin Transformer | 94.82% | 87.65% | 4.58 | 27.71 | 6.40 | |
Mask R-CNN | 94.35% | 87.16% | 5.01 | 37.68 | 8.34 | |
b | PointRend | 95.59% | 95.07% | 1.89 | 28.75 | 4.41 |
Swin Transformer | 93.94% | 92.81% | 2.58 | 24.51 | 3.97 | |
Mask R-CNN | 94.51% | 93.73% | 2.39 | 21.99 | 9.38 | |
c | PointRend | 94.61% | 90.84% | 2.68 | 5.39 | 5.39 |
Swin Transformer | 92.73% | 86.69% | 3.68 | 6.34 | 4.59 | |
Mask R-CNN | 93.62% | 88.06% | 3.23 | 5.89 | 5.89 |
Building | Method | IoU | Boundary IoU | MSD | MPD_EP | MPD |
---|---|---|---|---|---|---|
a | PointRend | 85.56% | 73.15% | 6.97 | 26.45 | 17.12 |
Swin Transformer | 86.33% | 69.30% | 8.47 | 39.67 | 19.44 | |
Mask R-CNN | 85.61% | 69.08% | 8.83 | 47.69 | 25.23 | |
b | PointRend | 94.94% | 91.40% | 2.80 | 15.74 | 8.84 |
Swin Transformer | 92.30% | 87.11% | 4.10 | 31.48 | 11.70 | |
Mask R-CNN | 92.38% | 87.12% | 3.99 | 64.80 | 12.93 | |
c | PointRend | 95.36% | 90.59% | 4.78 | 31.71 | 20.31 |
Swin Transformer | 93.81% | 82.62 | 5.75 | 36.02 | 18.91 | |
Mask R-CNN | 91.05% | 76.47% | 8.50 | 39.16 | 32.97 |
Building | Method | IoU | Boundary IoU | MSD | MPD_EP | MPD |
---|---|---|---|---|---|---|
a | PointRend | 72.07% | 68.82% | 16.48 | 95.58 | 47.75 |
Swin Transformer | 80.81% | 77.80% | 12.56 | 32.41 | 27.04 | |
Mask R-CNN | 90.26% | 89.92% | 8.53 | 15.22 | 8.34 | |
b | PointRend | 63.86% | 57.61% | 18.26 | 34.79 | 30.11 |
Swin Transformer | 69.87% | 59.40% | 15.76 | 37.10 | 34.61 | |
Mask R-CNN | 71.36% | 65.33% | 14.21 | 32.56 | 28.24 | |
c | PointRend | 52.52% | 47.78% | 53.16 | 109.91 | 109.91 |
Swin Transformer | 53.58% | 48.60% | 51.59 | 109.23 | 109.23 | |
Mask R-CNN | 50.26% | 44.48% | 54.90 | 109.96 | 109.96 |
Methods | IoUaverage | Boundary IoUaverage | MSDaverage | MPD_EPaverage | MPDaverage |
---|---|---|---|---|---|
PointRend | 90.75% | 89.08% | 3.34 | 19.57 | 9.09 |
Swin Transformer | 89.75% | 87.79% | 3.58 | 21.76 | 10.13 |
Mask R-CNN | 89.90% | 87.49% | 3.93 | 23.49 | 12.05 |
Methods | AP[email protected]:0.9 | AP[email protected] | AP[email protected] | MSD | APMPD@3 | APMPD@7 | APMPD@10 |
---|---|---|---|---|---|---|---|
PointRend | 0.584 | 0.808 | 0.660 | 3.34 | 0.328 | 0.500 | 0.747 |
Swin Transformer | 0.625 | 0.874 | 0.709 | 3.58 | 0.277 | 0.405 | 0.642 |
Mask R-CNN | 0.598 | 0.848 | 0.672 | 3.93 | 0.011 | 0.300 | 0.549 |
Methods | Counts of Buildings | Maximum Error | Average Error | Mean Square Error |
---|---|---|---|---|
Mask R-CNN | 1721 | 410.83 | 21.89 | 24.21 |
SwinTransformer | 1873 | 412.95 | 19.87 | 22.86 |
PointRend | 1798 | 441.23 | 26.84 | 38.88 |
Methods | Counts of Buildings | Maximum Error | Average Error | Mean Square Error |
---|---|---|---|---|
Mask R-CNN | 1721 | 287.321 | 11.87 | 17.02 |
SwinTransformer | 1873 | 152.73 | 10.14 | 8.45 |
PointRend | 1798 | 92.706 | 9.01 | 8.06 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, D.; Li, A.; Li, J.; Xu, Y.; Long, Y. Mean Inflection Point Distance: Artificial Intelligence Mapping Accuracy Evaluation Index—An Experimental Case Study of Building Extraction. Remote Sens. 2023, 15, 1848. https://doi.org/10.3390/rs15071848
Yu D, Li A, Li J, Xu Y, Long Y. Mean Inflection Point Distance: Artificial Intelligence Mapping Accuracy Evaluation Index—An Experimental Case Study of Building Extraction. Remote Sensing. 2023; 15(7):1848. https://doi.org/10.3390/rs15071848
Chicago/Turabian StyleYu, Ding, Aihua Li, Jinrui Li, Yan Xu, and Yinping Long. 2023. "Mean Inflection Point Distance: Artificial Intelligence Mapping Accuracy Evaluation Index—An Experimental Case Study of Building Extraction" Remote Sensing 15, no. 7: 1848. https://doi.org/10.3390/rs15071848
APA StyleYu, D., Li, A., Li, J., Xu, Y., & Long, Y. (2023). Mean Inflection Point Distance: Artificial Intelligence Mapping Accuracy Evaluation Index—An Experimental Case Study of Building Extraction. Remote Sensing, 15(7), 1848. https://doi.org/10.3390/rs15071848