A Siamese Network with a Multiscale Window-Based Transformer via an Adaptive Fusion Strategy for High-Resolution Remote Sensing Image Change Detection
Abstract
:1. Introduction
2. Materials and Methods
2.1. Related Works
2.1.1. RS-CD Method Based on a Deep Convolutional Network
2.1.2. Transformer-Based RS-CD Method
2.2. Model Overview
2.3. Window-Based Transformer
2.3.1. Patch Embedding
2.3.2. Window-Based Transformer Block
- (a)
- W-Trans block: As shown in Figure 3I, the W-Trans block contains W-MSA, the layer norm (LN), the multilayer perceptron (MLP), and residual connections. To extract the multiscale features of interesting changes, the module calculates the self-attention inside the multiscale window, which is calculated in the same way as MSA [31]. In addition, W-MSA greatly reduces computational consumption because standard MSA calculates self-attention by computing global self-attention among all image tokens, which causes secondary computational complexity in the number of tokens [29]. On the contrary, W-MSA computes self-attention inside a local window, and its computational complexity is linear with respect to the input, so W-MSA is more suitable for intensive prediction tasks or tasks oriented toward high-resolution remote sensing images.
- (b)
- SW-Trans block: The lack of information interaction across windows is the most fatal problem of the W-Trans block because it severely restricts the feature modeling capability of the model. For this reason, we introduced the moving window mechanism proposed by Liu et al. [28] to design the SW-Trans block, as shown in Figure 3II, the structure of which is similar to that of the W-Trans block, except that in the W-Trans block, W-MSA can be replaced by SW-MSA. The SW-Trans block divides the complete window at the distance (N/2, N/2) from the image vertex and solves the self-attention calculation problem of multiple nonstandard windows using a cyclic shift mechanism, where (N, N) represents the window size. In conclusion, the mathematical expression of the window-based transformer block is:
- (c)
- Self-attention in the local window: Both W-MSA and SW-MSA use the same method as MSA to compute multihead self-attention; the former computes multihead self-attention inside a local window, and the latter computes multihead self-attention on a global scale. The mathematical expressions of self-attention is as follows.
2.3.3. Multiscale Window Design
2.4. Channel-Related Fusion Mechanism
2.5. Decoder
3. Results
3.1. Datasets
3.1.1. CDD
3.1.2. WHU-CD
3.2. Metrics
3.2.1. Evaluation Criteria
3.2.2. Implementation Details
- (a)
- Parameter: We use the Adam optimizer for algorithm optimization with an epoch of 150, a learning rate of 5 × 10−4, a weight decay of 5 × 10−4, a step size of 50, and momentum of 0.9. PyTorch is used as the deep learning framework, and the graphics card is an NVIDIA RTX A6000.
- (b)
- Data segmentation and data enhancement: We crop the original images of the CDD dataset into small blocks of 256 × 256 pixels and randomly perform simple data enhancement operations such as rotation, folding, and center cropping to obtain a training set, a validation set, and a test set of 10,000, 3000, and 3000 images, respectively. We cropped the original images of the WHU-CD dataset into 512 × 512 pixel blocks without overlap; randomly performed data enhancement operations, such as rotation, folding, and center cropping; and randomly divided them into a training set, validation set and, test set at a ratio of 7:1:2.
3.3. Experimental Results on the Dataset
3.3.1. Comparison with Other Methods
- (a)
- FC-EF [16]: The FC-EF is a U-shaped structured network that stitches dual-temporal images into a single image that is input into the FCN.
- (b)
- FC-Siam-Di [16]: FC-Siam-Di is a Siamese network with a double U-shaped structure; the dual time-phase features generate the change information by absolute difference operations.
- (c)
- FC-Siam-Conc [16]: FC-Siam-Conc uses a Siamese FCN to extract multilevel change features, establishes skip connection layers to share the same-level feature information, and finally, generates change maps by multidecoding layers.
- (d)
- DASNet [18]: DASNet uses VGG16 or ResNet50 as the backbone network to extract features, introduces spatial attention and channel attention modules to enhance the resistance of the network to pseudo changes, and finally, uses the distance metric module to generate change maps.
- (e)
- IFNet [15]: IFNet is an end-to-end multiscale feature fusion network that first extracts deep features of representative dual-temporal images through an FCN-based Siamese structure, then feeds them into a deep supervised discriminative network, which combines channel attention and spatial attention.
- (f)
- BiT [25]: BiT is a network that combines a CNN and a transformer to express a dual-temporal image as a number of image tokens, encodes the global contextual relationships in space time using the transformer, then maps the tokens containing rich change information back to the pixel space, and refines the original features using the transformer decoder.
- (g)
- SwinSUNet [29]: SwinSUNet is a Siamese pure transformer change detection network with a double U-shaped structure. It uses a Swin transformer block as an encoder to obtain dual temporal phase features, then uses a Swin transformer block combined with a skip connection to decode the change map after feature fusion.
- (h)
- TransUNetCD [26]: This is an end-to-end network that combines the advantages of UNet and transformers. First, the tokenized image patches in the feature map of the convolutional network are encoded to obtain rich global contextual information, based on which the differential enhancement module is introduced to generate a differential feature map containing rich change information to achieve precise object localization.
3.3.2. Experimental Results on the CDD Dataset
- (a)
- Table 1 shows the experimental results of various methods on the CDD test set. The results show that SWaF-Trans consistently outperforms the other methods and has obvious advantages; four of its accuracy indices significantly outperform those of the other methods, and the F1 score is as high as 97.1%, which is higher than that of the traditional BiT method by 2%.
- (b)
- Figure 6 shows the change map and confusion matrix of SWaF-Trans inference; rows 1 and 2 correspond to changes in cars, roads, and houses, and rows 3 and 4 correspond to changes in farmland and houses, respectively. As shown in the figure, the changes in SWaF-Trans inference have accurate details, with complete changes in cars detected in the box in row 2 and detailed changes in small buildings detected in the box in row 4. This shows that our method can accurately detect changes of interest; can perceive small object changes in seasonal interference with few misses and false detections; and is not easily affected by light, color, and weather.
- (c)
- For large object changes, SWaF-Trans can ensure the compactness and boundary integrity inside the changes, and for small object changes, it can ensure the accuracy of the location and the reasonableness of the morphology. When the scale difference of change samples is large, our method can obtain the feature information of small objects and model the contextual relationships, avoiding the negative effects of large object changes overwhelming the network. As shown in row 4 of Figure 6, only the small object change region inferred by SWaF-Trans conforms to the real situation, which shows that our method is more resistant to the variable scale of change samples. However, other methods do not detect small object changes, such as changes in cars and small buildings, and even large objects such as houses and farmland have “jagged” change edges, which does not correspond to the real situation. On the contrary, the change boundary of our method is softer and more rounded, and the change features are aggregated in high-dimensional space, which indicates that SWaF-Trans learns a clear semantic boundary between change and non-change; therefore, the inferred changes are more consistent with the real situation.
3.3.3. Experimental Results on the WHU-CD Dataset
- (a)
- As shown in Table 2, SWaF-Trans outperformed the other methods on the WHU-CD test set, with three indicators achieving the best results, including an absolute F1 score of 93.9%, which is 0.4% higher than that of the SOTA method TransUNetCD.
- (b)
- To visualize the results, we generated change maps and confusion matrix maps for the test set. As shown in Figure 7, SWaF-Trans predicts more accurate change boundaries and more compact change interiors, with higher confidence in the change map patches. The box in row 1 shows that our method detects small building changes that are omitted from labeling in the ground truth, which indicates that our network learns the representational form of small object changes and is not influenced by mislabeling. In addition, similar to the case of the CDD dataset, the boundaries of the change regions are rounded and not broken, which implies a strong feature aggregation of the buildings. In contrast, the results of other methods show that the edges of the change region are not clear, the interior is not compact, and the details are not comprehensive, and there are even cases in which the confidence level of the change region is too low to be detected.
- (c)
- The WHU-CD dataset focuses on the detection of changes in buildings, so changes occurring in other features can be considered irrelevant. Rows 1 and 3 show that SWaF-Trans performs well in excluding irrelevant changes, and it is strongly resistant to changes occurring in roads and trees.
- (d)
- It is worth mentioning that SWaF-Trans does not require pretraining on large datasets, alleviating the transformer model’s over-reliance on data volume. In addition, our method in detecting interesting changes using only a simple network, which is attributed to the ability of SWaF-Trans to model the spatiotemporal contextual relationships of multiscale changes and to enhance the representational form of multiscale features.
3.3.4. Learning Curve Comparison
4. Discussion
4.1. Ablation Study
4.1.1. Comparison of the Multiscale Fusion Window Effect
4.1.2. Comparison of Different Fusion Methods
4.2. Parameter Verification Experiment
4.2.1. Effect of the Patch Number
4.2.2. Effect of the Number of Block Layers
4.3. Analysis of Effectiveness
4.3.1. Effectiveness Experiment
- (a)
- On a 512 × 512 pixel image, a small object is an entity smaller than 32 × 32 pixels, and a large object is an entity larger than 96 × 96 pixels. To more intuitively reflect the effectiveness of SWaF-Trans in detecting small object changes when the change sample scales differ greatly, we designed validity probing experiments. Specifically, we selected the images in the CDD test set with large differences in change scales and aggregated them into Test Set 1, then selected the images in the test set containing only small object changes and aggregated them into Test Set 2 and use multiple networks to predict the change maps of CDD, Test1, and Test2.
- (b)
- Figure 9 and Table 5 show the predicted changes and the accuracy statistics, respectively. As shown in Figure 9, SWaF-Trans accurately detects the shape and location of the car changes in the box, while BiT produces more misses and false detections. Comparison of the Test2 and CDD results shows that the accuracy of the model inference changes decreases as the large objects in CDD are eliminated to obtain Test2, but the F1 score of our method only decreases by 4.63%, while those of BiT and SwinSUNet decrease by 17.54% and 16.67%, respectively, which indicates that SWaF-Trans is much better at detecting small object changes than BiT and SwinSUNet.
- (c)
- As shown in the boxes in rows 1 and 2 of Figure 9, BiT has difficulty in accurately detecting small object changes around large objects due to the imbalance in the scale of the change samples, which causes the model to ignore small object samples. Experimentally, we find that false detection and missed detection occur more frequently when small objects are closer to large objects; we describe this phenomenon as the negative effect of the presence of large object change samples near the change samples in SoCD. For this reason, we model multiscale changes inside a multiscale window and use a small-scale window to capture more comprehensive information on small object changes, thereby mitigating the negative effects of scale differences in the change samples. In addition, the rich features of interest are captured by the channel-related fusion mechanism, which minimizes the loss of small object features and the interference of irrelevant information, so SWaF-Trans shows superior RS-CD performance, with an F1 score of 98.24% on Test1, which is much better than of BiT and SwinSUNet.
4.3.2. Visualization of Effectiveness
5. Conclusions and Future Works
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Khelifi, L.; Mignotte, M. Deep learning for change detection in remote sensing images: Comprehensive review and meta-analysis. IEEE Access 2020, 8, 126385–126400. [Google Scholar] [CrossRef]
- Shi, W.; Zhang, M.; Zhang, R.; Chen, S.; Zhan, Z. Change detection based on artificial intelligence: State-of-the-art and challenges. Remote Sens. 2020, 12, 1688. [Google Scholar] [CrossRef]
- Kennedy, R.E.; Townsend, P.A.; Gross, J.E.; Cohen, W.B.; Bolstad, P.; Wang, Y.; Adams, P. Remote sensing change detection tools for natural resource managers: Understanding concepts and tradeoffs in the design of landscape monitoring projects. Remote Sens. Environ. 2009, 113, 1382–1396. [Google Scholar] [CrossRef]
- Willis, K.S. Remote sensing change detection for ecological monitoring in United States protected areas. Biol. Conserv. 2015, 182, 233–242. [Google Scholar] [CrossRef]
- Todd, W.J. Urban and regional land use change detected by using Landsat data. J. Res. US Geol. Surv. 1977, 5, 529–534. [Google Scholar]
- Zheng, Z.; Zhong, Y.; Wang, J.; Ma, A.; Zhang, L. Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters. Remote Sens. Environ. 2021, 265, 112636. [Google Scholar] [CrossRef]
- Daudt, R.C.; Le Saux, B.; Boulch, A.; Gousseau, Y. Urban change detection for multispectral earth observation using convolutional neural networks. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 2115–2118. [Google Scholar]
- Lv, Z.; Liu, T.; Benediktsson, J.A.; Falco, N. Land cover change detection techniques: Very-high-resolution optical images: A review. IEEE Geosci. Remote Sens. Mag. 2021, 10, 44–63. [Google Scholar] [CrossRef]
- Wen, D.; Huang, X.; Bovolo, F.; Li, J.; Ke, X.; Zhang, A.; Benediktsson, J.A. Change detection from very-high-spatial-resolution optical remote sensing images: Methods, applications, and future directions. IEEE Geosci. Remote Sens. Mag. 2021, 9, 68–101. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Chen, H.; Shi, Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens. 2020, 12, 1662. [Google Scholar] [CrossRef]
- Liu, T.; Gong, M.; Lu, D.; Zhang, Q.; Zheng, H.; Jiang, F.; Zhang, M. Building change detection for VHR remote sensing images via local–global pyramid network and cross-task transfer learning strategy. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–17. [Google Scholar] [CrossRef]
- Wang, D.; Chen, X.; Jiang, M.; Du, S.; Xu, B.; Wang, J. ADS-Net: An Attention-Based deeply supervised network for remote sensing image change detection. Int. J. Appl. Earth Obs. Geoinf. 2021, 101, 102348. [Google Scholar]
- Wang, Z.; Peng, C.; Zhang, Y.; Wang, N.; Luo, L. Fully convolutional siamese networks based change detection for optical aerial images with focal contrastive loss. Neurocomputing 2021, 457, 155–167. [Google Scholar] [CrossRef]
- Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [Google Scholar] [CrossRef]
- Daudt, R.C.; Le Saux, B.; Boulch, A. Fully convolutional siamese networks for change detection. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4063–4067. [Google Scholar]
- Zhang, M.; Xu, G.; Chen, K.; Yan, M.; Sun, X. Triplet-based semantic relation learning for aerial remote sensing image change detection. IEEE Geosci. Remote Sens. Lett. 2018, 16, 266–270. [Google Scholar] [CrossRef]
- Chen, J.; Yuan, Z.; Peng, J.; Chen, L.; Huang, H.; Zhu, J.; Liu, Y.; Li, H. DASNet: Dual attentive fully convolutional Siamese networks for change detection in high-resolution satellite images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 1194–1206. [Google Scholar] [CrossRef]
- Peng, X.; Zhong, R.; Li, Z.; Li, Q. Optical remote sensing image change detection based on attention mechanism and image difference. IEEE Trans. Geosci. Remote Sens. 2020, 59, 7296–7307. [Google Scholar] [CrossRef]
- Zhang, M.; Shi, W. A feature difference convolutional neural network-based change detection method. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7232–7246. [Google Scholar] [CrossRef]
- Zhang, X.; Yue, Y.; Gao, W.; Yun, S.; Su, Q.; Yin, H.; Zhang, Y. DifUnet++: A satellite images change detection network based on UNet++ and differential pyramid. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Jiang, H.; Hu, X.; Li, K.; Zhang, J.; Gong, J.; Zhang, M. PGA-SiamNet: Pyramid feature-based attention-guided Siamese network for remote sensing orthoimagery building change detection. Remote Sens. 2020, 12, 484. [Google Scholar] [CrossRef]
- Yang, M.; Jiao, L.; Liu, F.; Hou, B.; Yang, S.; Jian, M. DPFL-Nets: Deep pyramid feature learning networks for multiscale change detection. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6402–6416. [Google Scholar] [CrossRef]
- Fang, S.; Li, K.; Shao, J.; Li, Z. SNUNet-CD: A densely connected Siamese network for change detection of VHR images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Chen, H.; Qi, Z.; Shi, Z. Remote sensing image change detection with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
- Li, Q.; Zhong, R.; Du, X.; Du, Y. TransUNetCD: A hybrid transformer network for change detection in optical remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–19. [Google Scholar] [CrossRef]
- Bandara, W.G.C.; Patel, V.M. A transformer-based siamese network for change detection. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 207–210. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Zhang, C.; Wang, L.; Cheng, S.; Li, Y. SwinSUNet: Pure transformer network for remote sensing image change detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
- Lin, A.; Chen, B.; Xu, J.; Zhang, Z.; Lu, G.; Zhang, D. Ds-transunet: Dual swin transformer u-net for medical image segmentation. IEEE Trans. Instrum. Meas. 2022, 71, 1–15. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Lebedev, M.; Vizilter, Y.V.; Vygolov, O.; Knyaz, V.; Rubis, A.Y. Change detection in remote sensing images using conditional adversarial networks. In Proceedings of the The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, Riva del Garda, Italy, 4–7 June 2018; Volume 42. [Google Scholar]
- Ji, S.; Wei, S.; Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans. Geosci. Remote Sens. 2018, 57, 574–586. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 July 2015; pp. 3431–3440. [Google Scholar]
- Guo, E.; Fu, X.; Zhu, J.; Deng, M.; Liu, Y.; Zhu, Q.; Li, H. Learning to measure change: Fully convolutional siamese metric networks for scene change detection. arXiv 2018, arXiv:1810.09111. [Google Scholar]
- Lei, T.; Zhang, Q.; Xue, D.; Chen, T.; Meng, H.; Nandi, A.K. End-to-end change detection using a symmetric fully convolutional network for landslide mapping. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 3027–3031. [Google Scholar]
- Zhan, Y.; Fu, K.; Yan, M.; Sun, X.; Wang, H.; Qiu, X. Change detection based on deep siamese convolutional network for optical aerial images. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1845–1849. [Google Scholar] [CrossRef]
- Liu, J.; Gong, M.; Qin, K.; Zhang, P. A deep convolutional coupling network for change detection based on heterogeneous optical and radar images. IEEE Trans. Neural Netw. Learn. Syst. 2016, 29, 545–559. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 3146–3154. [Google Scholar]
- Zhang, D.; Zheng, Z.; Li, M.; Liu, R. CSART: Channel and spatial attention-guided residual learning for real-time object tracking. Neurocomputing 2021, 436, 260–272. [Google Scholar] [CrossRef]
- Shi, Q.; Liu, M.; Li, S.; Liu, X.; Wang, F.; Zhang, L. A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–16. [Google Scholar] [CrossRef]
- Strudel, R.; Garcia, R.; Laptev, I.; Schmid, C. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 7262–7272. [Google Scholar]
- Meng, X.; Yang, Y.; Wang, L.; Wang, T.; Li, R.; Zhang, C. Class-Guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part I 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning roi transformer for oriented object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar]
- Jannat, F.E.; Willis, A.R. Improving Classification of Remotely Sensed Images with the Swin Transformer. In Proceedings of the SoutheastCon 2022, Mobile, AL, USA, 26 March–3 April 2022; pp. 611–618. [Google Scholar]
- Tong, S.; Qi, K.; Guan, Q.; Zhu, Q.; Yang, C.; Zheng, J. Remote Sensing Scene Classification Using Spatial Transformer Fusion Network. In Proceedings of the IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 549–552. [Google Scholar]
- Zhang, B.; Gu, S.; Zhang, B.; Bao, J.; Chen, D.; Wen, F.; Wang, Y.; Guo, B. Styleswin: Transformer-based gan for high-resolution image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11304–11314. [Google Scholar]
- Wang, Z.; Zhang, Y.; Luo, L.; Wang, N. TransCD: Scene change detection via transformer-based architecture. Opt. Express 2021, 29, 41409–41427. [Google Scholar] [CrossRef]
- Yan, T.; Wan, Z.; Zhang, P. Fully Transformer Network for Change Detection of Remote Sensing Images. In Proceedings of the Asian Conference on Computer Vision, Macau SAR, China, 4–8 December 2022; pp. 1691–1708. [Google Scholar]
- Ailimujiang, G.; Jiaermuhamaiti, Y.; Jumahong, H.; Wang, H.; Zhu, S.; Nurmamaiti, P. A Transformer-Based Network for Change Detection in Remote Sensing Using Multiscale Difference-Enhancement. Comput. Intell. Neurosci. 2022, 2022. [Google Scholar] [CrossRef] [PubMed]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H. Fully-convolutional siamese networks for object tracking. In Proceedings of the Computer Vision—ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–10 and 15–16 October 2016; Proceedings, Part II 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 850–865. [Google Scholar]
- Zhang, D.; Zheng, Z.; Wang, T.; He, Y. HROM: Learning high-resolution representation and object-aware masks for visual object tracking. Sensors 2020, 20, 4807. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Li, Y.; Li, S.; Du, H.; Chen, L.; Zhang, D.; Li, Y. YOLO-ACN: Focusing on small target and occluded object detection. IEEE Access 2020, 8, 227288–227303. [Google Scholar] [CrossRef]
- Deng, C.; Wang, M.; Liu, L.; Liu, Y.; Jiang, Y. Extended feature pyramid network for small object detection. IEEE Trans. Multimed. 2021, 24, 1968–1979. [Google Scholar] [CrossRef]
- Hu, G.X.; Yang, Z.; Hu, L.; Huang, L.; Han, J.M. Small object detection with multiscale features. Int. J. Digit. Multimed. Broadcast. 2018, 2018. [Google Scholar] [CrossRef]
- Yang, J.; Yang, J.y. Generalized K–L transform based combined feature extraction. Pattern Recognit. 2002, 35, 295–297. [Google Scholar] [CrossRef]
- Yang, J.; Yang, J.y.; Zhang, D.; Lu, J.f. Feature fusion: Parallel strategy vs. serial strategy. Pattern Recognit. 2003, 36, 1369–1381. [Google Scholar] [CrossRef]
- Liu, C.; Wechsler, H. A shape-and texture-based enhanced fisher classifier for face recognition. IEEE Trans. Image Process. 2001, 10, 598–608. [Google Scholar] [PubMed]
- Huang, L.; Dai, S.; Huang, T.; Huang, X.; Wang, H. Infrared small target segmentation with multiscale feature representation. Infrared Phys. Technol. 2021, 116, 103755. [Google Scholar] [CrossRef]
- Chaib, S.; Liu, H.; Gu, Y.; Yao, H. Deep feature fusion for VHR remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4775–4784. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin transformer embedding UNet for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Method | Year | Precision | Recall | F1 | OA | IOU |
---|---|---|---|---|---|---|
FC-EF | 2018 | 0.905 | 0.421 | 0.574 | 0.919 | - |
FC-Siam-Di | 2018 | 0.915 | 0.408 | 0.564 | 0.919 | - |
FC-Siam-Conc | 2018 | 0.918 | 0.505 | 0.652 | 0.930 | - |
DASNet | 2020 | 0.932 | 0.922 | 0.927 | 0.082 | - |
IFNet | 2020 | 0.950 | 0.860 | 0.903 | 0.977 | - |
BiT | 2021 | 0.962 | 0.940 | 0.951 | 0.988 | 0.906 |
SwinSUNet | 2022 | 0.957 | 0.923 | 0.940 | 0.985 | 0.892 |
TransUNetCD | 2022 | 0.969 | 0.974 | 0.972 | 0.989 | 0.945 |
Ours | - | 0.978 | 0.963 | 0.971 | 0.993 | 0.943 |
Method | Year | Precision | Recall | F1 | OA | IOU |
---|---|---|---|---|---|---|
FC-EF | 2018 | 0.716 | 0.673 | 0.694 | 0.976 | 0.531 |
FC-Siam-Di | 2018 | 0.473 | 0.777 | 0.588 | 0.956 | 0.417 |
FC-Siam-Conc | 2018 | 0.609 | 0.736 | 0.666 | 0.971 | 0.499 |
DASNet | 2020 | 0.900 | 0.905 | 0.910 | 0.991 | 0.833 |
IFNet | 2020 | 0.969 | 0.732 | 0.834 | 0.988 | 0.715 |
BiT | 2021 | 0.906 | 0.883 | 0.883 | 0.987 | 0.769 |
SwinSUNet | 2022 | 0.950 | 0.926 | 0.938 | 0.994 | 0.882 |
TransUNetCD | 2022 | 0.935 | 0.896 | 0.935 | 0.973 | 0.844 |
Ours | - | 0.940 | 0.928 | 0.939 | 0.975 | 0.886 |
Method | Window Scale | CDD | WHU-CD | |||||
---|---|---|---|---|---|---|---|---|
2 | 4 | 8 | 16 | F1 Score | IoU(%) | F1 Score | IoU(%) | |
Win-4 | ✔ | 95.99 | 92.30 | 92.67 | 86.35 | |||
Win-8 | ✔ | 96.72 | 93.65 | 92.55 | 86.13 | |||
Win-16 | ✔ | 95.08 | 90.62 | 92.18 | 85.50 | |||
NDF-2-16 | ✔ | ✔ | 95.22 | 90.88 | 93.07 | 87.04 | ||
NDF-4-16 | ✔ | ✔ | 95.38 | 91.16 | 92.98 | 86.88 | ||
NDF-8-16 | ✔ | ✔ | 94.83 | 90.17 | 92.47 | 86.00 | ||
NDF-4-8 | ✔ | ✔ | 96.04 | 92.39 | 93.41 | 87.64 | ||
NDF-2-8 | ✔ | ✔ | 96.93 | 94.05 | 93.04 | 87.00 | ||
CRF-2-16 | ✔ | ✔ | 96.93 | 94.03 | 93.35 | 87.53 | ||
CRF-4-16 | ✔ | ✔ | 96.86 | 93.91 | 93.36 | 87.55 | ||
CRF-8-16 | ✔ | ✔ | 96.89 | 93.97 | 93.18 | 87.16 | ||
CRF-4-8 | ✔ | ✔ | 96.84 | 93.88 | 92.78 | 86.54 | ||
CRF-2-8 | ✔ | ✔ | 97.10 | 94.27 | 93.07 | 87.04 |
Parameter | Index | CDD | WHU-CD | ||||
---|---|---|---|---|---|---|---|
F1 | IOU | P (%) | F1 | IOU | P (%) | ||
Patch size | 8 × 8 | 84.79 | 73.59 | 93.69 | 87.05 | 77.06 | 88.78 |
16 × 16 | 92.03 | 85.24 | 95.30 | 91.59 | 84.50 | 92.62 | |
32 × 32 | 96.84 | 93.88 | 97.43 | 93.35 | 87.53 | 94.38 | |
64 × 64 | 97.18 | 94.39 | 97.56 | 94.02 | 88.72 | 93.44 | |
Number of block layers | 1 | 96.80 | 93.79 | 97.40 | 93.08 | 87.05 | 94.45 |
2 | 96.93 | 94.03 | 97.57 | 93.36 | 87.53 | 94.38 | |
3 | 95.98 | 92.89 | 96.93 | 93.27 | 87.38 | 94.45 | |
4 | 95.01 | 92.14 | 96.39 | 92.98 | 86.88 | 94.49 |
Method | Test1 | Test2 | CDD |
---|---|---|---|
F1/IOU/P (%) | F1/IOU/P (%) | F1/IOU/P (%) | |
BiT | 95.96/92.24/96.63 | 77.56/63.35/89.24 | 95.10/90.60/96.20 |
SwinSUNet | 95.56/91.98/96.09 | 77.33/63.04/85.23 | 94.00/89.20/95.70 |
Ours | 98.24/96.55/98.37 | 92.30/85.69/93.64 | 96.93/94.05/97.34 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tao, C.; Kuang, D.; Wu, K.; Zhao, X.; Zhao, C.; Du, X.; Zhang, Y. A Siamese Network with a Multiscale Window-Based Transformer via an Adaptive Fusion Strategy for High-Resolution Remote Sensing Image Change Detection. Remote Sens. 2023, 15, 2433. https://doi.org/10.3390/rs15092433
Tao C, Kuang D, Wu K, Zhao X, Zhao C, Du X, Zhang Y. A Siamese Network with a Multiscale Window-Based Transformer via an Adaptive Fusion Strategy for High-Resolution Remote Sensing Image Change Detection. Remote Sensing. 2023; 15(9):2433. https://doi.org/10.3390/rs15092433
Chicago/Turabian StyleTao, Chao, Dongsheng Kuang, Kai Wu, Xiaomei Zhao, Chunyan Zhao, Xin Du, and Yunsheng Zhang. 2023. "A Siamese Network with a Multiscale Window-Based Transformer via an Adaptive Fusion Strategy for High-Resolution Remote Sensing Image Change Detection" Remote Sensing 15, no. 9: 2433. https://doi.org/10.3390/rs15092433
APA StyleTao, C., Kuang, D., Wu, K., Zhao, X., Zhao, C., Du, X., & Zhang, Y. (2023). A Siamese Network with a Multiscale Window-Based Transformer via an Adaptive Fusion Strategy for High-Resolution Remote Sensing Image Change Detection. Remote Sensing, 15(9), 2433. https://doi.org/10.3390/rs15092433