1. Introduction
Remote sensing images are images that capture the Earth’s surface information, collected through distant sensors mounted on platforms such as aircraft, satellites, or unmanned aerial vehicles (UAVs) [
1]. These images offer a comprehensive view of the Earth’s surface, enabling the extraction of valuable geographic and environmental data. These data include, but are not limited to, land cover types, land use patterns, topography, atmospheric conditions, and oceanic conditions. They provide diverse geographic information. These images play an important role in environmental monitoring, agriculture, urban planning, resource management, etc. [
2]. The change detection of remote sensing images refers to the process of detecting changes in the same geographical area by analyzing a set of images captured at different time periods [
3]. This task plays a crucial role in various applications, as it helps to track the evolution of landscapes, environmental conditions, and human activities over time. The current change detection tasks can be categorized into two main types: pixel-level change detection and semantic-level change detection. This paper primarily focuses on pixel-level change detection, which involves identifying changes at the individual pixel level, thereby providing a detailed, fine-grained view of the differences between images [
4]. Pixel-level change detection is particularly important in a variety of fields, offering critical insights into both natural and human-induced changes in the environment [
5].
However, the current form of remote sensing image change detection still suffers from significant errors, which may originate from various factors, including different cloud cover conditions, lighting conditions, object occlusion and shadows, as well as noise caused by atmospheric and terrain changes at different time points in remote sensing images [
6]. Among these errors, remote sensing images from different seasons may incur larger errors in change detection tasks due to seasonal variations in spectral characteristics, land cover, and vegetation. For example, when faced with images covered by a large amount of snow, a change detection model may be interfered with by the snow-covered areas, mistakenly identifying the change from snow-free to snow-covered land as a real change. However, since the type and usage of the land in this area have not changed, the change detection model is expected to exclude such misjudgments caused by seasonal changes. In change detection tasks, these seasonal errors can lead to misjudgments of seasonal changes and misidentifications of genuine changes by change detection models, significantly impairing the performance of change detection tasks.
However, most existing change detection methods have not been optimized specifically for seasonal errors [
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19]. Existing methods for eliminating seasonal errors are mostly based on image-to-image transformation techniques [
20,
21,
22,
23,
24,
25,
26,
27]. However, these methods lack guidance from the target image visual fidelity information (visual fidelity refers to the authenticity of the synthesized image in both spatial and spectral dimensions, including the realistic representation of ground features such as color, texture, and semantics [
28]), which may lead to the incorrect transformation of the original image during the conversion process. That is, in an attempt to change a region to resemble the corresponding season, the type, usage, texture, and color of the land may undergo incorrect changes. For example, in snow-covered winter images, roads are typically black due to vehicles passing over the snow. During seasonal error elimination, this may be mistakenly transformed into an asphalt road or a rural dirt road, which could result in a deviation from the original appearance of the road. This phenomenon can seriously interfere with change detection tasks. Additionally, it was found in the experiment that due to the significant differences in surface coverage, color, texture, etc., between snow-covered winter images and snow-free winter images, using a single transformer for seasonal error elimination can lead to severe loss of visual fidelity. For example, when using CycleGAN (Cycle-Consistent Generative Adversarial Networks) to eliminate seasonal errors of snow-covered winter images and snow-free winter images, the overall lighter color tone of snow-covered winter images causes an issue. After training the generator with a dataset that contains snow-covered winter images, using the same generator to transform snow-free winter images results in darker overall color tones in the generated images, with blurred ground feature details. This leads to a decline in the accuracy of subsequent change detection tasks.
To address the performance degradation in change detection caused by seasonal errors in remote sensing images, a method aimed at eliminating seasonal errors in remote sensing images is proposed. For remote sensing images of the same area at different times, the method first classifies them and uses two generators trained on two different datasets to perform seasonal error elimination on snow-covered winter images and snow-free winter images. In each generator, a hybrid attention module is used, integrating channel attention and spatial attention to extract features from images not in winter. Then, it performs multiple down-sampling processes on the winter images to obtain their deep features, and uses a cross-attention mechanism to fuse these features with the features of images not in winter. After several up-sampling steps, it obtains the final generated images, forming a new structure: Target Image Feature Fusion Generator (TIFFG). The TIFFG is embeded into the CycleGAN network as a generator and two TIFFGs are trained on two different datasets, thereby obtaining two TIFFGs for transforming different land cover images.Compared to traditional change detection models, this method optimizes for the performance degradation in change detection caused by seasonal errors in remote sensing images taken in winter. Compared to existing seasonal error elimination methods, the method incorporates features of images not taken in winter into TIFFG, reducing the incorrect transformation of the original image in the process of seasonal error elimination by other image-to-image transformation models. Our code is available at
https://github.com/zjp-zjp-zjp/TIFFG (accessed on 26 November 2024).
In addition, it should be clarified that the seasonal error elimination method proposed in this paper is aimed at eliminating errors between winter and non-winter images. Specifically, it uses TIFFG to convert the texture and color of certain regions in the winter image to match the corresponding regions in the non-winter image, thereby eliminating seasonal errors. The use of a dual-branch structure to separately process snow-covered and snow-free image pairs is primarily to improve the accuracy of the model’s transformation, as experiments show that snow cover has a significant impact on the spectral characteristics of the image and the seasonal error elimination task.
In summary, our innovations and contributions are as follows:
A hybrid attention module formed by combining spatial attention and channel attention is used to extract information from remote sensing images and fuse it with the deep features of the original image obtained through down-sampling, and perform up-sampling to obtain the final generated image. This approach combines information from remote sensing images not in winter to reduce seasonal errors between winter and non-winter remote sensing images and prevents the incorrect transformation of the original image, thus improving change detection performance.
The Dual-Branch Seasonal Error Elimination Change Detection Framework using Target Image Feature Fusion Generator is proposed with respect to snow-covered winter images and snow-free winter images with substantial differences in surface coverage, color, and texture, which performs seasonal error elimination separately for snow-covered data pairs in winter and snow-free data pairs in winter to solve the problem of a severe loss of visual fidelity.
Extensive experiments demonstrate that the proposed model improves the performance of change detection tasks compared to methods without seasonal error elimination, achieving an average increase of 7% in the F1-score for change detection tasks. Additionally, it outperforms other image-to-image methods used for seasonal error elimination.
The remainder of this paper is organized as follows.
Section 2 introduces the mainstream methods currently used for seasonal error elimination and change detection.
Section 3 first describes the overall process of the proposed seasonal error elimination method and provides the corresponding pseudocode. It then introduces the technical approaches of the dual-branch structure, the proposed TIFFG, the CycleGAN network used to train TIFFG, the seasonal classification model, and the change detection model used.
Section 4 discusses the methods for acquiring training datasets for each module in the model, model training methods, experimental setup, as well as the results and visualizations of comparative experiments and ablation studies.
Section 5 analyzes the model’s performance based on the comparative experiments and ablation studies, the necessity of key components in the model, and the model’s benefits and shortcomings.
Section 6 summarizes the findings, the overall approach of the proposed method, its advantages and disadvantages, and directions for future work.
3. Methods
In order to maintain the accuracy of land cover types and uses while aligning the texture and color of certain areas in the winter image as closely as possible with the corresponding areas in the non-winter image, the TIFFG is proposed, which captures the land cover and use information from certain areas of the winter image, as well as the texture and color information from the corresponding areas in the non-winter image, and integrates both into the transformation process to prevent incorrect transformations of the original image.
To avoid the severe loss of visual fidelity caused by the significant differences in texture and color features between snow-covered and snow-free images, as well as the different requirements for style transformation, the proposed method first classifies pairs of remote sensing images and processes and transforms snow-covered and snow-free winter images through a dual-branch structure with different TIFFGs. This allows the seasonal error elimination task to use more suitable transformers for style transformation when dealing with images of different land cover, thereby improving the transformation effect.
In the proposed method, for a pair of cross-season remote sensing images, it first classifies them into winter and non-winter images. The winter images are further divided into snow-covered and snow-free winter images. Then, using a dual-branch structure, depending on the snow coverage of the winter image in the image pair, different TIFFGs are applied to transform the textures and colors of certain regions in the winter image to match the corresponding regions in the non-winter image, thereby eliminating seasonal errors. Finally, the transformed image is input together with the non-winter image from the image pair into a change detection model to complete the final change detection task. Algorithm 1 is the pseudocode for the overall process of the method proposed.
Algorithm 1: Dual-Branch Seasonal Error Elimination Change Detection Framework using Target Image Feature Fusion Generator |
|
3.1. Dual-Branch Seasonal Error Elimination Change Detection Framework Using Target Image Feature Fusion Generator
Through experiments, it was found that snow coverage has a significant impact on the seasonal error elimination task. Specifically, in the transformation from snow-covered winter images to non-winter images, the generator needs to address the strong reflection properties of snow and eliminate its texture. The reflection and color of snow are typically much more complex than the other elements in snow-free winter images (such as bare soil, dead grass, etc.), meaning the generator must adapt to complex texture mapping and lighting changes. On the other hand, the transformation from snow-free winter images to non-winter images is relatively simple because the background already resembles the non-winter scene. The generator only needs to handle seasonal changes (such as color variation and vegetation growth).
Using a dual-branch generator, with two separate TIFFGs trained on snow-covered images and their corresponding non-winter images, as well as snow-free images and their non-winter corresponding non-winter images, can effectively meet the style transformation requirements for both and solve the issue of significant surface coverage, color, and texture differences between snow-covered and snow-free images. In practical experiments, it was found that using a single generator for transformation led to overly dark generated images in the seasonal error elimination from snow-free images to non-winter images. This could be due to the stronger reflection intensity of snow-covered images, which results in higher overall image brightness. As a result, the trained model would also severely reduce the image brightness when transforming snow-free images.
For this reason, DBSEE-CDF is proposed. This network handles change detection in remote sensing images for both snow-covered and snow-free scenes separately. The structure is illustrated in
Figure 1.
Specifically, when a pair of remote sensing images is input into the network, they undergo a structured processing flow to classify and transform the images based on their seasonal characteristics.
First, both images in the pair are classified using a seasonal classification model, with ResNet50 [
34] serving as the architecture for the model. Since ResNet itself has a sufficiently high image classification accuracy, we did not alter its network structure and instead began fine-tuning from the pre-trained ResNet50 model. This classification model outputs one of three possible categories for each image: snow-covered image in winter, snow-free image in winter, or image not in winter.
Once the classification is performed, the network examines the specific combination of image types in the pair. If the pair consists of one snow-covered image in winter and one image not in winter, the snow-covered image in winter is forwarded to the TIFFG specialized for snow-covered images, denoted as . This generator is responsible for transforming the snow-covered winter image into an image that closely resembles the non-winter conditions, thus eliminating the seasonal error and ensuring consistency across the pair. Similarly, if the pair consists of one snow-free image in winter and one image not in winter, the snow-free image in winter is passed to the TIFFG specialized for snow-free images, denoted as . This generator performs the necessary transformation to convert the snow-free winter image into an image more representative of non-winter conditions.
After the transformation process, the transformed image is then combined with the corresponding image from the non-winter set. This combined pair is then input into the final change detection model, which is designed to analyze and detect changes between the two images.
By separating the processing of snow-covered and snow-free winter images, the network ensures that seasonal errors are effectively mitigated and change detection is performed more accurately in both snow-covered and snow-free scenes. This structured approach allows for a more robust handling of seasonal variations in the dataset and improves the overall performance of the change detection task.
The training processes for the seasonal classification model, the two generators, and the change detection model will be described in
Section 4. Particularly, for the training of the change detection model, to adapt it to the images generated by TIFFG, the transformed dataset will be used instead of the original dataset.
3.2. Target Image Feature Fusion Generator
Motivated by hybrid attention module [
35], TIFFG is proposed, which integrates spatial attention and channel attention with a UNet-based generator. These components are combined through a cross-attention module, as illustrated in
Figure 2.
For a set of remote sensing images captured at different times, assuming the winter image is denoted as
and the non-winter image as
,
undergoes five down-sampling steps to obtain its deep features
. As for
, it first goes through a ResNet50 encoder to be encoded, resulting in shallow features
. Subsequently, these shallow features are fed into a hybrid attention module to generate the deep features of
. Specifically, for the shallow features of
, they are first processed by a channel attention module to extract channel features
, and then the channel features
and the shallow features
are multiplied using element-wise multiplication to obtain intermediate features
. Introducing channel attention in seasonal error elimination can help the model to focus more precisely on channels that are heavily influenced by seasonal changes. This enhancement allows the model to better adapt to variations in data distribution across different seasons and to capture essential feature patterns more effectively during specific seasons. As a result, it reduces the negative impact of seasonal errors on change detection results. The detailed processes are as follows:
where
represents sigmoid function.
is multilayer perceptron.
and
are the average pooling layer and max pooling layer. Using a combination of max pooling and average pooling in channel attention allows for the integrated utilization of different pooling methods to enhance the model’s perception of channel importance. Average pooling, coupled with a sigmoid function, maps the average response of each channel to a normalized weight value, while max pooling directly selects the strongest feature response without needing additional probability transformations; hence, it does not use a sigmoid function.
For the middle features
, they are first processed by a spatial attention module to extract spatial features
, and then the spatial features
and the middle features
are multiplied using element-wise multiplication to obtain deep features
:
Introducing spatial attention in seasonal error elimination can help the model to more effectively capture and model spatial relationships between different locations in images. It aids the model in focusing on significant spatial positions, such as corners of buildings or specific segments of roads, enhancing its ability to generalize across different geographical locations or shooting conditions. This improvement allows the model to better distinguish between seasonal variations and actual changes. The advantage of a combination of max pooling and average pooling in spatial attention functions is similar to channel attention.
Finally, the deep features of
and the deep features of
are input into the cross-attention module to obtain the final blended features:
where
,
,
are the query, key, value weight matrices in the cross-attention module. Using cross-attention to fuse deep feature matrices allows each element in one feature matrix to attend to relevant elements in another feature matrix. This enhances the model’s capability to integrate complementary information from both matrices, enabling selective focus on important features across different matrices. It promotes the effective communication and alignment of information between the two matrices, leveraging the strengths of each feature matrix while mitigating their weaknesses.
After obtaining the blended features , they are passed through five up-sampling layers to reconstruct the final generated image.
3.3. TIFFG Training Method
TIFFG is trained using CycleGAN training strategy, as shown in
Figure 3. In experiments, it is required for the images generated by TIFFG to closely resemble non-winter images. In such scenarios, employing GANs is a conventional approach. Among these, by introducing cycle consistency loss and enabling transformation of unpaired data, CycleGAN effectively captures style features from images and prevents the significant loss of essential image content during the conversion process. Given that the image pairs in this study were captured at different times, resulting in considerable variations in image content, the preservation of image content information by CycleGAN holds significant importance for seasonal transformation tasks and change detection tasks.
For a pair of input images
and
, in one direction, CycleGAN first utilizes the TIFFG-AB (
) to convert
into
, and then uses the TIFFG-BA (
) to convert
into
. In the opposite direction, it performs the same process to obtain
and
. To generate identity loss, it feeds
into
, resulting in
, and feeds
into
, resulting in
. Then, the
and
images are fed into the discriminator
to determine which one is real and which one is fake. In a Generative Adversarial Network (GAN), the discriminator plays a crucial role by providing a key gradient signal to the generator. This forces the generator to improve its generated images to deceive the discriminator as much as possible, making it unable to accurately distinguish generated images from real ones. Ultimately, this process enables the generator to produce more realistic images. The output of the GAN discriminator can be interpreted as the probability that the input image is real. Next, it calculates the adversarial loss:
After that, it computes the consistency loss:
Then, it is necessary to calculate the identity loss:
Identity loss encourages the generator to map the input image back to itself, thereby promoting the preservation of the input image’s structure and content. This addresses the issue where generated images might lose some important features of the original input.
Finally, CycleGAN calculates the total loss for the entire network and updates the weights accordingly:
where
,
,
are hyperparameters that defines the respective impact levels of
,
, and
,
on training.
4. Experiment
4.1. Overall Training Process
The training process for this framework involves multiple stages, including the training of a seasonal classification model, two TIFFGs, and change detection models. Each stage is designed to systematically address seasonal errors and enhance the performance of change detection tasks.
The first step involves preparing a season-classified image dataset derived from the original high-resolution images. This dataset is labeled to categorize images into three distinct classes: snow-covered winter images, snow-free winter images, and images not in winter. The labeled dataset is then used to train the seasonal classification model, enabling it to accurately identify and classify images based on their seasonal characteristics.
Next, using the seasonal classification model, images from the original dataset classified as snow-covered winter images and their corresponding images not in winter are selected to form one dataset, while images classified as snow-free winter images and their corresponding images not in winter are selected to form another dataset. Each of these datasets is then used to train a separate TIFFG. The purpose of these TIFFGs is to transform winter images—either snow-covered or snow-free—into images representative of non-winter conditions. This targeted transformation process is essential for mitigating the impact of seasonal variations on subsequent change detection tasks.
The trained seasonal classification model and TIFFGs are combined to process the original dataset. By applying these models, seasonal errors are systematically eliminated, resulting in a seasonally corrected dataset where the effects of winter conditions are minimized. This corrected dataset serves as the foundation for training and testing the change detection models, ensuring that they are optimized to focus on genuine changes in the environment without being influenced by seasonal anomalies.
4.2. Dataset
The ChangeDetectionDataset [
36] is utilized as an experimental dataset. This dataset consists of 11 pairs of high-resolution remote sensing images, each of varying sizes, and is complemented by a substantial set of 10,000 image pairs for the training set, 2998 image pairs for the validation set, and 3000 image pairs for the test set. These image pairs are generated from the 11 high-resolution remote sensing images and represent different time periods and seasonal conditions, providing a diverse and rich foundation for change detection tasks.
Each image pair in the dataset corresponds to a before-and-after scenario from different seasons or time periods, with the corresponding ground truth clearly depicting the changes that occurred between the two images. Each image in the training set, validation set, and test set is sized at 256 × 256 pixels.
Given its diverse range of images from different seasons and time points, as well as its detailed ground truth annotations, the ChangeDetectionDataset is highly suitable for training and validating our experimental model. The combination of high-resolution imagery, large-scale data, and seasonal variation makes this dataset an ideal choice for evaluating the effectiveness of the proposed seasonal error elimination and change detection framework.
4.3. Seasonal Classification Dataset Acquisition and Model Training
Due to the mixed seasonal nature of the original dataset, which contains both summer and winter images in sets A and B, it is essential to train a seasonal classification model to properly organize the dataset. The original 11 pairs of high-resolution images are relatively easy to classify in terms of their seasonal characteristics, as the seasonal differences are quite distinct. By re-cropping these 11 pairs of high-resolution images, a large number of individual samples can be generated. Each cropped sample’s season is determined by referencing the corresponding high-resolution image.
These cropped samples are then classified according to their seasonal characteristics and stored in the appropriate categories. As a result, the dataset is divided into three distinct groups: 1119 samples of images not in winter, 495 samples of snow-covered images from winter, and 624 samples of snow-free images from winter. This carefully organized dataset is subsequently used to train the seasonal classification model, allowing the model to learn to differentiate between different seasonal conditions and to classify future images accordingly. This seasonal classification model forms the foundation for subsequent processing steps, such as seasonal error elimination and change detection tasks, ensuring that the images are accurately classified based on their seasonal context.
4.4. TIFFG Dataset Acquisition and TIFFG Models Training
Utilizing the seasonal classification model mentioned in
Section 4.3, the original dataset is organized and classified as follows: first, all winter images in the original dataset are placed in set A, while non-winter images are placed in set B. Then, the entire dataset is divided to ultimately form three datasets:
The comprehensive dataset , which contains all images from the original dataset, with a basic guarantee that set A consists entirely of images in winter and set B consists entirely of images not in winter.
The snow-covered dataset , which contains a subset of images from , with a basic guarantee that set A consists entirely of snow-covered images, while set B consists entirely of images not in winter.
The snow-free dataset , which contains a subset of images from , with a basic guarantee that set A consists entirely of snow-free images, while set B consists entirely of images not in winter.
Using and in combination with the CycleGAN, two TIFFGs are trained specifically for transforming the images in set A. These TIFFGs are designed to handle the transformation of winter-related images in set A, with one TIFFG tasked with transforming snow-covered images and the other with transforming snow-free images. These two trained TIFFGs serve as the two generators in the DBSEE-CDF. By leveraging the strengths of these generators, the framework aims to effectively address and eliminate seasonal errors that might otherwise interfere with change detection tasks.
In addition to the primary TIFFGs, for the purpose of ablation experiments, two more TIFFGs are trained without the target image feature extraction and fusion modules. These TIFFGs are trained using and , employing the same CycleGAN network architecture. These ablation experiments are crucial for evaluating the impact of the target image feature extraction and fusion module on the performance of the seasonal error elimination process.
To further benchmark and compare the seasonal error elimination method proposed in this paper with other existing approaches, several well-known image-to-image translation models are trained, including the original CycleGAN, Swapping Autoencoder, SuperstarGAN, EGSDE, and Pnp-diffusion. These models are trained using the to assess their relative performance in seasonal error elimination and change detection tasks. This comparative analysis helps to validate the effectiveness of the proposed method and highlights any potential advantages it may offer over other state-of-the-art techniques.
4.5. Change Detection Models Training
First, for each sample in
, the classification model discussed in
Section 4.3 is applied to classify the images, and the corresponding TIFFG is used to perform the transformation. The transformed images are then stored in a new set,
. Afterward,
is merged with the original set
B and the ground truth to create the modified dataset
.
Next, for each sample in , the same classification model is used to classify the images; however, this time, the TIFFGs that do not include the target image feature extraction and fusion module are employed for transformation. The resulting transformed images are stored in a new set, . This set is then merged with the original set B and the ground truth to form the dataset .
To compare the seasonal error elimination method proposed in this paper with other existing methods, several alternative models are applied. For each sample in
, the original versions of CycleGAN, Swapping Autoencoder, SuperstarGAN, EGSDE, and Pnp-diffusion, as described in
Section 4.4, are used for transformation. This results in the creation of sets
,
,
,
, and
. Each of these sets is then merged with the original set
B to form the datasets
,
,
,
, and
.
Different change detection models are then used, including CGNet-CD [
14], Change Former [
15], and HANet-CD [
16], to train on
,
,
,
,
,
,
, and
, resulting in corresponding change detection models
,
,
,
,
,
,
, and
.
The performance of
is evaluated on the three datasets mentioned in
Section 4.4 to establish the baseline performance of the change detection model. Next, the two TIFFGs and
are integrated into the DBSEE-CDF and tested on the same three datasets to assess the performance improvements achieved by the seasonal error elimination techniques proposed in this work.
For ablation experiments, the target image feature extraction and fusion modules is removed from the two TIFFGs and embed with into the DBSEE-CDF. These ablation tests are then conducted on the same datasets to compare performance with the full version of the framework.
Finally, the original CycleGAN, Swapping Autoencoder, SuperstarGAN, EGSDE, and Pnp-diffusion models, as trained in
Section 4.4, are connected to
,
,
,
, and
, respectively, for a comparative analysis of the seasonal error elimination techniques and their impact on change detection performance. This comprehensive comparison provides insights into the relative effectiveness of the proposed approach in handling seasonal errors compared to existing methods.
4.6. Hyperparameters and Runtime Environment
For the training of the seasonal classification model, we use 50 iterations, the SGD optimizer with a learning rate of 0.001, and momentum of 0.9 to accelerate the training process. The batch size for the data is set to 4.
For the training of the image-to-image seasonal error elimination model, we use 200 iterations, an initial learning rate of 0.0002, the Adam optimizer, a first-order momentum decay rate of 0.5, and a second-order momentum decay rate of 0.999. In the first 100 iterations, the learning rate remains constant, and from iteration 101 onward, the learning rate gradually decays to 0. The batch size for the data is set to 4.
For the training of the change detection model, we use 100 iterations, an initial learning rate of 0.0001, the Adam optimizer, and the learning rate remains constant for the first 50 iterations. From iteration 51 onward, the learning rate gradually decays with a decay factor of 0.1. The batch size for the data is set to 4.
All training is conducted on NVIDIA GeForce RTX 3090 GPUs with 24GB of VRAM.
4.7. Experiment Setup
Extensive experimental validations are conducted to assess the effectiveness of the proposed framework and the necessity of its key components.
To validate the generalizability of the framework proposed (i.e., that using the proposed model for seasonal error elimination can improve the accuracy of change detection tasks regardless of the variation detection method), several change detection models are trained and tested, including CGNet-CD, ChangeFormer, and HANet-CD, using a dataset that underwent seasonal error elimination with the proposed framework. The results are compared with those where no seasonal error elimination is performed.
To demonstrate that the proposed model achieves better performance on the seasonal error elimination task compared to existing image-to-image translation models, and provides a greater improvement in the change detection task, several image-to-image translation models for seasonal error elimination (including CycleGAN, Swapping Autoencoder, SuperstarGAN, Pnp-diffusion, and EGSDE) are applied. Change detection models are then trained and tested on the dataset with seasonal errors eliminated, and the results are compared with those obtained using the seasonal error elimination framework proposed.
To verify the necessity of the target image feature extraction module and the dual-branch structure, and their effect on improving the seasonal error elimination task, we separately removed target image feature extraction module () and the dual-branch structure () and performed seasonal error elimination on the dataset. The change detection models are then trained and tested using the dataset with seasonal errors eliminated and the results are then compared with those obtained using the seasonal error elimination framework proposed.
4.8. Comparison
For each change detection model, the comparison experiment began by testing the
described in
Section 4.5 on three datasets:
,
, and
. These tests are performed to establish baseline performance metrics for the model when applied to these datasets under standard conditions, without incorporating any seasonal error elimination techniques. This initial testing allowed us to obtain the model’s effectiveness and identify areas for improvement.
Next, the modified
along with the two TIFFGs discussed in
Section 4.5 are integrated into the DBSEE-CDF. The updated framework is then tested on the same three datasets—
,
, and
—to evaluate the performance improvements brought about by the incorporation of seasonal error elimination. This allowed us to assess how well the proposed framework addressed seasonal errors and whether it led to better change detection accuracy.
Finally, the evaluation is extended by connecting the various image-to-image translation models described in
Section 4.4 to
,
,
,
, and
. These models are also tested on
,
, and
to provide a comparative analysis. This step allowed us to benchmark the performance of the proposed DBSEE-CDF against a variety of other image-to-image translation methods, further validating the effectiveness of the proposed approach.
The F1-score results of all the above experiments are shown in
Table 1.
Five pairs of images are sampled with corresponding change labels from
and
, respectively. Using CGNet-CD, Changeformer, and HANet-CD, we first applied the proposed dual-branch seasonal error elimination method before conducting change detection. Then, we directly performed change detection. The final change detection results for both approaches are shown in
Figure 4, illustrating that the proposed dual-branch seasonal error elimination method effectively enhances the performance of change detection tasks for these three change detection models.
4.9. Ablation Study
To validate the effectiveness and necessity of both the target image feature extraction and fusion module, as well as the dual-branch structure, a series of ablation experiments are conducted. These experiments aimed to isolate and evaluate the individual contributions of these components to the overall performance of the framework.
First, to assess the impact of the target image feature extraction and fusion module, two TIFFGs are integrated into the DBSEE-CDF, resulting in the network configuration . This configuration includes the full functionality of both TIFFGs, which process the seasonal features of the images to enhance the accuracy of change detection.
Next, an experiment is conducted where the target image feature extraction and fusion modules are removed from the two TIFFGs. The remaining TIFFGs are then embedded into the DBSEE-CDF, forming the network configuration . This experiment allowed us to observe how the removal of these modules impacted the framework’s ability to address seasonal errors and perform change detection tasks effectively.
To evaluate the necessity of the dual-branch structure, a single TIFFG is connected to the change detection models, omitting the seasonal classification module. This formed a single-branch seasonal error elimination change detection network, denoted as . By using a single TIFFG without the full dual-branch architecture, it could be assessed whether the additional branch structure provided significant improvements in eliminating seasonal errors and enhancing change detection performance.
These three network configurations—
,
, and
—are trained according to the methods described in
Section 4.4 and
Section 4.5. This training process resulted in distinct generators and corresponding change detection models for each configuration. After training, each network is tested on
,
, and
using a variety of change detection models to evaluate their performance.
The results of these ablation experiments, which provide insights into the individual contributions of each module and structure, are presented in
Table 2.
6. Conclusions
In summary, we proposed the DBSEE-CDF method.
In our study, it was found that in seasonal error elimination, since the downstream task is change detection, it is crucial to preserve the land cover types and uses in certain areas of the winter image while aligning the texture and color as closely as possible with the corresponding areas in the non-winter image. Therefore, it is essential to capture the land cover and use information from the winter image’s certain areas and the texture and color information from the corresponding areas of the non-winter image, and to integrate both into the transformation process. To achieve this, TIFFG is proposed, which performs down-sampling on the original images to obtain deep features. Using a ResNet encoder and a mixed attention module, the method generates deep features for the target images and fuses them through a cross-attention module. The fused features are then up-sampled five times to generate the final images. The experimental results show that the proposed network effectively enhances the performance of various change detection models and mitigates the impact of seasonal errors on remote sensing image change detection tasks.
We also discovered that during seasonal error elimination, the snow-covered areas have lighter colors and reflect more sunlight, leading to significant differences in texture and color features between snow-covered and snow-free regions in remote sensing images. This results in a severe loss of visual fidelity caused by the significant differences in texture and color features as well as the different requirements for style transformation. To address this issue, the proposed method first classifies pairs of remote sensing images and processes and transforms snow-covered and snow-free winter images through a dual-branch structure with different TIFFGs. The transformed images are then input along with non-winter images into the change detection model to generate the change detection results.
However, the proposed method still has certain limitations such as complex training processes, a high difficulty of use, and significant training time overhead. In the future, we will simplify the model structure and training process to reduce the difficulty of using the model and the training overhead. Additionally, to better capture the land cover types and uses in certain areas of the winter image and integrate them into the transformation process, we plan to incorporate semantic information from the winter images into the model to further enhance its performance.