1. Introduction
The surface of the Earth changes with the time, mainly because of natural and human impacts. Natural forces such as continental drift, glaciers action, floods, and tsunamis, as well as human forces, such as the transformation from forest to agriculture land, urban expansion, and the dynamic change in forest planting, have changed the types of land cover. In recent decades, the rate of land-cover change caused by human factors has greatly accelerated compared with natural factors. This unprecedented rate of change has become a major global environmental problem, and almost all ecosystems in the world are affected by human beings. Human activities have a great impact on the change in land cover mainly due to the development of technology and population expansion [
1]. Changes in land cover and land use have both positive and negative impacts on human beings. The transformation of forest into arable land can provide food, vegetables, fruits, and fibers for clothing to meet the needs of more people. At the same time, deforestation also brings a reduction in biodiversity, aggravation of soil erosion, and other consequences. Land-cover and land-use change bring us benefits and economic growth, but often at the cost of ecosystem degradation. Remote sensing images of the same area obtained at different times can be used to identify and determine the types of surface changes and their spatial distribution. This process is the change detection technology of remote sensing images. The core idea of change detection is to find the change between multitemporal images and the location and type of the change target. You et al., on the basis of the output requirement, considered that there are three scales of remote sensing change detection, namely, scene-level, region-level, and target-level [
2]. This paper only pays attention to remote sensing region-level change detection. The methods for change information extraction include mathematical analysis, feature space transformation, feature classification, feature clustering, and neural network [
2]; conventionally they can be divided into pixel- and object-based change detection methods [
3]. For multispectral image change detection, the object-based change detection method was proposed because pixel-based change detection is prone to produce “salt and pepper” noise. The object-based change detection method takes the image patch as the basic unit; therefore, image segmentation is the necessary prerequisite for object-based image processing. Image segmentation is a main research focus in the area of remote sensing image analysis. With the development of geographic object-based image analysis (GEOBIA) around the turn of the century, as the first step of GEOBIA, segmentation plays a fundamental role for remote sensing applications, especially for high-spatial-resolution images. A number of methods have been proposed for remote sensing image segmentation and can be divided into many categories. Traditional segmentation methods include the following [
4]:
1. Thresholding. Thresholding is one of the simplest approaches, and the automatic determination of the best segmentation threshold is the main content of research work. Pare et al. [
5] conducted an extensive survey on various optimization algorithms in multilevel thresholding focusing on metaheuristics, swarm algorithms, and different entropy-based objective criteria.
2. Region growing or extraction. The popular remote sensing image processing software eCognition uses the multiscale segmentation algorithm, which is a region merging technique [
6]. The merging decision of adjacent image objects is based on local homogeneity criteria.
3. Edge- or gradient-based approaches. Watershed segmentation is a gradient-based method which is well known with the problem of oversegmentation. In order to reduce the oversegmentation of watersheds segmentation, Gaetano et al. [
7] automatically generated morphological and spectral markers that help in limiting oversegmentation. Ciecholewski [
8] merged the region by maximizing average contrast with the use of gradient morphological reconstruction.
In addition to the above traditional segmentation methods, remote sensing image segmentation has also developed some other methods combined with specific theoretical tools including image segmentation method based on wavelet transform [
9], genetic algorithms [
10,
11], and active contours. Braga et al. [
12] developed a fast level set-based algorithm encompassing a nonparametric median filter for curvature regularization and morphological operations to perform front propagation to efficiently deal with speckle noise on synthetic aperture radar (SAR) images. Jin et al. [
13] modified the energy functional in the level set method and employed the
distribution for high-resolution PolSAR (polarimetric synthetic aperture radar) image segmentation, which is applicable to heterogeneous regions.
In image segmentation, the result is image objects for further classification and other procedures. In recent years, deep learning (DL) has been widely used in the field of computer vision. Image segmentation using DL to predict classes on a pixel level, which is regarded as semantic segmentation [
14]. Hoeser et al. [
15] reviewed 261 studies in the Earth observation field which used Convolutional Neural Networks (CNNs) for image segmentation, among which 62% were encoder–decoder models and 54% were related to the U-Net design. This is related to the property of remote sensing data having the occurrence of tiny and fine-grained targets.
The above methods are applied to the segmentation of single temporal remote sensing image. For object-based change detection methods, multitemporal remote sensing images need to be segmented and compared. Multitemporal image objects are segmented independently, resulting in different boundaries [
16,
17], which brings difficulties to the subsequent change analysis. To solve the problem, three different methods are adopted:
1. Stacking all the multi-temporal images into a single layer, and then performing synchronous segmentation [
18]. This has been widely used with the emergence of commercial software represented by eCognition.
2. Segmenting one image and assigning the result to another image. Through analysis, the change can be detected [
19].
3. Segmenting bitemporal images and detecting straight lines separately, before overlapping the segmentation results to obtain changes using a refining stage [
20].
Cosegmentation is a method used in computer vision to extract a common object of interest from a set of images. Rother [
21] firstly proposed the idea of cosegmentation. Markov random field is used to construct an energy function. The idea of graph theory is introduced to build the image into an undirected graph with weights, namely, the network flow diagram. The energy function is taken as the weight in the graph and is optimized on the basis of graph cut theory to obtain the final segmentation result. Cosegmentation has attracted the attention of a wide range of scholars. Ma et al. [
22] roughly classified cosegmentation methods into three types in accordance with different basic units: pixel-based, region-based [
23,
24], and object-based cosegmentation. Pixel-based cosegmentation takes pixels as the basic processing unit, calculates the probability that the pixel belongs to foreground or background in accordance with different energy function models, and optimizes the energy function to conduct segmentation. The advantage of this method is its simple operation steps, whereas its disadvantage is its large computational load. Recently, Merdassi et al. [
25] made an overview of existing cosegmentation methods and classified them into eight categories (Markov random field (MRF)-based, co-saliency-based, image decomposition-based, random walker-based, map-based, active contour-based, clustering-based, and deep learning-based cosegmentation) according to their strategies. Most existing methods are based on graphs, particularly on the MRF framework, because these methods are widely used to solve the combinatorial optimization problems. Cosegmentation methods are now capable of processing multiple band images with different objects. The author mentioned that cosegmentation can be applied to detect roads, bridges, and rivers in remote sensing images and pointed out that, in the nonmilitary field, the application of cosegmentation in aerial images is still limited.
Yuan et al. [
26] combined cosegmentation and remote sensing image change detection with a change intensity map as the guide. The change intensity map is based on the difference between bitemporal images. The change detection and image segmentation are linked together, and the image is constructed into a network flow diagram. The method of minimum cut/maximum flow is used to optimize the energy function and simultaneously complete the image segmentation and change detection. In bitemporal change detection, each image has its own feature item; therefore, there are two cosegmentation change detection results according to each phase image. Xie [
27] and Zhu et al. [
28] changed the optimization method of the energy function and obtained the minimum cut of the energy function with Dinic’s algorithm of maximum flow/minimum cut. By changing the form of the image feature item in the energy function, they obtained a unified change image.
Compared with traditional change detection methods, cosegmentation change detection has the following advantages:
Cosegmentation change detection can adequately overcome the “salt and pepper phenomenon” compared with the pixel-based change detection method;
It can generate a multitemporal change object with a consistent boundary compared with the object-based detection method;
Cosegmentation considers the image information, such as spectrum and texture, and mines the spatial neighborhood information between pixels.
However, the network flow diagram constructed in the minimum cut/maximum flow method of cosegmentation change detection takes each pixel as a node in the graph. Thus, the number of algorithm iterations is closely related to the total number of pixels in the graph. If the number of pixels of the image is more than 1000 × 1000, the number of edges in the image reaches 1 million, and the number of iterations is greatly increased, reducing the operation efficiency of the algorithm. In this paper, superpixel segmentation is introduced to make it suitable for the change detection of a large scene and enhance its practicability.
A superpixel is a homogeneous region composed of adjacent pixels with similar texture, brightness, spectral value, and other properties. Superpixel segmentation is to divide adjacent pixels with homogeneity into large pixels. This technique is usually used in the preprocessing step of segmentation and reduces the redundancy in the image. The concept of a superpixel was first proposed by Ren and Malik [
29] in 2003. In accordance with different principles, Achanta et al. [
17] classified superpixel segmentation into two types: based on graph theory and on the gradient descent method. Superpixel segmentation based on graph theory regards the entire image as a network flow diagram, each pixel as a node in the network flow diagram, spatial adjacency relationships between the pixels as edges, and the characteristics of the adjacent pixels as edge weights. Different segmentation criteria are used to segment the image after the network flow diagram is constructed [
30,
31]. The gradient descent-based superpixel segmentation is to iterate the initial clustering pixels and modify the clustering by gradient descent until the error of the iteration converges or is less than a certain threshold. Several methods based on gradient descent have been developed. These methods include the mean shift method proposed by Comaniciu et al. [
19] in 2002, superpixels extracted via energy-driven sampling (SEEDS) segmentation proposed by Van et al. [
32] in 2012, and simple linear iterative clustering (SLIC) proposed by Achanta et al. [
17].
Different superpixel segmentation algorithms have their own advantages and disadvantages, and no optimal segmentation algorithm is suitable for all cases. Superpixel segmentation is developing continuously. Stutz et al. [
33] reviewed 28 state-of-the-art superpixel segmentation algorithms and used five different datasets and a set of metrics to evaluate superpixel algorithms. Six algorithms were recommended for use in practice regarding the performance in boundary recall, undersegmentation error, explained variation, and stability. The six algorithms were extended topology preserving segmentation (ETPS) [
34], SEEDS, entropy rate superpixels (ERS) [
35], contour relaxed superpixels (CRS) [
36], eikonal region growing clustering (ERGC) [
37], and SLIC. In this study, pixels in the network flow diagram were replaced by superpixels; therefore, compact and highly uniform superpixels coherent with image boundaries were desirable to easily establish the neighborhood relationship among superpixels. The SLIC method was used to transform the red, green and blue (RGB) color image into the LAB colour model proposed by Commission Internationale de l’Eclairage (CIE), and the results were combined with
XY coordinates to form a five-dimensional feature vector. Then, a color and spatial distance measure was constructed by using the LABXY five-dimensional feature vector, and the image was locally clustered by simple linear iteration to generate uniform superpixels. The superpixels generated were compact and roughly equally sized with a controllable number. This approach was simple and easy to implement with few input parameters. Thus, SLIC was adopted as the superpixel segmentation algorithm in this study after analyzing the advantages and disadvantages of these methods.
This paper has three main contributions. The first contribution is the proposal of a superpixel cosegmentation method for change detection of satellite remote sensing images for the first time to the best of the authors’ knowledge. Superpixel segmentation and cosegmentation are advanced computer vision methods, which can enrich the change detection algorithms used in the remote sensing community. The second contribution is the use of superpixels as primitives for cosegmentation, thereby greatly improving the efficiency of the algorithm. In order to obtain a unified comprehensive superpixel segmentation boundary of the multitemporal images, superpixel images of different phases were superimposed to extract the inconsistent parts, and the boundary of superpixels was adjusted. The third contribution involves taking Chinese GF-1 and Landsat TM images to carry out the superpixel cosegmentation experiments, and the results were compared to analyze the applicability of the algorithm.
5. Conclusions
In this study, a superpixel cosegmentation change detection method was developed to improve the cosegmentation change detection method due to its low efficiency. The introduction of superpixels greatly improved the operation rate of cosegmentation change detection and expanded the size of images that can be processed. The accuracy of change detection results could be kept at approximately 0.8.
The SLIC superpixel segmentation algorithm was integrated into the cosegmentation change detection algorithm. As the first step of the algorithm, multitemporal images were segmented to superpixels. The optimal values of two important parameters in superpixel segmentation were determined (compactness and segmentation step size) by conducting experiments on two experimental images with different spatial resolutions. The superpixel images of multiple phases were superpositioned to extract the different parts as new superpixels. A cosegmentation change detection energy function based on superpixels was constructed and divided into two terms: change feature and image feature items. The two feature terms were calculated with superpixels as the basic unit. Each superpixel was taken as the node to construct the network flow diagram, and the change and image features in the energy function were taken as the weight values of the formation edges in the network flow diagram. The neighborhood relationship between the superpixel nodes in the network flow diagram was obtained by calculating the distance between the centroid of the superpixel and then selecting the shortest four centroid distances as the neighborhood relationship of the current superpixel. The accuracy of the experimental results based on pixel, object, and cosegmentation were evaluated and compared with each other using a confusion matrix. The advantages and disadvantages of the four methods were summarized.
The proposed superpixel cosegmentation change detection method overcame the shortcomings of the cosegmentation change detection method and obtained good results. However, this method has some shortcomings and deficiencies to be improved.
(1) At present, the execution time of the superpixel cosegmentation change detection algorithm is still long. The current algorithm was naïvely coded without any optimization. The two main factors that restrict the execution speed of the algorithm are the algorithm itself and the serial execution mode. The algorithm efficiency can be improved by graphics processing unit (GPU) parallel calculation. Analyzing each step of the algorithm is necessary. For example, in the process of cosegmentation, the calculation of the image features of the T1 image and T2 image is independent. For this task, dual GPUs can be used to speed up the process, where each GPU is responsible for the calculation of one image features. Moreover, the calculation process of each superpixel can be mapped to a thread and then assigned to a stream processor for processing. In this way, a large number of GPU stream processors can be used to achieve large-scale thread parallelism to compress time. In addition to parallel processing, the appropriate data structure can be selected to reduce the nesting of the loop in the code to reduce the computational time complexity and spatial complexity of the cosegmentation method.
(2) The applicability of this method decreases with the decrease in the spatial resolution of the image. In this study, this method was less suitable for images with a spatial resolution of 30 m or lower, whereas it was more suitable for images with a spatial resolution of 16 m or higher. The specific scope of application needs to be further studied and determined. Improving the algorithm and expanding its application scope are necessary.
(3) The change feature item of the energy function is still imperfect. The superpixels are used as the basic processing unit, but only the spectrum value is considered. Thus, the result is influenced by the “same objects with different spectra, different objects with the same spectrum” phenomenon. The change feature item of the energy function largely influences the results. A suitable and efficient algorithm needs to be developed.
(4) A lot of information can be mined from a superpixel image. These features include the NDVI and normalized water index, texture feature, shape feature, and superpixel spatial relationship. These features can be added to the energy function of image features or as a new feature in segmentation for improving the accuracy of results.
(5) Remote sensing (RS) images reflect only the instantaneous state of the Earth’s surface according to the sampling period. The “same objects with different spectra, different objects with the same spectrum” phenomenon and seasonal phase lead to interference. Consequently, the outcome of change detection will include spurious changes. Several studies [
40] demonstrated that, through integrating the auxiliary information with the RS images, more accurate changing results are obtained. The change information obtained from the images is only the initial result, and further processing combined with other multisource knowledge is needed to obtain satisfactory results.
This paper was concerned with the change detection of paired images at different time points. At present, remote sensing time series analysis has become a hot research field. Cosegmentation as a weakly supervised segmentation method can simultaneously segment multiple images of common interest. In the future, it can be used to extract time series image changes and more efficiently analyze the trend of land-cover changes.