1. Introduction
Land cover information is fundamental to many earth studies, such as natural resources management, urban planning, and land degradation analyses. Remote sensing (RS) is widely recognized as effective input for land cover mapping and change detection. In traditional land cover mapping, each pixel is assigned to a single class. However, because of the spectral resolution of RS images, the size and regular shape of pixels, and the heterogeneity of the earth’s surface, it is inevitable that more than one land cover class appears in one pixel. The accuracy of land cover classification results estimated from RS imagery is confronted by challenges [
1]. Spectral unmixing has been proposed to overcome the challenge of mixed pixels, and the proportional abundance of each class in a pixel can be estimated based on the spectral signatures of endmembers [
2]. However, using these methods, the spatial position of each class within a pixel is still unknown. To obtain the spatial distribution of land cover in mixed pixels, super-resolution mapping (SRM) was proposed by Atkinson, and using this method a fine-scale spatial resolution land cover map can be achieved [
3,
4,
5,
6].
In the past two decades, a large number of SRM approaches and applications have been developed. Spatial dependence is often used as the basic assumption to derive the fine-scale land cover pattern. The within-pixel spatial location of land cover classes is determined by maximizing the spatial dependence between the classes in the subpixels and their neighboring pixels or subpixels [
7,
8,
9,
10]. The spatial dependence between a subpixel and pixels may be defined as the product of the inverse distances between each subpixel and its neighboring pixels of equal class, possibly using the class fraction values of the neighboring pixels as weights. Methods of this type include the subpixel/pixel spatial attraction model (SASPM) [
11], learning-based algorithms [
12], spatial interpolation methods (using radial basis functions), and vectorial boundary methods [
3]. Several finer land cover results, which contain waterline and burned area, can be obtained by this kind SRM methods [
13,
14]. The spatial dependence between subpixels is calculated using an optimization strategy that switches the positions of subpixels until the spatial dependence of subpixels between neighboring pixels of the same type is maximized. This type of SRM solution is achieved by using numerical optimization methods, such as the Hopfield neural network [
15,
16,
17,
18], pixel swapping algorithm [
19,
20,
21], maximizing posteriori method, genetic algorithm [
14], and particle swarm optimization [
4].
Although spatial dependence is a suitable strategy to simulate the distribution of the subpixel locations at fine-scale spatial resolution, spatial heterogeneity is common on the earth’s surface, which may cause the spatial dependence of subpixels to be affected by irregular neighbors of different orientations [
15,
22]. Several SRM methods have been proposed to manage spatial heterogeneity. These SRM methods can be grouped into three categories. The first category considers prior information about the specific land surface structure, and solution approaches convert this prior information to spatial dependence constraints. The prior orientation of buildings, farmland, and object boundaries are extracted and considered as input or constraints for these SRM methods [
22,
23]. Furthermore, SRM can be performed separately for different types of geo-objects, and then combined in appropriate order [
24]. The second category consists of geostatistical methods, which are used to manage spatial heterogeneity in different directions. The semi-variogram model between coarse fractional information and class probability of subpixel is derived and used to simulate the subpixel locations [
25,
26,
27]. The third category makes use of auxiliary data. The auxiliary data can be categorized into finer images, multi-temporal difference coarse images and historic fine-scale land cover maps, and these are incorporated with coarse fractional information to estimate the fine-scale proportions of subpixels or used to directly swap subpixel positions [
28,
29,
30,
31].
For a homogeneous area with a simple and areal geo-object, spatial dependence-based SRM methods can obtain an accurate finer land cover map, and they have good robustness. Moreover, more heterogeneous information has be considered in the SRM methods when managing complex land surface. As mentioned in the above paragraph, results by SRM when combined with auxiliary data or information are excellent. More consideration needs to be focused on how to extrapolate to other areas, which can accelerate robustness and applicability of SRM for different geo-objects.
With the emergence of high-volume labeled data, high-performance computing, and state-of-the-art network structures, deep learning has demonstrated great advantages for image recognition. Specifically, the convolution neural network (CNN), which can automatically extract spatial features from image, has been shown to be successful for RS image applications, such as image classification, pixel classification and enhancement. RS scene classification is an important application of high-resolution spatial RS, where patches of an RS image are classified as exclusive classes. With the advantage of intrinsic feature extraction from RS patches, a CNN is suitable for scene classification. To overcome the shortage of samples when performing scene classification using a CNN, several sample expansion strategies have been adopted, such as ImageNet pre-trained models and self-labeling techniques [
32]. Additionally, the learned features of different scales or nets can be combined to improve performance [
33,
34]. Another import application of a CNN for RS is pixel classification, which is commonly used for land cover or land use mapping from RS. The encoder-decoder CNN model is basic model, which is a down-sampled-then-up-sampled architecture [
35], and multi-scale features are extracted to maintain boundary information and reduce the categorical ambiguity [
36,
37,
38]. Mohammadimanesh et al. [
39] proposed a land cover mapping method from PolSAR data, where an Inception module and Residual module were adopted to extract more feature. Furthermore, several land cover mapping methods based on multi-model deep learning were proposed when using multi-temporal and multi-source data. Qiu et al. [
40] proposed a land use mapping method from Sentinel 2 image, where the CNN was used for modelling the relationship between a multi-temporal Sentinel 2 images and corresponding land use classes. Interdonato et al. [
41] proposed a dual view point deep learning model to map land cover from Sentinel 2 imagery, where a recurrent neural network and CNN were used to first extract temporal and spatial feature and combine these for classification. The third important application of the CNN in RS is image enhancement, where the CNN is used to construct a nonlinear relationship between the original and target image. Super-resolution construction is a typical example of this application, and several high spatial images have been derived from Landsat and Sentinel 2 images [
42].
As mentioned above, the CNN has the advantage of extracting the intrinsic spatial features of RS images, and several CNN-based RS applications have achieved good performance. The ability of spatial feature extraction is suitable for SRM. Inspired by this, a CNN-based SRM method (
) is proposed in this paper. The main objectives of this research are as follows: (1) To propose a CNN-based SRM method; (2) to use feature visualization to illustrate how a CNN captures spatial features; (3) to analyze the performance of the proposed method for different geo-objects; and (4) to demonstrate the advantage of the CNN-based SRM method when compared with traditional methods. The remainder of this paper is organized as follows: In
Section 2, the method is described. In
Section 3, the details of data used are presented. The results are presented in
Section 4. Several analyses are done in
Section 5. Finally, the conclusions are stated in
Section 6.
4. Results
In the prediction stage, the original high-resolution RS images from the Vaihingen and Potsdam datasets were first blurred using a Gaussian filter and then down-sampled by a factor of four. The down-sampled images (coarse spatial resolution) were then submitted to the baseline SRM methods and proposed method. For the baseline SRM methods, fraction images were first obtained using an SVM soft classification method with selected samples, and a fine-scale land cover map was created using VBSPM and SASPM. For the method, a class-affiliation probability at a fine-scale spatial resolution was created directly using the learned weights, and a fine-scale land cover map was obtained by selecting the class with the highest probability.
4.1. Results for Vaihingen Dataset
The original image of “area34” was selected as a test image for the Vaihingen dataset because it was sufficiently far from the training areas. The fine-scale land cover maps obtained by SASPM, VBSPM and
are shown in
Figure 3. An accuracy assessment for each map was performed by comparing entire pixels of resulting maps with the reference land cover map. The confusion matrices of all the methods are presented in
Table 4,
Table 5 and
Table 6.
Figure 3 clearly shows that the spatial distribution of the land cover maps from SASPM, VBSPM, and
were similar to those of the reference map, where
building locates along with
background and was surrounded by
low vegetation and
tree. Generally, the
result was more similar to the reference map when compared with SASPM and VBSPM. However, the detailed spatial distribution in several subareas was obviously different from that of the reference map, particularly for
low vegetation and
tree.
The accuracy assessments of the three simulated land cover maps confirmed the aforementioned findings. The OAs of the three results were all above 77%, which demonstrates the good fitness between the simulated results and reference map. The OA of surpassed that of the two baseline methods. It was 5% and 6% higher than that of VBSPM and SASPM, respectively.
Detailed information about the class-specific accuracy supports the above results. The PA and UA of low vegetation were relatively lower than those of other classes, the average class-specific accuracy of low vegetation was approximately 0.62. The main reason for the relatively low accuracy was that low vegetation was often misclassified as tree. The average misclassification was 23% for PA and 15% for UA. The background class had the best performance, with an average PA of 0.92. Average PA of building and tree class were 85% and 84% respectively, which showed a good performance.
It could be concluded from the accuracy assessment that
performed better than SASPM or VBSPM for most class-specific accuracies. A detailed comparison of the three SRM methods was shown in the two zoom-in areas in the second and third rows of
Figure 3. In the zoom-in areas, the
image had less “salt–pepper” noise than the SASPM and VBSPM images. Moreover, the spatial distribution of the
image agrees more with the reference map.
4.2. Result of Potsdam Dataset
The test image for the Potsdam dataset was “area_7_11”. The fine-scale land cover maps and accuracy assessment of SASPM, VBSPM, and
were obtained using the same approach used for the Vaihingen dataset. The simulated fine-scale land cover and accuracy assessment results were shown in
Figure 4 and
Table 7,
Table 8 and
Table 9.
Figure 4 shows that
building was mainly located in the left-upper part of the area and was separated from a main road (
background). Generally, all classes of the simulated land cover maps of SASPM, VBSPM and
had a similar structure to the reference map. Few differences occurred between the three simulated results.
Although the OA of the three simulated results was greater than 80%, the result of improved slightly compared with the other methods. It had a 2% and 3% higher OA than VBSPM and SASPM, respectively. Moreover, the class-specific accuracy of was always higher than those of SASPM and VBSPM, which demonstrates the ability of to better model the nonlinear relationship between the coarse image sensing image and the fine-scale land cover map.
Almost all class-specific accuracies of the three simulated results were higher than 80%, except for PA of building. The details of the confusion matrices of the three results showed that the pixels misclassified as background contributed to the low accuracy. However, PA of building was the highest among the three classes, and equaled 91%. PA of vegetation and background were almost equal, which was 82%.
Similar to the Vaihingen dataset, a detailed comparison of the three SRM methods was shown in two zoom-in areas in the second and third rows of
Figure 4. In the zoom-in area, the
image had less “salt–pepper” noise than the SASPM and VBSPM images. Moreover, the spatial distribution of the
result was more similar to that of the reference map.
6. Conclusions
Inspired by the success of the CNN, when dealing with image classification, segmentation, and super resolution construction, a CNN-based SRM method () was proposed in this paper. An encoder-decoder CNN network was used to simulate the nonlinear relationship between a coarse image and a fine-scale land cover map. Two experiments were conducted, using the Vaihingen and Potsdam datasets, to compare with baseline SRM methods. The results demonstrated that achieved an improvement of 5% or 6% for the OA on the Vaihingen dataset, and 2% or 3% on the Potsdam dataset. In addition to the accuracy improvement, visual checks showed that was more similar to the reference map when compared with SASPM and VBSPM.
Despite showing advantages, several deficiencies occurred and limited its applicability for real-world finer land cover mapping. In order to cope with these issues and boost the applicability, future work is needed on networks selection, training sample collections, and multi-scale integration.