3.1. Dataset Description and Design of Experiments
The proposed approach was validated over different areas located in the Trentino region in the north of Italy (
Figure 6). These areas show interesting properties from the point of view of orographic conformation and environmental variety. Over a relatively small region it is possible to find: (i) flat regions including precious apple and vineyard fields, and urban, sub-urban and industrial areas with different density and structure; and (ii) hill and mountain environments with a variety of tree species.
Three bi-temporal data sets made up of different couple of images among QuickBird (QB), two WorldView-2 (WV-2) and one GeoEye-1 (GE-1) were constructed over the sample areas (yellow squares in
Figure 6). The three datasets were selected such that different types of change are represented. Therefore, dataset 1 shows the transition from forest area to several types of vegetation, dataset 2 shows transitions among different phenological states of crop areas; and dataset 3 shows transitions from vegetation and bare soil (and vice versa) and changes in roofs and roads around an urban area. These datasets allow us to evaluate the complexity of working with multisensor VHR optical images. Details about the three datasets are provided in
Table 1. The three satellites show some remarkable differences due to differences in the view angle, the spectral resolution, the number of bands and the spatial resolution.
The QB and GE-1 images have four multispectral bands, whereas WV-2 has eight. The spatial resolution of the QB image is 0.6 m for the panchromatic band and 2.4 m for multispectral bands, whereas WV-2 and GE-1 offer a higher spatial resolution in both panchromatic and multispectral bands with 0.5 m and 2 m, respectively.
Table 2 summarizes the characteristics of QB, WV-2 and GE-1 images from the spectral and spatial point of view. The spatial resolution differences imply that the sizes
and
of
and
images, respectively, are different despite they cover the same surface. The size of QB image in
Figure 6 is 8674 × 6300 pixels, whereas the size of WV-2 and GE-1 images is 10297 × 7139 pixels. Thus, pixel-by-pixel comparison cannot be directly applied since the same pixel coordinates in the two images do not correspond to the same position on the ground. Concerning spectral resolution, we can observe that the four primary multi-spectral bands of QB, GE-1 and WV-2 are acquired over similar spectral ranges (e.g., red), but not fully identical (e.g., blue). Similar considerations hold for green and NIR bands.
In order to apply the proposed approach, we define the tree of radiometric changes (
) specific for the considered study areas (one single joint tree is provided for the three datasets), apply mitigation, extract suitable features and perform
detection. As a first step, we define the specific tree of radiometric changes
for the considered problem by starting from the general tree given in
Figure 1 [
5]. The three datasets show changes due to acquisition conditions (
) and changes occurring on the ground (
). With regard to
, there are changes in the phenological state of the vegetation
(e.g., radiometry of some crops yards, trees, small roads between crop yards (
), re-vegetation) and changes due to anthropogenic activities
(e.g., changes in road
, deforestation
, roofs
and crop planting). It is important to clarify that even though the types of changes can be visually separated by photointerpretation, we do not have enough information to give a precise “from-to” label to them.
Concerning
, both atmospheric conditions
and acquisition system
effects are present.
is mitigated by means of atmospheric corrections [
6].
is related to the type of sensor (
) and to the sensor view angle (
). The
changes generate small differences in the appearance of objects, leading to geometric distortions, thus to residual misregistration even after proper alignment of images, and to spectral differences when tall buildings are present. We can also see some differences in shadows, which become more critical when high buildings, structures or reliefs are present.
and
changes are non-relevant from the application viewpoint. Therefore, they are explicitly handled before proceeding to detect
and tuning the final CD map.
such as the ones due to sensor acquisition mode (
) are not considered since we are working with passive sensors.
like the ones due to natural disasters (
) or environmental conditions (
) are ignored, since such events did not occur in the considered study area. According to this analysis and to the general taxonomy in
Section 2, the tree of radiometric changes for the considered problem becomes the one in
Figure 7.
Once the radiometric tree was defined, we performed spectral and geometric differences mitigation. All images were provided by DigitalGlobe Foundation in the context of the “MS-TS—Analysis of Multisensor VHR image Time Series” project [
54]. Conversion from DNs to At-Surface Reflectance (ASR) was conducted before delivery by means of the Atmospheric Compensation (AComp) algorithm [
6,
55,
56]. Given the orography of the study area and the possible distortions, we applied orthorectification by using a DEM obtained from LiDAR data [
57]. Further distortions appear in dataset 1, since it is located in mountain area, and dataset 3 because of the presence of buildings. Additional pixel-to-pixel problems are also observed due to
, and co-registration should be applied.
In order to achieve a better co-registration, PS was applied by means of the Gram–Schmidt method. Here ENVI software package was employed [
58]. After PS, the spatial resolution for QB is 0.6 m, and 0.5 m for GE-1 and WV-2 multispectral bands. Co-registration was carried out over the QB–WV-2 and WV-2–GE-1 pairs, covering the whole study area in
Figure 6, by using a polynomial function of second order. For the QB 2006 and WV-2 2010 couple, 79 uniformly distributed Ground Control Points (GCP) were selected, whereas 68 uniformly distributed GCP were selected for the WV-2 2011 and GE-1 2011 couple. The WV-2 2010 image was resampled during co-registration. Resampling was performed by means of the nearest neighbor interpolation.
Figure 8 shows the pansharpened multisensor VHR QB, GE-1 and WV-2 images after applying spectral and geometric mitigation in the first and second column, respectively.
Datasets 1 and 2 show a common spatial resolution of 0.6 m and a size of 640 × 640 pixels, whereas dataset 3 shows a common spatial resolution of 0.5 m and a size of 1800 × 1800 pixels. Datasets 1, 2 and 3 appear in first, second and third row of
Figure 8, respectively. In order to perform qualitative and quantitative analysis, a reference map for datasets 1 and 2 was defined by photointerpretation and exploiting prior knowledge on the scene as no ground truth was available (see
Figure 8c,f, showing 332,414 and 280,928 unchanged pixels (white color) and 77,186 and 128,672 changed pixels (different colors), respectively). For dataset 3, considering the extent of the area and the fact that we have no complete knowledge of the changes occurring on the ground, it was not possible to derive a complete reference map. Thus, quantitative analysis was based on 62808 pixels marked as changed, and 6263 as unchanged, selected by photointerpretation. For comparison purposes, a false color composition of the two acquisitions is provided, green and fuchsia areas represent changes (
Figure 8i). Changed pixels in the reference map include
, only. For dataset 1, changes from (i) forest to bare soil; (ii) forest to grass; (iii) base soil to grass and (iv) bare soil to some road are identified. For dataset 2, changes from (i) dense vegetation to sparse or light vegetation and vice versa; and (ii) bare soil to vegetation are present. In addition, for dataset 3, changes are from (i) bare soil to vegetation, both dense and sparse; (ii) one to another color of the roofs; and (iii) old to new roads.
Once spectral and geometric mitigation was achieved, mitigation of residual
was performed at the level of feature extraction. The selection of the features is therefore designed to detect the residual
and the
. According to the tree of radiometric changes (
Figure 7), residual
might be related to
, resulting in possible shadows and/or registration noise, whereas
includes three type of changes:
and
. Residual
due to shadows, due to vegetation or buildings, were detected by means of the method in [
59], whereas
due to registration noise are negligible.
were detected by employing TC and OrE features.
Three main TCs (i.e., Brightness, Greenness and Wetness) and three OrE (Crop mark, Vegetation and Soil) have been studied because of their sensibility to soil content or transitions from-to soil, vegetation, canopy moisture and anthropogenic activities have been studied. Thus, we expect them to properly detect the different changes in the study area, with the exception of some transitions between green areas that do not show up in TC features. These are the cases of datasets 1 and 3, where transitions from forest to grass and crop to grass are misdetected. This is due to the fact that the difference is more in texture rather than in TC features. In order to evaluate and compare the proposed approach, experiments carried on features such as ASR, are also conducted. The detection of changes is obtained by applying CVA to a 3D TC and OrE feature space. However, other features such as IR-MAD [
24] could be also used, despite no approaches for automatic detection with these features in multisensor VHR images exist yet. The drawback with IR-MAD is that it requires an end-user interaction to select the most specific components that represent the specific change of interest and to separate among changed and un-changed samples. This is time consuming and makes the approach not fully automatic.
Two experiments were designed: (i) experiment 1 (exp. 1) applies CVA to the transformed ASR bands features; and (ii) experiment 2 (exp. 2) applies CVA to TC (for datasets 1 and 2) or OrE (for dataset 3 which has a large presence or urban areas) features. In exp. 1, the first two selected bands are the Near-IR (NIR) and Red (R) given their high spectral sensibility in the analysis of vegetation and anthropogenic activities. The R bands of QB, GE-1 and WV-2 have quite similar spectral range, whereas the NIR ones do not (see
Table 2). Therefore, between NIR1 and NIR2 WV-2 bands, NIR1 (770–995 nm) was selected given that its spectral range better matches to the QB NIR (760–900 nm) and GE-1 NIR (757–853 nm) band spectral range. Another ASR feature to be selected could be the Green or Blue band. Empirical experiments showed a slightly improvement in the final CD accuracy while using Green band instead of Blue one. Therefore, Green band was selected as third feature. In exp. 2, for datasets 1 and 2, TC features were selected based on: (i) the maximum number of TC features that can be derived for each specific sensor, (ii) the radiometric tree of changes; and (iii) the possibility to compare between multisensor TC features. Thus, 4 TC features were derived, bounded by QB properties. In accordance with the radiometric tree, features should be selected that are able to highlight
and
. Based on the level of comparison between the multisensor TC coefficients, and according to the state-of-the-art, only the first three TC features of each sensor show similar physical meaning. For dataset 3, only three OrE features exist in the literature and are derived by means of the 4 main spectral bands of the sensors.
TC and OrE features were derived directly from the spectrally mitigated data and by using the coefficients in
Table 3,
Table 4 and
Table 5. For TC, only coefficients corresponding to the first three TC feature are present. Here the set of coefficients in [
60] was applied to the QB image. The coefficients are derived from the DN feature space (
Table 3). There are no TC coefficients derived from TOA values for QB images in the literature. Thus, we applied the QB TC coefficients over the QB DN features, and compared the derived TC features as a higher level primitive. For the WV-2 image, coefficients are applied as given in [
61], but to the ASR features instead of TOA ones, as originally derived (
Table 4). The OrE features were derived by means of the coefficients shown in
Table 5 and as per Equation (1).
3.2. Results and Discussion
In order to assess the effectiveness of the
mitigation and the
detection approach based on higher-level physical features, CVA was applied by considering the 3D feature space defined above and by means of Equations (2)–(5). We first extracted all the areas that correspond to radiometric changes in the image, by thresholding the magnitude variable. The selection of a threshold T over
showed to be a simple and fast way to separate among changed and non-changed pixels. A good separation, among the three dimensions of CVA was guaranteed in average. T was automatically selected by applying a Bayesian decision rule [
35] and by following the implementation presented in [
62]. In [
62], the statistical distribution of the magnitude as a mixture model representing the classes of unchanged and changed pixels is approximated by a Gaussian mixture. Parameters for the statistical distribution are derived by means of the EM algorithm and decision of the threshold is made using a Bayesian minimum cost criterion [
46]. The T values for each of the datasets in the two experiments are shown in
Table 6.
Figure 9 shows the multispectral difference image 3D histogram for the dataset 1, where bigger circles represent higher density (corresponding to the un-changed samples) and small circles represent lower densities (related to the different types of changes). In
Figure 9a,c, it is possible to see how
and
are distributed in a coplanar and sparse way, respectively. This kind of distribution leads to an easier visual interpretation of the
, if compared to
. In fact, from the 3D histograms in
Figure 9, we can see that the coplanar distribution of ASR features does not allow good separation between changes of interest and changes of no interest. A similar behavior is observed between the ASR and OrE features. Right column of
Figure 9, presents the changed samples after removing the unchanged ones. Different clusters fit to different
sectors and are associated with different changes. As we move from
to
, or
to
, it becomes evident how different clusters locate around preferred directions and how the number of clusters increases, making their detection and separation more effective. Separation among different clusters can be performed by automatic or manual methods.
The last step builds the multiple
by means of the extraction by cancellation strategy. To this end, selection of each of the clusters in the changed region was conducted automatically by means of an adaptation of the TSMO proposed in [
36]. The TSMO method yields the same set of thresholds as the conventional Otsu method, but it greatly decreases the required computation time. This is achieved by introducing an intermediate step based on histogram valley estimations, which is used to automatically determine the appropriate number of thresholds for an image. Final multiple CD maps were built by cancelling the remaining
and the
that are out of interest (e.g., cars in road and parking areas,
Figure 10). As expected, some vegetation changes that affect more texture rather than TC features are misdetected in datasets 1 and 3. In other words, the selected higher-level physical features are not optimized for those changes, but different higher-level physical features could be selected to properly model most of the types of changes.
In order to perform a qualitative analysis, a comparison of the CD maps with their reference maps (for datasets 1 and 2) and the set of points collected by photointerpretation (dataset 3) was carried out. The comparison pointed out the improvement achieved when working with higher-level physical quantities, specifically for transitions from and to bare soil and different types of crops. This is confirmed when we analyze dataset 1 where changes are mainly from-to vegetation and bare soil. TC features outperform the results obtained when using ASR features (
Table 7). As expected, both TC and ASR features have similar problems to identify the change from forest to vegetation. An analogous situation occurs on dataset 2, with changes from-to different types of crops. In this case, TC outperforms ASR being able to separate among changes C2 and C3 (which cannot be discriminated when using ASR features—see
Figure 10d–f).
For datasets 1 and 2, the major improvement is related to the decrease of False Alarms (FA) (see
Table 7 and
Table 8). In dataset 1, the FA correspond to the main road passing through the area and to some of the remaining shadows generated by the tree lines. Even though an index was used to remove the shadows, a small percentage of them remained. In dataset 2, the FA correspond mainly to the linear structures like roads in between the different crops. Moreover, it is possible to observe improvements in terms of detection and separation of the different types of changes. The number of Missed Alarms (MA) decreased as well when working with TC, leading to a better detection of changes, especially when a higher number of changes exist, such as the case of dataset 2. From the quantitative viewpoint, the reduction in both FA and MA reflects in the Overall Accuracy (OA) that increases of about five percent and seven percent for dataset 1 and 2, respectively (see
Table 7 and
Table 8).
In the case of dataset 3, ASR features poorly separates among C3 and C4 classes that correspond to transitions from bare soil to sparse and dense vegetation, respectively. The number of MA for the class C3 by ASR features is clearly larger than for OrE ones (see
Table 9). Moreover, OrE features are able to better detect the classes C2 and C7 than ASR features. C2 and C7 correspond to changes occurring on building roofs and roads renewed in the studied period. Finally, ASR features do not detect class C8 (change in roof color of a building), whereas OrE features do. Other transitions from-to vegetation and bare soil can be seen in the change classes with less MA and FA when the OrE features are used. It is worth noting that changes due to moving cars on roads or in parking areas were considered as non-relevant for this study and thus neglected. From the quantitative perspective, the OA obtained by OrE features increased the overall accuracy of about 2% over that of ASR features. This improvement can be considered relevant given the complexity of the scene with the presence of more types of changes. Proper higher-level physical features, such as some radiometric indexes or texture features, may provide better results.
Table 7,
Table 8 and
Table 9 show the confusion matrices obtained for each dataset in the two experiments.