Next Article in Journal
National Crop Mapping Using Sentinel-1 Time Series: A Knowledge-Based Descriptive Algorithm
Previous Article in Journal
CF2PN: A Cross-Scale Feature Fusion Pyramid Network Based Remote Sensing Target Detection
Previous Article in Special Issue
Probabilistic Mangrove Species Mapping with Multiple-Source Remote-Sensing Datasets Using Label Distribution Learning in Xuan Thuy National Park, Vietnam
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multisensor and Multiresolution Remote Sensing Image Classification through a Causal Hierarchical Markov Framework and Decision Tree Ensembles

1
DITEN Deptartment, University of Genoa, 16145 Genoa, Italy
2
Inria, Université Côte d’Azur, 06902 Sophia-Antipolis, France
3
Institute on Intelligence and Data, Laval University, Quebec City, QC G1V0A6, Canada
*
Author to whom correspondence should be addressed.
Remote Sens. 2021, 13(5), 849; https://doi.org/10.3390/rs13050849
Submission received: 8 January 2021 / Revised: 18 February 2021 / Accepted: 19 February 2021 / Published: 25 February 2021
(This article belongs to the Special Issue Multi-Modality Data Classification: Algorithms and Applications)

Abstract

:
In this paper, a hierarchical probabilistic graphical model is proposed to tackle joint classification of multiresolution and multisensor remote sensing images of the same scene. This problem is crucial in the study of satellite imagery and jointly involves multiresolution and multisensor image fusion. The proposed framework consists of a hierarchical Markov model with a quadtree structure to model information contained in different spatial scales, a planar Markov model to account for contextual spatial information at each resolution, and decision tree ensembles for pixelwise modeling. This probabilistic graphical model and its topology are especially fit for application to very high resolution (VHR) image data. The theoretical properties of the proposed model are analyzed: the causality of the whole framework is mathematically proved, granting the use of time-efficient inference algorithms such as the marginal posterior mode criterion, which is non-iterative when applied to quadtree structures. This is mostly advantageous for classification methods linked to multiresolution tasks formulated on hierarchical Markov models. Within the proposed framework, two multimodal classification algorithms are developed, that incorporate Markov mesh and spatial Markov chain concepts. The results obtained in the experimental validation conducted with two datasets containing VHR multispectral, panchromatic, and radar satellite images, verify the effectiveness of the proposed framework. The proposed approach is also compared to previous methods that are based on alternate strategies for multimodal fusion.

1. Introduction

Due to the heterogeneity of high or very high resolution (HR/VHR) remote sensing imagery, including optical (e.g., Pléiades, WorldView-3, SPOT-6/7) and synthetic aperture radar (SAR; e.g., COSMO-SkyMed Second Generation, TerraSAR-X, RADARSAT Constellation) data, a challenging problem is the development of image classifiers that can exploit information from multimodal input observations. This multimodal framework allows to take advantage of data associated with different physical characteristics, frequencies, polarizations, and to exploit the advantages of information at different scales, which can be either HR or VHR images or even a combination of the two categories: fine-scale observations capture geometrical details but are noise-sensitive, while coarse-scale observations are characterized by large image regions with a strong robustness to outliers and noise [1,2,3].
A major issue in this framework is the heterogeneity of the data coming from different sensors and the characterization of spatial information associated with different resolutions [4]. Trivial well-known solutions mostly use resampling procedures, which are not computationally demanding, in general, but can produce artifacts (e.g., because of aliasing) [3], and do not manage to explicitly capture the multiresolution structure of the image data.
In this paper, an approach to address the semantic segmentation [5,6], or dense supervised classification, of multiresolution and multisensor images is proposed, based on hierarchical probabilistic graphical models (PGMs), in particular hierarchical latent Markov models [7]. On one hand, Markov models formulated on planar graphs or on multilayer graphs such as quadtrees are powerful and flexible stochastic models in the field of structured output learning [7]. However, a common drawback of many Markov random fields (MRFs) postulated on 2D grids of pixels is that they are generally non-causal. The resulting non-convex problems are addressed with time-consuming iterative inference algorithms such as stochastic optimizers (e.g., simulated annealing) [7] or graph-theoretic techniques [8]. On the other hand, an effective way to model multiresolution information is to adopt a hierarchical MRF on a quadtree, to take into account relations among sites at different scales with a Markov chain. These models are causal and it is possible to use efficient sequential techniques for inference (e.g., the Baum–Welch algorithm [9]), but they do not model explicitly contextual-spatial information at each resolution [10].
In the proposed approach, Markovianity is postulated within each scale according to a causal neighborhood system defined on the pixel lattice by an order relation. This choice makes it possible to jointly address spatial-contextual and multiresolution fusion within a unique joint causal framework. Operatively, within the pixel lattice at each scale, this is addressed either by a one-dimensional scan, combining both a zig-zag trajectory and a Hilbert space-filling curve [11,12], or by a Markov mesh random field, where a neighborhood system is defined on a planar lattice, allowing to formulate causality [11,13,14,15,16].
Multiresolution fusion is intrinsically supported by the topology of the proposed framework, while multisensor (optical and radar) fusion is addressed by the integration of non-parametric ensemble modeling, e.g., decision tree ensembles [17], into the proposed hierarchical Markov model. From this perspective, the developed framework generalizes and completes the preliminary formulations that were presented in the conference papers [18,19,20,21].
The marginal posterior mode (MPM) criterion for inference is analytically formulated and proved for the proposed model. This criterion, when applied to quadtree structures, is based on recursive equations leading to the direct computation of the target marginal posterior distribution. As compared to the aforementioned iterative stochastic or graph-theoretic minimization algorithms for non-causal planar Markov random fields, it allows reducing the computational burden and increasing the efficiency of the inference. It is especially favorable for classification tasks linked to multiresolution models [10].
The main contributions of this paper are twofold. Firstly, a novel framework is proposed to tackle multiresolution fusion and spatial-contextual classification of remote sensing images, based on the aforementioned PGM formulation with respect to a total order relation on each planar pixel lattice involved. Input multiresolution datasets including HR images, VHR images, or both are supported in the proposed approach. In particular, the causality of the whole hierarchical Markov framework is proved in the general case associated with an arbitrary total order relation. Furthermore, the analytical formulation of the MPM criterion for the proposed framework is also proved, and the related conditional independence assumptions are discussed. Secondly, two specific classification algorithms are developed within the proposed framework for input multiresolution and multisensor imagery. These algorithms are based on the definition, on each layer of the quadtree, of either a Markov mesh or a Markov chain with respect to a 1D scan of the pixel lattice. In both methods, the fusion of multisensor imagery associated with different spatial resolutions is naturally accomplished thanks to the quadtree topology. The fusion of multisensor imagery collected at the same spatial resolution is obtained by incorporating decision tree ensembles [22] in both methods. These ensemble approaches, such as random forest [23], rotation forest [24], ExtraTrees [25], and gradient boosted regression trees (GBRT) [22], are characterized by a strong flexibility and a non-parametric formulation, allowing them to be applied to highly heterogeneous input data, such as the ones deriving from multisensor applications [17,26]. In this manner, multisensor and multiresolution fusion are jointly addressed in a unique probabilistic graphical framework. The experimental validation was carried out with two VHR satellite datasets of urban and agricultural areas associated with land cover mapping and, in one case, to flood risk prevention [27]. We recall that, in the previous conference paper in [21], the experimentation on the hierarchical PGM with the Markov chain was conducted with the sole GBRT.
This paper is organized as follows: after a review of the state of the art in the field of multiresolution and multisensor classification in Section 2, the developed methodology is presented in Section 3. First, Section 3.1 presents the hierarchical framework proposed, with the required model assumptions in Section 3.1.1 and the theoretical properties in Section 3.1.2. Then, the two multimodal classification methods developed within the proposed framework are presented in Section 3.2. Section 4 and Section 5 report the experimental results obtained with the two VHR datasets considered and a discussion of the performances of the proposed methods. Conclusions are drawn in Section 6.

2. Previous Work

A major challenge regarding the efficient processing of data in remote sensing is given by the recent increase in availability of image data acquired by different sensors, with their corresponding heterogeneous natures. The aim of multisensor image fusion is to integrate complementary information from several images related to the same scene and collected by distinct sensors (e.g., optical and synthetic aperture radar—SAR, or hyperspectral and LiDAR). In the case of multiresolution fusion, multiple images associated with distinct spatial resolutions are jointly used to benefit from their complementary perspectives (more synoptic view vs. more precise spatial detail). Here, the focus is on multisensor and multiresolution fusion aimed at generating supervised image classification or semantic segmentation results from input multimodal imagery. In general terms, the literature on either multisensor or multiresolution fusion in remote sensing is vast and dates back a few decades [2,28], whereas the literature on joint multisensor and multiresolution classification is scarcer. In this framework, serious issues are both the heterogeneity of the data coming from different sensors and the need for a characterization of the spatial information associated with different resolutions [4]. Well-known solutions mostly use resampling procedures, in general computationally cheap but that may cause artifacts [3] and that do not explicitly aim to benefit from the information carried by multiresolution data. Here, we shall review the state of the art associated with multisensor and multiresolution fusion—jointly—and then we shall focus especially on such fusion approaches developed for classification purposes.
Popular approaches involve wavelet-based algorithms [29]. For example, in [30] a hybrid multiresolution method combining the stationary wavelet transform (SWT) with the nonsubsampled contourlet transform (NSCT) to perform image fusion on a visible satellite image and an infrared image is presented. Wavelet-based methods are usually employed to perform the fusion of images captured by different sensors with different resolutions. In general, optical images with fine spectral resolution tend to be coarser in spatial resolution as compared to images with fewer spectral bands, for reasons associated with the signal-to-noise ratio of the optical instrument [31]. In [32] a wavelet-based multiresolution pyramid applied to multitemporal or multisensor satellite data is combined with a stochastic gradient based on two similarity measures, correlation and mutual information. In [33], an image fusion based on a preregistration process is implemented using the scale-invariant feature transform (SIFT) with a reliable outlier removal procedure, a fine-tuning process, and the maximization of mutual information using a modified Marquardt—Levenberg search strategy, is then tested on various remote sensing optical and SAR images. In [34] several wavelet pyramids aimed at performing invariant feature extraction and accelerating image fusion through multiple spatial resolutions are evaluated. In these methods, in order to achieve computational efficiency, fusion is performed iteratively from the coarsest level of the image pyramid, where the number of pixels is lower and the computational cost is limited, to the finest level of the image pyramid [32,33,34].
In [35] the optimal number of decomposition levels required in the wavelet transform in wavelet-based fusion is investigated. Based on the results obtained it was possible to deduce that too few decomposition levels result in fused images with poor spatial quality, while too many levels reduce the spectral similarity between the original multispectral and the pan-sharpened images.
Pan-sharpening is a very common family of multiresolution fusion techniques. Component substitution (CS)-based pan-sharpening methods spectrally transform the multispectral data into another feature space and separate spatial and spectral information into different components [36]. Among these methods, well-known techniques involve intensity-hue-saturation (IHS) [37], principal component analysis (PCA) [38], and Gram-Schmidt [39] transformations. On the other hand, multiresolution analysis (MRA)-based algorithms consist in the injection of spatial details (or high-frequency components) from a panchromatic image to a multispectral image [36,39]. Popular MRA approaches are based on box filtering [40], Gaussian filtering [41], wavelet transform [42,43], such as decimated wavelet transform (DWT) [29], and curvelet transform [44]. Indeed, pansharpening addresses multiresolution fusion from a signal-processing perspective and not from a supervised learning perspective. Accordingly, the resulting fusion product is not designed to be optimal from the viewpoint of supervised labeling with respect to a set of thematic classes described by a training set.
Furthermore, with the advent of unmanned aerial vehicles (UAVs), images with extremely high spatial resolution have become available at a relatively low cost. UAVs are often equipped with simple, lightweight sensors, such as RGB cameras [45] that capture small portions of land—a scenario that may raise the need for multimodal fusion to benefit from the available lightweight instruments. In [45] classification of a high spatial resolution color image and a lower spatial resolution hyperspectral image of the same scene is addressed. Contextual information is obtained from the color image through color attribute profiles (CAPs) and spectral information is extracted from the hyperspectral image; a composite decision fusion (CDF) strategy exploiting kernel-based techniques is proposed.
Joint optical-SAR data classification has been addressed for a couple of decades [46] with approaches stemming from several methodological areas, such as statistical pattern recognition [47], neural networks [48,49,50], consensus theory and decision fusion [51], kernel-based learning [52], and MRFs [53]. From a signal-processing perspective, multisensor image fusion algorithms to integrate panchromatic and SAR features into multispectral images were proposed in [54,55], extracting SAR texture by ratioing the despeckled SAR image to its low-pass approximation with a wavelet decomposition, and using the modified Brovey transform (MBT) conjointly with a wavelet decomposition, respectively. In [56] an image fusion algorithm to merge SAR and optical images based on the discrete multiwavelet transform (DMWT) performed at pixel level and an area-based selective fusion of SAR information to inject in the optical image is presented.
Following signal-level fusion through pan-sharpening, to perform image classification, either single-resolution classifiers [57] or multiresolution methods, based, for example, on Bayesian modeling [58] or hierarchical MRFs [7] can be applied. So far, the framework of supervised classification of multisensor and multiresolution remote sensing data has been scarcely investigated. Two examples of joint multisensor and multiresolution image classification are presented in [4,59]. In [4], a method based on multiple hierarchical Markov random fields on distinct quadtrees associated with multisensor optical and SAR data is introduced. This framework is integrated with finite mixture models and the MPM criterion. In [59], a copula-theoretic multivariate model allows to perform multisensor fusion and a multiscale Markovian model preserves details and avoids the smoothing effects generated by the single-scale MRF-based method. Recent approaches based on deep learning and convolutional neural networks (CNNs) in the field of multimodal—specifically multitemporal and multisensor—classification of remote sensing images are [60,61], where multitemporal samples are collected from Sentinel-2A/B and Landsat-8 satellites and the focus is on input satellite image time series.

3. Methodology

In general terms, a semantic segmentation (or image classification) task can be considered as the problem of estimating latent information represented by class labels x from feature vectors y associated with one or more sites (i.e., pixels) s of an image data set. From a statistical perspective, x and y are usually regarded as occurrences of random vectors. The inference of x given y may be computationally expensive since, in many cases—and specifically in the case of non-causal PGMs on planar graphs associated with image lattices—iterative algorithms are necessary. On the contrary, probabilistic graphical models that involve the use of hierarchical MRFs on quadtrees use the concept of causality in order to develop efficient non-iterative inference algorithms. From a modeling viewpoint, this is accomplished through the factorization of the latent field distribution (i.e., the prior distribution of x) in terms of causal transition probabilities.
Here, this approach is adopted to address semantic segmentation with input multiresolution and multisensor images. The idea is, first, to make use of a quadtree (shown in Figure 1) to model the interactions across multiresolution images. Input image data are inserted in the layers of the quadtree as a function of their spatial resolutions. Then, we propose a hierarchical Markov model that takes into account both the relations between different scales and the spatial information at each scale, in order to favor applicability to HR and VHR imagery. The causality of the proposed model makes it possible to derive an efficient formulation of the MPM criterion that is used to predict the label of each pixel in each layer of the quadtree. The outcome is a multiresolution dense classification result generated from the input multiresolution images. For this purpose, decision tree ensembles are separately trained on all layers of the quadtree to incorporate image data coming from different sensors that operate at the same resolution. Details can be found in the next sections.

3.1. The Proposed Hierarchical Causal Markov Framework

Multiresolution models are fundamental among the current trends of probabilistic graphical models. From an image processing perspective, the idea is to develop analysis methods that can take advantage of all available resolutions and model the trade-off between the coarser ones, more robust to noise and outliers, and the geometrical details of the finer ones. Probabilistic graphical models on either planar or hierarchical multilayer graphs [62], such as quadrees [10], have been popular for long as stochastic models for multiscale and possibly multisensor information. Hierarchical MRFs based on this topology typically capture multiresolution relations across the different layers of the quadtree through a Markov chain. However, in general, they do not model the spatial dependencies within each layer [10]. In this paper, a causal hierarchical framework postulating Markovianity both inter-scale (multiscale information) and intra-scale (contextual spatial information) is defined to address multiresolution fusion while also capturing the spatial information at each scale—a desired property when HR and VHR imagery is considered.
In the following, we shall use P ( · ) and p ( · ) to indicate the probability mass function (PMF) of a discrete random variable or vector and the probability density function (PDF) of a continuous random variable or vector, respectively.

3.1.1. Model Assumptions

Consider { S 0 , S 1 , , S L } , an ensemble of pixel lattices structured as a quadtree, as illustrated in Figure 1. The quadtree is built with the width and the height of S 1 being half those of S ( = 1 , 2 , , L ). Accordingly, S 0 is the root of the quadtree and corresponds to the coarsest spatial resolution, while S L is the layer of the leaves of the tree and corresponds to the finest resolution. Following the usual terminology of hierarchical Markov models, the pixels in the various layers of the quadtree will be named sites. Each site s S in layer S except the root (i.e., with { 1 , 2 , , L } ) has a parent site s S 1 . Each site s S in layer S except the leaves (i.e., with { 0 , 1 , , L 1 } ) has four children sites in S + 1 , which are collected in the set s + S + 1 . These definitions determine a hierarchy on the quadtree S = S 0 S 1 S L starting from the root S 0 to the leaves S L [10].
Let us also consider a discrete random variable x s , representing a class label, associated with each site s S , and let Ω be the finite set of classes, such that x s Ω for each s S . The random field X = { x s } s S of all class labels in the quadtree is a hierarchical MRF if and only if the following property holds for all layers, with the exception of the root layer ( = 1 , 2 , , L ) [10]
P ( X | X 1 , X 2 , , X 0 ) = P ( X | X 1 ) = s S P ( x s | x s ) ,
where X = { x s } s S ( = 0 , 1 , , L ). According to the first equality, the Markovian property holds across the layers of the quadtree, while the second equality indicates that the pixelwise transition probability P ( X | X 1 ) from layer ( 1 ) to layer is supposed to factor out on a pixelwise basis. The former equality and the Markov chain structure it yields across the layers of the quadtree imply a well-known property of hierarchical MRFs on quadtrees, i.e., causality [10]. The latter factorization is also a customary condition in hierarchical MRFs for image classification [10] but has implications in terms of modeling properties. Indeed, as mentioned before, a hierarchical MRF on a quadtree accounts for the dependencies between pixels in different scales, but does not model the spatial-contextual dependencies associated with the random field X on each layer S (see Figure 2a). In the proposed framework, we overcome this limitation by also incorporating a Markov model for the spatial context on the pixel grid at each scale. Simultaneously, it is desired that this spatial model preserves causality, because of the aforementioned implications in terms of efficient inference.
For this purpose, let us denote as R Z 2 a planar rectangular pixel lattice where a total order relation ⪯ is defined. We recall that a total order relation satisfies the following properties ( r , s , t R ) [63]:
Antisymmetry : r s , s r r = s Transitivity : r s , s t r t Connexity : at   least   one   holds   between   r s   and   s r .
The relation r s indicates that r s and r s . The set { r R : r s } defines the “past” of a site s R . A neighborhood relation is also defined in order to have the “past neighbours” of each site, consistently with the order relation ≺. The relation r s denotes that r is a past neighbor of s. Formally, { r R : r s } { r R : r s } , which means that the past neighbors of s R are included in the past of s but constitute a strict subset of its entire past. If we consider again a discrete random variable x s , representing a class label attached to each site s R , and let X = { x s } s R be the corresponding random field, then the Markovianity property constrained to the past of each site holds for X if and only if [11,13,62]:
P ( x s | x r , r s ) = P ( x s | x r , r s ) s R .
Again, this Markov model associated with an order relation on a planar lattice is known to be causal [11,13]. In particular, the following factorization holds for various causal Markov models postulated on planar lattices, such as Markov chains and second and third order Markov meshes [11]:
P ( X ) = s R P ( x s | x r , r s ) ,
up to appropriately defining the behavior of X near the borders of the image.
Indeed, the proposed framework merges the key ideas of the two aforementioned models into a unique hierarchical Markov model combining a quadtree topology and an arbitrary total order relation on each layer of the quadtree for the joint fusion and supervised classification of multisensor and multiresolution images.
Specifically, we assume the same notations introduced above for the quadtree topology. To insert a set of images acquired by different HR or VHR sensors at distinct spatial resolutions on the same area in a quadtree, the requirement is that the pixel lattices of the input images should have a power-of-2 relation one with the other. This constraint can be easily satisfied with a minor resampling [64,65] and antialiasing filtering [3,29]. The fusion between images at different resolutions is naturally matched by hierarchical models, and the relations ≺,⪯, and ≾ are considered to be defined within each lattice in the quadtree separately. This suggests that every site s S of every layer S ( = 1 , 2 , , L ) is connected to its parent s S 1 in the upper layer and to its past neighbors { r S : r s } in the same layer, except for the root layer. A pixel in the root S 0 has no parent but only its past neighbors, if present. The multiresolution and multisensor input images are inserted in the quadtree according to their spatial resolutions, and each s S is linked to a continuous feature vector y s collecting the corresponding image data. Input sensors that output data with the same resolution contribute to the feature vector in the same layer. Y = { y s } s S is the random field of the observations. X = { x s } s S indicates again the random field of the class labels x s Ω associated with all sites s S in the quadtree. The following assumptions define the proposed framework.
Assumption 1.
X satisfies the Markovian property across the scales as in Equation (1). Furthermore, on every layer except the root ( = 1 , 2 , , L ), the following proportionality and factorization hold valid:
P ( X | X 1 , X 2 , , X 0 ) = P ( X | X 1 ) s S P ( x s | x r , r s ) P ( x s | x s ) .
Assumption 2.
On the root layer, X 0 satisfies Equation (3):
P ( X 0 ) = s S 0 P ( x s | x r , r s ) .
Assumption 3.
The observations are conditionally independent given the class labels:
p ( Y | X ) = s S p ( y s | x s ) = = 0 L s S p ( y s | x s ) .
Assumption 1 defines the rationale of the proposed framework involving a hierarchical MRF on a quadtree and a causal planar Markov model built on a total order relation ⪯, implying that the Markovian property holds across the scales and that the transition probabilities can, in turn, be factorized to model both spatial-contextual and multiscale dependencies. Assumption 2 ensures that the spatial factorization is also valid on the root layer. The conditional independence shown in Assumption 3 is normally accepted when using MRFs for image classification and it is used to favor tractability [7,10,14,66].

3.1.2. Methodological Properties

The proposed model is methodologically defined by Assumptions 1–3. Here, we analytically prove two properties of this model. First, we note that the hierarchical MRF in Equation (1) and the MRF on a planar lattice with respect to a total order relation in Equation (2) are well-known to be causal, but this property is not a-priori guaranteed for the proposed framework. Indeed, causality holds for the developed framework as well, as indicated by the following theorem.
Theoremtone 1.
According to Assumptions 1–3, the joint probability distribution of the class labels and the feature vectors in the whole quadtree, i.e., the joint distribution of the random vector ( X , Y ) , is completely determined by the parent–child transition probabilities { P ( x s | x s ) } s S \ S 0 , the past neighbor transition probabilities { P ( x s | x r , r s ) } s S , and the pixelwise data conditional likelihoods { p ( y s | x s ) } s S .
The causality is expressed through the factorization of the joint distribution of the feature vectors and the labels as causal transition probabilities. Thanks to Theorem 1, it is possible to conclude that ( X , Y ) is a Markov chain with respect to the causal topology introduced in this paper and associated with the quadtree and with the total order relation defined on each one of its layers. The proof of Theorem 1 can be found in Appendix A.1.
As anticipated in Section 1, causality allows for an efficient recursive implementation of the MPM criterion. To formulate this criterion, three further assumptions are introduced. For each s S , denote as C s the vector of the labels of all nodes linked to s (the “context” of s), i.e., C s collects the labels of: (i) the sites in { s } { r S : r s } if s is not in the root ( s S \ S 0 ) , and (ii) the sites in { r S 0 : r s } if s S 0 is in the root. Moreover, if a site s is not in the leaf layer ( s S \ S L ), then it is possible to build the vector D s of the feature vectors of all the sites that descend from s along all the lower layers of the quadtree. If s in the leaf layer ( s S L ), then we set D s = y s .
Assumption 4.
The label of each site, when conditioned to those of its connected sites, depends only on the feature vectors of its descendants and not on the feature vectors of the other sites in the quadtree:
P ( x s | C s , Y ) = P ( x s | C s , D s ) s S .
Assumption 5.
The parent and neighbor labels of each site, when conditioned to the field of the feature vectors, are conditionally independent:
P ( C s | Y ) = P ( x s | Y ) r s P ( x r | Y ) s S \ S 0 r s P ( x r | Y ) s S 0 .
Assumption 6.
The parent and neighbor labels of each site, when conditioned to its own label, are mutually independent and independent on the observations of its descendants:
P ( C s | x s , D s ) = P ( C s | x s ) = P ( x s | x s ) r s P ( x r | x s ) s S \ S 0 r s P ( x r | x s ) s S 0 .
Assumptions 4–6 are conditional independence hypotheses and, similar to numerous analogous assumptions that are often made when operating with either hierarchical [10] or planar MRFs [7], they are accepted here to make the inference based on the MPM criterion analytically tractable. This criterion is advantageous for hierarchical MRFs since it considers the errors based on the scale where they occur, avoiding the accumulation of errors along the layers [10]. According to MPM, a pixel s S is labelled as belonging to the class ω Ω that yields the maximum value of P { x s = ω | Y } [7].
Theorem 2.
According to Assumptions 1–6, for each site s S \ S 0 , i.e., not in the root layer:
P ( x s ) = x s Ω P ( x s | x s ) P ( x s ) ,
P ( x s | C s , D s ) P ( x s | D s ) P ( x s | x s ) P ( x s ) P ( x s ) | C s | · r s P ( x s | x r ) P ( x r ) ,
P ( x s | Y ) = C s P ( x s | C s , D s ) P ( x s | Y ) r s P ( x r | Y ) ,
where in Equation (10), the summation is over all possible class labels assigned to site s ; in Equation (11), | B | indicates the cardinality of a finite set B, and the labels x s and x r , r s , are fixed to the values they take in the context C s ; and in Equation (12), the summation is over all possible label configurations in the context C s of site s, i.e., over all possible labels of its parent and past neighbors.
For each root site s S 0 :
P ( x s | C s , D s ) P ( x s | D s ) P ( x s ) | C s | r s P ( x s | x r ) P ( x r ) ,
P ( x s | Y ) = C s P ( x s | C s , D s ) r s P ( x r | Y ) .
For each site s S \ S L , i.e., not in the leaf layer:
P ( x s | D s ) P ( x s | y s ) t s + x t Ω P ( x t | D s ) P ( x t | x s ) P ( x t ) .
Proof of Theorem 2 can be found in Appendix A.2. According to Theorem 2, the MPM criterion can be formulated with three recursive steps. First of all, after the initialization of the prior probabilities P ( x s ) on the root layer, Equation (10) is used to obtain the prior probabilities of all classes on all the other sites of the quadtree using a top-down pass from the root down to the leaves. Various initialization strategies are possible; here the prior probabilities P ( x s ) are assumed independent with respect to the site s considered and are estimated at the root level ( s S 0 ) using the relative frequencies of the classes in the training set considered. Secondly, Equations (15), (11), and (13) are used to compute P ( x s | C s , D s ) using a bottom-up pass from the leaves up to the root and within the root layer. Lastly, Equations (12) and (14) are used to obtain P ( x s | Y ) through a second top-down pass.
The feature vectors enter the recursions through the pixelwise posterior probabilities P ( x s | y s ) in Equation (15). As mentioned above, the fusion of multisensor observations coming at different spatial resolutions is ensured by the quadtree topology intrinsically. In the proposed framework, the pixelwise posteriors P ( x s | y s ) are estimated using decision tree ensembles, such as random forest [23], rotation forest [24], ExtraTrees [25], and GBRT [22]. Data in different layers of the quadtree are associated with the observations at the corresponding resolution: the information collected by two or more sensors at the same resolution contributes to the feature vector in the same layer. Accordingly, this stacked vector may be highly heterogeneous. Decision tree ensembles, with their fully non-parametric formulation, are exploited to perform the estimation of the pixelwise posteriors, because they can flexibly characterize data with arbitrary statistics and are applicable to modeling the class-wise behavior of heterogeneous features. The integration of this ensemble approach for the P ( x s | y s ) terms of the recursive inference of the proposed framework guarantees the fusion of multisensor data associated with the same resolution as well (further details on the role of the decision tree ensembles in the proposed approach can be found in Section 3.2.3).

3.2. The Proposed Multimodal Classification Methods

Two classification algorithms are developed within the proposed causal hierarchical Markov framework. They differ in the choice of the spatial model associated with the individual layers of the quadtree, which corresponds, in turn, to distinct definitions of the total order relation ⪯ used to formalize spatial Markovianity. This analytical difference in the PGM structure leads to different algorithmic formulations.

3.2.1. Markov Mesh-Based Classification Algorithm

Markov mesh random fields (MMRFs) are an important class of causal MRFs built upon an order relation ⪯ on a rectangular lattice R Z 2 to account for the spatial relationship between pixels in the lattice. In the case of an MMRF, r s if r precedes s while the image is raster scanned, i.e., the past of a site s is the set of all pixels crossed by a raster scan before getting to s (see Figure 2b). In particular, in the case of a second-order Markov mesh, r s if and only if r is adjacent to s in its previous row and column in the scan. Denoting as i and j the pixel coordinates in R, this means that r s = ( i , j ) if and only if either r = ( i 1 , j ) or r = ( i , j 1 ) . In the case of a third-order Markov mesh, the three pixels adjacent to s and located in its previous row and column are its past neighbors, i.e., r s = ( i , j ) if and only if r { ( i 1 , j ) , ( i , j 1 ) , ( i 1 , j 1 ) } . These definitions of the past neighborhood are obviously adapted to the image borders. The factorization in Equation (3) holds for Markov meshes of both orders [11].
In the first proposed algorithm, a Markov mesh model is assumed for the spatial information in each level (or layer) of the quadtree. Accordingly, the MPM equations are applied using the aforementioned definition of the ≾ relation. In this respect, the intrinsic directionality of this definition may generally favor anisotropic artifacts in the output classification map—a common drawback of causal Markov model applications [13]. To prevent these artifacts, the symmetric MMRF (SMMRF) formulation defined in [15] is used in the first proposed algorithm (see Figure 3). As far as a planar lattice is concerned, it is based on a pixel visiting trajectory that ensures corner independence [15]. In the scanning trajectory presented in Figure 3, four rows are scanned at each step, two from the bottom and two from the top of the image, until all the rows are scanned. Consequently, the dependencies of pixels on all the neighboring sites are taken into consideration, guaranteeing an isotropic pixel visiting scheme [15]. In the proposed method, this scan is used within each layer of the quadtree separately. Accordingly, the transition probabilities of the Markov chain in Equation (4) take into account the Markov mesh dependencies, so that the components of X convey both the parent–child and the spatial dependencies, as shown in Figure 2c.
Operatively, applying the MPM formulation proved for the proposed framework in the first developed algorithm requires specifying both the pixelwise posteriors P ( x s | y s ) and the transition probabilities P ( x s | x s ) and P ( x s | x r ) , r s , between the label of a site and the labels of its parent and past neighbors. A parametric modeling approach is used for these transition probabilities, based on the formulation introduced by Bouman and Shapiro in [67]. This means that the transition probability across consecutive scales P ( x s | x s ) is modeled so that: (i) the probability that a site and its parent share the same label is the same for all classes and all site locations, i.e., P { x s = ω | x s = ω } = θ for all ω Ω and all s S \ S 0 , where θ [ 0 , 1 ] is a parameter of the method; and (ii) the probability that a site and its parent have different labels does not depend on their class memberships and on their location, i.e., P { x s = ω | x s = ω } is constant over all ω ω ( ω , ω Ω ) and all s S \ S 0 . An analogous model is adopted for the causal neighborhood transition probability P ( x s | x r ) , r s , on the same pixel grid, using a second parameter ϕ [ 0 , 1 ] that represents the common value of all P { x s = ω | x r = ω } for ω Ω and r s . More details can be found in [67,68]. Regarding the parametric models used for the transition probabilities, we recall that, in the context of hierarchical MRF modeling for multitemporal image classification, as shown by the results obtained in [68], the corresponding parameters did not influence significantly the results.

3.2.2. Markov Chain-Based Classification Algorithm

In the second developed algorithm, the spatial-contextual model associated with each layer S of the quadtree ( = 0 , 1 , , L ) is a Markov chain (MC). Consider a scan trajectory on S , i.e., a sequence moving to one pixel from one of its neighbors, visiting every pixel once. Given a pixel s S , the previous pixel in the scan is indicated as s S . In the second developed algorithm, r s if and only if r = s . Therefore, the spatial Markovianity condition in Equation (4) implies that X is a first-order Markov chain for the aforementioned 1D scan trajectory. Accordingly, Equations (4) and (5) define the spatial behavior of the whole random field X as a Markov chain of first-order. Specifically, Equations (11) and (12) result in the following special cases ( s S \ S 0 ):
P ( x s | x s , x s , D s ) P ( x s | D s ) P ( x s ) 2 · P ( x s | x s ) P ( x s ) P ( x s | x s ) P ( x s ) ,
P ( x s | Y ) = x s , x s Ω P ( x s | x s , x s , D s ) P ( x s | Y ) P ( x s | Y ) ,
and Equations (13) and (14) simplify analogously.
The formulation of the second developed algorithm requires the definition of the scan trajectory to be used within each layer of the quadtree. Two popular pixel visiting schemes are the zig-zag path and the Hilbert space-filling curve [11]. A zig-zag scan over the lattice S can be performed as illustrated in Figure 4b. Apart from the sites at the border of the pixel grid, s is the pixel adjacent to s and located diagonally from s in the direction of a zig-zag. The visiting scheme follows the image borders on the perimeter. The other scan makes use of the Hilbert space-filling curve on a power-of-2 sized grid S (see Figure 4a). The Hilbert curve is especially interesting because, while it maps a 1D to a 2D spaces, it preserves locality, an important property for spatial-contextual labeling: two points that are near each other in the 1D path remain near each other after folding. The opposite may generally be violated. Furthermore, the power-of-2 relation among the layer sizes in the quadtree is matched by the recursive construction of the Hilbert curve.
In the second developed method, to avoid the presence of anisotropic “corner-dependent” artifacts, a symmetric scan strategy is employed again. In fact, when using asymmetric scanning, directional artifacts in the shapes and edges of the regions in the classification maps generally occur [69]. Specifically, the scan used in the proposed method combines two Hilbert and four zig-zag paths, as shown in Figure 4c. Each pixel is reached multiple times in a symmetric way through a visiting scheme that involves two different directions of the Hilbert curve and zig-zag path along the two diagonals, in both directions, thus preventing any geometrical artifacts. The scanning trajectory is divided in six steps: in steps 1 and 2 the pixels are visited along a zig-zag curve on the main diagonal from the two opposite ends to the other, subsequently. Step 3 consists of a clockwise scan along the Hilbert curve. Steps 4 and 5 are symmetrical to steps 1 and 2, following the direction of the antidiagonal. Finally, step 6 involves moving anticlockwise onto the Hilbert curve.
Regarding the modeling of the transition probabilities P ( x s | x s ) and P ( x s | x s ) , the same approach as in the case of the first developed method is used, i.e., parametric Bouman–Shapiro models are adopted. In the case of the second method as well, these models involve two parameters θ and ϕ that represent the common values of all same-label transition probabilities P { x s = ω | x s = ω } = θ and P { x s = ω | x s = ω } = ϕ ( ω Ω ;   θ , ϕ [ 0 , 1 ] ).

3.2.3. Role of Decision Tree Ensembles

The fusion of multiresolution data in the MPM scheme is performed by ensembles of decision trees [22]. In general, any classifier predicting pixelwise posterior probabilities P ( x s | y s ) can be used in conjunction with the proposed model. Here, random forest (RanF) [23], rotation forest (RotF) [24], ExtraTrees (ET) [25], and gradient boosted regression trees (GBRT) [22] are used for the estimation of these pixelwise posteriors, since their non-parametric formulation is particularly advantageous for the integration of multisensor data into the proposed hierarchical Markov model [17]. Furthermore, the formulation in terms of pixelwise posteriors does not require models for the class-conditional PDF of the feature vectors to be defined.
Decision tree ensembles are a collection of decision trees that make independent predictions for test samples, while their outputs are aggregated to estimate the pixelwise posterior distribution. GBRT uses boosting [70] to adaptively and iteratively manipulate the training samples for building decision trees [22]. Each training sample is assigned a weight, minimized at each iteration while a decision tree is constructed. The final predictor is determined by a weighted vote of the single trees, depending on their accuracy. For the random forest, instead, each individual tree is trained on a bootstrap sample drawn from the training set (resampling the input data with replacement), resulting in different trees [23]. This process is known as bagging, and the same training sample may be drawn multiple times, while other training samples may never be selected. Random forest combines bagging and random feature-subset selection to favor independence among the trees in the ensemble. An unknown sample is classified by all the trees of the random forest and each tree casts a vote for a class membership. The class with the highest score will be the one finally selected. The ExtraTrees classifier differs from random forest in the selection of the split points in the various nodes of the trees and in the use of the training samples [25]: while in random forest, the optimum split is chosen in a random subset of features, in ExtraTrees this split is selected randomly; while random forest is trained with bootstrap replicas, ExtraTrees is trained with the whole training set available. Rotation forest is based on the idea of encouraging individual accuracy and diversity within the decision trees of the ensemble, by creating N random subsets of the training data and applying principal component analysis (PCA) to each one of them [24]. All the principal components are retained in order to preserve the variance of the data and N axis rotations take place to form the new features for a base classifier. All these four ensemble methods also provide estimates of the pixelwise posterior probabilities. Further algorithm details can be found in [22,23,24,25].
Previous work showed that the use of the bagging and boosting techniques generally achieves higher accuracy and/or enhanced robustness to overfitting than using single decision tree classifiers [71,72], as well as more stability and improved robustness to noise in the training set [73].

4. Experimental Results

4.1. Dataset and Experimental Setup

The experimental validation was carried out with two VHR datasets. The first one consisted of images taken over Port-au-Prince, Haiti, right after the 2010 earthquake (see Figure 5). This dataset is made up of a GeoEye-1 image with the resolution of 1.25 m and RGB channels, a QuickBird image with the resolution of 2.5 m and RGB-near infrared (NIR) channels, and a COSMO-SkyMed stripmap image (X-band SAR) at the resolution of 5 m. For the COSMO-SkyMed stripmap modality, the original resolution is 5 m, while for the multispectral channels of QuickBird and for the multispectral and panchromatic bands of GeoEye-1, whose resolutions are 2.4 m, 1.84 m, and 0.46 m, respectively, a minor resampling was applied. Proper antialiasing filtering was used for this purpose. For the GeoEye-1 image, the resampling was applied after the pan-sharpening to obtain the 1.25 m resolution image with a pixel grid of 1040 × 1360 pixels.
The second dataset consisted of images collected by the IKONOS multiresolution optical sensor over the city of Alessandria, Italy, in 2004 (see Figure 6). It comprises a panchromatic image with the resolution of 1 m (1260 × 1400 pixels, single-channel) and a multispectral image at the resolution of 4 m (315 × 350 pixels, RGB-NIR channels). This dataset was collected in the framework of a case study of flood vulnerability assessment [27].
A quadtree with three layers was employed for both datasets. For the IKONOS dataset, it contained the panchromatic image on the leaf layer and the multispectral image on the root layer (the resolutions of these two images were natively in a power-of-2 relation). The intermediate layer was filled with the result of pansharpening given by the Gram–Schmidt technique [39] and resampling (again with antialiasing) to obtain a 2 m resolution image. For the Haiti dataset, the GeoEye-1, QuickBird, and COSMO-SkyMed images were inserted in the leaf, intermediate, and root layers, respectively, with the aforementioned minor resampling for the first two images.
Both datasets were manually annotated in homogeneous areas at the finest resolution of the quadtree by a photo-interpretation specialist. The resulting ground truth at the finest resolution was divided into two non-overlapping sets, the training set and the test set. As usual in land cover mapping through remote sensing image classification, no class borders were taken into account in this ground truth to avoid mixed pixels in the training and test sets. Training and test maps corresponding to coarser resolutions in the quadtree were obtained by downsampling the training and test maps associated with the finest resolution. On each layer of the quadtree, the training set was used to train the decision tree ensembles and estimate the pixelwise posteriors accordingly, and the test set was used to perform quantitative accuracy assessment.
The proposed SMMRF-based classification algorithm was applied using a second-order mesh. The results of both proposed algorithms were compared to previous approaches to multiresolution and multisensor fusion. First of all, a planar MRF classifier was used. In this method, the unary potentials were computed according to the pixelwise predictions given by random forest after a resampling of the dataset to the finest resolution, the pairwise potentials were expressed by the well-known Potts model [66], and the minimization of the MRF energy was accomplished using sequential tree-reweighted message passing (TRW-S) [74]. TRW-S is a well-known approach from the broader family of belief propagation methods and its sequential formulation exhibits regular convergence properties [74]. This first comparison is aimed at evaluating the performances of the developed methods as compared to those of a spatial-contextual but single-resolution method applied after resampling.
The second approach considered for comparison was the one proposed in [75] for multiresolution optical image classification, based on noncausal MRFs, graph cuts, and Gaussian random fields. In this case, multiresolution fusion is accomplished at the feature level through a linear mixture model rather than through the probabilistic structure of the classifier as in the proposed algorithms. When applied to the COSMO-SkyMed image of the Haiti dataset, it was adapted with a log-normal model for the class-conditional PDF, which is often accepted for the statistics of SAR data. We note that both benchmark techniques produce classification maps at the finest observed resolution. On the contrary, thanks to the MPM formulation on the quadtree, the proposed algorithms generate classification results at all observed resolutions, which is generally a desirable property when input multiresolution imagery is concerned.

4.2. Results and Performances

A quantitative and qualitative (visual) analysis of the results is possible based on the reported figures and tables. For each dataset, the accuracy parameters refer to the corresponding test set.
The results for the Haiti dataset are shown in Figure 7, including the maps generated by the algorithms developed here within the proposed framework (see Figure 7b,c for the SMMRF- and MC-based algorithms, respectively) and those obtained by the aforementioned benchmark techniques (see Figure 7d,e for the single-resolution MRF and the adaptation of the method in [75], respectively). Five land cover classes were present in the dataset: asphalt (yellow), containers (red), vegetation (green), buildings (magenta), and water (blue). The quantitative results of the experimentation conducted on this dataset are presented in two separate tables. Table 1 shows the general behavior of the two proposed algorithms in terms of overall accuracy (OA) and Cohen’s κ , when applied in conjunction with the different decision tree ensembles. Following these results, which highlight that the highest performances were attained using GBRT, Table 2 focuses in more detail on the class-wise accuracies obtained on the test set by the two proposed methods when paired with this decision tree ensemble. The class-wise accuracies in the cases of the other ensembles were analogous and are not shown for brevity.
Regarding the classification maps obtained from the IKONOS dataset, crops on spatial details are shown in Figure 8 and Figure 9. In particular, in Figure 9 it is possible to see the comparison between the results of the two proposed algorithms in the case of a spatially detailed test image. In this dataset there are seven classes: urban (yellow), bare soil (red), rangeland (green), forest (dark green), agricultural (magenta), water (blue), and wet soil (cyan). The classification accuracies, on the other hand, are reported in Table 3, where the overall behavior of the two algorithms is presented as a function of the tree ensembles involved, and in Table 4, where class-wise accuracies and comparisons with benchmark methods are shown while focusing on the use of a single ensemble method (again, the results obtained using the other ensemble methods are similar and omitted for brevity).
The proposed approach attained high accuracies on both datasets, suggesting the effectiveness of the developed methods. The two proposed techniques differ in the definition of spatial-contextual information found in each layer of the quadtree. However, it is noticeable from the quantitative results that the accuracies obtained with the two algorithms are similar, in some cases even identical, consistently with the fact that both techniques are special cases of the causal hierarchical PGM framework formulated in this paper. This is further demonstrated by Table 5, which focuses on the “Haiti” dataset and on the class-wise performances of the SMMRF-based algorithm, when paired with all of the considered decision tree ensembles. For the sake of brevity and to avoid redundancy, we focus in this table on the SMMRF-based method and on “Haiti” because the behavior in the case of the MC-based algorithm and of “Alessandria” is analogous.
A discussion of the experimental results and of their comparison with those of the benchmark techniques can be found in Section 5.

5. Discussion

A visual analysis of the classification maps generated by applying the developed methods to the considered datasets highlights a remarkable visual regularity, obtained thanks to both the related multiresolution graph and the modeling of the contextual spatial information through planar Markov models at each layer (see Figure 7, Figure 8 and Figure 9).
Regardless of the strong spectral overlapping present among several classes (e.g., the vegetated land covers), in particular for the IKONOS dataset, the proposed model attained high test-set accuracies for all the levels of the quadtree. Table 1 and Table 3 show the results with the images taken over Port-au-Prince, Haiti, and Alessandria, Italy, respectively, for the two proposed algorithms. These results suggest the flexibility of the proposed framework in integrating pixelwise predictions from distinct tree ensembles. In terms of class-wise accuracy, both proposed algorithms obtained high values for all classes in the Alessandria dataset and all classes except “vegetation” in the Haiti dataset. To improve the discrimination of “vegetation” in the latter dataset, additional training samples could be added to better represent this land cover as it appears in different areas of the imaged scene.
The highest accuracies were achieved using the proposed algorithms paired with GBRT (and, in some cases, with ExtraTrees or random forest). In the case of both datasets, rather similar results were obtained at the finest and the intermediate resolutions ( 1.25 m and 2.5 m for “Haiti”, and 1 m and 2 m for “Alessandria”). A remarkable improvement ( 6%) was attained using GBRT as compared to the other ensemble methods at resolution equal to 5 m in the case of the Haiti dataset. These results suggest that, while the framework is flexible to incorporating the predictions of arbitrary ensemble methods and accurate maps were obtained in the case of all four ensembles considered, GBRT happens to be an especially appropriate choice within the applications associated with the considered datasets.
The proposed method surpasses in accuracy the performances of the single-resolution MRF and the adaptation of the method in [75], as can be seen in Table 2 and Table 4. Both these benchmark approaches discriminated most classes in quite a satisfactory manner, however with remarkably lower class-wise accuracies as compared to the proposed framework, for example, for “forest” in the “Alessandria” dataset. Furthermore, while the previous techniques perform the classification exclusively at the finest scale, the developed methods map at all the resolutions considered, i.e., the three layers of the quadtree in the considered case studies. These results indicate the effectiveness of the developed approach, based on a causal hierarchical PGM on a quadtree and on ensemble modeling, to address multiresolution and multisensor fusion, as compared to a classical resampling strategy and to a multiresolution model using a linear mixture rationale, such as in [75].
The two algorithms developed within the proposed framework attained similar results, in some cases even identical, as mentioned in Section 4. This matches the fact that both techniques, that make use either of a Markov chain or of an SMMRF within each layer, are special cases of a general model defined in this paper, characterized by a multiresolution causal hierarchical structure postulated on a quadtree and an arbitrary total order relation. This is shown in Table 2 and Table 4 and in Figure 7, Figure 8 and Figure 9. Consistently with this perspective, in general, large difference in accuracy between these two spatial-contextual models based on either Markov meshes or Markov chains incorporated in the proposed hierarchical framework on a quadtree, are not expected. At least in the case of the considered datasets, these two proposed approaches exhibited similar capabilities from the viewpoint of class discrimination, as confirmed by the results obtained.
However, the two approaches differ in terms of computational burden. The MC-based method involves looping over the three labels x s , x s , and x s , according to Equations (11) and (12) and to their special case in Equation (16), because each site s is connected to its parent site and to the preceding pixel in the 1D scan. Therefore, the computational burden is O ( | Ω | 3 ) . With regard to the proposed method using the SMMRF, the formulation of the mesh includes more neighboring pixels, which can lead, for example, to O ( | Ω | 4 ) and O ( | Ω | 5 ) in the case of second- and third-order meshes, respectively. Clearly, the smaller the subset of neighbors, the lower the computational cost of the inference.

6. Conclusions

A general framework based on causal hierarchical Markov modeling has been proposed for the complex problem of the joint classification of multimodal data including multisensor and multiresolution input components. The developed framework extends approaches formalized previously, by introducing Markovianity at each resolution through an arbitrary total order relation on a pixel lattice. Two classification algorithms have been developed within the proposed framework, by making use of different total orders and of correspondingly different spatial Markov models (a Markov mesh and a Markov chain along a specific scan path). The two methods were experimentally validated using two VHR datasets involving images collected by multispectral, panchromatic, and SAR sensors.
Both the causality of the whole framework and the expression of the MPM criterion have been proven analytically. The proposed hierarchical approach allows to obtain classification maps at all the spatial resolutions considered (a frequently advantageous property when working with multiresolution datasets). On the contrary, previous techniques used for comparison perform classification at one resolution. Multiresolution fusion is addressed intrinsically by the quadtree topology, while multisensor fusion is obtained thanks to the flexibility of tree ensemble modeling. Quite accurate discrimination was attained on the test set in all the layers of the quadtree, in spite of the high spectral overlapping of several classes, corresponding to vegetated land covers sensed with only a few multispectral bands. The results indicated that the proposed approach was flexible to the incorporation of pixelwise predictions from different tree ensembles. Yet, the highest performances were attained using GBRT, making it a particularly appropriate choice within the proposed experimentation.
The classification maps obtained by the proposed methods showed visual smoothness, thus indicating the effectiveness of a framework that exploits the spatial information within each layer of the quadtree. While the previous algorithms considered for comparisons also generated accurate maps, the presented framework increased the overall accuracy, thus confirming the potential of the integration of hierarchical PGMs with tree ensembles in the field of multiresolution and multisensor fusion.
The two proposed algorithms show different computational complexities, lower for the MC-based method than for the SMMRF-based technique. As a matter of fact, the method integrating a 1D Markov chain along zig-zag and Hilbert curves for each layer of the quadtree is computationally advantageous, owing to its lower complexity that leads to lower execution times. This reduced computational complexity makes it possible to tackle more efficiently both a larger image size and more classes.
Future generalizations of the present work may consider the integration of this multiscale framework with convolutional neural networks, which intrinsically match the multiresolution data representation through their convolution and pooling layers [76], and with adaptive multiscale topologies associated with superpixel segmentation results at several scales [77,78].

Author Contributions

Conceptualization, I.H., G.M., S.B.S. and J.Z.; methodology, All; software, L.F., I.H. and A.M.; validation, All; formal analysis, I.H., A.M., G.M., M.P., S.B.S. and J.Z.; investigation, L.F. and A.M.; resources, G.M., S.B.S. and J.Z.; data curation, G.M., S.B.S. and J.Z.; writing—original draft preparation, M.P.; writing—review and editing, All; visualization, M.P.; supervision, G.M., S.B.S. and J.Z.; project administration, G.M., S.B.S. and J.Z.; funding acquisition, G.M., S.B.S. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received partial funding from the French Space Agency (CNES) within project CNES/INRIA no. 131024/00. The support is gratefully acknowledged.

Institutional Review Board Statement

Not applicable.

Acknowledgments

The authors wish to thank CNES for partial financial funding, DigitalGlobe Inc. for the QuickBird data, Google Crisis Response for the GeoEye-1 data, and the Italian Space Agency (ASI) for the COSMO-SkyMed data. They would also like to thank Gabriele Barabino for his assistance in conducting the experiments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Proof of Theorem 1

We consider Assumptions 1–3 presented in Section 3.1.1 to hold valid.
Theoremtone A1.
According to Assumptions 1–3, the joint probability distribution of the class labels and the feature vectors in the whole quadtree, i.e., the joint distribution of the random vector ( X , Y ) , is completely determined by the parent–child transition probabilities { P ( x s | x s ) } s S \ S 0 , the past neighbor transition probabilities { P ( x s | x r , r s ) } s S , and the pixelwise data conditional likelihoods { p ( y s | x s ) } s S .
Proof. 
According to Equation (4), Assumption 1 implies that:
P ( X , Y ) = p ( Y | X ) P ( X ) = p ( Y | X ) P ( X L , X L 1 , , X 0 ) = p ( Y | X ) P ( X L | X L 1 ) P ( X 1 | X 0 ) P ( X 0 ) .
Incidentally, we note that P ( X , Y ) , which formalizes the joint distribution of ( X , Y ) , is neither a PMF nor a PDF because the random vector ( X , Y ) includes both discrete ( X ) and continuous components ( Y ). Yet, P ( X , Y ) does represent a proper “density function,” meant in the sense of a Radon–Nikodym derivative with respect to an appropriate product measure space [63].
By using the factorizations in Equation (4) for P ( X | X 1 ) ( = 1 , 2 , , L ), Equation (5) for P ( X 0 ) , and (6) for p ( Y | X ) , into Equation (A1) we obtain:
P ( X , Y ) = 1 L s S P ( x s | x r , r s ) P ( x s | x s ) p ( y s | x s ) · s S 0 P ( x s | x r , r s ) p ( y s | x s ) .
This implies that the joint probability distribution of the feature vectors and the labels P ( X , Y ) is completely defined by the class-conditional likelihoods p ( y s | x s ) , the past neighbor transition probabilities P ( x s | x r , r s ) , and the parent–child transition probabilities P ( x s | x s ) . □

Appendix A.2. Proof of Theorem 2

The Assumptions 1–6 introduced in Section 3.1.1 and Section 3.1.2 are considered to hold valid.
Theorem A2.
According to Assumptions 1–6, for each site s S \ S 0 , i.e., not in the root layer:
P ( x s ) = x s Ω P ( x s | x s ) P ( x s ) ,
P ( x s | C s , D s ) P ( x s | D s ) P ( x s | x s ) P ( x s ) P ( x s ) | C s | · r s P ( x s | x r ) P ( x r ) ,
P ( x s | Y ) = C s P ( x s | C s , D s ) P ( x s | Y ) r s P ( x r | Y ) ,
where in Equation (10), the summation is over all possible class labels assigned to site s ; in Equation (11), | B | indicates the cardinality of a finite set B, and the labels x s and x r , r s , are fixed to the values they take in the context C s ; and in Equation (12), the summation is over all possible label configurations in the context C s of site s, i.e., over all possible labels of its parent and past neighbors.
For each root site s S 0 :
P ( x s | C s , D s ) P ( x s | D s ) P ( x s ) | C s | r s P ( x s | x r ) P ( x r ) ,
P ( x s | Y ) = C s P ( x s | C s , D s ) r s P ( x r | Y ) .
For each site s S \ S L , i.e., not in the leaf layer:
P ( x s | D s ) P ( x s | y s ) t s + x t Ω P ( x t | D s ) P ( x t | x s ) P ( x t ) .
Proof. 
Equation (10) directly derives from the total probability theorem. The proof of Equation (15) for a hierarchical MRF is reported in [10]. According to the total probability theorem:
P ( x s | Y ) = C s P ( x s | C s , Y ) P ( C s | Y ) .
Equations (12) and (14) follow by inserting Equations (7) and (8) into (A3). Considering Bayes theorem:
P ( x s | C s , D s ) P ( C s | x s , D s ) P ( x s | D s ) .
The constant of proportionality does not include x s . If s S \ S 0 and if Equation (9) is inserted into Equation (A4), we derive:
P ( x s | C s , D s ) P ( x s | D s ) P ( x s | x s ) r s P ( x r | x s ) .
Equation (11) follows from this proportionality and from Bayes theorem. Equation (13) is proved analogously. □

References

  1. Moser, G.; Serpico, S.B.; Benediktsson, J.A. Land-Cover Mapping by Markov Modeling of Spatial–Contextual Information in Very-High-Resolution Remote Sensing Images. Proc. IEEE 2013, 101, 631–651. [Google Scholar] [CrossRef]
  2. Richards, J.A. Remote Sensing Digital Image Analysis: An Introduction, 5th ed.; Springer: Berlin, Germany, 2013. [Google Scholar]
  3. Alparone, L.; Aiazzi, B.; Baronti, S.; Garzelli, G. Remote Sensing Image Fusion; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar]
  4. Hedhli, I.; Moser, G.; Serpico, S.B.; Zerubia, J. Classification of multisensor and multiresolution remote sensing images through hierarchical Markov random fields. IEEE Geosci. Remote Sens. Lett. 2017, 14, 2448–2452. [Google Scholar] [CrossRef] [Green Version]
  5. Wang, X. Deep learning in object recognition, detection, and segmentation. Found. Trends Signal Process. 2016, 8, 217–382. [Google Scholar] [CrossRef]
  6. Arnab, A.; Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Larsson, M.; Kirillov, A.; Savchynskyy, B.; Rother, C.; Kahl, F.; Torr, P.H.S. Conditional random fields meet deep neural networks for semantic segmentation: Combining probabilistic graphical models with deep learning for structured prediction. IEEE Signal Process. Mag. 2018, 35, 37–52. [Google Scholar] [CrossRef]
  7. Li, S.Z. Markov Random Field Modeling in Image Analysis, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  8. Boykov, Y.; Veksler, O.; Zabih, R. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 1222–1239. [Google Scholar] [CrossRef] [Green Version]
  9. Baggenstoss, P.M. A modified Baum-Welch algorithm for hidden Markov models with multiple observation spaces. IEEE Trans. Speech Audio Process. 2001, 9, 411–416. [Google Scholar] [CrossRef] [Green Version]
  10. Laferté, J.M.; Pérez, P.; Heitz, F. Discrete Markov image modeling and inference on the quadtree. IEEE Trans. Image Process. 2000, 9, 390–404. [Google Scholar] [CrossRef]
  11. Abend, K.; Harley, T.J.; Kanal, L.N. Classification of binary random patterns. IEEE Trans. Inf. Theory 1965, 11, 538–544. [Google Scholar] [CrossRef]
  12. Fjortoft, R.; Delignon, Y.; Pieczynski, W.; Sigelle, M.; Tupin, F. Unsupervised classification of radar images using hidden Markov chains and hidden Markov random fields. IEEE Trans. Geosci. Remote Sens. 2003, 41, 675–686. [Google Scholar] [CrossRef]
  13. Devijver, P.A. Hidden Markov mesh random field models in image analysis. J. Appl. Stat. 1993, 20, 187–227. [Google Scholar] [CrossRef]
  14. Dunmur, A.P.; Titterington, D.M. Computational Bayesian analysis of hidden Markov mesh models. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 1296–1300. [Google Scholar] [CrossRef]
  15. Yousefi, S.; Kehtarnavaz, N.; Cao, Y. Computationally tractable stochastic image modeling based on symmetric Markov mesh random fields. IEEE Trans. Image Process. 2013, 22, 2192–2206. [Google Scholar] [CrossRef] [PubMed]
  16. Razlighi, Q.R.; Kehtarnavaz, N.; Nosratinia, A. Computation of image spatial entropy using quadrilateral Markov random field. IEEE Trans. Image Process. 2009, 18, 2629–2639. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Criminisi, A.; Shotton, J. Decision Forests for Computer Vision and Medical Image Analysis; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  18. Hedhli, I.; Moser, G.; Serpico, S.B.; Zerubia, J. Multi-resolution classification of urban areas using hierarchical symmetric Markov mesh models. In Proceedings of the JURSE 2017, Dubai, United Arab Emirates, 6–8 March 2017; pp. 1–4. [Google Scholar]
  19. Montaldo, A.; Fronda, L.; Hedhli, I.; Moser, G.; Serpico, S.B.; Zerubia, J. Causal Markov mesh hierarchical modeling for the contextual classification of multiresolution satellite images. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2716–2720. [Google Scholar]
  20. Montaldo, A.; Fronda, L.; Hedhli, I.; Moser, G.; Zerubia, J.; Serpico, S.B. Joint classification of multiresolution and multisensor data using a multiscale Markov mesh model. In Proceedings of the 2019 IEEE Geoscience and Remote Sensing Society (IGARSS), Yokohama, Japan, 28 July–2 August 2019; pp. 2810–2813. [Google Scholar]
  21. Montaldo, A.; Fronda, L.; Hedhli, I.; Moser, G.; Serpico, S.; Zerubia, J. A causal hierarchical Markov framework for the classification of multiresolution and multisensor remote sensing images. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, V-3-2020, 269–277. [Google Scholar] [CrossRef]
  22. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  23. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  24. Rodriguez, J.J.; Kuncheva, L.I.; Alonso, C.J. Rotation forest: A New classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1619–1630. [Google Scholar] [CrossRef]
  25. Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef] [Green Version]
  26. Merentitis, A.; Debes, C. Many hands make light work—On ensemble learning techniques for data fusion in remote sensing. IEEE Geosci. Remote Sens. Mag. 2015, 3, 86–99. [Google Scholar] [CrossRef]
  27. Serpico, S.B.; Dellepiane, S.; Boni, G.; Moser, G.; Angiati, E.; Rudari, R. Information extraction from remote sensing images for flood monitoring and damage evaluation. Proc. IEEE 2012, 100, 2946–2970. [Google Scholar] [CrossRef]
  28. Pohl, C.; Van Genderen, J. Remote Sensing Image Fusion: A Practical Guide, 1st ed.; CRC Press: Baton Rouge, FL, USA, 2016. [Google Scholar]
  29. Mallat, S. A Wavelet Tour of Signal Processing—The Sparse Way, 3rd ed.; Academic Press: Cambridge, MA, USA, 2009. [Google Scholar]
  30. Li, S.; Yang, B. Hybrid multiresolution method for multisensor multimodal image fusion. IEEE Sensors J. 2010, 10, 1519–1526. [Google Scholar]
  31. Pajares, G.; Manuel de la Cruz, J. A wavelet-based image fusion tutorial. Pattern Recognit. 2004, 37, 1855–1872. [Google Scholar] [CrossRef]
  32. Cole-Rhodes, A.; Johnson, K.; LeMoigne, J.; Zavorin, I. Multiresolution registration of remote sensing imagery by optimization of mutual information using a stochastic gradient. IEEE Trans. Image Process. 2003, 12, 1495–1511. [Google Scholar] [CrossRef]
  33. Gong, M.; Zhao, S.; Jiao, L.; Tian, D.; Wang, S. A novel coarse-to-fine scheme for automatic image registration based on SIFT and mutual information. IEEE Trans. Geosci. Remote Sens. 2014, 52, 4328–4338. [Google Scholar] [CrossRef]
  34. Zavorin, I.; Moigne, J. Use of multiresolution wavelet feature pyramids for automatic registration of multisensor imagery. IEEE Trans. Image Process. 2005, 14, 770–782. [Google Scholar] [CrossRef] [PubMed]
  35. Pradhan, P.S.; King, R.L.; Younan, N.H.; Holcomb, D.W. Estimation of the mumber of decomposition levels for a wavelet-based multiresolution multisensor image fusion. IEEE Trans. Geosci. Remote Sens. 2006, 44, 3674–3686. [Google Scholar] [CrossRef]
  36. Ghamisi, P.; Rasti, B.; Yokoya, N.; Wang, Q.; Höfle, B.; Bruzzone, L.; Bovolo, F.; Chi, M.; Anders, K.; Gloaguen, R.; et al. Multisource and multitemporal data fusion in remote sensing: A comprehensive review of the state of the art. IEEE Geosci. Remote Sens. Mag. 2019, 7, 6–39. [Google Scholar] [CrossRef] [Green Version]
  37. Carper, W.; Lillesand, T.; Kiefer, P. The use of intensity-hue-saturation transformations for merging SPOT panchromatic and multispectral image data. Photogramm. Eng. Remote Sens. 1990, 56, 459–467. [Google Scholar]
  38. Chavez, P.; Kwarteng, A. Extracting spectral contrast in landsat thematic mapper image data using selective principal component analysis. Photogramm. Eng. Remote Sens. 1989, 55, 339–348. [Google Scholar]
  39. Baronti, S.; Aiazzi, B.; Selva, M.; Garzelli, A.; Alparone, L. A theoretical analysis of the effects of aliasing and misregistration on pansharpened imagery. IEEE J. Sel. Top. Signal Process. 2011, 5, 446–453. [Google Scholar] [CrossRef]
  40. Liu, J.G. Smoothing filter-based intensity modulation: A spectral preserve image fusion technique for improving spatial details. Int. J. Remote Sens. 2000, 21, 3461–3472. [Google Scholar] [CrossRef]
  41. Aiazzi, B.; Alparone, L.; Baronti, S.; Garzelli, A.; Selva, M. MTF-tailored multiscale fusion of high-resolution MS and pan imagery. Photogramm. Eng. Remote Sens. 2006, 72, 591–596. [Google Scholar] [CrossRef]
  42. Nunez, J.; Otazu, X.; Fors, O.; Prades, A.; Pala, V.; Arbiol, R. Multiresolution-based image fusion with additive wavelet decomposition. IEEE Trans. Geosci. Remote Sens. 1999, 37, 1204–1211. [Google Scholar] [CrossRef] [Green Version]
  43. Otazu, X.; Gonzalez-Audicana, M.; Fors, O.; Nunez, J. Introduction of sensor spectral response into image fusion methods. Application to wavelet-based methods. IEEE Trans. Geosci. Remote Sens. 2005, 43, 2376–2385. [Google Scholar] [CrossRef] [Green Version]
  44. Nencini, F.; Garzelli, A.; Baronti, S.; Alparone, L. Remote sensing image fusion using the curvelet transform. Inf. Fusion 2007, 8, 143–156. [Google Scholar] [CrossRef]
  45. Thoonen, G.; Mahmood, Z.; Peeters, S.; Scheunders, P. Multisource classification of color and hyperspectral images using color attribute profiles and composite decision fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 510–521. [Google Scholar] [CrossRef]
  46. Gómez-Chova, L.; Tuia, D.; Moser, G.; Camps-Valls, G. Multimodal classification of remote sensing images: A review and future directions. Proc. IEEE 2015, 103, 1560–1584. [Google Scholar] [CrossRef]
  47. Datcu, M.; Melgani, F.; Piardi, A.; Serpico, S.B. Multisource data classification with dependence trees. IEEE Trans. Geosci. Remote Sens. 2002, 40, 609–617. [Google Scholar] [CrossRef]
  48. Benediktsson, J.A.; Swain, P.H.; Ersoy, O.K. Neural network approaches versus statistical methods in classification of multisource remote sensing data. In Proceedings of the 12th Canadian Symposium on Remote Sensing Geoscience and Remote Sensing Symposium, Vancouver, BC, Canada, 10–14 July 1989; Volume 2, pp. 489–492. [Google Scholar]
  49. Serpico, S.B.; Roli, F. Classification of multi-sensor remote-sensing images by structured neural networks. IEEE Trans. Geosci. Remote Sens. 1995, 33, 562–578. [Google Scholar] [CrossRef]
  50. Simone, G.; Farina, A.; Morabito, F.; Serpico, S.B.; Bruzzone, L. Image fusion techniques for remote sensing applications. Int. J. Inf. Fusion 2002, 3, 3–15. [Google Scholar] [CrossRef] [Green Version]
  51. Waske, B.; Benediktsson, J.A. Fusion of support vector machines for classification of multisensor data. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3858–3866. [Google Scholar] [CrossRef]
  52. Camps-Valls, G.; Gomez-Chova, L.; Munoz-Mari, J.; Rojo-Alvarez, J.L.; Martinez-Ramon, M. Kernel-based framework for multitemporal and multisource remote sensing data classification and change detection. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1822–1835. [Google Scholar] [CrossRef]
  53. Solberg, A.H.S.; Taxt, T.; Jain, A.K. A Markov random field model for classification of multisource satellite imagery. IEEE Trans. Geosci. Remote Sens. 1996, 34, 100–113. [Google Scholar] [CrossRef]
  54. Alparone, L.; Baronti, S.; Garzelli, A.; Nencini, F. Landsat ETM+ and SAR image fusion based on generalized intensity Modulation. IEEE Trans. Geosci. Remote Sens. 2004, 42, 2832–2839. [Google Scholar] [CrossRef]
  55. Chibani, Y. Integration of panchromatic and SAR features into multispectral SPOT images using the ‘à trous’ wavelet decomposition. Int. J. Remote Sens. 2007, 28, 2295–2307. [Google Scholar] [CrossRef]
  56. Byun, Y.; Choi, J.; Han, Y. An area-based image fusion scheme for the integration of SAR and optical satellite imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 2212–2220. [Google Scholar] [CrossRef]
  57. Colditz, R.R.; Wehrmann, T.; Bachmann, M.; Steinnocher, K.; Schmidt, M.; Strunz, G.; Dech, S. Influence of image fusion approaches on classification accuracy: A case study. Int. J. Remote Sens. 2006, 27, 3311–3335. [Google Scholar] [CrossRef]
  58. Storvik, G.; Fjortoft, R.; Solberg, A.H.S. A Bayesian approach to classification of multiresolution remote sensing data. IEEE Trans. Geosci. Remote Sens. 2005, 43, 539–547. [Google Scholar] [CrossRef]
  59. Voisin, A.; Krylov, V.A.; Moser, G.; Serpico, S.B.; Zerubia, J. Supervised classification of multisensor and multiresolution remote sensing images with a hierarchical copula-based approach. IEEE Trans. Geosci. Remote Sens. 2014, 52, 3346–3358. [Google Scholar] [CrossRef] [Green Version]
  60. Li, Z.; Chen, G.; Zhang, T. Temporal attention networks for multitemporal multisensor crop classification. IEEE Access 2019, 7, 134677–134690. [Google Scholar] [CrossRef]
  61. Li, Z.; Chen, G.; Zhang, T. A CNN-transformer hybrid approach for crop classification using multitemporal multisensor images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 847–858. [Google Scholar] [CrossRef]
  62. Willsky, A.S. Multiresolution Markov models for signal and image processing. Proc. IEEE 2002, 90, 1396–1458. [Google Scholar] [CrossRef]
  63. Rudin, W. Real and Complex Analysis, 3rd ed.; McGraw-Hill, Inc.: New York, NY, USA USA, 1987. [Google Scholar]
  64. Dikshit, O.; Roy, D.P. An empirical investigation of image resampling effects upon the spectral and textural supervised classification of a high spatial resolution multispectral image. Photogramm. Eng. Remote Sens. 1996, 62, 1085–1092. [Google Scholar]
  65. Inglada, J.; Muron, V.; Pichard, D.; Feuvrier, T. Analysis of artifacts in subpixel remote sensing image registration. IEEE Trans. Geosci. Remote Sens. 2007, 45, 254–264. [Google Scholar] [CrossRef]
  66. Kato, Z.; Zerubia, J. Markov Random Fields in Image Segmentation. Found. Trends Signal Process. 2012, 5, 1–155. [Google Scholar] [CrossRef]
  67. Bouman, C.A.; Shapiro, M. A multiscale random field model for Bayesian image segmentation. IEEE Trans. Image Process. 1994, 3, 162–177. [Google Scholar] [CrossRef]
  68. Hedhli, I.; Moser, G.; Zerubia, J.; Serpico, S.B. A new cascade model for the hierarchical joint classification of multitemporal and multiresolution remote sensing data. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6333–6348. [Google Scholar] [CrossRef] [Green Version]
  69. Yousefi, S.; Kehtarnavaz, N. Generating symmetric causal Markov random fields. Electron. Lett. 2011, 47, 1224–1225. [Google Scholar] [CrossRef]
  70. Schapire, R. The strength of weak learnability. Mach. Learn. 1990, 5, 197–227. [Google Scholar] [CrossRef] [Green Version]
  71. Briem, G.J.; Benediktsson, J.A.; Sveinsson, J.R. Multiple classifiers applied to multisource remote sensing data. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2291–2299. [Google Scholar] [CrossRef] [Green Version]
  72. Miao, X.; Heaton, J.; Zheng, S.; Charlet, D.; Liu, H. Applying tree-based ensemble algorithms to the classification of ecological zones using multi-temporal multi-source remote-sensing data. Int. J. Remote Sens. 2012, 33, 1823–1849. [Google Scholar] [CrossRef]
  73. DeFries, R.S.; Chan, J.C.W. Multiple criteria for evaluating machine learning algorithms for land cover classification from satellite data. Remote Sens. Environ. 2000, 74, 503–515. [Google Scholar] [CrossRef]
  74. Kolmogorov, V. Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1568–1583. [Google Scholar] [CrossRef] [PubMed]
  75. Moser, G.; De Giorgi, A.; Serpico, S.B. Multiresolution supervised classification of panchromatic and multispectral images by Markov random fields and graph cuts. IEEE Trans. Geosci. Remote Sens. 2016, 54, 5054–5070. [Google Scholar] [CrossRef]
  76. Goodfellow, I.; Bengio, Y.; Courville, A. Deep learning; MIT Press: Boston, MA, USA, 2016. [Google Scholar]
  77. Zhou, Y.; Li, J.; Feng, L.; Zhang, X.; Hu, X. Adaptive scale selection for multiscale segmentation of satellite images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3641–3651. [Google Scholar] [CrossRef]
  78. Akbas, E.; Ahuja, N. Low-level multiscale image segmentation and a benchmark for its evaluation. Comput. Vis. Image Underst. 2020, 199, 103026. [Google Scholar] [CrossRef]
Figure 1. Structure of the quadtree and the corresponding parent–child relations.
Figure 1. Structure of the quadtree and the corresponding parent–child relations.
Remotesensing 13 00849 g001
Figure 2. Causal models of (a) a hierarchical Markov random field (MRF), (b) a Markov mesh, and (c) the proposed symmetric Markov mesh random field (SMMRF)-based algorithm.
Figure 2. Causal models of (a) a hierarchical Markov random field (MRF), (b) a Markov mesh, and (c) the proposed symmetric Markov mesh random field (SMMRF)-based algorithm.
Remotesensing 13 00849 g002
Figure 3. Visiting scheme of the SMMRF adopted in each layer of the first proposed method [18].
Figure 3. Visiting scheme of the SMMRF adopted in each layer of the first proposed method [18].
Remotesensing 13 00849 g003
Figure 4. (a)Hilbert space-filling curve, (b) zig-zag scan, (c) symmetrized scan introduced for the second proposed algorithm through the combination of two Hilbert curve scans and four zig-zag paths.
Figure 4. (a)Hilbert space-filling curve, (b) zig-zag scan, (c) symmetrized scan introduced for the second proposed algorithm through the combination of two Hilbert curve scans and four zig-zag paths.
Remotesensing 13 00849 g004
Figure 5. “Haiti” dataset: (a) 1.25 m resolution GeoEye-1 image (RGB true color composition), (b) 2.5 m resolution QuickBird image (RGB false color composition), (c) 5 m resolution synthetic aperture radar (SAR) COSMO-SkyMed stripmap image, (d) training map, and (e) test map. Color legend for the classes: containers, vegetation, asphalt, buildings, water.
Figure 5. “Haiti” dataset: (a) 1.25 m resolution GeoEye-1 image (RGB true color composition), (b) 2.5 m resolution QuickBird image (RGB false color composition), (c) 5 m resolution synthetic aperture radar (SAR) COSMO-SkyMed stripmap image, (d) training map, and (e) test map. Color legend for the classes: containers, vegetation, asphalt, buildings, water.
Remotesensing 13 00849 g005
Figure 6. “Alessandria dataset”: (a) RGB true-color composition of the IKONOS multispectral channels, (b) panchromatic channel, (c) training map, and (d) test map. Color legend for the classes: urban, agricultural, rangeland, bare soil, forest, water, wet soil.
Figure 6. “Alessandria dataset”: (a) RGB true-color composition of the IKONOS multispectral channels, (b) panchromatic channel, (c) training map, and (d) test map. Color legend for the classes: urban, agricultural, rangeland, bare soil, forest, water, wet soil.
Remotesensing 13 00849 g006
Figure 7. “Haiti” dataset: details of (a) the 1.25 m resolution image and of the classification maps generated by (b) the proposed SMMRF-based algorithm with GBRT, (c) the proposed MC-based algorithm with the zig-zag and Hilbert curve scan and GBRT, (d) the single-resolution MRF-based classifier after resampling, and (e) the adaptation of the technique in [75] to the case of optical-SAR data. Color legend for the classes: containers, vegetation, asphalt, buildings, water.
Figure 7. “Haiti” dataset: details of (a) the 1.25 m resolution image and of the classification maps generated by (b) the proposed SMMRF-based algorithm with GBRT, (c) the proposed MC-based algorithm with the zig-zag and Hilbert curve scan and GBRT, (d) the single-resolution MRF-based classifier after resampling, and (e) the adaptation of the technique in [75] to the case of optical-SAR data. Color legend for the classes: containers, vegetation, asphalt, buildings, water.
Remotesensing 13 00849 g007
Figure 8. “Alessandria” dataset: details of (a) the 1 m resolution panchromatic image and of the classification maps generated by (b) the single-resolution MRF-based classifier after resampling, (c) the algorithm in [75], and (d) the proposed SMMRF-based method with GBRT. Class legend: urban, agricultural, rangeland, bare soil, forest, water, wet soil. “Water” is not present in this detail.
Figure 8. “Alessandria” dataset: details of (a) the 1 m resolution panchromatic image and of the classification maps generated by (b) the single-resolution MRF-based classifier after resampling, (c) the algorithm in [75], and (d) the proposed SMMRF-based method with GBRT. Class legend: urban, agricultural, rangeland, bare soil, forest, water, wet soil. “Water” is not present in this detail.
Remotesensing 13 00849 g008
Figure 9. “Alessandria” dataset: details of (a) the 1 m resolution panchromatic image and the classification maps generated by (b) the proposed SMMRF-based algorithm with GBRT, and (c) the proposed MC-based algorithm with the zig-zag and Hilbert curve scan and GBRT. Class legend: urban, agricultural, rangeland, bare soil, forest, water, wet soil.
Figure 9. “Alessandria” dataset: details of (a) the 1 m resolution panchromatic image and the classification maps generated by (b) the proposed SMMRF-based algorithm with GBRT, and (c) the proposed MC-based algorithm with the zig-zag and Hilbert curve scan and GBRT. Class legend: urban, agricultural, rangeland, bare soil, forest, water, wet soil.
Remotesensing 13 00849 g009
Table 1. “Haiti” dataset: OA and Cohen’s κ with respect to the test set, for the proposed methods employing SMMRF and Markov chain (MC), when combined with random forest (RanF), rotation forest (RotF), ExtraTrees (ET), and gradient boosted regression trees (GBRT).
Table 1. “Haiti” dataset: OA and Cohen’s κ with respect to the test set, for the proposed methods employing SMMRF and Markov chain (MC), when combined with random forest (RanF), rotation forest (RotF), ExtraTrees (ET), and gradient boosted regression trees (GBRT).
Resolution 1.25 mRanFRotFETGBRT
Proposed-SMMRFOA %96.0694.4895.0796.36
Cohen’s κ coeff0.90470.86130.87860.9124
Proposed-MCOA %96.8096.3797.0496.90
Cohen’s κ coeff0.90260.88680.90980.9057
Resolution 2 . 5 mRanFRotFETGBRT
Proposed-SMMRFOA %96.2394.4796.1396.63
Cohen’s κ coeff0.91740.86910.91480.9276
Proposed-MCOA %96.7795.8596.8197.00
Cohen’s κ coeff0.91390.88000.91500.9218
Resolution 5 mRanFRotFETGBRT
Proposed-SMMRFOA %90.0085.8187.5896.01
Cohen’s κ coeff0.76800.65610.70350.9288
Proposed-MCOA %90.9787.9288.7896.01
Cohen’s κ coeff0.73470.62840.65880.9288
Table 2. “Haiti” dataset: class-wise accuracies of the proposed algorithms, combined with GBRT and compared with benchmark methods, with respect to the test set.
Table 2. “Haiti” dataset: class-wise accuracies of the proposed algorithms, combined with GBRT and compared with benchmark methods, with respect to the test set.
Resolution 1.25 mContainer %Vegetation %Asphalt %Buildings %Water %OA %Cohen’s κ Coeff
Proposed-SMMRF-GBRT87.0833.2795.1799.0497.1896.360.9124
Proposed-MC-GBRT86.9734.5797.8299.3610096.900.9057
Single-res. MRF after resampling63.9774.8198.6499.3076.2494.580.888
Adaptation of [75]75.0685.3295.8997.7599.3695.110.901
Resolution 2 . 5 mContainer %Vegetation %Asphalt %Buildings %Water %OA %Cohen’s κ Coeff
Proposed-SMMRF-GBRT86.3132.7194.8710096.0496.630.9276
Proposed-MC-GBRT86.9233.0897.2910010097.000.9218
Resolution 5 mContainer %Vegetation %Asphalt %Buildings %Water %OA %Cohen’s κ Coeff
Proposed-SMMRF-GBRT87.9825.3996.5510088.8896.010.9288
Proposed-MC-GBRT87.9825.3996.5510088.8896.010.9288
Table 3. “Alessandria” dataset: OA and Cohen’s κ with respect to the test set for the proposed methods employing SMMRF and MC, when combined with random forest (RanF), rotation forest (RotF), ExtraTrees (ET), and GBRT.
Table 3. “Alessandria” dataset: OA and Cohen’s κ with respect to the test set for the proposed methods employing SMMRF and MC, when combined with random forest (RanF), rotation forest (RotF), ExtraTrees (ET), and GBRT.
Resolution 1 mRanFRotFETGBRT
Proposed-SMMRFOA %97.8498.3498.7799.12
Cohen’s κ coeff0.97490.98180.98790.9976
Proposed-MCOA %97.5297.1098.4498.58
Cohen’s κ coeff0.97040.96470.98330.9853
Resolution 2 mRanFRotFETGBRT
Proposed-SMMRFOA %97.4596.6598.3198.01
Cohen’s κ coeff0.97450.96350.98670.9825
Proposed-MCOA %97.2396.6198.0998.03
Cohen’s κ coeff0.97150.96290.98360.9829
Resolution 4 mRanFRotFETGBRT
Proposed-SMMRFOA %96.2594.7996.9596.36
Cohen’s κ coeff0.96820.94770.97810.9697
Proposed-MCOA %96.2594.7596.9596.36
Cohen’s κ coeff0.96820.94770.97820.9697
Table 4. “Alessandria” dataset: class-wise accuracies of the proposed algorithms, applied with GBRT and compared with benchmark methods, with respect to the test set.
Table 4. “Alessandria” dataset: class-wise accuracies of the proposed algorithms, applied with GBRT and compared with benchmark methods, with respect to the test set.
Resolution 1 mUrban %Agriculture %Rangeland %Forest %Water %Wet Soil %Bare Soil %OA %Cohen’s κ Coeff
Proposed-SMMRF-GBRT10099.0799.6510099.6510010099.120.9976
Proposed-MC-GBRT99.5498.2699.9186.4099.8010010098.580.9853
Single-res. MRF after resampling98.5899.1292.2336.9810098.3096.8296.030.9453
Method in [75]99.7099.2197.3464.9210010099.6698.600.9804
Resolution 2 mUrban %Agriculture %Rangeland %Forest %Water %Wet Soil %Bare Soil %OA %Cohen’s κ Coeff
Proposed-SMMRF-GBRT99.5498.0699.9178.7499.5310010098.010.9825
Proposed-MC-GBRT99.4998.0899.9179.5299.8410010098.030.9829
Resolution 4 mUrban %Agriculture %Rangeland %Forest %Water %Wet Soil %Bare Soil %OA %Cohen’s κ Coeff
Proposed-SMMRF-GBRT99.0996.8199.8264.8698.7210010096.360.9697
Proposed-MC-GBRT99.0996.8199.8264.8698.7210010096.360.9697
Table 5. “Haiti” dataset: test-set class-wise accuracies of the proposed SMMRF-based method as a function of the adopted decision tree ensemble.
Table 5. “Haiti” dataset: test-set class-wise accuracies of the proposed SMMRF-based method as a function of the adopted decision tree ensemble.
Resolution 1.25 mContainer %Vegetation %Asphalt %Buildings %Water %OA %Cohen’s κ Coeff
Proposed-SMMRF-RanF85.7634.2995.0398.8695.3596.060.9047
Proposed-SMMRF-RotF83.3738.7581.5199.5267.7394.480.8613
Proposed-SMMRF-ET82.2937.6388.7598.7791.8995.070.8786
Proposed-SMMRF-GBRT87.0833.2795.1799.0497.1896.360.9124
Resolution 2 . 5 mContainer %Vegetation %Asphalt %Buildings %Water %OA %Cohen’s κ Coeff
Proposed-SMMRF-RanF85.9527.5195.3299.5895.8596.230.9174
Proposed-SMMRF-RotF76.8532.7181.1810092.2994.470.8691
Proposed-SMMRF-ET84.7632.7195.7799.4495.4596.130.9148
Proposed-SMMRF-GBRT86.3132.7194.8710096.0496.630.9276
Resolution 5 mContainer %Vegetation %Asphalt %Buildings %Water %OA %Cohen’s κ Coeff
Proposed-SMMRF-RanF73.3215.8784.4097.2832.5490.000.7680
Proposed-SMMRF-RotF45.9114.2865.6097.2628.5785.810.6561
Proposed-SMMRF-ET57.9320.6372.2497.2331.7487.580.7035
Proposed-SMMRF-GBRT87.9825.3996.5510088.8896.010.9288
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Pastorino, M.; Montaldo, A.; Fronda, L.; Hedhli, I.; Moser, G.; Serpico, S.B.; Zerubia, J. Multisensor and Multiresolution Remote Sensing Image Classification through a Causal Hierarchical Markov Framework and Decision Tree Ensembles. Remote Sens. 2021, 13, 849. https://doi.org/10.3390/rs13050849

AMA Style

Pastorino M, Montaldo A, Fronda L, Hedhli I, Moser G, Serpico SB, Zerubia J. Multisensor and Multiresolution Remote Sensing Image Classification through a Causal Hierarchical Markov Framework and Decision Tree Ensembles. Remote Sensing. 2021; 13(5):849. https://doi.org/10.3390/rs13050849

Chicago/Turabian Style

Pastorino, Martina, Alessandro Montaldo, Luca Fronda, Ihsen Hedhli, Gabriele Moser, Sebastiano B. Serpico, and Josiane Zerubia. 2021. "Multisensor and Multiresolution Remote Sensing Image Classification through a Causal Hierarchical Markov Framework and Decision Tree Ensembles" Remote Sensing 13, no. 5: 849. https://doi.org/10.3390/rs13050849

APA Style

Pastorino, M., Montaldo, A., Fronda, L., Hedhli, I., Moser, G., Serpico, S. B., & Zerubia, J. (2021). Multisensor and Multiresolution Remote Sensing Image Classification through a Causal Hierarchical Markov Framework and Decision Tree Ensembles. Remote Sensing, 13(5), 849. https://doi.org/10.3390/rs13050849

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop