Learning Rotation Domain Deep Mutual Information Using Convolutional LSTM for Unsupervised PolSAR Image Classification

Wang, Lei; Xu, Xin; Gui, Rong; Yang, Rui; Pu, Fangling

doi:10.3390/rs12244075

Open AccessArticle

Learning Rotation Domain Deep Mutual Information Using Convolutional LSTM for Unsupervised PolSAR Image Classification

by

Lei Wang

,

Xin Xu

^*,

Rong Gui

,

Rui Yang

and

Fangling Pu

School of Electronic Information, Wuhan University, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2020, 12(24), 4075; https://doi.org/10.3390/rs12244075

Submission received: 31 October 2020 / Revised: 2 December 2020 / Accepted: 10 December 2020 / Published: 12 December 2020

(This article belongs to the Special Issue Remote Sensing Data and Classification Algorithms)

Download

Browse Figures

Versions Notes

Abstract

:

Deep learning can archive state-of-the-art performance in polarimetric synthetic aperture radar (PolSAR) image classification with plenty of labeled data. However, obtaining large number of accurately labeled samples of PolSAR data is very hard, which limits the practical use of deep learning. Therefore, unsupervised PolSAR image classification is worthy of further investigation that is based on deep learning. Inspired by the superior performance of deep mutual information in natural image feature learning and clustering, an end-to-end Convolutional Long Short Term Memory (ConvLSTM) network is used in order to learn the deep mutual information of polarimetric coherent matrices in the rotation domain with different polarimetric orientation angles (POAs) for unsupervised PolSAR image classification. First, for each pixel, paired “POA-spatio” samples are generated from the polarimetric coherent matrices with different POAs. Second, a special designed ConvLSTM network, along with deep mutual information losses, is used in order to learn the discriminative deep mutual information feature representation of the paired data. Finally, the classification results can be output directly from the trained network model. The proposed method is trained in an end-to-end manner and does not have cumbersome pipelines. Experiments on four real PolSAR datasets show that the performance of proposed method surpasses some state-of-the-art deep learning unsupervised classification methods.

Keywords:

PolSAR image; unsupervised classification; mutual information; convolutional LSTM; rotation domain

Graphical Abstract

1. Introduction

Polarimetric synthetic aperture radar (PolSAR) is a side-looking active imaging system and it has the advantages of working all day and night, working under all weather conditions, large scope, and certain penetration capacity. PolSAR has developed rapidly recent years, and it plays a significant role in Earth observation, such as land use planning, disaster prevention, environment monitoring, target detection, and so on [1,2,3].

PolSAR image classification is one of the fundamental applications in PolSAR image interpretation. Supervised PolSAR image classification has achieved excellent performance. Many traditional statistical model-based methods and non-neural machine learning [4] methods can achieve good results, such as the CoAS model [5], random forest (RF) [6], support vector machine (SVM) [7], and XGBoost [8]. In [9], two mixture models were proposed for modeling heterogeneous regions in single-look and multi-look polarimetric SAR images, along with their corresponding maximum likelihood classifiers for land cover classification. Feng et al. [10] proposed a classification scheme for forest growth stage types and other cover types while using a SVM that was based on the Polarimetric SAR Interferometric (PolInSAR) data. The interferometric polarimetric SAR multi-chromatic analysis (MCA-PolInSAR) signal processing method that was proposed in [11] can confirm the feasibility to resolve the volume-oriented indetermination problem. Deep learning is a branch of machine learning and it provides the state-of-the-art solutions to many problems in natural image processing field [12,13]. It also shows excellent performance in supervised PolSAR image classification [14,15,16]. Chen et al. [17] used the roll-invariant polarimetric features and hidden polarimetric features in the rotation domain in order to drive deep convolutional neural network and improved the classification performance. Liu et al. [18] proposed a polarimetric convolutional neural network that was based on a new polarimetric scattering coding method to classify PolSAR images by making full use of polarimetric information. Deep learning can automatically learn the discriminative feature representation of input data. With sufficient labeled training samples, the performance of deep learning based PolSAR image classification methods far surpass traditional machine learning methods [17,19].

Labeled PolSAR data are often insufficient and labeling PolSAR data accurately is expensive and time consuming [20]. Unsupervised image classification is one of the fundamental problems in information processing and it does not need labeled data. Central grouping approaches, such as k-means, were popular in early computer versions, since they could be computed efficiently. Lee et al. [21] proposed the iterative Wishart classifier, which is the most widely used classifier for PolSAR covariance matrix data. Spectral clustering, which is based on eigendecomposition of matrices, has good performance on arbitrary shape cluster and it is also often used for unsupervised SAR image classification [22,23]. Song et al. [24] designed a computationally tractable and memory-saving affinity matrix for spectral clustering and could be used for large size PolSAR image clustering. With the development of machine learning, increasing methods have been proposed based on machine learning for unsupervised PolSAR image classification. Hua et al. [25] presented an unsupervised classification algorithm with an adaptive number of classes for PolSAR data, which is capable of automatically estimating the class numbers. Zou et al. [26] proposed an unsupervised classification framework for PolSAR images by combining the superpixel segmentation, Gaussian kernels, consensus similarity network fusion, spectral clustering, and a new post-processing procedure. The non-neural machine learning based methods have achieved promising results for unsupervised PolSAR image classification [27,28], but the current methods still suffer some problems. First, some methods have cumbersome pipelines, such as pre-processing, feature extraction, clustering, post processing, and so on. For example, superpixel segmentation is usually used to take advantage of the spatial information of pixels [24,26,28,29]. Some methods over-cluster PolSAR images and manually merge the similar classes to improve the performance [24]. Second, the separation of feature extraction and clustering will make the solution sub-optimal [30]. Third, some methods require huge computing resources and they cannot classify large size PolSAR images [31].

Image clustering methods that are based on deep learning develop rapidly in natural image processing and they can be coarsely divided to three categories: (1) the combination of traditional clustering algorithms and deep learning. Deep Subspace Clustering (DSC) [32] introduced a novel self-expressive layer between the encoder and decoder of a deep auto-encoder to mimic the “self-expressiveness” property. Subsequently, from the parameters of the self-expressive layer, an affinity matrix was constructed in order to perform spectral clustering to obtain the final clusters. Zhou et al. [33] combined DSC with Generative Adversarial Networks (GAN) to faithfully evaluate the clustering quality. However, these methods often lead to degenerate solutions and they have cumbersome pipelines, such as pre-training, feature post-processing, and clustering mechanisms external to the network [34]. (2) Deep discriminative feature representation learning methods. Donahue et al. [35] added an encoder to GANs for better visual feature extraction. Hjelm et al. [36] performed unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encode. DeepCluster [37] used the pseudo-label that was computed by k-means as supervision to train the deep neural networks. These methods still require extra classifiers or clustering algorithms to output classification results. (3) Deep mutual information based methods. Invariant Information Clustering (IIC) [34] involved a simple mutual information objective function for paired data in a neural network, which was end-to-end and without any labels. An end-to-end classification method in deep learning can classify images by a single neural network model. The input of the model is the image or low level feature and the output is the classification result. For exploring and taking full advantage of various kinds of correlations behind the unlabeled data, Deep Comprehensive Correlation Mining (DCCM) [30] combined the Deep Adaptive Clustering (DAC) [38] architecture with pseudo-graph supervision, pseudo-label supervision, and triplet mutual information for unsupervised image clustering. Deep learning provides a new way for unsupervised PolSAR image classification. Bi et al. [39] proposed an unsupervised PolSAR image classification method that incorporated polarimetric image factorization and deep convolutional networks into a principled framework. At present, deep learning has great potential to further improve the performance of unsupervised PolSAR image classification. Deep learning based unsupervised PolSAR image classification methods are worthy of further investigation. Some methods [25,40] can obtain the optimal number of clusters in the cluster algorithms. Most unsupervised classification methods use a predefined number of classes in both natural image processing and remote sensing image interpretation fields [24,26,30,34,41]. In this paper, we also focus on the unsupervised PolSAR image classification method with a predefined number of classes.

The mutual information based methods IIC and DCCM are end-to-end deep clustering frameworks, which could be used for unsupervised PolSAR image classification. Different methods use different deep features in order to compute mutual information. IIC used the deep prediction features—the output feature of the softmax layer for classification task—of sample

x

and its randomly geometry transformed version

x^{'}

in order to compute mutual information. DCCM constructed the positive pairs and negative pairs based on the pseudo-graph and extracted the shallow layer and deep layer features to compute the triplet mutual information. The geometry transformations, which are suitable for natural image processing, are used in both DCCM and IIC. However, because of the image mechanism of SAR, it is hard to learn the deep mutual information representation of PolSAR data only via geometry transformations. For unsupervised PolSAR image classification, IIC is hard to converge with the use of mutual information alone. DCCM also cannot learn discriminative feature representation of PolSAR images and the performance is unsatisfactory.

This paper aims to propose an end-to-end unsupervised PolSAR image classification method, which does not have cumbersome pipelines and it is simple to apply to practical applications. In order to further improve the unsupervised classification performance, the state-of-the-art unsupervised deep learning algorithms are adopted to unsupervised PolSAR image classification. Therefore, an unsupervised PolSAR image classification method that is based on Convolution Long Short Term Memory (ConvLSTM) [42] network while using Rotation Domain Deep Mutual Information(RDDMI) of polarimetric coherent matrix is proposed in this paper. Two improvements are introduced in proposed method in order to better learn the deep mutual information of PolSAR data. First, the mutual information algorithms of IIC and DCCM are combined to better learn the PolSAR image feature representation and improve the unsupervised PolSAR image classification performance. Second, as a unique data transformation algorithm in PolSAR data interpretation field, the polarimetric matrix rotation is used in order to improve the deep mutual information learning. The polarimetric response of a target is related to the orientation of the target. The hide features in rotation domain can provide useful polarimetric information and be used to improve the classification performance [43,44,45]. Different Polarimetric Orientation Angles (POAs) are used in order to generate a sequence of polarimetric coherent matrices and then the ConvLSTM is used to learn the rotation domain features of the sequence. ConvLSTM can process long-term dependent sequential data with spatial-temporal information and it has been applied to remote sensing image interpretation [46,47,48]. The advantages of proposed method are summarized, as follows: (1) the deep mutual information in rotation domain is introduced for unsupervised PolSAR image classification. (2) Proposed method is an end-to-end model and do not have cumbersome pipelines. The input is the low level polarimetric features and the output is the class label of a pixel in a PolSAR image. Extra preprocessing or post-processing is not required. By introducing the mutual information algorithms of IIC and DCCM and the unique polarimetric matrix rotation, the proposed method can extract more discriminative feature representation and the performance of unsupervised PolSAR image classification is improved.

2. Methods

Figure 1 shows the proposed architecture and it consists of three modules: the first one is the input module, the second one is the network module, and the latter one is the loss function module.

The input data are the sequence of polarimetric coherent matrices in rotation domain, which is one of the most common used low level features for PolSAR image classification. For each pixel in a PolSAR image, the polarimetric matrix rotation along with geometry transformations is used in order to generate paired sequences of polarimetric coherent matrices,

x

and

x^{'}

, as the inputs of deep neural network.

The network used in proposed method is a convolutional LSTM network. It mainly contains two ConvLSTM layers (CL

_{1}

and CL

_{2}

), three convolutional layers (C

_{1}

, C

_{2}

, and C

_{3}

), three max-pooling layers (M

_{1}

, M

_{2}

and M

_{3}

), two fully connected layers (FC

_{1}

and FC

_{2}

), a softmax layer, and other auxiliary layers, such as ReLU and batch normalization [49].

The loss functions are pseudo-label loss, pseudo-graph loss, and two mutual information losses, which are used to guide the network training. The computation of the IIC mutual information is based on the prediction features of

x

and

x^{'}

. The pseudo-label supervision loss, pseudo-graph supervision loss, and triplet mutual information loss were first introduced in DCCM for image clustering. The prediction features of the network are used in order to compute the similarity matrix among samples. Subsequently, the similarity matrix is used to construct pseudo-graph and pseudo-label to guide the network training. Based on the pseudo-graph, the positive pairs and negative pairs are selected to construct triplet correlations. Finally, the shallow layer and deep layer features of samples, which have triplet correlations, are used in order to compute the triplet mutual information loss. The following sections introduce the detailed information.

2.1. Input Polarimetric Features

The deep mutual information learning of each pixel in a PolSAR image requires paired data

x

and

x^{'}

, which are both the low level features of the pixel. A simple way to generate paired features for a pixel is to transform the low level feature

x

to

x^{'}

by two transformation algorithms. One is the random geometry transformation algorithm, which is used in IIC and DCCM. The other one is the unique polarimetric matrix rotation algorithm of PolSAR data. It can further improve the deep mutual learning and improve the performance of unsupervised PolSAR image classification. In this paper,

x

and

x^{'}

are two sequences of polarimetric coherent matrices in rotation domain with different POAs.

Polarimetric information of PolSAR data can be expressed by polarimetric coherent matrix

T

. The polarimetric matrix rotation on

T

is defined, as follows [43]:

T (θ) = R_{3} (θ) T R_{3}^{H} (θ)

(1)

where

θ

denotes POA and rotation matrix

R_{3} (θ)

=

[\begin{matrix} 1 & 0 & 0 \\ 0 & cos (2 θ) & sin (2 θ) \\ 0 & - sin (2 θ) & cos (2 θ) \end{matrix}]

.

According to Equation (1), the elements

T_{i j} (θ), i, j = 1, 2, 3

in

T (θ)

are

\begin{matrix} T_{11} (θ) = T_{11} \\ T_{12} (θ) = T_{12} cos 2 θ + T_{13} sin 2 θ \\ T_{13} (θ) = - T_{12} sin 2 θ + T_{13} cos 2 θ \\ T_{23} (θ) = \frac{1}{2} (T_{33} - T_{22}) sin 4 θ + R e [T_{23}] cos 4 θ + j I m [T_{23}] \\ T_{22} (θ) = T_{22} {cos}^{2} 2 θ + T_{33} {sin}^{2} 2 θ + R e [T_{23}] sin 4 θ \\ T_{33} (θ) = T_{22} {sin}^{2} 2 θ + T_{33} {cos}^{2} 2 θ - R e [T_{23}] sin 4 θ \end{matrix}

(2)

where

R e [\cdot]

and

I m [\cdot]

denote the real part and imaginary part of a complex

T_{i j}

, respectively. We change the POA from 0 to

π / 2

with step

π / 18

and obtain nine POAs

θ_{x}

. Subsequently,

θ_{x}

are used to generate nine polarimetric coherent matrices. The polarimetric coherent matrices have different POAs and contain spatial information, so it is named “POA-spatio” sequences. For example, Figure 2 shows the Pauli pseudo-color images of the polarimetric coherent matrices of RADARSAT-2 Flevoland dataset. The polarimetric coherent matrices in the rotation domain have different polarimetric properties, which can be used to improve the PolSAR image classification performance.

The rotation domain polarimetric data of each pixel can be defined as a vector

t_{p}

,

\begin{matrix} t_{p} (θ) = [T_{11} (θ), T_{22} (θ), T_{33} (θ), R e [T_{12} (θ)], I m [T_{12} (θ)], \\ R e [T_{13} (θ)], I m [T_{13} (θ)], R e [T_{23} (θ)], I m [T_{23} (θ)]] \end{matrix}

(3)

For each pixel, the neighborhood window data are used to better reserve the spatial information. The vectors of all pixels in a neighborhood window are used in order to generate the sample

x

from the nine polarimetric coherent matrices. The size of each sample

x

is

9 \times 9 \times w \times w

, where w denotes the window size, the first nine is the number of POAs, and the second nine is number of channels, as shown in Figure 3. Each pseudo-color image patch presented in Figure 3 is generated from

T (θ)

, and it denotes the neighborhood window data of a pixel. The deep mutual information learning requires paired data, so the other sample

x^{'}

is also shown in Figure 3. There are two steps to generate sample

x^{'}

. First, sample

x^{'}

is also generated from polarimetric coherent matrices with nine POAs

θ_{x^{'}}

, which are different from

θ_{x}

. Sample

x^{'}

should be similar to

x

, so the nine POAs

θ_{x^{'}}

of

x^{'}

are close to

θ_{x}

. We make small changes to

θ_{x}

, and then the POAs

θ_{x^{'}} = θ_{x} + Δ θ

, where

Δ θ = [Δ θ_{0}, Δ θ_{1}, \dots, Δ θ_{8}]

. For each training epoch, the value of

Δ θ_{i}, i = 0, 1, \dots, 8

is randomly selected in the range

[- 5 π / 180, 4 π / 180]

. Second, the random geometry transformations are applied to

x^{'}

. The geometry transformations include rotation, skewing, scaling, flipping, channel shifting, and so on.

In other words, for each pixel, sample

x

is generated from the rotation domain polarimetric coherent matrices, and then

x

is converted to

x^{'}

by two procedures, which are the polarimetric matrix rotation and random geometry transformations. The network can better learn the deep mutual information of PolSAR data by introducing the unique polarimetric matrix rotation.

2.2. Network Architecture

Figure 4 shows the network used in our method, where N denotes the number of classes. The input sample

x

or

x^{'}

is sequence data and it has spatial information, so the first two ConvLSTM layers are used in order to capture the rotation domain features from the sequence of polarimetric coherent matrices. Subsequently, some convolution layers, as well as fully connected layers, are used in order to further learn deep features of the input sample. Finally, the softmax layer outputs a one-hot vector and the argmax function is used in order to compute the class of the sample. The shallow and deep layer features, which are used to compute the triplet mutual formation loss, are the outputs of the first max-pooling layer and second fully-connected layer, respectively.

LSTM can model long-range dependencies, but it contains too much redundancy for spatial data [42]. ConvLSTM is proposed in order to solve this problem. ConvLSTM replaces the fully connected gate layers of the LSTM with convolutional layers, so it is capable of encoding sequence data that have spatial information [47]. The main equations of ConvLSTM are shown below:

\begin{matrix} i_{t} = σ (W_{x i} * x_{t} + W_{h i} * h_{t - 1} + W_{c i} \circ C_{t - 1} + b_{i}) \\ f_{t} = σ (W_{x f} * x_{t} + W_{h f} * h_{t - 1} + W_{c f} \circ C_{t - 1} + b_{f}) \\ C_{t} = f_{t} \circ C_{t - 1} + i_{t} \circ tanh (W_{x c} * x_{t} + W_{h c} * h_{t - 1} + b_{c}) \\ o_{t} = σ (W_{x o} * x_{t} + W_{h o} * h_{t - 1} + W_{c o} \circ C_{t} + b_{o}) \\ h_{t} = o_{t} \circ tanh (C_{t}) \end{matrix}

(4)

where * denotes the convolution operation, ∘ denotes the Hadamard product, W denotes learnable weights, b denotes the bias,

x_{t}

denotes the input of time t,

f_{t}

and

C_{t}

denote the hidden state,

o_{t}

denotes the output gate, and

h_{t}

denotes the final output state of time t.

σ

and tanh are the activation functions.

The “POA-spatio” polarimetric coherent matrices can be regarded as the time steps of ConvLSTM, and sample

x

or

x^{'}

is “POA-spatio” sequence data. Therefore, ConvLSTM fits well with the deep feature representation learning of the PolSAR “POA-spatio” sequence data.

2.3. Loss Functions

2.3.1. IIC Mutual Information

Let

X = {x_{i}}_{i = 1}^{K}

and

X^{'} = {x_{i}^{'}}_{i = 1}^{K}

be paired samples in a PolSAR image, where

x_{i}

or

x_{i}^{'}

is the i-th sample and K is the number of total sample pairs. IIC aims to learn a representation

Φ

that preserves what is in common between paired data

x

and

x^{'}

, while discarding instance-specific details [34], which can be achieved by maximizing the mutual information (MI):

max_{Φ} I (Φ (x), Φ (x^{'})))

(5)

where

Φ

is a neural network. The output of the softmax layer

z = Φ (x)

, which is the prediction feature of sample

x

, can be interpreted as the distribution of a discrete random variable y over N classes, formally given by

P (y = n | x) = Φ_{n} (x)

[34]. Let y and

y^{'}

be the cluster assignment variables of

x

and

x^{'}

, respectively. Because y and

y^{'}

are not independent after marginalization over the dataset [34], the joint probability distribution of y and

y^{'}

is given by a

N \times N

matrix

P

. Each element at row n and column

n^{'}

of the matrix

P

is

P_{n n^{'}} = P (y = n, y^{'} = n^{'})

. Subsequently,

P

has the following form:

P = \frac{1}{K} \sum_{i = 1}^{K} Φ (x_{i}) \cdot Φ {(x_{i}^{'})}^{T}

(6)

The marginals

P_{n} = P (y = n)

and

P_{n}^{'} = P (y^{'} = n^{'})

can be computed by summing the elements of the rows and columns of the matrix

P

[34]. Plugging the matrix

P

into the mutual information expression [50] and Equation (5) can be computed equivalently:

I (y, y^{'}) = I (P) = \sum_{n = 1}^{N} \sum_{n^{'} = 1}^{N} P_{n n^{'}} l n \frac{P_{n n^{'}}}{P_{n} P_{n^{'}}}

(7)

Subsequently, the IIC mutual information loss can be formulated as:

min_{Φ} L_{I I C \underset{̲}{} M I} = - I (y, y^{'})

(8)

2.3.2. Pseudo Graph and Pseudo Label Supervision Loss

The pseudo graph and pseudo label are used in order to guide the network training and the computation of triplet mutual information. The pseudo-graph is used to explore the binary correlation between samples and the pseudo-label loss is used to make full use of category information behind the data [30].

The neural network

Φ (x)

can output the prediction feature of the input data

x

. Based on the prediction feature vector

z

of the softmax layer, the cosine similarity of two samples can be calculated by the following equation:

S_{i j} = \frac{z_{i} \cdot z_{j}}{| | z_{i} {| |}_{2} \cdot | | z_{j} {| |}_{2}}

(9)

In [30], a large

t h r e s h_{w}

is set to the similarity matrix

S

in order to construct pseudo-graph

W

:

W_{i j} = \{\begin{matrix} 1, & i f s_{i j} > t h r e s h_{w}, \\ 0, & o t h e r w i s e . \end{matrix}

(10)

If the cosine similarity of two samples is larger than

t h r e s h_{w}

, then the two samples are considered to be the same class. During the network training, the cosine similarity of these samples will be maximized. Otherwise, the samples are thought to be different classes and the cosine similarity will be minimized. The pseudo-graph supervision loss has the following form:

min_{Φ} L_{P G} (Φ) = \sum_{x_{i}, x_{j} \in X} l_{g} (Φ (x_{i}), Φ (x_{j}); W_{i j})

(11)

where

l_{g}

is the binary cross-entropy loss [38].

If

S_{i j}

is assumed to be distinctive to each other in similarity matrix

S

, then

X

can be divided into exactly N partitions

{P^{1}, P^{2}, ⋮, P^{N}}

by a threshold t [30]. The samples that have high cosine similarity will be in the same partition and partition n can be set as the pseudo-label of each

x

. The pseudo-label can be formulated as:

y_{i} = arg max_{n} {[Φ (x_{i})]}_{n}

(12)

where

{[\cdot]}_{n}

denotes the n-th component of the prediction vector. The probability of the predicted pseudo-label is

p_{i} = max {[Φ (x_{i})]}_{n}

. Because of the optimization problem, by setting a large threshold

t h r e s h_{l}

, only a highly confident pseudo-label is selected to train the network [30].

V_{i} = \{\begin{matrix} 1, & i f p_{i} > t h r e s h_{l}, \\ 0, & o t h e r w i s e . \end{matrix}

(13)

The pseudo-label supervision loss is formulated as:

min_{Φ} L_{P L} (Φ) = \sum_{x_{i} \in X} V_{i} \cdot l_{l} (Φ (x), y_{i})

(14)

where

l_{l}

is the cross-entropy loss.

The local robustness assumption is used in [30]. The feature representations between

x

and the transformed version

x^{'}

should be invariant, which means

Φ (x) \approx Φ (x^{'})

and the labels of

x

and

x^{'}

should be the same. Accordingly, the feature invariant loss has the following form:

min_{Φ} \sum_{i = 1}^{N} l_{r} (Φ (x), Φ (x^{'}))

(15)

where

l_{r}

denotes the

l_{2}

-norm, which measures the distance between the deep features of

x

and

x^{'}

. Subsequently, the pseudo-graph and pseudo-label information computed based on transformed samples should be consistent with the original samples. The loss function can be formulated as:

min_{Φ} L_{P G}^{'} (Φ) = \sum_{x_{i}^{'}, x_{j}^{'} \in X^{'}} l_{g} (Φ (x_{i}^{'}), Φ (x_{j}^{'}); W_{i j})

(16)

min_{Φ} L_{P L}^{'} (Φ) = \sum_{x_{i}^{'} \in X^{'}} V_{i} \cdot l_{l} (Φ (x^{'}), y_{i})

(17)

2.3.3. Triplet Mutual Information

The instance-level mutual information between the shallow layer and deep layer features of the same sample should be maximized. The instance-level mutual information of two random variables (D, S) is equal to the Jensen–Shannon divergence (JSD) between samples coming from the joint distribution

J

and their product of marginals

M

[36,51]. Different layer features of the same sample should follow the joint distribution. If the features are from different samples, then they should follow the marginal product distribution [36]. The JSD version mutual information is defined as:

{MI}^{(J S D)} (D, S) = E_{J} [- s p (- T (d, s))] - E_{M} [s p (T (d, s))]

(18)

where d denotes the deep layer features, s denotes the shallow layer features, and

s p (z)

is the softplus function

s p (z) = l o g (1 + e^{z})

. The discriminator T is used to distinguish whether d and s are sampled from the joint distribution or not. It is a convolutional neural network, which uses the deep layer and shallow layer features of samples as input. The output feature maps of T are the inputs of

s p (z)

. The detailed implementation of the discriminator T is introduced in [36].

For two different samples,

x_{1}

and

x_{2}

, which belong to the same class, the mutual information between

x_{1}

’s shallow-layer representation and

x_{2}

’s deep-layer representation should also be maximized. Therefore, in [30], the pseudo-graph

W

in Equation (10) is used to select positive pairs and negative pairs to construct triplet correlations. When two samples are the same class, their features are positive pairs; otherwise, the features are negative pairs. In this way, the deep neural network can learn triplet-level mutual information other than instance-level mutual information. The Equation (18) can be expanded in order to calculate the triplet mutual information. Let

d_{j}^{i}

and

s_{j}^{i}

denote the deep layer features and shallow layer features of sample j and the class of the sample is i. Subsequently,

D^{i} = {d_{1}^{i}, d_{2}^{i}, \dots, d_{n}^{i}}

and

S^{i} = {s_{1}^{i}, s_{2}^{i}, \dots, s_{n}^{i}}

are the features sets of class i. Variables

D

and

S

are defined by

D = {D^{1}, D^{2}, \dots, D^{N}}

and

S = {S^{1}, S^{2}, \dots, S^{N}}

, respectively. Triplet mutual information can be formulated as:

{MI}_{s e t}^{(J S D)} (D, S) = E_{(D, S) = J} [- s p (- T (d, s))] - E_{D \times S = M} [s p (T (d, s))]

(19)

Afterwards, the triplet mutual information loss has the following form:

min_{Φ} L_{T R I \underset{̲}{} M I} = - {MI}_{s e t}^{(J S D)} (D, S)

(20)

The triplet mutual information has the advantage that, for two different samples

x_{1}

and

x_{2}

, which are the same class, the mutual information between

x_{1}

’s shallow layer representation and

x_{2}

’s deep layer representation is also maximized. Subsequently, the deep neural network can learn more discriminative representations through the triplet mutual information.

2.4. Model Optimization

By combining the investigations of IIC and DCCM, the final objective of proposed method can be formulated as:

min_{Φ} L = L_{I I C \underset{̲}{} M I} + α \hat{L_{P G}} + β \hat{L_{P L}} + γ L_{T R I \underset{̲}{} M I}

(21)

where

α

,

β

, and

γ

control the importance of corresponding loss functions,

\hat{L_{P G}} = L_{P G} + L_{P G}^{'}

, and

\hat{L_{P L}} = L_{P L} + L_{P L}^{'}

.

The proposed method is trained in a minibatch based end-to-end manner. After the model is trained, the cluster label can be computed by Equation (12) while using the output one-hot vector of the softmax layer. The overall training steps are similar to [30], as shown below:

generate paired “POA-spatio” samples $X$ and $X^{'}$ from the polarimetric coherent matrices in rotation domain.
initialize the parameters of network $Φ$ randomly;
for each randomly selected minibatch $X_{B}$ and $X_{B}^{'}$ , compute the shadow layer, deep layer, and softmax layer features of $Φ (x_{i})$ and $Φ (x_{i}^{'})$ ;
compute the similarity matrix $S$ , pseudo-graph $W$ , and the pseudo-labels;
select positive and negative pairs based on $W$ ;
compute the final loss by Equation (21);
update the parameters of $Φ$ ; and,
calculate the unsupervised classification label by Equation (12) after the network is well trained.

3. Datasets

An airborne PolSAR image and three spaceborne PolSAR images are used as the experiment datasets in this section in order to fully demonstrate the performance of proposed method. The detailed data information is presented in Table 1. For each pixel in a PolSAR image, a paired data

x

and

x^{'}

with size of

9 \times 9 \times w \times w

are generated as the training samples. Except the pixels from the edge areas of a PolSAR image, all other pixels are used for training.

3.1. Gaofen-3 Wuhan Dataset

The Gaofen-3 (GF-3) Wuhan dataset is a C-band GF-3 spaceborne PolSAR data. It covers the scene of Hannan district, Wuhan, China. The spatial resolution is 5.20 m in range direction and 2.25 m in azimuth direction. The image size is

976 \times 603

pixels. There are five land cover classes identified in this dataset, which are water, building, farmland, vegetation 1, and vegetation 2. According to the backscattering responses and vegetation types, vegetation areas are divided into two classes. Figure 5 shows the Pauli pseudo-color image and ground truth map. The scattering power of vegetation 1 is stronger than that of vegetation 2. Vegetation 1 is shrubs or trees, and vegetation 2 is grasslands or reeds. The ground truth map was first used in [8].

3.2. RADARSAT-2 Flevoland Dataset

The other spaceborne dataset is a C-band RADARSAT-2 (RS-2) PolSAR dataset, which is famous and widely used for classification performance evaluation. It covers the scene of Flevoland in Netherlands, with an image size of

1400 \times 1200

pixels. The spatial resolution is 12 m in range direction and 8 m in azimuth direction. According to high resolution optical images, this dataset contains four land cover classes, which are water, farmland, building, and forest. Figure 6 shows the ground truth map and Pauli pseudo-color image.

3.3. AIRSAR Flevoland Dataset

The airborne dataset is acquired by the NASA/JPL AIRSAR L-band system and it is a four-look fully polarimetric image. This dataset also covers over the scene of Flevoland, Netherlands. The spatial resolution is 6m in range direction and 12 m in azimuth direction. The image size is

750 \times 1024

pixels. This dataset has a well-established ground truth map [52] and it is widely used for supervised PolSAR image classification. It contains 11 land cover types, which are forest, water, bare soil, and other eight crop types. Figure 7 shows the ground truth map and Pauli pseudo-color image.

3.4. RADARSAT-2 Wuhan Dataset

The last spaceborne dataset is acquired by RADARSAT-2 C-band PolSAR system at fine quad-pol mode and it covers over the scene of central area of Wuhan city, China. The image size is 5500 × 2400 and the spatial resolution is 12 m in range direction and 8 m in azimuth direction. A total of three main land cover types are identified in this scene, which are water, building, and forest. The Pauli pseudo-color image and ground truth map are shown in Figure 8a,b, respectively. Wuhan is a big city and it contains high density buildings with different orientations. For example, the two red boxes in the Pauli pseudo-color image are building areas and the corresponding optical images are shown in Figure 8c,d. The optical images show high density buildings with different orientations clearly. This dataset and ground truth map were first used in [20].

4. Experiments

The implementation of proposed method is based on the Pytorch implementation of DCCM (https://github.com/Cory-M/DCCM) and IIC (https://github.com/xu-ji/IIC), and most of the training parameters are not changed. The RMSprop optimizer with

l r = 1 e - 4

is used. The thresholds

t h r e s h_{l}

and

t h r e s h_{w}

are set to 0.9. The parameters that control the importance of different losses are

α = 1

,

β = 5

, and

γ = 0.05

. The probability outputs of softmax layer are used in order to compute IIC mutual information and the similarity matrix

S

. The shallow layer feature and deep layer feature that are used to compute triplet mutual information are the outputs of M

_{1}

and FC

_{2}

. The discriminator of triplet mutual information estimation is the same to DCCM. For all of the PolSAR dataset, the neighborhood window size w is 15.

The overall accuracy (OA), kappa coefficient, purity, and entropy [28,41] are used in order to evaluate the classification performances.

OA is one of the most common used measures for classification performance evaluation and it can be formulated as

$O A = \frac{\hat{n}}{K}$

(22)

where K denotes the number of total samples and $\hat{n}$ denotes the correctly classified samples.
Kappa is an indicator of consistency and it can be computed by the following equation

$k a p p a = \frac{p_{o} - p_{e}}{1 - p_{e}}$

(23)

where $p_{o}$ is OA, $p_{e} = \frac{\sum_{i} n_{i} * {\tilde{n}}_{i}}{K \times K}$ , $n_{i}$ is the number of samples that belong to class i, and ${\tilde{n}}_{i}$ is the number of samples that are classified to class i.
Purity and entropy are two commonly used measures for clustering performance evaluation. The purity is the higher the better and the entropy is the lower the better. The two measures are defined, as follows [53]

$P u r i t y = \sum_{r = 1}^{k} \frac{1}{n} max_{i} (n_{r}^{i})$

(24)

$E n t r o p y = \sum_{r = 1}^{k} \frac{n_{r}}{n} (- \frac{1}{log q} \sum_{i = 1}^{q} \frac{n_{r}^{i}}{n_{r}} log \frac{n_{r}^{i}}{n_{r}})$

(25)

where q is the number of classes, k the number of clusters, $n_{r}$ is the size of cluster r, and $n_{r}^{i}$ is the number of data points in class i clustered in cluster r.

DAC [38], DCCM, DCCM+IIC, and three traditional unsupervised PolSAR image classification methods, which are k-means, wishart cluster, and ASC-S [31], are used to compare the performance with proposed method. The original implementation of ASC-S over-clustered PolSAR images and merged similar classes manually. In this paper, ASC-S clusters the PolSAR image to the number of classes that are defined in the ground truth map directly. DCCM+IIC just combines the DCCM with IIC mutual information loss. The input data of DCCM+IIC are generated from polarimetric coherent matrix

T

, and only random geometry transformations are used to generate paired samples. No rotation domain data are used. In this way, we can more clearly show that the polarimetric matrix rotation in the proposed method can improve deep mutual information learning and the performance of unsupervised PolSAR image classification. The random forest (RF) is used as baseline and it is trained in a supervised manner. For each PolSAR data, 25,000 labeled samples are randomly selected in order to train RF. We run all of the methods multiple times and choose the best classification results.

4.1. Results of GF-3 Wuhan

Figure 9 shows the classification results of GF-3 Wuhan dataset and Table 2 shows the performance evaluation. Because the backscattering property of some pixels in the farmland area is similar to the building type, so the farmland and building areas are difficult to cluster in this dataset. The classification accuracies of k-means, wishart, and ASC-S are low. ASC-S classifies water area to two land cover types and cannot distinguish farmland and building types. The results of DAC and DCCM are comparable. The result of DCCM+IIC is much better. The proposed RDDMI achieves the best classification result, especially the classification accuracies of farmland and building types are both good, and the OA is 4.75% higher than DCCM+IIC. The kappa coefficient, purity, and entropy of the proposed method are also the best. Therefore, the polarimetric features in rotation domain can improve the unsupervised PolSAR data classification.

4.2. Results of RS-2 Flevoland

In the RS-2 Flevoland dataset, the farmland type has different backscattering properties. The backscattering property of some small farmland areas is similar to the forest type. Besides, some small forest areas are interspersed among the building areas, and the forest and building types are hard to cluster. Figure 10 and Table 3 illustrate that k-means, wishart, ASC-S, and DAC achieve bad results for farmland, forest, or building types. The classification accuracies of DCCM, DCCM+IIC, and proposed method are much better than the above four methods and the OA of proposed method is 1.07% high than DCCM+IIC. Furthermore, the performance of proposed method is very close to the supervised RF, which shows the superiority of proposed method.

4.3. Results of AIRSAR Flevoland

A total of 11 land cover types are identified in the AIRSAR Flevoland dataset and it is a challenge for unsupervised classification. This dataset contains too many unidentified areas, which may contain new land cover types; hence, only the labeled areas are used for training and evaluation. In this dataset, some land cover types have similar backscattering properties. Besides, the observed polarimetric matrices from the same land cover type may also be quite different. The seven methods all do not achieve impressive classification results, as shown in Figure 11 and Table 4. The classification accuracies of many land cover types in the results of k-means, wishart, ASC-S, DAC, and DCCM are zero or very low. The backscattering property of water type is similar to other types, so DCCM+IIC and proposed method also do not classify water types correctly. However, the OA, kappa, purity, and entropy of proposed method are still much better than the other six methods. The OA of RDDMI is 5.11% higher than DCCM+IIC. It again shows that the polarimetric matrix rotation is helpful for deep mutual information learning and it can improve the performance of unsupervised PolSAR image classification.

4.4. Results of RS-2 Wuhan

The RS-2 Wuhan dataset contains buildings with different orientations, and it is difficult to classify building type correctly. Figure 12 and Table 5 illustrate the classification results and the performance evaluation, respectively. The result of k-means is very bad. The results of Wishart, DAC, DCCM, and DCCM+IIC are better, but the accuracy of building type or forest type is not high. The OA of RDDMI is the best and the accuracies of building and forest types are both high. The kappa, purity, and entropy of RDDMI are also the best. The proposed method uses rotation domain polarimetric coherent matrices and it shows high robustness to buildings with different orientations. Besides, the OA of the proposed method is close to supervised RF, only 2.14% lower than RF, and it again shows that the proposed method has high performance for unsupervised PolSAR image classification.

5. Discussion

5.1. Semi-Supervised Classification

The proposed method can learn discriminative feature representation, which can be used for semi-supervised classification. The RF is used as the classifier of the semi-supervised classification. Three kinds of data are used as the input of RF. One is the polarimetric coherent matrix

T

, and the other two are the features from the two fully connected layers FC

_{1}

and FC

_{2}

on the top of the network. The output of FC

_{1}

is a 64 dimension vector and the output of FC

_{2}

is a N dimension vector, where N is the number of classes. The GF-3 Wuhan, RS-2 Flevoland, and AIRSAR Flevoland datasets are used for the semi-supervised classification performance evaluation. For each dataset, the classification results with different numbers of labeled training samples are illustrated. When the labeled training samples are less than 1000, the experiments are thought to be semi-supervised classification [54]. We run all of the experiments multiple times and choose the best results.

Table 6 shows the semi-supervised classification results of GF-3 Wuhan dataset. When the number of labeled training samples is 25,000, the OA of FC

_{1}

is the best and the OA of

T

is a little lower. When the number of samples reduces to 1000, the OA of

T

decreases a lot. The OA of FC

_{1}

is the best and the OA of FC

_{2}

is the lowest. When the number of samples is 100, the OAs of

T

and FC

_{2}

are closed to each other, and the OA of FC

_{1}

is the best.

Table 7 shows the classification results of the RS-2 Flevoland dataset. When the number of samples is 25,000, the OA of FC

_{1}

is high, and the OAs of

T

and FC

_{2}

are also good. When the number of samples reduces to 1000 and 100, the OA of

T

decrease a lot. Especially, when only 100 labeled training samples are used, the OA

T

is very low, only 66.98%. The OAs of FC

_{1}

and FC

_{2}

decrease a little and the performance of FC

_{1}

is the best.

For ARISAR dataset, the classification result of FC

_{1}

is very good when the number of samples is 25,000, and the OAs of

T

and FC

_{2}

are much lower, as shown in Table 8. When the number of samples reduces to 110, all of the classification accuracies are decreased. However, the OA of FC

_{1}

is still good enough, but the OA of

T

decreases to 49.44% and it is very low.

The above analysis shows that the feature representation of FC

_{1}

can achieve good results in semi-supervised classification for all three datasets with a commonly used RF classifier. The classification results of FC

_{2}

feature representation are also considerable. When the number of samples is very low, the performance of low level polarimetric feature

T

is low, and it is not suitable for semi-supervised PolSAR image classification. Therefore, the proposed method can learn discriminative feature representation in an unsupervised manner and improve the performance of semi-supervised PolSAR image classification.

5.2. Training Accuracy Trend

In this section, the classification accuracies of the four datasets during the training process are discussed, as shown in Figure 13. As the number of training steps increases, the cluster accuracy also increases, and it can achieve a relatively stable value. For the GF-3 Wuhan, AIRSAR Flevoland, and RS-2 Wuhan datasets, the accuracies of proposed methods are always higher than DAC, DCCM and DCCM+IIC during training. For the RS-2 Flevoland dataset, the accuracies of DCCM, DCCM+IIC, and the proposed method are closed to each other, but the accuracy of proposed method still a litter higher than DCCM+IIC. Besides, for the four datasets, the performance of DCCM+IIC is better than DCCM, and the performances of DCCM and DAC are comparable. The combination of the deep mutual information algorithms of IIC and DCCM can improve the classification accuracies. By learning the rotation domain deep mutual information while using IIC and DCCM mutual information algorithms, the proposed method is superior to DAC, DCCM, and DCCM+IIC, and it can achieve good performance for unsupervised PolSAR image classification.

As shown in Figure 13, the proposed method also has a disadvantage. As with other unsupervised methods, the cluster accuracies of the proposed method still change in a small range after long time training. However, the range of such accuracy variation is very small and rather smaller than the other methods. It has limited impact on the cluster performance. Therefore, the proposed method has promising value in practical applications.

6. Conclusions

In this paper, the convolutional LSTM and two mutual information losses are used in order to learn the rotation domain deep mutual information of “POA-spatio” polarimetric coherent matrices for unsupervised PolSAR image classification. The proposed method achieves good performance on four real PolSAR image datasets. When compared to six unsupervised classification methods, the performance of proposed method is the best. The proposed method is end-to-end, and no extra preprocessing or post-processing is needed. The deep mutual information of PolSAR image in rotation domain is very helpful for discriminative feature representation learning, and the proposed method also shows good performance in semi-supervised PolSAR image classification.

Author Contributions

Conceptualization, L.W.; methodology, L.W.; software, L.W.; validation, R.G.; formal analysis, R.Y.; investigation, L.W.; resources, X.X.; data curation, X.X.; writing—original draft preparation, L.W.; writing—review and editing, R.G., R.Y., X.X., and F.P.; supervision, X.X.; funding acquisition, X.X., F.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62071336, in part by the National Key Research and Development Program of China under Grant 2018YFB2100503, in part by the Thirteen-Five Civil Aerospace Planning Project—Integration of Communication, Navigation and Remote Sensing Comprehensive Application Technology, and in part by the Chinese Technology Research and Development of the Major Project of High-Resolution Earth Observation System under Grant 03-Y20A10-9001-15/16.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gong, C.; Zhou, P.; Han, J. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar]
Hawryło, P.; Bednarz, B.; Wężyk, P.; Szostak, M. Estimating defoliation of Scots pine stands using machine-learning methods and vegetation indices of Sentinel-2. Eur. J. Remote Sens. 2018, 51, 194–204. [Google Scholar] [CrossRef] [Green Version]
Dumitru, C.O.; Cui, S.; Faur, D.; Datcu, M. Data analytics for rapid mapping: Case study of a flooding event in Germany and the tsunami in Japan using very high resolution SAR images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 114–129. [Google Scholar] [CrossRef]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Dong, H.; Xu, X.; Sui, H.; Xu, F.; Liu, J. Copula-Based Joint Statistical Model for Polarimetric Features and Its Application in PolSAR Image Classification. Trans. Geosci. Remote Sens. 2017, 55, 5777–5789. [Google Scholar] [CrossRef]
Du, P.J.; Samat, A.; Waske, B.; Liu, S.C.; Li, Z.H. Random Forest and Rotation Forest for fully polarized SAR image classification using polarimetric and spatial features. ISPRS J. Photogramm. Remote Sens. 2015, 105, 38–53. [Google Scholar] [CrossRef]
Waske, B.; Benediktsson, J.A. Fusion of support vector machines for classification of multisensor data. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3858–3866. [Google Scholar] [CrossRef]
Dong, H.; Xu, X.; Wang, L.; Pu, F. Gaofen-3 PolSAR Image Classification via XGBoost and Polarimetric Spatial Information. Sensors 2018, 18, 611. [Google Scholar] [CrossRef] [Green Version]
Gao, W.; Yang, J.; Ma, W. Land cover classification for polarimetric SAR images based on mixture models. Remote Sens. 2014, 6, 3770–3790. [Google Scholar] [CrossRef] [Green Version]
Feng, Q.; Zhou, L.; Chen, E.; Liang, X.; Zhao, L.; Zhou, Y. The Performance of Airborne C-Band PolInSAR Data on Forest Growth Stage Types Classification. Remote Sens. 2017, 9, 955. [Google Scholar] [CrossRef] [Green Version]
Biondi, F. Multi-chromatic Analysis Polarimetric Interferometric Synthetic Aperture Radar (MCA-PolInSAR) for Urban Classification. Int. J. Remote Sens. 2018, 40, 3721–3750. [Google Scholar] [CrossRef]
Tao, X.; Zhang, D.; Singh, A.K.; Prasad, M.; Lin, C.; Xu, D. Weak Scratch Detection of Optical Components Using Attention Fusion Network. In Proceedings of the IEEE 16th International Conference on Automation Science and Engineering, Online Zoom Meeting, Hong Kong, China, 20–21 August 2020. [Google Scholar]
Wang, X.; Han, X.; Huang, W.; Dong, D.; Scott, M.R. Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
Shang, R.; Wang, G.; Okoth, M.A.; Jiao, L. Complex-Valued Convolutional Autoencoder and Spatial Pixel-Squares Refinement for Polarimetric SAR Image Classification. Remote Sens. 2019, 11, 522. [Google Scholar] [CrossRef] [Green Version]
Wang, R.; Wang, Y. Classification of PolSAR Image Using Neural Nonlocal Stacked Sparse Autoencoders with Virtual Adversarial Regularization. Remote Sens. 2019, 11, 1038. [Google Scholar] [CrossRef] [Green Version]
Mullissa, A.G.; Persello, C.; Stein, A. PolSARNet: A Deep Fully Convolutional Network for Polarimetric SAR Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 5300–5309. [Google Scholar] [CrossRef]
Chen, S.; Tao, C. PolSAR Image Classification Using Polarimetric-Feature-Driven Deep Convolutional Neural Network. IEEE Geosci. Remote Sens. Lett. 2018, 15, 627–631. [Google Scholar] [CrossRef]
Liu, X.; Jiao, L.; Tang, X.; Sun, Q.; Zhang, D. Polarimetric Convolutional Network for PolSAR Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3040–3054. [Google Scholar] [CrossRef] [Green Version]
Yang, R.; Hu, Z.; Liu, Y.; Xu, Z. A Novel Polarimetric SAR Classification Method Integrating Pixel-Based and Patch-Based Classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 431–435. [Google Scholar] [CrossRef]
Gui, R.; Xu, X.; Yang, R.; Wang, L.; Pu, F. Statistical Scattering Component-Based Subspace Alignment for Unsupervised Cross-Domain PolSAR Image Classification. Trans. Geosci. Remote Sens. 2020. [Google Scholar] [CrossRef]
Lee, J.S.; Pottier, E. Polarimetric Radar Imaging: From Basics to Applications; CRS Press: Boca Raton, FL, USA, 2009. [Google Scholar]
Anfinsen, S.N.; Jenssen, R.; Eltoft, T. Spectral clustering of polarimetric SAR data with Wishart-derived distance measures. In Proceedings of the 3rd International Workshop on Science and Applications of SAR Polarimetry and Polarimetric Interferometry, Esrin, Italy, 22–26 January 2007. [Google Scholar]
Ersahin, K.; Cumming, I.G.; Ward, R.K. Segmentation and classification of polarimetric SAR data using spectral graph partitioning. IEEE Trans. Geosci. Remote Sens. 2010, 48, 164–174. [Google Scholar] [CrossRef] [Green Version]
Song, H.; Yang, W.; Bai, Y.; Xu, X. Unsupervised classification of polarimetric SAR imagery using large-scale spectral clustering with spatial constraints. Int. J. Remote Sens. 2015, 36, 2816–2830. [Google Scholar] [CrossRef]
Hua, W.; Wang, S.; Liu, H.; Liu, Y.; Jiao, L. Adaptive unsupervised classification of polarimetric SAR images using the improved affinity propagation clustering algorithm. Int. J. Remote Sens. 2016, 37, 6023–6040. [Google Scholar] [CrossRef]
Zou, H.; Li, M.; Shao, N.; Qin, X. Superpixel-Oriented Unsupervised Classification for Polarimetric SAR Images Based on Consensus Similarity Network Fusion. IEEE Access 2019, 7, 78347–78366. [Google Scholar] [CrossRef]
Wang, S.; Liu, K.; Pei, J.; Gong, M. Unsupervised Classification of Fully Polarimetric SAR Images Based on Scattering Power Entropy and Copolarized Ratio. IEEE Geosci. Remote Sens. Lett. 2013, 10, 622–626. [Google Scholar] [CrossRef]
Zhong, N.; Yang, W.; Cherian, A.; Yang, X.; Xia, G.; Liao, M. Unsupervised Classification of Polarimetric SAR Images via Riemannian Sparse Coding. IEEE Trans. Geosci. Remote Sens. 2017, 55, 5381–5390. [Google Scholar] [CrossRef]
Zou, H.; Shao, N.; Li, M.; Chen, C.; Qin, X. Superpixel-Based Unsupervised Classification of Polsar Images with Adaptive Number of Terrain Classes. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 5 November 2018. [Google Scholar]
Wu, J.; Long, K.; Wang, F.; Qian, C.; Li, C.; Lin, Z.; Zha, H. Deep Comprehensive Correlation Mining for Image Clustering. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
Song, H.; Yang, W.; Zhong, N.; Xu, X. Unsupervised Classification of PolSAR Imagery via Kernel Sparse Subspace Clustering. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1487–1491. [Google Scholar] [CrossRef]
Ji, P.; Zhang, T.; Li, H.; Salzmann, M.; Reid, I. Deep Subspace Clustering Networks. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Zhou, P.; Hou, Y.; Feng, J. Deep Adversarial Subspace Clustering. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Ji, X.; Henriques, J.F.; Vedaldi, A. Invariant Information Clustering for Unsupervised Image Classification and Segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
Donahue, J.; Krahenbuhl, P.; Darrell, T. Adversarial feature learning. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Hjelm, R.D.; Fedorov, A.; Marchildon, S.L.; Grewal, K.; Trischler, A.; Bengio, Y. Learning deep representations by mutual information estimation and maximization. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Caron, M.; Bojanowski, P.; Joulin, A.; Douze, M. Deep Clustering for Unsupervised Learning of Visual Features. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
Chang, J.; Wang, L.; Meng, G.; Xiang, S.; Pan, C. Deep Adaptive Image Clustering. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
Bi, H.; Xu, F.; Wei, Z.; Han, Y.; Cui, Y.; Xue, Y.; Xu, Z. Unsupervised PolSAR Image Factorization with Deep Convolutional Networks. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019. [Google Scholar]
Cao, F.; Hong, W.; Wu, Y.; Pottier, E. An Unsupervised Segmentation With an Adaptive Number of Clusters Using the SPAN/H/α/A Space And the Complex Wishart Clustering for Fully Polarimetric SAR Data Analysis. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3454–3467. [Google Scholar] [CrossRef]
Zhao, Y.; Yuan, Y.; Wang, Q. Fast Spectral Clustering for Unsupervised Hyperspectral Image Classification. Remote Sens. 2019, 11, 399. [Google Scholar] [CrossRef] [Green Version]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.; Wong, W.; WOO, W. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
Chen, S.; Wang, X.; Sato, M. Uniform Polarimetric Matrix Rotation Theory and Its Applications. IEEE Trans. Geosci. Remote Sens. 2014, 52, 4756–4770. [Google Scholar] [CrossRef]
Taon, C.; Chen, S.; Li, Y.; Xiao, S. PolSAR Land Cover Classification Based on Roll-Invariant and Selected Hidden Polarimetric Features in the Rotation Domain. Remote Sens. 2017, 9, 660. [Google Scholar]
Yang, R.; Xu, X.; Xu, Z.; Dong, H.; Gui, R.; Pu, F. Dynamic Fractal Texture Analysis for PolSAR Land Cover Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5991–6002. [Google Scholar] [CrossRef]
Liu, Q.; Zhou, F.; Hang, R.; Yuan, X. Bidirectional-Convolutional LSTM Based Spectral-Spatial Feature Learning for Hyperspectral Image Classification. Remote Sens. 2017, 9, 1330. [Google Scholar]
Wang, L.; Xu, X.; Dong, H.; Gui, R.; Yang, R.; Pu, F. Exploring Convolutional Lstm for Polsar Image Classification. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 8452–8455. [Google Scholar]
Chang, Y.; Luo, B. Bidirectional Convolutional LSTM Neural Network for Remote Sensing Image Super-Resolution. Remote Sens. 2019, 11, 2333. [Google Scholar] [CrossRef] [Green Version]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
Mitra, P.; Bokil, H. Entropy and Mutual Information; Oxford University Press: Oxford, UK, 2009; pp. 333–341. [Google Scholar]
Nowozin, S.; Cseke, B.; Tomioka, R. F-GAN: Training generative neural samplers using variational divergence minimization. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 271–279. [Google Scholar]
Lee, J.S.; Grunes, M.R.; Pottier, E. Quantitative comparison of classification capability: Fully polarimetric versus dual and single-polarization SAR. IEEE Trans. Geosci. Remote Sens. 2001, 39, 2343–2351. [Google Scholar]
Rosenberg, A.; Hirschberg, J. V-measure: A Conditional Entropy Based External Cluster Evaluation Measure. In Proceedings of the EMNLP-CoNLL 2007, 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, 28–30 June 2007; pp. 410–420. [Google Scholar]
Li, Y.; Xing, R.; Jiao, L.; Chen, Y.; Chai, Y.; Marturi, N.; Shang, R. Semi-Supervised PolSAR Image Classification Based on Self-Training and Superpixels. Remote Sens. 2019, 11, 1933. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The architecture of proposed method. It consists of three modules: the first one is the input module, the second one is the network module and the latter one is the loss function module. The paired sequences of polarimetric coherent matrices of each pixel’s neighborhood window, for example the three red boxes in the input module, are the inputs of the network module. The network module is stacked by some deep learning layers, such as Convolutional Long Short Term Memory (ConvLSTM) (CL) layers, Convolutional (C) layers, and so on. The loss function module is used for network back propagation.

Figure 2. Pauli pseudo-color images of polarimetric coherent matrices of RADARSAT-2 Flevoland. (a) POA

= \frac{π}{18}

; (b) POA

= \frac{3 π}{18}

; (c) POA

= \frac{5 π}{18}

; (d) POA

= \frac{7 π}{18}

.

Figure 2. Pauli pseudo-color images of polarimetric coherent matrices of RADARSAT-2 Flevoland. (a) POA

= \frac{π}{18}

; (b) POA

= \frac{3 π}{18}

; (c) POA

= \frac{5 π}{18}

; (d) POA

= \frac{7 π}{18}

.

Figure 3. Input paired data

x

and

x^{'}

of a pixel in a polarimetric synthetic aperture radar (PolSAR) image.

x

denotes the sequence of polarimetric coherent matrices of the pixel’s neighborhood window.

x^{'}

is transformed from

x

by polarimetric matrix rotation and random geometry transformations. Each pseudo-color image patch denotes the neighborhood window data with different POAs. For every training epoch, the value of

Δ θ_{i}, i = 0, 1, \dots, 8

is randomly selected in range

[- 5 π / 180, 4 π / 180]

.

Figure 3. Input paired data

x

and

x^{'}

of a pixel in a polarimetric synthetic aperture radar (PolSAR) image.

x

denotes the sequence of polarimetric coherent matrices of the pixel’s neighborhood window.

x^{'}

is transformed from

x

by polarimetric matrix rotation and random geometry transformations. Each pseudo-color image patch denotes the neighborhood window data with different POAs. For every training epoch, the value of

Δ θ_{i}, i = 0, 1, \dots, 8

is randomly selected in range

[- 5 π / 180, 4 π / 180]

.

Figure 4. The network architecture of proposed method. Acronyms are, as follows: CL: ConvLSTM, C: Convolutional, R: ReLU, B: Batch Normalization, M: Max pooling, FC: fully connected, S: Softmax. The numbers in parentheses are filter kernels, kernel size, stride, and padding. The numbers in square brackets are the sizes of feature maps and N denotes the number of classes.

Figure 5. GF-3 Wuhan dataset and the ground truth map. (a) Pauli pseudo-color image of GF-3 Wuhan; (b) Ground truth map of GF-3 Wuhan.

Figure 6. RS-2 Flevoland dataset and the ground truth map. (a) Pauli pseudo-color image of RS-2 Flevoland; (b) Ground truth map of RS-2 Flevoland.

Figure 7. AIRSAR Flevoland dataset and the ground truth map. (a) Pauli pseudo-color image of AIRSAR Flevoland; (b) Ground truth map of AIRSAR Flevoland.

Figure 8. RS-2 Wuhan dataset and the ground truth map. (a) Pauli pseudo-color image of RS-2 Wuhan; (b) Ground truth map of RS-2 Wuhan; (c) The optical image of ROI_1; (d) The optical image of ROI_2.

Figure 9. Classification results of GF-3 Wuhan dataset. (a) Ground truth map; (b) K-means; (c) Wishart; (d) ASC-S; (e) DAC; (f) DCCM; (g) DCCM+IIC; (h) RDDMI.

Figure 10. Classification results of RS-2 Flevoland dataset. (a) Ground truth map; (b) K-means; (c) Wishart; (d) ASC-S; (e) DAC; (f) DCCM; (g) DCCM+IIC; (h) RDDMI.

Figure 11. Classification results of AIRSAR Flevoland dataset. (a) Ground truth map; (b) K-means; (c) Wishart; (d) ASC-S; (e) DAC; (f) DCCM; (g) DCCM+IIC; (h) RDDMI.

Figure 12. Classification results of RS-2 Wuhan dataset. (a) Ground truth map; (b) K-means; (c) Wishart; (d) ASC-S; (e) DAC; (f) DCCM; (g) DCCM+IIC; (h) RDDMI.

Figure 13. The training process of the four datasets. (a) GF-3 Wuhan; (b) RS-2 Flevoland; (c) AIRSAR Flevoland; (d) RS-2 Wuhan.

Table 1. The detailed information of four real PolSAR datasets.

Parameter	GF-3 Wuhan	RS-2 Flevoland	AIRSAR Flevoland	RS-2 Wuhan
Sensor	Gaofen-3	RADARSAT-2	NASA/JPL AIRSAR	RADARSAT-2
Band	C	C	L	C
Imaging Area	Wuhan	Flevoland	Flevoland	Wuhan
Imaging Mode	QPSI	Quad-pol	∖	Quad-pol
Imaging time	1-4-2017	2008	16-8-1989	12-2011
Spatial resolution [Range Azimuth] (m)	$5.20 \times 2.25$	$12 \times 8$	$6 \times 12$	$12 \times 8$
Image size [Range Azimuth] (pixel)	$976 \times 603$	$1400 \times 1200$	$750 \times 1024$	$5500 \times 2400$
Image number	Figure 5	Figure 6	Figure 7	Figure 8

Table 2. Performance evaluation of GF-3 Wuhan classification results. The numbers in bold denote the best unsupervised classification performance.

Land Cover	K-Means	Wishart	ASC-S	DAC	DCCM	DCCM+IIC	RDDMI	RF
Water	99.66	92.82	81.19	97.42	99.44	99.28	99.54	99.95
Building	62.20	44.35	0.00	65.37	65.76	71.51	83.57	98.67
Farmland	57.39	69.36	86.34	79.84	58.51	76.01	93.96	95.60
Vegetation1	82.50	80.64	96.17	51.92	99.32	98.50	95.20	99.42
Vegetation2	76.14	80.87	82.71	77.70	95.59	96.50	81.18	96.18
OA	72.19	75.60	74.53	78.59	79.11	86.93	91.68	97.16
Kappa	0.64	0.68	0.65	0.71	0.73	0.83	0.88	0.96
Purity	0.77	0.75	0.82	0.80	0.83	0.89	0.91	0.97
Entropy	0.91	0.96	0.69	0.84	0.63	0.49	0.38	0.18

Table 3. Performance evaluation of RS-2 Flevoland classification results. The numbers in bold denote the best unsupervised classification performance.

Land Cover	K-Means	Wishart	ASC-S	DAC	DCCM	DCCM+IIC	RDDMI	RF
Water	95.12	90.63	90.77	95.81	93.98	91.28	92.49	95.85
Forest	49.73	77.77	87.14	77.51	76.81	79.63	82.14	87.77
Farmland	38.93	65.65	54.41	80.60	83.74	84.47	86.42	85.60
Building	57.49	55.31	65.04	69.84	94.83	95.90	94.16	82.39
OA	54.70	71.35	72.35	80.15	85.87	86.79	87.86	88.10
Kappa	0.39	0.61	0.62	0.73	0.81	0.82	0.83	0.83
Purity	0.57	0.71	0.72	0.80	0.85	0.86	0.87	0.87
Entropy	1.29	1.12	1.02	0.93	0.72	0.70	0.67	0.68

Table 4. Performance evaluation of AIRSAR Flevoland classification results. The numbers in bold denote the best unsupervised classification performance.

Land Cover	K-Means	Wishart	ASC-S	DAC	DCCM	DCCM+IIC	RDDMI	RF
Stem bean	73.20	65.15	0.00	93.45	69.51	69.59	94.88	96.69
Forest	77.57	31.13	87.75	79.69	98.30	98.09	97.19	90.18
Potatoes	10.55	87.70	90.35	87.75	43.12	70.97	65.48	89.62
Lucerne	0.62	67.49	72.59	73.93	78.45	77.84	75.78	89.17
Wheat	45.55	57.71	50.40	54.19	34.67	39.27	82.18	77.94
Bare soil	94.08	82.38	75.82	98.52	92.44	94.37	82.88	95.57
Beet	75.12	71.59	82.78	67.45	2.35	93.37	53.32	89.00
Rape seed	59.98	79.90	22.09	59.17	56.78	60.84	35.07	90.38
Peas	67.59	57.95	65.79	66.79	74.30	65.28	78.80	73.50
Grass	0.00	50.36	18.98	1.53	17.55	89.99	91.58	93.56
Water	0.00	0.00	79.96	0.00	0.00	0.00	0.00	99.97
OA	46.73	59.34	58.90	64.91	60.62	67.73	72.84	86.81
Kappa	0.41	0.54	0.53	0.61	0.55	0.64	0.69	0.85
Purity	0.65	0.61	0.66	0.72	0.72	0.77	0.79	0.86
Entropy	1.21	1.58	1.35	1.12	0.96	0.86	0.83	0.73

Table 5. Performance evaluation of RS-2 Wuhan classification results. The numbers in bold denote the best unsupervised classification performance.

Land Cover	K-Means	Wishart	ASC-S	DAC	DCCM	DCCM+IIC	RDDMI	RF
Water	97.37	88.10	81.30	90.25	97.19	97.46	92.26	94.01
Building	52.32	67.89	70.30	74.25	58.15	70.16	88.49	88.34
Forest	19.63	74.02	80.90	80.37	75.01	64.26	75.83	81.44
OA	52.31	74.55	76.26	79.93	72.63	74.61	85.31	87.45
Kappa	0.30	0.60	0.63	0.69	0.59	0.62	0.77	0.80
Purity	0.63	0.75	0.76	0.80	0.73	0.75	0.85	0.87
Entropy	1.10	0.88	0.85	0.76	0.92	0.91	0.68	0.59

Table 6. The semi-supervised classification accuracies of GF-3 Wuhan (%).

Number of Samples	Input	Water	Building	Farmland	Vegetation1	Vegetation2	OA
100	$T$	99.79	80.12	90.35	92.70	90.12	91.58
	FC $_{1}$	99.32	79.22	94.96	95.27	90.81	93.82
	FC $_{2}$	99.68	75.17	93.56	95.44	85.15	91.81
1000	$T$	99.94	88.45	91.64	97.39	92.39	93.80
	FC $_{1}$	99.53	81.82	96.51	95.91	90.87	94.77
	FC $_{2}$	99.48	81.88	95.61	94.15	88.24	93.62
25,000	$T$	99.95	98.67	95.60	99.42	96.18	97.16
	FC $_{1}$	99.73	95.94	96.84	98.58	96.48	97.42
	FC $_{2}$	99.60	87.53	95.54	96.28	90.76	94.78

Table 7. The semi-supervised classification accuracies of RS-2 Flevoland (%).

Number of Samples	Input	Water	Forest	Farmland	Building	OA
100	$T$	97.12	67.32	59.69	52.74	66.98
	FC $_{1}$	90.08	88.13	89.58	88.81	89.08
	FC $_{2}$	92.18	83.16	87.18	90.45	87.57
1000	$T$	95.25	81.06	83.21	73.35	82.66
	FC $_{1}$	91.24	87.67	90.01	92.84	90.16
	FC $_{2}$	90.83	83.93	88.92	93.54	88.81
25,000	$T$	95.85	87.77	85.60	82.39	88.10
	FC $_{1}$	94.03	90.89	91.63	95.12	92.58
	FC $_{2}$	91.40	85.60	89.50	93.14	89.48

Table 8. The semi-supervised classification accuracies of AIRSAR Flevoland (%).

Land Cover	110			1100			25,000
Land Cover	T	FC $_{1}$	FC $_{2}$	T	FC $_{1}$	FC $_{2}$	T	FC $_{1}$	FC $_{2}$
Stem bean	97.95	95.53	94.75	91.79	99.09	96.52	96.69	99.79	99.42
Forest	48.13	94.51	96.36	80.64	97.10	95.50	90.18	99.15	98.01
Potatoes	43.45	86.95	77.01	77.12	91.78	84.54	89.62	97.00	91.77
Lucerne	67.13	90.82	62.98	74.90	96.35	90.61	89.17	99.07	96.70
Wheat	26.81	81.12	83.59	62.74	94.77	92.01	77.94	96.73	95.07
Bare soil	95.32	85.96	79.88	98.60	94.93	88.25	95.57	99.72	97.60
Beet	49.14	80.01	61.39	82.42	88.95	86.32	89.00	96.72	92.75
Rape seed	76.34	85.40	70.34	88.01	90.34	88.88	90.38	97.73	93.12
Peas	20.20	60.63	52.91	63.15	85.52	67.82	73.50	97.11	88.60
Grass	29.62	86.56	89.60	78.45	96.97	92.31	93.56	99.81	98.04
Water	93.32	95.37	86.36	97.88	99.82	95.58	99.97	99.97	99.59
OA	49.44	84.52	77.81	76.58	93.55	88.84	86.81	97.91	94.80

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, L.; Xu, X.; Gui, R.; Yang, R.; Pu, F. Learning Rotation Domain Deep Mutual Information Using Convolutional LSTM for Unsupervised PolSAR Image Classification. Remote Sens. 2020, 12, 4075. https://doi.org/10.3390/rs12244075

AMA Style

Wang L, Xu X, Gui R, Yang R, Pu F. Learning Rotation Domain Deep Mutual Information Using Convolutional LSTM for Unsupervised PolSAR Image Classification. Remote Sensing. 2020; 12(24):4075. https://doi.org/10.3390/rs12244075

Chicago/Turabian Style

Wang, Lei, Xin Xu, Rong Gui, Rui Yang, and Fangling Pu. 2020. "Learning Rotation Domain Deep Mutual Information Using Convolutional LSTM for Unsupervised PolSAR Image Classification" Remote Sensing 12, no. 24: 4075. https://doi.org/10.3390/rs12244075

APA Style

Wang, L., Xu, X., Gui, R., Yang, R., & Pu, F. (2020). Learning Rotation Domain Deep Mutual Information Using Convolutional LSTM for Unsupervised PolSAR Image Classification. Remote Sensing, 12(24), 4075. https://doi.org/10.3390/rs12244075

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Learning Rotation Domain Deep Mutual Information Using Convolutional LSTM for Unsupervised PolSAR Image Classification

Abstract

1. Introduction

2. Methods

2.1. Input Polarimetric Features

2.2. Network Architecture

2.3. Loss Functions

2.3.1. IIC Mutual Information

2.3.2. Pseudo Graph and Pseudo Label Supervision Loss

2.3.3. Triplet Mutual Information

2.4. Model Optimization

3. Datasets

3.1. Gaofen-3 Wuhan Dataset

3.2. RADARSAT-2 Flevoland Dataset

3.3. AIRSAR Flevoland Dataset

3.4. RADARSAT-2 Wuhan Dataset

4. Experiments

4.1. Results of GF-3 Wuhan

4.2. Results of RS-2 Flevoland

4.3. Results of AIRSAR Flevoland

4.4. Results of RS-2 Wuhan

5. Discussion

5.1. Semi-Supervised Classification

5.2. Training Accuracy Trend

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI