A Full Tensor Decomposition Network for Crop Classification with Polarization Extension

Zhang, Wei-Tao; Zheng, Sheng-Di; Li, Yi-Bang; Guo, Jiao; Wang, Hui

doi:10.3390/rs15010056

Open AccessArticle

A Full Tensor Decomposition Network for Crop Classification with Polarization Extension

by

Wei-Tao Zhang

^1,2,*

,

Sheng-Di Zheng

¹,

Yi-Bang Li

¹,

Jiao Guo

³ and

Hui Wang

⁴

¹

School of Electronic Engineering, Xidian University, Xi’an 710071, China

²

Research Institute of Advanced Remote Sensing Technology, Xidian University, Xi’an 710071, China

³

College of Mechanical and Electronic Engineering, Northwest A & F University, Xianyang 712100, China

⁴

Shanghai Institude of Satellite Engineering, Shanghai 201100, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(1), 56; https://doi.org/10.3390/rs15010056

Submission received: 29 October 2022 / Revised: 16 December 2022 / Accepted: 16 December 2022 / Published: 22 December 2022

(This article belongs to the Special Issue Deep Learning for Remote Sensing Image Classification II)

Download

Browse Figures

Versions Notes

Abstract

:

The multisource data fusion technique has been proven to perform better in crop classification. However, traditional fusion methods simply stack the original source data and their corresponding features, which can be only regarded as a superficial fusion method rather than deep fusion. This paper proposes a pixel-level fusion method for multispectral data and dual polarimetric synthetic aperture radar (PolSAR) data based on the polarization extension, which yields synthetic quad PolSAR data. Then we can generate high-dimensional features by means of various polarization decomposition schemes. High-dimensional features usually cause the curse of the dimensionality problem. To overcome this drawback in crop classification using the end-to-end network, we propose a simple network, namely the full tensor decomposition network (FTDN), where the feature extraction in the hidden layer is accomplished by tensor transformation. The number of parameters of the FTDN is considerably fewer than that of traditional neural networks. Moreover, the FTDN admits higher classification accuracy by making full use of structural information of PolSAR data. The experimental results demonstrate the effectiveness of the fusion method and the FTDN model.

Keywords:

polarization extension; curse of dimensionality; full tensor decomposition network (FTDN)

Graphical Abstract

1. Introduction

Crop classification plays an important role in the remote sensing monitoring of agricultural conditions. It is also a premise for further monitoring of crop growth and yields. Accurate and reliable discrimination of crop categories based on remotely-sensed images is a significant data source for agricultural monitoring and food security evaluation research.

Optical and radar remote sensing technologies are the two main methodologies for crop classification. Optical remote sensing technology is susceptible to environmental factors, such as weather and light, and is unable to obtain accurate surface information under adverse weather [1]. Radar remote sensing technology is insensitive to weather and light. Hence, the fusion of optical data and synthetic aperture radar (SAR) data is commonly carried out to implement more accurate classifications of crops [2]. For example, Qiao et al. fused SAR data and optical data and adopted a maximum likelihood method for crop classification based on fused data, leading to improved classification accuracy [3]. Shi et al. classified the fused data of multispectral and hyperspectral data through classification and regression tree, showing that the classification accuracies of fused data are better than those of individual data alone [4]. However, these fusion methods simply stack the multisource data or their corresponding features, they are far from in-depth fusion. Seo constructed a fusion model of optical data and SAR data using the random forest regression method, benefiting from the subsequent image analysis and interpretation [5]. This method fused the multisource data in-depth, which enhanced the quality of the optical image with auxiliary SAR data, and provided a new illustration for image fusion. Different from image fusion, the crop classification task focuses on the effective extraction of distinguishable features from remotely sensed data and the subsequent powerful classification model trained by these features. In particular, under the adverse weather condition, multisource data fusion is beneficial to generate enough distinguishable features for crop classification. In order to overcome the adverse weather effect, more polarization scattering features are always superior [6]. Quad PolSAR data usually provide data rich enough for crop classification, as it enables extracting large numbers of polarization scattering features via various polarization decomposition schemes [7]. However, quad PolSAR data are usually unavailable due to their higher costs. Dual PolSAR data can be served as natural substitutes, but fewer polarization decomposition schemes can be adopted, providing inadequate polarization scattering features. Therefore, the extension of dual PolSAR data to quad PolSAR data with the assistance of optical data is promising for crop classification.

Although higher dimensional features usually improve crop classification performance, it is prone to result in the “curse of dimensionality” problem in real applications. As for a regular cognitive model, the higher dimensional features extracted from raw data contain more information, and the trained model will be more capable to deal with sophisticated issues. However, Hughes [8] shows that the prediction ability of the model is enhanced at the beginning and then deteriorated with the increase of feature dimension for a fixed number of learning samples. This is called the Hughes phenomenon, which is one of the most significant drawbacks caused by the “curse of dimensionality”. This eventually results in a contradiction between the feature space and sample space below. To solve this problem, dimensionality reduction methods, including Principal Component Analysis (PCA) [9], Locally Linear Embedding (LLE) [10], and Autoencoder [11], are generally adopted before classification. Moreover, the subsequent classifiers, such as support vector machine (SVM) [12], K-nearest neighbor (KNN) [13] and neural networks [14,15,16] worked on the compressed features. However, these two-stage methods destroyed the structural information of high-dimensional data, and the hard connection between the dimensionality reduction model and classification model usually leads to a degraded performance. Fortunately, tensor decomposition techniques, such as Tucker decomposition and CP decomposition [17], are efficient tool to extract features from higher dimensional data with fewer parameters [18]. Moreover, it preserves the structural information in the decomposed core tensor. These two merits make it suitable to serve as the feature extractor or data compressor in higher dimensional scenario. However, tensor decomposition always works in an offline mode, where the transformation parameters such as the loading matrices depend only on the individual tensor sample. They cannot be properly learned online by a set of data samples. Therefore, the tensor decomposition-based learning method is promising to incorporate compressor and classifier in an end-to-end network.

In order to solve the above problems, we propose a polarization extension method for multisource data fusion and a full tensor decomposition network (FTDN) for end-to-end crop classification. The polarization extension method extends dual PolSAR to synthetic quad PolSAR data, then providing more complementary polarization scattering features to optical data. The FTDN is a one-stage network with fewer parameters, which exploits tensor decomposition technique in each layer to implement feature extraction and pattern classification. It is able to extract structural information from the original high-dimensional data, and circumvent the curse of dimensionality through tensor decomposition.

The rest of the paper is organized as follows. The second part presents the polarization extension-based fusion method, feature collection themes, and the architecture of the FTDN. The third part provides experimental results to verify the feasibility of the proposed method. The fourth part discusses the experimental results. Finally, our conclusion is drawn in Section 5.

2. Proposed Methodology

2.1. Multisource Data Fusion via Polarization Extension

Multispectral data and SAR data provide complementary features. The fusion of SAR data and multispectral data contributes to the better visual perception of objects and compensates for spatial information. Intensity hue saturation (IHS) and PCA methods are often used to merge multisensor data [19]. Chavez investigated the feasibility of three methods for using panchromatic data to substitute spatial features of multispectral data (both statistically and visually) [20]; Chandrakanth demonstrated the feasibility of the fusion of SAR data and multispectral data [21]. A basic assumption concerning these methods is that the SAR amplitude is closely related to the intensity and principal component of multispectral images with high correlation coefficients. Therefore, SAR data can replace either of these two images while re-transforming data back into the original image space. Inspired by this assumption, we propose to consider an antithetical assumption that the principal component of multispectral images can be regarded as PolSAR data from the point of view of intensity, which yields the simulated SAR data in a certain polarization mode. Then the dual PolSAR data can be extended to construct synthetic quad PolSAR data.

For quad PolSAR data, each pixel is represented by a 2 × 2 complex matrix as follows

S = [\begin{matrix} S_{H H} & S_{H V} \\ S_{V H} & S_{V V} \end{matrix}]

(1)

where

S_{H V}

denotes the scattering factor of horizontal transmitting and vertical receiving polarization and the others have similar definitions. Here we demonstrate the fusion of multispectral data and dual PolSAR data generated in VH and VV polarization modes. Note that the reciprocity condition

S_{H V} = S_{V H}

is commonly assumed for quad PolSAR data, then we only need to construct the scattering factor

S_{H H}

to form the quad PolSAR data. Firstly, we perform the registration of imageries collected by different sensors in advance. The overall fusion process is then illustrated in Figure 1. On one hand, we extract the principal component of multispectral data using PCA. The extracted principal component represents the spatial information, which is used to generate the amplitude of

S_{H H}

. On the other hand, the phase of

S_{H H}

is simulated by using the phase of

S_{H V}

instead. This is because they shared the same horizontal polarization transmitting antenna. Moreover, most of the distinguishable features (see Table 1 below) for crop classification is irrelevant to the phase of scattering factor, and our experimental results show that the crop classification performance is insensitive to the phase of

S_{H H}

. Finally, the simulated

S_{H H}

is incorporated into the dual polarization data

S_{V H}

and

S_{V V}

to form synthetic quad PolSAR data, which enables us to extract higher dimensional features of the target through various polarimetric decomposition schemes. High-dimensional features extracted from synthetic quad PolSAR data provide more complementary information to multispectral data for accurate classification applications.

2.2. Polarization Decomposition and Feature Collection

Polarization scattering feature extraction of PolSAR data can obtain more target feature information and further improve the classification, detection, and identification capabilities of a data-driven model. A quad PolSAR image can obtain the 36-dimensional polarimetric scattering features by using a variety of classical decomposition algorithms. As shown in Table 1, 10 features were directly obtained from the measured data by simple transformations and combinations. We obtained 24 features using incoherent decomposition algorithms, including the Huynen decomposition [22], Freeman–Durden decomposition [23], Yamaguchi decomposition [24], and Cloude–Pottier decomposition [25]. The last 2 null angle parameters were computed via polarimetric matrix rotation [26]. A quad PolSAR image was processed and the 36-dimensional features were obtained. The resulting features involved all of the potential data of the primitive PolSAR data.

2.3. Full Tensor Decomposition Network

(1) Tucker decomposition-based feature extraction layer: When using the traditional neural network to classify high dimensional crop data, it is often necessary to perform dimensionality reduction operations to avoid the curse of dimensionality. Moreover, the extracted features from the data compressor may not be suitable for the classification model because the compressor and classification model are separately trained. We propose a Tucker decomposition-based feature extraction (TDFE) layer to extract hidden information from high-dimensional data. The TDFE layer can be used as the hidden layer of a tensor network, which performs the data compression or feature extraction for further processing.

The TDFE layer, different from the fully connected layer, uses tensor decomposition to realize the forward propagation. For a 3-way tensor

X = {x_{i_{1} i_{2} i_{3}}} \in R^{I_{1} \times I_{2} \times I_{3}}

, a compressed feature tensor

Z = {z_{j_{1} j_{2} j_{3}}} \in R^{J_{1} \times J_{2} \times J_{3}}

can be obtained by transforming

X

as follows

Y = X \times_{1} M_{1} \times_{2} M_{2} \times_{3} M_{3}

(2)

Z = f (Y)

(3)

where

M_{n} \in R^{J_{n} \times I_{n}}, n = 1, 2, 3

are factor matrices,

\times_{1}

performs the operation that multiplies each column fiber of

X

with

M_{1}

,

\times_{2}

performs the operation that multiplies each row fiber of

X

with

M_{2}

, and

\times_{3}

performs the operation that multiplies each tube fiber of

X

with

M_{3}

,

Y = {y_{j_{1} j_{2} j_{3}}} \in R^{J_{1} \times J_{2} \times J_{3}}

is the multi-linear transformed tensor and

f (\cdot)

is the activation function. Note that the outcome

Y

is independent of the calculation order of mode product, each component

y_{j_{1} j_{2} j_{3}}

in tensor

Y

is calculated by

y_{j_{1} j_{2} j_{3}} = \sum_{i_{1} = 1}^{I_{1}} \sum_{i_{2} = 1}^{I_{2}} \sum_{i_{3} = 1}^{I_{3}} x_{i_{1} i_{2} i_{3}} m_{1_{j_{1} i_{1}}} m_{2_{j_{2} i_{2}}} m_{3_{j_{3} i_{3}}}

(4)

where

m_{1_{j_{1} i_{1}}}

,

m_{2_{j_{2} i_{2}}}

,

m_{3_{j_{3} i_{3}}}

are corresponding entries in matrices

M_{1}

,

M_{2}

and

M_{3}

. The forward propagation of the TDFE layer can be regarded as the inverse process of Tucker decomposition. The TDFE layer allows us to decompose the input tensor without destroying its coupling structure between each mode.

The factor matrices

M_{n}

are learned end-to-end by error backpropagation. When the TDFE layer is regarded as a hidden layer of a tensor neural network, we define the neuron error tensor

E

of the TDFE layer as

E \overset{Δ}{=} \frac{\partial L}{\partial Y} = {e_{j_{1} j_{2} j_{3}}} \in R^{J_{1} \times J_{2} \times J_{3}}

,where L represents the loss function of the network. The update of each entry

m_{1_{j_{1} i_{1}}} \in M_{1}

is derived by finding the gradient

\begin{matrix} \frac{\partial L}{\partial m_{1_{j_{1} i_{1}}}} & = \sum_{j_{2} = 1}^{J_{2}} \sum_{j_{3} = 1}^{J_{3}} \frac{\partial L}{\partial y_{j_{1} j_{2} j_{3}}} \frac{\partial y_{j_{1} j_{2} j_{3}}}{\partial m_{1_{j_{1} i_{1}}}} \\ = \sum_{j_{2} = 1}^{J_{2}} \sum_{j_{3} = 1}^{J_{3}} \sum_{i_{2} = 1}^{I_{2}} \sum_{i_{3} = 1}^{I_{3}} e_{j_{1} j_{2} j_{3}} x_{i_{1} i_{2} i_{3}} m_{2_{j_{2} i_{2}}} m_{3_{j_{3} i_{3}}} \end{matrix}

(5)

The updates of

M_{2}

and

M_{3}

are similar to that of

M_{1}

.

(2) Tucker decomposition-based classification layer: A flattened layer is always necessary for classical networks to vectorize the resulting features extracted by the hidden layer when classification or regression is performed. However, the structural information from the high-dimensional feature tensor will be discarded after the flattened layer. Instead, we propose to use a higher order weight tensor

W

to project the feature tensor into a class vector without discarding the multimodal structure. The higher order weight tensor

W

usually contains a large number of parameters, the update of which will lead to extremely high computational complexity. Here we proposed a Tucker decomposition-based classification (TDC) layer, which used the low-rank representation to replace the original weight tensor.

Figure 2 illustrates the structure of the TDC layer. Assuming that the 3-way feature tensor

Z = {z_{k_{1} k_{2} k_{3}}} \in R^{K_{1} \times K_{2} \times K_{3}}

is fed to the TDC layer, an output vector

t = {t_{k_{4}}} \in R^{K_{4}}

can be generated by a 4-way weight tensor

W = {w_{k_{1} k_{2} k_{3} k_{4}}} \in R^{K_{1} \times K_{2} \times K_{3} \times K_{4}}

, where

K_{4}

is the number of classes and each component

t_{k_{4}}

is calculated by the inner product of

Z

and the slices of

W

\begin{matrix} t_{k_{4}} & = 〈Z, W_{k_{4}}〉 \\ = \sum_{k_{1} = 1}^{K_{1}} \sum_{k_{2} = 1}^{K_{2}} \sum_{k_{3} = 1}^{K_{3}} z_{k_{1} k_{2} k_{3}} w_{k_{1} k_{2} k_{3} k_{4}} \end{matrix}

(6)

where

〈 〉

denotes the inner product of two tensors, and

W_{k_{4}}

is the slice of

W

along the fourth way. In order to reduce the parameters of the TDC layer, the weight tensor

W

is decomposed and represented by a core tensor

R \in R^{Q_{1} \times Q_{2} \times Q_{3} \times Q_{4}}

and factor matrices

U_{n} \in R^{K_{n} \times Q_{n}}

W = R \times_{1} U_{1} \times_{2} U_{2} \times_{3} U_{3} \times_{4} U_{4}

(7)

Then substituting (6) into (5) leads to

\begin{matrix} \begin{matrix} t_{k_{4}} \end{matrix} & \begin{matrix} = \sum_{k_{1} = 1}^{K_{1}} \sum_{k_{2} = 1}^{K_{2}} \sum_{k_{3} = 1}^{K_{3}} \sum_{q_{1} = 1}^{Q_{1}} \sum_{q_{2} = 1}^{Q_{2}} \sum_{q_{3} = 1}^{Q_{3}} \sum_{q_{4} = 1}^{Q_{4}} z_{k_{1} k_{2} k_{3}} r_{q_{1} q_{2} q_{3} q_{4}} \\ \cdot u_{1_{k_{1} q_{1}}} u_{2_{k_{2} q_{2}}} u_{3_{k_{3} q_{3}}} u_{4_{k_{4} q_{4}}} \end{matrix} \\ \begin{matrix} = \sum_{q_{1} = 1}^{Q_{1}} \sum_{q_{2} = 1}^{Q_{2}} \sum_{q_{3} = 1}^{Q_{3}} (\sum_{k_{1} = 1}^{K_{1}} \sum_{k_{2} = 1}^{K_{2}} \sum_{k_{3} = 1}^{K_{3}} z_{k_{1} k_{2} k_{3}} u_{1_{k_{1} q_{1}}} u_{2_{k_{2} q_{2}}} u_{3_{k_{3} q_{3}}}) \\ \cdot (\sum_{q_{4} = 1}^{Q_{4}} r_{q_{1} q_{2} q_{3} q_{4}} u_{4_{k_{4} q_{4}}}) \end{matrix} \\ \begin{matrix} = 〈Z^{'}, {R^{'}}_{k_{4}}〉 \end{matrix} \end{matrix}

(8)

where

Z^{'}

and

R_{k_{4}}^{'}

are 3-way smaller tensors of dimension

Q_{1} \times Q_{2} \times Q_{3}

, since

Q_{n}

can be always chosen far less than

K_{n}

. Note that the parameters of the TDC layer are converted to smaller core tensor

R

and the factor matrices

U_{1}, \dots, U_{4}

. Therefore, the number of parameters of the network can be greatly reduced.

When the TDC layer is used as the classification layer of a tensor neural network, the parameters of the TDC layer are learned end-to-end by error backpropagation. Let us define the neuron error vector

e

of the TDC layer as

e \overset{Δ}{=} \frac{\partial L}{\partial t} = {e_{k_{4}}} \in R^{K_{4}}

, then the update of

R

and

U_{1}

are derived by finding the gradient

\begin{matrix} \begin{matrix} \frac{\partial L}{\partial r_{q_{1} q_{2} q_{3} q_{4}}} = \sum_{k_{1} = 1}^{K_{1}} \sum_{k_{2} = 1}^{K_{2}} \sum_{k_{3} = 1}^{K_{3}} \sum_{k_{4} = 1}^{K_{4}} e_{k_{4}} z_{k_{1} k_{2} k_{3}} \\ \cdot u_{1_{k_{1} q_{1}}} u_{2_{k_{2} q_{2}}} u_{3_{k_{3} q_{3}}} u_{4_{k_{4} q_{4}}} \end{matrix} \end{matrix}

(9)

\begin{matrix} \begin{matrix} \frac{\partial L}{\partial u_{1_{k_{1} q_{1}}}} = \sum_{k_{2} = 1}^{K_{2}} \sum_{k_{3} = 1}^{K_{3}} \sum_{k_{4} = 1}^{K_{4}} e_{k_{4}} z_{k_{1} k_{2} k_{3}} \sum_{q_{2} = 1}^{Q_{2}} \sum_{q_{3} = 1}^{Q_{3}} \sum_{q_{4} = 1}^{Q_{4}} \\ r_{q_{1} q_{2} q_{3} q_{4}} u_{2_{k_{2} q_{2}}} u_{3_{k_{3} q_{3}}} u_{4_{k_{4} q_{4}}} \end{matrix} \end{matrix}

(10)

The update of each entry

U_{2}

,

U_{3}

, and

U_{4}

is similar to that of

U_{1}

.

(3) FTDN architecture and learning: Based on TDFE layer and TDC layer, we propose a full tensor decomposition network (FTDN) for high-dimensional data classification. The architecture of the FTDN is shown in Figure 3, which contains two TDFE layers and one TDC layer. The input tensor sample

X = {x_{i_{1} i_{2} i_{3}}} \in R^{I_{1} \times I_{2} \times I_{3}}

that corresponds to a certain pixel is formed by collecting all features of its local neighborhood pixels, where

I_{3}

is the number of features and

I_{1} \times I_{2}

is the size of the neighborhood. For TDFE layer-1, the feedforward computation is straightforward, and its output feature tensor is denoted by

Z^{{1}} = {z_{j_{1} j_{2} j_{3}}^{{1}}} \in R^{J_{1} \times J_{2} \times J_{3}}

, where the superscript in the bracket denotes the layer index. For TDFE layer-2, the input tensor comes from the output of the previous layer, and the output feature tensor is denoted by

Z^{{2}} = {z_{k_{1} k_{2} k_{3}}^{{2}}} \in R^{K_{1} \times K_{2} \times K_{3}}

. For the TDC layer, the output vector

t

is transformed by a softmax layer to calculate the class posteriors

z

, where

g (x)

represents the softmax function.

We now derive the error backpropagation-based learning of the FTDN. We see that a stochastic gradient descent learning for the FTDN is straightforward according to Equation (3), (7) and (8), where the remained issue is to compute the neuron errors of each layer. As for layer TDC layer, the neuron error vector

e

depends on the specific loss function L, which can be chosen from typical cross-entropy or mean square error between the network output and target label. Taking the cross entropy loss, for example, the neuron error vector

e

reads

e = z - g

, where

g

denotes the corresponding label vector. As for TDFE layer-2, the entries

e_{k_{1} k_{2} k_{3}}^{{2}}

of neuron error tensor

E^{{2}}

can be computed from

e

via the error backpropagation technique

\begin{matrix} \begin{matrix} e_{k_{1} k_{2} k_{3}}^{{2}} \end{matrix} & \begin{matrix} = \sum_{k_{4} = 1}^{K_{4}} \frac{\partial L}{\partial t_{k_{4}}} \frac{\partial t_{k_{4}}}{\partial z_{k_{1} k_{2} k_{3}}^{{2}}} \frac{\partial z_{k_{1} k_{2} k_{3}}^{{2}}}{\partial y_{k_{1} k_{2} k_{3}}^{{2}}} \end{matrix} \\ \begin{matrix} = f^{'} (y_{k_{1} k_{2} k_{3}}^{{2}}) \sum_{k_{4} = 1}^{K_{4}} e_{k_{4}} \sum_{q_{1} = 1}^{Q_{1}} \sum_{q_{2} = 1}^{Q_{2}} \sum_{q_{3} = 1}^{Q_{3}} \sum_{q_{4} = 1}^{Q_{4}} \\ r_{q_{1} q_{2} q_{3} q_{4}} u_{1_{k_{1} q_{1}}} u_{2_{k_{2} q_{2}}} u_{3_{k_{3} q_{3}}} u_{4_{k_{4} q_{4}}} \end{matrix} \end{matrix}

(11)

Similarly, the error backpropagation from

E^{{2}}

to

E^{{1}}

is represented as

\begin{matrix} \begin{matrix} e_{j_{1} j_{2} j_{3}}^{{1}} \end{matrix} & \begin{matrix} = \sum_{k_{1} = 1}^{K_{1}} \sum_{k_{2} = 1}^{K_{2}} \sum_{k_{3} = 1}^{K_{3}} \frac{\partial L}{\partial y_{k_{1} k_{2} k_{3}}^{{2}}} \frac{\partial y_{k_{1} k_{2} k_{3}}^{{2}}}{\partial z_{j_{1} j_{2} j_{3}}^{{1}}} \frac{\partial z_{j_{1} j_{2} j_{3}}^{{1}}}{\partial y_{j_{1} j_{2} j_{3}}^{{1}}} \end{matrix} \\ \begin{matrix} = f^{'} (y_{j_{1} j_{2} j_{3}}^{{1}}) \sum_{k_{1} = 1}^{K_{1}} \sum_{k_{2} = 1}^{K_{2}} \sum_{k_{3} = 1}^{K_{3}} e_{k_{1} k_{2} k_{3}}^{{2}} m_{1_{k_{1} j_{1}}}^{{2}} \\ \cdot m_{2_{k_{2} j_{2}}}^{{2}} m_{3_{k_{3} j_{3}}}^{{2}} \end{matrix} \end{matrix}

(12)

3. Experimental Results

3.1. Experimental Data

(1) Dali Dataset: The study area in Dali dataset is a farm (

109^{\circ} 10^{'} 49^{″} E,

3 4^{\circ} 47^{'} 60^{″} N

), located in Dali County, Weinan City, Shaanxi Province, which has 1106 × 514 pixels. There are mainly five different kinds of targets and an `unknown’ class in this area; the data are shown in Table 2, and the corresponding images of this area are shown in Figure 4.

As shown in Figure 4, the study area is covered by clouds. Five types of data are used in the following experiments: dual PolSAR data, multispectral data, stacking-based fusion data, textural feature fusion data, and polarization extension-fused data. The dual PolSAR data are obtained from the Sentinel-1 satellite with VV and VH polarization modes, and the multispectral data are obtained from the Sentinel-2 satellite. Stacking-based fusion data are formed by stacking multispectral data and original dual PolSAR data. In order to construct textural feature fusion data, we adopt gray-level co-occurrence matrix (GLCM) [27] method to extract textural features of dual PolSAR data, including energy, homogeneity and entropy. Then textural feature fusion data are obtained by combining the extracted features of VV and VH data respectively with multispectral data.

As for the polarization extension-fused data, the generation procedure is illustrated in Figure 5. Firstly, the principal component of the multispectral information is obtained by the standard PCA. Then we use the principal component to form the amplitude of

S_{H H}

, and the phase of

S_{H H}

is simulated by using the phase of

S_{H V}

instead. Consequently, the synthetic HH channel data

S_{H H}

and the dual PolSAR data are combined to obtain synthetic quad PolSAR data. Finally, the 36-dimensional features can be computed according to Table 1, and the combination of the extracted features with multispectral data yields the polarization extension-fused data.

(2) Flevoland Dataset:The Flevoland dataset was acquired by the NASA/JPL AIRSAR system in August 1989. This area has 15 classes of different types of crops and an `unknown’ class with 750 × 1024 pixels. The number of pixels and total area for each crop are summarized in Table 3. Figure 6a is the Pauli RGB image of the study area. The true distribution of crops is shown in Figure 6b [28,29]. As described in Section 2, the 36-dimensional features were collected for the Flevoland dataset, in the following experiments, we performed the model training and prediction based on the 36-dimensional features.

3.2. Classification Result of Different Methods Using Polarization Extension-Fused Data

We conducted an experiment on the Dali dataset to confirm the validity of the proposed method for crop classification. We compared FTDN with the state of art SAE-CNN [30], 3D CNN [31], classifiers, and other traditional deep learning methods [32] based on polarization extension-fused data to confirm the validity of the proposed methods.

For fair comparisons, the dimensions of the input data were reduced to 5 through the corresponding dimensionality reduction models for SAE-CNN and 3D CNN. For the FTDN, SAE-CNN, and 3D CNN classifiers, the sizes of the input samples were set to 5 × 5, while for other classifiers, the input single samples were without neighborhood pixels. The learning rate was set to 0.001 and the optimizer we adopted was adaptive moment estimation. for the FTDN, we used the activation function hyperbolic tangent for all TDFE layers. All methods were trained and implemented on our computing server equipped with two Intel Xeon E5-2620 CPUs and four Nvidia TITAN XP GPUs. Overall accuracy (OA) and Kappa coefficient (Kappa) were used to evaluate the classification accuracies of these methods. Each experiment repeated 10 times with randomly selected training samples.

The classification results of SAE-CNN, 3D CNN, 1D CNN, LSTM, and FTDN methods using polarization extension-fused data are shown in Figure 7. Table 4 presents a quantitative comparison of the classification accuracy and the number of parameters of the networks. Stratified random sampling was commonly implemented for accuracy assessment [33]; the training samples were selected using stratified random sampling with a 1% ratio.

We can conclude from Table 4 that the FTDN behaves the best performance. From the perspective of OA, the FTDN method is 2.17% higher than 3D CNN, 2.56% higher than SAE-CNN, 2.84% higher than 1D CNN, and 1.65% higher than LSTM. As for Kappa coefficient, the FTDN method is 4.12% higher than 3D CNN, 4.91% higher than SAE-CNN, 5.40% higher than 1D CNN and 2.34% higher than LSTM. Comparing the number of parameters of the networks, the FTDN is 89.14% fewer than 3D CNN, 94.73% fewer than SAE-CNN, 99.98% fewer than 1D CNN and 99.97% fewer than LSTM. These results indicate that classification using polarization extension-fused data achieves excellent effects, and the FTDN method has the best classification accuracy.

It should be pointed out that the proposed FTDN, 1D CNN, and LSTM are single end-to-end models for crop classification free of the specialized dimensionality compression model. The FTDN allows us to extract features without destroying its coupling structure between each mode, then the extracted structural information yields better classification accuracy. The fewer network parameters demonstrate the reason why FTDN can circumvent the curse of dimensionality [29]. However, for 3D CNN and SAE-CNN methods, the hard coupling between the feature compression model and the classification model is inevitable. After dimensionality reduction by data compression model, the input features are not necessarily discriminative for the classifier. As for 1D CNN and LSTM methods, although these classifiers maintain the end-to-end structure, these classifiers are usually used to process sequence data. The features of polarization extension-fused data contain few continuous sequence features and the characteristics of their network architectures determine that these classifiers cannot extract sufficient information for classification. The FTDN is a single end-to-end model for crop classification without hard coupling, and the special multi-way feature extraction technique based on the Tucker decomposition along each mode is efficient at capturing the structural distinctions between different crops. These results demonstrate the superiority of the FTDN method.

3.3. Classification Result of Different Methods Using the Flevoland Dataset

In the previous experiment, the FTDN method is used for the first time to classify polarization extension data, the results validate its superiority. However, Dali dataset is a relatively simple dataset with only five different categories, which cannot demonstrate the classification capability sufficiently. In order to investigate the performance of the FTDN method in high-dimensional crop classification from multiple aspects, we present the experimental results on Flevoland dataset.

The classification results and error maps for the SVM, SAE-CNN, 3D CNN, 1D CNN, LSTM, vision transformer (ViT) [34], and FTDN methods are shown in Figure 8. It is noticeable that the FTDN behaves the best performance.

By further summarizing the recall rates of crops of 15 categories individually in Table 5, we can see that the recall rates of the FTDN for each category are universally higher than that of other methods. It is noticeable that the training samples are randomly selected with a 1% ratio for each category. Although the training sample numbers between different categories are unbalanced, the statistical distribution of the training samples was consistent with the distribution of the overall samples, the classification results will obey the prior distribution of crops. Extra experiments were also carried out, the results show that the overall accuracies under balanced samples case and unbalanced case are close to each other. In particular, for some categories with small planting areas, i.e., C1, C10, C12, and C15, the FTDN still yields excellent classification performances. The recall rates of other classifiers for these crops are poorer than the FTDN. Note that this corresponds to a small-sample learning problem; the results imply that the proposed FTDN is qualified for the small-sample learning problem to some extent. The low overall accuracy of other classifiers arises from the poorer classification performance for small planting areas. We believe that large scale networks, such as SAE-CNN and ViT, are not suitable for handling the small-sample learning scenario because the small samples are not adequate to train the large parameter amounts. The FTDN model is a simple network with only two TDFE layers and one TDC layer, which makes it more suitable for the small-sample learning problem. Therefore, the classification accuracy of small sample crops can be improved.

The number of parameters for a crop classification model is also very important because it affects the efficiency of the model implemented in a specified platform. The number of parameters of the FTDN, ViT, LSTM, 1D CNN, 3D CNN, and SAE-CNN are shown in Table 6. It can be observed that the total number of parameters for the FTDN is fewer than that of 3D CNN, SAE-CNN, 1D CNN, LSTM, and ViT. The sizes of the factor matrices determine the number of parameters of the FTDN, which are far less than convolution kernels. The FTDN can circumvent the increasing parameters caused by high-dimensional input data, which is why the FTDN can integrate the dimensionality reduction model and the classification model in a single network. The fewer parameters imply that the FTDN model is more suitable to efficiently be implemented in resource-limited-embedded platforms.

4. Discussion

4.1. Classification Results of the SVM Using Different Datasets

We can conclude from Section 3 that all classifiers achieved excellent classification accuracies using polarization extension-fused data. In order to further explore the superiorities of the polarization extension method, we used the SVM classifier here to classify different types of fusion data and compare the classification results.

The classification results of the five different datasets using the SVM method are shown in Figure 9, Table 7 presents a quantitative comparison of the classification accuracy. Training samples were randomly selected with 1% ratio, and the remaining samples were used for the test. The regularization parameter for the SVM was set to 0.8.

We can conclude from Table 7 that the OA of the SVM method, which used the polarization extension-fused data, was 1.92% higher than the textural feature fusion data, 5.65% higher than the stacking-based fusion data, 5.75% higher than multispectral data, and 20.5% higher than dual PolSAR data. The Kappas of the SVM method that used the polarization extension-fused data were 3.67% higher than textural feature fusion data, 12.34% higher than stacking-based fusion data, 12.58% higher than multispectral data, and 24.98% higher than dual PolSAR data.

Since the study area is covered by clouds, it is difficult to achieve outstanding accuracy by directly classifying multispectral data using the SVM method. Furthermore, the dual PolSAR data provide few features, they are insufficient for the SVM classifier to yield good classification performance. Simply stacking dual PolSAR data and multispectral data cannot take full use of the complementary advantages of multisource data, which only provide few accuracy improvements for crop classification. Textural features are more distinguishable for the classifier than the original dual PolSAR data, and fusing these features with multispectral data can improve classification accuracy. Therefore, the classification accuracies of textural feature fusion data outperform stacking-based fusion data. The proposed polarization extension method extracted higher dimensional radar features by generating synthetic quad PolSAR data, which provided more complementary data to multispectral data. The spatial data of multispectral data were very similar to SAR data, and could be regarded as good alternates to a certain polarization channel for PolSAR data. Hence, the synthetic quad PolSAR data can be used to approximate the real quad PolSAR data. We can obtain the 36-dimensional features from synthetic quad PolSAR data by various polarization decomposition schemes; the extracted 36-dimensional features are more distinguishable than textural features extracted by the GLCM from dual PolSAR data. Therefore, the classification accuracies of the polarization extension-fused data outperformed textural feature fusion data.

4.2. Performance of the FTDN on Different Tensor Samples and Train Ratios

We tested the FTDN classification performance and compared it with the SAE-CNN network and 3D CNN network under different input sizes. It should be noted that the 36-dimensional features are directly fed to the proposed FTDN, whereas the dimensions of the input features are reduced to 9 through corresponding dimensionality reduction models for SAE-CNN and 3D CNN. Table 8 shows the OA and Kappa coefficients of these classifiers with different input sizes under the 1% train ratio.

By observing Table 8, we conclude that these classifiers trained by larger input sizes perform better than smaller ones. The FTDN classifier can achieve superior classification accuracy under various input sizes; however, other classifiers are susceptible to the input sizes. In particular, the OA of the FTDN classifier is 5.37% higher than that of SAE-CNN and 5.30% higher than that of 3D CNN when the input size is set to 3 × 3, and 4.61% higher than that of SAE-CNN and 4.92% higher than that of 3D CNN when the input size is set to 7 × 7. The gap of OA (of different classifiers) will diminish as the input size increases. As for various input sizes, the classification performance of the FTDN is always better than SAE-CNN and 3D CNN. The excellent performance of the FTDN arises from the special multi-way feature extraction technique that is based on the TDFE layers, which can capture the structural distinctions in smaller tensor samples. However, the corresponding network architectures of 3D CNN and SAE-CNN cannot perform better in feature extracting when inputting smaller sample sizes. The input size 15 × 15 achieved excellent performance for all classifiers; therefore, there is no need to further increase the sizes of the input tensors because the computational burden increases accordingly. So, in the following experiments, we set the input sizes of these classifiers to 15 × 15.

The classification accuracy of different methods can be improved with the increase in the training ratio. The increase of the training ratio will increase the computational burden accordingly; therefore, it is important to explore the performances of different models under different training ratios. We evaluated the crop classification performances of different classifiers by using different ratios of training samples. Here, four ratios (0.5%, 1%, 5%, and 10%) were considered. Table 9 shows the OA and Kappa coefficients of different classifiers with different train ratios.

It can be seen from Table 9 that the classification accuracies for all classifiers increased as did the growth of the training ratios. The FTDN always performed better than the other classifiers among these ratios. Note that the proposed FTDN model under a low training ratio performed excellent, and the other model could not perform well in these cases. The SVM classifier is obviously inferior to other competitors in terms of OA and Kappa because the SVM usually performs the crop classification based on the samples formed by individual pixels; the others are based on the samples formed by local neighborhood pixels. We believe that the poorer performances of the 3D CNN and SAE-CNN arise from the hard coupling between the feature compression model and the classification model. The feature compression operation may lead to insufficient information for discrimination; the parameters of the classification model are difficult to train adequately under low training ratios. As for 1D CNN and LSTM classifiers, as mentioned earlier, although these classifiers maintain end-to-end structures, the characteristics of their network architectures determine that these classifiers cannot extract sufficient information for classification. The ViT network is commonly applied to large data sets. In this application, it nearly achieves the best performance. However, the ViT model holds a large number of parameters, and the proposed FTDN accounts for merely 0.04% of ViT parameters. The excellent performance of the FTDN arises from the end-to-end network architecture. The extracted features by the TDFE layers are discriminative for the TDC layer; hence, the classification accuracy can be improved.

5. Conclusions

In order to extract high-dimensional polarization scattering features from dual PolSAR data, this paper proposes a deep fusion method of SAR and multispectral data based on the polarization extension, which can extract high-dimensional polarization scattering features to provide more complementary information for multispectral data. Furthermore, to solve the curse of dimensionality problem in high-dimensional data classification, a full tensor decomposition network (FTDN) is proposed. Compared with other fusion methods, the polarization extension method can make better use of the advantages of multisource data. The experimental results verify that the polarization extension method is effective and can significantly improve the classification accuracy of crops. Moreover, the FTDN model can effectively deal with high-dimensional features through end-to-end learning. Experimental results show that the FTDN outperforms other traditional methods for classifying high-dimensional remote sensing data in terms of classification accuracy and the number of network parameters.

Author Contributions

W.-T.Z. proposed the methodology. S.-D.Z. designed the experiments and performed the programming work. Y.-B.L. and H.W. contributed extensively to the manuscript writing and revision. J.G. supervised the study. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (grant No. 62071350 and U22B2015).

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kulkarni, S.C.; Priti, P.R. Application of Taguchi method to improve land use land cover classification using PCA-DWT-based SAR-multispectral image fusion. J. Appl. Remote Sens. 2021, 15, 014509. [Google Scholar] [CrossRef]
Orynbaikyzy, A.; Ursula, G.; Christopher, C. Crop type classification using a combination of optical and radar remote sensing data: A review. Int. J. Remote Sens. 2019, 40, 6553–6595. [Google Scholar] [CrossRef]
Qiao, C.; Daneshfar, B.; Davidson, A.; Jarvis, I.; Liu, T.; Fisette, T. Integration of optical and polarimetric SAR imagery for locally accurate crop classification. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium (IGARSS), Quebec City, QC, Canada, 13–18 July 2014; pp. 1485–1488. [Google Scholar]
Shi, F.; Lei, C.; Xiao, J.; Li, F.; Shi, M. Classification of Crops in Complicated Topography Area Based on Multisource Remote Sensing Data. Geogr. Geo-Inf. Sci. 2018, 34, 49–55. [Google Scholar]
Seo, D.K.; Eo, Y.D. A Learning-Based Image Fusion for High-Resolution SAR and Panchromatic Imagery. Appl. Sci. 2020, 10, 3298. [Google Scholar] [CrossRef]
Zhu, X.; Montazeri, S.; Ali, M. Deep Learning Meets SAR: Concepts, models, pitfalls, and perspectives. IEEE Geosci. Remote Sens. Mag. 2021, 9, 143–172. [Google Scholar] [CrossRef]
Yin, Q.; Xu, J.; Xiang, D.; Zhou, Y.; Zhang, F. Polarimetric Decomposition With an Urban Area Descriptor for Compact Polarimetric SAR Data. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2021, 14, 10033–10044. [Google Scholar] [CrossRef]
Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef] [Green Version]
Uddin, M.P.; Mamun, M.A.; Hossain, M.A. PCA-based feature reduction for hyperspectral remote sensing image classification. IEEE Tech. Rev. 2020, 38, 1–21. [Google Scholar] [CrossRef]
Chen, Y.; Qu, C.; Lin, Z. Supervised locally linear embedding based dimension reduction for hyperspectral image classification. In Proceedings of the 2013 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Melbourne, VIC, Australia, 21–26 July 2013; pp. 3578–3581. [Google Scholar]
Zhou, P.; Han, J.; Cheng, G.; Zhang, B. Learning compact and discriminative stacked autoencoder for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4823–4833. [Google Scholar] [CrossRef]
Wang, Y.; Yu, W.; Fang, Z. Multiple kernel-based SVM classification of hyperspectral images by combining spectral, spatial, and semantic information. Remote Sens. 2020, 12, 120. [Google Scholar] [CrossRef] [Green Version]
Ahmad, M.; Khan, A.; Khan, A.M.; Mazzara, M. Spatial prior fuzziness pool-based interactive classification of hyperspectral images. Remote Sens. 2019, 11, 1136. [Google Scholar] [CrossRef]
Zhang, W.; Wang, M.; Guo, J. Crop Classification Using MSCDN Classifier and Sparse Auto-Encoders with Non-Negativity Constraints for Multi-Temporal, Quad-Pol SAR Data. Remote Sens. 2021, 13, 2749. [Google Scholar] [CrossRef]
Jiang, Y.; Li, Y.; Zhang, H. Hyperspectral Image Classification Based on 3-D Separable ResNet and Transfer Learning. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1949–1953. [Google Scholar] [CrossRef]
Liu, X.; Jiao, L.; Tang, X.; Sun, Q.; Zhang, D. Polarimetric Convolutional Network for PolSAR Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3040–3054. [Google Scholar] [CrossRef] [Green Version]
Carroll, J.D.; Pruzansky, S.; Kruskal, J.B. CANDELINC: A general approach to multidimensional analysis of many-way arrays with linear constraints on parameters. Psychometrika 1980, 45, 3–24. [Google Scholar] [CrossRef]
Chen, J.; Zhang, W.; Qian, Y.; Ye, M. Deep tensor factorization for hyperspectral image classification. In Proceedings of the 2018 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Valencia, Spain, 22–27 July 2018; pp. 3578–3581. [Google Scholar]
Ghassemian, H. A review of remote sensing image fusion methods. Inf. Fusion 2016, 32, 75–89. [Google Scholar] [CrossRef]
Chavez, P.S.; Sides, S.C.; Anderson, J.A. Comparison of three different methods to merge multiresolution and multispectral data: Landsat TM and SPOT panchromatic. Photogramm. Eng. Remote Sens. 1991, 57, 265–303. [Google Scholar]
Chandrakanth, R.; Saibaba, J.; Varadan, G.; Ananth Raj, P. Feasibility of high resolution SAR and multispectral data fusion. In Proceedings of the 2011 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Vancouver, BC, Canada, 24–29 July 2011; pp. 356–359. [Google Scholar]
Yang, J.; Peng, Y.; Yamaguchi, Y.; Yamada, H. On Huynen’s decomposition of a Kennaugh matrix. IEEE Geosci. Remote Sens. Lett. 2006, 3, 369–372. [Google Scholar] [CrossRef]
Freeman, A.; Durden, S.L. A three-component scattering model for polarimetric SAR data. IEEE Trans. Geosci. Remote Sens. 1998, 36, 963–973. [Google Scholar] [CrossRef] [Green Version]
Yamaguchi, Y.; Moriyama, T.; Ishido, M.; Yamada, H. Four-component scattering model for polarimetric SAR image decomposition. IEEE Trans. Geosci. Remote Sens. 2005, 43, 1699–1706. [Google Scholar] [CrossRef]
Cloude, S.R.; Pottier, E. An entropy based classification scheme for land applications of polarimetric SAR. IEEE Trans. Geosci. Remote Sens. 1997, 35, 68–78. [Google Scholar] [CrossRef]
Chen, S.W.; Wang, X.S.; Sato, M. Uniform Polarimetric Matrix Rotation Theory and Its Applications. IEEE Trans. Geosci. Remote Sens. 2014, 52, 4756–4770. [Google Scholar] [CrossRef]
Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural Features for Image Classification. IEEE Trans. Sys. Man Cybern. 1973, SMC-3, 610–621. [Google Scholar] [CrossRef] [Green Version]
Uhlmann, S.; Kiranyaz, S. Integrating color features in polarimetric SAR image classification. IEEE Trans. Geosci. Remote Sens. 2014, 52, 2197–2216. [Google Scholar] [CrossRef]
Chen, S.; Tao, C. PolSAR Image Classification Using Polarimetric-Feature-Driven Deep Convolutional Neural Network. IEEE Geosci. Remote Sens. Lett. 2018, 15, 627–631. [Google Scholar] [CrossRef]
Guo, J.; Li, H.; Ning, J.; Zhang, W. Feature Dimension Reduction Using Stacked Sparse Auto-Encoders for Crop Classification with Multi-Temporal, Quad-Pol SAR Data. Remote Sens. 2020, 12, 321. [Google Scholar] [CrossRef] [Green Version]
Hamida, A.B.; Benoit, A.; Lambert, P.; Amar, C.B. 3-D deep learning approach for remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4420–4434. [Google Scholar] [CrossRef] [Green Version]
Zhong, L.; Hu, L.; Zhou, H. Deep learning based multi-temporal crop classification. Remote Sens. Environ. 2019, 221, 430–443. [Google Scholar] [CrossRef]
Stehman, S.V. Estimating area from an accuracy assessment error matrix. Remote Sens. Environ. 2013, 132, 202–211. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 2021 International Conference on Learning Representations (ICLR), Vienna, Austria, 4–8 May 2021. [Google Scholar]

Figure 1. Illustration of the polarization extension method.

Figure 2. A representation of the Tucker decomposition-based classification layer.

Figure 3. The architecture of the proposed FTDN.

Figure 4. Images of the study area. (a) Optical image from Google Earth. (b) Radar image. (c) Ground truth.

Figure 5. The generation process of polarization extension-fused data.

Figure 6. Images of the study area. (a) Pauli image. (b) Ground truth.

Figure 7. Classification result of different methods using polarization extension-fused data. (a) SAE-CNN, (b) 3D CNN, (c) 1D CNN, (d) LSTM, (e) FTDN; (f–j) the error maps of (a–e).

Figure 8. Crop classification results for different classifiers (a) SVM, (b) SAE-CNN, (c) 3D CNN, (d) 1D CNN, (e) LSTM, (f) ViT, (g) FTDN; (h–n) the error maps of (a–g).

Figure 9. Classification results of the SVM using different datasets. (a) SAR data. (b) Multispectral data. (c) Stacking-based fusion data. (d) Textural feature fusion data. (e) Polarization extension-fused data. (f–j) The error maps of (a–e).

Table 1. The 36-dimensional features of quad PolSAR data.

Feature Extraction	Features	Dimension
Features based on measured data	Polarization intensities $\|S_{H H}\|, \|S_{H V}\|, \|S_{V V}\|$	3
	Amplitude of HH-VV correlation $\|\frac{S_{V V} S_{H V}^{*}}{\sqrt{{\|S_{H H}\|}^{2} {\|S_{V V}\|}^{2}}}\|$	1
	Phase difference of HH-VV ${tan}^{- 1} (\frac{Im (S_{H H} S_{V V}^{})}{Re (S_{H H} S_{V V}^{})})$	1
	Co-polarized ratio $10 {log}_{10} (2 {\|S_{H V}\|}^{2} / {\|S_{V V}\|}^{2})$	1
	Cross-polarized ratio $10 {log}_{10} ({\|S_{V V}\|}^{2} / {\|S_{H H}\|}^{2})$	1
	Co-polarization ratio $10 {log}_{10} (2 {\|S_{H V}\|}^{2} / {\|S_{H H}\|}^{2})$	1
	Degrees of polarization ${\|S_{V V}\|}^{2} / {\|S_{H H}\|}^{2}$ $2 {\|S_{H V}\|}^{2} / ({\|S_{H H}\|}^{2} + {\|S_{V V}\|}^{2})$	2
Incoherent decomposition	Freeman–Durden decomposition	5
	Yamaguchi decomposition	7
	Cloude decomposition	3
	Huynen decomposition	9
Other decomposition	Null angle parameters	2
Total	36

Table 2. Crop information of the study area.

Targets	Pixel	Proportion
Wheat	142,450	25.06
Bare soil	37,549	6.60
Corn	25,227	4.44
Alfalfa	3009	0.53
Greenhouse	7100	1.25
Unknown	353,149	62.12
Sum	568,484	100

Table 3. Main Crops in the Study Area.

Crop Type	Crop Code	Number of Pixels	Ratio/%
Stem beans	C1	6338	0.83
Rapeseed	C2	13,863	1.81
Bare soil	C3	5109	0.67
Potatoes	C4	16,156	2.10
Beet	C5	10,033	1.31
Wheat 2	C6	11,159	1.45
Peas	C7	9582	1.25
Wheat 3	C8	22,241	2.90
Lucerne	C9	10,181	1.33
Barley	C10	7595	0.99
Wheat	C11	16,386	2.13
Grasses	C12	7058	0.92
Forest	C13	18,044	2.35
Water	C14	13,232	1.72
Building	C15	735	0.10
Unknown	UN	600,288	78.16
Sum	16	768,000	100

Table 4. Comparison between polarization extension-fused data classified.

Methods	OA	Kappa	Total Parameters
SAE-CNN	95.81	91.78	63,493
3D CNN	96.20	92.57	30,800
1D-CNN	95.53	91.29	157,573
LSTM	96.72	94.35	99,973
FTDN	98.37	96.69	3346

Table 5. The recall rates of different classifiers for each crop.

Crop Type	The Number of Train Samples (1%)	Accuracy (%)
Crop Type	The Number of Train Samples (1%)	SVM	SAE-CNN	3D CNN	1D CNN	LSTM	ViT	FTDN
C1	63	9.94	92.30	94.11	92.43	96.79	97.05	98.09
C2	139	8.36	95.24	95.28	87.28	96.79	94.99	95.95
C3	51	93.13	98.61	99.77	99.08	99.54	99.12	99.84
C4	162	84.37	93.52	93.96	94.99	95.07	97.46	99.16
C5	100	81.33	96.74	92.85	89.68	97.14	95.83	96.24
C6	112	48.67	97.99	93.54	93.82	96.69	98.05	98.64
C7	96	95.68	95.09	93.22	95.97	99.14	97.70	99.47
C8	222	99.84	99.22	95.55	96.27	98.30	99.15	99.50
C9	102	90.71	98.89	94.75	85.64	97.85	97.39	97.73
C10	76	87.02	94.81	90.60	89.41	96.36	96.59	99.57
C11	164	41.36	93.52	94.54	81.86	91.05	95.92	98.88
C12	71	73.52	90.12	92.94	84.92	95.80	91.92	94.86
C13	180	97.56	96.71	97.91	89.59	97.75	98.27	99.16
C14	132	99.92	99.74	98.19	99.67	98.50	99.93	99.95
C15	7	77.69	85.56	83.27	76.46	87.34	90.07	90.61

Table 6. The total parameters of different classifiers.

Competitors	Total Parameters
SAE-CNN	64,795
3D CNN	110,559
1D CNN	226,703
LSTM	86,927
ViT	15,305,790
FTDN	5792

Table 7. Contrast between the classification results before and after data fusion.

Methods	OA	Kappa
SAR data	73.18	62.32
Multispectral data	87.93	74.72
Stacking-based fusion data	88.03	74.96
Textural features fusion data	91.76	83.63
Polarization extension-fused data	93.68	87.30

Table 8. Classification performances of different classifiers under different input sizes.

Input Size	Classifier	Classification Accuracy (%)
Input Size	Classifier	OA	Kappa
3 × 3	SAE-CNN	90.86	90.04
	3D CNN	90.93	90.12
	FTDN	96.23	95.89
7 × 7	SAE-CNN	93.00	92.38
	3D CNN	92.69	92.04
	FTDN	97.61	97.51
15 × 15	SAE-CNN	96.17	95.83
	3D CNN	94.95	94.50
	FTDN	98.48	98.34
25 × 25	SAE-CNN	96.98	96.72
	3D CNN	96.36	96.03
	FTDN	98.82	98.71

Table 9. Classification performances of different classifiers under different train ratios.

Train Ratio	Classifier	Classification Accuracy (%)
Train Ratio	Classifier	OA	Kappa
0.5%	SVM	55.76	50.97
	SAE-CNN	93.08	92.47
	3D CNN	91.69	90.92
	1D CNN	93.22	92.62
	LSTM	95.33	94.91
	ViT	96.69	96.40
	FTDN	97.75	97.55
1%	SVM	73.35	71.87
	SAE-CNN	96.17	95.83
	3D CNN	94.95	94.50
	1D CNN	95.53	91.29
	LSTM	96.72	94.35
	ViT	97.26	97.02
	FTDN	98.48	98.34
5%	SVM	86.27	87.36
	SAE-CNN	98.68	98.35
	3D CNN	97.99	97.81
	1D CNN	97.44	97.21
	LSTM	98.17	98.01
	ViT	98.84	98.62
	FTDN	99.34	99.28
10%	SVM	92.96	92.32
	SAE-CNN	99.07	98.93
	3D CNN	98.49	98.35
	1D CNN	97.99	97.81
	LSTM	99.06	98.98
	ViT	99.37	99.13
	FTDN	99.81	99.79

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, W.-T.; Zheng, S.-D.; Li, Y.-B.; Guo, J.; Wang, H. A Full Tensor Decomposition Network for Crop Classification with Polarization Extension. Remote Sens. 2023, 15, 56. https://doi.org/10.3390/rs15010056

AMA Style

Zhang W-T, Zheng S-D, Li Y-B, Guo J, Wang H. A Full Tensor Decomposition Network for Crop Classification with Polarization Extension. Remote Sensing. 2023; 15(1):56. https://doi.org/10.3390/rs15010056

Chicago/Turabian Style

Zhang, Wei-Tao, Sheng-Di Zheng, Yi-Bang Li, Jiao Guo, and Hui Wang. 2023. "A Full Tensor Decomposition Network for Crop Classification with Polarization Extension" Remote Sensing 15, no. 1: 56. https://doi.org/10.3390/rs15010056

APA Style

Zhang, W. -T., Zheng, S. -D., Li, Y. -B., Guo, J., & Wang, H. (2023). A Full Tensor Decomposition Network for Crop Classification with Polarization Extension. Remote Sensing, 15(1), 56. https://doi.org/10.3390/rs15010056

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Full Tensor Decomposition Network for Crop Classification with Polarization Extension

Abstract

1. Introduction

2. Proposed Methodology

2.1. Multisource Data Fusion via Polarization Extension

2.2. Polarization Decomposition and Feature Collection

2.3. Full Tensor Decomposition Network

3. Experimental Results

3.1. Experimental Data

3.2. Classification Result of Different Methods Using Polarization Extension-Fused Data

3.3. Classification Result of Different Methods Using the Flevoland Dataset

4. Discussion

4.1. Classification Results of the SVM Using Different Datasets

4.2. Performance of the FTDN on Different Tensor Samples and Train Ratios

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI