1. Introduction
Pine wood nematode disease is one of the most dangerous forest biological infections worldwide, and it is a disease that is devastating to pine tree species. Because of its high infectivity and its high fatality rate, pine wood nematode disease is also called the “cancer” of pine trees. Pine wood nematodes originated in North America and have spread to other areas through the timber trade. The disease is currently prevalent in America, Canada, and Mexico in North America, China, Japan, South Korea, and North Korea in East Asia, and Portugal in Europe as well as in other countries. Japan has experienced the worst losses due to pine wood nematode disease. Pine wood nematode disease was reported in Nagasaki Prefecture in Japan in 1905. In the following decades, pine wood nematode disease spread to most parts of the country and caused serious economic losses [
1]. In 1982, pine wood nematode disease was first reported in China in Nanjing, Jiangsu Province, and in the following decades the disease spread to the surrounding areas. At present, the disease has spread to high-altitude areas, such as the Qinling Mountains, where it seriously threatens more than 300,000 km
2 of pine forests [
2]. Pine wood nematode disease has caused great losses to China’s forestry ecology and economy. In the 35 years between 1982 and 2017, the disease killed more than 50 million pine trees, and clear and selective pine forest cutting for epidemic control was conducted over an area of more than 4667 km
2. The related economic losses amount to tens of billions of dollars, and the epidemic has caused massive damage to China’s forest resources and ecological environment [
3].
The use of remote sensing technology to monitor forest pests offers the advantages of real-time dynamic monitoring, coverage of a large area, limited susceptibility to environmental interference, and short periods. The use of remote sensing technology to monitor pine wood nematode disease has recently made some progress. In the past, remote sensing to monitor pine wood nematode disease was mainly based on spectral histogram analysis of multispectral images. Kim et al. combined the normalized difference vegetation index (NDVI) with spectral histogram analysis of IKONOS images to identify areas affected by pine wood nematode disease [
4]. Then, based on on-site spectral observations, the researchers identified the characteristics of typical bands (green, red, and near-infrared bands) and constructed spectral characteristic indicators (red-edge parameters, vegetation indices, and time series characteristics). Determining the relationship between observed spectral characteristics and plant physiological characteristics, such as chlorophyll content, transpiration rate, and water content can help detect pine wood nematode disease [
5,
6,
7]. Huang et al. analyzed the hyperspectral time series characteristics and sensitive characteristics of healthy and susceptible plants, and they reported that the time series of plants infected with pine wood nematode disease showed large spectral differences, including a decrease in red-edge spectral reflectance and red-edge blueshift [
6]. Multiple spectral feature values in the near-infrared, red and blue edges are significant hyperspectral features that indicate the presence of pine wood nematode disease [
6]. Xu et al. collected the spectral characteristics of lodgepole and Masson pines at different susceptibility levels, and they found that the reflection spectrum curve and spectral characteristic parameters of different bands in the hyperspectral image can be used to analyze pathogenic mechanism at different stages, and the relationship model between spectral characteristics and chlorophyll can provide a reference for remote sensing monitoring of pine wood nematode disease [
7]. The existing studies generally rely only on the spectral characteristics of images as the basis for the identification of pine wood nematode disease, and few studies have attempted to use new technological means of analyzing high-resolution satellite remote sensing images to recognize pine wood nematode disease.
Deep convolutional neural networks (D-CNNs), which have efficient and accurate image recognition capabilities, have been widely used in computer vision and other fields. In recent years, D-CNNs have been introduced into the field of remote sensing and used in remote sensing big data analysis [
8,
9]. However, the current remote sensing processing methods based on D-CNNs have most often been applied to land use classification and feature target recognition [
10,
11,
12], and only a few studies have addressed forest pest monitoring and control. Ha et al. used deep learning to process images captured by unmanned aerial vehicles (UAVs) at low altitudes and to identify infected radish plants. The CNN obtained an accuracy of 93.3% [
13]. Rançon et al. obtained and labelled pictures of diseased and healthy vine plants, and 91% overall accuracy was obtained using deep features extracted from the MobileNet network trained on the ImageNet database [
14]. The above studies show that the use of deep learning can achieve higher accuracy than traditional machine learning methods provide. At the same time, the use of deep learning for pest detection can use pretrained networks without the need to redesign the network structure. In the monitoring of forest diseases and insect pests based on aerial images, Sylvain et al. used a D-CNN to identify the health status of trees, and the accuracy reached 94% [
15]. Safonova et al. used D-CNNs to detect Siberian fir trees at different susceptibility stages based on UAV images and achieved an accuracy of 98.77% in detecting susceptible fir trees at different stages [
16]. Qiao, Deng et al. used deep learning methods to classify and detect pine wood nematode disease based on UAV images and achieved high accuracy [
17,
18]. Most current studies of forest diseases and insect pests that use remote sensing technology are based on images obtained by UAVs or aerial imagery. UAV images have higher image resolution and richer detailed information than satellite images, making it easy to classify and detect ground objects accurately. Satellite remote sensing images, which offer large coverage, low cost, and relatively rough spatial resolution, are not fully utilized.
At present, research on deep learning for satellite remote sensing recognition of pine wood nematode disease is lacking, especially research that focuses on suitable deep learning networks and manually labelled samples. This study explores a deep learning model that is suitable for remote sensing image classification of pine wood nematode disease and uses China’s Gaofen-1 (GF-1) and Gaofen-2 (GF-2) images of pine wood nematode disease occurrence areas to construct a D-CNN sample dataset. Based on these samples, five excellent D-CNN models (AlexNet, GoogLeNet, SqueezeNet, ResNet-18, and VGG16) are selected for transfer learning, and the model with the best transfer learning effect is chosen for hyperparametric and structural optimization. The resulting model can accurately identify pine wood nematode disease. This study constructs a D-CNN model that is suitable for identifying satellite remote sensing images of pine wood nematode disease occurrence areas and provides technical support for the monitoring, prevention, and control of pine wood nematode disease.
2. Materials and Methods
2.1. Study Area
The spatial range of the remote sensing images used in this study is 41.6–42.2° N, 123.5–124.8° E (
Figure 1), covering Shenyang City District, Tieling City District, Fushun City District, Kaiyuan City, Tieling County, Fushun County, Xinbin Manchu Autonomous County, Qingyuan Manchu Autonomous County, Liaoning Province, China. The research area is rich in vegetation resources and is dominated by mountain forests, such as those of the Daxi, Tiebei, and Nantianmen Mountains. The genus Pinus has a wide range of distribution and is present in large numbers. Among the species in the area, Pinus densiflora Sieb. et Zucc., Pinus tabuliformis Carr., Pinus thunbergii Parl., and other pine tree species are hosts to pine wood nematodes. The study area has a northern temperate seasonal continental climate with cold, dry winters and warm, rainy summers, and its altitude ranges from 5.3 m to 1346.7 m. The average annual temperature is approximately 6–10 °C. The maximum temperature in August can reach 38 °C, and the minimum temperature in January can be below −35 °C. The average daily minimum temperature is above 0 °C beginning in April, and the average daily minimum temperature is below 0 °C beginning in November. An average of 600 to 850 mm of rainfall occurs yearly, and there are approximately 2500 h of sunshine annually, with longer sunshine hours in May and June and shorter sunshine hours in November and December. The annual average wind speed is 4.5 m per second. Relevant studies have shown that pine wood nematodes have a strong ability to adapt to temperatures above 0 °C [
2]. The range of latitude within the study area is also suitable for pine wood nematodes, and the study area provides suitable breeding conditions for pine wood nematodes [
2]. The studied area is a key area for the detection, prevention, and control of pine wood nematode disease. In recent years, the incidence of pine wood nematode disease in Fushun, Dandong, and Liaoning Province has been expanding, the degree of damage caused by these pests has been increasing, and massive economic losses have occurred [
19].
2.2. High-Resolution Remote Sensing Image Data
This study used 99 GF-1 and 50 GF-2 images with 1A product grades. The images were obtained from May to October of each year from 2013 to 2017. Pine wood nematode disease has been reported to occur in many cities and counties in Liaoning. The images from 2015–2017 provided information on the susceptible area, and the images from 2013–2014 provided reference information for normal forestland. High-spatial-resolution satellite images with a wide imaging range and short revisit period have advantages in forestry remote sensing applications. Such images have been widely used in forest resource monitoring and forest information extraction research, and can be used effectively to detect dynamic changes in forestland and vegetation cover [
20,
21]. The GF-1 images were obtained using 2 panchromatic/multispectral (PMS) cameras with spatial resolutions of 2 m for the panchromatic bands and 8 m for the multispectral bands and a width greater than 69 km. The multispectrum contains 4 bands (blue, green, red, and near-infrared bands), and the revisit period is only 4 days; the system thus integrates the advantages of high spatial resolution and high temporal resolution and can accurately reflect the spatial texture characteristics of the target. The GF-2 images were obtained using two PMS cameras with a spatial resolution of 1 m panchromatic bands/4 m multispectrum at a width of 45 km. The multispectral bands are the same as those of the GF-1 system and the revisit period is 5 days, further expanding the spatial resolution to the submeter level while maintaining excellent time resolution. The width reaches the highest level among international satellites with submeter resolution.
To build a deep learning model that is suitable for remote sensing image classification of pine wood nematode disease, we mainly go through three steps: dataset construction, transfer learning, and model optimization (
Figure 2).
2.3. Construction of a Manually Annotated Sample Dataset
The sample dataset is the basis for building the D-CNN. The D-CNN iteratively learns a large number of samples and uses that information to adjust the weight parameters of each neuron to achieve the extraction and recognition of multidimensional image features. The sample dataset acts directly on the parameters of the D-CNN, which has a profound impact on the recognition performed by the model.
We constructed the sample dataset in seven steps, including image selection, image fusion, band combinations, visual interpretation, sample cutting, Jeffries–Matusita distance separability calculation, sample balance, and augmentation. First, 76 remote sensing images with low cloud cover were selected from among 149 remote sensing images. To obtain remote sensing images with high spatial resolution and containing multispectral information as the basis for constructing the samples, the multispectral bands and the panchromatic bands of the remote sensing images were merged using the NNDiffuse pan sharpening method. The use of a combination of bands highlights the spectral characteristics of the vegetation disturbed by pine wood nematode disease. A large number of studies have confirmed that the red and green bands are very sensitive to color and to physiological changes caused by pine wood nematode disease. The red–green ratio index (RGRI = R/G) was calculated as one of the discriminant spectra. The near-infrared band is the most sensitive to changes caused by pine wood nematode disease, and the spectral difference between diseased and healthy plants is the largest in this range [
22]. The blue band can be used to increase the dimensionality of the spectral features. Therefore, RGRI, the near-infrared band, and the blue band are used as the input bands of the R, G, and B channels to synthesize the base image used to label the sample.
In the visual interpretation of the preprocessed remote sensing images based on the field survey data provided by the Forest and Grassland Pest Control Station of the State Forestry and Grassland Administration of China (
http://www.forestpest.org/ (accessed on 20 December 2018)), the spectral, textural, and other characteristics of the images obtained from each area in the corresponding periods are compared and evaluated for the presence of features characteristic of pine wood nematode disease, and the visual interpretation characteristics of the pine wood nematode disease-affected area are determined (
Figure 3). Plants in areas affected by pine wood nematode disease often present dark blue-green/blue-violet/dark green discolored areas, needles that show wilting and a clear granular texture without a large amount of shedding, and a clustered spatial distribution. Healthy forests are mostly characterized by green/light green areas with dense canopies and no obvious texture.
The samples are labelled according to their visual interpretation features. ENVI is used for precise positioning and cutting of the samples. The uniform sample size is 200 × 200 × 3 pixels, and all of the sample images are in TIFF/GeoTIFF format. After classification and labelling of each image, the labelled samples are divided into positive samples and negative samples. Positive samples represent woodlands infected with pine wood nematode disease and provide a direct reference for the identification and classification of research targets. They were generated from images obtained during the period of onset of pine wood nematode disease (2015–2017) in the study area. The negative samples present a collection of noninfectious surface features, including healthy forestland, agricultural land, construction land, and water (
Figure 3).
To avoid training errors caused by the phenomenon of “different bodies with the same spectrum” and “the same bodies with different spectra” between high-resolution images in susceptible forest land, healthy forest land, and agricultural land, this study uses the above three types of samples from the same image data source to calculate the separability of the Jeffries–Matusita (JM) distance for each image, and the result is used for separability testing and sample optimization [
23]. To minimize the feature learning bias that may result from the use of imbalanced data, this research uses the label shuffle algorithm proposed by Hikvision as the sample category balancing strategy [
24]. The labelled samples are flipped horizontally and vertically to increase the sample size 3-fold and thereby obtain the final sample size. When training and verifying samples, random image flipping is performed to improve the generalizability of the model.
According to the network training requirements, the samples are further divided into a training/validation dataset and a test dataset. A total of 3570 samples were used for training and testing. Of these, 3030 samples were used for model training and validation, and 540 samples were used for model testing. Of the 3030 training/validation samples, 2424 samples were used for training, and 606 samples were used for validation.
2.4. Transfer Learning of D-CNN
This study implements the transfer learning of CNNs. Five commonly used models pretrained on ImageNet, including AlexNet [
25], GoogLeNet [
26], SqueezeNet [
27], ResNet-18 [
28], and VGG16 [
29], are selected for training. These five models exhibit high accuracy in many tasks. We carried out the experiment with MATLAB software, and the training environment used in this study is: Windows10 64 bit operating system, 8 GB RAM, i7-5500U quad-core processor, and NVIDIA GeForce920M was used to accelerate the training of the models. The commonly used hyperparameters for transfer learning are set as follows: the batch size is 64, the learning rate is 0.001, and a total of 20 epochs of training are performed. To reduce memory usage during VGG16 model training, training is performed under the conditions of small sample size, small batch size (16), and a high learning rate (0.1). According to (i) the training time of the model, (ii) the classification accuracy result of the validation data, (iii) the convergence speed (runtime) of the model, and (iv) the stability of the accuracy and loss after convergence, we compare the effects of these pretrained network models on the transfer learning of the sample dataset and determine the best model for subsequent research.
2.5. Training Parameter Optimization of the Deep Convolution Neural Network
Not only the structure of the D-CNN but also the hyperparameters set in the training model have a direct impact on the transfer learning effect of the network. The initial training parameters that have a decisive effect on the feature learning performance are the batch size and the learning rate.
Increasing the batch size within a reasonable range can improve the efficiency of hardware memory usage, reduce the number of parameter updates (iterations) during each epoch, speed up processing of the same amount of data, improve the accuracy of the stochastic gradient descent direction, and stabilize the model training process. If the batch size exceeds a reasonable range, the full batch learning strategy may, in extreme cases, lead to insufficient hardware memory capacity and slow changes in the direction of the stochastic gradient descent, resulting in slow model training.
A learning rate that is too high will cause the model to fall too rapidly and thereby fail to arrive at the solution needed to minimize the loss function, and this will limit or even reduce the model’s classification accuracy. In contrast, a learning rate that is too low will cause the correction of the weight parameter to be slow, and this may make the model fall into the local optimal solution of the loss function instead of the global optimal solution; this not only reduces the network training speed but also fails to achieve appropriate model accuracy.
In this study, the batch size is set to 32, 64, 128, and 256, values that are commonly used in previous research and applications, and the transfer training effects of suitable models under these 4 batch size conditions are compared. To compare the transfer training effects of suitable models under different learning rates, the learning rate parameter is set to a constant learning rate series and to a variable learning rate. The constant learning rate series includes 1e−4, 5e−4, 1e−3, 3e−3, and 5e−3, the initial value of the variable learning rate is 1e−3, the drop coefficient is 0.5, and the variable learning rate changes every 5 training epochs.
2.6. Structure Optimization of the Deep Convolution Neural Network
The diversity of the structural design of D-CNNs and the complex interactions between the various layers of the network provide room for improving the network model, making it possible for the existing model to achieve optimal training accuracy and efficiency through structural adjustments. This study adopts the strategy of “macroarchitecture combined with a micromodule for joint tuning and improvement” to improve the best model obtained by transfer learning. The macroarchitecture of the model is improved using two model structure optimization methods: one method is based on a simple bypass connection structure [
27], and the other is based on a Slim module structure [
30]. The micromodules are adjusted by replacing the activation functions, introducing a batch normalization (BN) layer and a dropout layer and reducing the network structure. For model optimization, we compare the learning effect of each adjustment strategy using the sample dataset.
The improvement based on the simple bypass connection structure involves adding shortcut connections to the D-CNN that skip one or more layers and one or more modules [
27,
28]. When the network deepens, the use of shortcut connections can partially solve the network degradation problem and alleviate the disappearing gradient problem during back propagation.
Based on the improvement of the Slim module structure, one or more modules in the model are replaced with a Slim module. The Slim module introduces the idea of group convolutions and singular bottlenecks. The group convolution is for the channel: the input channel is divided into multiple groups so that the convolutions reduce the number of parameters and the number of calculations. The singular bottleneck is a nonlinear transformation that is preserved only once in the structure of the bottleneck, thereby improving the classification accuracy [
30].
The input layer of the proposed network follows the squeezenet, with the input size of 227 × 227 × 3 and zero-center normalization. The output layer is the softmax layer. It per-forms the classification by respectively calculating the probability of five categories (pine nematode disease-affected area, healthy forest, agricultural land, construction land, and water) of each feature map.
2.7. Evaluation of the Recognition Effect
This study uses evaluation indicators that are widely used in existing studies [
31], namely, overall accuracy (OA), recall (true positive rate, TPR), and false alarm rate (false positive rate, FPR), to evaluate the recognition effect of the improved D-CNN model on the test samples. In addition, considering that pine wood nematode disease-affected areas and healthy forestlands are easily confused, the inter-forestland TPR (TPRF) and the inter-forestland FPR (FPRF) indicators for the two classifications are also calculated. The formulas used to calculate the evaluation indices are as follows:
In the formulas, OA represents the ratio of correctly identified samples to total samples, TP represents the number of correctly identified samples from susceptible areas, FN represents the number of samples from susceptible areas that were incorrectly identified as nonsusceptible areas, FP represents the number of samples from nonsusceptible areas that were incorrectly identified as susceptible areas, and TN represents the number of correctly identified samples from nonsusceptible areas. FNforest represents the number of samples from susceptible areas that were incorrectly identified as healthy forestlands, FPforest represents the number of samples from healthy forestlands that were incorrectly identified as susceptible areas, and TNforest represents the number of correctly identified samples from healthy forestlands. The TPR represents the accuracy of samples from pine wood nematode disease-affected areas identified by the network model; the FPR represents the false positive rate of samples from nonsusceptible areas identified by the network model. The TPRF and FPRF are similar to the TPR and FPR, respectively, mainly reflecting the correct identification of positive samples and the misclassification of negative samples between forestlands. In general, the larger the OA, TPR, and TPRF are, the smaller the FPR and FPRF are and the better is the model’s ability to recognize pine wood nematode disease-affected areas.
4. Discussion
This study mainly explored the method and effect of using D-CNN technology to identify pine wood nematode disease-affected areas using high-spatial-resolution satellite remote sensing images. A sample dataset is constructed based on GF remote sensing images of areas in which pine wood nematode disease is present. Using five commonly used CNN models for transfer learning, SqueezeNet is found to be the best model for transfer learning of the sample dataset. The training parameters of SqueezeNet are then optimized, and it is found that a batch size of 64 and a learning rate of 1 e−4 are suitable. Then, using the strategy of “macroarchitecture combined with micromodule for joint tuning and improvement” to optimize the SqueezeNet structure, it is found that an improved model based on the Slim module structure has the best accuracy for identifying pine wood nematode disease occurrence areas. The improved model can be used to identify areas susceptible to pine wood nematode disease and provides an important technical method for the monitoring and control of pine wood nematode disease.
Although some studies have shown that conventional image processing techniques can accurately identify trees infected with pine nematode disease [
32,
33], such identification has two basic requirements. The first is the need for a large amount of data, including ground survey data, forest cover data, satellite remote sensing data, airborne aerial photography data, and other types of data, from multiple sources. Second, airborne images with a resolution of 20 cm or higher are needed, and even satellite images with a resolution of 0.5 m cannot meet the requirements for identification of areas affected by the disease. The cost of traditional technology is very high and this limits its scope of application. A few previous studies [
16,
17,
18,
34,
35] have applied deep learning techniques to remote sensing images to detect and identify forest pests. In these studies, deep learning technology is basically applied to UAV remote sensing images to enable the identification, classification, and detection of damaged trees; for this purpose, airborne images of 20 cm or higher resolution are needed, and no studies based on high-resolution satellite remote sensing imagery have been performed. Based on UAV images, Deng et al. used the improved faster region convolutional neural network method to detect trees killed by pine wood nematode disease, and the detection accuracy reached approximately 90% [
18]. Safonova et al. used a D-CNN based on UAV images to detect fir trees in different susceptible stages, and the detection accuracy of fir in some stages reached 98.77% [
16]. UAV images have higher resolution and more richly detailed information than high-resolution satellite remote sensing images and can be used to classify and detect objects more accurately. However, high-resolution satellite remote sensing images have the advantages of large coverage, wide monitoring area, richness of time-series information, and low cost, and they can therefore be applied over large areas. This study, which is based on high-resolution satellite remote sensing images, uses the improved SqueezeNet model based on the Slim module structure to classify the test samples with an accuracy of 94.90%; thus, it can better identify and classify images of areas in which pine wood nematode disease is present than the comparison methods.
In many cases, when we use deep learning technology for classification and recognition tasks, we do not create a new D-CNN model but select existing network models that have strong feature extraction ability, high classification accuracy, and pretraining for transfer learning. Based on the powerful ability of the existing weighting parameters in the pretrained network to extract rich features from natural images and the basic features common among samples from different datasets, the network model can be adapted to new visual tasks with minimal weight readjustment. However, different network models employ different design philosophies, model structures, and weighting parameters, and these differences have different effects on the classification and recognition of new datasets. In this study, we use five popular pretrained models with strong feature extraction capabilities, namely, AlexNet, GoogLeNet, SqueezeNet, ResNet-18, and VGG16, for transfer learning on sample datasets to find the most appropriate network model for our task.
D-CNN models have much room for design optimization. A number of scholars have proposed excellent model optimization strategies, such as the use of a 1 × 1 convolution kernel in GoogLeNet to reduce the number of parameters and the use of a residual structure to solve the network degradation problem in ResNet. SqueezeNet uses fire modules and global average pooling to replace the fully connected layers and thus to compress the parameters significantly. At the same time, it retains large feature maps before global average pooling, thereby preserving more information and improving the classification accuracy of the model. This study optimizes SqueezeNet using improvements based on a simple bypass connection structure and improvements based on the Slim module structure. The training speed of the improved method based on the Slim module structure is faster than that of the model based on a simple bypass connection; this is directly related to the former’s use of the group convolution strategy to reduce the number of weighting parameters and operations, and the reduction in the number of weighting parameters does not have a significant impact on the accuracy of identification of areas in which pine wood nematode disease is present. In addition, the optimization strategies of replacing the activation function, introducing a BN layer, reducing the number of modules, and introducing a dropout layer were conducted in a step-by-step manner. The results show that these methods are very helpful in improving the performance of the network model in terms of both speed and accuracy. Introducing a BN layer can improve gradient dispersion and thus improve the training accuracy; reducing the number of modules can remove redundant layers and thus speed up network training, and introducing a dropout layer can prevent network overfitting in some structures, thereby improving generalizability.
In this study, deep learning technology was applied to the classification of pine wood nematode disease satellite remote sensing images, and good results were achieved with an accuracy of 94.90%. However, some aspects of this work need to be improved and expanded. In this study, only GF-1 and GF-2 images were used in the construction of the datasets. The use of only a few data types limits the scope of application of the trained network model. In addition, this study used D-CNN to identify and classify satellite remote sensing images of pine wood nematode disease occurrence areas, but it did not attempt to detect dead wood or to study how factors, such as the age of the trees and the characteristics of the terrain affect the results. Other causes, such as drought, can also kill pine trees, combined with ground investigation, the error can be limited to an acceptable range. These areas of research are key areas in which research will be conducted next.