Remote Sensing

Research

16 pages, 13757 KiB

Open AccessArticle

Clouds Classification from Sentinel-2 Imagery with Deep Residual Learning and Semantic Image Segmentation

by Cheng-Chien Liu, Yu-Cheng Zhang, Pei-Yin Chen, Chien-Chih Lai, Yi-Hsin Chen, Ji-Hong Cheng and Ming-Hsun Ko

Remote Sens. 2019, 11(2), 119; https://doi.org/10.3390/rs11020119 - 10 Jan 2019

Cited by 47 | Viewed by 10711

Abstract

Detecting changes in land use and land cover (LULC) from space has long been the main goal of satellite remote sensing (RS), yet the existing and available algorithms for cloud classification are not reliable enough to attain this goal in an automated fashion. [...] Read more.

Detecting changes in land use and land cover (LULC) from space has long been the main goal of satellite remote sensing (RS), yet the existing and available algorithms for cloud classification are not reliable enough to attain this goal in an automated fashion. Clouds are very strong optical signals that dominate the results of change detection if they are not removed completely from imagery. As various architectures of deep learning (DL) have been proposed and advanced quickly, their potential in perceptual tasks has been widely accepted and successfully applied to many fields. A comprehensive survey of DL in RS has been reviewed, and the RS community has been suggested to be leading researchers in DL. Based on deep residual learning, semantic image segmentation, and the concept of atrous convolution, we propose a new DL architecture, named CloudNet, with an enhanced capability of feature extraction for classifying cloud and haze from Sentinel-2 imagery, with the intention of supporting automatic change detection in LULC. To ensure the quality of the training dataset, scene classification maps of Taiwan processed by Sen2cor were visually examined and edited, resulting in a total of 12,769 sub-images with a standard size of 224 × 224 pixels, cut from the Sen2cor-corrected images and compiled in a trainset. The data augmentation technique enabled CloudNet to have stable cirrus identification capability without extensive training data. Compared to the traditional method and other DL methods, CloudNet had higher accuracy in cloud and haze classification, as well as better performance in cirrus cloud recognition. CloudNet will be incorporated into the Open Access Satellite Image Service to facilitate change detection by using Sentinel-2 imagery on a regular and automatic basis. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Figure 1

26 pages, 26823 KiB

Open AccessArticle

A Ship Rotation Detection Model in Remote Sensing Images Based on Feature Fusion Pyramid Network and Deep Reinforcement Learning

by Kun Fu, Yang Li, Hao Sun, Xue Yang, Guangluan Xu, Yuting Li and Xian Sun

Remote Sens. 2018, 10(12), 1922; https://doi.org/10.3390/rs10121922 - 30 Nov 2018

Cited by 42 | Viewed by 6368

Abstract

Ship detection plays an important role in automatic remote sensing image interpretation. The scale difference, large aspect ratio of ship, complex remote sensing image background and ship dense parking scene make the detection task difficult. To handle the challenging problems above, we propose [...] Read more.

Ship detection plays an important role in automatic remote sensing image interpretation. The scale difference, large aspect ratio of ship, complex remote sensing image background and ship dense parking scene make the detection task difficult. To handle the challenging problems above, we propose a ship rotation detection model based on a Feature Fusion Pyramid Network and deep reinforcement learning (FFPN-RL) in this paper. The detection network can efficiently generate the inclined rectangular box for ship. First, we propose the Feature Fusion Pyramid Network (FFPN) that strengthens the reuse of different scales features, and FFPN can extract the low level location and high level semantic information that has an important impact on multi-scale ship detection and precise location of dense parking ships. Second, in order to get accurate ship angle information, we apply deep reinforcement learning to the inclined ship detection task for the first time. In addition, we put forward prior policy guidance and a long-term training method to train an angle prediction agent constructed through a dueling structure Q network, which is able to iteratively and accurately obtain the ship angle. In addition, we design soft rotation non-maximum suppression to reduce the missed ship detection while suppressing the redundant detection boxes. We carry out detailed experiments on the remote sensing ship image dataset, and the experiments validate that our FFPN-RL ship detection model has efficient detection performance. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

16 pages, 4977 KiB

Open AccessArticle

Neural Network Based Kalman Filters for the Spatio-Temporal Interpolation of Satellite-Derived Sea Surface Temperature

by Said Ouala, Ronan Fablet, Cédric Herzet, Bertrand Chapron, Ananda Pascual, Fabrice Collard and Lucile Gaultier

Remote Sens. 2018, 10(12), 1864; https://doi.org/10.3390/rs10121864 - 22 Nov 2018

Cited by 29 | Viewed by 6049

Abstract

The forecasting and reconstruction of oceanic dynamics is a crucial challenge. While model driven strategies are still the state-of-the-art approaches in the reconstruction of spatio-temporal dynamics. The ever increasing availability of data collections in oceanography raised the relevance of data-driven approaches as computationally [...] Read more.

The forecasting and reconstruction of oceanic dynamics is a crucial challenge. While model driven strategies are still the state-of-the-art approaches in the reconstruction of spatio-temporal dynamics. The ever increasing availability of data collections in oceanography raised the relevance of data-driven approaches as computationally efficient representations of spatio-temporal fields reconstruction. This tools proved to outperform classical state-of-the-art interpolation techniques such as optimal interpolation and DINEOF in the retrievement of fine scale structures while still been computationally efficient comparing to model based data assimilation schemes. However, coupling this data-driven priors to classical filtering schemes limits their potential representativity. From this point of view, the recent advances in machine learning and especially neural networks and deep learning can provide a new infrastructure for dynamical modeling and interpolation within a data-driven framework. In this work we adress this challenge and develop a novel Neural-Network-based (NN-based) Kalman filter for spatio-temporal interpolation of sea surface dynamics. Based on a data-driven probabilistic representation of spatio-temporal fields, our approach can be regarded as an alternative to classical filtering schemes such as the ensemble Kalman filters (EnKF) in data assimilation. Overall, the key features of the proposed approach are two-fold: (i) we propose a novel architecture for the stochastic representation of two dimensional (2D) geophysical dynamics based on a neural networks, (ii) we derive the associated parametric Kalman-like filtering scheme for a computationally-efficient spatio-temporal interpolation of Sea Surface Temperature (SST) fields. We illustrate the relevance of our contribution for an OSSE (Observing System Simulation Experiment) in a case-study region off South Africa. Our numerical experiments report significant improvements in terms of reconstruction performance compared with operational and state-of-the-art schemes (e.g., optimal interpolation, Empirical Orthogonal Function (EOF) based interpolation and analog data assimilation). Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

24 pages, 2066 KiB

Open AccessArticle

Fast Cloud Segmentation Using Convolutional Neural Networks

by Johannes Drönner, Nikolaus Korfhage, Sebastian Egli, Markus Mühling, Boris Thies, Jörg Bendix, Bernd Freisleben and Bernhard Seeger

Remote Sens. 2018, 10(11), 1782; https://doi.org/10.3390/rs10111782 - 10 Nov 2018

Cited by 88 | Viewed by 9310

Abstract

Information about clouds is important for observing and predicting weather and climate as well as for generating and distributing solar power. Most existing approaches extract cloud information from satellite data by classifying individual pixels instead of using closely integrated spatial information, ignoring the [...] Read more.

Information about clouds is important for observing and predicting weather and climate as well as for generating and distributing solar power. Most existing approaches extract cloud information from satellite data by classifying individual pixels instead of using closely integrated spatial information, ignoring the fact that clouds are highly dynamic, spatially continuous entities. This paper proposes a novel cloud classification method based on deep learning. Relying on a Convolutional Neural Network (CNN) architecture for image segmentation, the presented Cloud Segmentation CNN (CS-CNN), classifies all pixels of a scene simultaneously rather than individually. We show that CS-CNN can successfully process multispectral satellite data to classify continuous phenomena such as highly dynamic clouds. The proposed approach produces excellent results on Meteosat Second Generation (MSG) satellite data in terms of quality, robustness, and runtime compared to other machine learning methods such as random forests. In particular, comparing CS-CNN with the CLAAS-2 cloud mask derived from MSG data shows high accuracy (0.94) and Heidke Skill Score (0.90) values. In contrast to a random forest, CS-CNN produces robust results and is insensitive to challenges created by coast lines and bright (sand) surface areas. Using GPU acceleration, CS-CNN requires only 25 ms of computation time for classification of images of Europe with

508 \times 508

pixels. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

24 pages, 18125 KiB

Open AccessArticle

Dialectical GAN for SAR Image Translation: From Sentinel-1 to TerraSAR-X

by Dongyang Ao, Corneliu Octavian Dumitru, Gottfried Schwarz and Mihai Datcu

Remote Sens. 2018, 10(10), 1597; https://doi.org/10.3390/rs10101597 - 8 Oct 2018

Cited by 39 | Viewed by 8518

Abstract

With more and more SAR applications, the demand for enhanced high-quality SAR images has increased considerably. However, high-quality SAR images entail high costs, due to the limitations of current SAR devices and their image processing resources. To improve the quality of SAR images [...] Read more.

With more and more SAR applications, the demand for enhanced high-quality SAR images has increased considerably. However, high-quality SAR images entail high costs, due to the limitations of current SAR devices and their image processing resources. To improve the quality of SAR images and to reduce the costs of their generation, we propose a Dialectical Generative Adversarial Network (Dialectical GAN) to generate high-quality SAR images. This method is based on the analysis of hierarchical SAR information and the “dialectical” structure of GAN frameworks. As a demonstration, a typical example will be shown, where a low-resolution SAR image (e.g., a Sentinel-1 image) with large ground coverage is translated into a high-resolution SAR image (e.g., a TerraSAR-X image). A new algorithm is proposed based on a network framework by combining conditional WGAN-GP (Wasserstein Generative Adversarial Network—Gradient Penalty) loss functions and Spatial Gram matrices under the rule of dialectics. Experimental results show that the SAR image translation works very well when we compare the results of our proposed method with the selected traditional methods. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Figure 1

17 pages, 2529 KiB

Open AccessArticle

Mining Hard Negative Samples for SAR-Optical Image Matching Using Generative Adversarial Networks

by Lloyd Haydn Hughes, Michael Schmitt and Xiao Xiang Zhu

Remote Sens. 2018, 10(10), 1552; https://doi.org/10.3390/rs10101552 - 27 Sep 2018

Cited by 48 | Viewed by 7282

Abstract

In this paper, we propose a generative framework to produce similar yet novel samples for a specified image. We then propose the use of these images as hard-negatives samples, within the framework of hard-negative mining, in order to improve the performance of classification [...] Read more.

In this paper, we propose a generative framework to produce similar yet novel samples for a specified image. We then propose the use of these images as hard-negatives samples, within the framework of hard-negative mining, in order to improve the performance of classification networks in applications which suffer from sparse labelled training data. Our approach makes use of a variational autoencoder (VAE) which is trained in an adversarial manner in order to learn a latent distribution of the training data, as well as to be able to generate realistic, high quality image patches. We evaluate our proposed generative approach to hard-negative mining on a synthetic aperture radar (SAR) and optical image matching task. Using an existing SAR-optical matching network as the basis for our investigation, we compare the performance of the matching network trained using our approach to the baseline method, as well as to two other hard-negative mining methods. Our proposed generative architecture is able to generate realistic, very high resolution (VHR) SAR image patches which are almost indistinguishable from real imagery. Furthermore, using the patches as hard-negative samples, we are able to improve the overall accuracy, and significantly decrease the false positive rate of the SAR-optical matching task—thus validating our generative hard-negative mining approaches’ applicability to improve training in data sparse applications. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

28 pages, 28181 KiB

Open AccessArticle

3D Façade Labeling over Complex Scenarios: A Case Study Using Convolutional Neural Network and Structure-From-Motion

by Rodolfo Georjute Lotte, Norbert Haala, Mateusz Karpina, Luiz Eduardo Oliveira e Cruz de Aragão and Yosio Edemir Shimabukuro

Remote Sens. 2018, 10(9), 1435; https://doi.org/10.3390/rs10091435 - 8 Sep 2018

Cited by 19 | Viewed by 6427

Abstract

Urban environments are regions in which spectral variability and spatial variability are extremely high, with a huge range of shapes and sizes, and they also demand high resolution images for applications involving their study. Due to the fact that these environments can grow [...] Read more.

Urban environments are regions in which spectral variability and spatial variability are extremely high, with a huge range of shapes and sizes, and they also demand high resolution images for applications involving their study. Due to the fact that these environments can grow even more over time, applications related to their monitoring tend to turn to autonomous intelligent systems, which together with remote sensing data could help or even predict daily life situations. The task of mapping cities by autonomous operators was usually carried out by aerial optical images due to its scale and resolution; however new scientific questions have arisen, and this has led research into a new era of highly-detailed data extraction. For many years, using artificial neural models to solve complex problems such as automatic image classification was commonplace, owing much of their popularity to their ability to adapt to complex situations without needing human intervention. In spite of that, their popularity declined in the mid-2000s, mostly due to the complex and time-consuming nature of their methods and workflows. However, newer neural network architectures have brought back the interest in their application for autonomous classifiers, especially for image classification purposes. Convolutional Neural Networks (CNN) have been a trend for pixel-wise image segmentation, showing flexibility when detecting and classifying any kind of object, even in situations where humans failed to perceive differences, such as in city scenarios. In this paper, we aim to explore and experiment with state-of-the-art technologies to semantically label 3D urban models over complex scenarios. To achieve these goals, we split the problem into two main processing lines: first, how to correctly label the façade features in the 2D domain, where a supervised CNN is used to segment ground-based façade images into six feature classes, roof, window, wall, door, balcony and shop; second, a Structure-from-Motion (SfM) and Multi-View-Stereo (MVS) workflow is used to extract the geometry of the façade, wherein the segmented images in the previous stage are then used to label the generated mesh by a “reverse” ray-tracing technique. This paper demonstrates that the proposed methodology is robust in complex scenarios. The façade feature inferences have reached up to 93% accuracy over most of the datasets used. Although it still presents some deficiencies in unknown architectural styles and needs some improvements to be made regarding 3D-labeling, we present a consistent and simple methodology to handle the problem. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

25 pages, 11799 KiB

Open AccessArticle

WeedMap: A Large-Scale Semantic Weed Mapping Framework Using Aerial Multispectral Imaging and Deep Neural Network for Precision Farming

by Inkyu Sa, Marija Popović, Raghav Khanna, Zetao Chen, Philipp Lottes, Frank Liebisch, Juan Nieto, Cyrill Stachniss, Achim Walter and Roland Siegwart

Remote Sens. 2018, 10(9), 1423; https://doi.org/10.3390/rs10091423 - 7 Sep 2018

Cited by 203 | Viewed by 17320

Abstract

The ability to automatically monitor agricultural fields is an important capability in precision farming, enabling steps towards more sustainable agriculture. Precise, high-resolution monitoring is a key prerequisite for targeted intervention and the selective application of agro-chemicals. The main goal of this paper is [...] Read more.

The ability to automatically monitor agricultural fields is an important capability in precision farming, enabling steps towards more sustainable agriculture. Precise, high-resolution monitoring is a key prerequisite for targeted intervention and the selective application of agro-chemicals. The main goal of this paper is developing a novel crop/weed segmentation and mapping framework that processes multispectral images obtained from an unmanned aerial vehicle (UAV) using a deep neural network (DNN). Most studies on crop/weed semantic segmentation only consider single images for processing and classification. Images taken by UAVs often cover only a few hundred square meters with either color only or color and near-infrared (NIR) channels. Although a map can be generated by processing single segmented images incrementally, this requires additional complex information fusion techniques which struggle to handle high fidelity maps due to their computational costs and problems in ensuring global consistency. Moreover, computing a single large and accurate vegetation map (e.g., crop/weed) using a DNN is non-trivial due to difficulties arising from: (1) limited ground sample distances (GSDs) in high-altitude datasets, (2) sacrificed resolution resulting from downsampling high-fidelity images, and (3) multispectral image alignment. To address these issues, we adopt a stand sliding window approach that operates on only small portions of multispectral orthomosaic maps (tiles), which are channel-wise aligned and calibrated radiometrically across the entire map. We define the tile size to be the same as that of the DNN input to avoid resolution loss. Compared to our baseline model (i.e., SegNet with 3 channel RGB (red, green, and blue) inputs) yielding an area under the curve (AUC) of [background=0.607, crop=0.681, weed=0.576], our proposed model with 9 input channels achieves [0.839, 0.863, 0.782]. Additionally, we provide an extensive analysis of 20 trained models, both qualitatively and quantitatively, in order to evaluate the effects of varying input channels and tunable network hyperparameters. Furthermore, we release a large sugar beet/weed aerial dataset with expertly guided annotations for further research in the fields of remote sensing, precision agriculture, and agricultural robotics. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

19 pages, 2112 KiB

Open AccessArticle

Aircraft Type Recognition in Remote Sensing Images Based on Feature Learning with Conditional Generative Adversarial Networks

by Yuhang Zhang, Hao Sun, Jiawei Zuo, Hongqi Wang, Guangluan Xu and Xian Sun

Remote Sens. 2018, 10(7), 1123; https://doi.org/10.3390/rs10071123 - 16 Jul 2018

Cited by 35 | Viewed by 7366

Abstract

Aircraft type recognition plays an important role in remote sensing image interpretation. Traditional methods suffer from bad generalization performance, while deep learning methods require large amounts of data with type labels, which are quite expensive and time-consuming to obtain. To overcome the aforementioned [...] Read more.

Aircraft type recognition plays an important role in remote sensing image interpretation. Traditional methods suffer from bad generalization performance, while deep learning methods require large amounts of data with type labels, which are quite expensive and time-consuming to obtain. To overcome the aforementioned problems, in this paper, we propose an aircraft type recognition framework based on conditional generative adversarial networks (GANs). First, we design a new method to precisely detect aircrafts’ keypoints, which are used to generate aircraft masks and locate the positions of the aircrafts. Second, a conditional GAN with a region of interest (ROI)-weighted loss function is trained on unlabeled aircraft images and their corresponding masks. Third, an ROI feature extraction method is carefully designed to extract multi-scale features from the GAN in the regions of aircrafts. After that, a linear support vector machine (SVM) classifier is adopted to classify each sample using their features. Benefiting from the GAN, we can learn features which are strong enough to represent aircrafts based on a large unlabeled dataset. Additionally, the ROI-weighted loss function and the ROI feature extraction method make the features more related to the aircrafts rather than the background, which improves the quality of features and increases the recognition accuracy significantly. Thorough experiments were conducted on a challenging dataset, and the results prove the effectiveness of the proposed aircraft type recognition framework. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

19 pages, 6126 KiB

Open AccessArticle

A Fast Dense Spectral–Spatial Convolution Network Framework for Hyperspectral Images Classification

by Wenju Wang, Shuguang Dou, Zhongmin Jiang and Liujie Sun

Remote Sens. 2018, 10(7), 1068; https://doi.org/10.3390/rs10071068 - 5 Jul 2018

Cited by 329 | Viewed by 10937

Abstract

Recent research shows that deep-learning-derived methods based on a deep convolutional neural network have high accuracy when applied to hyperspectral image (HSI) classification, but long training times. To reduce the training time and improve accuracy, in this paper we propose an end-to-end fast [...] Read more.

Recent research shows that deep-learning-derived methods based on a deep convolutional neural network have high accuracy when applied to hyperspectral image (HSI) classification, but long training times. To reduce the training time and improve accuracy, in this paper we propose an end-to-end fast dense spectral–spatial convolution (FDSSC) framework for HSI classification. The FDSSC framework uses different convolutional kernel sizes to extract spectral and spatial features separately, and the “valid” convolution method to reduce the high dimensions. Densely-connected structures—the input of each convolution consisting of the output of all previous convolution layers—was used for deep learning of features, leading to extremely accurate classification. To increase speed and prevent overfitting, the FDSSC framework uses a dynamic learning rate, parametric rectified linear units, batch normalization, and dropout layers. These attributes enable the FDSSC framework to achieve accuracy within as few as 80 epochs. The experimental results show that with the Indian Pines, Kennedy Space Center, and University of Pavia datasets, the proposed FDSSC framework achieved state-of-the-art performance compared with existing deep-learning-based methods while significantly reducing the training time. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Figure 1

25 pages, 6200 KiB

Open AccessArticle

CraterIDNet: An End-to-End Fully Convolutional Neural Network for Crater Detection and Identification in Remotely Sensed Planetary Images

by Hao Wang, Jie Jiang and Guangjun Zhang

Remote Sens. 2018, 10(7), 1067; https://doi.org/10.3390/rs10071067 - 5 Jul 2018

Cited by 52 | Viewed by 7093

Abstract

The detection and identification of impact craters on a planetary surface are crucially important for planetary studies and autonomous navigation. Crater detection refers to finding craters in a given image, whereas identification means to actually mapping them to particular reference craters. However, no [...] Read more.

The detection and identification of impact craters on a planetary surface are crucially important for planetary studies and autonomous navigation. Crater detection refers to finding craters in a given image, whereas identification means to actually mapping them to particular reference craters. However, no method is available for simultaneously detecting and identifying craters with sufficient accuracy and robustness. Thus, this study proposes a novel end-to-end fully convolutional neural network (CNN), namely, CraterIDNet, which takes remotely sensed planetary images of any size as input and outputs detected crater positions, apparent diameters, and identification results. CraterIDNet comprises two pipelines, namely, crater detection pipeline (CDP) and crater identification pipeline (CIP). First, we propose a pre-trained model with high generalization performance for transfer learning. Then, anchor scale optimization and anchor density adjustment are proposed for CDP. In addition, multi-scale impact craters are detected simultaneously by using different feature maps with multi-scale receptive fields. These strategies considerably improve the detection performance of small craters. Furthermore, a grid pattern layer is proposed to generate grid patterns with rotation and scale invariance for CIP. The grid pattern integrates the distribution and scale information of nearby craters, which will remarkably improve identification robustness when combined with the CNN framework. We comprehensively evaluate CraterIDNet and present state-of-the-art crater detection and identification performance with a small network architecture (4 MB). Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Figure 1

16 pages, 10992 KiB

Open AccessArticle

Deriving High Spatiotemporal Remote Sensing Images Using Deep Convolutional Network

by Zhenyu Tan, Peng Yue, Liping Di and Junmei Tang

Remote Sens. 2018, 10(7), 1066; https://doi.org/10.3390/rs10071066 - 5 Jul 2018

Cited by 120 | Viewed by 7272

Abstract

Due to technical and budget limitations, there are inevitably some trade-offs in the design of remote sensing instruments, making it difficult to acquire high spatiotemporal resolution remote sensing images simultaneously. To address this problem, this paper proposes a new data fusion model named [...] Read more.

Due to technical and budget limitations, there are inevitably some trade-offs in the design of remote sensing instruments, making it difficult to acquire high spatiotemporal resolution remote sensing images simultaneously. To address this problem, this paper proposes a new data fusion model named the deep convolutional spatiotemporal fusion network (DCSTFN), which makes full use of a convolutional neural network (CNN) to derive high spatiotemporal resolution images from remotely sensed images with high temporal but low spatial resolution (HTLS) and low temporal but high spatial resolution (LTHS). The DCSTFN model is composed of three major parts: the expansion of the HTLS images, the extraction of high frequency components from LTHS images, and the fusion of extracted features. The inputs of the proposed network include a pair of HTLS and LTHS reference images from a single day and another HTLS image on the prediction date. Convolution is used to extract key features from inputs, and deconvolution is employed to expand the size of HTLS images. The features extracted from HTLS and LTHS images are then fused with the aid of an equation that accounts for temporal ground coverage changes. The output image on the prediction day has the spatial resolution of LTHS and temporal resolution of HTLS. Overall, the DCSTFN model establishes a complex but direct non-linear mapping between the inputs and the output. Experiments with MODerate Resolution Imaging Spectroradiometer (MODIS) and Landsat Operational Land Imager (OLI) images show that the proposed CNN-based approach not only achieves state-of-the-art accuracy, but is also more robust than conventional spatiotemporal fusion algorithms. In addition, DCSTFN is a faster and less time-consuming method to perform the data fusion with the trained network, and can potentially be applied to the bulk processing of archived data. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

22 pages, 14068 KiB

Open AccessArticle

Land Cover Segmentation of Airborne LiDAR Data Using Stochastic Atrous Network

by Hasan Asy’ari Arief, Geir-Harald Strand, Håvard Tveite and Ulf Geir Indahl

Remote Sens. 2018, 10(6), 973; https://doi.org/10.3390/rs10060973 - 19 Jun 2018

Cited by 30 | Viewed by 7314

Abstract

Inspired by the success of deep learning techniques in dense-label prediction and the increasing availability of high precision airborne light detection and ranging (LiDAR) data, we present a research process that compares a collection of well-proven semantic segmentation architectures based on the deep [...] Read more.

Inspired by the success of deep learning techniques in dense-label prediction and the increasing availability of high precision airborne light detection and ranging (LiDAR) data, we present a research process that compares a collection of well-proven semantic segmentation architectures based on the deep learning approach. Our investigation concludes with the proposition of some novel deep learning architectures for generating detailed land resource maps by employing a semantic segmentation approach. The contribution of our work is threefold. (1) First, we implement the multiclass version of the intersection-over-union (IoU) loss function that contributes to handling highly imbalanced datasets and preventing overfitting. (2) Thereafter, we propose a novel deep learning architecture integrating the deep atrous network architecture with the stochastic depth approach for speeding up the learning process, and impose a regularization effect. (3) Finally, we introduce an early fusion deep layer that combines image-based and LiDAR-derived features. In a benchmark study carried out using the Follo 2014 LiDAR data and the NIBIO AR5 land resources dataset, we compare our proposals to other deep learning architectures. A quantitative comparison shows that our best proposal provides more than 5% relative improvement in terms of mean intersection-over-union over the atrous network, providing a basis for a more frequent and improved use of LiDAR data for automatic land cover segmentation. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

13 pages, 2289 KiB

Open AccessArticle

Performance Evaluation of Single-Label and Multi-Label Remote Sensing Image Retrieval Using a Dense Labeling Dataset

by Zhenfeng Shao, Ke Yang and Weixun Zhou

Remote Sens. 2018, 10(6), 964; https://doi.org/10.3390/rs10060964 - 16 Jun 2018

Cited by 128 | Viewed by 10024 | Correction

Abstract

Benchmark datasets are essential for developing and evaluating remote sensing image retrieval (RSIR) approaches. However, most of the existing datasets are single-labeled, with each image in these datasets being annotated by a single label representing the most significant semantic content of the image. [...] Read more.

Benchmark datasets are essential for developing and evaluating remote sensing image retrieval (RSIR) approaches. However, most of the existing datasets are single-labeled, with each image in these datasets being annotated by a single label representing the most significant semantic content of the image. This is sufficient for simple problems, such as distinguishing between a building and a beach, but multiple labels and sometimes even dense (pixel) labels are required for more complex problems, such as RSIR and semantic segmentation.We therefore extended the existing multi-labeled dataset collected for multi-label RSIR and presented a dense labeling remote sensing dataset termed "DLRSD". DLRSD contained a total of 17 classes, and the pixels of each image were assigned with 17 pre-defined labels. We used DLRSD to evaluate the performance of RSIR methods ranging from traditional handcrafted feature-based methods to deep learning-based ones. More specifically, we evaluated the performances of RSIR methods from both single-label and multi-label perspectives. These results demonstrated the advantages of multiple labels over single labels for interpreting complex remote sensing images. DLRSD provided the literature a benchmark for RSIR and other pixel-based problems such as semantic segmentation. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

21 pages, 3856 KiB

Open AccessArticle

Geospatial Object Detection in Remote Sensing Imagery Based on Multiscale Single-Shot Detector with Activated Semantics

by Shiqi Chen, Ronghui Zhan and Jun Zhang

Remote Sens. 2018, 10(6), 820; https://doi.org/10.3390/rs10060820 - 24 May 2018

Cited by 56 | Viewed by 8119

Abstract

Geospatial object detection from high spatial resolution (HSR) remote sensing imagery is a heated and challenging problem in the field of automatic image interpretation. Despite convolutional neural networks (CNNs) having facilitated the development in this domain, the computation efficiency under real-time application and [...] Read more.

Geospatial object detection from high spatial resolution (HSR) remote sensing imagery is a heated and challenging problem in the field of automatic image interpretation. Despite convolutional neural networks (CNNs) having facilitated the development in this domain, the computation efficiency under real-time application and the accurate positioning on relatively small objects in HSR images are two noticeable obstacles which have largely restricted the performance of detection methods. To tackle the above issues, we first introduce semantic segmentation-aware CNN features to activate the detection feature maps from the lowest level layer. In conjunction with this segmentation branch, another module which consists of several global activation blocks is proposed to enrich the semantic information of feature maps from higher level layers. Then, these two parts are integrated and deployed into the original single shot detection framework. Finally, we use the modified multi-scale feature maps with enriched semantics and multi-task training strategy to achieve end-to-end detection with high efficiency. Extensive experiments and comprehensive evaluations on a publicly available 10-class object detection dataset have demonstrated the superiority of the presented method. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

19 pages, 3185 KiB

Open AccessArticle

Improving Remote Sensing Scene Classification by Integrating Global-Context and Local-Object Features

by Dan Zeng, Shuaijun Chen, Boyang Chen and Shuying Li

Remote Sens. 2018, 10(5), 734; https://doi.org/10.3390/rs10050734 - 9 May 2018

Cited by 83 | Viewed by 6722

Abstract

Recently, many researchers have been dedicated to using convolutional neural networks (CNNs) to extract global-context features (GCFs) for remote-sensing scene classification. Commonly, accurate classification of scenes requires knowledge about both the global context and local objects. However, unlike the natural images in which [...] Read more.

Recently, many researchers have been dedicated to using convolutional neural networks (CNNs) to extract global-context features (GCFs) for remote-sensing scene classification. Commonly, accurate classification of scenes requires knowledge about both the global context and local objects. However, unlike the natural images in which the objects cover most of the image, objects in remote-sensing images are generally small and decentralized. Thus, it is hard for vanilla CNNs to focus on both global context and small local objects. To address this issue, this paper proposes a novel end-to-end CNN by integrating the GCFs and local-object-level features (LOFs). The proposed network includes two branches, the local object branch (LOB) and global semantic branch (GSB), which are used to generate the LOFs and GCFs, respectively. Then, the concatenation of features extracted from the two branches allows our method to be more discriminative in scene classification. Three challenging benchmark remote-sensing datasets were extensively experimented on; the proposed approach outperformed the existing scene classification methods and achieved state-of-the-art results for all three datasets. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

22 pages, 52288 KiB

Open AccessArticle

A Deep-Local-Global Feature Fusion Framework for High Spatial Resolution Imagery Scene Classification

by Qiqi Zhu, Yanfei Zhong, Yanfei Liu, Liangpei Zhang and Deren Li

Remote Sens. 2018, 10(4), 568; https://doi.org/10.3390/rs10040568 - 6 Apr 2018

Cited by 58 | Viewed by 9733

Abstract

High spatial resolution (HSR) imagery scene classification has recently attracted increased attention. The bag-of-visual-words (BoVW) model is an effective method for scene classification. However, it can only extract handcrafted features, and it disregards the spatial layout information, whereas deep learning can automatically mine [...] Read more.

High spatial resolution (HSR) imagery scene classification has recently attracted increased attention. The bag-of-visual-words (BoVW) model is an effective method for scene classification. However, it can only extract handcrafted features, and it disregards the spatial layout information, whereas deep learning can automatically mine the intrinsic features as well as preserve the spatial location, but it may lose the characteristic information of the HSR images. Although previous methods based on the combination of BoVW and deep learning have achieved comparatively high classification accuracies, they have not explored the combination of handcrafted and deep features, and they just used the BoVW model as a feature coding method to encode the deep features. This means that the intrinsic characteristics of these models were not combined in the previous works. In this paper, to discover more discriminative semantics for HSR imagery, the deep-local-global feature fusion (DLGFF) framework is proposed for HSR imagery scene classification. Differing from the conventional scene classification methods, which utilize only handcrafted features or deep features, DLGFF establishes a framework integrating multi-level semantics from the global texture feature–based method, the BoVW model, and a pre-trained convolutional neural network (CNN). In DLGFF, two different approaches are proposed, i.e., the local and global features fused with the pooling-stretched convolutional features (LGCF) and the local and global features fused with the fully connected features (LGFF), to exploit the multi-level semantics for complex scenes. The experimental results obtained with three HSR image classification datasets confirm the effectiveness of the proposed DLGFF framework. Compared with the published results of the previous scene classification methods, the classification accuracies of the DLGFF framework on the 21-class UC Merced dataset and 12-class Google dataset of SIRI-WHU can reach 99.76%, which is superior to the current state-of-the-art methods. The classification accuracy of the DLGFF framework on the 45-class NWPU-RESISC45 dataset, 96.37 ± 0.05%, is an increase of about 6% when compared with the current state-of-the-art methods. This indicates that the fusion of the global low-level feature, the local mid-level feature, and the deep high-level feature can provide a representative description for HSR imagery. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Figure 1

23 pages, 4262 KiB

Open AccessArticle

Long-Term Annual Mapping of Four Cities on Different Continents by Applying a Deep Information Learning Method to Landsat Data

by Haobo Lyu, Hui Lu, Lichao Mou, Wenyu Li, Jonathon Wright, Xuecao Li, Xinlu Li, Xiao Xiang Zhu, Jie Wang, Le Yu and Peng Gong

Remote Sens. 2018, 10(3), 471; https://doi.org/10.3390/rs10030471 - 17 Mar 2018

Cited by 62 | Viewed by 8716

Abstract

Urbanization is a substantial contributor to anthropogenic environmental change, and often occurs at a rapid pace that demands frequent and accurate monitoring. Time series of satellite imagery collected at fine spatial resolution using stable spectral bands over decades are most desirable for this [...] Read more.

Urbanization is a substantial contributor to anthropogenic environmental change, and often occurs at a rapid pace that demands frequent and accurate monitoring. Time series of satellite imagery collected at fine spatial resolution using stable spectral bands over decades are most desirable for this purpose. In practice, however, temporal spectral variance arising from variations in atmospheric conditions, sensor calibration, cloud cover, and other factors complicates extraction of consistent information on changes in urban land cover. Moreover, the construction and application of effective training samples is time-consuming, especially at continental and global scales. Here, we propose a new framework for satellite-based mapping of urban areas based on transfer learning and deep learning techniques. We apply this method to Landsat observations collected during 1984–2016 and extract annual records of urban areas in four cities in the temperate zone (Beijing, New York, Melbourne, and Munich). The method is trained using observations of Beijing collected in 1999, and then used to map urban areas in all target cities for the entire 1984–2016 period. The method addresses two central challenges in long term detection of urban change: temporal spectral variance and a scarcity of training samples. First, we use a recurrent neural network to minimize seasonal urban spectral variance. Second, we introduce an automated transfer strategy to maximize information gain from limited training samples when applied to new target cities in similar climate zones. Compared with other state-of-the-art methods, our method achieved comparable or even better accuracy: the average change detection accuracy during 1984–2016 is 89% for Beijing, 94% for New York, 93% for Melbourne, and 89% for Munich, and the overall accuracy of single-year urban maps is approximately 96 ± 3% among the four target cities. The results demonstrate the practical potential and suitability of the proposed framework. The method is a promising tool for detecting urban change in massive remote sensing data sets with limited training data. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

23 pages, 7450 KiB

Open AccessArticle

Scene Classification Based on a Deep Random-Scale Stretched Convolutional Neural Network

by Yanfei Liu, Yanfei Zhong, Feng Fei, Qiqi Zhu and Qianqing Qin

Remote Sens. 2018, 10(3), 444; https://doi.org/10.3390/rs10030444 - 12 Mar 2018

Cited by 82 | Viewed by 8499

Abstract

With the large number of high-resolution images now being acquired, high spatial resolution (HSR) remote sensing imagery scene classification has drawn great attention but is still a challenging task due to the complex arrangements of the ground objects in HSR imagery, which leads [...] Read more.

With the large number of high-resolution images now being acquired, high spatial resolution (HSR) remote sensing imagery scene classification has drawn great attention but is still a challenging task due to the complex arrangements of the ground objects in HSR imagery, which leads to the semantic gap between low-level features and high-level semantic concepts. As a feature representation method for automatically learning essential features from image data, convolutional neural networks (CNNs) have been introduced for HSR remote sensing image scene classification due to their excellent performance in natural image classification. However, some scene classes of remote sensing images are object-centered, i.e., the scene class of an image is decided by the objects it contains. Although previous methods based on CNNs have achieved comparatively high classification accuracies compared with the traditional methods with handcrafted features, they do not consider the scale variation of the objects in the scenes. This makes it difficult to directly utilize CNNs on those remote sensing images belonging to object-centered classes to extract features that are robust to scale variation, leading to wrongly classified scene images. To solve this problem, scene classification based on a deep random-scale stretched convolutional neural network (SRSCNN) for HSR remote sensing imagery is proposed in this paper. In the proposed method, patches with a random scale are cropped from the image and stretched to the specified scale as the input to train the CNN. This forces the CNN to extract features that are robust to the scale variation. Furthermore, to further improve the performance of the CNN, a robust scene classification strategy is adopted, i.e., multi-perspective fusion. The experimental results obtained using three datasets—the UC Merced dataset, the Google dataset of SIRI-WHU, and the Wuhan IKONOS dataset—confirm that the proposed method performs better than the traditional scene classification methods. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

18 pages, 37551 KiB

Open AccessArticle

Automatic Building Segmentation of Aerial Imagery Using Multi-Constraint Fully Convolutional Networks

by Guangming Wu, Xiaowei Shao, Zhiling Guo, Qi Chen, Wei Yuan, Xiaodan Shi, Yongwei Xu and Ryosuke Shibasaki

Remote Sens. 2018, 10(3), 407; https://doi.org/10.3390/rs10030407 - 6 Mar 2018

Cited by 177 | Viewed by 11525

Abstract

Automatic building segmentation from aerial imagery is an important and challenging task because of the variety of backgrounds, building textures and imaging conditions. Currently, research using variant types of fully convolutional networks (FCNs) has largely improved the performance of this task. However, pursuing [...] Read more.

Automatic building segmentation from aerial imagery is an important and challenging task because of the variety of backgrounds, building textures and imaging conditions. Currently, research using variant types of fully convolutional networks (FCNs) has largely improved the performance of this task. However, pursuing more accurate segmentation results is still critical for further applications such as automatic mapping. In this study, a multi-constraint fully convolutional network (MC–FCN) model is proposed to perform end-to-end building segmentation. Our MC–FCN model consists of a bottom-up/top-down fully convolutional architecture and multi-constraints that are computed between the binary cross entropy of prediction and the corresponding ground truth. Since more constraints are applied to optimize the parameters of the intermediate layers, the multi-scale feature representation of the model is further enhanced, and hence higher performance can be achieved. The experiments on a very-high-resolution aerial image dataset covering 18 km

^{2}

and more than 17,000 buildings indicate that our method performs well in the building segmentation task. The proposed MC–FCN method significantly outperforms the classic FCN method and the adaptive boosting method using features extracted by the histogram of oriented gradients. Compared with the state-of-the-art U–Net model, MC–FCN gains 3.2% (0.833 vs. 0.807) and 2.2% (0.893 vs. 0.874) relative improvements of Jaccard index and kappa coefficient with the cost of only 1.8% increment of the model-training time. In addition, the sensitivity analysis demonstrates that constraints at different positions have inconsistent impact on the performance of the MC–FCN. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

19 pages, 7990 KiB

Open AccessArticle

Siamese-GAN: Learning Invariant Representations for Aerial Vehicle Image Categorization

by Laila Bashmal, Yakoub Bazi, Haikel AlHichri, Mohamad M. AlRahhal, Nassim Ammour and Naif Alajlan

Remote Sens. 2018, 10(2), 351; https://doi.org/10.3390/rs10020351 - 24 Feb 2018

Cited by 52 | Viewed by 12198

Abstract

In this paper, we present a new algorithm for cross-domain classification in aerial vehicle images based on generative adversarial networks (GANs). The proposed method, called Siamese-GAN, learns invariant feature representations for both labeled and unlabeled images coming from two different domains. To this [...] Read more.

In this paper, we present a new algorithm for cross-domain classification in aerial vehicle images based on generative adversarial networks (GANs). The proposed method, called Siamese-GAN, learns invariant feature representations for both labeled and unlabeled images coming from two different domains. To this end, we train in an adversarial manner a Siamese encoder–decoder architecture coupled with a discriminator network. The encoder–decoder network has the task of matching the distributions of both domains in a shared space regularized by the reconstruction ability, while the discriminator seeks to distinguish between them. After this phase, we feed the resulting encoded labeled and unlabeled features to another network composed of two fully-connected layers for training and classification, respectively. Experiments on several cross-domain datasets composed of extremely high resolution (EHR) images acquired by manned/unmanned aerial vehicles (MAV/UAV) over the cities of Vaihingen, Toronto, Potsdam, and Trento are reported and discussed. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

24 pages, 13762 KiB

Open AccessArticle

A Hierarchical Fully Convolutional Network Integrated with Sparse and Low-Rank Subspace Representations for PolSAR Imagery Classification

by Yan Wang, Chu He, Xinlong Liu and Mingsheng Liao

Remote Sens. 2018, 10(2), 342; https://doi.org/10.3390/rs10020342 - 23 Feb 2018

Cited by 36 | Viewed by 6378

Abstract

Inspired by enormous success of fully convolutional network (FCN) in semantic segmentation, as well as the similarity between semantic segmentation and pixel-by-pixel polarimetric synthetic aperture radar (PolSAR) image classification, exploring how to effectively combine the unique polarimetric properties with FCN is a promising [...] Read more.

Inspired by enormous success of fully convolutional network (FCN) in semantic segmentation, as well as the similarity between semantic segmentation and pixel-by-pixel polarimetric synthetic aperture radar (PolSAR) image classification, exploring how to effectively combine the unique polarimetric properties with FCN is a promising attempt at PolSAR image classification. Moreover, recent research shows that sparse and low-rank representations can convey valuable information for classification purposes. Therefore, this paper presents an effective PolSAR image classification scheme, which integrates deep spatial patterns learned automatically by FCN with sparse and low-rank subspace features: (1) a shallow subspace learning based on sparse and low-rank graph embedding is firstly introduced to capture the local and global structures of high-dimensional polarimetric data; (2) a pre-trained deep FCN-8s model is transferred to extract the nonlinear deep multi-scale spatial information of PolSAR image; and (3) the shallow sparse and low-rank subspace features are integrated to boost the discrimination of deep spatial features. Then, the integrated hierarchical subspace features are used for subsequent classification combined with a discriminative model. Extensive experiments on three pieces of real PolSAR data indicate that the proposed method can achieve competitive performance, particularly in the case where the available training samples are limited. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

20 pages, 8427 KiB

Open AccessArticle

A CNN-Based Fusion Method for Feature Extraction from Sentinel Data

by Giuseppe Scarpa, Massimiliano Gargiulo, Antonio Mazza and Raffaele Gaetano

Remote Sens. 2018, 10(2), 236; https://doi.org/10.3390/rs10020236 - 3 Feb 2018

Cited by 132 | Viewed by 15976

Abstract

Sensitivity to weather conditions, and specially to clouds, is a severe limiting factor to the use of optical remote sensing for Earth monitoring applications. A possible alternative is to benefit from weather-insensitive synthetic aperture radar (SAR) images. In many real-world applications, critical decisions [...] Read more.

Sensitivity to weather conditions, and specially to clouds, is a severe limiting factor to the use of optical remote sensing for Earth monitoring applications. A possible alternative is to benefit from weather-insensitive synthetic aperture radar (SAR) images. In many real-world applications, critical decisions are made based on some informative optical or radar features related to items such as water, vegetation or soil. Under cloudy conditions, however, optical-based features are not available, and they are commonly reconstructed through linear interpolation between data available at temporally-close time instants. In this work, we propose to estimate missing optical features through data fusion and deep-learning. Several sources of information are taken into account—optical sequences, SAR sequences, digital elevation model—so as to exploit both temporal and cross-sensor dependencies. Based on these data and a tiny cloud-free fraction of the target image, a compact convolutional neural network (CNN) is trained to perform the desired estimation. To validate the proposed approach, we focus on the estimation of the normalized difference vegetation index (NDVI), using coupled Sentinel-1 and Sentinel-2 time-series acquired over an agricultural region of Burkina Faso from May–November 2016. Several fusion schemes are considered, causal and non-causal, single-sensor or joint-sensor, corresponding to different operating conditions. Experimental results are very promising, showing a significant gain over baseline methods according to all performance indicators. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Figure 1

18 pages, 11465 KiB

Open AccessArticle

Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters

by Yongyang Xu, Liang Wu, Zhong Xie and Zhanlong Chen

Remote Sens. 2018, 10(1), 144; https://doi.org/10.3390/rs10010144 - 19 Jan 2018

Cited by 397 | Viewed by 21511

Abstract

Very high resolution (VHR) remote sensing imagery has been used for land cover classification, and it tends to a transition from land-use classification to pixel-level semantic segmentation. Inspired by the recent success of deep learning and the filter method in computer vision, this [...] Read more.

Very high resolution (VHR) remote sensing imagery has been used for land cover classification, and it tends to a transition from land-use classification to pixel-level semantic segmentation. Inspired by the recent success of deep learning and the filter method in computer vision, this work provides a segmentation model, which designs an image segmentation neural network based on the deep residual networks and uses a guided filter to extract buildings in remote sensing imagery. Our method includes the following steps: first, the VHR remote sensing imagery is preprocessed and some hand-crafted features are calculated. Second, a designed deep network architecture is trained with the urban district remote sensing image to extract buildings at the pixel level. Third, a guided filter is employed to optimize the classification map produced by deep learning; at the same time, some salt-and-pepper noise is removed. Experimental results based on the Vaihingen and Potsdam datasets demonstrate that our method, which benefits from neural networks and guided filtering, achieves a higher overall accuracy when compared with other machine learning and deep learning methods. The method proposed shows outstanding performance in terms of the building extraction from diversified objects in the urban district. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

14 pages, 16161 KiB

Open AccessArticle

Automatic Ship Detection in Remote Sensing Images from Google Earth of Complex Scenes Based on Multiscale Rotation Dense Feature Pyramid Networks

by Xue Yang, Hao Sun, Kun Fu, Jirui Yang, Xian Sun, Menglong Yan and Zhi Guo

Remote Sens. 2018, 10(1), 132; https://doi.org/10.3390/rs10010132 - 18 Jan 2018

Cited by 495 | Viewed by 19604

Abstract

Ship detection has been playing a significant role in the field of remote sensing for a long time, but it is still full of challenges. The main limitations of traditional ship detection methods usually lie in the complexity of application scenarios, the difficulty [...] Read more.

Ship detection has been playing a significant role in the field of remote sensing for a long time, but it is still full of challenges. The main limitations of traditional ship detection methods usually lie in the complexity of application scenarios, the difficulty of intensive object detection, and the redundancy of the detection region. In order to solve these problems above, we propose a framework called Rotation Dense Feature Pyramid Networks (R-DFPN) which can effectively detect ships in different scenes including ocean and port. Specifically, we put forward the Dense Feature Pyramid Network (DFPN), which is aimed at solving problems resulting from the narrow width of the ship. Compared with previous multiscale detectors such as Feature Pyramid Network (FPN), DFPN builds high-level semantic feature-maps for all scales by means of dense connections, through which feature propagation is enhanced and feature reuse is encouraged. Additionally, in the case of ship rotation and dense arrangement, we design a rotation anchor strategy to predict the minimum circumscribed rectangle of the object so as to reduce the redundant detection region and improve the recall. Furthermore, we also propose multiscale region of interest (ROI) Align for the purpose of maintaining the completeness of the semantic and spatial information. Experiments based on remote sensing images from Google Earth for ship detection show that our detection method based on R-DFPN representation has state-of-the-art performance. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

24 pages, 8706 KiB

Open AccessArticle

Double Weight-Based SAR and Infrared Sensor Fusion for Automatic Ground Target Recognition with Deep Learning

by Sungho Kim, Woo-Jin Song and So-Hyun Kim

Remote Sens. 2018, 10(1), 72; https://doi.org/10.3390/rs10010072 - 11 Jan 2018

Cited by 26 | Viewed by 10062

Abstract

This paper presents a novel double weight-based synthetic aperture radar (SAR) and infrared (IR) sensor fusion method (DW-SIF) for automatic ground target recognition (ATR). IR-based ATR can provide accurate recognition because of its high image resolution but it is affected by the weather [...] Read more.

This paper presents a novel double weight-based synthetic aperture radar (SAR) and infrared (IR) sensor fusion method (DW-SIF) for automatic ground target recognition (ATR). IR-based ATR can provide accurate recognition because of its high image resolution but it is affected by the weather conditions. On the other hand, SAR-based ATR shows a low recognition rate due to the noisy low resolution but can provide consistent performance regardless of the weather conditions. The fusion of an active sensor (SAR) and a passive sensor (IR) can lead to upgraded performance. This paper proposes a doubly weighted neural network fusion scheme at the decision level. The first weight (

α

) can measure the offline sensor confidence per target category based on the classification rate for an evaluation set. The second weight (

β

) can measure the online sensor reliability based on the score distribution for a test target image. The LeNet architecture-based deep convolution network (14 layers) is used as an individual classifier. Doubly weighted sensor scores are fused by two types of fusion schemes, such as the sum-based linear fusion scheme (

α β

-sum) and neural network-based nonlinear fusion scheme (

α β

-NN). The experimental results confirmed the proposed linear fusion method (

α β

-sum) to have the best performance among the linear fusion schemes available (SAR-CNN, IR-CNN,

α

-sum,

β

-sum,

α β

-sum, and Bayesian fusion). In addition, the proposed nonlinear fusion method (

α β

-NN) showed superior target recognition performance to linear fusion on the OKTAL-SE-based synthetic database. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing)

► Show Figures

Graphical abstract

Journal Menu

Journal Browser

Deep Learning for Remote Sensing

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (26 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI