remotesensing-logo

Journal Browser

Journal Browser

Deep Learning and Computer Vision in Remote Sensing-III

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "AI Remote Sensing".

Deadline for manuscript submissions: closed (31 October 2024) | Viewed by 13295

Special Issue Editors


E-Mail Website
Guest Editor Assistant
Department of Computing, University of Turku, Turku, Finland
Interests: machine learning; deep learning; computer vision; data analysis; pose estimation

E-Mail Website
Guest Editor Assistant
Department of Computing, University of Turku, Turku, Finland
Interests: artificial intelligence; machine learning; deep learning; human-computer interaction

Special Issue Information

Dear Colleagues,

Deep Learning (DL) has been successfully applied to a wide range of computer vision tasks, exhibiting state-of-the-art performance. For this reason, most data fusion architectures for computer vision tasks are built based on DL. In addition, DL harbors the great potential to process multi-sensory data, which usually contain rich information in the raw data and are sensitive to the training time and model size.

We are pleased to announce this Part III Special Issue, which will follow on from Part I and II, focusing on deep learning and computer vision methods for remote sensing. This Special Issue will provide researchers with the opportunity to present the recent advances in deep learning, with a specific focus on three main computer vision tasks: classification, detection and segmentation. We seek collaborative contributions from academia and industry experts in the fields of deep learning, computer vision, data science, and remote sensing.

The scope of this Special Issue includes, but is not limited to, the following topics:

  • Satellite image processing and analysis based on deep learning;
  • Deep learning for object detection, image classification, and semantic and instance segmentation;
  • Deep learning for remote sensing scene understanding and classification;
  • Transfer learning and deep reinforcement learning for remote sensing;
  • Supervised and unsupervised representation learning for remote sensing environments;
  • Applications

Dr. Fahimeh Farahnakian
Prof. Dr. Jukka Heikkonen
Guest Editors

Pouya Jafarzadeh
Farshad Farahnakian
Guest Editor Assistants

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Computer vision
  • Deep learning
  • Machine learning
  • Remote sensing
  • Sensor fusion
  • Autonomous systems

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 5856 KiB  
Article
Automated Recognition of Snow-Covered and Icy Road Surfaces Based on T-Net of Mount Tianshan
by Jingqi Liu, Yaonan Zhang, Jie Liu, Zhaobin Wang and Zhixing Zhang
Remote Sens. 2024, 16(19), 3727; https://doi.org/10.3390/rs16193727 - 7 Oct 2024
Cited by 1 | Viewed by 1077
Abstract
The Tianshan Expressway plays a crucial role in China’s “Belt and Road” strategy, yet the extreme climate of the Tianshan Mountains poses significant traffic safety risks, hindering local economic development. Efficient detection of hazardous road surface conditions (RSCs) is vital to address these [...] Read more.
The Tianshan Expressway plays a crucial role in China’s “Belt and Road” strategy, yet the extreme climate of the Tianshan Mountains poses significant traffic safety risks, hindering local economic development. Efficient detection of hazardous road surface conditions (RSCs) is vital to address these challenges. The complexity and variability of RSCs in the region, exacerbated by harsh weather, make traditional surveillance methods inadequate for real-time monitoring. To overcome these limitations, a vision-based artificial intelligence approach is urgently needed to ensure effective, real-time detection of dangerous RSCs in the Tianshan road network. This paper analyzes the primary structures and architectures of mainstream neural networks and explores their performance for RSC recognition through a comprehensive set of experiments, filling a research gap. Additionally, T-Net, specifically designed for the Tianshan Expressway engineering project, is built upon the optimal architecture identified in this study. Leveraging the split-transform-merge structure paradigm and asymmetric convolution, the model excels in capturing detailed information by learning features across multiple dimensions and perspectives. Furthermore, the integration of channel, spatial, and multi-head attention modules enhances the weighting of key features, making the T-Net particularly effective in recognizing the characteristics of snow-covered and icy road surfaces. All models presented in this paper were trained on a custom RSC dataset, compiled from various sources. Experimental results indicate that the T-Net outperforms fourteen once state-of-the-art (SOTA) models and three models specifically designed for RSC recognition, with 97.44% accuracy and 9.79% loss on the validation set. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-III)
Show Figures

Figure 1

26 pages, 23127 KiB  
Article
MEFSR-GAN: A Multi-Exposure Feedback and Super-Resolution Multitask Network via Generative Adversarial Networks
by Sibo Yu, Kun Wu, Guang Zhang, Wanhong Yan, Xiaodong Wang and Chen Tao
Remote Sens. 2024, 16(18), 3501; https://doi.org/10.3390/rs16183501 - 21 Sep 2024
Viewed by 557
Abstract
In applications such as satellite remote sensing and aerial photography, imaging equipment must capture brightness information of different ground scenes within a restricted dynamic range. Due to camera sensor limitations, captured images can represent only a portion of such information, which results in [...] Read more.
In applications such as satellite remote sensing and aerial photography, imaging equipment must capture brightness information of different ground scenes within a restricted dynamic range. Due to camera sensor limitations, captured images can represent only a portion of such information, which results in lower resolution and lower dynamic range compared with real scenes. Image super resolution (SR) and multiple-exposure image fusion (MEF) are commonly employed technologies to address these issues. Nonetheless, these two problems are often researched in separate directions. In this paper, we propose MEFSR-GAN: an end-to-end framework based on generative adversarial networks that simultaneously combines super-resolution and multiple-exposure fusion. MEFSR-GAN includes a generator and two discriminators. The generator network consists of two parallel sub-networks for under-exposure and over-exposure, each containing a feature extraction block (FEB), a super-resolution block (SRB), and several multiple-exposure feedback blocks (MEFBs). It processes low-resolution under- and over-exposed images to produce high-resolution high dynamic range (HDR) images. These images are evaluated by two discriminator networks, driving the generator to generate realistic high-resolution HDR outputs through multi-goal training. Extensive qualitative and quantitative experiments were conducted on the SICE dataset, yielding a PSNR of 24.821 and an SSIM of 0.896 for 2× upscaling. These results demonstrate that MEFSR-GAN outperforms existing methods in terms of both visual effects and objective evaluation metrics, thereby establishing itself as a state-of-the-art technology. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-III)
Show Figures

Figure 1

27 pages, 59331 KiB  
Article
AerialFormer: Multi-Resolution Transformer for Aerial Image Segmentation
by Taisei Hanyu, Kashu Yamazaki, Minh Tran, Roy A. McCann, Haitao Liao, Chase Rainwater, Meredith Adkins, Jackson Cothren and Ngan Le
Remote Sens. 2024, 16(16), 2930; https://doi.org/10.3390/rs16162930 - 9 Aug 2024
Cited by 2 | Viewed by 1794
Abstract
When performing remote sensing image segmentation, practitioners often encounter various challenges, such as a strong imbalance in the foreground–background, the presence of tiny objects, high object density, intra-class heterogeneity, and inter-class homogeneity. To overcome these challenges, this paper introduces AerialFormer, a hybrid model [...] Read more.
When performing remote sensing image segmentation, practitioners often encounter various challenges, such as a strong imbalance in the foreground–background, the presence of tiny objects, high object density, intra-class heterogeneity, and inter-class homogeneity. To overcome these challenges, this paper introduces AerialFormer, a hybrid model that strategically combines the strengths of Transformers and Convolutional Neural Networks (CNNs). AerialFormer features a CNN Stem module integrated to preserve low-level and high-resolution features, enhancing the model’s capability to process details of aerial imagery. The proposed AerialFormer is designed with a hierarchical structure, in which a Transformer encoder generates multi-scale features and a multi-dilated CNN (MDC) decoder aggregates the information from the multi-scale inputs. As a result, information is taken into account in both local and global contexts, so that powerful representations and high-resolution segmentation can be achieved. The proposed AerialFormer was benchmarked on three benchmark datasets, including iSAID, LoveDA, and Potsdam. Comprehensive experiments and extensive ablation studies show that the proposed AerialFormer remarkably outperforms state-of-the-art methods. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-III)
Show Figures

Graphical abstract

19 pages, 4475 KiB  
Article
A Multi-Level Cross-Attention Image Registration Method for Visible and Infrared Small Unmanned Aerial Vehicle Targets via Image Style Transfer
by Wen Jiang, Hanxin Pan, Yanping Wang, Yang Li, Yun Lin and Fukun Bi
Remote Sens. 2024, 16(16), 2880; https://doi.org/10.3390/rs16162880 - 7 Aug 2024
Cited by 2 | Viewed by 1370
Abstract
Small UAV target detection and tracking based on cross-modality image fusion have gained widespread attention. Due to the limited feature information available from small UAVs in images, where they occupy a minimal number of pixels, the precision required for detection and tracking algorithms [...] Read more.
Small UAV target detection and tracking based on cross-modality image fusion have gained widespread attention. Due to the limited feature information available from small UAVs in images, where they occupy a minimal number of pixels, the precision required for detection and tracking algorithms is particularly high in complex backgrounds. Image fusion techniques can enrich the detailed information for small UAVs, showing significant advantages under extreme lighting conditions. Image registration is a fundamental step preceding image fusion. It is essential to achieve accurate image alignment before proceeding with image fusion to prevent severe ghosting and artifacts. This paper specifically focused on the alignment of small UAV targets within infrared and visible light imagery. To address this issue, this paper proposed a cross-modality image registration network based on deep learning, which includes a structure preservation and style transformation network (SPSTN) and a multi-level cross-attention residual registration network (MCARN). Firstly, the SPSTN is employed for modality transformation, transferring the cross-modality task into a single-modality task to reduce the information discrepancy between modalities. Then, the MCARN is utilized for single-modality image registration, capable of deeply extracting and fusing features from pseudo infrared and visible images to achieve efficient registration. To validate the effectiveness of the proposed method, comprehensive experimental evaluations were conducted on the Anti-UAV dataset. The extensive evaluation results validate the superiority and universality of the cross-modality image registration framework proposed in this paper, which plays a crucial role in subsequent image fusion tasks for more effective target detection. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-III)
Show Figures

Figure 1

23 pages, 2574 KiB  
Article
Detection Based on Semantics and a Detail Infusion Feature Pyramid Network and a Coordinate Adaptive Spatial Feature Fusion Mechanism Remote Sensing Small Object Detector
by Shilong Zhou and Haijin Zhou
Remote Sens. 2024, 16(13), 2416; https://doi.org/10.3390/rs16132416 - 1 Jul 2024
Cited by 2 | Viewed by 1638
Abstract
In response to the challenges of remote sensing imagery, such as unmanned aerial vehicle (UAV) aerial imagery, including differences in target dimensions, the dominance of small targets, and dense clutter and occlusion in complex environments, this paper optimizes the YOLOv8n model and proposes [...] Read more.
In response to the challenges of remote sensing imagery, such as unmanned aerial vehicle (UAV) aerial imagery, including differences in target dimensions, the dominance of small targets, and dense clutter and occlusion in complex environments, this paper optimizes the YOLOv8n model and proposes an innovative small-object-detection model called DDSC-YOLO. First, a DualC2f structure is introduced to improve the feature-extraction capabilities of the model. This structure uses dual-convolutions and group convolution techniques to effectively address the issues of cross-channel communication and preserving information in the original input feature mappings. Next, a new attention mechanism, DCNv3LKA, was developed. This mechanism uses adaptive and fine-grained information-extraction methods to simulate receptive fields similar to self-attention, allowing adaptation to a wide range of target size variations. To address the problem of false and missed detection of small targets in aerial photography, we designed a Semantics and Detail Infusion Feature Pyramid Network (SDI-FPN) and added a dedicated detection scale specifically for small targets, effectively mitigating the loss of contextual information in the model. In addition, the coordinate adaptive spatial feature fusion (CASFF) mechanism is used to optimize the original detection head, effectively overcoming multi-scale information conflicts while significantly improving small target localization accuracy and long-range dependency perception. Testing on the VisDrone2019 dataset shows that the DDSC-YOLO model improves the mAP0.5 by 9.3% over YOLOv8n, and its performance on the SSDD and RSOD datasets also confirms its superior generalization capabilities. These results confirm the effectiveness and significant progress of our novel approach to small target detection. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-III)
Show Figures

Figure 1

17 pages, 14926 KiB  
Article
No-Reference Hyperspectral Image Quality Assessment via Ranking Feature Learning
by Yuyan Li, Yubo Dong, Haoyong Li, Danhua Liu, Fang Xue and Dahua Gao
Remote Sens. 2024, 16(10), 1657; https://doi.org/10.3390/rs16101657 - 8 May 2024
Cited by 2 | Viewed by 1042
Abstract
In hyperspectral image (HSI) reconstruction tasks, due to the lack of ground truth in real imaging processes, models are usually trained and validated on simulation datasets and then tested on real measurements captured by real HSI imaging systems. However, due to the gap [...] Read more.
In hyperspectral image (HSI) reconstruction tasks, due to the lack of ground truth in real imaging processes, models are usually trained and validated on simulation datasets and then tested on real measurements captured by real HSI imaging systems. However, due to the gap between the simulation imaging process and the real imaging process, the best model validated on the simulation dataset may fail on real measurements. To obtain the best model for the real-world task, it is crucial to design a suitable no-reference HSI quality assessment metric to reflect the reconstruction performance of different models. In this paper, we propose a novel no-reference HSI quality assessment metric via ranking feature learning (R-NHSIQA), which calculates the Wasserstein distance between the distribution of the deep features of the reconstructed HSIs and the benchmark distribution. Additionally, by introducing the spectral self-attention mechanism, we propose a Spectral Transformer (S-Transformer) to extract the spatial-spectral representative deep features of HSIs. Furthermore, to extract quality-sensitive deep features, we use quality ranking as a pre-training task to enhance the representation capability of the S-Transformer. Finally, we introduce the Wasserstein distance to measure the distance between the distribution of the deep features and the benchmark distribution, improving the assessment capacity of our method, even with non-overlapping distributions. The experimental results demonstrate that the proposed metric yields consistent results with multiple full-reference image quality assessment (FR-IQA) metrics, validating the idea that the proposed metric can serve as a substitute for FR-IQA metrics in real-world tasks. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-III)
Show Figures

Figure 1

35 pages, 18684 KiB  
Article
Deep Learning Test Platform for Maritime Applications: Development of the eM/S Salama Unmanned Surface Vessel and Its Remote Operations Center for Sensor Data Collection and Algorithm Development
by Juha Kalliovaara, Tero Jokela, Mehdi Asadi, Amin Majd, Juhani Hallio, Jani Auranen, Mika Seppänen, Ari Putkonen, Juho Koskinen, Tommi Tuomola, Reza Mohammadi Moghaddam and Jarkko Paavola
Remote Sens. 2024, 16(9), 1545; https://doi.org/10.3390/rs16091545 - 26 Apr 2024
Viewed by 1489
Abstract
In response to the global megatrends of digitalization and transportation automation, Turku University of Applied Sciences has developed a test platform to advance autonomous maritime operations. This platform includes the unmanned surface vessel eM/S Salama and a remote operations center, both of which [...] Read more.
In response to the global megatrends of digitalization and transportation automation, Turku University of Applied Sciences has developed a test platform to advance autonomous maritime operations. This platform includes the unmanned surface vessel eM/S Salama and a remote operations center, both of which are detailed in this article. The article highlights the importance of collecting and annotating multi-modal sensor data from the vessel. These data are vital for developing deep learning algorithms that enhance situational awareness and guide autonomous navigation. By securing relevant data from maritime environments, we aim to enhance the autonomous features of unmanned surface vessels using deep learning techniques. The annotated sensor data will be made available for further research through open access. An image dataset, which includes synthetically generated weather conditions, is published alongside this article. While existing maritime datasets predominantly rely on RGB cameras, our work underscores the need for multi-modal data to advance autonomous capabilities in maritime applications. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-III)
Show Figures

Figure 1

24 pages, 6324 KiB  
Article
A Bio-Inspired Visual Perception Transformer for Cross-Domain Semantic Segmentation of High-Resolution Remote Sensing Images
by Xinyao Wang, Haitao Wang, Yuqian Jing, Xianming Yang and Jianbo Chu
Remote Sens. 2024, 16(9), 1514; https://doi.org/10.3390/rs16091514 - 25 Apr 2024
Cited by 1 | Viewed by 915
Abstract
Pixel-level classification of very-high-resolution images is a crucial yet challenging task in remote sensing. While transformers have demonstrated effectiveness in capturing dependencies, their tendency to partition images into patches may restrict their applicability to highly detailed remote sensing images. To extract latent contextual [...] Read more.
Pixel-level classification of very-high-resolution images is a crucial yet challenging task in remote sensing. While transformers have demonstrated effectiveness in capturing dependencies, their tendency to partition images into patches may restrict their applicability to highly detailed remote sensing images. To extract latent contextual semantic information from high-resolution remote sensing images, we proposed a gaze–saccade transformer (GSV-Trans) with visual perceptual attention. GSV-Trans incorporates a visual perceptual attention (VPA) mechanism that dynamically allocates computational resources based on the semantic complexity of the image. The VPA mechanism includes both gaze attention and eye movement attention, enabling the model to focus on the most critical parts of the image and acquire competitive semantic information. Additionally, to capture contextual semantic information across different levels in the image, we designed an inter-layer short-term visual memory module with bidirectional affinity propagation to guide attention allocation. Furthermore, we introduced a dual-branch pseudo-label module (DBPL) that imposes pixel-level and category-level semantic constraints on both gaze and saccade branches. DBPL encourages the model to extract domain-invariant features and align semantic information across different domains in the feature space. Extensive experiments on multiple pixel-level classification benchmarks confirm the effectiveness and superiority of our method over the state of the art. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-III)
Show Figures

Graphical abstract

23 pages, 40077 KiB  
Article
MineCam: Application of Combined Remote Sensing and Machine Learning for Segmentation and Change Detection of Mining Areas Enabling Multi-Purpose Monitoring
by Katarzyna Jabłońska, Marcin Maksymowicz, Dariusz Tanajewski, Wojciech Kaczan, Maciej Zięba and Marek Wilgucki
Remote Sens. 2024, 16(6), 955; https://doi.org/10.3390/rs16060955 - 8 Mar 2024
Viewed by 1593
Abstract
Our study addresses the need for universal monitoring solutions given the diverse environmental impacts of surface mining operations. We present a solution combining remote sensing and machine learning techniques, utilizing a dataset of over 2000 satellite images annotated with ten distinct labels indicating [...] Read more.
Our study addresses the need for universal monitoring solutions given the diverse environmental impacts of surface mining operations. We present a solution combining remote sensing and machine learning techniques, utilizing a dataset of over 2000 satellite images annotated with ten distinct labels indicating mining area components. We tested various approaches to develop comprehensive yet universal machine learning models for mining area segmentation. This involved considering different types of mines, raw materials, and geographical locations. We evaluated multiple satellite data set combinations to determine optimal outcomes. The results suggest that radar and multispectral data fusion did not significantly improve the models’ performance, and the addition of further channels led to the degradation of the metrics. Despite variations in mine type or extracted material, the models’ effectiveness remained within an Intersection over Union value range of 0.65–0.75. Further, in this research, we conducted a detailed visual analysis of the models’ outcomes to identify areas requiring additional attention, contributing to the discourse on effective mining area monitoring and management methodologies. The visual examination of models’ outputs provides insights for future model enhancement and highlights unique segmentation challenges within mining areas. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-III)
Show Figures

Figure 1

Back to TopTop