1. Introduction
Landslides represent critical geohazards influenced by climatic, seismic, and human-induced factors, resulting in significant loss of life, infrastructure, and economic resources [
1,
2,
3]. With current weather patterns, urban extension, and population growth, landslides are expected to rise, particularly in mountainous regions worldwide [
4,
5]. Hence, effective mitigation strategies and risk reduction measures are essential for humankind. Extensive monitoring, forecasting, and detection form the foundation of landslide risk management [
1,
6]. Monitoring involves tracking the displacement of landslides over time, while forecasting, or prediction, involves timely estimation of potential events. Detection provides detailed and precise information about landslide occurrences. Continuous monitoring is crucial for predicting landslides, while detection facilitates the identification of key parameters, such as location and scale, essential for reducing their cascading effects. Furthermore, accurate detection of landslide events is vital for post-hazard recovery efforts and preparing up-to-date inventories. Additionally, detecting landslide events aids in mapping vulnerable areas, which is essential for sustainable planning. So, there is a growing trend towards rapidly and precisely detecting landslide events.
The traditional approach to landslide extraction relied upon ground-based measurements [
7,
8], which were time-consuming, labor-intensive, and less effective in emergency response situations [
7]. However, with advancements in remote sensing technology, high-resolution images have become vital in landslide hazard assessment studies [
1].
Four primary techniques [
8] leverage remote sensing data for landslide detection: visual interpretation, pixel-based, object-oriented, and artificial intelligence (AI) methods. Although relying on expert knowledge, visual interpretation methods are time-consuming and may not meet the requirements for rapid response actions [
7]. Pixel-based approaches overcome the restrictions of visual analysis by employing binary algorithms [
8] to classify pixels in an image into specific categories (e.g., landslide and background). However, accurately distinguishing objects with spectral appearances similar to landslides can be challenging. Object-oriented procedures rely on multiscale segmentations that consider image primitives such as texture, shape, and spectrum [
9]. These techniques require empirical settings (threshold criteria) and may struggle with the rapid segmentation of large geographical areas using high-resolution satellite imagery [
10]. Keyport et al. (2018) [
11] conducted a comparative study to infer the capabilities and limitations of both methods for landslide detection. Furthermore, AI-based techniques have witnessed remarkable advancements across multiple fields, including hazard exploration. Sequentially, remote sensing-centered big data [
12,
13] have provided auxiliary support to AI initiatives.
Additionally, machine learning (ML) algorithms have shown significant progress in landslide prevention and assessment studies [
4]. Specifically, logistic regression [
14], support vector machines [
15], random forest [
16], and decision trees [
17] have been prominent. With advancements in computational tools, particularly graphical processing units (GPUs), deep learning (a subdiscipline of ML) algorithms have demonstrated exceptional performance across various fields, including object detection [
18], image segmentation [
19], and scene classification [
20]. Moreover, deep learning has been successfully applied in geohazard analysis, including earthquakes, avalanches, and landslides [
6].
Previously, various deep learning algorithms, particularly convolutional neural networks (CNNs), with diverse architectures have been employed for landslide information recognition [
21], specifically Mask-RCNN [
9,
21] and U-Net [
22,
23,
24]. Currently, YOLO (you only look once) models have gained popularity in the areas of computer vision [
25]. For landslide detection, a series of YOLO models, namely, YOLOv3 [
8], YOLOv4 [
26], YOLOv5 [
27], YOLOv7 [
28], and YOLOv8 [
29], have been augmented and applied. In addition, attention modules [
30], inspired by human visual mechanisms, have been integrated into CNN architectures to enhance detection accuracy, specifically 3D-SCAM [
31], SW-MSA [
32], YOLO-SA [
2], and LA-YOLO-LLL [
33]. Despite the notable contributions by researchers, identifying a suitable deep learning-based model for landslide detection remains a challenge. In addition, evaluating the performance of attention models fused within a one-stage algorithm is in the preliminary stage and is, hence, crucial. We suggest an attention-driven generalized efficient layer aggregation network (GELAN) to leverage its strength and efficiency concerning landslide detection using satellite images with complex environments to address these concerns. The major contributions of this research work are threefold:
Developing an intelligent and optimized GELAN for automated landslide mapping in the Himalayan terrain.
Incorporating attention-based cognitive models, mainly CBAM and ECA, within GELAN to augment landslide hazard analysis.
Providing a comprehensive model evaluation and comparison considering earth observational images of different geological and geomorphological settings.
This paper is structured as follows:
Section 2 provides the details of the study area and dataset, while
Section 3 outlines the overall methodology.
Section 4 presents the results and
Section 5 contains the discussion. Lastly,
Section 6 includes the conclusion.
2. Study Area and Dataset
The study area is in Nepal, the Greater Himalayas (
Figure 1), comprising 275 GeoTIFF-8 bit Landsat images of
pixels. This region is prone to landslides due to various factors such as seismic activity, topographical conditions, seasonal rainfall, and human activities such as deforestation [
34,
35]. The Nepal region has experienced significant economic losses and numerous fatalities from natural disasters such as earthquakes and landslides. For instance, the Gorkha earthquake, with a magnitude of 7.8, occurred on 25 April 2015, followed by aftershocks on 12 May 2015, resulting in extensive damage and approximately 9000 fatalities [
34,
36]. The land cover pattern of the study site includes bare soil, vegetation, water bodies, rocks, and urban areas, making the extraction of landslide information from this complex environment particularly challenging. The dataset is available at
https://zenodo.org/record/3675410#.ZBv3UZBxD8 (accessed on 15 March 2024).
Figure 2 represents sample images of the Nepal landslide detection database.
3. Methodology
This section outlines the technique for automated landslide recognition, which encompasses four key stages: (1) data preparation, (2) model development, (3) experimental settings, and (4) evaluation measures. Algorithm 1 provides the pseudocode for the proposed method.
Algorithm 1 Pseudocode of the novel GELAN based on an attention mechanism. |
- 1:
Input: Labeled landslide-based satellite imagery - 2:
Output: Predicted bounding box coordinates - 3:
Annotation transformation: Convert annotations into YOLO format. - 4:
Data formation: Organize data into training, validation, and test sets. - 5:
Model training: - 6:
GELANc, and GELANc+attention - 7:
GELANe, and GELANe+attention - 8:
YOLOv9c, YOLOv9e, and YOLOv9c+attention - 9:
for each epoch do - 10:
Hyperparameter tuning: Fine-tune image size, batch size, etc. - 11:
Calculate the loss function (classification, bounding box, and confidence), and update the network’s parameters. - 12:
Model validation: Evaluate detection performance on the validation dataset. - 13:
end for - 14:
Save the model weights with the best performance (best_weight.pt). - 15:
Model testing: Apply the trained network to detect landslides in test images. - 16:
Predicted bounding box coordinates for detected landslides.
|
3.1. Data Preparation
The dataset initially contained annotations represented as masks in .png format. To align with our requirements for the YOLO format, we transformed the annotations into five elements: object class ID, X_center, Y_center, height, and width. Algorithm 2 represents the procedure of annotation conversion.
Algorithm 2 Procedure for annotation conversion. |
- 1:
INPUT Satellite images with landslide events, and ground truth images in binary format. - 2:
OUTPUT Annotations in YOLO format. - 3:
Read ground truth images (mask). - 4:
Convert images to grayscale. - 5:
Calculate the height (h) and width (w) of the image. - 6:
Contour estimation method: Identify the edges of the ground truth image. - 7:
for each identified contour do - 8:
Get the minimum and maximum coordinates of the bounding box: - 9:
Minimum x-coordinate (), Minimum y-coordinate (), Maximum x-coordinate (), Maximum y-coordinate (). - 10:
end for - 11:
Non-maximum suppression to remove overlapping bounding boxes. - 12:
for each remaining bounding box do - 13:
Calculate normalized bounding box coordinates:
- 14:
Create a line in the YOLO annotation file in the format:
- 15:
end for - 16:
Repeat step 7 for all remaining bounding boxes. - 17:
Save YOLO annotations.
|
Moreover, we created a YAML file comprising essential information, specifically the file paths for the training, validation, and testing sets and the number and names of classes. For experimentation, we subdivided the annotated data into training (70%), validation (20%), and testing (10%) sets.
3.2. Model Development
This section illustrates the background of the original YOLOv9 algorithm and describes the applied attention modules, particularly CBAM and ECA. Subsequently, a detailed explanation of the proposed GELAN+attention network including training details and evaluation measures will be given.
3.2.1. YOLOv9 Algorithm
Neural networks usually come across the challenge of information loss as the input data undergo numerous layers of feature extraction and spatial transformation, leading to the loss or degradation of the original information. Hence, to report this concern, the YOLOv9 model [
37] uses PGI (programmable gradient information) and GELAN (generalized efficient layer aggregation network) for effective key feature extraction.
PGI is a supporting framework aimed at managing the propagation of gradient information across several semantic levels to enrich the detection performance of the network. PGI consists of three primary modules: (1) the main branch, (2) the auxiliary reversible branch, and (3) multilevel auxiliary information. During the inference stage, only the main branch is accountable for both forward- and backpropagation. As the network becomes deeper, an information bottleneck may occur, leading to loss functions that fail to produce valuable gradients. In such cases, reversible functions, employed by the auxiliary reversible branch, become active, preserving information integrity and minimizing information degradation in the main branch. Further, multilevel auxiliary information addresses the problem of errors commencing from the deep supervision process, improving the learning capacity of the model through the introduction of supplementary information at multiple levels.
During model training, to boost information integration and propagation efficiency, the YOLOv9 model presents an innovative lightweight network architecture called GELAN. It assimilates ELAN (efficient layer aggregation network) and CSPNet (cross-stage partial network) to effectively combine network information, thereby minimizing information loss during propagation and improving interlayer information communication. This architecture is well suited for hazard assessment with restricted computing resources because of its lesser parameter count and computational complexity. Additionally, based on the parameter count, four variants of YOLOv9 are released, namely, v9-S, v9-M, v9-C, and v9-E, pretrained on the MS-COCO dataset, providing a balance between speed and accuracy. However, considering the size of the database and the available computational resources, this study concentrates on GELANc, GELANe, YOLOv9c, and YOLOv9e only.
3.2.2. CBAM
CBAM [
38], an effective attention-based module, can be integrated into neural networks with a remarkably low parameter count. As illustrated in
Figure 3, CBAM consists of a channel attention module (CAM) and a spatial attention module (SAM). To generate feature maps by multiplying weights for adaptive refinement, CBAM computes attention weights across two dimensions. CAM executes two vital operations: global average pooling (GAP) and maximum global pooling, while SAM executes maximum and average pooling. This dual-module framework aids the estimation of attention weights for the precise refinement of feature maps. In GAP, each feature map of the convolutional layer is spatially averaged to produce a single value. Unlike traditional pooling operations like max pooling, which select the maximum value within a region, GAP computes the average value across the entire feature map. This process encompasses three stages: (1) Spatial average calculation, where the values of all the neurons are averaged together for each feature map in the convolutional layer. (2) The spatial average operation, which reduces the spatial dimensions of the feature maps to a single value per channel. (3) Channel-wise aggregation, where the averaged values from each channel are concatenated to form the output of the pooling operation. GAP is generally applied at the end of the convolutional layers in a CNN, before the fully connected layers. GAP also introduces translation invariance, making the network less sensitive to small spatial variations in the input and enhancing the network’s performance in challenging tasks like landslide detection.
Through convolution and pooling operations, CBAM derives the 1D channel attention
and 2D spatial attention
maps, given an input feature map
. The inclusive attention is depicted by Equations (1) and (2).
where ⊗ signifies element-wise multiplication and Z″ denotes the refined output.
Moreover, Equations (3) and (4) define the weights of CAM and SAM, respectively.
3.2.3. ECA Network
Efficient channel attention (ECA) [
39] is a mechanism designed to enhance the representation power of CNNs by focusing on important channels within feature maps. It aims to capture long-range dependencies between channels while maintaining computational efficiency, making it suitable for real-time applications such as landslide event extraction employing large-scale datasets. ECA, a lightweight framework related to SENet [
39], readily integrates into CNNs to streamline the model’s complexity while preserving the original dimensionality.
Initially, the input feature map is transformed using a 1D convolutional layer. This transformation enables the network to learn channel-wise relationships. Further, the transformed feature map is then passed through a GAP layer, capturing the global context of each channel to recalibrate the feature map. This is achieved by applying a sigmoid activation function followed by element-wise multiplication with the original feature map. The sigmoid function scales the channel-wise importance scores between 0 and 1, effectively emphasizing informative channels while suppressing less relevant ones.
To ensure computational efficiency, ECA employs a kernel-wise operation, where each channel is processed independently. This reduces the computational overhead compared to traditional attention mechanisms that consider pairwise interactions between channels. The convolutional layers’ output forms a 4D tensor, which serves as input to the ECA network, encompassing four features, namely, the number of channels (Ch), height (He), width (Wi), and batch size (Bs). Similarly, the output of ECA-Net remains a 4D tensor. Its architecture consists of three modules: (1) the global feature descriptor, (2) adaptive neighborhood interaction, and (3) broadcast scaling. The GAP (global average pooling) method involves processing the input tensor by calculating the average of all pixels within an explicit feature map, followed by a reduction to a single pixel. Afterward, a tensor represented as
endures a 1D striding convolution. Further, the adaptive estimation of kernel size
is estimated by Equations (5) and (6), where
and ‘X’ indicate the predefined hyperparameters. The architecture of the ECA model is given in
Figure 4.
3.2.4. Proposed Architecture
The proposed architecture encompasses the backbone and head. The backbone includes a series of convolutional layers that progressively extract hierarchical features from the input data. These convolutional layers allow the model to capture complex patterns and relationships within the data. These layers apply convolutional filters to the input, learning spatial patterns and extracting features. The first layer has a −1 input channel and outputs 64 channels (indicating the layer will generate 64 feature maps), with a kernel size of 3 × 3 and a stride of 2 for downsampling. The output feature maps from the first convolutional layer (with 64 channels) serve as an input to the second convolutional layer (with 128 channels). This process continues through the entire backbone. The architecture of the backbone is aimed to efficiently aggregate network information while minimizing information loss during propagation. This is achieved through the integration of specialized modules such as CSPNet and ELAN, resulting in RepNCSPELAN, which enhances interlayer information interaction and preserves information integrity. The output feature maps from the previous convolutional layer (with 128 channels) are fed into the RepNCSPELAN block. Average pooling (a down layer) downsamples feature maps spatially, reducing their resolution while preserving some spatial information to create additional feature maps at different scales. Later, the pooled features are processed by convolution, resulting in further downsampled feature maps (with 256 channels). Hence, the backbone progressively reduces the spatial resolution of the feature maps while extracting increasingly complex features. The backbone follows a similar pattern for the remaining layers, and the number of output channels typically increases as the network progresses (256, 512, 512). This allows the model to capture both fine-grained and high-level information from the input image, which is crucial for accurate landslide detection.
Further, the neck receives the feature maps extracted by the backbone and refines them for landslide detection. The SPPELAN block performs spatial pyramid pooling (SPP), which partitions the input feature maps into grids of different sizes and extracts features from each region. This allows the network to capture features at different scales within the input. The upsampled features are concatenated with the features from the backbone, creating a richer representation at a higher resolution. Concatenation combines information from different scales, allowing the network to leverage both high-level semantic information and low-level spatial details for landslide detection.
The concatenated features are processed by the RepNCSPELAN blocks (similar to those in the backbone), potentially involving group normalization, channel splitting, and residual connections. Further, these features are enhanced by the attention module. Attention, a fundamental cognitive ability of human beings, enables them to prioritize relevant information while filtering out irrelevant details. Inspired by this aspect of human vision, attention modules show significant potential when integrated within a CNN to enhance its performance. Therefore, we deliberately integrated two substantial attention modules, CBAM and ECA, within GELAN. Particularly, an attention module is added in the neck after each RepNCSPELAN layer. The architecture of the head follows repeated upsampling, concatenation, RepNCSPELAN block, attention to the remaining layers, and processing features at different scales.
Figure 5 represents the algorithmic framework of the proposed GELAN+attention model. The addition of the attention module enhances feature representation as it allows the model to dynamically focus on the most relevant features within the input data. By assigning higher weights to important features and suppressing less relevant ones, the attention mechanism boosts the representation of key information, leading to more discriminative feature maps. Further, the attention layer facilitates efficient integration of information across different layers of the network. By selectively attending to relevant features from multiple layers, the model can effectively combine information from various levels of abstraction, leading to richer and more informative representations. The attention mechanism enables the model to adaptively learn which features to prioritize based on the input data. This characteristics allows the model to selectively consider different parts of the input, depending on their importance to the task, leading to more flexible and adaptive feature learning. The attention mechanism highlights the regions of the input that are most influential for the predictions, enabling increased interpretability and supporting model debugging and refinement. Integrating attention models within GELAN empowers the network to become more robust towards input data, consequently improving its ability to generalize across different conditions and environments.
Lastly, the detection layer performs the complex task of landslide detection. It takes the processed features and predicts the bounding boxes for potential objects in each grid cell of the feature maps and the class probabilities for each bounding box. The proposed architecture improves accuracy and facilitates real-time landslide information extraction from remotely sensed data.
3.3. Experimental Settings and Evaluation Measures
For experimentation, an image size of 416 × 416 pixels is chosen as the input to the network. The architecture is optimized to effectively process images of the selected size, capturing intricate details in the satellite image and ensuring efficient processing by the network. The batch size is set to 4 due to memory constraints, allowing for efficient training. Although the aim is to train the model for a maximum of 500 epochs, early stopping criteria are adopted to avoid inefficient utilization of computing resources and time. For optimization, the training units utilize a stochastic gradient descent (SGD) optimizer that uniformly scales gradients in all directions and facilitates faster convergence by collecting gradients from previous data points. Furthermore, GELAN uses an SGD optimizer by default, which is therefore employed in this research work. The initial learning rate is set at 0.01, with an initial momentum factor of 0.937 and an initial weight decay of 0.0005. These parameters are chosen to ensure effective training and optimization of the model. The training environment comprises an Intel® Xeon-CPU E3-1231 v3 @ 3.40 GHz CPU and NVIDIA GeForce RTX 3080Ti GPU running on the Ubuntu 18.04 operating system. The deep learning frameworks used are PyTorch 1.7 and CUDA 11.4. The model’s performance evaluation encompasses both quantitative and qualitative assessments. For quantitative evaluation, four metrics, precision, recall, f-score, and mean average precision (mAP), are computed using Equations (7)–(10). Simultaneously, qualitative evaluation involves visual analysis of the outcomes to assess the model’s capability.
5. Discussion
The proposed deep learning-based algorithm identifies the presence of landslides successfully. However, to infer its suitability, we considered the additional benchmark dataset. Moreover, we examined the effectiveness of integrating an additional attention-based model into GELAN to assess its capabilities. Subsequently, we compared our results with previously proposed studies and examined the performance of state-of-the-art YOLO variants for recognizing landslides. Additionally, we underscore the importance of the patience factor. Finally, the limitations, prospects, and applications of this study are outlined.
5.1. Model’s Performance on Benchmark Dataset
To assess the effectiveness of the suggested model, we included an additional dataset with geological and geomorphological characteristics distinct from those of the Himalayan terrain, specifically the popular benchmark Bijie landslide detection database. The quantitative outcomes (f-score, precision, and recall) are shown in
Table 2. A competitive f-score is achieved for GELANc (98.0%) and GELANe (98.5%). By integrating ECA within GELANc (improvement = +1.5%) and GELANe (improvement = +1.0%), the computed f-score reached 99.5%. Similarly, GELANe+CBAM (99.0%) and GELANc + CBAM (98.0%) demonstrated a challenging f-score. Recently, Du et al., 2024 [
29] and Yang et al., 2024 [
33] employed a similar dataset for their experiments based on YOLO models. Comparing our results with them (considering the average f-score), we noted a substantial improvement of +5.15% compared to Du et al., 2024 [
29] and a remarkable progress of +10.25% in the case of Yang et al., 2024 [
33]. Moreover, a noteworthy improvement of +19.75% is estimated when compared with the results of Tanatipuknon et al., 2021 [
40]. This comparison unveils the novelty of the proposed algorithms.
5.2. Additional Attention Model
We further explored an additional attention model to identify the most suitable one for landslide identification. Particularly, the global attention mechanism (GAM) [
41], a widely recognized model, enhances the utility of global dimension-interactive features to reduce information dispersion. GAM innovatively transforms the channel and spatial components of CBAM. GELANc + GAM (f-score = 76.7%,
[email protected] = 73.2%) demonstrated better performance when compared with GELANe + GAM (f-score = 70.3%,
[email protected] = 67.4%). The recall computed by GELANc + GAM and GELANe + GAM is 71.8% and 60.2%, respectively. However, the estimated precision of GELANe + GAM is estimated to be higher (+2.2 units) than GELANc + GAM (P = 82.4%). Further, the computational efficiency of GELANc + GAM (time = 1.301 h, memory = 2.15 GB, GFLOPS = 120.6, and Params = 31.06) is also noted to be better than GELANe + GAM (time = 2.095 h, memory = 4.21 GB, GFLOPS = 208.9, and Params = 63.65). In addition, YOLOv9-C+GAM revealed a competitive f-score (77.5%) and
[email protected] (74.3%). The accurateness computed by the GAM is assessed to be less than the CBAM and ECA modules.
5.3. Comparative Analysis
To exhibit the effectiveness of the attention mechanism integrated into GELAN, the results are compared with previously proposed methods for landslide detection from the Nepal region, particularly Bragagnolo et al., 2021 [
34], Chen et al., 2018 [
36], and Yu et al., 2020 [
35]. Comparison and evaluation are based on computed precision, recall, and f-score of our study’s best model.
Table 3 indicates the notable improvement in f-score (+14.5% to +55.5%), precision (+24.7 to +55.7 %), and recall (+3.7% to +54.7%), thus affirming its novelty and efficacy.
We also evaluated the performance of YOLOv9 with seven algorithms, specifically YOLOv5 small (s), YOLOv5 medium (m), YOLOv6 nano (n), YOLOv6s, YOLOv6m, YOLOv7 tiny (t), and YOLOv8n. The obtained mAP at the intersection over the union threshold of 0.5 (
[email protected]) for all models is depicted in
Figure 9. Notably, GELANc achieved the highest
[email protected] at 78.3% followed by YOLOv9e (76.3%), which is almost equivalent to YOLOv9e (76.0%). In addition, the
[email protected] of GELANe reached 75.2%. The computed
[email protected] of the considered YOLO variants for comparison is significantly lower (+2.6 to +4.5) than YOLOv9. However, the marginal difference (0.5 units) is noted in the case of YOLOv8n and GELANe. This comparison highlights the effectiveness of the YOLOv9 network.
The results of the proposed model (GELANe + ECA, GELANe + CBAM GELANc + ECA, GELANc + CBAM) are compared with state-of-the-art models, particularly YOLOv6n, YOLOv6t, YOLOv6s, and YOLOv7t, considering
[email protected]–0.95 matrices (refer to
Figure 10). The maximum improvement of +7.3% is noted in the case of GELANc+ECA when compared with YOLOv6n. In addition, the progress of +6.6% and +6.1% is estimated when compared with YOLOv6s and YOLOv6t, respectively. Further, the
[email protected]–0.95 improved by +3.4% in comparison with YOLOv7t. GELANc+CBAM shows an improvement of +4.9% over YOLOv6n, and the enhancements are +4.2%, +3.7%, and +1.0% when compared to YOLOv6s, YOLOv6t, and YOLOv7t, respectively. Additionally, GELANe+ECA improves
[email protected]–0.95 by +4.4% over YOLOv6n. The improvements are +3.7%, +3.2%, and +0.5% compared to YOLOv6s, YOLOv6t, and YOLOv7t, respectively. Furthermore, enrichment of +3.1%, +2.4%, and +1.9% is noted when the
[email protected]–0.95 of GELANe+CBAM is compared with YOLOv6n, YOLOv6s, and YOLOv6t, respectively, while a marginal difference (0.8%) is observed between the
[email protected]–0.95 of GELANe+CBAM and YOLOv7t. These comparisons highlight the significant enhancements in
[email protected]–0.95 achieved by the proposed models, demonstrating their superior performance over the popular object detection models.
5.4. Significance of Early Stopping Factor
During the execution and implementation of YOLO algorithms, early stopping is a technique utilized to improve the efficiency of the network. It involves continuously monitoring evaluation metrics during training and halting the training process if no improvement in accuracy is observed after a certain number of epochs, known as the patience parameter. Patience is a user-defined integer that specifies the number of iterations or epochs the training process can endure without any advancement in evaluation metrics before terminating early. This supports maintaining a balance between training time and network capability. Choosing a suitable patience value is crucial for optimizing outcomes. Setting the patience value too low may prematurely end model training, potentially missing out on performance improvements. On the other hand, setting it too high can unnecessarily prolong training, resulting in ineffective use of computational resources/hardware and time. There is no standard formula for determining the optimal patience value; it relies on factors such as dataset, model complexity, speed, and machine configuration. Finding the optimal patience value requires trial experimentation and fine-tuning. The default patience value for the YOLOv9 network is 100, as utilized in this research work (patience = 100). Patience parameters are flexible and can be disabled (set patience = 0) and enabled easily. In our experiments, only GELANe terminated early (at 468 epochs), saving the best results at 367 epochs, and the other five models (GELANc, GELANe + ECA, GELANc + ECA, GELANe + CBAM, and GELANc + CBAM) were trained for the defined number of epochs.
5.5. Limitations, Prospects, and Applications
The development of intelligent and fully automated landslide event recognition models by deep learning algorithms is still ongoing, presenting opportunities for advancement. Hence, we suggest an attention-based GELAN for detecting landslide hazards of varying sizes from complex backgrounds. We explore the implications of different attention modules and compare their performance with state-of-the-art YOLO models to showcase their detection capabilities. However, our study is constrained by the dataset and computational resources. The experimentation database comprises a limited number of satellite images, which impacts the accuracy of the suggested network. Additionally, the applied deep learning models are executed on a machine with limited GPU access. For future research work, we aim to enhance the accuracy of the model network by increasing the database size. Furthermore, the model can be evaluated using unmanned aerial vehicle images. The proposed approach holds the potential for quick and accurate mapping of landslide scenes, supporting effective rescue and recovery actions. Additionally, the findings of this study can be valuable in time-constrained landslide mapping for up-to-date inventory generation. Moreover, accurately identifying the spatial characteristics of landslides is essential for developing a landslide forecasting system.