A Real-Time Negative Obstacle Detection Method for Autonomous Trucks in Open-Pit Mines

Ruan, Shunling; Li, Shaobo; Lu, Caiwu; Gu, Qinghua

doi:10.3390/su15010120

Open AccessArticle

A Real-Time Negative Obstacle Detection Method for Autonomous Trucks in Open-Pit Mines

by

Shunling Ruan

^1,2,

Shaobo Li

^1,2,*,

Caiwu Lu

^1,2 and

Qinghua Gu

^1,2

¹

School of Resource Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China

²

Xi’an Key Laboratory of Intelligent Industry Perception Computing and Decision Making, Xi’an University of Architecture and Technology, Xi’an 710055, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(1), 120; https://doi.org/10.3390/su15010120

Submission received: 21 November 2022 / Revised: 16 December 2022 / Accepted: 16 December 2022 / Published: 21 December 2022

(This article belongs to the Special Issue Unmanned Ground Vehicle and Flying Cars Motion Planning and Control in Complex Environment)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Negative obstacles such as potholes and road collapses on unstructured roads in open-pit mining areas seriously affect the safe transportation of autonomous trucks. In this paper, we propose a real-time negative obstacle detection method for self-driving trucks in open-pit mines. By analyzing the characteristics of road negative obstacles in open-pit mines, a real-time target detection model based on the Yolov4 network was built. It uses RepVGG as the backbone feature extraction network, applying SimAM space and a channel attention mechanism to negative obstacle multiscale feature fusion. In addition, the classification and prediction modules of the network are optimized to improve the accuracy with which it detects negative obstacle targets. A non-maximum suppression optimization algorithm (CIoU Soft Non-Maximum Suppression, CS-NMS) is proposed in the post-processing stage of negative obstacle detection. The CS-NMS calculates the confidence of each detection frame with weighted optimization to solve the problems of encountering obscure negative obstacles or poor positioning accuracy of the detection boxes. The experimental results show that this research method achieves 96.35% mAP for detecting negative obstacles on mining roads with a real-time detection speed of 69.3 fps, and that it can effectively identify negative obstacles on unstructured roads in open-pit mines with complex backgrounds.

Keywords:

open-pit mines; negative obstacles; autonomous trucks; object detection

1. Introduction

In recent years, with the rapid development of smart mine construction and unmanned driving technology, driverless open-pit mines have gradually been established. However, there are some negative obstacles, such as potholes and road collapses on open-pit roads. These negative barriers blend with the road to a large extent, resulting in inconspicuous feature information. This poses a significant risk to the safe driving of driverless mine trucks in open-pit mines. Therefore, there is an urgent need to research rapid and accurate negative obstacle detection methods for the roads in open-pit mining areas [1].

Most existing detection methods rely on infrared or radar sensors to analyze and detect negative obstacles based on their local characteristics. Matthies et al. [2], for example, proposed a negative obstacle detection method based on thermal infrared images, i.e., the local intensity analysis of infrared images to determine negative obstacles. Cheng et al. [3] proposed a negative obstacle feature detection method based on radial distance and local dense features using multiple LIDAR and synthetic features, achieving good negative obstacle detection but poor detection of small targets. Kang et al. [4] proposed a single-line LIDAR and vision fusion algorithm for environment perception to address the problem of difficult small targets. However, this method has a poor detection effect and can’t adapt to the harsh and complex environment of open-pit mines.

Vision sensors have higher reliability in harsh environments, such as those with high temperatures, magnetic fields, or dust, and have good adaptability to open-pit mining. Machine vision-based negative obstacle detection methods can mainly be divided into image analysis algorithms and deep learning algorithms. Pothole detection algorithms based on 2D image analysis usually have four main steps: (1) image pre-processing; (2) image segmentation; (3) shape extraction; and (4) object recognition [5]. For example, Li et al. [6] used morphological filters [7] to reduce image noise and enhance pothole contours. The preprocessed road images are segmented using histogram-based thresholding pairs, as in Otsu’s method, used by Buza et al. [8], or the triangle method, used by Koch et al. [9]. Otsu’s method minimizes the intra-class variance and is better for separating damaged and normal road areas. Finally, the extracted area is modeled using an ellipse, and the image texture inside the ellipse is compared with the texture of the undamaged road area. If the former is coarser than the latter, the ellipse is considered a pothole [6]. However, all the above-mentioned image processing techniques are severely affected by various factors, especially insufficient light conditions, which hinder the implementation of obstacle recognition systems [10]. Therefore, some authors, such as Tsai et al. [11,12], have proposed segmentation on depth maps. This proves that better performance can be obtained when segmenting broken road areas. The fact that pits are always irregular in shape and lack sufficient texture information sometimes makes geometry and texture-based assumptions unreliable.

Deep learning methods have made great progress in recent years [13]. Silvister et al. [14] and Thiruppathiraj et al. [15] used convolutional neural networks for detecting damage to pavements, and they achieved high accuracy, though the detection speed was slow. Suong et al. [16] proposed an improved Yolo-v2 target detection network for intelligent pothole detection with an average detection accuracy reaching 82.43% and a detection speed of 21 frames per second. Chen et al. [17] proposed a location-aware convolutional neural network-based pothole detection method that could achieve both an accuracy of 95.2% and a recall of 92.0%, which is better than most existing methods. However, the real-time detection of negative obstacles for unstructured roads in open-pit mines has rarely been studied.

Therefore, this paper provides an in-depth analysis of the characteristics of unstructured roads in open-pit mines and proposes a real-time detection method for negative obstacles on open-pit roads that can adapt to the complex and harsh environment of open-pit mines. The main contributions of this paper are as follows:

(1): In this paper, we construct a dataset of negative obstacles on roads in open-pit mines and propose a real-time detection method for negative obstacles on roads in open-pit mines. This method provides the real-time, efficient, and highly reliable detection of negative obstacles for the early warning of unmanned vehicles in open-pit mining areas.
(2): We propose an improved Yolov4 negative obstacle detection convolutional neural network, using RepVGG as the backbone feature extraction network in the feature extraction phase and the SimAM attention mechanism in the feature fusion phase to solve the feature information loss problem. Finally, we use dynamic convolution to further improve the feature representation capability.
(3): We propose a new non-maximum suppression algorithm that has better detection accuracy compared with the traditional NMS algorithm and effectively solves the problem of difficult detection when the targets overlap.

2. System Model and Definitions

The proposed method for the real-time detection of negative obstacles in open-pit mines is shown in Figure 1.

In the training phase of the network: (1) the input image is preprocessed, and the backbone network is used to extract the features of the image; (2) the feature maps of different sizes are feature fused using the feature pyramid structure to achieve better multi-scale detection capability; (3) the features of different sizes are predicted in the classification and bounding box prediction module to obtain the class and bounding box location of the target; (4) finally, the results of comparing the prediction results and the correct labels are used as the basis for the reverse update of the network neurons, and the weights between the network neurons are updated; (5) steps (1)–(4) of the training process are repeated until the network is fully fitted.

In the inference process of the network, the trained network model is used to predict the images to get the obstacle classification and bounding box prediction results, and the non-maximum suppression algorithm is then used to remove the redundant prediction boxes to ensure that there is only one detection result for an obstacle.

Because the road texture of the open-pit mines is basically the same, the negative obstacles are often blurred due to complex light and stagnant water. Furthermore, the possible size of the negative obstacle may be anywhere within an enormous range according to the angle of the picture, the distance, and the varying size of road potholes. To ensure accuracy, the automated detection of negative obstacles should be a multitasking process that integrates feature extraction, target recognition, and localization. Specifically, the detection algorithm should have good capability in these three aspects: (1) extracting valid feature information about negative obstacles; (2) accurate target localization; (3) good multiscale target detection capabilities.

Mainstream target detection frameworks usually consist of a backbone network pre-trained on a target detection dataset as a feature extraction network and a head used to identify and locate the target [18]. The backbone networks for feature extraction can be divided into two categories according to the platform on which they run. Backbone networks running on GPU platforms include VGG [19], ResNet [20], DenseNet [21], etc., while backbone networks running on CPU platforms include SqueezeNet [22], MobileNet [23], and ShuffleNet [24]. Since negative obstacle detection is used in driverless vehicles, real-time target recognition is a very important metric, and GPU platforms have a natural speed advantage for image processing. The heads for target identification and localization are usually divided into two categories, ‘one-stage’ and ‘two-stage’ target detectors. Two-stage detectors include Fast R-CNN [25], Faster R-CNN [26], R-FCN [27], and Libra R-CNN [28], while for one-stage detectors, the most representative models are Yolo [29,30,31,32], SSD [33], and RetinaNet [34]. Two-stage detectors have high target recognition precision due to the inclusion of candidate box selection in the target recognition process, though this also leads to unsatisfactory detection when the target is obscured or occupies a large area of the image. A good way to solve the problem of multi-scale target recognition is to extract feature images from different stages and fuse the features after multiple upsampling and subsampling. Networks that use this feature fusion mechanism include Feature Pyramid Network (FPN) [35], Path Aggregation Network (PANet) [36], BiFPN [37], and NAS-FPN [38]. They usually place the feature fusion module between the backbone network for feature extraction and the head for target detection and localization to further extract the feature information of multi-scale targets.

Negative obstacles on the roads in open-pit mines have a large fusion range with the road, and the obstacle boundary feature may not be obvious, as shown in Figure 2a. This paper presents an in-depth analysis of Yolov4 (Yolov4: optimal speed and accuracy of object detection) [32] target detection theory and proposes a negative obstacle target detection model for one-stage mining roads based on Yolov4. The principle is shown in Figure 2b–d. The problem of obstacle detection at different scales is solved by designing three different sizes (13 × 13, 26 × 26, and 52 × 52) of a priori boxes on the feature maps.

The boundary box coordinate prediction method follows the practice of YOLOv3 (as shown in Figure 3), where

t_{x}, t_{y}, t_{w}, t_{h}

represent the predicted output of the model,

c_{x}

and

c_{y}

represent the coordinates of grid cells,

p_{w}

and

p_{h}

represent the size of the boundary box before prediction,

b_{x}

,

b_{y}

,

b_{w}

, and

b_{h}

are the center coordinates and size of the predicted boundary box, and the loss of coordinates adopts square error loss.

δ (t_{x})

and

δ (t_{y})

represent the distance between the center coordinate of the predicted boundary box and the coordinate of the network element on the

x

and

y

axes.

3. The Proposed Optimization Method

The negative obstacle detection model proposed in this paper is composed of the backbone extraction network RepVGG, a spatial pyramid pooling SPP module, a multi-scale feature bilateral fusion PANet module, and classification and prediction branches. The network model is shown in Figure 4.

3.1. Feature Extraction Network Based on RepVGG

Because the inference speed of the CSPDarkNet [39] backbone network used by Yolov4 is slow, this paper proposes RepVGG [40] as the feature extraction network to improve the Yolov4 model. The network structure of RepVGG is similar to that of the ResNet network, and the residual structure is used to solve the gradient disappearance problem of the deep network base on the VGG net. in addition, a 1 × 1 convolution and identity residual structure is used to make the network simpler and more efficient. The base block structure of the training and inference phase is shown in Figure 5. In the model inference stage, the network first merges the convolutional layer and the BN layer in the residual block and then converts all the convolutions of specific different convolutional kernels into convolutions of convolutional kernels with 3 × 3 size before finally merging the Conv3 × 3 in the residual branch. That is, the weight and the bias of all the branches are superimposed to obtain a Conv3 × 3 network layer after fusion. Since only the 3 × 3 convolution and ReLU activation function are stacked in the model inference stage, the residual structure is discarded. Since most of the current inference engines have a specific acceleration for 3 × 3 convolution, it is easier for model inference and acceleration than Yolov4′s CSPDarkNet backbone network, which uses the Mish activation function and many different convolutional kernel sizes.

The spatial pyramid pooling (SPP) module is connected to the convolution of the last feature layer of RepVGG to increase the receptive field of the neural network and separate significant contextual features to enhance the receptive field of the network for small targets. After three convolutions of the last feature layer of RepVGG, the number of channels is halved and they are processed using maximum pooling at four different scales of 13 × 13, 9 × 9, 5 × 5, and 1 × 1. The feature maps are then concatenated and the number of channels is halved again using the three-time convolution module to obtain the feature maps of the original input dimension. Because SPP uses different pooling cores for multi-faceted feature extraction and re-aggregation, the robustness of the network is stronger, and the detection performance of the network is improved.

3.2. Multiscale Feature Fusion Based on Channel-Wise and Spatial-Wise Attention

For unstructured negative road obstacles in open-pit mines with large size spans, irregular shapes, and multi-scale features, the PANet module is used in the detection network model for multi-layer feature fusion. The PANet structure performs multiple upsampling and downsampling processes for feature fusion to strengthen the convergence effect. However, in the process of upsampling, the nearest interpolation method will lose a certain amount of local feature map information. Therefore, the SimAM [41] spatial-wise and channel-wise attention mechanisms are introduced to mine the important features of each neuron between channels.

Existing attention modules in computer vision focus on the channel domain or spatial domain corresponding to feature-based attention and spatial-based attention in the human brain. However, in humans, these two mechanisms coexist and together facilitate information selection during visual processing. SimAM defines the following energy function for each neuron:

e_{t} (ω_{t}, b_{t}, γ, x_{i}) = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(- 1 - (ω_{t} x_{i} + b_{t}))}^{2} + {(1 - (ω_{t} x_{i} + b_{t}))}^{2} + λ ω_{t}^{2} .

(1)

Here,

\hat{t} = ω_{t} t + b_{t}

and

\hat{x_{i}} = w_{t} x_{i} + b_{t}

are the linear transformations of

t

and

x_{i}

, where

t

and

x_{i}

are the target neuron and other neurons in a single channel of the input feature

X \in ℝ^{C \times H \times W}

,

i

is the index over spatial dimension,

M = H \times W

is the number of neurons on the channel, and

w_{t}, b_{t}

is the weighting and bias transform. By solving the above equations, fast closed-form solutions for

ω_{t}

and

b_{t}

can be obtained:

ω_{t} = - \frac{2 (t - μ_{t})}{{(t - μ_{t})}^{2} + 2 σ^{2} + 2 λ} .

(2)

b_{t} = - \frac{1}{2} (t + μ_{t}) ω_{t} .

(3)

Here,

μ_{t} = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} x_{i}

,

σ^{2} =

\frac{1}{M - 1} \sum_{i}^{M - 1} {(x_{i} - μ_{t})}^{2} .

are mean and variance calculated over all neurons except

t

in that channel. Since the existing solutions shown in the above equation were obtained on a single channel, it is reasonable to assume that all pixels in a single channel follow the same distribution. Given this assumption, the mean and variance can be computed on all neurons and reused on that channel for all neurons. This can significantly reduce the computational cost and avoid iterative calculations

μ

and

σ

for each position. Thus, the following minimum energy equation is obtained:

e_{t}^{*} = \frac{4 (\hat{σ^{2}} + λ)}{{(t - \hat{μ})}^{2} + 2 {\hat{σ}}^{2} + 2 λ} .

(4)

Here,

\hat{μ} = \frac{1}{M} \sum_{i = 1}^{M} x_{i}

,

{\hat{σ}}^{2} =

\frac{1}{M - 1} \sum_{i = 1}^{M} {(x_{i} - \hat{μ})}^{2}

in the above equation implies that the lower the energy, the more the neuron t is distinguished from the surrounding neurons, and the higher the importance. Therefore, the importance of a neuron can be obtained by d. Finally, the features are augmented using sigmoid as follows:

\tilde{X} = s i g m o i d (\frac{1}{E}) ⊙ X .

(5)

Here, E groups all

e_{t}^{*}

across channels and spatial dimensions. Therefore, this paper uses the SimAM attention mechanism to calculate the importance of neurons in each channel in upsampling and enhance the corresponding neuron features so as to improve the problem of feature information loss in the process of upsampling interpolation.

3.3. Optimization of Classification and Prediction Module

The negative obstacle recognition and localization module is responsible for the interaction of feature maps and local features using 3 × 3 regular convolution and 1 × 1 dynamic convolution for three different size feature images to complete the classification regression operation. However, the conventional 1 × 1 convolution does not permit strong feature characterization, and this limits the detection performance of the model. Therefore, dynamic convolution [42] is introduced in the classification regression layer of the model to find a balance between network structure and computational consumption and to increase the expressive ability of the model without increasing the depth or width of the network. That is, the convolution parameters are adaptively adjusted according to the input image. Instead of using one convolutional kernel in each layer, it adjusts the weight of each convolutional kernel and makes targeted choices based on the dynamic aggregation of multiple parallel convolutional kernels. Appropriate parameters are used to extract features. Firstly, a general dynamic perceptron is defined, as is shown in Figure 6.

The output result is generally expressed as

y = g (W^{T} x + b)

, where

W, b, g

represent the weight, bias, and activation function, respectively. The perceptron is therefore defined as follows:

y = g ({\tilde{W}}^{T} x + \tilde{b})

,

\tilde{W} = \sum_{k = 1}^{K} π_{k} (x) {\tilde{W}}_{k}

,

\tilde{b} = \sum_{k = 1}^{K} π_{k} (x) {\tilde{b}}_{k},

s.t

0 \leq π_{k} (x) \leq 1

,

\sum_{k = 1}^{K} π_{k} (x) = 1

.

Here,

π_{k}

represents the attention weight, which is not fixed, but which changes as the input changes, including the attention weight calculation and dynamic weight fusion, as shown in Equation (6):

O ({\tilde{W}}^{T} x + \tilde{b}) ≫ O (\sum π_{k} {\tilde{b}}_{k}) + O (π (x))

(6)

Like the dynamic perceptron, the dynamic convolution also has K convolution kernels, as shown in Figure 7.

After the dynamic convolution, the BN layer and the ReLU layer are connected and the K kernels are set with the same scale and number of channels for a certain layer and merged through their respective attention weights to obtain the convolution kernel parameters of the layer. At the same time, the global average pooling is first performed in the attention layer to obtain global spatial features and then mapped to K dimensions by two fully connected layers. Finally, softmax normalization is performed, so that the attention weights obtained can be assigned to the K kernels of the layer. The original fixed convolutional kernels are now dynamically selected according to the input, which significantly improves the feature representation capability.

3.4. Optimization of Target Positioning

When the negative obstacle detection model is calibrating the accurate position of the target, the same obstacle will often output multiple suspected target detection frames with high confidence. In order to remove the repeated false detection frames, each object has only one detection result, as is shown in Figure 8. The use of a non-maximum suppression (NMS) [43] algorithm to obtain the local maximum is common in computer vision. The traditional NMS is not accurate for boundary frame localization and it may easily cause false suppression for similar negative obstacles. In this paper, we propose a new non-maximum suppression method (CIoU Soft Non-Maximum Suppression, CS-NMS) to calculate the confidence of each detection frame with weighted optimization so as to achieve the accurate localization of negative obstacle targets.

Intersection over union (IoU) [44], which measures the intersection of the predicted and true boxes, is the most popular assessment method used in target detection benchmarks, but it cannot be measured and evaluated when there is no intersection between the predicted box and the real box. In order to solve this problem, three important geometric elements of the boundary frame need to be considered: (1) overlapping area; (2) center point distance; and (3) aspect ratio. Complete intersection over union (CIoU) [45] adds a centroid normalized distance and an influence factor αv that takes into account the predicted box aspect ratio, the true box aspect ratio, and the above three factors.

IoU = \frac{|A \cap B|}{|A \cup B|}

(7)

CIoU = \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v, v = \frac{4}{^{π^{2}}} {(\arctan \frac{ω^{g t}}{h^{g t}} - \arctan \frac{ω}{h})}^{2}, α = \frac{v}{(1 - IoU) + v} .

(8)

Here, A is the real box and B is the prediction box,

b, b^{g t}

denote the center point of the prediction box and the real box,

ρ

is the calculation of the European distance between two center points,

c

represents the diagonal distance of the smallest closed region that can contain both predicted and real boxes, and

(ω, h)

and

(ω^{g t}

,

h^{g t})

represent the width and height of the predicted and real frames, respectively. The traditional non-maximal suppression algorithm will directly zero the score of the current detection box and the highest score detection box when the IoU of the box is greater than the threshold, which will lead to target boxes with a large overlapping areas being overlooked.

S_{i} = \{\begin{matrix} S_{i}, I o U (M, B_{i}) < ε \\ 0, I o U (M, B_{i}) \geq ε \end{matrix} .

(9)

Here,

S_{i}

indicates the score of the current detection frame

B_{i}

,

ε

is the threshold value of IoU, and M is the highest scoring detection box. The current detection box score is multiplied by a weight function that attenuates the scores of neighboring detection boxes that overlap with M. The more the detection box overlaps with M, the more serious the overlap attenuation. The Soft-NMS [46] chooses the Gaussian function as the weighting function to de-reduce the score of the prediction box instead of the original score, rather than directly zeroing it, thus modifying its rule of removing the detection box. The Gaussian weight function is as follows:

S_{i} = S_{i} e^{- \frac{iou {(M, b_{i})}^{2}}{σ}} .

(10)

In the Soft-NMS, IoU values are used to suppress redundant detection boxes, as the overlap area between the predicted and real boxes is the only factor, resulting in false suppression in cases where there is masking of the detected target. This paper considers the use of CIoU, a more accurate measure of detection frames, as a Soft-NMS criterion. This is because, in the suppression criterion, not only the overlapping area but also the distance between the centroids of the two frames and the aspect ratio of the current detection frame should be considered. However, the Soft-NMS reduces the confidence of all prediction boxes using the Gaussian function. There is a negative impact for those prediction boxes with CIoU scores below the threshold

ε

. The effect of the Soft-NMS with CIoU overlay instead becomes worse. This paper defines the new non-maximal suppression method CS-NMS as:

S_{i} = \{\begin{matrix} S_{i}, I o U - R_{C I o U} (M, B_{i}) < ε \\ S_{i} e^{- \frac{R_{C I o U} {(M, B_{i})}^{2}}{σ}}, I o U - R_{C I o U} (M, B_{i}) \geq ε \end{matrix}

(11)

where

S_{i}

denotes the score of the current detection box

B_{i}

,

ε

is the CS-NMS threshold, and M is the detection box with the highest score. It uses the Gaussian function to reduce the prediction box IoU scores that are higher than the threshold value, while the scores below the threshold value are kept unchanged. The algorithm flow is as follows (Algorithm 1):

Algorithm 1: The algorithm in original Equation (1) (Soft-NMS) is replaced with Equation (2) (CS-NMS).

Input:

B = \{b_{1}, \dots, b_{N}\}, S = \{s_{1}, \dots, s_{N}\}, N_{t}

(B is the list of initial detection boxes, S contains corresponding detection scores, ε

(B is the list of initial detection boxes, S contains corresponding detection scores, ε is the NMS threshold.)

Begin:

D \leftarrow {}

While

B \neq empty

do

m \leftarrow argmax S

M \leftarrow b_{m}

D \leftarrow D \cup M; B \leftarrow B - M

For

b_{i}

in B do

S_{i} \leftarrow S_{i} f (i o u (M, b_{i}))

Soft-NMS (1)

if

ciou (M, b_{i}

) \geq ε

then

S_{i} \leftarrow S_{i} f (c i o u (M, b_{i}))

CS-NMS (2)

end
end
return D, S
end

4. Performance Analysis

4.1. Dataset Construction and Experimental Setup

Roads in the open-pit mines consist of unstructured roads and some temporary semi-structured roads. The datasets used in this paper were collected from a metallic open-pit mine in Henan Province and a non-metallic open-pit mine in Hubei Province. There are 4150 images in the dataset, including 895 images that contain only background texture information and do not contain negative obstacles. The image resolution is 1080 × 1080, and 3200 images are used for the training set and 950 images are used for the test set. Because stagnant water is characteristically different from road potholes, this dataset classifies negative obstacles into two categories, potholes and stagnant water. The background is involved as a separate category during training, so this model is actually a three-category target detection model.

The training of deep learning models requires a large number of samples, and the original sample involved in the training included only 3220 images. As shown in Figure 9, in order to achieve the ideal training effect, we have carried out data augmentation on the original pictures, including horizontal and vertical flips, mirroring, changing brightness, adding Gaussian noise, and rotating at a certain angle (−15°~15°). The sample size after data augmentation was four times the original sample for the training set, and the distribution of samples in the dataset before and after augmentation is shown in Table 1.

The computer used in this experiment is configured with an Intel i7-7800X CPU, an NVIDIA GeForce 2080 Ti(11G) GPU, and its operating system is Windows 10 Professional. The network model of this experiment is based on the Pytorch 1.2 framework, adopts the method of migration learning, and introduces the weights of the pre-trained VOC dataset. The initial learning rate was set as 0.0016, the momentum was set as 0.9, and two-stage training was carried out. First, part of the feature extraction network layer of the network was frozen to speed up the training, and the batch size was set to 16, the epoch to 20, and the attenuation factor to 0.001. All layers were then thawed for training, and the batch size was set to 8, the epoch to 80, and the attenuation factor to 0.0001.

In order to validate the effectiveness of the negative obstacle risk detection model, the experiment used precision (P), recall (R), average precision (AP), mean average precision (mAP), and miss rate (MR) as criteria for quantitative evaluation:

P = \frac{T P}{T P + F P}, R = \frac{T P}{T P + F N}, AP = \frac{\sum_{1}^{n} Pression \times Reacll}{N}, mAP = \frac{\sum_{1}^{n} A P}{n}, MR = \frac{F N}{T P + F N}

Here, TP is a correctly detected target, FP is a background incorrectly detected as a target, FN is a target that failed to be detected, n indicates the class of target detection, and N stands for the number of obstacles detected. As shown in the formula, precision and recall cancel each other out; increasing one of them will usually reduce the other, and the most common way to achieve a balance between these two metrics is to use AP. This measures the accuracy of the model for this class of target detection. mAP is the average of all AP, which measures the accuracy of the overall model. Better comprehensive performance of the model indicates that it can better detect negative obstacles in extreme situations and control the driving risks more effectively. Therefore, each performance index of the model should achieve better scores in order to adapt the risk detection to the harsh environment of the open-pit mines.

4.2. Performance Validation of Models

In order to verify the performance of our model, the negative obstacle detection optimization model of this paper was compared with the current mainstream target detection network model for experiments, and the experimental results are shown in Table 2.

The negative obstacle detection model proposed in this paper has good comprehensive detection performance, with an accuracy of 94.15%, and it achieves a mAP of 96.35%. This means that our model can more accurately detect negative obstacles and avoid potential risks while the truck is in motion. In open-pit mining areas, trucks travel at 20–30 km/h. In such a motion scenario, higher demands are placed on the timeliness of the detector. The camera frame rate of driverless vehicles is 60 fps, but the detection speed needs to be higher than 60 fps to meet the requirements of real-time detection. Compared with Yolov4, the RepVGG backbone network used in this paper, due to its inter-layer fusion and discarding of residual branches in the inference stage, greatly improves the detection speed while ensuring accuracy, reaching 69.3 fps, which is enough to meet the requirements of safe driving in motion scenes. We chose the D0 version of EfficientDet [35] with the least number of network layers because it’s simpler for the feature extraction layer and it has good timeliness. RatinaNet, one of the latest single-stage networks, has good feature extraction efficiency and high detection accuracy. RatinaNet incorporates focal loss to solve the imbalance of the number of difficult and easy samples, enabling more accurate target detection and localization, but it also suffers from the shortcomings of low recall and poor timeliness. In addition, both EfficientDet and RatinaNet have high miss rates, which can lead to a large number of undetected targets and threaten driving safety. An improved version of Yolov3, Yolov4 has made significant progress in detection accuracy, recall rate, false detection rate, and timeliness. The CSPNet structure used enhances the feature extraction efficiency and achieves a good balance of detection accuracy and speed. Experiments show that our model has good applicability for negative obstacle detection in open-pit mines, with high precision and high real-time characteristics, and can meet the driving safety requirements of unmanned vehicles in complex environments.

Figure 10 shows the negative obstacle recognition results, which include a variety of situations, such as potholes, stagnant water, complex environments, multiple targets, and complex light. The red boxes are the pothole targets, the blue boxes are the stagnant water targets, and the upper left corner of the detection boxes show the target category and confidence level.

In Figure 10a, YOLO v3 has a classification error, our model and EfficientNet are not precise enough for locating negative obstacles, and RatinaNet performs best. In Figure 10b,e where there was obscured feature information and detection was difficult, our model and YOLO v3 performed the best. Due to the minimal number of layers in the EfficientNet network, it was difficult to extract more feature information, and this resulted in poor detection performance in complex situations. In general, our model performs well in complex and multi-target situations and has strong multi-scale target detection ability, enabling it to reliably meet the requirements of harsh open-pit mine environments.

4.3. Effectiveness of Model Training Methods

In the training of the negative obstacle detection model, this experiment used the VOC2007 + 2012 dataset as the pre-training data for migration learning. The model iteration was performed 96,850 times, and the loss convergence status is shown in Figure 11.

It can be seen from Figure 11a that the model tended to fit after 60,000 iterations. It can be seen from Figure 11b that due to migration learning training, the model has learned a large number of common underlying image features and is able simultaneously to reuse bottom-level features and fine-tune high-level abstract features in downstream tasks. This results in smaller initial loss values and faster convergence of the network, and it stabilized at a smaller range of values after 5000 iterations. The model in this paper has better convergence than Yolov4.

As mentioned above, in order to make the deep learning model achieve better accuracy and speed, the more efficient RepVGG network is used in the feature extraction stage, while the SimAM attention mechanism is introduced in the feature fusion module, dynamic convolution is used in the classification regression, and a new CS-NMS algorithm is proposed in the post-processing stage. Because the dataset in this paper is too small to accurately reflect the detection accuracy of the network, in the process of migration learning, the public dataset VOC2007 Test is used for testing at the same time. The results of the ablation experiment are shown in Table 3. Data augmentation can effectively improve the detection accuracy and the overfitting effect of the model for the negative obstacle dataset. Furthermore, adding dynamic convolution, SimAM attention, and CS-NMS to the model further improves the accuracy of the model and obtains a performance improvement of 2–3% mAP for both the VOC dataset and the negative obstacle dataset.

4.4. Performance Analysis of CS-NMS Algorithm

Since most potholes on the road do not have clear boundaries, target positioning is a difficult point. As is shown in Figure 12, after the introduction of the CS-NMS, the accuracy of bounding box positioning has been improved. In addition, the error suppression in the case of obscured targets has been reduced for the multi-target situation and the mAP has been improved by 0.25% compared with the unimproved network. However, the test set in this paper is small and the sample type is single, and this cannot accurately reflect the effect of the CS-NMS.

Therefore, we used Yolov4 and RetinaNet to verify the CS-NMS in the VOC2007 test set. The test result is shown in Figure 13. Yolov4 uses CS-NMS to increase the mAP by 0.85%. Since the confidence of the detected target in the standard test is 0.01, a large number of low-confidence false targets will be generated without good engineering application value, so this paper uses a confidence of 0.5 and an input image size of 416 × 416.

As is shown in Figure 13, in most cases, the CS-NMS proposed in this paper performs better than the Soft-NMS. When the threshold

ε

is equal to 0.46, the performance of the two is similar, and when the threshold

ε

is equal to 0.49, the three kinds of NMS achieve the best performance, and the CS-NMS performs the best. Under each threshold

ε

, the CS-NMS is better than or equal to the original Soft-NMS. Even the worst-performing CS-NMS can at least achieve a performance close to the best performance of the Soft-NMS, which means that even if the threshold

ε

is not adjusted in other target detection networks, the CS-NMS proposed in this paper will perform better.

Because the CS-NMS code proposed in this article is very concise and can be easily transplanted to other one-stage networks, RetinaNet has also obtained a 0.72% mAP improvement by using the CS-NMS on the VOC2007 test set. The test shows that the CS-NMS proposed in this paper can significantly improve the accuracy of target detection, and at the same time has good portability.

5. Industrial Applications

As mentioned above, the negative obstacle detection model proposed in this paper performs better than other methods, so we propose applying the algorithm to driverless trucks. Figure 14 shows a photo of a driverless truck and the proposed sensor location.

At present, driverless trucks, mostly use radar as the environment sensing sensor and cannot solve the problem of negative obstacle detection on unstructured roads, as is shown in Figure 15. In this paper, the video data collected by the camera are detected and recognized by IPC, and decisions are made. After receiving the control information, the vehicle chassis adjusts the vehicle path in time.

6. Conclusions and Future Work

In this paper, we propose a Yolov4-based target detection network model for negative obstacle recognition on unstructured roads in open-pit mining areas which can detect negative obstacles in open-pit mining areas more accurately and quickly. This method meets the demand for the fast and accurate recognition of negative obstacle targets in front of unmanned vehicles in complex road conditions such as open-pit mining areas. The experimental analysis shows that the negative obstacle detection model proposed in this paper has a good recognition effect in a variety of road scenarios in open-pit mining areas. The detection model achieves 96.35% mAP, 94.15% accuracy, and 94.18% recall, and is also in a leading position compared with mainstream target detection networks. The CS-NMS proposed in the paper is helpful for improving target detection accuracy and has good portability. This paper applies a deep learning target detection framework to negative obstacle detection in open-pit mines, demonstrating the feasibility of target detection applied in complex environments (e.g., mines) with obscured obstacle features, and provides a new solution for future unmanned mining truck obstacle warnings.

The difficulty and high risk involved in data collection in open-pit mines have resulted in small datasets. The next step should be to increase the number of samples and further improve the accuracy of the detection network. In addition, the improved model proposed in this paper is expected to be applied to more complex scenes and solve the problems associated with positive and negative obstacles in open-pit mining areas.

Author Contributions

S.L.: original draft preparation. S.R.: conceptualization, project, supervision. C.L. and Q.G. validation, review, and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (grant number 51208282, 52074205).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The article has been written by the stated authors who are all aware of its content and approve its submission. The authors declare no conflict of interest.

References

Zhang, Q.; Zhou, C.; Tian, Y.C.; Xiong, N.; Qin, Y.; Hu, B. A fuzzy probability Bayesian network approach for dynamic cybersecurity risk assessment in industrial control systems. IEEE Trans. Ind. Inform. 2017, 14, 2497–2506. [Google Scholar] [CrossRef] [Green Version]
Matthies, L.; Rankin, A. Negative obstacle detection by thermal signature. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA, 27–31 October 2003; Volume 1, pp. 906–913. [Google Scholar]
Chen, L.; Yang, J.; Kong, H. Lidar-histogram for fast road and obstacle detection. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation, Singapore, 29 May–3 June 2017; pp. 1343–1348. [Google Scholar]
Kang, B.; Choi, S. Pothole detection system using 2D LiDAR and camera. In Proceedings of the 2017 Ninth International Conference on Ubiquitous and Future Networks, Milan, Italy, 4–7 July 2017; pp. 744–746. [Google Scholar]
Dhiman, A.; Klette, R. Pothole detection using computer vision and learning. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3536–3550. [Google Scholar] [CrossRef]
Li, S.; Yuan, C.; Liu, D.; Cai, H. Integrated processing of image and GPR data for automated pothole detection. J. Comput. Civ. Eng. 2016, 30, e04016015. [Google Scholar] [CrossRef]
Pitas, I. Digital Image Processing Algorithms and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
Buza, E.; Omanovic, S.; Huseinovic, A. Pothole detection with image processing and spectral clustering. In Proceedings of the 2nd International Conference on Information Technology and Computer Networks, Antalya, Turkey, 8–10 October 2013; Volume 810, p. 4853. [Google Scholar]
Koch, C.; Brilakis, I. Pothole detection in asphalt pavement images. Adv. Eng. Inform. 2011, 25, 507–515. [Google Scholar] [CrossRef]
Hussein, A.; Marín-Plaza, P.; Martín, D.; de la Escalera, A.; Armingol, J.M. Autonomous off-road navigation using stereo-vision and laser-rangefinder fusion for outdoor obstacles detection. In Proceedings of the 2016 IEEE Intelligent Vehicles Symposium, Gotenburg, Sweden, 19–22 June 2016; pp. 104–109. [Google Scholar]
Tsai, Y.C.; Chatterjee, A. Pothole detection and classification using 3D technology and watershed method. J. Comput. Civ. Eng. 2018, 32, 04017078. [Google Scholar] [CrossRef]
Ahmed, A.; Ashfaque, M.; Ulhaq, M.U.; Mathavan, S.; Kamal, K.; Rahman, M. Pothole 3D Reconstruction With a Novel Imaging System and Structure From Motion Techniques. IEEE Trans. Intell. Transp. Syst. 2022, 23, 4685–4694. [Google Scholar] [CrossRef]
He, R.; Xiong, N.; Yang, L.T.; Park, J.H. Using multi-modal semantic association rules to fuse keywords and visual features automatically for web image retrieval. Inf. Fusion 2011, 12, 223–230. [Google Scholar] [CrossRef]
Silvister, S.; Komandur, D.; Kokate, S.; Khochare, A.; More, U.; Musale, V.; Joshi, A. Deep learning approach to detect potholes in real-time using smartphones. In Proceedings of the 2019 IEEE Pune Section International Conference, Pune, India, 18–20 December 2019; pp. 1–4. [Google Scholar]
Thiruppathiraj, S.; Kumar, U.; Buchke, S. Automatic pothole classification and segmentation using android smartphone sensors and camera images with machine learning techniques. In Proceedings of the 2020 IEEE REGION 10 CONFERENCE, Osaka, Japan, 16–19 November 2020; pp. 1386–1391. [Google Scholar]
Suong, L.K.; Kwon, J. Detection of Potholes Using a Deep Convolutional Neural Network. J. Univers. Comput. Sci. 2018, 24, 1244–1257. [Google Scholar]
Chen, H.; Yao, M.; Gu, Q. Pothole detection using location-aware convolutional neural networks. Int. J. Mach. Learn. Cybern. 2020, 11, 899–911. [Google Scholar] [CrossRef]
Bhagya, C.; Shyna, A. An overview of deep learning based object detection techniques. In Proceedings of the 2019 1st International Conference on Innovations in Information and Communication Technology, Chennai, India, 25–26 April 2019; pp. 1–6. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Huang, G.; Liu, Z. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Iandola, F.N.; Han, S.; Mskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. arXiv 2017, arXiv:1707.01083. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2016, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed]
Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. arXiv 2016, arXiv:1605.06409. [Google Scholar]
Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra r-cnn: Towards balanced learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadim, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2016; pp. 2117–2125. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Ghiasi, G.; Lin, T.Y.; Le, Q.V. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7036–7045. [Google Scholar]
Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.-Y.; Hsieh, J.-W. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Shenzhen, China, 26 February–1 March 2021; pp. 11863–11874. [Google Scholar]
Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11030–11039. [Google Scholar]
Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition, Hong Kong, China, 20–24 August 2006; Volume 3, pp. 850–855. [Google Scholar]
Rahman, M.A.; Wang, Y. Optimizing intersection-over-union in deep neural networks for image segmentation. In Proceedings of the International Symposium on Visual Computing, Las Vegas, NV, USA, 12–14 December 2016; Springer: Cham, Switzerland, 2016; pp. 234–244. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS--improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]

Figure 1. Negative obstacle detection method in open-pit mines.

Figure 2. Negative obstacle characteristic and Yolov4 detection principle.

Figure 3. Bounding box for size and position prediction.

Figure 4. Network infrastructure.

Figure 5. RepVGG base block of training and identity.

Figure 6. Dynamic perceptron.

Figure 7. Dynamic convolution.

Figure 8. (a) Not using NMS, (b) Using NMS.

Figure 9. Data augmentation.

Figure 10. Detection results for four algorithms in different scenarios. From left to right, our model, YOLO v3, EfficientNet-D0, and RetinaNet. Red represents potholes, blue represents stagnant water. (a,b) Potholes; (c) stagnant water; (d,e) multiple pothole targets; (f,g) multiple stagnant water targets; (h): both stagnant water and potholes.

Figure 11. (a) Loss convergence curve of VOC dataset training; (b) loss convergence curve of negative obstacle dataset.

Figure 12. Enhancement of multi-target effects using CS-NMS. (a) Traditional NMS; (b) CS-NMS.

Figure 13. Performance comparison of NMS algorithm under different thresholds.

Figure 14. Schematic of driverless truck and sensor locations.

Figure 15. Application of negative obstacle detection method in open-pit mines.

Table 1. Sample distribution in the dataset before and after augmentation.

Class	Pothole	Stagnant Water	Regular	Total Images
Training Set	5900	5330	755	3200
Augmented Training Set	23,720	21,320	3200	12,800
Test Set	1368	867	140	950

Table 2. Experimental results for each model for the negative obstacle detection dataset.

Model	mAP/%	MR/%	P/%	R/%	Speed/fps
EfficientDet-D0 512 × 512	81.52	23.45	91.37	76.55	94
Faster-RCNN 600 × 600	91.54	9.25	97.79	90.75	17.2
SSD 500 × 500	90.51	10.47	92.23	89.53	56.1
RatinaNet600 × 600	94.17	21.01	98.94	96.43	30.5
YOLOv3 416 × 416	91.71	9.72	88.8	90.28	45.6
YOLOv4 416 × 416	96.37	3.36	94.26	96.64	48.1
Our Model 416 × 416	96.35	7.82	94.15	92.18	69.3

Table 3. Performance comparison of model training methods.

Data Augmentation	RepVGG	CondConv	SimAM	CS-NMS	Pothole (AP)/%	Stagnant Water (AP)/%	mAP/%	VOC mAP/%
✔	✔	-	-	-	97.90	90.60	94.25	79.81
✔	✔	✔	-	-	98.00	93.76	95.51	80.48
✔	✔	✔	✔	-	98.06	94.13	96.10	82.03
-	✔	✔	✔	✔	88.06	92.08	90.34	-
✔	✔	✔	✔	✔	98.00	94.70	96.35	82.16

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ruan, S.; Li, S.; Lu, C.; Gu, Q. A Real-Time Negative Obstacle Detection Method for Autonomous Trucks in Open-Pit Mines. Sustainability 2023, 15, 120. https://doi.org/10.3390/su15010120

AMA Style

Ruan S, Li S, Lu C, Gu Q. A Real-Time Negative Obstacle Detection Method for Autonomous Trucks in Open-Pit Mines. Sustainability. 2023; 15(1):120. https://doi.org/10.3390/su15010120

Chicago/Turabian Style

Ruan, Shunling, Shaobo Li, Caiwu Lu, and Qinghua Gu. 2023. "A Real-Time Negative Obstacle Detection Method for Autonomous Trucks in Open-Pit Mines" Sustainability 15, no. 1: 120. https://doi.org/10.3390/su15010120

APA Style

Ruan, S., Li, S., Lu, C., & Gu, Q. (2023). A Real-Time Negative Obstacle Detection Method for Autonomous Trucks in Open-Pit Mines. Sustainability, 15(1), 120. https://doi.org/10.3390/su15010120

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Real-Time Negative Obstacle Detection Method for Autonomous Trucks in Open-Pit Mines

Abstract

1. Introduction

2. System Model and Definitions

3. The Proposed Optimization Method

3.1. Feature Extraction Network Based on RepVGG

3.2. Multiscale Feature Fusion Based on Channel-Wise and Spatial-Wise Attention

3.3. Optimization of Classification and Prediction Module

3.4. Optimization of Target Positioning

4. Performance Analysis

4.1. Dataset Construction and Experimental Setup

4.2. Performance Validation of Models

4.3. Effectiveness of Model Training Methods

4.4. Performance Analysis of CS-NMS Algorithm

5. Industrial Applications

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI