1. Introduction
Rock lithology classification has always been an indispensable part of engineering fields. However, in-situ rock shows different physical and mechanical properties due to varying mineral compositions, geological mineralization conditions, and internal structures. It is significant for engineers to understand in-situ rock lithology accurately and efficiently prior to engineering design and construction, excavation, and support schedules. In the past, it mainly depended on physical or chemical analysis methods. Technicians classified in-situ rock types by observing rock mineral composition and crystalline structure through magnifying glasses. They also brought rock samples back to the laboratory, made thin sections, and then analyzed the internal structure under the microscope to finish rock type classification. In addition, chemical analysis is also employed for rock lithology classification. Among the aforementioned traditional methods, their final classification accuracy directly depends on the experiences and professionalism of technicians. Hence, enormous subjectivity exists. Moreover, the data preparation process is extremely tedious, time-consuming, and inefficient.
Recently, computer vision technologies and artificial intelligence (AI) have rapidly developed and are widely applied in our daily lives. There are two mainstream AI algorithms: machine learning and neural networks. Machine learning algorithms such as support vector machine (SVM), random forest (RF), decision tree (DT), and logistic regression (LR) aim to solve continuous variable prediction or classification problems. However, the scale and complexity of data that machine learning methods could address are relatively small and uncomplicated. In contrast, neural networks mainly imitate human beings’ biological neural instincts. The neural cells selectively retain active or inactive status to the input information, transmit this status signal to the subsequent adjacent neurons, and finally finish the response. Usually, the more data there are, the more complex of data uncertainties, nonlinearities, and interrelationships, while compared to machine learning, neural networks have powerful abilities of self-learning and automatic feature extraction on big data problems. Therefore, it has quickly been promoted over the last decades.
Feedforward neural network (FNN), convolutional neural network (CNN), recurrent neural network (RNN), and generative adversarial network (GAN) constitute the basic format of neural networks. However, CNN is one of the most successful and widely used methods. The applications include the earliest MNIST handwritten digital numbers identification [
1], Cifar10 image classification [
2], object detection [
3,
4,
5,
6,
7] on open-source dataset: the PASCAL Visual Object Classes Homepage (Pascal VOC; ox.ac.uk)) and Common Objects in Context (COCO; cocodataset.org. accessed 13 December 2021)), face recognition [
8], natural language processing [
9,
10], remote sensing [
11,
12], autonomous driving perception technologies [
13], as well as industrial equipment fault detection, and medical CT analysis [
14,
15,
16], etc.
CNN shows excellent performance on the image process aspect. Many researchers are combining CNN with empirical methods or numerical simulation methods to address rock mechanics engineering problems, and the final results proved to be more scientific and optimized. Karimpouli et al. [
17,
18,
19,
20] combined CNN to estimate rock physical properties. Chen et al. [
21,
22,
23] studied landslide automatic recognition with satellite imagery based on CNN. Dong [
24] diagnosed structural tunnel damage through CNN, and Yang [
25] optimized TBM parameters by adopting CNN to analyze rock fragment size. Kovačević et al. [
26,
27] proposed a CNN to predict tunnel deformation and slope susceptibility. In addition, rock lithology automatic classification has also attracted the attention of researchers in recent years, and the existing experimental results illustrated that the overall classification performance based on CNN is more robust than traditional methods. However, rock lithology automatic classification research has undergone three development stages.
In the first stage, researchers used thin section images or features extracted from thin-section images or microscopic images as the input of the convolutional neural network. Cheng [
28] applied CNN to recognize three types of sandstones of different granularities with 98.5% precision based on thin-section images. Singh [
29] used thin section texture features to identify different basalt rock samples, and the neural network input is 27-dimensional numerical features extracted from RGB or grayscale thin section images, in this way, the accuracy could also reach 92.22%. Anjos [
30] achieved three different types of carbonate rock identification by micro-CT images, and the best performance was over 81.33% precision. Li [
31] used a transferred TrAdaBoost method to solve four interregional sandstone microscopic image classifications. Marmo [
32] trained a multilayer perception neural network to identify four types of Dunham carbonate. They used numerical methods to extract 23-dimensional features from thin-section images, and the method showed 93% precision. Su [
33] trained three neural network models based on thin-section images and assigned different weights to each model, and then the combined result of three models was viewed as the final output label.
In recent years, image processing technology has become increasingly reliable, and CNN has been directly adopted to classify single-type rock images. Wang [
34] realized a lightweight neural network algorithm for identifying rock images based on MobileNets [
35], and the method can accurately classify 25 single-type rocks with 93.45% precision. Ran [
36] proposed RTCNNs to identify six typical rock types based on CNN. They cut the original high-resolution image into several patches, the input size was defined as 128 × 128 × 3, and the final classification accuracy was 97.96%. ShuffleNet [
37], a commonly used lightweight convolutional neural network, was transferred to recognize rock lithology [
38]. Wang [
39] introduced CNN to realize the identification of four types of slope rock, and the test dataset accuracy was 90%. In Mars exploration, CNN was also adopted. Li [
40] used VGG16 as a backbone network to classify four groups of Martian rocks, and the accuracy achieved approximately 100% on the test dataset. Pham [
41] used a deep residual neural network (ResNet) and combined some data augmentation technologies to identify ten typical rock types with an overall accuracy of 84% on the test dataset. Fan [
42] performed a comparison experiment of two standard convolutional neural networks, SqueezeNet and MobileNet, and the classification results were 94.55% and 93.27%, respectively, on 28 kinds of single-class rock.
Thirdly, different from the previous two methods, Liu [
43] realized the precise and intelligent identification of rock types by using the object detection method. Object detection needs to not only detect the location of all objects in an image but also classify all targets. Liu [
43] used Faster R-CNN [
7], a deep learning neural network method, and achieved single-type rock recognition with 96% precision, while for hybrid multitype rock detection, the accuracy was only over 80%. Xu [
44] also adopted the Faster R-CNN architecture and ResNet structure to classify 30 types of rock lithologies, and the accuracy was over 93.916%, but it was also a single-type image.
In this paper, a novel convolutional neural network named RDNet was developed for automatic detection of multiple types of mixed rock lithologies. The proposed method was optimized based on YOLO-V3 [
4], and spatial pyramid pooling (SPP) [
45] structure, which was added to detect multiscale objects as much as possible. Furthermore, neural network model compression technology was used to improve the model detection efficiency. In addition, a new data augmentation method was transferred to extend dataset diversity. For comparison purposes, the presented model and three other algorithms, including Faster R-CNN, YOLO-V3, and SSD [
5], were trained based on the same four types of hybrid rock data: sandstone, shale, monzogranite, and tuff. Finally, the experimental results showed that our method (RDNet) exhibits excellent performance, and the model inference time is extremely fast up to real time, requiring only 11 milliseconds of single image detection. Consequently, it is feasible to transplant the algorithm to the embedded hardware device or Android platform to realize productization.
3. Experimental Results and Discussion
YOLO-V3, Simplified-Net, RDNet, RDNet + Aug, Faster R-CNN and SSD, together six models mentioned in this paper were all trained and validated under the same dataset. Where, YOLO-V3, Simplified-Net, RDNet and RDNet + Aug used the same parameters setting, Faster R-CNN and SSD retained the algorithms default parameters.
3.1. Training
The models of YOLO-V3, Simplified-Net, RDNet and RDNet + Aug were trained under the
PyTorch framework using the Stochastic gradient descent (SGD) optimizer. The initial learning rate (
lr) is 1 × 10
−4, and the learning rate is updated by a decreasing factor, which is expressed as follows:
where
indicates the decreasing factor and
i is the iteration number.
In addition, RTX3090 GPU is applied to train the model, the input data size is 512 × 512 × 3, the batch size is 32, and the number of training iterations is 60,000.
3.2. Results Analysis
The results of YOLO-V3, Simplified-Net, RDNet, RDNet + Aug as well as other two algorithms like Faster R-CNN and SSD were analyzed and compared in the next section.
3.2.1. Evaluation Metrics
Precision (
P) and recall (
R) are the major indicators for validating the performance of an object detection model. The intersection over union (
IOU) indicates the area overlap ratio between two rectangular boxes, and the calculation equation is
. Only when the
IOU between GTs is greater than the threshold can the prediction result be marked as correctly detected, otherwise, error is detected, and the
IOU threshold value in this paper is set to 0.5. Each rock lithology has its own precision and recall evaluation indicators
P and
R, and precision equals the number of truly correctly detected objects of a certain rock type divided by the sum numbers of detected objects that are marked as this type rock lithology. Recall equals the number of truly correctly detected objects of a certain rock type divided by the GT numbers in this type. The calculation formula is as follows:
where true positive (
TP) is the number of prediction results marked as correctly detected in this type, false positive (
FP) is the number of prediction results actually belonging to another type but marked as this type, which is viewed as incorrect detection, and false negative (
FN) is the number of prediction results marked as other types instead of this type. Therefore, in this paper, precision (
P) and recall (
R) are used as tools to measure the overall performance of the model.
3.2.2. YOLO-V3 and Simplified-Net
Firstly, with iterative training, the validation results of YOLO-V3 and Simplified-Net on four types of rock lithologies are shown in
Figure 11. The
x-axis indicates training iterations, the total number of iterations is 60,000, the
y-axis indicates
P and
R, the solid curve is the YOLO-V3 results, and the dotted line is the Simplified-Net.
Table 4 exhibits the best model performance of YOLO-V3 and Simplified-Net on all rock types. The parameters, calculations, and model inference speed between YOLO-V3 and Simplified-Net are shown in
Table 5.
The experimental results revealed that the gap between the Simplified-Net and YOLO-V3 on the four types of rock lithologies is negligible, and the average precision error and recall error are 2.3% and 1.25% respectively, proving that this kind of simplified approach is feasible.
3.2.3. Simplified-Net and RDNet
Secondly, with iterative training, the evaluation effects of Simplified-Net and RDNet on four types of rock lithologies are shown in
Figure 12, where the solid curve and dotted line are the results of Simplified-Net and RDNet, respectively. The best model performances of Simplified-Net and RDNet on all rock types are compared in
Table 6. The model parameters, computations, and model inference speed are shown in
Table 7.
According to the final experimental results in
Figure 12 and
Table 7, it is clear that there exists a certain disparity between Simplified-Net and RDNet, and the average precision error and recall error are 4.775% and 5.45% respectively, while, considering that RDNet was further simplified and pruned based on Simplified-Net mentioned in
Section 2.3.1, the structure of RDNet has been greatly simplified.
Table 8 shows that the parameters and computations of RDNet are reduced almost 20 times, and the inference time is shortened by half, requiring only 11 milliseconds for single image detection. Therefore, it is reasonable for a network to have a certain discount on overall performance when it is largely pruned.
In other words, the structured pruning technology on our task is acceptable, it not only guarantees that the precision is not greatly affected but also reduces the parameters and calculations.
3.2.4. RDNet and RDNet + Aug
Thirdly, data augmentation skills are merged into the preprocessing module, and trials have been conducted based on RDNet. The results are summarized in
Figure 13 and
Table 8. The solid curve is RDNet + Aug, and the dotted line is RDNet without data augmentation. It is obvious that whether on precision or recall, combined with data augmentation skills, the model performance obtained significant improvement, the average precision for detection of four types of rock is over 10%, and the recall is improved almost 2-fold. Since, data augmentation skills are integrated in the data preprocessing phase, no parameters or calculations are added to the network, as shown in
Table 9.
3.2.5. RDNet + Aug and Other Models
The two-stage object detection algorithm Faster R-CNN and one-stage network SSD are also trained on the same dataset, and with the same training iterations, both of their results are evaluated and compared with ours. As shown in
Figure 14 and
Figure 15, dotted lines represent our method, solid lines are SSD and Faster R-CNN, respectively, where the
x-axis indicates training iterations and the
y-axis indicates three evaluation indicators. It is obvious that our method performances more stable on aspect of all evaluation indicators, while SSD and Faster R-CNN has higher vibration and variance on four types of rock data. The best effects for our method, SSD and Faster R-CNN on four types of rock data are listed in
Table 10, and the compared detection results are shown in
Figure 16.
It can be summarized from
Table 10 that among the three algorithms, SSD has the lowest recall on the test dataset, and many objects were missed, while Faster R-CNN is more sensitive to apparent rock characteristics, therefore, the precision is not good compared to SSD, especially on monzogranite, shale and tuff classes.
Meanwhile, it is clear that our method achieved the best stability on four kinds of rock datasets, and the precision is 10%~30% higher than Faster R-CNN and SSD. In addition, the recall also performs better. Furthermore, the inference speed is twice as fast as s Faster R-CNN and SSD, and only 11 ms is needed for single image detection.
3.3. Discussion
The total experimental results of YOLO-V3, Simplified-Net, RDNet, RDNet + Aug, Faster R-CNN and SSD were summarized. The best model performance of each algorithm on four type of rock lithologies is shown in
Table 11. It can be concluded that YOLO-V3, Simplified-Net and RDNet possess almost the same evaluation results. The average precision on the four types of rock data is 75.325%, 77.325%, and 73.45%, respectively, and the average recall is 34.425%, 34.025%, and 52.63%, which demonstrates that the initial simplification method and the compression technology used in this paper are practical. On the other hand, low recall reveals that multitype hybrid rock lithology detection is challenging.
In addition, the average precision of RDNet (73.45%) is higher than that of Faster R-CNN (65.6%), and lower than that of SSD (80.05%), and the average recall of RDNet (52.63%) is lower than that of Faster R-CNN (75.92%) and SSD (56.5%), which illustrates that RDNet has no advantages compared to Faster R-CNN and SSD.
Combined with the data augmentation technology, RDNet + Aug achieved 82.1% average accuracy (higher than RDNet 73.45%), and 78% average recall (higher than RDNet 52.63%).
Figure 16 shows the comparison results of the labeled information of GT (the yellow box is the labeled targets box, the top left is the corresponding label) and the detection results (blue box) of RDNet + Aug, Faster R-CNN and SSD.
The comparison of validation results between RDNet + Aug, Faster R-CNN, SSD, and YOLO-V3 is shown in
Figure 17, which indicates that a suitable data augmentation method is of great importance for training the convolutional neural network model.
It is also worth noting that in addition to the more stable detection performance, RDNet + Aug still has obvious advantages on parameters and calculations. As is shown in
Table 12, the parameters and calculations of RDNet + Aug were reduced almost 20 times compared to YOLO-V3 and far less than Faster R-CNN and SSD. The inference speed is only 11 milliseconds for a single image detection, which is shortened by half compared to the others.
In the future study, multitype hybrid rock lithologies detection under complicated environment is still the key issue. High quality lithology database is needed to further expand the types of lithology and the number of each type of lithology. With the optimized network, the stability and generalization performance of the rock lithology detection model can show a big improvement.