Next Article in Journal
An Efficient One-Dimensional Texture Representation Approach for Lung Disease Diagnosis
Previous Article in Journal
Prediction of Intraday Electricity Supply Curves
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Lightweight Fracture Segmentation Algorithm for Logging Images Based on Fully 3D Attention Mechanism and Deformable Convolution

College of Petroleum, China University of Petroleum-Beijing at Karamay, Karamay 834000, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(22), 10662; https://doi.org/10.3390/app142210662
Submission received: 16 October 2024 / Revised: 15 November 2024 / Accepted: 16 November 2024 / Published: 18 November 2024

Abstract

:
The challenge of fracture segmentation remains a significant obstacle in imaging logging interpretation within the current oil and gas exploration and development field. However, existing image segmentation algorithms still encounter issues related to accuracy, speed, and robustness, as well as a tendency to misdetect or overlook small fractures when applied to logging image fracture segmentation tasks. To address these challenges comprehensively, this paper proposes an end-to-end fracture segmentation algorithm named SWSDS-Net. This algorithm is built upon the UNet architecture and incorporates the SimAM with slicing (SWS) attention mechanism along with the deformable strip convolution (DSCN) module. The SWS introduces a fully 3D attention mechanism that effectively learns the weights of each neuron in the feature map, enabling better capture of fracture features while ensuring fair attention and enhancement for both large and small objects. Additionally, the deformable properties of DSCN allow for adaptive sampling based on fracture shapes, effectively tackling challenges posed by varying fracture shapes and enhancing segmentation robustness. Experimental results demonstrate that SWSDS-Net achieves optimal performance across all evaluation metrics in this task, delivering superior visual results in fracture segmentation while successfully overcoming limitations present in existing algorithms such as complex shapes, noise interference, and low-quality images. Moreover, serving as a lightweight network solution enables SWSDS-Net’s deployment on mobile devices at remote sites—an advancement that lays a solid foundation for interpreting logging data and promotes deep learning technology application within traditional industrial scenarios.

1. Introduction

Logging technology plays a pivotal role in the field of oil and gas exploration and development by unveiling lithological characteristics, as well as the distribution and structure of fractures surrounding the wellbore. Fractures, being micro-geological structures, exert a significant influence on the distribution and seepage properties of oil, gas, and water. Consequently, they provide invaluable insights for oil and gas field exploration and development activities. In recent years, semantic segmentation has emerged as a prominent topic in computer vision research, with widespread application in fracture segmentation within logging images. Among various approaches used for this purpose, the UNet network stands out as the most commonly employed method. However, existing fracture segmentation methods possess certain limitations that need to be addressed. The current challenges revolve around designing a more lightweight version of the UNet network suitable for mobile devices and on-site deployment while also developing more efficient variants to enhance accuracy, speed, robustness, and interpretability in fracture segmentation.
With the rapid advancement of deep learning technology, segmentation methods based on deep learning have demonstrated significant potential in fracture segmentation tasks. Among these methods, the UNet architecture has gained widespread application due to its simplicity, ease of implementation, and excellent segmentation performance. Originally proposed by Ronneberger et al. (2015) [1], the UNet architecture employs an encoder–decoder structure that effectively separates the feature extraction and decoding processes of images. By incorporating skip connections, it successfully integrates low-level and high-level features to achieve pixel-level segmentation [2].
In recent years, UNet has been widely utilized for fracture segmentation tasks in imaging logging [3] and has achieved remarkable outcomes. Researchers commonly combine UNet with various feature extraction modules and loss functions to enhance segmentation accuracy and robustness. Feature extraction modules, such as ResNet [4], VGG [5], and EfficientNet [6], are employed to extract more comprehensive image features. Loss functions like Dice Loss [7], Jaccard Loss [8], and Focal Loss [9] are applied to improve segmentation precision. Fatimah Alzubaid et al. (2022) [10] proposed a fracture detection and characterization method based on Mask R-CNN that achieves a 95% accuracy rate in identifying and analyzing fractures from unwrapped core images. Nevertheless, this method may not fully encompass the complexity and diversity of authentic fractures. Luciana Olivia Dias et al. (2020) [11] proposed a novel method for automatic detection of fractures and breakout patterns in acoustic borehole image logs using fast-region convolutional neural networks (fast-RCNNs). In real images, a fast-RCNN can also effectively detect fractures and parts of breakouts. However, the method has limitations in detecting real breakouts, such as failing to recognize a pair of breakouts as a single event. Chao Li et al. (2024) [12] proposed a deep learning method based on transfer learning for intelligent identification and segmentation of fractures in ultrasonic logging images. Hongda Yu et al. (2024) [13] proposed a deep learning-based method for fracture identification from logging images, effectively overcoming the limitations of low efficiency in manual detection and poor accuracy of previous automated methods. Bing Xiong et al. (2024) [14] proposed a U-shaped network architecture known as the fusion–channel–Transformer network (FCT-Net), which comprises the channel atrous spatial pyramid pooling (CASPP) module, Transformer module, and attention weight cross fusion (AWCF) module. This network integrates local and global context features through a dual-branch encoding structure that effectively extracts multi-scale information from fracture images while progressively recovering image details during the decoding process. However, Transformer models generally require significant computational resources, limiting their real-time fracture detection capabilities and posing challenges for deployment on portable devices. Furthermore, Bing Xiong et al. (2024) [15] introduced a multi-scale dual-encoding fusion network (DefNet) based on the UNet architecture that extracts global and local features of fracture images using the stacked convolutional distillation module (CDM) and dual-branch attention Transformer (DBAT) modules. They also introduced a multi-scale information distillation mechanism and a large separable kernel attention (LSKA) module to construct the CDM encoder, surpassing the limitations of the traditional convolutional neural network (CNN) receptive field. Jianming Zhang et al. (2024) [16] proposed a dual-encoder crack segmentation network (DECS-Net), which combines the advantages of CNN and Transformer to address the issue of complex background interference in fracture segmentation. DECS-Net utilizes a high–low-frequency attention mechanism and a locally enhanced feedforward network to extract local and global features from images, respectively, subsequently fusing them through a feature fusion module and effectively enhancing the accuracy and recall rate of crack segmentation. Jia Liang et al. (2024) [17] proposed an innovative CNN-based pavement crack segmentation network (CSNet), integrating multi-scale context features with an attention mechanism for improved detection accuracy. CSNet demonstrates exceptional performance on various datasets, outperforming other segmentation models. However, it still faces challenges in micro-crack detection as well as requiring substantial labeled data for training purposes. Despite these advancements, there is still room for improvement when dealing with low-quality imaging, significant noise interference, and complex shapes in logging images. Additionally, there are significant challenges remaining regarding segmentation speed, model lightweighting, and interpretability.
In summary, this paper integrates SimAM with slicing (SWS) attention [18] and deformable strip convolution (DSCN) [19] into the UNet architecture, effectively enhancing the model’s performance in fracture segmentation tasks for imaging logging. Traditional UNet attention mechanisms primarily focus on channel and spatial dimensions. However, SWS introduces a fully 3D attention mechanism that more accurately learns the weights of each neuron in the feature map, thereby capturing fracture features with greater precision. Additionally, we have introduced a slicing operation to ensure equitable attention and enhancement for both large and small objects. Through SWS, the model can better concentrate on key regions of fractures while suppressing background noise, thus improving segmentation accuracy. Furthermore, the original UNet convolution has a fixed receptive field, which limits its adaptability to diverse fracture shapes. DSCN possesses deformable properties that enable adaptive sampling based on fracture shapes, resulting in more effective extraction of fracture features and addressing challenges posed by varying fracture shapes while enhancing segmentation robustness. By utilizing the improved UNet model, fractures in imaging logging can be segmented more accurately, providing reliable data for oil and gas exploration and development. This promotes the application of deep learning technology in industrial fields while serving as a reference for other domains.

2. Theoretical Foundations

2.1. SWSDS-Net Network Architecture

The UNet model effectively preserves multi-level feature information by utilizing skip connections and concatenating feature maps at different levels [20], thereby enhancing its representational capacity. Hence, for imaging logging fracture segmentation tasks with limited datasets and high costs involving single-class target semantic segmentation, employing UNet as a baseline model is a favorable solution. UNet comprises an encoder part and a decoder part. The encoder consists of multiple convolutional layers and pooling layers that progressively reduce the image’s spatial resolution while extracting features [21]. Typically, each convolutional layer is followed by a batch normalization layer and an activation function to augment the model’s learning capability. The final convolutional layer in the encoder generates a low-resolution feature map, which serves as input for the decoder. The decoder comprises multiple convolutional layers and upsampling layers that gradually restore the image’s spatial resolution while reconstructing features. Similar to the encoder, each convolutional layer in the decoder is usually accompanied by a batch normalization layer and an activation function to enhance learning ability. To guide feature reconstruction, each convolutional layer in the decoder is concatenated with its corresponding counterpart in the encoder through skip connections, effectively preserving multi-level feature information and improving segmentation accuracy. Finally, the last convolutional layer of the decoder produces a high-resolution feature map, which serves as the final segmentation result.
The network architecture, named SimAM with slicing deformable strip (SWSDS-Net), is constructed based on UNet. It incorporates SWS into both the encoder and the decoder, replacing the original convolution with DSCN. This integration effectively addresses the challenges posed by variations in fracture shapes, enhancing the model’s robustness in complex scenarios while reducing parameters and computational cost. Consequently, it accelerates the training process.
The overall architecture of SWSDS-Net, as depicted in the figure, adopts a symmetrical U-shaped structure. The left pathway is responsible for feature extraction and encoding, while the right pathway handles feature reconstruction and decoding. The left pathway comprises convolutional layers and pooling layers that progressively increase channel numbers and reduce feature map sizes to extract shallow target features. Conversely, the right pathway consists of convolutional layers and upsampling layers that gradually decrease channel numbers and enlarge feature map sizes to extract deep target features. Between these pathways, skip connections are employed to integrate shallow features from the left pathway with deep features from the right pathway, thereby achieving comprehensive recognition and segmentation of the target [22].
In the feature extraction process of the UNet network, redundant features, such as background noise [23], may be present. These features contribute minimally to the final segmentation task but significantly increase the computational burden of the model. Spatial weighted sum (SWS) effectively mitigates these redundant features by learning weights for each feature map, thereby enhancing feature extraction efficiency. Moreover, when processing objects with complex shapes in UNet, challenges like blurred edges or inaccurate shapes can arise. Dynamic shape convolutional networks (DSCNs), through learning offsets, adaptively adjust the receptive field of convolutional kernels to better suit irregularly shaped objects like fractures and enhance model adaptability to complex shapes. Furthermore, compared to traditional convolution operations, DSCNs have fewer parameters and lower computational costs, which effectively reduces computational burden while optimizing resource utilization. The incorporation of SWS and a DSCN also enhances depth in feature learning by facilitating a better fusion of shallow and deep features, ultimately leading to improved segmentation accuracy [24].
After preprocessing, the logging images are fed into the encoding section, which sequentially undergoes two convolutional layers and a pooling layer in a repetitive manner. This progressive process gradually increases the number of channels while reducing the size, resulting in an output feature size of 1024 × 16 × 16 . In the decoding section, through transposed convolution and concatenation operations, the channel count is initially doubled and subsequently reduced to ultimately generate a feature map with identical dimensions as the input image. In Figure 1, both the encoder’s and the decoder’s conventional convolutional layers are substituted with DSCN for enhanced performance, while SWS is incorporated after upsampling in both sections.

2.2. DSCN Principle and Design

The convolution operation in UNet is standardized, featuring a fixed receptive field that lacks adaptability to the shape variations of the target object. This inherent limitation hampers UNet’s effectiveness in extracting features from complex object shapes. Deformable convolutional networks v3 (DCNv3) introduces deformable convolution [25] operations that enhance adaptability to the target object’s shape. Moreover, DCNv3 employs a convolutional kernel with a global receptive field [26], enabling better capture of relationships between target objects. By learning sampling position offsets and modulation masks, the convolutional kernel can dynamically adjust to different shapes, thereby facilitating more effective feature extraction from the target object. However, when employing large kernel convolutions in DCNv3, there is a significant increase in computational cost—particularly during training—as it necessitates storing numerous offsets and modulation masks, resulting in high memory consumption and slower training speeds.
The paper thus proposes a simplification of DCNv3, resulting in the development of DSCN, a lightweight deformable convolution operation. The specific steps involved in this simplification process are as follows:
The computational cost of DCNv3 increases quadratically as the kernel size grows due to its sampling in both the horizontal and the vertical directions. For instance, a 3 × 3 DCNv3 convolutional kernel necessitates 9 sampling points, while a 5 × 5 DCNv3 convolutional kernel requires 25 sampling points.
Here, p k represents the coordinate offset of the k-th sample point in the convolution kernel on the spatial domain, where k is equivalent to ( i ,   j ) when expressed in spatial coordinates. G represents the quantity of deformable groups. Moreover, Σ g = 1 G represents the individual sampling founded on the learned offsets of each deformable group. y 1 i 0 ,   j 0 and y 2 i 0 ,   j 0 , respectively, denote the values of the output feature map at position i 0 ,   j 0 after conducting convolution operations along the x -axis and y -axis. m 0 , j 1 and m i , 0 2 represent the weights of the modulation mask. w 1 and w 2 represent the weights of the linear mapping. p j g and p i g , respectively, denote the offsets of the jth and ith sampling points of the gth deformable group.
y 1 i 0 ,   j 0 = g = 1 G j = 0 K w 1 w 1 m 0 , j 1 x ( i 0 , j 0 + j + p j g )
y 2 i 0 ,   j 0 = g = 1 G i = 0 K h 1 w 2 m i , 0 2 y 1 ( i 0 + i + p i g , j 0 )
The final output is obtained by substituting Equation (1) with Equation (2), and then sampling the superpixels along the x -axis and y -axis, respectively. The K w and K h represent the sampling size on the x -axis and y -axis.
y 2 i 0 ,   j 0 = g = 1 G i = 0 K h 1 j = 0 K w 1 w m i , j x ( i 0 + i + p i g , j 0 + j + p i g )
In Figure 2, The DSCN technique restricts the sampling to a single direction, such as the x -axis or y -axis, thereby enabling a 3 × 3 DSCN convolutional kernel to utilize only three sampling points and a 5 × 5 DSCN convolutional kernel to require just five sampling points. This approach of constraining the sampling direction effectively reduces the computational burden of DSCN, rendering it more suitable for lightweight CNN architectures.
y 1 i 0 ,   j 0 = g = 1 G j = 0 K w 1 w x ( i 0 , j 0 + j + p j g )
y 2 i 0 ,   j 0 = g = 1 G i = 0 K h 1 y 1 ( i 0 + i + p j g ,   j 0 )
Secondly, DCNv3 employs bilinear interpolation for sampling, which is a relatively intricate interpolation technique entailing high computational overhead. Bilinear interpolation necessitates two linear interpolations and one weighted summation operation. Assuming the presence of three spatial coordinates ( x 0 , y 0 ) , ( x 1 ,   y 1 ) , and ( x 2 , y 2 ) , f ( x 0 , y 0 ) can be computed using the bilinear interpolation formula with f ( x 1 ,   y 1 ) and f ( x 2 , y 2 ) .
f x 0 , y 0 = f x 1 ,   y 1 x 2 x 0 y 2 y 0 + f x 1 ,   y 2 x 0 x 1 y 2 y 0 + f x 1 ,   y 2 x 2 x 0 y 0 y 1 + f x 2 ,   y 2 x 0 x 1 y 0 y 1
The DSCN, in contrast, employs linear interpolation for sampling, which is a simpler interpolation method with reduced computational complexity. It only necessitates two linear interpolation operations.
f x 0 , y 1 = f x 1 ,   y 1 x 0 x 1 + f x 2 ,   y 1 x 2 x 1
f x 0 , y 0 = f x 0 , y 1 y 0 y 1 + f x 1 ,   y 2 y 2 y 1
The DCNv3 model employs a modulation mask to regulate the sampling weights, thereby increasing both the parameter count and the computational overhead. In contrast, DSCN eliminates the need for a modulation mask and instead utilizes a straightforward weight matrix to control the sampling weights, resulting in a significant reduction in complexity as well as a decreased parameter count and computational cost.
In summary, the input feature map undergoes a depthwise convolution (DWConv) initially to reduce dimensionality in the channel dimension. This is followed by a linear projection layer that generates offsets through a linear transformation. These offsets are utilized to control subsequent sampling positions, allowing the convolutional kernel to adaptively adjust the receptive field according to features of varying shapes and sizes. Subsequently, the input feature map is interpolated linearly based on these generated offsets using a linear function for estimating values between sampling points. The sampled feature map encompasses feature information surrounding the target object, thereby aiding in improved recognition and localization by the network. Finally, this sampled feature map passes through another DWConv operation to increase dimensionality in the channel dimension and restore its original number of channels within the feature map, ensuring effective connectivity with subsequent layers. Consequently, the output feature map contains DSCN-processed features specific to the target object.

2.3. SWS Principle and Design

Based on the simple attention mechanism (SimAM), we have introduced a slicing operation and proposed SWS, aiming to address the limitations of traditional attention mechanisms in small target segmentation in imaging logging. Unlike most existing attention modules, SimAM can infer 3D attention weights by simultaneously considering both channel and spatial dimensions, thereby enhancing the network’s feature representation capabilities while maintaining its lightweight nature without increasing the parameter count. Drawing inspiration from the spatial inhibition theory in neuroscience, which suggests that active neurons inhibit surrounding neuron activity to highlight important information, we calculate 3D attention weights through the following three steps [27]:
The SimAM algorithm first establishes an energy function to quantify the degree of linear separability between a single neuron and its neighboring neurons within the same channel. A lower energy value indicates a greater distinction between this particular neuron and its surrounding counterparts, thereby highlighting its significance in visual processing.
e t w t ,   b t ,   y ,   x i = y t t ^ 2 + 1 M 1 i = 1 M 1 ( y o x ^ i ) 2
Here, t ^ = w t t + b t and x ^ i = w t x i + b t are linear transforms of t and x i , where t and x i are the target neuron and other neurons in a single channel of the input feature X R C × H × W . i is index over spatial dimension.   M = H × W is the number of neurons on that. w t and b t are the weight and bias of the transform.
The transformed variables t ^ = w t t + b t and x ^ i = w t x i + b t represent linear transformations of t and x i , respectively. Here, i denotes the index and M = H × W represents the total number of neurons. The weight w t and bias b t are utilized for this transformation process. Upon simplification, we obtain the final energy function.
e t w t ,   b t ,   y ,   x i = 1 M 1 i = 1 M 1 ( 1 ( w t x i + b t ) ) 2 + ( 1 ( w t t + b t ) ) 2 + λ w t 2
The next step involves deriving a closed-form solution for the energy function, which allows SimAM to efficiently calculate the significance of each neuron. Here, μ ^ = 1 M i = 1 M x i and σ ^ 2 = 1 M i = 1 M ( x i μ ^ ) 2 .
μ t = 1 M 1 i = 1 M 1 x i
σ t 2 = 1 M 1 i M 1 ( x i μ t ) 2
w t = 2 ( t μ t ) ( t μ t ) 2 + 2 σ t 2 + 2 λ
b t = 1 2 ( t + μ t ) w t
e t * = 4 ( σ ^ 2 + λ ) ( t μ ^ ) 2 + 2 σ ^ 2 + 2 λ
The s i g m o i d function is employed to confine the value of the energy function E within the range of 0 and 1, thereby obtaining the weight of each neuron. Subsequently, the SimAM model ultimately leverages the significance of each individual neuron as its attention weight, enabling it to conduct element-wise multiplication with the feature map and thus attaining a weighted feature map.
X ~ = s i g m o i d ( 1 E ) X
In Figure 3, The SimAM model employs a multi-head attention mechanism [28], facilitated by multi-layer stacking, to effectively capture long-range dependencies within the input sequence. By incorporating residual connections, the output of the multi-head attention layer is seamlessly integrated with the input sequence, ensuring preservation of all pertinent information. Furthermore, a normalization layer applies layer normalization to the feedforward layer’s output, thereby expediting training and enhancing model performance while facilitating learning and generation of the corresponding output sequence [29].
Building on this, the SWS module introduces a slicing operation that partitions the feature map into multiple non-overlapping blocks. This slicing operation enables the module to compute local statistics instead of global statistics, which is particularly advantageous for emphasizing small objects that may be overlooked within these blocks. Each block independently calculates its average pixel variation, effectively capturing the distinctiveness of small targets within that specific block. Small targets exhibiting deviations from the local average are assigned higher weights, thereby enhancing their features. Subsequently, after computing the weight for each block, the feature map is reassembled by applying these weights to ensure enhanced representation of small targets while maintaining or further amplifying features of larger targets as required.

3. Results and Discussion

3.1. Dataset and Experimental Setup

The dataset utilized in this study comprises 34 imaging logging images, each with dimensions of 280 × 280 . These images were annotated using Labelme, with two target classes: cracks and background. Corresponding JSON label files were generated and subsequently converted into the PNG format required for UNet training. Offline data augmentation techniques, such as rotation and scaling, were employed to augment the dataset size by a factor of 10, resulting in a total of 340 images. The final images were resized to dimensions of 256 × 256 and underwent online data augmentation before being used for training.
The experimental hardware setup comprises a Windows 10 system equipped with an Intel(R) Xeon(R) CPU @ 20.20 GHz (6 cores) and an NVIDIA GeForce RTX 3060 GPU boasting 8 GB of memory. The software environment encompasses Python 3.10, PyTorch 2.0.1, and CUDA 11.8.
Considering the relatively limited size of the dataset utilized in this paper, it is crucial to obtain a more robust and reliable evaluation of model performance. Therefore, we have employed k-fold cross-validation with k set to 7. The specific steps are as follows: Firstly, the entire dataset is divided into seven equally sized subsets. Subsequently, seven rounds of training and validation are conducted. In each round, the i-th subset serves as the validation set, while the remaining six subsets are combined to form the training set. The model undergoes training on this set and is then tested on the validation set, with performance metrics being recorded accordingly. Finally, an average of these performance metric results from all seven rounds ( S 1 ~ S 7 ) is calculated to derive a comprehensive evaluation of the model’s performance [30].

3.2. Evaluation Metrics

The evaluation metrics employed in this paper encompass intersection over union (I), precision (P), recall (R), and the dice coefficient (D). True positive (TP) denotes the accurate prediction of positive samples as positive, false positive (FP) signifies the erroneous prediction of negative samples as positive, and false negative (FN) indicates the incorrect prediction of positive samples as negative [31].
I = T P T P + F P + F N
P = T P T P + F P
R = T P T P + F N
D = 2 T P 2 T P + F P + F N

3.3. Attention Module Comparison Experiments

In the comparative experiments, various attention mechanisms were incorporated into the UNet baseline model. SE [32] employs 1D attention weights, Coordinate and CBAM employ 2D attention weights [33], and Biformer [34] employs 3D attention weights. Comparatively speaking, the Coordinate structure is more intricate, as it necessitates constructing an attention module and integrating it into the network, along with introducing additional parameters such as convolutional kernels and fully connected layers. Similar to Coordinate, both SE and CBAM introduce extra parameters and possess relatively complex structures. Although Biformer can capture more intricate feature interactions through self-attention mechanisms, it requires a substantial number of parameters, has a complex structure, and incurs high computational costs. Overall, SWS does not introduce additional parameters but still achieves comparable performance in terms of inference speed and accuracy to other attention mechanisms [35].
In Table 1, The UNet baseline model, serving as the baseline, has a P of 68.4%, an R of 51.0%, and a D of 58.4%. SE and CBAM offer comparable performance, with the P for both being approximately 69.3%, R being approximately 51.6%, and D being approximately 59.0%. The increase in their parameters is approximately 590 K. In contrast to the original model, they augment more parameters, yet the improvement in accuracy is rather limited. For the current task, they are not favorable choices. Biformer has a P of 65.0%, an R of 48.3%, and a D of merely 55.4%. Nevertheless, it possesses the largest number of parameters, increasing by 4.67 M. Comparatively speaking, Biformer is the least effective, introducing the greatest number of parameters but experiencing a notable decline in accuracy. Therefore, Biformer is the poorest choice. Coordinate achieves a P of 69.9%. The recall rate is similar to that of the baseline model, and D improves to 59.4%. The increase in parameters is 560 K. Among all the attention modules, the increase in parameters of Biformer is the second smallest, but the accuracy improves significantly. The P and R reach 70.8% and 53.4% respectively, and D increases to 60.9%. On the whole, SWS proves to be an outstanding choice for this task.

3.4. Model Comparison Experiments

The effectiveness of the proposed SWSDS-Net for imaging logging fracture segmentation was verified through comparison experiments conducted with four well-established models, namely, UNet, UNetPlus, UNet++, and 3D-UNet, all under identical conditions.
In Table 2, in terms of P, the average value of the UNet baseline model is 0.694, that of 3D-UNet is 0.636, that of UNetPlus is 0.639, that of UNet++ is 0.664, and that of SWSDS-Net is 0.712. SWSDS-Net demonstrates the best performance in P, followed by UNet, UNet++, UNetPlus, and 3D-UNet. Regarding R, the average value of the UNet baseline model is 0.510, that of 3D-UNet is 0.570, that of UNetPlus is 0.334, that of UNet++ is 0.653, and that of SWSDS-Net is 0.608. SWSDS-Net demonstrates the best performance in R, followed by UNet++, 3D-UNet, UNet, and UNetPlus. Concerning I, the average value of the UNet baseline model is 0.663, that of 3D-UNet is 0.670, that of UNetPlus is 0.584, that of UNet++ is 0.672, and that of SWSDS-Net is 0.676. SWSDS-Net also demonstrates the best performance in I, followed by UNet++, 3D-UNet, UNet, and UNetPlus. Finally, with respect to D, the average value of the UNet baseline model is 0.578, that of 3D-UNet is 0.594, that of UNetPlus is 0.411, that of UNet++ is 0.650, and that of SWSDS-Net is 0.649. SWSDS-Net again demonstrates the best performance in D, followed by UNet++, 3D-UNet, UNet, and UNetPlus. Overall, SWSDS-Net exhibits the optimal performance in all parameters of the evaluation metrics.
The experimental results demonstrate that UNet++ outperforms other models in terms of P, R, I, and D. However, due to its complex network structure, large number of parameters, and high computational cost, it is not suitable for on-site deployment on mobile devices for imaging logging fracture segmentation. In contrast, the proposed SWSDS-Net achieves high accuracy while being lightweight without introducing additional parameters or significantly increasing computational costs. It effectively improves small target detection accuracy. Although 3D-UNet has the capability to handle 3D data and capture 3D features, it does not perform well in this specific scenario. UNetPlus reduces the number of channels to enhance model efficiency but sacrifices some accuracy.
In summary, SWSDS-Net effectively enhances the performance of fracture segmentation tasks in imaging logging by introducing the SWS and DSCN modules. These modules establish a closed loop between the encoder and decoder of UNet, where the feature maps from the encoder undergo upsampling and are concatenated with the feature maps from the decoder. Subsequently, the SWS module is reintegrated into the encoder to further enhance its representational capacity. This design effectively exploits SWS’s fully 3D attention mechanism, thereby improving segmentation accuracy. The figures depicting segmentation results demonstrate that SWSDS-Net yields minimal noise and interference in segmented targets. Through objective network performance comparisons, segmentation result analysis, and subjective visual evaluation, SWSDS-Net’s effectiveness is convincingly demonstrated.
In Figure 4, after analyzing the segmentation results, it is evident that all of these networks perform admirably when dealing with simpler scenes characterized by minimal noise and interference. However, in more intricate scenarios, UNetPlus exhibits a higher susceptibility to noise interference, resulting in notable false positives and misdetections. This vulnerability can be attributed to the complex structure of UNetPlus, which incorporates nested UNet structures and deep supervision mechanisms. Consequently, the model tends to overfit noise and struggles to effectively differentiate between fracture features and noise features, thereby compromising segmentation accuracy. Clearly outperforming the other models considered here, SWSDS-Net demonstrates superior segmentation performance. The SWS attention mechanism ensures equitable attention allocation and enhancement for both large and small objects, maximizing the detection of small targets without any omissions. Moreover, DSCN’s deformable properties enable adaptive sampling based on fracture shapes, facilitating improved extraction of fracture features. Additionally, SWSDS-Net adeptly captures intricate details, leading to near-perfect segmentation of fracture shapes while avoiding issues such as blurred edges or inaccurate shapes.

4. Conclusions

The proposed paper presents an end-to-end fracture segmentation algorithm for logging images, aiming to address the challenges of low segmentation accuracy in current algorithms when dealing with poor image quality, high noise interference, and complex shapes. Furthermore, it tackles the lack of lightweight models suitable for on-site deployment. The success of SWSDS-Net can be attributed to replacing standard convolution in UNet with the lightweight DSCN. By utilizing deformable convolution operations, DSCN enhances adaptability to target object shapes and achieves a global receptive field, enabling better capture of relationships between target objects. Additionally, the introduction of the SWS module after upsampling improves the network’s ability to capture global fracture information while maintaining a lightweight structure. This enhancement significantly boosts detection performance for small targets [36].
The experimental results demonstrate that, in comparison to currently prevalent fracture segmentation algorithms, the proposed SWSDS-Net exhibits superior overall performance and is better suited for fracture segmentation scenarios in imaging logging. Moreover, the lightweight model effectively addresses the challenge of on-site deployment on mobile devices. This paper holds significant practical value, as it provides a more dependable foundation for oil and gas exploration and development while promoting the application of deep learning technology in industrial domains. Furthermore, it enables deep learning to seamlessly facilitate traditional industrial scenarios, thereby reducing labor costs [37].
However, the proposed algorithm for logging fracture segmentation still has limitations in processing multimodal logging data. Multimodal logging data encompass diverse types of logging curves that reflect distinct rock physical properties and exhibit different feature representations. Furthermore, the fusion of multimodal data significantly amplifies the complexity of the model, resulting in a substantial increase in training time and computational costs. Therefore, future work will focus on incorporating multimodal feature fusion techniques to facilitate better integration of features from different modalities and extract more comprehensive feature information. Additionally, a modality-adaptive learning module will be devised to enable the model to learn appropriate strategies for feature extraction and fusion based on the characteristics of each modality. Building upon this foundation, lightweight model design will be considered by employing lighter feature extraction modules or attention mechanisms to reduce model complexity.

Author Contributions

Conceptualization, Q.Y., A.L. and L.Z.; software, A.L. and L.Z.; validation, Z.X. and Y.Q.; writing—original draft preparation, Q.Y.; writing—review and editing, Q.Y. and L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This article was supported by the Xinjiang Natural Science Foundation Outstanding Youth Science Foundation Project (2024D01E08) “Research on Ultra-Deep Seismic Imaging Mechanism and All-round Velocity Modeling Method in Central Tarim Basin”, the Research Foundation of China University of Petroleum-Beijing at Karamay “Study on analysis and correction method of wide azimuth seismic azimuth anisotropy in shale oil exploration” (No. XQZX20240029), “Evaluation of shale formation compressibility based on seismic petrophysics” (No. XQZX20240015), and the “Tianchi Talent” Introduction Plan Project of Xinjiang Uygur Autonomous Region, China.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank the anonymous reviewers and members of the editorial team for their comments and contributions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
  2. Zhao, L.; Zhao, X.; Liu, J.; Wang, S.; Ren, W. Characteristics of Paleogene stratigraphic and lithologic reservoirs and its exploration direction in Jizhong Depression. Acta Pet. Sin. 2009, 30, 492. [Google Scholar]
  3. Zhang, W.; Wu, T.; Li, Z.P.; Liu, S.Y.; Qiu, A.O.; Li, Y.J.; Shi, Y.B. Fracture recognition in ultrasonic logging images via unsupervised segmentation network. Earth Sci. Inform. 2021, 14, 955–964. [Google Scholar] [CrossRef]
  4. He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  5. Conneau, A.; Schwenk, H.; Le Cun, Y.; Barrault, L.; Assoc Computat, L. Very Deep Convolutional Networks for Text Classification. In Proceedings of the 15th Conference of the European-Chapter of the Association-for-Computational-Linguistics (EACL), Valencia, Spain, 3–7 April 2017; pp. 1107–1116. [Google Scholar]
  6. Tan, M.X.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
  7. Sudre, C.H.; Li, W.Q.; Vercauteren, T.; Ourselin, S.; Cardoso, M.J. Generalised Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations. In Proceedings of the 3rd MICCAI International Workshop on Deep Learning in Medical Image Analysis (DLMIA)/7th International Workshop on Multimodal Learning for Clinical Decision Support (ML-CDS), Quebec, QC, Canada, 14 September 2017; pp. 240–248. [Google Scholar]
  8. Mohajerani, S.; Saeedi, P. Cloud and Cloud Shadow Segmentation for Remote Sensing Imagery Via Filtered Jaccard Loss Function and Parametric Augmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4254–4266. [Google Scholar] [CrossRef]
  9. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.M.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
  10. Alzubaidi, F.; Makuluni, P.; Clark, S.R.; Lie, J.E.; Mostaghimi, P.; Armstrong, R.T. Automatic fracture detection and characterization from unwrapped drill-core images using mask R-CNN. J. Pet. Sci. Eng. 2022, 208, 109471. [Google Scholar] [CrossRef]
  11. Dias, L.O.; Bom, C.R.; Faria, E.L.; Valentín, M.B.; Correia, M.D.; de Albuquerque, M.P.; de Albuquerque, M.P.; Coelho, J.M. Automatic detection of fractures and breakouts patterns in acoustic borehole image logs using fast-region convolutional neural networks. J. Pet. Sci. Eng. 2020, 191, 107099. [Google Scholar] [CrossRef]
  12. Li, C.; Zou, C.C.; Peng, C.; Lan, X.X.; Zhang, Y.Y. Intelligent identification and segmentation of fractures in images of ultrasonic image logging based on transfer learning. Fuel 2024, 369, 131694. [Google Scholar] [CrossRef]
  13. Yu, H.; Pan, B.; Guo, Y.; Li, Y.; Han, R.; Wang, Y.; Zhang, P.; Wang, X. Automatic fracture identification from logging images using the TSCODE-SIMAM-YOLOv5 algorithm. Geoenergy Sci. Eng. 2024, 243, 213319. [Google Scholar] [CrossRef]
  14. Xiong, B.; Hong, R.; Liu, R.; Wang, J.; Zhang, J.; Li, W.; Lv, S.; Ge, D. FCT-Net: A dual-encoding-path network fusing atrous spatial pyramid pooling and transformer for pavement crack detection. Eng. Appl. Artif. Intell. 2024, 137, 109190. [Google Scholar] [CrossRef]
  15. Xiong, B.; Hong, R.; Wang, J.; Li, W.; Zhang, J.; Lv, S.; Ge, D. DefNet: A multi-scale dual-encoding fusion network aggregating Transformer and CNN for crack segmentation. Constr. Build. Mater. 2024, 448, 138206. [Google Scholar] [CrossRef]
  16. Zhang, J.; Zeng, Z.; Sharma, P.K.; Alfarraj, O.; Tolba, A.; Wang, J. A dual encoder crack segmentation network with Haar wavelet-based high–low frequency attention. Expert Syst. Appl. 2024, 256, 124950. [Google Scholar] [CrossRef]
  17. Liang, J.; Gu, X.; Jiang, D.; Zhang, Q. CNN-based network with multi-scale context feature and attention mechanism for automatic pavement crack segmentation. Autom. Constr. 2024, 164, 105482. [Google Scholar] [CrossRef]
  18. Yang, L.X.; Zhang, R.Y.; Li, L.D.; Xie, X.H. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML), Virtual, 18–24 July 2021. [Google Scholar]
  19. Zhang, J.; Liu, B.W.; Zhang, H.Y.; Zhang, L.; Wang, F.X.; Chen, Y.B. A small object detection network for remote sensing based on CS-PANet and DSAN. Multimed. Tools Appl. 2024, 83, 72079–72096. [Google Scholar] [CrossRef]
  20. Xie, S.N.; Tu, Z.W. Holistically-Nested Edge Detection. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 1395–1403. [Google Scholar]
  21. Song, M.; Guo, P. A Combinatorial Optimization Method for Remote Sensing Image Fusion with Contourlet and HSI Transform. J. Comput.-Aided Des. Graph. 2012, 24, 83–88. [Google Scholar]
  22. Du, G.G.; Wang, K.; Lian, S.G.; Zhao, K.Y. Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: A review. Artif. Intell. Rev. 2021, 54, 1677–1734. [Google Scholar] [CrossRef]
  23. Wang, X.R.; Zhu, Y.J.; Liu, H.Z. Surface defect detection for intricate pattern fabrics based on deep learning. Meas. Sci. Technol. 2024, 35, 105412. [Google Scholar] [CrossRef]
  24. Guan, Y.; Cui, Z.; Zhou, W.J. Fast autofocusing in off-axis digital holography based on search region segmentation and dichotomy. Opt. Laser Technol. 2025, 181, 111876. [Google Scholar] [CrossRef]
  25. Zhang, D.D.; Zhang, Z.Q.; Chen, N.G.; Wang, Y. Dynamic convolutional time series forecasting based on adaptive temporal bilateral filtering. Pattern Recognit. 2025, 158, 110985. [Google Scholar] [CrossRef]
  26. Xie, T.; Sun, Q.H.; Sun, T.; Zhang, J.H.; Dai, K.; Zhao, L.J.; Wang, K.; Li, R.F. DVDS: A deep visual dynamic slam system. Expert Syst. Appl. 2025, 260, 125438. [Google Scholar] [CrossRef]
  27. Chen, X.Y.; Luo, J.H.; Ren, Y.; Cui, T.; Zhang, M. MAFNet: A two-stage multiple attention fusion network for partial-to-partial point cloud registration. Meas. Sci. Technol. 2024, 35, 125113. [Google Scholar] [CrossRef]
  28. Li, N.H.; Liu, S.J.; Liu, Y.Q.; Zhao, S.; Liu, M. Neural Speech Synthesis with Transformer Network. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence/31st Innovative Applications of Artificial Intelligence Conference/9th AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 6706–6713. [Google Scholar]
  29. Wang, X.L.; Girshick, R.; Gupta, A.; He, K.M. Non-local Neural Networks. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
  30. Vehtari, A.; Gelman, A.; Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 2017, 27, 1413–1432. [Google Scholar] [CrossRef]
  31. Fu, J.; Liu, J.; Tian, H.J.; Li, Y.; Bao, Y.J.; Fang, Z.W.; Lu, H.Q.; Soc, I.C. Dual Attention Network for Scene Segmentation. In Proceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 3141–3149. [Google Scholar]
  32. Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  33. Woo, S.H.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  34. Zhu, L.; Wang, X.J.; Ke, Z.H.; Zhang, W.; Lau, R. BiFormer: Vision Transformer with Bi-Level Routing Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 10323–10333. [Google Scholar]
  35. Ao, D.; Zhou, L.; Luo, M.; Wang, W. A novel method of fracture segmentation for image log interpretation based on attention mechanisms and convolutional neural networks. J. Geophys. Prospect. Pet. 2023, 62, 236–244. [Google Scholar]
  36. Tong, W.; Chen, W.T.; Han, W.; Li, X.J.; Wang, L.Z. Channel-Attention-Based DenseNet Network for Remote Sensing Image Scene Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 4121–4132. [Google Scholar] [CrossRef]
  37. Huang, Z.P.; Liu, J.W.; Li, L.; Zheng, K.C.; Zha, Z.J. The Association for the Advancement of Artificial Intelligence. Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared Person Re-identification. In Proceedings of the 36th AAAI Conference on Artificial Intelligence/34th Conference on Innovative Applications of Artificial Intelligence/12th Symposium on Educational Advances in Artificial Intelligence, Virtual, 22 February–1 March 2022; pp. 1034–1042. [Google Scholar]
Figure 1. SWSDS-Net network architecture.
Figure 1. SWSDS-Net network architecture.
Applsci 14 10662 g001
Figure 2. Comparison of bilinear interpolation and linear interpolation.
Figure 2. Comparison of bilinear interpolation and linear interpolation.
Applsci 14 10662 g002
Figure 3. Comparison of different attentions: (a) channel-wise attention, (b) spatial-wise attention, and (c) full 3-D weights for attention.
Figure 3. Comparison of different attentions: (a) channel-wise attention, (b) spatial-wise attention, and (c) full 3-D weights for attention.
Applsci 14 10662 g003
Figure 4. The segmentation result of logging fracture images: (a) original picture, (b) the result of UNet, (c) the result of UNetPlus, (d) the result of UNet++, and (e) the result of SWSDS-Net.
Figure 4. The segmentation result of logging fracture images: (a) original picture, (b) the result of UNet, (c) the result of UNetPlus, (d) the result of UNet++, and (e) the result of SWSDS-Net.
Applsci 14 10662 g004
Table 1. Comparisons of different attention performance.
Table 1. Comparisons of different attention performance.
ModelPRDParams
Base68.451.058.4
SWS70.853.460.9+460 K
Coordinate69.951.759.4+560 K
SE69.351.959.3+590 K
CBAM68.251.658.7+590 K
Biformer65.048.355.4+4.67 M
Table 2. Test results of each model.
Table 2. Test results of each model.
ParameterModelS1S2S3S4S5S6S7Average
PUNet0.7410.6600.6350.7630.7320.6850.6450.694
3D-UNet0.7100.5350.6870.7340.6170.6170.5440.636
UNetPlus0.6740.6900.5610.6810.5090.6670.5990.639
UNet++0.7180.7160.6550.7560.5450.6350.6280.664
SWSDS-Net0.7890.7750.7020.8110.5640.6920.6470.712
RUNet0.3870.6670.6530.3910.4950.5770.4030.510
3D-UNet0.4530.6130.5930.5120.5990.6750.5380.570
UNetPlus0.2100.4710.5350.0950.4970.3560.1780.334
UNet++0.5210.6620.7020.5520.7030.7270.6780.653
SWSDS-Net0.5340.6420.7010.5120.6580.7140.5030.608
IUNet0.6150.7050.6820.6440.6700.6930.6340.663
3D-UNet0.6360.6500.6870.6810.6840.7030.6520.670
UNetPlus0.5300.6470.6330.5030.6260.6180.5370.584
UNet++0.6450.7100.6940.6630.6330.7170.6430.672
SWSDS-Net0.6750.6960.6920.6610.6670.7070.6400.676
DUNet0.5090.6640.6440.5170.5910.6260.4960.578
3D-UNet0.5530.5710.6370.6030.6080.6450.5410.594
UNetPlus0.3200.5990.5480.1670.5030.4640.2740.411
UNet++0.6040.6880.6780.6380.6140.6780.6520.650
SWSDS-Net0.6370.7020.7020.6280.6070.7030.5660.649
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, Q.; Zhang, L.; Xi, Z.; Qian, Y.; Li, A. The Lightweight Fracture Segmentation Algorithm for Logging Images Based on Fully 3D Attention Mechanism and Deformable Convolution. Appl. Sci. 2024, 14, 10662. https://doi.org/10.3390/app142210662

AMA Style

Yang Q, Zhang L, Xi Z, Qian Y, Li A. The Lightweight Fracture Segmentation Algorithm for Logging Images Based on Fully 3D Attention Mechanism and Deformable Convolution. Applied Sciences. 2024; 14(22):10662. https://doi.org/10.3390/app142210662

Chicago/Turabian Style

Yang, Qishun, Liyan Zhang, Zihan Xi, Yu Qian, and Ang Li. 2024. "The Lightweight Fracture Segmentation Algorithm for Logging Images Based on Fully 3D Attention Mechanism and Deformable Convolution" Applied Sciences 14, no. 22: 10662. https://doi.org/10.3390/app142210662

APA Style

Yang, Q., Zhang, L., Xi, Z., Qian, Y., & Li, A. (2024). The Lightweight Fracture Segmentation Algorithm for Logging Images Based on Fully 3D Attention Mechanism and Deformable Convolution. Applied Sciences, 14(22), 10662. https://doi.org/10.3390/app142210662

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop