1. Introduction
Transmission lines serve as critical infrastructure for the transmission of electricity, playing a crucial role in modern industrial and urban social life [
1,
2]. However, these lines, exposed to natural environments over long periods, are prone to defects such as broken and loose strands, posing threats to the safe operation of electrical systems [
3,
4]. To ensure the reliable and safe operation of the power system, regular inspections of transmission lines are necessary. Initially, these inspections heavily relied on manual methods, which were both dangerous and inefficient [
5,
6,
7]. With the advancement of drone technology, UAVs are now employed for transmission line inspections. While UAVs significantly reduce the need for hazardous manual aerial work, the analysis of transmission line images captured under visible light by UAVs requires costly manual rechecking and often leads to high misdiagnosis rates. Given these challenges, the effective and precise detection of faults in transmission lines has become an urgent problem that needs to be addressed.
Before the integration of deep learning with defect detection in power transmission lines, previous research focused on detecting broken and loose strands in these lines. Some researchers employed non-destructive testing theories to inspect the lines, which offered higher sensitivity compared with several common methods at the time. However, due to the unique structure of power transmission lines compared with other metal components, this method often reduces the effectiveness of defect recognition. Komoda et al. [
8] used visual inspection methods to detect defects in the lines. This method collects line images and identifies defects by extracting line contours or comparing images. Although this method is more direct than the previous one, it is prone to the angle of the acquisition device and background recognition, and it cannot achieve automatic identification of defects in transmission line images. Subsequently, researchers used traditional methods and signal characteristics to detect transmission lines. Cheng et al. [
9] proposed a method based on image space features to detect defects in the insulators of transmission lines. Yuan et al. [
10] used non-destructive ultrasonic phased array technology to detect composite insulators. Xiao et al. [
11] designed a new overhead ground line detection technology based on the effect of magnetic leakage signal on gap, elevation distance, defect width, and section loss rate.
With the rapid advancement of artificial intelligence [
12,
13], methods for detecting defects in transmission line images based on deep learning have been widely adopted for automatically modeling collected transmission line images, thereby facilitating the detection of defects such as broken and loose strands in transmission lines. Ni et al. [
14] adapted the traditional Faster R-CNN model and utilized the concept-ResNet-v2 network as a foundational feature extractor to detect defects in critical components of transmission lines. Chen et al. [
15] proposed an enhanced Faster R-CNN network incorporating deformable convolutions and feature pyramid modules for the intelligent detection of transmission line defects. Fu et al. [
16] employed a three-channel feature fusion network to enhance feature extraction capabilities while preserving spatial and semantic information, achieving high-precision detection of transmission line defects.
In recent years, an increasing number of scholars have dedicated their efforts to researching efficient denoising algorithms to enhance the clarity of images. Yu et al. [
17] proposed a noise-reduction algorithm, the adaptive neighborhood weighted median filtering (NW-AMF) algorithm, to accurately identify insulator defects. The algorithm utilizes a weighted summation technique to calculate the median value of the neighborhood of a pixel point, effectively filtering out noise in the captured aerial images. Bhadra et al. [
18] presented a novel architecture for anomaly detection and classification of high-voltage transmission lines. The architecture utilizes a self-attentive convolutional neural network augmented with wavelet transform (WSAT-CNN). The WSAT-CNN model is designed to improve noise immunity and prioritize fault characteristics. Shen et al. [
19] designed a transmission line safety warning technology based on multi-source data sensing to address the issue of poor timeliness in traditional transmission line safety warnings. The multi-source data for the transmission line are acquired through preprocessing the transmission line video image, which includes histogram equalization, denoising, sharpening, edge detection, and segmentation.
The aforementioned studies provide various new perspectives on detecting faults in transmission lines. However, in practical situations, instances of transmission line faults are relatively rare compared with normal conditions, resulting in an imbalance between normal samples and samples with broken and loose strands. This significantly impacts the accuracy of line condition recognition. Additionally, directly applying photos captured by drones for image detection may lead to lower detection accuracy. Therefore, this paper proposes a method for detecting broken and loose strands in transmission lines based on multi-strategy image processing and an improved deep network. Firstly, to address the influence of lighting or other background factors on the recognition of captured images of broken and loose strands, multi-strategy image processing, including wavelet denoising-based image enhancement [
20,
21], HSV color space-based multi-threshold segmentation [
22], and morphological analysis, is proposed to process the images. Subsequently, GAN [
23] is employed to generate images of transmission lines to enhance the sample morphological diversity and reduce the effect of data imbalance. Finally, the deep network GoogLeNet is improved by superseding the original cross-entropy loss function with focal loss function [
24] to enhance the defect recognition accuracy. The contributions of this paper are as follows:
- (1)
A multi-strategy image processing method is proposed to extract the transmission line area to reduce the interference of the environmental background in UAV photos;
- (2)
GAN is used to generate transmission line images, which enhance the diversity of sample morphology and reduce the impact of data imbalance;
- (3)
The focal loss function is introduced into the GoogLeNet feature extraction network so that the network can achieve higher fault detection accuracy in the case of an imbalance between class samples.
The rest of this paper is as follows:
Section 2 describes the process of multi-strategy processing of the transmission line image to extract the valid region.
Section 3 presents the transmission line defect detection method based on the improved deep network. Subsequently, in
Section 4, a real transmission line image dataset is used to test the performance of the proposed method. Finally, the conclusions are provided in
Section 5.
2. Multi-Strategy Image Processing
Extracting the line area is an important guarantee for the accurate detection of broken strand defects in transmission lines. In this paper, a multi-strategy image processing method including wavelet denoising-based image enhancement, HSV color space-based multi-threshold segmentation, and morphological analysis to extract the line regions. This method can effectively solve the problem that the transmission line is disturbed by background noise, which creates a premise for the subsequent detection of transmission line defects.
2.1. Image Enhancement Based on Wavelet Denoising
The influence of the image acquisition device or its surrounding environment can negatively affect the image quality, resulting in noise in the image. These noisy signals will degrade the quality of the image and may confuse useful information in the image, thus reducing the stability and accuracy of image processing.
In order to reduce noise interference in image processing, this paper selects the wavelet hard threshold to denoise the collected transmission line images. The basic idea of wavelet hard threshold denoising is to separate the image signal from the noise by using wavelet transform and then process the wavelet coefficient according to the set threshold. The hard threshold function decompositions the decomposition coefficients smaller than the threshold in different-scale spaces to zero, while preserving the decomposition coefficients larger than the threshold. Wavelet hard thresholding effectively separates image information from noise, accurately removes noise, and retains useful details. It is a simple, easy-to-implement method widely used in image denoising.
The wavelet coefficient preservation calculation formula of wavelet hard threshold image denoising is as follows:
where
wj,k is the wavelet coefficient and
λ is the critical threshold.
2.2. Multi-Threshold Segmentation Based on HSV Color Space
Image segmentation is a crucial step in image processing and is also one of the most critical tasks in image processing. The so-called image segmentation refers to the process of dividing the pixels of an image into several different sets of regions; each set represents an entity or background in the image. The threshold method converts a gray image to a binary image by dividing it into two regions based on pixel values and a threshold. It is essentially a transformation from an input image
F to an output image
G. The transformation formula is as follows:
where
T is the threshold value,
F(
i,
j) = 1 for elements in the target region, and
F(
i,
j) = 0 for elements in the background.
HSV multi-threshold segmentation is an image processing technology that uses the characteristics of HSV color space to segment images by setting multiple thresholds. This method considers both color and brightness information, improving the accuracy of extracting target objects and features. By setting different thresholds, multiple target regions in an image can be effectively segmented and extracted, which can be used for various image processing and analysis tasks. Since the multi-threshold segmentation of HSV color space can better reflect the salient color features, it is widely used in the field of image processing. Among them, the three components of the HSV color space corresponding to the image are:
where
R represents the red channel in the original image,
G represents the green channel in the original image,
B represents the green channel in the original image, max is the maximum value of the original image pixel, min is the minimum value of the original image pixel, and
S represents the saturation of the image.
Thresholding is one of the most commonly used segmentation methods that classifies pixels in an image according to their gray value with a preset threshold. In transmission line segmentation, it is difficult to extract the transmission line region with single thresholding. To address this, we use multi-threshold segmentation with threshold intervals to divide the image into regions and then apply morphological processing to remove background noise and isolate the transmission line region.
2.3. Extraction of Transmission Line Regions Based on Morphological Processing
Through the multi-threshold segmentation method, multiple connected regions can be obtained, including the line region, the insulator region, the transmission line region, and many small areas of interference. Firstly, the segmented image is processed by noise reduction and the closing operation to eliminate small noise interference and form large target connected regions. Then, all connected region targets are calculated, and the transmission line area is selected by brushing and extracted. The formula for calculating the area of the connected region is as follows:
where
A represents the area of the connected region,
I(
i,
j) represents the value of the pixel (
i,
j), and
n and
m represent the width and height of the image, respectively.
The principle of this formula is that the value of all pixels in the connected region is summed, and the result is the area of the connected region. In this study, through the analysis of transmission line images, the area occupied by transmission lines is the largest, and the corresponding connected area is the largest. The transmission line region can be extracted by sorting out the largest connected region.
3. Detection of Transmission Line Defects
Multi-strategy image processing can effectively remove the noise and extract the effective region of the transmission line. The following work will focus on defect diagnosis of transmission lines based on improved deep network. This innovative approach will help to improve classification accuracy for a small number of categories, making the model more focused on difficult-to-classify samples and thereby enhancing the overall performance of the detection system.
3.1. GAN
GAN was proposed by Ian Goodfellow [
25], which consists of two sub-networks as illustrated in
Figure 1, a generator and a discriminator. This algorithm has demonstrated strong capabilities in learning data representations through mutual competition. The training strategy is defined by a maximum–minimum game, simultaneously training both components. The generator (
G) extracts samples from a simple noise distribution, such as Gaussian or uniform distribution, maps them to the data space similar to the input real data, and aims to generate data that appear as realistic as possible through training. On the other hand, the discriminator (
D) is trained to maximize the probability of correctly identifying the source of input data. As a result of this adversarial training process, the distribution of generated fake data tends to approximate that of real data.
However, the original GAN uses Jensen–Shannon divergence to measure differences, which fails to accurately reflect the disparity between the two distributions. This issue makes it hard for the generator to use gradients for optimizing parameters, often leading to poor sample quality in practice. To address this issue, Arjovsky [
26] proposed the Wasserstein GAN (WGAN) algorithm. But weight clipping in WGAN can lead to optimization difficulties. In this method, parameters often converge to boundary values, meaning that the discriminator tends to learn a simplistic mapping function. The powerful fitting capability of WGAN is not fully realized. Weight clipping easily causes gradient vanishing or exploding.
To further enhance training stability, a gradient penalty (GP) [
27] was introduced on top of the original WGAN, leading to the development of WGAN with gradient penalty (WGAN-GP). The gradient penalty ensures that the discriminator meets Lipschitz constraints. The specific definition of the gradient penalty term is:
where
represents the linear interpolation between real samples and generated samples:
where
is obtained by randomly interpolating the sampling.
Upon the foundation of the original WGAN optimization objective, the objective function of WGAN-GP is as follows:
where
represents L2 norm, ∇ is the gradient operator, and λ is the coefficient of gradient penalty term and is set to 10.
3.2. GoogLeNet
GoogLeNet [
28] is a deep convolutional neural network developed by the Google research team in 2014. It won first place in the ImageNet competition. The network structure is shown in
Figure 2. The network structure made some improvements on the basis of LeNet [
29] and AlexNet [
30], and introduced new design ideas and techniques so that the network could better deal with complex image classification tasks. Compared with traditional convolutional neural networks, GoogLeNet uses a parallel architecture called the “Inception” module, which is able to simultaneously extract features at different scales and merge them together. This design enhances the network’s ability to capture details and global information without increasing its depth or number of parameters.
The network structure of GoogLeNet contains 22 network layers, but the number of parameters is only 1/36 of that of VGGNet. This is mainly due to the design of the “Inception” module, which reduces the number of parameters by using various convolution kernels of different sizes and pooling operations, and concatenating the feature maps at the end. This parameter efficiency enables GoogLeNet to perform training and inference with little computational resources. GoogLeNet also uses a parallel structure, letting different convolution and pooling operations run simultaneously in separate branches. This design can accelerate the training process of the network, reduce the training period, and improve the convergence and generalization ability of the network. The success of GoogLeNet not only promotes the development of deep learning in the field of image classification, but also provides important implications for subsequent network design. Its innovation and efficiency have enabled deep learning research to enter a new phase, laying the foundation for more complex computer vision tasks. In this paper, the powerful feature extraction capability of GoogLeNet is used to extract features of transmission lines.
3.3. Focal Loss
The focal loss function [
31] was proposed by Sung-Yi et al. as a solution for imbalanced data for object detection. The focal loss function considers the contribution of each sample to the loss according to the classification error. When using this loss function, when the model classifies the sample correctly, the loss will be reduced. This approach addresses class imbalance by focusing the loss indirectly on the challenging classes. In order to introduce the focal loss function, first end the common binary classification cross-entropy loss function for classification, which is expressed as follows:
where
p is the estimated probability of the model and
p ∈ [0, 1],
y ∈ {±1} represents the true class. The focal loss function can be extended to the case of multi-class classification by defining the parameter
pt, which is defined as follows:
According to the above Equation (7), the binary classification cross-entropy loss function is expressed as follows:
The balanced binary classification cross-entropy loss function is introduced as follows:
The above Equation (9) solves the problem of class imbalance by adding a weight factor
α to class 1 and adding 1 −
α to class −1. At the same time, this formulation is considered as a simple extension of the binary classification cross-entropy loss function, where
α can be set by the inverse of the class frequency or set to a hyperparameter fixed by cross-validation. The focal loss function is an extension of the cross-loss entropy loss function, which includes a weighting term. The formula is given in Equation (10):
where
α and
γ are both adjustable parameters, and
γ is a fixed positive value used to adjust the weighting speed of samples. When the focal loss function is similar to the cross-entropy loss function, and when
γ increases, the efficiency of the modulation factor will also increase:
where
α is used as a fixed value between 0 and 1 to balance the positive and negative labeled samples; this parameter constitutes a general solution for the balanced class, and the classification accuracy using the
α-balanced form is better than that using the non-
α-balanced form.
3.4. Proposed Method
In practice, there are relatively few defects in transmission lines, resulting in an imbalance between normal samples and fault samples, which adversely affects the accuracy of line state recognition. Therefore, this paper proposes a novel method for defect diagnosis of transmission lines. Firstly, considering potential illumination and background interferences in captured images, multi-strategy image processing including wavelet denoising-based image enhancement, HSV color space-based multi-threshold segmentation, and morphological analysis is proposed to extract the transmission line area. Subsequently, GAN is employed to generate images of transmission lines to enhance the sample morphological diversity and reduce the effect of data imbalance. Finally, the deep network GoogLeNet is improved by superseding the original cross-entropy loss function with focal loss function to achieve deep feature extraction of images and defect diagnosis of transmission lines. The specific implementation steps are outlined below, and the flowchart is depicted in
Figure 3.
Step 1: Utilize drones and other equipment to acquire original images of transmission lines.
Step 2: Preprocess the original image, including wavelet denoising, multi-threshold segmentation, and morphological processing.
Step 3: Perform sample enhancement on the processed transmission line image through GAN to reduce the impact caused by sample imbalance.
Step 4: Divide the transmission line image data into a training set and test set.
Step 5: Train the imbalanced training set using a GoogLeNet feature-extraction model based on the focal loss function.
Step 6: Save the best-performing trained model and evaluate it using the test set.
Step 7: Obtain detection results for broken strands and loose strands of the transmission lines.
4. Experimental Results and Analysis
The experimental part aims to verify the validity of the proposed method on the actual transmission line image dataset. In this study, three classical deep learning models, including AlexNet, MobileNet-V2 [
32], and DenseNet [
33], were selected as comparative experimental objects, which have excellent performance in fault diagnosis and are used widely in the area of image classification. By comparing the performance of different models in classification accuracy and calculation efficiency, the advantages and disadvantages of the proposed method are evaluated. Through the experimental results, the practicability and effect of the proposed method in practical application will be verified, and a scientific basis and reference will be provided for further engineering applications.
4.1. Dataset Introduction
A total of 1660 transmission line images were collected in this experiment, which contained three health states: healthy, loose strand, and broken strand. Among them, there were 660 normal samples, 646 scattered stock fault samples, and 354 broken stock fault samples, and the resolution of each image was 3024 × 4032. The dataset of transmission lines was processed using the proposed multi-threshold segmentation and morphological method based on HSV color space, and the dataset shown in
Figure 4 was obtained. Then, the data set processed by the multi-threshold segmentation and morphology method based on HSV color space was divided, in which 60 samples of each class were used as the test set, and the remaining samples of each class were used as the training set. The division of the data set is shown in
Table 1.
4.2. Results and Analysis
In this paper, the programming language is Python3.8, the programming environment is pytorch1.13.1, the running computer configuration is as follows: the operating system is Windows10, the CPU is Intel(R)Core(TM)i5-13490F, and the CPU is Intel(R)Core(TM). At the same time, the NVIDIA GeForce RTX3060Ti graphics card is loaded, the video memory is 8 G, and the general parallel computer CUDA architecture is 11.6.1. In order to alleviate the excessive training parameters caused by the large size of the input image, firstly, the collected image of the transmission line with 3024 × 4032 resolution was adjusted to 224 × 224, the Adam optimizer was used with a focal loss function, and the learning rate was set to 0.0002. The batch size of GoogLeNet was set to 16 and the number of training rounds was set to 200.
To verify the effectiveness of the improved model proposed in this paper, AlexNet with the cross-entropy loss function, MobileNet-V2, and DenseNet were selected for comparison. The other conditions of the three models were consistent with those of the model in this paper, except for the loss function, and all models were trained for 200 rounds. All experiments were repeated 10 times and the average metrics of 10 tests were employed for evaluation, including accuracy, recall, F1-score, and minimum loss. The metrics of all models are shown in
Table 2, while the training curve and loss curve of the training set are shown in
Figure 5, and the confusion matrices with the highest results among the 10 tests for all models are plotted in
Figure 6.
As shown in
Table 2, AlexNet exhibited the lowest classification accuracy of 86.34%, indicating its inferiority compared with the other three models in terms of overall classification capability. Although AlexNet achieved a recall of 90.11% and an F1-score of 89.10%, demonstrating strong performance in correctly identifying samples, its overall classification performance was limited by its lower accuracy, suggesting challenges in recognizing certain categories relative to the other models. In contrast, DenseNet excelled, with a classification accuracy of 94.01%, a recall of 97.20%, and an F1-score of 97.37%. This suggests that DenseNet provides a more comprehensive and balanced recognition ability across all categories, resulting in superior overall performance.
It is evident that the proposed model outperformed other methods across all metrics. With a classification accuracy of 97.83%, it demonstrated exceptional recognition capabilities. The recall rate of 97.81% and F1-score of 97.78% further highlighted the model’s excellence in both accurate identification and comprehensive classification. Additionally, achieving the lowest loss value of 0.04 substantiated its superior performance. Collectively, these metrics indicate that the proposed model has a significant performance advantage and exhibits outstanding overall effectiveness.
In addition, according to
Figure 5, it is evident that the loss of the proposed model decreased more significantly and smoothly after replacing the cross-entropy loss function. The classification accuracy reached a peak of 97.83% and stabilized after 150 epochs of model training. The fluctuations in the accuracy curve before 150 epochs could be attributed to the model’s insufficient extraction of transmission line image features and the incomplete mapping of these features to their corresponding labels. After 150 epochs, the accuracy curve of the proposed model became almost smooth and achieved the highest accuracy compared with other models. In contrast, the average test accuracy of DenseNet reached 94.01%, but the training accuracy curve of the model fluctuated too much, the loss decline was unstable, and the convergence effect was poor. The average test accuracy of the AlexNet model was only 86.34%, and the model training was unstable. It can be seen that the improved model proposed in this paper had the best classification test accuracy, the loss decreased more smoothly, and the loss converged to 0.04 at the lowest level in the training process. According to the comparison of the four methods, using GoogLeNet as the feature extraction model achieved a better feature extraction effect than AlexNet, MobileNet-V2, and DenseNet. At the same time, for the imbalanced training samples in the data set, focal loss can have a better classification effect. Compared with the original cross-entropy loss function, the obtained color loss function curve is smoother and more stable, which shows the effectiveness of the proposed method.
As shown in
Figure 6, it can be seen that AlexNet has a poor classification effect on the category, and the highest accuracy is only 87.78%. Although DenseNet and MobileNet-V2 had better classification results than AlexNet, with the highest classification accuracy of 94.44% and 92.22%, respectively, they misclassified more samples. In contrast, the highest accuracy of the proposed method reached 98.89%. Only two loose samples were misclassified into the category of broken, which was attributed to a certain similarity in features between loose and broken faults in some images of transmission lines. Other comparative models had poor classification ability due to their limited feature extraction capabilities.
In order to further validate the effectiveness of proposed the improved GoogLeNet, the classification results of the validation set of the four methods were visualized by T-SNE (t-distributed stochastic neighbor embedding), as shown in
Figure 7. As can be seen from the figure, the proposed method had the lowest misclassification compared with the other three methods. In addition, the DenseNet model had good classification performance compared with MobileNetV2 and AlexNet. The AlexNet model achieved the worst classification performance in this dataset, and there was more overlap of misclassification between the three categories.
5. Conclusions
The approach toward the traditional manual inspection of transmission line images captured by UAVs has been increasingly replaced by deep learning due to it is capability to extract features automatically in defect detection. However, under practical conditions, the fault images captured by UAVs are often imbalanced and limited, as the probability of physical faults in transmission lines is much lower than that of normal states. Additionally, the captured images usually are affected by varying shooting backgrounds, which hinders the effectiveness of feature extraction. But the existing research rarely considers the impact of the aforementioned issues simultaneously. To overcome the defects of traditional inspection methods and the shortcomings of existing deep-learning-based inspection methods, this paper proposes a novel method based on multi-strategy image processing and an improved deep network is proposed to conduct defect diagnosis of transmission lines.
Firstly, multi-strategy image processing, including image enhancement based on wavelet denoising, and multi-threshold segmentation based on HSV color space and morphological analysis was proposed to reduce background interference and dextract the effective area of transmission lines. Then, GAN was used to generate the transmission line image, which enhanced the diversity of sample morphology by augmenting samples and reduced the impact of data imbalance. Finally, the focal loss function was replaced with the original cross-entropy loss function to improve GoogLeNet to realize the deep feature extraction of the image and defect diagnosis of the transmission line. The actual transmission line data set was used to test the performance of the proposed method. Experiments showed that, compared with the contrastive models, the proposed method had the best accuracy in the classification and detection of line defects.
Although the proposed method demonstrated excellent performance in detecting defects in transmission lines, it was primarily focused on surface defects, and there were limitations in detecting potential internal faults caused by overheating. In future research, the multi-modal models will be our potential research direction, which can achieve more comprehensive fault detection for transmission lines with multidimensional information, including images, sensor data, infrared imagery, and so on. However, when studying multimodal models in complex transmission line environments, challenges related to equipment installation and data collection may arise.