1. Introduction
With the continuous growth in electricity demand, the construction of overhead transmission lines has rapidly expanded [
1,
2,
3]. The power industry is a crucial part of the national economy, especially the stable operation of transmission lines, which is vital for the quality and safety of electricity transmission within the power grid [
4,
5]. The environment around transmission lines is complex, often affected by foreign objects such as bird nests, plastic trash, kites, and balloons. These objects can cause short circuits [
6,
7], leading to faults in the power system, and they pose economic and safety risks [
8,
9]. According to data from the State Grid of China, incidents caused by these foreign objects rank second only to lightning and external force incidents among all causes of power outages [
10]. Therefore, the timely detection and handling of these foreign objects are crucial for ensuring the safety of transmission lines.
With the accelerated development of China’s power grid infrastructure, the construction of transmission lines has become increasingly complex and dense [
11]. Traditional manual inspections, due to high operational risks and heavy workloads, can no longer meet current demands. In practical applications, manual inspections are greatly limited by safety risks and inefficiency. Although unmanned aerial vehicles (UAVs) and similar equipment have improved the safety and efficiency of inspections to some extent, the onboard detection algorithms lack real-time processing capabilities and unbiased training data, significantly hindering their adaptability to complex scenarios [
12]. Therefore, there is an urgent need for more intelligent detection methods that efficiently integrate inspection and detection tasks to meet the practical needs of modern power grid construction [
13].
In recent years, artificial intelligence (AI)-based technologies have been widely applied to foreign object detection tasks. However, these tasks face multiple challenges: on the one hand, the targets to be detected on transmission lines are irregular in size, with sparse and unevenly distributed samples; on the other hand, since transmission lines are typically installed across diverse geographical areas, the acquired images often have complex and variable backgrounds. Moreover, existing methods are hindered by high computational costs. To address these challenges, this paper proposes a GCP-YOLO algorithm (an improved of YOLOv8 incorporating GSCDown, CSPBlock, and PAM) for detecting foreign objects on transmission lines. The main contributions of this article are summarized as follows.
This study constructs the Transmission Line Foreign Object Detection (TL-FOD) dataset, which contains 2817 images, through techniques such as hue, saturation, and value (HSV) enhancement, random blur and noise addition, and simulated weather conditions. Additionally, utilizing the AIGC platform effectively addresses the issue of sample imbalance. This dataset provides comprehensive and authentic data support for foreign object detection on power transmission lines.
In this study, the algorithm improves with targeted modifications to address the challenges in transmission line detection. Specifically, the GSCDown module replaces standard convolution by utilizing dimensionality reduction and expansion techniques, which not only reduce parameters and computational costs but also integrate multi-scale features, thereby improving computational efficiency and addressing the challenge of varying target sizes. The CSPBlock, designed with cross-stage partial connections and grouped convolutions, captures richer semantic information while maintaining relatively stable computational complexity. To improve the recognition of different objects, the introduced PAM highlights key information without increasing parameters, effectively distinguishing between background and target, and ensuring high accuracy even in the presence of background noise. As a result, compared to the original model, the improved architecture achieves an 18% reduction in parameters and a 4% reduction in computational cost while improving detection accuracy (
[email protected]) from 84.1% to 88.5%.
Employing a distillation strategy that integrates feature, response, and relational knowledge enhances detection accuracy, enabling the student network to extract more information from the teacher network, ultimately improving
[email protected] to 89.6%.
In foreign object detection for transmission lines, image sensors are the key components for obtaining high-quality image data. The effectiveness of the proposed improved YOLOv8 algorithm relies on high-quality datasets, which are sourced from high-resolution image sensors. Commonly used sensors include CMOS (complementary metal-oxide semiconductor) and CCD (charge-coupled device) sensors, which are capable of capturing foreign object information in the complex environments surrounding transmission lines at high resolutions and frame rates. These sensors are typically mounted on drones or ground-based monitoring equipment, allowing for 24/7 image acquisition around the transmission lines, thereby providing stable and reliable data support for detection tasks.
The paper is structured as follows.
Section 2 discusses the current literature regarding object detection.
Section 3 describes the proposed GCP-YOLO network.
Section 4 provides detailed information about the proposed TL-FOD dataset and analyzes the experimental results. Finally,
Section 5 concludes the work.
2. Related Work
Currently, image-based foreign object detection methods for transmission lines can be categorized into three main types: methods based on traditional hand-crafted features, methods based on machine learning, and methods based on deep learning.
2.1. Traditional Foreign Object Detection Methods
Researchers employ conventional algorithms for the detection of foreign object intrusions in power transmission lines. C. Chen et al. utilize adaptive filtering techniques to eliminate terrain interference and employ Euclidean clustering algorithms to segment the detection results, thereby constructing a recognition model for foreign object intrusions in power lines [
14]. L. Cheng et al. introduce an algorithm based on distance estimation focused on accurately calculating the coordinates of foreign objects [
15]. Concurrently, S. Jiao et al. integrate the Euclidean distance method with a predictive region drift strategy to enhance the accuracy of foreign object detection [
16]. Despite these advancements, traditional algorithms have limitations in recognizing types of foreign objects, noise resistance, and the diversity of target segmentation, increasing the complexity of detection.
2.2. Classical Machine Learning-Based Detection Methods
Consequently, numerous scholars explore the use of machine learning methods such as multilayer perceptrons (MLPs) and support vector machines (SVMs) for the detection of foreign objects. S. Z. Wu et al. propose a cascaded structure based on MLP [
17], while F. Mahdi Elsiddig Haroun et al. enhance the feature set of SVMs using satellite imagery to improve detection efficiency [
18]. X. Ye et al. employ a particle swarm optimization-enhanced SVM for detection purposes [
19]. Although machine learning models require extensive feature engineering, their data mining capabilities may be inferior to those of deep learning models. However, the superior high-level feature extraction and end-to-end solutions provided by deep learning present new possibilities for the detection of foreign objects in power transmission lines.
2.3. Deep Learning-Based Foreign Object Detection Methods
With the rapid development of high-performance computing and artificial intelligence, intelligent and automated detection of foreign objects on transmission lines has become an important research area. These methods can automatically learn features and identify the location of foreign objects. Depending on the stage of the process, these methods can be classified into multi-stage detectors and end-to-end detectors.
Multi-Stage Detectors. Liang et al. develop a method for detecting foreign objects in power transmission lines based on the Faster R-CNN framework [
20]. Guo et al. [
21] first constructed a dataset of transmission line images and then employed a region-based convolutional neural network (Faster R-CNN) to detect foreign objects such as fallen items, kites, and balloons. This method can handle foreign objects of various shapes, demonstrating excellent generalization ability. Despite the high accuracy of the two-level networks, the processing speed is slow and not suitable for real-time detection.
End-to-End Detectors. In contrast, single-stage networks such as SSD and the YOLO series demonstrate higher detection speeds and efficiency. Li et al. replace the backbone network of YOLOv3 with Mobilenetv2 to reduce the number of parameters [
22], albeit at the expense of some detection rate. Song et al. enhance performance by integrating k-means clustering and DIoU NMS with YOLOv4 [
23,
24]. Huang et al. utilize YOLOv5s in conjunction with Ghost convolution and KL divergence loss [
25,
26]. Liu et al. incorporate attention mechanisms and ASPP modules into YOLOX to improve detection accuracy [
27]. Meanwhile, Yu et al. combine hyperparameter optimization and SPD convolution with YOLOv7 to enhance the accuracy of detecting small targets [
28], although this results in slower detection speeds. Yang et al. [
29] developed a foreign object detection algorithm based on a denoising convolutional neural network (DnCNN) and YOLOv8. They consider various types of foreign objects, such as bird nests, kites, and balloons, achieving a mean average precision (mAP) of 82.9%. However, there remains significant room for improvement in detection performance, and the diversity of categories in the dataset requires further enhancement.
Both end-to-end and multi-stage algorithms have achieved favorable results on the given dataset. Multi-stage detection algorithms exhibit a clear advantage in terms of detection accuracy, although they also require considerable computational resources and processing time. In contrast, end-to-end detectors are trained using illumination models, aiming to strike a balance between detection accuracy and speed. Our work falls under the category of end-to-end detectors.
Given the constraints on computational resources and the need for rapid processing in practical deployments, the choice is made to utilize YOLOv8n [
30]. This model balances minimal parameter and computational demands with high accuracy and speed. Deep learning approaches rely heavily on large quantities of high-quality data. However, the challenges of class imbalance and data scarcity in the detection of foreign objects in power transmission lines limit both model training and accuracy. Consequently, research shifts towards refining algorithms and enhancing data. Algorithm refinement includes techniques such as zero-shot learning and transfer learning [
31,
32], while data augmentation methods encompass image enhancement, GAN-based techniques [
33], and generative model approaches [
34].
3. Proposed Method
The improved network framework, GCP-YOLO, is illustrated in
Figure 1. Specifically, in the feature extraction module, we introduced the GSCDown module (as shown in
Figure 2), which decouples spatial reduction and channel augmentation operations, significantly reducing both the number of parameters and computational complexity. This approach not only enhances detection efficiency but also improves detection accuracy for foreign objects. In the feature fusion module, we utilize the CSPBlock to replace the C2f structure in YOLOv8 (as shown in
Figure 3). The cross-stage connections are employed to ensure the transmission and fusion of semantic information between feature maps at different scales, thereby enhancing feature representation. To supervise the recognition of different targets with varying receptive fields, we incorporate PAM-M and PAM-A modules in front of the small-scale and large-scale detection heads, respectively (as shown in
Figure 4). These modules emphasize key information without introducing additional parameters, allowing the model to accurately distinguish between background entities and target entities, which is particularly advantageous for addressing complex background scenarios commonly found in transmission line images. Finally, a knowledge distillation strategy is employed using the YOLOv8x as the teacher model to further improve the model’s accuracy.
3.1. Ghost Shuffle Channel Downsampling
Detecting foreign objects on power transmission lines presents significant challenges due to the diversity and complexity of these objects. Foreign objects can vary greatly in shape, size, and material, including bird nests, ice layers, plastic bags, kites, and balloons. The variations of these objects under different environmental conditions further complicate detection. Additionally, the widespread distribution of transmission lines necessitates a detection system that can rapidly process large volumes of data with limited computational resources to achieve real-time and efficient detection.
YOLOv8 employs
standard convolutions for spatial downsampling and channel transformation, but this approach significantly increases computational cost and the number of parameters. Specifically, the computational cost of YOLOv8 is
, and the number of parameters is
, where
H,
W, and
C represent the height, width, and number of channels of the feature map, respectively. To address this issue, we introduce the GSCDown module, which enhances downsampling efficiency by separating spatial reduction and channel expansion operations. As shown in
Figure 2, the GSCDown module first uses a
convolution layer to halve the number of input feature map channels, reducing computational complexity and the number of model parameters, thereby improving system computational efficiency. It then employs the GSConv [
35] structure to perform grouped convolutions, lowering computational complexity while maintaining its feature representation ability. GSCDown integrates global context pooling with standard convolutional downsampling, reducing feature map resolution (and computational complexity) while ensuring that critical contextual information is retained when the model processes global features. This helps in recognizing complex foreign objects, reducing computational cost and preserving downsampling information. Consequently, it maintains high performance while reducing latency.
The design of the GSCDown module reduces the number of parameters in YOLOv8 from to , significantly decreasing the parameter count and ensuring efficient system operation in resource-constrained environments, thereby meeting the demands of real-time detection. The GSCDown module is particularly suited for the challenging application of detecting foreign objects on power transmission lines. Through innovative convolution design, it enhances the object recognition ability while maintaining high precision, significantly improving system computational efficiency and achieving more efficient downsampling.
3.2. Cross-Stage Partial Block
Feature Pyramid Network is a key technology in the field of object detection, enhancing detection performance by integrating features at different resolutions [
36,
37,
38]. The conventional FPN [
36] introduces a top-down pathway to fuse multi-scale features. However, considering the limitations of unidirectional information flow, PAFPN [
38] adds a bottom-up path aggregation network, which improves feature fusion effectiveness.
In the process of model fusion for foreign object detection on power transmission lines, the C2f structure, which primarily relies on 3 × 3 convolutional layers and a simple feature segmentation and fusion method, is inadequate for effectively capturing and representing small and complex foreign object features. Although the C2f structure demonstrates good computational efficiency, this efficiency is achieved at the cost of reduced feature representation abilities. Additionally, C2f performs feature segmentation and fusion at a single scale, lacking the ability to fuse multi-scale features, which is crucial for detecting foreign objects of varying sizes and shapes—a capability that C2f lacks. In contrast, the CSPBlock (Cross-Stage Partial Block), derived from CSPNet (Cross-Stage Partial Network) [
39], enhances feature fusion through an innovative design involving cross-stage partial connections and grouped convolutions, thereby improving feature representation while maintaining relatively stable computational costs. The CSPBlock structure is depicted in
Figure 3. Specifically, CSPBlock divides features into two parts: one part is directly transmitted through cross-stage connections, while the other part undergoes complex convolutional operations before being fused. This design not only reduces computational burden but also preserves rich feature information. Grouped convolutions reduce the number of parameters and computational load, while dense connections ensure efficient information transmission.
In the context of foreign object detection on power transmission lines, the introduction of CSPBlock significantly enhances the model’s feature representation abilities. By employing a cross-stage partial connection mechanism, CSPBlock ensures efficient transmission and fusion of features across different stages, thereby improving detection accuracy and robustness. Furthermore, CSPBlock’s spatial awareness is markedly improved. The use of grouped convolutions and dense connections maintains the integrity and continuity of spatial information during feature fusion, enabling the model to perform better in handling complex backgrounds and multi-scale features. This is particularly beneficial in the complex environments of power transmission lines, where it effectively distinguishes between background and foreign objects. Additionally, the design of CSPBlock effectively mitigates the vanishing gradient problem, ensuring efficient gradient transmission in deep networks, which enhances the stability and convergence speed of model training. Finally, CSPBlock strikes a balance between model lightweightness and real-time performance. It enhances feature representation capabilities without significantly increasing computational load, making it suitable for scenarios requiring long-term monitoring and efficient foreign object detection on power transmission lines.
3.3. Pooling Attention Mechanism
Power transmission line images often face complex background clutter challenges, including elements such as sky and vegetation that are irrelevant to the subject, as well as large structures like towers and wires that may serve as backgrounds for small targets like bird nests. In such visual environments, accurately identifying and separating key target objects places higher demands on the neural network’s representation capabilities. However, increasing network depth typically accompanies a significant rise in parameter count, conflicting with the goal of model lightweighting. Therefore, this section introduces PAM to enhance the model’s accuracy without increasing computational burden, thereby better addressing the complex background issues in images of foreign objects on power transmission lines. The pooling attention mechanism is shown in
Figure 4.
Pooling layers, as parameter-free fixed operations, can maintain certain feature invariance. Specifically, average pooling helps reduce the increased estimation variance due to limited neighborhood size, while max pooling mitigates the estimation bias caused by parameter errors in convolutional layers [
40]. During the decoding prediction process, the network outputs feature maps of sizes
,
, and
, respectively. If the receptive field is too small, excessive local information retrieval may lead to the loss of large targets. Conversely, if the receptive field is too large, small targets may be overlooked. Therefore, appropriately designing the size of the receptive field is crucial to ensuring accurate target detection.
Channel attention mechanisms compress spatial dimensions of feature maps while retaining channel dimensions, focusing on solving object recognition problems. Conversely, spatial attention mechanisms compress channel dimensions while preserving spatial dimensions of feature maps, focusing on solving object localization problems. In shallower networks, due to limited receptive fields, the network may overly emphasize local information, making small targets easily captured by the network and leading to misidentification of some large targets as background. To improve this, this study introduces a channel attention mechanism that enhances the representation of local details by focusing on information interaction between different channels. It utilizes average pooling to extract more global features from local information, thereby better distinguishing foreground from background.
As the number of layers in a neural network increases, the receptive field covered by the feature maps gradually enlarges, potentially leading to the neglect of smaller targets by the network. To address this issue, a spatial attention mechanism is introduced to enhance the detection ability for small targets. In this section, max pooling is employed to simulate the spatial attention mechanism. Max pooling can retain key features and extract finer-textured information, thereby effectively capturing small targets and reducing missed detections.
3.4. Feature Distillation
Knowledge distillation is a model compression technique that allows a smaller student model to learn from and achieve improved performance by mimicking a pre-trained, larger teacher model, as highlighted by Hinton et al. [
41]. In the knowledge distillation process, the teacher model’s outputs (typically the softmax layer’s output, i.e., the probability distribution) serve as the target for training the student model, a process referred to as “distillation”. Through this method, the student model can learn the decision-making process of the teacher model, even without direct access to the original training data. Knowledge distillation can be based on different types of knowledge, including response-based knowledge (i.e., the teacher model’s predicted outputs), feature-based knowledge (activations from the intermediate layers of the teacher model), and relation-based knowledge (the relationships between different layers or samples).
The GCP-YOLO network is trained using a knowledge distillation algorithm, as depicted in
Figure 5. The transmission line training set is simultaneously input into both the GCP-YOLO network and the YOLOv8x model for forward propagation. During this process, both the teacher and student networks obtain corresponding feature maps from the feature extraction layers. To ensure that the lightweight student model, GCP-YOLO, extracts feature information equivalent to that of the teacher network, a feature map loss function is constructed using the feature maps of both networks. Through back-propagation, the feature loss is gradually minimized, thereby enhancing the feature extraction ability of the lightweight network. Similarly, the Kullback–Leibler (KL) divergence loss function and mean square error function are utilized to construct a prediction probability loss function and regression result loss function between the teacher and student networks. This guides the student network to learn from the teacher network, allowing it to match the teacher network’s classification and regression performance. As seen in the structure diagram, the recognition results of the lightweight GCP-YOLO model exhibit classification loss, regression loss, and confidence loss, collectively represented by loss
L. Through back-propagation, the gap between the recognition results of the lightweight GCP-YOLO network and the ground truth is minimized. Since the teacher network, YOLOv8x, is pre-trained, the back-propagation process only occurs within the student network, i.e., the GCP-YOLO network.
We conducted a quick validation experiment to select the most suitable distillation method for our GCP-YOLO model. The results are shown in
Table 1. Our conclusion is that CWD (content and weighted dissimilarity) is better suited for our model, while MGD underperforms compared to mimicking due to its complex hyperparameters, which reduce its generalizability. During the feature distillation process, the loss ratio is set to 1.0 to fully leverage the feature distillation loss in optimizing the student model. Experimental results show a significant improvement in model performance through feature distillation, with the mean average precision (mAP) increasing by 1.5%. These substantial improvements confirm the effectiveness of feature distillation in the YOLO algorithm and present a new approach for optimizing model performance. Notably, the use of a constant decay factor combined with CWD’s feature loss type provides us with an efficient feature distillation strategy.
4. Results and Analysis
4.1. Dataset Preparation
Using high-precision localization and intelligent analysis techniques, this research meticulously designs and constructs a dataset for detecting foreign objects on power transmission lines. The composition of the TL-FOD dataset is presented in
Table 2. The dataset encompasses four common types of foreign objects: bird nests, kites, balloons, and plastic bags. Additionally, the insulator category is included to enhance the algorithm’s recognition abilities, totaling 1401 images. To augment the dataset’s diversity and robustness, HSV enhancement techniques are employed to simulate various lighting conditions in the images. In practical applications, object detection systems may encounter extreme weather conditions such as rain, snow, and fog. Incorporating these factors during model training can enhance the system’s real-world performance. To simulate raindrop shapes and dynamics, we applied noise generation and motion blur, adjusting raindrop density and length to represent varying intensities of rainfall. Snowfall is simulated using random noise and Gaussian blur to mimic the distribution and depth of snowflakes, with varying degrees of blur to depict snowflakes at different distances. Fog is simulated by adjusting transparency and contrast using an exponential decay formula, gradually blurring distant objects and reducing overall contrast to reflect diminished visibility.
To address the issue of sample imbalance, we employed Photoshop’s generative fill feature, utilizing Adobe’s AI-generated content (AIGC) technology to effectively augment the dataset. AIGC leverages artificial intelligence to automatically generate various forms of content, including text, images, audio, and video. Photoshop’s generative fill, an intelligent image editing tool developed by Adobe based on AIGC, allows users to select specific areas of an image and automatically generate content that seamlessly blends with the surrounding environment using AI algorithms. This feature enables image restoration, expansion, or object removal. Adobe has enhanced this function by integrating generative adversarial networks (GANs) and deep learning models, making the generative fill more intelligent and natural, thereby simplifying complex image editing workflows. The specific operational steps are as follows.
Area Selection: Using Photoshop’s selection tools (such as the rectangular marquee or lasso tool), the user marks the area of the image where content needs to be modified or generated.
Text Description Input: After selecting the area, the user can input a natural language description of the desired content in the provided text box. For example, inputting “White plastic bags were wrapped around the iron frame” allows the AI to analyze both the text description and the context of the selected area, generating content that meets the user’s expectations.
AI-Generated Image: Upon confirming the text input, the user clicks the “Generate” button, and Photoshop produces the final image (as shown in
Figure 6).
Additionally, to create class labels, we utilized a visual image annotation tool called “labelImg”. The annotation results are saved in XML format according to the PASCAL VOC standard, covering five categories: bird nests, kites, balloons, plastic bags, and insulators.
Figure 7 illustrates the data generated using AIGC technology. Through meticulous construction and augmentation, the TL-FOD dataset has been successfully created. It is an image collection focused on foreign object detection on power transmission lines. This dataset comprises a total of 2817 images, including 1401 original images and 1426 augmented images. It has been systematically divided into a training set of 2502 images, a validation set of 157 images, and a test set of 158 images. This division ensures the efficiency of model training and provides a solid foundation for the initial evaluation of model performance and the testing of its generalization ability.
4.2. Experimental Environment and Hyperparameter Settings
The experiments in this study are conducted using the PyTorch deep learning framework. In terms of hardware configuration, the GPU used is an RTX 3090, and the CPU is a 13th Gen Intel
® Core™ i9-13900HX. The experimental setup is detailed in
Table 3.
4.3. Evaluation Metrics
To comprehensively evaluate the detection performance of the proposed improved model, a range of evaluation metrics have been employed. These metrics encompass accuracy, recall rate,
[email protected],
[email protected], model parameter count, model size, and detection speed. Some of the evaluation metrics utilize the following parameters in their formulas: TP (true positives, predicted as positive samples and are indeed positive), FP (false positives, predicted as positive but are actually negative), and FN (false negatives, predicted as negative but are actually positive).
Precision: Precision measures the proportion of correctly predicted positive samples to the total number of predicted positive samples, indicating the precision of the model.
Recall: Also known as the true positive rate, recall denotes the proportion of actual positive samples that are correctly predicted as positive by the model, reflecting the model’s ability to recognize positive samples.
mAP: Mean average precision is the average of precision values across different recall levels. It combines information from both precision and recall to measure the model’s performance.
[email protected] represents the mAP value at an IoU threshold of 0.5, while
[email protected] represents the average mAP over IoU thresholds ranging from 0.5 to 0.95 (usually with a step size of 0.05). mAP is a comprehensive metric for evaluating the performance of detection algorithms. In this study, it refers to the average detection accuracy of all categories of transmission line anomalies. A higher mAP value indicates better overall performance of the algorithm.
FPS: Frames per second is used in object detection to measure the speed at which a system processes images, specifically referring to the number of image frames processed per second. In practical applications, the processing time per frame may include image preprocessing time, model inference time, and post-processing time, among others. The calculation formula for FPS is based on the time taken to process each frame.
4.4. Experimental Results
4.4.1. Ablation Experiment
To comprehensively evaluate the impact of different components on model performance and to validate the effectiveness of the improvement strategies adopted in this study, detailed ablation experiments are conducted. Ablation experiments systematically remove or add model components to determine the contribution of each component to the final model performance.
The detection results of the ablation study model are shown in
Table 4. The focus of the experiments is to examine the impact of three technical modules—GSCDown, CSPBlock, and PAM—on model performance. By ablating one or more of these modules, different models are constructed, and their performance in detection tasks is observed, specifically reflected in key metrics such as
[email protected],
[email protected], model size, parameter count, and GFLOPs. The experimental results indicate that the inclusion of different components significantly affects model performance. For instance, when only the GSCDown module is introduced, the model’s
[email protected] improves from the baseline of 84.1 to 87.2, while the parameter count is reduced by 0.5 M, demonstrating that this module not only enhances detection accuracy but also optimizes the model’s parameter efficiency. Similarly, adding the PAM module also notably improves both
[email protected] and
[email protected] scores. When combining multiple modules, particularly using GSCDown, CSPBlock, and PAM together, the model achieves the highest performance with 88.5
[email protected] and 66.9
[email protected] while maintaining a small model size and low computational complexity.
4.4.2. Distillation Experiment
In this study, distillation experiments are conducted to evaluate the impact of various temperature coefficients (t) on model performance. The results, as summarized in
Table 5, clearly demonstrate the positive effect of the distillation process on mean average precision (
[email protected]). Specifically, the model exhibits optimal performance when the temperature coefficient is set to 2, achieving a 4.1% increase in
[email protected].
Table 5 details the experimental outcomes at different temperature coefficients, utilizing contrastive weighted (CWD) feature loss with a fixed loss ratio of 1.0. The table indicates that, as the temperature coefficient decreases, the
[email protected] value of the model first increases and then decreases. At a temperature coefficient of 1, the model reaches the highest
[email protected] value of 89.6%, representing a significant enhancement compared to the baseline model without distillation (88.5%).
These findings underscore the importance of hyperparameter selection during model fine-tuning and provide valuable insights for future research. The optimal performance at a temperature coefficient of 2 validates the efficacy of this coefficient in the current experimental configuration. Subsequent studies can further explore the effects of different distillation strategies and hyperparameter settings on model performance, aiming to achieve higher accuracy and generalization abilities.
4.4.3. Comparative Experiment
The comparative experimental results in
Table 6 demonstrate that the proposed network architecture outperforms other existing YOLO object detectors in the task of foreign object detection on power lines. A detailed comparison of the performance between the YOLOv8n model and the proposed method is conducted, as shown in
Table 7. The proposed method, compared to the YOLOv8n model, achieves an improvement of 3 to 5 percentage points in mAP across various detection categories. Particularly in the task of detecting foreign objects on power transmission lines, precision increases to 95.5% for the plastic bag category, significantly improving detection efficiency. This enhancement provides a robust technical foundation for deployment in real-world application scenarios.
4.5. PAM
GradCAM is an effective visualization tool used to generate heatmaps [
45]. Through the reverse propagation mechanism of GradCAM, the model’s output class confidence can be converted into gradient values, visualizing the gradient intensity of feature maps in the heatmap. Deeper red shadows indicate areas the model focuses on more, while deeper blue shadows indicate areas of lower attention. As depicted in
Figure 8, the experimental outcomes distinctly reveal the disparities in feature attention among various object detection models. Observations from the results indicate that the YOLOv8n model demonstrates a deficiency in allocating attention to diminutive objects and exhibits a relatively low sensitivity to objects situated at a distance. In stark contrast, the model introduced in this study has shown exceptional performance in mitigating background noise, with a pronounced concentration of attention on the central regions of objects. This refined attention allocation mechanism significantly enhances the accuracy of the model in bounding box prediction tasks, thereby substantially improving the overall detection performance. Furthermore, this improvement not only bolsters the model’s robustness but also provides a robust technical foundation for performance optimization in practical applications.
5. Discussion
Compared to the original YOLOv8 model, the enhanced GCP-YOLO model significantly improves the accuracy of foreign object detection in power transmission lines. However, some foreign objects remain undetected, potentially due to the following factors.
Firstly, environmental complexity is a key factor. Power transmission lines located in outdoor environments are susceptible to weather conditions such as fog, rain, snow, and variations in lighting, which may hinder the detection efficiency of the model. Secondly, the diversity in size and shape of foreign objects poses a challenge, particularly when the sample data are limited. Smaller or irregularly shaped objects may be difficult for the model to identify, as their features may not be salient enough. Thirdly, the limitation of the dataset is an issue. The training dataset includes only common types of foreign objects, leading to the model’s inability to recognize rare or unknown objects.
In this study, we construct a Transmission Line Foreign Object Detection (TL-FOD) dataset, which encompasses images captured under varying resolutions and weather conditions. The richness of the dataset is largely attributed to advancements in sensor technology, particularly the application of high-resolution CMOS sensors. During model training, we simulate different lighting conditions, blur, and noise environments to ensure the model’s robustness in real-world applications. This approach effectively addresses common image quality issues encountered by sensors when capturing images in outdoor settings.
In future work, we will explore how more advanced sensor technologies can further enhance the accuracy of detection algorithms. For instance, integrating infrared sensors or multispectral imaging technology could improve the detection of foreign objects under extreme environmental conditions. Moreover, employing higher-resolution sensors and more intelligent image processing techniques will further enhance the practical application of the YOLOv8 algorithm in transmission line inspections.
Author Contributions
P.D. conceptualized and designed the experiments, executed them, analyzed the findings, contributed significantly to the paper’s composition, and participated actively in all stages of manuscript revision. P.D. also supervised the project and facilitated collaboration among team members. X.L. supplied crucial experimental apparatus, offered valuable feedback, and provided expert guidance, particularly in the methodology and analysis sections. X.L. played a central and influential role in enhancing the overall quality and coherence of the manuscript, especially during the revision phase. All authors have read and agreed to the published version of the manuscript.
Funding
The authors would like to acknowledge financial support from the Fundamental Research Funds for the Central University under Grant 24ZLQN40 and from the Beijing Science and Technology Planning Project under Grant Z231100001723002. (Corresponding authors: Xiao Liang. Co-first authors: Pingting Duan, Xiao Liang).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Conflicts of Interest
The authors declare no competing interests.
References
- Sharma, P.; Saurav, S.; Singh, S. Object detection in power line infrastructure: A review of the challenges and solutions. Eng. Appl. Artif. Intell. 2024, 130, 107781. [Google Scholar] [CrossRef]
- Feng, L.; Zhang, L.; Gao, Z.; Zhou, R.; Li, L. Gabor-YOLONet: A lightweight and efficient detection network for low-voltage power lines from unmanned aerial vehicle images. Front. Energy Res. 2023, 10, 960842. [Google Scholar] [CrossRef]
- Liao, J.; Xu, H.; Fang, X.; Miao, Q.; Zhu, G. Quantitative assessment framework for non-structural bird’s nest risk information of transmission tower in high-resolution UAV images. IEEE Trans. Instrum. Meas. 2023, 72, 5013712. [Google Scholar] [CrossRef]
- Rong, S.; He, L.; Du, L.; Li, Z.; Yu, S. Intelligent detection of vegetation encroachment of power lines with advanced stereovision. IEEE Trans. Power Deliv. 2020, 36, 3477–3485. [Google Scholar] [CrossRef]
- Wang, L.; He, Y.; Li, L. A single-terminal fault location method for HVDC transmission lines based on a hybrid deep network. Electronics 2021, 10, 255. [Google Scholar] [CrossRef]
- Kovács, B.; Vörös, F.; Vas, T.; Károly, K.; Gajdos, M.; Varga, Z. Safety and Security-Specific Application of Multiple Drone Sensors at Movement Areas of an Aerodrome. Drones 2024, 8, 231. [Google Scholar] [CrossRef]
- Yu, H.; Zhang, K.; Zhao, X.; Zhang, Y.; Cui, B.; Sun, S.; Liu, G.; Yu, B.; Ma, C.; Liu, Y.; et al. Research on Data Link Channel Decoding Optimization Scheme for Drone Power Inspection Scenarios. Drones 2023, 7, 662. [Google Scholar] [CrossRef]
- Butt, O.M.; Zulqarnain, M.; Butt, T.M. Recent advancement in smart grid technology: Future prospects in the electrical power network. Ain Shams Eng. J. 2021, 12, 687–695. [Google Scholar] [CrossRef]
- Jing, Z.; Yu, C.; Xi, F.; Wu, F.; Tao, Z.; Yang, P. Reliability analysis of distribution network operation based on short-term future big data technology. J. Phys. Conf. Ser. 2020, 1584, 012027. [Google Scholar] [CrossRef]
- Wen, X.; Wu, Q.; Wang, Y.; Liu, S.; Hao, J.; Lan, L.; Deng, Y.; Gao, L. High-risk region of bird streamer flashover in 110 kV composite insulators and design for bird-preventing shield. Int. J. Electr. Power Energy Syst. 2021, 131, 107010. [Google Scholar] [CrossRef]
- Tang, X.; Shen, W.; Zhu, M.; Bao, W. The foreign object detecting algorithm for transmission lines based on the improved YOLOv4. J. Anhui Univ. (Nat. Sci. Ed.) 2021, 45, 58–63. [Google Scholar]
- Zhu, J.; Guo, Y.; Yue, F.; Yuan, H.; Yang, A.; Wang, X.; Rong, M. A deep learning method to detect foreign objects for inspecting power transmission lines. IEEE Access 2020, 8, 94065–94075. [Google Scholar] [CrossRef]
- Zhang, H.; Zhou, H.; Li, S.; Li, P. Improved YOLOv3 foreign body detection method in transmission line. Laser J. 2022, 43, 82–87. [Google Scholar]
- Chen, C.; Yang, B.; Song, S.; Peng, X.; Huang, R. Automatic clearance anomaly detection for transmission line corridors utilizing UAV-Borne LIDAR data. Remote Sens. 2018, 10, 613. [Google Scholar] [CrossRef]
- Cheng, L.; Wu, G. Obstacles detection and depth estimation from monocular vision for inspection robot of high voltage transmission line. Clust. Comput. 2019, 22, 2611–2627. [Google Scholar] [CrossRef]
- Jiao, S.; Wang, H. The Research of Transmission Line Foreign Body Detection Based on Motion Compensation. In Proceedings of the 2016 First International Conference on Multimedia and Image Processing (ICMIP), Bandar Seri Begawan, Brunei, 1–3 June 2016; pp. 10–14. [Google Scholar] [CrossRef]
- Wu, S.; Kan, M.; He, Z.; Shan, S.; Chen, X. Funnel-structured cascade for multi-view face detection with alignment-awareness. Neurocomputing 2017, 221, 138–145. [Google Scholar] [CrossRef]
- Mahdi Elsiddig Haroun, F.; Mohamed Deros, S.N.; Bin Baharuddin, M.Z.; Md Din, N. Detection of Vegetation Encroachment in Power Transmission Line Corridor from Satellite Imagery Using Support Vector Machine: A Features Analysis Approach. Energies 2021, 14, 3393. [Google Scholar] [CrossRef]
- Ye, X.; Wang, D.; Zhang, D.; Hu, X. Transmission Line Obstacle Detection Based on Structural Constraint and Feature Fusion. Symmetry 2020, 12, 452. [Google Scholar] [CrossRef]
- Liang, H.; Zuo, C.; Wei, W. Detection and Evaluation Method of Transmission Line Defects Based on Deep Learning. IEEE Access 2020, 8, 38448–38458. [Google Scholar] [CrossRef]
- Guo, S.; Bai, Q.; Zhou, X. Foreign object detection of transmission lines based on faster R-CNN. In Proceedings of the Information Science and Applications: ICISA 2019, Seoul, Republic of Korea, 16–18 December 2019; Springer: Singapore, 2020; pp. 269–275. [Google Scholar]
- Li, H.; Liu, L.; Du, J.; Jiang, F.; Guo, F.; Hu, Q.; Fan, L. An Improved YOLOv3 for Foreign Objects Detection of Transmission Lines. IEEE Access 2022, 10, 45620–45628. [Google Scholar] [CrossRef]
- Song, Y.; Zhou, Z.; Li, Q.; Chen, Y.; Xiang, P.; Yu, Q.; Zhang, L.; Lu, Y. Intrusion detection of foreign objects in high-voltage lines based on YOLOv4. In Proceedings of the 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 9–11 April 2021; pp. 1295–1300. [Google Scholar] [CrossRef]
- Hui, Z.; Jian, Z.; Yuran, C.; Su, J.; Di, W.; Hao, D. Intelligent bird’s nest hazard detection of transmission line based on RetinaNet model. J. Phys. Conf. Ser. 2021, 2005, 012235. [Google Scholar] [CrossRef]
- Huang, Y.; Chen, Z.; Chen, Q. Real-time detection method for transmission line faults applying edge computing and improved YOLOv5s algorithm. Electr. Power Constr. 2023, 44, 91–99. [Google Scholar]
- Li, H.; Dong, Y.; Liu, Y.; Ai, J. Design and implementation of uavs for bird’s nest inspection on transmission lines based on deep learning. Drones 2022, 6, 252. [Google Scholar] [CrossRef]
- Liu, B.; Huang, J.; Lin, S.; Yang, Y.; Qi, Y. Improved YOLOX-S abnormal condition detection for power transmission line corridors. In Proceedings of the 2021 IEEE 3rd International Conference on Power Data Science (ICPDS), Harbin, China, 26 December 2021; IEEE: New York, NY, USA, 2021; pp. 13–16. [Google Scholar]
- Yu, C.; Liu, Y.; Zhang, W.; Zhang, X.; Zhang, Y.; Jiang, X. Foreign objects identification of transmission line based on improved YOLOv7. IEEE Access 2023, 11, 51997–52008. [Google Scholar] [CrossRef]
- Yang, S.; Zhou, Y. Abnormal Object Detection with an Improved YOLOv8 in the Transmission Lines. In Proceedings of the 2023 China Automation Congress (CAC), Chongqing, China, 17–19 November 2023; IEEE: New York, NY, USA, 2023; pp. 9269–9273. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO; Software; Ultralytics: Los Angeles, CA, USA, 2023. [Google Scholar]
- Zhao, Z.; Zhang, Q.; Yu, X.; Sun, C.; Wang, S.; Yan, R.; Chen, X. Applications of unsupervised deep transfer learning to intelligent fault diagnosis: A survey and comparative study. IEEE Trans. Instrum. Meas. 2021, 70, 3525828. [Google Scholar] [CrossRef]
- Su, H.; Xiang, L.; Hu, A.; Xu, Y.; Yang, X. A novel method based on meta-learning for bearing fault diagnosis with small sample learning under different working conditions. Mech. Syst. Signal Process. 2022, 169, 108765. [Google Scholar] [CrossRef]
- Sudharsan, R.; Ganesh, E. A Swish RNN based customer churn prediction for the telecom industry with a novel feature selection strategy. Connect. Sci. 2022, 34, 1855–1876. [Google Scholar] [CrossRef]
- Chen, Z.; Yang, J.; Feng, Z.; Zhu, H. RailFOD23: A dataset for foreign object detection on railroad transmission lines. Sci. Data 2024, 11, 72. [Google Scholar] [CrossRef]
- Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv 2022, arXiv:2206.02424. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7036–7045. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Wang, K.; Liew, J.H.; Zou, Y.; Zhou, D.; Feng, J. Panet: Few-shot image semantic segmentation with prototype alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9197–9206. [Google Scholar]
- Xu, X.; Jiang, Y.; Chen, W.; Huang, Y.; Zhang, Y.; Sun, X. Damo-yolo: A report on real-time object detection design. arXiv 2022, arXiv:2211.15444. [Google Scholar]
- Gholamalinezhad, H.; Khosravi, H. Pooling methods in deep neural networks, a review. arXiv 2020, arXiv:2009.07485. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Li, Q.; Jin, S.; Yan, J. Mimicking very efficient network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6356–6364. [Google Scholar]
- Yang, Z.; Li, Z.; Shao, M.; Shi, D.; Yuan, Z.; Yuan, C. Masked generative distillation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 53–69. [Google Scholar]
- Shu, C.; Liu, Y.; Gao, J.; Yan, Z.; Shen, C. Channel-wise knowledge distillation for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 5311–5320. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).