1. Introduction
With the mounting electricity demands of society, the inspection and maintenance of transmission lines have become increasingly crucial [
1,
2]. Traditional manual inspection methods are relatively slow, costly, and encompass inherent risks. The utilization of UAVs for electrical inspections offers many advantages, including safety, efficiency, flexibility, cost-effectiveness, and minimal constraints [
3,
4,
5,
6,
7]. Consequently, UAVs have become widely adopted in power line inspections, becoming the principal tool for the electric utility sector [
8,
9,
10].
UAV inspections require the collection of a large volume of image data, and manually inspecting this vast dataset is time-consuming. Furthermore, the quality of the inspection is subject to the subjective judgment and skill level of the personnel. The varying quality of image data could potentially lead to erroneous or missed detections [
5]. In recent years, machine vision technology has significantly enhanced inspection efficiency [
11]. Jenssen et al. [
12] proposed a computer vision-based power line inspection method that uses UAV optical images as the data source, combined with deep learning algorithms for data analysis and detection. This approach allows for the automatic detection of safety risks such as missing pole top pads, incorrectly installed insulators, cracked poles, and damage from woodpeckers. Han et al. [
13] introduced a computer vision algorithm that can detect damaged insulator discs and foreign objects lodged between two discs. Ma et al. [
14] proposed a method for the detection of transmission line insulators that combines UAV imagery with binocular vision perception technologies. This method can quickly and intelligently detect insulator damage and absences, while also using the global positioning system (GPS) and UAV flight parameters to calculate the spatial coordinates of the insulators.
These studies demonstrate that machine vision technology can significantly enhance the efficiency of inspection operations and is a crucial strategy in the advancement toward AI-driven power line inspections. However, these models require substantial computational resources, have large model sizes, and have slow processing speeds, creating significant challenges for practical deployment. Therefore, the industry is increasingly demanding lightweight and compact electrical inspection models with high efficiency.
Since 2012, computer vision technology based on deep convolutional neural networks has rapidly developed. Object detection has become a research hotspot in the field of computer vision. Object detection algorithms can be categorized into two main classes: Two-stage and one-stage. Two-stage algorithms involve candidate box generation in the first stage and accurate target localization in the second stage. Representative algorithms include R-CNN [
15], Faster R-CNN [
16], and Mask R-CNN [
17]. In contrast, one-stage algorithms predict the target category and location simultaneously, making them suitable for real-time detection tasks. Notable examples include YOLO [
18] and SSD [
19]. Thanks to its speed and accuracy, YOLO quickly gained significant attention in various fields, including transportation [
20,
21,
22], agriculture [
23,
24,
25,
26], epidemic prevention [
27,
28], geological monitoring [
29], urban management [
30], and medical diagnosis [
31,
32]. The power industry is also exploring the application of YOLO algorithms in power line inspection work.
For instance, Chen et al. [
33] proposed an electrical component recognition algorithm framework based on SRCNN and YOLOv3. The SRCNN network is used to perform super-resolution reconstruction of blurry images to expand the dataset, while YOLOv3 is used to recognize electrical components. Chen et al. [
34] used YOLOv3 to propose a solution for pole detection and counting based on UAV patrol videos, enabling rapid post-disaster assessment of fallen poles. Tu et al. [
35] proposed a model for recognizing towers and insulators based on an improved YOLOv3 algorithm. The authors removed the 52 × 52 scale feature extraction layer and pruned the three-scale feature extraction layers down to two to enhance computational speed. Simultaneously, the K-means++ clustering method was employed to calculate anchor box dimensions, thereby improving detection accuracy. Zhan [
36] created a dedicated dataset for electrical equipment and compared the detection performance of Faster R-CNN, Mask R-CNN, YOLO, and SSD. Bao et al. [
37] improved the performance of the YOLOv5x model for detecting defects in insulators and vibration dampers using UAV remote sensing images by incorporating a coordinate attention (CA) module and replacing the original PANet feature fusion framework with a bidirectional feature pyramid network (BiFPN).
However, these existing studies mainly employed the early versions of the YOLO networks, which had inferior comprehensive performance. Even with improvements to these algorithms, their performance is hardly comparable to the new version of the YOLO network. Moreover, as the data sets used in these studies usually have lower image resolution, larger object sizes, and lower background complexity, with only a few types of objects, they cannot adequately meet the needs of actual complex inspection scenarios.
To resolve the limitations of previous works and better address the demands of real-world inspection, we have constructed a tailored, high-quality dataset with characteristics aligned with practical scenarios. On this basis, we propose TLI-YOLOv5, an advanced lightweight object detection framework specifically designed for transmission line inspection. The main contributions and innovations of this paper are summarized as follows:
We constructed a UAV transmission line inspection dataset. The dataset includes 1231 high-resolution images, with as many as 8 types of labeled targets, and a total of 39,151 annotated boxes. The images were captured across various provinces and cities in China, resulting in a dataset with a rich variety of scenarios, a large volume, and high-quality images. This dataset provides a robust foundation for training high-quality models.
We introduced YOLOv5n to the field of transmission line inspection, a novel application that has not been explored before. YOLOv5n, the latest model released in the YOLOv5v6.0 version, exhibits faster speeds and a smaller size compared to its predecessors such as YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. These characteristics make it particularly well-suited for large-scale, real-time transmission line inspection tasks.
We constructed the TLI-YOLOv5, a lightweight object detection framework for transmission line inspection, built upon the foundation of YOLOv5. Firstly, we incorporated the parameter-free attention module, SimAM, into the backbone of the YOLOv5n network, which enables a bolstered feature extraction ability without increasing the network parameters. Secondly, the loss function was improved to Wise-IoU (WIoU), enhancing the model’s accuracy, robustness, and generalization ability. Furthermore, we employed transfer learning techniques to expedite the convergence rate during training and augment the learning performance of the model. We also adopted a cosine learning rate decay strategy to ensure a more stable training process, hasten the convergence speed, and thus improve the better generalization ability of the model.
We validated the proposed TLI-YOLOv5 on our transmission line inspection dataset. The experimental evaluations revealed that, in comparison to the original YOLOv5n model, the proposed TLI-YOLOv5 model exhibited measurable improvements. Specifically, precision improved by 0.40%, recall increased by 4.01%, the F1 score rose by 1.69%, the mean average precision at 50% IoU (mAP50) increased by 2.91%, and the mean average precision from 50% to 95% IoU (mAP50-95) also increased by 0.74%. Moreover, the model maintained a recognition speed of 76.1 frames per second (FPS) and a compact size of only 4.15 MB.
The remaining sections of this paper are structured as follows. In
Section 2, we provide a detailed exposition on the construction of our UAV transmission line inspection dataset, along with a comprehensive elaboration on the architecture and principles of TLI-YOLOv5. This includes the principles and integration of YOLOv5n, the SimAM attention module, and the WIoU loss function. We also describe the training methods involving transfer learning and cosine learning rate decay. In
Section 3, we present a comprehensive description of the rigorous experimental procedures undertaken.
Section 4 is dedicated to a thorough discussion of the strengths and limitations of the proposed framework. Finally, in
Section 5, we provide a concise summary that outlines the main findings and contributions of our study.
4. Discussion
Table 8 displays the detection performance of TLI-YOLOv5 for eight different objects. The confusion matrix of TLI-YOLOv5’s detection results is illustrated in
Figure 11. The
x-axis represents the actual categories, while the
y-axis indicates the predicted categories. Each cell in the matrix represents the proportion of instances where the model predicted a given actual category as the predicted category. All proportion values in the matrix have been normalized to the range of 0 to 1.
Among all eight categories, the model performs best in detecting nest, followed by vibration damper and tower. The detection results for insulator, yoke plate, and line clamp are at an average level. In contrast, the worst-performing categories are corona ring and tower sign, whose performance is significantly lower than the other six objects.
Delving into the images of these two problematic categories, we can identify the contributing factors to these discrepancies.
For the corona ring, it holds the position of the smallest object among all eight categories. Its small size inherently hampers detection effectiveness. Specifically, smaller objects occupy fewer pixels in an image, making it more challenging for the model to learn and extract meaningful features.
As for the tower sign, in some images captured at mid to long distances, tower signs still have high recognizability due to their distinctive features, despite their small size. Thus, we annotated them in the dataset. However, similar to other small objects, the model faces difficulty accurately detecting them. As illustrated in
Figure 12, among the seven tower signs that we annotated in the original image, only one was successfully detected by the model.
Figure 13 provides a clear visual comparison of the performance differences between the original YOLOv5n and TLI-YOLOv5. It is evident that the confidence threshold of the detection bounding boxes by TLI-YOLOv5 is generally higher than that of the original YOLOv5n. TLI-YOLOv5 successfully detects the largest object in the image, the tower, which YOLOv5n fails to detect.
Figure 14 presents the detailed sections of
Figure 13. As demonstrated in
Figure 14a, TLI-YOLOv5 successfully detected the obscured object hidden behind the tower body, which the original YOLOv5n was unable to detect. Furthermore, as illustrated in
Figure 14b, TLI-YOLOv5 generates more accurate detection bounding boxes than the original YOLOv5n for densely arranged objects.
As a lightweight object detection framework designed for real-time detection and video detection, TLI-YOLOv5, similar to other lightweight models, compromises a degree of precision for a higher detection speed. In the context of UAV transmission line inspection, the UAV’s visuals constitute a continuous video stream. The speed of detection not only determines the fluidity of the UAV’s operation but also significantly affects the efficiency of the inspection work. Through testing, TLI-YOLOv5’s performance metrics satisfy the demands of such tasks, striking a well-managed balance between precision and speed.
To further analyze the effectiveness of our proposed TLI-YOLOv5, we compare it with some other existing object detection methods designed for UAV-based transmission line inspection.
Previous works [
33,
34,
35] adopted YOLOv3 as the detection algorithm. However, YOLOv3 has been widely recognized to be surpassed by YOLOv5 in comprehensive performance, which is also validated in our experiments in
Section 3.3.5. Previous works [
36,
37] employed earlier versions of YOLOv5, whose capabilities fall far behind the 7.0 version used in our study. Moreover, they did not utilize the most lightweight YOLOv5n model, thus their detection speed is much lower compared to our approach.
Most importantly, our dataset contains eight types of transmission line objects and is annotated based on real-world UAV inspection images. Therefore, our proposed method has a wider range of application scenarios and is more suitable for the needs of modern unmanned aerial vehicle transmission line inspection. It demonstrates noticeable superiority over existing approaches.
Despite its strengths, our work has certain limitations. First, transmission line inspections involve a wide array of components. Due to constraints of time and manpower, our research only annotated and trained eight types of components. In future work, we plan to incorporate more components into our detection targets. Additionally, we aim to augment our dataset with a diverse range of medium- and large-sized objects, enabling the model to learn and capture more intricate feature details. Second, with the continuous introduction of more sophisticated algorithms and improvement strategies in the field of object detection, it is worth exploring their application to UAV transmission line inspection tasks. For example, recent works such as [
47,
48] on network pruning could help reduce model complexity. Techniques such as weight quantization [
49,
50] may be adopted for significant memory reduction. We hope our work can provide a baseline for future explorations to further advance UAV-based transmission line inspection capabilities.
5. Conclusions
This study proposes TLI-YOLOv5, a lightweight object detection framework specially developed for transmission line inspection. By integrating the parameter-free attention module SimAM, the WIoU loss function, and incorporating advanced training strategies, TLI-YOLOv5 demonstrates enhanced performance in various metrics compared to the original YOLOv5n, while maintaining high recognition speed and compactness. This work contributes a practical and efficient solution for large-scale, real-time transmission line inspection tasks.
However, there remain some limitations in our current research. First, the number of object categories is still limited. In future work, more types of components could be incorporated as detection targets to expand the model’s capabilities. Second, While TLI-YOLOv5 achieves a balance between accuracy and efficiency, using more advanced networks could potentially further boost detection performance. We aim to explore state-of-the-art architectures to continue pushing the frontiers. Third, creating larger datasets covering more environments would also be beneficial. Overall, this study offers a robust baseline model, and future research will focus on expanding in terms of category diversity, maximizing accuracy, and improving generalizability.
Although object detection has made rapid progress, there remain challenges in integration and implementation within real-world transmission line inspection applications. Factors such as dataset limitations, model generalization, efficiency constraints, and transition costs can hinder adoption. Our future work will concentrate on tackling these pitfalls to facilitate the broader deployment of object detection algorithms in transmission line inspection systems. We believe continued research and technical development efforts will progressively bridge the gap between state-of-the-art techniques and practical usage in the field of transmission line inspection.