Author Contributions
Conceptualization, N.L., X.X., S.W. and N.D.; methodology, J.Y., X.Z., S.S., X.H. and N.L.; software, J.Y., X.Z., S.S. and X.H.; validation, X.X. and N.L.; formal analysis, X.X. and N.L.; investigation, Y.Y. and S.W.; resources, N.D. and N.L.; data curation, S.W., Y.Y., J.Y. and X.Z.; writing—original draft preparation, J.Y., X.Z., S.S. and N.L.; writing—review and editing, J.Y., X.X. and N.L.; supervision, N.D.; project administration, N.D. and N.L.; funding acquisition, N.D. and N.L. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Overview of the proposed method’s pipeline. The proposed method consists of two key stages: power line local image generator (Stage 1) and defect recognition (Stage 2). Stage 1 takes the entire power line image as input and utilizes a sliding window approach to capture local images of the power lines, based on power line segmentation results. Stage 2 takes the power line local images generated by Stage 1 as input and employs a patch classification network to classify them into normal and defect regions. For the defect regions, Stage 2 visualizes them on the original image.
Figure 1.
Overview of the proposed method’s pipeline. The proposed method consists of two key stages: power line local image generator (Stage 1) and defect recognition (Stage 2). Stage 1 takes the entire power line image as input and utilizes a sliding window approach to capture local images of the power lines, based on power line segmentation results. Stage 2 takes the power line local images generated by Stage 1 as input and employs a patch classification network to classify them into normal and defect regions. For the defect regions, Stage 2 visualizes them on the original image.
Figure 2.
Brief illustration of the structure of BA-Net.
means the image or the feature map has a resolution of
and a channel number of
c. For detailed structure illustration, please refer to [
28].
Figure 2.
Brief illustration of the structure of BA-Net.
means the image or the feature map has a resolution of
and a channel number of
c. For detailed structure illustration, please refer to [
28].
Figure 3.
Illustration of the feature aggregation module (FAM). and denote feature maps in the ith and ()th branch, respectively. means the feature map has a resolution of and a channel number of c.
Figure 3.
Illustration of the feature aggregation module (FAM). and denote feature maps in the ith and ()th branch, respectively. means the feature map has a resolution of and a channel number of c.
Figure 4.
Visualization of the segmentation results from different branches of BA-Net in various scenarios.
Figure 4.
Visualization of the segmentation results from different branches of BA-Net in various scenarios.
Figure 5.
Visualization of the prediction results of BA-Net with or without dilated convolution.
Figure 5.
Visualization of the prediction results of BA-Net with or without dilated convolution.
Figure 6.
Illustration of the network architecture of BA-NetV2. means the image or the feature map has a resolution of and a channel number of c.
Figure 6.
Illustration of the network architecture of BA-NetV2. means the image or the feature map has a resolution of and a channel number of c.
Figure 7.
Output of each key step of the postprocessing procedure. (a) Segmented image. (b) Finding and filtering the minimum bounding rectangle of the obtained connected components, indicated by a green box. (c) Minimum bounding rectangles of connected components belonging to the same power line after hierarchical clustering, indicated by a red box. (d) Straight lines fitted using the least squares method for connected components belonging to the same power line, indicated by a blue line segment.
Figure 7.
Output of each key step of the postprocessing procedure. (a) Segmented image. (b) Finding and filtering the minimum bounding rectangle of the obtained connected components, indicated by a green box. (c) Minimum bounding rectangles of connected components belonging to the same power line after hierarchical clustering, indicated by a red box. (d) Straight lines fitted using the least squares method for connected components belonging to the same power line, indicated by a blue line segment.
Figure 8.
Illustration of the power line image patch cropping process. The left part shows the flow path of the cropping process. The right part visualizes the information corresponding to each block in the left part.
Figure 8.
Illustration of the power line image patch cropping process. The left part shows the flow path of the cropping process. The right part visualizes the information corresponding to each block in the left part.
Figure 9.
Illustration of the key information involved in determining the coordinates of a sliding window. are the coordinates of the starting point and end point of the current sliding window, respectively; k and are the slope and width of the power line, respectively. w and h are adjustable parameters to control the width and height of the image patch. Empirically, we set w and h to 6 and 24, respectively.
Figure 9.
Illustration of the key information involved in determining the coordinates of a sliding window. are the coordinates of the starting point and end point of the current sliding window, respectively; k and are the slope and width of the power line, respectively. w and h are adjustable parameters to control the width and height of the image patch. Empirically, we set w and h to 6 and 24, respectively.
Figure 10.
The architecture of the patch classification network. It consists of a modified MobileNetV2 backbone network, along with a segmentation head and a classification head. W, H, and C indicate the width, height, and channel number of the feature maps. “×2” means stacking of two same blocks. At the output of the classification head, “0” and “1” corresponds to the prediction of “normal” and “defective”, respectively.
Figure 10.
The architecture of the patch classification network. It consists of a modified MobileNetV2 backbone network, along with a segmentation head and a classification head. W, H, and C indicate the width, height, and channel number of the feature maps. “×2” means stacking of two same blocks. At the output of the classification head, “0” and “1” corresponds to the prediction of “normal” and “defective”, respectively.
Figure 11.
Samples of the Mixed-PLS power line segmentation dataset. This dataset is a combination of three sub-datasets, i.e., the relabeled PLDU dataset and PLDM dataset [
12], and the dataset collected by ourselves.
Figure 11.
Samples of the Mixed-PLS power line segmentation dataset. This dataset is a combination of three sub-datasets, i.e., the relabeled PLDU dataset and PLDM dataset [
12], and the dataset collected by ourselves.
Figure 12.
Sample images and segmentation results of BA-NetV2 and the compared networks.
Figure 12.
Sample images and segmentation results of BA-NetV2 and the compared networks.
Figure 13.
Samples of the AIRS-PLSB dataset and the AIRS-PLIP dataset. The source images of power lines originate from the AIRS-PLSB dataset. The AIRS-PLIP dataset was cropped from the AIRS-PLSB dataset. Each sample of the AIRS-PLIP dataset consists of a RGB image patch, along with its corresponding image-level classification label and pixel-level classification label.
Figure 13.
Samples of the AIRS-PLSB dataset and the AIRS-PLIP dataset. The source images of power lines originate from the AIRS-PLSB dataset. The AIRS-PLIP dataset was cropped from the AIRS-PLSB dataset. Each sample of the AIRS-PLIP dataset consists of a RGB image patch, along with its corresponding image-level classification label and pixel-level classification label.
Figure 14.
Samples of visualized prediction results of the proposed method. Better viewed in enlarged electronic edition.
Figure 14.
Samples of visualized prediction results of the proposed method. Better viewed in enlarged electronic edition.
Figure 15.
Examples of missed detection and false detection caused by the extremely thin strand breakage (upper left), the extremely complex background (upper right), and the fittings connected to the power lines and towers. Better viewed in enlarged electronic edition.
Figure 15.
Examples of missed detection and false detection caused by the extremely thin strand breakage (upper left), the extremely complex background (upper right), and the fittings connected to the power lines and towers. Better viewed in enlarged electronic edition.
Figure 16.
A power line with a thin long broken strand and the processing results of power line segmentation, center line extraction and final strand breakage detection. (a) A sample image with a thin long broken stand snipped from a power line inspection image. (b) Power line segmentation result. (c) Center line extraction result (the blue lines). (d) Final strand breakage detection result.
Figure 16.
A power line with a thin long broken strand and the processing results of power line segmentation, center line extraction and final strand breakage detection. (a) A sample image with a thin long broken stand snipped from a power line inspection image. (b) Power line segmentation result. (c) Center line extraction result (the blue lines). (d) Final strand breakage detection result.
Table 1.
The hyperparameter setting for training the BA-NetV2 model.
Table 1.
The hyperparameter setting for training the BA-NetV2 model.
Hyperparameters | Setting |
---|
Initial learning rate | 0.001 |
Minimum learning rate | 0.00001 |
Momentum | 0.9 |
Weight decay | 0.0005 |
Input image size | 512 × 512 |
Batch size | 32 |
Training steps | 20,000 |
Table 2.
Results of the ablation study experiment for BA-NetV2. A check mark indicates the modification is applied, while a X brush means the modification is not applied.
Table 2.
Results of the ablation study experiment for BA-NetV2. A check mark indicates the modification is applied, while a X brush means the modification is not applied.
Reduction of Parallel Branches | Expansion of Neural Network Base Channels | Dilated Convolution Branches | IoU (%) | mIoU (%) | Speed (Images/s) |
---|
✗ | ✗ | ✗ | 63.6 | 81.4 | 30.5 |
✓ | ✗ | ✗ | 64.0 | 81.6 | 58.3 |
✓ | ✓ | ✗ | 64.6 | 81.9 | 48.7 |
✓ | ✗ | ✓ | 64.4 | 81.8 | 57.8 |
✓ | ✓ | ✓ | 66.8 | 83.0 | 47.9 |
Table 3.
Performances of different methods on the power line segmentation dataset.
Table 3.
Performances of different methods on the power line segmentation dataset.
Method | IoU (%) | mIoU (%) | Speed (Images/s) |
---|
BA-NetV2 | 66.8 | 83.0 | 47.9 |
BA-Net | 63.6 | 81.4 | 30.5 |
Fast SCNN | 54.9 | 76.9 | 124.0 |
HEDNet | 59.6 | 78.7 | 23.0 |
U-Net | 66.0 | 77.7 | 9.7 |
FastFCN | 53.7 | 76.3 | 11.7 |
DeepLabV3+ | 61.0 | 80.0 | 44.9 |
DDRNet | 59.3 | 79.1 | 51.4 |
Table 4.
The hyperparameter setting for training the proposed patch classification network and the compared networks.
Table 4.
The hyperparameter setting for training the proposed patch classification network and the compared networks.
Hyperparameters | Setting |
---|
Initial learning rate | 0.001 |
Minimum learning rate weight decay | 0.0001 |
Input image size | 3 × 224 × 224 |
Batch size | 32 |
Training steps | 9000 |
Table 5.
Performance of different classification methods on AIRS-PLIP dataset.
Table 5.
Performance of different classification methods on AIRS-PLIP dataset.
Method | Precision | Recall | Average Precision | Speed (Images/s) |
---|
Ours | 0.96 | 0.90 | 0.95 | 257.9 |
SegDec | 0.97 | 0.87 | 0.86 | 45.5 |
ResNet-50 | 0.893 | 0.838 | 0.914 | 143.7 |
VGG16 | 0.811 | 0.792 | 0.863 | 118.1 |
MobileNetV2 | 0.857 | 0.785 | 0.855 | 287.3 |
Swin-Tiny | 0.857 | 0.877 | 0.899 | 125.5 |
Table 6.
Performance of different detection methods on the test set of the AIRS-PLSB dataset.
Table 6.
Performance of different detection methods on the test set of the AIRS-PLSB dataset.
Method | Precision | Recall | F1-Score | Speed (Images/s) |
---|
Ours | 0.833 | 0.769 | 0.800 | 11.5 |
YOLOv5m | 0.769 | 0.577 | 0.659 | 15.4 |
YOLOv5s | 0.775 | 0.596 | 0.674 | 23.4 |
YOLOv7 | 0.783 | 0.692 | 0.735 | 9.2 |
YOLOv7-tiny | 0.660 | 0.596 | 0.626 | 24.3 |
ATSS | 0.638 | 0.712 | 0.673 | 9.2 |
EfficientDet-D3 | 0.756 | 0.569 | 0.667 | 9.6 |
Table 7.
Performance of different classification methods trained with half of the training set in the AIRS-PLIP dataset and the performance drop compared with full data training. “Full” indicates the models were trained with the full training set in the AIRS-PLSB dataset; “Half” indicates the models were trained with the half training set in the AIRS-PLSB dataset; “Drop” indicates the performance decrease from “Full” to “Half”. The best performance in each column is highlighted in bold.
Table 7.
Performance of different classification methods trained with half of the training set in the AIRS-PLIP dataset and the performance drop compared with full data training. “Full” indicates the models were trained with the full training set in the AIRS-PLSB dataset; “Half” indicates the models were trained with the half training set in the AIRS-PLSB dataset; “Drop” indicates the performance decrease from “Full” to “Half”. The best performance in each column is highlighted in bold.
Method | Precision | Recall | F1-Score |
---|
Full | Half | Drop | Full | Half | Drop | Full | Half | Drop |
---|
Ours | 0.833 | 0.784 | 0.049 | 0.769 | 0.769 | 0 | 0.800 | 0.777 | 0.023 |
YOLOv5m | 0.769 | 0.722 | 0.047 | 0.577 | 0.500 | 0.077 | 0.659 | 0.591 | 0.068 |
YOLOv5s | 0.775 | 0.630 | 0.145 | 0.596 | 0.558 | 0.038 | 0.674 | 0.592 | 0.082 |
YOLOv7 | 0.783 | 0.730 | 0.053 | 0.692 | 0.519 | 0.173 | 0.735 | 0.607 | 0.128 |
YOLOv7-tiny | 0.660 | 0.542 | 0.118 | 0.596 | 0.500 | 0.096 | 0.626 | 0.520 | 0.106 |
ATSS | 0.638 | 0.579 | 0.059 | 0.712 | 0.423 | 0.289 | 0.673 | 0.489 | 0.184 |
EfficientDet-D3 | 0.756 | 0.700 | 0.056 | 0.596 | 0.538 | 0.058 | 0.667 | 0.609 | 0.058 |