1. Introduction
Insulators provide electrical insulation and mechanical support for overhead transmission lines. Insulator detection refers to the process of locating the position of the insulator in inspection images. This process serves as the foundation for other tasks such as insulator defect detection and power line extraction. Recently, insulator detection has attracted significant interest from researchers in many fields such as smart grid and computer vision, and some progress has been made [
1,
2].
At present, the inspection of insulators mainly depends on the visual observation of the staff, which can easily cause omissions, unmanned aerial vehicles (UAV), with the advantages of low cost, small size, and multiple cruise modes [
3], combined with other systems such as wireless sensor networks (WSNs) [
4], which have become important auxiliary equipment in many fields such as crop identification [
5] and yield estimation [
6]. On the other hand, UAVs bring safety hazards and privacy issues, such as threats to the flight activities of birds [
7]. In addition, the communication between the drone and Ground Control Station (GCS) is vulnerable to attacks, which can cause data leakage [
8]. However, in general, the convenience brought by drones is higher. Therefore, we believe that it is feasible to use drones for power line inspections. Currently, researchers have used drones for power line identification and monitoring tasks [
9], This paper will focus on the research of insulator detection algorithms using images acquired by UAVs.
In past studies, researchers have generally used feature extraction combined with classifiers for insulator detection [
10,
11]. This type of method usually uses artificially designed operators to traverse the image to obtain the edge and texture of the insulator. This process is followed by insulator detection based on feature information. However, such methods rely heavily on the completeness of the features. As the features used cannot cover all situations, it is difficult to obtain good detection results. In addition, the feature classifier requires significant manual work, and the level of automation is low, making it difficult to apply it on a large scale in reality.
In recent years, some researchers have proposed the use of convolutional neural networks (CNNs) for insulator detection. A power line insulator detection model was proposed in [
12] for different environmental backgrounds based on YOLO. In [
13], a framework based on Faster R-CNN was proposed for different types of insulators. Currently, significant redundant information remains in the detection results from the state-of-the-art models used for object detection. An improvement in precision comes at the cost of an increase in the quantity of data and parameters. For situations with high real-time requirements and small amounts of data, it is difficult for CNNs to achieve excellent detection results. In addition, overhead transmission lines are mostly located in mountains, lakes, and suburbs. Those environments are complex, which increases the difficulty of model convergence.
One way to solve the above problems is to use a conditional Generative Adversarial Nets (CGAN) [
14]. CGAN uses aligned images to enable the framework to learn the relevant mapping from input to output. In this way, the original image and ground truth can be used to generate an insulator-detection image. Some researchers have achieved good results using CGAN in image conversion tasks. For example, a network model named pix2pix [
15], proposed by Isola et al., was used to complete the conversion between images. The pix2pix model achieved an excellent performance, but the resolution of the generated image was only 256 * 256. The researchers in [
16] noted that image generation using an adversarial network was not reliable enough; instead, the authors used perceptual loss to obtain the image. The resolution of the generated image was improved, but the performance of the semantic details was poor.
The researchers in [
17] sought to introduce semantic guidance in image generation and proposed two multi-view image-generation models, X-Fork and X-Seq. The authors inputted an image of a known perspective and the semantic segmentation map of the target perspective together into the model and provided semantic guidance to make the generated image more realistic. Inspired by the work in [
17], the researchers in [
18] proposed a SelectionGAN model based on a multi-channel attention mechanism. The SelectionGAN model further expanded the semantic generation space. This model improved the semantic details in the image by referring to the intermediate results generated and achieved advanced results in the translation of satellite images and ground images. However, the above techniques have poor compatibility with transmission line scenarios because not all scenarios have sufficient semantic segmentation maps for model training.
To solve the above problems, this paper proposes an insulator-detection image-generation model called InsulatorGAN based on an improved conditional generation confrontation network. Our model can be flexibly adapted to a variety of power component inspection tasks and other application scenarios. InsulatorGAN contains a generator and multiple discriminators. As shown in
Figure 1, our model generates images through two steps: first, it uses a coarse image module to generate a low-resolution image (LR image); next, a fine image module is used to improve the semantic information in the LR image and to generate a high-resolution image (HR Image).
The differences between InsulatorGAN and previous models are as follows. First, the generator of InsulatorGAN uses multiple stages, which ensures that InsulatorGAN can not only generate high-resolution and realistic images but can also be applied to other fields, such as image translation and style conversion. Second, to strengthen the semantic constraints in the image-generation process, we used the Monte Carlo search (MCS) [
19] to sample low-resolution target images multiple times and calculate the corresponding penalty value according to the sampling results. The penalty mechanism can force the generator to produce images with richer semantics to avoid mode collapse [
20]. Third, to improve the ability of the discriminator, based on the multi-task learning strategy of parameter sharing [
21], we proposed a discriminator framework based on a multi-scale structure. Although all the discriminators use the same network structure, the input of different resolutions allows the discriminators to cooperate with each other, extract feature maps at different abstraction levels, and accelerate the training of the model. In addition, to solve the problem where the public dataset CPLID [
1] features less data and a simple background, we used the images obtained via UAV to build the insulator image-generation dataset, InsuGenSet. Moreover, the insulator detection results output by the InsulatorGAN model can be used for data expansion. We compared InsulatorGAN with several mainstream image-generation models and achieved the best results on the CPLID and InsuGenSet, which demonstrates that our model can generate high-resolution and high-quality images. The main contributions of this paper are as follows:
This paper proposes an insulator-detection image-generation model, InsulatorGAN, based on an improved conditional Generative Adversarial Nets. This model includes a generator and multiple discriminators. Moreover, we used a two-stage method from coarse to fine to generate high-resolution insulator inspection images that can be flexibly adapted to other scenes;
To improve the constraints on the insulator image-generation process, a penalty mechanism based on the Monte Carlo search was introduced into the generator. This mechanism enables the generator to obtain sufficient semantic guidance and add more semantic details to the generated image;
Based on the parameter sharing mechanism, we propose a multi-scale discriminator structure that enables the entire discriminator network to use feature information at different levels of abstraction to determine whether the input image is true or false;
To solve the small scale of the public insulator dataset CPLID, we established a dataset called InsuGenSet for insulator-detection image generation based on real images. We conducted many comparative experiments between the InsulatorGAN and state-of-the-art models on InsuGenSet, and the results demonstrated the effectiveness and flexibility of InsulatorGAN.
The rest of this article is arranged as follows.
Section 2 introduces the related work on insulator detection and image generation.
Section 3 introduces the knowledge on GAN. The architecture of InsulatorGAN is illustrated in
Section 4. In
Section 5, we present several sets of experiments that determined the effectiveness of our framework. Finally, the conclusions and future work are outlined in
Section 6.
3. Basic Knowledge of GAN
The Generative Adversarial Network [
27] includes two adversarial learning sub-networks, a discriminator, and a generator, which are trained using the maximum–minimum game theory. The generator
G obtains an image via a d-dimensional noise vector and produces a generated image as close as possible to the real image. On the other hand, the discriminator
D is used to determine whether the input is a fake image from the generator or a real image from a real dataset. The loss function of the entire generative adversarial network is as follows:
where
represents the real image sampled from the real data distribution
, and
represents the d-dimensional noise vector sampled from the Gaussian distribution
.
CGANs [
14] control the results of model generation by introducing auxiliary variables. In the CGANs, the generator generates images based on auxiliary conditions, and the discriminator makes judgments based on auxiliary conditions and images (false images or real images). The loss function is as follows:
where
represents the auxiliary variable, and
represents the image generated by the generator.
In addition to fighting against loss, previous works [
34,
36] have also sought to minimize the L1 or L2 distance between the real and the fake images to help the generator synthesize images with greater similarities to the real images. Previous research has proven that, compared with the L2 distance, the L1 distance can help the model reduce blur and distortion in the image. Therefore, the L1 distance was also introduced into InsulatorGAN. The formula for minimizing the L1 distance is as follows:
The loss function for this type of CGANs is the sum of Equations (2) and (3).