Next Article in Journal
User Perception-Based Optimal Route Selection for Vehicles of Disabled Persons in Urban Centers of Saudi Arabia
Previous Article in Journal
Machine Learning Models Based on [18F]FDG PET Radiomics for Bone Marrow Assessment in Non-Hodgkin Lymphoma
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

LEM-Detector: An Efficient Detector for Photovoltaic Panel Defect Detection

1
School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China
2
School of Medical and Health Engineering, Changzhou University, Changzhou 213164, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(22), 10290; https://doi.org/10.3390/app142210290
Submission received: 6 October 2024 / Revised: 29 October 2024 / Accepted: 7 November 2024 / Published: 8 November 2024

Abstract

:
Photovoltaic panel defect detection presents significant challenges due to the wide range of defect scales, diverse defect types, and severe background interference, often leading to a high rate of false positives and missed detections. To address these challenges, this paper proposes the LEM-Detector, an efficient end-to-end photovoltaic panel defect detector based on the transformer architecture. To address the low detection accuracy for Crack and Star crack defects and the imbalanced dataset, a novel data augmentation method, the Linear Feature Augmentation (LFA) module, specifically designed for linear features, is introduced. LFA effectively improves model training performance and robustness. Furthermore, the Efficient Feature Enhancement Module (EFEM) is presented to enhance the receptive field, suppress redundant information, and emphasize meaningful features. To handle defects of varying scales, complementary semantic information from different feature layers is leveraged for enhanced feature fusion. A Multi-Scale Multi-Feature Pyramid Network (MMFPN) is employed to selectively aggregate boundary and category information, thereby improving the accuracy of multi-scale target recognition. Experimental results on a large-scale photovoltaic panel dataset demonstrate that the LEM-Detector achieves a detection accuracy of 94.7% for multi-scale defects, outperforming several state-of-the-art methods. This approach effectively addresses the challenges of photovoltaic panel defect detection, paving the way for more reliable and accurate defect identification systems. This research will contribute to the automatic detection of surface defects in industrial production, ultimately enhancing production efficiency.

1. Introduction

Defects in photovoltaic panels pose a significant challenge, not only reducing power generation efficiency but also presenting a fire hazard. Early detection of these defects during the manufacturing process is crucial to prevent defective panels from advancing to subsequent production stages [1]. Photovoltaic panel defects appear as non-luminous dark areas in electroluminescence (EL) imaging, making it possible to detect defects through electroluminescence images of photovoltaic panels.
These defects exhibit wide variation in scale and diversity, ranging from minute cracks to extensive structural damage. Each type of defect presents unique characteristics, further complicating the detection process. Moreover, manufacturing processes and raw materials often introduce a significant number of randomly distributed, variably shaped grain patterns in photovoltaic panel images. These patterns closely resemble defects, creating complex backgrounds that interfere with defect detection. This interference increases the likelihood of false positives and missed detections, posing substantial challenges to the detection task. Figure 1 illustrates common defects found in photovoltaic panels.
Traditional computer vision-based methods have been widely employed for automated defect identification in photovoltaic panel images. These approaches typically extract features from texture, color, shape, and spectral cues using handcrafted feature descriptors, followed by utilizing classifiers for defect detection [2,3]. However, the feature extraction process heavily relies on handcrafted descriptors, which often involves complex parameter configurations and requires significant domain expertise. Moreover, these techniques are typically developed for specific applications, leading to limited generalization and robustness across diverse defect types and scenarios.
Object detection models are broadly categorized into one-stage and two-stage approaches. One-stage models, such as SSD [4] and YOLO [5], prioritize speed and real-time detection. Two-stage models, like R-CNN [6] and Faster R-CNN [7], emphasize accuracy but often come at the cost of computational efficiency. While R-CNN variants excel in precision, they are resource-intensive, requiring significant memory and processing power. SSD, although faster, struggles with detecting smaller objects and lags behind YOLO in real-time performance.
The YOLO family of models demonstrates a compelling balance between accuracy and speed. However, their reliance on Non-Maximum Suppression [8] for eliminating redundant detections can negatively impact both speed and accuracy. The DETR series [9], on the other hand, eliminates Non-Maximum Suppression altogether, improving both efficiency and accuracy. However, DETR models necessitate substantial computational resources.
RT-DETR [10] addresses these challenges by efficiently processing multi-scale features and introducing IoU-aware query selection, which provides the decoder with more accurate initial object queries. This approach surpasses existing real-time detectors in both accuracy and speed, while eliminating the need for post-processing, leading to a stable and delay-free inference process.
This paper presents an efficient end-to-end detector for photovoltaic panel defect detection, the LEM-Detector, drawing inspiration from the advancements of RT-DETR. The proposed approach addresses the challenges of imbalanced training data and effective feature extraction for multi-scale defect detection.
To tackle the issue of imbalanced training samples, a novel Linear Feature Augmentation (LFA) module is introduced. LFA utilizes the cosine similarity function as a triggering threshold to augment linear defect samples, thereby enhancing the model’s accuracy and robustness.
Furthermore, an Efficient Feature Enhancement Module (EFEM) is proposed to effectively capture subtle texture information associated with defects. The EFEM utilizes dynamic convolutional kernels based on feature weight analysis to improve extraction efficiency. Long-range spatial dependencies are captured by the module, enabling the selection of relevant edge information while suppressing redundant features. Group Normalization and one-dimensional large kernel convolutions further enhance the generalization capability of the EFEM, enabling accurate feature capture.
To address the issue of information loss during feature fusion, a Multi-Scale Multi-Feature Fusion Pyramid Network (MMFPN) is employed. The MMFPN leverages a Complementary Information Aggregation module (CIA), focusing on clearer boundary information in shallow features and richer category information in deep features. This bidirectional fusion mechanism between high- and low-resolution features improves information transfer, enhancing multi-scale feature fusion and boosting the accuracy of multi-scale object recognition.
The contributions of this paper are summarized as follows:
(1)
Novel LFA module: Addresses imbalanced training data through cosine similarity-based augmentation, improving detection accuracy and robustness.
(2)
Efficient EFEM module: Captures subtle texture information and enhances feature representation through dynamic convolutions, long-range dependency analysis, GN and large kernel convolutions.
(3)
MMFPN architecture: Enhances multi-scale feature fusion by combining complementary information from different feature levels, improving information transfer and accuracy.
The remainder of this paper is structured as follows. Section 2 provides a review of existing works related to photovoltaic panel defect detection. Section 3 delves into the methodology employed in this study, outlining the proposed model and its components in detail. Section 4 presents a comprehensive analysis of the proposed model through comparative and ablation experiments, demonstrating its effectiveness and performance. Section 5 concludes the paper, summarizing the key findings and contributions of the research.

2. Related Work

2.1. Data Augmentation

Data augmentation is a widely used technique in computer vision [11], employed to generate synthetic training samples from existing data. By applying various transformations and processing, data augmentation increases the diversity and quantity of datasets, ultimately enhancing the model’s generalization ability and robustness [12,13].
Data augmentation methods encompass a broad spectrum, ranging from traditional image processing techniques to model-based approaches. Traditional methods include transformations such as flipping, rotation, scaling, translation, cropping, color manipulation, noise addition, blurring, and affine transformations. Model-based approaches, on the other hand, leverage techniques like Generative Adversarial Networks [14], Variational Autoencoders (VAEs) [15], and adversarial sample enhancement [16].
Prior research has explored the application of data augmentation in photovoltaic panel defect detection. Tang et al. [17] proposed an automatic classification method for monocrystalline silicon solar cell defects that utilizes geometric morphological features for data augmentation, requiring only a limited number of training samples. Jain et al. [18] employed Generative Adversarial Networks to generate synthetic images of surface defects, training the generator to synthesize new defect images based on random noise. Wei et al. [19] utilized a linear weighted fusion method for fabric defect image reconstruction, producing a residual map that highlights defects and facilitates pixel-level detection of various defect types. Wang et al. [20] addressed the challenges of limited training data and data imbalance in photovoltaic panel detection models through data augmentation and weighted classification methods, demonstrating significant performance improvements.
These studies collectively highlight the significant impact of data augmentation methods in enhancing the detection performance of photovoltaic panels, emphasizing the importance of diversifying and expanding training data for robust and accurate defect detection.

2.2. Feature Fusion Methods

Deep learning approaches for photovoltaic panel defect detection face challenges stemming from the wide variation in defect scales. Single-layer features often lack sufficient multi-scale information, hindering accurate detection of defects across different sizes. Consequently, enhancing model robustness to scale variations and accurately detecting multi-scale defects is crucial.
Feature Pyramid Networks (FPNs) [21] have gained widespread adoption in object detection networks due to their multi-layer feature fusion mechanism, which improves multi-scale representation capabilities [22,23]. However, the reduction in feature channels and layer-by-layer transmission methods within FPNs can lead to information loss, particularly affecting the detection of small targets and limiting accuracy for multi-scale object detection [24].
Various approaches have been proposed to address these challenges. Liu et al. [25] combined CNNs with a feature pyramid structure in a Faster R-CNN-based solar cell defect detection method, enhancing robustness to defect scales by capturing and fusing multi-scale feature information. Su et al. [26] integrated attention mechanisms with bidirectional feature pyramids to improve detection accuracy and model robustness. Tang et al. [27] focused on local and edge information during feature fusion, improving accuracy in medical image segmentation. Zhou et al. [28] proposed a dual pyramid network that further integrates multi-scale features, allowing different feature layers to share similar semantic characteristics, enabling precise localization and classification. Zhao et al. [10] focused on fusing higher-level features with richer semantic concepts.
RT-DETR, specifically, utilizes an Attention-based Intra-scale Feature Interaction (AIFI) module for intra-scale interactions in high-level features, reducing computational redundancy. Feature fusion is achieved through a Cross-scale Feature Fusion (CCFF). While this approach reduces computational complexity and memory usage, it can result in some information loss. This is particularly detrimental for photovoltaic panel defect detection, as these images often contain many small-scale defects that occupy few pixels and carry limited information. This fusion method leads to poor feature representation for small-scale defects, negatively impacting model accuracy.
To address this limitation, we propose a multi-scale multi-feature fusion approach during the feature fusion process. This approach maximizes the retention of small-scale information, effectively enhancing the multi-scale detection performance of photovoltaic panels.

2.3. Detection Methods Incorporating Attention Mechanisms

Attention mechanisms are widely employed in computer vision as a means of adaptively learning weight coefficients that emphasize regions of interest while suppressing irrelevant background information. This selective focus enhances model performance and efficiency. In computer vision, common attention mechanisms include Channel Attention [29], Spatial Attention [30], and Self-Attention [31]. These mechanisms selectively focus on specific areas across different dimensions, effectively highlighting target regions and improving the model’s ability to extract relevant features.
Attention mechanisms operate by adjusting weights across different dimensions to highlight relevant information. Channel Attention mechanisms focus on feature dimensions, emphasizing channels that contribute most to object classification by adjusting their weights. Spatial Attention mechanisms focus on spatial dimensions, learning weights for each position to highlight areas crucial for object classification. Self-Attention mechanisms, designed for sequential data, consider the relationships between elements and adjust their weights to emphasize those of significant importance. While these mechanisms differ in their implementation, their common goal is to enhance model performance.
Hou et al. [32] proposed a novel attention mechanism called Coordinate Attention (CA). CA integrates both channel information and directional positional information, establishing global connections to facilitate improved defect detection. Xu et al. [33], recognizing limitations in CA, identified the detrimental effects of dimensionality reduction on channel attention and the complexity of the attention generation process. To address these challenges, they proposed a combination of one-dimensional convolution and Group Normalization feature enhancement techniques, demonstrating enhanced performance and generalization capabilities.
Photovoltaic panel images often contain a large number of grains, a consequence of manufacturing processes and raw materials. These grains are randomly distributed and exhibit variations in scale and shape, presenting significant challenges for defect detection. Su et al. [34] introduced a Region Proposal Attention Network to address this issue, employing a novel approach that combines channel and spatial attention. This network refines the feature maps extracted by Convolutional Neural Networks (CNNs), enhancing features within defect areas while suppressing background noise. This strategy significantly improves classification and detection performance, demonstrating the effectiveness of attention-based feature processing.
However, the close resemblance between some grains and defects can lead to misclassification and a high false detection rate. Additionally, information related to small-scale defects may be lost during feature extraction, impacting detection effectiveness. Precise feature extraction amidst complex background interference remains a crucial challenge.
Zhang et al. [35] addressed this challenge by proposing receptive field attention. This approach not only focuses on spatial features within the receptive field but also provides effective attention weights for convolutional kernels, effectively addressing the issue of parameter sharing in standard convolutions. By selectively enhancing key features during the feature extraction phase, rather than optimizing features after extraction, this method improves model performance and accuracy.
In summary, the primary challenges faced in photovoltaic panel defect detection include the following: (1) limited and imbalanced defect samples in the dataset, (2) strong background interference, making it easy to overlook small-scale features during feature extraction, leading to false positives and missed detections, and (3) poor defect recognition capabilities with insufficient localization accuracy. These issues affect the stability of automated defect detection in industrial products and hold value for further research.

3. Methods

The LEM-Detector comprises four key modules: an LFA, a Backbone with an integrated EFEM, an MMFPN, and a transformer decoder with auxiliary prediction heads. Figure 2 provides a visual representation of the LEM-Detector framework.
The input image first passes through the LFA module before entering the Backbone, which incorporates the EFEM. The Backbone sequentially processes the image through four blocks, extracting features denoted as S 2 , S 3 , S 4 , and S 5 . These extracted features are then input into the encoder.
The encoder begins by processing S 5 through an AIFI module. Subsequently, S 5 is fused with the other feature layers ( S 2 , S 3 , and S 4 ) in a multi-scale multi-feature fusion process. A minimum uncertainty query selection method is then applied to identify a fixed number of encoder features as the initial object queries for the decoder.
Finally, the decoder iteratively refines these object queries using auxiliary prediction heads to generate class labels and bounding box predictions. This iterative optimization process enables the model to accurately identify and locate objects within the image.
This section presents a detailed exploration of the LEM-Detector’s key components, beginning with the LFA module. We then delve into the EFEM, followed by a comprehensive discussion of the MMFPN architecture. This discussion highlights how these modules work together to facilitate effective multi-scale detection.

3.1. Linear Feature Augmentation (LFA)

The photovoltaic panel dataset encompasses both large-scale global defects and small-scale local defects with linear texture features. Large-scale defects, such as Black core, Horizontal dislocation, Vertical dislocation, and Short circuit, exhibit distinct and easily recognizable features, leading to higher detection accuracy. In contrast, small-scale defects, including Crack, Star crack, and Finger, pose significant challenges due to their high similarity to the background, complex and variable shapes, and scattered nature with strong background interference. These factors contribute to lower detection accuracy and hinder model performance improvement.
Furthermore, the dataset exhibits an imbalance in the distribution of samples, with a relative scarcity of small-scale linear texture defects. While data augmentation can modestly enhance the detection accuracy of Crack and Star crack defects, random processing of darker regions may negatively impact the accuracy of other easily recognizable defect types. Additionally, improper data augmentation can lead to overfitting, reducing model accuracy and consuming substantial computational resources, thereby lowering efficiency.
To address these challenges, a targeted approach is employed, leveraging the distribution characteristics of defect samples. A cosine similarity function is utilized to filter linear texture training samples with similar features. Subsequently, targeted data augmentation is applied specifically to these similar linear texture features, effectively addressing the issue of data imbalance and enhancing the model’s ability to detect small-scale defects while minimizing negative impacts on the detection of other defect types.
The cosine similarity function is a commonly used method for measuring the angle between two vectors. This similarity metric is achieved by calculating the cosine of the angle between the two vectors, which can be expressed mathematically as follows:
C = A B A B = i = 1 n   A i B i i = 1 n   A i 2 i = 1 n   B i 2
The cosine similarity C is defined as the cosine of the angle between two vectors, where A and B denote the dot product of the vectors, and A · B are the magnitudes of the vectors ‖ A ‖ and ‖ B ‖, respectively. The advantage of cosine similarity is that it considers only the direction of the vectors, without being affected by vector length.
During data augmentation, cosine similarity is calculated between training samples to select suitable candidates for augmentation. Samples with a cosine similarity exceeding a threshold δ, indicating similar features, are selected for augmentation. This approach preserves the crucial characteristics of linear defects while increasing the diversity of the training set, without significantly altering the original features. As depicted in Figure 3, if a training sample has at least one remaining sample with a cosine similarity exceeding δ, it proceeds to the next step of data augmentation; otherwise, no augmentation is performed.
The LFA incorporates a range of contemporary data augmentation techniques, including Affine, BboxSafeRandomCrop, Mosaic, Erase, and Mixup. These methods enhance the diversity of linear features in the photovoltaic panel dataset, simulating the impact of dark spot regions and increasing the complexity of the dataset. This enhanced data diversity improves the robustness of the detection model.
The LFA was applied to the PVEL-AD [36] dataset, ensuring that the generated samples remain sufficiently close to the real samples within the vector space. This approach enhances the diversity of training samples while preserving the actual data distribution, enabling the selection of appropriate linear feature training samples for targeted data augmentation. By focusing augmentation on specific, relevant samples, computational resources are utilized more efficiently, leading to improved training performance and model robustness. The impact of this data augmentation strategy is visually illustrated in Figure 4.

3.2. Efficient Feature Enhancement Module (EFEM)

The EFEM comprises four structurally similar blocks. Each block employs Group Convolution with a 7 × 7 kernel to process the primary feature information extracted from the image samples, optimizing computational efficiency and reducing memory overhead. Normalization and a ReLU activation function are then applied, enabling the model to capture more complex features and improve convergence speed.
To capture global information and establish long-range dependencies, the dimensions of the sample images are adjusted, and global average pooling is applied separately in both the height and width directions. To avoid information mixing, the height and width features are not directly concatenated. Instead, 1D convolutional kernels of 5 × 5 are used to extract richer feature information in both directions.
Given the limited training samples and potential memory access issues, Group Normalization is employed to normalize feature maps. Group Normalization divides channels into multiple groups, computing the mean and variance within each group. This approach mitigates high false detection rates when using small mini-batches, enhancing model stability.
Finally, the processed features are passed through a sigmoid function to select and amplify important features, allowing the model to focus on critical areas and assign appropriate weights to the feature maps. The specific content is shown in Figure 5.
The input feature information is represented as X R C × H × W , where C , H , and W denote the channel, height, and width dimensions. The input X undergoes Group Convolution, which groups the different feature maps from the input layer. Different convolution kernels are then applied to each group, thereby reducing the computational load of the convolution process. After processing, K groups of convolution information are obtained as X 1 R C K 2 × H × W , where K represents the number of grouped convolution kernels. After group normalization and ReLU activation, normalizing the input data alleviates the variations in inputs between layers in deep neural networks, resulting in more stable training. The activation function enables the neural network to fit and express more complex features.
Next, we adjust the size of X 2 R C × K H × K W for subsequent feature extraction. z c h and z c w represent the global average pooling in the vertical and horizontal directions, respectively. Two pooling kernels, sized K H × 1 and 1 × K W , are used to perform one-dimensional feature encoding on the height and width of the input feature map, yielding feature outputs in both directions. The specifics are as follows:
z c h h = 1 H 0 i < H   x c h , i
z c w w = 1 W 0 j < W   x c j , w
To better capture the global information in the images and enhance the model’s ability to focus on critical regions, we designed a more efficient processing method. This approach preserves more small-scale defect information, preventing the mixing of irrelevant information. We processed the height and width positional information extracted from z c h and z c w separately. We employed a 5 × 5 one-dimensional convolution to enhance the positional information in both the vertical and horizontal directions. This large kernel convolution effectively processes the height and width directions, allowing for a more detailed capture of spatial information and enhancing feature expression capabilities, while also capturing long-distance spatial dependencies. This approach achieves a balance between detection accuracy and parameter count. Subsequently, we utilized Group Normalization, which is less affected by mini-batch size, making it more suitable for small sample training, to process the enhanced positional information. The representations of position attention in the height and width directions were obtained as follows:
y h = σ G n F h z h
y w = σ G n F w z w
Let y h and y w represent the feature weight maps in the vertical and horizontal directions, respectively. σ denotes the sigmoid activation function, and the one-dimensional convolutions are represented as F h and F w . We set the convolution kernels for F h and F w to 5. Finally, we multiply the original input feature map X by the two weight maps to obtain the final output, denoted as Y :
Y = x c × y h × y w
The output Y is capable of more accurately capturing the positional relationships in both vertical and horizontal directions, thereby enhancing meaningful features while suppressing redundant or irrelevant ones. This allows the network to effectively capture precise local information within the data, improving its ability to represent complex patterns and making it more conducive to the detection of small-scale defects.

3.3. Multi-Scale Multi-Feature Fusion Pyramid Network (MMFPN)

As the network depth increases, the features corresponding to small-scale defects often become diminished due to the limited number of pixels they occupy in the image, leading to a loss of information and potential neglect during detection, which adversely affects detection accuracy. To address this issue and re-correct the selection of easily overlooked small-scale information, we propose the MMFPN to integrate the feature information S 2 , S 3 , S 4 , and S 5 extracted by the Backbone.
Initially, the high-level feature S 5 undergoes intra-scale feature interaction via the AIFI module, and then it is combined with the features S 2 , S 3 , and S 4 in the MMFPN. The process begins with top-down feature fusion, wherein the high-level feature maps, which contain rich global semantic information but have low spatial resolution, are upsampled and fused with the low-level feature maps. This process enriches the spatial information of high-level feature maps, significantly improving the detection performance of multi-scale targets, especially small objects. The generated fused features P 1 , P 2 , and P 3 are obtained through the fusion process as described in Equation (7), where P i represents the fused feature information, and P i = C I A ( X i , X j ) denotes the Complementary Information Aggregation module (CIA), with X i , and X j being the input feature information. Subsequently, three additional bottom-up feature fusions occur, wherein the high spatial resolution but semantically sparse information from the low-level feature maps is downsampled and fused with the high-level feature maps. This enhances the semantic information of the low-level feature maps, contributing to more accurate classification and generating the fused features P 4 , P 5 , and P 6 . The fusion process is described in Equation (8). During feature fusion, the CIA module captures easily overlooked detailed information by receiving adjacent high-level semantic and low-level semantic information. It introduces a bidirectional fusion mechanism between high-resolution and low-resolution features, ensuring more effective information transfer between features, which further enhances the results of multi-scale feature fusion. Finally, the features P 3 , P 4 , P 5 , and P 6 are input into the Decoder for precise classification and localization, as shown in Equation (9).
P 1 = C I A ( S 5 , S 4 )   P 2 = C I A ( P 1 , S 3 )   P 3 = C I A ( P 2 , S 2 )
P 4 = C I A ( P 3 , P 2 )   P 5 = C I A ( P 4 , P 1 )   P 6 = C I A ( P 5 , S 5 )
P o u t = ( P 3 , P 4 , P 5 , P 6 )
In terms of the fusion method, our proposed CIA module focuses on the complementary semantic information within feature layers. By facilitating cross-layer interactions, it learns the semantic dependencies between multi-scale channel feature mappings, effectively integrating clearer boundary information from shallow features with richer categorical information from deep features. This approach preserves the complementary information from different levels of semantic data, thereby achieving more accurate classification and localization. The specific structure of the CIA is illustrated in Figure 6.
To avoid redundancy and inconsistencies that may arise from directly merging semantic information from different feature levels, the CIA selectively aggregates boundary and semantic information. Shallow and deep feature representations, denoted as f L and f H , are input into the module and are adjusted to a unified size using 1 × 1 convolution.
F 1 = f L + f L σ ( f L ) + ( 1 σ ( f L ) ) U p s a m p l e ( σ ( f H ) f H )
F 2 = f H + f H σ ( f H ) + ( 1 σ ( f H ) ) U p s a m p l e ( σ ( f L ) f L )
F 1 and F 2 represent the complementary feature information after aggregating shallow and deep features, respectively, while σ denotes the sigmoid function. The processed feature information is then fused to compensate for the loss of spatial boundary information in the high-level features and the lack of global category information in the low-level features. Finally, after a 3 × 3 convolution, the two recalibrated feature representations are concatenated for output, as shown in Equation (12):
F ( F 1 ,   F 2 ) = R e L U ( G N ( c a t ( F 1 , F 2 ) ) )
Here, cat ( · ) denotes the concatenation operation along the channel dimension, and F ( F 1 ,   F 2 ) represents the fused feature information after applying Group Normalization and the ReLU activation function. This process achieves a robust combination of different features while refining coarse features, ultimately enabling accurate and efficient identification of defects across varying scales.

4. Experiments

This section details the experimental procedures employed in this study, encompassing dataset construction, evaluation metrics, implementation details, experimental assessment, interpretability analysis, and ablation studies. To enhance the clarity and intuitive understanding of our findings, feature map visualization is employed as a supplementary tool to support the experimental results.

4.1. Dataset

The dataset used in this study is the PVEL-AD dataset, collected and created by Hebei University of Technology. This dataset comprises 4500 images representing twelve common defect types in photovoltaic panels: Crack, Finger, Black core, Thick line, Star crack, Corner, Fragment, Scratch, Horizontal dislocation, Vertical dislocation, Printing error, and Short circuit. Each image is annotated with bounding boxes and class labels, specifying the class of the defect and the coordinates of the bounding box’s top-left and bottom-right corners. All images have a resolution of 1024 × 1024 pixels.
Due to the insufficient number of defect instances for the classes Corner, Fragment, Scratch, and Printing error (each with fewer than 32 instances), these four classes were deemed insignificant for experimentation and were removed from the dataset. The remaining eight defect types were retained for further analysis. The dataset was then split into training and testing sets at a ratio of 7:3, resulting in 3155 images for training and 1345 images for testing. Table 1 provides a detailed breakdown of the number and distribution of defect instances for the eight retained defect types in both the training and testing sets.
Following the application of the LFA, an additional 8761 training samples were generated. This augmentation strategy aimed to enhance the representation of linear texture features within the training data, further improving the model’s ability to detect small-scale defects.

4.2. Evaluation Metrics

In region-level defect detection tasks, an objective evaluation of model performance requires considering both classification and localization accuracy. Typically, Precision, Recall, and mean Average Precision (mAP) are employed to assess the accuracy of object detection. mAP represents the mean across all categories, while Precision is defined as the proportion of true defect samples among all detected defect samples. Recall, on the other hand, indicates the proportion of true defect samples among all actual defect samples. The formulas for each metric are provided as Equations (13)–(16):
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
A P = 0 1   P d R
m A P = i = 1 N   A P i N
Additionally, Frames Per Second (FPS), Giga Floating-point Operations Per Second (GFLOPS), and the number of parameters are utilized to evaluate the overall performance of the model, where TP, FP, and FN represent the number of correctly identified targets (true positives), the number of incorrectly identified targets (false positives), and the number of actual targets that were not detected (false negatives), respectively. Average Precision (AP) refers to the mean precision of the detection algorithm at different recall levels. The P-R curve is typically used for this calculation, where S P - R denotes the area enclosed by the P-R curve and the axes. The computation method is illustrated by Equation (17), as shown below:
A P = S P - R
In addition, we introduce the TIDE [37] metric as an additional means of analyzing model performance. In object detection, the primary metric used to assess performance is the mean Average Precision (mAP). However, since a false positive (FP) can arise from various issues such as duplicate detection, misclassification, incorrect localization, confusion with the background, or a combination of these factors, it is not straightforward to analyze detection errors solely through mAP. Similarly, a false negative (FN) may represent a completely missed ground truth (GT), or a potentially correct prediction that is merely misclassified or mislocalized. These error types can significantly impact mAP in different ways. Therefore, we utilize the TIDE evaluation metric to provide insight into our model’s superior performance. TIDE categorizes error types into six categories: classification errors (Cls), localization errors (Loc), both classification and localization errors (Both), duplicate detection errors (Dupe), background errors (Bkg), and missed GT errors (Miss), along with the counts of FPs and FNs.

4.3. Implementation Details

The experimental platform runs on the Ubuntu 20.04.6 LTS operating system, featuring an Intel Xeon CPU (Intel, Santa Clara, CA, USA) and four Tesla V100 GPUs (Tesla, Austin, TX, USA) (32 GB Memory each). The software environment includes CUDA 11.4 and CUDNN 8.0.5. All code is developed using the PyTorch 1.10.0 deep learning framework. Both training and testing are conducted on a single GPU. Images are uniformly resized to 640 × 640 pixels. During training, a stochastic gradient descent optimizer is utilized for parameter optimization, with a fixed weight decay of 0.0001 and a momentum of 0.9. The initial learning rate is set to 0.0001. The number of epochs and batch size are set to 150 and 8, respectively. The trigger threshold δ for LFA is set to 0.7. The trained models do not use pre-trained weights.

4.4. The Result of Defect Detection

The performance of the proposed LEM-Detector is compared to related methods in Figure 7. The LEM-Detector achieves a mean Average Precision (mAP) of 94.7%, representing a 3.4 percentage point improvement over RT-DETR. Compared to Faster R-CNN and RetinaNet [38], our model achieves an accuracy improvement of over 8 percentage points while utilizing fewer parameters. When compared to the latest YOLOv10 [39], which has a similar number of parameters, our model surpasses YOLOv10 by 6% in terms of mAP. These results highlight the significant performance gains achieved by the LEM-Detector, demonstrating its effectiveness in addressing the challenges of multi-scale defect detection in photovoltaic panels.
Table 2 provides a detailed view of the superior performance of the LEM-Detector. Compared to existing methods, the LEM-Detector, utilizing LFA, EFEM, and MMFPN, achieves an optimal balance among model size, detection speed, and GFLOPs while ensuring top detection accuracy, making it especially suitable for industrial inspection applications.
Table 3 further reveals significant performance gains for the Crack and Star crack defect types, which typically exhibit lower detection accuracy. This improvement is attributed to the integration of the LFA specifically designed for linear crack data. The LFA creates a more realistic dataset with more evenly distributed training samples, enhancing the model’s ability to accurately identify these challenging defects.
The combination of the EFEM and the Multi-Scale Multi-Feature Fusion Pyramid Network MMFPN ensures that each feature layer retains rich scale information while preserving defect feature information outside the candidate boxes. This approach reduces information coupling and enhances edge information, leading to improved detection accuracy. The LEM-Detector’s high detection accuracy further validates the effectiveness of the LFA, EFEM, and MMFPN in multi-scale defect detection of photovoltaic panel images. Notably, the LEM-Detector surpasses Faster R-CNN by 8.9 percentage points in terms of mAP, showcasing the most significant improvement compared to other methods.
The P-R curve in Figure 8 provides a visual representation of the LEM-Detector’s detection performance for each defect type. The area enclosed by each curve and the coordinate axes represents the Average Precision (AP) value for the corresponding defect type.
The P-R curves for Crack and Star crack defects demonstrate a significant performance gap compared to other defect types, highlighting these defects as challenging samples that hinder model performance improvement. This challenge stems from the high similarity of these defects to the background, their complex and variable shapes, and their dispersed nature with substantial background interference, all contributing to increased detection difficulty.
In contrast, the detection performance for Black core and Horizontal dislocation defects is superior. These defects are often large-scale global defects with distinct features, making them relatively easier to detect. This observation further underscores the challenges presented by small-scale, linear defects like Crack and Star crack, and the need for specialized approaches to enhance their detection accuracy.
Figure 9 presents several visual examples of the LEM-Detector’s performance in detecting defects. The figure showcases the model’s effectiveness in identifying defects across various scales. The high threshold for detected defects demonstrates the method’s sensitivity to defect presence. Even in images containing multiple defect types, the LEM-Detector maintains its effectiveness. For example, in the second column of the first row, the model successfully identifies overlapping defects, demonstrating its ability to distinguish individual defects within complex scenarios. These results provide strong evidence of the LEM-Detector’s exceptional performance in detecting multi-scale defects in photovoltaic panels.

4.5. Ablation Studies

We perform ablation experiments on different components of the proposed LEM-Detector, LFA, EFEM, and MMFPN, to verify the effectiveness of those components.
From Table 4, it is evident that the use of LFA for enhancing linear features, the EFEM for accurately extracting multi-scale feature information, and the MMFPN module for capturing often-overlooked edge information significantly improves detection accuracy. The overall mAP reaches 94.7%, with each module contributing to enhanced experimental precision. Notably, the detection results for Crack and Star crack types show substantial improvements, with accuracy increases of 12.1 and 9.3 percentage points, respectively, compared to RT-DETR, effectively demonstrating the detection efficacy of the LEM-Detector.
Using the TIDE evaluation metric, as shown in Table 5, it is evident that our detection model exhibits significant enhancements in various aspects, particularly in localization errors (Loc), duplicate detection errors (Dupe), background errors (Bkg), and missed ground truths (Miss). The model’s performance in terms of false positives (FPs) and false negatives (FNs) is also markedly superior to that of the RT-DETR model, with only minor deficiencies in classification errors (Cls) and errors involving both classification and localization (Both). Notably, substantial improvements are observed in localization (Loc) and background (Bkg) errors, attributed to our precise retention of defect boundary information during feature fusion. The MMFPN effectively reduces the risk of overfitting by integrating complementary information from different feature layers, enhancing the model’s generalization capabilities. Importantly, our model did not exhibit any missed GT errors (Miss), indicating that the EFEM effectively captures small-scale defect information without omissions, thereby improving the model’s performance in multi-scale defect detection.

4.5.1. The Effect of LFA

Table 6 presents the results of the ablation experiments, which clearly support our analysis of the dataset. From the second row of Table 6, it can be seen that using the same data augmentation techniques without the LFA, the mean Average Precision (mAP) is 91.7%, showing a slight improvement over RT-DETR’s mAP of 91.3%. However, there is no significant enhancement in detection accuracy for the following defect types: Finger, Black core, Horizontal dislocation, Vertical dislocation, and Short circuit. In fact, the detection precision for Finger and Vertical dislocation decreases. This is primarily because, apart from the Crack and Star crack defect types, the other six types have easily recognizable features and larger scales, achieving accuracy levels of 90% or higher. Moreover, due to the inherent challenges in achieving uniformity in manual annotations during dataset creation, the potential for improvement is quite limited. Our proposed LFA effectively addresses this issue by focusing on targeted data augmentation for linear features, resulting in an improved mAP of 92.5%. Additionally, compared to the traditional data augmentation approach that generated 12,470 training images, our LFA method utilizes only 8761 training images, thereby enhancing training speed and minimizing resource wastage.

4.5.2. The Effect of EFEM

In contrast to the BasicBlock module used for feature extraction in ResNet, which can be insensitive to information variations due to shared parameters in convolutional operations, the proposed EFEM effectively addresses this limitation. By emphasizing the importance of different features within the receptive field and prioritizing spatial characteristics, EFEM enriches the diversity of feature expression. It selectively enhances meaningful features while suppressing redundant or irrelevant ones, enabling the network to capture precise local information and represent complex features more effectively. This approach incurs negligible computational costs and parameter increments, significantly enhancing network performance. The improved results are presented in Table 7.
To visualize the effectiveness of the proposed EFEM compared to the BasicBlock module used in RT-DETR, GradCAM++ was employed to extract heatmaps from the outputs of the feature extraction modules at the same layer. Figure 10 presents these heatmaps, with the first column displaying the original images, the second column showing the features extracted using the BasicBlock module, and the third column illustrating the features extracted by the EFEM.
The first and second rows of Figure 10 demonstrate that for the challenging cases of Crack and Star crack defects (which typically exhibit lower detection accuracy), the EFEM is more effective at capturing precise information regarding the dispersion of cracks compared to the features extracted by BasicBlock. Furthermore, in the third and fourth rows, even under challenging conditions such as the low brightness of defect samples and defects crossing over busbars, the EFEM demonstrates accurate identification capabilities. These visual comparisons highlight the ability of the EFEM to capture more meaningful and precise features, contributing to the enhanced performance of the LEM-Detector.

4.5.3. The Effect of MMFPN

To visualize the effectiveness of the proposed MMFPN compared to the CCFF used in RT-DETR, GradCAM++ was employed to extract heatmaps from the outputs of the feature fusion modules at the same layer. Figure 11 presents these heatmaps, with the first column displaying the original images, the second column showing the features extracted using CCFF, the third column illustrating the feature maps extracted using EFEM (demonstrating the effectiveness of MMFPN in merging edge information), and the fourth column showcasing our proposed MMFPN.
In the first row of Figure 11, it is evident that for defects not recognized by RT-DETR and for defects that cross busbars (which are difficult to identify in the second row), MMFPN selectively filters and integrates the discontinuous and noisy precise information extracted by the EFEM. This process merges more edge information, enabling accurate defect information acquisition for classification and regression.
In the third row, RT-DETR fails to identify all defect types in the case of stacked defects. In contrast, the MMFPN demonstrates a superior ability to focus on these easily overlooked edge details, allowing for accurate identification and localization of the two defect types. These visual comparisons highlight the ability of MMFPN to effectively fuse multi-scale features and enhance edge information, contributing to the improved performance of the LEM-Detector.

5. Conclusions

This paper presents the LEM-Detector, an efficient end-to-end detector for photovoltaic panel defect detection. The proposed method addresses several challenges in this domain, including the wide range of defect scales, diverse defect types, and severe background interference. Our approach aims to enhance detection accuracy and robustness by introducing several novel components: the LFA, the EFEM, and the MMFPN.
The LFA module addresses the issue of imbalanced training data by focusing on augmenting linear feature samples using a cosine similarity function. This targeted augmentation strategy effectively improves the model’s ability to detect small-scale defects like cracks and star cracks. The EFEM module is designed to enhance the receptive field and capture subtle texture information, which is crucial for identifying defects amidst complex backgrounds. The MMFPN facilitates the aggregation of complementary information from different feature layers, ensuring that both high-level category features and edge information are effectively utilized.
The experimental results demonstrate the effectiveness of the proposed LEM-Detector. On the PVEL-AD dataset, our model achieves an mAP of 94.7%, outperforming several state-of-the-art methods. The improvements are particularly significant for the detection of Crack and Star crack defects, highlighting the effectiveness of the LFA module. Additionally, the EFEM and MMFPN contribute to the overall enhancement of multi-scale defect detection by preserving detailed information and improving feature fusion.
The aim and assumptions of this study have been achieved through the introduction of these novel components, which together address the challenges in photovoltaic panel defect detection. The results of this research are expected to be useful for industrial applications in the photovoltaic sector, enabling more reliable and accurate defect identification systems.
For future research directions, we plan to explore methods for automating the threshold generation in the LFA module to further streamline the training process. Additionally, we aim to develop a more lightweight model to enhance the practicality and deployability of the LEM-Detector in real-world scenarios. Overall, this work provides valuable technical references for visual inspection tasks in similar industrial applications.

Author Contributions

Conceptualization, X.Z., X.L. and R.W.; Methodology, X.Z. and X.L.; Project administration, R.W.; Validation, X.Z., X.L. and W.H.; Writing—original draft, X.Z., X.L. and R.W.; Writing—review and editing, X.Z., X.L. and R.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the University-Industry Collaborative Education Program, grant number 220605181012316, Jiangsu Industry-University-Research Cooperation Program, grant number BY20230265, Jiangsu Province Graduate Student Research and Practice Innovation Program, grant number YPC22020142 and YPC23020155, Jiangsu Provincial Double-Innovation Doctor Program, grant number JSSCBS20210908 and JSSCBS20210896, Changzhou Science and Technology Program, grant number CJ20235041, Changzhou University Extracurricular Innovation and Entrepreneur-ship Fund Program for College Students, grant number QZX22020187 and Changzhou University Outstanding Undergraduate Graduation Design (Thesis) Cultivation Program, grant number QZX23020036 and QZX24020101.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in: http://aihebut.com/col.jsp?id=118 (accessed on 6 November 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

LFALinear Feature Augmentation
EFEMEfficient Feature Enhancement Module
MMFPNMulti-Scale Multi-Feature Pyramid Network
AIFIAttention-based Intra-scale Feature Interaction
CCFFCross-scale Feature Fusion
CACoordinate Attention
CIAComplementary Information Aggregation

References

  1. Herraiz, Á.H.; Marugán, A.P.; Márquez, F.P.G. Photovoltaic plant condition monitoring using thermal images analysis by convolutional neural network-based structure. Renew. Energy 2020, 153, 334–348. [Google Scholar] [CrossRef]
  2. Luo, Q.; Sun, Y.; Li, P.; Simpson, O.; Tian, L.; He, Y. Generalized completed local binary patterns for time-efficient steel surface defect classification. IEEE Trans. Instrum. Meas. 2018, 68, 667–679. [Google Scholar] [CrossRef]
  3. Firuzi, K.; Vakilian, M.; Phung, B.T.; Blackburn, T.R. Partial discharges pattern recognition of transformer defect model by LBP & HOG features. IEEE Trans. Power Deliv. 2018, 34, 542–550. [Google Scholar]
  4. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. pp. 21–37. [Google Scholar]
  5. Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo algorithm developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
  6. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
  7. Ren, S. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed]
  8. Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; pp. 850–855. [Google Scholar]
  9. Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
  10. Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle WA, USA, 17–21 June 2024; pp. 16965–16974. [Google Scholar]
  11. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
  12. Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
  13. Taylor, L.; Nitschke, G. Improving deep learning with generic data augmentation. In Proceedings of the 2018 IEEE symposium series on Computational Intelligence (SSCI), Bangalore, India, 18–21 November 2018; pp. 1542–1547. [Google Scholar]
  14. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems 27; NeurIPS: La Jolla, CA, USA, 2014. [Google Scholar]
  15. Kingma, D.P. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
  16. Wang, X.; He, K. Enhancing the transferability of adversarial attacks through variance tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1924–1933. [Google Scholar]
  17. Tang, W.; Yang, Q.; Yan, W. Deep learning based model for Defect Detection of Mono-Crystalline-Si Solar PV Module Cells in Electroluminescence Images Using Data Augmentation. In Proceedings of the 2019 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC), Macao, China, 1–4 December 2019; pp. 1–5. [Google Scholar]
  18. Jain, S.; Seth, G.; Paruthi, A.; Soni, U.; Kumar, G. Synthetic data augmentation for surface defect detection and classification using deep learning. J. Intell. Manuf. 2022, 33, 1007–1020. [Google Scholar] [CrossRef]
  19. Wei, C.; Liang, J.; Liu, H.; Hou, Z.; Huan, Z. Multi-stage unsupervised fabric defect detection based on DCGAN. Vis. Comput. 2023, 39, 6655–6671. [Google Scholar] [CrossRef]
  20. Wang, J.; Bi, L.; Sun, P.; Jiao, X.; Ma, X.; Lei, X.; Luo, Y. Deep-learning-based automatic detection of photovoltaic cell defects in electroluminescence images. Sensors 2022, 23, 297. [Google Scholar] [CrossRef] [PubMed]
  21. Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
  22. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
  23. Ross, T.-Y.; Dollár, G. Focal loss for dense object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2980–2988. [Google Scholar]
  24. Luo, J.-q.; Fang, H.-s.; Shao, F.-m.; Zhong, Y.; Hua, X. Multi-scale traffic vehicle detection based on faster R–CNN with NAS optimization and feature enrichment. Def. Technol. 2021, 17, 1542–1554. [Google Scholar] [CrossRef]
  25. Liu, L.; Zhu, Y.; Rahman, M.R.U.; Zhao, P.; Chen, H. Surface defect detection of solar cells based on feature pyramid network and GA-faster-RCNN. In Proceedings of the 2019 2nd China Symposium on Cognitive Computing and Hybrid Intelligence (CCHI), Xi’an, China, 21–22 September 2019; pp. 292–297. [Google Scholar]
  26. Su, B.; Chen, H.; Zhou, Z. BAF-detector: An efficient CNN-based detector for photovoltaic cell defect detection. IEEE Trans. Ind. Electron. 2021, 69, 3161–3171. [Google Scholar] [CrossRef]
  27. Tang, F.; Xu, Z.; Huang, Q.; Wang, J.; Hou, X.; Su, J.; Liu, J. DuAT: Dual-aggregation transformer network for medical image segmentation. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Xiamen, China, 13–15 October 2023; pp. 343–356. [Google Scholar]
  28. Zhou, X.; Wei, M.; Li, Q.; Fu, Y.; Gan, Y.; Liu, H.; Ruan, J.; Liang, J. Surface defect detection of steel strip with double pyramid network. Appl. Sci. 2023, 13, 1054. [Google Scholar] [CrossRef]
  29. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
  30. Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. In Advances in Neural Information Processing Systems 28; NeurIPS: La Jolla, CA, USA, 2015. [Google Scholar]
  31. Lyu, H.; Sha, N.; Qin, S.; Yan, M.; Xie, Y.; Wang, R. Advances in neural information processing systems. In Advances in Neural Information Processing Systems 32; NeurIPS: La Jolla, CA, USA, 2019. [Google Scholar]
  32. Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
  33. Xu, W.; Wan, Y. ELA: Efficient Local Attention for Deep Convolutional Neural Networks. arXiv 2024, arXiv:2403.01123. [Google Scholar]
  34. Su, B.; Chen, H.; Chen, P.; Bian, G.; Liu, K.; Liu, W. Deep learning-based solar-cell manufacturing defect detection with complementary attention network. IEEE Trans. Ind. Inform. 2020, 17, 4084–4095. [Google Scholar] [CrossRef]
  35. Zhang, X.; Liu, C.; Yang, D.; Song, T.; Ye, Y.; Li, K.; Song, Y. RFAConv: Innovating spatial attention and standard convolutional operation. arXiv 2023, arXiv:2304.03198. [Google Scholar]
  36. Su, B.; Zhou, Z.; Chen, H. PVEL-AD: A large-scale open-world dataset for photovoltaic cell anomaly detection. IEEE Trans. Ind. Inform. 2022, 19, 404–413. [Google Scholar] [CrossRef]
  37. Bolya, D.; Foley, S.; Hays, J.; Hoffman, J. Tide: A general toolbox for identifying object detection errors. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16. pp. 558–573. [Google Scholar]
  38. Lin, T. Focal Loss for Dense Object Detection. arXiv 2017, arXiv:1708.02002. [Google Scholar]
  39. Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Figure 1. Common defect types in photovoltaic panels.
Figure 1. Common defect types in photovoltaic panels.
Applsci 14 10290 g001
Figure 2. Overall framework of the LEM-Detector.
Figure 2. Overall framework of the LEM-Detector.
Applsci 14 10290 g002
Figure 3. The architecture of the LFA.
Figure 3. The architecture of the LFA.
Applsci 14 10290 g003
Figure 4. Effects of data augmentation ((A): defect overlap; (B): small-scale Star crack; (C): small-scale Finger; (D): crack crossing the busbar).
Figure 4. Effects of data augmentation ((A): defect overlap; (B): small-scale Star crack; (C): small-scale Finger; (D): crack crossing the busbar).
Applsci 14 10290 g004
Figure 5. The architecture of the EFEM.
Figure 5. The architecture of the EFEM.
Applsci 14 10290 g005
Figure 6. The architecture of the CIA.
Figure 6. The architecture of the CIA.
Applsci 14 10290 g006
Figure 7. The proposed LEM-Detector achieves state-of-the-art performance when compared to existing prominent object detectors.
Figure 7. The proposed LEM-Detector achieves state-of-the-art performance when compared to existing prominent object detectors.
Applsci 14 10290 g007
Figure 8. P-R curve of LEM-Detector.
Figure 8. P-R curve of LEM-Detector.
Applsci 14 10290 g008
Figure 9. Detection results of LEM-Detector.
Figure 9. Detection results of LEM-Detector.
Applsci 14 10290 g009
Figure 10. Heatmaps of the feature extraction stage.
Figure 10. Heatmaps of the feature extraction stage.
Applsci 14 10290 g010
Figure 11. Heatmaps of the feature fusion stage.
Figure 11. Heatmaps of the feature fusion stage.
Applsci 14 10290 g011
Table 1. Number and distribution of defect instances.
Table 1. Number and distribution of defect instances.
CkFrBcTlScHdVdSt
Training87620417316959555597350
Testing3849172972864024340142
Total126029581028981135795137492
Note: Crack (Ck), Finger (Fr), Black core (Bc), Thick line (Tl), Star crack (Sc), Horizontal dislocation (Hd), Vertical dislocation (Vd), and Short circuit (St).
Table 2. Comparison with advanced object detection detectors (all object detectors were trained starting from the smallest model until they were able to detect all defect types.).
Table 2. Comparison with advanced object detection detectors (all object detectors were trained starting from the smallest model until they were able to detect all defect types.).
Method[email protected] (%)Parameter (M)GFLOPsFPS
Faster-RCNN85.8261.4137.028
RetinaNet86.5190.998.032
Deformable-DETR90.140.1126.037
DAB-DETR90.644.094.031
YOLOv8n89.62.98.188
RT-DETR91.318.957.0260
YOLOv9c88.724.7102.161
YOLOv10x89.928.0160.054
LEM-Detector
(ours)
94.733.867.260
Table 3. [email protected] for different defect types.
Table 3. [email protected] for different defect types.
MethodCkFrBcTlScHdVdSt[email protected]
Faster-RCNN59.887.195.578.771.099.595.599.585.8
RetinaNet59.388.897.781.268.899.597.599.586.5
Deformable-DETR73.293.699.189.881.196.997.799.590.1
DAB-DETR70.787.398.589.682.799.597.699.590.6
YOLOv8n72.893.899.284.183.098.087.299.289.6
RE-DETR72.694.598.990.082.799.592.799.591.3
YOLOv9c68.093.198.483.176.398.492.799.588.7
YOLOv10x72.091.598.584.479.199.595.898.689.9
LEM-Detector
(ours)
84.794.398.891.192.099.597.799.594.7
Note: Crack (Ck), Finger (Fr), Black core (Bc), Thick line (Tl), Star crack (Sc), Horizontal dislocation (Hd), Vertical dislocation (Vd), and Short circuit (St).
Table 4. The results of the LEM-Detector with different components on the PVEL-AD dataset.
Table 4. The results of the LEM-Detector with different components on the PVEL-AD dataset.
LFAEFEMMMFPN[email protected] (%)CkFrBcTlScHdVdSt
91.372.694.598.990.082.799.592.799.5
92.578.295.099.391.586.499.590.899.5
92.277.994.298.889.385.299.493.099.5
92.676.493.999.190.584.799.598.498.6
93.179.694.599.490.386.299.596.199.5
92.978.794.499.390.986.999.494.599.5
92.876.994.398.388.988.599.597.199.5
94.784.794.398.891.192.099.597.799.5
Note: Crack (Ck), Finger (Fr), Black core (Bc), Thick line (Tl), Star crack (Sc), Horizontal dislocation (Hd), Vertical dislocation (Vd), and Short circuit (St).
Table 5. Distribution of error types.
Table 5. Distribution of error types.
Method[email protected]ClsLocBothDupeBkgMissFPFN
RT-DETR91.300.481.670.150.432.560.146.841.29
LEM-Detector94.700.551.020.180.131.990.004.400.66
Improvement+3.40−0.07+0.65−0.03+0.30+0.57+0.14+2.44+0.63
Table 6. Ablation experiment of LFA (DA denotes the use of the same data augmentation methods as in LFA, but without processing through the cosine similarity threshold.).
Table 6. Ablation experiment of LFA (DA denotes the use of the same data augmentation methods as in LFA, but without processing through the cosine similarity threshold.).
Method[email protected]Sample SizeCkFrBcTlScHdVdSt
RT-DETR91.3315172.694.598.990.082.799.592.799.5
RT-DETR + DA91.712,47077.693.698.491.386.599.587.499.5
RT-DETR + LFA92.5876178.295.099.391.586.499.590.899.5
Note: Crack (Ck), Finger (Fr), Black core (Bc), Thick line (Tl), Star crack (Sc), Horizontal dislocation (Hd), Vertical dislocation (Vd), and Short circuit (St).
Table 7. Ablation experimental results of EFEM.
Table 7. Ablation experimental results of EFEM.
Backbone[email protected] (%)Parameter (M)GFLOPsFPS
BasicBlock91.310.620.1271
EFEM92.210.920.9264
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, X.; Li, X.; Huang, W.; Wei, R. LEM-Detector: An Efficient Detector for Photovoltaic Panel Defect Detection. Appl. Sci. 2024, 14, 10290. https://doi.org/10.3390/app142210290

AMA Style

Zhou X, Li X, Huang W, Wei R. LEM-Detector: An Efficient Detector for Photovoltaic Panel Defect Detection. Applied Sciences. 2024; 14(22):10290. https://doi.org/10.3390/app142210290

Chicago/Turabian Style

Zhou, Xinwen, Xiang Li, Wenfu Huang, and Ran Wei. 2024. "LEM-Detector: An Efficient Detector for Photovoltaic Panel Defect Detection" Applied Sciences 14, no. 22: 10290. https://doi.org/10.3390/app142210290

APA Style

Zhou, X., Li, X., Huang, W., & Wei, R. (2024). LEM-Detector: An Efficient Detector for Photovoltaic Panel Defect Detection. Applied Sciences, 14(22), 10290. https://doi.org/10.3390/app142210290

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop