1. Introduction
Worldwide growth in the installation of renewable energy systems has been remarkable during the last decade. The most popular and widespread renewable energy technologies in Greece are wind, photovoltaic (PV), biomass/biogas and hydropower. According to the Renewable Energy Sources Operator and Provider of Guarantees of Origin of Greece, the installed power is 4426.23 MWp for wind, 4198.59 MWp for photovoltaics, 112.31 MWp for biomass/biogas and 257.92 MWp for hydropower [
1]. The design, operation and maintenance of renewable energy systems is a challenging task that requires measurement accuracy and scientific soundness in the processing of monitoring data. Monitoring data from these energy systems can give important information for a system’s design and performance evaluation. Furthermore, data can be used in energy management either for fault detection or for power forecasting. The following are considered as the most critical problems related to energy optimization and maintenance in photovoltaic systems: Max Power Point Tracking, Output Power Forecasting, Parameter Estimation, Defect Detection [
2]. A fault detection method for a PV energy system can provide an accurate estimation of production under normal operating conditions as well as hints for the detection of the system’s faults [
3]. Faulty operating conditions result in a reduction in energy production and compromise the reliability of the system, while creating financial losses for investors [
4].
Many researchers have studied the fault detection of photovoltaic systems by combining measurements, experiment and computational methods. Every approach is characterized by the specific type of fault that detects the methodology addressing this task, and the specific application category. Faults concern both the AC side and DC side, as commonly observed in stand-alone and grid-connected PV systems [
5]. Garoudja et al. categorized faults by: (i) short-circuit faults, that can affect cells, bypass diodes or modules, and the aging of PV modules; (ii) open-circuit faults that involve a break of wiring between PV modules or solar cells, and (iii) partial shading faults, which affect shaded modules, while the other part is normally exposed to solar irradiance, the passage of clouds, dirt on PV modules, snow, other light barriers and obstacles [
6]. Other faults related to PV panels or panel covers involve: the fracture of the glass protective surface; bubbles and/or tears to the polymer cover of the backsheet; the corrosion of metallic frames; damage to the panel’s insulation; failed soldering joints of the PV cells; hot spots; and the PID effect [
7].
A large variety of techniques are applied to monitoring and measurement data management and processing. The multiple forms of machine learning (ML) are prominent in these techniques. ML may be broadly defined as the practice of using algorithms to parse data, learn from them, and then make a determination or prediction thereof [
8]. Neural networks (NN) and statistical methods are the most frequent tool categories, applied in fault detection in the area of photovoltaic energy systems. Silvestre et al. proposed an algorithm procedure based on the comparison of simulated and measured yields. The main type of faults examined and traced are inverter disconnection, partial shadowing and the disconnection of a string. The method was applied in a 9 kWp PV test installation in Algeria [
9]. Dhimish et al. proposed a procedure for automatic fault detection and diagnosis based on the statistical comparison of the means of measured and theoretical power using the
t-test. The application and validation of this method was conducted in a 1.98 kWp system in United Kingdom [
10]. Garoudja et al. propose a method for fault detection comparing measured values with simulated values by use of a one-diode model. The processing of the difference between the measurements and model predictions, used as fault indicators, was carried out through the application of an exponentially weighted moving average (EWMA) monitoring chart [
6]. Another statistical approach is based on labeling faults at over 100 PV sites in the United States, by the use of machine learning (ML) algorithms and hidden Markov modeling (HMM), which allow the labeling of historical behaviors without a manual setting of threshold [
11]. Chine et al. proposed a model for the detection of faulty modules in a string, faulty string, faulty inverter and false alarm, a group of faults which include partial shading, the ageing of PVs and inverter MPPT errors. The idea behind this method is the comparison of absolute error on the performance ratio (PR) with a threshold, that enables generating a diagnostic signal [
12]. Research on the health monitoring of PV installations is profiting from the quite significant, parallel research efforts applied to the vibration-based health monitoring of civil structures, affected by environmental factors (traffic, ambient temperature, noise), which change their structural dynamic characteristics, covering up those coming from structural damage. For example, Huang et al. employed the support vector machine (SVM) and moth-flame optimization (MFO) [
13], as well as an autoregressive (AR) time series model with two-step artificial neural networks (ANNs) [
14] to identify damage under temperature variations. Suresh et al. compared the behavior of a multiple linear regression, ARMA model, CNN-simple, and CNN-LSTM and observed that the CNN-simple and the CNN-LSTM methods perform best for all 1-h, 1-day and 1-week predictions, with the CNN-LSTM providing better results on certain occasions [
15].
Many researchers face the degradation problem with methods applied in specific experimental setups in off-grid operation. Fuster-Palop et al. proposed the use of simple machine learning tools to predict the global PR based on the following variables: global irradiance in-plane of array, and ambient temperature. Two regression models were applied: a multiple linear regression (MLR), and random forest (RF) algorithm, which is simpler than ANN [
16]. Hichri et al. propose a machine learning method for fault diagnosis and classification, which is based on the principal component analysis (PCA) technique. The proposed approach is applied in a PV system comprising three 4 kWp PV arrays, each driven by individual MPPT trackers [
17]. Ammiche et al. proposed a method based on a fuzzy logic filter (FLF), which relies on PCA and moving window principal component analysis (MWPCA). The method was applied in three inverters, with polycrystalline, monocrystalline and thin film PV panels connected, respectively, in Malaya University. The results demonstrate the method’s effectiveness in detecting different types of faults with high accuracy [
18]. Cui et al. proposed a PV-fault identification method based on improved deep residual shrinkage networks (DRSN). The method was able to identify short-circuit faults, partial-shading, abnormal aging and hybrid faults in a 6.48 kWp experimental PV field [
19]. Voutsinas et al. proposed a multi-output ANN for fault detection on the DC side of a photovoltaic system. A comparison of I
mpp and V
mpp values, obtained from the models based on data from the manufacturer’s data sheets, proved the effectiveness of the proposed method [
20]. Guejia Burbano et al. propose an ANN for faults and degradation phenomena occurring in PV panels. The method has two stages: one for predicting the single diode model parameters under normal operation, and one for the degraded condition. Comparison between the two stages is used for the identification of the degradation type. The method was tested using experimental data of I-V curves from the NREL database [
21]. Hopwood et al. propose an approach that utilizes physics-based simulations of string-level IV curves. Comparison between a baseline curve (no fault) and cases with partial soiling and cell crack system modes are presented [
22]. Wang et al. proposed a hybrid algorithm by combining the symmetrized dot pattern (SDP) with a convolutional neural network (CNN) for common faults such as poor welding, breakage and bypass diode failure. Comparison with a fault-free module was conducted and this study successfully combined SDP with CNN to develop a PV module fault, with recognition accuracy reported to 99.88%.
On the other hand, methods to be applied in a grid-connected operation were also presented. Hichri et al. developed a genetic-algorithm (GA)-based ANN. The proposed method is applied in GCPV systems under normal and faulty conditions [
23]. Aljafari et al. created a detection technique based on the comparison of calculated and measured DC power, with a predefined threshold value. The method uses a simulation tool for calculated values and generates a diagnostic signal to show normal or abnormal operation in a 2 kWp GTPV plant [
24]. Hussain et al. propose a simple technique for the detection and classification of faults occurring in PV systems based on a fuzzy logic controller. The method introduced a ‘fault index’, which measures the degree of deviation from the normal operating conditions and is applied in a 5 kWp grid-connected PV system [
25]. Another aspect of fault detection aims in special types of faults as faulty inverters’ operation and problems with arc discharge at the DC side. Wang et al. proposed a fault detection method for DC series arcs based on the combination of adaptive local mean decomposition (ALMD), multiscale fuzzy entropy (MFE), and support vector machine (SVM) algorithms for PV systems [
26]. Omana et al. propose a model for the detection of inverters’ faults, especially those observed at the inverter’s power MOSFETs. The method is based on periodically monitoring the variation in the harmonics of the inverter’s currents [
27]. Improved fault detection algorithms are also comparatively tested by Hussain et al. [
28] in small pilot PV installations by the use of non-iterative FF ANN and radial basis function (RBF) ANN. Further development in the topology and training algorithms include non-iterative neural-like structures based on the successive geometric transformations model (SGTM), which has been successfully applied to big data in regression and classification tasks in related sectors [
29].
Finally, IR thermography is a method that gives motivation to a fair number of researchers. Álvarez-Tey et al. proposed an IR thermography inspection strategy for PV plants based on a two stage aerial inspection and carried out on the ground. The method was applied to a 100 kW PV plant. The types of faults addressed were broken glass in a PV module, partial shading, problems involving PV module technology and open-circuit PV modules, along with other incidents [
30]. Kim et al. proposed an algorithm in order to analyze the infrared images of solar panels. The proposed method uses a convolutional neural network with SVM (CNN-SVM) to classify enhanced images in order to conduct non-physical visual recognition and fault detection analysis [
31].
The current work focuses to the application of ANN in the fault detection of grid-connected PV plants. The main contribution is the analysis and evaluation of already known fault types on grid- connected systems with the aid of ANN, aiming at formulating a procedure to enable fault diagnosis through the quantification of statistical metrics characteristic to each type of fault. To this end, the evaluation of the same type of ANN with different inputs is carried out. The novelty of the proposed methodology is based on the correlation of already known faults in grid-connected PV systems, with the statistical deviation of their performance metrics when they are observed with the aid of an ANN.
The organization of the remaining part of this paper is as follows:
Section 2 presents the testing setup and the fault detection methodology.
Section 3 presents and discusses the results of its application to normal operation, as well as to its operation with various known faults. A discussion follows in
Section 4, which defines the error tolerances to be associated with each fault category. Finally,
Section 5 summarizes the main findings with the methodology developed, its limitations and prospects for future research.
4. Discussion
In the previous section, the different types of faults were presented and the behavior of nRMSE values seen as indicators of already known faults was discussed. It is also important to evaluate the behavior of ANN when they are fed with inputs from other PV systems. To this end, we present the behavior of all three networks in 2014, when they are fed with data from PVS1 (
Figure 10).
The specific behavior of all three networks during 2014 is statistically evaluated by the variation of nRMSE in this period.
Figure 11 shows a similar behavior for all three models. However, certain periods are observed with an increasing trend, except those correlated with the existence of a fault. This trend is significant at the end of March, when the nRMSE is seen to significantly increase. During this period, the n3 network, which has a different type of temperature sensor’s input, demonstrates smaller deviations. This behavior is also observable in
Figure 12. All three networks underestimate the simulated power, as seen in
Figure 12. This may indicate an error in measurement equipment in network n3.
Figure 13 shows the trend of percent bias error during 2014. The general trend shows that simulated values are higher than the actual values; however, there are periods at the end of March where the result is the opposite. This fact may be related to a systematic error of measurement equipment in this period. The fact that n3 has a more consistent behavior in this aspect may indicate that this is an error in the backsheet temperature sensor.
An observation of the nRMSE and PBIAS error values points to the fact that these metrics could act as indicators for possible faults. The behavior of these values (daily) in normal operation sets a clearly observable threshold. A fault in a PV string has the larger statistical deviation from ANN simulation to actual behavior, over 30%, fault panel has a deviation of 10–20% and faults of near shading 6–10%, compared to 5% during normal operation. These deviations are dependent on the specific system type, layout, location and the type of measurement equipment. However, the proposed procedure is valid—mutatis mutandis—in all these cases.