1. Introduction
Clean energy generation is a very important issue in the development of any community that aspires to become sustainable. In this context, we are witnessing an exponential development of the photovoltaic (PV) energy sector [
1]. The feasibility study as well as the technical design of a PV system are based on an on-site, detailed simulation of the system’s operation. For running such simulations, accurate in-plane estimates of solar irradiance are required. Even today, measurements of solar irradiance in various spatial directions are very scarce. Thus, in-plane solar irradiance is commonly estimated on the basis of solar irradiance measured on the horizontal surface [
2]. To apply a transposition model, both direct-normal (
DNI) and diffuse (
DHI) solar irradiance components are required [
3]. However, the component of solar radiation measured with the highest spatial density at the planetary level is global horizontal solar irradiance (
GHI). This fact assigns the separation models a privileged place in modeling solar resources at the Earth’s surface. In a broad sense, a separation model estimates
DNI and
DHI from
GHI measurements (see, e.g., [
4] for a recent treatment of the topic).
The term irradiance defines the solar energy flux, expressed in W/m
2.
GHI denotes the incoming solar energy flux incident on a unitary horizontal surface from the entire celestial vault.
GHI is the result of summing two components in the horizontal plane: the beam and the diffuse solar irradiance. The solar irradiance beam is the solar energy flux coming directly from the Sun disk to the Earth, measured on a horizontal plane. If
denotes the zenithal angle, then the solar irradiance beam is expressed by the cosine law:
. Diffuse solar irradiance, or
DHI, defines the solar radiation that has been scattered by the atmospheric constituents but still reaches the surface of the Earth. The three components are related by the closure equation:
The problem of separating global irradiance into its direct and diffuse components is a current topic of interest in solar energy research (see, e.g., [
5]), with the stated aim of increasing the performance of the separation process.
Generally, the output of a separation model is the diffuse fraction, defined by equation [
6]:
Technically speaking,
GHI is directly measured, then
DHI is straightforwardly obtained from the estimated diffuse fraction, and finally
DNI is estimated through Equation (1):
Along with this basic application, the evaluation of
DHI and
DNI on the basis of
GHI, which is currently the diffuse fraction, acquires new values, such as its use as a quantifier of the uncertainty induced by aerosols in estimating clear-sky
DHI [
7] or as a proxy for classifying the sky conditions into clear, intermediate, and overcast [
8].
The key stage in
DHI estimation is the choice and application of a separation model, i.e., the estimation of diffuse fraction from one or more atmospheric parameters, frequently called predictors. The accuracy level in diffuse fraction estimation determines the overall accuracy of a separation model. The diffuse fraction is typically estimated as a function of clearness index [
6]:
where
Gext represents the solar irradiance deterministically computed in a horizontal plane at the top of the atmosphere. The clearness index encapsulates information about atmospheric transparency, and its measured time series captures the stochastic part of solar irradiance [
9].
In the last few decades, separation models have experienced continuous development [
6]. The main engine of the development was constituted by the needs of the solar industry, with the research focusing on the practical modeling of the diffuse fraction based on the atmospheric clearness index. At the same time, there was also a constant rush to develop more and more accurate separation models, even if their practical applicability proved to be extremely limited. In fact, the development of a better performing model is the goal per se of the research. In this study, both aspects of the research were addressed. On the one hand, we propose a new entry in the race for the best-performing separation model that maintains a large degree of accessibility in practice. On the other hand, we propose an applicative separation model from the class of the simplest models, whose performance is comparable to the best-performing current models.
In summary, this paper reports two new minute-scale separation models, which are rational functions of the clearness index and other predictors. The first model setup operates with eight predictors and constantly outperforms already established models. The second model is highly accessible because it uses only three predictors that are determined using GHI values. The models’ accessibility also resides in their structure, with each model being defined by a single equation and a unique set of coefficients. The predictors are easily computable in any location where GHI is measured. Constructed innovatively and supplely, the two proposed models are meant to cover the gap between high accuracy and accessibility.
The paper is organized as follows:
Section 2 surveys the current state-of-the-art separation models. The relevant datasets used in this study are described in
Section 3. The proposed separation models are introduced in
Section 4. The models’ performance is evaluated in
Section 5.
Section 6 summarizes the main conclusions.
2. A Survey on the Separation Model’s Performance
A general perspective on the separation models’ equation is given by [
10]. Almost all the separation models have an empirical basis, ranging from purely empirical models [
11] to the very rare models incorporating physical features [
12]. Many empirical models are fitted to data collected from a single station (e.g., [
13]), but there are also models fitted to data collected from stations spread all over the world (e.g., [
14]). The practical separation models consist of linear or non-linear equations (see, e.g., the large variety of separation models gathered by [
4]). The most populated classes of the separation model equations’ are polynomial and logistic.
The first separation model proposed by Liu and Jordan in 1960 [
15] linearly relates the daily diffuse solar irradiation to the daily global solar irradiation. After this opening work, hundreds of single-predictor polynomial models have been proposed, e.g., [
16] in 1977, [
17] in 1982, [
18] in 2006, and so on. Most of the polynomial separation models contain a single equation defined over the entire range of the clearness index. Some separation models comprise more than one equation, each equation being defined over a sub-domain of the clearness index [
11]. The first logistic model was proposed in 2001 [
19]. It was motivated by the attempt to abandon the branched structure of many previous models. Some of the latest and most popular separation models are from the class of logistic models ([], [
5,
20,
21]).
The separation models are not significantly distinctive in their formal nature. What differentiates them mainly is the number and nature of the predictors. The search for an ideal combination of predictors can be easily understood by looking at the scattering of the diffuse fraction with respect to the clearness index (e.g.,
Figure 1,
Figure 2a). Because single predictor models cannot explain the dispersion of the diffuse fraction at a fixed value of the clearness index, the proposed polynomial models started to operate with more predictors [
21,
22,
23].
From a timescale perspective, there is a large variety of separation models that are applicable to GHI time series sampled from 1 min [
11] to monthly [
24] intervals.
The separation models are comprehensively reviewed in [
6]. The paper compares the performance of 140 separation models using data from 54 stations spread across all climate zones. The authors claim an exhaustive study, involving all separation models published prior to 2016. The paper concluded that no separation model can outclass all the others in every location. However, the model Engerer2 [
20] (further denoted E2) was indicated to be quasi-universal based on the statistical results in different climate zones. The current state-of-the-art in separation models is highlighted by recent work [
5]. It compares the 10 separation models that claimed to perform better than E2 against a huge dataset collected from all climate zones. The study identifies four minute-scale separation models, developed after 2016, that outperform Engerer2: Yang4 (Y4) [
5], Starke1 (S1) [
21], Starke3 (S3) [
25], and Paulescu (PB) [
11]. The abbreviation used in this study is indicated in brackets.
On the basis of the outcomes from [
5,
6], in the present study we considered only the top five models from [
5]. The ensemble of the five models represents the state-of-the-art in separating diffuse from global solar irradiance. The performance of these models is considered a benchmark in the evaluation of the performance of the separation models proposed in this study. The five models are briefly introduced in
Appendix A.
3. Dataset
The radiometric data used for developing and testing the proposed models were collected from the Baseline Surface Radiation Network (BSRN) [
26]. Most of the stations in this network provide
GHI,
DHI, and
DNI measured at 1 min resolution. The general database developed for this study includes two datasets. The first dataset, denoted D_FIT, was used for developing the proposed models. The second dataset, denoted D_TEST, was designed for comparing the performance of the proposed models with the performance of the reference models introduced in
Section 2.
D_FIT contains 193,294 lines of data, recorded for one year (2020) at 1 min resolution at the station of Payerne, Switzerland (46.815 N, 6.944 E, BSRN indicative
PAY). This location is situated in a temperate climate region classified as Cfb according to the Köppen–Geiger system [
27]. D_TEST contains nearly 7 million lines. Data were collected from 36 locations spread across four climatic zones: tropical (A), arid (B), temperate (C), and continental (D). Only data measured at the Sun’s elevation angle
h > 5° were included in D_FIT and D_TEST. This is a common practice in research conducted with radiometric data because the pyranometer’s accuracy is questionable close to sunrise and sunset.
The data in D_TEST are characterized by diversity; they were recorded in locations with latitudes between −45 and +58 deg, and the time span extends from 2002 to 2021.
Table 1 summarizes the stations’ metadata.
There is some inherent overlapping between the locations included in D_TEST and the origin location of the models. Thus, Payerne (
PAY), the origin location of the proposed models, is included in D_TEST with a share of data of 2.7%. There is no data overlap, i.e., the fit and the test were performed on data collected in different years. Testing an empirical model on data collected from the origin location in a different period is usually practiced (e.g., [
22]). The data used to develop the PB model were recorded in Palaiseau (
PAL) during 2014–2016. Palaiseau is present in D_TEST with the same share of 2.7% as data collected in 2017. E2 was developed with data from Australia, with none of the stations included in D_TEST. S1 was fitted with data recorded at stations in Australia. Only Darwin (
DAR) station is part of the fitting dataset and D_TEST, with the mention that data are collected in different years. S3 proposes different empirical coefficients for each of the five climate zones. S3 was developed based on data collected from 51 BSRN stations, almost all from the network. A number of 32 BSRN stations are also present in D_TEST, which ascribes some advantage to S3. Five of the seven stations used to fit the coefficients of the Y4 are also included in D_TEST. It is worth noting that even if there is some spatial overlap, there is no temporal overlap.
4. A Proposal for Minute-Scale GHI Separation Models
In this section, two innovative separation models are introduced. The models were built on the basis of the D_FIT dataset. Most of the existing separation models are linear, polynomial, logistic, and/or piecewise defined [
4,
5,
6]. Differently, the proposed models are defined by rational equations. Basically, the two models estimate the diffuse fraction based on common predictors included in the equations of the most performant separation models (see, e.g.,
Appendix A).
The first proposed model, further denoted M1, is defined by an elaborate equation calling on eight predictors. M1 is the result of intensive research on the effect of various predictors on the diffuse fraction. It gathers together almost all the predictors successfully accounted for by the previous separation models: (1) the clearness index
, (2) the deviation of the clearness index from its estimated value under clear skies
, (defined by Equation (A3)), (3) the part of diffuse fraction
that is attributable to cloud enhancement
, (defined by Equation (A4)), (4) daily average of the clearness index
, (defined by Equation (A1)), (5) hourly average of the clearness index
, (6) the persistence factor
, defined as the average of a lag and a lead of the clearness index values, (7) the clear sky global solar irradiance
in MJ/(h∙m
2), and (8) the ratio of measured
GHI and the estimated global clear sky irradiance
. M1 is defined by the empirical equation (
and
nMBE = 0.011):
The second proposed model, further denoted M2, is definitely simpler and more accessible than M1. It uses only three predictors, all related to the clearness index: (1) the clearness index itself
, the daily average of the clearness index
, and the hourly average of the clearness index
. M2 is defined by the equation: (
and
nMBE = 0.007).
The simplicity of M2 does not reduce its ability to capture the essential behavior of the diffuse fraction with respect to the clearness index. It is well illustrated by the sensitivity analysis presented in
Figure 1, which displays
kd with respect to
kt for the boundary values of the hourly average of the clearness index: overcast during the whole hour
(
Figure 1a) and clear sky
(
Figure 1b). The curve parameter is
. In a broad sense,
and
measures the cloud cover at two different time scales, hourly and daily, respectively.
Figure 1 shows the two parameters acting as a locator for the geometrical place of the
curve in the plane
. As the daily/hourly average of cloudiness increases (measured by a decreasing in
and
) the share of
DHI in
GHI increases as well. In other words, as the day and the hourly interval within the diffuse fraction are estimated to become cloudier, Equation (6) increases the likelihood for the diffuse fraction to take higher values. This discussion is valid for the atmospheric column content as well (e.g., atmospheric aerosol loading). Generally,
and
capture the whole atmospheric transmittance (atmospheric column content and clouds), the cloud cover is the most influential parameter. Looking again at
Figure 2a, it can be seen that the two parameters,
and
, ascribe to M2 the ability to cover the geometrical place occupied by the measurements in the plane
. This perspective will be discussed next in more detail.
Figure 1.
Diffuse fraction kd estimated with M2 (Equation (6)) with respect to the clearness index kt for two values of the hourly average of the clearness index khour (a) 0.3 and (b) 1.0. The curve parameter is the daily average of the clearness index kday.
Figure 1.
Diffuse fraction kd estimated with M2 (Equation (6)) with respect to the clearness index kt for two values of the hourly average of the clearness index khour (a) 0.3 and (b) 1.0. The curve parameter is the daily average of the clearness index kday.
Figure 2.
Diffuse fraction kd vs. the clearness index kt at the station in Payerne, Switzerland, in 2020. Measured (in gray) and estimated (in red) values, with the tested separation models, are displayed.
Figure 2.
Diffuse fraction kd vs. the clearness index kt at the station in Payerne, Switzerland, in 2020. Measured (in gray) and estimated (in red) values, with the tested separation models, are displayed.
Figure 2 shows how the estimates of the seven models cover the measurements in the plane
kt −
kd.
Figure 2a, already referred to, displays the measured diffuse fraction vs. the clearness index for D_FIT. This is a typical picture whose merit is to illustrate the wide dispersion of
kd compared to the classical predictor
kt. The estimates issued by a simple equation
, irrespective of their complexity, will always be accompanied by a substantial amount of uncertainty.
Figure 2b–f displays the estimates issued by the two proposed models (M1 (Equation (5)) and (M2 (Equation (6)) and the reference models PB (Equation (A2)), E2 (Equation (A5)), S1 (Equation (A6)), S3 (Equation (A7)), and Y4 (Equation (A8)) superimposed over the measurements. This allows a visual intercomparison of the models’ flexibility with the variation of the atmospheric conditions captured by the input parameters. PB (Equation (A2)) appears to overlap the data reasonably well (
Figure 2b) but is experiencing a relative underestimation of the diffuse fraction. The overlay of E2 over the measurements is lower, but the appearance and position of the estimates cluster indicate an accurate capture of the data mean (
Figure 2c). S1 captures almost all the areas of data (
Figure 2d). S3 best covers the space occupied by the measured data (
Figure 2e). Y4 also appears to overlap the data reasonably well while displaying a lower dispersion than S3 (
Figure 2f). As expected, the proposed models M1 and M2 cover the measurements to a large extent (
Figure 2g,h), since the two models are fitted on D_FIT. The flexibilities of M1 and S3 are comparable, explained by their relative high complexity: M1 operates with eight predictors, while S3 operates with seven predictors on two branches. Looking at
Figure 2 as a whole, it can be concluded that the proposed M1 and M2 models achieve with rational polynomial equations a similar behavior as the reference logistic models.
Figure 3 displays the density plot of
kd vs.
kt at the station in Payerne. It highlights the area in plane
kt −
kd where the measurements (
Figure 3a) and the estimates issued by each model (
Figure 3b–h) are clustered. Thus,
Figure 3 gives us a better perspective on how the models locate the estimated pairs
compared to the measured ones. In general, the models demonstrate the ability to agglomerate the estimates in the regions of the
space where the measurements show the highest density. It is useful to remember that the reference models are evaluated as high performing by independent studies [
5]. A visual comparison between
Figure 3a and
Figure 3g indicates that the M1 estimates meet almost the same distribution as the measured joint probability
kt −
kd, especially in areas with high probability density. Visible differences appear in the area with intermediate values of the clearness index, but fortunately the probability density is much lower here.
5. Performance Assessment
Aiming to efficiently validate the performance of the proposed separation models, a detailed intercomparison of the M1 and M2 accuracy to that of the reference models (PB, E2, S1, S3, and Y4) was performed. The estimates were evaluated against data from D_TEST (described in
Section 3). The tests were focused on the final product, i.e.,
DNI and
DHI, with the diffuse fraction regarded just as a proxy.
5.1. Statistical Indicators
The models’ accuracy was evaluated in terms of three statistical indicators: the determination coefficient
R2, the normalized root mean square error
nRMSE, and the normalized mean bias error
nMBE. The indicators are defined in
Appendix B.
First, we look closer at the performance of the M1 and M2 models in estimating DNI. For M1, the determination coefficient R2 falls between 0.745 and 0.949. The minimum value of R2 is obtained at the arid climate station in Tamanrasset, Algeria. The maximum value of R2 is achieved at Payerne, Switzerland, with a temperate continental climate. This was expected, with Payerne being the origin location of M1. Similarly, for M2, the determination coefficient R2 falls between 0.720 and 0.937. The minimum and maximum values were also achieved at Tamanrasset and Payerne, respectively. This means that the reduction by half of M1′s predictor number roughly keeps the fraction of variance in D_TEST explained by M1.
The models’ performance in estimating
DNI from the diffuse fraction estimates through Equations (2) and (3) is displayed in
Figure 4a in terms of
nRMSE. M1 achieves the best performance at the arid station in Gobabeb, Namib Desert, Namibia (
nRMSE = 14.0%), while the worst performance is reached at Lindenberg, Germany, a station with a temperate climate (
nRMSE = 41.2%). At the different stations in D_TEST, M1 exhibits both positive and negative biases. The minimum
nMBE was achieved at the tropical station on Cocos Island (
nMBE = 0.19%). The highest positive bias is experienced at the continental station in Regina, Canada (
nMBE = 15.3%), while the highest negative bias is experienced at the temperate continental station in Boulder, USA (
nMBE = −13.9%). M2 achieves the best performance at Gobabeb (
nRMSE = 14.8%) and the worst at Lindenberg (
nRMSE = 48.1%). In terms of
nMBE, M2 achieves the best result at the temperate station in Tateno, Japan (
nMBE = 0.40%). Similarly to M1, M2 experiences the highest positive bias at Regina (
nMBE = 19.0%) and the highest negative bias at the station in Boulder (
nMBE = −12.0%).
At first glance, the accuracy of M1 and M2 seems to be relatively low. However, this is not the case. With all the progress registered by the separation models, there is plenty of room for improvement. Visual inspection of
Figure 4 reveals high values of
nRMSE when the reference models were applied to estimate
DNI:
nRMSE ranges between 15.2% and 49.4%, both values being reached by S3. A large bias is also present, with
nMBE taking values between −19.0% (reached by E2) and +18.2% (reached by PB).
Secondly, the models’ performance in estimating
DHI from
estimates through Equation (2) was tested. The results are summarized in
Figure 4b in terms of
nRMSE. Overall, the
nRMSE range is wider than in the case of
DNI, making the distinction between the models more visible. The determination coefficient
R2 of M1 falls between 0.428 and 0.904. The minimum value is reached at the arid climate station in Solar Village, Saudi Arabia. The maximum value is achieved at the continental climate station in Toravere, Estonia. The determination coefficient
R2 of M2 falls between 0.427 and 0.892. The minimum value is reached at the arid climate station in Tamanrasset, Algeria. The maximum value is also achieved at Toravere.
In terms of nRMSE, M1 achieves the best result at the temperate station in Cabauw, Netherlands (nRMSE = 23.1%), while the worst is obtained at Tamanrasset (nRMSE = 51.9%). M1 experiences the minimum bias at the continental station in Budapest, Hungary (nMBE = −0.18%), while the worst result is obtained at the arid station in Alice Springs, Australia (nMBE = 26.79%). M2 achieves the best nRMSE at Toravere (nRMSE = 25.4%), while the worst is obtained at Tamanrasset (nRMSE = 52.9%). The lowest bias is achieved by M2 at Toravere (nMBE = 0.25%), while the maximum is obtained at Tamanrasset (nMBE = 29.2%). As this last value notes a very large bias, we underline that only at six stations, out of thirty-six nMBE for M2 exceeds 10%.
Overall, the performance of the reference models in estimating DHI is modest, with nRMSE falling between 23.7% (achieved by S1) and 82.5% (achieved by E2). At some stations, DHI estimates are significantly biased, with nMBE falling between −26.0% (achieved by PB) and +54.5% (achieved by E2). At most stations, the reference models experience a reasonable bias, with the 1st and 3rd quartiles of nMBE taking the following values, respectively: −5.8% and +3.8% for PB, −10.7% and +0.7% for E2, −5.2% and +4.6% for S1, −3.7% and 4.7% for S3, and −8.9% and 0.8% for Y4.
5.2. Model Ranking
We conclude the models’ evaluation with a statistical ranking. The overall performance of a model was calculated on the basis of the linear ranking method [
5]. Thus, the mean rank of a model is calculated as a weighted average:
where
is the mean rank of the
i-th model (
i = PB, E2, S1, S3, Y4, M1, and M2),
with
n being the number of stations where the model
i is in the
nRMSE hierarchy at position
k. For each station, the best model is ranked 1st, and the most ineffective model is ranked 7th.
Table 2 presents the ranking of the new and reference models based on the estimation of
DNI and
DHI. Each column in the table corresponds to the mean ranking (Equation (7)) of a particular model. The proposed model M1 is ranked first in estimating both
DNI and
DHI. For
DNI estimation, M1 achieves the first position with a mean rank of
, and is closely followed by S1 (
). For
DIF estimation, M1 also achieves the first position with a mean rank of
, closely followed by S3 (
). The detailed ranking of the models at the stations from D_TEST, on which the mean ranking (
Table 2) was built, is displayed in
Figure 5. Visual inspection shows that M1 performs best at the largest number of stations, taking first place at 16 stations for DNI estimation and at 17 stations for DHI estimation. The high accessibility of the M2 does not compromise accuracy.
Figure 5 emphasizes M2 as a real competitor among the best-performing current models, taking second place at three stations for DNI estimation and first place at five stations for DHI estimation.
Figure 5 also displays the models’ sensitivity to climate. The stations are clustered by climate zones, according to the Koppen–Geiger climate classification [
27]. Excepting the tropical climate A, where S3 is the most accurate model (with a mean rank (Equation (7))
for DNI estimation and
for DHI estimation), in all the other climates, the proposed model M1 performs with the highest accuracy. Thus, for DNI estimation, M1 achieves the following mean rank:
in arid climate B,
in both temperate climate C and in continental climate D. For DHI estimation, M1 achieves the following mean rank:
in arid climate B,
in temperate climate C and
in continental climate D.
6. Conclusions
This paper introduced two new separation models, which share global solar irradiance GHI into its direct-normal DNI and diffuse horizontal DHI components. The models effectively estimate the diffuse fraction at minute-scale resolution. Different from the already established minute-scale equations, the new models are differently formulated, being defined by rational polynomial equations. The first model, M1, operates with eight predictors. The M1 equation fails the beauty test but successfully passes the performance test. By contrast, the second model, M2, is defined by a beautiful equation. It operates with only three predictors defined on GHI measurements, which gives it high accessibility.
Validation of the new models was carried out against data collected from 36 stations covering the four major climatic zones. Five current top minute-scale separation models were considered references. The tests were performed on the final product estimations, i.e., DNI and DHI. Based on a statistical linear ranking method according to the models’ performance at every station, M1 leads the hierarchy, ranking first in both DNI and DHI estimation. The high accessibility of the M2 does not compromise accuracy. The results place M2 among the best-performing current models.
Finally, we can conclude that all the models, both the proposed and the reference ones, seem to evolve in tandem (more or less similarly accurate), which could suggest a limit that cannot be crossed in the traditional estimation process. In order to further increase the accuracy of separating the global solar irradiance into its primary components, a new approach to the development of algorithms is necessary. Future work will be directed in this direction.