1. Introduction
The Sun is considered a fundamental resource for most physical and chemical processes on Earth. Thus, processes related to the Sun are important for researchers [
1,
2]. The application of solar energy for different aims as a renewable energy source is an important priority for researchers [
3]. Renewable energy technologies are fast becoming more effective and cheaper and their application is widely growing. Solar radiation is one of the most important resources of renewable energy [
4]. In fact, the wide application of solar energy as renewable energy can decrease greenhouse emissions. Renewable energy, such as solar radiation, has a low effect on the environment. In this context, decision makers should know the available solar radiation in order to be able to identify the expected generation of renewable energy. Therefore, the development of an accurate forecasting model and generation of zone mapping for solar radiation is of vital importance for decision-makers. In addition, with the world becoming warmer, renewable energies can balance the ecological conditions and the production of domestic power can be carried out based on the use of solar energy [
5,
6].
In fact, Solar Radiation (SR) affects climate processes, and agricultural production is governed by solar energy [
7]. SR is necessary for plant growth, photosynthesis, and regulation of the growth duration. The estimation of SR has many applications in different fields, including agricultural engineering, building engineering, the energy and hydrology fields, and power and heat production [
8]. For example, the accurate estimation of irrigation is considered an important issue for irrigation planning and design [
9]. Additionally, SR is considered an important issue for evaporation computation [
10]. Thus, access to solar data and an accurate model for the prediction of SR data are important for the leverage of the solar energy potential in particular locations. There are different empirical models and equations for SR estimation. These models are highly important because of the economic limitation and measurement complexity of SR estimation in some locations, rendering empirical models suitable for radiation estimation [
11]. Additionally, remote sensing and satellite images can be considered effective tools for solar energy estimation. However, these empirical models require various parameters that might be too complex to be accurately estimated in some locations. In addition, some models cannot provide a good estimation of SR under changing climate conditions and other conditions [
12]. The station height from sea level, longitude, latitude, relative humidity, and pollution accumulation in the atmosphere affect radiation estimations. Thus, the pollution content is an important and influential issue affecting SR, and the application of accurate tools for radiation estimation under the effect of different parameters is very important for decision makers [
13]. Soft computing methods can be effective as powerful tools based on large data sets because these methods can accurately simulate hydrological or other variables. These methods can effectively adapt to climate and hydrological boundaries, decrease the computational time, and ensure high accuracy in hydrological predictions [
14]. In addition, soft computing methods can be effective tools for estimating SR when different parameters, such as particle pollution, temperature, humidity, etc., have a significant effect on SR [
15]. The present study attempts to simulate the SR in East Azarbayejan in Iran in some stations in the presence of particle pollution. The current article presents self-organizing maps (SOMs) with a multi-objective shark algorithm (MOSA), and the fuzzy method is applied to select the optimal input combinations, identify the best value of the ANFIS parameters, and generate the spatial distribution of SR. The SOM is used as a clustering method to identify impactful spatial SR values, and the results are compared with those obtained using the multi-objective genetic algorithm (MOGA) and multi-objective particle swarm optimization (MPSO).
2. Background
An ANN model based on different numbers of inputs in different cities in Turkey was previously used to simulate SR [
16]. The results indicated that using a pre-selection procedure is necessary for the determination of the inputs because the elimination of some input parameters could significantly decrease the accuracy of the models. The ANN used for different climate change conditions had acceptable performance for SR based on the accuracy of the parameter selection.
Another study simulated the monthly and daily total global SR based on an artificial neural network (ANN) [
17]. The soil temperature, wind speed, temperature, and mean monthly rainfall were used to estimate SR. The results indicated that the mean absolute percentage error (MAPE) based on the ANN was approximately 5.34% and that the ANN method decreased the MAPE compared with the output of the regression method with a high correlation.
The auto regressive moving average mode (ARMA) and multi-layer perceptron (MLP) have been used as hybrid models to simulate hourly SR [
18]. The results indicated that the hybrid model had a lower root mean square error (RMSE) than the simple MLP model, and the necessary data for this research were obtained based on a numerical weather simulation model.
A regression model was used to estimate hourly SR values [
19]. The relative humidity, atmospheric pressure, air pollution index, and mean rainfall were used as inputs in the models. The results indicated that the models incorporating the air pollution index could produce a relatively accurate estimate of SR with a high Nash–Sutcliffe efficiency value and a small RMSE value.
The daily SR has also been predicted by a support vector machine (SVM) [
20]. The SVM method was used based on the sunshine ratio as an effective input, and the results were compared with those obtained using empirical models. Compared to the empirical models, the SVM models could significantly reduce the relative error and provide more accurate predictions of the winter season.
Vakili et al. [
21] simulated the daily SR based on an ANN while considering the suspended particulate matter. The temperature, rainfall, humidity, and suspended particle characteristics were used to simulate SR in Tehran Province, Iran, and the results indicated that compared to the other applied methods’ input structure, the ANN considering the suspended particles could simulate SR with the lowest error in terms of the RMSE and mean absolute error.
The hourly global horizontal irradiance (HGI) was estimated based on Meteosat imagery and an ANN [
22]. The data were obtained from a radiometric station. The Heliosat-2 model was used to compare the results with those obtained using an ANN. The results obtained by the ANN based on different sky conditions had a significantly lower RMSE value than those obtained by the Heliosat-2 model.
Celik et al. [
23] applied an optimized ANN for SR estimation over the Eastern Mediterranean. The results indicated that the daily SR could be predicted based on the optimized ANN with high accuracy such that the optimized model could accurately determine the number of hidden neurons and weights in the ANN, and the RMSE was decreased by approximately 10% to 12% compared to that obtained using regression methods.
A moderate-resolution imaging spectroradiometer (MODIS) and an ANN have also been used to obtain SR estimates [
24]. Land surface temperature (LST) data were used as input data to the ANN, and the results indicated that the relative error of the ANN was 5.35%, while the error of the regression models was 10.23%.
The new daily SR model (NDSRM) using the air quality index (AQI) was applied in multiple cities [
25]. The results differed according to whether the AQI was added or removed from the inputs such that the predicted SR based on the elimination of the AQI in some cities had high accuracy, whereas the model accuracy depended on the AQI value as input in other cities.
The ANN model and inverse distance weight (IDW)-based model were used to simulate SR at distances greater than 50 km [
26]. The results indicated that the IDW model had an RMSE of approximately 6.4%, while the RMSE of the ANN model was 5.11%, and the IDW model was simpler and more accurate than the ANN model.
Yoe et al. [
25] applied the SVM to SR estimation based on the air quality index. The results indicated that the Nash–Sutcliffe efficiency varied from 0.682 to 0.740, and the models with the AQI input provided a higher accuracy in solar estimation than those without this input.
The daily SR considering the air quality index was simulated by a support vector machine (SVM) in a large region with different climates [
2]. The results indicated that the elimination of the AQI input among the other inputs could significantly decrease the accuracy of the estimation model; the SVM featured a high accuracy, and simple structures were found to have a high ability to absorb SR.
Fan et al. [
27] applied an SVM for SR estimation while considering atmospheric particulate matter (PM). Daily metrological and air pollution data were used to simulate SR. The inclusion of PM with a diameter of 2.5 micrometres (PM 2.5), PM 10, and O
3 in the input combinations were considered the best combination of inputs and improved the results of the SVM.
Furthermore, many research efforts have attempted to simulate SR based on soft computing methods under different conditions, such as using air pollution data. Different air quality indexes have been used to evaluate the air quality, such as the air pollution index (API) and air quality index (AQI) [
28]. The AQI is used to provide a daily evaluation of the air quality. This index is used to present the air quality to the population while focusing on the respiratory effects that can be observed some hours or days after exposure [
29]. The AQI is computed based on models or air monitors considering nitrogen dioxide, ozone, carbon monoxide, and sulfur dioxide [
30]. This index ranges from 0 to 500, which is divided into different classifications, and each classification is related to different levels of human health. The country of Iran experiences considerable air pollution in different cities; this air pollution is usually measured and evaluated with the index, and the classification changes accordingly [
31]. For example, when the index value is within the range of 0 to 50, the air quality is good, and when the index value is greater than 300, the air quality is very dangerous. A literature review of the pollutant particle effect on SR shows that several studies have considered the effect of air quality on SR [
31].
Thus, the main purpose of the current paper is to evaluate the effect of air quality on SR using soft computing models to provide a comprehensive discussion concerning the influences of the air quality on the accuracy of SR prediction. In this study, the ANFIS model was selected as the soft computing method because ANFIS is suitable for predicting stochastic nature variables, such as SR. However, in the ANFIS model procedure, there is a need to initialize a few internal parameters that are usually selected using the trial-and-error process. The selection of the optimal values of these parameters significantly affects the accuracy of the model performance. In this context, there is a need to optimize the value of the ANFIS’s internal parameter to ensure an acceptable level of prediction accuracy. In fact, the shark algorithm is widely accepted and has been successfully applied in the fields of water resources and power generation, mathematical simulations, the design of trusses, and other fields [
32,
33]. Therefore, the shark algorithm is used as an effective optimization tool to obtain the optimal parameters for the ANFIS. The rotational movement of the shark in this algorithm improves the ability to search for the global optima of the ANFIS’s internal parameters. The proposed integrated adaptive neuro-fuzzy inference system with multi-objective shark algorithm (ANFIS-MOSA) model was examined in SR prediction in three different case studies. In addition, comprehensive analyses were carried out to compare the proposed model to other models.
4. Case Study
The current study considers SR over Ardebil, Gilan, and East Azarbayejan. East Azarbayejan has an area of approximately 47,830 km
2 (
Figure 1). The northern extent of East Azarbayejan is a part of the Republic of Azarbayejan and Armenia. Zanjan Province is located in the south of this province, and the Sahand Mountain, which has a height of 3707 m, is one of the highest points in the province. The annual temperature from 2008 to 2018 ranged between 25 and −15 °C. This province is located between the longitudes of 45°0′ E and 47°50′ E and latitudes of 36°50′ to 39°50′. This province is mountainous as follows: 40% of the province area has mountainous conditions. The climate in this province is cold and dry, but the different topographies cause variations in the climates. The maximum temperature, precipitation, and number of sunny hours obtained from three stations, i.e., Ahar (longitude: 47°48′ and latitude: 38°28′), Bonab (longitude: 45°70′ and latitude: 37°20′), and Sarab (longitude: 47°23′ and latitude: 37°51′), were considered. Additionally, the AQI values at these 3 stations or cities were obtained based on air quality monitoring. Ardebil Province occupies an area of approximately 3.17 km
2 between the longitudes of 47°00′ E and 48°50′ E and latitudes of 37°00′ N to 40°00′ N. Different climates, such as Mediterranean, moderate Mediterranean, and mountainous climates, can be observed in this province. The average annual precipitation in this province is 462.5 mm, and the annual temperature varied between −3 °C (minimum temperature) and 34 °C (maximum temperature) from 2008 to 2018. The following three stations in this province were used to characterize the temperature, precipitation, sunny hours, and AQI data: Ardebil station (longitude: 47°29′ and latitude: 38°00′), Ebrahimabad (longitude: 48°29′ and latitude: 38°22′), and Phyroozabad (longitude: 48°20′ and latitude: 37°59′). Gilan Province is located in the north of Iran within the boundary demarcated by these stations.
A moderate climate is observed in this basin. The area of this province is 14,044 km
2, and the average annual precipitation varied from 1200 to 1800 mm during the period from 2008 to 2018. The climatology-oriented Rasht Institute (longitude: 49°39′ and latitude: 37°12′), Anzali (longitude: 49°39′ and latitude: 38°20′) and Astara (longitude: 48°51′ and latitude: 38°21′) are used to record the different data. The AQI at the different stations is computed based on the following equation:
where
I is the air quality index,
C is the pollutant concentration,
is the concentration extreme equal to or lower than
C,
is the concentration extreme equal to or greater than
C,
Ihigh is the index breakpoint related to
, and
Ilow is the index breakpoint related to
.
Table 1 shows the different classifications of AQI (Air Quality Index).
Figure 2 shows the average AQI during different months in Ardebil, East Azarbayejan, and Gilan Provinces. Clearly, the greatest variation in East Azarbayejan can be observed in January as follows: The minimum AQI in January is 46, and the maximum value is 74. The lowest variation in East Azarbayejan is observed in May (minimum AQI = 60 and maximum AQI = 62). The highest AQI value is observed in March (AQI = 79.5) in East Azarbayejan, and the lowest AQI value in East Azarbayejan Province (EAZP) occurs in April. This index in EAZP shows that the first quartile and third quartile in most months exhibit an AQI > 50, and the conditions appear to differ only in April. The other values of the AQI index can be observed in the other provinces.
Table 2 shows the variation in the average temperature during the different months during the period of 2008–2018. July exhibits the greatest variation based on the high value of the variation coefficient in Azarbayejan (EAZP), and the maximum temperature in this province occurs during this month. The minimum temperature in Ardebil Province (AP) occurs in January, and the greatest variation in this province can be observed in May. The other details are listed in
Table 2.
Figure 3 shows the precipitation values in the different provinces. For example, the highest precipitation value in AP occurs in March, and the greatest variation in precipitation in AP is observed in this month. The lowest precipitation value in AP occurs in December. The other details of the other provinces are shown in
Figure 3. Additionally, the number of sunny hours at different stations is shown. For example, the number of sunny hours (NSN) in June in Gilan Province (GP) is as follows: The most variation in the NSN occurs in June, and the lowest variation can be observed in July. The other details are shown in
Figure 3.
The inverse distance weight (IDW) method was used to obtain the SR in the different zones. This method has an effective feature allowing the optimal value to be obtained based on a multi-objective optimization framework as follows:
where
is the estimated precipitation at each point,
Di is the difference between the predicted and observed data, and
q is the power parameter.
Three objective functions are considered in the ANFISmulti-objective optimization algorithm [
40]. The
obg function is considered to select the best input parameters for the ANFIS, and the RMSE is used as an objective function to obtain the best value of the ANFIS parameters. The general standard deviation (GSD) is used to obtain the optimal power parameter for the IDW:
where
is the training data number, RMSE
train is the root mean square error of the training data,
is the root mean square error of the test data,
is the mean absolute error of the training data,
is the mean absolute error of the test data,
is the correlation coefficient of the training data,
is the correlation coefficient of the test data, T
i represents the simulated data,
is the average value of the simulated data at different points, and O
i represents the observed data. Lower RMSE, MAE, and GSD values are considered better. First, the ANFIS model is considered based on the initial estimates of the linear and nonlinear parameters, and different components (N
s: Number of sunny hours, T
max(t−3): Maximum temperature with a three-month lag, T
max(t−6): Maximum temperature with a 6-month lag, T
min(t−3): Minimum temperature with a three-month lag, T
min(t−6): Minimum temperature with a six-month lag, Rainfall
(t−3): Precipitation value with a three-month lag, Rainfall
(t−6): Precipitation value with a six-month lag, and the AQI indexes) are used as inputs. The ANFIS simulates the results. The IDW is used to simulate SR in the different zones, and then, the Multi-Objective Shark Algorithm (MOSA) is used based on the initial population used for the selection of inputs, the optimal determination of the adaptive neuro-fuzzy inference system (ANFIS) parameters, and the selection of the power parameters for the IDW. As shown in
Figure 4, the different operators apply the candidate solutions, and then, these solutions are returned to the ANFIS subroutine for another simulation iteration. If the stopping criteria are satisfied, the process finishes with the optimal results. The modified technique for order preference by similarity by the ideal solution (M-TOPSIS) is used to select the best solution from the Pareto form based on the following equations:
where
xj+ is the ideal solution (largest maximization criterion value or smallest minimization criterion value),
xj− is the negative ideal solution (largest minimization criterion value or smallest maximization criterion value),
is the distance from the ideal solution,
is the distance from the least ideal solution,
xij represents the results of alternative
i considering criterion
j, and
is the similarity ratio, and this index of the solution is sorted by descending values to show the rank of each solution.
Additionally, the following indexes are used to select the best multi-objective algorithm:
● Cover Surface (CS)
This index presents the relative score of the solutions in set B that are weakly dominated by set A as follows:
If the index value equals 1, all solutions in set B are weakly dominated by those in set A, and if the index value equals −1, any solution in set B is dominated by the solutions in set A. However, the index value can have values other than 1 and −1, which could indicate that the number of solutions in set A is covered by those in set B.
● General Distance (GD)
This index shows the closeness value of the computed Pareto solutions to the true Pareto solution. If
Q is considered a set obtained by the MOSA, the GD is computed based on the following equation:
where
P* is a reference solution (a set of all possible true Pareto solutions),
is the distance of the solution obtained by the algorithm to the best solution, and
is the m-objective value of the
kth member of
P*. A lower value of this index is more favorable for decision makers.
● Spread Index (SI)
The SI presents the diversity value of the obtained and archived solutions among the non-dominated solutions.
where d
i is the Euclidean distance between successive solutions among the obtained non-dominated solutions,
is the average of all distances d
i,
N is the number of solutions in the best non-dominated front, and
is the computed distance of the extreme solution between the obtained Pareto of the m
oth objective and the true optimal Pareto.
● Spacing Metric (SM)
This index is computed by measuring the distances of successive solutions in a non-dominated front and shows an evaluation of the spread of vectors in the total set of non-dominated solutions.
where
,
and
fik is the value of the
ith objective function of the
kth member.
6. Conclusions
The current paper aimed to simulate SR based on the ANFIS-MOSA, and the IDW was used to obtain zone maps of three provinces. Pareto solutions were obtained using different algorithms and the ANFIS model. The different indexes showed that ANFIS-MOSA performs better than the other models, and the low value of the RDI index showed that the Pareto obtained using the MOSA and ANFIS matched well with the ideal solution. The MTOPSIS model was used to select the optimal solutions, and different indexes, such as the RMSE, MAE, and Taylor diagram, showed that the obtained ideal solution performance was the highest for the Pareto solution with a significant difference. Then, the effect of the AQI parameter on the results was analyzed. The results showed that the elimination of the AQI parameter decreased the accuracy of the zone map. Additionally, different models with and without the AQI parameter were considered, and the results showed that the error index without the AQI parameter was significantly higher. Finally, the uncertainty of the obtained data was considered to determine its effect on the results, and the high p factor value and low d factor value illustrated the adequate performance of the proposed model. The proposed model can not only simulate SR with an acceptable level of accuracy but also add a new direction to include multi-objective functions to evaluate the performance of the prediction model. For example, to improve the performance of the proposed model, another objective function could be considered to represent the risk performance, such as experiencing ± maximum errors. In this context, an objective function that represents the risk performance could be added to address the probability occurrence of the ±maximum errors at any time during the span of the prediction time.
In fact, the current research focused on studying the performance of the proposed model considering the time period dimension. However, there is an important dimension that could be considered for further analysis, which is the importance of a certain parameter at a specific location. In this context, it is essential to recommend this direction of research to be carried out in future research.