1. Introduction
Due to the increase in the world’s population, the focus of the agricultural sector has primarily been on increasing food production quantity without degrading its quality. These requirements have highlighted the need to use advanced technologies. Today, we are in a stage of great growth in the agricultural sector with the use of computers, artificial intelligence, robotics, and the Internet of Things (IoT) having reduced traditional agriculture to a smart version of itself. In the era of smart agriculture, the receipt of large amounts of data from various sources for various times and locations, has helped to better monitor and manage crops, resulting in natural resource savings and reduced energy consumption, and allowing people to become more prudent and efficient in their choice to avoid products that could cause serious problems not only for the ecosystem but also for human health (e.g., pesticides). Smart agriculture with the help of innovative techniques and technologies can benefit from every aspect of the production process, from the planting stage to the final stage of harvest [
1].
Greenhouses that have the ability to produce food out of season in quantities that can meet the needs of the ever-growing population, present significant advantages in the field of agriculture. A high greenhouse efficiency requires controlling the microclimate inside the greenhouse, mainly the temperature and relative humidity. The deviation in the values of these parameters from the desired levels can cause stress on plants, and the likelihood of the growth of pathogenic microorganisms inside the greenhouse is high and usually destructive to the crop [
2,
3]. More specifically, the exposure of plants to extreme temperatures can negatively affect both productivity (from the stage of reproduction to the formation of fruits) and the phenological growth of plants [
4,
5], and it is considered that the increase in relative humidity to levels outside the desired range not only favors the growth of pathogenic microorganisms, but at the same time, it delays the growth of plants by impeding them from absorbing nutrients [
4]. However, proper control and regulation of the greenhouse microclimate results in a greater quantitative and qualitative food yield and also presents significant economic benefits. The largest share of the energy consumption of a greenhouse (~90%) is due to the so-called basic energy consumption, which includes among other things, heating, cooling, and dehumidifying, the latter corresponding to almost 50% of the total energy consumption of the greenhouse [
6]. Finally, environmental benefits can also be achieved through better management of energy consumption and water use [
5].
In Greece, the total arable area amounts to about 132 million acres. Of these, only 61,000 acres are covered by greenhouse installations. These greenhouses are mainly used to produce vegetables at a rate of about 92%, whereas the remaining percentage (8%) corresponds to ornamental crops. Most greenhouse facilities are in the region of Crete (~45%) followed by the region of Peloponnese (~15%). The rest are scattered in regions throughout the country. The main greenhouse cover material in Greece is plastic (polyethylene), with the percentage of vegetable production in such greenhouses being equal to 95.18%. The remaining percentage of vegetable production corresponds to production in glass-covered greenhouses [
7,
8].
Greenhouses are complex, nonlinear, dynamic systems whose internal microclimatic conditions are highly dependent on the constantly changing external conditions. Apart from the influence of the external conditions on parameters such as temperature and relative humidity inside the greenhouse, processes that take place inside the greenhouse also contribute. For example, evapotranspiration is a process during which the release of water vapor, both from plants and from the soil, acts as a catalyst for the relative humidity inside the greenhouse, and the temperature depends highly on the incident solar radiation, not only due to the thermal energy transferred directly to the indoor air of the greenhouse but also because of the energy stored in the ground which greatly modulates the indoor temperature during the nighttime. The interaction between the indoor climatic parameters is intense, with the relationship between temperature and relative humidity being direct, whereas other parameters, such as the concentration of carbon dioxide (CO2), have an important contribution to the system.
Due to the existence of the boundary layer from a few hundred meters up to the first 1–2 km in the atmosphere, the meteorological conditions (e.g., temperature, humidity, wind speed, etc.) are greatly affected by the ground, which leads to a strong diurnal variability and makes it imperative that the time scales at which the various processes are investigated be as small a step as possible [
9]. This intense variability of the parameters of the external environment of a greenhouse, combined with the strong interaction between the external and indoor conditions, makes the control of the microclimate of the greenhouse particularly difficult. Therefore, to better manage the yield and the energy consumption of a greenhouse, the maintenance of the desired greenhouse climate should be based on a computational decision support system (DSS), which will be particularly flexible to the continuous changes of the parameters of the external environment, and the decisions should be taken on many different time scales. It is important that the accuracy of the models based on these systems is as high as possible so that the operation of the various actuators (e.g., for heating, cooling, ventilation, etc.) is targeted and has as immediate as possible a response, whether they are passive actuators or mechanically engineered with electricity requirements [
10,
11,
12].
Modeling the microclimate of a greenhouse differs depending on the purpose. There are two main categories of models, the physical ones, which are mainly used if the main purpose is the study and knowledge of the natural processes that take place within the greenhouse, and the black-box models, if the main objective is the applications and design of systems related to greenhouse management. The development of a physical model presents a high degree of difficulty, especially because the greenhouse is a nonlinear complex. Such a model is mainly based on the laws of thermodynamics and of heat transfer and mass transfer. Therefore, parameters such as solar irradiance and factors related to heat and mass transfer need to be calculated with high accuracy [
13,
14,
15]. On the other hand, the black-box models are based on the system identification (SI) process. SI is a methodology that depends mainly on experimental input and output data. These models provide an effective and accurate description of the behavior of various parameters without the need to model the internal system processes [
16]. To develop such a model, it is necessary to follow a procedure that includes signal measurements at a specific time, a choice of model structure, the selection and definition of how the appropriate parameters of the model will be calculated, and the validation and evaluation of the model using a dataset separate from the one used for the rest of the process [
17].
As mentioned above, greenhouses are dynamic, nonlinear, and highly complex systems with constant interactions between the microclimate variables. Therefore, the combination of the above with the need to create models that will be applied directly to the control of the greenhouse gives the black-box models and artificial intelligence techniques in general special value. One of these techniques is the modeling and control of the greenhouse microclimate through neural networks. Neural networks (NN) are based on the logic of the biological neural system, where signals received from a cell body through a network called dendrites are transferred to different neurons through a fiber called an axon. The connection of an axon with a dendrite of a different cell body is called the synapse. The state of such a system largely depends on the arrangement of the above, and the forces between the synapses are strengthened or weakened depending on the process of learning [
18]. In an artificial neural network (ANN), the signals are transferred starting from the nodes (neurons) of the input layer to the nodes of a “hidden” layer, which are activated according to their power, finally transferring the signal to the nodes of a final layer called the output layer. The above process for the extraction of remarkable results requires a careful choice of how neurons are connected (topology), a learning algorithm according to which learning process will be performed, the number of hidden layers and nodes, and the variables that will introduce the information into the network [
19].
Neural networks are widely used in greenhouses not only for the control, but also the prediction of microclimatic parameters, producing very remarkable and clear results. The research in this field concerns the use of different architectures for creating a neural network, such as the use of feedforward or recurrent neural networks and also different training algorithms. More specifically, Singh & Tiwari [
2] tested a feedforward NN with only one hidden layer and a different number of nodes to predict the indoor temperature and relative humidity one day ahead. After several experiments with the use of three to ten hidden nodes, it was found that the structure with four hidden nodes showed the best results, with the coefficients of determination being equal to 0.980 and 0.967, for temperature and relative humidity, respectively. Castañeda & Castaño [
20], created a multilayer perceptron Neural Network with Levenberg–Marquardt as a training algorithm, which can predict the indoor temperature, with the calculated coefficients of determination being equal to 0.9549 and 0.9590, for the winter and summer season, respectively, to control the frost inside the greenhouse. Choi et al. [
21] also trained an MLP neural network using the data of both the external and internal greenhouse conditions as the input variables, to predict the indoor temperature and relative humidity for 10 to 120 min later. The neural network consisted of four hidden layers and a different number of nodes for temperature and relative humidity, with the coefficients of determination being equal to 0.988 and 0.990, respectively.
Recurrent NNs are also widely used with the Elman structure among others being the most well-known. Hongkang et al. [
22] created an Elman network that is based on a dynamic backpropagation training algorithm. As input variables, they used the parameters of the internal environment, such as air and substrate temperature, relative humidity, CO
2 concentration, and illumination. From this model, coefficients of determination greater than 0.9 were obtained, and more specifically their values for temperature and relative humidity were equal to 0.925 and 0.937, respectively. Moreover, Salah & Fourati [
23] combined an Elman network with a deep multilayer FF neural network for the greenhouse control. The first network for the simulation of the direct dynamics of the internal processes was used, whereas the second one was for the inverse dynamics. Finally, Taki, et al. [
24] compared three different models for the estimation of three different temperatures inside the greenhouse, the air temperature, the soil temperature, and the plants’ temperatures. After comparing a radial basis function (RBF) model, an MLP, and a support vector machine (SVM) model, it was found that the first one presented the best results.
The present study aims to create a model using artificial neural networks (ANN), which can predict temperature (Tin) and relative humidity (RHin) inside the greenhouse, based on outside temperature (Tout) and relative humidity RHout), wind speed (WS), solar irradiance (SR), as well as internal temperature and relative humidity up to half an hour before. After extensive research in the literature in recent years, no similar study has been found. The goal is for the model to show as low a maximum error as possible between the predicted and observed data so that it can respond adequately to a decision support system (DSS).
3. Results and Discussion
In the present study, an MLP neural network was used for the prediction of the temperature and relative humidity inside the greenhouse. To find the best architecture, and more specifically the number of nodes in the hidden layer, the model was trained and tested for the corresponding periods mentioned in
Section 2. After performing the procedure for a different number of nodes each time (from 1 to 20) and based on the statistical indices MAE, RMSE, R
2, and the maximum error, it was found that the best structure of the neural network is 10-7-2, which gave the most reliable results for the testing period. The model structure extracted from MATLAB is presented in
Figure 9, whereas the values of the aforementioned indices are presented in
Table 3, both for temperature and relative humidity. According to the specific structure of the neural network, the following graphs of comparison were made between the observed and predicted values of the two variables.
The model training process was completed after 89 epochs (
Figure 10), where the validation mean square error (MSE) increased 6 times in a row. The number of epochs was selected to be 83 where the validation error presented its minimum value, which was equal to approximately 5.447 × 10
−5. This value was obtained through the normalized input and output variables and presents the best performance of the model.
Regarding the greenhouse temperature and based on the values of the MAE and RMSE, which were calculated at 0.218 K and 0.271 K, the ability of the model to predict the temperature to a fairly satisfactory degree is established. At the same time, the values of MAE and RMSE differ slightly from each other, indicating the absence of large errors produced by the model. The relative humidity is on the same wavelength, with the values of MAE and RMSE being very small and equal to 0.339% and 0.48%, respectively. The coefficients of determination are the same for both parameters and equal to 0.999, which indicates the very good performance of the model. Finally, the maximum errors are equal to 0.877 K and 2.838% for temperature and relative humidity, respectively.
In
Figure 11 and
Figure 12, the time series of the predicted and observed temperature values for the 3 days used for testing the model are presented as well as the errors between the respective values. As shown in
Figure 11, the model has predicted temperature to a large extent, and the largest errors occur during the daytime hours when the temperature presents high values but also abrupt changes (
Figure 12). These errors have the potential to be reduced by adding more samples before training the model. In
Figure 13 and in the case of relative humidity, great predictability is presented again, with the two curves being almost identical. However, according to
Figure 14 and unlike the temperature case, errors do not seem to follow a fixed pattern.
In this study, to derive the optimal values of the model parameters, three different algorithms were initially used, the Levenberg–Marquardt backpropagation algorithm, the Bayesian regularization backpropagation algorithm, and the BFGS quasi-Newton backpropagation algorithm. The first of these three algorithms (methods) proved to be the best. In addition to the above, there are several different optimization techniques that could be used, such as different training algorithms (e.g., scaled conjugate gradient, conjugate gradient with Powell/Beale restarts, etc.). Relevant research comparing a large number of algorithms has already been conducted by Taki, et al. [
24], with LMBP giving the best results.
Three different types of error descriptions were used to evaluate the results and compare the predicted and observed values for the two parameters under study (indoor temperature and relative humidity). Thus, the comparison of the results with a wide range of literature studies was achieved, as in each study a different formula was used to evaluate the error. The values calculated for the MAE and RMSE were found to be quite small, which gives the created model special weight. At the same time, after extensive research of the literature, the coefficients of determination, R
2, for both temperature and relative humidity, proved to be very good, presenting values very close to 1. Finally, the maximum error is not a widespread data comparison value; however, for the needs of this work, it was a very useful parameter, with its value being quite small [
29].
4. Conclusions
Modeling the greenhouse microclimate has proven to be a very difficult process due to the fact that a greenhouse is a dynamic system that is directly dependent on external environmental conditions, whereas the interactions between the parameters that constitute its internal conditions seem to be both strong and complex. Although physical models have been developed to estimate parameters such as indoor temperature and relative humidity, their complexity does not make them a useful tool, especially for decision support systems (DSS), which require models that in addition to high accuracy, can manage huge amounts of data with as little computing power as possible. The above requirements are largely met by modeling greenhouse conditions using artificial neural networks (ANNs), for which knowledge of physical processes is not necessary, providing a quick and easy way to assess the desired parameters through system identification processes.
The purpose of this study was to create a multilayer perceptron artificial neural network model to estimate indoor temperature and relative humidity, taking as input variables the outside temperature and relative humidity, the wind speed, the solar irradiance, and the indoor temperature and relative humidity with a time delay of three timesteps. Model predictions can be used as a basis for a DSS. The main goal was for the predictions’ maximum errors to be as small as possible, as this value plays a key role in a DSS. Even an incorrect prediction of the model is likely to either delay the operation of an actuator (e.g., heating or cooling system) or not even put it into operation, with crop destruction being a very likely possibility. At the same time, a clear assessment of the indoor environmental parameters will allow the actuators to be switched on only when really needed while keeping the energy consumption low.
Before the implementation of the model, the data were preprocessed, removing the missing values, and normalized into a range from 0.01 to 0.99 in order to avoid the weighted contribution due to the input variables’ different orders of magnitude. For the training and validation of the model, 59 days of data were used, which were randomly divided into two subsets, with 80% referring to the training process and 20% to the validation process. The three remaining days were used to test the model. The structure of the model consists of three different layers, the input, the hidden, and the output layer. The Levenberg–Marquardt backpropagation algorithm was selected through the trial-and-error method including two different algorithms, the Bayesian regularization backpropagation and the BFGS quasi-Newton backpropagation. logistic sigmoid and Llinear were used as the activation functions for the hidden layer and the output layer, respectively. The logistic sigmoid was chosen between the logistic sigmoid, the hyperbolic tangent sigmoid, the radial basis, and the positive linear. The model was implemented for a different number of nodes in the hidden layer (1 to 20), with the best results for the testing period being obtained for the 10-7-2 structure. The maximum error was equal to 0.877 K and 2.838% for temperature and relative humidity, respectively, and the MAE, RMSE, and R2 were calculated to equal 0.218 K, 0.271 K, and 0.999 for temperature, and to 0.339%, 0.481%, and 0.999 for relative humidity. The above values, both for the maximum error and for the rest of the statistics, prove that the specific model can satisfactorily meet the requirements of a decision support system.
The above-proposed model can respond to a DSS, making it a very important tool. However, the existence of a crop inside the greenhouse could create problems in extracting reliable results from the model. Introducing biological processes, such as evapotranspiration, relative humidity, and consequently temperature, would add one more factor to influence them besides the conditions of the external environment. At the same time, as found in the results, the largest errors of the model occur at times when both the temperature and the relative humidity show abrupt changes. Therefore, the ability of the model to predict the above parameters should be studied in case the presence of actuators (e.g., heating systems, dynamic ventilation) causes relatively abrupt changes in the microclimate of the greenhouse. Finally, adding data describing extreme conditions, such as a summer heatwave or a very cold winter day, will be able to train the model in extreme cases that could cause significant crop problems. Future work could include studying the ability of the model to predict the temperature and the relative humidity under the influence of a crop on the internal greenhouse conditions, the addition of parameters concerning the actuators of the greenhouse but also the addition of more data to the model and train it in further cases, and gaining the ability to respond to extreme conditions and rapid dynamic changes.