Short-Term Prediction Method for Gas Concentration in Poultry Houses Under Different Feeding Patterns

Xu, Yidan; Teng, Guanghui; Zhou, Zhenyu

doi:10.3390/agriculture14111891

Open AccessArticle

Short-Term Prediction Method for Gas Concentration in Poultry Houses Under Different Feeding Patterns

by

Yidan Xu

^1,2

,

Guanghui Teng

^1,2,* and

Zhenyu Zhou

^1,2

¹

College of Water Resources and Civil Engineering, China Agricultural University, Beijing 100083, China

²

Key Laboratory of Agricultural Engineering in Structure and Environment, Ministry of Agriculture and Rural Affairs, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(11), 1891; https://doi.org/10.3390/agriculture14111891

Submission received: 21 September 2024 / Revised: 18 October 2024 / Accepted: 22 October 2024 / Published: 25 October 2024

(This article belongs to the Section Ecosystem, Environment and Climate Change in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Ammonia (NH₃) and carbon dioxide (CO₂) are the main gases that affect indoor air quality and the health of the chicken flock. Currently, the environmental control strategy for poultry houses mainly relies on real-time temperature, resulting in lag and singleness. Indoor air quality can be improved by predicting the change in CO₂ concentration and proposing an optimal control strategy. Combining the advantages of seasonal-trend decomposition using loess (STL), Granger causality (GC), long short-term memory (LSTM), and extreme gradient boosting (XGBoost), an ensemble method called the STL-GC-LSTM-XGBoost model is proposed. This model can set fast response prediction results at a lower cost and has strong generalization ability. The comparative analysis shows that the proposed STL-GC-LSTM-XGBoost model achieved high prediction accuracy, performance, and confidence in predicting CO₂ levels under different environmental regulation modes and data volumes. However, its prediction accuracy for NH₃ was slightly lower than that of the STL-GC-LSTM model. This may be due to the limited variability and regularity of the NH₃ dataset, which likely increased model complexity and decreased predictive ability with the introduction of XGBoost. Nevertheless, in general, the proposed integrated model still provides a feasible approach for gas concentration prediction and health-related risk control in poultry houses.

Keywords:

granger causality; XGBoost; LSTM; ventilation

Graphical Abstract

1. Introduction

The intensive poultry production system is designed to improve energy efficiency, making it a relatively closed environment [1]. The temperature, relative humidity, harmful gas concentration, and other environmental factors in poultry houses significantly affect the growth and health of the poultry. In particular, the concentration of harmful gases can have negative effects on local environmental management and the health of workers. Therefore, it is crucial to control the air quality in poultry houses in a timely manner [2]. Ammonia (NH₃), hydrogen sulfide (H₂S), and carbon dioxide (CO₂) are the primary sources of indoor air quality and pollutant emissions. NH₃ in poultry facilities is released from feces, posing both toxic and foul-smelling hazards. High indoor NH₃ levels can reduce feed conversion rates and poultry production performance, resulting in economic losses [3]. In addition, elevated NH₃ contributes to soil acidification and nitrogen deposition in the ecosystem, causing serious environmental issues [4]. CO₂ mainly comes from the respiration of poultry and the decomposition of their feces. High concentrations of CO₂ can negatively affect poultry health. Moreover, CO₂ is a greenhouse gas linked to poultry production systems, contributing to global warming [5].

Ventilation is crucial for improving air quality in poultry houses. However, increasing the number of air changes also raises energy consumption and costs. In cold weather, ventilation can result in significant heat loss, making it crucial to implement a well-designed and effective ventilation scheme. Monitoring, processing, and analyzing the indoor environment using an Internet of Things (IoT) system [6] can effectively integrate changes and interactions among environmental factors and provide a data-driven basis for the precise regulation of poultry houses. Current control strategies are mainly focused on indoor air temperature and relative humidity [7]. However, ventilation is reduced and air quality deteriorates during winter. The effect of air pollution on chickens is determined by the concentration and duration of exposure. However, a delay is often inevitable in real-time monitoring and regulation. Therefore, it is crucial to develop NH₃ and CO₂ models that can accurately predict the next moment.

The current prediction models for harmful gas concentration in livestock houses are categorized into two types. The first category is based on machine learning, such as random forest regression (RFR) [8], generalized regression neural network (GRNN) [9], long short-term memory (LSTM) [10,11], Wavelet Neural Network (WNN) [12], or hybrid deep learning models [13]. The second category involves predicting ammonia concentration using computational fluid dynamics (CFD) simulation methods [14,15]. However, RFR and GRNN lack the capability to automatically capture temporal dependencies, necessitating the creation of lagged features, which increases the complexity of data processing and prediction time. Data quality significantly impacts predictive performance, making the reduction in data noise particularly important. Current feature selection primarily relies on correlation analysis, but the conclusions drawn are often bidirectional, which can obscure valuable information. To address these challenges, LSTM is utilized as the foundational model, and STL is employed to decompose the target time series, thereby mitigating the impact of data noise. Causal features are used to extract useful information from the target series. XGBoost is applied to adjust the residual components, resulting in the final predicted values. Additionally, most of these studies used only a single dataset. However, the automation degree and environmental control strategy in different poultry house are not consistent; therefore, it is necessary to consider the generalization ability and practicality of the method.

This study proposes a new hybrid machine learning algorithm that predicts short-term gas concentration in a poultry house under two different environmental control modes. This algorithm offers fast and accurate prediction capabilities, making it suitable for low-cost equipment. It aims to provide timely early warnings about indoor air quality and serves as a reference for cost-effective, efficient optimal control strategies, and environmental management in poultry houses.

2. Materials and Methods

2.1. Data Source

In this study, the proposed prediction method was used to predict two types of gas concentrations in different broiler farms to verify its generalization ability. Two datasets were obtained from two intensive broiler breeders farms that use different farming modes in Shandong Province, China. One farm had its environment controlled by an automatic environmental controller, whereas the other used artificial environmental intervention. Note that the time lengths of the two datasets are not the same. The first dataset has nearly 5 months of data, whereas the second dataset has nearly 3 consecutive months of data. The frequency of data acquisition is 1 min. Figure 1 shows the interior of the broiler breeders houses.

2.1.1. Dataset 1 Source: Automated Environmental Control Broiler Breeder House

The dimensions of the house in Dataset 1 are 126 m × 13 m × 2.8 m. The house has a brick wall enclosure and a colored steel frame roof. It also contains five rows of three-layer A-type cage frames, with half of the rows located on each side of the wall. In total, there are four corridors. Approximately 11,520 broiler breeders are raised in the house in a restricted feeding mode, feeding once a day in the morning and once in the afternoon. The house is equipped with mechanical ventilation and has eight fans measuring 1.4 m × 1.4 m installed on one wall. An automatic environmental monitoring and controller system based on IoT technology is also installed in the house. The environmental controller can automatically regulate the fan and wet curtain according to set temperature environment thresholds. The environmental monitoring system comprises five temperature sensors, one humidity sensor, two wind speed sensors, one CO₂ sensor, one ammonia sensor, one static pressure sensor, and one temperature sensor outside the house. The sampling interval is 1 min, and the collected environmental data are uploaded to the IoT cloud platform. Figure 2 shows the planar structure and location of environmental sensor deployment points. Table 1 shows the parameters and models of the sensors.

Because of inaccuracies in the data acquisition of the ammonia sensor in this house, the NH₃ data from this dataset were not be included in this study. Instead, the IoT system was used to collect indoor temperature, relative humidity, wind speed, static pressure, CO₂ levels, ventilation volume, and outdoor temperature data in the house for the period spanning 5 March to 1 August 2024.

2.1.2. Dataset 2 Source: Artificial Environment Control Broiler Breeder House

The poultry house in Dataset 2 measures 72 m × 12 m × 3.5 m. It has five columns and three layers of H-shaped cage frames, with a total of six corridors. The poultry house has approximately 10,000 broiler breeders, including approximately 9600 hens. The broiler breeders are fed once a day in the morning using an automatic feeding system. A sensor acquisition system is installed in the house. It comprises five temperature and humidity sensors, two wind speed sensors, one CO₂ sensor, one ammonia sensor, two dust sensors, and one air pressure sensor. The time interval of environmental data collection is 1 min, and the data are uploaded to the cloud platform through the IoT and saved in the database. The house uses mechanical ventilation, and seven fans measuring 1.3 m × 1.3 m are installed on one wall. The environmental control of the house is manually monitored by checking the value displayed by the current temperature sensor and adjusting the fan and wet curtain accordingly based on experience. Figure 3 shows the planar structure and location of environmental sensor deployment points. Table 2 shows the parameters and models of the sensors.

The environmental data (temperature, relative humidity, wind speed, CO₂, ammonia gas, and dust) in the poultry house from 6 March to 8 June 2024 were collected by the IoT system. In addition, the hourly historical temperature and relative humidity data outside the poultry house were obtained from the website of the China Meteorological Administration. Figure 4 shows the IoT devices of the two poultry houses.

2.2. Gas Concentration Prediction Model Development

2.2.1. Main Framework

A new method for predicting CO₂ and NH₃ levels in a poultry house was developed. This method combines seasonal-trend decomposition using loess (STL), Granger causality (GC), LSTM, and extreme gradient boosting (XGBoost) models and uses low-cost settings. The specific prediction method flowchart is shown in Figure 5. First, STL was used to extract CO₂ and NH₃ sequences and identify trend, seasonal, and residual components. Then, GC was used to select features and analyze the causal relationship between each feature and CO₂ and NH₃. In time series prediction, LSTM has demonstrated good performance, with its flexible memory and generalization ability. However, as LSTM’s prediction ability for the residual component of the target variable is poor, XGBoost was selected to correct the trend, seasonal, and residual component results and obtain the final CO₂ and NH₃ prediction results.

2.2.2. Data Preprocessing

The linear interpolation method was used to fill in the missing values collected by the sensor. Subsequently, the PauTa Criterion was employed to process the data to remove the outliers, which were replaced by the average value of the adjacent data. However, multi-continuous time series with missing values due to power outages or equipment failure were not supplemented. To achieve advanced regulation of the operating state of the environmental control equipment in the poultry house and to avoid environmental control lag, this study selected 10 min as the node for short-term prediction. This choice allowed for more accurate adjustment of the environmental equipment, ensuring the stability of air quality at the best level, reducing energy consumption, and improving energy efficiency.

Both datasets were split into 80% for the training set and 20% for the test set. To eliminate the influence of different data scales on training performance and improve the convergence speed of the model, it is necessary to standardize these data, as shown in Equation (1).

Z = \frac{X - μ}{σ}

(1)

where

X

is the actual value of the data point,

μ

and

σ

are the mean and standard deviation of the data, respectively, and

Z

is the standard score.

2.3. Structure of the Proposed Hybrid STL-GC-LSTM-XGBoost Model

2.3.1. Target Decomposition

The levels of CO₂ and NH₃ in poultry houses are easily affected by ventilation. In the summer, ventilation is increased to provide a cooling effect, but in the winter, it is often minimized to improve insulation. These irregular fluctuations pose challenges for a single model. However, using STL models can decompose the series and improve prediction accuracy. Table 3 describes the symbols in this section.

STL is a time series decomposition algorithm based on local weighted regression. It has a fast computation speed and is highly adaptive to outliers in the data, making it suitable for dealing with large non-stationary time series [16]. STL can decompose

{C O}_{2}

and

{N H}_{3}

time series data

{C O}_{2_{t}}

and

{N H}_{3_{t}}

at time step

t

into three components, namely, seasonal component

S_{t}

, trend component

T_{t}

, and residual component

R_{t}

, as follows:

{C O}_{2_{t}} = R_{t} + T_{t} + S_{t}

(2)

Specifically, the trend component represents the evolution pattern of CO₂ and NH₃, the seasonal component represents periodic changes, and the residual component captures the randomness and uncertainty of fluctuations.

STL contains two loop procedures: an inner loop nested within an outer loop. The seasonal and trend items are updated for each iteration of the inner loop. The sets

{S_{t}}^{(k)}

and

{T_{t}}^{(k)}

denote the

k

th iteration seasonal and trend components, respectively. The initial value of the trend is

{T_{t}}^{(0)} = 0

. At iteration k + 1,

{S_{t}}^{(k + 1)}

and

{T_{t}}^{(k + 1)}

are calculated using the following steps:

Step 1: Calculate the detrended sequence

{C O}_{2_{t}} - {T_{t}}^{(k)}

. If

{C O}_{2_{t}}

is missing at this point, the detrended sequence is also missing at this point.

Step 2: Use local weighted regression (LOESS) to smooth all periodic subsequences, including missing values, one value before the first point of the time series, and one value after the last point of the time series. The combined smoothed values of all periodic subsequences form a temporal seasonal sequence

{C_{t}}^{(k + 1)}

.

Step 3: Smooth

{C_{t}}^{(k + 1)}

using a filter containing a moving average of length n(p), a moving average of length 3, and a local moving average with parameters

d = 1

and

{q = n}_{(1)}

to obtain

{L_{t}}^{(k + 1)}

.

Step 4: Detrend the periodic subsequences after smoothing. In Equation (3),

{L_{t}}^{(k + 1)}

is subtracted to prevent low-frequency information from entering the seasonal term, as follows:

{S_{t}}^{(k + 1)} = {C_{t}}^{(k + 1)} - {L_{t}}^{(k + 1)}

(3)

Step 5: Calculate the de-seasonal term sequence

{C O}_{2_{t}} - {S_{t}}^{(k + 1)}

.

Step 6: Smooth the de-seasonal term using LOESS and obtain

{T_{t}}^{(k + 1)}

after

k + 1

cycles.

Given the tendency item

{T_{t}}^{(k + 1)}

and the seasonal item

{S_{t}}^{(k + 1)}

, the residual error

{R_{t}}^{(k + 1)}

is obtained using Equation (4), as follows:

{R_{t}}^{(k + 1)} = {C O}_{2_{t}} - {S_{t}}^{(k + 1)} - {T_{t}}^{(k + 1)}

(4)

Then, the robustness weight

ρ_{t}

at time point

t

can be obtained using Equations (5) and (6):

h = 6 m e d i a n (|R_{t}|)

(5)

ρ_{t} = B (\frac{|R_{t}|}{h})

(6)

where

m e d i a n (|R_{t}|)

is to calculate the absolute value of the residual and take the median, so

h

is a dynamic threshold based on the median of the residual, which is used to identify the anomaly.

B

is the quadratic weighting function, calculated as follows:

B (\frac{|R_{t}|}{h}) = \{\begin{matrix} {[{1 - (\frac{|R_{t}|}{h})}^{2}]}^{2}, 0 \leq \frac{|R_{t}|}{h} < 1 \\ 0, \frac{|R_{t}|}{h} \geq 1 \end{matrix}

(7)

The CO₂ series decomposed using STL is shown in Figure 6. The trend component

T_{t}

is almost the same as the original data

{C O}_{2_{t}}

, but some data noise has been removed to make

R_{t}

smoother. The seasonal component

S_{t}

has a certain regularity over a specific period. However, the residual component

R_{t}

exhibits obvious fluctuation and uncertainty, mainly because the residual can be considered as data noise, which is often also the main factor affecting the predictions and has the disadvantage of uncertainty.

The NH₃ series decomposed using STL is shown in Figure 7. The decomposition result is the same as that of CO₂, and the trend is smoother and more obvious, which is beneficial for making future predictions.

2.3.2. Auxiliary Variable Selection

Because of the characteristics of indoor environmental factors (e.g., coupling with each other, being affected by other factors, nonlinear, time-varying, and dynamic), some factors were first selected as auxiliary input variables for the hybrid model. In most studies on livestock house environment prediction, the use of correlation, CNN, or machine learning methods to select important features accounts for a large proportion. However, although correlation can test the relationship between two variables, when they change together, it does not mean that one variable can cause the change in the other. Therefore, correlation does not indicate causality.

To improve the effectiveness of the input information of the model and reduce the input dimension of CO₂ and NH₃ prediction, the GC was used to analyze the causal relationship between CO₂ and other meteorological factors in the poultry house. In addition, the cycle characteristics of the influence of other meteorological factors on CO₂ concentration in 1–15 different lag periods were revealed.

GC is a statistical hypothesis testing method that can effectively avoid the spurious correlation between variables. It is used to determine whether a variable in one time series can provide predictive information for a variable in another time series. The main theory is that for time series

u_{t}

and

{C O}_{2_{t}}

, one can infer that time series

u_{t}

contributes to

{C O}_{2_{t}}

if using the lags of both time series

u_{t}

and

{C O}_{2_{t}}

provides better results than using only the lags of time series

{C O}_{2_{t}}

. That is, time series

u_{t}

helps explain the future variation trend of series

{C O}_{2_{t}}

; thus, time series

u_{t}

is the Granger cause of time series

{C O}_{2_{t}}

. However, GC does not imply causality as we usually understand it; rather, it indicates a statistically significant relationship between the two time series variables [17]. The GC involves two autoregressive (AR) models of time series

{C O}_{2_{t}}

and

u_{t}

to be tested, as follows:

{C O}_{2_{t}} = α_{0} + \sum_{i = 1}^{p} α_{i} {C O}_{2_{t - i}} + \sum_{i = 1}^{q} β_{i} u_{t - i} + ε_{t}

(8)

{C O}_{2_{t}} = α_{0} + \sum_{i = 1}^{p} α_{i} {C O}_{2_{t - i}} + ε_{t}

(9)

Usually, the AR model that includes the lag term

{C O}_{2_{t}}

of

u_{t}

is compared with the AR model without

u_{t}

to test for causality. Figure 4 shows the causal test results of CO₂ and other meteorological factors for the two datasets. In general, the F-statistic is used to test the significance of a causal relationship, and the significance level of the F-statistic can help determine whether to reject the original hypothesis.

F = \frac{{R S S}_{R} - \frac{{R S S}_{U}}{q}}{\frac{{R S S}_{U}}{n - p - q - 1}} ~ F (q, n - p - q - 1)

(10)

where

{R S S}_{R}

is the sum of squares of residuals excluding the independent variable

u_{t}

and

{R S S}_{U}

is the sum of squares of residuals including the independent variable

u_{t}

.

q

and

p

are the number of lag terms of the two models, respectively.

n

represents the total number of samples.

If the calculated F-statistic is greater than the critical value of the corresponding F-distribution, the original hypothesis can be rejected and the variable

u

Granger causes the variable CO₂. The larger the F-statistic, the more strongly the lag term of

u

explains CO₂ and therefore the more likely there is a Granger-induced relationship between

u

and CO₂. To summarize, the original hypothesis (H0) posits that no causal relationship exists in the Granger causality test, whereas the alternative hypothesis (H1) suggests that there is a causal relationship [18]. Table 4 shows the description of the environmental parameter symbols in the dataset.

Figure 8a shows the relationship between CO₂ of Dataset 1 and other meteorological factors, including indoor temperature, relative humidity, ventilation rate, wind speed, static pressure, and outdoor temperature. The color in the heatmap represents the magnitude of the p-value, where a smaller p-value indicates a more significant causal relationship between two variables. The results show that when the number of lag steps was 1, the p-value of the relative humidity in the house was greater than 0.05. Therefore, the original hypothesis (H0) cannot be rejected, meaning that

{h u m}_{t}

does not Granger cause

{C O}_{2_{t}}

. However, the -value of all meteorological factors was less than 0.05 in lag numbers 2–15. Thus, the original hypothesis is rejected, indicating that all meteorological factors in Dataset 1 were Granger causes of CO₂.

Figure 8b shows that for CO₂ in Dataset 2, the p-values of the static pressure inside the house were all greater than 0.05 in the lag steps from 1 to 15. Therefore, the original hypothesis cannot be rejected; that is, the static pressure inside the house was not the Granger cause of CO₂. When the number of lag steps was 1, the p-values of wind speed, PM2.5, PM10, relative humidity, and ammonia concentration in the house were all greater than 0.05. Therefore, the original hypothesis cannot be rejected. However, the p-values of these meteorological factors were all less than 0.05 when the number of lag steps was 2–15.

Figure 9 shows that the p-values of PM10, PM2.5, and wind speed inside the house and relative humidity outside the house were all greater than 0.05 in the lag steps from 1 to 15. Therefore, the original hypothesis cannot be rejected, indicating that PM10, PM2.5, and wind speed inside the house and relative humidity outside the house were not Granger causes of NH₃. When the number of lag steps was 1, the p-value of the relative humidity inside the house was greater than 0.05. Therefore, the original hypothesis cannot be rejected, which indicates that the in-house relative humidity was not the Granger cause of NH₃ when the number of lag steps was 1. Table 5 shows each predictor and auxiliary variable that passed the causality test.

2.3.3. Component Prediction

The sequence input layer was responsible for receiving measurement data. The data included the forecast targets of Table 2 and the historical data of the corresponding auxiliary variables.

The history sequence data of

T_{{C O}_{2}}

,

R_{{C O}_{2}}

, and

S_{{C O}_{2}}

of the two different datasets were transformed into input time series named

{T 1}_{{C O}_{2}}

,

{R 1}_{{C O}_{2}}

,

{S 1}_{{C O}_{2}}

,

{T 2}_{{C O}_{2}}

,

{R 2}_{{C O}_{2}}

,

{S 2}_{{C O}_{2}}

,

T_{{N H}_{3}}

,

R_{{N H}_{3}}

, and

S_{{N H}_{3}}

.

The LSTM network is a special type of recurrent neural network (RNN) with a gate mechanism for capturing long-term dependencies, which effectively alleviates the gradient vanishing problem prevalent in traditional RNNs. The basic composition of an LSTM network includes a memory block consisting of a forget gate, an input gate, and an output gate.

The input gate determines the important information in the input at the current time and the admission weight

i_{t}

of the information using the sigmoid function. It also provides the new candidate state

{\tilde{C}}_{t}

using the tanh function. The forgetting gate determines the information that needs to be forgotten at the previous moment. It calculates the current input

x_{t}

and the hidden state

h_{t - 1}

at the previous moment through a fully connected layer using the sigmoid activation function to obtain the activation value of the forgetting gate

f_{t}

. A value close to 1 indicates “retain,” whereas a value close to 0 indicates “discard”. The cell state acts as a memory trace for the LSTM network, which extends over the entire cell chain, allowing information to be added or removed based on the input and forget gates. The output gate determines the information to be passed to the hidden state at the next time or the output at the current time. The activation value

o_{t}

is the same as the principle of the forget gate, and the value of the cell state after the tanh function is multiplied element by element to obtain the final hidden state

h_{t}

. It is precisely because of this design that LSTM can capture long-term dependencies while avoiding the problem of vanishing and exploding gradients. The detailed calculation is shown in Equation (11).

\{\begin{matrix} f_{t} = σ (W_{f} \cdot [x_{t}, h_{t - 1}] + b_{f}) \\ i_{t} = σ (W_{i} \cdot [x_{t}, h_{t - 1}] + b_{i}) \\ {\tilde{C}}_{t} = t a n h (W_{C} \cdot [x_{t}, h_{t - 1}] + b_{C}) \\ C_{t} = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ {\tilde{C}}_{t} \\ o_{t} = σ (W_{o} \cdot [x_{t}, h_{t - 1}] + b_{o}) \\ h_{t} = o_{t} ⊙ t a n h (C_{t}) \end{matrix}

(11)

where

σ

is the sigmoid activation function,

W_{f}

is the weight matrix of the forgetting gate,

W_{i}

is the weight matrix of the input gate,

W_{o}

is the weight matrix of the output gate,

W_{C}

is the weight matrix of the memory cell,

b_{f}

is the bias of the forget gate,

b_{i}

is the bias of the input gate,

b_{C}

is the bias of the input gate activation function, and

b_{o}

is the bias of the output gate [19].

The output size of the output layer is 3, corresponding to the trend, seasonal, and residual components.

2.3.4. Correction of Results

Because of the instability of the residual results, it is necessary to correct the discrepancies between the residual and predicted results. XGBoost is a classical ensemble boosting algorithm based on gradient boosting that splits the data based on features, similar to a tree model. Trees are added each time to fit the model to correct the prediction errors made by prior models. Each leaf in a tree represents a numerical weight, and each sample is assigned to a set of leaves based on the values of its input variables. The model’s estimated output for that sample is obtained by adding the sum of the leaves assigned to that sample for each regression tree. Compared with other gradient boosting algorithms, XGBoost can collect a strong classifier from a set of weak classifiers and exhibits the following advantages: (1) effectively handles missing values; (2) prevents overfitting; and (3) performs parallel and distributed computing to reduce running time. The purpose of XGBoost is to adopt a gradient descent optimization method and an arbitrary differentiable loss function to minimize the loss function by adding weak learners, that is, defining and optimizing the objective function. XGBoost attempts to minimize the regularization objective, as shown in Equation (12).

o b j (θ) = \sum L (\hat{y_{i}}, y_{i}) + \sum Ω (f_{k}) {, f}_{k} ϵ F

(12)

where L is the training loss function that measures the deviation between the value

\hat{y_{i}}

predicted by the proposed model and the actual value

y_{i}

.

Ω

is the regularization function that measures the complexity of the model, which tends to prevent overfitting. f is a function in the functional space F, and F is the set of all possible regression trees [20].

2.3.5. Evaluation Indicators

In this study, five evaluation metrics were used to evaluate the prediction methods: mean square error (MSE), mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R²). These metrics can be calculated using Equations (13) to (17).

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}

(13)

M S A = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - \hat{y_{i}}|

(14)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}

(15)

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - \hat{y_{i}}}{y_{i}}|

(16)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y_{i}})}^{2}}

(17)

where

y_{i}

is the actual value, which is the label of the data;

\hat{y_{i}}

is the predicted value; and n is the number of the predicted value. Except for R², the smaller the value of other indicators, the smaller the prediction error, and the higher the prediction accuracy of the prediction method.

3. Results and Discussion

3.1. Dataset and Setting

The two time series datasets were divided into training datasets and testing datasets, which were used for the training and prediction testing of the STL-GC-LSTM-XGBoost model, respectively. The first dataset contains 21,393 ten-minute data groups, including indoor temperature, humidity, wind speed, and carbon dioxide, mainly covering the spring, fall, and summer seasons from March to August 2023. The second dataset contains 10,749 10-minute data groups, including indoor temperature, humidity, wind speed, carbon dioxide, and ammonia, mainly covering the spring and fall seasons from March to June 2024. The first 80% and last 20% of the two datasets were used for training and testing data, respectively. Tensorflow-CPU-2.16.1 was used to train and test the STL-GC-LSTM-XGBoost model. All experiments involving LSTM used the following parameters: each LSTM layer had 50 neurons, the activation function was tanh, sigmoid was used as the recursive activation function, and the loss function was MSE. The widely used Adaptive Moment Estimation (ADAM) method was used for optimization. The number of neurons in the output layer was 1. The batch size and the number of training rounds were set to 64 and 30, respectively.

3.2. Short-Term Prediction of CO₂ Concentration

3.2.1. Verification of the Predicted CO₂ Results of the STL-GC-LSTM-XGBoost Model

Through data processing steps, time series decomposition, and the causality test, the trend, seasonal, and residual components of CO₂ concentration and the input information of each dataset were obtained. LSTM was used to predict the three components separately, and then, the prediction results of the three components were used as input features to predict the final CO₂ using XGBoost.

Because of the large amount of data, the overall prediction chart could not show the prediction effect. Therefore, Figure 6 shows the trend component

{T 1}_{{C O}_{2}}

, seasonal component

{S 1}_{{C O}_{2}}

, and residual component

{R 1}_{{C O}_{2}}

of the test set from Dataset 1 for 16–17 July, as predicted by LSTM. Table 6 presents the evaluation results for Datasets 1 and 2.

The evaluation index results for Dataset 1, as shown in Figure 10 and Table 6, indicate that, in general, LSTM can accurately predict

{T 1}_{{C O}_{2}}

and

{S 1}_{{C O}_{2}}

. In addition, the change trend of the predicted value was very close to that of the observed value. Although the predicted MAPE for

{S 1}_{{C O}_{2}}

was higher, this may likely be because the actual value was close to 0 in some cases, making the percentage error particularly large. However,

{R 1}_{{C O}_{2}}

prediction was not accurate, mainly because the residual was essentially noise, making the components complex and lacking obvious regularity.

Figure 11 shows part of the prediction plots for

{T 2}_{{C O}_{2}}

,

{S 2}_{{C O}_{2}}

, and

{R 2}_{{C O}_{2}}

from Dataset 2. As with the results for Dataset 1,

{T 2}_{{C O}_{2}}

and

{S 2}_{{C O}_{2}}

exhibited better prediction effects than

{R 2}_{{C O}_{2}}

. Table 6 shows that

{T 2}_{{C O}_{2}}

had smaller MSE, MAE, RMSE, and MAPE, demonstrating better predictive power than

{T 1}_{{C O}_{2}}

and stronger explanatory power. The main reason for this may be that although Dataset 2 uses artificial experience to control the internal environment, it forms a stable control pattern, which might be easier to identify and predict. In addition, the frequency and magnitude of manual adjustment might be relatively small, resulting in less frequent data changes, more uniform data, and lower noise levels. These make it easier for the proposed model to capture long-term trends.

As shown in Table 6, the MSE, MAE, and RMSE of

{S 1}_{{C O}_{2}}

and

{S 2}_{{C O}_{2}}

were very small, indicating that the two environmental control methods performed very well in predicting seasonal changes, and the R² value was close to 1, indicating that the seasonal component was well captured. However, MAPE was relatively large for both components. This index measures the relative error, and the high MAPE may be mainly due to the small actual value of the seasonal component. Thus, even if the absolute value of the prediction error is small, the relative error becomes amplified, making MAPE easily affected.

Unlike the other components, the prediction effect of

{R 1}_{{C O}_{2}}

and

{R 2}_{{C O}_{2}}

is not ideal. They represent the random fluctuations in the time series, which is usually the most difficult part to predict because the residual part contains random noise and uncaptured nonlinear changes. By comparison, the MSE and RMSE results of the two components were close, whereas the MAE of

{R 2}_{{C O}_{2}}

was significantly lower than that of

{R 1}_{{C O}_{2}}

, which may be due to the reduction in random noise and irregular fluctuations in Dataset 2 under manual control. The MAPE values of both

{R 1}_{{C O}_{2}}

and

{R 2}_{{C O}_{2}}

were very high, indicating that the prediction of

R_{{C O}_{2}}

had great uncertainty and volatility.

Therefore, to correct the above problems and reduce model complexity, it was decided to use XGBoost to further incorporate trend, seasonal, and residual components as features to predict CO₂, with the aim of increasing the accuracy of the prediction results. The test set was used to test the prediction accuracy of the proposed model. As shown in Figure 12, the change trend of the predicted value of CO₂ mass concentration by STL-GC-LSTM-XGBoost was very close to the change trend of the observed value. The measured CO₂ values of the test set of Dataset 1 varied from 582 to 762 mg/m³, and the predicted values of the STL-GC-LSTM-XGBoost model varied from 594.49 to 756.42 mg/m³. The minimum error between the predicted and measured values was 0 mg/m³, the maximum error was 34.18 mg/m³, and the MSE was 35.38 mg/m³. The measured CO₂ concentration of the test set of Dataset 2 varied from 424.47 to 1072.41 mg/m³, and the predicted value of the STL-GC-LSTM-XGBoost model varied from 423.73 to 1049.31 mg/m³. The minimum error between the predicted and measured values was 0 mg/m³, the maximum error was 70.2 mg/m³, and MSE was 37.59 mg/m³.

From the two prediction graphs and prediction data, it can be observed that, as a whole, these two datasets did not exceed the highest CO₂ standard for the poultry house in their respective time periods. This is primarily due to the large ventilation volume in the house during spring and summer, which maintains good air quality. However, the CO₂ levels in the house controlled by an automatic environmental controller (Dataset 1) are more stable than those of the house controlled by an artificial environment controller (Dataset 2). This stability is mainly because the house controlled by an artificial environment controller cannot observe the sensor data at every moment of the day to react, so the CO₂ fluctuations are larger at some moments.

3.2.2. Comparison of the Predicted CO₂ Results of Different Models

To further verify the prediction effect of the proposed model, five models (LSTM, GRU, STL-LSTM, STL-GC-LSTM, and STL-GC-LSTM-XGBoost) were used on the test sets of the two datasets for comparison. The generalization ability and practicability of the proposed model were also determined using the datasets of two different production modes. The CO₂ prediction results are shown in Table 4 and Table 5.

The results show that the proposed STL-GC-LSTM-XGBoost model achieved optimal prediction accuracy, whether in the house using an automatic environmental controller or the artificial environment. As shown in Figure 13 and Figure 14, although the traditional LSTM and GRU models have certain predictive abilities, they had large errors and low explanatory abilities, with R2 values of only 0.56 and 0.58, respectively. The predictive ability of LSTM is slightly better than that of GRU, which is also the reason why LSTM was selected in this study. After adding STL decomposition, the prediction performance improved significantly, the error index was reduced, and R2 increased to 0.92, mainly because STL decomposition removed some of the noise. Because there are few other meteorological factors in dataset 1, all of them exhibited a causal relationship with CO₂ after GC analysis. Therefore, no features were removed. Theoretically, STL-LSTM and STL-GC-LSTM should have the same prediction results. However, random events during the training process led to different results, although the differences were small. After further introducing XGBoost, the proposed model exhibited the best prediction performance, with all error indicators at their lowest and R² at its highest (0.95). This indicates that the proposed model has the best prediction and explanatory ability in the automatic environmental control system.

In addition, a comparison of the running times of all models reveals that the proposed ensemble model was relatively complex, resulting in a long running time. This is primarily because the trend, seasonal, and residual components of the target series needed to be predicted separately, which increased the running time. However, compared with the 10-min data interval, the 86-s running time in this study was still considerable and could fully meet the requirements for the response of the prediction and control strategy.

Figure 15 and Figure 16 show the prediction results of the five models of environmental control Dataset 2 with artificial experience intervention. The conclusions are similar to those drawn from Dataset 1, but the data are more stable, indicating less noise. Therefore, the trend and seasonal components in the data were easier to capture and explain, and all the models had strong explanatory abilities. The main reason for this phenomenon may be that the environmental control system is not timely or sensitive enough to respond to environmental changes in some cases, resulting in some unpredictable fluctuations and noise in the data. Artificial experience intervention may have had a sensitive response to emergencies and abnormal data points, which helped stabilize the data. In addition, the amount of data in Dataset 2 is two months less than that in Dataset 1, and the complexity of the data is higher. Overall, regardless of the production mode, the proposed STL-GC-LSTM-XGBoost model exhibits excellent CO₂ prediction ability. As Dataset 2 has 10,644 fewer rows than Dataset 1, it also took 34 s less to run. However, despite the slightly longer running time of this method compared with others, it still fully meets the requirements for subsequent prediction and regulation.

3.3. Short-Term Forecast of Ammonia Gas

3.3.1. Verification of the Predicted NH₃ Results of the STL-GC-LSTM-XGBoost Model

Figure 17 shows part of the

{T 2}_{{N H}_{3}}

,

{S 2}_{{N H}_{3}}

, and

{R 2}_{{N H}_{3}}

prediction plots of Dataset 2. As with the CO₂ prediction results, the prediction effects of

{T 2}_{{N H}_{3}}

and

{S 2}_{{N H}_{3}}

were significantly better than that of

{R 2}_{{N H}_{3}}

. Table 6 shows that

{T 2}_{{N H}_{3}}

had almost no errors, whereas

{S 2}_{{N H}_{3}}

, despite its MSE, MAE, and RMSE being close to 0, had a higher MAPE (56.79%). This is mainly because the range of NH₃ is very small. Therefore, even if the absolute error is small, the relative error appears high.

As with CO₂, XGBoost was also used to correct the residuals, and the prediction plot is shown in Figure 18. The results show that the measured value of NH₃ varied from 0.67 to 2.15 ppm, and the predicted value of the STL-GC-LSTM-XGBoost model varied from 0.68 to 1.5 ppm. The minimum error between the predicted and measured values was 0.00 ppm, the maximum error was 0.3 ppm, and MSE was 0.00 ppm.

3.3.2. Comparison of the Predicted NH₃ Results of Different Models

Figure 19 and Figure 20 show the results of the five models for NH₃ prediction in the environmental control Dataset 2 with artificial experience intervention. As can be observed, the prediction abilities of the LSTM and GRU basic models are similar to those in CO₂ prediction. The MAPE of LSTM was slightly lower, but the prediction ability was still better. The prediction performance improved significantly after STL decomposition. Notably, the error was significantly reduced, and the R2 value was close to 1. These results indicate that the model-fitting effect was excellent. At the same time, after adding GC, performance improved further, and the MAPE slightly decreased, reflecting effectiveness. However, the difference is that the proposed STL-GC-LSTM-XGBoost method did not perform as well as the STL-GC-LSTM model in NH₃ prediction, which is contrary to the CO₂ prediction results. The NH₃ dataset shows that the range of NH₃ variation is relatively small, which may have led to the complexity of the model brought by the combination of XGBoost, resulting in some overfitting or other problems.

4. Conclusions

The following conclusions can be drawn from this study:

(1) All LSTM networks used in this study had only 50 neurons per layer, the batch size was 64, and the number of training rounds was 30. The computational complexity was low, and the CPUs used low-cost settings, ensuring that the model could be applied to low-performance computer devices. The experimental results showed that the running time of the proposed model was 86 and 52 s, indicating that the parameter settings achieved better prediction performance;

(2) STL was used to decompose the target series (CO₂ and NH₃) to obtain the trend, seasonal, and residual components. This effectively improved the prediction ability of LSTM. On Datasets 1 and 2, the MSE of CO₂ prediction decreased from 289.49 to 58.31 and 314.06 to 49.66 and the MAE decreased from 113.69 to 6.27 and 7.56 to 5.4, respectively. MAPE decreased from 2.11% to 0.97% and from 1.31 to 1.03. The NH₃ prediction ability also improved. The effectiveness of introducing STL for objective decomposition was proven. The main reason is that STL can effectively separate the random noise in the data, allowing the model to learn the trend of non-noisy data;

(3) GC can determine the causal relationship between other meteorological factors and the target sequence. It is not limited to the correlation commonly used at present. By comparing the CO₂ prediction results of the STL-GC-LSTM and STL-LSTM models from Dataset 2, it is evident that the introduction of causal features significantly improved prediction performance. MSE decreased from 49.66 to 43.5, MAE decreased from 5.4 to 4.85, and RMSE decreased from 7.05 to 6.6. Although the other evaluation metrics did not show improvement in NH₃ prediction, MAPE decreased from 1.13% to 1.11%;

(4) The proposed STL-GC-LSTM-XGBoost ensemble model performed poorly in NH₃ concentration prediction, demonstrating slightly lower performance than the STL-GC-LSTM model. This may be because variation in NH₃ concentration was not particularly pronounced and the introduction of XGBoost increased the complexity of the model. However, the CO₂ concentration prediction ability of the poultry house under two different environmental control modes was quite significant. Therefore, the model can still be used to predict gas concentrations in poultry houses.

The method proposed in this paper is for timely warning of the environment in the poultry house and will be combined with the environmental control algorithm in the future. It is hoped that it can contribute to the regulation of the environmental quality of the poultry house in winter and provide a basis for prediction.

Author Contributions

Conceptualization, Y.X. and G.T.; methodology, Y.X.; formal analysis, Y.X.; investigation, Y.X. and Z.Z.; resources, Y.X.; data curation, Z.Z.; writing—original draft preparation, Y.X.; writing—review and editing, Y.X. and G.T.; visualization, Y.X.; supervision, G.T.; project administration, G.T.; funding acquisition, G.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China, grant number 2023YFD2000805, and the Key R&D Program of Shandong Province of China, grant number 2022TZXD0015.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to the datasets presented in this article are not readily available because the data are part of an ongoing study, and the research project has not yet been concluded.

Acknowledgments

The authors would like to thank the reviewers for their hard work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Monedero Gomez, F. Savings and energy efficiency in livestock installations Ahorro y eficiencia energetica en instalaciones ganaderas. Agric. Rev. Agropecu. 2005, 74, 534–539. [Google Scholar]
Bist, R.B.; Subedi, S.; Chai, L.; Yang, X. Ammonia emissions, impacts, and mitigation strategies for poultry production: A critical review. J. Environ. Manag. 2023, 328, 116919. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Wang, T.; Yao, W.; Hu, L.; Gao, Y.; Huang, F. Ammonia emission characteristic from livestock and poultry house and its harm to livestock and poultry health. Chin. J. Anim. Nutr. 2017, 29, 3472–3481. [Google Scholar]
Fangmeier, A.; Hadwiger-Fangmeier, A.; Van der Eerden, L.; Jäger, H.-J. Effects of atmospheric ammonia on vegetation—A review. Environ. Pollut. 1994, 86, 43–82. [Google Scholar] [CrossRef] [PubMed]
Sun, Y.; Wang, Y.; Sa, R.; Zhang, H. Effects of carbon dioxide on poultry and control measures. Chin. J. Anim. Nutr. 2018, 30, 2918–2924. [Google Scholar]
Gao, L.A.; Er, M.; Li, L.; Wen, P.; Jia, Y.; Huo, L. Microclimate environment model construction and control strategy of enclosed laying brooder house. Poult. Sci. 2022, 101, 101843. [Google Scholar] [CrossRef] [PubMed]
Costantino, A.; Fabrizio, E.; Villagrá, A.; Estellés, F.; Calvet, S. The reduction of gas concentrations in broiler houses through ventilation: Assessment of the thermal and electrical energy consumption. Biosyst. Eng. 2020, 199, 135–148. [Google Scholar] [CrossRef]
Yeo, U.H.; Jo, S.K.; Kim, S.H.; Park, D.H.; Jeong, D.Y.; Park, S.J.; Shin, H.; Kim, R.W. Applicability of Machine-Learned Regression Models to Estimate Internal Air Temperature and CO₂ Concentration of a Pig House. Agronomy 2023, 13, 328. [Google Scholar] [CrossRef]
Sun, G.; Hoff, S.J.; Zelle, B.C.; Smith, M.A. Development and comparison of backpropagation and generalized regression neural network models to predict diurnal and seasonal gas and PM10 concentrations and emissions from swine buildings. Trans. Asabe 2008, 51, 685–694. [Google Scholar] [CrossRef]
Peng, S.Y.; Zhu, J.; Liu, Z.; Hu, B.; Wang, M.; Pu, S. Prediction of Ammonia Concentration in a Pig House Based on Machine Learning Models and Environmental Parameters. Animals 2023, 13, 165. [Google Scholar] [CrossRef] [PubMed]
Chen, X.Y.; Yang, L.; Xue, H.; Li, L.; Yu, Y. A Machine Learning Model Based on GRU and LSTM to Predict the Environmental Parameters in a Layer House, Taking CO₂ Concentration as an Example. Sensors 2024, 24, 244. [Google Scholar] [CrossRef] [PubMed]
Besteiro, R.; Arango, T.; Ortega, J.A.; Rodríguez, M.R.; Fernández, M.D.; Velo, R. Prediction of carbon dioxide concentration in weaned piglet buildings by wavelet neural network models. Comput. Electron. Agric. 2017, 143, 201–207. [Google Scholar] [CrossRef]
Xie, Q.J.; Ni, J.-Q.; Li, E.; Bao, J.; Zheng, P. Sequential air pollution emission estimation using a hybrid deep learning model and health-related ventilation control in a pig building. J. Clean. Prod. 2022, 371, 133714. [Google Scholar] [CrossRef]
Tabase, R.K.; Van Linden, V.; Bagci, O.; De Paepe, M.; Aarnink, A.J.; Demeyer, P. CFD simulation of airflows and ammonia emissions in a pig compartment with underfloor air distribution system: Model validation at different ventilation rates. Comput. Electron. Agric. 2020, 171, 105297. [Google Scholar] [CrossRef]
Tong, X.J.; Zhao, L.; Heber, A.J.; Ni, J.-Q. Development of a farm-scale, quasi-mechanistic model to estimate ammonia emissions from commercial manure-belt layer houses. Biosyst. Eng. 2020, 196, 67–87. [Google Scholar] [CrossRef]
Trull, O.; Garcia-Diaz, J.C.; Peiro-Signes, A. Multiple seasonal STL decomposition with discrete-interval moving seasonalities. Appl. Math. Comput. 2022, 433, 127398. [Google Scholar] [CrossRef]
Mahjoub, C.; Bellanger, J.-J.; Kachouri, A.; Jeannès, R.L.B. On the performance of temporal Granger causality measurements on time series: A comparative study. Signal Image Video Process. 2020, 14, 955–963. [Google Scholar] [CrossRef]
Granger, C.W.J. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 1969, 37, 424–438. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Chen, T.Q.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]

Figure 1. Interior of the broiler breeders houses. (a) House with automatic environmental controller for Dataset 1; (b) House by artificially controlled environmental for Dataset 2.

Figure 2. House plane structure and sensor layout diagram of Dataset 1. (a) Top view; (b) Side view.

Figure 3. House plane structure and sensor layout diagram of Dataset 2. (a) Top view; (b) Side view.

Figure 4. The picture of IoT devices in poultry houses. (a) Broiler breeders house with automatic environmental controller for Dataset 1; (b) Broiler breeders house by artificially controlled environmental for Dataset 2.

Figure 5. Structure of the STL-GC-LSTM-XGBoost model.

Figure 6. STL decomposed 1000 consecutive data of CO₂ in the sample. (a) The original data of Dataset 1 were decomposed into trend components, seasonal components, and residual components; (b) The original data of Dataset 2 were decomposed into trend components, seasonal components, and residual components.

Figure 7. STL decomposed 1000 consecutive data of NH₃ in the sample (from top to bottom are the original data, trend component, seasonal component, and residual component).

Figure 8. Causal testing of other meteorological factors and CO₂ in two datasets. (a) Dataset 1; (b) Dataset 2.

Figure 9. Causality test of other meteorological factors and NH₃ in Dataset 2.

Figure 10. Trend, seasonal, and residual component prediction plots for the test set part of Dataset 1.

Figure 11. Trend, seasonal, and residual component prediction plots for the test set part of Dataset 2.

Figure 12. Comparison of STL-GC-LSTM-XGBoost prediction values for the test set of different datasets. (a) Dataset 1; (b) Dataset 2.

Figure 13. Prediction results of five models for Dataset 1.

Figure 14. Comparison plots of partial data predictions for different models for CO₂ of Dataset 1.

Figure 15. Prediction results of five models for Dataset 2.

Figure 16. Comparison plots of partial data predictions for different models for CO₂ of Dataset 2.

Figure 17. Trend, seasonal, and residual component prediction plots for part of the test set.

Figure 18. STL-GC-LSTM-XGBoost prediction plot for the test set.

Figure 19. Prediction results of five models.

Figure 20. Comparison plots of partial data predictions for different models for NH₃ of Dataset 2.

Table 1. Environmental sensor models and parameters of Dataset 1.

Detection Index	Range	Precision	Model Number
Temperature	−40–80 °C	±0.2 °C	KCNTC-1M
Humidity	0–100%	±1.5%	KCTHT-102
Wind speed	0~60 m/s	±(0.2 m/s ± 0.02 × v)	RS-CFSFX-I20-2
Static pressure	−100~100 pa	3%	KCPT-106
Carbon dioxide	0–5000 ppm	±(50 ppm +3% F·S)	KCCOT-103
Ammonia	0~50 ppm	±8%	RS-NH3-I20-2-50p

Table 2. Environmental sensor models and parameters of Dataset 2.

Detection Index	Range	Precision	Model Number
Temperature	−40–80 °C	±0.5 °C	RS-WS-SMG-4
Humidity	0–100%	±3%	RS-WS-SMG-4
PM2.5 and PM10	0~1000 μg/m³	±3%FS	RS-PM-I20-2
Wind speed	0~60 m/s	±(0.2 m/s ± 0.02 × v)	RS-CFSFX-I20-2
Carbon dioxide	0–5000 ppm	± (50 ppm +3% F·S)	RS-CO2-I20-2-5000p
Ammonia	0~50 ppm	±8%	RS-NH3-I20-2-50p
Air pressure	−120~120 Pa	±3%	RS-YC-I20-2

Table 3. Parameter description.

Symbol	Description
$S_{t}$	Seasonal component of time t
$T_{t}$	Trend component at time t
$R_{t}$	Residual component at time t
${S_{t}}^{(k)}$	Seasonal component at the end of k − 1 of the inner cycle
${T_{t}}^{(k)}$	Trend component at the end of k − 1 of the inner cycle,
$n (p)$	Period of time series
$n_{(1)}$	LOESS smoothing parameter
$h$	A dynamic threshold based on the median of the residual, which is used to identify the anomaly.

Table 4. Description of dataset parameters.

Parameter Abbreviation	Explanation
out_tem (°C)	Temperature outside the house
out_hum (%)	Humidity outside the house
in_hum (%)	Humidity inside the house
in_tem (°C)	Temperature inside the house
in_wind (m/s)	Wind speed inside the house
in_ven (m³/h)	Ventilation inside the house
in_NP, in_kpa (pa)	Static pressure
in_pm2.5 (mg/m³)	PM2.5 inside the house
in_pm10 (mg/m³)	PM10 inside the house
in_co2 (ppm)	Carbon dioxide concentration inside the house
in_ammonia (ppm)	Ammonia concentration inside the house

Table 5. Auxiliary variables obtained by granger causality for different predictor variables.

Predictive Variable	Auxiliary Variable
$T_{{C O}_{2}}$ $, R_{{C O}_{2}}$ $and S_{{C O}_{2}}$ of Dataset 1	out_tem, in_hum, in_tem, in_wind, in_ven, in_NP.
$T_{{C O}_{2}}$ $, R_{{C O}_{2}}$ $and S_{{C O}_{2}}$ of Dataset 2	out_tem, out_hum, in_tem, in_wind, in_pm2.5, in_pm10, in_kpa, in_ammonia.
$T_{{N H}_{3}}$ $, R_{{N H}_{3}}$ $and S_{{N H}_{3}}$ of Dataset 2	out_tem, in_co2, in_hum, in_tem, in_kpa.

Table 6. Evaluation metrics for trend, seasonal, and residual component prediction for both datasets.

Object of Prediction	The Decomposed Components	MSE (ppm)	MAE (ppm)	RMSE (ppm)	MAPE (%)	R2
C02 sequence of Dataset 1	$Trend ({T 1}_{{C O}_{2}}$ )	27.89	4.48	5.28	0.69	0.95
	$Seasonal ({S 1}_{{C O}_{2}}$ )	0.08	0.22	0.28	30.72	0.99
	$Residual ({R 1}_{{C O}_{2}}$ )	51.81	5.69	7.2	228.6	0.44
C02 sequence of Dataset 2	$Trend ({T 2}_{{C O}_{2}}$ )	13.64	2.7	3.69	0.48	0.99
	$Seasonal ({S 2}_{{C O}_{2}}$ )	0.06	0.17	0.24	48.01	0.99
	$Residual ({R 2}_{{C O}_{2}}$ )	53.93	3.97	7.34	254.71	0.20
NH₃ sequence of Dataset 1	$Trend ({T 1}_{{N H}_{3}}$ )	0.00	0.01	0.01	0.6	0.99
	$Seasonal ({S 1}_{{N H}_{3}}$ )	0.00	0.00	0.00	56.79	0.99
	$Residual ({R 1}_{{N H}_{3}}$ )	0.00	0.16	0.02	290.73	0.32

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Y.; Teng, G.; Zhou, Z. Short-Term Prediction Method for Gas Concentration in Poultry Houses Under Different Feeding Patterns. Agriculture 2024, 14, 1891. https://doi.org/10.3390/agriculture14111891

AMA Style

Xu Y, Teng G, Zhou Z. Short-Term Prediction Method for Gas Concentration in Poultry Houses Under Different Feeding Patterns. Agriculture. 2024; 14(11):1891. https://doi.org/10.3390/agriculture14111891

Chicago/Turabian Style

Xu, Yidan, Guanghui Teng, and Zhenyu Zhou. 2024. "Short-Term Prediction Method for Gas Concentration in Poultry Houses Under Different Feeding Patterns" Agriculture 14, no. 11: 1891. https://doi.org/10.3390/agriculture14111891

APA Style

Xu, Y., Teng, G., & Zhou, Z. (2024). Short-Term Prediction Method for Gas Concentration in Poultry Houses Under Different Feeding Patterns. Agriculture, 14(11), 1891. https://doi.org/10.3390/agriculture14111891

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term Prediction Method for Gas Concentration in Poultry Houses Under Different Feeding Patterns

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Source

2.1.1. Dataset 1 Source: Automated Environmental Control Broiler Breeder House

2.1.2. Dataset 2 Source: Artificial Environment Control Broiler Breeder House

2.2. Gas Concentration Prediction Model Development

2.2.1. Main Framework

2.2.2. Data Preprocessing

2.3. Structure of the Proposed Hybrid STL-GC-LSTM-XGBoost Model

2.3.1. Target Decomposition

2.3.2. Auxiliary Variable Selection

2.3.3. Component Prediction

2.3.4. Correction of Results

2.3.5. Evaluation Indicators

3. Results and Discussion

3.1. Dataset and Setting

3.2. Short-Term Prediction of CO₂ Concentration

3.2.1. Verification of the Predicted CO₂ Results of the STL-GC-LSTM-XGBoost Model

3.2.2. Comparison of the Predicted CO₂ Results of Different Models

3.3. Short-Term Forecast of Ammonia Gas

3.3.1. Verification of the Predicted NH₃ Results of the STL-GC-LSTM-XGBoost Model

3.3.2. Comparison of the Predicted NH₃ Results of Different Models

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Short-Term Prediction Method for Gas Concentration in Poultry Houses Under Different Feeding Patterns

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Source

2.1.1. Dataset 1 Source: Automated Environmental Control Broiler Breeder House

2.1.2. Dataset 2 Source: Artificial Environment Control Broiler Breeder House

2.2. Gas Concentration Prediction Model Development

2.2.1. Main Framework

2.2.2. Data Preprocessing

2.3. Structure of the Proposed Hybrid STL-GC-LSTM-XGBoost Model

2.3.1. Target Decomposition

2.3.2. Auxiliary Variable Selection

2.3.3. Component Prediction

2.3.4. Correction of Results

2.3.5. Evaluation Indicators

3. Results and Discussion

3.1. Dataset and Setting

3.2. Short-Term Prediction of CO2 Concentration

3.2.1. Verification of the Predicted CO2 Results of the STL-GC-LSTM-XGBoost Model

3.2.2. Comparison of the Predicted CO2 Results of Different Models

3.3. Short-Term Forecast of Ammonia Gas

3.3.1. Verification of the Predicted NH3 Results of the STL-GC-LSTM-XGBoost Model

3.3.2. Comparison of the Predicted NH3 Results of Different Models

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2. Short-Term Prediction of CO₂ Concentration

3.2.1. Verification of the Predicted CO₂ Results of the STL-GC-LSTM-XGBoost Model

3.2.2. Comparison of the Predicted CO₂ Results of Different Models

3.3.1. Verification of the Predicted NH₃ Results of the STL-GC-LSTM-XGBoost Model

3.3.2. Comparison of the Predicted NH₃ Results of Different Models