1. Introduction
In recent years, the interest for environmental protection has grown faster, becoming an important criterion for public policy in social and political contexts [
1,
2]. The most pursued objective is to reduce emissions of greenhouse gases (e.g., carbon dioxide, nitrous oxide and methane), which are responsible for the greenhouse effect [
3]. In particular, the cement industry contributes about 5% of global anthropogenic CO
2 emissions excluding land-use change [
4,
5], as the production of the binder is a highly energy-intensive and emitting process. Calcination of raw materials for the cement production (e.g., limestone, clay, calcareous marl and other clay-like materials) and burning (fossil) fuels to maintain high temperature in the kiln are the processes with highest environmental impact. The former is a chemical emission, the latter a physical emission. Indeed, raw materials are heated inside large rotating furnaces at 1400 °C to form a solid substance called clinker [
6]. During this process, chemical emissions mainly come from calcium carbonate (CaCO
3) and magnesium carbonate (MgCO
3) calcination according to Equations (1) and (2) [
7,
8]:
Clinker is then ground or milled with gypsum and other constituents (e.g., products, raw materials, additives, recycled waste) to produce cement [
9]. According to [
9], 5 main types of cement (CEM I to CEM V) and 27 products in the family of common cements are defined. They differ for composition (proportion by mass) of the main constituents, but all contain clinker. Therefore, its production affects the environmental performances of the final product and cannot be overlooked in this study. The chemical process described in Equations (1) and (2) implies more than 60% of total CO
2 emissions due to clinker production, as confirmed by the mass balance published in the last Environmental Product Declaration (EPD) of the Italian cement production [
10]. Chemical emissions are not reducible, but physical emissions resulting from fuel combustion for kiln firing can be managed and reduced using alternative/waste fuels [
11] and/or by adopting energy-saving technologies [
12]. However, these changes in production processes have to turn out in agreement with the required quality of the obtained products, related to construction or industrial uses [
13], or with some special performances to obtain [
14,
15,
16]. Moreover, in order to reduce the impacts of cement production, in the last decade the content of clinker in cement has decreased [
17] using supplementary cementitious materials (e.g., gypsum, ground limestone, coal fly ash or blast furnace slag) [
18,
19,
20]. Therefore, the cement industry is making efforts to reduce its environmental impacts in terms of greenhouse gases that reflect on other environmental performances. The standard EN 15804 “Sustainability of construction works, environmental product declarations, core rules for the product category of construction products” [
21] defines the core rules for the product category of construction Products in order to assess the life cycle impact and develop a Type III environmental declaration (i.e., EPD) for any construction product and construction service [
22].
According to [
21], different parameters describe the environmental performances of a product, such as environmental impacts, resource use, waste categories, and output flows [
23,
24,
25,
26,
27]. In the literature, several studies assessed the impacts of cement production considering its upstream processes (i.e., production of commodities, raw materials, transport to the factory plant, production process) [
28]. The obtained results allow the cement industry to identify the best strategies to reduce its environmental impact. More in detail, the authors have Life Cycle Analysis (LCA) results of the Italian cement production from 2014 to 2019; this data refers to the environmental performances of one of the most important cement industries in Europe with over 19 million of Mg produced in 2019 [
29]. The impacts of more than 190 cement powders produced according to [
9] have been assessed with a “from cradle to gate” boundary approach using the database Ecoinvent 2.2 (Version 2.2, Ecoinvent, Zurich, Switzerland, 2007) and the software package SimaPro 8.0.5.13 (Version 8.0.5.13, Pré Consultants: Amersfoort, The Netherlands, 2016) [
30]. 15 different impact categories were assessed to describe the characteristics of each cement powder recipe.
The goal of this work is to identify among these features the most relevant variables to predict the behavior of Global Warming Potential (GWP). To obtain these results, different models have been implemented, by means of linear regression and variable selection procedures, more specifically, the Akaike Information Criterion. Analogous models are developed also for the other impact categories that quantify emissions to air. A comparative analysis shows that the most important impact category to control GWP and other emissions is represented by the Abiotic depletion-fossil fuels (ADPf).
2. Data and Methods
After a quick introduction of the available dataset, this section provides the reader with the key concepts to investigate the behavior of the Global Warming Potential (GWP) and its connections with the production parameters, as well as a short summary of the statistical methods here employed.
The examined impact categories (ICs) comply with the standard EN 15804 [
21];
Table 1 lists their name and units of measure.
ICs in
Table 1 are available for a sample of 193 different recipes to produce 1 kg of gray cement, which can be classified into four types of cement (from CEM I to CEM IV according to [
9]):
CEM I (i.e., Portland cement). It is the most impacting type because it contains at least 95% by mass of clinker and gypsum as minor additional constituent to control the “setting of cement”.
CEM II (i.e., Portland composite cement). It is composed of clinker with different proportion by mass (65–94%), main constituents (e.g., bastfurnace slag, silica fume, pozzolana, fly ash, burnt shale or limestone), and gypsum as minor additional constituent. In the examined panel, CEM II A-LL and CEM II-BLL (i.e., portland limestone cements) are taken into account. The former has 80–94% clinker, 6–20% limestone and 0–5% gypsum by mass, the latter has 65–79% clinker, 21–35% limestone and 0–5% gypsum by mass:
CEM III (i.e., blastfurnace cement) is composed of 35–64% clinker, 36–65% blastfurnace slag, 0–5% gypsum by mass; and
CEM IV (i.e., pozzolanic cement). It is composed of clinker and pozzolanic constituents (i.e., bastfurnace slag, silica fume, pozzolana, and fly ash). Two types of CEM IV are in the panel: CEM IV/A (65–89% clinker, 11–35% pozzolanic materials, 0–5% gypsum) and CEM IV/B (45–64% clinker, 11–35% pozzolanic materials, 0–5% gypsum).
A preliminary analysis on the correlation matrix, performed to study relations between variables, is given in
Figure 1. Positive and negative correlations are displayed in blue and red color, respectively, while the intensity of the color of each circle and its size is proportional to the absolute value of the corresponding correlation coefficient.
Figure 1 shows that ODP, ADPf and TNRPE are pairwise perfectly correlated, that is, the corresponding correlation coefficient is equal to 1. This implies that these variables match deterministically, up to a multiplicative factor. For this reason, ODP and TNRPE are discarded in the further analysis. The remaining ICs have been used as independent variables and split into two groups, “Emissions” and “Consumption” (
Table 2).
This work will focus on the following questions:
Which ones are the most relevant variables among the Consumption ICs in order to explain the behavior of GWP?
Are different variables important to predict GWP for the four types of cement?
Are the relevant variables to predict GWP also useful to predict other Emission ICs?
The answer to the first question is given by fitting and evaluating a linear regression model linking GWP to the Consumption ICs, (Equation (3)):
where
ε is the so-called noise, a vector of size
n = 193 of independent and identically distributed random variables, while (
βADPe,
βADPf,
βTRPE,
βSRM,
βnonRSF,
βRSF,
βWater,
βEl) is the vector of parameters to be estimated [
31]. In statistics, linear regression is a widely used approach to establish the relationship between the so-called response (in our case GWP for
Section 3.2 and
Section 3.3, the other Emission ICs in
Section 3.4) and a set of explanatory variables (here the Consumption ICs) [
32]. The relationship between the response and the explanatory variables is modeled by means of a linear predictor function whose unknown model parameters are estimated from the data. In this work, all the linear models are fit by means of the so-called least squares approach, which minimizes the sum of the squared residuals (the differences between the observed value and the one predicted by the model. In our case, solving the regression problem produces a vector of estimators
. The most relevant estimators can be then selected by means of the so-called Akaike Information Criterion (AIC) [
32]. This procedure results in a model where only statistically significant variables in terms of the highest variances are selected, while the others are iteratively discarded. A model validation is then performed by means of a 10-fold cross validation procedure [
31] to assess and compare the accuracy of the two models. Cross validation evaluates the accuracy of a predictive model, estimating its ability to predict new data. In the k-fold cross validation, the original dataset is randomly partitioned into k subsamples of equal size. Then, one subsample (the validation data) is used to test the model obtained by using the remaining k-1 subsamples (the so-called training data). This procedure is then repeated k times and averaged, so that all the observations are used for both validation and testing. The measure of the accuracy of each method is provided by the root mean square error, the risk function which measures the square root of the average squared difference between observations and the estimated values [
31].
As far as the second question is concerned, it is now investigated if different types of cement influence GWP in different ways and, consequently, if the statistical model’s accuracy can be improved by fitting a separate regression model for each class. These regression models can be expressed as Equation (4):
where
i = I, II, III, IV and solving the regression model produces estimates for the set of parameters
,
i = I, …, IV. It is also relevant to check if some types of cement behave differently in terms of the regression models and relevant variables. Again, 10-fold cross validation procedures [
31] are used to compare the results and to verify how accurate each predictive model is.
Regarding the third question, a multiple linear regression using the full set of Consumption ICs is performed and used to evaluate two alternative models. The first alternative model uses as input variables only the ones selected for GWP. The other alternative model develops different variables for each Emission IC, by means of the AIC criterion. If sufficiently accurate, the first model would allow the producer to focus on the same subset of variables to control jointly all the emissions. If it is not the case, the second model establishes which Consumption ICs are relevant to predict other emissions than GWP. Also in this case, a 10-fold cross validation procedure has been applied to compare the accuracy of the models. The statistical analysis has been performed within the R Cran environment [
33] and the support of additional packages [
34,
35].
3. Results
In this section, details concerning the performed data analysis are presented and discussed to answer the questions introduced in
Section 2. In particular,
Section 3.1 includes some preliminary exploratory analysis.
Section 3.2 concentrates on Question 1, by studying and comparing two models to predict GWP, the first model obtained by fitting a linear regression, and the second model by selecting the most relevant variables by means of the AIC criterion.
Section 3.3 is concerned with Question 2. For each type of cement, a linear regression is fit and then the most important Consumption ICs are selected by the AIC criterion. Analogies and differences shown by the models here developed and the ones in
Section 3.1 are then investigated.
Section 3.4 is focused on the other Emission ICs and, then, on Question 3. For each type of Cement and for each Emission, three different models are studied and then compared. The first model is a linear regression which uses all the available Consumption ICs to predict each emission. The second model is a linear regression where only the relevant variables to GWP established in
Section 3.2 are used. The third model selects the relevant variables for each Emission by the AIC criterion. The three models are then examined and compared to establish whether the same Consumption ICs can be used to predict accurately all the Emissions or not.
3.1. Exploratory Analysis
First, it is investigated if there is some relation linking GWP with the other Emission ICs, and it can be observed that all the Emission variables are positively correlated (i.e., green lines) with GWP, as shown in
Table 3 and described in
Figure 2.
Table 4 presents the correlation coefficients of GWP with the Consumption variables, presented in Figure 4.
Having regard to the correlation coefficients,
Figure 2 shows that all Emission ICs are positively correlated.
Figure 3 shows the correlation between GWP and the Consumption ICs. GWP is strongly positively correlated to ADPf and El and mildly negatively correlated to TRP and SRM. The results about GWP, El and ADPf comply with the physical meaning of these ICs. Indeed, the more fossil fuels and electricity are consumed, more greenhouse gases are emitted. On the other hand, the use of secondary fuels (both renewable and non-renewable) reduces the consumption of fossil fuels. Correlation between ADPe and water is due to the upstream processes (i.e., quarry extraction) of the cement production.
Therefore, the most relevant variables among the Consumption ICs that could explain the behavior of GWP are electricity and fossil fuels consumption. This complies with the Italian energy mix, whose main energy consumption is driven by petroleum and other liquids and natural gas [
36]. On the other hand, the correlations between GWP and other Emission ICs (
Table 3) and GWP and the Consumption ICs (
Table 4) justify the international approach to protect the environment reducing greenhouse gas emissions. At this purpose, in 2003 the European Parliament and the Council established the Emissions Trading Scheme [
37] to limit or reduce greenhouse gas emissions.
Figure 4 provides the reader with explicit correlation coefficients (in the top-right cells with respect to the main diagonal), an estimation of the density function by a histogram and a kernel density estimation (KDE) (in the main diagonal) and, finally, scatterplots with fitted nonparametric regression lines to stress the relationship between pairs of different variables (in the bottom-left cells with respect to the main diagonal). In the first column, each plot displays values for GWP paired with all the Consumption ICs, while in the first row the correlation coefficients between GWP and the Consumption ICs are listed.
In
Figure 4 both x- and y- axis labels refer to the corresponding iCs listed in the main diagonal; their units comply with those listed in
Table 1. Therefore, GWP values obtained in the LCA range between 0.6 and 1.0 kg CO
2 eq./1 kg of produced cement; ADPe ranges between 1.0 × 10
−7 and 5.0 × 10
−7 kg Sb eq./1 kg of produced cement.
3.2. Linear Regression and Variable Selection for GWP
In this Section two predictive models for GWP are fit. The first model is a linear regression where GWP is the scalar response and the Consumption ICs are the input variables. All variables have been preliminarily normalized to simplify the interpretation. The estimated coefficients are listed in
Table 5, together with the related standard deviations (St. dev.) and the corresponding significance for the
p-values associated to the significance test of the model.
To develop the second model, an AIC backward selection procedure is then performed on the linear regression, to find the best subset of Consumption ICs to accurately predict GWP, leading to the model described in
Table 6.
Then, a 10-fold cross-validation procedure is performed to compare the two models. The root mean square errors (RMSE) are computed by Equations (5) and (6):
for the linear and AIC model, respectively. The sample size
= 193 is the number of observations, while
, and
are the estimated parameters with the linear model and then selected by the AIC criterion, respectively. Recall that the RMSE is a risk function aimed to measure the discrepancy between the observations and the corresponding estimated values. In
Table 7 is higher than
, thus the variable reduction produces a more accurate model.
Finally,
Figure 5 describes the size of each regression slope coefficient, after the variable selection. The highest contribution to GWP is given by ADPf. This result complies with the release of carbon dioxide into the atmosphere by burning of fossil fuels [
38,
39,
40].
In answer to Question 1, the most relevant variables among the Consumption ICs to predict GWP are ADPf, NRSF and RSF. The energy-intensive industry of cement manufacturing can motivate this variable selection: all these ICs quantify the energy, mainly fossil but also alternative, spent in the process. This result complies with the efforts to implement in the cement sector different management systems, process-integrated techniques and end-of-pipe measures identified as Best Available Techniques (BAT) to have environmental benefits (e.g., thermal energy optimization techniques in the kiln system; reduction of electrical energy use; recovery of excess heat from the process and cogeneration of steam and electrical power) [
41].
3.3. Linear Regression and Variable Selection for Each Type of Cement
The different types of cement are now studied separately to evaluate their impact on GWP.
Figure 6, which contains the scatterplots related to GWP and the Consumption ICs, shows that the points associated to the class CEM I (in blue) are isolated in the GWP scatterplots with respect to the data belonging to the other types. Moreover, the environmental impacts of CEM I are higher than other investigated cement types: both the qualitative and the quantitative observed trends suggest investigating whether predictive models built separately for each class (type of cement) could achieve more accurate predictions for GWP. Furthermore, it is of extreme interest to check if different variables result to be important for each separate class with respect to the ones selected for the whole dataset.
Figure 6 contains a matrix of scatterplots used to visualize the relationship between pairs of variables, all listed in the main diagonal. For each scatterplot, the variables in the
x-axis (
y-axis, respectively) can be found in the entry belonging to the main diagonal in the same column (row, respectively). The units of each axis label are listed in
Table 1.
The dimensions of each class are given in
Table 8.
Due to the small number of observations, CEM III is filtered out.
For each class, a linear regression is fit, where GWP corresponds to the scalar response and the ICs to the explanatory variables. The estimated regression coefficients for CEM I, II, and IV are listed in
Table 9,
Table 10 and
Table 11, respectively.
Then, AIC backward selection procedures yield the models for the classes CEM I, II and IV described in
Table 12,
Table 13 and
Table 14, respectively.
A 10-fold cross-validation procedure shows that the root mean square error (RMSE) related to the AIC selection model, is lower than the one computed for the full linear regression model (
Table 15).
To answer to Question 2, also in this case a variable selection procedure leads to a more accurate model. Furthermore, different variables are selected as relevant depending on the type of cement. It can be observed that while for the class CEM I, the ICs TRPE and NRSF are discarded, for the classes CEM II and CEM IV the ones rejected by the AIC criterion are Water and El and the same ICs are selected. These ICs are also consistent also with the ones chosen for the whole dataset (except to SRM). The model for the class CEM I is characterized by a different set of variables and associated sizes, including Water and El (
Figure 7 shows the size of the coefficients for each class and for the aggregate data,
Figure 7a–d, respectively). It also confirms the qualitative interpretation of
Figure 6, where data belonging to CEM I was mostly isolated.
The results of CEM I (
Figure 7b) differ from CEM II and CEM IV (
Figure 7c,d) due to its composition (i.e., at least 95% by mass of clinker and not more than 5% by mass of gypsum according to [
9]), while other cement types contain less clinker and other main constituents.
3.4. Other Emission Variables
This section focuses on producing accurate models for the other Emission ICs. The first model here fit is a multiple linear regression model containing as target variables all the emissions. The results are collected in
Table 16,
Table 17 and
Table 18 for CEM I, II and IV, respectively. Since in
Section 3.3, subsampling data with respect to the type of cement has led to a more accurate model, also in this Section each type of cement is examined separately.
For each class, the second model consists in a linear model, which exploits only the variables established in
Section 3.3 as relevant to predict GWP. The models are summarized in
Table 19,
Table 20 and
Table 21 for CEM I, II, and IV, respectively.
A third model is finally produced applying the AIC criterion separately to each Emission IC for each class. Then, each of this model selects the most important variables for the corresponding IC. The results are summarized in the
Table 22,
Table 23 and
Table 24 for CEM I, II, and IV, respectively.
A 10-fold cross-validation procedure shows that the root mean square error (RMSE) computed for AIC selection model built for each emission IC is lower than the one related to the full linear regression model and to the one associated to the variables selected to predict GWP. The two last RMSEs are among them comparable (
Table 25).
In answer to Question 3, the results in
Table 25 highlight that, while in general using the same subset of Consumption IC relevant to predict GWP does not improve the accuracy of the linear model, performing separately for each Emission IC a variable selection procedure leads to a meaningful enhancement in terms of predictive assessment of the model.
Figure 8 confirms that ADPf plays a pivotal role for the majority of IC emissions, that is, AP, EP, POCP and P (CEM I and CEM IV). CEM II differs from CEM I and CEM IV due to its limestone-based composition; particularly, POCP CEM II has its highest correlation with the Water consumption IC. It is confirmed by the upstream processes necessary to quarry natural raw materials.
4. Discussion
Due to a dependence on fossil fuels and the calcination of raw materials, the cement industry generates about 5% of global greenhouse gas emissions. Within this framework, several efforts are on-going to protect the environment and increase energy efficiency using renewable resources or alternative fuels. In order to analyze comparable environmental performances, cement companies are conducting life cycle assessment of their “from cradle to gate” processes in order to identify the best strategies to meet the need for sustainable development.
In this study, having regard to the European approach compliant with the standard EN 15804, the environmental impacts of 193 different recipes of gray cement produced in Italy from 2014 to 2019 have been assessed. Fifteen different impact categories have been considered and split into two classes, “Emissions” and “Consumption”. One of the main results of this work concerns the identification of the significant Consumption ICs to predict the behavior of Emission ICs, In particular, the target of this paper consists in answering to the following questions:
Which ones are the most relevant variables among the Consumption ICs in order to explain the behavior of GWP?
Are different variables important to predict GWP for the four types of cement?
Are the relevant variables to predict GWP also useful to predict other Emission ICs?
As far as Question 1 is concerned, it is shown that the most important variable to predict the behavior of GWP is ADPf (
Figure 8), while NRSF and RSF are the two other most relevant consumption variables. To answer Question 2, a more accurate model is produced by fitting a linear regression and applying the AIC criterion for different types of cement (i.e., CEM I, CEM II, CEM IV) separately. Also in this case, ADPf is proved to be the most important Consumption IC. However, scatterplots related to GWP and the Consumption ICs show that the environmental performances of CEM I differ from those of the other types, and their values are higher. Predictive models built separately for each type of cement revealed more accurate predictions for GWP. Finally, concerning Question 3, the authors investigated if the relevant variables to predict GWP could predict other Emission ICs. In this case, it is shown that fitting separately regression models and selecting the most important variables leads to more accurate predictions for all the other Emissions ICs (
Table 25) in comparison to the standard linear model or the one which uses the same Consumption ICs for GWP. Also in this case, ADPf is confirmed to be a strong predictor in the models related to the emission variables AP, EP, POCP (for CEM I and CEM II), and P (for CEM I and CEM IV). Therefore, the obtained results underline the need for policies and strategies that could reduce consumption of energy, both fossil and secondary fuels, and justify the European policies about Emission trading and Best Available Techniques to be implemented in the cement industry.