Extreme Gradient Boosting-Based Machine Learning Approach for Green Building Cost Prediction

Alshboul, Odey; Shehadeh, Ali; Almasabha, Ghassan; Almuflih, Ali Saeed

doi:10.3390/su14116651

Open AccessArticle

Extreme Gradient Boosting-Based Machine Learning Approach for Green Building Cost Prediction

¹

Department of Civil Engineering, Faculty of Engineering, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan

²

Department of Civil Engineering, Hijjawi Faculty for Engineering Technology, Yarmouk University, Shafiq Irshidatst, Irbid 21163, Jordan

³

Department of Industrial Engineering, King Khalid University, King Fahad St., Guraiger, Abha 62529, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(11), 6651; https://doi.org/10.3390/su14116651

Submission received: 30 April 2022 / Revised: 25 May 2022 / Accepted: 25 May 2022 / Published: 29 May 2022

Download

Browse Figures

Versions Notes

Abstract

:

Accurate building construction cost prediction is critical, especially for sustainable projects (i.e., green buildings). Green building construction contracts are relatively new to the construction industry, where stakeholders have limited experience in contract cost estimation. Unlike conventional building construction, green buildings are designed to utilize new technologies to reduce their operations’ environmental and societal impacts. Consequently, green buildings’ construction bidding and awarding processes have become more complicated due to difficulties forecasting the initial construction costs and setting integrated selection criteria for the winning bidders. Thus, robust green building cost prediction modeling is essential to provide stakeholders with an initial construction cost benchmark to enhance decision-making. The current study presents machine learning-based algorithms, including extreme gradient boosting (XGBOOST), deep neural network (DNN), and random forest (RF), to predict green building costs. The proposed models are designed to consider the influence of soft and hard cost-related attributes. Evaluation metrics (i.e., MAE, MSE, MAPE, and

R^{2}

) are applied to evaluate and compare the developed algorithms’ accuracy. XGBOOST provided the highest accuracy of 0.96 compared to 0.91 for the DNN, followed by RF with an accuracy of 0.87. The proposed machine learning models can be utilized as a decision support tool for construction project managers and practitioners to advance automation as a coherent field of research within the green construction industry.

Keywords:

green buildings; cost prediction; machine learning; extreme gradient boosting (XGBOOST); deep neural network (DNN); random forest (RF)

1. Introduction

With the growing demand for green buildings worldwide, it has become necessary to develop a new adequate research field to create practical evaluation approaches for green building bidders to guarantee that the selected bid winner has practical experience and knowledge of all the required stages, which are vital to finish such projects with the required time, cost, quality, safety, and environmental aspects. In addition, influential research ensures the ability to successfully leave space for the green construction industry to cope with emerging technologies [1]. Thus, accurate construction cost prediction models need to consider all influential attributes. Considering that evaluating the cost of traditional buildings is utterly different from that of green buildings designed to be environmentally friendly through the production of zero greenhouse gases, it is necessary to design cost forecasting models differently and innovatively [2,3,4]. Although the current literature contains various cost prediction models for traditional buildings, minimal efforts have been directed toward green building cost estimation modeling. Green construction cost biddings contain more detailed requirements when compared to traditional ones. Having such detailed comprehensiveness makes it complicated for stakeholders to use the rule of thumb for cost estimation [5]. Consequently, various indicators should be considered when selecting the bid winner, including green building management, environmental aspects, material use, water use and water protection, land and construction site protection, and energy use.

The commonly adopted procedure in bid winner selection is the lowest price. Such an approach may be adopted with minimal adverse impact on traditional building construction, yet it is hazardous to employ such an approach for green building construction bidding awards. Thus, improving the policy of the public and private sectors for selecting bid winners is crucial for maximizing construction quality and the added value [6]. Furthermore, adopting robust and accurate cost prediction methodologies is expected to strengthen the competition between the bidders and enhance the awarding process to ensure successful green construction. Therefore, effective construction cost forecasting has positive practical implications on economic, social, and environmental levels.

The inability to appropriately manage the selection of the bid winner results in significant delays in project delivery; hence, the bid winner must be carefully picked [7]. The current study opens up new possibilities for developing innovative and integrated models for green building cost prediction that consider various influencing factors and may be used in the bidding awarding procedures to reduce the financial and legal conflicts among contractual parties for such projects. Construction project delivery techniques are critical in forecasting bidding costs and, accordingly, in updating the bid winner selection policy by adhering to regulations, assizes, and guidelines to avoid patronage. Consequently, in forecasting the total cost of green building construction projects, it is challenging to establish a suitable balance in the bid cost in anticipation of the actual cost. As a result, a new method incorporating the primary aspects influencing the cost of green building construction should be developed. In this study, a machine learning-based model for predicting green building construction costs was developed, which is critical for paving the way to selecting the best bidder to fulfill the required conditions, expanding the benefits of green construction, and accommodating rapid changes along with future change orders.

2. Literature Review

Green buildings are designed to meet current and future generations’ needs in protecting planet Earth. Since the nineteenth century, the demand for such a green construction approach has become necessary for efficient pollution reduction, dynamic use of materials, and more social inclusiveness by reducing the ripple effect of the construction process and maintaining environmentally friendly building operations [8]. Several green building design approaches have been proposed to simplify green building deliveries (e.g.,

LEED

and the Building Research Establishment Environmental Assessment Method). For instance, in 2021,

LEED

-certified green buildings were up to over 100,000 in the United States and 69,800 in Canada [9,10]. Public construction biddings are part of the public procurement structure that seizes a significant portion of the public expense yearly, which is estimated to be about ten percent of total incomes in most countries in North America [11].

In most traditional bids, the winner is determined based only on the bid price aspect following the law, which is called the lowest-priced bid. If such a winner cannot perform the bid, the bid is awarded directly to the second-lowest price [12]. This is the second type of bidding adopted in the market. This approach increases the bid price compared to the lowest-bid approach [13]. These approaches are impractical for the owner as all bidders try to make their estimate lower than others and lower the actual cost to win the bid [14]. Most private and public firms adopt such an approach in bidding in the United States and European countries [15,16]. However, applying the lowest price approach in bidding creates various issues regarding work quality, construction delays, disputes, and claims [17,18].

Though the lowest price approach is the best for cost reduction purposes and is widely used in the construction industry, it is a recipe for conflict, primarily when intense competition exists [19,20,21]. For instance, in Turkey, most contractors who have won construction bids via a lower-price approach have faced trouble in the construction project delivery. In one study, about 430 questionnaires were analyzed after being sent to contractors in Turkey. The results indicated that such consequences have occurred because contractors are trying to continue in the market regardless of inaccurate bidding and little experience in contract pricing [22].

Change order costs also increase when the lowest-price approach is adopted, especially in green building construction. In addition, the “multiattribute” is used at the best price in the construction industry to achieve the best value [23,24]. Another approach utilized is the average bid, where the winner is selected by applying a single criterion. First, the price is compared with the average of all the bidders’ prices and then the one that is closest to the average bid is adopted [25,26,27]. The Peruvian approach has also been applied to construction project bids by removing outliers’ bid values that have a price increase or decrease of 10% of the actual average bid price from the submitted list. The new average is determined in the next step to choose the bidder that meets both conditions closest to and below the new average price [28]. Bidders are classified based on several aspects, such as quality, profitability, leverage, and expertise, to determine who is eligible to win the bid. A neural network technique was employed in the same vein to decide on such a concept [29]. In addition, several experimental methods have been used for bidding selection. For instance, the non-competitive method was implemented in construction bids, and various identification strategies were evaluated experimentally [30]. Additionally, an analytical model was developed based on game theory to address construction project claims and opportunistic bidding [26].

Moreover, risk possibility and competition for projects were measured using a case-based reasoning method [31]. Using general regression and classification networks, choosing the fittest bid closest to the actual construction project price was studied [15]. The bid scoring formula is a realistic approach to selecting the winner and is still valid for providing a promotional result that meets the needs of the owner and contractor [32].

Green building bids are converting from traditional contracts in choosing a winner to smarter contracts that meet all the sustainability requirement changes in public bidding [33,34]. The design-build (

DB

) approach has been used in green and traditional buildings. It maintains a high integration of the design stages for sustainability in green buildings [5].

DB

is a practical approach to incentivize bidders and make them more familiar with maintaining sustainable goals during the construction and design processes [35,36,37]. The dynamic and practical connection between the contractor and the design group is a significant feature of improving the concept of all aspects of green buildings, such as quality, cost, and time [33,38,39]. Compared to the traditional approach, such as the low-bid approach to choosing a winner,

DB

seems to have significant advantages for sustainable green building objectives [40,41]. Many approaches have been improved since the 1990s to find a rating approach to handle green buildings properly [42]. To achieve a comprehensive vision of all aspects of the bid details and to meet the required conditions written in the contract to satisfy all parties, when determining the bid winner, it is vital to consider the quality, environment, technical aspects, reputation, and price. This approach ensures that more work and effort is put into enforcing adequate and acceptable criteria for choosing the winner to attain the optimal advantage [16,41,42]. Many factors must be considered for the conceptual phase in the design and planning stages of the green building to acquire practical work [43]. In addition, effective green building design must employ efficient natural and ecologic resource management processes [44]. To fulfill such a need, selecting the best bidder is required.

3. Methodology

The main aim of this study was to develop a forecast model for providing accurate green building project costs. Data collection, feature selection, machine learning algorithm formulation, and model evaluation are the four major stages of the suggested methodology. The methodology flowchart is illustrated in Figure 1.

3.1. Data Collection

General information on the initial costs of green building construction was gathered from various sources, including journals, the green building council website, and other related websites. The original cost data for the 283 LEED-certified green buildings under consideration (placed at various locations across North America) were gathered from various sources and fed into the constructed models. Some of the primary data sources for this study were the websites of green building councils in the United States and Canada [9,10]. Such data were captured and exported into MS Excel sheets to prepare for the analytical procedure. The information was gathered between 2010 and 2020. However, about two years were required to ensure that the obtained data were enough and usable for the

M L

approach. Emphasis was placed on quantifying and comparing the economic performance of each building in the dataset. The data were modified depending on location to create consistent results for comparison. Furthermore, the designs of the erected structures were compared to make future cost forecasts and comparisons.

3.2. Factor Selection

Hard and soft costs are associated with construction projects [44]. Land costs are expenses associated with land ownership, such as the transfer of ownership, land purchase, and site clearance. In addition, land costs include the actual and direct costs of building implementation, such as civil and structural works, architectural work, and other physical construction activities [45]. Soft costs are indirect expenses associated with the non-physical parts of a building project, such as administration, planning, documentation, and marketing [46]. Soft expenses are linked with activities outside the scope of building costs and typically vary from 1 to 5% of the total construction costs [47]. In summary, hard costs are associated with construction, while soft costs are associated with design and certification services. Thus, soft expenses are any costs that are not directly related to the building cost.

Various studies have been conducted to determine if the cost of green projects will increase or decrease when green elements are introduced to fulfill green building standards compared to regular structures [48,49]. Green buildings are estimated to cost 5 to 10% more than regular structures. According to certain studies, the costs of green buildings increase by 1 to 2%. However, according to another study, the cost of green buildings is raised by less than 2% compared to typical structures [50,51]. The inconsistent findings complicate the investigation into green building costs, and therefore, this matter is a concerning issue. In conclusion, green building costs can be separated into the main types (i.e., hard and soft costs). These types have been further divided into categories, as shown in Table 1.

The factors that influence the cost of green building projects are listed as follows: people, technical, technological, and specific requirements, and external support. For more detail, Table 2 presents a description of these features. All features impact the total building project cost in terms of hard and soft costs. In addition, Figure 2 is included to provide a better view of these features that affect the overall cost of green building projects.

3.3. Data Preprocessing

The use of

M L

algorithms requires data preparation. Variable selection selects variables that will be relevant for predicting green building performance. Outlier cleansing, data noise reduction, normalization, and standardization are part of pre-processing. The information gathered includes numerical values for all variables. Once the proper variables have been chosen, pre-processing is necessary, including outlier removal and dataset normalization. Outliers are removed from the gathered data in this initial data preparation phase. This study used interquartile ranges to identify extreme and outlier results. Appropriate graphical methods, such as boxplots for outlier elimination, were also used. Null indicators were also used to represent and remove missing values from the obtained data. When data points were missing from the original database, reliability difficulties arose. As a result, missing values in the dataset (values represented by the “Null” or “-” indicators) were considered. Pre-processing was necessary for a small amount of missing data (i.e., 9 missing values representing about 3.2% of the original database). These missing data points were replaced with the average and median values of relevant properties. The statistical analysis of the utilized datasets is listed in Table 3. In addition, Figure 3 depicts the data normality presentation for the main features of the green building cost prediction, which is vital for determining how well the distribution curve fits the collected datasets.

3.4. Machine Learning Algorithm

Three ML techniques were employed in this study to estimate the green building costs. Training and testing processes were also conducted to check the suggested ML algorithm’s efficiency. The training applied 70% of the database to train the proposed model, and the testing part applied 30% of the database to carry out the test process. We utilized 5-fold cross-validation to ensure the robustness and effectiveness of the suggested prediction models.

This study used three prediction algorithms (i.e.,

X G B o o s t

,

D N N

, and

R F

). These are the most contemporary and efficient machine learning-based prediction algorithms available. A review was conducted of the many ML prediction algorithms published in the literature. These two methods were chosen for various reasons, including the fact that they are scalable, accurate, relatively quick, versatile, and give regularized model formalization to control overfitting [56,57,58].

3.4.1. Extreme Gradient Boosting ( $X G B O O S T$ )

X G B o o s t

technology is a scalable tree optimization machine learning technology that has recently been widely used in data analysis disciplines. The

X G B O O S T

technique was proposed as a one-of-a-kind applied gradient boosting machine, particularly in regression and classification trees. The “boosting” concept is the root of

X G B O O S T,

which merges the forecasting of weak learners with additive training methods to develop a strong learner. In addition, this process helps to avoid overfitting and improves mathematical ability. The

X G B O O S T

architecture is shown in Figure 4, where the objective functions are simplified by allowing the prediction and regularization terms to be combined while preserving the fastest possible processing speed. The general function of the forecasting is set up at step

p,

as shown in Equation (1)

f_{i}^{(p)} = \sum_{k = 1}^{p} f_{k} (x_{i}) = f_{i}^{(p - 1)} + f_{p} (x_{i})

(1)

where

f_{p} (x_{i})

denotes the learner at step

p

,

f_{i}^{(p)}

denotes the prediction at

p

,

f_{i}^{(p - 1)}

denotes the prediction at

p - 1

, and

x_{i}

denotes the input features.

To make the overfitting reasonable while reducing the model’s mathematical speed, the analytical formula was created by

X G B O O S T

, as shown in Equation (2), to estimate the model’s “goodness” for the original function.

O b j e c t i v e^{(p)} = \sum_{k = 1}^{n} l ({\bar{y}}_{i}, y_{i}) + \sum_{k = 1}^{p} σ (f_{i})

(2)

where l presents the loss function, n presents the number of observations utilized, and σ presents the regularization term as represented in Equation (3).

σ (f) = ϒ T + 0.5 λ ω^{2}

(3)

where ω expresses vector scores in leaves, ϒ expresses the minimal loss necessary to divide the leaf node further, and λ expresses the regularization parameters.

3.4.2. Deep Neural Network ( $D N N$ )

D N N

is a subset of deep learning algorithms that employ multi-layered neural network training and testing to learn complex structures and achieve appropriate abstraction levels. The dataset is directed in one direction within the network, passing the four hidden layers [57]. Such movement improves the memory of the neural network to process sequential data naturally. The

D N N

technique has two stages: The first stage (training) is applied to optimize the network parameters to complete the expected goal (prediction). The second stage (testing) assesses whether the trained model can process a new dataset. The six layers in this study (i.e., the input layer, four hidden layers, and the output layer) were employed to retain the best prediction model accuracy, as presented in Figure 5.

Unique weights were assigned to the inputs by the activation function (rectified linear unit (

R e L U

) [59]. Then, these weights were associated with the model’s variables to reduce the error between the observed and predicted values. As a result, the hidden layers were employed after the prediction was run to extract a new raw descriptor representation, as shown in Equation (4).

x_{i + 1} = f (w_{i} x_{i} + b_{i}), i = 1, 2, \dots I

(4)

where

f

denotes the activation function,

w_{i}

denotes the weight matrix, and

b_{i}

denotes the bias of the

i_{t h}

hidden layer. The parameters were chosen based on (

R e L U

), as presented in Figure 6.

3.4.3. Random Forest ( $R F$ )

This algorithm has been frequently utilized in data mining applications to cope with classification and regression issues [60]. The

R F

is a classification and regression technique that uses a set of classification trees, a bootstrap sample of the data applied to create these sets. The variables are chosen at random at each split as the candidate set of variables for tree building. Employing bagging is the second way that can be used to integrate unsteady learners successfully. The

R F

is a robust algorithm with several merits, such as forecasting accuracy, dealing with many features, fast simulation speed, high performance, and free applications [58]. In this study, two primary indicators (i.e., the number of grown trees and leaf size) determined the accuracy and proficiency of the RF. The number of grown trees and leaf size ranged from 0 to 3000 and 1 to 20, respectively. The model became more stable with reliable results in the case of 400 grown trees and two leaf sizes. In addition, in this study, “bootstrap samples” (

S 1, S 2, \dots S n)

were created from the original dataset, and then these random samples were used to construct trees (

R 1, R 2, \dots R n

). Finally, these trees were combined, as shown in Figure 7.

4. ML Model Results

4.1. Experimental Setup

The K-fold cross-validation process was applied to develop the model’s accuracy by examining the

M L

algorithm performance on various datasets. In addition, the model hyperparameters were tuned using a K-fold cross-validation technique. First, the database must be separated into subsets for training and testing the

M L

modeling process. During this process, the training dataset is partitioned into multiple ‘k’ smaller portions. The term ‘K-fold’ was coined as a result. Then, testing is done with K-fold, while training is done with k-1. Both are also based on a random dataset. In addition, the model hyperparameters are tuned using a K-fold cross-validation technique. The prediction model is then fitted to the training set using the best possible hyperparameter configuration. Consequently, each fold is only utilized as a validation set once. Finally, the accuracy measures for each fold may be compared, and if they are similar, the model is likely to generalize well, as shown in Figure 8.

4.1.1. Hyperparameter Optimization

The hyperparameters of the proposed

M L

algorithms used in this study had to be tuned, as shown in Table 4. These hyperparameters were modified depending on the actual dataset rather than the manual determinations. Thus, the investigation was carried out with k = 1 to k = 10 for K-fold cross-validation. Each

K

represents the grid search for the optimum

M L

model selection and hyperparameter tuning. As a result, five-fold cross-validation had the best prediction accuracy, as discussed the in the performance evaluation.

4.1.2. Feature Importance Analysis

Pearson’s correlation process was conducted between the selected features and the cost values of green building projects to assess the influence of these features on each other and the observed values, as presented in Figure 9.

Better insight and understanding of the model’s features help decision-makers plan and formulate policies effectively. As a result, a feature importance process was carried out using the

X G B o o s t

,

D N N

, and

R F

techniques to identify the importance degree of each feature included in forecasting green building costs. As illustrated in Figure 10, the feature scale plot was implemented to calculate a relative score for each variable. In addition, as presented in Figure 10, the features were ranked in descending order of importance: people, technological, technical, and specific requirements.

4.2. Performance Evaluation

After testing the primary model assumptions, it was vital to evaluate the suggested models’ usefulness and predictive capability. Thus, the assessment measurements were utilized to evaluate the proposed models’ proficiency. Four statistical measures (i.e.,

R M S E

,

M A E

,

M A P E

, and

R^{2}

) were employed to investigate the efficiency of the suggested

M L

models, as presented in Equations (5)–(9).

M A E = \frac{1}{m} \sum_{i = 1}^{m} |Y_{i} - \bar{Y_{i}}|

(5)

R M S E = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(Y_{i} - \bar{Y_{i}})}^{2}}

(6)

M A P E = \frac{1}{m} \sum_{i = 1}^{m} |\frac{Y_{i} - \bar{Y_{i}}}{Y_{i}}| \times 100

(7)

R^{2} = 1 - \frac{\sum_{i = 1}^{m} {(Y_{i} - \bar{Y_{i}})}^{2}}{\sum_{i = 1}^{m} {(Y_{i} - \bar{Y})}^{2}}

(8)

R_{A d j u s t e d}^{2} = 1 - [\frac{\sum_{i = 1}^{m} {(Y_{i} - \bar{Y_{i}})}^{2}}{\sum_{i = 1}^{m} {(Y_{i} - \bar{Y_{i}})}^{2}}] \times [\frac{(m - 1)}{(m - d - 1)}]

(9)

where

Y_{i}

symbolizes the actual (measured) values of the overstrength ratio of short links,

\bar{Y_{i}}

symbolizes the forecasted outcome,

\bar{Y}

symbolizes the mean of the

Y_{i}

,

m

symbolizes the number of datasets utilized, and

d

is an independent variable. The model accuracy is increased if the

R^{2}

value approaches 1 and the

R M S E

,

M A E

, and

M A P E

values approach 0. A set of random, nonoverlapping partitioned folds were used as training and test datasets for k = 3, k = 5, and k = 7, together with their corresponding performance measures. Therefore, the effectiveness of the suggested

M L

models was assessed utilizing a stratified five-fold cross-validation technique, as shown in Table 5.

The comparison of the efficiency of

M L

algorithms (i.e.,

X G B O O S T, D N N, and R F

) to predict the cost of the green building projects was implemented for

k = 5

. Accordingly, the results of the evaluation measures were computed, as shown in Table 6.

The comparison outcomes show that

X G B O O S T

and

D N N

had higher

R^{2}

values (more than 0.90) and lower

M A E

,

R M S E

,

M A P E

values than the

R F

model in predicting green building costs. All assessment metrics results also revealed that

X G B O O S T

had excellent prediction capability and had the highest

R^{2}

value. Furthermore,

X G B O O S T

had the lower values for the rest of the evaluation metrics (i.e.,

M A E

,

R M S E

, and

M A P E

) compared to the

D N N

model, as shown in Figure 11. Moreover, the forecasted outputs of the

X G B O O S T

model illustrate that its prediction values were very close to the values of green building project costs. It is worth noting the progress of the

R_{Adjusted}^{2}

estimate for each developed model. The

X G B O O S T

’s decisive

R^{2}

value was 0.96, implying that the

X G B O O S T

model was somewhat mounted to the datasets since it was close to 1. Given that the forecast

R^{2}

was considerably superior to the regular

R_{Adjusted}^{2}

, this means that the

X G B O O S T

model did forecast new interpretations and fit the existing dataset. Therefore, the

X G B O O S T

model had a better fit and slight deviation from the actual values of the green building costs. Consequently, the

X G B O O S T

was the most effective and competent model for predicting green building costs.

4.3. Experimental Results

The current study was designed to expand and augment the literature in forecasting green building costs. First, the four main features that affect the cost of green buildings were thoroughly investigated and disaggregated into its primary sub-attributes. Then these factors were evaluated according to their acquired data record by developing machine learning-based prediction models in combination with accuracy evaluation matrices to focus on the uncertainty coupled with the cost forecasting. The current research findings reveal that people primarily affect green building cost features, followed by technological, technical, and special requirements, which implies that spreading the “green” culture amongst involved personnel is critical to minimizing the construction cost. In addition, green building contractors need to utilize cutting-edge technologies that can facilitate the deployment of efficient technical approaches necessary for cost objective optimization purposes, reflecting the importance of applying prediction models to produce accurate cost predictions. In line with the adopted methodology, the results reveal the significance of examined attributes and their sub-categories, such as people, technological, technical, and other specific requirements, which demonstrates the consequence of changes in the cost objective functions of alternatives. For example, the cost function was an average of 88% compared to the people, technological, technical, and special requirements at 93%, 90%, 96%, and 82%. Furthermore, variation in the cost objective function was analyzed through various sub-attributes, implying a high dependency and influencing the proficiency of the decision-making process.

Consequently, the cost prediction models utilize the cost-effective frontline approach to affect this matter and streamline the proper decision-making process. The established cost prediction models showed that

X G B O O S T

outperformed the

D N N

and

R F

by

5 %

and

9.4 %

, respectively. Thus, the

X G B O O S T

prediction model represents the most attractive alternative for decision-makers from both an economical and sustainable point of view for the most accurate prediction with the lowest cost objective function and lowest correlated risk.

5. Discussion

This study’s cost prediction models offer an insightful perception of the correlation between influential features and green buildings’ green cost premium. The cost of each building was forecasted via up-to-date machine learning approaches to reflect the cost function variation based on datasets recovered from 283 green building projects that were examined in North America. The proposed models can be used by green building vendors, designers, stakeholders, and decision-makers to predict the green cost objective function of their new green buildings based on the characteristics of the main influential factors. It ought to be stated that the scope of the current research is partial to green buildings in the United States. It is also limited in the quantity of the collected data and the number of the main attributes considered. Thus, it is critical to consider the impact of governmental and non-governmental external support.

It should also be explained that the current research was restricted to economic and sustainable assessments and did not consider social and environmental dimensions. Additionally, the emphasis of the current research was on comparing the predicted cost objective function against the actual construction cost without considering the construction life cycle cost analysis elements or the reimbursement cycle. Therefore, further research is required to combine the current research with additional data, more attributes, various green buildings, and recently

L E E D

-accredited green buildings.

Decision-makers are increasingly relying on technological findings to upgrade and build policies. The current study develops a robust machine learning framework for predicting green building construction costs from recorded datasets, which can be utilized as a general model to simulate all associated characteristics. This provides a good foundation for investigating how different feature interconnectivity might share insight on green building construction cost forecasting. Furthermore, as contemporary machine learning algorithms grow, increasingly sophisticated forecast models provide ways to develop more valuable and exact modeling for green building construction cost prediction, which many construction business practitioners may then use. Finally, consistent with what is currently emerging in the construction engineering and management research fields, the authors are confident in the proposed models’ ability to provide stakeholders with more precise forecasts to fit accessible datasets as advantageous preceding information to feed machine learning-based models.

The current research aims to minimize the knowledge gap in predicting green building costs. Thus, the proposed models were designed after an intensive investigation of the currently available related models. For example, one of the main gaps is the lack of an integrated representation of how the main attributes affect the green building cost interconnectedly. One of the main issues with the already available green building cost prediction models is that they lack integrity. Not considering all of the influential features of green building costs has ripple effects on the developed model’s dependability. For example, some studies have focused on the features of green building technologies while relaxing the people, technical, and specific requirements [61]. This has a detrimental impact on the accuracy of green building cost forecasts since a high level of uncertainty frequently accompanies the major qualities. In addition, little effort has been put into establishing analytical or machine learning-based models for green construction cost prediction. Many existing models use survey and questionnaire methodologies to describe the practice case for green building costs. There is always a need for more quantitative and objective techniques for projecting green construction costs [3,61].

Some studies have provided models that partially forecast the cost of only green building-certified residences [2]. In addition, other research papers have focused on the green certification of office buildings and the cost of equity capital of green buildings [4,62]. Consequently, a holistic model for green building cost estimation is required to provide reliable forecasting tools for practitioners.

Furthermore, historical green construction cost data are scarce. The established data gathering processes and dataset comprehensiveness have also been a significant impediment for researchers in developing reliable and general prediction models, especially for large-budget building contracts. As a result, building a prediction model that can be used effectively and independently of the construction cost value is critical. Several academics have attempted to anticipate green construction costs; however, their conclusions were limited to a single location since only a few relevant features were evaluated [2,3,63]. Such problems prove that decision-making tools are in great demand in the construction industry [64,65,66,67,68,69].

The contradictory findings hampered the assessment of green construction costs, making this a worrying problem. Furthermore, to the best of the authors’ knowledge, no appropriate model for predicting green construction costs is available in the present literature. As a result, a general and worldwide model for green cost prediction is required. The created green building cost prediction models have an advantage over comparable modeling techniques available in the literature because of their different processing chronological sequence, where forecasts are less impacted by the number of classes and can be analyzed consistently. The created models produce fewer discriminant nodes, lowering the number of class dimensions to be evaluated progressively. The generated prediction models have been discovered to be expert short-running models with high forecast precision and low memory consumption with superior performance.

Furthermore, it was discovered that generated models might be classified as realistic decision support tools in several sectors of the construction business when compared to other accessible models. The suggested models may be an integrated, general, practical, and accurate prediction tool. The current study covers several significant traits employed in bidding and awarding procedures to reduce financial and legal concerns among contractual parties. As a result, the proposed models are expected to play a critical role in reducing potential conflict among stakeholders in the green building construction industry, particularly when decision-makers face significant challenges and difficulties in estimating acceptable green building costs that all contractual parties can agree on.

6. Conclusions

Will a green building cost more than a traditional building? Are the costs of the people, technological, technical, and other specific requirements quantifiable and predictable? Is this objective cost function affected by sub-attributes? Do the developed prediction models consider ambiguity in cost function? Do the developed cost forecasting models empower practitioners to take effective cost-related decisions? These research questions can be addressed by developing accurate and robust machine learning-based models for cost prediction to reduce the cost-related risk. The proposed models have been demonstrated to provide decision-makers with a support decision tool to forecast the green buildings’ construction costs of new green buildings and pave the road towards having green buildings LEED-certified based on economic and sustainable aspects. Four primary green building cost attributes and twenty sub-features were considered, and different feasible green construction approaches were investigated utilizing thorough cutting-edge forecasting models for cost prediction and associated risk minimization. ML-based cost prediction modeling approaches were utilized to improve decision-making superiority amongst the best practices.

X G B O O S T

,

D N N

, and

R F

prediction models were designed, and they were evaluated using

M A E

,

R M S E

,

M A P E

, and

R^{2}

. The evaluation results indicate that the

X G B O O S T

and

D N N

prediction performance was superior, where low values for all performance appraisal measures were evaluated, indicating excellent performance. The

R F

had a lower forecast accuracy, but it still had an acceptable level of precision. The most accurate green building costs can be predicted based on the embraced machine learning models.

The current study revealed that green building costs could be accurately predicted via machine learning approaches and smoothly compared with conventional building costs. In addition, the key attributes that influence green building costs were considered. Moreover, the developed cost prediction models are expected to pave the road toward a smoother

L E E D

certification process. Additionally, decision-makers are provided with support decision tools that can predict green buildings’ total operational and life cycle costs. Future research efforts should reflect the inclusion of more datasets, more accurate collection, pre-processing and post-processing, and different types of buildings in various locations. The external support attribute needs to be considered, as it is expected to significantly influence green building costs. Additionally, in future work, the economic assessment of the

L E E D

certification can be expanded beyond construction costs to incorporate the influence on the overall life cycle cost analysis.

Author Contributions

Formal analysis, O.A.; Funding acquisition, A.S.A.; Investigation, O.A. and A.S.; Methodology, A.S., G.A. and A.S.A.; Software, A.S. and G.A.; Supervision, A.S.A.; Writing—original draft, O.A. and G.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through the Large Groups Project under grant number (RGP. 2/178/43).

Conflicts of Interest

The authors declare no conflict of interest.

References

Molenaar, K.R.; Sobin, N.; Antillón, E.I. A Synthesis of Best-Value Procurement Practices for Sustainable Design-Build Projects in the Public Sector. J. Green Build. 2010, 5, 148–157. [Google Scholar] [CrossRef]
Sun, C.-Y.; Chen, Y.-G.; Wang, R.-J.; Lo, S.-C.; Yau, J.-T.; Wu, Y.-W. Construction Cost of Green Building Certified Residence: A Case Study in Taiwan. Sustainability 2019, 11, 2195. [Google Scholar] [CrossRef] [Green Version]
Fan, K.; Chan, E.H.W.; Chau, C.K. Costs and Benefits of Implementing Green Building Economic Incentives: Case Study of a Gross Floor Area Concession Scheme in Hong Kong. Sustainability 2018, 10, 2814. [Google Scholar] [CrossRef] [Green Version]
Plebankiewicz, E.; Juszczyk, M.; Kozik, R. Trends, Costs, and Benefits of Green Certification of Office Buildings: A Polish Perspective. Sustainability 2019, 11, 2359. [Google Scholar] [CrossRef] [Green Version]
Xia, B.; Chen, Q.; Xu, Y.; Li, M.; Jin, X. Design-Build Contractor Selection for Public Sustainable Buildings. J. Manag. Eng. 2015, 31, 04014070. [Google Scholar] [CrossRef] [Green Version]
Montalbán-Domingo, L.; García-Segura, T.; Sanz, M.A.; Pellicer, E. Social Sustainability in Delivery and Procurement of Public Construction Contracts. J. Manag. Eng. 2019, 35, 04018065. [Google Scholar] [CrossRef]
Alhazmi, T.; McCaffer, R. Project Procurement System Selection Model. J. Constr. Eng. Manag. 2000, 126, 176–184. [Google Scholar] [CrossRef]
Myers, D. A review of construction companies’ attitudes to sustainability. Constr. Manag. Econ. 2005, 23, 781–785. [Google Scholar] [CrossRef]
U.S. Green Building Council (USGBC). 2021. Available online: https://www.usgbc.org/ (accessed on 1 January 2022).
Canada Green Building Council (CAGBC). 2021. Available online: https://www.cagbc.org/ (accessed on 1 January 2022).
Zhu, Q.; Geng, Y.; Sarkis, J. Motivating green public procurement in China: An individual level perspective. J. Environ. Manag. 2013, 126, 85–95. [Google Scholar] [CrossRef]
Ioannou, P.G.; Leu, S.S. Average-Bid Method—Competitive Bidding Strategy. J. Constr. Eng. Manag. 1993, 119, 131–147. [Google Scholar] [CrossRef] [Green Version]
Drew, D.S.; Skitmore, M. Testing Vickery’s Revenue Equivalence Theory in Construction Auctions. J. Constr. Eng. Manag. 2006, 132, 425–428. [Google Scholar] [CrossRef] [Green Version]
Chaovalitwongse, W.A.; Wang, W.; Williams, T.; Chaovalitwongse, P. Data Mining Framework to Optimize the Bid Selection Policy for Competitively Bid Highway Construction Projects. J. Constr. Eng. Manag. 2012, 138, 277–286. [Google Scholar] [CrossRef]
Alshboul, O.; Shehadeh, A.; Hamedat, O. Governmental Investment Impacts on the Construction Sector Considering the Liquidity Trap. J. Manag. Eng. 2022, 38, 04021099. [Google Scholar] [CrossRef]
Bergman, M.A.; Lundberg, S. Tender evaluation and supplier selection methods in public procurement. J. Purch. Supply Manag. 2013, 19, 73–83. [Google Scholar] [CrossRef]
Lambropoulos, S. The use of time and cost utility for construction contract award under European Union Legislation. Build. Environ. 2007, 42, 452–463. [Google Scholar] [CrossRef]
Molenaar, K.R.; Johnson, D.E. Engineering the Procurement Phase to Achieve Best Value. Leadersh. Manag. Eng. 2003, 3, 137–141. [Google Scholar] [CrossRef]
Holt, G.D.; Olomolaiye, P.O.; Harris, F.C. Factors influencing U.K. construction clients’ choice of contractor. Build. Environ. 1994, 29, 241–248. [Google Scholar] [CrossRef]
Oviedo-Haito, R.J.; Jiménez, J.; Cardoso, F.F.; Pellicer, E. Survival Factors for Subcontractors in Economic Downturns. J. Constr. Eng. Manag. 2014, 140, 04013056. [Google Scholar] [CrossRef] [Green Version]
Williams, T.P. Predicting final cost for competitively bid construction projects using regression models. Int. J. Proj. Manag. 2003, 21, 593–599. [Google Scholar] [CrossRef]
Gunduz, M.; Karacan, V. Damage to Treasury: Abnormally Low Tenders in Public Construction Works. J. Leg. Aff. Disput. Resolut. Eng. Constr. 2009, 1, 130–136. [Google Scholar] [CrossRef]
David, E.; Azoulay-Schwartz, R.; Kraus, S. Bidding in sealed-bid and English multi-attribute auctions. Decis. Support Syst. 2006, 42, 527–556. [Google Scholar] [CrossRef]
Karakaya, G.; Köksalan, M. An interactive approach for multi-attribute auctions. Decis. Support Syst. 2011, 51, 299–306. [Google Scholar] [CrossRef]
Chang, W.-S.; Chen, B.; Salmon, T.C. An Investigation of the Average Bid Mechanism for Procurement Auctions. Manag. Sci. 2015, 61, 1237–1254. [Google Scholar] [CrossRef] [Green Version]
Ho, S.P.; Liu, L.Y. Analytical Model for Analyzing Construction Claims and Opportunistic Bidding. J. Constr. Eng. Manag. 2004, 130, 94–104. [Google Scholar] [CrossRef] [Green Version]
Liu, S.L.; Lai, K.K.; Wang, S. Multiple criteria models for evaluation of competitive bids. IMA J. Manag. Math. 2000, 11, 151–160. [Google Scholar] [CrossRef] [Green Version]
Henriod, E.E.; Lantran, J.-M. Trends in contracting practice for civil works. In Site Resources; World Bank: New York, NY, USA, 2000; p. 1. [Google Scholar]
Elazouni, A.M. Classifying Construction Contractors Using Unsupervised-Learning Neural Networks. J. Constr. Eng. Manag. 2006, 132, 1242–1253. [Google Scholar] [CrossRef]
Skitmore, M. Identifying non-competitive bids in construction contract auctions. Omega 2002, 30, 443–449. [Google Scholar] [CrossRef] [Green Version]
Chua, D.; Li, D.; Chan, W.T. Case-Based Reasoning Approach in Bid Decision Making. J. Constr. Eng. Manag. 2001, 127, 35–45. [Google Scholar] [CrossRef]
Ballesteros-Pérez, P.; Skitmore, M.; Pellicer, E.; Zhang, X. Scoring Rules and Competitive Behavior in Best-Value Construction Auctions. J. Constr. Eng. Manag. 2016, 142, 04016035. [Google Scholar] [CrossRef] [Green Version]
Shehadeh, A.; Alshboul, O.; Hamedat, O. Risk Assessment Model for Optimal Gain-Pain Share Ratio in Target Cost Contract for Construction Projects. J. Constr. Eng. Manag. 2022, 148, 04021197. [Google Scholar] [CrossRef]
Mollaoglu-Korkmaz, S.; Swarup, L.; Riley, D. Delivering Sustainable, High-Performance Buildings: Influence of Project Delivery Methods on Integration and Project Outcomes. J. Manag. Eng. 2013, 29, 71–78. [Google Scholar] [CrossRef]
Lapinski, A.R.; Horman, M.J.; Riley, D.R. Lean Processes for Sustainable Project Delivery. J. Constr. Eng. Manag. 2006, 132, 1083–1091. [Google Scholar] [CrossRef]
Riley, D.R., II; Pexton, K.; Drilling, J. Procurement of sustainable construction services in the United States: The contractor’s role in green buildings. Ind. Environ. 2003, 26, 66–69. [Google Scholar]
Korkmaz, S.; Swarup, L.; Horman, M.; Riley, D.; Molenaar, K.R.; Sobin, N.; Gransberg, D.D. Influence of Project Delivery Methods on Achieving Sustainable High Performance Buildings: Report on Case Studies Draft for Panel Review; Charles Pankow Foundation: McLean, VA, USA, 2009; Available online: https://www.researchgate.net/publication/327976578_Influence_of_Project_Delivery_Methods_on_Achieving_Sustainable_High_Performance_Buildings_Report_on_Case_Studies_Draft_for_Panel_Review (accessed on 12 February 2022).
Riley, D.; Sanvido, V.; Horman, M.; McLaughlin, M.; Kerr, D. Lean and Green: The Role of Design-Build Mechanical Competencies in the Design and Construction of Green Buildings. In Proceedings of the Construction Research Congress 2005, San Diego, CA, USA, 5–7 April 2005; pp. 1–10. [Google Scholar] [CrossRef]
Alshboul, O.; Alzubaidi, M.A.; Mamlook, R.E.A.; Almasabha, G.; Almuflih, A.S.; Shehadeh, A. Forecasting Liquidated Damages via Machine Learning-Based Modified Regression Models for Highway Construction Projects. Sustainability 2022, 14, 5835. [Google Scholar] [CrossRef]
Korkmaz, S.; Riley, D.; Horman, M. Piloting Evaluation Metrics for Sustainable High-Performance Building Project Delivery. J. Constr. Eng. Manag. 2010, 136, 877–885. [Google Scholar] [CrossRef]
Abdelrahman, M.; Zayed, T.; Elyamany, A. Best-Value Model Based on Project Specific Characteristics. J. Constr. Eng. Manag. 2008, 134, 179–188. [Google Scholar] [CrossRef]
Jaśkowski, P.; Czarnigowska, A. Contractor’s bid pricing strategy: A model with correlation among competitors’ prices. Open Eng. 2019, 9, 159. [Google Scholar] [CrossRef]
Wang, W.; Zmeureanu, R.; Rivard, H. Applying multi-objective genetic algorithms in green building design optimization. Build. Environ. 2005, 40, 1512–1525. [Google Scholar] [CrossRef]
Kubba, S. Handbook of Green Building Design and Construction: LEED, BREEAM, and Green Globes; Butterworth-Heinemann: Herndon, VA, USA, 2012. [Google Scholar]
Zahirah, N.; Abidin, N.Z.; Nuruddin, A.R. Soft Cost Elements That Affect Developers Decision to Build Green. J. Civ. Environ. Eng. 2013, 7, 768–772. [Google Scholar] [CrossRef]
Klinger, M.; Susong, M. The construction project: Phases, People, Terms, Paperwork, Processes; American Bar Association: Chicago, IL, USA, 2006. [Google Scholar]
Consultants, N.E.M. Analyzing the Cost of Obtaining LEED Certification; The American Chemistry Council: Arlington, VA, USA, 2003. [Google Scholar]
Zhang, X.; Platten, A.; Shen, L. Green property development practice in China: Costs and barriers. Build. Environ. 2011, 46, 2153–2160. [Google Scholar] [CrossRef]
Tatari, O.; Kucukvar, M. Cost premium prediction of certified green buildings: A neural network approach. Build. Environ. 2011, 46, 1081–1086. [Google Scholar] [CrossRef]
Issa, M.; Rankin, J.; Christian, A. Canadian practitioners’ perception of research work investigating the cost premiums, long-term costs and health and productivity benefits of green buildings. Build. Environ. 2010, 45, 1698–1711. [Google Scholar] [CrossRef]
Kats, G. Greening America’s Schools. Costs and Benefits. A Capital-E Report. 2006. Available online: https://www.usgbc.org/sites/default/files/Greening_Americas_Schools.pdf (accessed on 10 February 2022).
Mathur, V.N.; Price, A.D.F.; Austin, S.; Moobela, C. Defining, identifying and mapping stakeholders in the assessment of urban sustainability. In Proceedings of the SUE-MoT Conference 2007: International Conference on Whole Life Sustainability and its Assessment, Glasgow, UK, 27–29 June 2007. [Google Scholar]
Häkkinen, T.; Belloni, K. Barriers and drivers for sustainable building. Build. Res. Inf. 2011, 39, 239–255. [Google Scholar] [CrossRef]
Du Plessis, C. A strategic framework for sustainable construction in developing countries. Constr. Manag. Econ. 2007, 25, 67–76. [Google Scholar] [CrossRef]
Azizi, N.Z.M.; Abidin, N.Z.; Raofuddin, A. Identification of Soft Cost Elements in Green Projects: Exploring Experts’ Experience. Procedia—Soc. Behav. Sci. 2015, 170, 18–26. [Google Scholar] [CrossRef] [Green Version]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: San Francisco, CA, USA; pp. 785–794. [Google Scholar]
Vieira, S.; Pinaya, W.; Mechelli, A. Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: Methods and applications. Neurosci. Biobehav. Rev. 2017, 74 Pt A, 58–75. [Google Scholar] [CrossRef] [Green Version]
Janitza, S.; Tutz, G.; Boulesteix, A.-L. Random forest for ordinal responses. Comput. Stat. Data Anal. 2016, 96, 57–73. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Volume 1. [Google Scholar]
Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Tran, Q.; Nazir, S.; Nguyen, T.-H.; Ho, N.-K.; Dinh, T.-H.; Nguyen, V.-P.; Nguyen, M.-H.; Phan, Q.-K.; Kieu, T.-S. Empirical Examination of Factors Influencing the Adoption of Green Building Technologies: The Perspective of Construction Developers in Developing Economies. Sustainability 2020, 12, 8067. [Google Scholar] [CrossRef]
Hsieh, H.-C.; Claresta, V.; Bui, T. Green Building, Cost of Equity Capital and Corporate Governance: Evidence from US Real Estate Investment Trusts. Sustainability 2020, 12, 3680. [Google Scholar] [CrossRef]
Najini, H.; Nour, M.; Al-Zuhair, S.; Ghaith, F. Techno-Economic Analysis of Green Building Codes in United Arab Emirates Based on a Case Study Office Building. Sustainability 2020, 12, 8773. [Google Scholar] [CrossRef]
Alshboul, O.A.; Shehadeh, O.; Tatari, G. Almasabha, and E. Saleh, Multiobjective and multivariable optimization for earthmoving equipment. J. Facil. Manag. 2022; ahead-of-print. [Google Scholar] [CrossRef]
Shehadeh, A.O.; Alshboul, O.; Tatari, M.A.; Alzubaidi, A.; Hamed El-Sayed, S. Selection of heavy machinery for earthwork activities: A multi-objective optimization approach using a genetic algorithm. Alex. Eng. J. 2022, 61, 7555–7569. [Google Scholar] [CrossRef]
Shehadeh, A.O.; Alshboul, R.E.; Al Mamlook, O.; Hamedat, O. Machine learning models for predicting the residual value of heavy construction equipment: An evaluation of modified decision tree, LightGBM, and XGBoost regression. Autom. Constr. 2021, 129, 103827. [Google Scholar] [CrossRef]
Alshboul, O.; Shehadeh, A.; Al-Kasasbeh, M.; Al Mamlook, R.E.; Halalsheh, N.; Alkasasbeh, M. Deep and machine learning approaches for forecasting the residual value of heavy construction equipment: A management decision support model. Eng. Constr. Archit. Manag. 2021. [Google Scholar] [CrossRef]
Alshboul, O.; Shehadeh, A.; Hamedat, O. Development of integrated asset management model for highway facilities based on risk evaluation. Int. J. Constr. Manag. 2021, 1–10. [Google Scholar] [CrossRef]
Shehadeh, A.; Alshboul, O.; Hamedat, O. A Gaussian mixture model evaluation of construction companies’ business acceptance capabilities in performing construction and maintenance activities during COVID-19 pandemic. Int. J. Manag. Sci. Eng. Manag. 2021, 1–11. [Google Scholar] [CrossRef]

Figure 1. Methodology flowchart for predicting green building project costs.

Figure 2. Description of green building cost influential factors.

Figure 3. Normality presentation of the features.

Figure 4.

X G B O O S T

structure of the proposed model.

Figure 4.

X G B O O S T

structure of the proposed model.

Figure 5. Deep neural network structure.

Figure 6. Activation function for DNN.

Figure 7. Flowchart of the RF’s architecture.

Figure 8. Five-fold cross-validation technique flowchart.

Figure 9. Heated correlation matrix analyses.

Figure 10. Representation of feature importance.

Figure 11. Evaluation metric comparison between XGBOOST and DNN.

Table 1. Categories of green building costs.

Green Building Costs
Categories	Hard	Soft
	Architectural design	Professional engagement
	Material and labor	Professional engagement
	Building services	Procedures
	Civil and structural	Procedures
	Plants and equipment	Legal requirements
	Building requirements	Legal requirements

Table 2. Feature descriptions.

Feature Symbol	Definition
People	People directly impact the project as they are engaged in its delivery and set its context, where their primary responsibilities play a crucial role in planning, design, delivery, and maintenance [52].
Technical aspects	Technical aspects are related to the methodological aspects of green building construction. Technical aspects include process and procurement issues, regulations and rules, and scarcity of green building materials and expertise [53].
Technology	Technology indicates the utilization of a product during or after its execution. For example, technology might be used throughout the execution process or be included as part of the final product. Equipment, materials, and industrial operations are exampled of technology [54].
Specific requirement	As there is a need to focus on the green features of the projects, additional construction specialists, such as green building facilitators and green building certifiers, are expected to be involved in green building projects. For example, a regular consultant group will be augmented by one or more green building consultants [55].

Table 3. Statical analysis of collected data.

Features	Statistical Methods
Features	Mean	Standard Deviation	Minimum	Maximum
People	1,563,257	1,334,043	43,280	4,404,400
Technical	416,644	381,155	11,080	1,258,400
Technology	1,686,579	1,524,620	47,320	5,033,600
Specific requirement	669,967	571,732	19,120	1,887,600
Green building cost	4,466,449	3,811,552	120,800	12,584,000

Table 4. Results of hyperparameter optimization for the

M L

models.

Table 4. Results of hyperparameter optimization for the

M L

models.

ML Models	Hyperparameters	Optimal Values
$X G B O O S T$	Number of trees	1000
	Learning rate	0.08
	Maximum depth	12
	Number of needed leaves	16
$R F$	Number of trees	800
	Learning rate	0.11
	Maximum depth	17
	Number of needed leaves	20
$D N N$	Number of neurons	4
	Learning rate	0.13
	Batch size	10
	Epochs	300
	Number of hidden layers	4
	Activation function	$R e L U$

Table 5. Performance evaluation for different K-folds.

K-Fold Cross-Validation	Regression Model	Performance Evaluation Metrics
K-Fold Cross-Validation	Regression Model	MAE	MSE	MAPE	$R^{2}$
$k = 3$	$X G B o o s t$	132.0	152.5	27.9	94.0
	$D N N$	238.0	316.0	51.1	89.0
	$R F$	408.0	527.9	56.9	86.0
$k = 5$	$X G B o o s t$	92.0	132.5	19.9	96.0
	$D N N$	196.5	284.0	32.4	91.0
	$R F$	378.0	507.9	40.4	87.0
$k = 7$	$X G B o o s t$	118	141	23.3	95.0
	$D N N$	212.5	301.1	43.8	90.0
	$R F$	389.4	516.6	50.7	86.0

Table 6. Performance measure comparison of ML models at k = 5.

Performance Metrics	Prediction Models
Performance Metrics	XGBOOST	DNN	RF
MAE	92.0	196.5	378.0
RMSE	132.5	284.0	507.9
MAPE	19.9	32.4	40.4
R²	96.0	91.0	87.0
$R_{A d j u s t e d}^{2}$	95.9	90.9	86.8

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alshboul, O.; Shehadeh, A.; Almasabha, G.; Almuflih, A.S. Extreme Gradient Boosting-Based Machine Learning Approach for Green Building Cost Prediction. Sustainability 2022, 14, 6651. https://doi.org/10.3390/su14116651

AMA Style

Alshboul O, Shehadeh A, Almasabha G, Almuflih AS. Extreme Gradient Boosting-Based Machine Learning Approach for Green Building Cost Prediction. Sustainability. 2022; 14(11):6651. https://doi.org/10.3390/su14116651

Chicago/Turabian Style

Alshboul, Odey, Ali Shehadeh, Ghassan Almasabha, and Ali Saeed Almuflih. 2022. "Extreme Gradient Boosting-Based Machine Learning Approach for Green Building Cost Prediction" Sustainability 14, no. 11: 6651. https://doi.org/10.3390/su14116651

APA Style

Alshboul, O., Shehadeh, A., Almasabha, G., & Almuflih, A. S. (2022). Extreme Gradient Boosting-Based Machine Learning Approach for Green Building Cost Prediction. Sustainability, 14(11), 6651. https://doi.org/10.3390/su14116651

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Extreme Gradient Boosting-Based Machine Learning Approach for Green Building Cost Prediction

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Data Collection

3.2. Factor Selection

3.3. Data Preprocessing

3.4. Machine Learning Algorithm

3.4.1. Extreme Gradient Boosting ( $X G B O O S T$ )

3.4.2. Deep Neural Network ( $D N N$ )

3.4.3. Random Forest ( $R F$ )

4. ML Model Results

4.1. Experimental Setup

4.1.1. Hyperparameter Optimization

4.1.2. Feature Importance Analysis

4.2. Performance Evaluation

4.3. Experimental Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Extreme Gradient Boosting-Based Machine Learning Approach for Green Building Cost Prediction

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Data Collection

3.2. Factor Selection

3.3. Data Preprocessing

3.4. Machine Learning Algorithm

3.4.1. Extreme Gradient Boosting ( X G B O O S T )

3.4.2. Deep Neural Network ( D N N )

3.4.3. Random Forest ( R F )

4. ML Model Results

4.1. Experimental Setup

4.1.1. Hyperparameter Optimization

4.1.2. Feature Importance Analysis

4.2. Performance Evaluation

4.3. Experimental Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.4.1. Extreme Gradient Boosting ( $X G B O O S T$ )

3.4.2. Deep Neural Network ( $D N N$ )

3.4.3. Random Forest ( $R F$ )