1. Introduction
The manufacturing industry plays a crucial role in the economy of each country [
1]. Developed countries mostly have better industries and, therefore, investments in potential opportunities in the industries (manufacturing projects), accordingly. For example, Aбдикeeв et al. [
2] reported that in 2017, 30%, 28%, 20%, and 13.5% of the annual gross domestic product (GDP) of China, South Korea, Germany, and Russia, respectively, belong to the manufacturing industry. Of this, a significant value belongs to construction projects for developing new industries or expanding available manufacturing firms. For instance, the value and number of net cross-border Mergers and Acquisitions (M&As) and announced greenfield global foreign direct investment (FDI) projects, 2008–2017 [
3].
Given the rivalry of today, choosing appropriate decisions plays a crucial role in the success of a manufacturing firm. In most cases, choosing inappropriately will impose detrimental effects on a company or cause project failure. Risks are considered an inseparable part of a project and thus should not be ignored. Each year, many projects fail due to the harm that they impose on themselves. Risks attributed to projects can have various sources, but all have the same goal: project failure.
Figure 1 depicts the correlations between risks associated with a project and the amount at stake throughout the lifecycle of a project.
Figure 1 indicates the level of risks associated with each stage of the lifecycle of a project [
4]. The level of the risks at the earlier phases of a project is significantly higher than in the ending phases. Such a fact can reveal the importance of risk management in project selection. Carvalho and Rabechini Junior [
5] also mentioned a significant correlation between the level of risk management taken by a project team and project success.
In the project management body of knowledge (PMBOK), which is considered the essential guideline in the project management field, risk management is considered one of the nine areas of project management.
Therefore, minimizing the risks of a project is a vital need. Subsequently, the more attention paid to risk identification, the less risk will be faced during the lifecycle of a project. For this purpose, and as will be shown in the literature review, efforts have been made during the last two decades to propose various decision-making methods to investigate different risk management problems. Of these, a noticeable share belongs to the project selection problem. In addition, the shortcomings of current research methods will be investigated. In the literature review, the problem will be explained in more detail.
The main question is whether deterministic decision-making methods can satisfy all risk management needs in project selecting problems. Later, this question will be answered.
Considering the uncertainty in the decision science discipline challenges the use of classic methods. Many references have used evidence theory as a powerful method to consider the beliefs of different experts. Tacnet et al. [
6] stated that experts have different opinions about factors due to imperfect information provided by more or less heterogeneous, reliable, and conflicting sources, which will affect the decision-making process. Kazimieras Zavadskas et al. also argued that people might have different meanings based on their knowledge, experience, and preference. Evidence theory is an effective way to consider such issues in the decision-making process by combining multi-criteria decision-making methods such as the Analytical Hierarchy Process (AHP) and the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) [
7]. Within the literature, the hybrid AHP and evidence theory have been used many times to provide an adequate base for choosing the best alternative, while the beliefs of experts are essential [
8,
9,
10,
11]. While considering the term uncertainty, each cell in the AHP will not have a specific value necessarily, and instead, a range of values (confidence intervals) must be estimated by considering confidence levels (1–α). Therefore, an optimum solution will not necessarily exist and can be changed by considering different confidence intervals.
Using the facts mentioned above, there is ample evidence to consider uncertainty in selecting the best project among the available alternatives to minimize the risks of the project in the executing phase.
Therefore, in this research, a new decision-making model will be proposed to select the best alternative while uncertainties exist for the level of risk factors. The aim will be to minimize the risks associated with the capabilities of the company, such as available money, human resource skills, machinery, documentation systems, and quality control level.
This study can help industrial business owners to select appropriate projects according to their capabilities and strengths. Choosing the wrong project for construction can have detrimental effects on a business and impose irreparable financial harm. In each county, many failed projects can be found that remain unconstructed for years or even decades. The same phenomenon happens in the industry. A quick search in each industrial zone, a series of unsuccessful projects are left alone simply because the managers believe that the projects will not be successful and make a profit, even if they are successfully working in the future. This may also simply be because they made a wrong decision about choosing projects and calculating the required budget for completing them, due to not paying enough attention to selecting appropriate projects according to the money and resource availability.
In this research, a new method will be proposed to involve uncertainty of the available evidence in the process of selecting the best alternative among a list of available proposals of manufacturing projects, which will minimize the level of the risk factors associated with a project by utilizing scheduling and line balancing risk assessment.
2. Literature Review
More than one attribute is usually considered in multi-attribute decision-making methods in project scheduling (MADM) methods. The most preeminent advantage of MADMs, compared to multi-objective decision-making methods (MODM), is that there are easier to be understood by managers in actual practice. Moreover, their outcomes are more tangible and can easily be applied directly in real manufacturing systems. Methods like the AHP and TOPSIS are among the most critical MADMs. MADM methods can usually be hybrid with other methods to enhance their functionality. Several MADM references are investigated in this section, and essential questions regarding MADM in risk management will be answered. MADM methods are regularly used in project management problems. Kuo and Lu [
12] proposed a fuzzy MCDM method for assessing risk factors associated with metropolitan construction projects. The proposed method used a multiple fuzzy attributes direct rating scheme to measure the occurrence probability of risk factors. Marcelino-Sádaba et al. [
13] proposed a risk management method for project management in small manufacturing systems. The method is then applied to 72 companies located in Spain. Their method was based on checklists and other simple tools to measure indicators and outline the corrective actions.
Leu and Chang [
14] argued that although many papers used classic methods while safety factors are considered, most of those methods could ineffectively address the correlation between dependencies of safety factors and occupational accidents. Hence, they proposed a new safety risk-assessment model based on Bayesian networks and fault tree concepts. Their outcomes show that the proposed method can effectively address the safety management of a project. Hossen et al. [
15] used a construction schedule delay risk assessment method for nuclear power plants where a hybrid AHP and relative importance index and schedule delay risk are considered.
Kim [
16] developed a new model for risk management, which is designed based on Bayesian rules. Their model used the pre-project cost risk assessment and actual performance data to portray a range of possible project costs at a pre-defined confidence level. Esmaeili et al., (2015) [
17] presented an attribute-based risk identification and analysis method to improve available construction safety management methods, which help schedulers and designers realize, identify, and then model safety risk independently of specific areas activities or building components. Their method showed high performance when it was used to identify and measure several injuries and fatalities in construction resulting from a finite number of hazardous attributes of the work environment. In the same year, Esmaeili et al., (2015) [
18] claimed that identifying and measuring risks associated with virtual construction work environments (which usually cause work injuries) is vital for pre-construction safety management. They proposed a two-stage model based on a principal component analysis of safety attributes and leading principal components that used potential predictors in a generalized linear model. The outcomes of applying their model indicated that giving identifiable characteristics of planned work could successfully forecast the probability of a safety incident.
Islam et al. [
19] focused on using an analytical network process (ANP) on project risk management and argued that fuzzy ANP has limited functionality while incorporating new information into the risk structure. Therefore, they proposed a new fuzzy Bayesian belief network (FBBN) and showed its superiority to supersede the existing fuzzy ANPs. Valipour et al. [
20] focused on occupational accidents reported in metropolitan excavations in significant cities. To investigate this case, they proposed new criteria for risk assessment by using adopted step-wise weight assessment ratio analysis (SWARA) and complex proportional assessment (COPRAS) methods. Then, using a field study, they found that construction safety, unfavorable geological conditions, shortage of managerial experience, preliminary emergency plan, and subsidence of ground are the most significant risks in excavation projects. Williams [
21] declared that systematicity causes difficulties in evaluating risk levels of complex projects. They also mentioned that one important activity after identifying risk is to pursue its casual chain. They outlined the steps of analyzing the systemic nature of risk and how owners and constructors can fully understand the consequences of their actions.
Fabricius and Büttgen [
22] argued that the overconfidence of the manager directs them to project failure in many cases. Therefore, they used risk assessment to measure overall anticipated project success and how overconfidence will influence such assessments. Using data from 204 project managers, they outlined a standardized, case-based survey and proved that overconfidence reduces risk awareness among project managers, leading them to assess risks more optimistically and with more positive conclusions about anticipated project success. C. M. Wang et al. [
23] addressed a new method for construction project managers to perceive risk. They investigated how the behavioral factors of project managers such as extraversion, agreeableness, and conscientiousness influence risk propensity and their implications to see if they differ in perceiving risk. They found that extraversion, agreeableness, and conscientiousness impose detrimental effects on risk perception.
Chemweno et al. [
24] focused on implementing appropriate risk management on the health level of assets. They showed that choosing the proper risk management approach would positively affect maintenance decision-making by identifying, analyzing, evaluating, and mitigating equipment failures. For this purpose, they offered a new risk assessment using generic selection criteria for the failure mode and effects analysis (FMEA), fault tree analysis (FTA), and Bayesian networks (BN). In their method, the available criteria were prioritized using ANP.
Kokangül et al. [
25] dealt with solving the health and safety problem in workspaces. For this purpose, they proposed a new risk assessment method, which relies on the Fine–Kinney method and AHP for a large-scale manufacturing company. Then the correlation between the Fine–Kinney risk assessment method and AHP was examined.
Identifying risks and evaluating them is a vital step in the early stage of a project. Yet et al. [
26] addressed a hybrid dynamic Bayesian network modeling framework for analyzıng risk scenarios and budget policies in agriculture projects, where both uncertainty and variability of risk and economic factors were taken into account. Continuing, Yang et al. [
27] argued that many risks associated with research and development (R&D) projects make them too sophisticated when standard methods are used to examine the performance of a project. Therefore, they proposed a predictive evaluation framework where a belief rule-based system with random subspaces was used in order to assess risks in R&D projects. They applied their model in a number of projects in China and showed that while using their method prominent results with prediction accuracy were achieved.
When considering more than one objective function is essential, MODM models in project management methods promise ways to overcome the difficulties of super-complex project management problems. Many project management problems are Np-hard by their nature, which means normal optimizers like LINGO and GAMS cannot quickly solve them. The complexity of such problems will be increased when more than one objective function is taken into consideration. This fact matches with the real circumstances of projects. In real projects, managers usually consider more than one objective at the same time while selecting a project. For example, they may want to determine which project will have more profit, have fewer costs, and impose fewer risks during the execution phase. Some of the most important references that used MODM techniques in the project solving problem will be reviewed.
Ansarifar et al. [
28] focused on rapid responding (time) and cost of services objectives to find the optimum location for ambulance stations and helicopter ambulances. For this purpose, they proposed a new heuristic method to solve the developed multi-objective mode l.
2.1. Risk Management and Its Importance in Project Management
The Cambridge Dictionary defined the term risk as “the possibility of something bad happening” (
https://dictionary.cambridge.org/dictionary/english/risk, accessed on 30 October 2020). This means that projects (similar to other industrial sectors) can suffer from the risks if risk factors are neglected.
Risk management is an essential part of PMBOK, which is an important reference in project management worldwide. In this reference, risk assessment is divided into five main parts: Risk Identification, Risk Evaluation, Risk Analysis, Risk Planning, and Risk Response.
Y. Zhang and Fan [
29] proposed a model to integrate project cost, schedule, and quality to choose the risk response strategy selection problem. They declared that by finding the optimum solution, the most appropriate risk response strategies could be taken.
Failure to pay enough attention to the risk factors of a project will impost negative effects on time, cost and quality of the project, states Fabricius and Büttgen [
22]. It is worth knowing that risk identification has a significant impact on the future strategies of a company. Kliestik et al. [
30] believed that the beliefs of top management and the economic environment are two major points that play significant roles in shaping the organizational social norm. Abd El-Karim et al. [
31] focused on the added value of risk assessment, risk strategy, and plan analysis to the construction industry in Egypt. They aimed to identify and measure the effect of the factors that negatively influence time and cost contingency. Liu et al. [
32] argued that incorrect investment decisions are the main root of many losses for the investors of a project. Although using quantitative risk assessment, which project owners frequently apply, can ameliorate such problems, classic risk assessment methods usually ignore assessing the effects of risk events, such as product sales falling short of expectations. Therefore, they proposed a modified version of the quantitative risk assessment model, enabling managers to outline the direct correlations between risk events and other decision variables in investing in a project. In this research we will use the risks that are classified by [
33].
However, it is essential to know that the risk identification level depends on the nature of the project. While for some projects, it is essential to use complex techniques to understand the risks and the correlations between them, for others, more straightforward methods are more useful. For example, Bowers and Khorakian [
34] stated that innovative projects are riskier to succeed due to their nature. Current risk management methods might have too stern a look for innovative projects, which may damage the creativity in an innovative project accordingly. To overcome such a barrier, a new framework was offered to use the generic innovation process in the risk management process of a project to outline a stage-gate innovation process model to provide an effective interface for incorporating project risk concepts.
A critical point about risk management is the contribution of the size of a company with risks and risk assessment methods. Brustbauer [
35] investigated risk management in small and medium-sized enterprises using a field study. Their outcomes indicated that using an active or a passive risk management approach has influenced the choosing of an offensive or defensive strategy for the studied cases, respectively. Besides, when the size of the firm came into consideration, the affiliation of the sector and the ownership structure would also influence the implementation of risk management.
Fang et al. [
36] stated that complexity usually causes barriers in identifying and assessing risks associated with a project. To overcome such difficulty, they used an important measuring technique in project risk management. The complex project risk network models and provides complementary analysis results, which are used to measure the interactions of risks. Tao et al. [
37] stated that location and congestion of activities must be considered during the project schedule.
2.2. Risk Assessment Methods in Project Management
An in-depth review of the opted research studies showed that the most important methods are mathematical modeling, MADM, MODM, field study and statistical analysis, heuristic and meta-heuristics, and reviewing case studies (regardless of their priority). In addition, several critical research studies are shown where risk management is the main aim.
Gutjahr [
38] used a branch and bound searching algorithm for a multi-objective scheduling method while minimizing project time, and costs are considered the main objectives of the model. Wu et al. [
39] provided an in-depth review of tools and methods used by researchers for business intelligence risk management.
Risk can be defined as a measurable part of uncertainty (Dziadosz and Rejment, 2015), discuss [
40], by considering the occurrence and severity of the damage. However, uncertainty is defined as “a situation, in which something is not known, or something that is not known or certain” (
https://dictionary.cambridge.org/dictionary/english/uncertainty, accessed on 11 August 2020). Uncertainty can increase the harms of risk or increase its occurrence likelihood.
2.3. Uncertainty and Evidence Theory in Project Management
When risk management (including risk identification and risk assessments) comes into mind, one major shortcoming is that researchers consider instant values in their calculations in most cases. While in the real world, the risk factors and their identifiers can change for many reasons. For example, the chance of lack of money in a period (occurrence rate) could be entirely different from another period due to economic conditions. For example, Davari and Demeulemeester [
41] dealt with the proactive and reactive resource-constrained project scheduling problem with stochastic activity durations. Grabovy and Orlov [
42] developed a risk management method for considering uncertainty factors at all stages of implementing an investment construction project in Russia using a cross-border index for calculating an investment construction project. Besides, the level of intensity of a machine breakdown (severity) may vary. While oil leakage is a minor failure in most cases, it may cause harmful damage to an engine at another time. For instance, Nasrabadi and Mirzazadeh [
43] focused on uncertain conditions and the time value of money. In such cases, using constant values for risk identification and assessment is a flawed strategy and can cause uselessness, and in some cases it can even mislead the decision makers. Such a drawback is even serious for projects in the industry.
In most cases, stubble cues that show a risk level will remain at the same level until the end of the project. In fact, in most cases, risks can emerge and become exacerbated in a short period of a project, and then they can be ameliorated by risk response programs (or worsen if they are left alone). For this purpose, we focus on the MADM and MODM techniques in project management considering the uncertainty.
Dempster introduced the theory of considering uncertainty in the probability of the decision-making process in the 1960s. Then, in 1976, Shafer published A Mathematical Theory of Evidence (Shafer, 1976). Their theory was developed to consider the uncertainty of mathematics arrays and followed fundamental but functional mathematics principles. Other scientists frequently apply their methods in various fields, including engineering, management, and the humanities. In addition, several important types of research that adopted the Dempster—Shafer theory of evidence will be investigated.
Using the Dempster—Shafer theory of evidence can overcome the dilemma between exact and probabilistic methods in expert systems. Zadeh (1986) [
44] stated that the Dempster—Shafer theory of evidence has been widely used in AI for considering uncertainty in expert systems.
Tang [
45] addressed a fuzzy soft set approach based on grey relational analysis and the Dempster—Shafer theory of evidence. The Dempster—Shafer theory of evidence was used to integrate the available alternatives into one collective alternative to choose the best alternatives. Hatefi et al. [
46] used the Dempster—Shafer theory of evidence to develop a new model for assessing risk factors in a project associated with the environment. Their method was applied to an oil company in Iran, and the outcomes were compared with those achieved by conventional risk assessment and the fuzzy inference system methods, which showed the superiority of the proposed model in uncertain conditions of a project. Li et al. [
47] discussed that most of the previously worked methods based on the fuzzy soft sets were based on different kinds of level soft sets, making them too sophisticated to investigate by decision makers. Therefore, they proposed a new fuzzy soft sets approach to combine grey relational analysis with the Dempster—Shafer theory of evidence in medical diagnosis problems. In their method, the Dempster—Shafer rule of evidence was used to aggregate the available alternatives into a collective alternative to select the best alternative.
J. Wang et al. [
48] enhanced the functionality of the fuzzy soft set-based decision-making method by combining ambiguity measure and the Dempster—Shafer theory of evidence, which yielded less uncertainty and increased the choice decision level accordingly.
Ballent et al. [
49] believed that the Dempster—Shafer theory of evidence could provide a basis for considering various expert beliefs where structural vulnerability and damage are examined, which results in subjective assessments. Muriana and Vizzini [
50] stated that quantitative risk assessment is an efficient tool for fast decision-making. At the same time, progress variances from what was targeted before have adverse effects on a project risk profile. Thus, corrective and preventive actions must be defined based on the risk index to balance the risks. Niazi et al. [
51] discussed that many software organizations do not pay enough attention to project management and risk assessment before starting global software development. For this purpose, they proposed a two-step approach to identify and analyze the 19 risks associated with global software development from the client and vendor points of view. Pan et al. [
52] proposed new hybrid interval-valued fuzzy sets and improved the Dempster—Shafer evidence theory, as well as fuzzy Bayesian networks, for risk assessment and risk analyzing for sophisticated uncertain conditions. They showed that the proposed method could help reduce the likelihood of potential failure occurrence and ameliorate the risk magnitudes while a failure happened. Qazi et al. [
53] addressed a new method for assessing risks by considering project complexity simultaneously. They found there is interdependency available between complexity drivers, risks, and objectives and their method was also able to make priority between complexity drivers, risks, and strategies.
Sangaiah et al. [
54] proposed a hybrid approach for the risk assessment of software projects, including fuzzy Decision-Making Trial and Evaluation Laboratory, fuzzy MCDM, and MADM. Their method could provide more effective results compared to classic methods. Suresh and Dillibabu [
55] focused on the risk assessment of software projects using a hybrid fuzzy-based machine learning mechanism that worked based on an adaptive neuro-fuzzy inference system-based multi-criteria decision-making and intuitionistic fuzzy-based TODIM (an acronym in Portuguese for interactive multi-criteria decision-making) approaches. Tonmoy et al. [
56] dealt with coastal risks identification and evaluation in Australia. They found that informing and consulting stakeholders has positive impacts on planning for risk management. Zou et al. [
57] stated that multi-disciplinary collaboration in risk management is necessary to achieve more success.
In most of the classic risk assessment methods, risks were usually analyzed separately. However, Y. Zhang [
58] stated that correlations between risk factors of a project can influence project performance. Therefore, they proposed a new method for measuring risks interdependently, followed by an optimization model for selecting the best risk response strategies. Zavadskas et al. [
59] proposed an MADM method for risk evaluation which worked based on the TOPSIS grey and COPRAS methods. Their main aim was to consider the goals of stakeholders along with other construction process efficiency and real estate value factors. After reviewing these papers, the following findings were achieved:
Considering risk factors in project management is vital, and during the last two decades scientists have focused on minimizing the risk factors associated with a project.
Uncertainty in occurrence probability and intensity should not be ignored and will impose detrimental effects on a project. Scientists have considered various aspects of uncertainty in their research.
When dealing with uncertainty in risk identification and assessments come into consideration, the Dempster—Shafer evidence theory provides a promising way to express and model the uncertainty.
In order to consider multiple attributes in evidence theory (when more than one attribute has to be addressed), evidence theory shows flexibility in combining with other decision-making methods. The hybrid methods have superiority compared to the standard decision-making methods.
In this research, considering the compatibility of AHP in choosing the best project and also the outstanding features of evidence theory in addressing uncertainty, a hybrid AHP evidence theory will be proposed to address the problem of selecting the best industrial project among the available alternatives in order to minimize the production risks associate to a project.
The outcomes of the comprehensive research completed in this section, using a hybrid AHP evidence theory for selecting industrial projects to minimize production risks, have not been addressed before.
3. Research Methodology
3.1. Flowchart of the Proposed Framework
Figure 2 shows the flowchart of the research methodology in more detail.
According to the research flowchart in the next section, the effective risk factors that can influence project selection in an uncertain environment will be identified first by quantitative research (Phase 1). Then, an unsupervised machine learning method will be applied (Phase 2) to filter the alternatives before entering the next phase. This section aims to classify the alternatives into different clusters so the top managers of the company can focus on excellent options more effectively. In the next phase (Phase 3), a hybrid AHP and Dempster—Shafer theory of evidence is presented to select the best alternative with the lowest level of overall risks. The method is designed so that the project with a lower total risk factor range will have more chance to be selected. The performance of the outcomes of the method will then be evaluated by using some metrics.
The advantages of novelties of the proposed method are provided in
Section 3.5 after explaining the method in detail.
Methodology for Each Phase of the Proposed Method
The proposed method consisted of three steps where, in the first step, quantitative research was conducted to identify the risk factors that can influence a project. Then, a hybrid PCA-agglomerative unsupervised machine learning algorithm is proposed to classify the projects in terms of Properties, Operational and Technological, Financial, and Strategic risk factors. Then, in the third step, a hybrid AHP and Dempster—Shafer theory of evidence is presented to select the best alternative with the lowest level of overall risks.
Figure 3 indicates the steps of the proposed method for choosing the best project.
3.2. Identify the Effective Risk Factors (Phase 1)
Kral et al. [
60] used a questionnaire to figure out the experience of the manager about the criteria, which can influence the optimization of the project portfolio. A similar approach will be carried out in this research by finding the significant risk factors that can influence the success of a project. For this purpose, “Project Success” can be defined as the dependent variable. According to the questions of the research, the risk factors that can impose detrimental effects on the success of a project are categorized into the four main sections:
Properties Risk Factors (Infrastructure, Machinery, Human Resource)
Technology and Operational Risk Factors (Scheduling, Technology, Operational Risk, Management Systems)
Financial Risk Factors (Evaluating Projects, Profit and Costs, Money Value)
Strategic Risk Factor (Competition, Market Share, Marketing, Customer Satisfaction)
The main aim of this research is to find out if the above risk categories can influence the success of a project. If so, to what extent?
Therefore, the following variables are to be addressed in this research:
Since in this research one aim is to track the influence of a variable throughout the life cycle of a project, each of the above variables will be asked in three phases:
Using this strategy, finding the correlations between the independent variables and dependent variables can show us if a project is selected correctly or not. Moreover, and more importantly, do companies pay enough attention to such risk factors?
Table 2 shows the statistical analysis for the data that will be used in the next section:
3.3. Classify the Project Candidates Using a Hybrid PCA-Agglomerative Method (Phase 2)
3.3.1. Input Data
Table 3 shows the summary of the data gathered from the statistical society:
3.3.2. Libraries
In this research, Python is used to code the machine learning algorithm. For this purpose, the following libraries were used:
NumPy: NumPy is a Python library for generating and working with homogeneous multi-dimensional arrays. It is also used for applying basic mathematical formulas. These arrays are tables of elements (usually numbers) of the same type and are indexed by a few positive integers. In NumPy, dimensions are known as axes. The number of axes is called the rank.
Pandas: Pandas is the second library that will be used in this research. The main aim of using Pandas is to develop the data frameworks. With Pandas, it is possible to import data with different file types, such as CSV and XLSX.
Pandas is also an excellent library to work with matrixes and perform various functions such as adding a row, deleting a column, multiplying two matrixes, etc.
Matplotlib: Matplotlib is used for drawing various plots, including histograms, box charts, bar charts, and scatter charts.
Seaborn: Seaborn is a library that contains powerful formulas for statistical analysis.
Scipy: Scipy is widely used for various purposes. However, in this research, Scipy will be applied for calculating the correlations between factors and optimization purposes.
scikit-learn: scikit-learn is an essential library for this research. This library will be used for applying supervised and unsupervised machine learning methods.
3.3.3. Selecting Features
In this section, the Ward algorithm is used for clustering datasets in an agglomerative way. However, before using the proposed method, a PCA must be used to reduce the size of the features to prepare them to be used by WARD:
X = ConsumersData.iloc[:,[5, 6, 7, 8]].values
Noted that 5, 6, 7 and 8 present the column number of dataset that will be used as features.
The outcomes will be shown as following:
Afterward, the Ward method is applied for the clustering algorithm while the different number of clusters is taken into consideration (k = 2, 3, and 4):
3.3.4. Determining the Appropriate Number of Clusters
One way to help estimate the appropriate number of clusters (K) in unsupervised machine learning methods is to use a dendrogram. A dendrogram is an agglomerative method for clustering data. In a dendrogram, the correct number of clusters can be estimated by looking at long vertical lines. However, it should be noted that a dendrogram can be considered as a guideline, and the correct number of clusters should be estimated based on the scores observed after solving the unsupervised machine learning algorithm (such as Silhouette and Calinski–Harabasz).
A horizontal cut in
Figure 4 where the vertical lines have long distances shows that the correct number of clusters could be two or three. The correct value for the number of clusters will be outlined using Silhouette and Calinski–Harabasz metrics.
After clustering the dataset using the proposed methods, the outcomes are outlined in
Figure 5. It shows the clustering scatter chart based on the Ward method and a predefined number of clusters. As shown, while the number of clusters is considered three, the machine-learning algorithm could specify the clusters more precisely. In contrast, while the number of clusters was considered 4, the algorithm could not specify the border of clusters precisely.
Therefore, according to the findings of the proposed PCA-Ward method, projects can be clustered into two main groups while Properties, Technologic and Operational, Financial, and Strategic risk factors are taken into consideration. Such a clustering approach will facilitate the process of pre-selecting the alternatives for the next stage.
3.4. Choosing the Best Alternative Using Multi-Attribute Decision-Making Method (Phase 3)
AHP is a decision problem divided into different levels of objectives, criteria, and sub-criteria to choose the best alternative amongst those available. In this process, different options are involved in decision-making, and it is possible to analyze the sensitivity of the criteria and sub-criteria. A sensitivity analysis based on the AHP method is a way to rank alternatives in terms of the pre-defined criteria. The decision maker can also weight criteria. However, one major shortcoming of classic AHP is that the values are considered constant in this method, and therefore, it cannot reflect the uncertainty of the responses of the experts in a selecting problem. Besides, the Dempster—Shafer theory of evidence is a robust method for considering the point of view of experts when uncertainty must be taken into consideration.
In the Dempster—Shafer theory of evidence, the level of belief of individuals in expressing their opinions is used. For example, not all survey participants necessarily answer questions with 100% certainty. In the real world, it is normal to answer a question with a level of uncertainty (α %). As a result, the belief rate of a question could be (1–α)%). Therefore, in this method, the degree of belief of individuals in answering each question plays a key role and is considered a function of belief. The belief function can be defined as a mathematical function, a range of values (for example, between 0 and 100), and even a quantitative or qualitative table.
For example, an expert can be given a range for risk (i.e., score (1–5)) with a confidence rate of 30%, and therefore, the system considers (1–9) with a confidence rate of 70% automatically. They can also give a single score for risk as well. For example, they can give 5 with a confidence rate (0.6). In this case, the algorithm considered (1–9) 0.5 automatically.
Thus, the Dempster—Shafer theory of evidence, which is often used as a method based on the degree of belief of individuals, is based on two principles: first, obtaining the degrees of belief of participants for possible answers to each of the questions, and second (the Dempster rule) to combine such degrees of belief when they are based on independent evidence.
Therefore, considering the aim of this research, which is choosing the best project amongst the available alternatives to minimize the risks, combining this method with evidence theory (that uses possibility, belief, and uncertainty functions), a new method based on machine learning is used where the alternatives will be filtered before they enter into the hybrid AHP and Dempster—Shafer theory of evidence method. Then, the best alternative will be selected considering the beliefs of the experts in terms of the mentioned risk group factors.
The solutions of the proposed hybrid AHP and Dempster—Shafer theory of evidence can be represented as follows:
When the first index of the above matrix indicates several available contract options, the second index is used for showing the upper and lower level of risk of each alternative (upper risk level, lower risk level), and the third index is used to show the number of alternatives.
The above matrix will indicate the risk levels of each alternative. Therefore, using the statistical probability method (as shown below), the best alternative, which contains the lowest risk domain, can be elected using the following formula:
The (Alternative Index) AI matrix shows the amount of 1/μ_i that is the index for showing the mean of the total risk factor values of an alternative. Greater values of AI are preferred. Afterwards, the best alternatives can be detected and represented.
3.5. Choosing an Alternative with the Lowest Risk Domain
After calculating the total value for risk factors as a domain (with upper and lower limits) for each alternative, it is time to select the alternative with the lowest total risk factor. However, choosing the best alternative in this research is not as easy as selecting the project with the lowest value, because here the total risk values are not exact numbers and it is not possible to easily select the minimum value.
To overcome such a problem, two main factors must be taken into account:
It is evident that an alternative with the lowest
is more desired because the related project achieved the lowest risk values (
Figure 12). However, while the
for two alternatives are the same, the project with a smaller
is preferred because, generally, it has a lower risk than the other option (
Figure 13).
In order to develop statistical formulas (Equations (8)–(10)) for measuring the high confidence level (99.7%), the normal distribution function is used. It is in line with the central limit theorem, which was used in this research to gather data, and the mean of variables will follow the normal distribution function if the amount of gathered data is more than 30. As a result, the following formulas based on statistical quality control of normal distribution function will be developed:
The reason for considering 6σ for measuring distance between the upper limit and lower limit is that control rules take advantage of the normal curve in which 99.73% of the data will be within plus or minus three standard deviations from the average.
Therefore, the CV index (
) can be considered as an appropriate index for comparing the total risk domains while the two alternatives have equal means (
).
In statistics, is usually used instead of the .
Therefore, all possible conditions must be taken into account.
- (1)
When two alternatives have different total risk factor means (
) and different total risk factor domain lengths (
), as shown in
Figure 14.
Result: the project with the lower will be selected (more significant ).
- (2)
When two alternatives have different total risk factor means (
) but equal total risk factor domain lengths (
), as shown in
Figure 15.
Result: the project with the lower will be selected (more significant ).
- (3)
When two alternatives have equal total risk factor means (
) but different total risk factor domains (
), as shown in
Figure 16.
Result: the projects have the same AI; therefore, the project with the lower CV is preferred.
- (4)
When two alternatives have equal total risk factor means (
) and equal total risk factor domains (
), as shown in
Figure 17.
Result: projects have the same AI and CV. Both alternatives can be chosen.
3.6. Advantages and Novelties of Using the Proposed Hybrid Evidence Theory and AHP
The proposed hybrid evidence theory and AHP is a promising method for minimizing the uncertainty in project selection problems while negative factors such as job tardiness, work in process, bottleneck machines, and over-allocated machines are taken into consideration. In addition, compared to the mathematical modeling, the outcomes are more understandable for project managers in real industries.
Compared to classic AHP, the proposed hybrid AHP and Dempster—Shafer theory of evidence has many prominent features.
Table 4 outlines the features of both mentioned methods:
3.7. Novelties and Innovation of the Proposed Method
In the following, to clarify the novelties of the proposed method, the most relevant and recent similar methods that have used the evidence theory and AHP will be presented and compared with the proposed method in this research (
Table 5).
Although some shreds of evidence showed the evidence theory and AHP were used before for other problems, in this research a new version of evidence theory and AHP is proposed to filter the unsuitable alternatives using an unsupervised machine learning algorithm before selecting the best alternative. Moreover, the proposed algorithm is designed to track and compare the level of each risk factor group in different phases of a project, including before execution, during execution, and after finishing. Such an approach will make a base for a project selection portfolio of a company.
4. Results and Discussion
4.1. Verifying the Proposed Algorithm (Solving Experiments Gathered from the Literature)
In this section, several case studies will be solved to verify the functionality of the proposed algorithm in different ways conditions. For this purpose, an L2^4 Taguchi method is used to design experiments (DOE) using Minitab 18.0. The reason for choosing this type is that a lower limit and an upper limit are considered (2^4).
The experiments are designed to consider various conditions that potentially surround a company while choosing the best project. To design the experiments, the following levels for each factor of DOE are taken into account (
Table 6).
The case studies are designed in a way that various ranges of parameters is taken into account. Therefore, the case studies are divided into three main categories used by many researchers in the literature review [
62].
Table 7 shows the case studies that the proposed algorithm must perform. As shown, the domains of each of the case studies have been selected according to
Table 7 to cover each scale (small, medium, large, and very large) entirely.
In this section, each of the case studies will be solved by the proposed algorithm in Matlab. The outcomes of the case studies are shown in
Table 8.
However, in order to see the steps of the proposed algorithm, in reality, case study number two in
Table 8 will be explained in detail.
4.2. Solving a Case Study and Explaining the Outcomes in Detail
In this section, the third case study in
Table 8 will be explained in detail regarding the mechanism of the algorithm functionality.
Suppose a company has to select the best option between the available two alternatives. One is to set up a new production line, and the other is to set up a new laboratory, which can also provide outdoor services.
There are two managers in this company that must determine which alternative is to be carried out in the future. However, the quote of the company share for the first manager is two times more than the other manager, and therefore his vote will value two times more than the other.
In order to choose the best option, managers decided to consider three risk factors: financial, operational, and property. From the point of view of the manager at this time, the financial risk factor is more important than the other factors, and the operational risk factors are more important than the property risk factors. Therefore, they decided to set the following values between the risk factors:
Regardless of the project title, the company has two options for financing it. One is to pay the expenditures directly and the other is to obtain a bank loan. However, each contract option will influence the level of the risks.
Afterward, the managers are asked to fill out a questionnaire to set values about each risk factor and their uncertainty about their solutions. The following matrix shows the first three experts opinions (
Table 9):
The following results are obtained after solving the case study using the proposed hybrid AHP and theory of evidence.
Step (1) Calculating the “expected_value_for_risk” matrix using the opinions of the experts (
Table 10):
The way for calculating the first element of the above matrix will be explained:
In order to calculate the above matrix, the lower and upper values for each risk factor must be calculated. Therefore, using the “for” loop, the idea for each expert will be gained. For example, for the first risk factor, the results will be as follows (
Table 11):
Using the same strategy, the rest of the elements of the “expected value for risk” will be calculated (
Table 12).
Step (2) Continuing, the mean will be calculated for each of the risks in each contract option (
Table 13):
Step (3) Normalize the upper and lower risk factor values (
Table 14):
Step (4) Calculating the average of the normalized risk matrix:
Then, using the following formulas, the average of the normalized lower and upper-risk factors will be calculated (
Table 15).
Step (5) Calculating the total risk matrix:
The total lower and upper risk matrix will be calculated (
Table 16).
At this point, the total lower and upper values for each alternative (using a specific contract option) are calculated. For example, while the first alternative is assumed to be carried out by the second contract option, the total risk domain will be [0.0366, 1.8211].
Step (6): Calculating the AI matrix and choosing the best alternative:
Now, and in the last step, the alternative with the lowest risk point must be selected. However, since the risk point is not an exact value but a domain, this means selecting the project with the lowest risk domain. To solve this problem, two factors must be taken into consideration:
The mean of a risk factor domain (
) where the index
is used for it as described in
Section 3.
If two or more projects have the same
, then the length of the risk factor domain where
is used for it (using the
formula) as described in
Section 3.
Therefore using the following formulas, the
AI index will be calculated for each alternative.
Then, the best option will be the alternative with the highest AI index value, which in this case study is the second project when the first contract option is selected (0.7371).
Best_alternative = 2
Best_contract option = 1
4.3. Measuring the Performance of the Proposed Algorithm
In order to assess the performance of the proposed method, several indicators are defined as shown below:
The ability to solve all problem types
The ability to choose projects with the lowest uncertainty
The solving time
Comparing the hybrid AHP and Dempster—Shafer theory of evidence with classic AHP
In addition, in the second part of this section, the outcomes of problems solved using the hybrid AHP and Dempster—Shafer theory of evidence will be compared with classic AHP to show the superiority of the proposed method in solving the problems while uncertainty exists.
4.3.1. The Ability to Solve All Problem Types
The results of 24 experiments solved by the proposed hybrid method showed that the algorithm could solve all experiments (100%) and show the best alternative with the average of the lowest risk factors.
Therefore, the algorithm can be used in real project selection time by industries.
4.3.2. The Ability to Choose Projects with the Lowest Uncertainty
The outcomes of all solved case studies are revised again. In each case, the
AI matrix is presented in
Table 17, and the lowest risk factor reported by the proposed algorithm is double-checked. In all studied cases, the solving algorithm can find and report the project with the lowest uncertainty (highest
AI).
Reduced risk indicator (RRI) shows how much percentage using the proposed algorithm helps select the alternative with the lowest risk factor.
As shown in
Figure 18, the algorithm can choose the alternative with the lowest risk value in all cases.
As shown, the solving algorithm solved the small case studies in less than 0.055 s, medium-scale cases in less than 0.09 s, and large-scale cases in less than 0.56 s (
Figure 19). These results are noticeable and mean that the algorithm can be used safely in actual practice.
5. Conclusions and Recommendations
This research focused on the uncertainty in the industrial project selection problem. In the real environment, several risk factors threaten the success of the project. However, such risk factors are not constant and may take various values depending on the environment of the project. Therefore, classic decision-making methods may fail to correctly report the actual risk factors value and select the best project among the alternatives. In this research, many risk factors that influence project success are extracted using the Delphi method. The findings showed that the risks could be divided into four main risk clusters: Properties risk factors; Technology and Operational risk factors; Financial risk factors; and Strategic risk factors. In each of the risk factor clusters, several variables are defined. Each variable is asked in three phases of a project: before selecting a project, during execution of the project, and after completing the project.
The aim was to track the status of a variable in the life cycle of a project. After asking the opinion of the responder for each question, their belief rate was also asked to clarify the uncertainty of the risk factors. The statistical analysis is then carried out to specify the statistical description of the variable, find out the correlations between the variables, and determine their values in project success (as the dependent variable).
A new hybrid AHP and Dempster—Shafer theory of evidence is proposed, based on the uncertainty level of the risk factors. The proposed method could determine the total risk level range of each alternative, and then report the best alternative with the lowest total risk level range. Next, a Taguchi Method (L2^4) is designed for the experiments. The proposed method is used to solve 24 experiments where the condition of the experiments was different from one experiment to another.
The performance of the proposed algorithm is then evaluated using four indicators. The proposed method could solve all small, medium, and large-scale experiments (validating index). Moreover, it could find and report the project with the lowest total risk range in all cases. In order to check the performance of the proposed method in choosing projects with the lowest total risk factor, the maximum and minimum risk factors for available alternatives of each case study are compared (reduced risk indicator). The outcomes showed that the proposed hybrid method could select projects with the lowest total risk factor of up to 90.53% for small-scale studied cases, up to 94.45% for medium-scale studied cases, and up to 19.61% for large-scale studied cases. The proposed method solved the small-scale problems in [0.036, 0.054] s, medium-scale problems in [0.033, 0.088] s, and large-scale problems in [0.062, 0.557] s, depending on the nature of the project (processing time).
It is recommended to develop a Java application for the proposed method in this research, which could be completed by computer science researchers or manufacturing engineering researchers familiar with programming languages. It is also suggested to use different MADAM methods such as VICOR and TOPSIS to compare the functionality of the proposed method in this research with them.