2. Literature Review/Background
Issues related to the well-being of countries with a high level of economic development have constantly attracted the attention of economists around the world, starting with the famous work of Adam Smith, “An Inquiry into the Nature and Causes of the Wealth of Nations” [
1]. We refer to this famous investigation to show that, starting from this work, macroeconomists in many countries are trying to answer the question “Why are some countries so rich and others so poor?”. Our study contributes to some extent to the solution of this issue. The relevance of this issue is increasing due to the challenges that countries face today. Such challenges include the economic recession due to the pandemic, the need to reduce the unemployment rate, the volume of external debt, and others.
The author of [
2] illustrates the steps to construct a measure that accounts for multi-dimensional inequality across European regions in human development. A multi-dimensional index to explore the inequalities across regions has been produced. Results show a generally increased level of human development achievements despite a widespread, persistent level of inequality in its distribution.
The author of [
3] aims to conduct a multi-dimensional comparative analysis of E.U. countries’ implementation of a circular economy. Based on the analyses, it was concluded that these of the old E.U. countries are the most advanced in terms of a circular economy among all E.U. countries.
The scope of [
4] is to investigate the performance of listed companies of the construction sector in the southern European member states of the E.U., namely Greece, Italy, Portugal, and Spain, taking into consideration their economic conditions as the recent economic crisis has formed them. The structure of the proposed methodology is based on a non-parametric method of Data Envelopment Analysis. Using statistical regression methods is attempted for possible correlation between each country’s efficiency scores and a group of key macroeconomic variables that can show the possible changes in the effects of crisis between the countries under examination.
Jacek Batóg and Barbara Jacek [
5], in their study based on discriminant analysis, identify the key determinants affecting economic growth. The results obtained allowed us to conclude that the gross domestic product growth rates in E.U. countries were determined by consumption, investment, exports and labour productivity, and during periods of economic downturn and public debt.
The purpose of [
6] is to verify the long-range correlation between the stock markets of the largest economies in the world and the individual exchange rate with the USD. The authors analysed the situation according to different time scales using detrended cross-correlation and detrended moving average cross-correlation analyses and the respective correlation coefficients. Their main results showed that the exchange rate does not significantly affect European markets.
Medium-term forecasts serve as a basis for compiling a multiple-budget of public finance and preparing or updating the Convergence Programme of the Slovak Republic. The subject of forecasting [
7] includes the following indicators: generation of GDP in current and fixed prices and the structure of its use; foreign trade and a current account of the balance of payments; the inflation rate, growth of manufacturers’ prices and growth of deflators; the labour market—average nominal and real wage in the national economy, rate of employment and that of unemployment; and other derived indicators.
In [
8], the issues are considered of how national governments, the private sector, and international institutions respond to the challenges of poverty, inequality, climate change, and others as we emerge from a crisis that has affected us all.
The main question that the authors explore in [
9] is whether it is possible in today’s globalising world to talk about the uniformity of the competitiveness of economies. Such questions are crucial in the case of such political and economic structures as the European Union. The strategic development goals of the E.U. include the desire for the harmonious development of all its members. The aim of the work is a multi-dimensional comparative analysis of the competitiveness of the economies of the European Union countries. In studying the spatial differentiation of the economies of the E.U. countries in the context of their competitiveness, a taxonomic measure of development based on the Weber median vector was used.
The subject of [
10] is a comparative analysis of the level of development of the European Union. The research is based on data collected in the Eurostat database, which the European Commission uses to implement the E.U.’s sustainable development goals. Multi-dimensional scaling was used to determine the similarities and differences at the level of development of the E.U. countries. According to the variable analyses, the results obtained made it possible to identify similar and different countries.
In [
11], a composite index proposed and constructed based on a dynamic relative taxonomy was used in the conducted research to present the diversified distance of the individual E.U. countries concerning the E.U.-level targets and the national-level targets of the Europe 2020 Strategy.
The authors of [
12] propose a new multi-dimensional, intertemporal measure of economic insecurity that accounts for its multiplicity and dynamism.
In [
13], the authors claim that studies of economic inequality almost always separately examine income, consumption, and wealth inequality, and hence, miss the critical synergy amongst the three measures explicit in the life-cycle budget constraint. However, these joint distributions are essential in evaluating the macroeconomic impacts of changes in income because the response may differ across the wealth distribution. This heterogeneity in response to income changes can significantly impact the effectiveness of government fiscal policy.
In [
14], Ruiz Estrada applies different multi-dimensional graphical models on macroeconomics and shows various applications from different multi-dimensional coordinate spaces offered by Econographicology.
The work in [
15] shows the change in prices during the pandemic. The analysis is carried out by predicting prices and their impact on product development. The structure of the price change forecast model was based on a time series analysis using the ARIMA procedure. The construction of the predictive model was based on extrapolation.
In [
16], using the theory of approximations, some problems of the modern economy associated with making decisions under uncertainty are analysed.
The authors of [
17] developed a macroeconomic SIR model that considers herd immunity, behaviour-dependent transmission rates, remote workers, and the indirect externalities of lockdowns. It is formulated as an exit time control problem where a social planner can prescribe different levels of the lockdown low-risk and high-risk portions of the adult population. The model predicts that high-risk individuals can leave lockdown sooner than in models where herd immunity is not considered by considering the possibility of reaching herd immunity. Overall, the model-determined optimal lockdown strategy, combined with individual actions to slow virus transmission, can reduce total mortality to one-third of the model-predicted no-lockdown level of mortality.
One scientific paper [
18] seeks to address the relationship between corporate social responsibility, intellectual capital, and performance environment. The questionnaire method was used for the targeted research objectives. The findings support the idea of a strong relationship between corporate social responsibility, intellectual capital, and performance in the business environment.
Undoubtedly, when conducting macroeconomic analysis, it is impossible not to note the generally recognised publications of such authors as Robert J. Barro [
19,
20,
21], Paul M. Sommers [
22,
23,
24], Danny Quah [
25,
26,
27], and other authors.
In our study, we consider the economies of European countries in 2020. In the next study, we plan to conduct a study of the economies of European countries in dynamics. From this perspective, the works of such well-known economists as James H. Stock [
28,
29], Lucrezia Reichlin [
30,
31], Massimiliano Marcellino [
32,
33], and some other scientists should be mentioned.
4. Main Part/Results
4.1. A Visual Representation of the Relative Position of the Economies of European Countries
Graphical presentation of information has a significant advantage—clarity. If two (or three) variables describe the objects of study, the graphical representation of these objects does not cause any difficulties. In our case, we are in a 20-dimensional space. Nevertheless, in this case, it is also possible to visually see all the initial information not for individual variables or their sets in the form of slices but by taking into account all multi-dimensional information as a whole. This opportunity gives us multi-dimensional scaling, the main idea of which is a visual representation of all multi-dimensional information in the aggregate. In this case, of course, there is some distortion of the initial information, but the mathematical methods underlying the multi-dimensional scaling make it possible to reduce this distortion to a minimum [
34].
The multi-dimensional scaling methods take an input matrix that defines the differences between elements and outputs a coordinate matrix configured to minimise the loss function. Pairwise distance between objects is defined as Euclidean distance in multi-dimensional space.
These distances are the entries of the dissimilarity matrix
Multidimensional scaling minimises the cost function named Stress:
The goal of multidimensional scaling is, using the matrix D{\displaystyle D}, to find {\displaystyle M} N vectors {\displaystyle x_{1},\ldots, x_{M}\in \mathbb {R} ^{N}} such that for all . In other words, the ratio of the distances between pairs of objects in the space depicted visually should be as much as possible equal to the ratio of the distances between the corresponding pairs of objects in the original multidimensional space.
The steps in the classical MDS algorithm are as follows:
From D calculate A =.
From A calculate B =, where is the average of all across j.
Find the p largest eigenvalues of B and corresponding eigenvectors L = (L(1), L(2), …, L(p) which are normalised so that L(i)′ L(i) = . (Here, it is assumed that p is selected so that the eigenvalues are all relatively large and positive.)
The coordinates of the objects are the rows of L.
We use the most commonly accepted multi-dimensional scaling procedure ALSCAL. The result of applying multi-dimensional scaling is shown in
Figure 2.
A small value of the Stress parameter = 0.00012 shows the excellent adequacy of the obtained diagram to the actual multi-dimensional ratio of the economies of European countries.
The initial analysis of the resulting diagram indicates the unique position of Germany, the United Kingdom, France, Italy, Russia, Spain, Luxembourg, and Ireland relative to other countries. The noted countries may differ significantly among themselves in individual macroeconomic indicators. It is important to note that in terms of the ratio of the distances on the diagram between countries, one can judge their economic proximity or difference qualitatively and quantitatively, namely how many times the differences revealed between pairs of countries differ from each other. A more detailed further analysis of the undoubtedly helpful obtained diagram is also possible, which we do not carry out in this study for the following reasons.
Baselines include those measured in absolute terms, such as Population and GDP (in total). Therefore, even if they do not have a sufficiently high level of economic development, some countries can occupy a worthy place among other countries with high economic indicators. Examples include countries such as Russia and Turkey. The relatively large population in these countries leads to rather large values of the GDP variable (
Figure 3), while the relative variable GDP per capita PPP has a low value for these countries.
Therefore, we remove such variables as Population and GDP from further analysis, focusing on relative variables. Moreover, we remove such variables as Female Life Expectancy at birth and Male Life Expectancy at birth since we cannot influence them in a directive way, meaning by command. Unlike these variables, the values of, for example, Personal Income Tax Rate, %, Sales Tax Rate, %, Interest Rate, %, Government Debt to GDP, %, and others can be changed in a command manner. Consequently, there are opportunities to quickly influence the change in the economic level of a country. At the same time, we leave for further analysis the variables of Retirement Age Men and Retirement Age Women, since we can set the values of these variables ourselves, and thus possibly influence the country’s economy.
Leaving the 16 initial variables in the analysis and repeating the multi-dimensional scaling procedure, we get the map shown in
Figure 4. A small value of the Stress parameter = 0.00003 shows a good reflection of the actual economic situation.
The analysis of the chart reveals the special position of Luxembourg. This country is characterised by high values of the variables GDP per capita PPP = USD 110,261, Trade Balance = 38.72% of GDP, low value of the variable Net External Debt = −2611.3% of GDP. Ireland, Switzerland, Norway, and Cyprus also stand out. When conducting cluster analysis, we will get a more detailed description of the countries in the next section. The directionality of variables is determined by comparing the values of variables for countries occupying opposite positions in the diagram.
4.2. The Division of Countries into Clusters. Prediction Based on Discriminant Analysis
We use the hierarchical cluster analysis [
35] procedure to perform clustering. As a method of cluster organisation, we use the Ward method, which is based on minimising the variance in each cluster. As a result, multi-dimensional clusters with a compact spherical shape are obtained. Note that we do not pre-assign any boundaries among clusters when conducting cluster analysis. The program finds the most significant gaps in the values of variables and draws boundaries between clusters using them.
We will construct the dendrogram to determine the optimal number of clusters (
Figure 5).
As follows from the view of the dendrogram, at the global level, all countries can be divided into two clusters. The first cluster includes 20 countries, the second includes 22:
Cluster 1: Austria, Belgium, Germany, Denmark, Ireland, Iceland, Spain, Luxembourg, the Netherlands, Norway, the United Kingdom, Malta, Italy, Slovenia, Portugal, Greece, Finland, Switzerland, Sweden, and France.
Cluster 2: Czech Republic; Cyprus; Lithuania; Estonia; Poland; Hungary; Slovakia; Latvia; Romania; Turkey; Croatia; Russia; Bulgaria; Belarus; Montenegro; Serbia; Macedonia; Bosnia and Herzegovina; Albania; Ukraine; Moldova; and Kosovo.
The economy of the countries of the first cluster is characterised by high values of the variables GDP per capita PPP, Corporate Tax Rate, Personal Income Tax Rate, Government Debt to GDP, Current Account to GDP, Saving Rate (National savings/GNDI), and Trade balance and low values of the variables Unemployment Rate, Interest Rate, Inflation Rate, and Net External Debt, compared to the countries of the second cluster (
Table 2).
The significance of the differences will be determined in the next section based on the analysis of variance. It is possible to characterise the countries of the first cluster as countries with thriving economies, while the countries of the second cluster are characterised by an insufficiently high level of economic development.
It is interesting to note that according to the value of the GDP Annual Growth Rate, the previous variable, because of the pandemic’s negative impact, had a more significant impact on the countries of the first cluster. For the first cluster, this variable has a value of −0.86 compared to the same indicator −0.02 for the countries of the second cluster.
The division of countries into two clusters shows the fundamental difference between economically successful countries and countries with a relatively low level of economic development. For a more detailed analysis, based on the obtained dendrogram, we will divide the countries into seven clusters:
Cluster: Austria; Finland; Norway; Sweden; France; Malta; Slovenia;
Cluster: Belgium; United Kingdom; Italy; Spain; Portugal; Greece;
Cluster; Denmark; Germany; Iceland; Netherlands; Switzerland;
Cluster: Ireland; Luxembourg;
Cluster: Czech Republic; Cyprus; Lithuania; Estonia; Poland; Hungary; Slovakia; Latvia; Romania; Croatia; Bulgaria; Serbia;
Cluster: Turkey; Russia; Belarus; Ukraine;
Cluster: Montenegro; Macedonia; Bosnia and Herzegovina; Albania; Moldova; Kosovo.
The mean values of the variables for each cluster are shown in
Table 3.
In order to visualise the characteristics of each cluster, we present graphs of the average values of the analysis variables by clusters (
Figure 6).
Each cluster has its most significant characteristics, by which we can judge the distinctive economic characteristics of the countries included in this cluster. For example, the countries of the first cluster are distinguished by high values of taxes determined by the variables Corporate Tax Rate, Personal Income Tax Rate, and Sales Tax Rate and low values of the variables Interest Rate and Inflation Rate.
Unlike cluster analysis, in which clustering is carried out on the basis of initial given data, discriminant analysis makes it possible to make predictions in which of the existing clusters the country under consideration will fall if the values of macroeconomic indicators change in a certain way. Using discriminant analysis and based on cluster analysis results, predictions can be made. For example, when the indicators of the variables GDP per capita PPP = USD 32,000, GDP Annual Growth Rate, Last = 12%, GDP Annual Growth Rate, Previous = −10%, Unemployment Rate = 3%, Corporate Tax Rate = 20% Personal Income Tax Rate = 27%, Sales Tax Rate = 12%, Interest Rate = 2.7%, Inflation Rate 3.1%, Government Debt to GDP = 8%, Saving Rate (National savings/GNDI) 13%, Net External Debt = −18% of GDP, Trade Balance = 6% of GDP, Retirement Age Men = 66 years, Retirement Age Women = 62 years, Current Account to GDP = 5%, discriminant analysis shows that with a probability of 0.897, this country will belong to the third cluster; with a probability of 0.096, to the fifth cluster; and with a probability of 0.006, to the second cluster. The probabilities of this country belonging to other clusters are practically zero. Thus, this country will belong to the third cluster with a high degree of probability.
The cluster analysis results are in good agreement with the results of multivariate scaling. The slight differences are explained by the different approaches underlying these methods.
We will carry out a cluster analysis using the K-means method to identify more general characteristics of the original multi-dimensional data. Unlike the hierarchical cluster analysis, this method predetermines the number of clusters. In our case, we will set three clusters, allowing us to identify more general features of the classification of European countries based on initial multi-dimensional information and find the high-frequency cluster.
The analysis showed that the first cluster includes two countries: Luxembourg and Ireland, with equal distances from the centre of this cluster. The second cluster includes 12 countries: Austria, Belgium, Denmark, Finland, Germany, Iceland, the Netherlands, Norway, Sweden, Switzerland, France, and the United Kingdom. Iceland has the closest position in this cluster from its centre. The remaining 28 countries made up the third cluster. Croatia has the closest position to the centre in this cluster.
Figure 7 shows a box plot diagram for GDP per capita PPP, USD. The diagram clearly shows the differences in the values of this variable depending on the clusters.
4.3. Analysis of Variance
We use variance analysis to determine the significance of differences in the values of variables across clusters. Let us use the one-factor analysis of variance One-way ANOVA [
36].
If countries are divided into two clusters (see the previous section), the procedure results are shown in
Table 4.
Table 4 shows that the countries of the first and second clusters significantly differ in the values of the variables GDP per capita PPP, Retirement Age Men, Retirement Age Women, Corporate Tax Rate, Personal Income Tax Rate, Interest Rate, Inflation Rate, Government Debt to GDP, Current Account to GDP, Saving Rate (National savings / GNDI), and Trade Balance. For these variables, the significance level does not exceed the threshold value of 0.05. Close to the threshold is the significance of the differences for the Unemployment Rate variable. As for the variables GDP Annual Growth Rate, Last; GDP Annual Growth Rate, Previous; Sales Tax Rate; and Net External Debt, we cannot say that the countries of the first and second clusters differ significantly in the values of these variables. The significance level for the differences in these variables for the first and second clusters exceeds the threshold value of 0.05.
In the case of dividing countries into seven clusters, analysis of variance based on the Bonferroni post hoc test gives the following results.
According to the values of the variable GDP per capita PPP, the first cluster significantly differs from the fourth, fifth, sixth, and seventh, the second cluster from the third, fourth, sixth, and seventh, the third cluster from all the others except for the first, the fourth cluster from all the others, the fifth from the first, third, fourth, and seventh, the sixth from first, second, third, and seventh, and the seventh from all except the sixth.
For the GDP Annual Growth Rate variable, last, the program did not find any significant difference anywhere.
According to the GDP Annual Growth Rate variable, Previous, the first cluster differs significantly from the fourth, the second cluster from the fourth, the third cluster from the fourth, the fourth cluster from all the others except for the sixth, the fifth cluster from the fourth, the sixth cluster does not differ significantly from anyone else, and the seventh cluster from the fourth.
According to the Unemployment Rate variable, all clusters are significantly different from the seventh, and the seventh cluster is significantly different from all the others.
According to the Retirement Age Men variable, the first cluster differs significantly from the second, third, fourth, and fifth, the second cluster from the first, fifth, sixth, and seventh, the third cluster from the first, fifth, sixth, and seventh, the fourth cluster from the first and sixth, the fifth cluster from the first, second, third, and sixth, the sixth cluster from the second, third, fourth, and fifth, and the seventh cluster from the second and third.
According to the Retirement Age Women variable, the first cluster significantly differs from the second, third, fourth, and sixth, the second cluster from the first, fifth, sixth, and seventh, the third cluster from the first, fifth, sixth, and seventh, the fourth cluster from the first, sixth, and seventh, fifth cluster from the second, third, sixth, and seventh, the sixth cluster from all the others, and the seventh cluster from all the others except the first.
According to the Corporate Tax Rate variable, the first cluster differs significantly from the fifth and seventh, the second cluster from the fifth and seventh, the third cluster from the seventh, the fourth cluster does not differ significantly from anyone else, the fifth cluster from the first, the sixth cluster from the seventh, and the seventh cluster from the first, second, third, and sixth.
According to the Personal Income Tax Rate variable, the first, second, third, and fourth clusters differ significantly from the fifth, sixth, and seventh. The fifth, sixth, and seventh clusters significantly differ from the first, second, third, and fourth clusters.
According to the Corporate Tax Rate variable, the first cluster significantly differs from the fifth and seventh, the second cluster from the fifth and seventh, the third cluster from the seventh, the fourth cluster does not differ significantly from anyone else, the fifth cluster from the first, the sixth cluster from the seventh, and the seventh cluster from the first, second, third, and sixth.
The program did not find any significant difference in the Sales Tax Rate variable.
According to the Interest Rate variable, all clusters differ significantly from the sixth. The sixth cluster is significantly different from all the others.
According to the Inflation Rate variable, all clusters significantly differ from the sixth, and the sixth cluster significantly differs from all the others.
According to the Government Debt to GDP variable, all clusters differ significantly from the second, and the second cluster is significantly different from all the others.
According to the variable Current Account to GDP, the first cluster significantly differs from the seventh, and the second cluster does not differ significantly from anyone else. The third, fourth, and fifth clusters differ significantly from the seventh, the fifth and third cluster from the seventh, the fourth cluster does not significantly differ from any cluster, the fifth cluster from the first, the sixth cluster does not differ significantly from anyone else, and the seventh cluster from the first, third, fourth, and fifth.
According to the Saving Rate variable, the first cluster significantly differs from the second and seventh, the second cluster from the first and third, the third cluster from the second and seventh, and the fourth and fifth clusters significantly differ from the seventh, the sixth cluster does not differ significantly from anyone else, and the seventh cluster from the first, third, fourth, and fifth.
According to the Net External Debt variable, all clusters differ significantly from the fourth; the fourth cluster differs significantly from all the others.
According to the Trade Balance variable, the first, second, third, fifth clusters, and the sixth clusters differ significantly from the fourth and seventh; the fourth cluster differs significantly from all the others, and the seventh cluster also differs from all the others.
Among other possible combinations of clusters, analysis of variance did not reveal significant differences in the values of the studied variables.
4.4. Multiple Regression and Forecasting
If by carrying out multivariate scaling and cluster analysis, there is no need to select the dependent variable since all variables act as equal, then regression analysis requires the presence of a dependent variable. The issue is somewhat controversial. Generally, GDP per capita PPP is the dependent variable in determining countries’ welfare. Some researchers suggest taking Average Life Expectancy as a dependent variable when conducting such an analysis. The phrase defines another approach for assessing the wealth of a society: “The wealth of the state is determined by the availability of free time for its citizens” (Karl Marx) [
37]. Our study follows the traditional approach, choosing GDP per capita PPP as the dependent variable. The rest of the variables are taken as an independent. Multiple regression is usually performed using a linear model.
The Durbin–Watson criterion determines the presence of a negative phenomenon of heteroscedasticity. This criterion takes values in the range [0; 4]. The absence of the phenomenon of heteroscedasticity is determined by the value 2. In our case, the Durbin–Watson test takes a value of 2.208 (
Table 5), which is close enough to value two and indicates no adverse effect of heteroscedasticity on the regression analysis results. The residual analysis also indicates a favourable situation for performing multiple regression analysis.
The values of the R-squared coefficient = 0.913 (
Table 5), close to one, and the significance of 0.000…, (
Table 6), which is much lower than the threshold value of 0.05, indicate high adequacy of the model to the initial data.
The results of the regression analysis are shown in
Table 7. Of particular interest are the values of the standardised coefficients since they are reduced to the same scale. In this case, the strength of the influence of independent variables on the dependent variable GDP per capita PPP can be compared.
First of all, we note the variables, the standardised values of which are greater than 0.1 in modulus. Thus, we highlight the most significant independent variables that affect the dependent variable. Let us write them in order of their decreasing strength of influence on the dependent variable:
Personal Income Tax Rate (positive impact);
Net External Debt (negative impact);
Saving Rate (positive impact);
Trade Balance (positive impact);
Corporate Tax Rate (negative impact);
Current Account to GDP (negative impact);
Interest Rate (negative impact);
Retirement Age Women (positive impact);
Unemployment Rate (negative impact).
The results of the study show that an increase in the value of the dependent variable in a certain interval is possible due to an increase in the Personal Income Tax Rate, Saving Rate, Trade Balance, a decrease in Net External Debt, Corporate Tax Rate, Current Account to GDP, Interest Rate, and so on.
It is important to note that the results of the performed regression analysis show not only the order in which independent variables follow in terms of the strength of their influence on the dependent variable, but also how many times the strength of the influence of one variable is greater than the strength of the influence of the other. For example,
Table 7 shows that Net External Debt has a 1.6 times greater effect on the dependent variable than Trade Balance.
Thus, the regression analysis results allow us to identify the essential variables to focus on in the first place in solving the problem of increasing GDP per capita PPP.
It is interesting to note that the results obtained show a positive effect of the transfer of taxes from companies to individuals. At the same time, of course, social priorities come to the fore, but these issues are beyond the scope of this study.
The regression analysis results show that the convergence of the retirement ages for men and women has a positive effect on strengthening the economy. Note also that in many countries, the retirement age for men is higher than that for women. In some countries, such as Luxembourg, the values of these variables are the same.
We can also note the relatively low influence of the Inflation Rate and Government Debt to GDP variables on the value of the dependent variable. The reasons for this can be identified in subsequent studies.
The constructed regression model allows us to make predictions. So, with the same values of the variables as in
Section 4.2, except the GDP per capita PPP variable, which in the case of regression analysis we take as the dependent variable, its predicted value is USD 38,053. If we consider Slovakia, the result of multiple regression is USD 31,101. The real value of the dependent variable for this country is 30,330. The relative values indicate good convergence of multiple regression results.
4.5. Factor Analysis
Factor analysis allows compressing variables to reduce their number without significant loss of initial information [
38]. Factor analysis is based on identifying groups of variables. Variables in each group are correlated; therefore, they express some common essence. At the same time, the variables in different groups are weakly correlated with each other. Factor analysis allows an integrated form to express each group in the form of one latent variable, thereby dramatically reducing the number of variables for further analysis. In our case, we are using a procedure called principal component analysis.
The possibility of using factor analysis in our case is confirmed by the Kaiser–Meyer–Olkin criterion equal to 0.616, which is greater than the threshold value of 0.5 and the Bartlett sphericity criterion, for which the significance is less than the threshold value 0.05 (
Table 8).
The optimal number of principal components (latent variables) can be determined from the Scree plot diagram (
Figure 8). It can be seen from the diagram that three main components can be distinguished, the corresponding points of which lie on the steep slope of the diagram. We use the Varimax rotation method to interpret the selected components better.
When conducting factor analysis, the variable GDP per capita PPP is not included in the consideration. The factor analysis results are shown in
Table 9.
While determining the economic meaning of the identified main components, it can be noted that the first component is associated with the financial position of the country and the balance in economic relations with other countries. The second component is characterised mainly by the inflation and bank deposit rates; domestic liabilities and ratios mainly characterise the third. When determining the economic essence of the selected components, it is essential to consider the signs of the variables.
With the help of SPSS software and understanding the economic meaning of each selected component, it is possible to find the values of the components for each country and, in further analysis, sharply reduce the number of independent variables and use only 3 instead of 15. For example, multiple regression will take the form:
The standardised coefficients for the explanatory variables Factor1, Factor2, and Factor3 are 0.771, −0.261, and 0.124, respectively. Note that if the main components Factor1 and Factor3 positively affect the dependent variable and, therefore, play a positive role in strengthening the country’s economy, then the Factor2 component has a negative effect. R-squared has, in this case, the value 0.678.
A general diagram of the positions of European countries relative to each other in terms of their economic level of development can also be constructed using factor analysis. To do it by the principal component method, we have to select two latent integrated variables, Factor12 and Factor22 (
Table 10), save the normalised values of these variables and use them as coordinates for constructing a two-dimensional scatter plot (
Figure 9). Signs of factor loadings values (
Table 10) will indicate the direction of action of the initial macroeconomic indicators.
Comparison of the results of multi-dimensional scaling, cluster analysis and a diagram based on factor analysis shows their good convergence. A slight difference in results is inevitable since the methods are based on different approaches and provide approximations to reality. Multivariate scaling is based on calculating the distances between objects in multivariate space, while factor analysis is based on the study of correlations between variables.
If we select the two principal components, Factor12 and Factor22, the regression equation becomes:
In this case, R-squared = 0.676, and the values of the standardised coefficients for the Factor12 and Factor22 variables are 0.762 and 0.308, respectively. Both Factor12 and Factor22 positively affect the dependent variable GDP per capita PPP.
In the next section, the factor analysis results are used to construct an economic model using fuzzy modelling methods.
In 2020, the macroeconomic indicators were undoubtedly affected by the pandemic. We reflected its impact on GDP per capita PPP in
Figure 10. In addition, we took into account the impact of the pandemic in our multivariate analysis by introducing two variables: the GDP Annual Growth Rate Previous and the GDP Annual Growth Rate Last. It is explained by the change in the values of the variables during a pandemic. A detailed study of the impact of the pandemic requires serious investigation that we plan to do in our following paper.
4.6. Fuzzy Method
Fuzzy modelling [
39] will be performed based on the Fuzzy TECH 5.81 d Professional software package. As the output variable out, as in the case of regression analysis, we will take GDP per capita PPP, USD. We divide the values of this variable within (10,000; 120,000) into five gradations: very low values; low; medium; high, and very high.
As input variables, we take the three principal components obtained in the previous section as the result of factor analysis, Factor 1 (var1), Factor2 (var2), and Factor3 (var3). The values of these variables range from −3 to 3 since these variables are reduced to normal standard distribution with parameters (0; 1). We break these variables into three grades: low, medium and high.
The block diagram of the fuzzy logic model is shown in
Figure 11.
The distribution law graph for the variable var1 is shown in
Figure 12. The variables var2 and var3 have the same distribution laws. The variable out has five gradations.
The relationships among the dependent and independent variables are shown in
Table 11.
In the case of two independent variables, var1 and var2, which are the main components, Factor 12 and Factor 22, the relationship between the dependent and independent variables will be as follows (
Table 12).
Figure 16 shows 3D graphs of the functions out (var1, var2) in case of only two independent variables.
The created fuzzy logic models allow you to make a prediction for the output variable of a particular country.
Some other aspects of the investigations are given in the papers [
40,
41].