1. Introduction
Tubular steel joints (TSJs), as seen in
Figure 1, are common and usually the main components of the steel structures used in offshore structures, bridges, buildings, and other applications [
1,
2,
3,
4]. These TSJs, whether applied offshore or in onshore areas, are connected by welding, which is sensitive to the type of loads, especially cyclic loadings, like wind and waves [
5,
6,
7], which can cause fatigue damage. Thus, to ensure structural integrity in these types of structures, the joints must be designed considering fatigue. The stress method [
4,
7,
8,
9] is mostly applied to calculate the fatigue resistance of TSJs. For this purpose, the SCF [
10,
11,
12,
13,
14,
15] is a key parameter for studying fatigue. The common way to predict the fatigue of joints is through empirical and numerical studies [
16,
17,
18]. As an example, Bao et al. [
19] worked on Y-three-planner-joints under axial loading and proposed equations to calculate SCFs. Rahmanli and Becque [
20] worked on two-planner KT-joints with the help of SolidWorks under balanced axial loading. Their results revealed that among the different non-dimensional parameters,
γ and
τ are critical parameters for SCFs, so increasing them results in an increase in SCFs. In addition, the maximum SCF on the chord member took place in the sectors around 180° and 225° < φ < 270°. Ahmadi et al. and Zavvar et al. [
21,
22] worked on multiplanar KT-joints in ANSYS APDL 2024 to derive equations to calculate SCF values. They proposed a set of equations with higher R
2, which indicates their accuracy. Kuang et al. [
23] developed formulae to calculate SCFs in some types of joins, including the T/Y-, K-, and KT-joints. The authors did not model the weld profile thanks to the capability of the shell element used in their approach.
Some methods are used to reinforce steel structures and joints, such as internal and external steel ring plates [
15,
24,
25,
26,
27,
28,
29,
30,
31,
32], fiber-reinforced polymer (FRP) [
33,
34,
35,
36,
37,
38,
39], doubler plates [
40,
41,
42,
43,
44,
45,
46,
47], and concrete and grout [
48,
49,
50]. In FRP methods, Zavvar et al. and Hosseini et al. [
51,
52] performed an investigation on KT-joints against axial loadings and bending moments. More than 3000 FE models were created and analyzed in ABAQUS to derive formulae to calculate the value of SCFs. Zavvar et al. [
53] conducted an investigation on uniplanar DKT-joints to calculate the maximum SCF using FE modeling. The solid element in ANSYS was used to create and analyze the models. SPSS was used to derive the formulae. The proposed formulae (with a high R
2, i.e., >0.9) were considered accurate enough to calculate the maximum SCF. Zhao et al. [
54] conducted a study on rectangular section joints reinforced with CFRP. The authors concluded that the crippling web capacity can be increased with CFRP.
In this research, finite element (FE) models of uniplanar KT-joints reinforced with FRP (which were previously verified in [
16,
55,
56,
57]) are used to extract data to calculate the maximum SCF. ANSYS v24 was utilized to generate the FE models and perform analysis under 16 axial loading conditions with different non-dimensional parameters (
Table 1). Later, among all loading conditions, the four parameters leading to the maximum SCF were selected (details in
Section 2). In the models with four loading conditions, FRP combined with five types of material was used. A collection of sample databases was generated in accordance with the findings of the FE investigation. SPSS and MATLAB were used to apply the statistical methods to the obtained data.
2. Models Characterization
In this section, the properties of the models and the modeling process are described. The models were created with different dimensions (e.g., diameters, length, and thickness). The combination of those dimensions created some well-known dimensionless parameters (
Table 1). The selected specifications include a diverse range of tubular connectors that are often used in marine structures [
58]. SOLID 186 and SHELL 281 were used to model the DKT-joints in ANSYS; the solid elements were used for modeling the weld profile and members whereas SHELL 281 was used for FRP. The main purpose of the FE modeling in ANSYS is to extract the HSS from the weld toe. Therefore, there is no need to have a fine mesh in all parts of the members, which would increase analysis time significantly. It is sufficient to have a fine mesh in the HSS region (International Institute of Welding (IIW) [
59] recommendations) and weld profile. To this end, the sub-zone method is utilized to control the meshing quality (
Figure 2).
In total, 16 possible loading conditions were applied to uniplanar KT-joints. The four loading conditions that create the maximum SCFs are presented in
Table 2. A convergence investigation was performed on the meshing through the chord thickness and two elements were selected. Additional information is included in the references [
12,
13,
53,
60].
To model the weld profile, AWS [
61] recommendations are used. An important parameter for the weld design is the dihedral angle (Ψ). It can be calculated as indicated in
Figure 3 [
22,
50,
62].
FRP is like a sheet of paper. Hence, the best option to model it is using shell elements. Members are modeled with solid elements, and FRP is modeled with shell elements. In order to simulate the interaction between the solid members and the FRP, an accurate method is using the contact capability of ANSYS. In this technique, the contact is applied to the joint outer surfaces (element contact 174), and the target is applied to the FRP (target 170). Regarding the length of the selected FRP, previous research [
63] indicated that having an FRP longer than
for the chord members and
for the brace members is not necessary (
Figure 4).
Based on previous studies [
51,
52], fiber orientations of 90° and 0° have been used.
Table 3 provides information on the FRP materials. The third scheme was selected [
51,
52]. Additional information is provided in the references [
12,
13,
53].
SCFs are calculate as follows [
9,
12,
64]:
where
is the nominal stress. The hot-spot stress (HSS) is defined as the largest value of stress around the weld toe. The HSS is determined according to IIW [
59]. Hence, HSS is achieved from two points, 0.4 T and 1.4 T (
Figure 5). The nominal stress for these types of loading patterns is calculated as follows [
12]:
where
represents the nominal stress,
F is the axial load, and
A is the cross-section area of the loaded brace. The stress
σ is determined as follows:
where the 1st point is represented by
, and the 2nd point is indicated by
.
3. Statistical Procedures
The statistical analysis of the FRP-retrofitted two-planar KT-Joint under AX loading is described in this section. The models were generated considering seven parameters, namely the joint traditional non-dimensional and FRP parameters (β, γ, τ, θ, η, and ξ) and the resulting SCFs. The processing of sample data involves a quantitative statistical analysis aimed at determining global indicators across six dimensions. This procedure allowed for the creation of five sample databases that present the values of SCF for this type of joint. Then, using software such as SPSS v29 and MATLAB v24, the statistical analysis was conducted. Finally, the samples were submitted to detailed statistical procedures and methods.
3.1. Applied Methods
The statistical methods applied to the created datasets focus on six quantitative indicators. In the first step, with the help of univariate analysis, each parameter is described individually using some graphical representations, such as box plots. In the second step, the bivariate analysis, Pearson’s correlation coefficient is applied to analyze the linear relationships between the variables. Finally, in the multivariate analysis, two approaches, including multiple regression [
65,
66,
67] and hierarchical classification (HC) [
68,
69] are used to create classes of data and classes of indicators.
HC, often referred to as Agglomerative Hierarchical Clustering (AHC), is a method used to classify a set of elements by grouping them into a smaller number of classes. This classification is based on the principle that elements within the same class share similarities with each other and exhibit differences from elements in other classes. Unlike some other clustering methods, the number of classes in AHC is not predetermined but is determined as part of the clustering process itself. This iterative approach builds the classes gradually, merging similar objects into clusters until a stopping criterion is met, resulting in a hierarchical structure of clusters.
The hierarchical classification (HC) method involves two critical decisions. The first one being the determination of the measure of comparison between the pairs of elements to be classified. This choice depends on whether the elements being classified are variables (similarity type) or individuals (dissimilarities). The second consists of selecting the criterion to measure the proximity between two classes. It is essential to recognize that these two choices are pivotal factors as they can significantly influence the outcome of the clustering process. Careful consideration of these choices is crucial since they play a vital role in shaping the final clustering results.
3.2. Univariate Analysis
The models were analyzed to obtain SCFs, which were then categorized into five distinct samples for further analysis. The 1st sample comprised the SCFs from the 1st loading condition, while the 2nd sample included SCFs from the 2nd loading condition. Similarly, the third and fourth samples consisted of SCFs from the 3rd and 4th loading conditions, respectively. The fifth sample contained the maximum SCFs of all the 16 loading conditions.
Table 4 displays, for each sample, the SCFs’ descriptive measures. A positive skewness indicates that the distribution tail is skewed towards higher values; therefore, all samples show positive asymmetry. Moreover, the kurtosis value in the third sample surpasses that of the other samples, suggesting a probability distribution with a pronounced peak. In contrast, the kurtosis value in the fourth sample is lower than that of the other four, indicating a probability distribution with a less pronounced peak for this sample. In this research, the number of the considered parameters is 6 (
Table 1).
Thus, looking at the 5 samples in
Table 4, sample 1 and sample 5 have higher values: sample 1 has an average of 20.51, a variance of 110.1, a mode of 13.42, a standard deviation of 10.50, and a maximum of 56.76; sample 5 has a median of 18.02. Sample 2 (the second loading condition) has a skewness of 0.85 and the maximum is 34.49.
Figure 6 displays box plots illustrating the distributions of sample 1 through sample 5. Each box plot is composed of the minimum, first quartile, median, 3rd quartile, and maximum, dividing the distribution into four intervals, each showing 25 percent of the respective distribution. This set of box plots enables comparisons between sample 1 and sample 5 within each dimension, as well as comparisons of indicators across dimensions. The samples demonstrate significant concentrations in the lower range of the distribution. (positive asymmetry). The sample 1 and sample 5 box plots reveal an accentuated negative asymmetry of the distribution. Sample 1 and sample 5 have a significant concentration between the first quartile and the median. Other samples (2, 3, and 4) are symmetrical. Upon comparing the samples, it can be inferred that sample 1 and 5 exhibit the widest range of values, while sample 4 has the narrowest range. The sample 4 box plot displays a few lower–moderate outliers.
Figure 6 also shows the behavior of the samples. Sample 5 has the widest range of values, and sample 4 has the lowest. Furthermore, sample 5 and sample 1 have values that are greater than the values of the other samples.
3.3. The Histograms
In order to construct a density histogram, it is necessary to partition the range (R) into many classes. There are several suggestions to calculate the number of classes, such as the Sturges [
70] and Freedman–Diaconis (FD) [
71,
72,
73,
74] method.
In this study, the FD [
71] rule was used to calculate the number of classes. It is determined as follows:
Table 5 indicates the FD values for each sample.
Figure 7 indicates the histograms with normal distribution. According to the skewness and kurtosis values (
Table 5), it can be inferred that the histograms exhibit a larger right tail compared to the left tail.
Figure 7 indicates that the values of the kurtosis are higher than 3 for four samples, including 1, 2, 3, and 5 (
Table 4), hence suggesting a leptokurtic distribution thinner than the standard normal distribution. Only for sample 4 is the kurtosis value smaller than 3, indicating a platykurtic distribution.
3.4. Bivariate Analysis
This section addresses the bivariate analysis, which involves examining the relationship between two variables. Measures of association are very common for this analysis, and one of the most popular methods is the Pearson correlation coefficient (PCC), which can involve two or more variables. Correlation simply measures the linear association between variables without any implication of cause and effect. The sign of the correlation coefficient indicates the direction of the relationship, a positive correlation signifies that the variables fluctuate in the same direction, while a negative correlation signifies that they vary in opposite directions. The Pearson coefficient is defined as follows:
Therefore, the Pearson correlation coefficient is a standardized covariance between −1 and +1. When it is close to 1 it indicates the best positive correlation. In this section, instead of presenting all sample results, just the results of sample 5 are presented for the sake of brevity.
In the bivariate analysis of the samples, Pearson’s correlation determines correlation values for samples and parameters (
Table 6 and
Table 7). This analysis indicates that
γ values are correlated with the
η values in each dimension and also between them. The
γ-SCF and
τ-SCF correlation values are higher than for the others. The correlation between the
β-SCF,
θ-SCF, and
ξ-SCF is weak between all parameters, while the
η with other indicators is the lowest.
Table 6 and
Table 7 indicate that parameters γ and τ have the best correlation with the SCF and that parameters
γ and
ξ have a high correlation together. Furthermore, it shows that, among other parameters, these three are important and have significant roles in predicting SCF. The lowest correlation is related to
θ; hence, it means that this parameter has no relevant effect on the structure’s behavior against fatigue. Therefore, the thickness of the braces and the chord are much more important than the other variables in protecting tubular joints against fatigue.
4. Scatter and P-P Plots
Figure 8 indicates P-P and scatter plots of the samples. P-P plots (
Figure 8 left) are typically used in regression analysis to assess the assumption of normality of the residuals. The X-axis represents the cumulative probability of the observed residuals, and the Y-axis represents the cumulative probability that would be expected. Deviations from the line indicate departures from normality. Large deviations can suggest problems with the model or the need for transformation of the dependent variable. The scatter plots (
Figure 8, right) provide the relationship between the regression standardized residuals and the SCF (for all samples) values. The X-axis indicates the regression standardized residuals, whereas the Y-axis indicates the SCF values. Scatter plots with independent and dependent variables are presented. The joint parameters (
β,
γ,
τ,
θ,
η, and
ξ) are the independent variables, whereas SCF is defined as the dependent variable. In all samples, it can be seen from P-P plots (
Figure 8 left) that the points closely follow the diagonal line, which presents that the data are approximately normally distributed. Any major deviations from the line might indicate a problem with normality, but in plots, the points seem to be well-aligned with the line, suggesting that the normality assumption is acceptably met.
The scatter plots (
Figure 8, right) show that, for example, in sample 5, distribution seems to be a fan-shaped pattern, suggesting heteroscedasticity (variance of residuals increases with the predicted values). In the lower range of the SCF (up to about 20), the data are more tightly clustered around zero, but as the SCF increases, the spread of residuals increases, forming a distinct pattern. There are clusters of points, particularly in the mid-range of SCF values, indicating that certain ranges of SCF values have more residual variation than others. The presence of outliers can be seen at both ends of the SCF values, with some points far from the main cluster of data.
5. Multiple Regression
In this section, the multiple regression analysis is explained with the joint geometry parameters, FRP, and SCF. Several key parameters are important in the multiple regression analysis, such as R
2 and adjusted R
2 which can be calculated as follows:
where
represents the actual values of the dependent variable,
represents the predicted values from the model, and
represents the mean of the actual values;
indicates the sample size, and
presents the number of predictors.
Table 8 indicates the summary information of the samples. Samples 1 and 5 both have the highest R
2 values of 0.903, indicating that these models explain 90.3% of the SCF. Sample 4 has the lowest standard error of the estimate (2.03349), suggesting the predictions for this sample are the closest to the actual values. All samples show high R and R
2 values, suggesting that the samples fit the data well. Sample 5 is the most important sample, presenting the SCFmax in FRP-reinforced two-planar KT-joints subjected to AX loads. The results indicate that the model performs well across all samples, with strong correlations and high R
2 values, though the accuracy of predictions (as measured by the standard error) varies somewhat between samples. It shows that samples 1 and 5 demonstrate the best overall performance with the highest R and R
2 values.
Table 9 provides the variance analysis (ANOVA). The ANOVA table details the analysis of variance for five different regression models (Sample 1 to Sample 5). The sum of squares (SS) (regression) represents the variability explained by the regression model, and a higher value presents that the model explains a significant portion of the total variability. The sum of squares (residual) represents the variability not explained by the model, and a lower value shows that the model fits the data well. The degrees of freedom (DoFs) for the regression are 6, and for the residuals, there are 1289. The mean square (MS) is the sum of squares (SS) divided by the respective degrees of freedom. It is used in the calculation of the F-statistic. A higher F-value proves that the model is significantly more appropriate at determining the outcome than a model with no predictors. A value of <0.001 demonstrates that the regression model is significant.
Table 10 proves that all models have a
p-value (sig.) less than 0.001 and that the predictors collectively have a considerable influence on the SCF. The F-statistics are very high for all samples, further confirming the models’ overall significance. Higher F-values indicate that the model explains a substantial proportion of the variability in the SCF. For example, samples 1, 2, and 3 have a substantial proportion of the variability (F = 1993.656,
p < 0.001 and F = 1573.234,
p < 0.001). Samples 3 and 4 are significant with an F-value of 1080.158 (
p < 0.001) and F = 752.596 (
p < 0.001). The consistently high F-values and low
p-values across all samples indicate that the regression models are effective in explaining the variability in the dependent variable for each sample.
The standardized coefficients are shown in
Table 10. Standardized coefficients help to evaluate the effects of the other materials on the SCF. All the
p-values are less than 0.001, indicating that all predictors are statistically significant at the 0.05 level. The variable τ has the highest standardized coefficient, suggesting it has the strongest effect on the dependent variable. The variable β has a negative influence on the dependent variable. For example, in sample 1,
β,
, and
ξ are negative, indicating a negative relationship with the dependent variable. τ has a high value, indicating a strong positive relationship.
γ and
θ are positive, showing a positive relationship. Based on the Beta values in
Table 10, it can be seen that two parameters, including
γ and
τ, have considerable effects on the SCF. The lowest effects for variables belong to the inclination angle (
θ). For example, the value of the θ in samples is close to zero, especially in samples 2, 3, and 4.
6. Hierarchical and Non-Hierarchical Classification
The importance of clustering in multivariate statistical analysis is that it allows for the identification of patterns in the data that may not be readily apparent from the raw data and to group similar data points together, reducing the dimensionality of the data and making it easier to interpret and visualize. In this case, considering the huge dataset available, with more than 1296 observations, this classification is essential for the multivariate analysis of these data.
The classification of observations in each of the clusters is, in general, more rigorous in non-hierarchical methods; however, it is advisable in a cluster analysis problem to start with hierarchical methods for the purpose of exploration and then proceed with the non-hierarchical, in the case of k-means, to refine and interpret the cluster solution. In this work, the hierarchical method was not fully successful due to the size of the dataset, making it impossible to apply this method to the entire set, but instead only to a minimal part, for the learning and understanding criteria of the method. The k-means method was able to be applied to the entire dataset, resulting in a very useful analysis.
A good distance method is one that results in a clear separation of groups in the dendrogram, with similar samples forming a single branch and different samples forming separate branches. The Euclidean distance (it is known that the comparison criterion has more effect on the classification than the distance measure used) was used, and the results are presented in the dendrograms below (
Figure 9 and
Figure A1). This type of graph makes it possible to easily understand the connections made between individuals and identify those who are more similar, which are used as the final result obtained via this hierarchical classification system.
In this case, the first observations show a peculiar behavior, which is the clusters of housing pairs with indices very close to each other. For the sake of brevity, only data from sample 5 are represented. When comparing the dendrograms resulting from the four comparison criteria mentioned, it can be observed that only the two formed via ward and single linkage present a tree that is quite different from the others. Therefore, the method of average distance between clusters, or average between groups, was used with the Euclidean distance measure. The proximity matrix,
Table 11, reveals the dissimilarity between the cases, while the agglomeration scheme indicates the order of aggregation of individuals in the respective clusters. The first cluster to be formed contains individuals with indexes 1153 and 1156, which have a smaller average distance between them. In the second step, the cluster is formed with subjects 901 and 904, and so on.
The dendrogram was omitted due to its long length. In the initial stages, clusters are combined at very low coefficients (0.1), indicating they are quite similar or close together. The “Coefficients” column contains the proximities shown in
Table 11. As the process continues, these coefficients increase, reflecting the growing dissimilarity between the clusters being combined. As we move down the table to higher stages, the coefficients increase, indicating that clusters being combined are becoming less similar. For example, by stage 1294, the coefficient is 17.291, and by stage 1295, it increases significantly to 29.821. The last stages (1293–1295) involve the combination of larger clusters, resulting in significantly higher coefficients. This suggests that in the final stages, the clusters being merged are quite distinct from one another.
Table 11 provides a detailed view of the hierarchical agglomerative clustering process, showcasing how clusters are incrementally combined from initial stages with low dissimilarity to final stages where the clusters are significantly different. This progressive increase in coefficients reflects the growing differences between clusters as the clustering process consolidates more diverse groups.
It can be seen in
Table 11 that most of the combined clusters were the result of close index pairs, with a few exceptions. This is due to the previously mentioned fact that data are very similar from one time to the next, and therefore tend to be grouped together.
- (a)
Non-hierarchical cluster (NHC) grouping
NHC grouping approaches are intended to exclusively group individuals, or observations, into a set of clusters where the number is defined by the analyst. This is very useful when working with very large datasets. The method used for this classification is the k-means method, which starts from a previously defined number of classes (in this case, the number of classes used is that resulting from the application of the previously presented hierarchical classification, and the respective cutoff of the dendrogram in k classes, that is, 16 classes), and each of the classes is assigned an individual who will function as the center of the respective class.
Table A1 presents the average of each variable in each of the 10 clusters.
The analysis carried out with SPSS, using 10 clusters with 10 iterations, proved to be reasonably appropriate for the purpose of this work. However, for the purpose of comparison, a second analysis was carried out using the MATLAB programming language, which allows for more iterations without compromising the computer’s memory, like SPSS. Therefore, an analysis was carried out with the same clusters but with 300 iterations, as seen in
Table 12.
Table 13 shows the variation in the center of the clusters at each step of the iteration. It would be possible to increase the number of iterations, but this requires a high-performance computer. The parameters like
β,
τ, and
γ have more uniform values across clusters, suggesting these are more consistent features. Other parameters like
η,
ξ, and SCF show more variability, indicating these are more distinctive features that differentiate the initial clusters. The values show the initial conditions or centers from which the clustering process starts. Some variables have a wide range (e.g., SCF from 5.12 to 56.76), indicating heterogeneity among initial clusters. Others are more constrained (e.g.,
β between 0.4 and 0.7), suggesting some homogeneity.
Table 14 shows the number of cases assigned to each of the 16 clusters after the initial clustering process. The total of 1296 cases being valid and with none missing indicates the completeness and integrity of the dataset used for clustering. The number of cases per cluster ranges from as few as 14 cases (Cluster 5) to as many as 147 cases (Cluster 10). There is considerable variability in cluster sizes, indicating a diverse spread of data across clusters. The largest cluster, Cluster 10, has 147 cases, making it the most populous, but the smallest cluster, Cluster 5, has 14 cases, making it the least populous. Clusters with a large number of cases (e.g., Cluster 10 and Cluster 9) may represent more common patterns or groupings within the data. Clusters with a small number of cases (e.g., Cluster 5 and Cluster 6) might represent outliers or less common groupings, which could be significant depending on the context. The presence of clusters with moderate sizes (e.g., Clusters 1, 2, 3, 4, 7, 8, 11, 12, 13, 14, and 15) suggests a balance between common and less common patterns.
It can be seen from
Table 14 that
θ,
τ,
γ,
β,
η, and
ξ represent the average of the respective features within each cluster. For example, for cluster 0, the average
θ is 0.5236, τ is 0.4, etc. SCF shows the average SCF (presumably some important metric or feature) for each cluster. For instance, cluster 0 has an average SCF of 7.9351. The SCF values show significant variation across clusters, indicating this feature plays a crucial role in differentiating the clusters. For instance, cluster 4 has the highest SCF value (52.8629), while cluster 0 has the lowest (7.9351). The centroids provide a summary of each cluster’s central tendency, highlighting the average values for each feature within the clusters. This information helps to understand the characteristics and differences between clusters.
7. Conclusions
A total of 5184 analyses were carried out on two-planar KT-joints under 16 AX loadings, and 1296 models were selected and analyzed under four loading conditions which created maximum stress. In the first stage, an FE analysis was conducted, and then a univariate analysis was carried out, dealing with the descriptive statistics of each variable, making it possible to identify important information such as measures of dispersion and location, such as mean, median, and standard deviation, as well as histograms to verify the distribution of their values and box plots to identify possible outliers and distribution of quartiles. A bivariate analysis was then carried out, where the correlations between the variables were identified using the Pearson coefficient and also multiple forms of regression to verify the degree of linearity between pairs of variables.
Among three angles, including 0°, ±45°, and 90°, for FRP orientation, the effective fiber orientations are 90° and 0° in the chord, while FRP orientations on the brace have no effects on chord SCFs. The highest and lowest SCFs were 56.76 and 0.012, respectively. The SCFmax was located at the saddle point of the central brace under the 1st loading condition.
In the multivariate analysis stage, the analysis of the main components was first carried out, where it was possible to decrease the size of the dataset into two variables that contained more than 80% of the initial information, proving to be an efficient approach to reducing size in this case, a variable housed the direction variables, and another housed the height and period variables.
Bivariate analysis shows that θ has the lowest correlation and τ and γ have the highest correlation among others. Hence, it means that θ has no effects on the structure’s behavior against fatigue. However, the thickness of the braces and the chord are much more important in tubular joints against fatigue.
Hierarchical and non-hierarchical classification analyses of the observations were carried out. The first proved to be incapable of being carried out on such an extensive set of data, compromising the computer’s memory, but based on knowledge of the methodology, dendrograms and tables were created using a smaller subset of data. The non-hierarchical approach, using the k-means clustering technique, proved capable of being applied to a large dataset and to be quite efficient in grouping representative SCF samples.
Multivariate data analysis provides a theoretical and practical framework for studying complex data by examining relationships between multiple variables simultaneously, by reducing the number of variables