1. Introduction
Loads from the superstructure are transferred to the soil. Another function of the soil is to be used as a construction material; however, soils may not always be suitable to fulfill their intended function. For example, in some cases, inadequate bearing capacity of the soil, high compressibility, low workability, and high swelling potential may be encountered.
Soils that increase in volume with increasing water content and shrink when the water content decreases are swelling soils [
1]. Wet and swollen soils may even lose their shear strength. Dry soils shrink and leave large voids in the soil. In soils with high plasticity and high swelling potential, unstable fluctuations in water content cause large volume changes. These volume changes cause swelling, and the structures on these soils are damaged. Swelling soils can be improved by using lime (L), fly ash (FA), and similar additives. L and FA additives are frequently used as chemical improvement additives; however, the effect of these additives on extremely or very highly plastic clays is not clear.
In such cases, the compacted fill can be used, or the soil can be improved. Soil improvement can be achieved by mechanical or chemical methods and is more economical than compacted filling. One of the commonly used methods is compaction, but this method is not used for the improvement of fine-grained cohesive soils. In this case, the soil can be improved by chemical methods. Soils can be improved in situ by mixing with hydrated or quick L with specific physical and chemical properties. Soils can be stabilized using building materials such as L and cement [
2]. Many studies have been carried out to investigate the effect of chemical additives in improving the swelling properties of plastic clay [
3,
4]. The effect of L is especially high in swelling cohesive soils with high plasticity index and high water holding capacity. As a result of the reaction of L with clay, plasticity decreases, workability increases, water holding capacity decreases, and swelling pressure and swelling potential decrease.
Conditions such as swelling and shrinkage on the soil adversely affect the superstructure [
5]. Changes in the volume of soil due to the proportion of water can cause deterioration, buckling, and cracking of structures, roads, and ground floor slabs. Repairing these problems has negative economic consequences. There are many alternatives to deal with the problem of swelling soils. Advances in equipment, materials, and design methods have made soil improvement methods more effective and economical [
6]. The most preferred ones are soil replacement and chemical treatment. Chemical treatment is more advantageous than soil replacement as the soil is treated in situ.
Figure 1 shows the mechanical and chemical stabilization methods applied for expansive soils.
Soils are classified according to particle size. Gravel, sand, silt, and clay soils range from coarse to fine, respectively. Soils with particle sizes larger than 5 mm are called gravel, soils with particle sizes between 5 mm and 0.074 mm are called sand, soils between 0.002 and 0.074 mm are called silt, and soils with particle sizes smaller than 0.002 mm (2 μm) are called clay. The engineering properties of soils vary depending on many parameters such as unit volume weight, void ratio, clay mineralogy, etc. Changes in the physical and mechanical properties of the soil affect the physical and mechanical properties of the soil. The physical and mechanical properties of clay or clayey soils are especially directly affected by changes in the microstructure of the clay. Clay is a natural secondary mineral composed of hydrated aluminum and magnesium silicates. A distinctive feature of clay is that it forms mud when mixed with water [
8].
Clay soils exhibit different behaviors according to their water content. These soils, which are quite hard and durable when dry, dissolve and lose their strength when water is added to their structure [
9]. Depending on the clay mineral content, such soils swell, and their volumes may increase significantly when water is added. The changes in these soil masses also change their geotechnical properties and create negative consequences for the structures on them. To control swelling, it is necessary to understand the mechanisms that cause clay swelling. The swelling potential of clay soil is influenced by soil properties, environmental conditions, and soil stress [
9]. For these soils to be able to support the structures on them, a suitable improvement method should be determined and applied.
The processes based on changing the engineering properties of soils by chemical reactions with various additives (FA, L, etc.) are chemical stabilization [
10]. Chemical stabilization is used for the stabilization of expansive soils. Since reactions occur between clay minerals and L, L is a good option for improving cohesive soils with high clay content. There are various types of methods using L [
11]. L stabilization increases the strength and modulus of elasticity of the soil, reduces swelling potential and swelling pressures, and increases durability. It also improves field working conditions as it reduces the plasticity of the soil [
12]. FA is a waste material and, therefore, a cheap material [
13] that can be used in construction activities. The use of FA for superficial soil improvement is by mixing the soil with FA and then compacting it.
Figure 2 shows the Treatment process before and after L stabilization.
Traditional stabilization methods may not always be sufficient to achieve the desired results. Especially in soils with high plasticity and high swelling potential, as in a study [
14] from which this dataset was taken, mechanical stabilization methods may be insufficient. In this case, traditional methods are less effective.
Static properties of the soils include the index test results (i.e., Sieve Analyses, Atterberg Limits, void ratio (e0), specific gravity (Gs)), free swell and swell pressure characteristics, consolidation properties, and unconfined compression test results. Static properties are used to evaluate the strength (bearing capacity) and compressibility (settlement) characteristics of the subsoils in the engineering structures. Cyclic properties of the soils include the resilient modulus test results, which cover the test results for natural soils, lime-treated soils, and also some cure periods after the treatment process. The resilient modulus test is essentially a cyclic version of a triaxial compression test. Since the resilient modulus is stress divided by strain for rapidly applied loads similar to those experienced by pavements, the actual traffic loading is simulated more accurately by using this modulus value in designs.
In this study, the dataset from the PhD thesis [
14] from one of the authors titled “
Static and cyclic properties of expansive clays treated with L and FA with special reference to swelling and resilient moduli” was used. In cases where historical data are available, prediction can be performed using machine learning (ML). ML is based on being able to process data and is encountered in every aspect of our lives, such as the facial recognition systems in phones or recommending music similar to the music style listened to previously.
ML is now used in almost all fields with the innovations brought by technological advances. Using ML, predictions can be made without the need for human intervention. For example, it is used for cardiac arrest monitoring in the health sector [
15], dementia prediction [
16], crop yield prediction in the agricultural sector [
17], and many more.
Numerous tests are performed to determine the strength, swelling, and compressibility properties of the soil stabilized with chemical additives. These tests cause problems such as time, cost, labor force as well as environmental pollution problems; therefore, the aim of this study is to propose a set of ML models to predict the strength, swelling, and compressibility properties of the soil using engineering index properties of the soil (specific gravity, void ratio, sieve analysis (percent remaining on the No.4 sieve (+No.4), percent passing No. 200 sieve (−No.200), clay size), Atterberg Limits (liquid limit (LL), plastic limit (PL), plasticity index (PI), linear shrinkage (Ls), and shrinkage limit (SL)).
Many studies have been carried out using ML in civil engineering. Some of them involve the prediction of flexible pavement structural capacity [
18], predicting the risk of delay in construction projects [
19], prediction of the compressive strength of geopolymer concrete [
20], prediction of axial compression capacity of concrete-filled steel tubular columns [
21], optimum design of reinforced concrete (RC) columns [
22,
23], prediction of splitting tensile strength of basalt fiber reinforced concrete [
24], optimum dimensioning of post-tensioned concrete cylindrical walls [
25], prediction of alkali-activated concrete carbon emission [
26], soil classification [
27,
28], prediction of cooling load of building [
29,
30], prediction of optimum tuned mass damper (TMD) parameters [
31], modeling based on diffusion maps [
32], reliability analysis of failure of shallow foundations [
33], and confinement of concrete cylinders using Carbon Fabric Reinforced Cement Mortar [
34]. The studies in which soil improvement is performed by adding additives to the soil and ML is used are given below.
Gajurel et al. [
35] used logistic regression (LR), discriminant analysis (DA), k-nearest neighbors (KNN), and support vector machines (SVM) to predict the optimal additive content and determine the optimal model. Eyo et al. [
36] estimated the unconfined compressive strength (UCS) of stabilized soil using gradient Boosting (GB) and achieved a high coefficient of determination value (R
2 = 90%). Taffese and Abegaz [
37] predicted optimum moisture content (OMC), maximum dry density (MDD), and UCS of modified soil using optimizable ensemble technique (OEM) and artificial neural networks (ANNs). As a result of this study, both models showed close performance. Zeini et al. [
38] used a Random Forest (RF) algorithm to estimate the USC of geopolymer-stabilized clay soil. As a result, RF predicted USC with high success (R
2 = 0.97). Zhang et al. [
39] used an extreme gradient boosting model (XGBoost) to predict the compressive strength of cement soil and achieved success (R
2 = 0.93). Onyelowe et al. [
40] predicted the UCS of cohesive soil stabilized using cement and L using an ensemble-based ML classification technique. Gradient boosting (GB) and KNN showed the highest success (accuracy of 95%) for prediction in a dataset where the inputs were cement (C), L (Li), LL, PI, optimum moisture content, and maximum dry density. Ahmad et al. [
41] estimated the UCS of expanding clay soil treated with hydrated L-activated rice husk ash using Gaussian Process Regression (GPR) and Support Vector Machine (SVM). The inputs in this dataset were hydrated L-activated rice husk ash, LL, PL, PI, OMC, clay activity, and maximum dry density. SVM and GPR showed high predictive performance (R
2 > 0.99) in UCS prediction. Wan [
42] used the Naive Bayes (NB) ML model with metaheuristic algorithms for UCS prediction of soil stabilized with L and cement. The results of this study showed that the hybrid model achieved high prediction success (R
2 = 0.99).
As mentioned above, stabilization methods are chemical and mechanical methods. Chemical stabilization involves the use of additives, while mechanical stabilization involves the use of physical processes. The use of different experiments to determine the value of each parameter in different conditions presents a challenge in terms of time and cost. The UCS non-soaked, UCS soaked, swelling pressure, swelling percentage, mean resilient modulus (MRM), UCS after the resilient modulus test (RMT), compression index (Cc), and expansion index (Ce) values, which can be obtained using conventional methods, require equipment, high cost, environmental impact, and labor force. These critical engineering parameters in the stabilization process were estimated quickly and efficiently. In this study, the ML models tested on eight different datasets, each of which is designed for a different output, have shown effective performance in predicting the relevant output in each dataset.
Purpose of the Research
In this study, a series of ML algorithms were used to predict the strength, swelling, and compressibility properties of the stabilized soil where the actual effect of the remediated soil is desired to be seen. The inputs are specific gravity, void ratio, sieve analysis (+No.4, −No.200, clay size), Atterberg Limits (LL, PL, PI, Ls, and SL), and the outputs are UCS non-soaked, UCS soaked, swelling pressure, swelling percentage, MRM, UCS after RMT, Cc and Ce.
The PhD thesis [
14] used in this study, from which these data were obtained, focused on determining the stabilization effect of L and FA on highly and extremely plastic-expansive soils, especially in terms of swelling behavior and resilient modulus properties. The lack of clarity on whether these additives are sufficiently effective in extremely plastic clays and the swelling and shrinkage properties of soils stabilized with L and FA is the focus of this thesis. In the PhD thesis, the effect of the mineralogical composition of clay soils and the chemical constituents of the materials on swelling was also investigated. For this purpose, both the treated soils and the L and FA agents used in this research were characterized using different tests. Laboratory tests were conducted to examine the index, swelling, strength, compressibility, and resilient modulus properties of all natural soils and soil–agent mixtures in different proportions. Strength tests were carried out for both soaked and non-soaked conditions. The curing effect was also studied for different durations of 7 days, 28 days, 56 days, and 90 days. Most of the experiments in this study were conducted in accredited soil mechanics laboratories of Middle East Technical University (METU) in accordance with ASTM [
43] standards. RMTs were also performed at the technical research department of the General Directorate of Highways, and the equipment was calibrated before the experiments. Dissimilar to other studies, in this study, the resilient property, which is a dynamic soil property under repetitive loads, was predicted, and successful results were obtained.
Figure 3 shows the methodology of this study.
With advancing technology, data-driven methods such as ML can overcome the limitations of traditional approaches. ML is effective in overcoming the challenge, especially since traditional methods for estimating engineering parameters are time and cost-consuming. Using ML models, parameters such as UCS non-soaked, UCS soaked, swelling pressure, swelling percentage, MRM, and UCS after RMT, Cc and Ce were predicted with high performance using different datasets.
2. Materials and Methods
2.1. Data Collection
The dataset in this study was obtained from two different soil samples taken from different locations in Ankara, Turkey (
Figure 4). These include Esenboğa and Bilkent Clays. Esenboğa clay from clay deposits near Ankara Esenboğa Airport is an extremely highly plastic clay (plasticity index, PI = 101%). The main objective of this study [
14], from which the dataset was taken, was to test the behavior of clays with high plasticity and to evaluate the effectiveness of L and FA additives on such soils. In this context, Esenboğa and Bilkent regions of Ankara were selected as the study areas. These areas are naturally occurring areas where very/extremely plastic clays are found and provide the sampling conditions required for this study. Bilkent Clay from the clay deposits near Ankara Eskisehir State Highway is a highly plastic clay (PI = 76%). Both disturbed and undisturbed soil samples were taken from both sites for laboratory tests. Hydrated L and FA were used as chemical stabilizers. Hydrated L was used in the experiments. Since very low percentages of hydrated L were used for remediation, the effect of the water content of the L on the remediation was considered negligible. It was the water content of the soil that affected the result. Undisturbed soil samples were collected from both sites using metal tube samplers without loss of moisture content. The chemical composition of hydrated L used in treatment is given in
Table 1.
The FA used in this research was obtained from the burning coal of Soma in Manisa, Türkiye. The grade of this FA is a “C” grade according to ASTM C 618 [
45]. The samples are prepared and analyzed as pellet structures after being dried at 65 °C for one night and then at 105 °C for 2 h. These samples are studied within the range of Bor-Uranium (BU) and Fluor-Uranium (FU); then, the results are presented in the form of oxide. The major chemical constituents of Soma FA are given in
Table 2.
As understood from
Table 1, the L used in the research is a high calcium-hydrated L. A high calcium content increases the effectiveness of the clay stabilization process since it leads to the flocculation–agglomeration phenomena in the short term. Thus, pozzolanic reactions occur as an effective long-term stabilization because of the high calcium content.
The fly ash, on the other hand, used in this research is a Class C material with moderate calcium oxide—CaO (12%). The effectiveness of FA may not be as effective as the L agent depending on the properties of the clay to be stabilized due to the calcium oxide content.
The laboratory tests in this thesis were conducted to determine the index properties of the soil (i.e., Atterberg Limits, sieve analysis), strength, compressibility, and swelling parameters. Resilient modulus tests were also carried out at the soil laboratory of the General Directorate of Highways.
Initially, Atterberg Limits, Sieve Analyses, unconfined compression tests, and oedometer tests (free swell, swell pressure, and consolidation tests) were performed for the untreated Esenboğa and Bilkent Clays. Then, both Esenboğa and Bilkent Clays were mixed with different proportions of L and fly ash. L was considered a chemically active additive, and FA was considered to be a filler material. Natural soils were mixed with hydrated L at different percentages of the dry weight of the soil (1%, 3%, 5%, 7%, and 9%) and with FA at other percentages of the dry weight of the soil (5%, 10%, 15%, 20%, and 25%). By continuously increasing the L and FA content, the Atterberg Limits, sieve analysis, strength, compressibility, and swelling properties of the treated soil samples were investigated. All tests were performed according to ASTM Standards [
43].
In addition, different tests were performed on L-treated samples of Esenboğa Clay for curing times of 7, 28, 56, and 90 days (for 1%, 3%, 5%, 7%, and 9% percentages). Sampling depths are 3–4 m for Esenboğa and Bilkent clay. The sample quantities include 86 for unconfined compressive strength (UCS) non-soaked, 86 for UCS soaked, 86 for swelling pressure, 86 for swelling percentage, 122 for MRM, 182 for UCS after RMT, 26 for Cc, and 26 for Ce.
Resilient modulus tests with L alone for both Esenboğa and Bilkent Clays were performed without curing and for each L percentage (1%, 3%, 5%, 7%, and 9%) for curing times of 7, 28, 56, and 90 days. Sample tubes can be seen in
Figure 5.
Disturbed soil samples were obtained in large, heavy-duty bags (i.e., about 30 kg) at the site. They were oven-dried at 60 °C in accordance with FHWA reports for treatment of samples with L and fly ash. Then, the samples were pulverized by means of a wooden hammer. After that, soils were sieved using a No. 40 sieve in order to ensure that the samples are to be uniform. The soil samples were mixed with the agent until they were thoroughly blended. A high degree of homogeneity is achieved in this way. It should be noted here that a time, which is 1 h, should be passed between the mixing and compaction process according to ASTM D3551 [
46]. This time is called “mellowing time,” which can be accepted as a resting time for samples. The purpose of allowing mellowing time is to obtain a homogeneous, moisture-equalized, and pliable mixture.
The other samples for treated soil conditions are prepared in such a way that the samples are to be compacted with maximum dry density at optimum water content. A Harvard miniature compaction test apparatus was used for this purpose. This apparatus was developed by Wilson [
47] for a small amount of fine-grained soils. Since the size of the sample to be tested is small, many specimens can be produced in a short time. The strength and compressibility characteristics of the samples can be determined in addition to the determination of index properties of the soils. The soils to be tested are sieved through the No.4 sieve first in the Harvard miniature compaction method. The height and the diameter of the compaction mold are 0.0715 m and 0.033 m, respectively. The mold has a cylindrical tamping foot, the diameter of which is 0.0127 m. The soil is compacted in three layers. The number of tamps for each layer is 25. A Harvard miniature compaction test apparatus is shown in
Figure 6 below.
2.2. Dataset Description
In the dataset from the PhD thesis [
14], all outputs were together in a single dataset. In this study, first, rows with missing data were deleted, and outliers were checked. After, the dataset from [
14] is divided into eight different datasets, each using the same inputs in order to evaluate the performance of the ML models used on specific output variables separately.
The dataset consists of 393 rows. These data are split into 80% training data and 20% test data. Training data are used to learn the model, while test data are used to measure the success of the model. Data in this study include the engineering parameters of the soil. These data were collected from Esenboğa and Bilkent sites through experiments.
In this study, eight different datasets were used. Index properties, swelling characteristics (swelling pressure, swelling percentage), compressive strengths, resilient modulus, and compressibility characteristics for both soaked and non-soaked conditions are the outputs of this research.
The inputs and descriptions of some of them are given. Gs is the ratio of the density of the material to the density of water. e0 is the ratio of the volume of voids to the volume of solids. A sieve analysis (+No.4, −No.200) (granulometry measurements) is performed to determine the grain distribution rates. Clay size, Atterberg Limits (LL is the limit between the liquid and plastic state of the soil. PL is the lowest moisture content at which soil samples can be rolled into 3 mm diameter bars without disintegrating. PI is the difference between the liquid limit and the plastic limit of the soil. Ls is the reduction in the length of the soil sample when it is dried. SL is the water content at which the soil starts to behave, similar to a solid. Other inputs are cure day, agent, clay type, and agent percentage.
The eight datasets are introduced separately below. Unconfined compression tests in PhD thesis have been performed for non-soaked and soaked conditions.
Figure 7 shows a stacked column diagram of the UCS non-soaked dataset.
In the UCS non-soaked dataset, the cure day takes values between 0 and 56. The agent input has the properties Natural (0), FA (1), and L (2). The clay types are Esenboğa (0) and Bilkent (1). The agent percentage is between 0 and 0.25. Gs takes values between 2.5 and 2.74. The void ratio takes values between 0.92 and 1.56. No.4 takes 0 and 0.5 values. No.200 takes values between 0.792 and 0.998. The clay size takes values between 0.14 and 0.675. LL takes values between 0.72 and 1.66. PL takes values between 0.26 and 0.68. PI takes values between 0.24 and 1.24. Ls takes values between 0.16 and 0.33. SL takes values between 0.09 and 0.29. UCS non-soaked (output) takes values between 193 and 1316.
UCS: Unconfined compression tests in PhD thesis have been performed for non-soaked and soaked conditions.
Figure 8 shows a stacked column diagram of the UCS soaked dataset.
In the UCS soaked dataset, the cure day takes values between 0 and 56. The agent input has the properties Natural (0), FA (1), and L (2). The clay types are Esenboğa (0) and Bilkent (1). The agent percentage is between 0 and 0.25. Gs takes values between 2.5 and 2.74. The void ratio takes values between 0.92 and 1.56. No.4 takes 0 and 0.5 values. No.200 takes values between 0.792 and 0.998. The clay size takes values between 0.14 and 0.675. LL takes values between 0.72 and 1.66. PL takes values between 0.26 and 0.68. PI takes values between 0.24 and 1.24. Ls takes values between 0.16 and 0.33. SL takes values between 0.09 and 0.29. UCS soaked (output) takes values between 37 and 1293.
Oedometer tests were used to determine the swelling pressure values.
Figure 9 shows a stacked column diagram of the swelling pressure dataset.
In the swelling pressure dataset, the cure day takes values between 0 and 56. The agent input has the properties Natural (0), FA (1), and L (2). The clay types are Esenboğa (0) and Bilkent (1). The agent percentage is between 0 and 25. Gs takes values between 2.5 and 2.74. The void ratio takes values between 0.92 and 1.56. No.4 takes 0 and 0.5 values. No.200 takes values between 79.2 and 99.8. The clay size takes values between 14 and 67.5. LL takes values between 72 and 166. PL takes values between 26 and 68. PI takes values between 24 and 124. Ls takes values between 16 and 33. SL takes values between 9 and 29. swelling pressure (output) takes values between 3.5 and 408.9.
Oedometer tests were used to determine the swelling percentage values.
Figure 10 shows a stacked column diagram of the swelling percentage dataset.
In the swelling percentage dataset, the cure day takes values between 0 and 56. The agent input has the properties Natural (0), FA (1), and L (2). The clay types are Esenboğa (0) and Bilkent (1). The agent percentage is between 0 and 0.25. Gs takes values between 2.5 and 2.74. The void ratio takes values between 0.92 and 1.56. No.4 takes 0 and 0.5 values. No.200 takes values between 0.792 and 0.998. The clay size takes values between 0.14 and 0.675. LL takes values between 0.72 and 1.66. PL takes values between 0.26 and 0.68. PI takes values between 0.24 and 1.24. Ls takes values between 0.16 and 0.33. SL takes values between 0.09 and 0.29. Swelling percentage (output) takes values between 0 and 35.5.
MRM: Resilient modulus (M
r) is a measure of the stiffness of the subgrade material. The RMT is actually a cyclic version of a triaxial compression test; therefore, the resilient modulus basically represents the unloading modulus after several repeated loading cycles, allowing the simulation of road traffic. The resilient modulus value provides a basic relationship between the stress and strain of the material. Resilience is a dynamic soil feature.
Figure 11 shows a stacked column diagram of the mean resilient modulus dataset.
In the mean resilient modulus dataset, the cure day takes values between 0 and 90. The agent input has the properties Natural (0), FA (1), and L (2). The clay types are Esenboğa (0) and Bilkent (1). The agent percentage is between 0 and 0.25. Gs takes values between 2.5 and 2.74. The void ratio takes values between 0.92 and 1.56. No.4 takes 0 and 0.5 values. No.200 takes values between 0.792 and 0.998. The clay size takes values between 0.14 and 0.675. LL takes values between 0.72 and 1.66. PL takes values between 0.26 and 0.68. PI takes values between 0.24 and 1.24. Ls takes values between 0.16 and 0.33. SL takes values between 0.09 and 0.29. The mean resilient modulus (output) takes values between 27,874 and 340,780.
Unconfined compressive strength after resilient modulus test (UCS after RMT): Resilient modulus tests for both Esenboğa and Bilkent clays with only L agent were performed without cure and with 7, 28, 56, and 90 days cure periods for each L percentages (1%, 3%, 5%, 7%, and 9%). Resilient modulus tests were also conducted on natural clays. This test is conducted in accordance with the American Association of State Highway and Transportation Officials (AASHTO) T 307 [
48].
Figure 12 shows a stacked column diagram of UCS after the RMT dataset.
In the UCS after the RMT dataset, the cure day takes values between 0 and 90. The agent input has the properties Natural (0), FA (1), and L (2). The clay types are Esenboğa (0) and Bilkent (1). The agent percentage is between 0 and 0.25. Gs takes values between 2.5 and 2.74. The void ratio takes values between 0.92 and 1.56. No.4 takes 0 and 0.5 values. No.200 takes values between 0.792 and 0.998. The clay size takes values between 0.14 and 0.675. LL takes values between 0.72 and 1.66. PL takes values between 0.26 and 0.68. PI takes values between 0.24 and 1.24. Ls takes values between 0.16 and 0.33. SL takes values between 0.09 and 0.29. UCS after RMT (output) takes values between 141 and 1254.
Cc: Using the results of the oedometer tests, compression and expansion indices can be obtained from the slopes of the straight lines by plotting the void ratio (e) versus logarithmic stress (σ′) and tracing both the compression and expansion part of the test. Cc is the slope of the virgin compression line of the test. A high C
c value means high compressibility. The compressibility and expansion potential of highly plastic clays cause problems in foundation engineering applications, as stated before. FA stabilization effect in this context is to be evaluated below for Bilkent Clay.
Figure 13 shows the clustered column diagram of the compression index dataset.
In the Cc dataset, the cure is 0. The agent input has the properties Natural (0) and L (2). The clay types are Esenboğa (0) and Bilkent (1). The agent percentage is between 0 and 0.25. Gs takes values between 2.5 and 2.74. The void ratio takes values between 0.92 and 1.56. No.4 takes 0 and 0.5 values. No.200 takes values between 0.792 and 0.998. The clay size takes values between 0.14 and 0.675. LL takes values between 0.72 and 1.66. PL takes values between 0.26 and 0.68. PI takes values between 0.24 and 1.24. Ls takes values between 0.16 and 0.33. SL takes values between 0.09 and 0.29. UCS after RMT (output) takes values between 0.035 and 0.452.
Ce: Ce is the slope of the unloading part of the test plot. The higher the C
e value, the higher the expansion capacity of the clay.
Figure 14 shows the clustered column diagram of the expansion index dataset.
In the Ce dataset, the cure is 0. The agent input has the properties Natural (0) and FA (1). The clay types are Esenboğa (0) and Bilkent (1). The agent percentage is between 0 and 0.25. Gs takes values between 2.5 and 2.74. The void ratio takes values between 0.92 and 1.56. No.4 takes 0 and 0.5 values. No.200 takes values between 0.792 and 0.998. The clay size takes values between 0.14 and 0.675. LL takes values between 0.72 and 1.66. PL takes values between 0.26 and 0.68. PI takes values between 0.24 and 1.24. Ls takes values between 0.16 and 0.33. SL takes values between 0.09 and 0.29. UCS after RMT (output) takes values between 0.024 and 0.136.
2.3. Machine Learning (ML)
ML algorithms are basically divided into four categories: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. In supervised learning, the machine learns the outputs corresponding to the inputs from a dataset consisting of given input-output pairs. In unsupervised learning, the machine tries to make sense of a dataset itself without defining a label. Semi-supervised learning is a mix of supervised and unsupervised learning as it works on both labeled and unlabeled data. In reinforcement learning, behavior is automatically evaluated to increase the learning efficiency of the target machine [
49]. Supervised learning is divided into two subgroups, classification and regression, which are desired to be learned by the machine. If the output is discrete, it is a classification problem, while if the output is continuous, it is a regression problem. In regression problems, dissimilar to classification problems, the value to be predicted is numerical. In this study, regression, one of the supervised learning methods, was used.
Figure 15 shows a schematic drawing of supervised learning.
Missing values in the dataset were deleted as there were few. Each dataset was split into 80% training and 20% testing. The inputs are the same for all datasets and are Gs, eo, sieve analysis (+No.4, −No.200), clay size, LL, PL, PI, Ls, SL, cure day, agent, clay type, and agent percentage. The algorithms used are Bayesian Ridge, ElasticNet, Ridge, HuberRegressor, PassiveAggressiveRegressor, SGDRegressor, Lasso, KNeighborsRegressor, OrthogonalMatchingPursuit, RandomForestRegressor, BaggingRegressor, DecisionTreeRegressor, AdaBoostRegressor, StackingRegressor. The algorithms were used with their default parameters. Performance metrics (adjusted R2, R2, and RMSE) were used to measure model performance for each dataset.
In this study, different supervised ML algorithms were used for the prediction of strength, swelling, and compressibility properties, where the actual effect of the improved soil is desired to be examined on different datasets. The prediction outputs of different ML models were evaluated using adjusted R2, R2, RMSE, and taken time criterion.
2.3.1. ML Models
Regression analysis is one of the most frequently used methods for analyzing the relationship between dependent and independent variables. Regression models are models that can be used to predict continuous quantitative data. These models are applied to obtain the function that best describes the relationship between independent variables and the dependent variable. In regression analysis, the outcome variable y is predicted based on one or more values of the predictor variable x. Several different regression models were used in this study. These models and their explanations are given in
Table 3.
The soil properties were predicted using these models (
Table 3), and the predictive performance of the models was compared.
2.3.2. Model Training and Testing
In this study, the train test split method is applied to the datasets. This method is frequently used to evaluate and validate the performance of ML models. With this method, the dataset can be divided into training and test subsets, allowing the model to be learned on training data and then measured on test data. The training set contains data used in the learning process of the model, while the test set contains data used to evaluate the performance of the model. In this way, it can be determined whether the model is prone to overfitting and how well it adapts to real-world data [
67]. In this study, the dataset is divided into two parts, a training set and a test set in a ratio of 8:2. In ML studies, ratios such as 7:3 and 8:2 are generally used. In this study, the 8:2 ratio was chosen because it is widely used in the literature [
68,
69].
2.4. Performance Criteria
Measuring the prediction success of the models is an important step. By using error metrics, the trained model predictions and the actual data of the test set can be compared. The performance of the trained models was evaluated using Coefficient of Determination (R2), adjusted R2, and Root Mean Square Error (RMSE) metrics.
2.4.1. Coefficient of Determination (R2)
R
2 is the strength of the linear relationship between predicted and actual values. In Equation (1), N is the number of samples, x
actural is the actual value, x
predicted is the predicted value, and x
actual,mean is the average of actual observation values.
2.4.2. Adjusted R2
Adjusted R
2 is the percentage of variation explained by independent variables that affect only the dependent variable. Adjusted R
2 is calculated by Equation (2) [
70]. In Equation (2), N is the total sample size, and p is the number of independent variables.
2.4.3. Root Mean Square Error (RMSE)
RMSE refers to the standard deviation of predicted errors. It is calculated with Equation (3). In Equation (3), N is the number of samples, x
actural is the actual value, and x
predicted is the predicted value.
3. Results
The objective of this study is to evaluate the performance of different ML models in predicting soil behavior using eight datasets. In this section, the results obtained from the analysis are presented. First, the results will be presented, and then these data will be explained and discussed in the relevant subsections.
Figure 16 shows the results of the ML algorithms for the UCS non-soaked dataset for RMSE, adjusted R
2, R
2, and taken time metrics.
Table 4 shows the metrics of the top five models for UCS non-soaked prediction.
The results in
Table 4 compare the performance of different regression models for UCS non-soaked predictions. AdaBoostRegressor has an adjusted R
2 of 0.80, R
2 of 0.87, RMSE of 69.28, and a processing time of 10.64 s for the UCS non-soaked prediction. StackingRegressor has an adjusted R
2 of 0.85, R
2 of 0.90, RMSE of 95.22, and a processing time of 142.13 s for the UCS non-soaked prediction. BaggingRegressor has an adjusted R
2 of 0.86, R
2 of 0.91, RMSE of 59.90, and a processing time of 2.33 s for the UCS non-soaked prediction. ExtraTreesRegressor has an adjusted R
2 of 0.89, R
2 of 0.93, RMSE of 44.66, and a processing time of 13.68 s for the UCS non-soaked prediction. GradientBoostingRegressor has an adjusted R
2 of 0.91, R
2 of 0.95, RMSE of 63.54, and a processing time of 4.25 s for the UCS non-soaked prediction.
Figure 17 shows the scatter plot for UCS non-soaked prediction using GBR.
Figure 18 shows the results of the ML algorithms for the UCS soaked dataset for RMSE, adjusted R
2, R
2, and taken time metrics.
Table 5 shows the metrics of the top five models for UCS soaked prediction.
The results in
Table 5 compare the performance of different regression models for UCS soaked predictions. DecisionTreeRegressor has an adjusted R
2 of 0.88, R
2 of 0.92, RMSE of 65.94, and a processing time of 0.14 s for the UCS soaked prediction. AdaBoostRegressor has an adjusted R
2 of 0.89, R
2 of 0.93, RMSE of 75.49, and a processing time of 8.84 s for the UCS soaked prediction. StackingRegressor has an adjusted R
2 of 0.93, R
2 of 0.95, RMSE of 74.67, and a processing time of 149.46 s for the UCS soaked prediction. ExtraTreesRegressor has an adjusted R
2 of 0.96, R
2 of 0.97, RMSE of 42.10, and a processing time of 15.10 s for the UCS soaked prediction. GradientBoostingRegressor has an adjusted R
2 of 0.97, R
2 of 0.98, RMSE of 40.91, and a processing time of 3.72 s for the UCS soaked prediction.
Figure 19 shows the scatter plot for UCS soaked prediction using GBR.
Figure 20 shows the results of the ML algorithms for the swelling pressure dataset for RMSE, adjusted R
2, R
2, and taken time metrics.
Table 6 shows the metrics of the top five models for swelling prediction.
The results in
Table 6 compare the performance of different regression models for swelling pressure predictions. BaggingRegressor has an adjusted R
2 of 0.89, R
2 of 0.93, RMSE of 31.25, and a processing time of 2.52 s for the swelling pressure prediction. DecisionTreeRegressor has an adjusted R
2 of 0.90, R
2 of 0.94, RMSE of 31.07, and a processing time of 0.16 s for the swelling pressure prediction. GradientBoostingRegressor has an adjusted R
2 of 0.92, R
2 of 0.948, RMSE of 34.92, and a processing time of 7.48 s for the swelling pressure prediction. AdaBoostRegressor has an adjusted R
2 of 0.95, R
2 of 0.968, RMSE of 24, and a processing time of 11.31 s for the swelling pressure prediction. ExtraTreesRegressor has an adjusted R
2 of 0.96, R
2 of 0.97, RMSE of 24.23, and a processing time of 18.15 s for the swelling pressure prediction.
Figure 21 shows the scatter plot for swelling pressure prediction using ExtraTreesRegressor.
Swelling Percentage Prediction:
Figure 22 shows the results of the ML algorithms for the swelling percentage dataset for RMSE, adjusted R
2, R
2, and taken time metrics.
Table 7 shows the metrics of the top five models for swelling percentage prediction.
The results in
Table 7 compare the performance of different regression models for swelling percentage predictions. RandomForestRegressor has an adjusted R
2 of 0.90, R
2 of 0.94, RMSE of 2.89, and a processing time of 21.27 s in the swelling percentage prediction. HuberRegressor has an adjusted R
2 of 0.92, R
2 of 0.949, RMSE of 3.62, and a processing time of 1.71 s in the swelling percentage prediction. GradientBoostingRegressor has an adjusted R
2 of 0.93, R
2 of 0.95, RMSE of 3.10, and a processing time of 3.65 s in the swelling percentage prediction. DecisionTreeRegressor has an adjusted R
2 of 0.94, R
2 of 0.963, RMSE of 2.58, and a processing time of 0.20 s in the swelling percentage prediction. ExtraTreesRegressor has an adjusted R
2 of 0.95, R
2 of 0.965, RMSE of 2.32, and a processing time of 29.40 s in the swelling percentage prediction.
Figure 23 shows the scatter plot for swelling percentage prediction using ExtraTreesRegressor.
MRM Prediction:
Figure 24 shows the results of the ML algorithms for the mean resilient modulus dataset for RMSE, adjusted R
2, R
2, and taken time metrics.
Table 8 shows the metrics of the top five models for mean resilient modulus prediction.
The results in
Table 8 compare the performance of different regression models for mean resilient modulus predictions. RandomForestRegressor has an adjusted R
2 of 0.93, R
2 of 0.95, RMSE of 12,433.75, and a processing time of 20.01 s in the mean resilient modulus prediction. ExtraTreesRegressor has an adjusted R
2 of 0.95, R
2 of 0.970, RMSE of 10,812.27, and a processing time of 13.91 s in the mean resilient modulus prediction. DecisionTreeRegressor has an adjusted R
2 of 0.960, R
2 of 0.971, RMSE of 10,552.66, and a processing time of 0.19 s in the mean resilient modulus prediction. StackingRegressor has an adjusted R
2 of 0.962, R
2 of 0.972, RMSE of 13,733.78, and a processing time of 170.84 s in the mean resilient modulus prediction. GradientBoostingRegressor has an adjusted R
2 of 0.98, R
2 of 0.98, RMSE of 10,827.86, and a processing time of 3.64 s in the mean resilient modulus prediction.
Figure 25 shows the scatter plot for mean resilient modulus prediction using GradientBoostingRegressor.
Figure 26 shows the results of the ML algorithms for the UCS After RMT dataset for RMSE, adjusted R
2, R
2, and taken time metrics.
Table 9 shows the metrics of the top five models for UCS after RMT prediction.
The results in
Table 9 compare the performance of different regression models for UCS after RMT predictions. GradientBoostingRegressor has an adjusted R
2 of 0.962, R
2 of 0.96, RMSE of 31.04, and a processing time of 4.86 s in the UCS after RMT prediction. RandomForestRegressor has an adjusted R
2 of 0.968, R
2 of 0.973, RMSE of 29.08, and a processing time of 27.17 s in the UCS after RMT prediction. BaggingRegressor has an adjusted R
2 of 0.969, R
2 of 0.974, RMSE of 27.56, and a processing time of 4.17 s in the UCS after RMT prediction. ExtraTreesRegressor has an adjusted R
2 of 0.97, R
2 of 0.978, RMSE of 29.78, and a processing time of 16.25 s in the UCS after RMT prediction. DecisionTreeRegressor has an adjusted R
2 of 0.99, R
2 of 0.99, RMSE of 6.04, and a processing time of 0.15 s in the UCS after RMT prediction.
Figure 27 shows the scatter plot for UCS after RMT prediction using DecisionTreeRegressor.
Figure 28 shows the results of the ML algorithms for the C
c dataset for RMSE, adjusted R
2, R
2, and taken time metrics.
Table 10 shows the metrics of the top five models for UCS after RMT prediction.
The results in
Table 10 compare the performance of different regression models for C
c predictions. HuberRegressor has an adjusted R
2 of 0.948, R
2 of 0.968, RMSE of 0.03, and a processing time of 1.29 s in the C
c prediction. StackingRegressor has an adjusted R
2 of 0.949, R
2 of 0.969, RMSE of 0.02, and a processing time of 134.77 s in the C
c prediction. Ridge has an adjusted R
2 of 0.96, R
2 of 0.977, RMSE of 0.01, and a processing time of 0.19 s in the C
c prediction. DecisionTreeRegressor has an adjusted R
2 of 0.97, R
2 of 0.978, RMSE of 0.02, and a processing time of 0.17 s in the C
c prediction. AdaBoostRegressor has an adjusted R
2 of 0.99, R
2 of 0.99, RMSE of 0.008, and a processing time of 9.62 s in the C
c prediction.
Figure 29 shows the scatter plot for C
c prediction using AdaBoostRegressor.
Figure 30 shows the results of the ML algorithms for the C
e dataset for RMSE, adjusted R
2, R
2, and taken time metrics.
Table 11 shows the metrics of the top five models for UCS after RMT prediction.
The results in
Table 11 compare the performance of different regression models for C
e predictions. BaggingRegressor has an adjusted R
2 of 0.6, R
2 of 0.96, RMSE of 0.0046, and a processing time of 2.24 s in the C
e prediction. NuSVR has an adjusted R
2 of 0.79, R
2 of 0.974, RMSE of 0.0044, and a processing time of 0.16 s in the C
e prediction. ExtraTreesRegressor has an adjusted R
2 of 0.86, R
2 of 0.975, RMSE of 0.0049, and a processing time of 12.79 s in the C
e prediction. Bayesian Ridge has an adjusted R
2 of 0.90, R
2 of 0.976, RMSE of 0.005, and a processing time of 0.23 s in the C
e prediction. GradientBoostingRegressor has an adjusted R
2 of 0.95, R
2 of 0.98, RMSE of 0.0046, and a processing time of 4.19 s in the C
e prediction.
Figure 31 shows the scatter plot for C
E prediction using GradientBoostingRegressor.
SHapley Additive Explanations (SHAP)
The difficulty of interpreting ML models can limit the practicality of ML. Explainable approaches are required for the interpretation of ML models. One of these approaches is SHapley Additive Explanations (SHAP). SHAP was proposed by Lundber and Lee [
71] to improve the explainability of models. SHAP can explain the prediction of any observation by calculating the contribution of each variable to the prediction and makes models explainable by showing the contribution of each input to the model output [
71]. In this study, SHAP bar plots were created using the XGBoost model trained on the datasets. The SHAP bar plot quantifies the contribution of each attribute to the predicted output. The input with higher average absolute SHAP values is more influential on the forecast. In Esenboğa and Bilkent, although the datasets have the same inputs since the values and contents of the inputs are different, the inputs affect the model predictions differently, and the SHAP values may also be different.
Figure 32 shows the SHAP bar plot for UCS non-soaked prediction.
As can be seen in
Figure 32, the PL has the greatest impact on predicting the output. After that, LL, clay type, and agent percentage are more effective than the other inputs for the prediction of UCS non-soaked output. In
Figure 32, +No.4 is the percent remaining on the No.4 sieve. −No.200 is the percent passing No. 200 sieve. Fine-grained soil is the soil, more than half of which passes below sieve 200 (−#200).
Figure 33 shows the SHAP bar plot for the UCS soaked prediction.
As can be seen in
Figure 33, the PL has the greatest impact on predicting the output. After that, LL, clay type, and agent percentage are more effective than the other inputs for the prediction of the UCS soaked output.
Figure 34 shows a SHAP bar plot for swelling pressure prediction.
As can be seen in
Figure 34, the PL has the greatest impact on predicting the output. After that, LL, agent percentage, and clay type are more effective than the other inputs for the prediction of the swelling pressure output.
Figure 35 shows the SHAP bar plot for the swelling percentage prediction.
As can be seen in
Figure 35, the PL has the greatest impact on predicting the output. After that, LL, clay type, and agent percentage are more effective than the other inputs for the prediction of the swelling percentage output.
Figure 36 shows the SHAP bar for the mean resilient modulus prediction.
As can be seen in
Figure 36, the clay type has the greatest impact on predicting the output. After that, plastic limit (PL), agent percentage, and LL are more effective than the other inputs for the prediction of mean resilient modulus output.
Figure 37 shows the SHAP bar plot for the UCS after RMT prediction.
As can be seen in
Figure 37, the clay type has the greatest impact on predicting the output. After that, agent percentage, PL, and liquid limit (LL is more effective than the other inputs for the prediction of UCS after RMT output.
Figure 38 shows the SHAP bar plot for the C
c prediction.
As can be seen in
Figure 38, the passing no. 200 sieve (−No.200) has the greatest impact on predicting the output. After that, clay size, LL, and PL are more effective than the other inputs for the prediction of C
c output.
Figure 39 shows the SHAP bar plot for the C
e prediction.
As can be seen in
Figure 39, the passing no. 200 sieve (−No.200) has the greatest impact on predicting the output. After that, the plasticity index (PI), PL, and LL are more effective than the other inputs for the prediction of the C
e output.
4. Discussion
In this study, a series of ML algorithms were used to predict the strength, swelling, and compressibility properties of the stabilized soil using soil engineering properties. The inputs are specific gravity, void ratio, sieve analysis (+No.4, −No.200, clay size), Atterberg Limits (LL, plastic limit (PL), plasticity index (PI), linear shrinkage (Ls), shrinkage limit (SL)), and the outputs are unconfined compression strength (UCS) non-soaked, UCS soaked, swelling pressure, swelling percentage, MRM, UCS after RMT, Cc and Ce. 8 different datasets were used in this study.
As shown in
Table 4 and
Figure 16, the highest R
2 values for UCS non-soaked prediction are obtained by AdaBoostRegressor (R
2 = 0.87), StackingRegressor (R
2 = 0.90), BaggingRegressor (R
2 = 0.91), ExtraTreesRegressor (R
2 = 0.93) and GradientBoostingRegressor (R
2 = 0.95) models. The RMSE values showing the error rates of these models were calculated as 69.28, 95.22, 59.90, 44.66, and 63.54, respectively. As the R
2 value of the models increases, the error (RMSE) decreases. GradientBoostingRegressor is the most successful model.
The highest R2 values for UCS soaked prediction belong to DecisionTreeRegressor (R2 = 0.88), AdaBoostRegressor (R2 = 0.89), StackingRegressor (R2 = 0.93), ExtraTreesRegressor (R2 = 0.96) and GradientBoostingRegressor (R2 = 0.97) models. The RMSE values of these models were calculated as 65.94, 75.49, 74.67, 42.10, and 40.91. GradientBoostingRegressor, which gives the lowest error, is the most successful model for UCS soaked prediction.
The highest R2 values for swelling pressure prediction were obtained by BaggingRegressor (R2 = 0.89), DecisionTreeRegressor (R2 = 0.90), GradientBoostingRegressor t (R2 = 0.92), AdaBoostRegressor (R2 = 0.95) and ExtraTreesRegressor (R2 = 0.96) models. The RMSE values of these models are 31.25, 31.07, 34.92, 24, and 38.36, respectively. ExtraTreesRegressor gave the best results for swelling pressure prediction with both accuracy and low error rate.
The most successful models for swelling percentage prediction were RandomForestRegressor (R2 = 0.94), HuberRegressor (R2 = 0.949), GradientBoostingRegressor (R2 = 0.95), DecisionTreeRegressor (R2 = 0.963) and ExtraTreesRegressor (R2 = 0.965). RMSE values were 2.89, 3.62, 3.10, 2.58, and 2.32, respectively. ExtraTreesRegressor performed the best for this prediction.
The highest R2 values for mean modulus of resilience prediction belong to RandomForestRegressor (R2 = 0.95), ExtraTreesRegressor (R2 = 0.970), DecisionTreeRegressor (R2 = 0.971), StackingRegressor (R2 = 0.972), and GradientBoostingRegressor (R2 = 0.98) models. The RMSE values of these models are calculated as 12,433.75, 10,812.27, 10,552.66, 13,733.78, and 10,827.86, respectively. GradientBoostingRegressor is the most successful model for the mean modulus of resilience prediction.
The best models for UCS after RMT prediction were GradientBoostingRegressor (R2 = 0.96), RandomForestRegressor (R2 = 0.973), BaggingRegressor (R2 = 0.974), ExtraTreesRegressor (R2 = 0.978) and DecisionTreeRegressor (R2 = 0.99). RMSE values were 31.04, 29.08, 27.56, 29.78, and 6.04, respectively. DecisionTreeRegressor gave the best results for UCS after RMT prediction.
The highest R2 values for Cc prediction belong to HuberRegressor (R2 = 0.968), StackingRegressor (R2 = 0.969), Ridge (R2 = 0.977), DecisionTreeRegressor (R2 = 0.978) and AdaBoostRegressor (R2 = 0.99) models. The RMSE values are 0.03, 0.02, 0.01, 0.02, and 0.008, respectively. AdaBoostRegressor provided the best results for this prediction with its short processing time, lowest error, and high accuracy.
Finally, the best models for the prediction of the Ceare BaggingRegressor (R2 = 0.96), NuSVR (R2 = 0.974), ExtraTreesRegressor (R2 = 0.975), Bayesian Ridge (R2 = 0.976) and GradientBoostingRegressor (R2 = 0.98). RMSE values of all models were calculated as 0.005 or less. GradientBoostingRegressor stood out as the best model.
Stacking has the longest processing time among the models. Not every algorithm performed at its best for every output. An ML model that performed very well for one output performed poorly for another output. This was due to the content and complexity of the datasets. Boosting algorithms used in this study generally showed higher prediction success for all outputs because they strengthen weak learners with various methods and reduce the risk of overfitting. When the results are analyzed, it is seen that ensemble methods generally have the highest prediction success. Based on all these results, it can be said that ensemble methods give very good results in the prediction of static and cyclic properties of the soil.
Table 12 is the comparison with the existing stabilized soil properties prediction models. As can be seen in
Table 12, in this study, different from other studies, static and dynamic properties of the soil were performed using several ML models.
5. Conclusions
Soils may not always be suitable to fulfill their functions. In such cases, compacted fill can be used, or the soil can be improved. Soil improvement can be accomplished by mechanical or chemical methods. One of the commonly used methods is compaction, but this method is not used especially for the improvement of fine-grained cohesive soils. In this case, the soil can be improved by chemical methods. In this study, two natural clay samples with extreme and very high plasticity were improved by using L and FA admixtures and their properties under static and repeated loads were investigated by ML methods. In this study, data from the PhD thesis of one of the authors titled “Static and cyclic properties of expansive clays treated with L and FA with special reference to swelling and resilient moduli” were used. Two soil samples from two different sites in Ankara, Esenboğa Clay (an extremely high plasticity clay (PI = 101%)) and Bilkent Clay (a very high plasticity clay (PI = 76%)) were analyzed in detail. The soil samples were subjected to many tests. Where the actual effect of the improved material is desired (non-soaked and soaked conditions, swell percentage, swell pressure, mean resilient modulus) obtained from the doctoral thesis are the output and engineering index properties of the material (are the inputs, prediction is attempted with different ML techniques. Different ML techniques are used for regression. SHapley Additive Explanations (SHAP) were used to explain the ML models. R2, adjusted R2, RMSE and process time used for evaluation.
The performance evaluation of the models used in this study showed very high accuracy rates in the prediction of UCS and other outputs. In UCS non-soaked prediction, GradientBoostingRegressor was the most successful model with an adjusted R2 of 0.91, R2 of 0.95, and RMSE of 63.54. Similarly, for UCS soaked prediction, GradientBoostingRegressor provided the best results with an adjusted R2 of 0.97, R2 of 0.98, and RMSE of 40.91.
For swelling pressure prediction, ExtraTreesRegressor was the most successful model with an adjusted R2 of 0.96, R2 of 0.97, and RMSE of 24.23. For the swelling percentage prediction, ExtraTreesRegressor again performed the best with an adjusted R2 of 0.95, R2 of 0.965, and RMSE of 2.32.
For the mean resilient modulus prediction, the most successful model was GradientBoostingRegressor, with an adjusted R2 of 0.98, R2 of 0.98, and RMSE of 10,827.86. As for the UCS after RMT prediction, DecisionTeeRegressor stood out with an adjusted R2 of 0.99, R2 of 0.99, and an RMSE of 6.04.
For the Cc estimation, AdaBoostRegressor gave the best results with an adjusted R2 of 0.99, R2 of 0.99, and RMSE of 0.008. Finally, GradientBoostingRegressor was found to be the most successful model for predicting the Cewith an adjusted R2 of 0.95, R2 of 0.98, and RMSE of 0.0046.
These results clearly show that the used ML models can predict both static and cyclic properties of stabilized clays with high accuracy (R2 > 0.99). The models stand out as effective tools for modeling soil behavior with fast processing times and low error rates.
Successful prediction of UCS non-soaked, UCS soaked, swelling pressure, swelling percentage, MRM, UCS after RMT, Cc, and Ceoutputs can contribute to the design of structures to be built in these soils, stability analysis, modeling of soil behavior, time and cost savings in projects.
In future studies, the size of the dataset can be increased, and faster results can be obtained with different hyperparameter optimization techniques.