1. Introduction
The tea plant (
Camellia sinensis (L.) O. Kuntze) is one of the world’s most popular and well-known non-alcoholic beverages. This plant, grown in more than 30 countries, including China, India, Kenya, Sri Lanka, Vietnam, Japan, and Türkiye, is native to Southeast Asia and is a member of the Theaceae family of the order Ericales [
1]. The natural height of
C. sinensis, an evergreen perennial plant, can reach up to 15 m. It produces white flowers and is characterized by cross-pollination [
2]. This plant’s immature shoot tips, rich in chemical compounds, are used in tea processing. Tea has been documented and provenanced as early as the 6th and 7th centuries BCE. It was reportedly first used medicinally, especially to cure stomach problems. However, insufficient conclusive information exists in the literature about turning tea into a drink [
3]. Based on the oxidation and fermentation processes, consumers are offered various teas, including oolong, black, green, and white. Oolong is semi-fermented, black tea is fermented, and white and green teas are unfermented. While oolong and green tea are the most popular teas in Asia, black tea is more commonly drank in India and other European nations [
4,
5].
Around 4000 bioactive compounds have been found in tea, with polyphenols accounting for one-third of the total. Aside from polyphenols, tea also has small quantities of minerals, amino acids, carbohydrates, and caffeine. Remarkably, polyphenols can differ according to the degree of fermentation used [
6,
7]. Furthermore, tea has been reported to include mg/g levels of the elements Na
+, Mg
2+, Ca
2+, Mn
2+, and K
+ and µg/g levels of Zn
2+, Ni
2+, Cr
3+, Cd
2+, Fe
2+, Cu
2+, and Co
2+ [
8]. In addition to its antioxidant and anticancer qualities, tea has been shown to have numerous other health advantages, including the ability to repair DNA damage. Tea can lower the incidence of the mouth, stomach, esophagus, bladder, breast, small intestine, pancreas, and lung cancers [
9,
10]. There is now more interest in this plant because of its many health benefits and its antifungal and antibacterial activity [
4,
11]. The production of tea plants has been rising annually in parallel with consumption, as they are both commercially valuable and health promoting. China leads the world in tea leaf production overall, accounting for 48.87% of global production in 2022 according to the FAO data for 2023. India (20%), Kenya (7.8%), Sri Lanka (4.7%), and Turkey (4.3%) produce the highest percentages after China [
12].
Traditionally, cuttings and seeds are used to propagate the tea plant. Until the early 1800s, the only propagation mode was through seeds. On the other hand, genetic changes arising during seed propagation have led to notable plant variances. As a result, cuttings are employed in different ways. However, there have also been documented issues with cutting propagation. The most remarkable is how long a single plant takes (approximately 12 months) to grow from a cutting and be prepared for planting [
13,
14]. As a result, increasing the number of plants for the commercial market and propagating tea clonally have become complex tasks [
15]. In vitro tissue culture technologies promise to address these issues and facilitate tea plants’ rapid and high-quality commercial propagation.
Machine learning (ML) modeling has gained popularity in plant biotechnology. However, there is a paucity of supportive data, particularly on in vitro plant tissue culture. Various machine learning algorithms have been applied to optimize diverse aspects such as in vitro sterilization, in vitro germination, in vitro cell culture, in vitro micropropagation and rooting, in vitro drought stress, and in vitro somatic embryogenesis [
16,
17,
18,
19,
20,
21,
22].
Artificial neural networks (ANNs), like the human brain, are constructed from interconnected processing neurons that can decipher information through weighted communication links [
23]. These networks are utilized to statistically establish relationships between independent (input) and dependent (output) variables, facilitating the analysis of information exchange between them.
The application of conventional statistical approaches for decrypting the vast amount of encrypted data within extensive datasets of biological interactions, such as micropropagation, poses significant challenges attributed to intricate, noisy, and misleading datasets entailing multifactorial processes [
24]. Nevertheless, recent advancements have achieved accuracy, prediction, and optimization in plant tissue culture procedures by utilizing various machine learning models. These approaches employ neural networks [
25,
26] and machine learning models based on decision trees [
27].
ML has emerged as a pivotal asset in plant science research, notably within in vitro plant studies, enhancing the breadth and depth of investigations in plant breeding, genomics, and biotechnology [
28]. These innovative techniques have facilitated the analysis of bioelectrical patterns in plants, revealing their acute sensitivity to environmental changes [
29]. Furthermore, ML has significantly propelled plant molecular research by capitalizing on the vast sequencing data [
30]. The application of ML algorithms in plant omics and the enhancement of agronomic traits has shown promise in amplifying research outcomes [
31]. Additionally, modeling plant behavior through data from plant sensors has yielded a more nuanced comprehension of plant performance over time [
32]. Deep learning, a subset of machine learning, has been instrumental in plant disease detection, phenotyping, and image analysis, marking a significant advancement in the field [
33]. A focused area of ML application includes plant disease detection, classification, and prediction, underscoring its transformative potential in plant health management strategies [
34,
35,
36,
37]. Employing ML algorithms to differentiate between various plant stress phenotypes and identifying disease indicators through hyperspectral imaging has showcased the technology’s nuanced capability [
38,
39].
Furthermore, ML’s role in predicting the effects of electromagnetic radiation on plants and developing models to assess plant sensitivity to radiofrequency electromagnetic fields has been pivotal [
40]. The synergy of machine learning with cutting-edge technologies like hyperspectral imaging, LIDAR sensors, and uncrewed aerial vehicles (UAVs) has unveiled new avenues for plant species classification, canopy density mapping, and health assessment [
41]. Moreover, ML’s application in automating weed identification and enhancing precision agriculture showcases its utility in generating novel insights [
42,
43]. In vitro plant research, particularly when integrated with machine learning techniques, enables the detailed analysis and modeling of plant responses under controlled conditions that simulate various stress factors, including pathogenic and environmental stresses. We can analyze large datasets from in vitro experiments using machine learning algorithms to identify patterns and biomarkers indicative of disease susceptibility or resilience to environmental stressors. This can be extrapolated to real-world applications by developing predictive models that aid in early disease detection and assessing the impact of environmental stresses on plant health. For instance, machine learning models can be trained using data from in vitro studies where plants are exposed to specific pathogens or stress conditions. These models can then predict disease progression or stress responses based on early indicators detected in plant tissues, which are often subtle and complex [
44]. Researchers can unlock profound insights into plant physiology, behavior, and health by leveraging ML algorithms, significantly advancing agricultural and plant science endeavors.
The cornerstone of this investigation is to pioneer the micropropagation of the commercially paramount tea plant, C. sinensis, a crop witnessing a global surge in production. This research ventures into uncharted territories by integrating machine learning (ML) and artificial neural networks (ANNs) into plant tissue culture, marking a novel convergence of biotechnology and computational intelligence aimed at deciphering and enhancing the intricacies of plant growth under in vitro conditions. Unique in its approach, the study not only harnesses the predictive power of ML and ANNs to navigate the complex dynamics of micropropagation but also signifies the inaugural application of these advanced computational models within tea plant research. Through meticulous biotechnological scrutiny and the adept application of ML models, we aim to transcend traditional analytical methods, offering a pathway to demystify and foresee the intricate interactions that govern plant tissue culture. This innovative integration of biotechnology and machine learning enhances the precision and efficiency of tea plant micropropagation and significantly impacts commercial cultivation practices. By producing genetically uniform and vigorous plantlets faster, this approach promises to elevate commercial tea cultivation, ensuring high-quality crop production and meeting the increasing demands of global markets.
2. Materials and Methods
2.1. Plant Material
Fresh shoots from 2–3-year-old tea plants (Camellia sinensis) were sourced from a commercial tea producer in Ardeşen, Rize, Turkey. These shoots served as explants for subsequent in vitro culture.
2.2. Sterilization and Culture Conditions
Explants were rinsed with tap water for 3–5 min to remove surface debris. Subsequently, they were sterilized by immersion in a 70% ethanol solution for two minutes within a sterile laminar airflow cabinet. Three washes with sterile distilled water followed this. Explants were further treated with a 3% sodium hypochlorite solution for three minutes and washed three times with sterile distilled water to ensure complete sterilization. Following this, explants were transferred to culture media.
2.3. In Vitro Micropropagation
The micropropagation of plantlets was performed using two different culture media with different PGRs and a PGR-free medium as the control. The first medium (SM1) was comprised of MS medium (Duchefa Biochemie, Haarlem, The Netherlands) supplemented with 30 g/L sucrose, 0.1 g/L myo-inositol, 1.0 mg/L BAP (6-Benzylaminopurine) (Duchefa Biochemie, Haarlem, The Netherlands), 0.5 mg/L GA3 (Gibberellic Acid) (Duchefa Biochemie, Haarlem, The Netherlands), 0.1 mg/L Thiamine HCl, and 2 mg/L Glycine. The second medium (SM2) was similarly composed of MS medium supplemented with 30 g/L sucrose, 0.1 g/L myo-inositol, 2.0 mg/L BAP, 0.5 mg/L GA3, 0.1 mg/L Thiamine HCl, and 2 mg/L Glycine. Agar (7.5 g/L) was added to solidify both media, and their pH was adjusted to 5.8 using 1 N HCl and 1 N NaOH. The media were then sterilized by autoclaving at 121 °C and 103.4 kPa for 20 min before being dispensed into sterilized disposable plastic containers and glass tubes. The sterilized plantlets were subsequently transferred to these media for micropropagation. Every 4 weeks, the plantlets were subcultured into fresh media three times. The growth chambers were used to carry out the incubation under a 16/8 h photoperiod at 25 ± 2 °C. The photosynthetically active radiation (PAR) was set to an optimal level of 100 µmol/m2/s to provide adequate light for photosynthesis while preventing photoinhibition. The relative humidity within the culture environment was maintained at an optimal 75% to facilitate efficient physiological processes without risking tissue desiccation. The multiplication coefficient, which measures the propagation efficiency over the subculture period, was calculated by subtracting the initial count of plantlets at the beginning of the subculture cycle from the total number of plantlets obtained at the end and then dividing the result by the initial count. This calculation provides a quantitative measure of the proliferative capacity of the plantlets under the given culture conditions.
2.4. In Vitro Rooting
Root induction was performed using modified MS media. Media 1 (RM1) contained 30 g/L sucrose, 0.1 g/L myo-inositol, 0.5 mg/L indole-3-butyric acid (IBA) (Duchefa Biochemie, Haarlem, The Netherlands), 0.5 mg/L GA3, 0.1 mg/L thiamine HCl, and 2 mg/L glycine. Media 2 (RM2) had a higher concentration of IBA at 2 mg/L and media without PGRs was used as a control. Media were prepared and sterilized as described for micropropagation. Rooting was initiated in growth chambers under conditions identical to the micropropagation conditions.
2.5. Statistical Analysis
In this study, the analysis of variance (ANOVA) was employed using R-programming software (version 4.3.1) to conduct a comprehensive statistical evaluation, aiming to explore the differences across various shoot and root mediums. ANOVA, a robust statistical method, was utilized to analyze the mean values of specific traits in the shoot and root mediums. This approach allowed for an in-depth examination of the variations attributable to the experimental conditions. Exhibiting significant changes were subjected to the least significant difference (LSD) test. The analysis incorporated three replications, each containing 30 plantlets.
2.6. Modeling Procedure
This study used two decision-tree-based machine learning (ML) algorithms, random forest (RF) and extreme gradient boosting (XGBoost), along with an artificial neural network (ANN) based multilayer perceptron (MLP), to model and predict the efficiency of tea micropropagation and rooting through various plant growth regulators. Using a 10-fold cross-validation method, the dataset was split into training and testing subsets to thoroughly evaluate the prediction capabilities of ANN and decision-tree-based machine learning algorithms.
The two distinct PGR mediums were included in the input variables. Conversely, the micropropagation rate, plant height (cm), leaf number, number of roots, and length of roots (cm) were the target (output) variables. Coding was implemented using R-programming (version 4.3.1), with the usage of the Caret and Kernlab packages.
These metrics thoroughly evaluate how well the ANN-based MLP and decision-tree-based RF and XGBoost performed in simulating and forecasting the micropropagation and rooting effectiveness of tea using plant growth regulators. Metrics like the coefficient of determination (R
2), which evaluates the correlation between the model and dependent variable, root mean square error (RMSE), which shows the alignment of the regression line with the observed data points, and mean absolute error (MAE), which determines the average error between observed and predicted values, were employed for the evaluation and comparison of the effectiveness and accuracy of the models. Equations (1)–(3).
= actual value, = Predicted value, = mean of the actual values, and n = sample count.
MLP is organized into several levels, including an input layer, an output layer, and one or more hidden layers. Using both input and output variables as part of the training set, MLP uses a supervised training technique. The training procedure continues until the minimization of Equation (4). [
20]
As a gradient-boosting decision trees class member, XGBoost is well-known for its exceptional speed and performance. Within a gradient-boosting architecture, XGBoost excels at learning from errors and progressively reducing the error rate over multiple iterations.
Equation (6) shows the XGBoost iterative model, and Equation (5) shows the XGBoost objective function.
In Equation (5), F is the function space of the tree model and fd represents an independent tree structure that classifies each individual i into one leaf. In Equation (6), l is the convex loss function; it measures the difference between the observed y and predicted .
Breiman [
45] created the RF approach, effectively an ensemble of unpruned trees. The RF approach is well known for its effectiveness and simplicity of usage, and it has proven successful in both regression and classification problems. The trained tree makes the final decision in the RF technique, which uses bagging, also referred to as bootstrap aggregation. The basic idea of the RF model is shown in Equation (7).
3. Results
In this study, we investigate how different shooting and rooting conditions affect important growth parameters during the micropropagation of tea. The multiplication coefficient, number of leaves, plant height (cm), number of roots, and root length (cm) are important metrics that provide information on the effectiveness of various media compositions. Using a careful examination, we reveal the complex relationship between growth results and media compositions, giving insight into the elements that propel productive tea plant propagation.
Table 1 presents the findings from a study examining the impact of various plant growth regulators (PGRs) on micropropagation traits. The experiment was structured into two segments, shooting traits and rooting traits, assessing the efficacy of different concentrations and combinations of PGRs.
In the examination of shooting traits, the study evaluated three distinct treatments: a control group without PGRs, a treatment group receiving 1 mg/L BAP combined with 0.5 mg/L GA3, and another group treated with a higher concentration of 2 mg/L BAP and 0.5 mg/L GA3. The results demonstrated a clear dose–response relationship, with increasing concentrations of BAP significantly enhancing both the multiplication coefficient and the number of leaves. Specifically, the highest concentration of BAP (2 mg/L) combined with 0.5 mg/L GA3 yielded the greatest multiplication coefficient of 1.74 and the highest number of leaves, recorded at 12.68. A similar trend was evident for plant height, with the tallest plants, measuring 9.96 in height, observed in the treatment with the highest concentration of BAP and GA3. The statistical significance of these findings was robustly supported by an LSD value of 0.1, with a p-value less than 0.0001 confirming significant differences across the treatment groups. These differences were denoted by alphabetical notations, with ‘a’ representing the highest means and ‘c’ the lowest, illustrating the effectiveness of the PGR treatments in enhancing plant growth metrics.
In the rooting traits section, the focus shifted to the effects of IBA combined with GA3. Like the shooting traits, increasing the concentration of IBA from 0 mg/L in the control up to 2 mg/L significantly improved the number and length of roots from averages of 0.18–1.64 and 0.13–1.56, respectively. This positive correlation suggests that IBA is particularly effective in promoting root development, a critical aspect of plant establishment in tissue culture.
Machine Learning Analysis
Table 2 thoroughly analyzes the predictive performance of three machine learning models, RF, XGBoost, and MLP, regarding micropropagation and rooting parameters. The utilized evaluation measures, specifically R
2, RMSE, and MAE, are essential for determining how well each model captures the nuances of the examined plant growth processes.
The MLP model demonstrates excellent predictive accuracy, as indicated by high R2 values, particularly for the number of leaves and roots, which are both at 0.93. This suggests that MLP is particularly effective in capturing the underlying patterns for these traits. The parameters’ low RMSE and MAE values further affirm the model’s precision, indicating minimal deviations from the actual values.
While the RF model shows slightly lower R2 values compared to MLP, it still maintains robust predictive capabilities, especially for rooting parameters, where it matches the high R2 values noted in XGBoost. The RF model exhibits slightly higher RMSE values, yet it achieves very low MAE values for rooting parameters, highlighting its efficiency in average error minimization.
XGBoost shows a strong performance, particularly in rooting parameters, with R2 values peaking at 0.93 for the number of roots and 0.92 for root length. These high values reflect XGBoost’s capability to handle complex data interactions, likely prevalent in root development. The RMSE and MAE values are competitive and consistent, suggesting that XGBoost effectively balances accuracy and error minimization.
Overall, all three models exhibit high consistency in predictive accuracy across the measured parameters, with particular strengths noted in specific areas. The results underscore the potential of these models in enhancing predictive modeling applications in plant science, providing valuable insights that could help optimize micropropagation practices and improve understanding of plant growth dynamics under controlled conditions. Each model has shown specific strengths that make them suitable for different aspects of predictive modeling in plant science, highlighting the importance of model selection based on the nature of the data and the specific traits being analyzed. However, overall, the XGBoost model showed the best performance for several reasons: consistently high R2 values, balanced error metrics, and performance in complex data patterns. While the MLP and RF models also performed well, particularly MLP with very high R2 values in some parameters, XGBoost’s consistent performance across all metrics and parameters slightly edges out the others, making it the best model in this analysis for predicting micropropagation and rooting parameters. This makes XGBoost particularly valuable for complex and varied datasets commonly encountered in plant science research.
4. Discussion
In our study, we explore the impact of different shooting and rooting conditions on crucial growth parameters during the micropropagation of tea. This examination delves into how media compositions influence the multiplication coefficient, number of leaves, plant height, number of roots, and root length—vital metrics that shed light on the effectiveness of various media compositions. We have uncovered the nuanced relationship between growth outcomes and media compositions through meticulous analysis, providing valuable insights into the factors that drive successful tea plant propagation. Our results demonstrate significant differentiation in growth metrics across two distinct media conditions, SM1&RM1 and SM2&RM2. This differentiation was evidenced by the mean values and corresponding
p-values derived from an ANOVA, followed by post hoc comparisons using the LSD test. These findings indicate the pronounced influence of media environments on various growth metrics, with the composition of the media playing a crucial role in determining plant growth and development. Like many other plants, the tissue culture of tea has a rich history that spans several years. This research aims to identify a more practical process for the commercial production of tea and establish a protocol that facilitates the production of many plants per unit of area and time. Despite potential variations due to the plant material studied, the source of the explant, or laboratory conditions, reporting on micropropagation is invaluable for such a significant plant. Our study preferred chemicals that are easily accessible and commonly found in many countries. Growth regulators such as BAP for micropropagation and IBA for in vitro rooting were selected based on their widespread availability and effectiveness. Similar studies corroborate our findings, highlighting the responsiveness of tea plant nodal explants to BAP and IAA in specific media compositions, leading to an increase in multiple shoots per explant. For instance, a study reported that nodal explants of a tea clone exhibited better responses and produced 12.6 ± 1.9 multiple shoots per explant when supported with BAP (4 mg/L) and IAA (0.2 mg/L) in ½ MS medium [
46]. Bag et al. [
47] conducted practices to enhance the acclimatization success of tea, noting the effectiveness of waiting for 12 weeks for the root induction of micropropagated plantlets and the positive influence of using larger bottles (500 mL) instead of 250 mL containers on growth, stem thickness, leaf number, and leaf area.
Further, Sun et al. [
48] utilized plant growth regulators for callus induction and subsequent plant regeneration in different explants of an elite tea (
C. sinensis) clone. The combination of Tidiazuron (TDZ), BAP, and IBA was used, with varying degrees of success depending on the explant and growth regulator concentrations. Similarly, in another study, TDZ was found to induce in vitro shoot proliferation in tea seedlings at different and lower concentrations and to induce callus formation at higher concentrations [
49]. Sarathchandra et al. [
50] investigated somatic embryogenesis in callus culture studies using tea plants’ young leaves and nodal explants, observing callus formation and embryo-like structures under specific conditions. Samarina et al. [
51] showed genetic stability and variability among micropropagated tea plantlets over a long-term period. The morphological variability and instances of aneuploidy in their findings complement our results, where different media compositions led to variable growth outcomes. These results underline the importance of selecting and stabilizing media conditions to minimize genetic and morphological variability in micropropagation. Additionally, the impact of plant growth regulators on embryogenesis in tea reported by Wachira and Ogada [
52] correlates with our observations on how these regulators influence different growth metrics in micropropagation. Wachira’s findings, where embryogenic competence was reduced by increasing concentrations of growth regulators, support our recommendations for optimizing regulator concentrations to enhance growth outcomes. Furthermore, Molina et al. [
53]’s work on the regeneration effectiveness of different media for tea clones provides a practical foundation that aligns with our findings. Their identification of the optimal media conditions for specific clones complements our study’s emphasis on media optimization, illustrating the broader applicability of these findings in tailoring micropropagation practices to different genetic backgrounds. Also, the study by Gonbad et al. [
54] on protocol optimization using nodal segments highlights the critical role of precise growth regulator combinations and concentrations, as we also noted in our study. The comparison between Gonbad’s findings and ours further emphasizes the significance of detailed, condition-specific studies for improving micropropagation efficiency and success.
In summary, the comparative discussion of our results within the context of the existing literature, including studies [
46,
55], underscores the significance of media composition in the micropropagation of tea. The substantial differences observed between Medium 1 and Medium 2 across all metrics underline the crucial role of media environments in optimizing growth conditions. These insights contribute to the broader body of knowledge, enabling researchers and practitioners to refine agricultural and horticultural practices for improved plant propagation.
In our study on tea micropropagation, we investigated the performance of three machine learning models: RF, XGBoost, and MLP, across various growth parameters, including the multiplication coefficient, leaf count, plant height, number of roots, and root length. Our analysis, encapsulated in
Table 2, demonstrates that the MLP model, with its highest R
2 scores across multiple metrics, is superior in capturing the nuances of tea plant growth under varied conditions, outperforming both RF and XGBoost in consistency across evaluated parameters.
Comparing our findings with those from similar studies reveals a broader landscape of machine learning applications in plant science. For instance, the study by Şimşek [
20] on strawberry cultivars under drought stress highlighted the RF model’s exceptional accuracy in predicting water stress effects, demonstrating the model’s adaptability across strawberries with different genetic backgrounds. This echoes our findings, where RF also showed strong performance, especially in predicting the number of roots with an R
2 of 0.90, underscoring the effectiveness of RF in plant sciences when dealing with specific growth traits. Further, the application of machine learning in the lavender study by Şimşek et al. [
20] revealed varying degrees of effectiveness across different models, including MLP, RBF, XGBoost, and GP, in predicting plant characteristics. The study noted XGBoost’s impressive performance in predicting root length, which is similar to our observations, where XGBoost displayed intermediate performance across the micropropagation parameters of tea. This similarity underscores the potential of XGBoost in handling complex datasets across different plant species.
On the other hand, Pepe et al.’s [
24] exploration into
Cannabis sativa L. in vitro culture optimization using MLP, GRNN, and ANFIS models indicated GRNN’s superior predictive performance over the others. This contrasts with our findings, where MLP outshined RF and XGBoost, suggesting that the choice of model may depend on the specific characteristics of the plant species and the growth parameters under investigation. The study by Aasim et al. [
55] on the in vitro germination of hemp further elaborates on the potential of machine learning, particularly emphasizing the standout performance of the random forest model. This aligns with our results, highlighting RF’s strong predictive capability, especially in rooting parameters. It underscores the utility of RF in scenarios requiring high precision in predictions, such as in vitro germination and seedling growth traits.
These comparative insights across studies illuminate the machine learning models’ diverse utility and performance in plant science research. While MLP emerged as the most consistent performer in our tea micropropagation study, the superior performance of RF in predicting specific growth traits, as seen in both our study and others, showcases the model’s robustness. XGBoost’s intermediate performance, alongside the varying effectiveness of models like GRNN in cannabis research, highlights the importance of model selection based on the specific dataset and research objectives. The comparison validates the significant role of machine learning in enhancing agricultural and horticultural practices and emphasizes the necessity for tailored approaches in choosing the most appropriate model for specific plant studies.