Next Article in Journal
Molecular Mechanisms of Heterosis and Its Applications in Tree Breeding: Progress and Perspectives
Previous Article in Journal
The T2T Genome of the Domesticated Silkworm Bombyx mori
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identification of Key Genes Involved in Seed Germination of Astragalus mongholicus

1
Industrial Crop Institute, Shanxi Agricultural University, Fenyang 032200, China
2
School of Pharmacy, Shanxi Medical University, Taiyuan 030001, China
3
College of Agriculture, Shanxi Agricultural University, Jinzhong 030801, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Int. J. Mol. Sci. 2024, 25(22), 12342; https://doi.org/10.3390/ijms252212342
Submission received: 14 October 2024 / Revised: 12 November 2024 / Accepted: 14 November 2024 / Published: 17 November 2024
(This article belongs to the Section Molecular Informatics)

Abstract

:
Seed germination is a fundamental process in plant reproduction, and it involves a series of complex physiological mechanisms. The germination rate of Astragalus mongholicus (AM) seeds is significantly lower under natural conditions. To investigate the key genes associated with AM seed germination, seeds from AM plants were collected at 0, 12, 24, and 48 h for a transcriptomic analysis, weighted gene co-expression network analysis (WGCNA), and machine learning (ML) analysis. The primary pathways involved in AM seed germination include plant-pathogen interactions and plant hormone signaling. Four key genes were identified through the WGCNA and ML: Cluster-28,554.0, FAS4, T10O24.10, and EPSIN2. These findings were validated using real-time quantitative reverse transcription PCR (qRT-PCR), and results from RNA sequencing demonstrated a high degree of concordance. This study reveals, for the first time, the key genes related to AM seed germination, providing potential gene targets for further research. The discovery of N4-acetylcysteine (ac4C) modification during seed germination not only enhances our understanding of plant ac4C but also offers valuable insights for future functional research and application exploration.

1. Background

Seed germination is a critical stage in the plant-growth process [1,2]. Its primary purpose is to produce offspring and maintain species competitiveness in the natural environment. Additionally, it serves as a vector for the transmission of microorganisms and pathogens [3]. The appropriate temporal and spatial distribution of seed germination is essential for the survival and reproduction of seed plants [4]. The germination process begins with the uptake of water by mature, dry seeds, which leads to seed expansion and culminates in the formation of the radicle. This process encompasses a series of physiological and morphological events, including energy conversion, nutrient consumption, and changes in metabolic products [5], as well as alterations at the genetic level.
Astragalus membranaceus var. mongholicus (Bunge) P.K. Hsiao is a significant medicinal herb belonging to the Leguminosae family (Astragalus L.). This perennial plant is primarily found in regions such as Shanxi, Gansu, Nei Mongol, and other areas of China. Due to its remarkable pharmacological properties, the roots of Astragalus mongholicus (AM) are widely utilized in traditional medicine. However, wild populations of AM are becoming increasingly scarce due to overharvesting, resulting in its classification as a vulnerable species in The Red List of Endangered Plants in China. AM mainly propagates through seeds, but its hard seed coat poses challenges for natural breeding in the wild. Even in artificial cultivation, methods such as scarification are required to overcome the impermeability of the seed coat and increase germination rates [6]. This indicates that the seeds of Astragalus exhibit dormancy characteristics, and a low germination rate is a key internal factor affecting its cultivation success [7]. Previous studies on improving the germination rate of AM seeds have provided some insights. Conditions of both full light and complete darkness have no significant effect on the germination of AM seeds [8]. Treatment with gibberellins and hydrogen peroxide can increase the germination rate of hard Astragalus seeds [9]. Low concentrations of polyethylene glycol and NaCl also do not promote the germination of AM seeds [10]. Nevertheless, methods such as sanding, soaking in warm water, using 75% alcohol [11], and immersing in 98% concentrated sulfuric acid [12] can effectively break seed coat dormancy and improve the germination rate of AM seeds. Machine learning (ML) is a crucial branch of artificial intelligence that focuses on enabling computers to learn from data and make predictions or decisions using algorithms and statistical models. These methods can be categorized into supervised and unsupervised techniques, both of which hold significant potential for analyzing relationships in high-dimensional data [13]. Furthermore, ML is particularly effective in evaluating high-dimensional transcriptomic data and identifying biologically significant features [14,15]. In clinical settings, a weighted gene co-expression network analysis (WGCNA) is frequently combined with ML to identify diagnostic biomarkers for diseases [16,17]. However, this approach is seldom utilized to screen for genes and pathways associated with plant physiological activities.
In this study, we used a transcriptomic analysis, a WGCNA, and ML to identify key genes involved in the germination of AM seeds. By examining the regulation of these key genes and pathways throughout the germination process, we aimed to elucidate the genes and pathways implicated in AM seed germination. This research provides valuable insights into the physiological and metabolic changes that occur during the germination of AM seeds.

2. Results

2.1. Functional Annotation of the Transcriptome During AM Seed Germination

To investigate the mechanism of seed germination in AM, we performed RNA-Seq analysis to assess transcript-level changes across four germination stages, with each stage comprising three replicates. The raw data were derived from 12 cDNA libraries, resulting in a total of 78.87 GB of clean data, with each sample’s clean data exceeding 6 GB. The Q30 value for all samples was greater than 94% (Table S1), indicating that the high-quality transcriptome sequencing data were appropriate for further analysis. A total of 175,831 transcripts were assembled, yielding an N50 length of 2105 bp. The completeness of the transcripts was evaluated using BUSCO software (v4.0.6, Bioinformatics Group, VIB, Ghent University, Ghent, Belgium) (Figure S1). Hierarchical clustering was conducted based on the read count and expression patterns of the transcripts aligned with Corset. The longest cluster sequence obtained from the hierarchical clustering was designated as an unigene for subsequent analysis, resulting in a total of 103,152 unigenes with an N50 of 2341 bp (Tables S2 and S3). To acquire comprehensive information on the assembled transcriptome, similarity searches were performed against the Kyoto Encyclopedia of Genes and Genomes (KEGG), Non-redundant Protein Sequence Database (Nr), SwissProt, Gene Ontology (GO), Eukaryotic Orthologous Groups of Proteins (KOG), Translated EMBL Nucleotide Sequence Data Library (TrEMBL), and Protein Families Database (Pfam) databases, using a significance threshold of E ≤ 10−5 and the BLAST algorithm. Among these, 60,024 sequences (58.19% of the total) were annotated in KEGG, 77,698 (75.32%) in Nr, 56,918 (55.18%) in SwissProt, 67,948 (65.87%) in GO, 48,784 (47.29%) in KOG, 77,633 (75.26%) in TrEMBL, and 52,232 (50.64%) in Pfam (Table S4).

2.2. Differential Gene Expression and Enrichment Analysis

The DESeq2 package (v1.22.2, Bioconductor, Seattle, WA, USA) was employed for differential gene expression analysis. Using the criteria of |log2 Fold Change| ≥ 1, FDR < 0.05, and p-value < 0.01, we compared all upregulated and downregulated genes. As the duration of seed germination increased, the number of differentially expressed genes (DEGs) gradually rose (Figure 1A). Venn diagrams comparing the time points of 12 h-vs-0 h, 24 h-vs-0 h, and 48 h-vs-0 h indicated that 4950 genes exhibited significant changes throughout the germination process (Figure 1B). GO enrichment analysis of DEGs across the four germination stages revealed that these genes were broadly distributed among three functional categories: biological process, molecular function, and cellular component (Figure 1C). In the biological process category, the majority of genes were involved in the generation of precursor metabolism and energy, indicating that the transition from dormancy to an active growth state necessitates extensive breakdown of seed-storage materials for energy and precursor metabolites. This process supports cell growth and division, ultimately driving seedling emergence and growth. In the cellular component category, most genes were localized to the ribosome, underscoring the critical role of protein synthesis during seed germination, as newly synthesized proteins were essential for reestablishing cellular functions and promoting seedling growth. In the molecular function category, many genes were associated with the mitochondrial envelope, highlighting the importance of mitochondrial activation for energy production during seed germination. This underscored the active involvement of mitochondria in regulating intracellular energy balance and metabolic activities, which ensured a continuous energy supply for seed germination. Together, these findings emphasized the crucial role of mitochondria in supporting the energy needs necessary for successful germination.
KEGG enrichment analysis of the DEGs revealed significant enrichment in metabolic pathways across the three time points, indicating that these pathways were highly active during seed germination. This finding demonstrated that metabolic regulation, which encompasses the breakdown of storage materials, energy generation, and the synthesis of new cellular components, plays a crucial role in the germination of AM seeds (Figure 1D).

2.3. Identification of Coexpressed Gene Modules Associated with Seed Germination in AM

To understand the biological processes involved in the germination of AM seeds from a holistic network perspective, we conducted a WGCNA (Figure 2). The pick soft threshold function in the WGCNA package (version 1.71, Peter Langfelder and Steve Horvath, University of California, Los Angeles, CA, USA) in R was utilized to calculate the optimal power value, using an R-squared cut-off of 0.85, which resulted in an appropriate soft threshold of 20 (Figure 2A). The WGCNA categorized the transcripts obtained from sequencing into 15 distinct modules. The majority of transcripts were distributed in the blue and turquoise modules, with fewer transcripts found in the tan and salmon modules (Figure 2B,C). In various stages of seed development, gene expression in different modules shows significant correlations with specific stages as follows: the dry seed stage (0 h-1, 0 h-2, and 0 h-3) shows significant upregulation of the pink, cyan, and green yellow modules, indicating a high expression of genes within these modules during this phase. In the seed imbibition stage (12 h-1, 12 h-2, and 12 h-3), the green, blue, and salmon modules are significantly upregulated, suggesting active gene expression in these modules during imbibition. During the seed-germination stage (24 h-1, 24 h-2, and 24 h-3), the red, purple, and tan modules exhibit significant upregulation. In the seed swelling stage (48 h-1, 48 h-2, and 48 h-3), the turquoise module is significantly upregulated, indicating substantial gene expression in this module during the swelling phase (Figure 2C,D). We performed a correlation analysis between MM and GS for the 15 modules. We selected modules with cor > 0.7 and p-value < 0.05 for further analysis. This resulted in the selection of the grey module (cor = 0.93, p = 3 × 10−103), purple module (cor = 0.93, p = 3 × 10−52), tan module (cor = 0.94, p = 4.1 × 10−39), magenta module (cor = 0.74, p = 5 × 10−38), and green module (cor = 0.74, p = 6.4 × 10−199), totaling 3771 genes (Figure S2).

2.4. Functional Enrichment Analysis of Key Module Genes

The WGCNA method aims to identify co-expressed gene modules and explore the associations between gene networks and target traits of interest. By conducting enrichment analysis on the genes within key modules, detailed insights can be obtained [18]. During seed germination, a total of 138 pathways were identified across five key modules. Among these, the plant-pathogen interaction (n = 92) and plant hormone signal transduction (n = 84) pathways exhibited the highest levels of gene enrichment. Additionally, protein processing in the endoplasmic reticulum (n = 77) and ubiquitin-mediated proteolysis (n = 76) were also significantly enriched (Figure 3A). GO enrichment analysis revealed that, in the biological process category, key module genes were primarily involved in protein ubiquitination (n = 123), response to alcohol (n = 110), and response to abscisic acid (n = 107). In the cellular component category, these genes were predominantly located in the endoplasmic reticulum membrane (n = 82) and the nuclear outer membrane-endoplasmic reticulum membrane net (n = 82). In the molecular function category, the primary activities involved transcription cis-regulatory region binding (n = 123) and transcription regulatory region nucleic acid binding (n = 123). (Figure 3B). The key module genes were predominantly enriched in the plant-pathogen interaction and plant hormone signal transduction pathways, indicating that the germination stage of AM seeds was particularly sensitive to external environmental factors. In contrast, processes such as ubiquitin-mediated proteolysis and ribosome biogenesis underscored the complexity and critical nature of the internal regulatory mechanisms governing seed germination.

2.5. Screening of Feature Genes

A Venn diagram analysis (Figure 4A) was performed to compare the five module genes (n = 3771) with the differentially expressed genes (DEGs, n = 4950), identifying 312 overlapping genes, of which 293 were upregulated and 19 were downregulated. To further select feature genes related to AM seed germination, gradient boosting machine (GBM, Figure 4B) and random forest (RF, Figure 4C) algorithms were employed. In GBM, importance scores for each gene were calculated based on its contribution to all decision trees. By analyzing these scores, we identified the most influential genes that significantly affect prediction outcomes [19]. In addition, RF is a widely used machine learning method for feature selection, which assesses the importance of each gene by calculating the reduction in the Gini index (for classification) or the mean squared error reduction (for regression) at tree nodes. This method is effective at identifying features that contribute most to the prediction results, and it is particularly suitable for high-dimensional data. RF also handles multicollinearity well, which is crucial for analyzing gene expression data. Furthermore, by introducing randomness, RF reduces overfitting and enhances model generalizability [20,21]. Using these two machine learning methods, we selected nine feature genes (Table S5, Figure 4D) from the 312 overlapping genes. These feature genes, which performed well in both the GBM and RF models, represent genes with high predictive power and biological relevance, providing important candidates for further research.

2.6. Analysis of Key Genes

After identifying the nine feature genes from the intersection of GBM and RF (Figure 4D), least absolute shrinkage and selection operator (LASSO) was employed to pinpoint four key genes (Table S6; Figure 5A,B): Cluster-28,554.0, Cluster-31,140.3, Cluster-44,625.0, and Cluster-40,267.9. Among these genes, Cluster-31,140.3, Cluster-44,625.0, and Cluster-40,267.9 were upregulated during AM seed germination, while Cluster-28,554.0 was downregulated. In this experiment, 0 h was used as the control group, and 12, 24, and 48 h were used as the experimental groups. The changes of these four key genes were as follows: the expression level of Cluster-44,625.0 showed a significant upward trend, while the expression levels of Cluster-31,140.3 and Cluster-40,267.9 began to decrease significantly after 24 h, decreasing by 38.5% and 45.0%, respectively. Cluster-28,554.0 first decreased, then increased during seed germination, and then decreased again (Figure 5C).
The identification of overlapping genes and the application of multiple ML algorithms in this study resulted in the selection of four key genes from the feature set. According to the WGCNA module classification, Cluster-31,140.3 was part of the purple module, which was primarily associated with the biosynthesis of cofactors, pantothenate, and CoA biosynthesis (Figure 5D). Both Cluster-28,554.0 and Cluster-44,625.0 belonged to the green module, which was mainly related to plant-pathogen interaction, ubiquitin-mediated proteolysis, and zeatin biosynthesis (Figure 5E). Cluster-40,267.9 was categorized within the tan module, which was primarily involved in RNA degradation, protein export, and ubiquitin-mediated proteolysis (Figure 5F). Through the use of LASSO and the WGCNA, this study identified four key genes (Cluster-28,554.0, Cluster-31,140.3, Cluster-44,625.0, and Cluster-40,267.9) that play a role in AM seed germination. A detailed annotation of the expression changes and functions of these genes highlighted their important roles in various biological processes, providing valuable insights and a foundation for further research into the mechanisms underlying AM seed germination.
To further elucidate the functions of the four key genes, we analyzed their annotation information. Cluster-28,554.0 encodes a protein of unknown function, and our understanding of its role remains limited. However, since both Cluster-28,554.0 and Cluster-44,625.0 are components of the green module, it is hypothesized that these two genes may exhibit similar functions. Cluster-31,140.3 encodes the ATP-dependent RNA helicase DEAH13, also known as FAS4 (At1g33390). This gene encodes a protein that contains an HA2 domain associated with RNA helicases and is a member of the DEAH-box RNA helicase family. Cluster-40,267.9 encodes RNA 4-methylcytidine acetyltransferase 1, with the coding gene identified as T10O24.10 (At1g10490). This protein is generally associated with the Cyclin N-terminal domain, which is linked to cell cycle regulation. It plays a crucial role in ribosome biogenesis by specifically catalyzing the formation of N4-acetylcysteine (ac4C) in 18S rRNA and tRNA. The synthesis of N4-acetylcysteine is essential for plant growth and development [22]. Cluster-44,625.0 encodes the protein EPSIN2/EPN2 (At2g43160/At2g43170), also known as the balanced nucleotide transporter, which contains an ENTH (epsin N-terminal homology) domain, plays a significant role in protein trafficking, particularly in endocytosis, and is involved in ribosome biogenesis in eukaryotes according to KEGG metabolic pathway annotations. Seed germination is a complex and dynamic process, and the four key genes identified interact to ensure successful seed germination and early growth.

2.7. qRT-PCR Validation of Key Genes

The expression levels of key genes were validated using qRT-PCR (Figure 6). The results indicated that the expression patterns of the four selected key genes were generally consistent with the RNA-Seq data, confirming a strong correlation between the RNA-Seq and qRT-PCR results.

3. Discussion

Seed germination is a complex process governed by the interaction of numerous intrinsic factors [23]. Understanding the entire germination process, from initial water uptake to radicle emergence, is essential for gaining comprehensive insights into this developmental stage. Accurately determining the transition time from water uptake to radicle emergence during seed germination is crucial [24]. Previous studies have shown that seed imbibition occurs in three distinct phases: Phase I, which lasts for 6 h from the onset of water absorption; Phase II, which consists of a plateau lasting until 24 h; and Phase III, which culminates in the initiation of germination [25]. AM seeds exhibit rapid water absorption within the first 1 to 12 h, followed by a gradual approach to saturation from 12 to 24 h [26]. For AM, the germination phase lasts from 0 to 2 days, while the post-germination growth phase extends from 3 to 8 days [27]. In this study, transcriptomic analyses of AM seeds at 0 h, 12 h, 24 h, and 48 h after germination revealed significant KEGG/GO enrichment, indicating that seed germination requires substantial energy for growth and development. GO enrichment analysis highlighted the generation of precursor metabolites and energy as a key functional category, underscoring the fact that energy metabolism is a central process in seed germination. Additionally, KEGG pathway analysis indicated enrichment in metabolic pathways, demonstrating that metabolic activities are continuously adjusted and activated during the transition from seed dormancy to active growth. Correlation analysis between MM and GS across 15 modules demonstrated five key modules with cor > 0.7 and p-value < 0.05. Subsequent KEGG and GO enrichment analyses revealed that these five key modules were predominantly associated with the plant–pathogen interaction and plant hormone signal transduction pathways. These findings indicate that these two pathways exhibit heightened activity throughout the seed-germination process in AM.
The plant–pathogen interaction pathway includes genes associated with plant defense mechanisms and immune responses [28]. This indicates that, during the seed germination phase, seeds are particularly susceptible to pathogen attacks, and the genes enriched in this pathway play crucial roles in pathogen recognition, the initiation of defense responses, and signaling processes that bolster seed resistance to diseases. Additionally, the majority of these genes are also enriched in the plant hormone signal transduction pathway, indicating that precise regulation of plant hormone synthesis, secretion, and signaling is vital for seed germination, influencing cellular division, expansion, and developmental growth [29]. A WGCNA provides a visual representation of the complex co-expression patterns among genes, facilitating the identification of module genes [30]. Furthermore, ML algorithms can enhance the identification of key genes within these modules, improving the accuracy and reliability of the gene selection process [29,31]. Seed germination is a complex and highly coordinated process that requires precise gene regulation and an adequate energy supply. In this study, we identified four key genes (Cluster-28,554.0, FAS4, EPN2, and T10O24.10) through a WGCNA and three ML models. Among these, the functional understanding of Cluster-28,554.0 is limited due to a lack of relevant studies. RNA helicases represent a large and complex gene family involved in various aspects of RNA metabolism [32]. RNA helicases are frequently associated with ribonucleoprotein complexes, which are essential for ribosome assembly, degradation, and the regulation of translation. Notably, FAS4, an ATP-dependent RNA helicase, regulates reproductive development through sub functionalization, which is critical for plant reproduction [33]. Studies have shown that FAS4 may be expressed at specific stages of plant development or under certain developmental and environmental conditions [34]. The expression of FAS4 during the germination of AM indicates its specific role during this critical growth phase. Ac4C is a conserved modification found in rRNA and tRNA [35,36,37]. Among various mRNA modifications, ac4C is unique because of its acetylation. In human cell lines, ac4C enhanced mRNA stability and translation initiation [38]. However, the existence, distribution patterns, and potential functions of ac4C modifications in plants remain largely unexplored. Studies have shown that ac4C was enriched at translation initiation sites in rice mRNA and at both initiation and termination sites in Arabidopsis mRNA [39]. It is hypothesized that ac4C modification may promote mRNA stability in plants, although extensive research is needed to elucidate its underlying mechanisms [40]. In the transcriptome of AM seeds, the enzyme RNA 4-methylcytidine acetyltransferase 1, which catalyzes the formation of ac4C in 18S rRNA, was identified [22]. Furthermore, its gene expression levels gradually increase during germination, indicating significant production of ac4C during seed germination and its crucial regulatory role in this process. Specifically, the increase in ac4C modification may stabilize key mRNAs, facilitate their translation, and ensure the efficient synthesis of proteins required for seed germination. This dynamic modification further emphasizes the crucial function of ac4C in regulating gene expression and facilitating seed germination and early growth. EPSIN proteins are an evolutionarily conserved family of membrane proteins that play crucial roles in endocytosis and signal transduction [41]. This protein family has been primarily studied in humans and animals [42]. Evidence from mouse models showed that knockout of EPSIN1 and EPSIN2 leads to embryonic lethality, highlighting their critical functions in embryonic development [43]. Similarly, during the process of seed germination in AM, EPSIN2 expression may play an important role in early developmental stages to ensure proper embryonic development and successful seed germination.

4. Materials and Methods

4.1. Materials

The AM seeds used in this study were obtained from the Economic Crop Research Institute of Shanxi Agricultural University (37°24′05″ E, 111°78′65″ N). The seeds were identified as AM by researcher Tian H from Shanxi Agricultural University. The HQ-233 seeds demonstrated excellent field-germination performance. All experimental seeds were harvested from the same AM plant in 2023 and stored in a seed bank at 4 °C. The seeds were disinfected with a 5% sodium hypochlorite solution for 5 min, rinsed 6–7 times with running water (with each rinse lasting 1–2 min), soaked in boiling water for 60 s, and then removed and placed in Petri dishes for cultivation. The treated seeds were incubated in a greenhouse at a temperature of 20–25 °C until germination.
This study focused on four stages of seed germination: 0 h (seeds without imbibition), 12 h (seeds fully imbibed with water), 24 h (seed coat cracking stage), and 48 h (radicle emergence stage) (Figure 7). The seeds were immediately frozen in liquid nitrogen and stored at −80 °C for subsequent analysis.

4.2. Sequencing Results

The RNA from the AM seeds was extracted using the CTAB-PBIOZOL method. The extracted RNA was dissolved in 50 µL of DEPC-treated water. Total RNA was subsequently identified and quantified using a Qubit 4.0 (Thermo Fisher Scientific, Waltham, MA, USA) fluorometer and a Qsep400 (BiOptic, New Taipei City, Taiwan) high-throughput bio-fragment analyzer. Most eukaryotic mRNAs possess a poly(A) tail, which was utilized to enrich poly(A)-tailed mRNAs with oligo(dT) magnetic beads for mRNA library construction. The libraries were sequenced on the Illumina NovaSeq 6000 platform at Wuhan MetWare Biotechnology Co., Ltd., Wuhan, China. Following sequencing, the raw data underwent several quality control steps, including data filtering, assessment of sequencing error rates, and examination of GC content distribution. Reads containing adapters, reads with more than 10% N content, and reads with over 50% of bases having a quality score (Q) ≤ 20 were removed to obtain clean reads for subsequent analysis (Table S1). The clean reads were assembled into transcripts using Trinity (v2.13.2, Trinity Software, Washington, DC, USA) [44]. The assembled transcripts were then clustered and deduplicated using Corset (https://github.com/Oshlack/Corset) (accessed on 25 July 2024) to refine the dataset for further analysis.
After filtering the raw sequencing data, high-quality reads were obtained. These reads were assembled into transcript sequences for the species using Trinity. The transcripts were then deduplicated to generate unigene sequences with Corset. The high-quality reads were aligned to the deduplicated transcriptome to calculate gene expression levels. Transcript expression levels were determined using RSEM software(v1.3.1, RSEM Software, University of California, Berkeley, CA, USA), and FPKM for each transcript was calculated using transcript length (Figure S3). To predict the potential functions and biological pathways of the genes, DIAMOND [45] BLASTX software (v2.13.0, NCBI, Bethesda, MD, USA) was employed to align unigene sequences against the KEGG, NR, Swiss-Prot, GO, KOG, and TrEMBL databases. After predicting the amino acid sequences of the unigene, HMMER software (v2.13.0+, NCBI, Bethesda, MD, USA) was utilized for alignment with the Pfam database to obtain unigene annotation information (Tables S4 and S7).

4.3. Screening and Analysis of DEGs

Differential expression analysis between sample groups was conducted using DESeq2 (version 1.22.2, Bioconductor, Seattle, WA, USA) [46,47], which identified differentially expressed gene sets between two biological conditions. The Benjamini–Hochberg method was employed to adjust p-value for multiple hypothesis testing, with corrected p-value and |log2 fold change| serving as thresholds for significant differential expression. The criteria for identifying differentially expressed genes were |log2 Fold Change| ≥ 1 and false discovery rate (FDR) < 0.05. Differentially expressed genes were annotated for categories, functions, and pathways using the Gene Ontology (GO) database (http://geneontology.org/) (accessed on 25 July 2024) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (http://www.kegg.jp/ or http://www.genome.jp/kegg/) (accessed on 25 July 2024).

4.4. WGCNA and ML

A WGCNA is advantageous for studying gene set expression. The WGCNA R package was employed in subsequent stages to construct and modularize various gene networks. The samples were clustered to identify any potentially significant outliers that may exist. Following this, a co-expression network was established using an automated network system. The “WGCNA” package in R software (v4.3.3, R Foundation for Statistical Computing, Vienna, Austria) was utilized for constructing and visualizing the network [48], with the following parameters: mergeCutHeight = 0.25, RsquaredCut = 0.85, TOMType = “signed” and minModuleSize = 50. MM is correlated with GS, and colors are selected for further analysis based on a correlation coefficient (cor) > 0.7 and a p-value < 0.05.

4.5. Key Genes

RF is a machine learning method that utilizes decision trees to assess variable importance by scoring the significance of each variable [49]. GBM is an ensemble learning algorithm that evaluates the contribution of each input feature to the prediction outcome, with more important features exerting a greater influence on the predictive results [19]. LASSO regression is another machine learning technique that performs variable selection and complexity regularization while fitting a generalized linear model, the scikit-learn package (v0.24.2, scikit-learn developers, USA) in python (v3.12, Python Software Foundation, Beaverton, OR, USA) is commonly used. The degree of complexity adjustment in LASSO is governed by the parameter lambda. A larger lambda value imposes a greater penalty on linear models with more variables, resulting in fewer selected variables and more representative genes [50]. This study employed these three machine learning models. Initially, feature genes were screened using the GBM and RF algorithms, followed by the selection of key genes using LASSO.

4.6. Real-Time Quantitative Reverse Transcription PCR (qRT-PCR)

RNA was extracted using the Plant RNA Extraction Kit (DP432) following the manufacturer’s instructions. Transcript-specific primers were designed with Primer 3 (Table S8) and synthesized by Shanghai Yuying Biotechnology Co. (Shanghai, China). The synthesized primers were subsequently optimized (Figures S4 and S5). qRT-PCR was conducted using Power qPCR PreMix (GENEray, GK8020, Guangzhou, Guangdong, China) on the CFX384 Touch™ Real-Time PCR Detection System (Bio-Rad Laboratories, Hercules, CA, USA) under the following conditions: 95 °C for 10 min, followed by 40 cycles of 95 °C for 10 s and 60 °C for 34 s. The 18S RNA served as an internal reference, and relative expression levels were calculated using the 2−ΔΔCt method [51]. Each biological sample was analyzed in triplicate.

4.7. Statistical Analysis

Data analysis was conducted using Microsoft Excel 2019 (Microsoft Corporation, Redmond, WA, USA) for initial data processing and organization. A WGCNA was performed using the “WGCNA” package in R software (v4.3.3, R Foundation for Statistical Computing, Vienna, Austria). Data preprocessing and analysis were executed using the Python programming language. The Pandas library facilitated data cleaning and manipulation, including handling missing values with the “dropna” and “fillna” functions, performing data formatting and calculations with the “apply” and “map” functions, and filtering data based on specific conditions using the “query” function. Additionally, the NumPy library was employed for data analysis and computation. Statistical analysis and regression model construction were carried out using the scikit-learn library. Data visualization was performed in a Python environment utilizing the matplotlib and seaborn libraries. Machine learning model development and evaluation were conducted in Python, also using the scikit-learn library. Feature selection was executed with the “SelectKBest” function, and classification models were developed using the random forest classifier. Model performance was assessed through 5-fold cross-validation.

5. Conclusions

Seed germination is a complex and highly coordinated process that necessitates precise gene regulation and an adequate energy supply. Transcriptomic analyses have demonstrated the reliance of seeds on energy metabolism during germination and highlighted the central roles of several key genes in various biological processes. Our study identified four key genes—Cluster_28554.0, FAS4, EPN2, and T10O24.10—as crucial regulators of the germination process. These findings not only enhance our understanding of the mechanisms underlying AM seed germination but also offer valuable insights for further functional research and for the regulation of plant growth.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms252212342/s1.

Author Contributions

Conceptualization, J.L. and S.G.; Methodology, J.L. and S.G.; Software, J.L. and S.G.; Validation, J.L., X.Z. and Y.H.; Formal Analysis, J.L.; Data Curation, J.L.; Writing—Original Draft Preparation, J.L.; Writing—Review & Editing, J.L.; Visualization, J.L.; Supervision, H.T., Q.Z. and Y.W.; Project Administration, H.T., Q.Z. and Y.W.; Funding Acquisition, S.G. All authors have read and agreed to the published version of the manuscript.

Funding

We gratefully acknowledge the financial support of the National Key Research and Development Project (2019YFC1710800); National Chinese Herbal Medicine Industry Technology System Hunyuan Comprehensive Test Station (CARS-21-03); Breeding Engineering of Shanxi Agricultural University (YZGC056); Germplasm Resources and Breeding of Astragalus in Hengshan (XDHZHQY2022-01).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and pictures are in the Supplementary Materials (https://zenodo.org/records/13927134 at 10.5281/zenodo.13927134), as well as the transcribe raw data can be found in the NCBI database (https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA1165939).

Acknowledgments

Thank you to all the authors for their contributions to this article, and special thanks to Wuhan MetWare Biotechnology Co., Ltd. for providing technical support for transcription.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

WGCNAWeighted Gene Coexpression Network Analysis
MLMachine Learning
FPKMFragments Per Kilobase of Transcript Per Million Mapped Reads
GOGene Ontology
KOGEukaryotic Orthologous Groups of Proteins
TrEMBLTranslated EMBL Nucleotide Sequence Data Library
KEGGKyoto Encyclopedia of Genes and Genomes
PfamProtein Families Database
NrNon-redundant Protein Database
MMModule Membership
GSGene Significance
RFRandom Fores
GBMGradient Boosting Machine
LASSOLeast Absolute Shrinkage and Selection Operator
SDBGStilbenoid, diarylheptanoid and gingerol bisynthesis
UQ/TQ biosynthesisUbiquinone and other terpenoid-quinone biosynthesis
TPPABTeopane, pineridine and pyridine alkaloid biosynthesis

References

  1. Rajjou, L.; Duval, M.; Gallardo, K.; Catusse, J.; Bally, J.; Job, C.; Job, D. Seed germination and vigor. Annu. Rev. Plant Biol. 2012, 63, 507–533. [Google Scholar] [CrossRef] [PubMed]
  2. Zaynab, M.; Kanwal, S.; Furqan, M.; Islam, W.; Noman, A.; Ali, G.M.; Rehman, N.; Zafar, S.; Sughra, K.; Jahanzab, M. Proteomic approach to address low seed germination in Cyclobalnopsis gilva. Biotechnol. Lett. 2017, 39, 1441–1451. [Google Scholar] [CrossRef] [PubMed]
  3. Wang, X.; He, S.; Hou, J.; Wei, H.; Zhang, X. Advances in seed endophytic bacteriome. Acta Micro-Biol. Sin. 2023, 63, 1365–1378. [Google Scholar] [CrossRef]
  4. Nonogaki, H. Seed dormancy and germination-emerging mechanisms and new hypotheses. Front. Plant Sci. 2014, 5, 233. [Google Scholar] [CrossRef]
  5. Bewley, J.D.; Black, M. Dormancy and the control of germination. In Seeds; Springer: Boston, MA, USA, 1985; pp. 199–271. [Google Scholar] [CrossRef]
  6. Guo, W.; Li, M.; Yi, L.; Hou, X.; Wei, Z. Planting Techniques of Astragalus membranaceus (Fisch.) Bge. var. mongholicus (Bge.) Hsiao: A Review. J. Agric. 2019, 9, 36–43. [Google Scholar] [CrossRef]
  7. Okyere, A.S. Study of Seed Germination Promoting Factors with Inhibiting Damage and Bud Transcriptome for Astragalus membranaceus var. mongholicus; Gansu Agricultural University: Lanzhou, China, 2022. [Google Scholar] [CrossRef]
  8. Xu, T.; Guo, S.; Tian, H.; Wu, C.; Hao, Y.; Pei, S. Effects of Different Pretreatment and Illumination Condition on Germination of Astragalus membranaceus Seeds. J. Shanxi Agric. Sci. 2018, 46, 196–198. [Google Scholar] [CrossRef]
  9. Zheng, T.; Chen, Y. Study on the Method of Breaking Hard Seed of Astrangalus. Seed 2016, 35, 90–93. [Google Scholar] [CrossRef]
  10. Wang, N.; Gao, J.; Huang, W.J.; Li, B.; He, Y.H.; Tang, Z.S.; Song, Z.X. Variations in seed germination and salicylic acid protective effect between two cultivars of Astragalus membranaceus under drought and salt stress. Pratacultural Sci. 2018, 35, 106–114. [Google Scholar] [CrossRef]
  11. Shi, L.; Ou, Q.; Cui, W.; Chen, Y. Study on Method and Its Optimization of Improving Seed Germination of Astragalus membranaceus as Gansu Traditional Medicinal Herb. J. Chin. Med. Mater. 2014, 37, 548–552. [Google Scholar] [CrossRef]
  12. Ma, Y.; Zhuang, Y.; Li, Y. Influence of Different Treatments and Sowing Pattern on Germination Percentage of Astragalus membranceus Seeds. Seed 2007, 26, 58–59. [Google Scholar] [CrossRef]
  13. Uddin, S.; Khan, A.; Hossain, M.E.; Moni, M.A. Comparing different supervised machine learning algorithms for disease prediction. BMC Med. Inf. Decis. Mak. 2019, 19, 281. [Google Scholar] [CrossRef] [PubMed]
  14. Handelman, G.S.; Kok, H.K.; Chandra, R.V.; Razavi, A.H.; Lee, M.J.; Asadi, H. eDoctor: Machine learning and the future of medicine. J. Intern. Med. 2018, 284, 603–619. [Google Scholar] [CrossRef] [PubMed]
  15. Greener, J.G.; Kandathil, S.M.; Moffat, L.; Jones, D.T. A guide to machine learning for biologists. Nature reviews. Mol. Cell Biol. 2022, 23, 40–55. [Google Scholar] [CrossRef]
  16. Li, Q.; Wei, X.; Wu, F.; Qin, C.; Dong, J.; Chen, C.; Lin, Y. Development and validation of preeclampsia predictive models using key genes from bioinformatics and machine learning approaches. Front. Immunol. 2024, 15, 1416297. [Google Scholar] [CrossRef] [PubMed]
  17. Xu, M.; Zhou, H.; Hu, P.; Pan, Y.; Wang, S.; Liu, L.; Liu, X. Identification and validation of immune and oxidative stress-related diagnostic markers for diabetic nephropathy by WGCNA and machine learning. Front. Immunol. 2023, 14, 1084531. [Google Scholar] [CrossRef]
  18. Zhang, B.; Horvath, S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 2005, 4–5. [Google Scholar] [CrossRef]
  19. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  20. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  21. Díaz-Uriarte, R.; Alvarez de Andrés, S. Gene selection and classification of microarray data using random forest. BMC Bioinform. 2006, 7, 3–4. [Google Scholar] [CrossRef]
  22. Theologis, A.; Ecker, J.R.; Palm, C.J.; Federspiel, N.A.; Kaul, S.; White, O.; Alonso, J.; Altafi, H.; Araujo, R.; Bowman, C.L.; et al. Sequence and analysis of chromosome 1 of the plant Arabidopsis thaliana. Nature 2000, 408, 816–820. [Google Scholar] [CrossRef]
  23. Kucera, B.; Cohn, M.A.; Leubner-Metzger, G. Plant hormone interactions during seed dormancy release and germination. Seed Sci. Res. 2005, 15, 281–307. [Google Scholar] [CrossRef]
  24. Li, H.; Li, X.; Wang, G.; Zhang, J.; Wang, G. Analysis of gene expression in early seed germination of rice: Landscape and genetic regulation. BMC Plant Biol. 2022, 22, 70. [Google Scholar] [CrossRef] [PubMed]
  25. Ye, N.; Zhu, G.; Liu, Y.; Zhang, A.; Li, Y.; Liu, R.; Shi, L.; Jia, L.; Zhang, J. Ascorbic acid and reactive oxygen species are involved in the inhibition of seed germination by abscisic acid in rice seeds. J. Exp. Bot. 2012, 63, 1809–1822. [Google Scholar] [CrossRef] [PubMed]
  26. Cao, Y.; Zhang, A.; Zang Cl Jia, X.; Xue, Y.; Wang, X.Q. Responses of Seed Germination and Seedling Growth of AM Bunge to Saline-sodic Stress. Seed 2023, 42, 101–105+111. [Google Scholar] [CrossRef]
  27. Yang, N.; Wang, X.; Guo, X.R.; Liu, Y.; Tang, Z.H.; Wang, H.Z. Variation in Flavonoids Biosynthesis during Seed Germination and Post germination Growth in Astragalus membranaceus. Bull. Bot. Res. 2018, 38, 298–305. [Google Scholar] [CrossRef]
  28. Hassan, M.Z.; Rahim, M.A.; Jung, H.J.; Park, J.I.; Kim, H.T.; Nou, I.S. Genome-Wide Characterization of NBS-Encoding Genes in Watermelon and Their Potential Association with Gummy Stem Blight Resistance. Int. J. Mol. Sci. 2019, 20, 902. [Google Scholar] [CrossRef]
  29. Liu, Y.; Ma, M.; Li, G.; Yuan, L.; Xie, Y.; Wei, H.; Ma, X.; Li, Q.; Devlin, P.F.; Xu, X.; et al. Transcription Factors FHY3 and FAR1 Regulate Light-Induced CIRCADIAN CLOCK ASSOCIATED1 Gene Expression in Arabidopsis. Plant Cell 2020, 32, 1464–1478. [Google Scholar] [CrossRef]
  30. Aoki, K.; Ogata, Y.; Shibata, D. Approaches for extracting practical information from gene co-expression networks in plant biology. Plant Cell Physiol. 2007, 48, 381–390. [Google Scholar] [CrossRef]
  31. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  32. Jankowsky, E.; Fairman, M.E. RNA helicases—One fold for many functions. Curr. Opin. Struct. Biol. 2007, 17, 316–324. [Google Scholar] [CrossRef]
  33. Binmöller, L.; Volkert, C.; Kiefer, C.; Zühl, L.; Slawinska, M.W.; Loreth, A.; Nauerth, B.H.; Ibberson, D.; Martinez, R.; Mandakova, T.M.; et al. Differential expression and evolutionary diversification of RNA helicases in Boechera sexual and apomictic reproduction. J. Exp. Bot. 2024, 75, 2451–2469. [Google Scholar] [CrossRef] [PubMed]
  34. Pogorelko, G.; Fursova, O.; Klimov, E. Dentification and Analysis of the Arabidopsis Thaliana Atfas4 Gene Whose Overexpression Results in the Development of a Fasciated Stem. J. Proteom. Bioinform. 2008, 1, 329–335. [Google Scholar] [CrossRef] [PubMed]
  35. Stern, L.; Schulman, L.H. The role of the minor base N4-acetylcytidine in the function of the Escherichia coli noninitiator methionine transfer RNA. J. Biol. Chem. 1978, 253, 6132–6139. Available online: https://pubmed.ncbi.nlm.nih.gov/355249/ (accessed on 20 July 2024). [CrossRef] [PubMed]
  36. Ito, S.; Akamatsu, Y.; Noma, A.; Kimura, S.; Miyauchi, K.; Ikeuchi, Y.; Suzuki, T.; Suzuki, T. A single acetylation of 18S rRNA is essential for biogenesis of the small ribosomal subunit in saccharomyces cerevisiae. J. Biol. Chem. 2014, 289, 26201–26212. [Google Scholar] [CrossRef]
  37. Taniguchi, T.; Miyauchi, K.; Sakaguchi, Y.; Yamashita, S.; Soma, A.; Tomita, K.; Suzuki, T. Acetate-dependent tRNA acetylation required for decoding fidelity in protein synthesis. Nat. Chem. Biol. 2018, 14, 1010–1020. [Google Scholar] [CrossRef]
  38. Arango, D.; Sturgill, D.; Alhusaini, N.; Dillman, A.A.; Sweet, T.J.; Hanson, G.; Hosogane, M.; Sinclair, W.R.; Nanan, K.K.; Mandler, M.D.; et al. Acetylation of cytidine in mRNA promotes translation efficiency. Cell 2018, 175, 1872–1886. e24. [Google Scholar] [CrossRef]
  39. Li, B.; Li, D.; Cai, L.; Zhou, Q.; Liu, C.; Lin, J.; Li, Y.; Zhao, X.; Li, L.; Liu, X.; et al. Transcriptome-wide profiling of RNA N4-cytidine acetylation in Arabidopsis thaliana and Oryza sativa. Mol. Plant 2023, 16, 1082–1098. [Google Scholar] [CrossRef]
  40. Wang, W.; Liu, H.; Wang, F.; Liu, X.; Sun, Y.; Zhao, J.; Zhu, C.; Gan, L.; Yu, J.; Witte, C.P.; et al. N4-acetylation of cytidine in mRNA plays essential roles in plants. Plant Cell 2023, 35, 3739–3756. [Google Scholar] [CrossRef]
  41. Holkar, S.S.; Kamerkar, S.C.; Pucadyil, T.J. Spatial Control of Epsin-induced Clathrin Assembly by Membrane Curvature. J. Biol. Chem. 2015, 290, 14267–14276. [Google Scholar] [CrossRef]
  42. Wang, Y.; Huang, Z.; Xiao, Y.; Wan, W.; Yang, X. The shared biomarkers and pathways of systemic lupus erythematosus and metabolic syndrome analyzed by bioinformatics combining machine learning algorithm and single-cell sequencing analysis. Front. Immunol. 2022, 13, 1015882. [Google Scholar] [CrossRef]
  43. Chen, H.; Ko, G.; Zatti, A.; Di Giacomo, G.; Liu, L.; Raiteri, E.; Perucco, E.; Collesi, C.; Min, W.; Zeiss, C.; et al. Embryonic arrest at midgestation and disruption of Notch signaling produced by the absence of both epsin 1 and epsin 2 in mice. Proc. Natl. Acad. Sci. USA 2009, 106, 13838–13843. [Google Scholar] [CrossRef] [PubMed]
  44. Grabherr, M.G.; Haas, B.J.; Yassour, M.; Levin, J.Z.; Thompson, D.A.; Amit, I.; Adiconis, X.; Fan, L.; Raychowdhury, R.; Zeng, Q.; et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011, 29, 644–652. [Google Scholar] [CrossRef] [PubMed]
  45. Huson, D.H.; Buchfink, B. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 2015, 12, 59–60. [Google Scholar] [CrossRef]
  46. Love, M.I.; Huber, W.; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014, 15, 550. [Google Scholar] [CrossRef]
  47. Varet, H.; Brillet-Guéguen, L.; Coppée, J.-Y.; Dillies, M.-A. SAR Tools: A DESeq2- and EdgeR-based r pipeline for comprehensive differential analysis of RNA-seq data. PLoS ONE 2016, 11, e0157022. [Google Scholar] [CrossRef]
  48. Langfelder, P.; Horvath, S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinform. 2008, 9, 559. [Google Scholar] [CrossRef]
  49. Alakwaa, F.M.; Chaudhary, K.; Garmire, L.X. Deep Learning Accurately Predicts Estrogen Receptor Status in Breast Cancer Metabolomics Data. J. Proteome Res. 2018, 17, 337–347. [Google Scholar] [CrossRef]
  50. Friedman, J.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 2010, 33, 1–22. Available online: https://pubmed.ncbi.nlm.nih.gov/20808728/ (accessed on 20 July 2024). [CrossRef]
  51. Schmittgen, T.; Livak, K. Analyzing real-time PCR data by the comparative C(T) method. Nat. Protoc. 2008, 3, 1101–1108. [Google Scholar] [CrossRef]
Figure 1. DEGs during germination of AM seeds. (A) Upregulation and downregulation of DEGs at 12 h, 24 h, and 48 h, respectively, compared to 0 h. (B) Venn diagram illustrating DEGs. (C) GO enrichment analysis of DEGs. (D) KEGG enrichment analysis.
Figure 1. DEGs during germination of AM seeds. (A) Upregulation and downregulation of DEGs at 12 h, 24 h, and 48 h, respectively, compared to 0 h. (B) Venn diagram illustrating DEGs. (C) GO enrichment analysis of DEGs. (D) KEGG enrichment analysis.
Ijms 25 12342 g001
Figure 2. Expression network analysis of genes related to AM seed germination. (A) Appropriate soft thresholds were established to construct the scale-free network. (B) The cluster dendrogram illustrates the results of hierarchical clustering among genes, with different modules indicated by distinct colors at the bottom of the figure. Each module represents a set of highly co-expressed genes. (C) The module–sample relationship illustrates the correlation between various gene modules and sample features. (D) The eigengene adjacency heatmap displays the similarity between genes characterized by their respective modules.
Figure 2. Expression network analysis of genes related to AM seed germination. (A) Appropriate soft thresholds were established to construct the scale-free network. (B) The cluster dendrogram illustrates the results of hierarchical clustering among genes, with different modules indicated by distinct colors at the bottom of the figure. Each module represents a set of highly co-expressed genes. (C) The module–sample relationship illustrates the correlation between various gene modules and sample features. (D) The eigengene adjacency heatmap displays the similarity between genes characterized by their respective modules.
Ijms 25 12342 g002
Figure 3. KEGG and GO analysis of module genes related to AM seed germination. (A) Key module KEGG pathway analysis: The horizontal axis represents the number of genes enriched in the top 20 pathways, while the vertical axis indicates the names of the KEGG pathways. (B) Key module GO function analysis: The horizontal axis displays the names of the GO entries, and the vertical axis represents the number of genes enriched in the top 10 GO functions.
Figure 3. KEGG and GO analysis of module genes related to AM seed germination. (A) Key module KEGG pathway analysis: The horizontal axis represents the number of genes enriched in the top 20 pathways, while the vertical axis indicates the names of the KEGG pathways. (B) Key module GO function analysis: The horizontal axis displays the names of the GO entries, and the vertical axis represents the number of genes enriched in the top 10 GO functions.
Ijms 25 12342 g003
Figure 4. Feature gene selection for AM seed germination. (A) DEGs and WGCNA of the overlapping screened genes. (B) GBM screening of the characterized genes, with the horizontal axis representing the character importance score and the vertical axis representing the gene name. (C) RF algorithm screening of the characterized genes, with the horizontal axis indicating the average Gini index decline value and the vertical axis indicating the gene name. (D) Feature genes identified from RF and GBM screening.
Figure 4. Feature gene selection for AM seed germination. (A) DEGs and WGCNA of the overlapping screened genes. (B) GBM screening of the characterized genes, with the horizontal axis representing the character importance score and the vertical axis representing the gene name. (C) RF algorithm screening of the characterized genes, with the horizontal axis indicating the average Gini index decline value and the vertical axis indicating the gene name. (D) Feature genes identified from RF and GBM screening.
Ijms 25 12342 g004
Figure 5. Key genes involved in AM seed germination. (A) LASSO coefficient path diagram. The horizontal axis represents the log λ value, while the vertical axis displays the regression coefficient of the gene. At the optimal λ value (indicated by the vertical dashed line in the figure), the LASSO method identifies the key genes. (B) Cross-validation error plot of LASSO. The horizontal axis shows the log λ value, and the vertical axis represents the mean deviation. The red solid line indicates the mean deviation, while the gray shading represents the standard error. The vertical dashed line marks the optimal λ value, which corresponds to the smallest error. (C) Plot of expression changes of the four key genes at different time points. The horizontal axis denotes the time points, the vertical axis indicates gene expression on the left side, and gene counts on the right side. The bar graph represents gene counts, and the line graph illustrates gene expression. (D) KEGG pathway enrichment analysis graph for the purple module. (E) KEGG pathway enrichment analysis graph for the green module. (F) KEGG pathway enrichment analysis for the tan module.
Figure 5. Key genes involved in AM seed germination. (A) LASSO coefficient path diagram. The horizontal axis represents the log λ value, while the vertical axis displays the regression coefficient of the gene. At the optimal λ value (indicated by the vertical dashed line in the figure), the LASSO method identifies the key genes. (B) Cross-validation error plot of LASSO. The horizontal axis shows the log λ value, and the vertical axis represents the mean deviation. The red solid line indicates the mean deviation, while the gray shading represents the standard error. The vertical dashed line marks the optimal λ value, which corresponds to the smallest error. (C) Plot of expression changes of the four key genes at different time points. The horizontal axis denotes the time points, the vertical axis indicates gene expression on the left side, and gene counts on the right side. The bar graph represents gene counts, and the line graph illustrates gene expression. (D) KEGG pathway enrichment analysis graph for the purple module. (E) KEGG pathway enrichment analysis graph for the green module. (F) KEGG pathway enrichment analysis for the tan module.
Ijms 25 12342 g005
Figure 6. qRT-PCR validation of four key genes. The black lines represent the qRT-PCR results for the key genes, while the blue bars indicate the RNA-seq values. Each sample was analyzed with three biological replicates for qRT-PCR. Error bars represent the standard deviation of the relative expression levels from the three biological replicates. (A) Expression level of Cluster-28,554.0 in RNA-Seq and qRT-PCR validation results; (B) Expression level of FAS4 in RNA-Seq and qRT-PCR validation results; (C) Expression level of T10O24.10 in RNA-Seq and qRT-PCR validation results; (D) Expression level of EPSIN2/EPN2 in RNA-Seq and qRT-PCR validation results.
Figure 6. qRT-PCR validation of four key genes. The black lines represent the qRT-PCR results for the key genes, while the blue bars indicate the RNA-seq values. Each sample was analyzed with three biological replicates for qRT-PCR. Error bars represent the standard deviation of the relative expression levels from the three biological replicates. (A) Expression level of Cluster-28,554.0 in RNA-Seq and qRT-PCR validation results; (B) Expression level of FAS4 in RNA-Seq and qRT-PCR validation results; (C) Expression level of T10O24.10 in RNA-Seq and qRT-PCR validation results; (D) Expression level of EPSIN2/EPN2 in RNA-Seq and qRT-PCR validation results.
Ijms 25 12342 g006
Figure 7. Morphological characteristics of AM seed germination at four stages: (A) Seed dormancy (0 h). (B) Seed water absorption and swelling (12 h). (C) Seed coat dehiscence (24 h). (D) Radicle breakthrough (48 h).
Figure 7. Morphological characteristics of AM seed germination at four stages: (A) Seed dormancy (0 h). (B) Seed water absorption and swelling (12 h). (C) Seed coat dehiscence (24 h). (D) Radicle breakthrough (48 h).
Ijms 25 12342 g007
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, J.; Guo, S.; Zhang, X.; He, Y.; Wang, Y.; Tian, H.; Zhang, Q. Identification of Key Genes Involved in Seed Germination of Astragalus mongholicus. Int. J. Mol. Sci. 2024, 25, 12342. https://doi.org/10.3390/ijms252212342

AMA Style

Li J, Guo S, Zhang X, He Y, Wang Y, Tian H, Zhang Q. Identification of Key Genes Involved in Seed Germination of Astragalus mongholicus. International Journal of Molecular Sciences. 2024; 25(22):12342. https://doi.org/10.3390/ijms252212342

Chicago/Turabian Style

Li, Junlin, Shuhong Guo, Xian Zhang, Yuhao He, Yaoqin Wang, Hongling Tian, and Qiong Zhang. 2024. "Identification of Key Genes Involved in Seed Germination of Astragalus mongholicus" International Journal of Molecular Sciences 25, no. 22: 12342. https://doi.org/10.3390/ijms252212342

APA Style

Li, J., Guo, S., Zhang, X., He, Y., Wang, Y., Tian, H., & Zhang, Q. (2024). Identification of Key Genes Involved in Seed Germination of Astragalus mongholicus. International Journal of Molecular Sciences, 25(22), 12342. https://doi.org/10.3390/ijms252212342

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop