1. Introduction
Genes are the keys to deciphering the molecular mechanism underpinning a biological trait and designing approaches efficient for plant and animal genetic improvement. Therefore, several methods have been developed to clone the genes controlling a biological trait, such as map-based cloning [
1] and mutagenesis [
2,
3]. These methods include two parts: candidate gene identification and gene function validation. QTL mapping, eQTL mapping, gene differential expression (DE) analysis, and genome-wide association study (GWAS) have been widely used for the genome-wide identification of candidate genes controlling a biological trait, while genetic transformation, RNA interference (RNAi), gene editing, and gene regulation have been widely used to validate the functions of the candidate genes. Therefore, candidate gene identification has been the prerequisite for gene cloning because most genes are known only by phenotypes. These methods have greatly contributed to the cloning of genes controlling qualitative and quantitative traits [
1]. However, only a few of the genes have been cloned for each of the quantitative traits to date [
4]. Since most traits of agricultural or biological importance are quantitative traits controlled by numerous genes, the limited number of genes cloned to date for each of the traits is insufficient not only for comprehensively deciphering the molecular mechanisms of the traits but also for designing approaches efficient for the manipulation of the traits in agriculture. Therefore, it is necessary to develop a procedure efficient for genome-wide identification of candidate genes controlling a complex trait to accelerate the process of gene cloning and promote functional genomics research.
The aim of the present study was to develop a method that is accurate, efficient, readily usable, and widely applicable for the genome-wide identification of the candidate genes controlling a complex trait using ginseng ginsenoside Rb1 as the target trait. Ginseng (
Panax ginseng Mey.), abundant in numerous bioactive components, especially ginsenosides, is grown in Asia and North America for health food, health product, and medicine. Ginseng is a tetraploid perennial that has a large complex genome, with a genome size of approximately 3.3 Gb. Although three genomes of ginseng have been sequenced [
5,
6,
7], and a massive number of transcriptomes have been developed from different genotypes of ginseng [
8], different developmental stages [
9,
10], different plant parts [
10,
11], and different treatment conditions [
12], no artificial mapping population has been developed, and no genetic map has been available, as has been for annual crops such as rice, maize, wheat, soybean, and cotton. Ginseng genome research remains largely under-studied compared with those of annual crops. If a procedure that is efficient for the genome-wide identification of the candidate genes controlling a complex trait is developed in ginseng, it will be applicable to other crops whose genome research has been far more advanced. Importantly, a desirable method has been developed and widely used in ginseng to validate the functions of candidate genes involved in ginsenoside biosynthesis [
8,
13,
14,
15,
16,
17,
18], which will facilitate the validation of the candidate genes controlling the target trait. Ginsenoside Rb1 is one of the most important ginsenosides [
19], and the biosynthesis of Rb1 has been shown to be a complex biological process controlled by numerous genes.
The present study analyzed the transcriptomes of four-year-old plant roots of 42 cultivars selected from a ginseng GWAS panel against their Rb1 contents. The analysis was performed from four aspects, including gene transcript differential expressions between cultivars with highest- and lowest-Rb1 contents, the correlation between gene transcript expression and Rb1 content among the cultivars, the biological impact of gene SNP mutation on Rb1 content, and the co-expression network of genes with the previously cloned genes that were involved in ginsenoside biosynthesis [
18,
20,
21,
22,
23,
24,
25,
26,
27]. The major advances of this analysis procedure over the existing integrated analysis of transcriptomes with the target trait for candidate identification included three aspects: (1) the use of transcript expressions for gene differential expression analysis, correlation analysis, and gene co-expression network analysis; (2) the use of gene transcript expression–target trait correlation in a population; and (3) the use of the impact of SNP or InDel mutation contained in the gene on the target trait. Given that different transcripts spliced from the same gene may have different biological functions [
4], the candidate genes identified are more accurate than those identified based on genes when the gene transcripts are used for candidate gene identification. We identified 22 candidate genes for Rb1 biosynthesis, the functions of 21 of which in Rb1 biosynthesis were validated. Using the Rb1 biosynthesis genes, we, for the first time, developed 96 biomarkers for Rb1 biosynthesis and studied the gene interactions underlying Rb1 biosynthesis at the genome level. Therefore, this study has identified 21 new genes and 96 biomarkers for Rb1 biosynthesis and substantially advanced our understanding of the molecular mechanisms underlying Rb1 biosynthesis.
3. Discussion
The present study has developed and demonstrated a method for the rapid, accurate, efficient, and genome-wide identification of candidate genes controlling a biological trait or process using the ginsenoside Rb1 content in ginseng as a target trait. Genes are the central determinants of all traits and the core molecular basis of plant biology and genetic improvement. Therefore, extensive research has been conducted to develop methods that can identify candidate genes that likely control a biological trait or process. These methods include QTL mapping, eQTL mapping, GWAS, and gene DE analysis and have greatly contributed to the genome-wide identification of candidate genes for a biological trait or process. Nevertheless, as QTL mapping, eQTL mapping, and GWAS are linkage- or linkage disequilibrium (LD)-based, these methods cannot accurately identify candidate genes for a trait because a QTL interval or a genomic region often contains multiple genes, probably dozens to hundreds of genes. DE analysis is based on the expression of a gene that is essential for the gene to control a trait, but the candidate genes identified with the DE method often contain many false positive candidate genes due to the hypothesized relationship between gene expression and target trait. The analysis performed in this study is based on both the inherent relationship between gene transcript expression and trait performance and the impacts of the SNP/InDel mutations contained in the genes on the target trait and identifies candidate genes for a trait in a stepwise manner from four aspects of genes. These include transcript DE analysis between genotypes contrasting in the target trait, gene transcript expression–trait performance correlation analysis, and the impacts of gene mutations on the target trait, followed by gene transcript co-expression network analysis. Since a
p-value = 0.001 for DE analysis, a
p-value = 0.05 for gene transcript expression–trait phenotype variation correlation analysis, a
p-value = 0.05 for impact analysis of gene mutation on the target trait, and a
p-value = 0.05 for gene transcript co-expression network analysis were applied for the candidate gene identification for the trait, the collective
p-value of these analyses of identifying a candidate gene by chance was 1.25 × 10
−7, being close to zero. In addition, the expressions of individual transcripts were used for the DE, correlation, and co-expression network analyses because different transcripts spliced from a gene are shown to have different biological functions [
4], which further increases the sensitivity and, thus, accuracy of candidate gene identification with these analyses. Furthermore, given that it is based on integrated and stepwise analysis of only transcriptomic and phenotypic data, the analysis is applicable to genome-wide and accurate identification of candidate genes controlling a biological trait or process in any species, including plants, animals, and microbes.
The utility and efficiency of the method developed in this study for genome-wide identification of candidate genes controlling a biological trait or process have been demonstrated by cloning 21
PgRb1 genes involved in Rb1 biosynthesis. We have identified 22 candidate genes for Rb1 biosynthesis through the stepwise integrated analysis of transcriptomes with Rb1 contents in this study. The functions of 21 (95%) of them have been validated through MeJA regulation that has widely been used for functional validation of genes involved in ginsenoside biosynthesis in ginseng [
8,
13,
14,
15,
16,
17,
18]. For instance, the functions of most of the key enzyme genes cloned thus far in ginsenoside biosynthesis were validated by gene regulation with MeJA, such as
PgSS [
21],
PgSE [
15],
PgMVD [
30],
PgFPS [
31],
PgDDS [
32],
PgUGT [
33], and
PgCYP450 [
34]. Although the function of the one remaining candidate gene for Rb1 biosynthesis,
PgRb1-20-01, has not been confirmed by the MeJA-mediated gene regulation method, it is only less than 5% out of the 22 candidate genes. Furthermore, we randomly selected one of the 21
PgRb1 genes identified by MeJA regulation,
PgRb1-11-01, and validated its function in Rb1 biosynthesis by the genetic transformation in another study (submitted). In addition, 96 genic SNPs/InDels were identified from these 21
PgRb1 genes that had significant impacts on Rb1 content, providing an additional line of evidence on their function in Rb1 biosynthesis. These SNPs/InDels provide biomarkers that are the most desirable for the manipulation of Rb1 biosynthesis in ginseng and for enhanced ginseng breeding. Together, the results of all these experiments, including the MeJA regulation, genetic transformation, and mutation analysis, consistently conclude that the 21
PgRb1 genes of the 22 candidate genes are involved in Rb1 biosynthesis.
Ginsenoside Rb1 biosynthesis has been shown to be a complicated process in which at least 32 genes, including the 21
PgRb1 genes cloned in this study and 11 published Rb1 biosynthesis genes cloned in previous studies, are involved. First, the
PgRb1 genes encode a diverse class of proteins, enzymes, and transcription factors and are categorized into 15 secondary GO categories. The expressions of these
PgRb1 genes vary dramatically not only across tissues, developmental stages, and cultivars but also within a tissue, at a development stage, and in a cultivar. These results indicate that the genes involved in Rb1 biosynthesis have greatly diverged in nucleotide sequence, expression activity, and biochemical functionality, even though they are all involved in Rb1 biosynthesis. Second, the 21
PgRb1 genes identified in this study with the 11 published Rb1 biosynthesis genes analyzed form two sub-pathways for Rb1 biosynthesis. One of the sub-pathways positively contributes to, and the other negatively contributes to, Rb1 biosynthesis. Third, network analysis reveals that the 32 Rb1 biosynthesis genes form a co-expression network several times stronger than that constructed from randomly selected unknown genes, suggesting that the process of Rb1 biosynthesis is completed by correlation of the genes controlling Rb1 biosynthesis in functionality. Moreover, this study reveals that the variation in the network has significantly influenced Rb1 biosynthesis. These findings are supported by previous studies of the molecular mechanisms underlying complex traits in ginseng [
4,
8,
35], maize [
4], and cotton [
4]. Finally, the 32 genes do not seem to be all the genes involved in Rb1 biosynthesis because only the
PgRb1 genes that contain SNPs/InDels significantly impacting Rb1 biosynthesis have been identified in this study. The genes that do not contain SNPs/InDels but are involved in Rb1 biosynthesis cannot be identified through the analysis performed in this study. Therefore, additional research remains to identify the remaining genes involved in Rb1 biosynthesis and to comprehensively decipher the molecular mechanism underlying Rb1 biosynthesis.
4. Materials and Methods
4.1. Plant Materials
Four types of plant materials were used for this study. The first type of plant material was the four-year-old plant roots of 42 ginseng cultivars. Roots are the major organ where the ginsenosides, including Rb1, are synthesized and stored, thus the major products of ginseng. Although the older a ginseng plant, the higher the value of its root, farmers usually harvest roots at the four-year-old stage. These 42 cultivars were representative of the genetic diversity of ginseng in Jilin Province, China—the origin and the diversity center of
P. ginseng. The Rb1 contents in the roots of the 42 cultivars were determined in our previous studies [
35]. The second type of plant material was 14 tissues of a four-year-old ginseng plant (for details of the 14 tissues, see refs. [
10,
36]), and their Rb1 contents were previously determined [
36]. The third type of plant material was the roots of 5-, 12-, 18-, and 25-year-old ginseng plants [
10]. All these three types of plant materials were sampled from our ginseng field experimental station, Baishan, Jilin (41°56′38″ N 126°24′53″ E). The fourth type of plant material was the cultured adventitious roots of ginseng treated with methyl jasmonate (MeJA) (see below).
4.2. Database
Five databases, hereafter defined as I, II, III, IV, and V, were used in this study. Database I was derived from the four-year-old plant roots of the above 42 cultivars (NCBI/GEO 369 SRR13131364–SRR13131405). The database consists of the sequences of all gene transcripts expressed in the roots, the expressions of the transcripts and the genes, the single nucleotide polymorphism (SNP) and nucleotide insertion/deletion (InDel) mutations, and the contents of their ginsenosides, including Rb1 [
8,
35]. Database II was derived from the 14 tissues of a four-year-old ginseng plant. The database contains the sequences of all gene transcripts expressed in the 14 tissues (BioProject PRJNA302556) and the expressions of all transcripts spliced from the expressed genes, and the overall expressions of the genes (NCBI/GEO SRP066368) [
10]. Database III was derived from the roots of the above 5-, 12-, 18-, and 25-year-old ginseng plants. It also consists of the sequences of all gene transcripts expressed in the roots and the expressions of all transcripts spliced from the expressed genes, and the overall expressions of the genes [
10]. Database IV was the draft genome assembly of
P. ginseng line IR826 [
5]. Database V was the draft genome assembly of
P. ginseng cv. ChP [
6].
4.3. Genome-Wide Identification of Candidate Genes for Rb1 Biosynthesis
The candidate genes for Rb1 biosynthesis were identified by stepwise and integrated analysis of the database of the 42 cultivars (Database I) with four sequential steps. The analysis was designed based on two genetic bases: the central dogma of molecular biology (a gene controls a trait through its expression) and the impacts of gene mutations on the target trait (if a gene controls a trait, the mutation of the gene will influence its phenotype). Therefore, the two methods complement each other, and their combination allows much more accurate and efficient identification of candidate genes than either method alone for a trait or biological process.
Step 1: Transcript differential expression analysis between cultivars with highest- and lowest-Rb1 contents in roots. This is the first step of the integrated analysis of transcriptomes with Rb1 contents for genome-wide identification of candidate genes for Rb1 biosynthesis. Zhang et al. [
29] and Han et al. [
36] showed that most genes are subjected to RNA alternative splicing, forming multiple transcripts from a single gene while they express in a tissue at a developmental stage in a genotype. Zhang et al. [
4] discovered that different transcripts spliced from a gene have different biological functions. Therefore, the expressions of transcripts were used for this analysis so that the sensitivity of identifying the candidate genes controlling a biological trait could be substantially increased. The 42 cultivars were sorted in ascending according to the Rb1 contents of their roots. The 14 highest- and 14 lowest-Rb1 content cultivars were selected and subjected to Student’s
t-test. Statistical analysis showed the Rb1 contents between the 14 highest and 14 lowest cultivars were significantly different (
p < 0.01). Therefore, we sampled cultivars from each of the two groups, respectively, five times with a bootstrap replacement sampling method in which some of the 14 cultivars in each group might be sampled more than once while others might not be sampled at all. Consequently, we achieved five bootstrap replications, with each consisting of 14 cultivar samples. The expressions of the transcripts in the roots of the 14 highest- and 14 lowest-Rb1 content cultivars were extracted from Database I. The mean expression of each transcript in the 14 cultivar samples was calculated and used as a bootstrap replicate. Therefore, we had five bootstrap replicates for each of the highest- and lowest-Rb1 content cultivar groups for transcript differential expression analysis. The analysis was conducted with DESeq2 v1.32.0 [
37]. The transcript with differential expression by log
2 (foldchange) ≥ 2 and adjusted
p-value ≤ 0.001 between the two groups was counted as a candidate gene I for Rb1 biosynthesis, defined as candidate gene DET-I.
Step 2: Correlation analysis between the expressions of the candidate gene DET-Is and Rb1 contents. Zhang et al. [
4] showed that the expressions of nearly 100% of the gene transcripts controlling a quantitative trait were correlated with the phenotype of the trait, no matter what genetic resources they were cloned from; thus, correlation analysis between gene transcript expression and phenotype variation of a trait provides a line of evidence on the candidate genes controlling the trait. All 42 cultivars selected above were used for this experiment. The expression of each DET-I in the 42 cultivars was subjected to correlation analysis with their Rb1 contents using the SPSS package (IBM SPSS Statistics 23). The candidate gene DET-I whose expression was significantly correlated with the Rb1 content among the 42 cultivars was selected as candidate gene II for Rb1 biosynthesis, defined as candidate gene DET-II.
Step 3: Biological impacts of the SNPs/InDels of the DET-IIs on Rb1 contents. We hypothesized that if a gene controls a trait, the mutation of the gene will influence its phenotype. All 42 cultivars were used for this experiment. The genic SNPs/InDels of the candidate gene DET-IIs were extracted from the genic SNP/InDel dataset of Database I, which were called by the SAMtools software using the transcript sequences of Database II (developed from 14 tissues of a four-year-old cv Damaya plant) [
38,
39] as the reference. The impact of each DET-II genic SNP/InDel on Rb1 content was determined statistically. Briefly, the 42 cultivars were grouped using the genotypes of the SNP under study. Since every SNP is bi-allelic, the 42 cultivars were grouped into two groups if no heterozygote existed among the 42 cultivars or three groups if heterozygotes were identified among the 42 cultivars. If the 42 cultivars were grouped into two groups, Student’s
t-test was used to determine whether the SNP or InDel under study had an impact on Rb1 contents. If the 42 cultivars were grouped into three groups, one-way analysis of variance (ANOVA) was used to determine whether the SNP or InDel under study had an impact on Rb1 contents. The candidate gene DET-II with an SNP or InDel that had a significant impact on Rb1 content (
p ≤ 0.05) was selected and defined as candidate gene DET-III for Rb1 biosynthesis. Moreover, the degree of impact of the SNP or InDel mutation on Rb1 content was estimated using the following formula:
where Y
i is the biological impact of mutation i on the Rb1 content, G
ih is the mean Rb1 content of the homozygous group with higher Rb1 content, and G
il is the mean Rb1 content of the homozygous group with lower Rb1 content. Finally, the type of the SNP or InDel, such as synonymous SNP (nucleotide substitution) and non-synonymous SNP, was classified by the ORF (open-reading frame) Finder.
Step 4: Co-expressions of the candidate gene DET-IIIs with previously published Rb1 biosynthesis genes. Zhang et al. [
4] showed that the gene transcripts controlling a quantitative trait were several times more likely to form a co-expression network than randomly selected unknown gene transcripts. Therefore, we further conducted the co-expression network analysis of the candidate gene DET-IIIs with the previously published Rb1 biosynthesis genes (
Table S3). Zhao et al. [
35] previously conducted a correlation analysis between expressions of 16 transcripts spliced from 10 published ginsenoside biosynthesis genes and Rb1 contents. They showed that 11 of the transcripts spliced from nine of the published ginsenoside biosynthesis genes were significantly correlated in expressions with Rb1 contents. Therefore, the expressions of the candidate gene DET-IIIs and the published Rb1 biosynthesis gene transcripts were both extracted from the transcript expression dataset of Database I that were counted by RNA-seq, followed by transcript quantification with the RSEM software [
40]. The co-expression network was constructed using the Biolayout
Express3D (Version 3.3) [
41] and characterized as previously described [
4,
8,
35]. The candidate gene DET-IIIs that formed a strong co-expression network with the published Rb1 biosynthesis gene transcripts were selected and defined as
PgRb1 candidate genes for Rb1 biosynthesis.
4.4. Functional Validation of PgRb1 Candidate Genes for Rb1 Biosynthesis
Because roots are the major organ in which ginsenosides are synthesized and stored, gene regulation with MeJA in cultured adventitious roots has widely been used to functionally validate genes that are involved in ginsenoside biosynthesis in ginseng [
8,
13,
14,
15,
16,
17,
18]. The functions of most of the key enzyme genes cloned thus far in ginsenoside biosynthesis were validated by this method, including
PgMVD [
30],
PgFPS [
31],
PgDDS [
32],
PgSS [
21],
PgSE [
15],
PgUGT [
33], and
PgCYP450 [
34]. Therefore, we validated the roles of the
PgRb1 candidate genes in Rb1 biosynthesis using this method. Ginseng seeds were used as the sources of plant materials for adventitious root induction. The coats of the ginseng seeds were removed, washed, and sterilized. The embryos of the seeds were carefully excised and then cultured on 1/2 MS (Murashige and Skoog) agar basal medium containing 1.0 mg/L gibberellins at 22–25 °C under 14 h light /10 h dark for one week to generate sterile seedlings. The healthy sterile seedlings were selected, and their leaflet pedicels were cut into approximately 0.5 cm segments under sterile conditions and cultured on the MS agar medium containing 2 mg/L 2,4-D (2,4-dichlorophenoxyacetic acid) and 0.2 mg/L 6-BA (6-benzylaminopurine) at 22–25 °C in the dark for 4–5 weeks to induce calli. The vigorous calli were selected, inoculated onto B5 (Gamborg’s B-5) agar basal medium containing 3.0 mg/L IBA (indole-3-butyric acid), and cultured at 22–25 °C in the dark to induce adventitious roots. Vigorous adventitious roots were individually selected and cultured in a B5 liquid medium at 22–25 °C, 110 RPM in the dark for 30 days to reproduce adventitious roots.
The 30-old-day vigorous adventitious roots were selected, inoculated into 250 mL fresh B5 liquid medium with an amount of 1.0 g adventitious roots per 250 mL medium, and cultured at 22–25 °C, 110 RPM in the dark for 25 days. Since the experiment contained 12 treatment time points with MeJA, with each time point having three biological replicates, a total of 36 flasks containing 250 mL B5 liquid medium were inoculated and cultured. On the 25th day of the culture, MeJA was added to 33 of the 36 adventitious root culture flasks, respectively, at a final concentration of 200 µM MeJA. The adventitious roots were then harvested at 0 h (the culture with no MeJA added), 6 h, 12 h, 24 h, 36 h, 48 h, 60 h, 72 h, 84 h, 96 h, 108 h, and 120 h. One gram of the adventitious roots harvested from each of the three replicates at each time point was immediately frozen in liquid nitrogen and stored at −80 °C for RNA analysis. The remaining adventitious roots were dried and stored at 4 °C for ginsenoside analysis.
Ginsenosides were extracted from all three biological replicates of the dried adventitious roots sampled at each time point by the Soxhlet extraction method, as described in previous studies [
42]. One gram of each dried root replicate sample was used for ginsenoside extraction. Mono-ginsenosides were separated using the Waters Alliance HPLC (high-performance liquid chromatography) with an e2695 Separation Module. The contents of individual mono-ginsenosides were determined using the Waters 2489 Ultraviolet Spectrophotometric Detector (Waters, Milford, MA, USA), according to Li et al. [
42]. The contents of Rb1 were extracted from the measurement results of the mono-ginsenosides. To validate the effects of the MeJA treatment on Rb1 biosynthesis, the contents of Rb1 that had three biological replicates for each of 6 h through 120 h time points were compared with the contents of Rb1 for the 0 h time point (non-MeJA treatment control that also had three biological replicates) by Student’s
t-test.
The total RNAs were isolated from all three biological replicates of the frozen adventitious roots sampled at each time point using the TRIpure Reagent Total RNA Extraction Reagent (Bioteke, Beijing, China). RNA-seq libraries were constructed as described by Zhang et al. [
29]. The libraries were qualified, quantified, multiplexed, and sequenced using the HiSeq X Ten (Illumina, Inc., San Diego, USA), with 150 PE (paired-end) and >30 million clean reads per sample. The sequence reads were quality filtered and trimmed as described by Zhang et al. [
29] before further analysis. Individual gene transcripts were assembled from the clean reads using the Trinity v2.14.0 software [
43] with the transcriptome assembly of Database II as the reference. The expressions of individual transcripts and the overall expressions of genes were quantified using the RSEM v1.3.3 software [
40]. The expressions of the transcripts in transcripts per million (TPM) were used for further analysis. The expressions of the
PgRb1 candidate gene transcripts were extracted from each biological replicate of the adventitious roots sampled at each time point. The effects of the MeJA treatment on the expressions of the
PgRb1 candidate genes were confirmed by comparing the expressions of the genes in the MeJA-treated samples at each time point with those in the control samples at the 0 h time point by Student’s
t-test.
To further confirm the roles of the
PgRb1 candidate genes in Rb1 biosynthesis, the expressions of their individual transcripts at each time point, including the 0 h time point, were subjected to Pearson’s correlation analysis with the Rb1 contents at each time point. The expressions of the individual transcripts spliced from the
PgRb1 candidate genes were used for the correlation analysis because different transcripts spliced from a single gene may have different functions [
4]. If a transcript expression of a
PgRb1 candidate gene was correlated with the Rb1 contents at a two-tailed significance level of
p ≤ 0.05, the gene spliced into the transcript was considered as the gene involved in Rb1 biosynthesis.
4.5. Annotation and Gene Ontology (GO) Categorization of the 21 PgRb1 Genes
The
PgRb1 genes were annotated and categorized in GO using the Blast2go 5.2 software [
44].
4.6. Expression Mode of the 21 PgRb1 Genes with the 11 Published Rb1 Biosynthesis Genes
To have a first insight into the functional relationships of the Rb1 biosynthesis genes in Rb1 biosynthesis, we constructed their expression heatmaps in different tissues of a four-year-old plant, at different developmental stages in plant roots, and in the four-year-old plant roots among cultivars. The expressions of the
PgRb1 genes in different tissues, at different developmental stages, and across cultivars were extracted from the expression datasets of Databases II, III, and I, respectively. The expression heatmaps in different tissues, at different developmental stages, and across cultivars were constructed using the R programming language and software, according to Wang et al. [
10].
4.7. The Putative Pathways of the Rb1 Biosynthesis Genes in Rb1 Biosynthesis
Moreover, we inferred the pathway of the Rb1 biosynthesis genes in Rb1 biosynthesis. Pearson’s correlation coefficients of the expressions between the genes and Rb1 content were calculated using the SPSS package (IBM SPSS Statistics 23). We hypothesized that the genes that directly interacted in Rb1 biosynthesis tended to have higher correlation coefficients of expressions than the genes that indirectly interacted in Rb1 biosynthesis in a group of genes that interacted. Therefore, the interaction groups of the genes were constructed first and then connected based on their expression correlation coefficients. The gene that is in the central position of the pathway, assuming that it was playing a central role, was defined as a hub gene. The action direction of a gene in a pathway was assigned toward ginsenoside Rb1.
4.8. Co-Expression Network Variation of the Rb1 Biosynthesis Genes and its Impacts on Rb1 Biosynthesis
Finally, we examined their co-expression network with the previously published Rb1 biosynthesis genes and the impact of the network variation on Rb1 contents to determine the relationship of the
PgRb1 genes in Rb1 biosynthesis and the impacts of their relationship variation on Rb1 biosynthesis. The 42 cultivars were grouped into three groups that differed in Rb1 content (
p ≤ 0.01), with low-, mid-, and high-Rb1 contents, and each group containing 14 cultivars. The co-expression networks of the
PgRb1 genes were constructed for each group of cultivars separately, using the BioLayout
Express3D software [
41] as described above. The resultant co-expression networks of the genes were compared among the three groups of cultivars in major characteristics of the networks, including the number of gene nodes, the number of gene interactions or co-expression edges, and network robustness presented by connectivity.