2.2. QTL-Allele System of Drought Tolerance in the CCSP
Using the 24,694 SNPLDBs in the RTM-GWAS procedure, at the first stage, 7795 and 7382 SNPLDBs were preselected for the second stage analysis, and then 75 and 64 QTL were detected for MPW and MPH, with −Log
10P values ranging in 2.2~33.0 and 2.2~25.9, respectively (
Table 2 and
Table 3,
Figure 1). The genetic contribution (
R2) of individual QTL ranged from 0.1% to 3.5%, with Gm13_BLOCK338 and Gm07_BLOCK25 having the highest significance values (−Log
10P) for respective indicators (
Figure 1a,
Table 2 and
Table 3). Among these loci, 16 and 14 ones were the large-contribution major QTL (LC-major QTL) with
R2 values more than 1.0% (
Table 2 and
Table 3).
On the 75 and 64 loci for MPW and MPH, 261 and 207 alleles were detected with 2~12 ones per locus, and among these alleles, 127 and 106 had positive effects, and 134 and 101 had negative effects, respectively (
Figure 1b,
Table 4).
Based on the RTM-GWAS results, the composition of DT QTL system in CCSP was explored (
Table 4). For MPW, 81.3% (heritability value) of the phenotypic variation was explained by genetic variation, in which the total
R2 of 16 LC-major QTL and 59 SC-major QTL were 25.9% and 28.8%, respectively, in a total of 75 detected QTL explaining 54.7% of phenotypic variation, and the remained 26.6% phenotypic variation might be explained by the collective unmapped minor QTL. The genetic structure of MPH was similar to that of MPW. In a total, 135 QTL/markers were detected for the two DT indicators, among which only 4 QTL/markers were shared between the two indicators (
Figure 1a,c). The shared QTL only explained a small part of phenotypic variation with values of 2.8% and 3.4%, respectively, not very much, but contained two LC-major QTL/SNPLDBs, Gm06_BLOCK576 and Gm08_BLOCK466, which might be the most important QTL for DT in soybean. The above results make us understand that DT is a complex trait, different indicators have their own genetic systems, and all the detected 135 QTL/markers are members of the DT genetic system. Therefore, in the following text, they will be considered as a joint genetic system conferring a super-DT-trait.
2.4. Population Genetic Differentiation from Landraces to Released Cultivars
The CCSP QTL-allele matrix was separated into its components, LRS and RCS. The independence of the allele frequency distribution of detected QTL between LRS and RCS was tested with Chi-square criterion, and 87 of the 135 QTL showed significant differentiation at
p = 0.05 with an average of 4.4 QTL per chromosome, ranging from 1 on Gm05 and Gm17 to 8 on Gm08 (
Table S3). The genetic differentiation performed mainly as different frequency distribution between LRS and RCS, especially on the 43 loci listed in
Table 5. For MPW, 27 out of 75 loci (36.0%) were with allele changes, and out of 261 (134 negative plus 127 positive) alleles, 34 (19 negative, 15 positive) LRS alleles were excluded but 12 (7 negative, 5 positive) alleles were newly emerged in RCS, and in a total, 46 (26 negative, 20 positive) alleles were changed on the 27 loci (
Figure 2d,
Table 5). For MPH, 19 out of 64 loci (29.7%) were with allele changes, and out of 207 (101, 106) alleles, 26 (14, 12) LRS alleles were excluded but 6 (4, 2) alleles emerged in RCS, and in a total, 32 (18, 14) alleles were changed. Among the 43 loci with allele changes, Gm06_BLOCK576, Gm08_BLOCK466 and GM20_39658089 were joint ones shared between MPW and MPH (
Table 5). Altogether, for the supper-DT-trait composed of MPW and MPH, there were 436 (217 negative, 219 positive) alleles on 135 DT QTL in the LRS, from which 378 (186, 192) alleles on 135 DT loci passed to RCS, but 58 (31 negative, 27 positive) alleles on 36 loci did not appear in RCS or excluded during the breeding processes (
Figure 2d,
Table 5). However, 16 (10 negative, 6 positive) new alleles on 13 loci were emerged during the breeding processes in the RCS. Among the 58 disappeared alleles and the 16 emerged alleles, both positive and negative effect alleles were involved, with 27 negative alleles vs. 31 positive alleles in excluded ones and 10 negative alleles vs. 6 positive alleles in emerged ones, in a total of 41 negative vs. 33 positive in a total of 74 changed alleles. Thus, in the excluded and emerged alleles, the number of negative alleles and number of positive alleles were roughly about similar, the allele changes from LRS to RCS was not obviously orientation-directed. The significant genetic differentiation between the LRS and RCS at the subpopulation and individual locus level caused the DT reduction from LRS to RCS, from 0.434 to 0.401 for MPW and from 0.150 to 0.082 for MPH, which suggested that the QTL-allele structure changes from LRS to RCS caused the subpopulation mean DT values changed. However, both the facts of the small phenotypic DT reduction and non-orientation-directed genetic differentiation from LRS to RCS implied that DT breeding did not receive enough attention in previous cultivar development, therefore, should be enhanced in the future in China.
In addition, among the 43 loci with allele changing, there appeared very active loci, Gm06_ BLOCK491for MPH with five alleles excluded in RCS; Gm15_BLOCK409 for MPW with three alleles emerged and one allele excluded in RCS; and Gm17_BLOCK344 for MPW with 4 alleles excluded in RCS. Among the newly emerged alleles in RCS, the allele (a3) on Gm06_BLOCK576 was associated with high positive effects for both MPW and MPH, while among the specific alleles in LRS (absented in RCS), the alleles a1 and a2 on Gm08_BLOCK466 were with negative effects for both MPW and MPH (
Figure 2d,
Table 5). These specific loci-alleles should be potential in their gene functions.
2.5. Prediction of Optimal Cross for Drought Tolerance Improvement
Based on the QTL-allele matrices, the optimal crosses of DT were predicted.
Figure 2e showed the distributions of predicted MPW and MPH values for the simulated progenies. There were 3319 and 3214 crosses with the predicted 95th percentile values exceeding the maximum phenotypic value in the CCSP for the respective indicators, among them, 745 crosses were jointly superior for the two indicators. The best top 10 predicted crosses were listed in
Table 6, among which the parental phenotypic values of MPW and MPH ranged in 0.645~1.411 and 0.059~1.712, respectively, while the predicted 95th percentile values of progenies ranged as 2.107~2.392 and 2.244~3.135, indicating that a great transgression might be obtained from these crosses. As shown in
Table 6, the two parents of Cross 1 (N25340 × N25258) both had high values for the two indicators, and the two parents of Cross 8 (N24359 × N25340) had medium and high values for the two indicators. However, both crosses could produce elite progenies with 95th percentile values up to 2.392 and 2.140, 2.552 and 2.417, 2.460 and 2.274 for MPW, MPH and WAV (weighted average value of the two indicators), respectively. The high × medium crosses (Cross 2 and 3) even can provide better segregants than the high × high cross (Cross 1), because more elite alleles could be converged in the former cases (
Figure 2f and
Table 6).
In conventional breeding, breeders usually used high × high strategy for designing crosses, while the present results implied that in marker-assisted breeding, the parental selection may extend to a broader range, which makes the breeders have more freedom in breeding by design. In summary, the marker-assisted cross design based on the QTL-allele matrix can take the advantage of converging the best alleles and therefore provide a way to create innovative materials with the desired genetic structure.
2.6. The Candidate Gene System of Drought Tolerance Inferred from Detected QTL
Based on the soybean reference genome of Glyma.Wm82.a1.v1.1 (
http://www.soybase.org), a total of 354 candidate genes within or neighboring to the 135 SNPLDBs were annotated for MPW and MPH (
Table 7). To verify the candidate genes, qRT-PCR was carried out by using two genotypes from the CCSP, drought tolerant N23644 (T) and drought-sensitive N00710 (S). A total of 177 annotated genes displayed differential expressions at more than five-folds in at least one of the four pairs of comparisons, which were the combinations of leaf (L) and root (R) of N23644 (T) and N00710 (S), i.e., TL, TR, SL and SR categories (
Table S4). There showed 6, 5, 4 and 2 down-regulated genes (with expression ranging in 0.11~0.20, 0.02~0.20, 0.09~0.16 and 0.06~0.11, respectively) and 19, 75, 44 and 121 up-regulated genes (with expression ranging in 5.05~32.79, 5.06~108.38, 5.17~96.00 and 5.19~211.57) in TL, TR, SL and SR categories, respectively, with some shared among the categories (
Figure 3A,
Table S4). In a total, 177 candidate genes were validated, in which, 92 and 92 (with overlapped ones) drought-responsive candidate genes were located in or close to the 52 and 44 (92 in total) SNPLDBs that were associated with MPW and MPH, respectively (
Table 7). Among the 177 candidate genes, 69 ones were from 24 LC-major QTL, 108 from 68 SC-major QTL and 7 from 4 shared QTL (
Table 7,
Figure 3B,C).
Among the 177 candidate genes, there were 30 most likely confident candidate genes that should be particularly studied further, including 22 highly expressed candidate genes and 10 candidate genes with their allele phenotypes significantly different at
p = 0.05 (
Table 8, with two shared). According to the results of qRT-PCR, 1, 6, 4 and 11 (22 in total) supper candidates were identified in TL, TR, SL and SR categories (
Figure 3A,
Table 8,
Table S4), respectively, with relative expression values more than 1.5 times of the inter-quartile range based on boxplot. Among them, the most sensitive gene was
Glyma07g18280, which expressed similar patterns in the leaf of T and in the leaf and root of S, with relative expression values of 16.62 and 1.46, 75.58 and 211.57, respectively.
Glyma07g18280 belonging to iron/ascorbate family oxidoreductases, was involved in multiple biological processes including jasmonic acid stimulus, oxidation-reduction, response to karrikin and so on.
Glyma02g08115 coding
Pip1 protein, is a drought-induced water channel protein, which was predicted to be responsible for water deprivation, salt stress and ABA stimulus (
https://www.ncbi.nlm.nih.gov/nucleotide/U27347.1).
Glyma02g26160, coding lipoxygenase, and
Glyma11g16750, in aldehyde dehydrogenase family, were both predicted involving in the biological processes of response to water deprivation (
Table 8). As for the 10 candidate genes with their allele phenotypes significantly different at
p = 0.05 (included in the 177 candidate gens), these should be also the confident candidate genes, in which,
Glyma02g08115 and
Glyma03g01262 were also identified from high expression level of qRT-PCR (
Table 8) and
Glyma16g27350 was predicted to be a
Sucrose transport protein (
Table 8), whose homologous genes were required for abiotic stress tolerance in an ABA-dependent pathway in
Arabidopsis thaliana [
31]. However, the above potential major candidate genes-alleles are only a small part of the 177 ones, the others are to be explored further. In addition, among the 177 candidate genes, 45 ones contain SNP(s) in the CCSP, including 25 ones with single SNP and 20 ones with multiple SNPs (
Table S5). On the 45 candidate genes each with 2–6 alleles, 117 alleles were recognized totally, where 24 alleles from the 10 genes with different allele phenotypes were significantly associated with DT.
In gene ontology enrichment analysis, all the above 177 predicted candidate genes were grouped into nine categories, i.e., ABA responders (51), stress responders (41), transports (41), development factors (38), protein metabolism (26), transcription factors (21), protein kinases (15), unknown function (35) and others (22) (
Figure 3D,
Table S6). The proportions of the candidate genes over the nine categories were similar for MPW, MPH and shared ones (
Figure 3D and
Table S6), which indicated that each indicator included all the nine gene categories or a similar set of functional genes. Furthermore, the genes related to the 58 excluded and 16 emerged QTL-alleles changed from LRS to RCS were located on 37 DT QTL, in which 95 verified candidate genes were included, which indicated that more candidate genes were related to the evolutionarily changed loci. Among the 95 verified candidate genes, 25, 25, 18, 27, 14, 12, 16, 14 and 10 ones were involved in the nine GO groups, where ABA responders, stress responders and development factors were also the major categories (
Tables S4 and S6). Thus, the five sets of gene ontology enrichment analysis in
Table S6 showed a similar functional classification results, indicating that DT in fact is the resulted performance contributed from a series of functional genes, and that the DT gene network composed of the nine category functions determines the DT performance. As we understand, to know the DT genetic mechanism we have to know the whole picture of the genes, and therefore the whole set of the QTL-alleles, rather than the individual QTL-allele or gene-allele.