PNGSeqR: An R Package for Rapid Candidate Gene Selection through Pooled Next-Generation Sequencing
Abstract
:1. Introduction
2. Results
2.1. Example 1—Prioritizing the Candidate Gene Controlling Quality Trait by Using RNA-Seq Data
2.2. Example 2—Mapping QTLs by Using DNA-Seq Data
2.3. Example 3—Mapping QTLs Using a RIL Population
3. Discussion
4. Materials and Methods
4.1. Data Import and Filtering
4.2. Bulk Segregant Analyses
4.3. DEG and GO Analyses
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Giovannoni, J.J.; Wing, R.A.; Ganal, M.W.; Tanksley, S.D. Isolation of Molecular Markers from Specific Chromosomal Intervals Using DNA Pools from Existing Mapping Populations. Nucleic Acids Res. 1991, 19, 6553–6568. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Michelmore, R.W.; Paran, I.; Kesseli, R.V. Identification of Markers Linked to Disease-Resistance Genes by Bulked Segregant Analysis: A Rapid Method to Detect Markers in Specific Genomic Regions by Using Segregating Populations. Proc. Natl. Acad. Sci. USA 1991, 88, 9828–9832. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wolyn, D.J.; Borevitz, J.O.; Loudet, O.; Schwartz, C.; Maloof, J.; Ecker, J.R.; Berry, C.C.; Chory, J. Light-Response Quantitative Trait Loci Identified with Composite Interval and EXtreme Array Mapping in Arabidopsis Thaliana. Genetics 2004, 167, 907–917. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wenzl, P.; Raman, H.; Wang, J.; Zhou, M.; Huttner, E.; Kilian, A. A DArT Platform for Quantitative Bulked Segregant Analysis. BMC Genom. 2007, 8, 196. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yuanlin, D.; Runsen, P.; Yuanchang, Z.; Jianmin, Q.; Lihui, L.; Zhiwei, C.; Huaqing, L.; Danfeng, Z. Genetic Analysis and Mapping of Gene Fzp(t) Controlling Spikelet Differentiation in Rice. Sci. China 2003, 46, 328–334. [Google Scholar] [CrossRef]
- Baird, N.A.; Etter, P.D.; Atwood, T.S.; Currey, M.C.; Shiver, A.L.; Lewis, Z.A.; Selker, E.U.; Cresko, W.A.; Johnson, E.A. Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers. PLoS ONE 2008, 3, e3376. [Google Scholar] [CrossRef]
- Magwene, P.M.; Willis, J.H.; Kelly, J.K. The Statistics of Bulk Segregant Analysis Using Next Generation Sequencing. PLoS Comput. Biol. 2011, 7, e1002255. [Google Scholar] [CrossRef] [Green Version]
- Hill, J.T.; Demarest, B.L.; Bisgrove, B.W.; Gorsi, B.; Su, Y.-C.; Yost, H.J. MMAPPR: Mutation Mapping Analysis Pipeline for Pooled RNA-Seq. Genome Res. 2013, 23, 687–697. [Google Scholar] [CrossRef] [Green Version]
- Shen, F.; Huang, Z.; Zhang, B.; Wang, Y.; Zhang, X.; Wu, T.; Xu, X.; Zhang, X.; Han, Z. Mapping Gene Markers for Apple Fruit Ring Rot Disease Resistance Using a Multi-Omics Approach. G3 Genes|Genomes|Genet. 2019, 9, 1663–1678. [Google Scholar] [CrossRef] [Green Version]
- Liu, S.; Yeh, C.T.; Tang, H.M.; Nettleton, D.; Schnable, P.S. Gene Mapping via Bulked Segregant RNA-Seq (BSR-Seq). PLoS ONE 2012, 7, e36406. [Google Scholar] [CrossRef] [Green Version]
- Takagi, H.; Abe, A.; Yoshida, K.; Kosugi, S.; Natsume, S.; Mitsuoka, C.; Uemura, A.; Utsushi, H.; Tamiru, M.; Takuno, S.; et al. QTL-Seq: Rapid Mapping of Quantitative Trait Loci in Rice by Whole Genome Resequencing of DNA from Two Bulked Populations. Plant J. 2013, 74, 174–183. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Xu, Y. Bulk Segregation Analysis in the NGS Era: A Review of Its Teenage Years. Plant J. 2022, 109, 1355–1374. [Google Scholar] [CrossRef] [PubMed]
- Cloonan, N.; Forrest, A.R.R.; Kolle, G.; Gardiner, B.B.A.; Faulkner, G.J.; Brown, M.K.; Taylor, D.F.; Steptoe, A.L.; Wani, S.; Bethel, G.; et al. Stem Cell Transcriptome Profiling via Massive-Scale MRNA Sequencing. Nat. Methods 2008, 5, 613–619. [Google Scholar] [CrossRef] [PubMed]
- Mortazavi, A.; Williams, B.A.; McCue, K.; Schaeffer, L.; Wold, B. Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq. Nat. Methods 2008, 5, 621–628. [Google Scholar] [CrossRef]
- Chepelev, I.; Wei, G.; Tang, Q.; Zhao, K. Detection of Single Nucleotide Variations in Expressed Exons of the Human Genome Using RNA-Seq. Nucleic Acids Res. 2009, 37, e106. [Google Scholar] [CrossRef] [Green Version]
- Marioni, J.C.; Mason, C.E.; Mane, S.M.; Stephens, M.; Gilad, Y. RNA-Seq: An Assessment of Technical Reproducibility and Comparison with Gene Expression Arrays. Genome Res. 2008, 18, 1509–1517. [Google Scholar] [CrossRef] [Green Version]
- Begara-Morales, J.C.; Sánchez-Calvo, B.; Luque, F.; Leyva-Pérez, M.O.; Leterrier, M.; Corpas, F.J.; Barroso, J.B. Differential Transcriptomic Analysis by RNA-Seq of GSNO-Responsive Genes Between Arabidopsis Roots and Leaves. Plant Cell Physiol. 2014, 55, 1080–1095. [Google Scholar] [CrossRef]
- Kim, K.H.; Kang, Y.J.; Kim, D.H.; Yoon, M.Y.; Moon, J.-K.; Kim, M.Y.; Van, K.; Lee, S.-H. RNA-Seq Analysis of a Soybean Near-Isogenic Line Carrying Bacterial Leaf Pustule-Resistant and -Susceptible Alleles. DNA Res. 2011, 18, 483–497. [Google Scholar] [CrossRef] [Green Version]
- Shen, C.; Li, D.; He, R.; Fang, Z.; Xia, Y.; Gao, J.; Shen, H.; Cao, M. Comparative Transcriptome Analysis of RNA-Seq Data for Cold-Tolerant and Cold-Sensitive Rice Genotypes under Cold Stress. J. Plant Biol. 2014, 57, 337–348. [Google Scholar] [CrossRef]
- Chen, Q.; Zhang, J.; Wang, J.; Xie, Y.; Cui, Y.; Du, X.; Li, L.; Fu, J.; Liu, Y.; Wang, J.; et al. Small Kernel 501 (smk501) Encodes the RUBylation Activating Enzyme E1 Subunit ECR1 (E1 C-TERMINAL RELATED 1) and Is Essential for Multiple Aspects of Cellular Events during Kernel Development in Maize. New Phytol. 2021, 230, 2337–2354. [Google Scholar] [CrossRef]
- Tippmann, S. Programming Tools: Adventures with R. Nature 2015, 517, 109–110. [Google Scholar] [CrossRef] [PubMed]
- Knaus, B.J.; Grünwald, N.J. VcfR: A Package to Manipulate and Visualize VCF Format Data in R. Mol. Ecol. Resour. 2016, 17, 44–53. [Google Scholar] [CrossRef] [PubMed]
- Wickham, H.; Francois, R. Dplyr: A Grammar of Data Manipulation. 2014. Available online: https://cran.uib.no/web/packages/dplyr/index.html (accessed on 15 March 2017).
- Wickham, H.; Francois, R. Tidyverse/Dplyr. 2018. Available online: https://github.com/tidyverse/dplyr (accessed on 15 August 2021).
- Wickham, H.; Francois, R. Readr: Read Tabular Data. 2015. Available online: https://cran.r-project.org/web/packages/readr/index.html (accessed on 14 January 2022).
- Love, M.I.; Huber, W.; Anders, S. Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2. Genome Biol. 2014, 15, 550. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ginestet, C. ggplot2: Elegant Graphics for Data Analysis. J. R. Stat. Soc. 2011, 174, 245–246. [Google Scholar] [CrossRef]
- Alexa, A.; Rahnenfuhrer, J. topGO: Enrichment Analysis for Gene Ontology. 2007. Available online: https://bioconductor.org/packages/release/bioc/html/topGO.html (accessed on 2 February 2022).
- Lawrence, M.; Gentleman, R.; Carey, V. Rtracklayer: An R Package for Interfacing with Genome Browsers. Bioinformatics 2009, 25, 1841–1842. [Google Scholar] [CrossRef] [Green Version]
- Yang, Z.; Huang, D.; Tang, W.; Zheng, Y.; Liang, K.; Cutler, A.J.; Wu, W. Mapping of Quantitative Trait Loci Underlying Cold Tolerance in Rice Seedlings via High-Throughput Sequencing of Pooled Extremes. PLoS ONE 2013, 8, e68433. [Google Scholar] [CrossRef] [Green Version]
- Li, H.; Durbin, R. Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform. Bioinformatics 2009, 25, 1754–1760. [Google Scholar] [CrossRef] [Green Version]
- DePristo, M.A.; Banks, E.; Poplin, R.; Garimella, K.V.; Maguire, J.R.; Hartl, C.; Philippakis, A.A.; del Angel, G.; Rivas, M.A.; Hanna, M.; et al. A Framework for Variation Discovery and Genotyping Using Next-Generation DNA Sequencing Data. Nat. Genet. 2011, 43, 491–498. [Google Scholar] [CrossRef]
- Auwera, G.A.; Carneiro, M.O.; Hartl, C.; Poplin, R.; del Angel, G.; Levy-Moonshine, A.; Jordan, T.; Shakir, K.; Roazen, D.; Thibault, J.; et al. From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. Curr. Protoc. Bioinform. 2013, 43, 11.10.1–11.10.33. [Google Scholar] [CrossRef]
- McKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, K.; Kernytsky, A.; Garimella, K.; Altshuler, D.; Gabriel, S.; Daly, M.; et al. The Genome Analysis Toolkit: A MapReduce Framework for Analyzing next-Generation DNA Sequencing Data. Genome Res. 2010, 20, 1297–1303. [Google Scholar] [CrossRef] [Green Version]
- Sun, J.; Yang, L.; Wang, J.; Liu, H.; Zheng, H.; Xie, D.; Zhang, M.; Feng, M.; Jia, Y.; Zhao, H.; et al. Identification of a Cold-Tolerant Locus in Rice (Oryza sativa L.) Using Bulked Segregant Analysis with a next-Generation Sequencing Strategy. Rice 2018, 11, 24. [Google Scholar] [CrossRef] [PubMed]
- Schnable, P.S.; Ware, D.; Fulton, R.S.; Stein, J.C.; Wei, F.; Pasternak, S.; Liang, C.; Zhang, J.; Fulton, L.; Graves, T.A.; et al. The B73 Maize Genome: Complexity, Diversity, and Dynamics. Science 2009, 326, 1112–1115. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jiao, Y.; Peluso, P.; Shi, J.; Liang, T.; Stitzer, M.C.; Wang, B.; Campbell, M.S.; Stein, J.C.; Wei, X.; Chin, C.-S.; et al. Improved Maize Reference Genome with Single-Molecule Technologies. Nature 2017, 546, 524–527. [Google Scholar] [CrossRef] [PubMed]
- Kawahara, Y.; de la Bastide, M.; Hamilton, J.P.; Kanamori, H.; McCombie, W.R.; Ouyang, S.; Schwartz, D.C.; Tanaka, T.; Wu, J.; Zhou, S.; et al. Improvement of the Oryza Sativa Nipponbare Reference Genome Using next Generation Sequence and Optical Map Data. Rice 2013, 6, 4. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, C.; Tang, S.; Zhan, Q.; Hou, Q.; Zhao, Y.; Zhao, Q.; Feng, Q.; Zhou, C.; Lyu, D.; Cui, L.; et al. Dissecting a Heterotic Gene through GradedPool-Seq Mapping Informs a Rice-Improvement Strategy. Nat. Commun. 2019, 10, 2982. [Google Scholar] [CrossRef]
- Huang, L.; Feng, G.; Yan, H.; Zhang, Z.; Bushman, B.S.; Wang, J.; Bombarely, A.; Li, M.; Yang, Z.; Nie, G.; et al. Genome Assembly Provides Insights into the Genome Evolution and Flowering Regulation of Orchardgrass. Plant Biotechnol. J. 2020, 18, 373–388. [Google Scholar] [CrossRef]
- Nadaraya, E.A. On Estimating Regression. Theory Probab. Its Appl. 1964, 9, 141–142. [Google Scholar] [CrossRef]
- Watson, G.S. Smooth Regression Analysis. Sankhyā Indian J. Stat. 1964, 26, 359–372. [Google Scholar] [CrossRef]
- Mansfeld, B.N.; Grumet, R. QTLseqr: An R Package for Bulk Segregant Analysis with Next-Generation Sequencing. Plant Genome 2018, 11, 180006. [Google Scholar] [CrossRef] [Green Version]
- Su, A.; Song, W.; Xing, J.; Zhao, Y.; Zhang, R.; Li, C.; Duan, M.; Luo, M.; Shi, Z.; Zhao, J. Identification of Genes Potentially Associated with the Fertility Instability of S-Type Cytoplasmic Male Sterility in Maize via Bulked Segregant RNA-Seq. PLoS ONE 2016, 11, e0163489. [Google Scholar] [CrossRef] [Green Version]
- Zhao, T.; Jiang, J.; Liu, G.; He, S.; Zhang, H.; Chen, X.; Li, J.; Xu, X. Mapping and Candidate Gene Screening of Tomato Cladosporium Fulvum-Resistant Gene Cf-19, Based on High-Throughput Sequencing Technology. BMC Plant Biol. 2016, 16, 51. [Google Scholar] [CrossRef] [Green Version]
- Wen, J.; Jiang, F.; Weng, Y.; Sun, M.; Shi, X.; Zhou, Y.; Yu, L.; Wu, Z. Identification of Heat-Tolerance QTLs and High-Temperature Stress-Responsive Genes through Conventional QTL Mapping, QTL-Seq and RNA-Seq in Tomato. BMC Plant Biol. 2019, 19, 398. [Google Scholar] [CrossRef] [PubMed]
- Hao, Z.; Geng, M.; Hao, Y.; Zhang, Y.; Zhang, L.; Wen, S.; Wang, R.; Liu, G. Screening for Differential Expression of Genes for Resistance to Sitodiplosis Mosellana in Bread Wheat via BSR-Seq Analysis. Theor. Appl. Genet. 2019, 132, 3201–3221. [Google Scholar] [CrossRef] [PubMed]
- Danecek, P.; Bonfield, J.K.; Liddle, J.; Marshall, J.; Ohan, V.; Pollard, M.O.; Whitwham, A.; Keane, T.; McCarthy, S.A.; Davies, R.M.; et al. Twelve Years of SAMtools and BCFtools. GigaScience 2021, 10. [Google Scholar] [CrossRef] [PubMed]
- Kim, D.; Paggi, J.M.; Park, C.; Bennett, C.; Salzberg, S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 2019, 37, 907–915. [Google Scholar] [CrossRef]
- Putri, G.H.; Anders, S.; Pyl, P.T.; Pimanda, J.E.; Zanini, F. Analysing High-Throughput Sequencing Data in Python with HTSeq 2.0. Bioinformatics 2022, 38, 2943–2945. [Google Scholar] [CrossRef]
Criteria a | Methods b | Chrom c | Start d | End e | Length f | |
---|---|---|---|---|---|---|
——————Mb—————— | ||||||
p-value | 0.01 | Bayes | 8 | 0.08 | 11.28 | 11.20 |
ED | 8 | 0.08 | 10.65 | 10.57 | ||
ΔSNP | 8 | 0.08 | 10.00 | 9.92 | ||
G-test | 8 | 0.08 | 10.65 | 10.57 | ||
0.005 | Bayes | 8 | 0.08 | 10.49 | 10.41 | |
ED | 8 | 0.08 | 10.20 | 10.12 | ||
ΔSNP | 8 | 0.08 | 9.36 | 9.28 | ||
G-test | 8 | 0.08 | 8.05 | 7.97 | ||
0.001 | Bayes | 8 | 1.46 | 3.42 | 1.96 | |
ED | 8 | 0.08 | 3.37 | 3.30 | ||
ΔSNP | 8 | 0.08 | 2.21 | 2.13 | ||
G-test | 8 | 0.08 | 1.67 | 1.59 | ||
Quantile | 99% | Bayes | 8 | 0.08 | 11.28 | 11.20 |
ED | 8 | 0.08 | 10.65 | 10.57 | ||
ΔSNP | 8 | 0.08 | 10.00 | 9.92 | ||
G-test | 8 | 0.08 | 10.65 | 10.57 | ||
99.5% | Bayes | 8 | 0.08 | 11.03 | 10.95 | |
ED | 8 | 0.08 | 10.65 | 10.57 | ||
ΔSNP | 8 | 0.08 | 9.36 | 9.28 | ||
G-test | 8 | 0.08 | 10.45 | 10.37 | ||
99.9% | Bayes | 8 | 1.46 | 3.42 | 1.96 | |
ED | 8 | 0.08 | 3.29 | 3.21 | ||
ΔSNP | 8 | 0.08 | 3.29 | 3.21 | ||
G-test | 8 | 0.08 | 3.29 | 3.21 |
Chrom a | Start b | End c | Gene d | Log2 (foldchange) e | FDR f |
---|---|---|---|---|---|
8 | 563764 | 569489 | Zm00001d008179 | −1.6552 | 4.96 × 10−3 |
8 | 909115 | 910111 | Zm00001d008196 | −7.1323 | 1.86 × 10−3 |
8 | 957005 | 966468 | Zm00001d008198 | 2.4286 | 3.83 × 10−2 |
8 | 1238180 | 1242266 | Zm00001d008209 | 2.6675 | 4.95 × 10−5 |
8 | 1299092 | 1300668 | Zm00001d008211 | −3.7562 | 6.48 × 10−4 |
8 | 1983066 | 1984497 | Zm00001d008229 | −2.6535 | 3.86 × 10−2 |
8 | 3193712 | 3222186 | Zm00001d008256 g | −1.5896 | 6.63 × 10−13 |
Chrom a | ΔSNP (Mb) b | G-Test (Mb) c | ED (Mb) d | |||
---|---|---|---|---|---|---|
Quantile > 85% | p-Value < 0.05 | Quantile > 85% | p-Value < 0.05 | Quantile > 85% | p-Value < 0.05 | |
Chr1 | 25.61–34.62 | 26.20–33.34 | 25.56–34.90 | 26.22–33.63 | 25.45–34.49 | 26.12–33.20 |
Chr2 | 9.58–19.44 | 10.06–10.60 | 9.58–19.58 | 10.01–10.51 | 9.28–19.64 | 9.73–10.40 |
Chr5 | 27.09–28.75 | - | 27.35–28.68 | - | 27.14–29.96 | - |
Chr8 | 17.04–27.30 | 21.27–25.24 | 16.92–27.37 | 21.40–25.17 | 16.90–27.54 | 21.34–25.43 |
Chr10 | 17.50–20.25 | - | 17.67–20.26 | - | 17.66–20.33 | - |
Chrom a | ED (Mb) b | ΔSNP (Mb) c | G-Test (Mb) d | ||||||
---|---|---|---|---|---|---|---|---|---|
Start | End | Size | Start | End | Size | Start | End | Size | |
4 | 5.53 | 6.30 | 0.77 | 5.57 | 6.32 | 0.75 | 5.98 | 6.19 | 0.21 |
4 | 21.31 | 23.69 | 2.38 | 21.03 | 23.69 | 2.66 | 20.81 | 23.69 | 2.88 |
5 | - | - | - | - | - | - | 15.31 | 16.07 | 0.77 |
5 | 18.97 | 19.57 | 0.60 | 19.18 | 19.57 | 0.39 | 18.07 | 19.21 | 1.14 |
6 | 5.37 | 6.99 | 1.62 | 5.32 | 7.00 | 1.68 | 5.23 | 7.00 | 1.77 |
6 | 21.79 | 24.68 | 2.89 | 21.79 | 24.68 | 2.89 | 23.31 | 24.80 | 1.49 |
6 | - | - | - | - | - | - | 27.21 | 27.55 | 0.34 |
11 | - | - | - | - | - | - | 21.71 | 21.98 | 0.27 |
11 | 23.89 | 24.44 | 0.55 | 23.92 | 24.44 | 0.52 | 23.37 | 24.30 | 0.93 |
Total | - | - | 8.81 | - | - | 8.89 | - | - | 9.81 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhen, S.; Zhang, H.; Xie, Y.; Zhang, S.; Chen, Y.; Gu, R.; Liu, S.; Du, X.; Fu, J. PNGSeqR: An R Package for Rapid Candidate Gene Selection through Pooled Next-Generation Sequencing. Plants 2022, 11, 1821. https://doi.org/10.3390/plants11141821
Zhen S, Zhang H, Xie Y, Zhang S, Chen Y, Gu R, Liu S, Du X, Fu J. PNGSeqR: An R Package for Rapid Candidate Gene Selection through Pooled Next-Generation Sequencing. Plants. 2022; 11(14):1821. https://doi.org/10.3390/plants11141821
Chicago/Turabian StyleZhen, Sihan, Hongwei Zhang, Yuxin Xie, Song Zhang, Yan Chen, Riliang Gu, Sanzhen Liu, Xuemei Du, and Junjie Fu. 2022. "PNGSeqR: An R Package for Rapid Candidate Gene Selection through Pooled Next-Generation Sequencing" Plants 11, no. 14: 1821. https://doi.org/10.3390/plants11141821
APA StyleZhen, S., Zhang, H., Xie, Y., Zhang, S., Chen, Y., Gu, R., Liu, S., Du, X., & Fu, J. (2022). PNGSeqR: An R Package for Rapid Candidate Gene Selection through Pooled Next-Generation Sequencing. Plants, 11(14), 1821. https://doi.org/10.3390/plants11141821