History of Biological Databases, Their Importance, and Existence in Modern Scientific and Policy Context
Abstract
:1. Introduction
2. Databases and Bioinformatic Tools
3. Databases, Open Data Sharing and Ethical Problems
4. DSI and COP16
- (a)
- The term ‘DSI’ will remain in use during the talks despite its lack of clear definition;
- (b)
- In the context of benefit sharing, the DSI will be addressed from the perspective of the Nagoya protocol—as genetic resources;
- (c)
- It was agreed that a solution for equitable benefit-sharing needs to be developed, and the benefits from use of DSI should be used to support conservation and sustainable use of biological diversity;
- (d)
- The form of equitable benefit-sharing will take the form of a global fund.
- (a)
- total assets worth more than USD 20 million;
- (b)
- Sales greater than USD 50 million;
- (c)
- Profit greater than USD 5 million.
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Venter, J.C.; Adams, M.D.; Myers, E.W.; Li, P.W.; Mural, R.J.; Sutton, G.G.; Smith, H.O.; Yandell, M.; Evans, C.A.; Holt, R.A.; et al. The Sequence of the Human Genome. Science 2001, 291, 1304–1351. [Google Scholar] [CrossRef] [PubMed]
- Pearson, M.L.; Söll, D. The Human Genome Project: A Paradigm for Information Management in the Life Sciences. FASEB J. 1991, 5, 35–39. [Google Scholar] [CrossRef] [PubMed]
- Stoesser, G.; Baker, W.; van den Broek, A.; Camon, E.; Garcia-Pastor, M.; Kanz, C.; Kulikova, T.; Leinonen, R.; Lin, Q.; Lombard, V.; et al. The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 2002, 30, 21–26. [Google Scholar] [CrossRef] [PubMed]
- Benson, D.A.; Karsch-Mizrachi, I.; Lipman, D.J.; Ostell, J.; Wheeler, D.L. GenBank. Nucleic Acids Res. 2005, 33, D34–D38. [Google Scholar] [CrossRef] [PubMed]
- Sayers, E.W.; Cavanaugh, M.; Clark, K.; Pruitt, K.D.; Sherry, S.T.; Yankie, L.; Karsch-Mizrachi, I. GenBank 2024 Update. Nucleic Acids Res. 2024, 52, D134–D137. [Google Scholar] [CrossRef]
- Collins, F.S.; Patrinos, A.; Jordan, E.; Chakravarti, A.; Gesteland, R.; Walters, L.; The Members of the DOE and NIH Planning Groups. New Goals for the U.S. Human Genome Project: 1998–2003. Science 1998, 282, 682–689. [Google Scholar] [CrossRef]
- Lander, E.S.; Linton, L.M.; Birren, B.; Nusbaum, C.; Zody, M.C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M.; FitzHugh, W.; et al. Initial Sequencing and Analysis of the Human Genome. Nature 2001, 409, 860–921. [Google Scholar] [CrossRef] [PubMed]
- Secretariat of the Convention on Biological Diversity. Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from Their Utilization to the Convention on Biological Diversity: Text and Annex; The Nagoya Protocol on Access and Benefit Sharing of Genetic Resources; Secretariat of the Convention on Biological Diversity: Montreal, QC, Canada, 2011; ISBN 92-9225-306-9.
- Tateno, Y.; Gojobori, T. DNA Data Bank of Japan in the Age of Information Biology. Nucleic Acids Res. 1997, 25, 14–17. [Google Scholar] [CrossRef] [PubMed]
- Karsch-Mizrachi, I.; Takagi, T.; Cochrane, G. The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res. 2018, 46, D48–D51. [Google Scholar] [CrossRef] [PubMed]
- Smith, K. A Brief History of NCBI’s Formation and Growth. In The NCBI Handbook [Internet], 2nd ed.; National Center for Biotechnology Information (US): Bethesda, MD, USA, 2013. [Google Scholar]
- Sayers, E.W.; Bolton, E.E.; Brister, J.R.; Canese, K.; Chan, J.; Comeau, D.C.; Connor, R.; Funk, K.; Kelly, C.; Kim, S.; et al. Database Resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2022, 50, D20–D26. [Google Scholar] [CrossRef] [PubMed]
- Data Repository Guidance|Scientific Data. Available online: https://www.nature.com/sdata/policies/repositories (accessed on 12 January 2025).
- Gražulis, S.; Daškevič, A.; Merkys, A.; Chateigner, D.; Lutterotti, L.; Quirós, M.; Serebryanaya, N.R.; Moeck, P.; Downs, R.T.; Le Bail, A. Crystallography Open Database (COD): An Open-Access Collection of Crystal Structures and Platform for World-Wide Collaboration. Nucleic Acids Res. 2012, 40, D420–D427. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Cheng, T.; Bryant, S.H. PubChem BioAssay: A Decade’s Development toward Open High-Throughput Screening Data Sharing. SLAS Discov. 2017, 22, 655–666. [Google Scholar] [CrossRef] [PubMed]
- Hoch, J.C.; Baskaran, K.; Burr, H.; Chin, J.; Eghbalnia, H.R.; Fujiwara, T.; Gryk, M.R.; Iwata, T.; Kojima, C.; Kurisu, G.; et al. Biological Magnetic Resonance Data Bank. Nucleic Acids Res. 2022, 51, D368–D376. [Google Scholar] [CrossRef] [PubMed]
- The wwPDB Consortium EMDB—The Electron Microscopy Data Bank. Nucleic Acids Res. 2024, 52, D456–D465. [CrossRef] [PubMed]
- NITRC: About, Us. Available online: https://www.nitrc.org/include/about_us.php (accessed on 11 January 2025).
- Markiewicz, C.J.; Gorgolewski, K.J.; Feingold, F.; Blair, R.; Halchenko, Y.O.; Miller, E.; Hardcastle, N.; Wexler, J.; Esteban, O.; Goncavles, M.; et al. The OpenNeuro Resource for Sharing of Neuroscience Data. eLife 2021, 10, e71774. [Google Scholar] [CrossRef] [PubMed]
- Clark, K.; Vendt, B.; Smith, K.; Freymann, J.; Kirby, J.; Koppel, P.; Moore, S.; Phillips, S.; Maffitt, D.; Pringle, M.; et al. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. J. Digit. Imaging 2013, 26, 1045–1057. [Google Scholar] [CrossRef] [PubMed]
- Gries, C.; Hanson, P.C.; O’Brien, M.; Servilla, M.; Vanderbilt, K.; Waide, R. The Environmental Data Initiative: Connecting the Past to the Future through Data Reuse. Ecol. Evol. 2023, 13, e9592. [Google Scholar] [CrossRef]
- What Is GBIF? Available online: https://www.gbif.org/what-is-gbif (accessed on 10 January 2025).
- KNB. Available online: https://knb.ecoinformatics.org/ (accessed on 11 January 2025).
- Karan, M.; Liddell, M.; Prober, S.M.; Arndt, S.; Beringer, J.; Boer, M.; Cleverly, J.; Eamus, D.; Grace, P.; Van Gorsel, E.; et al. The Australian SuperSite Network: A Continental, Long-Term Terrestrial Ecosystem Observatory. Sci. Total Environ. 2016, 568, 1263–1274. [Google Scholar] [CrossRef] [PubMed]
- Praz, V.; Périer, R.; Bonnard, C.; Bucher, P. The Eukaryotic Promoter Database, EPD: New Entry Types and Links to Gene Expression Data. Nucleic Acids Res. 2002, 30, 322–324. [Google Scholar] [CrossRef] [PubMed]
- Meylan, P.; Dreos, R.; Ambrosini, G.; Groux, R.; Bucher, P. EPD in 2020: Enhanced Data Visualization and Extension to ncRNA Promoters. Nucleic Acids Res. 2020, 48, D65–D69. [Google Scholar] [CrossRef] [PubMed]
- Donna Maglott, P.; Kim Pruitt, P.; Tatiana Tatusova, P.; Terence Murphy, P. Gene. In The NCBI Handbook [Internet], 2nd ed.; National Center for Biotechnology Information (US): Bethesda, MD, USA, 2013. [Google Scholar]
- Clough, E.; Barrett, T. The Gene Expression Omnibus Database. Methods Mol. Biol. 2016, 1418, 93–110. [Google Scholar] [CrossRef]
- Scott, A.F.; Amberger, J.S. The Genes of OMIM: A Legacy of Victor McKusick. Am. J. Med. Genet. Part A 2021, 185, 3276–3283. [Google Scholar] [CrossRef] [PubMed]
- Pruitt, K.D.; Brown, G.R.; Hiatt, S.M.; Thibaud-Nissen, F.; Astashyn, A.; Ermolaeva, O.; Farrell, C.M.; Hart, J.; Landrum, M.J.; McGarvey, K.M.; et al. RefSeq: An Update on Mammalian Reference Sequences. Nucleic Acids Res. 2014, 42, D756–D763. [Google Scholar] [CrossRef] [PubMed]
- About ClinicalTrials.Gov|ClinicalTrials.Gov. Available online: https://clinicaltrials.gov/about-site/about-ctg (accessed on 11 January 2025).
- Bhattacharya, S.; Andorf, S.; Gomes, L.; Dunn, P.; Schaefer, H.; Pontius, J.; Berger, P.; Desborough, V.; Smith, T.; Campbell, J.; et al. ImmPort: Disseminating Data to the Public for the Future of Immunology. Immunol. Res. 2014, 58, 234–239. [Google Scholar] [CrossRef]
- PhysioNet Timeline. Available online: https://physionet.org/about/timeline (accessed on 11 January 2025).
- Hoeppner, M.; Latterner, M.; Siyan, K. Bookshelf. In The NCBI Handbook [Internet], 2nd ed.; National Center for Biotechnology Information (US): Bethesda, MD, USA, 2013. [Google Scholar]
- Weis, S. NLM Catalog. In The NCBI Handbook [Internet], 2nd ed.; National Center for Biotechnology Information (US): Bethesda, MD, USA, 2013. [Google Scholar]
- About. Available online: https://pubmed.ncbi.nlm.nih.gov/about/ (accessed on 31 December 2024).
- PubMed Central: About PMC. Available online: https://pmc.ncbi.nlm.nih.gov/about/intro/ (accessed on 31 December 2024).
- Barrett, T.; Clark, K.; Gevorgyan, R.; Gorelenkov, V.; Gribov, E.; Karsch-Mizrachi, I.; Kimelman, M.; Pruitt, K.D.; Resenchuk, S.; Tatusova, T.; et al. BioProject and BioSample Databases at NCBI: Facilitating Capture and Organization of Metadata. Nucleic Acids Res. 2012, 40, D57–D63. [Google Scholar] [CrossRef]
- Barrett, T. BioSample. In The NCBI Handbook [Internet], 2nd ed.; National Center for Biotechnology Information (US): Bethesda, MD, USA, 2013. [Google Scholar]
- Staff, N. NCBI Datasets: Easily Access and Download Sequence Data and Metadata. Available online: https://ncbiinsights.ncbi.nlm.nih.gov/2023/10/18/ncbi-datasets-access-sequence-data/ (accessed on 9 January 2025).
- Yurekten, O.; Payne, T.; Tejera, N.; Amaladoss, F.X.; Martin, C.; Williams, M.; O’Donovan, C. MetaboLights: Open Data Repository for Metabolomics. Nucleic Acids Res. 2024, 52, D640–D646. [Google Scholar] [CrossRef] [PubMed]
- Eppig, J.T.; Richardson, J.E.; Kadin, J.A.; Ringwald, M.; Blake, J.A.; Bult, C.J. Mouse Genome Informatics (MGI): Reflecting on 25 Years. Mamm. Genome 2015, 26, 272–284. [Google Scholar] [CrossRef]
- Smith, J.R.; Hayman, G.T.; Wang, S.-J.; Laulederkind, S.J.F.; Hoffman, M.J.; Kaldunski, M.L.; Tutaj, M.; Thota, J.; Nalabolu, H.S.; Ellanki, S.L.R.; et al. The Year of the Rat: The Rat Genome Database at 20: A Multi-Species Knowledgebase and Analysis Platform. Nucleic Acids Res. 2020, 48, D731–D742. [Google Scholar] [CrossRef] [PubMed]
- FlyBase Consortium. FlyBase: The Drosophila Database. Nucleic Acids Res. 1996, 24, 53–56. [Google Scholar] [CrossRef] [PubMed]
- FlyBase: Overview—FlyBase Wiki. Available online: https://wiki.flybase.org/wiki/FlyBase:Overview (accessed on 11 January 2025).
- A State-of-the-Art Ecosystem for Neuroscience. Available online: https://www.ebrains.eu/about (accessed on 11 January 2025).
- Akram, M.A.; Nanda, S.; Maraver, P.; Armañanzas, R.; Ascoli, G.A. An Open Repository for Single-Cell Reconstructions of the Brain Forest. Sci. Data 2018, 5, 180006. [Google Scholar] [CrossRef]
- Malik-Sheriff, R.S.; Glont, M.; Nguyen, T.V.N.; Tiwari, K.; Roberts, M.G.; Xavier, A.; Vu, M.T.; Men, J.; Maire, M.; Kananathan, S.; et al. BioModels—15 Years of Sharing Computational Models in Life Science. Nucleic Acids Res. 2020, 48, D407–D415. [Google Scholar] [CrossRef]
- Spidlen, J.; Breuer, K.; Rosenberg, C.; Kotecha, N.; Brinkman, R.R. FlowRepository: A Resource of Annotated Flow Cytometry Datasets Associated with Peer-Reviewed Publications. Cytom. Part A 2012, 81A, 727–731. [Google Scholar] [CrossRef]
- Schoch, C.L.; Ciufo, S.; Domrachev, M.; Hotton, C.L.; Kannan, S.; Khovanskaya, R.; Leipe, D.; Mcveigh, R.; O’Neill, K.; Robbertse, B.; et al. NCBI Taxonomy: A Comprehensive Update on Curation, Resources and Tools. Database 2020, 2020, baaa062. [Google Scholar] [CrossRef] [PubMed]
- Lipscomb, C.E. Medical Subject Headings (MeSH). Bull. Med. Libr. Assoc. 2000, 88, 265–266. [Google Scholar] [PubMed]
- UK Data Service. About. Available online: https://ukdataservice.ac.uk/about/ (accessed on 11 January 2025).
- Olson, R.D.; Assaf, R.; Brettin, T.; Conrad, N.; Cucinell, C.; Davis, J.J.; Dempsey, D.M.; Dickerman, A.; Dietrich, E.M.; Kenyon, R.W.; et al. Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): A Resource Combining PATRIC, IRD and ViPR. Nucleic Acids Res. 2022, 51, D678–D689. [Google Scholar] [CrossRef] [PubMed]
- Alvarez-Jarreta, J.; Amos, B.; Aurrecoechea, C.; Bah, S.; Barba, M.; Barreto, A.; Basenko, E.Y.; Belnap, R.; Blevins, A.; Böhme, U.; et al. VEuPathDB: The Eukaryotic Pathogen, Vector and Host Bioinformatics Resource Center in 2023. Nucleic Acids Res. 2024, 52, D808–D816. [Google Scholar] [CrossRef]
- Desiere, F.; Deutsch, E.W.; Nesvizhskii, A.I.; Mallick, P.; King, N.L.; Eng, J.K.; Aderem, A.; Boyle, R.; Brunner, E.; Donohoe, S.; et al. Integration with the Human Genome of Peptide Sequences Obtained by High-Throughput Mass Spectrometry. Genome Biol. 2005, 6, R9. [Google Scholar] [CrossRef]
- Whitmore, L.; Woollett, B.; Miles, A.J.; Klose, D.P.; Janes, R.W.; Wallace, B.A. PCDDB: The Protein Circular Dichroism Data Bank, a Repository for Circular Dichroism Spectral and Metadata. Nucleic Acids Res. 2011, 39, D480–D486. [Google Scholar] [CrossRef]
- Jain, E.; Bairoch, A.; Duvaud, S.; Phan, I.; Redaschi, N.; Suzek, B.E.; Martin, M.J.; McGarvey, P.; Gasteiger, E. Infrastructure for the Life Sciences: Design and Implementation of the UniProt Website. BMC Bioinform. 2009, 10, 136. [Google Scholar] [CrossRef] [PubMed]
- Protein Data Bank. RCSB PDB: PDB History. Available online: https://www.rcsb.org/pages/about-us/history (accessed on 11 January 2025).
- Landrum, M.; Lee, J.; Riley, G.; Jang, W.; Rubinstein, W.; Church, D.; Maglott, D. ClinVar. In The NCBI Handbook [Internet], 2nd ed.; National Center for Biotechnology Information (US): Bethesda, MD, USA, 2013. [Google Scholar]
- Kitts, A.; Phan, L.; Ward, M.; Holmes, J.B. The Database of Short Genetic Variation (dbSNP). In The NCBI Handbook [Internet], 2nd ed.; National Center for Biotechnology Information (US): Bethesda, MD, USA, 2014. [Google Scholar]
- Kitts, A.; Church, D.; Hefferon, T.; Phan, L. dbVar. In The NCBI Handbook [Internet], 2nd ed.; National Center for Biotechnology Information (US): Bethesda, MD, USA, 2014. [Google Scholar]
- Huston, P.; Edge, V.; Bernier, E. Reaping the Benefits of Open Data in Public Health. Can Commun. Dis. Rep. 2019, 45, 252–256. [Google Scholar] [CrossRef] [PubMed]
- Pinxten, W.; Howard, H.C. Ethical Issues Raised by Whole Genome Sequencing. Best Pract. Res. Clin. Gastroenterol. 2014, 28, 269–279. [Google Scholar] [CrossRef] [PubMed]
- UN Environment Programme. 15/9. Digital Sequence Information on Genetic Resources. In Proceedings of the Conference of the Parties to the Convention on Biological Diversity, Montreal, QC, Canada, 7–19 December 2022. [Google Scholar]
- Lefebvre, V. Decision Adopted by the Conference of the Parties to the Convention on Biological Diversity on 1 November 2024. In Proceedings of the Conference of the Parties to the Convention on Biological Diversity, Cali, Colombia, 21 October–1 November 2024. [Google Scholar]
- Jones, B. Every Country Is Negotiating a Plan to Save Nature. Except the US. Available online: https://www.vox.com/down-to-earth/379295/cop16-biodiversity-why-us-global-treaty-protect-nature (accessed on 31 December 2024).
- Frenkel, K.A. The Human Genome Project and Informatics. Commun. ACM 1991, 34, 40–51. [Google Scholar] [CrossRef]
- Chen, Z. Ethics and Discrimination in Artificial Intelligence-Enabled Recruitment Practices. Humanit. Soc. Sci. Commun. 2023, 10, 567. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Danielewski, M.; Szalata, M.; Nowak, J.K.; Walkowiak, J.; Słomski, R.; Wielgus, K. History of Biological Databases, Their Importance, and Existence in Modern Scientific and Policy Context. Genes 2025, 16, 100. https://doi.org/10.3390/genes16010100
Danielewski M, Szalata M, Nowak JK, Walkowiak J, Słomski R, Wielgus K. History of Biological Databases, Their Importance, and Existence in Modern Scientific and Policy Context. Genes. 2025; 16(1):100. https://doi.org/10.3390/genes16010100
Chicago/Turabian StyleDanielewski, Mikołaj, Marlena Szalata, Jan Krzysztof Nowak, Jarosław Walkowiak, Ryszard Słomski, and Karolina Wielgus. 2025. "History of Biological Databases, Their Importance, and Existence in Modern Scientific and Policy Context" Genes 16, no. 1: 100. https://doi.org/10.3390/genes16010100
APA StyleDanielewski, M., Szalata, M., Nowak, J. K., Walkowiak, J., Słomski, R., & Wielgus, K. (2025). History of Biological Databases, Their Importance, and Existence in Modern Scientific and Policy Context. Genes, 16(1), 100. https://doi.org/10.3390/genes16010100