A Tool to Encourage Minimum Reporting Guideline Uptake for Data Analysis in Metabolomics
Abstract
:1. Introduction
1.1. Data Analysis Reporting Using R Markdown
1.2. Objectives
- To present a set of previously proposed minimum reporting guidelines in the form of a checklist specifically for the data analysis step of metabolomics biomarker discovery studies. There are typically four phases to this data analysis pipeline, although aside from pre-treatment, the other steps are not essential but are commonly used:
- Data pre-treatment
- Univariate data analysis to identify significant features that differ between groups
- Multivariate data analysis
- ▪
- Unsupervised data analysis to discover correlated features or identify hidden subgroups or to visualise separation and identify outliers
- ▪
- Supervised data analysis, specifically for developing prediction models and/or biomarker identification.
- Biomarker Candidate Performance Analysis
- ▪
- Receiver Operating Curve (ROC) analysis
- To provide an authoring tool to promote standardised comprehensive reporting on data analysis, which will also generate workflow diagrams.
2. Methods
2.1. Checklist of Minimum Information For Reporting Data Analysis In Metabolomics
2.2. An Authoring Tool For Reporting Statistical Analysis Of Predictive Omics
3. Results
3.1. Minimum Information About A Data Analysis (MIDAS) Checklist (Guidelines Checklist Specifically for the Data Analysis Step)
- What are the dimensions of the dataset entering this phase of analysis?
- What percentage of the data is missing values?
- Is imputation (I) performed?
- If yes, describe the method.
- Is normalisation (N) performed?
- If yes, describe the method.
- Is transformation (T) performed?
- If yes, describe the method.
- Is scaling (S) performed?
- If yes, describe the method.
- Is filtering (F) applied to the dataset at this point?
- If yes, describe the method.
- Is a Quality Control/Quality Assessment (QC/QA) method employed on the dataset?
- Please describe.
- Outline the order of the pre-treatment steps performed on the dataset.E.g., I-> T-> S->N->F->QC
- Have the dimensions of the dataset changed from the outset of pre-treatment to the end of pre-treatment?
- Provide details on the package or program used for this phase of the analysis.
- If an in-house code is used, provide it or a link to it and also the language the code is written in.
- What are the dimensions of the dataset entering this phase of analysis?
- Is univariate testing performed?
- If yes, describe the method.
- Is a multiple testing correction employed with this method?
- If yes, describe the method.
- Are other methods of univariate testing performed?
- If yes, describe the methods.
- Are multiple testing correction employed with these methods?
- If yes, describe the method.
- Please report p-values and adjusted p-values.
- Please report test statistics and confidence intervals.
- Have the dimensions of the dataset changed from the outset of univariate analysis to the end of univariate analysis? If yes, provide the dimensions of the dataset at the end of univariate analysis and make it clear how the dimensions have changed.
- Provide details on the package or program used for this phase of the analysis.
- If in-house code is used, provide it or a link to it and also the language the code is written in.
- If a list of potential biomarkers is produced at this point, please state this explicitly.
- What are the dimensions of the dataset entering this phase of the analysis?
- Are unsupervised methods employed for visualisation and/ or data reduction and/or correlation analysis?
- If yes, describe the algorithm used.
- Is outlier detection and removal addressed at this point? If yes please describe and specify the outliers removed.
- Are unsupervised analysis methods used for clustering?
- If yes, describe and provide distance metric.
- Have the dimensions of the dataset changed? If yes, how and why?
- Provide the dimensions of the dataset at the end of unsupervised analysis.
- Provide details on the package or program used for this phase of the analysis.
- If in house code is used, provide it or a link to it and also the language the code is written in.
- If a list of potential biomarkers is produced at this point, please state this explicitly.
- What are the dimensions of the dataset at this point?
- Are supervised methods employed?
- If yes, describe the supervised analysis described fully enough to allow the imitation of the exact procedure. This would require reporting all the following information: all parameters; details of how data is split; details of how internal validation is conducted; details of how meta-parameter optimization is performed; details about the chosen metric for evaluating the performance of the classifier and finally the overall description of the workflow.
- Is more than one supervised method employed?
- If yes, describe the implementation of the other algorithm(s) fully enough to allow imitation of the exact procedure. This requires the reporting of all the following information: all parameters; details of how data is split; details of how internal validation is conducted; details of how meta-parameter optimization is performed; details about the chosen metric for evaluating the performance of the classifier and finally the overall description of the workflow.
- Is external validation employed?
- If yes, describe the source of external data. Is the data from the same location/ lab/timeline or a hold-out set from the original data?
- Provide a confusion matrix of results.
- Provide results as an average of n leave multiple-out and external predictions.
- Are potential biomarkers identified? If yes, list them.
- Have the dimensions of the dataset changed? If yes, how and why?
- Provide the dimensions of the dataset at the end of supervised analysis.
- Provide details on the package or program used for this phase of the analysis
- If in house code, is used provide it or a link to it and also the language the code is written in.
- If a list of potential biomarkers is produced at this point please state this explicitly.
- Is ROC analysis performed on the identified putative biomarkers?
- If yes, please report on AUC, sensitivity and specificity.
- Provide details on the package or program used for this phase of the analysis
- If in house code is used, provide it or a link to it and also the language the code is written in.
3.2. Link to GitHub Repository Containing Markdown Template
4. Discussion
- Go to the GitHub repository: https://github.com/MSI-Metabolomics-Standards-Initiative/MIDAS.
- Click the “clone or download” button on the right hand side of the page and download the folder as a zip file.
- Download latest version of R Studio if you do not have it.
- Open the folder and open the MIDAS.rmd file in R studio.
- Start editing and writing the report of your data analysis guided by the questions in green directly inside the MIDAS.rmd file.
- After the pre-treatment, univariate analysis, multivariate analysis and error analysis sections have been completed the next section is to produce workflow diagrams.
- Follow the instructions in green to produce a workflow diagram of pre-treatment steps.
- Click on the knit button and knit to HTML to see how the generated report looks.
- Knit to PDF or Word to render the report to a PDF or Word document as you wish.
- PDF and Word reports will not contain the diagrams so these need to be saved in the viewer pane as an image (JPG /BMP etc) to your local folder.
- Insert the workflow diagrams into your Word or PDF report that you have saved to your local folder.
- Render the document to HTML and workflow diagrams will be included anyway.
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Gromski, P.S.; Xu, Y.; Kotze, H.L.; Correa, E.; Ellis, D.I.; Armitage, E.G.; Turner, M.L.; Goodacre, R. Influence of missing values substitutes on multivariate analysis of metabolomics data. Metabolites 2014, 4, 433–452. [Google Scholar] [CrossRef] [PubMed]
- Van den Berg, R.A.; Hoefsloot, H.C.J.; Westerhuis, J.A.; Smilde, A.K.; van der Werf, M.J. Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genom. 2006, 7, 142. [Google Scholar] [CrossRef] [PubMed]
- Considine, E.C.; Thomas, G.; Boulesteix, A.L.; Khashan, A.S.; Kenny, L.C. Critical review of reporting of the data analysis step in metabolomics. Metabolomics 2018, 14, 7. [Google Scholar] [CrossRef]
- Cambiaghi, A.; Ferrario, M.; Masseroli, M. Analysis of metabolomic data: Tools, current strategies and future challenges for omics data integration. Brief. Bioinform. 2017, 18, 498–510. [Google Scholar] [CrossRef] [PubMed]
- Bartel, J.; Krumsiek, J.; Theis, F.J. Statistical methods for the analysis of high-throughput metabolomics data. Comput. Struct. Biotechnol. J. 2013, 4, e201301009. [Google Scholar] [CrossRef] [PubMed]
- Ren, S.; Hinzman, A.A.; Kang, E.L.; Szczesniak, R.D.; Lu, L.J. Computational and statistical analysis of metabolomics data. Metabolomics 2015, 11, 1492–1513. [Google Scholar] [CrossRef]
- Tugizimana, F.; Steenkamp, P.A.; Piater, L.A.; Dubery, I.A. A conversation on data mining strategies in LC-MS untargeted metabolomics: Pre-processing and pre-treatment steps. Metabolites 2016, 6, 40. [Google Scholar] [CrossRef] [PubMed]
- Baker, M. Is there a reproducibility crisis? A Nature survey lifts the lid on how researchers view the ‘crisis rocking science and what they think will help. Nature 2016, 533, 452–455. [Google Scholar] [CrossRef] [PubMed]
- Peng, R.D. Reproducible Research in Computational Science. Science 2011, 334, 1226–1227. [Google Scholar] [CrossRef] [PubMed]
- Fiehn, O.; Robertson, D.; Griffin, J.; van der Werf, M.; Nikolau, B.; Morrison, N.; Sumner, L.W.; Goodacre, R.; Hardy, N.W.; Taylor, C.; et al. The metabolomics standards initiative (MSI). Metabolomics 2007, 3, 175–178. [Google Scholar] [CrossRef]
- Goodacre, R.; Broadhurst, D.; Smilde, A.K.; Kristal, B.S.; Baker, J.D.; Beger, R.; Bessant, C.; Connor, S.; Calmani, G.; Craig, A.; et al. Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics 2007, 3, 231–241. [Google Scholar] [CrossRef]
- Sumner, L.W.; Amberg, A.; Barrett, D.; Beale, M.H.; Beger, R.; Daykin, C.A.; Fan, T.W.M.; Fiehn, O.; Goodacre, R.; Griffin, J.L. Proposed minimum reporting standards for chemical analysis. Metabolomics 2007, 3, 211–221. [Google Scholar] [CrossRef] [PubMed]
- Griffin, J.L.; Nicholls, A.W.; Daykin, C.A.; Heald, S.; Keun, H.C.; Schuppe-Koistinen, I.; Griffiths, J.R.; Cheng, L.L.; Rocca-Serra, P.; Rubtsov, D.V.; et al. Standard reporting requirements for biological samples in metabolomics experiments: Mammalian/in vivo experiments. Metabolomics 2007, 3, 179–188. [Google Scholar] [CrossRef]
- Morrison, N.; Bearden, D.; Bundy, J.G.; Collette, T.; Currie, F.; Davey, M.P.; Haigh, N.S.; Hancock, D.; Jones, O.A.H.; Rochfort, S.; et al. Standard reporting requirements for biological samples in metabolomics experiments: Environmental context. Metabolomics 2007, 3, 203–210. [Google Scholar] [CrossRef]
- Rubtsov, D.V.; Jenkins, H.; Ludwig, C.; Easton, J.; Viant, M.R.; Günther, U.; Griffin, J.L.; Hardy, N. Proposed reporting requirements for the description of NMR-based metabolomics experiments. Metabolomics 2007, 3, 223–229. [Google Scholar] [CrossRef]
- Simera, I.; Altman, D.G.; Moher, D.; Schulz, K.F.; Hoey, J. Guidelines for Reporting Health Research: The EQUATOR Network’s Survey of Guideline Authors. PLoS Med. 2008, 5, e139. [Google Scholar] [CrossRef] [PubMed]
- The Equator Network Enhancing the Quality and Transparency of health Research. Available online: http://www.equator-network.org (accessed on 4 February 2019).
- Meier, R.; Ruttkies, C.; Treutler, H.; Neumann, S. Bioinformatics can boost metabolomics research. J. Biotechnol. 2017, 261, 137–141. [Google Scholar] [CrossRef] [PubMed]
- Brazma, A.; Hingamp, P.; Quackenbush, J.; Sherlock, G.; Spellman, P.; Stoeckert, C.; Aach, J.; Ansorge, W.; Ball, C.A.; Causton, H.C.; et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 2001, 29, 365–371. [Google Scholar] [CrossRef] [PubMed]
- Taylor, C.F.; Paton, N.W.; Lilley, K.S.; Binz, P.-A.; Julian, R.K.; Jones, A.R.; Zhu, W.; Apweiler, R.; Aebersold, R.; Deutsch, E.W.; et al. The minimum information about a proteomics experiment (MIAPE). Nat Biotech 2007, 25, 887–893. [Google Scholar] [CrossRef] [PubMed]
- The Biosharing Website. Available online: https://biosharing.org/standards/?selected_facets=isMIBBI:true (accessed on 4 February 2019).
- CIMR-Core Information for Metabolomics Reporting. Available online: https://fairsharing.org/FAIRsharing.exz30t (accessed on 4 March 2019).
- Salek, R.M.; Neumann, S.; Schober, D.; Hummel, J.; Billiau, K.; Kopka, J.; Correa, E.; Reijmers, T.; Rosato, A.; Tenori, L.; et al. COordination of Standards in MetabOlomicS (COSMOS): Facilitating integrated metabolomics data access. Metabolomics 2015, 11, 1587–1597. [Google Scholar] [CrossRef] [PubMed]
- Steinbeck, C.; Conesa, P.; Haug, K.; Mahendraker, T.; Williams, M.; Maguire, E.; Rocca-Serra, P.; Sansone, S.-A.; Salek, R.M.; Griffin, J.L. MetaboLights: Towards a new COSMOS of metabolomics data management. Metabolomics 2012, 8, 757–760. [Google Scholar] [CrossRef] [PubMed]
- Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.-W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef] [PubMed]
- Spicer, R.A.; Salek, R.; Steinbeck, C. Compliance with minimum information guidelines in public metabolomics repositories. Sci. Data 2017, 4, 170137. [Google Scholar] [CrossRef] [PubMed]
- The, P.M.E. From Checklists to Tools: Lowering the Barrier to Better Research Reporting. PLoS Med. 2015, 12, e1001910. [Google Scholar] [CrossRef]
- Glasziou, P.; Altman, D.G.; Bossuyt, P.; Boutron, I.; Clarke, M.; Julious, S.; Michie, S.; Moher, D.; Wager, E. Reducing waste from incomplete or unusable reports of biomedical research. Lancet 2014, 383, 267–276. [Google Scholar] [CrossRef]
- Marusic, A. A tool to make reporting checklists work. BMC Med. 2015, 13, 243. [Google Scholar] [CrossRef] [PubMed]
- Broadhurst, D.I.; Kell, D.B. Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics 2006, 2, 171–196. [Google Scholar] [CrossRef]
- Baumer, B.; Udwin, D. R Markdown. Wiley Interdiscip. Rev. Comput. Stat. 2015, 7, 167–177. [Google Scholar] [CrossRef]
- Toelch, U.; Ostwald, D. Digital open science—Teaching digital tools for reproducible and transparent research. PLoS Biol. 2018, 16, e2006022. [Google Scholar] [CrossRef] [PubMed]
- Sandve, G.K.; Nekrutenko, A.; Taylor, J.; Hovig, E. Ten Simple Rules for Reproducible Computational Research; Public Library of Science: San Francisco, CA, USA, 2013. [Google Scholar]
- Moons, K.G.; Altman, D.G.; Reitsma, J.B.; Ioannidis, J.P.; Macaskill, P.; Steyerberg, E.W.; Vickers, A.J.; Ransohoff, D.F.; Collins, G.S. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): Explanation and elaboration. Ann. Intern. Med. 2015, 162, W1–W73. [Google Scholar] [CrossRef] [PubMed]
- Janssens, A.C.; Ioannidis, J.P.; Van Duijn, C.M.; Little, J.; Khoury, M.J. Strengthening the Reporting of Genetic Risk Prediction Studies: The GRIPS Statement. PLoS Med. 2011, 8, e1000420. [Google Scholar] [CrossRef] [PubMed]
- McShane, L.M.; Altman, D.G.; Sauerbrei, W.; Taube, S.E.; Gion, M.; Clark, G.M. REporting recommendations for tumour MARKer prognostic studies (REMARK). Br. J. Cancer 2005, 93, 387–391. [Google Scholar] [CrossRef] [PubMed]
- Moseley, H.N.B. Error analysis and propagation in metabolomics data analysis. Comput. Struct. Biotechnol. J. 2013, 4, e201301006. [Google Scholar] [CrossRef] [PubMed]
- Bossuyt, P.M.; Reitsma, J.B.; Bruns, D.E.; Gatsonis, C.A.; Glasziou, P.P.; Irwig, L.M.; Lijmer, J.G.; Moher, D.; Rennie, D.; De Vet, H.C.W. Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD initiative. Radiology 2003, 226, 24–28. [Google Scholar] [CrossRef] [PubMed]
- Bossuyt, P.M.; Reitsma, J.B.; Bruns, D.E.; Gatsonis, C.A.; Glasziou, P.P.; Irwig, L.M.; Moher, D.; Rennie, D.; De Vet, H.C.W.; Lijmer, J.G. The STARD statement for reporting studies of diagnostic accuracy: Explanation and elaboration. Ann. Intern. Med. 2003, 138, W1–W12. [Google Scholar] [CrossRef] [PubMed]
- Schulz, K.F.; Altman, D.G.; Moher, D. CONSORT 2010 statement: Updated guidelines for reporting parallel group randomised trials. BMC Med. 2010, 8, 18. [Google Scholar] [CrossRef] [PubMed]
- Kale, N.S.; Haug, K.; Conesa, P.; Jayseelan, K.; Moreno, P.; Rocca-Serra, P.; Nainala, V.C.; Spicer, R.A.; Williams, M.; Li, X. MetaboLights: An Open-Access Database Repository for Metabolomics Data. Curr. Protoc. Bioinform. 2016, 53, 14.13.1–14.13.18. [Google Scholar]
- Sud, M.; Fahy, E.; Cotter, D.; Azam, K.; Vadivelu, I.; Burant, C.; Edison, A.; Fiehn, O.; Higashi, R.; Nair, K.S. Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res. 2015, 44, D463–D470. [Google Scholar] [CrossRef] [PubMed]
- GitHub. Available online: https://github.com/ (accessed on 4 March 2019).
- Sveidqvist, K.; Bostock, M.; Pettitt, C.; Daines, M.; Kashcha, A.; Iannone, R. DiagrammeR: Create Graph Diagrams and Flowcharts Using R. R Package Version 0.9. 0. 2017. Available online: https://cran.r-project.org/web/packages/DiagrammeR/index.html (accessed on 4 March 2019).
- Rocca-Serra, P.; Salek, R.M.; Arita, M.; Correa, E.; Dayalan, S.; Gonzalez-Beltran, A.; Ebbels, T.; Goodacre, R.; Hastings, J.; Haug, K. Data standards can boost metabolomics research, and if there is a will, there is a way. Metabolomics 2016, 12, 14. [Google Scholar] [CrossRef] [PubMed]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Considine, E.C.; Salek, R.M. A Tool to Encourage Minimum Reporting Guideline Uptake for Data Analysis in Metabolomics. Metabolites 2019, 9, 43. https://doi.org/10.3390/metabo9030043
Considine EC, Salek RM. A Tool to Encourage Minimum Reporting Guideline Uptake for Data Analysis in Metabolomics. Metabolites. 2019; 9(3):43. https://doi.org/10.3390/metabo9030043
Chicago/Turabian StyleConsidine, Elizabeth C., and Reza M. Salek. 2019. "A Tool to Encourage Minimum Reporting Guideline Uptake for Data Analysis in Metabolomics" Metabolites 9, no. 3: 43. https://doi.org/10.3390/metabo9030043
APA StyleConsidine, E. C., & Salek, R. M. (2019). A Tool to Encourage Minimum Reporting Guideline Uptake for Data Analysis in Metabolomics. Metabolites, 9(3), 43. https://doi.org/10.3390/metabo9030043