Next Article in Journal
Study of Rock Mass Rating (RMR) and Geological Strength Index (GSI) Correlations in Granite, Siltstone, Sandstone and Quartzite Rock Masses
Next Article in Special Issue
Partition Quantitative Assessment (PQA): A Quantitative Methodology to Assess the Embedded Noise in Clustered Omics and Systems Biology Data
Previous Article in Journal
Electric Stimulation of Astaxanthin Biosynthesis in Haematococcus pluvialis
Previous Article in Special Issue
A Serum Metabolomic Signature for the Detection and Grading of Bladder Cancer
 
 
Article
Peer-Review Record

A Comparative Cross-Platform Meta-Analysis to Identify Potential Biomarker Genes Common to Endometriosis and Recurrent Pregnancy Loss

Appl. Sci. 2021, 11(8), 3349; https://doi.org/10.3390/app11083349
by Pokhraj Guha 1, Shubhadeep Roychoudhury 2,*, Sobita Singha 2, Jogen C. Kalita 3, Adriana Kolesarova 4, Qazi Mohammad Sajid Jamal 5, Niraj Kumar Jha 6, Dhruv Kumar 7, Janne Ruokolainen 8 and Kavindra Kumar Kesari 8,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2021, 11(8), 3349; https://doi.org/10.3390/app11083349
Submission received: 20 February 2021 / Revised: 25 March 2021 / Accepted: 6 April 2021 / Published: 8 April 2021
(This article belongs to the Special Issue Towards a Systems Biology Approach)

Round 1

Reviewer 1 Report

The manuscript entitled “A comparative cross-platform meta-analysis to identify potential biomarkers common to endometriosis and recurrent pregnancy loss” focus on interesting meta-analysis of biomarkers linking to endometriosis via in silico approach. Authors have identified biomarker genes through a system biology approach which could be utilized to find out treatment against endometriosis and recurrent 3 pregnancy loss. This is an interesting and promising approach the results of which would be interesting for the readers of Applied Sciences.I have few minor comments for the improvement of the manuscript-

 

1.    The manuscript contains grammatical and syntactical errors. 

2.    In what way this study is different from earlier published reports on endometrial genetics?

3.    Have the authors gone through data queries in other databases such as Array Express?

4.    In figure legend 2, please mention what do you mean by “effect value”.

5.    Highlight some common pathways, if any, where the target genes are integrated with action. 

6.     Lines 88-89 Why data was searched for only 3 months (July to September 2020)? What was the specific reason for their selection, needs to be clarified?

7.     Authors need to provide suitable references for NIA array analysis, ANOVA, False Discovery Rate (FDR), Limma Package, and Heat package of R or any additional software used in the study.

8.     Line 100….. Normalization of the data was carried out…….It is suggested that authors should provide a separate sections in more detail for methodology adopted for pre-processing of the data or standardization of selected microarrays data quality.

9.     Section 2.4 need to divide into separate sections with detailed information e.g. parameters selection for protein-protein interaction and pathways enrichment analysis.

10.  Figure 5. A high-resolution figure will be more suitable for the readers. Although authors have described node colors meaning but within the caption the connecting colored lines descriptions needed.

11.  Figure 6. A high-resolution figure needed. Labeling is very small in size, it is recommended that provide better quality figures with important node labels.

Author Response

The manuscript entitled “A comparative cross-platform meta-analysis to identify potential biomarkers common to endometriosis and recurrent pregnancy loss” focus on interesting meta-analysis of biomarkers linking to endometriosis via in silico approach. Authors have identified biomarker genes through a system biology approach which could be utilized to find out treatment against endometriosis and recurrent 3 pregnancy loss. This is an interesting and promising approach the results of which would be interesting for the readers of Applied Sciences. I have few minor comments for the improvement of the manuscript-

 

Comment 1: The manuscript contains grammatical and syntactical errors. 

Response: Thank you for your suggestion. We have incorporated all the grammatical and syntactical errors throughout the manuscript. We have also took the help of native English speaking person in corrections.

 

Comment 2: In what way this study is different from earlier published reports on endometrial genetics?

Response: In most of the earlier studies that we have cited or are reported elsewhere, focus has been given on endometrial genetics, as you have correctly mentioned. However, in this study, we have tried to explore the underlying connection between Endometriosis and Recurrent Pregnancy Loss. Such pathophysiological association in terms of potential diagnostic biomarkers have not been explored earlier. To our belief, this work will definitely contribute to finding the link, if any, between Endometriosis and Recurrent Pregnancy Loss, and aid in future in the diagnosis of these two closely associated diseases.

 

Comment 3: Have the authors gone through data queries in other databases such as Array Express?

Response: Yes, we have gone through data queries in other databases including Array Express but none of the datasets there met the inclusion criteria that we have set for our study (Page no. 3, Line number 135).

 

Comment 4: In figure legend 2, please mention what do you mean by “effect value”.

Response: Thank you for the correction. We have mentioned about the ‘Effect Value’ inside the manuscript. However, following your suggestion, we have mentioned about Effect Value in the legend of Figure 3 of the revised manuscript (in “Track Changes” mode) which is the heatmap.

 

Comment 5: Highlight some common pathways, if any, where the target genes are integrated with action. 

Response: Thanks a lot for such an interesting remark. It is already evident from the Enrichment Network (Figure 6) that a large number of Biological Pathways are associated with the genes that are found in this observation. However, unfortunately, no single pathway in the enrichment network have all the four marker genes involved. However as you have said, there are pathways where more than one of these potential markers are involved and these data have been provided in the supplementary Table 1. I would like to request you to go through the supplementary Table once where those pathways have been clearly marked.

However, based on your suggestion, the following paragraph has been added to the manuscript in Page no 14 line no. 436 as follows:

“ When we tried to explore the involvement of our potential biomarkers in the biological pathways, it was seen that SNRPF protein is involved in the Cellular component organization pathway. Interestingly, CTNNB1 is involved in all the 20 pathways. Hnrnpab is involved in 15 pathways and TWIST 2 in 13 of the pathways. CTNNB1, HNRNPAB and TWIST2 are commonly involved in 11 out of 20 major pathways that are represented in Supplementary Table 1, while SNRPF and CTNNB1 share only one pathway in common.”

 

Comment 6: Lines 88-89 Why data was searched for only 3 months (July to September 2020)? What was the specific reason for their selection, needs to be clarified?

Response: By this line we tried to say that the query process was run for three consecutive months for searching the data. But our datasets were not restricted to that period. Other way around, we did not restrict our selection of datasets to those that were reported on those three months.

 

Comment 7: Authors need to provide suitable references for NIA array analysis, ANOVA, False Discovery Rate (FDR), Limma Package, and Heat package of R or any additional software used in the study.

Response: Thank you for the suggestion. As per your advice, we have cited the references for the above-mentioned software(s) or statistical methods in the revised manuscript (Page 4, Line 155-176).

 

Comment 8: Line 100….. Normalization of the data was carried out…….It is suggested that authors should provide a separate sections in more detail for methodology adopted for pre-processing of the data or standardization of selected microarrays data quality.

Response: Based on your suggestion, a paragraph on normalization has been included in the manuscript (Page 4 Line 150).

 

Comment 9: Section 2.4 need to divide into separate sections with detailed information e.g. parameters selection for protein-protein interaction and pathways enrichment analysis.

Response: As suggested, section 2.4 has been subdivided into separate subsections and analyzed accordingly in the revised manuscript (Page4, line 186 to Page 5, line 206).

 

Comment 10: Figure 5. A high-resolution figure will be more suitable for the readers. Although authors have described node colors meaning but within the caption the connecting colored lines descriptions needed.

Response: Thank you.A high-resolution figure is available from Cytoscape in pdf version. Therefore, we are submitting this figure as pdf form which may be highly understandable by the readers.

 

Comment 11:  Figure 6. A high-resolution figure needed. Labeling is very small in size, it is recommended that provide better quality figures with important node labels.

Response: Thank you.A high-resolution figure is available from Cytoscape in pdf version. Therefore, we are submitting this figure as pdf form which may be highly understandable by the readers.

Reviewer 2 Report

Major concerns:

  1.  A meta-analysis of endometriosis was published in January of last year (Poli-Neto et al. Scientific Reports, January 2020).  Results of the analysis of this manuscript should be compared to the Poli-Neto paper, and differences explained in the discussion.
  2. The methods are ambiguous as to why other endometriosis microarray datasets were excluded from this analysis.  "Precautions were adopted to exclude datasets containing overlapping sample sets or datasets that have been generated from the same research laboratory" is not sufficiently descriptive for why the 5 were specifically chosen, or why the others were excluded (including those in the paper mentioned above).
  3. In the description of the analysis pertaining to Network Analyst, the authors mention using a 'Random effect model', a p-value cutoff, a FDR cutoff, and then the use of Limma.  More description is needed in this section, as it doesn't make sense what each of these components is being used for.  Why is there both a p-value and a FDR threshold being used?  Does Network Analyst identify DEGs, or was Limma used?
  4. The authors fail to provide any details about their use of Geo2R, specifically what method was used to calculate differential expression and what thresholds were used to determine a DEG.  
  5. No details are provided about the requirements of directionality when comparing the different methods.  For the Venn diagrams in figure 4, are the genes required to be significantly changing in the same direction relative to controls?  If not, can the authors explain the discrepancies?  
  6. In between two statements about Figure 3 (the heat map), the authors state ' expression profile of the genes are clearly distinguishable between the patient and the control samples of each dataset'.  However, the heat map does not display any such information.  
  7. In the conclusion, the authors state that they identified 120 DEGs, however, that 120 number is based exclusively on the ExAtlas and Network Analyst analyses, and doesn't include the Geo2R results.  The authors should describe why the results of Geo2R analysis are not included.  And, if they're not important, why is the analysis included in this manuscript?   
  8. For the heat map figure, the authors discuss how the clustering shows the similarity of the RPL samples relative to the endometriosis samples, however no information is given as to how the heat map was generated.   What type of clustering was used?  What was the distance metric?   What type of linkage was used?

 

Overall, the authors perform a meta-analysis of endometriosis microarray experiments.  However, the authors fail to acknowledge previous studies that already attempt to do this.  In addition, the authors fail to provide nearly enough detail in the methods section to explain how they arrived at their results or to allow for someone to reproduce their findings.   Finally, the conclusion of the paper is focused on only two of the 3 methods for identifying DEGs.   If the DEGs from the Geo2R are not a part of the conclusion of the paper, the authors should justify why they are including that analysis.  

Author Response

Comment 1: A meta-analysis of endometriosis was published in January of last year (Poli-Neto et al. Scientific Reports, January 2020).  Results of the analysis of this manuscript should be compared to the Poli-Neto paper, and differences explained in the discussion.

Response: I agree with the reviewer concern that our study could be compared with Poli-Neto et al. study. We have added couple of lines defining main outcomes of both studies in the discussion section. Line 501-510.

 

Comment 2: The methods are ambiguous as to why other endometriosis microarray datasets were excluded from this analysis.  "Precautions were adopted to exclude datasets containing overlapping sample sets or datasets that have been generated from the same research laboratory" is not sufficiently descriptive for why the 5 were specifically chosen, or why the others were excluded (including those in the paper mentioned above).

Response: Following your suggestions, specific inclusion criteria (that were previously thought to be obvious for such study) that were used while selecting the sample for our meta-analyses, have been mentioned in this revised version of the manuscript (Page 3 Line 135). Only those datasets were chosen for our study which met these inclusion criteria.

 

Comment 3: In the description of the analysis pertaining to Network Analyst, the authors mention using a 'Random effect model', a p-value cutoff, a FDR cutoff, and then the use of Limma.  More description is needed in this section, as it doesn't make sense what each of these components is being used for.  Why is there both a p-value and a FDR threshold being used?  Does Network Analyst identify DEGs, or was Limma used?

Response: Thank you for this interesting question. In this regard, we would like to mention that Network Analyst is a software, which uses Limma as the statistical method for microarray differential analyses.

Also, the FDR is a modification of the p-value. The p-value for a single gene (in its standard presentation) is independent of the other p-values in the profile, while the FDR takes those other tests into account. Both are useful - the FDR can be a great indicator of the strength of a study, the p-value can be more useful for statistical power analyses in future studies). Therefore both the values have been mentioned and following your suggestion, the information has been updated in the revised manuscript (Page 3 Lines 125-127).

 

Comment 4: The authors fail to provide any details about their use of Geo2R, specifically what method was used to calculate differential expression and what thresholds were used to determine a DEG.  

Response: Quantile normalization was performed and the Benjamini & Hochberg false discovery rate method has been selected by default for Geo2R analyses because it is the most commonly used adjustment for microarray data and provides a good balance between discovery of statistically significant genes and limitation of false positives (Haynes, 2013) [Benjamini–Hochberg Method. In: Dubitzky W., Wolkenhauer O., Cho KH., Yokota H. (eds) Encyclopedia of Systems Biology. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9863-7_1215)].

 

Comment 5: No details are provided about the requirements of directionality when comparing the different methods.  For the Venn diagrams in figure 4, are the genes required to be significantly changing in the same direction relative to controls?  If not, can the authors explain the discrepancies?  

Response: This is a very crucial question in relation to the approach we have adopted in this manuscript.

Thanks a lot. The explanation is as given below.

In case of Figure 4, we have shown a Venn diagram (Fig 4a.) showing the outcome of the overlap of the lists of differentially expressed genes from individual dataset using Geo2R. As the datasets are individually analyzed, therefore, they are not normalized against each other (cross-platform normalization). From this approach we detected 19 genes to be common among all the 5 datasets.

The other two approaches i.e. ExAtlas and Network Analyst are based on Late Integration and Early Integration of microarray data. The differences between these two approaches are discussed in detail by Walsh et al., 2015 (Microarrays 4, 389-406, doi:10.3390/microarrays4030389).

Thus, the difference in the analytical principles of the three approaches explains the differences that are found among the output of the three approaches.

However, following your suggestion, we have clearly mentioned in our revised manuscript that we have considered the differentially expressed genes (both up-regulated and down-regulated) for all the above mentioned three approaches for comparative analyses because in this study, we primarily aimed to indentify genes that are differentially expressed (Page 8 Lines 268-270). 

 

Comment 6: In between two statements about Figure 3 (the heat map), the authors state 'expression profile of the genes are clearly distinguishable between the patient and the control samples of each dataset'.  However, the heat map does not display any such information.  

Response: Thanks a lot for this observation. In the Figure 3, the legend shows the effect value ranging from -4 to +4. The effect value refers to log ratio of gene expression change/difference in the patients compared to the control. We have mentioned that the ‘expression profile of the genes are clearly distinguishable between the patient and the control samples of each dataset’ based on the concept that the effect value of each gene across each dataset (as evident from the colour of the legend) clearly denotes the difference between the patients and the control for that gene for a particular dataset. However, if that line is not clearly evident from the figure, then we would like to omit that line from our manuscript for avoiding any sort of misinterpretation of the results.

 

Comment 7: In the conclusion, the authors state that they identified 120 DEGs, however, that 120 number is based exclusively on the ExAtlas and Network Analyst analyses, and doesn't include the Geo2R results.  The authors should describe why the results of Geo2R analysis are not included.  And, if they're not important, why is the analysis included in this manuscript?   

Response: We would like to request you to kindly follow the Venn diagram Figure 4B wherein it is clearly evident that among the 120 genes that are common between ExAtlas and Network Analyst approach. Interestingly, among those 120 genes, only one gene (TWIST2) is also shared with the Geo2R analyses (Fig 4A) and accordingly has also been included in all the downstream analyses (Page 8, Line 260-280). Furthermore, TWIST2 has also been found to be involved in 13 out of the first 20 pathways (based on node size), as evident from the pathway enrichment analyses (Page 14 Lines 450-453).

 

Comment 8: For the heat map figure, the authors discuss how the clustering shows the similarity of the RPL samples relative to the endometriosis samples, however no information is given as to how the heat map was generated.   Whattype of clustering was used?  What was the distance metric?   What type of linkage was used?

Response: Indeed, this is a very vital question and it missed our attention. The heatmap was generated using Complex heatmap package of BioConductor version 3.12 using R version 4.0. In the following heatmamp, hierarchical clustering was used with predefined option for distance metric using Pearson coefficient. Complete linkage has been used by default for hierarchical clustering. In our case, all the options were set to default and therefore, the protocol can be understood in details in from the study of Gu et al., 2016 (Page 4 Line 182) [Gu, Z.; Eils, R.; Schlesner, M. Complex Heatmaps Reveal Patterns and Correlations in Multidimensional Genomic Data. Bioinformatics2016,32, 2847–2849, doi:10.1093/bioinformatics/btw313].

Back to TopTop