Disease Prediction Using Graph Machine Learning Based on Electronic Health Data: A Review of Approaches and Trends
Round 1
Reviewer 1 Report
The works look fine but, the similarity report shows that it has a high percentage of similarity with other work.
Author Response
This is our own paper and original submission without peer review. We took a similarity report. We did not find a single sentence entirely copied and pasted from that report. We still went through the manuscript and took steps to reduce the similarity percentage of this revised manuscript.
Reviewer 2 Report
The authors presented a review article on recent published research studies about disease prediction using graph machine learning. The paper is nicely structured with sufficient background information and terminology. The graphical illustration of shallow embedding and GNN is clear. Advantages and disadvantages of discussed GNN models are clearly provided with clarity. Overall, the paper has merit to be published in Healthcare. There are a few points that I would like to ask the authors to consider before it can be published:
1) There are some typos/formatting issues in the paper. For example, Line 109 (three keywords is written, but there are more than three keywords in the example provided), Line 134 (correct section index), Table 1 and 2 (if you are to provide the authors name, please make it consistent, since some references do not have authors' name listed).
2) Figure 1 and 2 needs to be colored. It is hard to tell the grayscale difference in part (a) and (b) when the nodes and data are presented.
3) Explain all terms in equations used in this paper. Definition of some parameters are missing. For example, N(u) is equation 1, theta term in equation 3, H(l) is equation 5. Also when the acronyms are first used, provide the entire spell (e.g., Line 262, Long short-term memory?)
4) Another suggestion is to provide a few graphical examples for node classification and link prediction from the reviewed literature so that the real data can be visualized.
Author Response
(1) Thank you for this valuable comment. We fixed the typos and formatting issues in the paper. Please see highlighted text in page 3, line 125 and Page 13.
(2) We changed the figures in coloured, please see revised manuscript figure 1 and 2.
(3) We added the definition, please check line 212, 242 and 259, and abbreviation table in appendix.
(4) Thank you for this comment. We added the graphical examples, please see Figure 7.
Reviewer 3 Report
Overall, the manuscript is very well written. Please find below my remarks regarding the paper:
1 1) The manuscript has done an excellent job in reviewing the use of graph machine learning in the prediction of diseases from electronic health records. The quality of writing is also very well.
2 2) It would be very interesting if the authors could discuss elaborately on the kinds of diseases that have been predicted using graph machine learning algorithms. Also a description of the health data (such as the type of data, the size of data, quality of data, and any other details) that have been used.
3 3) Please ensure consistency in Table 2. In the ‘reference’ column, the third row is written as [67], but it is different in other rows.
4 4) Also in table 2, for the work by Santamaria [65], the area under the ROC curve is given as 0.74. However, multiple diseases have been predicted. Therefore it needs to be made clear the disease for which the area under ROC curve is 0.74. What about other diseases? The same question is valid for the work shown in the next row [67].
5 5) Future directions (Section 4) require more in-depth discussion. It would be great if authors could go into more detail about what kind of prediction is important and how to go about it in further details.
Author Response
(1) Thank you for your comment.
(2) Thank you for this feedback. We added a description of the health data in Table 1, please see the revised paper.
(3) We fixed this issue, please check table 1 in revised manuscript, thank you.
(4) There was only one under the ROC curve because they were making a multi-class prediction. For example, they are five labels in the dataset, and they used one model to predict these diseases.
(5) Thanks for the comment. We added “Moreover, in addition to the diseases mentioned above. Other diseases, such as COVID-19 and Thyroid, are current diseases of concern. It is also worth investigating how to use graph machine learning techniques to predict these diseases.” in section 6.3. Please see the revised manuscript line 542 - 545.
Reviewer 4 Report
Dear authors,
It has been a pleasure to review this manuscript. The paper reflects the hard work you have put into performing it. However, I have some suggestions in order to improve this paper:
-there are studies reflecting performed during the COVID pandemic period which deserve our attention ( ie: https://www.mdpi.com/2075-1729/11/11/1281)
- what about papers regarding the pediatric population? ( ie : https://www.mdpi.com/1648-9144/57/4/395)
-it would be helpful to add an abbreviation section
Author Response
Thanks for the comment. We reviewed these papers and the abbreviation table in the revised manuscript. Please see the revised manuscript line 542 – 545 and appendix.
Round 2
Reviewer 2 Report
The authors have addressed the points listed from the first round of review. Therefore, I would like to recommend the publication of this paper.
Reviewer 3 Report
I am happy with the changes made by the authors.