A Hybrid Deep Learning Model for Protein–Protein Interactions Extraction from Biomedical Literature
Round 1
Reviewer 1 Report
In the paper a deep learning model for protein-protein interactions extraction from the literature is presented. My main concern is presentation of the method details. Some of them should be, in my opinion, improved:
- Could you write more about embedding (lines 138-142). Information that "Two embedding layeres are two (...) tensors" is confusing.
- The meaning of equations from (1) to (3) is not clear. Could write something more? Maybe additional figure concerning LSTM network could be usefull.
- Figure 1 suggests that CNN uses 2D convoulutions while formula (9) seems to use 1D convolutions.
- Last paragraph of section 3 suggests that there is only 1 convolutional layer while after Table 2 I can guess that there are 3 layers.
It would be also a good idea to make a mathematical notation more consistent. The same symbols are used twice in different contexts (e.g. h in lines 158 and 176, etc.). Moreover in line 175 output of Bidirectional LSTM is denoted as x but ealier y was used.
Other comments:
- It could be a good idea to move urls from text to reference section (e.g. line 137, 209, )
- In your text you use term "shortest dependency path" and its short name "sdp" using different capitalization (e.g. lines 190 and 17)
- In line 50 there should be "are the features" instead of "is the features".
- In line 48 isn't there missing somethong before = sign?
- In line 197 there is double "are".
- In Table 1 why the number of positive and negative sentences is bigger than the number of sentences?
- Lines 261-262: how can one observe classification performance in Figure 3?
- In Figure 4 LSTM model (orange line) seems to get better and better results. Is it possible that after longer training it would be better than your model? Why such close results are not visible in Table 3? In line 271 you write that there is a problem with longer training. Why? Training is performed only once. Could you write some comments about it in your text?
- The sentence in line 295 is not clear. Does int mean that other models with their original preprocessing, tokenization, parsing tools, etc. would be better?
- Could you write in concluding sections why, in your opinion, this CNN block after Bidirectional LSTM, allows to obtain better results?
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
The development of protein-protein interactions extraction models from Biomedical literature is a current not solve problem. An exploration on the “Science Direct” database using the keywords "protein-protein Interactions Extraction from Biomedical Literature" have reported an enough quantity of papers, including approaches taking as base DL models like the authors proposed work. Therefore, the manuscript presented here can be considered one more contribution on this topic. Concerning manuscript format it can be seen that authors have taken into account the current rules for scientific writing, the paper is well-organized and not major English corrections are needed. Figures have enough quality and mathematical formulas are well-used. From the content point of view authors have taken care to compare the proposed method with others previously published methods. The proposed method appears to be feasible and practical to implement. However, before to accept the publication of this work/manuscript, I ask authors to consider the following issues / questions:
- Authors show 42 visited bibliography references, which demonstrated you have taken care to review the literature regarding this knowledge area. However, some updates of the literature are welcomed, 26% of the reviewed works belong to the last 5-years (2015-2020). Moreover, only two of reviewed papers belong to the last 2-years (2018-2019).
- As it is known, to be robust and accurate DL models need large amounts of training data. Do you consider sufficient these two datasets to consider your method innovative? Please, can you explain better / justify this fact?
- Section 3 "The Model Description", for me a critical part, corresponding to your contributions is very limited/reduced. It is not clear / it was impossible for me to know what is the novelty of your work when compare with the current state of the art methods/developments in this area. Please, can you improve / describe better the novelty of your work? what are your real contributions above previous developed methods and in specific, regarding DL approaches?
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
I have observed that authors have answered and taken into account all my assigned issues and questions in the new submitted version of the paper.
While I consider that authors can still improve the writing of the manuscript, in particular, the aspects related to the novelty (own contributions to the state of the art), it is my opinion that the manuscript is now in better conditions. Therefore, I give my approval to publish this new version of the work.