MRlogP: Transfer Learning Enables Accurate logP Prediction Using Small Experimental Training Datasets
Round 1
Reviewer 1 Report
1- the author design and evaluate the method they used in such a good way, but it was very strange to me to try the method with the following reference compounds:
2,2',4,4',5-pentachlorobiphenyl
your result will be 4.237 and its reported one is 6.410
also Acetamide will be -0.297 and the reported one is -1.155
so i think you have to re design your algorithm to be more reliable
Author Response
Reviewer comment:
The author design and evaluate the method they used in such a good way, but it was very strange to me to try the method with the following reference compounds: 2,2',4,4',5-pentachlorobiphenyl your result will be 4.237 and its reported one is 6.410. Also Acetamide will be -0.297 and the reported one is -1.155, so i think you have to re design your algorithm to be more reliable.
Author response:
We thank the reviewer for their favourable evaluation of the experimental design and methodology used in the creation of MRlogP. We are also encouraged that they tested the predictor on some of their own compound suggestions (presumably carried out on the website developed to showcase the algorithm). It is unfortunate that the reviewer was unhappy with the returned logPs for 2,2',4,4',5-pentachlorobiphenyl (PCB) and acetamide, with logP prediction errors of 2.173 and 0.858 respectively. We would like to highlight however, that MRlogP is tuned for logP prediction on druglike small molecules, with our intended userbase being drug discovery scientists and medicinal chemists. We note that although included in some literature studies, polychlorinated biphenyls are typically classed as environmental contaminants from industrial processes and are very non-druglike (QED score of 0.52). Additionally, the manuscript notes the high performance of our MRlogP predictor in logP ranges where lots of training data was available. Supporting information figure S3 shows the expected error within the high (non-druglike) 6-7 logP bin (PCB’s true logP value) is around 3 log units. The MRlogP prediction error of 2.173 is belowthis expected error. We are therefore encouraged that MRlogP performed better than expected on this molecule outside of its intended application domain. The second molecule, acetamide is a very small (4 atoms with a combined molecular weight 59 Da) non-druglike compound (QED score of 0.40), which produces a prediction error of 0.858, well within our reported logP prediction accuracy despite also existing outside of our intended application domain. Whilst we are encouraged by the performance of MRlogP on non-druglike molecules tried by reviewer 1, we have placed a warning on the MRlogP web-predictor, strengthening the point that MRlogP is optimised for druglike small molecule prediction, and a link to a popular web service which implements multiple druglikeness related evaluations.
Reviewer 2 Report
The authors present in the Manuscript ID: processes-1337108 an interesting study on the prediction of the octanol/water partition coefficient of druglike compounds using the artificial network technique. This study is comprehensive, well documented and include accurate results, with the creation of an open, freely available, druglike small molecule logP predictor. The subject of this study is interesting, brings valuable new information in the application of the small molecule lipophilicity prediction in the drug design and I recommend this paper to be accepted for publication. The only suggestion I have is to visualize the structures (perhaps as smiles notation) of the prediction selected 224 druglike compounds used to calculate the MrlogP values, by including them in the supporting information, together with their experimental logP values.
Author Response
Reviewer comment:
The authors present in the Manuscript ID: processes-1337108 an interesting study on the prediction of the octanol/water partition coefficient of druglike compounds using the artificial network technique. This study is comprehensive, well documented and include accurate results, with the creation of an open, freely available, druglike small molecule logP predictor. The subject of this study is interesting, brings valuable new information in the application of the small molecule lipophilicity prediction in the drug design and I recommend this paper to be accepted for publication.
Author response
We thank the reviewer for their extremely positive review and were pleased to see they recommend the paper for publication.
Reviewer comment:
The only suggestion I have is to visualize the structures (perhaps as smiles notation) of the prediction selected 224 druglike compounds used to calculate the MrlogP values, by including them in the supporting information, together with their experimental logP values.
Author response
We were pleased to carry out this addition in order to strengthen our manuscript. We thank the reviewer for pointing out this opportunity. We have now expanded the supporting information to include Table S5, containing SMILES representations of the compounds, along with their experimentally determined logP values and QED scores for the 224 Martel Druglike compounds used in the transfer learning part of the study.
Reviewer 3 Report
The authors developed a consensus predictive model for predicting logP of drug-like compounds and then used a set of very small dataset to fine-tune the model. The authors claimed their models achieved better performance than JPlogP, an existing consensus model. However, I don't think the comparison is fair since the authors used a series filtering steps and transformations for the training and test datasets used in the study. At best, the presented research is a minor incremental improvement even if their claim is true. Thus, I cannot recommend it for publication.
Author Response
Reviewer comment:
The authors developed a consensus predictive model for predicting logP of drug-like compounds and then used a set of very small dataset to fine-tune the model. The authors claimed their models achieved better performance than JPlogP, an existing consensus model. However, I don't think the comparison is fair since the authors used a series filtering steps and transformations for the training and test datasets used in the study. At best, the presented research is a minor incremental improvement even if their claim is true. Thus, I cannot recommend it for publication.
Author response
We were very disappointed to read that reviewer 3 could not recommend our manuscript for publication. We would like to address a few points which we believe are inaccurate or misleading in their review.
From the outset, MRlogP aimed to be the most accurate predictor for druglike compounds. The filtering steps employed are clearly described and carried out with the intention of supplying our algorithms with druglike compounds. We evaluated MRlogP on more than 25,000 compounds, significantly more than JPlogP’s 974 test compounds.
MRlogP is also freely available for use by all, via Python source code, or on the constructed web portal, allowing drug discovery scientists and medicinal chemists to rapidly perform predictions on compounds. We believe this is in stark contrast to JPlogP’s public availability, released in two formats – 1) a cut down version different to that described in the paper, released, and integrated into the CDK toolkit, requiring knowledge of the Java programming language to compile and run. The version of JPlogP described in the paper is presumably proprietary as the released version is noted to be “available for release without any licensing agreements” in the conclusions. 2) The paper notes that “JPlogP will be available as a KNIME node from Lhasa Limited in the trusted community contributions in due course,”. We note that since publication in 2018, no release of Lhasa nodes has included JPlogP functionality, and no mention of the release is mentioned on their “Lhasa Nodes for KNIME” website.
MRlogP strives to compete with open source publicly accessible logP predictors. MRLogP is immediately usable, freely available, and has in fact been released to the public before JPlogP. MRlogP has been tested by drug discovery scientists not connected to the Auer Lab and has received strong comments and recommendations.
Reviewer 4 Report
Chen et al. describe a new method to predict the logP (partitioning betwen nonpolar and polar solvent) of drug-like compounds. Accurate prediction of logP is very important for design of new drug molecules. The authors use an ANN method trained on large sets of compounds and demonstrate superior performance compared to existing methods. I have only a few minor concerns:
- The authors report a feature vector with dimension 316. Although some features are mentioned in the manuscript I could not find an explanation for this number of molecular descriptors. The choice of this number is critical for the evaluation of the study.
- The authors report improved performance compared to existing prediction methods. However, an explanation of how existing approaches work (based on physical descriptors, or data driven statistical approaches) and how it differs from the ANN approach is not given.
- It is also intersting to learn which molecular descriptors are the most informative (or most important) to determine logPs. An information that can be extracted from the ANN the authors designed.
Author Response
Reviewer comment:
Chen et al. describe a new method to predict the logP (partitioning betwen nonpolar and polar solvent) of drug-like compounds. Accurate prediction of logP is very important for design of new drug molecules. The authors use an ANN method trained on large sets of compounds and demonstrate superior performance compared to existing methods. I have only a few minor concerns:
The authors report a feature vector with dimension 316. Although some features are mentioned in the manuscript I could not find an explanation for this number of molecular descriptors. The choice of this number is critical for the evaluation of the study.
The authors report improved performance compared to existing prediction methods. However, an explanation of how existing approaches work (based on physical descriptors, or data driven statistical approaches) and how it differs from the ANN approach is not given.
It is also intersting to learn which molecular descriptors are the most informative (or most important) to determine logPs. An information that can be extracted from the ANN the authors designed.
Author response
We thank the reviewer for highlighting where more clarity could be added to the feature vector definition. We have now expanded the text around line 165 to indicate that the 316 features are comprised of 128 molecular descriptors from the FP4 fingerprint, 128 from the ECFP4 fingerprint, and 60 from USRCAT. Our rationale for choosing these three diverse descriptor sets is given in the materials and methods section, whereby we envisage ECFP4 capturing graph connectivity within the molecule, FP4 capturing larger moieties, and USRCAT the overall 3D shape and electrostatics of the molecule.
We have also taken up the reviewer’s request to enhance the explanation of approaches taken by other techniques. We have now extended this section with additions beginning around line 53.
Whilst the understanding of the contribution of individual molecular descriptors is a worthwhile pursuit, the folded nature of the binary fingerprints in reduction to 128 bits makes this task difficult. Certainly, many overall features of molecules will have large contributions to logP – such as polar surface area, a function of the number of hydrogen bond donors and acceptors present, which is encoded in multiple input features. Detection of larger molecular moieties such as solubilising groups certainly requires representation using multiple input descriptors. We therefore expect higher order features to be the main drivers of logP prediction, rather than individual members of the 316 descriptors.
Reviewer 5 Report
This is a good paper to construct a new logP predictor in which modern machine learning techniques along with transfer learning are used.
For the high quality data of 244 druglike compounds for transfer learning to improve the accuracy of predictor, the Martel_DL dataset was employed.
The performance of the proposed method is seen in Figure 2 and Table 1.
The present work clearly provides the best performing freely available druglike small molecule logP predictor.
I have a concern for Figure 3 only whether RMSE=1-2 is satisfactory for practical pharmaceutical applications, which may be addressed further.
Author Response
Reviewer comment:
This is a good paper to construct a new logP predictor in which modern machine learning techniques along with transfer learning are used.
For the high quality data of 244 druglike compounds for transfer learning to improve the accuracy of predictor, the Martel_DL dataset was employed.
The performance of the proposed method is seen in Figure 2 and Table 1.
The present work clearly provides the best performing freely available druglike small molecule logP predictor.
I have a concern for Figure 3 only whether RMSE=1-2 is satisfactory for practical pharmaceutical applications, which may be addressed further.
Author response
We thank the reviewer for their positive comments and are greatly encouraged that they agree the work presents the best performing freely available druglike small molecule logP predictor. Whilst root mean squared errors of more than one log unit appear high, small molecule lipophilicity prediction is a difficult task which our freely available and open predictor competes well in. LogP prediction is certainly used in medicinal chemistry programs, cheminformatics and early-stage drug discovery with improvements impacting all of these areas. Historical use of predictors points to their usefulness in pharmaceutical and drug discovery settings. We hope the opensource nature and availability of MRlogP contributes somewhat towards the future reduction of these 1-2 RMSE prediction errors.
We aim at publishing MRlogP as quickly as possible, whilst looking at further applications of small accurate datasets exploited through transfer learning.
Round 2
Reviewer 1 Report
1- the compound I use arereferences compounds for any other methods, the authors have to define the requirements of compounds for his methods for example no. of atoms, no of rings and M.wt, and so on.
Author Response
Reviewer comment:
The compound I use are references compounds for any other methods, the authors have to define the requirements of compounds for his methods for example no. of atoms, no of rings and M.wt, and so on.
Author response:
We thank the reviewer for their comments and for trying our logP predictor. We understand that the compounds the reviewer chose to test our predictor are used as standards for analytical techniques. We have taken onboard the reviewer’s advice to strengthen the description of compounds that MRlogP is tailored for, expanding on ‘druglike’ and noting the advantage that the QED method brings with a quantification of the subjective druglike term (additions line 88). We would like to reiterate, and emphasize that in predicting logPs for the reviewer’s compounds, our predictor performed within estimated errors for the logP range of 2,2',4,4',5-pentachlorobiphenyl and within the overall reported error for acetamide, even with both being non-druglike (QED scores of 0.52 and 0.4 respectively). We hope that expanding on the druglike criteria in the manuscript, along with predictions within expected errors has addressed the reviewers concerns.
Reviewer 3 Report
Thank the authors for the responses. However, I am still not convinced why the manuscript should be accepted for publication. The authors claimed that their model "greatly outperforms JPlogP on the REaxys_DL", but in reality, the Figure 3 clearly shows it is only slightly better in the range of 0.5-4.5, and this was based on the dataset the authors collected. While it is true the logP values of many drugs are in the range, certainly drugs may be not (e.g. sub-lingual absorbed drugs). Also, is this improvement statistically significant at all?
Accessibility is certainly important, but I would like to see a true improvement first.
Author Response
Reviewer comment:
Thank the authors for the responses. However, I am still not convinced why the manuscript should be accepted for publication. The authors claimed that their model "greatly outperforms JPlogP on the REaxys_DL", but in reality, the Figure 3 clearly shows it is only slightly better in the range of 0.5-4.5, and this was based on the dataset the authors collected. While it is true the logP values of many drugs are in the range, certainly drugs may be not (e.g. sub-lingual absorbed drugs). Also, is this improvement statistically significant at all?
Accessibility is certainly important, but I would like to see a true improvement first.
Author response
We thank the reviewer for highlighting their concerns around the strength of our statements regarding MRlogP vs JPlogP performance on the Reaxys_DL dataset. We have removed the word “greatly” from this discussion. As to significance, performing a two tailed t-test (unequal variances) on the absolute MRlogP and JPlogP prediction error within the Reaxys_DL dataset delivers a p-value of less than 0.0001, defining the performance improvement as extremely significant. Coupled with previous explanation that the JPlogP reported in literature is not publicly available, MRlogP represents a valuable, citable, open, and expandable resource for logP prediction. We hope that we have answered all of the reviewers concerns and demonstrated significant improvement over JPlogP for druglike small molecules.