Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data
Round 1
Reviewer 1 Report
The authors developed a novel multi-task model to predict molecular fingerprints from LC-MS/MS data with the aim to provide fast automated tools to support the identification of metabolite structures using multi-task ANNs for modeling, and Sparse Principal Component Analysis for MS data dimensionality reduction. The similarity between predicted and actual fingerprints was measured by Jaccard-Tanimoto binary similarity index. Although the average similarity between the predicted and true fingerprints was between 0.43 and 0.47 (on the test set), much better results were achieved on the calibration and validation sets. Also, the authors found that the accuracy of the prediction depends on the chemical structure complexity - the higher the complexity, the better the prediction. This is in accordance with the ANNs learning abilities, which is better in the case of more detailed information, rather than the sparse MS fingerprints and structural data space.
The article is excellent! It is well written. The authors are renowned experts in the field of Chemometrics. I can wholeheartedly recommend it for publication in your journal.
Reviewer 2 Report
The manuscript describes the use of artificial neural networks to predict the fingerprints that identify the structure of a new, unknown metabolite. The network takes the Mass spectroscopy (MS) spectra of a metabolite data set as input and the fingerprint of the metabolite structure as output. The work was performed using two MS data sets, one with 40K samples and another with 12K samples. The authors show that the predictions are almost independent of the size of the data set. Furthermore, after their training, the system can predict the fingerprints of an unknown metabolite by starting from their MS spectrum.
The authors performed a detailed data pre-processing to exclude anomalous spectra that could produce wrong training and, consequently, wrong predictions of the metabolite structure. They describe the model architecture and the optimization of the parameters and hyperparameters to get a learning system producing confident predictions. Therefore, the manuscript is valuable for publishing in the journal Molecules in the Cross-Field Chemistry section.