Automated Recommendation of Research Keywords from PubMed That Suggest the Molecular Mechanism Associated with Biomarker Metabolites
Abstract
:1. Introduction
2. Results
2.1. Preparation of Example MeSH Terms
2.2. Development of the MeSH Term Search Method
2.3. Efficient Literature Survey Guided by the Obtained MeSH Terms
2.4. Summarization of the MeSH Terms by Over-Represented Analysis
2.5. Considerable Variations in Number of Obtained MeSH Terms among Metabolites and Keywords
3. Discussion
4. Materials and Methods
4.1. Computational Resources and Code Availability
4.2. Acquisition of PubMed and MeSH Term Data
4.3. Calculation of Connectivity Score S Using Co-Occurrence Information Derived from PubMed
4.4. Construction of a Randomized DB and Estimation of FDR
4.5. Procedure for Finding MeSH Terms That Associate with Two MeSH Terms
- (1)
- Two MeSH terms of metabolite c and the researcher’s known keyword k were prepared. The list of available 13,985 MeSH terms is shown in Data S1.
- (2)
- For a MeSH term k′, the connectivity score S(c, k′, k) was determined using the confidence (LR) method and Equation (1) as follows:
- (3)
- The p-value of the connectivity score S (c, k′, k) was determined using Equation (2), with a null distribution.
- (4)
- The FDR value was obtained from the p-value using the Benjamini–Hochberg method [18].
- (5)
- All MeSH terms k′, whose FDR levels were lower than the threshold level, were obtained as answer keywords.
4.6. Over-Representation Analysis
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wishart, D.S. Emerging applications of metabolomics in drug discovery and precision medicine. Nat. Rev. Drug Discov. 2016, 15, 473–484. [Google Scholar] [CrossRef] [PubMed]
- Wheelock, C.E.; Goss, V.M.; Balgoma, D.; Nicholas, B.; Brandsma, J.; Skipp, P.J.; Snowden, S.; Burg, D.; D’Amico, A.; Horvath, I.; et al. Application of ‘omics technologies to biomarker discovery in inflammatory lung diseases. Eur. Respir. J. 2013, 42, 802–825. [Google Scholar] [CrossRef] [PubMed]
- Johnson, C.H.; Ivanisevic, J.; Benton, H.P.; Siuzdak, G. Bioinformatics: The next frontier of metabolomics. Anal. Chem. 2015, 87, 147–156. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Matsuda, F.; Toya, Y.; Shimizu, H. Learning from quantitative data to understand central carbon metabolism. Biotechnol. Adv. 2017, 35, 971–980. [Google Scholar] [CrossRef] [PubMed]
- Barupal, D.K.; Fan, S.; Fiehn, O. Integrating bioinformatics approaches for a comprehensive interpretation of metabolomics datasets. Curr. Opin. Biotechnol. 2018, 54, 1–9. [Google Scholar] [CrossRef]
- Johnson, C.H.; Ivanisevic, J.; Siuzdak, G. Metabolomics: Beyond biomarkers and towards mechanisms. Nat. Rev. Mol. Cell Biol. 2016, 17, 451–459. [Google Scholar] [CrossRef] [Green Version]
- Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-Based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th International Conference on World Wide Web, WWW 2001, Hong Kong, China, 1–5 May 2001; pp. 285–295. [Google Scholar]
- Abdeddaim, S.; Vimard, S.; Soualmia, L.F. The MeSH-Gram neural network model: Extending word embedding vectors with MeSH concepts for semantic similarity. Stud. Health Technol. Inform. 2019, 264, 5–9. [Google Scholar] [CrossRef]
- Yang, H.; Lee, H.J. Research trend visualization by MeSH terms from PubMed. Int. J. Environ. Res. Public. Health 2018, 15, 1113. [Google Scholar] [CrossRef] [Green Version]
- Zhou, J.; Shui, Y.; Peng, S.; Li, X.; Mamitsuka, H.; Zhu, S. MeSHSim: An R/Bioconductor package for measuring semantic similarity over MeSH headings and MEDLINE documents. J. Bioinform. Comput. Biol. 2015, 13, 1542002. [Google Scholar] [CrossRef] [Green Version]
- Ono, T.; Kuhara, S. A novel method for gathering and prioritizing disease candidate genes based on construction of a set of disease-related MeSH(R) terms. BMC Bioinform. 2014, 15, 179. [Google Scholar] [CrossRef] [Green Version]
- Lim, C.G.; Jeong, B.S.; Choi, H.J. Suggesting biomedical topics for unseen research articles based on MeSH descriptors. In Proceedings of the 2015 International Conference on Big Data and Smart Computing (BIGCOMP), Jeju, Korea, 9–11 February 2015; pp. 51–54. [Google Scholar]
- Ishida, Y.; Shimizu, T.; Yoshikawa, M. An analysis and comparison of keyword recommendation methods for scientific data. Int. J. Digit. Librarie 2020, 21, 307–327. [Google Scholar] [CrossRef] [Green Version]
- Sreekumar, A.; Poisson, L.M.; Rajendiran, T.M.; Khan, A.P.; Cao, Q.; Yu, J.; Laxman, B.; Mehra, R.; Lonigro, R.J.; Li, Y.; et al. Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression. Nature 2009, 457, 910–914. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Luka, Z.; Mudd, S.H.; Wagner, C. Glycine N-methyltransferase and regulation of S-adenosylmethionine levels. J. Biol. Chem. 2009, 284, 22507–22511. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Klein, M.S.; Shearer, J. Metabolomics and type 2 diabetes: Translating basic research into clinical application. J. Diabetes Res. 2016, 2016, 3898502. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Agrawal, R.; Imieliński, T.; Swami, A. Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 1993, 22, 207–216. [Google Scholar] [CrossRef]
- Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate-a practical and powerful approach to multiple testing. J. R Stat. Soc. B 1995, 57, 289–300. [Google Scholar] [CrossRef]
- Green, T.; Chen, X.; Ryan, S.; Asch, A.S.; Ruiz-Echevarria, M.J. TMEFF2 and SARDH cooperate to modulate one-carbon metabolism and invasion of prostate cancer cells. Prostate 2013, 73, 1561–1575. [Google Scholar] [CrossRef] [Green Version]
- Urhammer, S.A.; Moller, A.M.; Nyholm, B.; Ekstrom, C.T.; Eiberg, H.; Clausen, J.O.; Hansen, T.; Pedersen, O.; Schmitz, O. The effect of two frequent amino acid variants of the hepatocyte nuclear factor-1alpha gene on estimates of the pancreatic beta-cell function in Caucasian glucose-tolerant first-degree relatives of type 2 diabetic patients. J. Clin. Endocrinol. Metab. 1998, 83, 3992–3995. [Google Scholar] [CrossRef]
- Fang, L.; Zhou, X.B.; Cui, L. Biclustering high-frequency MeSH terms based on the co-occurrence of distinct semantic types in a MeSH tree. Scientometrics 2020, 124, 1179–1190. [Google Scholar] [CrossRef]
- Yoshii, K.; Ogasawara, M.; Wada, J.; Yamamoto, Y.; Inouye, K. Exploration of dipeptidyl-peptidase IV (DPP IV) inhibitors in a low-molecular mass extract of the earthworm Eisenia fetida and identification of the inhibitors as amino acids like methionine, leucine, histidine, and isoleucine. Enzyme Microb. Technol. 2020, 137, 109534. [Google Scholar] [CrossRef]
- Deacon, C.F. Dipeptidyl peptidase 4 inhibitors in the treatment of type 2 diabetes mellitus. Nat. Rev. Endocrinol. 2020, 16, 642–653. [Google Scholar] [CrossRef]
- Izumi, Y.; Matsuda, F.; Hirayama, A.; Ikeda, K.; Kita, Y.; Horie, K.; Saigusa, D.; Saito, K.; Sawada, Y.; Nakanishi, H.; et al. Inter-Laboratory Comparison of Metabolite Measurements for Metabolomics Data Integration. Metabolites 2019, 9, 257. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA. Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef] [PubMed]
- Matsuda, F.; Shinbo, Y.; Oikawa, A.; Hira, M.Y.; Fiehn, O.; Kanaya, S.; Saito, K. Assessment of metabolome annotation quality: A method for evaluating the false discovery rate of elemental composition searches. PLoS ONE 2009, 4, e7490. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Seabold, S.; Perktold, J. Statsmodels: Econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010. [Google Scholar] [CrossRef] [Green Version]
Example 1. Sarcosine and Prostate Neoplasm (1) | Example 2. Leucine and Diabetes Mellitus, Type 2 (2) | |||||
---|---|---|---|---|---|---|
Methods for Association Scoring | Number of Obtained MeSH Terms | Ranking of Dimethylglycine Dehydrogenase | Ranking of One-Carbon Group Transferases | Number of Obtained MeSH Terms | Ranking of Insulin Resistance | Ranking of Mechanistic Target of Rapamycin Complex 1 |
Simpson | 4 | 4th | No hit | 2 | No hit | No hit |
Lift | 0 | No hit | No hit | 0 | No hit | No hit |
Cosine | 1 | No hit | No hit | 54 | No hit | No hit |
Confidence (RR) | 0 | No hit | No hit | 0 | No hit | No hit |
Confidence (RL) | 6 | No hit | No hit | 4 | No hit | No hit |
Confidence (LR) | 5 | 3rd | 5th | 291 | 53rd | 77th |
Confidence (LL) | 1 | No hit | No hit | 0 | No hit | No hit |
Ranking | Obtained MeSH Terms, k′ | Co-Occurrence (c, k′) (n) | A(c, k′) | Co-Occurrence (k′, k) (n) | A(k, k′) | p-Value | FDR | PubMed Search Hit (1) |
---|---|---|---|---|---|---|---|---|
1 | Sarcosine Dehydrogenase | 25 | 0.431 | 5 | 0.086 | 1.00 × 10−8 | 1.4 × 10−4 | 5 |
2 | Sarcosine Oxidase | 38 | 0.245 | 7 | 0.045 | 8.00 × 10−8 | 5.6 × 10−4 | 7 |
3 | Dimethylglycine Dehydrogenase | 15 | 0.326 | 1 | 0.022 | 1.70 × 10−7 | 7.9 × 10−4 | 1 |
4 | Glycine N-Methyltransferase | 19 | 0.075 | 14 | 0.055 | 3.00 × 10−7 | 1.0 × 10−3 | 6 |
5 | One-Carbon Group Transferases | 1 | 0.019 | 3 | 0.056 | 3.38 × 10−6 | 9.4 × 10−3 | 7 |
MeSH Tree ID | MeSH ID | MeSH Term | Number of Obtained MeSH Terms in the Lower Hierarchy | Total Number of MeSH Terms in the Lower Hierarchy | p | FDR |
---|---|---|---|---|---|---|
D08.811.277.656 | D010447 | Peptide Hydrolases | 28 | 358 | 5.32 × 10−6 | 2.05 × 10−5 |
D08.811.277.656.350 | D020689 | Exopeptidases | 10 | 35 | 4.44 × 10−16 | 5.53 × 10−15 |
D08.811.277.656.350.100 | D000626 | Aminopeptidases | 2 | 6 | 5.96 × 10−5 | 0.000196 |
D08.811.277.656.350.350 | D004152 | Dipeptidyl-Peptidases and Tripeptidyl-Peptidases | 2 | 3 | 1.92 × 10−9 | 1.09 × 10−8 |
D08.811.277.656.350.555 | D045727 | Metalloexopeptidases | 3 | 10 | 4.13 × 10−6 | 1.63 × 10−5 |
D08.811.277.656.675.555 | D045727 | Metalloexopeptidases | 3 | 10 | 4.13 × 10−6 | 1.63 × 10−5 |
D08.811.277.656.837 | D043484 | Proprotein Convertases | 4 | 9 | 1.53 × 10−11 | 9.91 × 10−11 |
D08.811.913.696.620.682.700.931 | D058570 | TOR Serine-Threonine Kinases | 3 | 5 | 4.08 × 10−12 | 2.80 × 10−11 |
D08.811.913.696.620.682.700.931.500 | D000076222 | Mechanistic Target of Rapamycin Complex 1 | 2 | 2 | 7.02 × 10−14 | 5.48 × 10−13 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kanazawa, S.; Shimizu, S.; Kajihara, S.; Mukai, N.; Iida, J.; Matsuda, F. Automated Recommendation of Research Keywords from PubMed That Suggest the Molecular Mechanism Associated with Biomarker Metabolites. Metabolites 2022, 12, 133. https://doi.org/10.3390/metabo12020133
Kanazawa S, Shimizu S, Kajihara S, Mukai N, Iida J, Matsuda F. Automated Recommendation of Research Keywords from PubMed That Suggest the Molecular Mechanism Associated with Biomarker Metabolites. Metabolites. 2022; 12(2):133. https://doi.org/10.3390/metabo12020133
Chicago/Turabian StyleKanazawa, Shinji, Satoshi Shimizu, Shigeki Kajihara, Norio Mukai, Junko Iida, and Fumio Matsuda. 2022. "Automated Recommendation of Research Keywords from PubMed That Suggest the Molecular Mechanism Associated with Biomarker Metabolites" Metabolites 12, no. 2: 133. https://doi.org/10.3390/metabo12020133
APA StyleKanazawa, S., Shimizu, S., Kajihara, S., Mukai, N., Iida, J., & Matsuda, F. (2022). Automated Recommendation of Research Keywords from PubMed That Suggest the Molecular Mechanism Associated with Biomarker Metabolites. Metabolites, 12(2), 133. https://doi.org/10.3390/metabo12020133