AAindexNC: Estimating the Physicochemical Properties of Non-Canonical Amino Acids, Including Those Derived from the PDB and PDBeChem Databank
Abstract
:1. Introduction
- Sequence alignment and homology modeling, by incorporating amino acid substitution matrices into alignment algorithms that reflect the physicochemical differences between amino acids (this can improve the accuracy of sequence homology models that compare proteins according to their functional or structural similarity) [32,33,34,35].
2. Results
3. Discussion
4. Materials and Methods
4.1. Formulation of the Problem
4.2. SMILES for Canonical Amino Acids
4.3. SMILES for Non-Canonical Amino Acids
4.4. The Selection of Statistically Significant Predictors and Prediction Quality Statistical Assessment
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kawashima, S.; Pokarowski, P.; Pokarowska, M.; Kolinski, A.; Katayama, T.; Kanehisa, M. AAindex: Amino acid index database, progress report 2008. Nucleic Acids Res. 2008, 36, D202–D205. [Google Scholar] [CrossRef] [PubMed]
- Rodrigues, C.H.M.; Myung, Y.; Pires, D.E.V.; Ascher, D.B. mCSM-PPI2: Predicting the effects of mutations on protein-protein interactions. Nucleic Acids Res. 2019, 47, W338–W344. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Liu, C.; Deng, L. Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting. Sci. Rep. 2018, 8, 14285. [Google Scholar] [CrossRef] [PubMed]
- Rozhonova, H.; Marti-Gomez, C.; McCandlish, D.M.; Payne, J.L. Robust genetic codes enhance protein evolvability. PLoS Biol. 2024, 22, e3002594. [Google Scholar] [CrossRef] [PubMed]
- Schmitt, A.; Schuchhardt, J.; Brockmann, G.A. The action of key factors in protein evolution at high temporal resolution. PLoS ONE 2009, 4, e4821. [Google Scholar] [CrossRef]
- Yampolsky, L.Y.; Bouzinier, M.A. Evolutionary patterns of amino acid substitutions in 12 Drosophila genomes. BMC Genom. 2010, 11 (Suppl. S4), S10. [Google Scholar] [CrossRef]
- Bohorquez, H.J.; Suarez, C.F.; Patarroyo, M.E. Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements. Sci. Rep. 2017, 7, 7717. [Google Scholar] [CrossRef]
- Rimal, P.; Panday, S.K.; Xu, W.; Peng, Y.; Alexov, E. SAAMBE-MEM: A sequence-based method for predicting binding free energy change upon mutation in membrane protein-protein complexes. Bioinformatics 2024, 40, btae544. [Google Scholar] [CrossRef]
- Kuang, J.; Zhao, Z.; Yang, Y.; Yan, W. PON-Tm: A Sequence-Based Method for Prediction of Missense Mutation Effects on Protein Thermal Stability Changes. Int. J. Mol. Sci. 2024, 25, 8379. [Google Scholar] [CrossRef]
- Aljarf, R.; Shen, M.; Pires, D.E.V.; Ascher, D.B. Understanding and predicting the functional consequences of missense mutations in BRCA1 and BRCA2. Sci. Rep. 2022, 12, 10458. [Google Scholar] [CrossRef]
- Nishi, H.; Tyagi, M.; Teng, S.; Shoemaker, B.A.; Hashimoto, K.; Alexov, E.; Wuchty, S.; Panchenko, A.R. Cancer missense mutations alter binding properties of proteins and their interaction networks. PLoS ONE 2013, 8, e66273. [Google Scholar] [CrossRef] [PubMed]
- Livesey, B.J.; Marsh, J.A. The properties of human disease mutations at protein interfaces. PLoS Comput. Biol. 2022, 18, e1009858. [Google Scholar] [CrossRef] [PubMed]
- Sekiyama, N.; Takaba, K.; Maki-Yonekura, S.; Akagi, K.I.; Ohtani, Y.; Imamura, K.; Terakawa, T.; Yamashita, K.; Inaoka, D.; Yonekura, K.; et al. ALS mutations in the TIA-1 prion-like domain trigger highly condensed pathogenic structures. Proc. Natl. Acad. Sci. USA 2022, 119, e2122523119. [Google Scholar] [CrossRef]
- Ruiz-Blanco, Y.B.; Aguero-Chapin, G.; Garcia-Hernandez, E.; Alvarez, O.; Antunes, A.; Green, J. Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone. BMC Bioinform. 2017, 18, 349. [Google Scholar] [CrossRef]
- Vanella, R.; Kovacevic, G.; Doffini, V.; Fernandez de Santaella, J.; Nash, M.A. High-throughput screening, next generation sequencing and machine learning: Advanced methods in enzyme engineering. Chem. Commun. 2022, 58, 2455–2467. [Google Scholar] [CrossRef]
- Ramirez-Palacios, C.; Marrink, S.J. Super High-Throughput Screening of Enzyme Variants by Spectral Graph Convolutional Neural Networks. J. Chem. Theory Comput. 2023, 19, 4668–4677. [Google Scholar] [CrossRef]
- Li, G.; Jia, L.; Wang, K.; Sun, T.; Huang, J. Prediction of Thermostability of Enzymes Based on the Amino Acid Index (AAindex) Database and Machine Learning. Molecules 2023, 28, 8097. [Google Scholar] [CrossRef]
- Kim, H.; Kihara, D. Protein structure prediction using residue- and fragment-environment potentials in CASP11. Proteins 2016, 84 (Suppl. S1), 105–117. [Google Scholar] [CrossRef]
- Kloczkowski, A.; Jernigan, R.L.; Wu, Z.; Song, G.; Yang, L.; Kolinski, A.; Pokarowski, P. Distance matrix-based approach to protein structure prediction. J. Struct. Funct. Genom. 2009, 10, 67–81. [Google Scholar] [CrossRef]
- Ren, J.; Liu, Q.; Ellis, J.; Li, J. Tertiary structure-based prediction of conformational B-cell epitopes through B factors. Bioinformatics 2014, 30, i264–i273. [Google Scholar] [CrossRef]
- Milchevskiy, Y.V.; Milchevskaya, V.Y.; Nikitin, A.M.; Kravatsky, Y.V. Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information. Int. J. Mol. Sci. 2023, 24, 15656. [Google Scholar] [CrossRef] [PubMed]
- Dong, B.; Liu, Z.; Xu, D.; Hou, C.; Dong, G.; Zhang, T.; Wang, G. SERT-StructNet: Protein secondary structure prediction method based on multi-factor hybrid deep model. Comput. Struct. Biotechnol. J. 2024, 23, 1364–1375. [Google Scholar] [CrossRef] [PubMed]
- Dong, B.; Liu, Z.; Xu, D.; Hou, C.; Niu, N.; Wang, G. Impact of Multi-Factor Features on Protein Secondary Structure Prediction. Biomolecules 2024, 14, 1155. [Google Scholar] [CrossRef] [PubMed]
- Vishnepolsky, B.; Grigolava, M.; Gabrielian, A.; Rosenthal, A.; Hurt, D.; Tartakovsky, M.; Pirtskhalava, M. Analysis, Modeling, and Target-Specific Predictions of Linear Peptides Inhibiting Virus Entry. ACS Omega 2023, 8, 46218–46226. [Google Scholar] [CrossRef]
- Nath, A. Physicochemical and sequence determinants of antiviral peptides. Biol. Futur. 2023, 74, 489–506. [Google Scholar] [CrossRef]
- Codina, J.R.; Mascini, M.; Dikici, E.; Deo, S.K.; Daunert, S. Accelerating the Screening of Small Peptide Ligands by Combining Peptide-Protein Docking and Machine Learning. Int. J. Mol. Sci. 2023, 24, 12144. [Google Scholar] [CrossRef]
- Han, J.; Kong, T.; Liu, J. PepNet: An interpretable neural network for anti-inflammatory and antimicrobial peptides prediction using a pre-trained protein language model. Commun. Biol. 2024, 7, 1198. [Google Scholar] [CrossRef]
- Ong, S.A.; Lin, H.H.; Chen, Y.Z.; Li, Z.R.; Cao, Z. Efficacy of different protein descriptors in predicting protein functional families. BMC Bioinform. 2007, 8, 300. [Google Scholar] [CrossRef]
- Hecht, M.; Bromberg, Y.; Rost, B. Better prediction of functional effects for sequence variants. BMC Genom. 2015, 16 (Suppl. S8), S1. [Google Scholar] [CrossRef]
- Xu, J.; Li, F.; Li, C.; Guo, X.; Landersdorfer, C.; Shen, H.H.; Peleg, A.Y.; Li, J.; Imoto, S.; Yao, J.; et al. iAMPCN: A deep-learning approach for identifying antimicrobial peptides and their functional activities. Brief. Bioinform. 2023, 24, bbad240. [Google Scholar] [CrossRef]
- Nordquist, E.; Zhang, G.; Barethiya, S.; Ji, N.; White, K.M.; Han, L.; Jia, Z.; Shi, J.; Cui, J.; Chen, J. Incorporating physics to overcome data scarcity in predictive modeling of protein function: A case study of BK channels. PLoS Comput. Biol. 2023, 19, e1011460. [Google Scholar] [CrossRef] [PubMed]
- Collingridge, P.W.; Kelly, S. MergeAlign: Improving multiple sequence alignment performance by dynamic reconstruction of consensus multiple sequence alignments. BMC Bioinform. 2012, 13, 117. [Google Scholar] [CrossRef] [PubMed]
- Liu, B.; Xu, J.; Zou, Q.; Xu, R.; Wang, X.; Chen, Q. Using distances between Top-n-gram and residue pairs for protein remote homology detection. BMC Bioinform. 2014, 15 (Suppl. S2), S3. [Google Scholar] [CrossRef]
- Koehl, P.; Orland, H.; Delarue, M. Numerical Encodings of Amino Acids in Multivariate Gaussian Modeling of Protein Multiple Sequence Alignments. Molecules 2018, 24, 104. [Google Scholar] [CrossRef] [PubMed]
- Hollebrands, B.; Hageman, J.A.; van de Sande, J.W.; Albada, B.; Janssen, H.G. Improved LC-MS identification of short homologous peptides using sequence-specific retention time predictors. Anal. Bioanal. Chem. 2023, 415, 2715–2726. [Google Scholar] [CrossRef]
- Sultan, M.F.; Shaon, M.S.H.; Karim, T.; Ali, M.M.; Hasan, M.Z.; Ahmed, K.; Bui, F.M.; Chen, L.; Dhasarathan, V.; Moni, M.A. MLAFP-XN: Leveraging neural network model for development of antifungal peptide identification tool. Heliyon 2024, 10, e37820. [Google Scholar] [CrossRef]
- Yao, L.; Xie, P.; Guan, J.; Chung, C.R.; Zhang, W.; Deng, J.; Huang, Y.; Chiang, Y.C.; Lee, T.Y. ACP-CapsPred: An explainable computational framework for identification and functional prediction of anticancer peptides based on capsule network. Brief. Bioinform. 2024, 25, bbae460. [Google Scholar] [CrossRef]
- Liang, X.; Zhao, H.; Wang, J. MA-PEP: A novel anticancer peptide prediction framework with multimodal feature fusion based on attention mechanism. Protein Sci. 2024, 33, e4966. [Google Scholar] [CrossRef]
- Sun, S.; Yang, X.; Wang, Y.; Shen, X. In Vivo Analysis of Protein-Protein Interactions with Bioluminescence Resonance Energy Transfer (BRET): Progress and Prospects. Int. J. Mol. Sci. 2016, 17, 1704. [Google Scholar] [CrossRef]
- Vickers, T.A.; Crooke, S.T. Development of a Quantitative BRET Affinity Assay for Nucleic Acid-Protein Interactions. PLoS ONE 2016, 11, e0161930. [Google Scholar] [CrossRef]
- Lostao, A.; Lim, K.; Pallares, M.C.; Ptak, A.; Marcuello, C. Recent advances in sensing the inter-biomolecular interactions at the nanoscale—A comprehensive review of AFM-based force spectroscopy. Int. J. Biol. Macromol. 2023, 238, 124089. [Google Scholar] [CrossRef] [PubMed]
- Katoh, T.; Sengoku, T.; Hirata, K.; Ogata, K.; Suga, H. Ribosomal synthesis and de novo discovery of bioactive foldamer peptides containing cyclic beta-amino acids. Nat. Chem. 2020, 12, 1081–1088. [Google Scholar] [CrossRef] [PubMed]
- Adaligil, E.; Song, A.; Cunningham, C.N.; Fairbrother, W.J. Ribosomal Synthesis of Macrocyclic Peptides with Linear gamma(4)- and beta-Hydroxy-gamma(4)-amino Acids. ACS Chem. Biol. 2021, 16, 1325–1331. [Google Scholar] [CrossRef] [PubMed]
- Goettig, P.; Koch, N.G.; Budisa, N. Non-Canonical Amino Acids in Analyses of Protease Structure and Function. Int. J. Mol. Sci. 2023, 24, 14035. [Google Scholar] [CrossRef]
- Fuertes, G.; Sakamoto, K.; Budisa, N. Editorial: Exploring and expanding the protein universe with non-canonical amino acids. Front. Mol. Biosci. 2023, 10, 1303286. [Google Scholar] [CrossRef]
- Castro, T.G.; Melle-Franco, M.; Sousa, C.E.A.; Cavaco-Paulo, A.; Marcos, J.C. Non-Canonical Amino Acids as Building Blocks for Peptidomimetics: Structure, Function, and Applications. Biomolecules 2023, 13, 981. [Google Scholar] [CrossRef]
- Lugtenburg, T.; Gran-Scheuch, A.; Drienovska, I. Non-canonical amino acids as a tool for the thermal stabilization of enzymes. Protein Eng. Des. Sel. 2023, 36, gzad003. [Google Scholar] [CrossRef]
- Pham, P.N.; Zahradnik, J.; Kolarova, L.; Schneider, B.; Fuertes, G. Regulation of IL-24/IL-20R2 complex formation using photocaged tyrosines and UV light. Front. Mol. Biosci. 2023, 10, 1214235. [Google Scholar] [CrossRef]
- Khoury, G.A.; Smadbeck, J.; Tamamis, P.; Vandris, A.C.; Kieslich, C.A.; Floudas, C.A. Forcefield_NCAA: Ab initio charge parameters to aid in the discovery and design of therapeutic proteins and peptides with unnatural amino acids and their application to complement inhibitors of the compstatin family. ACS Synth. Biol. 2014, 3, 855–869. [Google Scholar] [CrossRef]
- Croitoru, A.; Park, S.J.; Kumar, A.; Lee, J.; Im, W.; MacKerell, A.D., Jr.; Aleksandrov, A. Additive CHARMM36 Force Field for Nonstandard Amino Acids. J. Chem. Theory Comput. 2021, 17, 3554–3570. [Google Scholar] [CrossRef]
- Renfrew, P.D.; Choi, E.J.; Bonneau, R.; Kuhlman, B. Incorporation of noncanonical amino acids into Rosetta and use in computational protein-peptide interface design. PLoS ONE 2012, 7, e32637. [Google Scholar] [CrossRef] [PubMed]
- Hickey, J.L.; Sindhikara, D.; Zultanski, S.L.; Schultz, D.M. Beyond 20 in the 21st Century: Prospects and Challenges of Non-canonical Amino Acids in Peptide Drug Discovery. ACS Med. Chem. Lett. 2023, 14, 557–565. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Zheng, Z.; Dong, L.; Shi, N.; Yang, Y.; Chen, H.; Shen, Y.; Xia, Q. Rational incorporation of any unnatural amino acid into proteins by machine learning on existing experimental proofs. Comput. Struct. Biotechnol. J. 2022, 20, 4930–4941. [Google Scholar] [CrossRef] [PubMed]
- Eisenberg, D. Three-dimensional structure of membrane and surface proteins. Annu. Rev. Biochem. 1984, 53, 595–623. [Google Scholar] [CrossRef]
- Fasman, G.D. Handbook of Biochemistry and Molecular Biology, 3rd ed.; Fasman, G.D., Ed.; CRC Press: Cleveland, OH, USA, 1976; Volume 1. [Google Scholar]
- Charton, M.; Charton, B.I. The dependence of the Chou-Fasman parameters on amino acid side chain structure. J. Theor. Biol. 1983, 102, 121–134. [Google Scholar] [CrossRef]
- Dayhoff, M.O. Atlas of protein sequence and structure. In National Biomedical Research Foundation; National Geodetic Survey, NOAA: Silver Spring, MD, USA, 1972; Volume 5, p. 5. [Google Scholar]
- Chou, P.Y.; Fasman, G.D. Empirical predictions of protein conformation. Annu. Rev. Biochem. 1978, 47, 251–276. [Google Scholar] [CrossRef]
- Bella, J.; Eaton, M.; Brodsky, B.; Berman, H.M. Crystal and molecular structure of a collagen-like peptide at 1.9 A resolution. Science 1994, 266, 75–81. [Google Scholar] [CrossRef]
- Burjanadze, T.V. Hydroxyproline content and location in relation to collagen thermal stability. Biopolymers 1979, 18, 931–938. [Google Scholar] [CrossRef]
- O’Brien, K.T.; Mooney, C.; Lopez, C.; Pollastri, G.; Shields, D.C. Prediction of polyproline II secondary structure propensity in proteins. R. Soc. Open Sci. 2020, 7, 191239. [Google Scholar] [CrossRef]
- Roseman, M.A. Hydrophilicity of polar amino acid side-chains is markedly reduced by flanking peptide bonds. J. Mol. Biol. 1988, 200, 513–522. [Google Scholar] [CrossRef]
- Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem. Inf. Comput. Sci. 1988, 28, 31–36. [Google Scholar] [CrossRef]
- Weininger, D.; Weininger, A.; Weininger, J.L. SMILES. 2. Algorithm for generation of unique SMILES notation. J. Chem. Inf. Comput. Sci. 1989, 29, 97–101. [Google Scholar] [CrossRef]
- O’Boyle, N.M. Towards a Universal SMILES representation—A standard method to generate canonical SMILES based on the InChI. J. Cheminform. 2012, 4, 22. [Google Scholar] [CrossRef] [PubMed]
- Landrum, G. Open-Source Cheminformatics. Available online: https://www.rdkit.org (accessed on 17 November 2024).
- Mauri, A.; Consonni, V.; Pavan, M.; Todeschini, R. Dragon software: An easy approach to molecular descriptor calculations. MATCH Commun. Math. Comput. Chem. 2006, 56, 237–248. [Google Scholar]
- Willighagen, E.L.; Mayfield, J.W.; Alvarsson, J.; Berg, A.; Carlsson, L.; Jeliazkova, N.; Kuhn, S.; Pluskal, T.; Rojas-Chertó, M.; Spjuth, O. The Chemistry Development Kit (CDK) v2.0: Atom typing, depiction, molecular formulas, and substructure searching. J. Cheminform. 2017, 9, 33. [Google Scholar] [CrossRef]
- Masand, V.H.; Rastija, V. PyDescriptor: A new PyMOL plugin for calculating thousands of easily understandable molecular descriptors. Chemometr. Intell. Lab. Syst. 2017, 169, 12–18. [Google Scholar] [CrossRef]
- Bjerrum, E.J. SMILES enumeration as data augmentation for neural network modeling of molecules. arXiv 2017, arXiv:1703.07076. [Google Scholar] [CrossRef]
- Li, X.; Fourches, D. Inductive transfer learning for molecular activity prediction: Next-Gen QSAR Models with MolPMoFiT. J. Cheminform. 2020, 12, 27. [Google Scholar] [CrossRef]
- Kimber, T.B.; Engelke, S.; Tetko, I.V.; Bruno, E.; Godin, G. Synergy effect between convolutional neural networks and the multiplicity of SMILES for improvement of molecular prediction. arXiv 2018, arXiv:1812.04439. [Google Scholar] [CrossRef]
- Tetko, I.V.; Karpov, P.; Bruno, E.; Kimber, T.B.; Godin, G. Augmentation is what you need! In Proceedings of the International Conference on Artificial Neural Networks, Munich, Germany, 17–19 September 2019; Springer: New York, NY, USA, 2019; pp. 831–835. [Google Scholar]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Berman, H.M.; Kleywegt, G.J.; Nakamura, H.; Markley, J.L. The Protein Data Bank at 40: Reflecting on the past to prepare for the future. Structure 2012, 20, 391–396. [Google Scholar] [CrossRef] [PubMed]
- Dimitropoulos, D.; Ionides, J.; Henrick, K. Using MSDchem to search the PDB ligand dictionary. Curr. Protoc. Bioinform. 2006, 15, 14.3.1–14.3.21. [Google Scholar] [CrossRef] [PubMed]
- Zhang, S.; Krieger, J.M.; Zhang, Y.; Kaya, C.; Kaynak, B.; Mikulska-Ruminska, K.; Doruker, P.; Li, H.; Bahar, I. ProDy 2.0: Increased scale and scope after 10 years of protein dynamics modelling with Python. Bioinformatics 2021, 37, 3657–3659. [Google Scholar] [CrossRef] [PubMed]
- Fauchere, J.L.; Charton, M.; Kier, L.B.; Verloop, A.; Pliska, V. Amino acid side chain parameters for correlation studies in biology and pharmacology. Int. J. Pept. Protein Res. 1988, 32, 269–278. [Google Scholar] [CrossRef] [PubMed]
- Hastie, T.; Tibshirani, R.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009; Volume 2, p. 745. [Google Scholar]
AAindex Accession | rj-n | RMSE | F-Value of rj-n | Pnum |
---|---|---|---|---|
CHAM820101 | 0.999 | 0.005 | 1.2 | 10 |
KARS160117 | 0.994 | 1.820 | 2.0 | 8 |
FAUJ880103 | 0.989 | 0.287 | 1.1 | 10 |
LEVM760105 | 0.989 | 0.070 | 2.1 | 6 |
BIGC670101 | 0.986 | 4.580 | 1.0 | 9 |
GOLD730102 | 0.985 | 6.860 | 6.9 | 5 |
KARS160107 | 0.982 | 0.748 | 4.0 | 6 |
CHOC750101 | 0.976 | 9.540 | 1.5 | 9 |
FASG760101 | 0.975 | 6.920 | 1.4 | 10 |
KARS160101 | 0.975 | 0.526 | 3.4 | 6 |
ROSM880101 | 0.973 | 1.230 | 1.7 | 9 |
TSAJ990101 | 0.970 | 9.920 | 2.3 | 7 |
LEVM760102 | 0.966 | 0.255 | 3.3 | 6 |
TSAJ990102 | 0.966 | 10.600 | 3.2 | 7 |
KRIW790103 | 0.963 | 9.900 | 4.6 | 5 |
KUHL950101 | 0.958 | 0.095 | 3.3 | 3 |
ZIMJ680104 | 0.958 | 0.850 | 1.0 | 7 |
CHOC760101 | 0.957 | 12.500 | 1.0 | 7 |
KARS160114 | 0.954 | 2.210 | 1.5 | 6 |
KARS160102 | 0.952 | 0.740 | 3.4 | 4 |
AAindex Accession | rj-n | Property Explanation |
---|---|---|
KUHL950101 | 0.958 | Hydrophilicity scale |
WOLR790101 | 0.954 | Hydrophobicity index |
EISD840101 | 0.924 | Consensus normalized hydrophobicity scale |
KIDA850101 | 0.903 | Hydrophobicity-related index |
PRAM900101 | 0.888 | Hydrophobicity |
ENGD860101 | 0.887 | Hydrophobicity index |
BLAS910101 | 0.873 | Scaled side chain hydrophobicity values |
GOLD730101 | 0.828 | Hydrophobicity factor |
COWR900101 | 0.808 | Hydrophobicity index, 3.0 pH |
CIDH920102 | 0.802 | Normalized hydrophobicity scales for beta-proteins |
JURD980101 | 0.774 | Modified Kyte–Doolittle hydrophobicity scale |
WILM950101 | 0.714 | Hydrophobicity coefficient in RP-HPLC, C18 with 0.1%TFA/MeCN/H2O |
CIDH920105 | 0.697 | Normalized average hydrophobicity scales |
ARGP820101 | 0.659 | Hydrophobicity index |
JOND750101 | 0.646 | Hydrophobicity |
CIDH920104 | 0.625 | Normalized hydrophobicity scales for alpha-/beta-proteins |
CASG920101 | 0.600 | Hydrophobicity scale from native protein structures |
PONP800106 | 0.589 | Surrounding hydrophobicity in turn |
CIDH920103 | 0.563 | Normalized hydrophobicity scales for alpha-proteins |
SWER830101 | 0.563 | Optimal matching hydrophobicity |
PONP800103 | 0.545 | Average gain ratio in surrounding hydrophobicity |
CIDH920101 | 0.537 | Normalized hydrophobicity scales for alpha-proteins |
MANP780101 | 0.512 | Average surrounding hydrophobicity |
PONP930101 | 0.501 | Hydrophobicity scales |
FASG890101 | 0.479 | Hydrophobicity index |
PONP800105 | 0.476 | Surrounding hydrophobicity in beta-sheet |
PONP800102 | 0.458 | Average gain in surrounding hydrophobicity |
ZIMJ680101 | 0.455 | Hydrophobicity |
PONP800101 | 0.445 | Surrounding hydrophobicity in folded form |
WILM950102 | 0.427 | Hydrophobicity coefficient in RP-HPLC, C8 with 0.1%TFA/MeCN/H2O |
WILM950103 | 0.368 | Hydrophobicity coefficient in RP-HPLC, C4 with 0.1%TFA/MeCN/H2O |
PONP800104 | 0.242 | Surrounding hydrophobicity in alpha-helix |
WILM950104 | −0.193 | Hydrophobicity coefficient in RP-HPLC, C18 with 0.1%TFA/2-PrOH/MeCN/H2O |
F-Value | Regression Coefficient | SD of RC | Predictor |
---|---|---|---|
507.614 | −0.563688 | 0.025 | O |
120.263 | −0.556289 | 0.050 | N |
9.014 | 0.163592 | 0.054 | n |
11.118 | 0.452579 | 0.135 | N= |
56.734 | 0.113070 | 0.015 | C |
111.808 | 0.148830 | 0.014 | c |
22.228 | −0.248443 | 0.052 | S |
68.744 | −0.290756 | 0.035 | c1 |
44.738 | −0.688098 | 0.102 | c2 |
189.686 | −1.168294 | 0.084 | + |
Constant term | 1.630 |
Component | Predictor’s Value | Regression Coefficient |
---|---|---|
=O | 1 | 0.000000 |
O | 3 | −0.563688 |
N | 1 | −0.556289 |
n | 0 | 0.163592 |
=N | 0 | 0.452579 |
C | 5 | 0.113070 |
c | 0 | 0.148830 |
[C@ | 2 | 0.000000 |
S | 0 | −0.248443 |
C1 | 1 | −0.290756 |
C2 | 0 | −0.688098 |
= | 1 | 0.000000 |
+ | 0 | −1.168294 |
Constant term | 1.629829 |
Code | Amino Acid | SMILES | Number | Percent |
---|---|---|---|---|
MLY | N-DIMETHYL-LYSINE | CN(C)CCCC[C@H](N)C(O)=O | 5324 | 20.151 |
HYP | 4-HYDROXYPROLINE | O[C@H]1CN[C@@H](C1)C(O)=O | 2264 | 8.569 |
NAG | 2-ACETAMIDO-2-DEOXY-BETA-D-GLUCOPYRANOSE | CC(=O)N[C@H]1[C@H](O)O[C@H](CO)[C@@H](O)[C@@H]1O | 983 | 3.721 |
CSO | S-HYDROXYCYSTEINE | N[C@@H](CSO)C(O)=O | 890 | 3.369 |
CRO | {2-[(1R,2R)-1-AMINO-2-HYDROXYPROPYL]-4-(4-HYDROXYBENZYLIDENE)-5-OXO-4,5-DIHYDRO-1H-IMIDAZOL-1-YL}ACETIC ACID | C[C@@H](O)[C@H](N)C1=N\C(=C/c2ccc(O)cc2)C(=O)N1CC(O)=O | 886 | 3.354 |
KCX | LYSINE NZ-CARBOXYLIC ACID | N[C@@H](CCCCNC(O)=O)C(O)=O | 877 | 3.319 |
PCA | PYROGLUTAMIC ACID | OC(=O)[C@@H]1CCC(=O)N1 | 783 | 2.964 |
CME | S,S-(2-HYDROXYETHYL)THIOCYSTEINE | N[C@@H](CSSCCO)C(O)=O | 679 | 2.570 |
CSD | 3-SULFINOALANINE | N[C@@H](C[S](O)=O)C(O)=O | 617 | 2.335 |
NRQ | {(4Z)-4-(4-HYDROXYBENZYLIDENE)-2-[3-(METHYLTHIO)PROPANIMIDOYL]-5-OXO-4,5-DIHYDRO-1H-IMIDAZOL-1-YL}ACETIC ACID | CSCCC(=N)C1=N\C(=C/c2ccc(O)cc2)C(=O)N1CC(O)=O | 604 | 2.286 |
CR2 | {(4Z)-2-(AMINOMETHYL)-4-[(4-HYDROXYPHENYL)METHYLIDENE]-5-OXO-4,5-DIHYDRO-1H-IMIDAZOL-1-YL}ACETIC ACID | NCC1=N\C(=C/c2ccc(O)cc2)C(=O)N1CC(O)=O | 521 | 1.972 |
CGU | GAMMA-CARBOXY-GLUTAMIC ACID | N[C@@H](CC(C(O)=O)C(O)=O)C(O)=O | 507 | 1.919 |
GYC | [(4Z)-2-[(1R)-1-AMINO-2-MERCAPTOETHYL]-4-(4-HYDROXYBENZYLIDENE)-5-OXO-4,5-DIHYDRO-1H-IMIDAZOL-1-YL]ACETIC ACID | N[C@@H](CS)C1=N\C(=C/c2ccc(O)cc2)C(=O)N1CC(O)=O | 396 | 1.499 |
CRQ | [2-(3-CARBAMOYL-1-IMINO-PROPYL)-4-(4-HYDROXY-BENZYLIDENE)-5-OXO-4,5-DIHYDRO-IMIDAZOL-1-YL]-ACETIC ACID | NC(=O)CCC(=N)C1=N\C(=C/c2ccc(O)cc2)C(=O)N1CC(O)=O | 381 | 1.442 |
MDO | {2-[(1S)-1-AMINOETHYL]-4-METHYLIDENE-5-OXO-4,5-DIHYDRO-1H-IMIDAZOL-1-YL}ACETIC ACID | C[C@H](N)C1=NC(=C)C(=O)N1CC(O)=O | 356 | 1.347 |
OCS | CYSTEINESULFONIC ACID | N[C@@H](C[S](O)(=O)=O)C(O)=O | 352 | 1.332 |
FME | N-FORMYLMETHIONINE | CSCC[C@H](NC=O)C(O)=O | 300 | 1.136 |
ORN | L-ORNITHINE | NCCC[C@H](N)C(O)=O | 299 | 1.132 |
ABA | ALPHA-AMINOBUTYRIC ACID | CC[C@H](N)C(O)=O | 284 | 1.075 |
CR8 | 2-[1-AMINO-2-(1H-IMIDAZOL-5-YL)ETHYL]-1-(CARBOXYMETHYL)-4-[(4-OXOCYCLOHEXA-2,5-DIEN-1-YLIDENE)METHYL]-1H-IMIDAZOL-5-OLATE | N[C@@H](Cc1[nH]cnc1)c2nc(C=C3C=CC(=O)C=C3)c([O-])n2CC(O)=O | 279 | 1.056 |
TYS | O-SULFO-L-TYROSINE | N[C@@H](Cc1ccc(O[S](O)(=O)=O)cc1)C(O)=O | 276 | 1.045 |
SMC | S-METHYLCYSTEINE | CSC[C@H](N)C(O)=O | 258 | 0.977 |
M3L | N-TRIMETHYLLYSINE | C[N+](C)(C)CCCC[C@H](N)C(O)=O | 254 | 0.961 |
ALY | N(6)-ACETYLLYSINE | CC(=O)NCCCC[C@H](N)C(O)=O | 239 | 0.905 |
GYS | [(4Z)-2-(1-AMINO-2-HYDROXYETHYL)-4-(4-HYDROXYBENZYLIDENE)-5-OXO-4,5-DIHYDRO-1H-IMIDAZOL-1-YL]ACETIC ACID | N[C@@H](CO)C1=N\C(=C/c2ccc(O)cc2)C(=O)N1CC(O)=O | 225 | 0.852 |
Component | Description |
---|---|
=O O= | oxygen, forming a double bond |
O | any oxygen |
N | nitrogen, except an aromatic ring |
n | nitrogen in an aromatic ring |
=N N= =[N | nitrogen, forming a double bond |
C | carbon, except an aromatic ring |
c | carbon in an aromatic ring |
[C@ | carbon as a chiral center |
S | sulfur |
c1 C1 n1 N1 S1 | any ring (aromatic or any other cycle) |
c2 C2 n2 N2 S2 | second ring (aromatic or any other cycle) |
= | any double bond |
+ | positive charge |
F-Value = 1 | F-Value = 2 | F-Value = 2.4 | F-Value = 3 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
r | rj-n | Pnum | r | rj-n | Pnum | r | rj-n | Pnum | r | rj-n | Pnum |
0.997 | 0.850 | 11 | 0.997 | 0.848 | 10 | 0.953 | 0.924 | 10 | 0.953 | 0.890 | 3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Milchevskiy, Y.V.; Kravatskaya, G.I.; Kravatsky, Y.V. AAindexNC: Estimating the Physicochemical Properties of Non-Canonical Amino Acids, Including Those Derived from the PDB and PDBeChem Databank. Int. J. Mol. Sci. 2024, 25, 12555. https://doi.org/10.3390/ijms252312555
Milchevskiy YV, Kravatskaya GI, Kravatsky YV. AAindexNC: Estimating the Physicochemical Properties of Non-Canonical Amino Acids, Including Those Derived from the PDB and PDBeChem Databank. International Journal of Molecular Sciences. 2024; 25(23):12555. https://doi.org/10.3390/ijms252312555
Chicago/Turabian StyleMilchevskiy, Yury V., Galina I. Kravatskaya, and Yury V. Kravatsky. 2024. "AAindexNC: Estimating the Physicochemical Properties of Non-Canonical Amino Acids, Including Those Derived from the PDB and PDBeChem Databank" International Journal of Molecular Sciences 25, no. 23: 12555. https://doi.org/10.3390/ijms252312555
APA StyleMilchevskiy, Y. V., Kravatskaya, G. I., & Kravatsky, Y. V. (2024). AAindexNC: Estimating the Physicochemical Properties of Non-Canonical Amino Acids, Including Those Derived from the PDB and PDBeChem Databank. International Journal of Molecular Sciences, 25(23), 12555. https://doi.org/10.3390/ijms252312555