Accurate Prediction of Transcriptional Activity of Single Missense Variants in HIV Tat with Deep Learning
Abstract
:1. Introduction
2. Results
2.1. Overview of the Proposed Deep Learning Framework Called Rep2Mut
2.2. Evaluation of the Proposed Deep Learning Framework Rep2Mut
2.3. Effect of Amino Acid Position on Activity Prediction
2.4. Rep2Mut Sensitivity Analysis for the Fraction of Training Data
3. Discussion
3.1. Visualization of Predicted Vectors
3.2. Association of Amino Acid Types with Tat Activity Predictions
3.3. Outliers in Rep2Mut Prediction
4. Materials and Methods
4.1. Dataset
4.2. Rep2Mut Framework to Estimate Tat Variants’ Effect on Transcriptional Activities
4.3. Training and Testing Rep2Mut
4.4. Evaluation Measurements
4.5. How to Use ESM to Predict Tat Variants’ Activities
4.6. How to Test DeepSequence on Tat Variants
4.7. The Framework of a Baseline Method
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Basic Statistics|HIV Basics|HIV/AIDS|CDC. Available online: https://www.cdc.gov/hiv/basics/statistics.html (accessed on 6 May 2022).
- Preston, B.D.; Poiesz, B.J.; Loeb, L.A. Fidelity of HIV-1 Reverse Transcriptase. Science 1988, 242, 1168–1171. [Google Scholar] [CrossRef] [PubMed]
- Palmer, S.; Kearney, M.; Maldarelli, F.; Halvas, E.K.; Bixby, C.J.; Bazmi, H.; Rock, D.; Falloon, J.; Davey, R.T., Jr.; Dewar, R.L.; et al. Multiple, Linked Human Immunodeficiency Virus Type 1 Drug Resistance Mutations in Treatment-Experienced Patients Are Missed by Standard Genotype Analysis. J. Clin. Microbiol. 2005, 43, 406–413. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Woodman, Z.; Williamson, C. HIV Molecular Epidemiology: Transmission and Adaptation to Human Populations. Curr. Opin. HIV AIDS 2009, 4, 247–252. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Benjamin, R.; Giacoletto, C.J.; FitzHugh, Z.T.; Eames, D.; Buczek, L.; Wu, X.; Newsome, J.; Han, M.V.; Pearson, T.; Wei, Z.; et al. GigaAssay—An Adaptable High-Throughput Saturation Mutagenesis Assay Platform. Genomics 2022, 45, 110439. [Google Scholar] [CrossRef] [PubMed]
- Weile, J.; Roth, F.P. Multiplexed Assays of Variant Effects Contribute to a Growing Genotype–Phenotype Atlas. Hum. Genet. 2018, 137, 665–678. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kuang, D.; Truty, R.; Weile, J.; Johnson, B.; Nykamp, K.; Araya, C.; Nussbaum, R.L.; Roth, F.P. Prioritizing Genes for Systematic Variant Effect Mapping. Bioinformatics 2021, 36, 5448–5455. [Google Scholar] [CrossRef] [PubMed]
- Starita, L.M.; Ahituv, N.; Dunham, M.J.; Kitzman, J.O.; Roth, F.P.; Seelig, G.; Shendure, J.; Fowler, D.M. Variant Interpretation: Functional Assays to the Rescue. Am. J. Hum. Genet. 2017, 101, 315–325. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Meier, J.; Rao, R.; Verkuil, R.; Liu, J.; Sercu, T.; Rives, A. Language Models Enable Zero-Shot Prediction of the Effects of Mutations on Protein Function. Adv. Neural Inf. Process. Syst. 2021, 34, 29287–29303. [Google Scholar]
- Riesselman, A.J.; Ingraham, J.B.; Marks, D.S. Deep Generative Models of Genetic Variation Capture the Effects of Mutations. Nat. Methods 2018, 15, 816–822. [Google Scholar] [CrossRef] [PubMed]
- McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv 2020, arXiv:1802.03426. [Google Scholar]
- Gu, J.; Babayeva, N.D.; Suwa, Y.; Baranovskiy, A.G.; Price, D.H.; Tahirov, T.H. Crystal Structure of HIV-1 Tat Complexed with Human P-TEFb and AFF4. Cell Cycle 2014, 13, 1788–1797. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, R.; Cao, X.-J.; Kulej, K.; Liu, W.; Ma, T.; MacDonald, M.; Chiang, C.-M.; Garcia, B.A.; You, J. Uncovering BRD4 Hyperphosphorylation Associated with Cellular Transformation in NUT Midline Carcinoma. Proc. Natl. Acad. Sci. USA 2017, 114, E5352–E5361. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef] [PubMed]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL-HLT, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Suzek, B.E.; Huang, H.; McGarvey, P.; Mazumder, R.; Wu, C.H. UniRef: Comprehensive and Non-Redundant UniProt Reference Clusters. Bioinformatics 2007, 23, 1282–1288. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; IEEE: Washington, DC, USA, 2015; pp. 1026–1034. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Prediction Method | Pearson | Spearman |
---|---|---|
ESM_pred | 0.51 | 0.56 |
ESM_pred_avg | 0.54 | 0.59 |
DeepSequence | 0.57 | 0.41 |
The baseline method | 0.76 | 0.72 |
Rep2Mut (wo_p 1) | 0.91 | 0.87 |
Rep2Mut | 0.94 | 0.89 |
GigaAssay | Predicted | ||||||||
---|---|---|---|---|---|---|---|---|---|
Var | GA | Pred | #ID | Avg | Min | Max | Avg | Min | Max |
Overestimation | |||||||||
E2A | 0.19 | 0.57 | E2 | 0.44 | P = 0.16; A = 0.19; C = 0.24 | D = 0.77; T = 0.72; Q = 0.72 | 0.47 | P = 0.28; I = 0.36; R = 0.36 | S = 0.58; T = 0.57; Q = 0.57 |
P3R | 0.21 | 0.56 | P3 | 0.56 | R = 0.20; K = 0.22; G = 0.34 | L = 0.77; V = 0.75; I = 0.75 | 0.56 | K = 0.41; D = 0.43; Y = 0.48 | V = 0.64; S = 0.62; L = 0.62 |
P6K | 0.16 | 0.55 | P6 | 0.62 | K = 0.16; R = 0.36; L = 0.45 | W = 0.85; Y = 0.79; F = 0.77 | 0.61 | R = 0.48; D = 0.51; E = 0.54 | S = 0.67; H = 0.66; A = 0.66 |
R7P | 0.25 | 0.64 | R7 | 0.78 | P = 0.24; K = 0.7; I = 0.75 | E = 0.86; D = 0.85; S = 0.83 | 0.75 | P = 0.63; W = 0.66; T = 0.69 | E = 0.83; Q = 0.8; Y = 0.8 |
K12P | 0.12 | 0.56 | K12 | 0.70 | P = 0.12; G = 0.50; T = 0.62 | L = 0.83; Q = 0.82; N = 0.82 | 0.69 | F = 0.55; P = 0.55; W = 0.59 | Q = 0.8; A = 0.79; T = 0.78 |
Q17V | 0.32 | 0.62 | Q17 | 0.51 | P = 0.11; W = 0.24; I = 0.24 | K = 0.8; R = 0.78; A = 0.77 | 0.51 | P = 0.32; F = 0.33; Y = 0.41 | V = 0.62; M = 0.61; C = 0.59 |
F32H | 0.16 | 0.47 | F32 | 0.24 | D = 0.09; K = 0.1; N = 0.1 | Y = 0.84; W = 0.76; L = 0.55 | 0.27 | G = 0.07; P = 0.11; E = 0.13 | Y = 0.53; H = 0.47; M = 0.42 |
Q35R | 0.11 | 0.44 | Q35 | 0.42 | K = 0.08; D = 0.1; R = 0.11 | H = 0.79; M = 0.74; Y = 0.71 | 0.43 | P = 0.26; G = 0.33; E = 0.33 | H = 0.55; A = 0.55; M = 0.51 |
M39K | 0.11 | 0.45 | M39 | 0.45 | W = 0.1; K = 0.1; R = 0.1 | L = 0.84; I = 0.78; V = 0.78 | 0.45 | P = 0.22; R = 0.28; D = 0.28 | V = 0.77; I = 0.61; S = 0.6 |
K85E | 0.17 | 0.76 | K85 | 0.76 | E = 0.16; W = 0.72; F = 0.75 | V = 0.83; D = 0.81; Q = 0.81 | 0.78 | P = 0.65; M = 0.73; W = 0.73 | S = 0.87; T = 0.83; H = 0.82 |
Underestimation | |||||||||
D5E | 0.64 | 0.32 | D5 | 0.28 | F = 0.15; I = 0.16; R = 0.17 | E = 0.64; S = 0.51; C = 0.4 | 0.32 | I = 0.18; L = 0.19; M = 0.22 | N = 0.46; S = 0.46; H = 0.41 |
E9P | 0.84 | 0.37 | E9 | 0.45 | W = 0.13; F = 0.15; Y = 0.17 | P = 0.84; A = 0.81; D = 0.76 | 0.45 | F = 0.25; W = 0.32; R = 0.32 | D = 0.66; A = 0.61; Q = 0.58 |
P10N | 0.77 | 0.46 | P10 | 0.43 | W = 0.12; F = 0.14; M = 0.17 | N = 0.76; A = 0.74; S = 0.74 | 0.43 | W = 0.29; D = 0.32; Y = 0.33 | A = 0.62; S = 0.57; C = 0.53 |
G15T | 0.59 | 0.21 | G15 | 0.21 | E = 0.09; F = 0.11; I = 0.11 | S = 0.73; T = 0.59; P = 0.35 | 0.23 | Y = 0.11; I = 0.14; F = 0.16 | Q = 0.36; M = 0.33; V = 0.33 |
G15S | 0.74 | 0.28 | G15 | 0.21 | E = 0.09; F = 0.11; I = 0.11 | S = 0.73; T = 0.59; P = 0.35 | 0.23 | Y = 0.11; I = 0.14; F = 0.16 | Q = 0.36; M = 0.33; V = 0.33 |
Q17K | 0.81 | 0.50 | Q17 | 0.51 | P = 0.11; W = 0.24; I = 0.24 | K = 0.8; R = 0.78; A = 0.77 | 0.51 | P = 0.32; F = 0.33; Y = 0.41 | V = 0.62; M = 0.61; C = 0.59 |
C31A | 0.76 | 0.36 | C31 | 0.24 | P = 0.09; Y = 0.1; E = 0.1 | A = 0.76; S = 0.73; V = 0.42 | 0.25 | R = 0.11; K = 0.14; D = 0.14 | S = 0.53; V = 0.51; T = 0.49 |
F32Y | 0.85 | 0.53 | F32 | 0.24 | D = 0.09; K = 0.1; N = 0.1 | Y = 0.84; W = 0.76; L = 0.55 | 0.27 | G = 0.07; P = 0.11; E = 0.13 | Y = 0.53; H = 0.47; M = 0.42 |
F32W | 0.76 | 0.29 | F32 | 0.24 | D = 0.09; K = 0.1; N = 0.1 | Y = 0.84; W = 0.76; L = 0.55 | 0.27 | G = 0.07; P = 0.11; E = 0.13 | Y = 0.53; H = 0.47; M = 0.42 |
F38W | 0.52 | 0.19 | F38 | 0.15 | D = 0.08; Q = 0.09; R = 0.09 | W = 0.51; Y = 0.39; L = 0.15 | 0.14 | R = 0.06; K = 0.07; S = 0.07 | I = 0.22; V = 0.22; W = 0.19 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Derbel, H.; Giacoletto, C.J.; Benjamin, R.; Chen, G.; Schiller, M.R.; Liu, Q. Accurate Prediction of Transcriptional Activity of Single Missense Variants in HIV Tat with Deep Learning. Int. J. Mol. Sci. 2023, 24, 6138. https://doi.org/10.3390/ijms24076138
Derbel H, Giacoletto CJ, Benjamin R, Chen G, Schiller MR, Liu Q. Accurate Prediction of Transcriptional Activity of Single Missense Variants in HIV Tat with Deep Learning. International Journal of Molecular Sciences. 2023; 24(7):6138. https://doi.org/10.3390/ijms24076138
Chicago/Turabian StyleDerbel, Houssemeddine, Christopher J. Giacoletto, Ronald Benjamin, Gordon Chen, Martin R. Schiller, and Qian Liu. 2023. "Accurate Prediction of Transcriptional Activity of Single Missense Variants in HIV Tat with Deep Learning" International Journal of Molecular Sciences 24, no. 7: 6138. https://doi.org/10.3390/ijms24076138
APA StyleDerbel, H., Giacoletto, C. J., Benjamin, R., Chen, G., Schiller, M. R., & Liu, Q. (2023). Accurate Prediction of Transcriptional Activity of Single Missense Variants in HIV Tat with Deep Learning. International Journal of Molecular Sciences, 24(7), 6138. https://doi.org/10.3390/ijms24076138