Cheminformatics Modeling of Gene Silencing for Both Natural and Chemically Modified siRNAs
Abstract
:1. Introduction
2. Materials and Methods
2.1. The Huesken Dataset
2.2. The Bramsen Dataset
2.3. Overall Workflow for the Predictive Modeling of siRNA Potency
2.4. Generation of the BCUT Descriptors for Each siRNA Molecule
- Correlation strength and predictive power. The correlation coefficient r (i.e., Pearson r) was computed to measure the correlation strength, and the predictive r2 was used to measure the prediction power. The values of model development r (or r2) were calculated on the basis of the actual potency and model-predicted potency for the training set siRNAs. They served as the necessary requirement for a reliable quantitative model. Testing r (r2) was calculated on the basis the actual potency and model-predicted potency for the test set molecules. The value of r (or r2) was viewed as another necessary requirement for a suitable predictive model [46]. Equation (1) was used to determine the value of r, and Equation (2) was used to compute the value of r2, applicable to both training and test sets. In the equations, yi and pi are the actual and predicted potencies, respectively; and are the means of the yi and pi, respectively. N is the number of siRNA molecules.
- 2.
- The effect of the number of principal components on the predictive models. The principal component analysis technique was used to extract a set of orthogonal factors that afford the best predictive power. The proper number of principal components is dependent on the size of the training set and the relationship between the descriptors. To establish the best models for a given dataset, we scanned the number of principal components to find the optimal numbers for use in the model.
- 3.
- The effect of data partitioning on the predictive models. To ensure the predictive power of the built models, we rationally split/partition the dataset into training and test sets. The training set was used to establish the models, and the corresponding test set was used to validate the models. The molecules in the test set were not involved in the model building; thus, the predictive r2 calculated on the basis of the test set will more objectively indicate the true predictive ability. Different partitioning of training and test sets could give rise to models with different predictive powers, especially when the dataset is small, and we performed a series of computational experiments to find the best models.
- 4.
- The effect of training set size on the predictive models. The predictive power is strongly dependent on the size of the training set. Thus, different percentages of the original dataset were selected to be used in the training set. The ideal case was to use the least number of siRNAs to develop models, which are then used to predict the greatest number of siRNAs. We used the Huesken dataset to demonstrate this where 1%, 2%, 3%, …, and 90% of the whole dataset were used as the training set; and the models were used to predict the potency of 99%, 98%, 97%, …, and 10% of the remaining siRNAs, respectively.
- 5.
- Effect of random shuffling on model development. The predictive models built should faithfully reflect the intrinsic relationship between the descriptors and the gene silencing potency for a given dataset. A random dataset should not result in a predictive model. To prove this, we first randomly shuffled the potencies among the whole dataset, and then the models were built from these scrambled datasets. A different number of principal components was used to perform the PLS (partial least square) modeling. We should expect a dramatic decrease in the predictive power of the models built on the scrambled dataset.
3. Results and Discussion
3.1. Modeling of the Huesken Dataset
3.2. Modeling of the Bramsen Dataset
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Morrissey, D.V.; Lockridge, J.A.; Shaw, L.; Blanchard, K.; Jensen, K.; Breen, W.; Hartsough, K.; Machemer, L.; Radka, S.; Jadhav, V.; et al. Potent and persistent In Vivo anti-HBV activity of chemically modified siRNAs. Nat. Biotechnol. 2005, 23, 1002–1007. [Google Scholar] [CrossRef] [PubMed]
- Soutschek, J.; Akinc, A.; Bramlage, B.; Charisse, K.; Constien, R.; Donoghue, M.; Elbashir, S.; Geick, A.; Hadwiger, P.; Harborth, J.; et al. Therapeutic silencing of an endogenous gene by systemic administration of modified siRNAs. Nature 2004, 432, 173–178. [Google Scholar] [CrossRef] [PubMed]
- Durcan, N.; Murphy, C.; Cryan, S.A. Inhalable siRNA: Potential as a therapeutic agent in the lungs. Mol. Pharm. 2008, 5, 559–566. [Google Scholar] [CrossRef] [PubMed]
- Grimm, D. Small silencing RNAs: State-of-the-art. Adv. Drug Deliv. Rev. 2009, 61, 672–703. [Google Scholar] [CrossRef]
- Jackson, A.L.; Linsley, P.S. Recognizing and avoiding siRNA off-target effects for target identification and therapeutic application. Nat. Rev. Drug Discov. 2010, 9, 57–67. [Google Scholar] [CrossRef]
- Rana, T.M. Illuminating the silence: Understanding the structure and function of small RNAs. Nat. Rev. Mol. Cell Biol. 2007, 8, 23–36. [Google Scholar] [CrossRef]
- Mello, C.C.; Conte, D., Jr. Revealing the world of RNA interference. Nature 2004, 431, 338–342. [Google Scholar] [CrossRef]
- Hutvagner, G.; McLachlan, J.; Pasquinelli, A.E.; Balint, E.; Tuschl, T.; Zamore, P.D. A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 2001, 293, 834–838. [Google Scholar] [CrossRef]
- Hannon, G.J.; Rossi, J.J. Unlocking the potential of the human genome with RNA interference. Nature 2004, 431, 371–378. [Google Scholar] [CrossRef]
- Meister, G.; Tuschl, T. Mechanisms of gene silencing by double-stranded RNA. Nature 2004, 431, 343–349. [Google Scholar] [CrossRef]
- Chowdhury, U.F.; Sharif, S.M.U.; Hoque, K.I.; Beg, M.A.; Sharif, S.M.K.; Moni, M.A. A computational approach to design potential siRNA molecules as a prospective tool for silencing nucleocapsid phosphoprotein and surface glycoprotein gene of SARS-CoV-2. Genomics 2021, 113, 331–343. [Google Scholar] [CrossRef]
- Zhang, M.; Shao, W.; Yang, T.; Liu, H.; Guo, S.; Zhao, D.; Weng, Y.; Liang, X.J.; Huang, Y. Conscription of Immune Cells by Light-Activatable Silencing NK-Derived Exosome (LASNEO) for Synergetic Tumor Eradication. Adv. Sci. 2022, 9, e2201135. [Google Scholar] [CrossRef]
- Guo, S.; Li, K.; Hu, B.; Li, C.; Zhang, M.; Hussain, A.; Wang, X.; Cheng, Q.; Yang, F.; Ge, K.; et al. Membrane-destabilizing ionizable lipid empowered imaging-guided siRNA delivery and cancer treatment. Exploration 2021, 1, 20210008. [Google Scholar] [CrossRef]
- Weng, Y.; Xiao, H.; Zhang, J.; Liang, X.-J.; Huang, Y. RNAi therapeutic and its innovative biotechnological evolution. Biotechnol. Adv. 2019, 37, 801–825. [Google Scholar] [CrossRef]
- Holen, T.; Amarzguioui, M.; Wiiger, M.T.; Babaie, E.; Prydz, H. Positional effects of short interfering RNAs targeting the human coagulation trigger Tissue Factor. Nucleic Acids Res. 2002, 30, 1757–1766. [Google Scholar] [CrossRef]
- Reynolds, A.; Leake, D.; Boese, Q.; Scaringe, S.; Marshall, W.S.; Khvorova, A. Rational siRNA design for RNA interference. Nat. Biotechnol. 2004, 22, 326–330. [Google Scholar] [CrossRef]
- Ladunga, I. More complete gene silencing by fewer siRNAs: Transparent optimized design and biophysical signature. Nucleic Acids Res. 2006, 35, 433–440. [Google Scholar] [CrossRef]
- Khvorova, A.; Reynolds, A.; Jayasena, S.D. Functional siRNAs and miRNAs exhibit strand bias. Cell 2003, 115, 209–216. [Google Scholar] [CrossRef]
- Schwarz, D.S.; Hutvágner, G.; Du, T.; Xu, Z.; Aronin, N.; Zamore, P.D. Asymmetry in the Assembly of the RNAi Enzyme Complex. Cell 2003, 115, 199–208. [Google Scholar] [CrossRef]
- Chalk, A.M.; Wahlestedt, C.; Sonnhammer, E.L. Improved and automated prediction of effective siRNA. Biochem. Biophys Res. Commun. 2004, 319, 264–274. [Google Scholar] [CrossRef]
- Ui-Tei, K.; Naito, Y.; Takahashi, F.; Haraguchi, T.; Ohki-Hamazaki, H.; Juni, A.; Ueda, R.; Saigo, K. Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference. Nucleic Acids Res. 2004, 32, 936–948. [Google Scholar] [CrossRef]
- Saetrom, P.; Snove, O., Jr. A comparison of siRNA efficacy predictors. Biochem. Biophys Res. Commun. 2004, 321, 247–253. [Google Scholar] [CrossRef]
- Heale, B.S.E.; Soifer, H.S.; Bowers, C.; Rossi, J.J. siRNA target site secondary structure predictions using local stable substructures. Nucleic Acids Res. 2005, 33, e30. [Google Scholar] [CrossRef]
- Yiu, S.M.; Wong, P.W.H.; Lam, T.; Mui, Y.; Kung, H.F.; Lin, M.; Cheung, Y.T. Filtering of Ineffective siRNAs and Improved siRNA Design Tool. Bioinformatics 2004, 21, 144–151. [Google Scholar] [CrossRef]
- Patzel, V.; Rutz, S.; Dietrich, I.; Köberle, C.; Scheffold, A.; Kaufmann, S.H. Design of siRNAs producing unstructured guide-RNAs results in improved RNA interference efficiency. Nat. Biotechnol. 2005, 23, 1440–1444. [Google Scholar] [CrossRef]
- Saetrom, P. Predicting the efficacy of short oligonucleotides in antisense and RNAi experiments with boosted genetic programming. Bioinformatics 2004, 20, 3055–3063. [Google Scholar] [CrossRef]
- Pancoska, P.; Moravek, Z.; Moll, U.M. Efficient RNA interference depends on global context of the target sequence: Quantitative analysis of silencing efficiency using Eulerian graph representation of siRNA. Nucleic Acids Res. 2004, 32, 1469–1479. [Google Scholar] [CrossRef]
- Mittal, V. Improving the efficiency of RNA interference in mammals. Nat. Rev. Genet. 2004, 5, 355–365. [Google Scholar] [CrossRef]
- Sandy, P.; Ventura, A.; Jacks, T. Mammalian RNAi: A practical guide. Biotechniques 2005, 39, 215–224. [Google Scholar] [CrossRef] [PubMed]
- Gong, D.; Ferrell, J.E., Jr. Picking a winner: New mechanistic insights into the design of effective siRNAs. Trends Biotechnol. 2004, 22, 451–454. [Google Scholar] [CrossRef] [PubMed]
- Gong, W.; Ren, Y.; Zhou, H.; Wang, Y.; Kang, S.; Li, T. siDRM: An effective and generally applicable online siRNA design tool. Bioinformatics 2008, 24, 2405–2406. [Google Scholar] [CrossRef] [PubMed]
- Ren, Y.; Gong, W.; Xu, Q.; Zheng, X.; Lin, D.; Wang, Y.; Li, T. siRecords: An extensive database of mammalian siRNAs with efficacy ratings. Bioinformatics 2006, 22, 1027–1028. [Google Scholar] [CrossRef] [PubMed]
- Huesken, D.; Lange, J.; Mickanin, C.; Weiler, J.; Asselbergs, F.; Warner, J.; Meloon, B.; Engel, S.; Rosenberg, A.; Cohen, D.; et al. Design of a genome-wide siRNA library using an artificial neural network. Nat. Biotechnol. 2005, 23, 995–1001. [Google Scholar] [CrossRef] [PubMed]
- Shabalina, S.A.; Spiridonov, A.N.; Ogurtsov, A.Y. Computational models with thermodynamic and composition features improve siRNA design. BMC Bioinform. 2006, 7, 65. [Google Scholar] [CrossRef]
- Matveeva, O.; Nechipurenko, Y.; Rossi, L.; Moore, B.; Saetrom, P.; Ogurtsov, A.Y.; Atkins, J.F.; Shabalina, S.A. Comparison of approaches for rational siRNA design leading to a new efficient and transparent method. Nucleic Acids Res. 2007, 35, e63. [Google Scholar] [CrossRef]
- He, F.; Han, Y.; Gong, J.; Song, J.; Wang, H.; Li, Y. Predicting siRNA efficacy based on multiple selective siRNA representations and their combination at score level. Sci. Rep. 2017, 7, 44836. [Google Scholar] [CrossRef]
- Jia, X.; Han, Q.; Lu, Z. Constructing the boundary between potent and ineffective siRNAs by MG-algorithm with C-features. BMC Bioinform. 2022, 23, 337. [Google Scholar] [CrossRef]
- Ayyagari, V.S. Design of siRNA molecules for silencing of membrane glycoprotein, nucleocapsid phosphoprotein, and surface glycoprotein genes of SARS-CoV2. J. Genet. Eng. Biotechnol. 2022, 20, 65. [Google Scholar] [CrossRef]
- Watts, J.K.; Deleavey, G.F.; Damha, M.J. Chemically modified siRNA: Tools and applications. Drug Discov. Today 2008, 13, 842–855. [Google Scholar] [CrossRef]
- Takabatake, Y.; Isaka, Y.; Mizui, M.; Kawachi, H.; Takahara, S.; Imai, E. Chemically modified siRNA prolonged RNA interference in renal disease. Biochem. Biophys Res. Commun. 2007, 363, 432–437. [Google Scholar] [CrossRef]
- Koller, E.; Propp, S.; Murray, H.; Lima, W.; Bhat, B.; Prakash, T.P.; Allerson, C.R.; Swayze, E.E.; Marcusson, E.G.; Dean, N.M. Competition for RISC binding predicts In Vitro potency of siRNA. Nucleic Acids Res. 2006, 34, 4467–4476. [Google Scholar] [CrossRef]
- Ui-Tei, K.; Naito, Y.; Zenno, S.; Nishi, K.; Yamato, K.; Takahashi, F.; Juni, A.; Saigo, K. Functional dissection of siRNA sequence by systematic DNA substitution: Modified siRNA with a DNA seed arm is a powerful tool for mammalian gene silencing with significantly reduced off-target effect. Nucleic Acids Res. 2008, 36, 2136–2151. [Google Scholar] [CrossRef]
- Jackson, A.L.; Burchard, J.; Leake, D.; Reynolds, A.; Schelter, J.; Guo, J.; Johnson, J.M.; Lim, L.; Karpilow, J.; Nichols, K.; et al. Position-specific chemical modification of siRNAs reduces “off-target” transcript silencing. RNA 2006, 12, 1197–1205. [Google Scholar] [CrossRef]
- Hokuldsson, A. A PLS regression methods. J. Chemometr. 1988, 2, 211–228. [Google Scholar] [CrossRef]
- Bramsen, J.B.; Laursen, M.B.; Nielsen, A.F.; Hansen, T.B.; Bus, C.; Langkjaer, N.; Babu, B.R.; Hojland, T.; Abramov, M.; van Aerschot, A.; et al. A large-scale chemical modification screen identifies design rules to generate siRNAs with high activity, high stability and low toxicity. Nucleic Acids Res. 2009, 37, 2867–2881. [Google Scholar] [CrossRef]
- Golbraikh, A.; Tropsha, A. Beware of q2. J. Mol. Graph. Model. 2002, 20, 269–276. [Google Scholar] [CrossRef]
- Burden, F.R. Molecular Identification Number for Substructure Searches. J. Chem. Inf. Comput. Sci. 1989, 29, 225–227. [Google Scholar] [CrossRef]
- Pearlman, R.S.; Smith, K.M. Metric Validation and the Receptor-Relevant Subspace Concept. J. Chem. Inf. Comput. Sci. 1999, 39, 28–35. [Google Scholar] [CrossRef]
- Sandberg, M.; Eriksson, L.; Jonsson, J.; Sjostrom, M.; Wold, S. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J. Med. Chem. 1998, 41, 2481–2491. [Google Scholar] [CrossRef]
- Jonsson, J.; Eriksson, L.; Hellberg, S.; Lindgren, F.; Sjostrom, M.; Wold, S. A multivariate representation and analysis of DNA sequence data. Acta Chem. Scand. 1991, 45, 186–192. [Google Scholar] [CrossRef] [Green Version]
- Jonsson, J.; Norberg, T.; Carlsson, L.; Gustafsson, C.; Wold, S. Quantitative sequence-activity models (QSAM)—Tools for sequence design. Nucleic Acids Res. 1993, 21, 733–739. [Google Scholar] [CrossRef]
- Golbraikh, A.; Shen, M.; Xiao, Z.; Xiao, Y.D.; Lee, K.H.; Tropsha, A. Rational selection of training and test sets for the development of validated QSAR models. J. Comput. Aided Mol. Des. 2003, 17, 241–253. [Google Scholar] [CrossRef]
- Golbraikh, A.; Tropsha, A. Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. Mol. Divers. 2002, 5, 231–243. [Google Scholar] [CrossRef]
- Carpenter, G.; Grossberg, S.; Rosen, D. Art 2-A: An adaptive resonance algorithm for rapid category learning and recognition. Neural Netw. 1991, 4, 493–504. [Google Scholar] [CrossRef]
- Tropsha, A.; Golbraikh, A. Predictive QSAR modeling workflow, model applicability domains, and virtual screening. Curr. Pharm. Des. 2007, 13, 3494–3504. [Google Scholar] [CrossRef]
- Lučić, B.; Batista, J.; Bojović, V.; Lovrić, M.; Sović Kržić, A.; Bešlo, D.; Vikić-Topić, D. Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges. Croat. Chem. Acta 2019, 92, 379–391. [Google Scholar] [CrossRef]
- Ebalunode, J.O.; Zheng, W. Cheminformatics Approach to Gene Silencing: Z Descriptors of Nucleotides and SVM Regression Afford Predictive Models for siRNA Potency. Mol. Inform. 2010, 29, 871–881. [Google Scholar] [CrossRef]
- Dar, S.A.; Gupta, A.K.; Thakur, A.; Kumar, M. SMEpred workbench: A web server for predicting efficacy of chemically modified siRNAs. RNA Biol. 2016, 13, 1144–1151. [Google Scholar] [CrossRef] [Green Version]
NT | PEOE_0 | PEOE_1 | PEOE_2 | PEOE_3 | SLOGP_0 | SLOGP_1 | SLOGP_2 | SLOGP_3 | SMR_0 | SMR_1 | SMR_2 | SMR_3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
A | −2.08 | −0.62 | 0.72 | 2.13 | −2.32 | −0.60 | 0.76 | 2.19 | −1.75 | −0.40 | 0.96 | 2.45 |
C | −2.26 | −0.49 | 0.58 | 2.21 | −2.52 | −0.44 | 0.63 | 2.10 | −1.91 | −0.38 | 0.68 | 2.54 |
G | −2.26 | −0.66 | 0.73 | 2.30 | −2.63 | −0.60 | 0.77 | 2.23 | −1.92 | −0.49 | 0.92 | 2.60 |
U | −2.30 | −0.56 | 0.55 | 2.29 | −2.61 | −0.51 | 0.66 | 2.10 | −1.97 | −0.38 | 0.69 | 2.60 |
T | −2.36 | −0.49 | 0.49 | 2.37 | −2.62 | −0.34 | 0.60 | 2.34 | −2.06 | −0.31 | 0.59 | 2.66 |
Dataset | Mean | Standard Deviation | Maximum | Minimum |
---|---|---|---|---|
Training | 0.66 | 0.003 | 0.67 | 0.65 |
Test | 0.63 | 0.02 | 0.68 | 0.58 |
Training Sets (21-NT) b | Test Sets (21-NT) | |||
---|---|---|---|---|
All (249) | All-human (198) | hE2 (139) | Rodent (51) | |
All (2182) | 0.65 (0.66) a | 0.62 (0.63) a | 0.62 (0.63) a | 0.76 (0.77) a |
All-human (1744) | 0.64 (0.65) | 0.60 (0.61) | 0.60 (0.62) | 0.75 (0.76 |
Humans-E2s (1229) | 0.64 (0.65) | 0.6 (0.62) | 0.6 (0.62) | 0.75 (0.76) |
Rodent (438) | 0.61 (0.55) | 0.6 (0.54) | 0.59 (0.53) | 0.63 (0.57) |
Random-all (1091) | 0.64 (0.65) | 0.60 (0.62) | 0.60 (0.61) | 0.76 (0.75) |
Random-all (727) | 0.64 (0.65) | 0.62 (0.63) | 0.63 (0.63) | 0.72 (0.76) |
Random-all (545) | 0.60 (0.62) | 0.59 (0.60) | 0.59 (0.60) | 0.66 (0.70) |
Random-all (218) | 0.52 (0.47) | 0.48 (0.47) | 0.46 (0.46) | 0.68 (0.46) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dong, X.; Zheng, W. Cheminformatics Modeling of Gene Silencing for Both Natural and Chemically Modified siRNAs. Molecules 2022, 27, 6412. https://doi.org/10.3390/molecules27196412
Dong X, Zheng W. Cheminformatics Modeling of Gene Silencing for Both Natural and Chemically Modified siRNAs. Molecules. 2022; 27(19):6412. https://doi.org/10.3390/molecules27196412
Chicago/Turabian StyleDong, Xialan, and Weifan Zheng. 2022. "Cheminformatics Modeling of Gene Silencing for Both Natural and Chemically Modified siRNAs" Molecules 27, no. 19: 6412. https://doi.org/10.3390/molecules27196412
APA StyleDong, X., & Zheng, W. (2022). Cheminformatics Modeling of Gene Silencing for Both Natural and Chemically Modified siRNAs. Molecules, 27(19), 6412. https://doi.org/10.3390/molecules27196412