Predicting X-ray Diffraction Quality of Protein Crystals Using a Deep-Learning Method
Abstract
:1. Introduction
2. Image Preprocessing
2.1. Data Collection
2.2. Quantitative Analysis of Diffraction Results
2.2.1. Statistics of Diffraction Spots
2.2.2. Analysis of Diffraction Spots with Resolution
2.2.3. Establish a Scoring Mechanism
2.2.4. Establish a Classification Dataset
2.3. Data Augmentation
3. Algorithm Design
3.1. Network Introduction
3.2. CBAM
4. Results and Discussion
4.1. Experiment Platform
4.2. Evaluation Indicators
- Accuracy: accuracy is one of the most commonly used evaluation indicators, which represents the ratio of the number of correctly classified samples to the total number of samples. The model’s accuracy is calculated as shown in Equation (3).
- Precision: precision refers to the ratio of the number of correctly classified samples to the total number of samples classified into a certain class. Precision measures the classifier’s ability to judge positive examples. The model’s precision is calculated as shown in Equation (4).
- Recall rate: the recall rate refers to the ratio of the number of correctly classified samples among all samples that belong to a certain class to the total number of samples that actually belong to that class. The recall rate measures the classifier’s ability to cover positive examples. The model’s recall rate is calculated as shown in Equation (5).
- F1 score: the F1 score combines the precision rate and the recall rate, which is the harmonic mean of the precision rate and the recall rate. The higher the F1 score, the better the classifier performs in terms of accuracy and recall. The model’s F1 value is calculated as shown in Equation (6).
- Confusion matrix: the confusion matrix is a visual tool used to display the classification results of classifiers in various categories. It can display the classification accuracy and misclassification status of the classifier.
- ROC curve and AUC value: the ROC curve is drawn with the true positive rate as the vertical axis and the false positive rate as the horizontal axis. The AUC (Area Under Curve) value represents the area under the ROC curve, and it is used to evaluate the performance of the classifier.
4.3. Experiment Results and Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Abola, E.; Kuhn, P.; Earnest, T.; Stevens, R.C. Automation of X-ray crystallography. Nat. Struct. Biol. 2000, 7, 973–977. [Google Scholar] [CrossRef] [PubMed]
- Maveyraud, L.; Mourey, L. Protein X-ray Crystallography and Drug Discovery. Molecules 2020, 25, 1030. [Google Scholar] [CrossRef] [PubMed]
- McCarthy, A.A.; Barrett, R.; Beteva, A.; Caserotto, H.; Dobias, F.; Felisaz, F.; Giraud, T.; Guijarro, M.; Janocha, R.; Khadrouche, A.; et al. ID30B—A versatile beamline for macromolecular crystallography experiments at the ESRF. J. Synchrotron Radiat. 2018, 25, 1249–1260. [Google Scholar] [CrossRef] [PubMed]
- Qin, J.; Zhang, Y.; Zhou, H.; Yu, F.; Sun, B.; Wang, Q. Protein crystal instance segmentation based on mask R-CNN. Crystals 2021, 11, 157. [Google Scholar] [CrossRef]
- Elez, K.; Bonvin, A.M.J.J.; Vangone, A. Distinguishing crystallographic from biological interfaces in protein complexes: Role of intermolecular contacts and energetics for classification. BMC Bioinf. 2018, 19, 19–28. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Zhao, H. SADeepcry: A deep learning framework for protein crystallization propensity prediction using self-attention and auto-encoder networks. Briefings Bioinf. 2022, 23, bbac352. [Google Scholar] [CrossRef] [PubMed]
- Bruno, A.E.; Charbonneau, P.; Newman, J.; Snell, E.H.; So, D.R.; Vanhoucke, V.; Watkins, C.J.; Williams, S.; Wilson, J. Classification of crystallization outcomes using deep convolutional neural networks. PLoS ONE 2018, 13, e0198883. [Google Scholar] [CrossRef] [PubMed]
- Elbasir, A.; Moovarkumudalvan, B.; Kunji, K.; Kolatkar, P.R.; Mall, R.; Bensmail, H. DeepCrystal: A deep learning framework for sequence-based protein crystallization prediction. Bioinformatics 2019, 35, 2216–2225. [Google Scholar] [CrossRef] [PubMed]
- Luft, J.R.; Collins, R.J.; Fehrman, N.A.; Lauricella, A.M.; Veatch, C.K.; DeTitta, G.T. A deliberate approach to screening for initial crystallization conditions of biological macromolecules. J. Struct. Biol. 2003, 142, 170–179. [Google Scholar] [CrossRef] [PubMed]
- Leslie, A.G.W.; Powell, H.R. Processing diffraction data with mosflm. In Evolving Methods for Macromolecular Crystallography; NATO Science Series; Springer: Dordrecht, The Netherlands, 2007; Volume 245. [Google Scholar] [CrossRef]
- Kabsch, W. XDS. Acta Crystallogr. Sect. D Biol. Crystallogr. 2010, 66 Pt 2, 125–132. [Google Scholar] [CrossRef] [PubMed]
- Waterman, D.G.; Winter, G.; Gildea, R.J.; Parkhurst, J.M.; Brewster, A.S.; Sauter, N.K.; Evans, G. Diffraction-geometry refinement in the DIALS framework. Crystallogr. Sect. D Struct. Biol. 2016, 72, 558–575. [Google Scholar] [CrossRef] [PubMed]
- White, T.A. Processing serial crystallography data with CrystFEL: A step-by-step guide. Crystallogr. Sect. D Struct. Biol. 2019, D75, 219–233. [Google Scholar] [CrossRef]
- Melnikov, I.; Svensson, O.; Bourenkov, G.; Leonard, G.; Popov, A. The complex analysis of X-ray mesh scans for macromolecular crystallography. Crystallogr. Sect. D Struct. Biol. 2018, 74 Pt 4, 355–365. [Google Scholar] [CrossRef] [PubMed]
- McPherson, A.; Gavira, J.A. Introduction to protein crystallization. Acta Crystallogr. Sect. F Struct. Biol. Commun. 2014, 70, 2–20. [Google Scholar] [CrossRef] [PubMed]
- Wu, K.; Otoo, E.; Suzuki, K. Optimizing two-pass connected-component labeling algorithms. Pattern Anal. Applic. 2009, 12, 117–135. [Google Scholar] [CrossRef]
- Rondeau, J.M.; Schreuder, H. Protein crystallography and drug discovery. In The Practice of Medicinal Chemistry; Camille, G.W., Ed.; Elsevier Ltd.: Amsterdam, The Netherlands, 2008; pp. 605–634. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Juan, Puerto Rico, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 11976–11986. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Juan, Puerto Rico, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
- Zhu, X.; Cheng, D.; Zhang, Z.; Lin, S.; Dai, J. An empirical study of spatial attention mechanisms in deep networks. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6688–6697. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 14–19 June 2020; pp. 11534–11542. [Google Scholar]
Data Category | Original | Enhanced | ||||||
---|---|---|---|---|---|---|---|---|
Train | Val | Test | Total | Train | Val | Test | Total | |
Level 1 (good) | 72 | 18 | 10 | 100 | 360 | 90 | 50 | 500 |
Level 2 (normal) | 64 | 16 | 10 | 90 | 320 | 80 | 50 | 450 |
Level 3 (bad) | 72 | 18 | 10 | 100 | 360 | 90 | 50 | 500 |
Total | 208 | 52 | 30 | 290 | 1040 | 260 | 150 | 1450 |
Network | Accuracy/Recall (%) | Precision (%) | F1 Score (%) | Inference Time (ms) |
---|---|---|---|---|
Vision Transformer [18] | 45.33 | 48.50 | 41.93 | 129 |
DenseNet [26] | 53.33 | 50.57 | 49.47 | 59 |
ShuffleNet [27] | 57.33 | 54.03 | 54.90 | 10 |
ResNet50 [24] | 65.33 | 68.25 | 62.11 | 54 |
ConvNeXt [23] | 67.33 | 71.30 | 66.19 | 7 |
ConvNeXt + SA [28] | 63.33 | 67.42 | 62.20 | 9 |
ConvNeXt + ECA [29] | 69.33 | 69.05 | 68.91 | 9 |
ConvNeXt + CBAM [25] | 75.33 | 76.11 | 75.31 | 11 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shen, Y.; Zhu, Z.; Xiao, Q.; Ye, K.; Wang, Q.; Wang, Y.; Sun, B. Predicting X-ray Diffraction Quality of Protein Crystals Using a Deep-Learning Method. Crystals 2024, 14, 771. https://doi.org/10.3390/cryst14090771
Shen Y, Zhu Z, Xiao Q, Ye K, Wang Q, Wang Y, Sun B. Predicting X-ray Diffraction Quality of Protein Crystals Using a Deep-Learning Method. Crystals. 2024; 14(9):771. https://doi.org/10.3390/cryst14090771
Chicago/Turabian StyleShen, Yujian, Zhongjie Zhu, Qingjie Xiao, Kanglei Ye, Qisheng Wang, Yue Wang, and Bo Sun. 2024. "Predicting X-ray Diffraction Quality of Protein Crystals Using a Deep-Learning Method" Crystals 14, no. 9: 771. https://doi.org/10.3390/cryst14090771
APA StyleShen, Y., Zhu, Z., Xiao, Q., Ye, K., Wang, Q., Wang, Y., & Sun, B. (2024). Predicting X-ray Diffraction Quality of Protein Crystals Using a Deep-Learning Method. Crystals, 14(9), 771. https://doi.org/10.3390/cryst14090771