Predictive Modeling of Critical Temperatures in Superconducting Materials
Abstract
:1. Introduction
2. Results and Discussion
2.1. Data Pre-Processing
2.2. Model Development
2.3. Optimization of the Best Models
2.4. Interpretation of Optimized Model and Potential Real-World Applications
3. Materials and Methods
3.1. Dataset
3.2. Duplicates Removal
3.3. Attribute Selection
3.4. QSPR Modeling
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Conflicts of Interest
Sample Availability
Abbreviations
AE | absolute error |
MLR | multiple linear regression |
NN | neural network |
PCA | principal component analysis |
RF | random forests |
PLS | projections to latent structures |
RMSE | root mean squared error |
XGBoost | gradient boosted decision trees |
References
- Hamidieh, K. A data-driven statistical model for predicting the critical temperature of a superconductor. Comput. Mater. Sci. 2018, 154, 346–354. [Google Scholar] [CrossRef] [Green Version]
- Mousavi, T.; Grovenor, C.R.M.; Speller, S.C. Structural parameters affecting superconductivity in iron chalcogenides: A review. Mater. Sci. Technol. 2014. [Google Scholar] [CrossRef]
- Bardeen, J.; Rickayzen, G.; Tewordt, L. Theory of the Thermal Conductivity of Superconductors. Phys. Rev. 1959, 113, 982–994. [Google Scholar] [CrossRef]
- Gallop, J.C. Introduction to Superconductivity, in: SQUIDs. Josephson Eff. Supercond. Electron. 2018. [Google Scholar] [CrossRef] [Green Version]
- Schafroth, M.R. Theory of superconductivity. Phys. Rev. 1954. [Google Scholar] [CrossRef]
- Stanev, V.; Oses, C.; Kusne, A.G.; Rodriguez, E.; Paglione, J.; Curtarolo, S.; Takeuchi, I. Machine learning modeling of superconducting critical temperature. Npj Comput. Mater. 2018. [Google Scholar] [CrossRef]
- Kononenko, O.; Adolphsen, C.; Li, Z.; Ng, C.-K.; Rivetta, C. 3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite. Phys. Rev. Accel. Beams. 2017, 20, 102001. [Google Scholar] [CrossRef] [Green Version]
- Tanaka, I.; Rajan, K.; Wolverton, C. Data-centric science for materials innovation. MRS Bull. 2018, 43, 659–663. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.; Zhao, T.; Ju, W.; Shi, S. Materials discovery and design using machine learning. J. Mater. 2017. [Google Scholar] [CrossRef]
- Smith, J.S.; Isayev, O.; Roitberg, A.E. ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 2017, 4, 3192–3203. [Google Scholar] [CrossRef] [Green Version]
- Jha, D.; Ward, L.; Paul, A.; Liao, W.; Choudhary, A.; Wolverton, C.; Agrawal, A. ElemNet: Deep Learning the Chemistry of Materials From Only Elemental Composition. Sci. Rep. 2018, 8, 17593. [Google Scholar] [CrossRef] [PubMed]
- Sizochenko, N.; Mikolajczyk, A.; Jagiello, K.; Puzyn, T.; Leszczynski, J.; Rasulev, B. How toxicity of nanomaterials towards different species could be simultaneously evaluated: Novel multi-nano-read-across approach. Nanoscale 2018, 10, 582–591. [Google Scholar] [CrossRef] [PubMed]
- Halder, A.K.; Moura, A.S.; Cordeiro, M.N.D.S. QSAR modelling: A therapeutic patent review 2010-present. Expert Opin. Ther. Pat. 2018. [Google Scholar] [CrossRef] [PubMed]
- Goh, G.B.; Hodas, N.O.; Vishnu, A. Deep learning for computational chemistry. J. Comput. Chem. 2017. [Google Scholar] [CrossRef] [Green Version]
- Tropsha, A. Best practices for QSAR model development, validation, and exploitation. Mol. Inform. 2010, 29, 476–488. [Google Scholar] [CrossRef]
- Cherkasov, A.; Muratov, E.N.; Fourches, D.; Varnek, A.; Baskin, I.I.; Cronin, M.; Dearden, J.; Gramatica, P.; Martin, Y.C.; Todeschini, R.; et al. QSAR Modeling: Where Have You Been? Where Are You Going To? J. Med. Chem. 2014, 57, 4977–5010. [Google Scholar] [CrossRef] [Green Version]
- Correa-Baena, J.-P.; Hippalgaonkar, K.; van Duren, J.; Jaffer, S.; Chandrasekhar, V.R.; Stevanovic, V.; Wadia, C.; Guha, S.; Buonassisi, T. Accelerating Materials Development via Automation, Machine Learning, and High-Performance Computing. Joule 2018, 2, 1410–1420. [Google Scholar] [CrossRef] [Green Version]
- Ghiringhelli, L.M.; Vybiral, J.; Levchenko, S.V.; Draxl, C.; Scheffler, M. Big Data of Materials Science: Critical Role of the Descriptor. Phys. Rev. Lett. 2015, 114, 105503. [Google Scholar] [CrossRef] [Green Version]
- De Jong, M.; Chen, W.; Notestine, R.; Persson, K.; Ceder, G.; Jain, A.; Asta, M.; Gamst, A. A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds. Sci. Rep. 2016, 6, 34256. [Google Scholar] [CrossRef]
- Lehmus, K.; Karppinen, M. Application of Multivariate Data Analysis Techniques in Modeling Structure–Property Relationships of Some Superconductive Cuprates. J. Solid State Chem. 2001, 162, 1–9. [Google Scholar] [CrossRef]
- Villars, P.; Phillips, J. Quantum structural diagrams and high-T_{c} superconductivity. Phys. Rev. B. 1988, 37, 2345–2348. [Google Scholar] [CrossRef]
- OECD. Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)Sar] Models. Transport 2007. [Google Scholar] [CrossRef]
- Sizochenko, N.; Jagiello, K.; Leszczynski, J.; Puzyn, T. How the “Liquid Drop” Approach Could Be Efficiently Applied for Quantitative Structure–Property Relationship Modeling of Nanofluids. J. Phys. Chem. C. 2015, 119, 25542–25547. [Google Scholar] [CrossRef]
- Mejía-Salazar, J.R.; Perea, J.D.; Castillo, R.; Diosa, J.E.; Baca, E. Hybrid superconducting-ferromagnetic [Bi2Sr2(Ca,Y)2Cu3O10]0.99(La2/3Ba1/3MnO3)0.01 composite thick films. Materials 2019, 12, 861. [Google Scholar] [CrossRef] [Green Version]
- Zhang, G.; Samuely, T.; Xu, Z.; Jochum, J.K.; Volodin, A.; Zhou, S.; May, P.W.; Onufriienko, O.; Kačmarčík, J.; Steele, J.A.; et al. Superconducting Ferromagnetic Nanodiamond. ACS Nano. 2017. [Google Scholar] [CrossRef]
- Bache, K.; Lichman, M. UCI Machine Learning Repositor. Univ. Calif. Irvine Sch. Inf. 2013. [Google Scholar] [CrossRef] [Green Version]
- Xu, Y.; Hosoya, J.; Sakairi, Y.; Yamasato, H. Superconducting Material Database (SuperCon), n.d. Available online: https://supercon.nims.go.jp/index_en.html (accessed on 18 August 2020).
- Jurs, P.C. Mathematica. J. Chem. Inf. Comput. Sci. 1992. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Liu, P.; Long, W. Current mathematical methods used in QSAR/QSPR studies. Int. J. Mol. Sci. 2009, 10, 1978. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- RapidMiner Studio, version (9.3); (n.d.); RapidMiner Inc.: Boston, MA, USA, 2019.
Preprocessing | Attribute Selection | Performance | ||
---|---|---|---|---|
R2 | RMSE | AE | ||
Cleaned Dataset | n/a | 0.726 ± 0.012 | 17.664 ± 0.279 | 13.317 ± 0.194 |
weight by relief | 0.611 ± 0.017 | 21.038 ± 0.490 | 16.286 ± 0.374 | |
weight by PCA | 0.606 ± 0.016 | 21.170 ± 0.453 | 16.131 ± 0.326 | |
weight by correlation | 0.618 ± 0.011 | 20.860 ± 0.372 | 16.060 ± 0.239 | |
Correlations Removed | n/a | 0.699 ± 0.009 | 18.505 ± 0.348 | 14.185 ± 0.265 |
weight by relief | 0.657 ± 0.021 | 19.771 ± 0.521 | 14.957 ± 0.391 | |
weight by PCA | 0.576 ± 0.011 | 21.957 ± 0.243 | 17.339 ± 0.243 | |
weight by correlation | 0.610 ± 0.006 | 21.063 ± 0.236 | 16.760 ± 0.165 | |
No Outliers | n/a * | 0.734 ± 0.007 | 17.414 ± 0.251 | 13.124 ± 0.241 |
weigh by relief | 0.607 ± 0.013 | 21.199 ± 0.349 | 16.351 ± 0.342 | |
weight by PCA | 0.616 ± 0.012 | 20.936 ± 0.289 | 15.927 ± 0.262 | |
weight by correlation | 0.626 ± 0.014 | 20.682 ± 0.347 | 15.882 ± 0.239 | |
Correlations Removed, No Outliers | n/a | 0.708 ± 0.016 | 18.244 ± 0.435 | 13.983 ± 0.411 |
weight by relief | 0.603 ± 0.017 | 21.310 ± 0.378 | 16.631 ± 0.347 | |
weight by PCA | 0.585 ± 0.010 | 21.761 ± 0.367 | 17.163 ± 0.270 | |
weight by correlation | 0.619 ± 0.016 | 20.867 ± 0.323 | 16.578 ± 0.293 |
Preprocessing | Attribute Selection | Performance | ||
---|---|---|---|---|
R2 | RMSE | AE | ||
Cleaned Dataset | n/a | 0.840 ± 0.011 | 14.376 ±0.346 | 10.515 ± 0.269 |
weight by relief | 0.801 ± 0.015 | 15.774 ± 0.467 | 11.489 ± 0.366 | |
weight by PCA | 0.808 ± 0.007 | 15.576 ± 0.319 | 11.354 ± 0.143 | |
weight by correlation | 0.803 ± 0.009 | 15.715 ± 0.315 | 11.442 ± 0.231 | |
Correlations Removed | n/a | 0.831 ± 0.012 | 14.718 ± 0.441 | 10.704 ± 0.309 |
weight by relief | 0.810 ± 0.011 | 15.486 ± 0.406 | 11.356 ± 0.193 | |
weight by PCA | 0.799 ± 0.006 | 15.864 ± 0.247 | 11.441 ± 0.220 | |
weight by correlation | 0.814 ± 0.006 | 15.337 ± 0.273 | 11.143 ± 0.173 | |
No Outliers | n/a* | 0.847 ± 0.009 | 14.132 ± 0.347 | 10.314 ± 0.260 |
weigh by relief | 0.810 ± 0.014 | 15.473 ± 0.344 | 11.250 ± 0.226 | |
weight by PCA | 0.812 ± 0.007 | 15.424 ± 0.291 | 11.238 ± 0.191 | |
weight by correlation | 0.810 ± 0.012 | 15.494 ± 0.250 | 11.222 ± 0.181 | |
Correlations Removed, No Outliers | n/a | 0.839 ± 0.012 | 14.428 ± 0.428 | 10.472 ± 0.301 |
weight by relief | 0.817 ± 0.014 | 15.237 ± 0.349 | 11.113 ± 0.245 | |
weight by PCA | 0.803 ± 0.015 | 15.756 ± 0.428 | 11.337 ± 0.266 | |
weight by correlation | 0.820 ± 0.016 | 15.114 ± 0.463 | 10.969 ± 0.280 |
Preprocessing | Attribute Selection | Performance | ||
---|---|---|---|---|
R2 | RMSE | AE | ||
Cleaned Dataset | n/a | 0.863 ± 0.010 | 12.614 ± 0.466 | 8.351 ± 0.300 |
weight by relief | 0.836 ± 0.005 | 13.745 ± 0.239 | 9.105 ± 0.171 | |
weight by PCA | 0.844 ± 0.007 | 13.410 ± 0.315 | 8.815 ± 0.150 | |
weight by correlation | 0.851 ± 0.007 | 13.119 ± 0.194 | 8.643 ± 0.166 | |
Correlations Removed | n/a | 0.855 ± 0.011 | 12.965 ± 0.490 | 8.591 ± 0.315 |
weight by relief | 0.830 ± 0.014 | 13.987 ± 0.470 | 9.308 ± 0.249 | |
weight by PCA | 0.837 ± 0.011 | 13.715 ± 0.354 | 9.010 ± 0.203 | |
weight by correlation | 0.846 ± 0.009 | 13.331 ± 0.391 | 8.788 ± 0.202 | |
No Outliers | n/a* | 0.868 ± 0.007 | 12.399 ± 0.247 | 8.180 ± 0.165 |
weigh by relief | 0.848 ± 0.011 | 13.278 ± 0.439 | 8.748 ± 0.276 | |
weight by PCA | 0.849 ± 0.010 | 13.224 ± 0.496 | 8.670 ± 0.313 | |
weight by correlation | 0.856 ± 0.007 | 12.893 ± 0.251 | 8.431 ± 0.134 | |
Correlations Removed, No Outliers | n/a | 0.859 ± 0.014 | 12.790 ± 0.371 | 8.426 ± 0.177 |
weight by relief | 0.848 ± 0.017 | 13.266 ± 0.558 | 8.789 ± 0.277 | |
weight by PCA | 0.843 ± 0.010 | 13.497 ± 0.415 | 8.827 ± 0.229 | |
weight by correlation | 0.853 ± 0.015 | 13.063 ± 0.474 | 8.579 ± 0.230 |
Preprocessing | Attribute Selection | Performance | ||
---|---|---|---|---|
R2 | RMSE | AE | ||
Cleaned Dataset | n/a | 0.837 ± 0.012 | 14.194 ± 0.696 | 9.619 ± 0.426 |
weight by relief | 0.746 ± 0.013 | 17.685 ± 0.603 | 12.667 ± 0.755 | |
weight by PCA | 0.763 ± 0.012 | 16.902 ± 0.866 | 11.906 ± 1.058 | |
weight by correlation | 0.769 ± 0.011 | 16.857 ± 1.009 | 12.028 ± 1.167 | |
Correlations Removed | n/a | 0.831 ± 0.009 | 14.637 ± 0.848 | 10.379 ± 0.999 |
weight by relief | 0.783 ± 0.019 | 16.496 ± 1.023 | 11.700 ± 1.117 | |
weight by PCA | 0.766 ± 0.016 | 17.086 ± 0.942 | 12.249 ± 0.987 | |
weight by correlation | 0.780 ± 0.012 | 16.746 ± 1.231 | 12.054 ± 1.343 | |
No Outliers | n/a* | 0.842 ± 0.007 | 14.186 ± 0.794 | 10.021 ± 1.137 |
weigh by relief | 0.755 ± 0.013 | 17.460 ± 0.773 | 12.497 ± 1.069 | |
weight by PCA | 0.773 ± 0.013 | 16.888 ± 0.942 | 12.287 ± 1.019 | |
weight by correlation | 0.774 ± 0.013 | 16.805 ± 0.937 | 12.004 ± 1.007 | |
Correlations Removed, No Outliers | n/a | 0.834 ± 0.010 | 13.996 ± 0.332 | 9.369 ± 0.305 |
weight by relief | 0.777 ± 0.010 | 16.541 ± 0.599 | 11.817 ± 0.726 | |
weight by PCA | 0.775 ± 0.012 | 16.858 ± 1.206 | 12.016 ± 1.566 | |
weight by correlation | 0.793 ± 0.012 | 16.394 ± 1.532 | 11.916 ± 2.028 |
Preprocessing | Performance | Algorithm | |||
---|---|---|---|---|---|
MLR | XGBoost | RF | NN | ||
Aggregation Only 1 | R2 | 0.542 ± 0.014 | 0.768 ± 0.014 | 0.825 ± 0.008 | 0.688 ± 0.013 |
RMSE | 0.677 ± 0.012 | 0.501 ± 0.013 | 0.421 ± 0.012 | 0.566 ± 0.018 | |
AE | 0.535 ± 0.008 | 0.364 ± 0.011 | 0.278 ± 0.007 | 0.408 ± 0.023 | |
Aggregation Only 1, No outliers | R2 | 0.530 ± 0.013 | 0.780 ± 0.012 | 0.834 ± 0.012 | 0.691 ± 0.021 |
RMSE | 0.673 ± 0.017 | 0.492 ± 0.012 | 0.412 ± 0.015 | 0.574 ± 0.023 | |
AE | 0.551 ± 0.016 | 0.356 ± 0.009 | 0.270 ± 0.010 | 0.419 ± 0.024 | |
Aggregation, Merged Attributes | R2 | 0.726 ± 0.011 | 0.840 ± 0.012 | 0.863 ± 0.011 | 0.836 ± 0.009 |
RMSE | 17.657 ± 0.421 | 14.376 ± 0.433 | 12.615 ± 0.433 | 14.224 ± 0.591 | |
AE | 13.312 ± 0.263 | 10.490 ± 0.293 | 8.339 ± 0.261 | 9.932 ± 0.836 | |
Aggregation, Merged Attributes, No outliers | R2 | 0.735 ± 0.006 | 0.846 ± 0.012 | 0.867 ± 0.012 | 0.844 ± 0.011 |
RMSE | 17.409 ± 0.300 | 14.126 ± 0.378 | 12.405 ± 0.524 | 13.624 ± 0.391 | |
AE | 13.121 ± 0.293 | 10.279 ± 0.318 | 8.186 ± 0.377 | 9.469 ± 0.504 |
Preprocessing | Attribute selection | Performance | ||
---|---|---|---|---|
R2 | RMSE | AE | ||
Original Dataset (with Duplicates) | n/a (XGBoost model from [1]) | 0.92 | 9.5 | - |
n/a | 0.926 ± 0.004 | 9.344 ± 0.289 | 5.142 ± 0.147 | |
weight by relief | 0.922 ± 0.005 | 9.544 ± 0.372 | 5.313 ± 0.160 | |
weight by PCA | 0.922 ± 0.007 | 9.551 ± 0.357 | 5.346 ± 0.107 | |
weight by correlation | 0.923 ± 0.007 | 9.494 ± 0.504 | 5.297 ± 0.168 | |
Cleaned Dataset | n/a | 0.923 ± 0.005 | 9.365 ± 0.329 | 5.168 ± 0.110 |
weight by relief | 0.914 ± 0.009 | 9.882 ± 0.518 | 5.504 ± 0.221 | |
weight by PCA | 0.917 ± 0.009 | 9.737 ± 0.476 | 5.513 ± 0.248 | |
weight by correlation | 0.917 ± 0.009 | 9.683 ± 0.492 | 5.510 ± 0.141 | |
Correlations Removed | n/a | 0.925 ± 0.005 | 9.265 ± 0.244 | 5.170 ± 0.190 |
weight by relief | 0.920 ± 0.009 | 9.557 ± 0.511 | 5.377 ± 0.256 | |
weight by PCA | 0.918 ± 0.008 | 9.665 ± 0.442 | 5.463 ± 0.189 | |
weight by correlation | 0.919 ± 0.009 | 9.613 ± 0.544 | 5.424 ± 0.235 | |
No Outliers | n/a * | 0.930 ± 0.012 | 8.927 ± 0.689 | 4.975 ± 0.259 |
weight by relief | 0.921 ± 0.007 | 9.497 ± 0.417 | 5.334 ± 0.169 | |
weight by PCA | 0.920 ± 0.007 | 9.557 ± 0.388 | 5.408 ± 0.211 | |
weight by correlation | 0.922 ± 0.010 | 9.444 ± 0.593 | 5.354 ± 0.285 | |
Correlations Removed, No Outliers | n/a | 0.929 ± 0.005 | 9.012 ± 0.319 | 5.030 ± 0.121 |
weight by relief | 0.924 ± 0.004 | 9.336 ± 0.242 | 5.296 ± 0.121 | |
weight by PCA | 0.922 ± 0.006 | 9.413 ± 0.379 | 5.332 ± 0.196 | |
weight by correlation | 0.921 ± 0.011 | 9.477 ± 0.659 | 5.334 ± 0.279 |
Attribute 1 | Relative Importance | Scaled Importance |
---|---|---|
range_ThermalConductivity | 47,722,904.0 | 1.000 |
wtd_gmean_ThermalConductivity | 10,336,861.0 | 0.217 |
range_atomic_radius | 3,051,781.3 | 0.064 |
range_atomic_mass | 2,503,977.0 | 0.052 |
range_fie | 2,469,144.3 | 0.052 |
wtd_range_fie | 1,768,628.4 | 0.037 |
wtd_mean_atomic_mass | 1,551,901.4 | 0.033 |
mean_Density | 1,533,498.8 | 0.032 |
gmean_atomic_radius | 1,522,213.5 | 0.032 |
wtd_range_atomic_radius | 1,455,983.8 | 0.031 |
wtd_mean_Density | 890,073.5 | 0.019 |
wtd_std_fie | 832,274.6 | 0.017 |
wtd_mean_atomic_radius | 832,100.6 | 0.017 |
mean_fie | 792,180.4 | 0.017 |
range_Density | 744,477.6 | 0.016 |
range_ElectronAffinity | 720,590.1 | 0.015 |
gmean_ThermalConductivity | 670,280.3 | 0.014 |
mean_atomic_mass | 664,245.5 | 0.014 |
gmean_atomic_mass | 412,535.3 | 0.009 |
wtd_gmean_Density | 344,775.4 | 0.007 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sizochenko, N.; Hofmann, M. Predictive Modeling of Critical Temperatures in Superconducting Materials. Molecules 2021, 26, 8. https://doi.org/10.3390/molecules26010008
Sizochenko N, Hofmann M. Predictive Modeling of Critical Temperatures in Superconducting Materials. Molecules. 2021; 26(1):8. https://doi.org/10.3390/molecules26010008
Chicago/Turabian StyleSizochenko, Natalia, and Markus Hofmann. 2021. "Predictive Modeling of Critical Temperatures in Superconducting Materials" Molecules 26, no. 1: 8. https://doi.org/10.3390/molecules26010008
APA StyleSizochenko, N., & Hofmann, M. (2021). Predictive Modeling of Critical Temperatures in Superconducting Materials. Molecules, 26(1), 8. https://doi.org/10.3390/molecules26010008