Machine Learning Classification Workflow and Datasets for Ionospheric VLF Data Exclusion
Abstract
:1. Summary
2. Data Description
3. Methods (Workflow Description)
# | Model Name | Python Function | More Information at: |
---|---|---|---|
1 | Logistic Regression | lr | [14] |
2 | Ridge Classifier | ridge | [14] |
3 | Linear Discriminant Analysis | lda | [15] |
4 | Random Forest Classifier | rf | [3] |
5 | Naive Bayes | nb | [14] |
6 | Gradient boosting Classifier | gba | [16] |
7 | Adaboost Classifier | ada | [17] |
8 | Extra Trees Classifier | et | [18] |
9 | Quadratic Discriminant Analysis | qda | [19] |
10 | Light Gradient Boosting Machine | lightgbm | [20] |
11 | K Neighbors Classifier | knn | [21] |
12 | Decision Tree Classifier | dt | [22] |
13 | Extreme Gradient Boosting | xgboost | [23] |
14 | Dummy Classifier | dummy | [15] |
15 | SVM Linear Kernel | svm | [14] |
4. User Notes
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- McRae, W.M.; Thomson, N.R. VLF Phase and Amplitude: Daytime Ionospheric Parameters. J. Atmos. Sol.-Terr. Phys. 2000, 62, 609–618. [Google Scholar] [CrossRef]
- Šulić, D.M.; Srećković, V.A.; Mihajlov, A.A. A Study of VLF Signals Variations Associated with the Changes of Ionization Level in the D-Region in Consequence of Solar Conditions. Adv. Space Res. 2016, 57, 1029–1043. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Arnaut, F.; Kolarski, A.; Srećković, V.A. Random Forest Classification and Ionospheric Response to Solar Flares: Analysis and Validation. Universe 2023, 9, 436. [Google Scholar] [CrossRef]
- Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J. Random Forests for Classification in Ecology. Ecology 2007, 88, 2783–2792. [Google Scholar] [CrossRef] [PubMed]
- Hatwell, J.; Gaber, M.M.; Azad, R.M.A. CHIRPS: Explaining Random Forest Classification. Artif. Intell. Rev. 2020, 53, 5747–5788. [Google Scholar] [CrossRef]
- Bartz-Beielstein, T.; Chandrasekaran, S.; Rehbach, F.; Zaefferer, M. Case Study I: Tuning Random Forest (Ranger). In Hyperparameter Tuning for Machine and Deep Learning with R; Springer Nature: Berlin/Heidelberg, Germany, 2023; pp. 187–220. [Google Scholar] [CrossRef]
- Ali, M. PyCaret: An Open Source, Low-Code Machine Learning Library in Python. PyCaret Version 1.0.0. 2020. Available online: https://www.pycaret.org (accessed on 1 October 2023).
- Batista, G.E.A.P.A.; Prati, R.C.; Monard, M.C. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
- Hasanin, T.; Khoshgoftaar, T. The Effects of Random Undersampling with Simulated Class Imbalance for Big Data. In Proceedings of the 2018 IEEE International Conference on Information Reuse and Integration (IRI), Salt Lake City, UT, USA, 6–9 July 2018. [Google Scholar] [CrossRef]
- Saripuddin, M.; Suliman, A.; Syarmila Sameon, S.; Jorgensen, B.N. Random Undersampling on Imbalance Time Series Data for Anomaly Detection. In Proceedings of the 4th International Conference on Machine Learning and Machine Intelligence, Hangzhou, China, 17–19 September 2021. [Google Scholar] [CrossRef]
- Hossin, M.; Sulaimani, M.N. A Review on Evaluation Metrics for Data Classification Evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1–11. [Google Scholar] [CrossRef]
- Joshi, M.V. On Evaluating Performance of Classifiers for Rare Classes. In Proceedings of the 2002 IEEE International Conference on Data Mining, Maebashi City, Japan, 9–12 December 2002. [Google Scholar] [CrossRef]
- Bonaccorso, G. Machine Learning Algorithms; Packt Publishing Ltd.: Birmingham, UK, 2017; pp. 1–368. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely Randomized Trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Friedman, J.H. Regularized Discriminant Analysis. J. Am. Stat. Assoc. 1989, 84, 165–175. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017; Volume 30. [Google Scholar]
- Cover, T.; Hart, P. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory. 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Quinlan, J.R. Induction of Decision Trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. Available online: https://arxiv.org/abs/1603.02754v3 (accessed on 14 October 2023).
- Hapgood, M. Societal and Economic Importance of Space Weather. In Machine Learning Techniques for Space Weather; Elsevier: Amsterdam, The Netherlands, 2018; pp. 3–26. [Google Scholar] [CrossRef]
- Kolarski, A.; Srećković, V.A.; Arnaut, F. Low Intensity Solar Flares’ Impact: Numerical Modeling. Contrib. Astron. Obs. Skaln. Pleso. 2023, 53, 176–187. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Arnaut, F.; Kolarski, A.; Srećković, V.A. Machine Learning Classification Workflow and Datasets for Ionospheric VLF Data Exclusion. Data 2024, 9, 17. https://doi.org/10.3390/data9010017
Arnaut F, Kolarski A, Srećković VA. Machine Learning Classification Workflow and Datasets for Ionospheric VLF Data Exclusion. Data. 2024; 9(1):17. https://doi.org/10.3390/data9010017
Chicago/Turabian StyleArnaut, Filip, Aleksandra Kolarski, and Vladimir A. Srećković. 2024. "Machine Learning Classification Workflow and Datasets for Ionospheric VLF Data Exclusion" Data 9, no. 1: 17. https://doi.org/10.3390/data9010017
APA StyleArnaut, F., Kolarski, A., & Srećković, V. A. (2024). Machine Learning Classification Workflow and Datasets for Ionospheric VLF Data Exclusion. Data, 9(1), 17. https://doi.org/10.3390/data9010017