A Comparison of Feature Selection and Forecasting Machine Learning Algorithms for Predicting Glycaemia in Type 1 Diabetes Mellitus
Abstract
:1. Introduction
- -
- A brief literature review on variable selection and prediction methods for glycaemia values in diabetics.
- -
- To use an innovative database in the field of DM1, both in terms of the number of patients/variables considered and the monitoring time covered.
- -
- To test different variable selection techniques.
- -
- To combine these feature selection techniques with different predictive algorithms.
- -
- To discuss the influence of the variable selection techniques on the performance of the predictive algorithm, as well as to study the accuracy achieved.
2. Related Works
3. Feature Selection and Forecasting Time Series
3.1. Feature Selection Techniques
3.2. Forecasting
4. Database, Available Features and Target to Be Forecasted
4.1. Description of the Experiment
4.2. Available Features and Targets to Be Forecasted
- -
- Glycaemia: A collection of previous measurements.
- -
- Insulin injections: Previous values for fast insulin doses. For diabetic patients this hormone, generated exogenously, is the primary controller of how far blood glucose levels will fall.
- -
- Meals: Previous values, as with insulin. It is noteworthy that the patient cohort all were experienced in counting their carbohydrates. All food by humans is converted and absorbed in the form of glucose, which is then released into the bloodstream, causing a virtually instantaneous rise in glycaemia.
- -
- Exercise: Relevant historical data, with measurements in terms of steps taken. The muscles demand more glucose during physical activity; physical activity also enhances the circulation of the blood, making insulin more effective during exercise as the cellular barriers have greater permeability, meaning glucose has easier access to the cells.
- -
- Heart rate: Contemporary and past values. The heart rate can be increased for a wide variety of reasons. Clearly it will rise during physical activity, but stress can be a contributor, as can hypo or hyperglycemia.
- -
- Sleep: We collected data that only showed whether the subject was awake or asleep. It would appear logical to register sleep as being related to the length of sleep previously enjoyed. Poor quality nighttime sleep may cause insulin resistance and imbalances in glucose dynamics.
5. Methodology
5.1. The Waikato Environment for Knowledge Analysis (WEKA)
5.2. Computer Hardware
5.3. Data Cleaning, Regularization and Lagged Variables
5.4. Features Selection
5.4.1. Search Method
5.4.2. Attribute Evaluators
- -
- Wrapper methods: employing the ClassifierAttributeEval routine within WEKA will permit evaluation or certain approaches. The predictors below will be executed.
- ○
- Linear Regression: This allows for swift computation, with coefficients being fixed for all features.
- ○
- Random Forest: As previously stated, this is a tree-based algorithm frequently employed for classification.
- ○
- Multilayer Perceptron (MLP): This algorithm makes an estimation of the relative contributions of input units (which represent the attributes) and the output neurons (those which correspond with the problem classes) and uses the information to identify a subset of pertinent usable attributes to be employed in supervised pattern classification [63].
- ○
- Instance-Based k-nearest neighbor algorithm (IBk) [64]: this is a K-nearest neighbor classifier that selects a suitable value for K on the basis of CV; it can also perform distance weighting.
- -
- Filter Methods. For univariate methods, we will employ the predictors listed below.
- ○
- Relief Attribute (Rlf) [65]: Relief feature selection works on the basis of creating a score by identifying feature value differences for nearest neighbor instance pairs.
- ○
- Principal Component Analysis (PCA) [66]: With this method, we introduce a novel set of orthogonal coordinate axes, simultaneously maximizing sample data variants. This makes other directions with more minor variants have less significance and so they can be cleaned from the dataset. PCA is extremely effective in transforming data at lower dimensions and can also show us simplified underlying data patterns.
5.4.3. Generated Subsets
5.5. Data Modeling and Forecasting
- -
- Linear Regression (LR)
- -
- Support Vector Machines (SVM)
- -
- Random Forest (RF)
- -
- Gaussian Process (GP)
6. Results and Discussion: Forecasting Performance
7. Conclusions and Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Fowler, M.J. Diabetes: Magnitude and Mechanisms. Clin. Diabetes 2007, 25, 25–28. [Google Scholar] [CrossRef] [Green Version]
- DeWitt, D.E.; Hirsch, I.B. Outpatient insulin therapy in type 1 and type 2 diabetes mellitus: Scientific review. JAMA 2003, 289, 2254–2264. [Google Scholar] [CrossRef]
- Davidson, M.B.; Davidson, M.B. Diabetes Mellitus: Diagnosis and Treatment; Saunders: Philadelphia, PA, USA, 1998. [Google Scholar]
- Sherr, J.L.; Tauschmann, M.; Battelino, T.; De Bock, M.; Forlenza, G.; Roman, R.; Hood, K.; Maahs, D.M. ISPAD Clinical Practice Consensus Guidelines 2018: Diabetes technologies. Pediatr. Diabetes 2018, 19, 302–325. [Google Scholar] [CrossRef]
- Westman, E.C.; Tondt, J.; Maguire, E.; Yancy, W.S., Jr. Implementing a low-carbohydrate, ketogenic diet to manage type 2 diabetes mellitus. Expert Rev. Endocrinol. Metab. 2018, 13, 263–272. [Google Scholar] [CrossRef] [PubMed]
- Kowalski, A. Can We Really Close the Loop and How Soon? Accelerating the Availability of an Artificial Pancreas: A Roadmap to Better Diabetes Outcomes. Diabetes Technol. Ther. 2009, 11, S113. [Google Scholar] [CrossRef]
- Nguyen, B.P.; Ho, Y.; Wu, Z.; Chui, C.-K. Implementation of model predictive control with modified minimal model on low-power RISC microcontrollers. In Proceedings of the Third Symposium on Virtual Reality Modeling Language-VRML, Monterey, CA, USA, 16–19 February 2012. [Google Scholar] [CrossRef]
- Chui, C.-K.; Nguyen, B.P.; Ho, Y.; Wu, Z.; Nguyen, M.; Hong, G.S.; Mok, D.; Sun, S.; Chang, S. Embedded Real-Time Model Predictive Control for Glucose Regulation. In XXVI Brazilian Congress on Biomedical Engineering; Springer Nature: Berlin, Germany, 2013; Volume 39, pp. 1437–1440. [Google Scholar]
- Eskaf, E.K.; Badawi, O.; Ritchings, T. Predicting blood glucose levels in diabetics using feature extraction and Artificial Neural Networks. In Proceedings of the 2008 3rd International Conference on Information and Communication Technologies: From Theory to Applications, Damascus, Syria, 7–11 April 2008; pp. 1–6. [Google Scholar]
- Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
- Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar] [CrossRef] [Green Version]
- Balakrishnan, S.; Narayanaswamy, R.; Savarimuthu, N.; Samikannu, R. SVM ranking with backward search for feature selection in type II diabetes databases. In Proceedings of the 2008 IEEE International Conference on Systems, Man and Cybernetics, Singapore, 12–15 October 2008; pp. 2628–2633. [Google Scholar]
- Tomar, D.; Agarwal, S. Hybrid Feature Selection Based Weighted Least Squares Twin Support Vector Machine Approach for Diagnosing Breast Cancer, Hepatitis, and Diabetes. Adv. Artif. Neural Syst. 2015, 2015, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Rodríguez-Rodríguez, I.; Rodríguez, J.-V.; Zamora-Izquierdo, M. Variables to Be Monitored via Biomedical Sensors for Complete Type 1 Diabetes Mellitus Management: An Extension of the “On-Board” Concept. J. Diabetes Res. 2018, 2018, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Rodríguez-Rodríguez, I.; Rodríguez, J.-V.; González-Vidal, A.; Zamora, M.; Rodríguez, R.; Vidal, G. Feature Selection for Blood Glucose Level Prediction in Type 1 Diabetes Mellitus by Using the Sequential Input Selection Algorithm (SISAL). Symmetry 2019, 11, 1164. [Google Scholar] [CrossRef] [Green Version]
- Rodríguez-Rodríguez, I.; Chatzigiannakis, I.; Rodríguez, J.-V.; Maranghi, M.; Gentili, M.; Zamora-Izquierdo, M. Utility of Big Data in Predicting Short-Term Blood Glucose Levels in Type 1 Diabetes Mellitus Through Machine Learning Techniques. Sensors 2019, 19, 4482. [Google Scholar] [CrossRef] [Green Version]
- Rodríguez-Rodríguez, I.; Rodríguez, J.V.; Molina-García-Pardo, J.M.; Zamora-Izquierdo, M.Á.; Rodríguez-Rodríguez, M.T.M.I.I.; Martínez-Inglés, M.T. A Comparison of Different Models of Glycemia Dynamics for Improved Type 1 Diabetes Mellitus Management with Advanced Intelligent Analysis in an Internet of Things Context. Appl. Sci. 2020, 10, 4381. [Google Scholar] [CrossRef]
- Xie, J.; Wang, Q. Benchmarking Machine Learning Algorithms on Blood Glucose Prediction for Type I Diabetes in Comparison with Classical Time-Series Models. IEEE Trans. Biomed. Eng. 2020, 67, 3101–3124. [Google Scholar] [CrossRef]
- Sun, S.; Zhang, G.; Wang, C.; Zeng, W.; Li, J.; Grosse, R. Differentiable compositional kernel learning for Gaussian processes. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4828–4837. [Google Scholar]
- Ortmann, L.; Shi, D.; Dassau, E.; Doyle, F.J.; Leonhardt, S.; Misgeld, B.J. Gaussian process-based model predictive control of blood glucose for patients with type 1 diabetes mellitus. In Proceedings of the 2017 11th Asian Control Conference (ASCC), Gold Coast, QLD, Australia, 17–20 December 2017. [Google Scholar]
- Ortmann, L.; Shi, D.; Dassau, E.; Doyle, F.J.; Misgeld, B.J.; Leonhardt, S. Automated Insulin Delivery for Type 1 Diabetes Mellitus Patients using Gaussian Process-based Model Predictive Control. In Proceedings of the 2019 American Control Conference (ACC), Philadelphia, PA, USA, 10–12 July 2019. [Google Scholar]
- Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning, 1st ed.; The MIT Press: Cambridge, MA, USA, 2016; pp. 33–77. [Google Scholar]
- Sage, A.J.; Genschel, U.; Nettleton, D. Tree aggregation for random forest class probability estimation. Stat. Anal. Data Min. 2020, 13, 134–150. [Google Scholar] [CrossRef]
- Xu, W.; Zhang, J.; Zhang, Q.; Wei, X. Risk prediction of type II diabetes based on random forest model. In Proceedings of the 2017 Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB), Chennai, India, 27–28 February 2017; pp. 382–386. [Google Scholar]
- Marling, C.; Xia, L.; Bunescu, R.; Schwartz, F. Machine Learning Experiments with Noninvasive Sensors for Hypoglycemia Detection. In Proceedings of the IJCAI Workshop on Knowledge Discovery in Healthcare Data, New York, NY, USA, 19–24 June 2016. [Google Scholar]
- Rodríguez-Rodríguez, I.; Zamora, M.Á.; Rodríguez, J.V. On predicting glycaemia in type 1 diabetes mellitus patients by using support vector machines. In Proceedings of the 1st International Conference on Internet of Things and Machine Learning, Liverpool, UK, 17–18 October 2017; pp. 1–2. [Google Scholar]
- Izonin, I.; Tkachenko, R.; Verhun, V.; Zub, K. An approach towards missing data management using improved GRNN-SGTM ensemble method. Eng. Sci. Technol. Int. J. 2020, in press. [Google Scholar] [CrossRef]
- Tkachenko, R.; Izonin, I.; Kryvinska, N.; Dronyuk, I.; Zub, K. An Approach towards Increasing Prediction Accuracy for the Recovery of Missing IoT Data based on the GRNN-SGTM Ensemble. Sensors 2020, 20, 2625. [Google Scholar] [CrossRef]
- Izonin, I.; Tkachenko, R.; Vitynskyi, P.; Zub, K.; Tkachenko, P.; Dronyuk, I. Stacking-based GRNN-SGTM Ensemble Model for Prediction Tasks. In Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), Zallaq, Bahrain, 8–9 November 2020; pp. 326–330. [Google Scholar]
- Guyon, I.; Elissee, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Sheikhpour, R.; Sarram, M.A.; Gharaghani, S.; Chahooki, M.A.Z. A Survey on semi-supervised feature selection methods. Pattern Recognit. 2017, 64, 141–158. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Tibshirani, R.J. Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv 2017, arXiv:1707.08692. [Google Scholar]
- Rodríguez-Rodríguez, I.; Rodríguez, J.V.; Pardo-Quiles, D.J.; Heras-González, P.; Chatzigiannakis, I. Modeling and Forecasting Gender-Based Violence through Machine Learning Techniques. Appl. Sci. 2020, 10, 8244. [Google Scholar]
- Karegowda, A.G.; Manjunath, A.S.; Jayaram, M.A. Feature Subset Selection Problem using Wrapper Approach in Supervised Learning. Int. J. Comput. Appl. 2010, 1, 13–17. [Google Scholar] [CrossRef]
- Yang, K.; Yoon, H.; Shahabi, C. A supervised feature subset selection technique for multivariate time series. In Proceedings of the Workshop on Feature Selection for Data Mining: Interfacing Machine Learning with Statistics, New Port Beach, CA, USA, 23 April 2005; pp. 92–101. [Google Scholar]
- Crone, S.F.; Kourentzes, N. Feature selection for time series prediction—A combined filter and wrapper approach for neural networks. Neurocomputing 2010, 73, 1923–1936. [Google Scholar] [CrossRef] [Green Version]
- Sánchez-Maroño, N.; Alonso-Betanzos, A.; Tombilla-Sanromán, M. Filter Methods for Feature Selection—A Comparative Study. In Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning, Guilin, China, 30 October–1 November 2017; pp. 178–187. [Google Scholar]
- Fonti, V.; Belitser, E. Feature Selection Using Lasso. VU Amst. Res. Pap. Bus. Anal. 2017, 30, 1–25. [Google Scholar]
- Zhang, H.; Zhang, R.; Nie, F.; Li, X. A Generalized Uncorrelated Ridge Regression with Nonnegative Labels for Unsupervised Feature Selection. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2781–2785. [Google Scholar]
- Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 2012, 34, 483–519. [Google Scholar] [CrossRef]
- Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. Distributed feature selection: An application to microarray data classification. Appl. Soft Comput. 2015, 30, 136–150. [Google Scholar] [CrossRef]
- Wolpert, D.; Macready, W. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef] [Green Version]
- Shmueli, G.; Lichtendahl, K.C., Jr. Practical Time Series Forecasting with r: A Hands-on Guide; Axelrod Schnall Publishers: Green Cove Springs, FL, USA, 2016. [Google Scholar]
- Faloutsos, C.; Gasthaus, J.; Januschowski, T.; Wang, Y. Forecasting big time series: Old and new. Proc. VLDB Endow. 2018, 11, 2102–2105. [Google Scholar] [CrossRef]
- Kalekar, P.S. Time Series Forecasting Using Holt-Winters Exponential Smoothing; Kanwal Rekhi School of Information Technology: Powai, Mumbai, 2004; pp. 1–13. [Google Scholar]
- Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Schölkopf, B.; Smola, A.J. A short introduction to learning with kernels. In Advanced Lectures on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2003; pp. 41–64. [Google Scholar]
- Kuhn, M.; Johnson, K. Applied Predictive Modeling, 1st ed.; Springer: New York, NY, USA, 2013; ISBN 978-1-4614-6848-6. [Google Scholar]
- Fierrez, J.; Morales, A.; Vera-Rodriguez, R.; Camacho, D. Multiple classifiers in biometrics. part 1: Fundamentals and review. Inf. Fusion 2018, 44, 57–64. [Google Scholar] [CrossRef]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
- Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How Many Trees in A Random Forest? In International Workshop on Machine Learning and Data Mining in Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2012; pp. 154–168. [Google Scholar]
- Blomqvist, K.; Kaski, S.; Heinonen, M. Deep Convolutional Gaussian Processes. In Proceedings of the Mining Data for Financial Applications, Ghent, Belgium, 14–18 September 2020; pp. 582–597. [Google Scholar]
- Rodríguez-Rodríguez, I.; Rodríguez, J.V.; Chatzigiannakis, I.; Zamora Izquierdo, M.Á. On the Possibility of Predicting Glycaemia ‘On the Fly’with Constrained IoT Devices in Type 1 Diabetes Mellitus Patients. Sensors 2019, 19, 4538. [Google Scholar] [CrossRef] [Green Version]
- Seeger, M. Gaussian processes for machine learning. Int. J. Neural Syst. 2004, 14, 69–106. [Google Scholar] [PubMed] [Green Version]
- Whelan, M.E.; Orme, M.; Kingsnorth, A.P.; Sherar, L.B.; Denton, F.L.; Esliger, D.W. Examining the Use of Glucose and Physical Activity Self-Monitoring Technologies in Individuals at Moderate to High Risk of Developing Type 2 Diabetes: Randomized Trial. JMIR Mhealth Uhealth 2019, 7, e14195. [Google Scholar] [CrossRef] [Green Version]
- Bondia, J.; Vehi, J. Physiology-Based Interval Models: A Framework for Glucose Prediction Under Intra-patient Variability. In Advances in Bioprocess Engineering and Technology; Springer Nature: Berlin, Germany, 2015; pp. 159–181. [Google Scholar]
- Garg, S.K.; Weinzimer, S.A.; Tamborlane, W.V.; Buckingham, B.A.; Bode, B.W.; Bailey, T.S.; Brazg, R.L.; Ilany, J.; Slover, R.H.; Anderson, S.M.; et al. Glucose Outcomes with the In-Home Use of a Hybrid Closed-Loop Insulin Delivery System in Adolescents and Adults with Type 1 Diabetes. Diabetes Technol. Ther. 2017, 19, 155–163. [Google Scholar] [CrossRef]
- Hussain, S.; Dahan, N.A.; Ba-Alwi, F.M.; Ribata, N. Educational Data Mining and Analysis of Students’ Academic Performance Using WEKA. Indones. J. Electr. Eng. Comput. Sci. 2018, 9, 447–459. [Google Scholar] [CrossRef]
- Kiranmai, S.A.; Laxmi, A.J. Data mining for classification of power quality problems using WEKA and the effect of attributes on classification accuracy. Prot. Control. Mod. Power Syst. 2018, 3, 29. [Google Scholar] [CrossRef]
- Lang, S.; Bravo-Marquez, F.; Beckham, C.; Hall, M.; Frank, E. WekaDeeplearning4j: A deep learning package for Weka based on Deeplearning4j. Knowl.-Based Syst. 2019, 178, 48–50. [Google Scholar] [CrossRef]
- Kotthoff, L.; Thornton, C.; Hoos, H.H.; Hutter, F.; Leyton-Brown, K. Auto-WEKA: Automatic Model Selection and Hyperparameter Optimization in WEKA. In Automated Machine Learning: Methods, Systems, Challenges; Hutter, F., Kotthoff, L., Vanschoren, J., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 81–95. ISBN 978-3-030-05318-5. [Google Scholar]
- Novakovic, J.; Strbac, P.; Bulatovic, D. Toward optimal feature selection using ranking methods and classification algorithms. Yugosl. J. Oper. Res. 2011, 21, 119–135. [Google Scholar] [CrossRef]
- Gasca, E.; Sánchez, J.; Alonso, R. Eliminating redundancy and irrelevance using a new MLP-based feature selection method. Pattern Recognit. 2006, 39, 313–315. [Google Scholar] [CrossRef]
- Aha, D.W.; Kibler, D.; Albert, M.K. Instance-based learning algorithms. Mach. Learn. 1991, 6, 37–66. [Google Scholar]
- Kononenko, I. Estimating Attributes: Analysis and Extensions of RELIEF. In Proceedings of the European Conference on Machine Learning, Catania, Italy, 6–8 April 1994; pp. 171–182. [Google Scholar]
- Abdi, H.; Williams, L.J. Principal component analysis. Wiley interdisciplinary reviews. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
- Bergmeir, C.; Benítez, J.M. On the use of cross-validation for time series predictor evaluation. Inf. Sci. 2012, 191, 192–213. [Google Scholar] [CrossRef]
- Snijders, T.A.B. On Cross-Validation for Predictor Evaluation in Time Series. In Lecture Notes in Economics and Mathematical Systems; Springer Nature: Berlin, Germany, 1988; Volume 307, pp. 56–69. [Google Scholar]
- Frank, E.; Hall, M.A.; Holmes, G.; Kirkby, R.B.; Pfahringer, B.; Witten, I.H.; Trigg, L. Weka-A Machine Learning Workbench for Data Mining. In Data Mining and Knowledge Discovery Handbook; Springer: Boston, MA, USA, 2009; pp. 1269–1277. [Google Scholar]
- Nguyen, B.P.; Tay, W.-L.; Chui, C.-K. Robust Biometric Recognition from Palm Depth Images for Gloved Hands. IEEE Trans. Hum.-Mach. Syst. 2015, 45, 799–804. [Google Scholar] [CrossRef]
- Dubosson, F.; Ranvier, J.-E.; Bromuri, S.; Calbimonte, J.-P.; Ruiz, J.; Schumacher, M. The open D1NAMO dataset: A multi-modal dataset for research on non-invasive type 1 diabetes management. Inform. Med. Unlocked 2018, 13, 92–100. [Google Scholar] [CrossRef]
- Woo, W.L.; Koh, B.H.; Gao, B.; Nwoye, E.O.; Wei, B.; Dlay, S.S. Early Warning of Health Condition and Visual Analytics for Multivariable Vital Signs. In Proceedings of the 2020 International Conference on Computing, Networks and Internet of Things, Sanya, China, 24–26 April 2020; pp. 206–211. [Google Scholar]
Population Feature | Value | ||
---|---|---|---|
Subjects (Number) | 25 | ||
Sex | 14 men–11 women | ||
Occupation | 16 students–9 office workers | ||
Population Feature | Median | Min | Max |
Age (years) | 24.51 | 18 | 56 |
Body Mass Index (BMI, kg/m2) | 22.20 | 19.42 | 24.80 |
Duration of diabetes (years). | 9 | 5 | 29 |
Fingersticks per day. | 7 | 5 | 12 |
Insulin units per day (fast insulin + slow insulin, median). | 47 | 36 | 59 |
HbA1C (%). | 6.8 | 6.3 | 7.8 |
Search Method | Attribute Evaluator | Predictor | Acronym |
---|---|---|---|
Ranker | Wrapper (Classifier) | LR | Rnk-LR |
RF | Rnk-RF | ||
MLP | Rnk-MLP | ||
IBk | Rnk-IBk | ||
Filter | Relief | Rnk-Rlf | |
PCA | Rnk-PCA |
Technique | Command |
---|---|
Ranker | weka.attributeSelection.Ranker -T -1.8E308 -N -1 |
Classifier LR | weka.attributeSelection.ClassifierAttributeEval -execution-slots 100 -B weka.classifiers.functions.LinearRegression -F 5 -T 0.01 -R 1 -E RMSE -- -S 0 -R 1.0E-8 -num-decimal-places 4” -S “weka.attributeSelection.Ranker -T -1.8E308 -N 100 |
Classifier RF | weka.attributeSelection.ClassifierAttributeEval -execution-slots 100 -B weka.classifiers.trees.RandomForest -F 5 -T 0.01 -R 1 -E RMSE -- -P 100 -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1” -S “weka.attributeSelection.Ranker -T -1.8E308 -N 100 |
Classifier MLP | weka.attributeSelection.ClassifierAttributeEval -execution-slots 1 -B weka.classifiers.functions.MultilayerPerceptron -F 5 -T 0.01 -R 1 -E RMSE -- -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a” -S “weka.attributeSelection.Ranker -T -1. 1.8E308 -N 100” |
Classifier IBk | weka.attributeSelection.ClassifierAttributeEval -execution-slots 1 –B weka.classifiers.lazy.IBk -F 5 -T 0.01 -R 1 -E RMSE -- -K 1 -W 0 -A \”weka.core.neighboursearch.LinearNNSearch -A \\\”weka.core.EuclideanDistance -R first-last\\\”\”“ -S “weka.attributeSelection.Ranker -T -1.8E308 -N 100” |
Rlf | “weka.attributeSelection.ReliefFAttributeEval -M -1 -D 1 -K 10” -S “weka.attributeSelection.Ranker -T -1.8E308 -N 100” |
PCA | weka.attributeSelection.PrincipalComponents -R 0.95 -A 5” -S “weka.attributeSelection.Ranker -T 1.8E308 -N -1 |
Technique | Command |
---|---|
LR | weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4 |
RF | weka.classifiers.trees.RandomForest -P 100 -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1 |
SVM | weka.classifiers.functions.SMOreg -C 1.0 -N 0 -I “weka.classifiers.functions.supportVector.RegSMOImproved -T 0.001 -V -P 1.0E-12 -L 0.001 -W 1” -K “weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C 250007” |
GP | weka.classifiers.functions.GaussianProcesses -L 1.0 -N 0 -K “weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C 250007” -S 1 |
RMSE (mg/dL) | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Subset FS | 5 | 10 | 15 | 20 | 25 | 30 | 35 | 40 | 45 | 50 | 55 | 60 | Stnd Dev. | |
Forecasting technique: LR | ||||||||||||||
No F.S. | 9.32 | 15.15 | 18.94 | 22.72 | 26.23 | 29.30 | 31.26 | 33.33 | 35.48 | 37.69 | 40.03 | 42.55 | 28.50 | 10.32 |
LR | 9.00 | 14.17 | 16.84 | 20.07 | 21.79 | 23.52 | 25.41 | 27.36 | 29.29 | 31.24 | 33.25 | 35.33 | 23.94 | 7.98 |
RF | 9.29 | 14.53 | 17.16 | 20.26 | 21.82 | 23.38 | 25.08 | 26.84 | 28.58 | 30.32 | 32.09 | 33.91 | 23.60 | 7.41 |
MLP | 9.22 | 14.44 | 17.09 | 20.25 | 21.89 | 23.51 | 25.26 | 27.06 | 28.83 | 30.61 | 32.42 | 34.29 | 23.74 | 7.57 |
IBk | 9.51 | 14.88 | 17.57 | 20.69 | 22.26 | 23.83 | 25.54 | 27.31 | 29.07 | 30.84 | 32.63 | 34.48 | 24.05 | 7.50 |
Rlf | 9.75 | 15.68 | 18.91 | 22.47 | 24.31 | 26.05 | 27.91 | 29.81 | 31.64 | 33.39 | 35.07 | 36.71 | 25.98 | 8.19 |
PCA | 9.53 | 14.97 | 17.77 | 21.04 | 22.77 | 24.42 | 26.20 | 28.11 | 29.92 | 31.81 | 33.75 | 35.74 | 24.67 | 7.89 |
24.93 | ||||||||||||||
Forecasting technique: RF | ||||||||||||||
No F.S. | 13.17 | 20.87 | 24.78 | 28.65 | 31.03 | 31.78 | 32.40 | 32.92 | 33.33 | 33.66 | 33.92 | 34.15 | 29.22 | 6.50 |
LR | 9.75 | 14.96 | 17.89 | 20.71 | 22.37 | 22.84 | 23.20 | 23.43 | 23.61 | 23.70 | 23.76 | 23.80 | 20.84 | 4.45 |
RF | 7.91 | 13.21 | 16.22 | 19.08 | 19.74 | 20.21 | 20.57 | 20.83 | 21.01 | 21.16 | 21.26 | 21.33 | 18.54 | 4.14 |
MLP | 8.88 | 14.18 | 17.14 | 19.97 | 21.68 | 22.18 | 22.53 | 22.78 | 22.94 | 23.04 | 23.10 | 23.10 | 20.13 | 4.52 |
IBk | 7.95 | 13.29 | 16.27 | 19.12 | 20.81 | 21.30 | 21.87 | 22.15 | 22.35 | 22.50 | 22.58 | 22.59 | 19.40 | 4.63 |
Rlf | 12.39 | 17.42 | 20.38 | 23.25 | 25.06 | 25.74 | 26.27 | 26.72 | 27.10 | 27.40 | 27.64 | 27.82 | 23.93 | 4.84 |
PCA | 11.93 | 17.10 | 20.08 | 21.35 | 22.80 | 23.80 | 24.67 | 25.03 | 25.65 | 26.17 | 26.56 | 26.85 | 22.67 | 4.47 |
22.10 | ||||||||||||||
Forecasting technique: SVM | ||||||||||||||
No F.S. | 2.38 | 9.16 | 16.35 | 20.10 | 22.62 | 25.13 | 27.69 | 29.84 | 32.11 | 34.25 | 36.43 | 38.63 | 24.56 | 11.07 |
LR | 1.99 | 7.64 | 13.53 | 16.45 | 18.37 | 20.39 | 22.51 | 24.27 | 26.11 | 27.85 | 29.65 | 31.48 | 20.02 | 8.96 |
RF | 2.33 | 5.70 | 11.64 | 14.57 | 16.50 | 18.52 | 20.63 | 22.37 | 24.19 | 25.92 | 27.71 | 29.52 | 18.30 | 8.55 |
MLP | 0.99 | 6.65 | 12.56 | 15.51 | 17.48 | 19.54 | 21.69 | 23.47 | 25.32 | 27.09 | 28.93 | 30.78 | 19.17 | 9.06 |
IBk | 3.56 | 7.73 | 13.66 | 16.57 | 18.48 | 20.50 | 22.61 | 24.36 | 26.18 | 27.92 | 29.73 | 31.57 | 20.24 | 8.69 |
Rlf | 4.02 | 8.44 | 14.82 | 18.02 | 20.14 | 22.30 | 24.40 | 25.98 | 27.62 | 29.12 | 30.65 | 32.16 | 21.47 | 8.82 |
PCA | 3.26 | 7.77 | 13.74 | 16.67 | 18.62 | 20.64 | 22.76 | 24.52 | 26.35 | 28.09 | 29.89 | 31.71 | 20.33 | 8.79 |
20.58 | ||||||||||||||
Forecasting technique: GP | ||||||||||||||
No F.S. | 15.96 | 26.16 | 37.01 | 43.79 | 45.80 | 47.29 | 47.62 | 47.98 | 48.31 | 48.66 | 49.03 | 49.43 | 42.25 | 10.68 |
LR | 12.08 | 25.82 | 31.17 | 33.37 | 34.34 | 34.79 | 35.02 | 35.15 | 35.23 | 35.28 | 35.32 | 35.35 | 31.91 | 6.83 |
RF | 5.32 | 17.40 | 22.37 | 24.49 | 25.43 | 25.86 | 26.07 | 26.18 | 26.24 | 26.28 | 26.30 | 26.32 | 23.19 | 6.20 |
MLP | 7.11 | 19.96 | 25.26 | 27.51 | 28.51 | 28.97 | 29.19 | 29.31 | 29.38 | 29.42 | 29.45 | 29.47 | 26.13 | 6.60 |
IBk | 10.30 | 23.40 | 28.52 | 30.64 | 31.57 | 32.01 | 32.24 | 32.36 | 32.43 | 32.47 | 32.50 | 32.52 | 29.25 | 6.53 |
Rlf | 7.84 | 24.73 | 32.56 | 36.34 | 38.26 | 39.28 | 39.84 | 40.17 | 40.38 | 40.51 | 40.60 | 40.66 | 35.10 | 9.78 |
PCA | 15.62 | 24.24 | 27.85 | 29.37 | 30.02 | 30.31 | 30.44 | 30.50 | 30.53 | 30.54 | 30.55 | 30.55 | 28.38 | 4.42 |
30.89 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rodríguez-Rodríguez, I.; Rodríguez, J.-V.; Woo, W.L.; Wei, B.; Pardo-Quiles, D.-J. A Comparison of Feature Selection and Forecasting Machine Learning Algorithms for Predicting Glycaemia in Type 1 Diabetes Mellitus. Appl. Sci. 2021, 11, 1742. https://doi.org/10.3390/app11041742
Rodríguez-Rodríguez I, Rodríguez J-V, Woo WL, Wei B, Pardo-Quiles D-J. A Comparison of Feature Selection and Forecasting Machine Learning Algorithms for Predicting Glycaemia in Type 1 Diabetes Mellitus. Applied Sciences. 2021; 11(4):1742. https://doi.org/10.3390/app11041742
Chicago/Turabian StyleRodríguez-Rodríguez, Ignacio, José-Víctor Rodríguez, Wai Lok Woo, Bo Wei, and Domingo-Javier Pardo-Quiles. 2021. "A Comparison of Feature Selection and Forecasting Machine Learning Algorithms for Predicting Glycaemia in Type 1 Diabetes Mellitus" Applied Sciences 11, no. 4: 1742. https://doi.org/10.3390/app11041742
APA StyleRodríguez-Rodríguez, I., Rodríguez, J. -V., Woo, W. L., Wei, B., & Pardo-Quiles, D. -J. (2021). A Comparison of Feature Selection and Forecasting Machine Learning Algorithms for Predicting Glycaemia in Type 1 Diabetes Mellitus. Applied Sciences, 11(4), 1742. https://doi.org/10.3390/app11041742