Machine Learning-Based Boosted Regression Ensemble Combined with Hyperparameter Tuning for Optimal Adaptive Learning
Abstract
:1. Introduction
- We first give a detailed statistical analysis of the acquired throughput data through performance status reporting at the different user equipment terminal locations with respect to the tested communication distances from the transmitter.
- We provide performance benchmarking of adaptive learning capacities of different key machine-learning-based regression models with the choice regression model, which is the random forest.
- We propose an RF-LS-BPT regression model for improved dataset predictive modeling and learning.
- The proposed RF-LS-BPT regression model was applied in detailed, accurate throughput data modeling and learning using different performance indicators.
2. Theoretical Background
2.1. Random Forest (RF)
2.2. Exploratory Data Analysis Procedure
2.3. The RF Regression Model and Least-Squares Boosting (LS-Boost)
3. The Proposed Machine-Learning-Based Boosted Regression Ensemble Combined with Hyperparameter Tuning
3.1. 5G Throughput Measurement Campaign
3.2. The Proposed RF-LS-BPT Process
- Load the throughput datasets into MATLAB.
- Examine the datasets to obtain relevant insights.
- Presence of correlated features.
- Missing values and outliers.
- Preprocess the datasets to cater for the identified missing values and outliers.
- Transform the datasets RF-LS-BPT modeling format.
- Split the datasets into two, with 0.3 portions for testing and 0.7 portions for training.
- Engage the default RF ensemble fitting tool in MATLAB for the data training and testing.
- Evaluate the default RF ensemble fitting through data training and testing.
- Choose an appropriate RF aggregation technique. LS-Boost was chosen here.
- Identify the most relevant RF hyperparameters.
- Determine optimal values of the RF hyperparameters using the optimization option I MATLAB (‘OptimizeHyperparameters’, ‘auto’), which is based on the Bayesian optimization process.
- Optimize the RF Regression ensemble results using the cross-validation process.
- Build the final RF-LS-BPT model combing the LS-Boost algorithm with tuned optimal RF hyperparameter values.
- Engage the resultant RF-LS-BPT model on the entire throughput quality datasets.
- Test the resultant RF-LS-BPT using a 0.3 portion of the data and new data.
- Assess and report the predictive performance of the resulting RF-LS-BPT model.
Algorithm 1: Also, The RF Regression with least-squares boost (LS-Boost) is given in Algorithm 1. |
Input: Training set:. Learning rate value, v and Tree number, A, obtained through Bayesopt, Loss function, Output: Regression mode, Training Process: End |
3.3. Key Evaluation Metrics
4. Results and Discussion
4.1. Throughput Quality Status Analysis
4.2. Throughput Data Training and Testing Using Different Machine Learning Models with Their Default Parameters
4.3. Throughput Data Training and Testing Using Proposed RF-LS-BPT Model versus Standard RF Modeling Approach
4.4. Throughput Data Training and Testing Using LS-Boosting and Bagging
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
4/5/6G | Fourth/fifth/sixth generation |
AI | Artificial intelligence |
BOA | Bayesian optimization algorithm |
Bps | Bits per second |
COTS | Commercially available off-the-shelf |
DT | Decision tree |
EDA | Exploratory data analysis |
GLS | Generalized least squares |
GN | Gauss-Newton |
GP | Gaussian process |
GPR | Gaussian process regression |
GS | Grid search |
KNN | K-nearest neighbor |
LM | Levenberg–Marquart |
LTE | Long-term evolution |
LS | Least squares |
LS-Boost | Least-squares boosting |
MAE | Mean absolute error |
ML | Machine learning |
MLP-NN | Multilayer perceptron neural network |
NN | Neural network |
NRMSE | Normalized root mean squared error |
PE | Percentage error |
RF | Random forest |
RMSE | Root mean square error |
RS | Random search |
SVM | Support vector machine |
UET | User equipment terminal |
References
- Isabona, J. Joint Statistical and Machine Learning Approach for Practical Data-Driven Assessment of User Throughput Quality in Microcellular Radio Networks. Wirel. Pers. Commun. 2021, 119, 1661–1680. [Google Scholar]
- Imoize, A.L.; Orolu, K.; Atayero, A.A.-A. Analysis of key performance indicators of a 4G LTE network based on experimental data obtained from a densely populated smart city. Data Brief 2020, 29, 105304. [Google Scholar] [CrossRef] [PubMed]
- Singh, S.K.; Cha, J.; Kim, T.W.; Park, J.H. Machine learning based distributed big data analysis framework for next generation web in IoT. Comput. Sci. Inf. Syst. 2021, 18, 597–618. [Google Scholar] [CrossRef]
- Singh, S.K.; Salim, M.M.; Cha, J.; Pan, Y.; Park, J.H. Machine learning-based network sub-slicing framework in a sustainable 5g environment. Sustainability 2020, 12, 6250. [Google Scholar] [CrossRef]
- Shin, Z.; Moon, J.; Rho, S. A Comparative Analysis of Ensemble Learning-Based Classification Models for Explainable Term Deposit Subscription Forecasting. J. Soc. e-Bus. Stud. 2021, 26, 97–117. [Google Scholar]
- Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How many trees in a random forest. In International Workshop on Machine Learning and Data Mining in Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2012; pp. 154–168. [Google Scholar]
- Probst, P.; Boulesteix, A.-L. To tune or not to tune the number of trees in random forest. J. Mach. Learn. Res. 2017, 18, 6673–6690. [Google Scholar]
- Han, S.; Kim, H. Optimal feature set size in random forest regression. Appl. Sci. 2021, 11, 3428. [Google Scholar] [CrossRef]
- Han, S.; Kim, H.; Lee, Y.-S. Double random forest. Mach. Learn. 2020, 109, 1569–1586. [Google Scholar] [CrossRef]
- Gao, X.; Wen, J.; Zhang, C. An improved random forest algorithm for predicting employee turnover. Math. Probl. Eng. 2019, 2019, 4140707. [Google Scholar] [CrossRef]
- Malek, S.; Gunalan, R.; Kedija, S.; Lau, C.; Mosleh, M.A.; Milow, P.; Lee, S.; Saw, A. Random forest and Self Organizing Maps application for analysis of pediatric fracture healing time of the lower limb. Neurocomputing 2018, 272, 55–62. [Google Scholar] [CrossRef]
- Gomes, H.M.; Bifet, A.; Read, J.; Barddal, J.P.; Enembreck, F.; Pfharinger, B.B.; Holmes, G.; Abdessalem, T. Adaptive random forests for evolving data stream classification. Mach. Learn. 2017, 106, 1469–1495. [Google Scholar] [CrossRef]
- Bernard, S.; Heutte, L.; Adam, S. Influence of hyperparameters on random forest accuracy. In International Workshop on Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2009; pp. 171–180. [Google Scholar]
- Goldstein, B.A.; Polley, E.C.; Briggs, F.B.S. Random forests for genetic association studies. Stat. Appl. Genet. Mol. Biol. 2011, 10, 32. [Google Scholar] [CrossRef] [PubMed]
- Han, S.; Kim, H. On the optimal size of candidate feature set in random forest. Appl. Sci. 2019, 9, 898. [Google Scholar] [CrossRef] [Green Version]
- Ajani, T.S.; Imoize, A.L.; Atayero, A.A. An Overview of Machine Learning within Embedded and Mobile Devices—Optimizations and Applications. Sensors 2021, 21, 4412. [Google Scholar] [CrossRef] [PubMed]
- Kumar, L.J.S.; Krishnan, P.; Shreya, B.; Sudhakar, M.S. Performance enhancement of FSO communication system using machine learning for 5G/6G and IoT applications. Optik 2022, 252, 168430. [Google Scholar] [CrossRef]
- Tanveer, J.; Haider, A.; Ali, R.; Kim, A. An Overview of Reinforcement Learning Algorithms for Handover Management in 5G Ultra-Dense Small Cell Networks. Appl. Sci. 2022, 12, 426. [Google Scholar] [CrossRef]
- Mehlhose, M.; Schäufele, D.; Awan, D.A.; Marcus, G.; Binder, N.; Kasparick, M.; Cavalcante, R.L.G.; Stańczak, S.; Keller, A. Real-Time GPU-Accelerated Machine Learning Based Multiuser Detection for 5G and Beyond. arXiv 2022, arXiv:2201.05024. [Google Scholar]
- Kavitha, K.N.; Ashok, S.; Imoize, A.L.; Ojo, S.; Selvan, K.S.; Ahanger, T.A.; Alhassan, M. On the Use of Wavelet Domain and Machine Learning for the Analysis of Epileptic Seizure Detection from EEG Signals. J. Healthc. Eng. 2022, 2022, 8928021. [Google Scholar] [CrossRef]
- Rehman, E.; Haseeb-Ud-Din, M.; Malik, A.J.; Khan, T.K.; Abbasi, A.A.; Kadry, S.; Khan, M.A.; Rho, S. Intrusion detection based on machine learning in the internet of things, attacks and counter measures. J. Supercomput. 2022, 78, 8890–8924. [Google Scholar] [CrossRef]
- Talebi, H.; Peeters, L.J.M.; Otto, A.; Tolosana-Delgado, R. A truly spatial Random Forests algorithm for geoscience data analysis and modelling. Math. Geosci. 2022, 54, 1–22. [Google Scholar] [CrossRef]
- Peng, W.; Coleman, T.; Mentch, L. Rates of convergence for random forests via generalized U-statistics. Electron. J. Stat. 2022, 16, 232–292. [Google Scholar] [CrossRef]
- Kabudi, T.; Pappas, I.; Olsen, D.H. AI-enabled adaptive learning systems: A systematic mapping of the literature. Comput. Educ. Artif. Intell. 2021, 2, 100017. [Google Scholar] [CrossRef]
- Yedida, R.; Saha, S.; Prashanth, T. Lipschitzlr: Using theoretically computed adaptive learning rates for fast convergence. Appl. Intell. 2021, 51, 1460–1478. [Google Scholar] [CrossRef]
- Battiti, R. Accelerated backpropagation learning: Two optimization methods. Complex Syst. 1989, 3, 331–342. [Google Scholar]
- Castillo, G. Adaptive learning algorithms for Bayesian network classifiers. Ai Commun. 2008, 21, 87–88. [Google Scholar]
- Khan, M.A.; Tembine, H.; Vasilakos, A.V. Game dynamics and cost of learning in heterogeneous 4G networks. IEEE J. Sel. Areas Commun. 2011, 30, 198–213. [Google Scholar] [CrossRef]
- Pandey, B.; Janhunen, D.T. Adaptive Learning For Mobile Network Management. Master’s Thesis, Aalto University School of Science, Espoo, Finland, 2016. [Google Scholar]
- Li, X.; Cao, R.; Hao, J. An adaptive learning based network selection approach for 5G dynamic environments. Entropy 2018, 20, 236. [Google Scholar] [CrossRef] [Green Version]
- Narayanan, A.; Ramadan, E.; Carpenter, J.; Liu, Q.; Liu, Y.; Qian, F.; Zhang, Z.-L. A first look at commercial 5G performance on smartphones. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 894–905. [Google Scholar]
- Moodi, M.; Ghazvini, M.; Moodi, H. A hybrid intelligent approach to detect android botnet using smart self-adaptive learning-based PSO-SVM. Knowl.-Based Syst. 2021, 222, 106988. [Google Scholar] [CrossRef]
- Santana, Y.H.; Alonso, R.M.; Nieto, G.G.; Martens, L.; Joseph, W.; Plets, D. Indoor Genetic Algorithm-Based 5G Network Planning Using a Machine Learning Model for Path Loss Estimation. Appl. Sci. 2022, 12, 3923. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Ojo, S.; Imoize, A.; Alienyi, D. Radial basis function neural network path loss prediction model for LTE networks in multitransmitter signal propagation environments. Int. J. Commun. Syst. 2021, 34, e4680. [Google Scholar] [CrossRef]
- Kouhalvandi, L.; Matekovits, L. Multi-objective Optimization Methods for Passive and Active Devices in mm-Wave 5G Networks. In Printed Antennas for 5G Networks; Springer: New York, NY, USA, 2022; pp. 337–371. [Google Scholar]
- Du, L.; Gao, R.; Suganthan, P.N.; Wang, D.Z.W. Bayesian optimization based dynamic ensemble for time series forecasting. Inf. Sci. 2022, 591, 155–175. [Google Scholar] [CrossRef]
- Andrienko, N.; Andrienko, G. Exploratory Analysis of Spatial and Temporal Data: A Systematic Approach; Springer Science & Business Media: New York, NY, USA, 2006. [Google Scholar]
- Isabona, J.; Imoize, A.L. Terrain-based adaption of propagation model loss parameters using non-linear square regression. J. Eng. Appl. Sci. 2021, 68, 33. [Google Scholar] [CrossRef]
- Imoize, A.L.; Ibhaze, A.E.; Atayero, A.A.; Kavitha, K.V.N. Standard Propagation Channel Models for MIMO Communication Systems. Wirel. Commun. Mob. Comput. 2021, 2021, 36. [Google Scholar] [CrossRef]
- Bartlett, P.; Freund, Y.; Lee, W.S.; Schapire, R.E. Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Stat. 1998, 26, 1651–1686. [Google Scholar] [CrossRef]
- Isabona, J.; Ojuh, D. Adaptation of Propagation Model Parameters toward Efficient Cellular Network Planning using Robust LAD Algorithm. Int. J. Wirel. Microw. Technol. 2020, 10, 3–24. [Google Scholar] [CrossRef]
- Lan, Z.-C.; Huang, G.-Y.; Li, Y.-P.; Rho, S.; Vimal, S.; Chen, B.-W. Conquering insufficient/imbalanced data learning for the Internet of Medical Things. Neural Comput. Appl. 2022, 2022, 1–10. [Google Scholar] [CrossRef]
- Brain, D.; Webb, G.I. The need for low bias algorithms in classification learning from large data sets. In Principles of Data Mining and Knowledge Discovery; Springer: Berlin/Heidelberg, Germany, 2002; pp. 62–73. [Google Scholar]
- Brain, D.; Webb, G.I. On the effect of data set size on bias and variance in classification learning. In Proceedings of the Fourth Australian Knowledge Acquisition Workshop, University of New South Wales, Sydney, Australia, 16–22 October 1999; pp. 117–128. [Google Scholar]
Year | Reference | Focus and Coverage | Limitations | Comparison with This Paper |
---|---|---|---|---|
1989 | Battiti [26] | The work focuses on accelerated backpropagation learning, considering two optimization techniques. | There is a need to assess the performance of the models for networks with a large number of weights. | This paper presents a detailed statistical analysis of the acquired throughput data through performance status quality reporting at the different user equipment terminal locations. |
2008 | Castillo [27] | Adaptive learning algorithms for Bayesian network classifiers were projected. The work aims to handle the cost–performance trade-off and deals with concept drift. | The work did not provide adequate information on how to resolve the bottleneck challenges in a prequential learning framework as the training data increase over time. | The current work examined the performance of the projected learning-based models for 5G wireless networks using large-scale throughput data acquired from several network operators in the United States. |
2011 | Khan, Tembine, and Vasilakos [28] | The work presents game dynamics and the cost of learning in heterogeneous 4G networks. | The work provides numerical examples and OPNET simulations concerning network selection in WLAN and LTE. However, experimental validation of the numerical results is missing. | Our work presents performance benchmarking of adaptive learning capabilities of different machine-learning-based regression models based on the experimental 5G throughput data. |
2016 | Pandey and Janhunen [29] | The work presents a method based on reinforcement learning for automating parts of the management of mobile networks. | The work did not cover the concept of learning with partial observability and cooperative learning that considers the neighboring base stations. | Our work addresses the problem of learning with partial observability and cooperative learning by integrating the neighboring base stations based on the 5G data analyzed. |
2018 | Li, Cao and Hao [30] | The work presents an adaptive-learning-based network selection approach for 5G dynamic environments. The system enables users to adaptively adjust their selections in response to the gradually or abruptly changing environment. | Though the proposed approach enables a population of terminal users to adapt effectively to the network dynamics, experimental validation of the proposed approach is missing. | Our work proposed an RF-LS-BPT regression model for improved dataset predictive modeling and learning based on 5G experimental datasets. |
2020 | Narayanan et al. [31] | The work focuses on commercial 5G performance on smartphones using 5G networks of three carriers in three US cities. Additionally, the work explored the feasibility of using location and other environmental data to predict network performance. | The work developed practical and sound measurement methodologies for 5G networks on COTS smartphones but did not provide the learning-based models for the 5G performance measurements. | The current work projected learning-based models for improved dataset predictive modeling and learning based on the 5G throughput data. |
2021 | Moodi, Ghazvini, and Moodi [32] | The work considers a hybrid intelligent approach to detect android botnets using a smart self-adaptive-learning-based PSO-SVM. | The authors observed that one of the factors influencing the selection of important features of a dataset is the approach and the parameters used on that dataset. However, practical deployment of the projected hybrid intelligent approach was not considered. | An optimized RF-LS-BPT regression model was proposed for accurate throughput data modeling and learning using different performance indicators based on experimental datasets. |
2022 | Hervis Santana et al. [33] | The work examines the application of a machine-learning-based algorithm to approximate a complex 5G path loss prediction model. Specifically, the decision tree ensembles (bagging) algorithm was employed to build a generic model which was used to estimate the pathloss. | Time optimization for the feature (input) calculation process was not considered in this work. Experimental validation of the proposed model is also missing. Lastly, practical testing of the model for accurate wireless network planning is required. | The current work captured optimization for the features (inputs) variables and experimentally validated the proposed model using practical 5G throughput data. |
Distance (m) | Max. | Min. | Mean | Median | STD |
---|---|---|---|---|---|
25 | 2.35 × 10³ | 31.43 | 947.61 | 450.56 | 625.27 |
50 | 2.08 × 10³ | 335.54 | 925.27 | 734.02 | 604.43 |
75 | 2.07 × 10³ | 906.13 | 807.38 | 807.38 | 614.36 |
100 | 1.97 × 10³ | 10.49 | 855.26 | 718.17 | 540.10 |
160 | 1.99 × 10³ | 146.79 | 808.43 | 655.34 | 482.53 |
Hyperparameters | Best Grid Search Hyperparameter Values | Best Bayesian Search Hyperparameter Values |
---|---|---|
Learning Rate | 0.25 | 0.29025 |
Num. Trees | 52 | 23 |
MaxNumSplits | 32 | 195 |
Accuracy | 25 | 50 | 75 | 100 | 160 | |
---|---|---|---|---|---|---|
Optimized RF | MAE 1 | 2.40 | 0.42 | 0.86 | 2.95 | 4.24 |
Standard RF | MAE 2 | 9.24 | 5.47 | 6.58 | 7.84 | 12.56 |
Optimized RF | NRMSE 1 | 0007 | 0.0001 | 0.0027 | 0.0111 | 0.081 |
Standard RF | NRMSE 2 | 0.009 | 0.0045 | 0.0049 | 0.0117 | 0.02 |
Optimized RF | Rsq 21 | 0.9999 | 0.9999 | 0.9999 | 0.9986 | 0.9890 |
Standard RF | Rsq 22 | 0.9998 | 0.9998 | 0.9997 | 0.9983 | 0.9488 |
Accuracy | 25 | 50 | 75 | 100 | 160 | |
---|---|---|---|---|---|---|
Optimized RF | MAE 1 | 1.33 | 0.88 | 1.12 | 11.37 | 11.92 |
Standard RF | MAE 2 | 9.27 | 2.41 | 3.84 | 12.88 | 13.82 |
Optimized RF | NRMSE 1 | 0.0041 | 0.0025 | 0.0029 | 0.2700 | 0.0490 |
Standard RF | NRMSE 2 | 0.0043 | 0.0029 | 0.0035 | 0.2720 | 0.0494 |
Optimized RF | Rsq 21 | 0.9998 | 0.9999 | 0.9999 | 0.9926 | 0.9881 |
Standard RF | Rsq 22 | 0.9990 | 0.9977 | 0.9997 | 0.9920 | 0.9800 |
Accuracy | 25 | 50 | 75 | 100 | 160 | |
---|---|---|---|---|---|---|
Training (LS-Boosting) | MAE 1 | 1.71 | 0.66 | 0.72 | 2.73 | 5.21 |
Training (Bagging) | MAE 2 | 63.03 | 42.37 | 37.77 | 33.57 | 49.97 |
Training (LS Boosting) | NRMSE 1 | 0.0052 | 0.0012 | 0.0104 | 0.0102 | 0.0210 |
Training (Bagging) | NRMSE 2 | 0.0684 | 0.0500 | 0.0342 | 0.0353 | 0.0560 |
Training (LS Boosting) | Rsq | 0.9996 | 0.9999 | 0.9999 | 0.9986 | 0.9935 |
Training (Bagging) | Rsq | 0.9835 | 0.9984 | 0.9984 | 0.9883 | 0.9719 |
Accuracy | 25 | 50 | 75 | 100 | 160 | |
---|---|---|---|---|---|---|
Testing (LS-Boosting) | MAE 1 | 4.22 | 0.71 | 1.79 | 8.65 | 8.07 |
Testing (Bagging) | MAE 2 | 77.39 | 27.91 | 47.58 | 43.76 | 50.08 |
Testing (LS-Boosting) | NRMSE 1 | 0.012 | 0.0024 | 0.0047 | 0.024 | 0.0374 |
Testing (Bagging) | NRMSE 2 | 0.090 | 0.0032 | 0.0466 | 0.047 | 0.0696 |
Testing (LS Boosting) | Rsq1 | 0.9983 | 0.9999 | 0.9998 | 0.9947 | 0.9860 |
Testing (Bagging) | Rsq2 | 0.9935 | 0.9935 | 0.9935 | 0.9818 | 0.9654 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Isabona, J.; Imoize, A.L.; Kim, Y. Machine Learning-Based Boosted Regression Ensemble Combined with Hyperparameter Tuning for Optimal Adaptive Learning. Sensors 2022, 22, 3776. https://doi.org/10.3390/s22103776
Isabona J, Imoize AL, Kim Y. Machine Learning-Based Boosted Regression Ensemble Combined with Hyperparameter Tuning for Optimal Adaptive Learning. Sensors. 2022; 22(10):3776. https://doi.org/10.3390/s22103776
Chicago/Turabian StyleIsabona, Joseph, Agbotiname Lucky Imoize, and Yongsung Kim. 2022. "Machine Learning-Based Boosted Regression Ensemble Combined with Hyperparameter Tuning for Optimal Adaptive Learning" Sensors 22, no. 10: 3776. https://doi.org/10.3390/s22103776
APA StyleIsabona, J., Imoize, A. L., & Kim, Y. (2022). Machine Learning-Based Boosted Regression Ensemble Combined with Hyperparameter Tuning for Optimal Adaptive Learning. Sensors, 22(10), 3776. https://doi.org/10.3390/s22103776