Using ARIMA to Predict the Growth in the Subscriber Data Usage
Abstract
:1. Introduction
1.1. Research Question
- Which forecasting model between ARIMA and CNN is effective in predicting subscriber data usage?
1.2. Research Contribution
2. Related Works
2.1. Stationarity
- The mean and standard deviation remain constant over time.
- The dataset exhibits no seasonal patterns. Seasonality refers to any predictable pattern or variation that repeats for a year [7].
2.2. Background
2.3. Time Series Data Limitation
3. Materials and Methods
3.1. Mathematical Formulation of ARIMA
- the number of auto-regressive terms is p,
- nonseasonal differences denoted by d,
- and the number of lagged predictive biases denoted by q.
3.2. Experimental Datasets
3.3. Feature Engineering and Data Cleaning
- 1.
- Data normalization. The process of normalization is frequently used to prepare data for ML. The objective of normalization is to convert numerical columns to a common scale without losing information or distorting the ranges of values. This will reduce bias towards accurate prediction [40]. We have used Python Sklearn’s MinMax Scaler function to normalize the throughput column down to values between 0 and 1.
- 2.
- Feature engineering. Also known as feature extraction, is a process of selecting and transforming the most important features from the data to utilize for developing predictive models using statistical or ML models [39]. Concerning the subscriber dataset, only the throughput and timestamp columns will be used to model the training and testing sets.
- 3.
- Train-test split. It is a method for validating models that enables one to simulate how a model would behave when presented with fresh untested data [47]. In our analysis, the training data will be split into k-fold cross-validation to avoid under/over-fitting. However, cross-validation is not necessary when the data is small.
3.4. Stationarity of Data
- 1.
- Visual and graphical inspection. This is implemented by plotting the functions of the time-series dataset, and then inspecting visually whether the dataset is stationary or not, but the method is prone to inaccuracy.
- 2.
- Statistical Augmented Dickey-Fuller test. Named after famous statisticians David Dickey and Wayne Fuller, the Dickey-Fuller test is a more accurate stationarity test method that determines if a time series dataset is stationary by calculating the p-value to test the null hypothesis. The Dickey-Fuller null hypothesis is that the data is not stationary. If the p-value is more than 0.05, then there is strong support for the null hypothesis, and thus the time series dataset will be deemed to be non-stationary. Python’s stats model library was used to perform this task by importing the fuller functionality.
3.5. The UGRansome Characteristics
3.6. Exploratory Techniques
- Standardized residual (). It measures the strength of actual and predicted values and indicates the significance of features [50] ( facilitates the recognition of patterns that contribute the most to the predictive values):
- Normal Q-Q. The normal Q-Q means normal Quantile-Quantile. It is a plot that compares actual and theoretical quantiles [50]. The metric considers the range of random variables to plot normal Q-Q using a probabilistic computation. The x-axis represents the Z-score of the standardized normal distribution, but different formulations have been proposed in the literature to detect the plotting positions:
- Correlogram. It is a correlational and statistical chart used in TSA to plot the auto-correlations sample versus the timestamp lags h to check for randomness [50]. The correlation is zero when randomness is detected. Equation (10) denotes the auto-correlation parameter at h lag:
- Augmented Dickey-Fuller (ADF) test. This statistical metric tests the stationarity of time series data [50] by using a unit root metric that exists in a series of observations where as per the below Equation (11).Here represents the time series values at time t, but is a separate time series variable.
- Theoretical quantile. The theoretical Q-Q explores the variable’s deviation from theoretical distributions to visually evaluate if the ratio is significantly different for EDA purposes [50].
- Likelihood. The likelihood parameter maps or given by or . This metric computes the most probable value assigned to a specific feature using as the hypothesis in and spaces. Inputs x compute the predictive values y using a predefined parameter. With this, the likelihood represents the quantile probability (Prob (Q)) of correlated features used for forecasting.
- Kurtosis. This metric evaluates the probability of the predicted variables by describing the probability proportion. There are various techniques to compute the theoretical distribution of Kurtosis, and there are subjective manners of approximating it with relevant samples [50]. With Kurtosis results, higher values determine the presence of outliers. The Kurtosis is as follows:
- Jarque-Bera (JB) test. This metric uses a Lagrange multiplier to test for data normality. The JB value tests if the distribution is normal by testing the Kurtosis to determine if features have a normal distribution. A normal JB distribution will have symmetrical Kurtosis indicating the peaked in the distribution. We formulate the JB test as follows:
- Heteroscedasticity. It checks the alternative hypothesis () versus the null hypothesis () [50]. With the alternative hypothesis, the empirical error is multiplying the function of various variables:However, a null hypothesis has equal error variances (homoscedasticity) [50]:
- Accuracy. The balanced accuracy of the ARIMA model is calculated with the following mathematical formulation [47]:
3.7. Feature Extraction
3.8. Model Training and Testing
3.9. Model Tuning
3.10. ARIMA Predictor Model
- Phase 1: Preprocessing. Subdivide the time series data into the K sub-series.
- Phase 2: Modeling. Train the algorithm using sub-series data by assuming that the IDS of the sub-series remains constant.
- Phase 3: Linear transformation. Translate the trained algorithm in phase 2 into linear representations k.
- Phase 4: Estimator combination. The obtained local estimator from phase 3 minimizes global losses parameters described in Section 3.1.
3.11. Computational Environment
3.12. Feature Extraction
4. Results
4.1. Dickey Fuller Test
- def forecast(ARIMA model, periods = 0):
- n periods = periods
- fitted, interval = ARIMA model.predict(n periods = n periods, interval = True)
- index = pd.date range(df.index[1], periods = n periods, freq = H)
- fitted series = pd.Series(fitted, index = index)
- lower series = pd.Series(interval[:, 0], index)
- upper series = pd.Series(interval[:, 1], index)
- plt.plot(fitted series)
- plt.fill between(lower series.index,
- lower series,
- upper series,
- alpha = 0.15)
- plt.show()
- forecast(ARIMA model, 730)
- ARIMA model.summary( )
4.2. The Comparative Results of ARIMA and CNN
- A small p-value (typically <0.05) indicates strong evidence against the null hypothesis.
- A large p-value (>0.05) indicates weak evidence against the null hypothesis.
- The p-values very close to the cutoff (0.05) are considered to be marginal.
4.3. Execution Speed Test
4.4. ARIMA, CNN, BATS, and TBATS Comparison
4.5. Recommendation
- 1.
- Python is unfortunately not the fastest language, and it is thus recommended to build ML models using low-level programming languages such as C, and C++ for faster processing times.
- 2.
- Running ML models locally via a Central Processing Unit (CPU) or Graphics Processing Unit (GPU) can slow down the learning process due to memory limits, and it is thus recommended to run models using cloud technologies such as Google Colab or Amazon Web Services (AWS) and SageMaker where memory can be defined before the model is run.
- 3.
- Decreasing the number of neurons that makes up the model can reduce the processing time. However, this will be at the expense of model accuracy since fewer neurons result in model underfitting with poor performance when new data is introduced.
- 4.
- Similar to decreasing the number of neurons, decreasing the number of epochs will also reduce the final model run time, however, at the expense of accuracy since reducing the number of epochs results in underfitting the model.
5. Conclusions and Discussion
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
ADFT | Augmented Dickey-Fuller Test |
AWS | Amazon Web Services |
ARIMA | Auto-Regressive Integrated Moving Average |
AR | Auto-Regressive |
CCA | Canonical Correlation Analysis |
CPU | Central Processing Unit |
CSV | Comma-Separated Values |
CNN | Convolutional Neural Network |
DL | Deep Learning |
DPI | Deep Packet Inspection |
DT | Decision Trees |
DTW | Dynamic Time Warping |
EDA | Exploratory Data Analysis |
GRU | Gated Recurrent Unit |
GD | Gradient Descent |
GPU | Graphics Processing Unit |
Tpt in | Incoming Throughput |
IP | Internet Protocol |
IDS | Insights Data Storage |
KPI | Key Performance Indicator |
KNN | K-Nearest Neighbor |
LDA | Linear Discriminant Analysis |
LSTM | Long Short Term Memory |
ML | Machine Learning |
MSE | Mean Squared Error |
MA | Moving Average |
MAPE | Mean Absolute Percentage Error |
Ms | Milliseconds |
NN | Neural Networks |
NSDM | Network Subscriber Data Management |
OS | Operating System |
Tpt out | Outgoing Throughput |
PCA | Principal Components Analysis |
QoE | Quality of Experience |
Prob (Q) | Quantile Probability |
RF | Random Forest |
ReLU | Rectified Linear Unit |
RNN | Recurrent Neural Networks |
RMSE | Root Mean Square Error |
SARIMA | Seasonal ARIMA |
SQL | Structured Query Language |
SVM | Support Vector Machine |
TSA | Time Series Analysis |
Ts | Timestamps |
UDP | User Datagram Protocol |
References
- Nkongolo, M.; van Deventer, J.P.; Kasongo, S.M.; van der Walt, W. Classifying Social Media Using Deep Packet Inspection Data. In Inventive Communication and Computational Technologies; Ranganathan, G., Fernando, X., Rocha, Á., Eds.; Springer: Singapore, 2023; pp. 543–557. [Google Scholar] [CrossRef]
- Theodoridis, G.; Tsadiras, A. Applying machine learning techniques to predict and explain subscriber churn of an online drug information platform. Neural Comput. Appl. 2022, 34, 19501–19514. [Google Scholar] [CrossRef]
- Kumar, R.; Kumar, P.; Kumar, Y. Multi-step time series analysis and forecasting strategy using ARIMA and evolutionary algorithms. Int. J. Inf. Technol. 2022, 14, 359–373. [Google Scholar] [CrossRef]
- Li, X.; Petropoulos, F.; Kang, Y. Improving forecasting by subsampling seasonal time series. Int. J. Prod. Res. 2022, 1–17. [Google Scholar] [CrossRef]
- Jin, X.B.; Gong, W.T.; Kong, J.L.; Bai, Y.T.; Su, T.L. A variational Bayesian deep network with data self-screening layer for massive time-series data forecasting. Entropy 2022, 24, 335. [Google Scholar] [CrossRef] [PubMed]
- Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
- Adhikari, R.; Agrawal, R.K. An introductory study on time series modeling and forecasting. arXiv 2013, arXiv:1302.6613. [Google Scholar]
- Khashei, M.; Bijari, M. A novel hybridization of artificial neural networks and ARIMA models for time series forecasting. Appl. Soft Comput. 2011, 11, 2664–2675. [Google Scholar] [CrossRef]
- Azaria, B.; Gottlieb, L.A. Predicting Subscriber Usage: Analyzing Multidimensional Time-Series Using Convolutional Neural Networks. In Cyber Security, Cryptology, and Machine Learning; Dolev, S., Katz, J., Meisels, A., Eds.; Springer: Cham, Switzerland, 2022; pp. 259–269. [Google Scholar]
- Salman, A.G.; Heryadi, Y.; Abdurahman, E.; Suparta, W. Weather forecasting using merged long short-term memory model. Bull. Electr. Eng. Inform. 2018, 7, 377–385. [Google Scholar] [CrossRef]
- Masum, S.; Liu, Y.; Chiverton, J. Multi-step time series forecasting of electric load using machine learning models. In Proceedings of the International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, 3–7 June 2018; pp. 148–159. [Google Scholar]
- Siami-Namini, S.; Tavakoli, N.; Namin, A.S. A comparison of ARIMA and LSTM in forecasting time series. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 1394–1401. [Google Scholar]
- Muhammad, U.L.; Musa, M.Y.; Usman, Y.; Nasir, A.B. Limestone as solid mineral to develop national economy. Am. J. Phys. Chem. 2018, 7, 23–28. [Google Scholar] [CrossRef] [Green Version]
- Mbah, T.J.; Ye, H.; Zhang, J.; Long, M. Using LSTM and ARIMA to simulate and predict limestone Price variations. Min. Metall. Explor. 2021, 38, 913–926. [Google Scholar] [CrossRef]
- Tan, C.W.; Bergmeir, C.; Petitjean, F.; Webb, G.I. Time series extrinsic regression. arXiv 2020, arXiv:2006.12672. [Google Scholar] [CrossRef]
- Goldsmith, J.; Scheipl, F. Estimator selection and combination in scalar-on-function regression. Comput. Stat. Data Anal. 2014, 70, 362–372. [Google Scholar] [CrossRef]
- Pimentel, M.A.; Charlton, P.H.; Clifton, D.A. Probabilistic estimation of respiratory rate from wearable sensors. In Wearable Electronics Sensors; Springer: Cham, Switzerland, 2015; pp. 241–262. [Google Scholar] [CrossRef] [Green Version]
- Zheng, Y.; Liu, Q.; Chen, E.; Ge, Y.; Zhao, J.L. Time series classification using multi-channels deep convolutional neural networks. In Web-Age Information Management; Springer: Cham, Switzerland, 2014; pp. 298–310. [Google Scholar] [CrossRef]
- Yang, J.; Nguyen, M.N.; San, P.P.; Li, X.L.; Krishnaswamy, S. Deep convolutional neural networks on multichannel time series for human activity recognition. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
- Okita, T.; Inoue, S. Recognition of multiple overlapping activities using compositional CNN-LSTM model. In Proceedings of the Proceedings of the 2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2017 ACM International Symposium on Wearable Computers, Maui, HI, USA, 11–15 September 2017; pp. 165–168. [Google Scholar] [CrossRef]
- Wang, J.; Long, Q.; Liu, K.; Xie, Y. Human action recognition on cellphone using compositional bidir-lstm-cnn networks. In Proceedings of the 2019 International Conference on Computer, Network, Communication and Information Systems (CNCI 2019), Qingdao, China, 27–29 March 2019; pp. 687–692. [Google Scholar]
- Snow, D. AtsPy: Automated Time Series Forecasting in Python. 2020. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3580631 (accessed on 27 December 2022).
- Mode, G.R.; Hoque, K.A. Adversarial examples in deep learning for multivariate time series regression. In Proceedings of the 2020 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 13–15 October 2020; pp. 1–10. [Google Scholar] [CrossRef]
- Antsfeld, L.; Chidlovskii, B.; Borisov, D. Magnetic sensor based indoor positioning by multi-channel deep regression. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems, Virtual, 16–19 November 2020; pp. 707–708. [Google Scholar] [CrossRef]
- Mehtab, S.; Sen, J.; Dasgupta, S. Robust analysis of stock price time series using CNN and LSTM-based deep learning models. In Proceedings of the 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 5–7 November 2020; pp. 1481–1486. [Google Scholar] [CrossRef]
- Mirko, K.; Kantelhardt, J.W. Hadoop. TS: Large-scale time-series processing. Int. J. Comput. Appl. 2013, 74, 1–8. [Google Scholar]
- Li, L.; Noorian, F.; Moss, D.J.; Leong, P.H. Rolling window time series prediction using MapReduce. In Proceedings of the 2014 IEEE 15th international Conference on Information Reuse and Integration (IEEE IRI 2014), Redwood City, CA, USA, 13–15 August 2014; pp. 757–764. [Google Scholar] [CrossRef] [Green Version]
- Talavera-Llames, R.; Pérez-Chacón, R.; Troncoso, A.; Martínez-Álvarez, F. Big data time series forecasting based on nearest neighbours distributed computing with Spark. Knowl.-Based Syst. 2018, 161, 12–25. [Google Scholar] [CrossRef]
- Galicia, A.; Torres, J.F.; Martínez-Álvarez, F.; Troncoso, A. A novel Spark-based multi-step forecasting algorithm for big data time series. Inf. Sci. 2018, 467, 800–818. [Google Scholar] [CrossRef]
- Petropoulos, F.; Apiletti, D.; Assimakopoulos, V.; Babai, M.Z.; Barrow, D.K.; Taieb, S.B.; Bergmeir, C.; Bessa, R.J.; Bijak, J.; Boylan, J.E.; et al. Forecasting: Theory and practice. Int. J. Forecast. 2022, 38, 705–871. [Google Scholar] [CrossRef]
- Shamir, O.; Srebro, N.; Zhang, T. Communication-efficient distributed optimization using an approximate newton-type method. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1000–1008. [Google Scholar]
- Wang, J.; Kolar, M.; Srebro, N.; Zhang, T. Efficient distributed learning with sparsity. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 3636–3645. [Google Scholar]
- Jordan, M.I.; Lee, J.D.; Yang, Y. Communication-efficient distributed statistical inference. J. Am. Stat. Assoc. 2018, 114, 668–681. [Google Scholar] [CrossRef] [Green Version]
- Chen, X.; Liu, W.; Zhang, Y. Quantile regression under memory constraint. Ann. Stat. 2019, 47, 3244–3273. [Google Scholar] [CrossRef] [Green Version]
- Ryu, E.K.; Yin, W. Large-Scale Convex Optimization; Cambridge University Press: Cambridge, MA, USA, 2022. [Google Scholar]
- Challu, C.; Olivares, K.G.; Oreshkin, B.N.; Garza, F.; Mergenthaler, M.; Dubrawski, A. N-hits: Neural hierarchical interpolation for time series forecasting. arXiv 2022, arXiv:2201.12886. [Google Scholar]
- Fernández, J.D.; Menci, S.P.; Lee, C.M.; Rieger, A.; Fridgen, G. Privacy-preserving federated learning for residential short-term load forecasting. Appl. Energy 2022, 326, 119915. [Google Scholar] [CrossRef]
- Bennett, S.; Clarkson, J. Time series prediction under distribution shift using differentiable forgetting. arXiv 2022, arXiv:2207.11486. [Google Scholar]
- Nkongolo, M.; van Deventer, J.P.; Kasongo, S.M. The Application of Cyclostationary Malware Detection Using Boruta and PCA. In Computer Networks and Inventive Communication Technologies; Smys, S., Lafata, P., Palanisamy, R., Kamel, K.A., Eds.; Springer: Singapore, 2023; pp. 547–562. [Google Scholar] [CrossRef]
- Nkongolo, M.; Van Deventer, J.P.; Kasongo, S.M.; Zahra, S.R.; Kipongo, J. A Cloud Based Optimization Method for Zero-Day Threats Detection Using Genetic Algorithm and Ensemble Learning. Electronics 2022, 11, 1749. [Google Scholar] [CrossRef]
- Nkongolo, M.; van Deventer, J.P.; Kasongo, S.M. UGRansome1819: A Novel Dataset for Anomaly Detection and Zero-Day Threats. Information 2021, 12, 405. [Google Scholar] [CrossRef]
- Ghaderi, A.; Movahedi, Z. Joint Latency and Energy-aware Data Management Layer for Industrial IoT. In Proceedings of the 2022 8th International Conference on Web Research (ICWR), Tehran, Iran, 11–12 May 2022; pp. 70–75. [Google Scholar] [CrossRef]
- Mehdi, H.; Pooranian, Z.; Vinueza Naranjo, P.G. Cloud traffic prediction based on fuzzy ARIMA model with low dependence on historical data. Trans. Emerg. Telecommun. Technol. 2022, 33, e3731. [Google Scholar] [CrossRef]
- Xiao, R.; Feng, Y.; Yan, L.; Ma, Y. Predict stock prices with ARIMA and LSTM. arXiv 2022, arXiv:2209.02407. [Google Scholar]
- Wang, X.; Kang, Y.; Hyndman, R.J.; Li, F. Distributed ARIMA models for ultra-long time series. Int. J. Forecast. 2022; in press. [Google Scholar] [CrossRef]
- Chao, H.L.; Liao, W. Fair scheduling in mobile ad hoc networks with channel errors. IEEE Trans. Wirel. Commun. 2005, 4, 1254–1263. [Google Scholar] [CrossRef] [Green Version]
- Nkongolo, M. Classifying search results using neural networks and anomaly detection. Educor Multidiscip. J. 2018, 2, 102–127. [Google Scholar]
- Suthar, F.; Patel, N.; Khanna, S. A Signature-Based Botnet (Emotet) Detection Mechanism. Int. J. Eng. Trends Technol. 2022, 70, 185–193. [Google Scholar] [CrossRef]
- Kotu, V.; Deshpande, B. Chapter 3—Data Exploration. In Data Science, 2nd ed.; Kotu, V., Deshpande, B., Eds.; Morgan Kaufmann: Burlington, MA, USA, 2019; pp. 39–64. [Google Scholar] [CrossRef]
- Ij, H. Statistics versus machine learning. Nat Methods 2018, 15, 233. [Google Scholar]
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
Source | Model | Limitation |
---|---|---|
[15] | RF & GD | Data understanding |
[17] | Auto Regressive | Seasonality |
[22] | CNN | Seasonality |
[18] | DTW & KNN | Feature engineering |
[19] | CNN | Back propagation |
[20] | RNN & LSTM | Classification |
[21] | LSTM & CNN | Classification |
[24] | LSTM | Feature engineering |
[25] | LSTM & CNN | Execution time |
[23] | CNN | Biases |
[10] | LSTM & ARIMA | Weather forecasting |
[11] | LSTM & ARIMA | Electricity load forecasting |
[12] | LSTM & ARIMA | Stock market prediction |
[14] | LSTM & ARIMA | Limestone price prediction |
[3] | ABC & ARIMA | Refineries |
[9] | CNN | Subscriber usage |
Attack | Feature | Total |
---|---|---|
Blacklist | Timestamp | 2761 |
Spam | IP address | 7425 |
Scan | Flag | 1559 |
SSH | Prediction | 7293 |
Botnet | Threats | 4765 |
Total | - | 23,803 |
Hyper-Parameter | Value |
---|---|
Number of neurons | 50 |
Activation function | Tanh |
Number of dense layers | 6 |
Optimizer | Adam |
Batch size | 2 |
Loss function | Mean Squared Error (MSE) |
Number of epochs | 50 |
Node | Specification |
---|---|
RAM | 39 GB |
Service | Jupyter & DBeaver |
ML algorithm | ARIMA & CNN |
System | 64-bits |
Processor | 2.60 GHz |
Dataset | Subscriber data & UGRansome |
Operating System (OS) | Windows & Linux |
CPU | Intel i7-10 |
Language | Python & SQL |
Number | Attribute | Description | Type |
---|---|---|---|
1 | Timestamp | Traffic duration | Numeric |
2 | Protocol | Communication rule | Categorical |
3 | Flag | Network state | Categorical |
4 | IP address | Unique address | Categorical |
5 | Network traffic | Periodic network flow | Numeric |
6 | Threat | Novel malware | Categorical |
7 | Port | Communication port | Numeric |
8 | Expended address | Malware address | Categorical |
9 | Seed address | Malware address | Categorical |
10 | Cluster | Group assigned | Numeric |
11 | Ransomware | Novel malware | Categorical |
12 | Prediction | Novel malware class | Categorical |
Dataset | Test Statistic | p-Value | Iteration | Accuracy |
---|---|---|---|---|
Subscriber data | −3.537879 | 0.007066 | 20 | 90.567% |
UGRansome data | −9.876982 | 0.0008044 | 342 | 90.456% |
Correlogram | ADF | Q-Q | ||
Subscriber training set | 0.9 | 0.8 | 0.9 | 90.398% |
UGRansome training set | 0.8 | 0.9 | 0.7 | 89.453% |
Subscriber testing set | 0.8 | 0.9 | 0.9 | 91.348% |
UGRansome testing set | 0.8 | 0.8 | 0.9 | 88.298% |
Features Total | Mean | Deviation | ||
Subscriber data | 700 | 54.23 | 22.45 | 92.351% |
UGRansome data | 8932 | 75.32 | 46.3 | 88.527% |
Subscriber testing set | 400 | 12.6 | 6.7 | 94% |
UGRansome testing set | 4765 | 26.87 | 39.65 | 88% |
Balanced Accuracy | - | - | - | 81% |
Balanced Features | 3699 | - | - | - |
Balanced Mean | - | 41.75 | - | - |
Balanced Deviation | - | - | 28.25 | - |
Dataset | Features | p-Value | CNN | ARIMA |
---|---|---|---|---|
Subscriber data | 450 | 0.006, 055 | 85.8% | 92.67% |
UGRansome data | 120,000 | 0.0, 006, 043 | 88.9% | 91.65% |
Subscriber testing set | 300 | 0.008 | 86.3% | 94.8% |
UGRansome testing set | 60,500 | 0.007 | 88% | 95.3% |
Balance | 45,312 | 0.005 | 87.25% | 93.605% |
ARIMA (s) | CNN (s) | |
---|---|---|
0 rows | 0 | 0 |
10 rows | 0.44 | 4.39 |
100 rows | 0.64 | 10.43 |
1000 rows | 2.64 | 75.60 |
10,000 rows | 18.24 | 685.76 |
100,000 rows | 159.87 | 6951.91 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nkongolo, M. Using ARIMA to Predict the Growth in the Subscriber Data Usage. Eng 2023, 4, 92-120. https://doi.org/10.3390/eng4010006
Nkongolo M. Using ARIMA to Predict the Growth in the Subscriber Data Usage. Eng. 2023; 4(1):92-120. https://doi.org/10.3390/eng4010006
Chicago/Turabian StyleNkongolo, Mike. 2023. "Using ARIMA to Predict the Growth in the Subscriber Data Usage" Eng 4, no. 1: 92-120. https://doi.org/10.3390/eng4010006
APA StyleNkongolo, M. (2023). Using ARIMA to Predict the Growth in the Subscriber Data Usage. Eng, 4(1), 92-120. https://doi.org/10.3390/eng4010006