1. Introduction
Recently, many people have begun paying attention to the potential for growth of the cryptocurrency market [
1]. Cryptocurrency has emerged as a new financial product and is traded by numerous investors [
2]. In 2021, the market capitalization of the cryptocurrency market exceeded USD 3 trillion, and the transaction volume has continuously increased. Tthe cryptocurrency market is now an important research topic in the field of digital finance. Cryptocurrency was introduced along with blockchain, a decentralized data storage technology [
3]. Blockchain is a technology that stores data in blocks and manages datasets connected in a chain form in a distributed manner [
4]. Each data block used in the blockchain has a certain capacity and size, and, whenever a new transaction occurs, the transaction is recorded in the data block. The recorded data is connected in a chain and replicated at various data nodes [
4]. Because data blocks are stored and managed in each data node, even if a specific data block is tampered with, the authenticity of the data can be verified by referring to the data blocks of other data nodes [
5]. To tamper with data stored in the block chain, it is necessary to access numerous connected data nodes and change the contents of all data blocks within a short time, so data forgery is very difficult. Therefore, blockchain is attracting attention as a next-generation data storage technology in which all participants store and share common information. Cryptocurrency can be said to be a kind of reward that appears in the process of recording and verifying data blocks on the blockchain. Encryption and decryption are required to verify and record data blocks within the blockchain, which requires a great deal of computing resources. Therefore, users who verify and record data blocks must provide their own computing resources, and cryptocurrency is paid as a reward for providing computing resources [
3]. As the monetary value of cryptocurrency increases, the number of users who provide computing resources increases, making the blockchain ecosystem more robust [
6].
As the expectations for blockchain technology increase, the cryptocurrency market is growing together with it, and many related studies are being conducted. For example, many studies are being conducted to predict the price of Bitcoin, one of the representative cryptocurrencies [
7,
8,
9]. However, since the cryptocurrency market is a high-risk investment market with considerable volatility, it is difficult to implement an accurate forecasting model [
10]. Therefore, the purpose of this study was to predict the log-return price of cryptocurrency using volatility features of cryptocurrency. In the study, data for the top eleven cryptocurrencies that are frequently traded were used to predict the log-return price of cryptocurrencies. Among the collected cryptocurrency data, we predict the log returns of the three major cryptocurrencies, such as Bitcoin, Ethereum, and Binance Coin, with high market capitalization.
There are various mathematical models that can help predict cryptocurrency performance [
11]. Examples include simple exponential smoothing (SES) and various linear models for simple trend analysis [
12,
13]. In addition, several studies on price prediction for artificial neural network-based cryptocurrencies and related derivatives are being conducted [
14,
15]. However, in this study, mathematical models were selected to extend cryptocurrency log-return price prediction research described in previous studies [
11]. This study expanded the scope of research by adding volatility characteristics to log-return price prediction studies undertaken [
16]. Volatility is a widely used variable when predicting log-returns [
17]. In this study, the volatility prediction models, autoregressive conditional heteroskedasticity (ARCH) and generalized autoregressive conditional heteroskedasticity (GARCH), were used to predict the volatility of cryptocurrency and this was used to predict the log-return price [
18]. To determine the importance of features related to the selected cryptocurrency, and to predict the log-return price of the cryptocurrency, major features were selected. To predict the log-return price, traditional time-series methodologies use autoregressive integrated moving average (ARIMA) and artificial neural network based recurrent neural networks (RNN), long short-term memory models (LSTM), and gated recurrent units (GRU) [
19,
20,
21]. By comparison of the performance of the prediction models proposed in this study, we identify the most suitable methods for predicting the log-return price. Unlike previous studies, here, a volatility feature is added to predict the log-return price, and a more accurate log-return price prediction algorithm is proposed.
The paper is organized as follows:
Section 2 describes the methods used in the study.
Section 3 describes the data collected and how it was processed.
Section 4 describes the experimental results.
Section 5 describes the significance and limitations of the study. Finally,
Section 6 presents the conclusions.
4. Results
The MAE, MSE, and RMSE values were used to evaluate the performance of the log-return price prediction model. MAE is the average value obtained by converting the difference between the actual value and the predicted value into an absolute value. Since MAE takes an absolute value for the error, the magnitude of the error is reflected as it is. MSE is the average of the squared difference between the actual value and the predicted value. The MSE increases exponentially as the error increases. Therefore, it is an evaluation index that can respond sensitively to outliers. RMSE is a performance evaluation index using the root for MSE; it is easy to interpret because it is converted back into a unit similar to the actual value.
The log-return price of each cryptocurrency was predicted using ARIMA, a traditional time-series prediction technique, and deep learning techniques. Since Bitcoin, Ethereum, and Binance Coin, which were the analysis targets of this paper, do not satisfy the stationarity requirement, a difference is required to apply ARIMA. So, we applied the hyperparameter d = 1 for the difference. Next, we adjusted the hyperparameter coefficients via the auto-correlative function (ACF).
Figure 7 shows the ACF for the cryptocurrencies.
Considering AFC, 2-time legs were judged to be suitable, and this was applied to the ARIMA model. Therefore, the hyperparameter for p was set to 2, and the ARIMA model of this study was set to p = 2, d = 1, q = 0. As a result of constructing the ARIMA model for three cryptocurrencies, all coefficient values were significant (p < 0.001). Therefore, we applied the ARIMA model with the same hyperparameters to the three cryptocurrencies.
For the artificial neural network-based time-series method, various types of architectures were built based on previous studies [
35]. There were a total of six built architectures. The built models are shown in
Table 4 below.
The dense layer was adjusted based on a time-series neural network layer (RNN, LSTM, GRU) consisting of 32 nodes, and the results were compared.
Table 4 presents the architecture of these models. Architecture 1 is the most complex architecture in this study. This architecture derives results through dense multiple layers after the time-series neural network layer and involves the largest amount of computation. Architecture 6 is a model that uses a time-series neural network layer and a simple dense layer and has the lowest computational load. All given architectures were set under the same conditions. Each neural network layer was activated as ‘Linear’. We also computed the loss function using the Adam optimizer. The calculated loss function is the mean squared error [
38]. We used the Python language and KERAS library to build an artificial neural network model.
The model uses the data seven days before the prediction time to predict one day at the prediction time. In many previous studies related to cryptocurrency prediction, one week is set as a term for short-term prediction [
39,
40]. In addition, since it can reflect the flow of the past week, we set the time interval for forecasting to seven days. To predict the value of the next time point, the model learns by reflecting the value predicted up to the current time point. In other words, the model iteratively utilizes the values predicted up to the current point in time to create a new learning model for predicting the next point in time. The activation function of the training model commonly uses a linear function. The batch size was set to 1 and the epoch was set to 100.
Table 5,
Table 6 and
Table 7 shows the results of predicting the log-return price of cryptocurrencies in the validation dataset based on the selected main features.
Table 5 shows the model’s prediction of the log-return price of Bitcoin. The RNN with Architecture 5 showed the smallest error between the actual value and the predicted value. The ARIMA (2, 1, 0) model, which is a traditional time-series prediction method, showed an error of 0.0422 based on MAE. This represents a relatively large error when compared to the artificial neural network. Therefore, as a model for predicting the log-return price of Bitcoin, the artificial neural network model is more suitable than the ARIMA model.
When comparing the artificial neural network techniques, RNN and GRU were more predictive than LSTM. LSTM tended to produce a large error compared to other artificial neural network methods. On the other hand, GRU had the smallest error among all the models, and it was confirmed through RMSE that the error for outliers was also low. The error rates of the configured architectures were mostly similar. However, Architecture 6 was the most efficient because the number of parameters used in the analysis was significantly smaller. Therefore, we utilized Architecture 6 to predict the Bitcoin log-return price for our test data.
Table 6 shows the prediction results of the log-return price of Ethereum. As a result of Ethereum’s log-return price prediction, the GRU of Architecture 6 showed the smallest error between the actual value and the predicted value. The ARIMA (2, 1, 0) model, which is a traditional time-series prediction method, produced an error of 0.0442 based on MAE, a relatively large error compared to the artificial neural network. Therefore, as a model for predicting the log-return price of Ethereum, the artificial neural network model is more suitable than the traditional time-series prediction model. Furthermore, MSE and RMSE were larger than for the artificial neural network techniques.
When comparing the results for the artificial neural network techniques, the performance of some architectures was higher in LSTM; however, in general, RNN showed higher predictive power than GRU and LSTM. As with the Bitcoin log-return prediction model, LSTMs generally perform poorly. Among the various performance indicators presented in
Table 6, there was an outlier effect when predicting the Ethereum log-return price through the RMSE value. Therefore, RNNs that suppress the influence of outliers showed good results. Similar to the Bitcoin log-return prediction results, all artificial neural network-based log-return prediction models presented in
Table 6 showed lower error rates compared to ARIMA. In addition, Architecture 6 showed good performance. Therefore, we selected Architecture 6 as the best way to predict the log-return price of Ethereum.
Table 7 shows the log-return price predictions for Binance Coin for the model. The Binance Coin model prediction results were superior to those of Bitcoin and Ethereum. The ARIMA (2, 1, 0) model produced an error of 0.0293 based on the MAE, and as for the other experimental results, the error was relatively large compared to the artificial neural network. Therefore, as a model for predicting the log-return price of Binance Coin, the artificial neural network model is more suitable than ARIMA. In addition, when comparing the artificial neural network techniques, RNN produced less error than LSTM and GRU. This is the same result as for the Ethereum log-return price prediction model. When comparing the performance between architectures, there was no significant difference. This was a similar result to that obtained when predicting the log-return price of Bitcoin, and Architecture 6 was selected based on its efficiency.
The model-specific prediction results for the test data are shown in
Table 8. The test data were used to directly compare the Architecture 6 structure and ARIMA, which is the architecture selected based on the experimental results of the validation dataset. Architecture 6 consists of a combination of a time-series neural network layer and a simply structured dense layer. Structurally, this architecture entails a small amount of computation and exhibits a relatively low error rate. As a result of using the ARIMA and the Architecture 6 structure, the artificial neural network performed well in all evaluations. There were no significant differences from the results of the validation dataset shown in
Table 5,
Table 6 and
Table 7. This also avoids overfitting the model. Therefore, the artificial neural network model provides excellent predictions of the log-return price of cryptocurrencies.
5. Discussion
The cryptocurrency market is growing very rapidly with expectations for high returns. However, since it is a highly volatile market, risk management is essential. Therefore, it is necessary to predict the volatility of the cryptocurrency market, which is associated with a high investment risk, to help investors manage their portfolios or to better prepare for future risks. Therefore, in this study, a more accurate log-return price prediction was attempted by adding a volatility feature to the existing log-return price prediction model. These suggestions may be helpful to some investors.
In the study, we analyzed data for the top 11 cryptocurrencies that are traded a great deal to predict the log-return price of cryptocurrencies. It is very important to analyze the transaction amount of each cryptocurrency because it is not independently derived but is influenced by the transaction quantity of other cryptocurrencies. For example, when the price of Bitcoin falls sharply, the price of other cryptocurrencies also falls. An attempt was made to increase the predictive power of the model by extending the existing research and using the results of the ARCH and GARCH models for variability prediction as additional features [
16]. Using these methods, a total of 33 features were constructed. Based on the newly constructed 33 features, important features affecting the log-return price of each cryptocurrency were selected using the Gini impurity technique. Feature selection is very important because it can greatly reduce the complexity of the model in the process of deriving the model’s predicted values. Therefore, unlike previous studies, a predictive model was constructed by selecting major features that have a large impact on each cryptocurrency without using all the features. As a result of verifying the error rate using several artificial neural network-based models, the artificial neural network-based time-series prediction model was found to be superior to the ARIMA model. The log-return price of the three major cryptocurrencies showed that the error rate of the artificial neural network method was consistently lower than that of the ARIMA method.
Hundreds of cryptocurrencies are traded on Binance, the world’s largest cryptocurrency exchange. Therefore, the interrelationships between cryptocurrencies are becoming more complex every day. However, only 11 cryptocurrencies with high trading volume were selected and analyzed. Therefore, a limitation of the study is that it did not reflect all the interrelationships between cryptocurrencies. In addition, since the volatility of cryptocurrency is very great, there is a limit to the ability to accurately select functions related to cryptocurrencies. The investment value of cryptocurrencies is determined by many investors, and cryptocurrencies are affected by the stock market, bonds, and exchange rates; however, in this study, these features were not reflected in the prediction model. In future research, it will be important to use a sufficient quantity of cryptocurrency data to enable generalization of the correlations between cryptocurrencies and to present a log-return price prediction model that more accurately reflects volatility by also using macroeconomic features.
6. Conclusions
A traditional time-series prediction method and artificial neural network-based time-series prediction methods were presented using various cryptocurrency data, and the volatility of major cryptocurrencies were predicted using the models. The log-return price of cryptocurrency and ARCH- and GARCH-based volatility data were combined and processed into new data for future log-return prediction. In addition, by using the Gini impurity, major features affecting the log-return price of representative cryptocurrencies (i.e., Bitcoin, Ethereum, Binance Coin) were selected. As a result of the selection process, three important features were selected for each cryptocurrency. Using this feature selection method, the amount of calculation required can be greatly reduced, and, because only the main features are used for prediction, the proposed method represents a significant contribution to the prediction of the log-return price of cryptocurrencies.
The analysis was conducted using three important features and the log-return price data of cryptocurrencies. Based on the selected features, a traditional time-series prediction ARIMA model and artificial-neural-network-based RNN, LSTM, and GRU models were constructed. Since the prediction result of the artificial neural network-based model varies depending on the layer configuration and hyperparameter setting, various architectures were constructed by reference to previous studies [
38]. When comparing various types of artificial neural network architectures, it was confirmed that a relatively simple model architecture was suitable. Although different cryptocurrencies have different architectures for optimal performance, the results confirmed that performance is degraded when the neural network goes too deep.
The error rate of almost all the artificial neural network architectures presented was lower than that of the existing time-series model. As a result of predicting the log-return price of each cryptocurrency using an artificial neural network model built in various architectural forms, the prediction result of the artificial neural network-based time-series prediction model showed less error than the ARIMA model.
It is widely accepted that cryptocurrencies are affected by other macroeconomic features. Therefore, it is anticipated that volatility can be predicted more accurately if research is conducted that includes features of the stock market, bonds, and exchange rates. If additional cryptocurrency data and volatility features are utilized, it will be possible to present a more accurate and generalized prediction model. In addition, many time-series prediction methodologies, other than the analysis methodologies used in this study, are being investigated. In future research, we will develop the approach using new macroeconomic features and various cryptocurrency features.