4.3.2. Hybrid LSTM Neural Network Modeling Process
To transform the one-order difference taxi trajectory into supervised learning problems and evaluate the prediction accuracy of the model, we divided the trajectory points of the one-order difference taxi trajectory into a training set and a testing set, where the first 80% (i.e., 5182 trajectory points) of the one-order difference taxi trajectory is the training set and the last 20% (i.e., 1296 trajectory points) of the one-order difference taxi trajectory is the testing set. The hybrid LSTM neural network modeling process is as follows:
(1) Network Structure Tuning
When tuning the network structure in Keras, the hybrid LSTM neural network structure shown in
Figure 1 selects one or more LSTM combination layers, zero or more Dense combination layers, and one output combination layer, where each LSTM combination layer only includes an LSTM layer, each Dense combination layer only includes a Dense layer, and the output layer only includes a Dense layer. The parameters of the LSTM layer are set to the default values of Keras [
46], and the parameters of the Dense layer are set to the default values of Keras [
46]. Additionally, the batch_size is set to 1, the number of neurons is set to 5, the time of epoch is set to 300, the optimizer is set to Adam and its learning rate is set to 0.0008. The number of neurons, the time of epoch, the optimizer and its learning rate are further tuned in the subsequent experiments.
Figure 9 is the distance percentage of different LSTM layer number, and
Figure 10 is the distance percentage of different Dense layer number.
From
Figure 9 and
Figure 10, we can see that the distance percentage is the maximum (i.e., 0.6203) when the number of LSTM layers is 2, and we can see that the distance percentage is the maximum (i.e., 0.6211) when the number of Dense layers is 5. Therefore, the hybrid LSTM neural network structure shown in
Figure 1 only selects two LSTM combination layers, four Dense combination layers and one output combination layer in the subsequent experiments.
(2) Neurons Number Tuning
The number of neurons in the neural network has a certain impact on the output of the model, and an appropriate number of neurons can improve the prediction accuracy of the model. When tuning the number of neurons in Keras, the hybrid LSTM neural network structure shown in
Figure 1 only selects two LSTM combination layers, four Dense combination layers, and one output combination layer, where each LSTM combination layer only includes an LSTM layer, each Dense combination layer only includes a Dense layer, and the output layer only includes a Dense layer. The above parameter values except the number of neurons are applied here.
Figure 11 is the distance percentage of different neurons number.
From
Figure 11, when the number of neurons is 6, the distance percentage is the maximum (i.e., 0.6312). Therefore, the number of neurons is set to 6 in the subsequent experiments.
(3) Optimizer Tuning
In deep learning, in order to minimize the given objective function, the descending gradient can be optimized. Different optimizers can be used to further optimize the target problem. The commonly used optimizers include Adam, RMSProp, Adagrad, Adamax, Nadam, Adadelta, Stochastic Gradient Descent (SGD), etc. [
46]. When tuning the optimizers in Keras, the hybrid LSTM neural network structure shown in
Figure 1 only selects two LSTM combination layers, four Dense combination layers, and one output combination layer, where each LSTM combination layer only includes an LSTM layer, each Dense combination layer only includes a Dense layer, and the output layer only includes a Dense layer. The above parameter values except the optimizers and their learning rates are applied here.
Figure 12 is the distance percentage of different optimizers and learning rates.
According to
Figure 12, when the optimizer is Adamax and its learning rate is 0.000032, the distance percentage is the maximum (i.e., 0.6636). Therefore, the optimizer is set to Adamax and its learning rate is set to 0.000032 in the subsequent experiments.
(4) Activation Function Tuning
When tuning the activation functions in Keras, the hybrid LSTM neural network structure shown in
Figure 1 only selects two LSTM combination layers, four Dense combination layers, and one output combination layer. In order to further improve the prediction accuracy of the model, we added an Activation layer to each of the above combination layers, i.e., each LSTM combination layer includes an LSTM layer and an Activation layer, each Dense combination layer includes a Dense layer and an Activation layer, and the output layer includes a Dense layer and an Activation layer. The activation function of the Activation layer of each LSTM combination layer is the same as that of the Activation layer of each Dense combination layer, called intermediate activation function. The activation function of the Activation layer of the output combination layer is called output activation function. When tuning the intermediate activation function in Keras, the output activation function is set to the linear function shown as Equation (16), and the above parameter values are applied here.
Table 4 is the distance percentage of different intermediate activation functions.
From
Table 4, when the intermediate activation function is Softsign, the distance percentage is the maximum (i.e., 0.6644). Therefore, when tuning the output activation function in Keras, the intermediate activation function is set to Softsign, and the above parameter values are applied here.
Table 5 is the distance percentage of different output activation functions.
From
Table 5, when the output activation function is the linear function shown as Equation (16), the distance percentage is the maximum (i.e., 0.6651). Therefore, the output activation function is set to the linear function shown as Equation (16) in the subsequent experiments.
(5) Dropout Tuning
When tuning the dropout rate in Keras, the hybrid LSTM neural network structure shown in
Figure 1 only selects two LSTM combination layers, four Dense combination layers and one output combination layer. An appropriate dropout rate can sometimes improve the predication accuracy of the model, therefore we added a Dropout layer to each of the combination layers, i.e., each LSTM combination layer includes an LSTM layer, an Activation layer, and a Dropout layer, each Dense combination layer includes a Dense layer, an Activation layer, and a Dropout layer, and the output layer includes a Dense layer and an Activation layer. The intermediate activation function is set to Softsign, the output activation function is set to the linear function shown as Equation (16), and the above parameter values are applied here.
Figure 13 is the distance percentage of different dropout rates.
According to
Figure 13, when the dropout rate is 0.1, the distance percentage is the maximum (i.e., 0.6358), less than 0.6651 as shown in
Figure 14. Hence, the dropout rate in each Dropout layer should be set 0, i.e., no neural units need to be discarded for the one-order difference taxi trajectory.
(6) Epoch Time Tuning
In
Figure 12, when the optimizer is Adam and its learning rate is 0.0000064, the distance percentage is raised to 0.6551. When the optimizer is RMSprop and its learning rate is 0.0000064, the distance percentage is raised to 0.6543. When the optimizer is Adagrad and its learning rate is 0.004, the distance percentage is raised to 0.6551. When the optimizer is Adamax and its learning rate is 0.000032, the distance percentage is raised to 0.6636. When the optimizer is Nadam and its learning rate is 0.000032, the distance percentage is raised to 0.6481. When the optimizer is Adadelta and its learning rate is 0.02, the distance percentage is raised to 0.6543. When the optimizer is SGD and its learning rate is 0.0008, the distance percentage is raised to 0.6049.
Generally, when the learning rate is low, the learning speed will be slow. Therefore, for the Adagrad with a learning rate of 0.004, the Adamax with a learning rate of 0.000032, and the Adadelta with a learning rate of 0.02, their distance percentages are raised to a higher value more quickly. In order to improve the training performance of the model, we further tuned the epoch time for them. When tuning the epoch time in Keras, the hybrid LSTM neural network structure shown in
Figure 1 only selects two LSTM combination layers, four Dense combination layers, and one output combination layer, where each LSTM combination layer includes an LSTM layer, an Activation layer and a Dropout layer, each Dense combination layer includes a Dense layer, an Activation layer, and a Dropout layer, and the output layer includes a Dense layer and an Activation layer. The intermediate activation function is set to Softsign, the output activation function is set to the linear function shown as Equation (16), and the above parameter values except the epoch time are applied here.
Figure 14 is the distance percentage of different epoch times.
According to
Figure 14, when the epoch time is 120, the distance percentage for the Adadelta with a learning rate of 0.02 is higher than that for the Adagrad with a learning rate of 0.004 and that for the Adamax with a learning rate of 0.000032, and is raised to 0.6505. This is very close to the maximum distance percentage 0.6636 above, while the epoch time is reduced from 300 to 120. Therefore, the optimizer is set to Adadelta and its learning rate is set to 0.02, and the epoch time is set to 120 finally.