4.3.2. Performance Indicators
In the process of testing, the mean absolute percentage error (
) and root mean square error (
) are adopted as indicators to measure the performance of the prediction model [
42]. The calculations of
and
are shown as Equations (13) and (14), respectively:
In these equations, denotes real bus driving time inferred from GPS trajectory, and represents the predicted bus running time using the proposed T-LSTM model.
and are often used to measure the difference between predicted and real values. reflects the percentage of difference and real values, and smaller percentages represent higher prediction accuracy. However, it is not enough to judge the difference only considering when the difference is small. Therefore, is introduced to assist in measuring the difference.
4.3.3. Prediction of Congestion Time
Prediction results of congestion times for six road sections are shown in
Table 4, including morning and evening peak periods. During the morning peak period, the lowest and highest
are 8.0% and 12.7%, respectively, which indicates that the prediction accuracy in section 3 is higher than in other sections, and the accuracy in section 1 is the worst. Meanwhile, the lowest and highest
are 3.05 and 35, respectively, which indicates that the difference in section 6 between prediction and reality is the smallest, and the most obvious difference is in section 3. The average
and
are 11.25% and 14.91, respectively. During the evening peak period, the lowest and highest
are 9.7% and 15%, respectively, which indicates that the prediction result of section 5 is the best, and section 3 is the worst. The lowest and highest
are 2.9 and 44.5, respectively, which indicates that the maximum difference is in section 4 and the smallest difference is in section 2. The average
is 12.3% and
is 14.57 in the evening peak.
To better illustrate the experimental results, we extracted 90 congestion times for each road section to show the predicted results.
Figure 7 depicts the predicted and real congestion times of the six road sections during morning and evening peak periods. The red curve depicts the real congestion times of buses in the six road sections, and the blue curve represents the predicted congestion times. From the picture, we can see that the changing trend of the predicted value curve is very close to the real value curve, which indicates that the predicted curve can reflect the change of real values perfectly.
In summary, the T-LSTM model can accurately and steadily predict the congestion times of morning and evening peak periods to provide information on road status in advance, and lays a foundation for calculating congestion index and classifying congestion levels.
4.3.4. Classification of Congestion Levels
There are three steps for classifying congestion. Firstly, the congestion index is calculated using times predicted by the T-LSTM model. Secondly, the average daily congestion indices of morning and evening peaks are calculated. Thirdly, the congestion levels of morning and evening peaks are classified into five grades by the three classification methods. In order to better present the distribution of congestion levels in six sections, the proportion of each grade for the predicted 13 days is obtained, shown in
Figure 8,
Figure 9 and
Figure 10.
Figure 8 shows the proportion of five congestion grades for the six sections during peak periods by the equal interval classification. During the morning peak period, the proportion of better smooth is larger than other grades in road sections 1, 2, 4, 5 and 6, which account for 34%, 35%, 27%, 42%, and 38%, respectively. The proportion of moderate congestion is 35% in road section 3, the largest of all grades. The congestion proportions of the six road sections are 39%, 39%, 73%, 58%, 35%, and 47%. Similar to the morning peak, the proportion of better smooth in sections 1, 2, and 3 during the evening peak period are smaller than the others, which are 50%, 42%, 38%, and severe congestion accounts for 31%, 46%, and 41% in the other sections. The congestion proportions are 31%, 50%, 19%, 54%, 69%, and 61%.
Figure 9 illustrates the proportions by using the natural breakpoint classification method. During the morning peak period, normal smooth accounts for larger proportions in road sections 1, 2, and 4, which are 27%, 32%, and 31%. Moderate congestion accounts for 31% in section 3, and better smooth accounts for 31% in section 5, and both mild and moderate congestion account for 27% in section 6, representing the largest proportions. The congestion proportions of the six sections are 58%, 53%, 66%, 61%, 46%, and 62%. During the evening peak period, the proportion of normal smooth of both sections 2 and 4 is 27%. Similarly, the proportion of severe congestion in both sections 1 and 5 is also 27%, and mild congestion and moderate congestion both account for 28% in sections 3 and 6. The proportions of congestion in the evening peak are 58%, 58%, 60%, 54%, 69%, and 62%.
Figure 10 shows the result of geometric interval classification. The highest proportions are 24%, 24%, 24%, 31%, and 24% for sections 1, 2, 4, 5, and 6, respectively, indicating severe congestion, normal smooth, mild congestion, better smooth, and severe congestion. Better smooth, mild congestion, and moderate congestion each account for 23% in section 3. The proportions of congestion are 58%, 53%, 58%, 62%, 49%, and 62%. The largest proportions of these sections are 24%, 23%, 24%, 27%, 23%, and 24%, and the congestion proportions are 62%, 58%, 62%, 54%, 58%, and 61%.
The congestion proportions of six road sections using three classification methods are summarized in
Table 5.
In summary, comparing the three classification methods, we can conclude that the geometric interval classification method has the most uniform distribution and the equal interval classification method has the worst distribution.
4.3.5. Calculating Information Entropy
The results of the three classification methods in the previous section may not fully reflect the magnitude of information. Therefore, the information entropy of the six sections by using the three classification methods is calculated separately, as shown in
Table 6. All the information entropy by geometric interval classification is larger than with the other methods, and there is a big difference compared with the equal interval method and a small difference compared with the natural breakpoint method.
Table 7 shows the total information entropy of the three classification methods during the morning and evening. From the table, the information entropy of the geometric interval method is larger than the others, and the morning and evening information entropy is the largest. Conversely, the equal interval method is the smallest for morning and evening information entropy, and the natural breakpoint method has moderate information entropy.
To sum up, there are large differences in the classification results of the same data when comparing the information entropy of the three classification methods, especially between equal interval and geometric interval, and geometric has the largest information entropy in all sections. Therefore, geometric interval performs better than the others in terms of information entropy.