A two-stage data-driven method was designed to predict the lateral motion of the preceding vehicle based on the processed dataset in the previous section. The steps are shown in
Figure 3. The vehicle mobility data processed in the previous section is taken as the input to the method. Two ensembled decision trees were trained to provide the initial lateral motion prediction results. Based on the driver behavior or driving manner classification, the initial results were given various weights and the combination of the weighted results were taken as the final prediction.
To be more specific, in this paper, we used one-day data of a two-month dataset to be the training dataset. Our training set contained mobility data of 51 unique vehicles within 587,812 time instances (16.33 h of driving) [
5]. One bagging decision tree [
22] and one random under-sampling (RUS) boosted decision tree [
23] were trained on the dataset to predict lateral motion. A GMM was used to cluster drivers based on the erraticness of their driving [
24]. The dataset for the clustering combined the training data and the testing instance. Generally, the model created a voting scheme between two ensembles weighted by the driver type probabilities in order. In this way, the model skewed the classification away from false negatives with more erratic drivers.
4.1. Ensembled Decision Tree Classification
Considering the skewness nature of the dataset, we conducted multiple experiments based on different classification methods (three times for each method), including the supporting vector machine (SVM), basic decision tree (DT), and several ensemble decision trees (EDT). The bagging decision trees and the RUS boosted decision trees demonstrated better performance on the cross-validations.
4.1.1. Bagging Decision Trees
The term “bagging” is an acronym of “bootstrap aggregating” [
22]. The learning scheme was created to improve the classification accuracy of the traditional algorithms by using a bootstrap [
25] method to randomly sample the training dataset when training multiple weak learners, or decision trees in this instance, and to make a prediction based on the voting or average of the results of decision tree trained on the sampled datasets. Take
as the training dataset, where
is the label or class of the
-th training data instance
. A normal DT is trained on all the training data to make future predictions of the testing data
as shown in Equation (7).
Within the bagging decision trees algorithm, bootstrap samples are repeatedly and randomly drawn, with replacement, from the original training dataset when training each tree in the ensemble. The final prediction was then obtained as a vote among the DTs in the ensemble. For this work, the labels include {Turn Left, Won’t Turn, Turn Right}, i.e., .
The adaptive synthetic (ADASYN) sampling approach [
26] was utilized to deal with the imbalanced training data. The algorithm generated more synthetic data for the minority classes which were Turn Left and Turn Right here in this case. We used ADASYN to generate the synthetic data for Turn Left class against the Won’t Turn class examples and that for Turn Right class against the Won’t Turn class examples, respectively. The combined dataset of these two new datasets formed the actual training dataset. Ten-fold cross-validation results of the bagging ensembled based on the training dataset processed by the ADASYN are shown in
Table 2.
In this table, false negative rates are more than 43% for Turn Left and around 59% for Turn Right which was due to the high disparity between classes. On the other hand, the bagging trees ensemble tended to bias more conservatively towards a no turn prediction.
4.1.2. Random Under-Sampling Boosted Decision Trees
The RUS boosted decision trees were utilized to resolve skewness of the training data. The idea was to down-sample the training dataset using RUS so that the amount of the minority class data was comparable to that of the majority class data. The ratio of the quantities of different classes was predefined to make the observations of the three classes equal to each other. The weak learners we used were normal decision trees. For each iteration, the under-sampling was conducted based on the predefined ratio, and the weight for each instance of the sampled training dataset was calculated. The sampled dataset and the corresponding weights were used to train one weak learner. According to the error, the weight for the current weak learner was computed and the weights of each instance were adjusted based on it. After predefined times of iteration, multiple weak learners were generated based on training datasets that had more class-balanced data. For each testing instance, the weighted voting of each weak learner classification results was the prediction. The RUS boosted trees ten-fold cross-validation results are shown in
Table 3. In this table, the false negative rates have been largely improved. The RUS boosted trees performed effective prediction of lateral motion. It is to be noted that the false positive cases are important here. In order to decrease the positive predictive value, this method tends to be more aggressive with its positive (−1, 1) labels at the expense of the false positives increasing from 2.33% to 13.81% as compared with the bagging method.
4.2. Gaussian Mixture Models Clustering
The ensemble decision trees algorithms expressed two opposite biases on the prediction results. We observed such biases in several different learning schemes. To secure safer trips, we utilized these biases with predictions on more erratic drivers given that they have more of a tendency to change current driving status. Since the bagged trees tended to provide more conservative predictions, i.e., predicted a lack of lateral motion when it did occur (false negatives), it reduced risk if we chose to cast less trust on its results when facing more erratic drivers. At the same time, since the RUS boosted trees tended to provide more aggressive predictions, i.e., falsely predicting turns (false positives), it reduced risk if we choose to cast more trust on its results when facing front drivers driving with more erratic manners.
Multiple well-developed studies have been conducted to cluster the behaviors of drivers based on their aggressiveness. Kanarachos et al., in 2018, summarized the features used to implement the clustering, including acceleration and smoothness (variance of acceleration) [
27]. The harsh acceleration and harsh cornering behavior of the drivers mentioned in [
28] are widely used for driver behavior classification. Their connections with the vehicle lane changes are also revealed. Therefore, the relevant data in the current dataset were assessed and corresponding metrics were selected for the clustering.
As observed in the training dataset, the mobility data showed a fairly clear boundary between the two groups of drivers. We took data of two features, the mean of longitudinal acceleration and the mean of the jerk, as an example and showed the two-dimensional plot of them in
Figure 4. It can be seen that there is a group of vehicles that have comparatively small or even negative mean values of jerk. On the other hand, the other group of vehicles, or more specifically their drivers, tend to use more variable and aggressive acceleration actions which is illustrated by the increment of acceleration (positive and high jerk). Given that the mean acceleration can be determined by different driving scenarios and environments, which can be encountered by any driver, it is more reasonable to take the driving manners with more frequent actions and a tendency to increase the acceleration as more erratic. Therefore, after analyzing the physical meaning of the data, we clustered the drivers into two categories: consistent (the former group) or erratic (the latter group), and this provided a theoretical basis for the further processing of the ensemble trees predictions.
In order to combine the two algorithms’ ascendancy on the algorithm, we utilized GMM to cluster the driver into one of the two clusters. The clusters of GMM are represented by two different Gaussian distributions characterized by distinct expectations and variances. Here we use as the distribution for consistent drivers’ cluster and as the distribution for erratic drivers’ cluster. The clustering problem is thus transformed into finding the distributions of the observations in the dataset, which means to find the , , , and .
The clustering results based on the training samples are listed in
Figure 5a,b. The GMM is conducted on the mean of the longitudinal acceleration versus the mean of the longitudinal jerk within the original training dataset in
Figure 5a. We also used principal component analysis (PCA) to analyze the training data. The GMM was also conducted on the processed data after PCA, and this is shown in
Figure 5b. Using the clustering model based on original data, we also plotted the grouping characteristics of multiple kinematic variables as shown in
Figure 6. According to these plots, there exist two groups of driving behaviors, and based on the physical meaning of these kinematic variables, we categorize them into two classes: more consistent drivers and less consistent drivers.
The algorithm can provide two probabilities, i.e.,
and
that indicate the chances the driver belongs to each of the two clusters or distributions. Since the two posteriors indicate the probabilities of the driver belonging to a respective cluster, and they sum to one, they are naturally suitable for normalizing the two predictions made by the ensembled methods to find a new average. Take
to be the probability of the front driver belongs to the consistent drivers’ cluster and
to be that of the front driver belongs to the erratic drivers’ cluster. The final prediction is shown in Equation (8).
where,
is the final prediction;
and
are the predictions made by the bagging DTs and the RUS boosted DTs, respectively; the
is the quantize function that categorizes the real number into the set of
, which means turn left, won’t turn, and turn right.
The flow chart of the two-stage method is shown in
Figure 7.