1. Introduction
Over the first two decades of the twenty-first century, the use of mobile devices, such as smartphones with global positioning system (GPS) functions, has spread rapidly. Consequently, location data from these mobile devices via the GPS and the attribute data of users attached to these location data are now being collected on a grand scale, with the expectation that these data would be utilized for urban planning and the analysis of human activity, among other applications [
1]. Location data and their associated data are regarded as the trajectories of mobile device users. There has been much research on classification and prediction models for user trajectories [
2,
3,
4,
5]. This study mainly focuses on classifying user groups based on location data and their associated data, as obtained from mobile devices.
Many studies have aimed at improving the accuracy of trajectory classification using location information consisting of spatiotemporal dimensions obtained from mobile terminals and associated semantic information [
6,
7,
8,
9,
10,
11]. In particular, Ferrero et al. proposed a MasterMovelet that characterizes trajectories with semantic dimensions using a series of uniform procedures [
12]. However, whereas these studies assume that the semantic dimension is attached to the trajectory, the semantic information associated with the location data acquired from mobile devices is often missing. In this context, methods for trajectory similarity computation that do not require semantic dimensionality information other than spatial–temporal dimensionality have been proposed by Han et al. and Zhou et al. [
13,
14]. In addition, Endo et al. proposed a stacked denoising autoencoder (SDA) [
15] based method for extracting trajectory features for transportation mode classification [
16]. However, these methods do not assume that trajectories are sparse in the spatiotemporal dimension. Because irregular events obtain mobile terminal-based location information, the trajectories of such information could be sparser in spatial–temporal space.
Many studies have focused on trajectory classification using dense or semantically rich data. However, location data generated by mobile devices often lack semantic information and exhibit spatial–temporal sparsity. This raises the challenge of effectively utilizing sparse trajectories for applications such as user group classification and mobility analysis. Addressing this issue is crucial for harnessing the potential of trajectory data in fields such as urban planning, marketing analysis, and public policy. In addition, increasing data integrity can significantly enhance the usability of location data for these applications.
For these reasons, in our past work, we developed a sparse trajectory feature based on spatial–temporal information for classifying user age groups [
17]; however, we conducted only a superficial analysis for limited binary classification and did not significantly examine feature availability. Furthermore, this past research focused solely on analyses based on the support vector classifier (SVC) [
18,
19] model and did not include comparisons with other machine learning (ML) models and trajectory features based on existing methods. Therefore, this study provides a detailed analysis of whether user groups can be estimated using only a sparse spatial–temporal trajectory without considering prior knowledge and past historical data associated with mobile device users. In particular, we analyze the feature based on [
17], which is extracted from users’ location data using point-type dynamic population (PTDP) data [
20] obtained from the GPS function of mobile devices. Moreover, we adopt multiple types of machine learning (ML) models that learn the features for six-class classification and conduct a detailed analysis. Additionally, by evaluating the user age group classification performance of the ML models, we verify the effectiveness of features based on sparse spatial–temporal trajectories in user age group estimation.
Each PTDP data point has an ID for identifying mobile devices by day, and those linked to this ID are the associated attributes registered by the user of the device and the location data from the device. Although location data are acquired in one-minute units, most of the data for a single day are missing because the data acquisition timing is determined by an event, such as the user opening a specific application of the device. Thus, we use a simple method to impute the missing location data and regard these imputed data as the user trajectory to make the data length uniform. Furthermore, for this study, we select age groups as the specific user groups that are subjected to classification and assume that the age groups can be categorized into six classes: 0 to 19, 20 s, 30 s, 40 s, 50 s, and 60 s and over.
Typically, mobile users’ location data, which consist of longitude and latitude, are expressed in the geographic coordinate system. In other words, the location data that are part of a trajectory are represented as points in the region [180° W–180° E, 90° S–90° N]. Because using such trajectories as features is not realistic, we need to summarize a trajectory into feature vectors consisting of a finite number of dimensions. Hence, we generate
n-dimensional feature vectors by assigning
n representative points (RPs) to a target region and aggregating the user locations into RPs. A RP is generated using the Gaussian mixture model (GMM), which assumes a Gaussian mixture distribution (GMD) for the population distribution in a target region. In particular, the GMD is generated from the raw location data of all users in a target period, and we regard the centroids of each Gaussian distribution as the RPs for that distribution. Thus, the RPs are selected in locations where the population is concentrated. In this way, each user trajectory is aggregated into RPs, and
n-dimensional feature vectors for classifying user age groups are generated. We employed ML models that use the
n-dimensional feature vector as input and output an age group linked to the mobile users. For the ML models, we employed SVC, random forest (RF) [
21], and deep neural network (DNN) [
22] models, and for the hyperparameter tuning of these ML models, we used fivefold grid search cross-validation (GS+CV). With feature vectors generated using GMD in this manner, it is possible to construct a classification model that uses only spatial–temporal information without complicated preprocessing, which differs for each target region. In addition, this study aims to verify the effectiveness of features based on the GMM for classifying sparse trajectories without semantic information. To achieve this goal, a comparison is conducted with existing feature extraction methods for trajectory data. In particular, the comparison focuses on features extracted using the improved DNN (IDNN), developed by Endo et al. [
16], which is based on an SDA [
15]. The IDNN does not assume that the trajectory is sparse, but part of this feature extraction method can be applied to our problem settings. This comparison demonstrates the superiority of the GMM-based features for sparse trajectories proposed in our past work, highlighting their effectiveness in this context. This allows for an evaluation of the relative performance and applicability of the GMM-based features.
In our experiments, we demonstrate the performance of three ML models trained with GMM-based features via PTDP data acquired between January and June 2021 in Narashino, Chiba, Japan. Because the number of GMD components that determine the feature vector’s dimensions can significantly influence the classification accuracies of the ML models, we conduct a sensitivity analysis on the number of components. In addition, we evaluate the accuracy and generalization performance of the combinations of three ML models and GMM- and IDNN-based features with respect to the level of missing value ratio, which quantifies the sparsity of trajectories. The trajectory classification ability of GMM-based features consisting of the sparse trajectory and the spatial–temporal sparsity of the user trajectory are examined in terms of their effectiveness for classification accuracy via a comparison with IDNN-based features. As a result, this study demonstrates that GMM-based features exhibit greater robustness to trajectory sparsity and higher generalization performance than the features extracted by the IDNN. These findings highlight the superior capability of GMM-based features in effectively classifying user age groups from sparse trajectories, compared to existing trajectory-based features, making them a promising approach for scenarios where only spatial–temporal information is available.
2. Literature Review
This section reviews relevant research studies to highlight the originality and practicality of our GMM-based feature extraction method. In this work, we treat age group classification as an example of user group classification using a sparse trajectory.
Traditionally, age group classification tasks have relied heavily on human biometric data. For instance, various age estimation techniques based on facial images have been developed over the years [
23,
24,
25]. Additionally, Saxena et al. introduced an age group estimation method leveraging fingerprint data, while Nabila et al. proposed using gait patterns for the same purpose [
26,
27]. Beyond biometric-based approaches, alternative methods have also emerged. For example, Guimaraes et al. developed a strategy for classifying age attributes based on user-generated text on social media platforms [
28].
Despite this wealth of research on age group classification, direct applications of trajectory data to age group classification are rare. Thus, we focus on research studies that, although not explicitly designed for user age group classification, can be adapted for this purpose. This section is organized into two key aspects: independence from semantic information and applicability for sparse trajectories, with potential applications and limitations of each method being discussed in detail.
2.1. Trajectory Analysis Using Semantic Information
Many studies have proposed trajectory classification methods that utilize data with both spatial–temporal and other semantic dimensions to increase accuracy. Wang et al. proposed an estimation method for the demographics of user groups based on trajectories obtained by analyzing wireless access points and prior knowledge of mobile users [
29]. Montasser et al. also proposed a method for predicting the demographics of geographic units by analyzing geotagged tweets [
30]. Many other methods have also been proposed, corresponding to classification or prediction models for user groups based on additional geographic and social network information [
6,
7,
8,
9,
10,
11]. However, these methods must process a considerable amount of prior knowledge data linked to mobile users and perform text analyses on posts in social network services.
In this context, Ferrero et al. proposed MasterMovelets, which designs features for trajectories with semantic dimensions, excluding spatial–temporal dimensions [
12]. MasterMovelets can be used to examine the features of spatial–temporal trajectories with rich information regarding other semantic dimensions via a series procedure. This method increases classification accuracy by focusing on local subtrajectories in a target trajectory. Most of the research introduced so far assumes that the trajectories have semantic dimensions in addition to spatial–temporal dimensions and aims to increase the classification accuracy for such trajectories. However, user information linked to mobile device location data tends to have many missing values because of the nonuniformity of users’ registration statuses. Therefore, given that homogeneous information is not always added to all user trajectories, these methods cannot be applied to trajectories, which tend to have missing information.
2.2. Trajectory Analysis Using Only Spatial–Temporal Information
2.2.1. Similarity-Based Methods
DTW [
31], a traditional similarity measurement method for sequential data, can measure the similarity between trajectories on the basis of spatiotemporal information in the context of trajectory classification. The similarity of trajectories can also be utilized for classifying user attributes associated with trajectories. Although a variety of extended DTWs have been proposed [
32,
33], these methods have a high computational cost. By contrast, Han et al. proposed a highly accurate prediction method for the similarity of trajectory patterns composed of only spatial–temporal information based on a graph-based approach [
13]. Zhou et al. developed a noise-robust method for calculating the similarity of trajectories and solved data sparsity for the search space for the same problem setting [
14]. However, these methods do not assume spatial–temporal sparsity of the trajectory based on mobile device location data. In contrast, location data from mobile devices are obtained almost entirely through irregular events, such as the user opening an application. In addition, with regard to anonymity, location data often do not account for microscopic movement patterns associated with users’ residences. The trajectories consisting of such location data are very sparse.
2.2.2. Trajectory Prediction and Recovery for Sparse Trajectory
Wang et al. proposed a model for predicting the future mobility of mobile users using such sparse trajectories based only on spatial–temporal dimensions [
34]. Their model learns the trajectories of heterogeneous users accumulated in the past and predicts the future movements of users on the basis of their similarity with past users’ data. However, that study focused only on predicting spatial–temporal information about how users move from one point to another; it did not aim to estimate a user group without spatial–temporal information.
If it is possible to recover a complete trajectory for a sparse trajectory that lacks information, conventional methods could be applied to user-group classification. Various methods for such trajectory recovery have been proposed [
35,
36,
37,
38]. Xia et al. developed an attentional neural network-based model that recovers sparse trajectories by learning past accumulated historical trajectory data of users [
35]. Sun et al. also proposed a trajectory recovery model for similar problem settings using a graph neural network, and both models demonstrate high performance; however, these models do not assume a new user who does not store past historical data [
36]. For this problem, Si et al. proposed a new method that mitigates the cold start problem of such new users by using a transformer [
37]. However, these methods require large amounts of historical data linked to specific users. Conversely, since unique IDs of mobile users’ location data covered by this study are reset on a daily basis, these methods cannot be applied to our problem settings.
2.3. Feature Extraction Applicable to Sparse Trajectories Without Semantic Information: A Comparative Baseline
Endo et al. proposed a feature extraction method for classifying transportation modes of trajectories using a SDA [
15,
16]. In this method, trajectories are converted into images and flattened into vectors, which are then used as inputs for a DNN constructed via SDA. Fine-tuning is performed using annotated labels, and the resulting improved DNN (IDNN) is employed to extract features from the trajectories. The semantic information associated with the trajectories is subsequently added, enabling high-accuracy transportation mode classification. In this study, we target trajectories that lack semantic information; however, the feature extraction method using the IDNN can still be applied. On the other hand, because IDNN-based feature extraction relies heavily on the shape of trajectories, it assumes dense trajectory data. Therefore, its performance on sparse trajectories remains uncertain. In our experiments, we compare the performance on sparse trajectories using features extracted by our proposed method and those extracted by the IDNN.
3. Trajectory Data
3.1. PTDP Data
The PTDP data used in this study are obtained from the GPS function of mobile devices and are provided by Agoop Corp. (Tokyo, Japan) [
20]. Agoop Corp., the provider of these data, is a company that offers location-based services and is 100% owned by Softbank, one of the major telecommunications carriers in Japan. PTDP data are used for various purposes, such as urban planning, marketing analysis, and public policy development [
39,
40,
41,
42]. Accurately estimating user groups, e.g., age groups, can significantly enhance the usability of PTDP data in these research areas, enabling more precise analyses and evidence-based decision making. The columns of the PTDP data table, which are used in this research study, consist of the following attributes:
Daily ID: User ID associated with the mobile device. This ID is reset on a daily basis.
Timestamp: timestamp recorded in yyyy-mm-dd HH:MM for obtaining location information.
Longitude: longitude of the location data.
Latitude: latitude of the location data.
Age group: age groups composed of those aged under 15, 70 and over and other age intervals divided into five-year increments, i.e., 0–14, 15–19, 20–24, …, 65–69, 70 and over.
Each row in the PTDP data table records the above attributes, which represent a user’s location information, demographic attributes, and time. For each user, daily location information is recorded across multiple rows. These point data are acquired through specific events, such as when a user opens an application on a smartphone. Each row of data is logged at a minimum interval of one minute. Therefore, by arranging these location data in a time series order, trajectories associated with individual users can be constructed.
Notably, there are missing values for the age group attribute; however, in this study, we excluded data with missing age group information from the experiments. Using the age groups in their original form could result in an excessive number of classes, which might lead to an extremely small number of data points in each class. Moreover, the original classification does not clearly indicate the attributes of individuals in each class, such as whether they are students or workers. For these reasons, this study reclassifies the age groups into six categories: 0–19, 20 s, 30 s, 40 s, 50 s, and 60 s and over. These groups can generally be associated with standard attributes, such as students, early-career workers, young workers, middle-career workers, senior workers, and retirees. Notably, the behavior patterns of the 0–19 age group may overlap with those of their caregivers.
3.2. Data Assumption
As mentioned in
Section 3.1, this study assumes that PTDP data are the location data that constitute the trajectories. The set of IDs linked to the mobile devices (where each ID is linked to exactly one mobile device) is denoted by
K, and the set of location data corresponding to ID
is denoted by
. Note that all the IDs are reset on a daily basis. If the maximum length of the location data in a day is denoted by
, the location data size for a day is
, i.e.,
where
is a natural number that is less than or equal to
. Note that
is a point of ID
k on the geographic coordinate system at time
t, and
[180° W–180° E],
[90° S–90° N].
Although the acquisition timing of location data is at a unit time interval, if a mobile device with ID k does not cause any event, location data are not acquired. Hence, for almost all sets for , a large portion of the location data acquired by mobile devices are missing, i.e., .
3.3. Imputation for Missing Location Data
As mentioned in
Section 3.2, a large portion of the location data for any
is missing. Accordingly, generating features is difficult because the data length of
differs from that of ID
k. Thus, we must impute a location dataset
to a trajectory
with aligned data length, where
is a point of ID
k on the geographic coordinate system at time
t.
As we mentioned in
Section 2.2, although various trajectory recovery methods have been proposed, these models require large amounts of historical trajectory data linked to unique user IDs. However, because the mobile location data covered by this study do not store historical data for user anonymization, we cannot use such historical data. Thus, in this study, we use simple linear interpolation for trajectory recovery.
Let us consider a scenario in which an ID is obtained at Site A at time . Then, later, the same ID is acquired at Site B at time , where and . When , we assume that the ID contains missing location data between and . An assumption needs to be made regarding the missing data to be imputed if there is no information about the ID location. Hence, we assume that a mobile user departs from Site A at and then arrives at Site B at . In other words, on the basis of the unit time and the acquisition times and , the path can be obtained by dividing the interval between Sites A and B by . By this assumption, we obtain trajectories .
The missing value ratio of
, i.e., the ratio of imputed data for the entire
, is defined by the function
with location dataset
, which is the source of
as an input, as follows:
Note that
. Afterward, denoting the missing value ratio by
, we use the function
with
as the input. The missing value ratio
refers to the sparsity of trajectory
such that the closer the value of
is to 1, the sparser the trajectory
.
Figure 1 shows the distribution of the missing value ratios of the trajectories based on PTDP data obtained at Narashino, Chiba, from January to June 2021.
Figure 1 indicates that almost all the location data are
. Notably, the data with
comprise 63% of the data.
Figure 1 thus implies that almost all the trajectories based on PTDP data are sparse.
Figure 2 visualizes the imputation.
6. Numerical Experiment
6.1. Methods
6.1.1. Tuning GMD Components n for Model Performance
As shown in
Section 4.1.1, the dimension of the feature vector
generated based on trajectory
is equal to the number
n of GMD components. In general, too large the feature vector dimension for a sample size has a negative effect on the robustness and regularization of the model [
47,
48]. By contrast, for the feature vector obtained by aggregating a trajectory to RPs, the dimension is regarded as the expressiveness for a trajectory; thus, a larger dimension
n might increase the classification accuracy. Therefore, the trends of the model performance by components
n of the GMD are verified, and a search for the components
n that provide the highest accuracy is performed. Let the search range of components
n be
. The minimum dimensionality required for the feature vector is
to ensure distinctiveness. This is because setting
results in all feature vectors being
, eliminating any distinctiveness of trajectories. Note that the GMD is computed via the expectation–maximization algorithm [
50].
The metric employed for the GMD is often either Akaike’s information criterion or the Bayesian information criterion. These metrics provide the log-likelihood of the GMD. However, in this study, the performance of the classification model trained with features based on the GMD is essential, whereas the likelihood of the GMD for the actual population distribution is not essential. Thus, we directly use the model performance, i.e., the metrics in
Section 5.4 as evaluation indices for the GMD components
n. By contrast, only the accuracy in
Section 5.4 is used in the experiment because the objective is to determine the trends of model performance by components
n and not the detailed behavior of the models. In particular, the average and standard deviation of the accuracy are calculated for the results of 20 seeds.
Because the numbers of datasets and component candidates are 20 and 21, respectively, the number of constructed models is
for each ML method. The hyperparameter tuning described in
Section 5.3 is conducted for all 420 models for each ML method.
6.1.2. Comparison of GMM and IDNN
The performances of the ML models that use the feature vectors
and
are compared. The IDNN is trained via the same training and test datasets with
as the ML models in
Section 6.1.1.
In [
16], a clipping process is applied to center the trajectory centroid within the image, as their objective is to classify transportation modes, focusing primarily on the shape of the trajectory. By contrast, this study aims at attribute classification, where the relative position of the trajectory within the region, in addition to its shape, is crucial. Therefore, no clipping process is applied.
The IDNN has several hyperparameters that need to be tuned. It is not realistic to tune all the hyperparameters in terms of the computational cost; thus, we focus on the number of hidden layers and the number of neurons in the l-th layer to tune these parameters via the GS+CV.
To simplify the process, we assume that all layers have the same number of neurons, i.e., . We suppose the search ranges for these hyperparameters with and . The other parameters and settings for the IDNN are as follows:
Image size:
Noise distribution : masking noise (rate = 0.2)
Activation functions:
- −
, : ReLU
- −
: Softmax
Loss functions:
- −
, : mean squared error
Optimizer for each DEA and fine tuning of the DNN: Adam
The image size determines the expressiveness of the trajectory representation. A larger image size is anticipated to enhance the representational capacity of the trajectory; however, it may also increase susceptibility to noise. Conversely, a smaller image size is more robust to noise but may compromise the expressive power of the trajectory representation. In [
16], an image size of
was reported to achieve the best accuracy and demonstrate good generalization performance. In this study, the task involves a higher degree of trajectory sparsity and noise compared to [
16]. To address these challenges while avoiding significant loss in representational capability due to an overly small image size, the image size was set to
.
Masking noise is used during the pre-training phase of SDA to deliberately introduce noise, with the goal of improving the robustness of extracted features. In [
16], it was reported that a masking noise rate of 0.4 achieved the highest accuracy and improved generalization performance. However, because the trajectories in this study inherently contain noise, using a high masking noise rate would significantly reduce accuracy. Consequently, the masking noise rate was set to 0.2 in this study.
In [
16], sigmoid functions were used as activation functions. However, sigmoid functions are known to cause the vanishing gradient problem. In recent years, ReLU functions have gained popularity in many tasks [
51]. Furthermore, since the output layer during fine-tuning is compared with binary vectors representing the classes, the Softmax function is more suitable than the sigmoid function. For these reasons, this study adopted ReLU for
and
, and the Softmax function for
. For the loss function, although Cross-Entropy is commonly used for classification tasks, some studies report that mean squared error (MSE) performs better for certain problems [
52]. Since there is no definitive conclusion about the superiority of one over the other, this study follows [
15,
16] and adopts MSE as the loss function.
Regarding the optimizer, SGD was utilized in [
15,
16]. However, SGD requires careful tuning of the learning rate and is prone to becoming stuck in local optima if not configured correctly. On the other hand, Adam overcomes these limitations by automatically adjusting the learning rate. Considering the balance between computational cost and improved accuracy, this study selected Adam as the optimizer.
Based on these parameter settings, the average and standard deviation of the accuracy are calculated for the results of 20 seeds and compared with the results obtained in
Section 6.1.1.
6.1.3. Performance by the Missing Value Rate of the User Trajectory
As in
Section 3.3, the missing value rate
of trajectory
indicates its sparsity. Hence, the classification performance increases if only trajectories with low sparsity are learned by providing the threshold
for the missing value range
. Let
be a trajectory generated from element
of
obtained using any
. Then, we use the function
defined in Equation (
1) into Equation (
5). The dataset for month
m consists of the pair
of feature
generated by
, and the corresponding class
c is denoted by
. If we conduct a sensitivity analysis of the threshold
for
, we can verify the effect of trajectory sparsity on the classification accuracy.
In this experiment, a sensitivity analysis is conducted in the range
. Then,
indicates that the entire trajectory
is to be used for training and testing. When sampling data using any random seeds, we ensure that the datasets become
The sizes of the training and test data in the datasets
for each
value are listed in
Table 1. Additionally, for the models for each month, we adopted the GMD components and the IDNN hyperparameters with the maximum average accuracy in
Section 6.1.1 and
Section 6.1.2.
Figure 8 summarizes the flow of all the experimental procedures.
6.2. Results and Discussion: Tuning GMD Components n for Model Performance
Based on
Section 6.1.1, the trends in the classification performance of each ML model for GMD components
n are verified.
Figure 9 shows the test accuracy values obtained using components
and months
. The results are shown for the SVC with an RBF kernel, RF, and DNN, which are represented by red, blue, and green lines, respectively. In particular, the averages and standard deviations (1SD) of each model by dataset
for the 20 seeds are indicated. The vertical and horizontal axes represent the accuracy and GMD components
n, respectively. The title presents the model that achieved the highest average accuracy, along with the GMD component number
n, which yielded the highest average accuracy. In addition, we conducted the Shapiro–Wilk test on the distribution of 20 accuracy values when the average was maximal, confirming the normality of the distribution. According to the test results,
,
, and
are indicated as ‘n.s.’, ‘*’, and ‘**’, respectively. The purpose of the Shapiro–Wilk test is to define the confidence interval for the classification accuracy of the ML models and determine the significance of the ML model for the random six-class classification model.
Figure 10 shows the training accuracy values, with the vertical and horizontal axes providing the same information as in
Figure 9.
Figure 9 shows that for all months, the accuracy tends to increase as
n increases, particularly in the SVC and RF. This is because if the GMD components
n are tiny, the expressiveness of the RPs in the target area might be rough, and the difference in the similarity of each ID feature might be less likely to occur. In particular, models with minimal components, such as
, significantly reduce the classification performance. Conversely, components
n that are too large may cause the expressiveness to peak and are less likely to contribute to the classification performance. The classification performance varies by the period of data acquisition; focusing on the maximal average by each month, the highest performance is achieved in January by SVC, with an accuracy of =0.336 (0.013), and the lowest performance is achieved in June by RF, with an accuracy of =0.238 (0.040) and a chance level of approximately
. Nonetheless, in any month, the normality of the accuracy distribution is confirmed to be
, and the classification performance, excluding the results from DNN, is higher than that of random classification (
) in the range of 3SD, i.e., a confidence interval of 99.7%. As a result, features consisting of only sparse trajectories would include sufficient information to classify the user age groups.
Here, we focus on the learning trends of each model. The SVC achieves the highest performance in January and February, whereas the RF achieves superior accuracy from March to June. Notably, during the period from March to June, the RF consistently outperforms the other two models across all the components n. In contrast, the DNN achieves significantly lower accuracy than both the SVC and the RF do in all months. This may be because the sample size of training dataset is small for DNN. Therefore, the following discussion focuses primarily on the SVC and RF.
Focusing on the transitions in accuracy with respect to the GMD components
n, RF tends to reach accuracy saturation earlier than SVC across all months for the test data. This trend is also observed in the transitions in accuracy for each model on the training data, as shown in
Figure 10, although the training accuracy of SVC tends to increase more gradually than that of the test data. From this observation, it can be seen that although SVC tends to have lower accuracy than RF does, it appropriately handles the complexity associated with the increase in the dimensionality of feature vectors, effectively suppressing overfitting. This is particularly evident during the period from March to June. On the other hand, RF demonstrates high accuracy even with small values of
n, that is, low-dimensional feature vectors, but it also tends to overfit. From these results, it can be concluded that SVC exhibits higher generalization performance than RF does. Moreover, the RF achieves high accuracy even with low-dimensional feature vectors. In the computation of the GMD, an increase in the number of components
n leads to an increase in the computational cost; thus, the RF is appropriate for scenarios where the computational cost is a priority. Conversely, when the generalization performance is emphasized, SVC is preferred. These findings are based on the accuracy obtained within the hyperparameter space described in
Section 5.3. Hence, different results may emerge when a broader hyperparameter space is considered.
Note that because the GMD components n obtained in this experiment are the results for only one area, i.e., Narashino, Chiba, Japan, the components are not necessarily practically applicable to any area and could depend on the scale and shape of the target area.
6.3. Results and Discussion: Comparison of the GMM and IDNN
The performance of the ML models that use the feature vectors
and
is compared on the basis of
Section 6.1.2.
Table 2 shows the average and standard deviation of 20 accuracies on training and test data by month in
. The header of the table indicates the combination of the feature extraction method and the ML model, e.g., GMM-SVC indicates that SVC trained by
. In addition, the parameters providing the highest average accuracies for each feature extraction method on the test data are indicated in parentheses. These include
, representing the number of GMD components for the GMM, and
for the IDNN, which denotes the number of hidden layers and the number of neurons in each layer, respectively. These accuracies are obtained via the SVC, RF, and DNN with optimized hyperparameters in space described in
Section 5.3. The highest average accuracy for each month is indicated in bold.
The difference in accuracy between the training and test data serves as a critical indicator of the model’s generalization performance. Specifically, when training accuracy is substantially higher than test accuracy, it indicates that the model is overfitting. Conversely, a small difference in accuracy implies that the model exhibits high generalization performance. Therefore, evaluating the accuracy difference between training and test data provides a more detailed assessment of each model’s generalization capabilities.
The results show that the accuracy of the GMM-SVC is the highest in January and February, whereas the GMM-RF achieves the highest accuracy from March to June. GMM-SVC and GMM-RF consistently show higher accuracy than IDNN-SVC and IDNN-RF across all months. The scores of IDNN-SVC fall within the 1–3 SD range of the scores for GMM-SVC and GMM-RF, indicating relatively small differences in accuracy. On the other hand, focusing on the accuracy differences between the training and test data, the range for GMM-SVC and GMM-RF is 0.019–0.392, whereas IDNN-SVC and IDNN-RF exhibit a wider range of 0.358–0.679. This suggests that IDNN-SVC and IDNN-RF are prone to overfitting, whereas feature extraction via the GMM demonstrates better generalization performance. Note that, except for IDNN-SVC in March and May and IDNN-DNN in March and June, all the results have , confirming that the accuracy distributions on test follow a normal distribution.
This experiment was conducted with a missing ratio of , resulting in highly sparse trajectories. Consequently, the results indicate that the GMM-based features achieve higher accuracy and generalization performance than the IDNN does for sparse trajectories. The reliance of the IDNN on image-based trajectory transformations likely makes it more susceptible to noise, reducing its generalization performance. In other words, imputed trajectory information, when directly converted into pixel values, can lead to irregular spatial patterns in the image, causing noise accumulation. For sparse trajectories, the lack of structural information on the IDNN may limit its effectiveness. By contrast, GMM-based feature generation identifies RPs on the basis of the distribution of human flows and aggregates trajectories around those RPs. This approach may minimize the accumulation of noise caused by sparsity. As a result, GMM-based features demonstrate superior generalization performance for sparse trajectories.
6.4. Results and Discussions: Performance by the Missing Value Rate of the User Trajectory
On the basis of
Section 6.4, we analyze the trend of the ML models using the feature vectors
and
performance with respect to trajectory sparsity. First, to consider the effect of trajectory sparsity on classification performance, we visualize the trends of the accuracy values for the test and training data against the missing value ratio
in
Figure 11 and
Figure 12, respectively. For both figures, the vertical and horizontal axes represent the accuracy
and missing value ratio
, respectively. In addition, the red, blue and green symbols represent the SVC, RF, and DNN, respectively, and the circle and cross symbols indicate the GMM- and IDNN-based features, respectively. Fewer differences between the accuracy of training and test data indicate higher generalization performance. In
Figure 11, the horizontal and vertical dashed lines represent the best accuracy and the corresponding
among the combinations of feature extraction methods and ML models, respectively.
In the SVC and RF methods that use GMM and IDNN features, given that the trends in which a lower
leads to higher accuracy were confirmed between the threshold of the best accuracy, i.e., the dashed line in
Figure 11 and
, the sparsity of the trajectory would decrease the model performance. Conversely, if
is too small, low performance can also occur. This is because an extremely small
yields low numbers of samples relative to the feature space. Conversely, for both the GMM and IDNN features, the DNN tends to decrease in accuracy as
decreases in the range of
–
, which contrasts with the trend observed in SVC and RF, where accuracy increases as
decreases. This suggests that, in this range of missing value ratios, the amount of training data may be insufficient to adequately train the DNN models. This result is supported by the findings shown in
Figure 12 for the DNN on the training data. Because the trends in accuracy for the training and test data are similar, it can be inferred that insufficient data availability prevents the DNN from learning effectively, thereby hindering its performance improvement.
Focusing on the best accuracy values in
Figure 11, GMM-RF achieves the highest accuracy for all months. In particular, for April to June, the test accuracy of GMM-RF is higher than that of the other combinations for all the missing value ratios. In addition, the test accuracy of GMM-SVC is not as high as that of GMM-RF, but it is confirmed that GMM-SVC has high generalization performance because of the smaller differences between the accuracies of the training and test data. This is implied from
Figure 11 and
Figure 12. On the other hand, while the test accuracies of the GMM and IDNN features are close, the training accuracy of the IDNN features tends to be greater than that of the GMM features, particularly for
–
. This implies the IDNN features are prone to overfitting in the instance of this experiments. This trend was confirmed in
Section 6.3, and its reasons are the same as those mentioned before; that is, the conversion process of a trajectory to an image cannot eliminate noise. These results also suggest that the GMM-based features are more effective for handling sparse trajectories than IDNN-based features. Overall, the results indicate that GMM-based features provide a more reliable and effective approach for extracting information from sparse trajectories, making them a better fit for scenarios where semantic information is unavailable and data sparsity is a significant challenge. For
–
, overfitting is confirmed in GMM-SVC and RF and IDNN-SVC and RF. This may be because the number of samples is small in the feature space, and the model is overfitted to the training data.
Here, we analyze the classification results via detailed metrics, such as precision, recall, and F-measure, in addition to accuracy. Note that we focus on only the GMM-RF results, which showed the highest accuracy in all months.
Table 3 shows each metric for the classification results of the test data by threshold
and month. The averages and standard deviations corresponding to datasets of 20 seeds for each metric are indicated in parentheses. Moreover, with respect to the confidence interval of classification accuracy, as in
Section 6.2,
Table 3 shows the
p-values for the Shapiro–Wilk test for the distribution of accuracies for 20 seeds. Because the
p-values are
except for
and
in May and
in June, we assume that any distribution of accuracies follows a normal distribution.
Focusing on the precision, recall, and F-measure in any parameter setting, we can conclude that the levels of classification performance for the classes are similar because these metric values are almost equivalent to the accuracy. This is a natural result because we handle a balanced dataset. For the missing value rate
, because the average accuracy is at its maximum value of 0.331 (0.011) in January and at its minimum value of 0.238 (0.004) in June, it is confirmed that the classification performance is significantly higher than random classification in any month. Furthermore, through adjustments in the threshold
, the classification performance is increased such that the average accuracy reaches its maximum of 0.545 (0.064) in January and minimum of 0.316 (0.011) in June. The normality of the accuracy distribution for the 20 seeds is confirmed, except for
and
in May and
in June. These results show that with appropriate parameter settings, the constructed GMM-RF models perform with significantly higher classification accuracy than those produced by random classification. Note that, in GMM-SVC, the classification performance is not as high as that of GMM-RF, but it is confirmed that GMM-SVC has the same tendency and high generalization performance, as shown in
Figure 11 and
Figure 12. As a result, we infer that sparse spatial–temporal information, i.e., a sparse trajectory, would have sufficient information to classify the corresponding user age group.
Finally, the classification performance with respect to the data acquisition period is discussed. Focusing on the maximum accuracy of GMM-RF, the accuracy values from January to June are 0.545 (0.064), 0.476 (0.038), 0.342 (0.017), 0.335 (0.023), 0.331 (0.015), and 0.316 (0.011), respectively. In January and February, the values are particularly high. Conversely, the values from March to June are lower than those from January and February. This occurred because declarations of a state of emergency and a semistate of emergency were enforced from 8 January to 21 March and 20 April to 1 August 2021, respectively, in the target area. The state of emergency involves strong behavioral regulation, including orders to close restaurants and commercial facilities. In contrast, the semistate of an emergency is a behavioral regulation that includes only the required orders regarding hours of operation [
45]. Thus, in January and February, in which states of emergency were enforced, the restriction of irregular behavior (e.g., amusement trips to commercial and recreational facilities) among people’s activities clearly differentiated the behavior patterns of people in different user age groups. However, in March, when a state of emergency was also enforced, the classification performance was significantly lower than that in January and February. This could be because the states of emergency migrated to semistates of emergency in the middle of March, and people’s behavior changed remarkably. The classification performance also declined from April to June; this may be because people’s behavior became more diverse than that during the state of emergency. The monthly sample sizes shown in
Figure 5 and
Figure 6 indicate that the sample size from March to June is larger than that from January to February. This suggests that people’s behavior patterns gradually returned to normal during the period from March to June. Nonetheless, the average accuracy values are significantly higher than those of random classification in the range of 3SD for many parameter settings. These results also reveal that the sparse trajectory contains sufficient information for classifying user age groups without the use of any semantic information other than spatial–temporal information.
7. Conclusions
In this study, we conducted a performance evaluation and analysis of GMM-based features for sparse trajectories without semantic information. The experiments utilized three machine learning models, namely, SVC, RF, and DNN, to perform a detailed examination of the features. We also compared them with the IDNN, an existing feature extraction method for trajectories. The results showed that GMM-based features achieved higher classification accuracy and better generalization performance than IDNN-based features. In particular, the RF algorithm attained high accuracy, whereas the SVC algorithm maintained stable generalization performance. Moreover, a sensitivity analysis of the missing rate , which indicates trajectory sparsity, revealed that the IDNN becomes more prone to overfitting as the sparsity increases, whereas the GMM-based approach can sustain accuracy and generalizability performance. These contributions demonstrate that GMM-based features offer clear advantages over existing methods in terms of both accuracy and generalizability when dealing with sparse trajectories lacking semantic information. Consequently, this study shows that sparse trajectory data can be effectively utilized even without semantic information. This study expands the potential for data utilization in various fields, including urban planning, transportation analysis, and public policy.
The limitations of this study are as follows:
Although simple linear interpolation was adopted for trajectory imputation, its impact on classification performance was not verified.
The proposed method was verified only for a single region. In addition, owing to constraints in the data acquisition period, the influence of behavioral changes from prepandemic norms during the COVID-19 era could not be taken into account.
The features did not contain the connection relationships of RPs.
In light of these points, our future research will verify the generality of the proposed model in terms of the area and period of location data and further consider the utilization of sparse spatial–temporal information.