Analysis of Sparse Trajectory Features Based on Mobile Device Location for User Group Classification Using Gaussian Mixture Model

Kakimoto, Yohei; Omae, Yuto; Takahashi, Hirotaka

doi:10.3390/app15020982

Open AccessArticle

Analysis of Sparse Trajectory Features Based on Mobile Device Location for User Group Classification Using Gaussian Mixture Model

by

Yohei Kakimoto

^1,*

,

Yuto Omae

¹

and

Hirotaka Takahashi

²

¹

College of Industrial Technology, Nihon University, 1-2-1 Izumicho, Narashino 275-8575, Japan

²

Department of Design and Data Science and Research Center for Space Science, Advanced Research Laboratories, Tokyo City University, 3-3-1 Ushikubo-Nishi, Tsuzuki-ku, Yokohama 224-8551, Japan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(2), 982; https://doi.org/10.3390/app15020982

Submission received: 29 October 2024 / Revised: 13 January 2025 / Accepted: 15 January 2025 / Published: 20 January 2025

(This article belongs to the Special Issue Data Analysis and Data Mining for Knowledge Discovery)

Download

Browse Figures

Versions Notes

Abstract

:

Location data collected from mobile devices via global positioning system often lack semantic information and can form sparse trajectories in space and time. This study investigates whether user age groups can be accurately classified solely from such sparse spatial–temporal trajectories. We propose a feature extraction method based on a Gaussian mixture model (GMM), which assigns representative points (RPs) by clustering the location data and aggregating user trajectories into these RPs. We then construct three machine learning (ML) models—support vector classifier (SVC), random forest (RF), and deep neural network (DNN)—using the GMM-based features and compare their performance with that of the improved DNN (IDNN), which is an existing feature extraction approach. In our experiments, we introduced a missing value ratio

θ_{th}

to quantify trajectory sparsity and analyzed the effect of trajectory sparsity on the classification accuracy and generalizability performance of the ML models. The results indicate that GMM-based features outperform IDNN-based features in both classification accuracy and generalization performance. Notably, the RF model achieved the highest accuracy, whereas the SVC model displayed stable generalizability. As the missing value ratio

θ_{th}

increases, the IDNN becomes more susceptible to overfitting, whereas the GMM-based approach preserves accuracy and robustness. These findings suggest that sparse trajectories can still offer meaningful classification performance with appropriate feature design and model selection even without semantic information. This approach holds promise for domains where large-scale, sparse trajectory data are common, including urban planning, marketing analysis, and public policy.

Keywords:

sparse trajectory classification; mobile device location; Gaussian mixture model; machine learning

1. Introduction

Over the first two decades of the twenty-first century, the use of mobile devices, such as smartphones with global positioning system (GPS) functions, has spread rapidly. Consequently, location data from these mobile devices via the GPS and the attribute data of users attached to these location data are now being collected on a grand scale, with the expectation that these data would be utilized for urban planning and the analysis of human activity, among other applications [1]. Location data and their associated data are regarded as the trajectories of mobile device users. There has been much research on classification and prediction models for user trajectories [2,3,4,5]. This study mainly focuses on classifying user groups based on location data and their associated data, as obtained from mobile devices.

Many studies have aimed at improving the accuracy of trajectory classification using location information consisting of spatiotemporal dimensions obtained from mobile terminals and associated semantic information [6,7,8,9,10,11]. In particular, Ferrero et al. proposed a MasterMovelet that characterizes trajectories with semantic dimensions using a series of uniform procedures [12]. However, whereas these studies assume that the semantic dimension is attached to the trajectory, the semantic information associated with the location data acquired from mobile devices is often missing. In this context, methods for trajectory similarity computation that do not require semantic dimensionality information other than spatial–temporal dimensionality have been proposed by Han et al. and Zhou et al. [13,14]. In addition, Endo et al. proposed a stacked denoising autoencoder (SDA) [15] based method for extracting trajectory features for transportation mode classification [16]. However, these methods do not assume that trajectories are sparse in the spatiotemporal dimension. Because irregular events obtain mobile terminal-based location information, the trajectories of such information could be sparser in spatial–temporal space.

Many studies have focused on trajectory classification using dense or semantically rich data. However, location data generated by mobile devices often lack semantic information and exhibit spatial–temporal sparsity. This raises the challenge of effectively utilizing sparse trajectories for applications such as user group classification and mobility analysis. Addressing this issue is crucial for harnessing the potential of trajectory data in fields such as urban planning, marketing analysis, and public policy. In addition, increasing data integrity can significantly enhance the usability of location data for these applications.

For these reasons, in our past work, we developed a sparse trajectory feature based on spatial–temporal information for classifying user age groups [17]; however, we conducted only a superficial analysis for limited binary classification and did not significantly examine feature availability. Furthermore, this past research focused solely on analyses based on the support vector classifier (SVC) [18,19] model and did not include comparisons with other machine learning (ML) models and trajectory features based on existing methods. Therefore, this study provides a detailed analysis of whether user groups can be estimated using only a sparse spatial–temporal trajectory without considering prior knowledge and past historical data associated with mobile device users. In particular, we analyze the feature based on [17], which is extracted from users’ location data using point-type dynamic population (PTDP) data [20] obtained from the GPS function of mobile devices. Moreover, we adopt multiple types of machine learning (ML) models that learn the features for six-class classification and conduct a detailed analysis. Additionally, by evaluating the user age group classification performance of the ML models, we verify the effectiveness of features based on sparse spatial–temporal trajectories in user age group estimation.

Each PTDP data point has an ID for identifying mobile devices by day, and those linked to this ID are the associated attributes registered by the user of the device and the location data from the device. Although location data are acquired in one-minute units, most of the data for a single day are missing because the data acquisition timing is determined by an event, such as the user opening a specific application of the device. Thus, we use a simple method to impute the missing location data and regard these imputed data as the user trajectory to make the data length uniform. Furthermore, for this study, we select age groups as the specific user groups that are subjected to classification and assume that the age groups can be categorized into six classes: 0 to 19, 20 s, 30 s, 40 s, 50 s, and 60 s and over.

Typically, mobile users’ location data, which consist of longitude and latitude, are expressed in the geographic coordinate system. In other words, the location data that are part of a trajectory are represented as points in the region [180° W–180° E, 90° S–90° N]. Because using such trajectories as features is not realistic, we need to summarize a trajectory into feature vectors consisting of a finite number of dimensions. Hence, we generate n-dimensional feature vectors by assigning n representative points (RPs) to a target region and aggregating the user locations into RPs. A RP is generated using the Gaussian mixture model (GMM), which assumes a Gaussian mixture distribution (GMD) for the population distribution in a target region. In particular, the GMD is generated from the raw location data of all users in a target period, and we regard the centroids of each Gaussian distribution as the RPs for that distribution. Thus, the RPs are selected in locations where the population is concentrated. In this way, each user trajectory is aggregated into RPs, and n-dimensional feature vectors for classifying user age groups are generated. We employed ML models that use the n-dimensional feature vector as input and output an age group linked to the mobile users. For the ML models, we employed SVC, random forest (RF) [21], and deep neural network (DNN) [22] models, and for the hyperparameter tuning of these ML models, we used fivefold grid search cross-validation (GS+CV). With feature vectors generated using GMD in this manner, it is possible to construct a classification model that uses only spatial–temporal information without complicated preprocessing, which differs for each target region. In addition, this study aims to verify the effectiveness of features based on the GMM for classifying sparse trajectories without semantic information. To achieve this goal, a comparison is conducted with existing feature extraction methods for trajectory data. In particular, the comparison focuses on features extracted using the improved DNN (IDNN), developed by Endo et al. [16], which is based on an SDA [15]. The IDNN does not assume that the trajectory is sparse, but part of this feature extraction method can be applied to our problem settings. This comparison demonstrates the superiority of the GMM-based features for sparse trajectories proposed in our past work, highlighting their effectiveness in this context. This allows for an evaluation of the relative performance and applicability of the GMM-based features.

In our experiments, we demonstrate the performance of three ML models trained with GMM-based features via PTDP data acquired between January and June 2021 in Narashino, Chiba, Japan. Because the number of GMD components that determine the feature vector’s dimensions can significantly influence the classification accuracies of the ML models, we conduct a sensitivity analysis on the number of components. In addition, we evaluate the accuracy and generalization performance of the combinations of three ML models and GMM- and IDNN-based features with respect to the level of missing value ratio, which quantifies the sparsity of trajectories. The trajectory classification ability of GMM-based features consisting of the sparse trajectory and the spatial–temporal sparsity of the user trajectory are examined in terms of their effectiveness for classification accuracy via a comparison with IDNN-based features. As a result, this study demonstrates that GMM-based features exhibit greater robustness to trajectory sparsity and higher generalization performance than the features extracted by the IDNN. These findings highlight the superior capability of GMM-based features in effectively classifying user age groups from sparse trajectories, compared to existing trajectory-based features, making them a promising approach for scenarios where only spatial–temporal information is available.

2. Literature Review

This section reviews relevant research studies to highlight the originality and practicality of our GMM-based feature extraction method. In this work, we treat age group classification as an example of user group classification using a sparse trajectory.

Traditionally, age group classification tasks have relied heavily on human biometric data. For instance, various age estimation techniques based on facial images have been developed over the years [23,24,25]. Additionally, Saxena et al. introduced an age group estimation method leveraging fingerprint data, while Nabila et al. proposed using gait patterns for the same purpose [26,27]. Beyond biometric-based approaches, alternative methods have also emerged. For example, Guimaraes et al. developed a strategy for classifying age attributes based on user-generated text on social media platforms [28].

Despite this wealth of research on age group classification, direct applications of trajectory data to age group classification are rare. Thus, we focus on research studies that, although not explicitly designed for user age group classification, can be adapted for this purpose. This section is organized into two key aspects: independence from semantic information and applicability for sparse trajectories, with potential applications and limitations of each method being discussed in detail.

2.1. Trajectory Analysis Using Semantic Information

Many studies have proposed trajectory classification methods that utilize data with both spatial–temporal and other semantic dimensions to increase accuracy. Wang et al. proposed an estimation method for the demographics of user groups based on trajectories obtained by analyzing wireless access points and prior knowledge of mobile users [29]. Montasser et al. also proposed a method for predicting the demographics of geographic units by analyzing geotagged tweets [30]. Many other methods have also been proposed, corresponding to classification or prediction models for user groups based on additional geographic and social network information [6,7,8,9,10,11]. However, these methods must process a considerable amount of prior knowledge data linked to mobile users and perform text analyses on posts in social network services.

In this context, Ferrero et al. proposed MasterMovelets, which designs features for trajectories with semantic dimensions, excluding spatial–temporal dimensions [12]. MasterMovelets can be used to examine the features of spatial–temporal trajectories with rich information regarding other semantic dimensions via a series procedure. This method increases classification accuracy by focusing on local subtrajectories in a target trajectory. Most of the research introduced so far assumes that the trajectories have semantic dimensions in addition to spatial–temporal dimensions and aims to increase the classification accuracy for such trajectories. However, user information linked to mobile device location data tends to have many missing values because of the nonuniformity of users’ registration statuses. Therefore, given that homogeneous information is not always added to all user trajectories, these methods cannot be applied to trajectories, which tend to have missing information.

2.2. Trajectory Analysis Using Only Spatial–Temporal Information

2.2.1. Similarity-Based Methods

DTW [31], a traditional similarity measurement method for sequential data, can measure the similarity between trajectories on the basis of spatiotemporal information in the context of trajectory classification. The similarity of trajectories can also be utilized for classifying user attributes associated with trajectories. Although a variety of extended DTWs have been proposed [32,33], these methods have a high computational cost. By contrast, Han et al. proposed a highly accurate prediction method for the similarity of trajectory patterns composed of only spatial–temporal information based on a graph-based approach [13]. Zhou et al. developed a noise-robust method for calculating the similarity of trajectories and solved data sparsity for the search space for the same problem setting [14]. However, these methods do not assume spatial–temporal sparsity of the trajectory based on mobile device location data. In contrast, location data from mobile devices are obtained almost entirely through irregular events, such as the user opening an application. In addition, with regard to anonymity, location data often do not account for microscopic movement patterns associated with users’ residences. The trajectories consisting of such location data are very sparse.

2.2.2. Trajectory Prediction and Recovery for Sparse Trajectory

Wang et al. proposed a model for predicting the future mobility of mobile users using such sparse trajectories based only on spatial–temporal dimensions [34]. Their model learns the trajectories of heterogeneous users accumulated in the past and predicts the future movements of users on the basis of their similarity with past users’ data. However, that study focused only on predicting spatial–temporal information about how users move from one point to another; it did not aim to estimate a user group without spatial–temporal information.

If it is possible to recover a complete trajectory for a sparse trajectory that lacks information, conventional methods could be applied to user-group classification. Various methods for such trajectory recovery have been proposed [35,36,37,38]. Xia et al. developed an attentional neural network-based model that recovers sparse trajectories by learning past accumulated historical trajectory data of users [35]. Sun et al. also proposed a trajectory recovery model for similar problem settings using a graph neural network, and both models demonstrate high performance; however, these models do not assume a new user who does not store past historical data [36]. For this problem, Si et al. proposed a new method that mitigates the cold start problem of such new users by using a transformer [37]. However, these methods require large amounts of historical data linked to specific users. Conversely, since unique IDs of mobile users’ location data covered by this study are reset on a daily basis, these methods cannot be applied to our problem settings.

2.3. Feature Extraction Applicable to Sparse Trajectories Without Semantic Information: A Comparative Baseline

Endo et al. proposed a feature extraction method for classifying transportation modes of trajectories using a SDA [15,16]. In this method, trajectories are converted into images and flattened into vectors, which are then used as inputs for a DNN constructed via SDA. Fine-tuning is performed using annotated labels, and the resulting improved DNN (IDNN) is employed to extract features from the trajectories. The semantic information associated with the trajectories is subsequently added, enabling high-accuracy transportation mode classification. In this study, we target trajectories that lack semantic information; however, the feature extraction method using the IDNN can still be applied. On the other hand, because IDNN-based feature extraction relies heavily on the shape of trajectories, it assumes dense trajectory data. Therefore, its performance on sparse trajectories remains uncertain. In our experiments, we compare the performance on sparse trajectories using features extracted by our proposed method and those extracted by the IDNN.

3. Trajectory Data

3.1. PTDP Data

The PTDP data used in this study are obtained from the GPS function of mobile devices and are provided by Agoop Corp. (Tokyo, Japan) [20]. Agoop Corp., the provider of these data, is a company that offers location-based services and is 100% owned by Softbank, one of the major telecommunications carriers in Japan. PTDP data are used for various purposes, such as urban planning, marketing analysis, and public policy development [39,40,41,42]. Accurately estimating user groups, e.g., age groups, can significantly enhance the usability of PTDP data in these research areas, enabling more precise analyses and evidence-based decision making. The columns of the PTDP data table, which are used in this research study, consist of the following attributes:

Daily ID: User ID associated with the mobile device. This ID is reset on a daily basis.
Timestamp: timestamp recorded in yyyy-mm-dd HH:MM for obtaining location information.
Longitude: longitude of the location data.
Latitude: latitude of the location data.
Age group: age groups composed of those aged under 15, 70 and over and other age intervals divided into five-year increments, i.e., 0–14, 15–19, 20–24, …, 65–69, 70 and over.

Each row in the PTDP data table records the above attributes, which represent a user’s location information, demographic attributes, and time. For each user, daily location information is recorded across multiple rows. These point data are acquired through specific events, such as when a user opens an application on a smartphone. Each row of data is logged at a minimum interval of one minute. Therefore, by arranging these location data in a time series order, trajectories associated with individual users can be constructed.

Notably, there are missing values for the age group attribute; however, in this study, we excluded data with missing age group information from the experiments. Using the age groups in their original form could result in an excessive number of classes, which might lead to an extremely small number of data points in each class. Moreover, the original classification does not clearly indicate the attributes of individuals in each class, such as whether they are students or workers. For these reasons, this study reclassifies the age groups into six categories: 0–19, 20 s, 30 s, 40 s, 50 s, and 60 s and over. These groups can generally be associated with standard attributes, such as students, early-career workers, young workers, middle-career workers, senior workers, and retirees. Notably, the behavior patterns of the 0–19 age group may overlap with those of their caregivers.

3.2. Data Assumption

As mentioned in Section 3.1, this study assumes that PTDP data are the location data that constitute the trajectories. The set of IDs linked to the mobile devices (where each ID is linked to exactly one mobile device) is denoted by K, and the set of location data corresponding to ID

k \in K

is denoted by

L_{k}

. Note that all the IDs are reset on a daily basis. If the maximum length of the location data in a day is denoted by

T

, the location data size for a day is

| L_{k} | \leq T

, i.e.,

\begin{matrix} L_{k} = {x_{k, t} ∣ t \in N_{\leq T}}, | L_{k} | \leq T, \end{matrix}

where

N_{\leq T}

is a natural number that is less than or equal to

T

. Note that

x_{k, t}

is a point of ID k on the geographic coordinate system at time t, and

x_{k, t} = {(x_{k, t}^{lon}, x_{k, t}^{lat})}^{⊤}, x_{k, t}^{lon} \in

[180° W–180° E],

x_{k, t}^{lat} \in

[90° S–90° N].

Although the acquisition timing of location data

x_{k, t}

is at a unit time

t_{unit}

interval, if a mobile device with ID k does not cause any event, location data are not acquired. Hence, for almost all sets

L_{k}

for

T

, a large portion of the location data acquired by mobile devices are missing, i.e.,

| L_{k} | ≪ T

.

3.3. Imputation for Missing Location Data

As mentioned in Section 3.2, a large portion of the location data for any

L_{k}

is missing. Accordingly, generating features is difficult because the data length of

L_{k}

differs from that of ID k. Thus, we must impute a location dataset

L_{k}

to a trajectory

T_{k} = {τ_{k, 1}, τ_{k, 2}, \dots, τ_{k, T}}

with aligned data length, where

τ_{k, t}

is a point of ID k on the geographic coordinate system at time t.

As we mentioned in Section 2.2, although various trajectory recovery methods have been proposed, these models require large amounts of historical trajectory data linked to unique user IDs. However, because the mobile location data covered by this study do not store historical data for user anonymization, we cannot use such historical data. Thus, in this study, we use simple linear interpolation for trajectory recovery.

Let us consider a scenario in which an ID is obtained at Site A at time

t_{1}

. Then, later, the same ID is acquired at Site B at time

t_{2}

, where

t_{1} < t_{2}

and

t_{1}, t_{2} \in N

. When

t_{2} - t_{1} > 1

, we assume that the ID contains missing location data between

t_{1}

and

t_{2}

. An assumption needs to be made regarding the missing data to be imputed if there is no information about the ID location. Hence, we assume that a mobile user departs from Site A at

t_{1}

and then arrives at Site B at

t_{2}

. In other words, on the basis of the unit time

t_{unit}

and the acquisition times

t_{1}

and

t_{2}

, the path can be obtained by dividing the interval between Sites A and B by

(t_{1} - t_{2}) / t_{unit}

. By this assumption, we obtain trajectories

T_{k}, \forall k \in K

.

The missing value ratio of

T_{k}

, i.e., the ratio of imputed data for the entire

T_{k}

, is defined by the function

Θ (\cdot) \mapsto [0, 1)

with location dataset

L_{k}

, which is the source of

T_{k}

as an input, as follows:

\begin{matrix} Θ (T, L_{k}) = θ_{k} = \frac{T - | L_{k} |}{T} . \end{matrix}

(1)

Note that

| L_{k} | > 0

. Afterward, denoting the missing value ratio by

θ_{k}

, we use the function

Θ

with

L_{k}

as the input. The missing value ratio

θ_{k}

refers to the sparsity of trajectory

T_{k}

such that the closer the value of

θ_{k}

is to 1, the sparser the trajectory

T_{k}

.

Figure 1 shows the distribution of the missing value ratios of the trajectories based on PTDP data obtained at Narashino, Chiba, from January to June 2021. Figure 1 indicates that almost all the location data are

θ_{k} \geq 0.95

. Notably, the data with

θ_{k} \geq 0.99

comprise 63% of the data. Figure 1 thus implies that almost all the trajectories based on PTDP data are sparse. Figure 2 visualizes the imputation.

4. Feature Extraction Method for Sparse Trajectories

4.1. Feature Extraction by GMM

In our previous study, we proposed a method that determines and examines the features of trajectories by combining the imputed trajectory in Section 3.3 and the GMM in [17]. In this study, we analyze the features of trajectories via the same method; however, the properties of the method are related to those of the experiment presented in this paper and are therefore introduced in detail below.

The PTDP data consist of points denoted by

x = {(x_{lon}, x_{lat})}^{⊤}

in the geographic coordinate system. Because any

x

is a set of coordinates in the region [180° W–180° E, 90° S–90° N], it is not realistic to regard its coordinates as features. Hence, n RPs are determined for the given region, and by allowing the PTDP data to aggregate into the RPs, we generate n-dimensional feature vectors

f_{k}^{GMM}

for each ID k.

4.1.1. Determining RPs via GMM

Generally, it is desirable to select densely populated locations, such as stations and commercial facilities, as RPs; however, this method requires an analysis of the regional characteristics of a target area and thus would not result in a general framework. Instead, we use the GMM to aggregate a target area into a number of RPs. The GMM is a clustering method that represents data points scattered in space as overlapping n-Gaussian distributions.

In this study, because the target space is two-dimensional, the GMD is represented by an overlapping bivariate Gaussian distribution as follows:

\begin{matrix} N (x ∣ μ, Σ) = \frac{1}{{2 π | Σ |}^{1 / 2}} exp \{- \frac{1}{2} {(x - μ)}^{⊤} Σ^{- 1} (x - μ)\}, \end{matrix}

where

μ

and

Σ

are the mean vector and variance–covariance matrix, respectively. Thus, the GMD is defined as follows:

\begin{matrix} p (x) & = \sum_{i \in I} π_{i} N (x ∣ μ_{i}, Σ_{i}), \\ N (x ∣ μ_{i}, Σ_{i}) & = \frac{1}{2 π | Σ_{i} |^{1 / 2}} exp \{- \frac{1}{2} {(x - μ_{i})}^{⊤} Σ_{i}^{- 1} (x - μ_{i})\}, \\ \sum_{i \in I} π_{i} & = 1, \end{matrix}

where I is the set of bivariate Gaussian distributions and

μ_{i}

and

Σ_{i}

are the i-th mean vector and variance–covariance matrix, respectively. In addition,

π_{i}

is the mixing coefficient that expresses the weight of the i-th Gaussian distribution. The obtained means

μ_{i}, i \in I

of n bivariate Gaussian distributions are regarded as the coordinates of the RPs with which we approximate the target space. The number of overlapping Gaussian distributions, i.e., the number of components n, can be determined arbitrarily. In this context, because the lengths of the location data are not necessarily the same for any ID k, we use the raw location data

{x ∣ x \in L_{k}, \forall k \in K}

as samples for the GMD.

4.1.2. Aggregation of User Trajectories into RPs

As mentioned at the beginning of this section, using trajectory

T_{k}

of ID k imputed based on Section 3.3 as feature vectors is not realistic. Accordingly, we define the feature vector to aggregate trajectory

T_{k}

of data length

T

into RPs obtained via the GMM. In particular, the nearest RP for each coordinate in trajectory

T_{k}

is determined, and the obtained RP is assigned as the representative for that coordinate. The number of times that a coordinate is assigned to a certain RP is stored in the RP and is regarded as the stay time of ID k at that RP. This operation generates feature vectors

f_{k}^{GMM}, \forall k \in K

with n features based on trajectory

T_{k}, \forall k \in K

.

The specific procedures are indicated below. Let the coordinates of ID k obtained at time t be

x_{k, t} = (x_{k, t}^{lat}

,

x_{k, t}^{lon})

. The cluster variable is denoted by

z_{i}

, which equals 1 if sample

x_{k, t}

belongs to the Gaussian distribution

i \in I

; otherwise, it equals 0. In this context, the probability that

z_{i} = 1

, given

x_{k, t}

, is

p (z_{i} = 1 ∣ x_{k, t})

, and from Bayes’ theorem,

\begin{matrix} p (z_{i} = 1 ∣ x_{k, t}) & = \frac{p (z_{i} = 1) p (x_{k, t} ∣ z_{i} = 1)}{\sum_{i \in I} p (z_{i} = 1) p (x_{k, t} ∣ z_{i} = 1)} \\ = \frac{π_{i} p (x_{k, t})}{\sum_{i \in I} π_{i} p (x_{k, t} ∣ z_{i} = 1)} \end{matrix}

is obtained. Here, because

p (x_{k, t} ∣ z_{i} = 1)

is the probability of sample

x_{k, t}

on the i-th Gaussian discussion of the GMD,

\begin{matrix} p (z_{i} = 1 ∣ x_{k, t}) & = \frac{π_{i} N (x_{k, t} ∣ μ_{i}, Σ_{i})}{\sum_{i \in I} π_{i} N (x_{k, t} ∣ μ_{i}, Σ_{i})} \end{matrix}

(2)

is obtained based on

\begin{matrix} p (x_{k, t} ∣ z_{i} = 1) = N (x_{k, t} ∣ μ_{i}, Σ_{i}) . \end{matrix}

Equation (2) is the conditional probability that

z_{i} = 1

given sample

x_{k, t}

; thus, the Gaussian distribution i that maximizes Equation (2), i.e.,

\begin{matrix} {argmax}_{i \in I} p (z_{i} = 1 ∣ x_{k, t}), \end{matrix}

is the distribution to which sample

x_{k, t}

belongs. By applying this operation to

x_{k, t}, \forall t \in N_{\leq T}

, we can transform

T_{k}

of ID k to an n-dimensional feature vector

f_{k}^{GMM}

and generate a training dataset for the ML model. Note that the components of vector

f_{k}^{GMM}

are assumed to be

f_{i}, i \in I

,

f_{i} \in {x \in R_{\geq 0} ∣ x \leq T}

and

\sum_{i \in I} f_{i} = T

. Because this n-dimensional feature vector is obtained by aggregating the target area into n RPs, the size of n is regarded as the expressiveness for the source area and may affect the classification accuracy.

A diagram of this procedure is presented in Figure 3.

4.2. Deep Feature Extraction: A Comparative Baseline

Endo et al. proposed a feature extraction method for classifying the transportation modes of trajectories obtained via GPS data [16]. In this method, the target trajectories for classification are converted into images, and fine-tuning is performed on an nueral network (NN) constructed via SDA with annotated labels. The IDNN obtained through this process is then used to extract features from the trajectories. Furthermore, the extracted features are combined with semantic information associated with the trajectories to generate new features.

Alternatively, we aim to examine whether user age group classification can be performed using sparse trajectory data without semantic information. Although the entire method proposed by Endo et al. [16] requires semantic information, the IDNN, which is a part of their method, does not rely on such information. Therefore, the IDNN is considered applicable to the nonsemantic and sparse trajectory data targeted in this study. In this study, we compare the classification accuracy for sparse trajectories using feature extraction via the IDNN with that via the GMM-based method.

The feature extraction method using the IDNN generates features through the following three phases:

Trajectory image generation
The target area is divided into rectangular regions of a specified size $W_{m} \times H_{m}$ , and each region is represented as a pixel. This allows the target area to be treated as an image, where each element of the trajectory $T_{k}$ is converted into a pixel value $v (i, j)$ for the region $(i, j), i \in {0, 1, \dots, W_{m}}, j \in {0, 1, \dots, H_{m}}$ . The pixel value $v (i, j)$ is incremented by one for every minute the user k spends in the region. Thus, $v (i, j)$ represents the total time that user k spends in region $(i, j)$ , satisfying the following equation:

$\begin{matrix} \sum_{i}^{W_{m}} \sum_{j}^{H_{m}} v (i, j) = T . \end{matrix}$

Finally, the pixel values $v (i, j)$ are normalized such that they fall within the range $\hat{v} (i, j) \in [0, 1]$ . This process generates a grayscale image $I_{k} = (\hat{v} (i, j)) \in {[0, 1]}^{W_{m} \times H_{m}}$ associated with each ID k.
Pretraining by SDA
The autoencoder (AE) is a type of NN designed to compress and reconstruct input vectors, and it is primarily used for feature extraction and dimensionality reduction. For example, when a flattened vector of an image $I_{k}$ is denoted as $x_{k}^{img} \in {[0, 1]}^{W_{m} H_{m}}$ , an AE constructs an NN that minimizes the loss function $L (x_{k}^{img}, z)$ between $x_{k}^{img}$ and the reconstructed vector $z \in R^{W_{m} H_{m}}$ . The denoising autoencoder (DAE) is an extension of the AE that enhances robustness by adding noise, sampled from a distribution $q_{D}$ , to the input vector $x_{k}^{img}$ before constructing the AE. SDA is a type of DNN built by stacking multiple DAEs as middle layers, where each layer is trained independently by its respective DAE [15]. Specifically, the l-th middle layer of an SDA is trained by minimizing the loss function $L (x_{l - 1}, z_{l})$ between the features $x_{l - 1}$ learned by the $(l - 1)$ -th layer DAE and the output $z_{l}$ of the l-th layer. The number of middle layers $λ$ in a DNN constructed by an SDA equals the number of stacked DAEs. In the l-th DAE, the input vector $x_{l - 1}$ is first passed through the encoding function $φ (\cdot)$ after noise is added according to $q_{D}$ and then reconstructed as a vector $z_{l}$ through the decoding function $φ^{'} (\cdot)$ . Typically, $q_{D}$ employs masking noise, which randomly forces selected elements to zero. The conceptual diagram of an SDA is shown in Figure 4a.
Fine tuning and feature extraction
Fine-tuning is performed by adding an output layer to the top of the pretrained DNN constructed by the SDA and minimizing the loss function $L_{supv} (y, z_{supv})$ between the output $z_{supv}$ and the target labels $y$ (Figure 4b). In the fine-tuning process, the target labels are defined as binary vectors representing the classes. Finally, the fine-tuned IDNN takes the vector $x_{k}^{img}$ as input, and the feature vector $f_{k}^{IDNN}$ is extracted from the $λ$ -th intermediate layer of the IDNN. The dimensionality of the extracted feature vector $f_{k}^{IDNN}$ is equal to the number of neurons in the $λ$ -th middle layer.

Through this series of procedures, the feature vectors

f_{k}^{IDNN}, \forall k \in K

are obtained. Notably, the IDNN proposed by Endo et al. [16] utilizes not only the feature vectors derived from trajectories but also other semantic information as features. By contrast, as this study aims to classify trajectories without semantic information, the classification performance is evaluated using only the feature vectors

f_{k}^{IDNN}

obtained through the IDNN.

5. Experimental Setup

5.1. Data Summary

PTDP data from January to June 2021 from Narashino, Chiba, Japan, provided by Agoop Corp. [20], were used for this experiment. The target periods of data extraction comprise only workdays because people’s behaviors on holidays are commonly different from those on workdays, which could affect classification accuracy. Narashino is in the region [139.59–140.50° E], [35.38–35.42° N], covering an area of 20.27 [km²], and as of January 2021, its population was 176,197 [43,44]. The location data acquisition timing is unit time

t_{unit} = 1

[min]. Because IDs linked to mobile devices are reset at 12:00 a.m., each user’s behavior can be identified only for a maximum of one day. In other words, the data length of trajectories

T_{k}, \forall k \in K

is

T = 60 \times 24 = 1440

. Each ID k is associated with one age group among five-year age divisions. Although an age group is also linked to each mobile device, not all IDs have age group information, and some data are missing. Therefore, in this study, we construct and evaluate the ML model using only data for IDs with age group information.

This study aims to determine whether the features generated by the sparse trajectory

T_{k}

can lead to accurate classifications. To facilitate this evaluation, we simplify the problem into a six-class classification problem, with each class comprising ages 0 to 19, 20 s, 30 s, 40 s, 50 s, and 60 s and over.

Because people’s behavior can vary depending on the season, we divided the data from January to June 2021 by month, i.e.,

1, 2, \dots, 6

. Figure 5 shows the distribution of the data size by month and class; the figure indicates that the dataset is imbalanced. Additionally, Figure 6 shows the distribution of the data size for the missing value rate

θ_{k}, \forall k \in K

for each month. Figure 6 indicates that the distributions for all the periods are almost the same shape.

Note that during the target period, the target area was under the outbreak of the novel oronavirus (COVID-19), and declarations of a state of emergency and a semistate of emergency were enforced from 8 January to 21 March and 20 April to 1 August 2021, respectively [45]. This is noted in the experimental results and discussion.

5.2. Under Sampling for Imbalanced Data

Typically, an ML model that learns imbalanced data has problems in that the estimation accuracy for each class is biased. Many methods for sampling and evaluating imbalanced data have been proposed [46]. In the experiments, the imbalanced dataset was balanced by undersampling to simplify the evaluation of the classification models.

Undersampling is the process of sampling data from all classes according to the class with the smallest data size. In other words, if datasets whose elements are pairs

(f, y)

of inputs

f

and outputs y are denoted by

D

, the data size of any month that should be sampled is as follows:

\begin{matrix} S_{m} = | C | (min_{c \in C} | {(f, y) ∣ y = c, \forall (f, y) \in D} |), \forall m \in M, \end{matrix}

(3)

where C and M are the sets of classes and months, respectively.

Here, we must consider the differences in the data size by month, as shown in Figure 5. Because the data sizes

S_{m}

obtained using Equation (3) are not necessarily equal for all

m \in M

, the sizes are imbalanced among months. The larger the data size is compared with the feature dimension, the better the robustness and accuracy of an ML model in general [47,48]. Each target period’s reliability gap in terms of classification accuracy may be significant if the data sizes differ. For this reason, the datasets for each month are undersampled to the class with the minimum data size to facilitate comparisons among the classification models for each month. That is, letting

D_{m}, | D_{m} | = S_{m}, m \in M

be a dataset for month m, the data sizes for the months are calculated as follows:

\begin{matrix} S = | C | (min_{m \in M} min_{c \in C} | {(f, y) ∣ y = c, \forall (f, y) \in D_{m}} |) . \end{matrix}

Note that in this study,

M = {1, 2, \dots, 6}

and C = {‘0 to 19’, ‘20 s’, ‘30 s’, ‘40 s’, ‘50 s’, ‘60 s and over’}.

To construct and evaluate the ML models, we divided the dataset into training and test datasets. Because dividing a dataset into 70% training and 30% test datasets was shown to be desirable in many instances, the same was adopted for the training datasets in this study. In contrast, many test datasets can increase the validation accuracy of an ML model; thus, all extra datasets after training datasets were extracted and used as test datasets. Therefore, the number of training datasets was constant for any month, but the test datasets differed by month. If the training and test data sizes are denoted by

S^{train}

and

S_{m}^{test}

, respectively, then their sizes are defined as follows:

\begin{matrix} S^{train} & = ⌊0.7 S⌋, \\ S_{m}^{test} & = S_{m} - S^{train}, \forall m \in M . \end{matrix}

The training and test data are sampled such that there are no duplications from the dataset of size

S_{m}

. Because randomness is present when the training and test data are extracted from the source dataset, we conduct undersampling for 20 random seeds in the range of 0–19. In this way, 20 datasets for each month are prepared.

The specific sizes S and

S_{m}

, which are the basis for the training and test data for each month, respectively, are shown in Figure 5.

5.3. ML Models for Classification

To evaluate the proposed features for sparse trajectories (Section 4.1) and the features obtained via the method described in Section 4.2, we employed multiple ML models, including SVC with an RBF kernel, RF, and DNN. The models were implemented and evaluated using scikit-learn version 1.5.2 [49].

Although the components of the feature vector

f_{k}^{GMM}, \forall k \in K

mentioned in Section 4 are given by

f_{i} \in {x \in R_{\geq 0} ∣ x \leq T}, \forall i \in N_{\leq T}

, for calculation convenience, we use the feature vector to normalize the components into values between 0 and 1, as follows:

\begin{matrix} f_{k}^{' GMM} = \frac{1}{T} f_{k}^{GMM} . \end{matrix}

Note that

\sum_{k \in K} f_{k}^{' GMM} = 1

. Similarly, each element of

f^{IDNN}

is normalized to the range

f^{' IDNN} \in {[0, 1]}^{n}

via min–max scaling. Because each ML model has its own hyperparameters, a GS is applied to tune them by providing a specified search range for each model.

The SVC with an RBF kernel has two hyperparameters: the regularization parameter C and the kernel coefficient

γ

. These parameters are tuned over the following ranges:

C \in {10^{1}, 10^{2}, 10^{3}}, γ \in {10^{- 3}, 10^{- 2}, 10^{- 1}} .

The RF has several hyperparameters, but tuning all of them is impractical. Therefore, we focus on two key hyperparameters: the maximum depth

d_{max}

and the number of trees

n_{trees}

. These parameters are tuned over the following ranges:

d_{max} \in {4, 6, 8}, n_{trees} \in {100, 300, 500} .

The other hyperparameters of the RF are set to the default values provided by scikit-learn (version 1.5.2).

The DNN also has numerous hyperparameters. To simplify tuning, we select two of them: the hidden layer size

h

and the regularization parameter

α

. These parameters are tuned over the following ranges:

h \in {(64), (128), (64, 64)}, α \in {10^{- 4}, 10^{- 3}, 10^{- 2}} .

The DNN uses the Adam optimizer for parameter updates and ReLU as the activation function. All other hyperparameters are set to the default values provided by scikit-learn (version 1.5.2).

Moreover, we conducted fivefold CV for each parameter pair to prevent overfitting. This method is called GS+CV and is visualized in Figure 7.

5.4. Metrics

The classification accuracies of the learned ML models are evaluated on the test data. The metrics employed for the analysis are the following macro averages of accuracy, precision, recall, and F-measure, which are assumed to be

A_{M}, P_{M}, R_{M}

, and

F 1_{M}

, respectively, and are defined as follows:

\begin{matrix} A_{M} & = \frac{1}{| C |} \sum_{c \in C} \frac{{TP}_{c} + {TN}_{c}}{{TP}_{c} + {FP}_{c} + {TN}_{c} + {FN}_{c}}, \\ P_{M} & = \frac{1}{| C |} \sum_{c \in C} P_{c}, P_{c} = \frac{{TP}_{c}}{{TP}_{c} + {FP}_{c}}, \\ R_{M} & = \frac{1}{| C |} \sum_{c \in C} R_{c}, R_{c} = \frac{{TP}_{c}}{{TP}_{c} + {FN}_{c}}, \\ F 1_{M} & = \frac{1}{| C |} \sum_{c \in C} \frac{2 P_{c} R_{c}}{P_{c} + R_{c}}, \end{matrix}

(4)

where

{TN}_{c}, {FN}_{c}, {FP}_{c}

, and

{TP}_{c}

are the true negative, false negative, false positive, and true positive of each class c, respectively.

Because the dataset is a balanced six-class dataset, the accuracy can be expressed simply as follows:

\begin{matrix} A_{M} \{\begin{matrix} > \frac{1}{6} : & Performance is higher than \\ that of random model, \\ \leq \frac{1}{6} : & Performance is equivalent to or \\ lower than that of random model, \end{matrix} \end{matrix}

That is, the approximate chance level of this problem is 17%. The dataset utilized in this study was sampled to ensure balanced data across all classes, as described in Section 5.2. For balanced data, there is no requirement to adjust for class imbalance when evaluating accuracy, which allows the metric defined in Equation (4) directly comparable to the chance level. We thus evaluate the model performance on the basis of accuracy. In addition, as mentioned in Section 5.2, given that 20 datasets were prepared to decrease randomness, 20 models are also constructed. In other words, 20 values of each metric are obtained for each parameter setting. Thus, the average and standard deviation can be obtained for the metrics of all the ML models.

6. Numerical Experiment

6.1. Methods

6.1.1. Tuning GMD Components n for Model Performance

As shown in Section 4.1.1, the dimension of the feature vector

f_{k}^{GMM}

generated based on trajectory

T_{k}

is equal to the number n of GMD components. In general, too large the feature vector dimension for a sample size has a negative effect on the robustness and regularization of the model [47,48]. By contrast, for the feature vector obtained by aggregating a trajectory to RPs, the dimension is regarded as the expressiveness for a trajectory; thus, a larger dimension n might increase the classification accuracy. Therefore, the trends of the model performance by components n of the GMD are verified, and a search for the components n that provide the highest accuracy is performed. Let the search range of components n be

n \in {2} \cup {10, 20, \dots, 200}

. The minimum dimensionality required for the feature vector is

n = 2

to ensure distinctiveness. This is because setting

n = 1

results in all feature vectors being

f_{k}^{GMM} = {(T)}^{⊤}

, eliminating any distinctiveness of trajectories. Note that the GMD is computed via the expectation–maximization algorithm [50].

The metric employed for the GMD is often either Akaike’s information criterion or the Bayesian information criterion. These metrics provide the log-likelihood of the GMD. However, in this study, the performance of the classification model trained with features based on the GMD is essential, whereas the likelihood of the GMD for the actual population distribution is not essential. Thus, we directly use the model performance, i.e., the metrics in Section 5.4 as evaluation indices for the GMD components n. By contrast, only the accuracy in Section 5.4 is used in the experiment because the objective is to determine the trends of model performance by components n and not the detailed behavior of the models. In particular, the average and standard deviation of the accuracy are calculated for the results of 20 seeds.

Because the numbers of datasets and component candidates are 20 and 21, respectively, the number of constructed models is

20 \times 21 = 420

for each ML method. The hyperparameter tuning described in Section 5.3 is conducted for all 420 models for each ML method.

6.1.2. Comparison of GMM and IDNN

The performances of the ML models that use the feature vectors

f_{k}^{GMM}

and

f_{k}^{IDNN}

are compared. The IDNN is trained via the same training and test datasets with

θ_{th} = 1.00

as the ML models in Section 6.1.1.

In [16], a clipping process is applied to center the trajectory centroid within the image, as their objective is to classify transportation modes, focusing primarily on the shape of the trajectory. By contrast, this study aims at attribute classification, where the relative position of the trajectory within the region, in addition to its shape, is crucial. Therefore, no clipping process is applied.

The IDNN has several hyperparameters that need to be tuned. It is not realistic to tune all the hyperparameters in terms of the computational cost; thus, we focus on the number of hidden layers

λ

and the number of neurons in the l-th layer

n^{(l)}

to tune these parameters via the GS+CV.

To simplify the process, we assume that all layers have the same number of neurons, i.e.,

n^{(1)} = n^{(2)} = \dots = n^{(λ)}

. We suppose the search ranges for these hyperparameters with

λ \in {2, 3, 4}

and

n^{(l)} \in {100, 150, 200}, \forall l \in {1, 2, \dots, λ}

. The other parameters and settings for the IDNN are as follows:

Image size: $W_{m} = H_{m} = 32$
Noise distribution $q_{D}$ : masking noise (rate = 0.2)
Activation functions:
−
$φ (\cdot)$ , $φ^{'} (\cdot)$ : ReLU
−
$φ_{supv} (\cdot)$ : Softmax
Loss functions:
−
$L$ , $L_{supv}$ : mean squared error
Optimizer for each DEA and fine tuning of the DNN: Adam

The image size determines the expressiveness of the trajectory representation. A larger image size is anticipated to enhance the representational capacity of the trajectory; however, it may also increase susceptibility to noise. Conversely, a smaller image size is more robust to noise but may compromise the expressive power of the trajectory representation. In [16], an image size of

W_{m} = H_{m} = 40

was reported to achieve the best accuracy and demonstrate good generalization performance. In this study, the task involves a higher degree of trajectory sparsity and noise compared to [16]. To address these challenges while avoiding significant loss in representational capability due to an overly small image size, the image size was set to

W_{m} = H_{m} = 32

.

Masking noise is used during the pre-training phase of SDA to deliberately introduce noise, with the goal of improving the robustness of extracted features. In [16], it was reported that a masking noise rate of 0.4 achieved the highest accuracy and improved generalization performance. However, because the trajectories in this study inherently contain noise, using a high masking noise rate would significantly reduce accuracy. Consequently, the masking noise rate was set to 0.2 in this study.

In [16], sigmoid functions were used as activation functions. However, sigmoid functions are known to cause the vanishing gradient problem. In recent years, ReLU functions have gained popularity in many tasks [51]. Furthermore, since the output layer during fine-tuning is compared with binary vectors representing the classes, the Softmax function is more suitable than the sigmoid function. For these reasons, this study adopted ReLU for

φ (\cdot)

and

φ^{'} (\cdot)

, and the Softmax function for

φ_{supv} (\cdot)

. For the loss function, although Cross-Entropy is commonly used for classification tasks, some studies report that mean squared error (MSE) performs better for certain problems [52]. Since there is no definitive conclusion about the superiority of one over the other, this study follows [15,16] and adopts MSE as the loss function.

Regarding the optimizer, SGD was utilized in [15,16]. However, SGD requires careful tuning of the learning rate and is prone to becoming stuck in local optima if not configured correctly. On the other hand, Adam overcomes these limitations by automatically adjusting the learning rate. Considering the balance between computational cost and improved accuracy, this study selected Adam as the optimizer.

Based on these parameter settings, the average and standard deviation of the accuracy are calculated for the results of 20 seeds and compared with the results obtained in Section 6.1.1.

6.1.3. Performance by the Missing Value Rate of the User Trajectory

As in Section 3.3, the missing value rate

θ_{k}

of trajectory

T_{k}

indicates its sparsity. Hence, the classification performance increases if only trajectories with low sparsity are learned by providing the threshold

θ_{th}

for the missing value range

θ_{k}, \forall k \in K

. Let

T_{k}^{θ_{th}}

be a trajectory generated from element

L_{k}^{θ_{th}}

of

\begin{matrix} L^{θ_{th}} = {L_{k} ∣ Θ (T, L_{k}) < θ_{th}, \forall k \in K}, \end{matrix}

(5)

obtained using any

θ_{th}

. Then, we use the function

Θ

defined in Equation (1) into Equation (5). The dataset for month m consists of the pair

(f_{k}^{θ_{th}}, c)

of feature

f_{k}^{θ_{th}}

generated by

T_{k}^{θ_{th}}

, and the corresponding class c is denoted by

D_{m}^{θ_{th}}

. If we conduct a sensitivity analysis of the threshold

θ_{th}

for

D_{m}^{θ_{th}}

, we can verify the effect of trajectory sparsity on the classification accuracy.

In this experiment, a sensitivity analysis is conducted in the range

θ_{th} \in {0.95, 0.96, \dots, 1.00}

. Then,

D_{m}^{θ_{th} = 1.00}

indicates that the entire trajectory

T_{k} \forall k \in K

is to be used for training and testing. When sampling data using any random seeds, we ensure that the datasets become

\begin{matrix} D_{m}^{θ_{th} = 0.95} \subseteq D_{m}^{θ_{th} = 0.96} \subseteq \dots \subseteq D_{m}^{θ_{th} = 1.00}, \forall m \in M . \end{matrix}

The sizes of the training and test data in the datasets

D_{m}^{θ_{th}}, \forall m \in M

for each

θ_{th}

value are listed in Table 1. Additionally, for the models for each month, we adopted the GMD components and the IDNN hyperparameters with the maximum average accuracy in Section 6.1.1 and Section 6.1.2.

Figure 8 summarizes the flow of all the experimental procedures.

6.2. Results and Discussion: Tuning GMD Components n for Model Performance

Based on Section 6.1.1, the trends in the classification performance of each ML model for GMD components n are verified. Figure 9 shows the test accuracy values obtained using components

n \in {2} \cup {10, 20, \dots, 200}

and months

\forall m \in M

. The results are shown for the SVC with an RBF kernel, RF, and DNN, which are represented by red, blue, and green lines, respectively. In particular, the averages and standard deviations (1SD) of each model by dataset

D_{m}

for the 20 seeds are indicated. The vertical and horizontal axes represent the accuracy and GMD components n, respectively. The title presents the model that achieved the highest average accuracy, along with the GMD component number n, which yielded the highest average accuracy. In addition, we conducted the Shapiro–Wilk test on the distribution of 20 accuracy values when the average was maximal, confirming the normality of the distribution. According to the test results,

p \geq 0.050

,

p < 0.050

, and

p < 0.010

are indicated as ‘n.s.’, ‘*’, and ‘**’, respectively. The purpose of the Shapiro–Wilk test is to define the confidence interval for the classification accuracy of the ML models and determine the significance of the ML model for the random six-class classification model. Figure 10 shows the training accuracy values, with the vertical and horizontal axes providing the same information as in Figure 9.

Figure 9 shows that for all months, the accuracy tends to increase as n increases, particularly in the SVC and RF. This is because if the GMD components n are tiny, the expressiveness of the RPs in the target area might be rough, and the difference in the similarity of each ID feature might be less likely to occur. In particular, models with minimal components, such as

n = 2

, significantly reduce the classification performance. Conversely, components n that are too large may cause the expressiveness to peak and are less likely to contribute to the classification performance. The classification performance varies by the period of data acquisition; focusing on the maximal average by each month, the highest performance is achieved in January by SVC, with an accuracy of =0.336 (0.013), and the lowest performance is achieved in June by RF, with an accuracy of =0.238 (0.040) and a chance level of approximately

0.170

. Nonetheless, in any month, the normality of the accuracy distribution is confirmed to be

p > 0.050

, and the classification performance, excluding the results from DNN, is higher than that of random classification (

accuracy \approx 0.170

) in the range of 3SD, i.e., a confidence interval of 99.7%. As a result, features consisting of only sparse trajectories would include sufficient information to classify the user age groups.

Here, we focus on the learning trends of each model. The SVC achieves the highest performance in January and February, whereas the RF achieves superior accuracy from March to June. Notably, during the period from March to June, the RF consistently outperforms the other two models across all the components n. In contrast, the DNN achieves significantly lower accuracy than both the SVC and the RF do in all months. This may be because the sample size of training dataset is small for DNN. Therefore, the following discussion focuses primarily on the SVC and RF.

Focusing on the transitions in accuracy with respect to the GMD components n, RF tends to reach accuracy saturation earlier than SVC across all months for the test data. This trend is also observed in the transitions in accuracy for each model on the training data, as shown in Figure 10, although the training accuracy of SVC tends to increase more gradually than that of the test data. From this observation, it can be seen that although SVC tends to have lower accuracy than RF does, it appropriately handles the complexity associated with the increase in the dimensionality of feature vectors, effectively suppressing overfitting. This is particularly evident during the period from March to June. On the other hand, RF demonstrates high accuracy even with small values of n, that is, low-dimensional feature vectors, but it also tends to overfit. From these results, it can be concluded that SVC exhibits higher generalization performance than RF does. Moreover, the RF achieves high accuracy even with low-dimensional feature vectors. In the computation of the GMD, an increase in the number of components n leads to an increase in the computational cost; thus, the RF is appropriate for scenarios where the computational cost is a priority. Conversely, when the generalization performance is emphasized, SVC is preferred. These findings are based on the accuracy obtained within the hyperparameter space described in Section 5.3. Hence, different results may emerge when a broader hyperparameter space is considered.

Note that because the GMD components n obtained in this experiment are the results for only one area, i.e., Narashino, Chiba, Japan, the components are not necessarily practically applicable to any area and could depend on the scale and shape of the target area.

6.3. Results and Discussion: Comparison of the GMM and IDNN

The performance of the ML models that use the feature vectors

f_{k}^{GMM}

and

f_{k}^{IDNN}

is compared on the basis of Section 6.1.2. Table 2 shows the average and standard deviation of 20 accuracies on training and test data by month in

θ_{th} = 1.00

. The header of the table indicates the combination of the feature extraction method and the ML model, e.g., GMM-SVC indicates that SVC trained by

f_{k}^{GMM}

. In addition, the parameters providing the highest average accuracies for each feature extraction method on the test data are indicated in parentheses. These include

(n)

, representing the number of GMD components for the GMM, and

(λ, n^{(l)})

for the IDNN, which denotes the number of hidden layers and the number of neurons in each layer, respectively. These accuracies are obtained via the SVC, RF, and DNN with optimized hyperparameters in space described in Section 5.3. The highest average accuracy for each month is indicated in bold.

The difference in accuracy between the training and test data serves as a critical indicator of the model’s generalization performance. Specifically, when training accuracy is substantially higher than test accuracy, it indicates that the model is overfitting. Conversely, a small difference in accuracy implies that the model exhibits high generalization performance. Therefore, evaluating the accuracy difference between training and test data provides a more detailed assessment of each model’s generalization capabilities.

The results show that the accuracy of the GMM-SVC is the highest in January and February, whereas the GMM-RF achieves the highest accuracy from March to June. GMM-SVC and GMM-RF consistently show higher accuracy than IDNN-SVC and IDNN-RF across all months. The scores of IDNN-SVC fall within the 1–3 SD range of the scores for GMM-SVC and GMM-RF, indicating relatively small differences in accuracy. On the other hand, focusing on the accuracy differences between the training and test data, the range for GMM-SVC and GMM-RF is 0.019–0.392, whereas IDNN-SVC and IDNN-RF exhibit a wider range of 0.358–0.679. This suggests that IDNN-SVC and IDNN-RF are prone to overfitting, whereas feature extraction via the GMM demonstrates better generalization performance. Note that, except for IDNN-SVC in March and May and IDNN-DNN in March and June, all the results have

p > 0.050

, confirming that the accuracy distributions on test follow a normal distribution.

This experiment was conducted with a missing ratio of

θ_{th} = 1.00

, resulting in highly sparse trajectories. Consequently, the results indicate that the GMM-based features achieve higher accuracy and generalization performance than the IDNN does for sparse trajectories. The reliance of the IDNN on image-based trajectory transformations likely makes it more susceptible to noise, reducing its generalization performance. In other words, imputed trajectory information, when directly converted into pixel values, can lead to irregular spatial patterns in the image, causing noise accumulation. For sparse trajectories, the lack of structural information on the IDNN may limit its effectiveness. By contrast, GMM-based feature generation identifies RPs on the basis of the distribution of human flows and aggregates trajectories around those RPs. This approach may minimize the accumulation of noise caused by sparsity. As a result, GMM-based features demonstrate superior generalization performance for sparse trajectories.

6.4. Results and Discussions: Performance by the Missing Value Rate of the User Trajectory

On the basis of Section 6.4, we analyze the trend of the ML models using the feature vectors

f_{k}^{GMM}

and

f_{k}^{IDNN}

performance with respect to trajectory sparsity. First, to consider the effect of trajectory sparsity on classification performance, we visualize the trends of the accuracy values for the test and training data against the missing value ratio

θ_{th}

in Figure 11 and Figure 12, respectively. For both figures, the vertical and horizontal axes represent the accuracy

A_{M}

and missing value ratio

θ_{th}

, respectively. In addition, the red, blue and green symbols represent the SVC, RF, and DNN, respectively, and the circle and cross symbols indicate the GMM- and IDNN-based features, respectively. Fewer differences between the accuracy of training and test data indicate higher generalization performance. In Figure 11, the horizontal and vertical dashed lines represent the best accuracy and the corresponding

θ_{th}

among the combinations of feature extraction methods and ML models, respectively.

In the SVC and RF methods that use GMM and IDNN features, given that the trends in which a lower

θ_{th}

leads to higher accuracy were confirmed between the threshold of the best accuracy, i.e., the dashed line in Figure 11 and

θ_{th} = 1.00

, the sparsity of the trajectory would decrease the model performance. Conversely, if

θ_{th}

is too small, low performance can also occur. This is because an extremely small

θ_{th}

yields low numbers of samples relative to the feature space. Conversely, for both the GMM and IDNN features, the DNN tends to decrease in accuracy as

θ_{th}

decreases in the range of

θ_{th} = 0.95

–

0.98

, which contrasts with the trend observed in SVC and RF, where accuracy increases as

θ_{th}

decreases. This suggests that, in this range of missing value ratios, the amount of training data may be insufficient to adequately train the DNN models. This result is supported by the findings shown in Figure 12 for the DNN on the training data. Because the trends in accuracy for the training and test data are similar, it can be inferred that insufficient data availability prevents the DNN from learning effectively, thereby hindering its performance improvement.

Focusing on the best accuracy values in Figure 11, GMM-RF achieves the highest accuracy for all months. In particular, for April to June, the test accuracy of GMM-RF is higher than that of the other combinations for all the missing value ratios. In addition, the test accuracy of GMM-SVC is not as high as that of GMM-RF, but it is confirmed that GMM-SVC has high generalization performance because of the smaller differences between the accuracies of the training and test data. This is implied from Figure 11 and Figure 12. On the other hand, while the test accuracies of the GMM and IDNN features are close, the training accuracy of the IDNN features tends to be greater than that of the GMM features, particularly for

θ_{th} = 0.98

–

1.00

. This implies the IDNN features are prone to overfitting in the instance of this experiments. This trend was confirmed in Section 6.3, and its reasons are the same as those mentioned before; that is, the conversion process of a trajectory to an image cannot eliminate noise. These results also suggest that the GMM-based features are more effective for handling sparse trajectories than IDNN-based features. Overall, the results indicate that GMM-based features provide a more reliable and effective approach for extracting information from sparse trajectories, making them a better fit for scenarios where semantic information is unavailable and data sparsity is a significant challenge. For

θ_{th} = 0.95

–

0.97

, overfitting is confirmed in GMM-SVC and RF and IDNN-SVC and RF. This may be because the number of samples is small in the feature space, and the model is overfitted to the training data.

Here, we analyze the classification results via detailed metrics, such as precision, recall, and F-measure, in addition to accuracy. Note that we focus on only the GMM-RF results, which showed the highest accuracy in all months. Table 3 shows each metric for the classification results of the test data by threshold

θ_{th}

and month. The averages and standard deviations corresponding to datasets of 20 seeds for each metric are indicated in parentheses. Moreover, with respect to the confidence interval of classification accuracy, as in Section 6.2, Table 3 shows the p-values for the Shapiro–Wilk test for the distribution of accuracies for 20 seeds. Because the p-values are

p > 0.050

except for

θ_{th} = 0.95

and

0.98

in May and

θ_{th} = 0.95

in June, we assume that any distribution of accuracies follows a normal distribution.

Focusing on the precision, recall, and F-measure in any parameter setting, we can conclude that the levels of classification performance for the classes are similar because these metric values are almost equivalent to the accuracy. This is a natural result because we handle a balanced dataset. For the missing value rate

θ_{th} = 1.00

, because the average accuracy is at its maximum value of 0.331 (0.011) in January and at its minimum value of 0.238 (0.004) in June, it is confirmed that the classification performance is significantly higher than random classification in any month. Furthermore, through adjustments in the threshold

θ_{th}

, the classification performance is increased such that the average accuracy reaches its maximum of 0.545 (0.064) in January and minimum of 0.316 (0.011) in June. The normality of the accuracy distribution for the 20 seeds is confirmed, except for

θ_{th} = 0.95

and

0.98

in May and

θ_{th} = 0.95

in June. These results show that with appropriate parameter settings, the constructed GMM-RF models perform with significantly higher classification accuracy than those produced by random classification. Note that, in GMM-SVC, the classification performance is not as high as that of GMM-RF, but it is confirmed that GMM-SVC has the same tendency and high generalization performance, as shown in Figure 11 and Figure 12. As a result, we infer that sparse spatial–temporal information, i.e., a sparse trajectory, would have sufficient information to classify the corresponding user age group.

Finally, the classification performance with respect to the data acquisition period is discussed. Focusing on the maximum accuracy of GMM-RF, the accuracy values from January to June are 0.545 (0.064), 0.476 (0.038), 0.342 (0.017), 0.335 (0.023), 0.331 (0.015), and 0.316 (0.011), respectively. In January and February, the values are particularly high. Conversely, the values from March to June are lower than those from January and February. This occurred because declarations of a state of emergency and a semistate of emergency were enforced from 8 January to 21 March and 20 April to 1 August 2021, respectively, in the target area. The state of emergency involves strong behavioral regulation, including orders to close restaurants and commercial facilities. In contrast, the semistate of an emergency is a behavioral regulation that includes only the required orders regarding hours of operation [45]. Thus, in January and February, in which states of emergency were enforced, the restriction of irregular behavior (e.g., amusement trips to commercial and recreational facilities) among people’s activities clearly differentiated the behavior patterns of people in different user age groups. However, in March, when a state of emergency was also enforced, the classification performance was significantly lower than that in January and February. This could be because the states of emergency migrated to semistates of emergency in the middle of March, and people’s behavior changed remarkably. The classification performance also declined from April to June; this may be because people’s behavior became more diverse than that during the state of emergency. The monthly sample sizes shown in Figure 5 and Figure 6 indicate that the sample size from March to June is larger than that from January to February. This suggests that people’s behavior patterns gradually returned to normal during the period from March to June. Nonetheless, the average accuracy values are significantly higher than those of random classification in the range of 3SD for many parameter settings. These results also reveal that the sparse trajectory contains sufficient information for classifying user age groups without the use of any semantic information other than spatial–temporal information.

7. Conclusions

In this study, we conducted a performance evaluation and analysis of GMM-based features for sparse trajectories without semantic information. The experiments utilized three machine learning models, namely, SVC, RF, and DNN, to perform a detailed examination of the features. We also compared them with the IDNN, an existing feature extraction method for trajectories. The results showed that GMM-based features achieved higher classification accuracy and better generalization performance than IDNN-based features. In particular, the RF algorithm attained high accuracy, whereas the SVC algorithm maintained stable generalization performance. Moreover, a sensitivity analysis of the missing rate

θ_{th}

, which indicates trajectory sparsity, revealed that the IDNN becomes more prone to overfitting as the sparsity increases, whereas the GMM-based approach can sustain accuracy and generalizability performance. These contributions demonstrate that GMM-based features offer clear advantages over existing methods in terms of both accuracy and generalizability when dealing with sparse trajectories lacking semantic information. Consequently, this study shows that sparse trajectory data can be effectively utilized even without semantic information. This study expands the potential for data utilization in various fields, including urban planning, transportation analysis, and public policy.

The limitations of this study are as follows:

Although simple linear interpolation was adopted for trajectory imputation, its impact on classification performance was not verified.
The proposed method was verified only for a single region. In addition, owing to constraints in the data acquisition period, the influence of behavioral changes from prepandemic norms during the COVID-19 era could not be taken into account.
The features did not contain the connection relationships of RPs.

In light of these points, our future research will verify the generality of the proposed model in terms of the area and period of location data and further consider the utilization of sparse spatial–temporal information.

Author Contributions

Conceptualization, Y.K., Y.O. and H.T.; methodology, Y.K., Y.O. and H.T.; software, Y.K.; validation, Y.K.; formal analysis, Y.K., Y.O. and H.T.; investigation, Y.K., Y.O. and H.T.; resources, Y.O.; data curation, Y.K., Y.O. and H.T.; writing—original draft preparation, Y.K.; writing—review and editing, Y.K., Y.O. and H.T.; visualization, Y.K.; supervision, Y.K.; project administration, Y.K.; funding acquisition, Y.K. and Y.O. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by JSPS Grant-in-Aid for Early-Career Scientists (Grant No. 23K13517 to Y.K.).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study were obtained from Agoop Corp. and are available for purchase at https://agoop.co.jp/ (accessed on 16 January 2025) with the permission of Agoop Corp.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Van der Spek, S.; Van Schaick, J.; De Bois, P.; De Haan, R. Sensing Human Activity: GPS Tracking. Sensors 2009, 9, 3033–3055. [Google Scholar] [CrossRef] [PubMed]
Giannotti, F.; Nanni, M.; Pinelli, F.; Pedreschi, D. Trajectory pattern mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 330–339. [Google Scholar] [CrossRef]
Lee, J.G.; Han, J.; Li, X.; Gonzalez, H. TraClass: Trajectory classification using hierarchical region-based and trajectory-based clustering. Proc. VLDB Endow. 2008, 1, 1081–1094. [Google Scholar] [CrossRef]
Kapadais, K.; Varlamis, I.; Sardianos, C.; Tserpes, K. A Framework for the Detection of Search and Rescue Patterns Using Shapelet Classification. Future Internet 2019, 11, 192. [Google Scholar] [CrossRef]
Kontopoulos, I.; Makris, A.; Tserpes, K. A Deep Learning Streaming Methodology for Trajectory Classification. ISPRS Int. J. Geo-Inf. 2021, 10, 250. [Google Scholar] [CrossRef]
Li, Q.; Zheng, Y.; Xie, X.; Chen, Y.; Liu, W.; Ma, W.Y. Mining user similarity based on location history. In Proceedings of the ACM International Symposium on Advances in Geographic Information Systems, Irvine, CA, USA, 5–7 November 2008; pp. 298–307. [Google Scholar] [CrossRef]
Calabrese, F.; Diao, M.; Lorenzo, G.D.; Ferreira, J.; Ratti, C. Understanding individual mobility patterns from urban sensing data: A mobile phone trace example. Transp. Res. Part C Emerg. Technol. 2013, 26, 301–313. [Google Scholar] [CrossRef]
Zhong, E.; Tan, B.; Mo, K.; Yang, Q. User demographics prediction based on mobile data. Pervasive Mob. Comput. 2013, 9, 823–837. [Google Scholar] [CrossRef]
Hu, T.; Luo, J.; Kautz, H.; Sadilek, A. Home Location Inference from Sparse and Noisy Data: Models and Applications. In Proceedings of the 15th IEEE International Conference on Data Mining Workshop (ICDMW 2015), Atlantic City, NJ, USA, 14–17 November 2015; pp. 1382–1387. [Google Scholar] [CrossRef]
Sun, F.; Wang, P.; Zhao, J.; Xu, N.; Zeng, J.; Tao, J.; Song, K.; Deng, C.; Lui, J.C.; Guan, X. Mobile Data Traffic Prediction by Exploiting Time-Evolving User Mobility Patterns. IEEE Trans. Mob. Comput. 2022, 21, 4456–4470. [Google Scholar] [CrossRef]
Sinclair, M.; Maadi, S.; Zhao, Q.; Hong, J.; Ghermandi, A.; Bailey, N. Assessing the socio-demographic representativeness of mobile phone application data. Appl. Geogr. 2023, 158, 102997. [Google Scholar] [CrossRef]
Ferrero, C.A.; Petry, L.M.; Alvares, L.O.; da Silva, C.L.; Zalewski, W.; Bogorny, V. MasterMovelets: Discovering heterogeneous movelets for multiple aspect trajectory classification. Data Min. Knowl. Discov. 2020, 34, 652–680. [Google Scholar] [CrossRef]
Han, P.; Wang, J.; Yao, D.; Shang, S.; Zhang, X. A Graph-based Approach for Trajectory Similarity Computation in Spatial Networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Virtually, 14–18 August 2021; pp. 556–564. [Google Scholar] [CrossRef]
Zhou, S.; Han, P.; Yao, D.; Chen, L.; Zhang, X. Spatial-temporal fusion graph framework for trajectory similarity computation. World Wide Web 2023, 26, 1501–1523. [Google Scholar] [CrossRef]
Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.A. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
Endo, Y.; Toda, H.; Nishida, K.; Ikedo, J. Classifying spatial trajectories using representation learning. Int. J. Data Sci. Anal. 2016, 2, 107–117. [Google Scholar] [CrossRef]
Kakimoto, Y.; Omae, Y. Evaluation of location-data based features using Gaussian mixture models for age group estimation. J. Phys. Conf. Ser. 2024, 2701, 012070. [Google Scholar] [CrossRef]
Boser, B.E.; Guyon, I.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Agoop Corp. Dynamic Population Data. Available online: https://agoop.co.jp/ (accessed on 16 January 2025).
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: http://www.deeplearningbook.org (accessed on 16 January 2025).
Kwon, Y.H.; Lobo, N.d.V. Age Classification from Facial Images. Comput. Vis. Image Underst. 1999, 74, 1–21. [Google Scholar] [CrossRef]
Angulu, R.; Tapamo, J.R.; Adewumi, A.O. Age estimation via face images: A survey. EURASIP J. Image Video Process. 2018, 2018, 42. [Google Scholar] [CrossRef]
Mustapha, M.F.; Mohamad, N.M.; Osman, G.; Hamid, S.H.A. Age Group Classification using Convolutional Neural Network (CNN). J. Phys. Conf. Ser. 2021, 2084, 012028. [Google Scholar] [CrossRef]
Saxena, A.K.; Chaurasiya, V.K. Fingerprint based human age group estimation. In Proceedings of the 2014 Annual IEEE India Conference (INDICON), Pune, India, 11–13 December 2014; pp. 1–4. [Google Scholar] [CrossRef]
Nabila, M.; Mohammed, A.I.; Yousra, B.J. Gait-based human age classification using a silhouette model. IET Biom. 2018, 7, 116–124. [Google Scholar] [CrossRef]
Guimaraes, R.G.; Rosa, R.L.; De Gaetano, D.; Rodríguez, D.Z.; Bressan, G. Age Groups Classification in Social Network Using Deep Learning. IEEE Access 2017, 5, 10805–10816. [Google Scholar] [CrossRef]
Wang, P.; Sun, F.; Wang, D.; Tao, J.; Guan, X.; Bifet, A. Inferring demographics and social networks of mobile device users on campus from ap-trajectories. In Proceedings of the 26th International World Wide Web Conference 2017 (WWW 2017), Perth, Australia, 3–7 April 2017; pp. 139–147. [Google Scholar] [CrossRef]
Montasser, O.; Kifer, D. Predicting Demographics of High-Resolution Geographies with Geotagged Tweets. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI 2017), San Francisco, CA, USA, 4–9 February 2017; pp. 1460–1466. [Google Scholar] [CrossRef]
Berndt, D.; Clifford, J. Using Dynamic Time Warping to Find Patterns in Time Series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 31 July–1 August 1994. [Google Scholar]
Holt, G.T.; Reinders, M.; Hendriks, E. Multi-Dimensional Dynamic Time Warping for Gesture Recognition. In Proceedings of the Thirteenth Annual Conference of the Advanced School for Computing and Imaging, Heijen, The Nederland, 13–15 June 2007. [Google Scholar]
Shokoohi-Yekta, M.; Hu, B.; Jin, H.; Wang, J.; Keogh, E. Generalizing DTW to the multi-dimensional case requires an adaptive approach. Data Min. Knowl. Discov. 2017, 31, 1–31. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Zeng, S.; Li, Y.; Zhang, P.; Jin, D. Human Mobility Prediction Using Sparse Trajectory Data. IEEE Trans. Veh. Technol. 2020, 69, 10155–10166. [Google Scholar] [CrossRef]
Xia, T.; Qi, Y.; Feng, J.; Xu, F.; Sun, F.; Guo, D.; Li, Y. AttnMove: History Enhanced Trajectory Recovery via Attentional Network. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 4494–4502. [Google Scholar] [CrossRef]
Sun, H.; Yang, C.; Deng, L.; Zhou, F.; Huang, F.; Zheng, K. PeriodicMove: Shift-aware Human Mobility Recovery with Graph Neural Network. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM ’21), Gold Coast, QLD, Australia, 1–5 November 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 1734–1743. [Google Scholar] [CrossRef]
Si, J.; Yang, J.; Xiang, Y.; Wang, H.; Li, L.; Zhang, R.; Tu, B.; Chen, X. TrajBERT: BERT-Based Trajectory Recovery With Spatial-Temporal Refinement for Implicit Sparse Trajectories. IEEE Trans. Mob. Comput. 2024, 23, 4849–4860. [Google Scholar] [CrossRef]
Chen, W.; Liang, Y.; Zhu, Y.; Chang, Y.; Luo, K.; Wen, H.; Li, L.; Yu, Y.; Wen, Q.; Chen, C.; et al. Deep Learning for Trajectory Data Management and Mining: A Survey and Beyond. arXiv 2024, arXiv:2403.14151. [Google Scholar] [CrossRef]
García-Palomares, J.C.; Gutiérrez, J.; Cardozo, O.D. Walking Accessibility to Public Transport: An Analysis Based on Microdata and GIS. Environ. Plan. Plan. Des. 2013, 40, 1087–1102. [Google Scholar] [CrossRef]
Joo, S.; Kashiyama, T.; Sekimoto, Y.; Seto, T. An Analysis of Factors Influencing Disaster Mobility Using Location Data from Smartphones: Case Study of Western Japan Flooding. J. Disaster Res. 2019, 14, 903–911. [Google Scholar] [CrossRef]
Mori, M.; Omae, Y.; Kakimoto, Y.; Sasaki, M.; Toyotani, J.; Mori, M.; Omae, Y.; Kakimoto, Y.; Sasaki, M.; Toyotani, J. Analyzing factors of daily travel distances in Japan during the COVID-19 pandemic. Math. Biosci. Eng. 2024, 21, 6936–6974. [Google Scholar] [CrossRef]
Eom, S.; Kim, H.; Hasegawa, D.; Yamada, I. Pedestrian movement with large-scale GPS records and transit-oriented development attributes. Sustain. Cities Soc. 2024, 102, 105223. [Google Scholar] [CrossRef]
Statistics Bureau of Japan. Statistics. Available online: https://www.stat.go.jp/english/data/index.html (accessed on 12 March 2024).
Geospatil Information Authority of Japan. Planimetric Reports on the Land Area by Prefectures and Municipalities in Japan. Available online: https://www.gsi.go.jp/KOKUJYOHO/MENCHO-title.htm (accessed on 12 March 2024).
Cabinet Agency for Infectious Disease Crisis Management. About The Cabinet Agency for Infectious Disease Crisis Management. Available online: https://www.caicm.go.jp/en/about/index.html (accessed on 12 March 2024).
Guo, H.; Li, Y.; Shang, J.; Gu, M.; Huang, Y.; Bing, G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
Hua, J.; Xiong, Z.; Lowey, J.; Suh, E.; Dougherty, E.R. Optimal number of features as a function of sample size for various classification rules. Bioinformatics 2005, 21, 1509–1515. [Google Scholar] [CrossRef]
Rajput, D.; Wang, W.J.; Chen, C.C. Evaluation of a decided sample size in machine learning applications. BMC Bioinform. 2023, 24, 48. [Google Scholar] [CrossRef] [PubMed]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Moon, T.K. The expectation-maximization algorithm. IEEE Signal Process. Mag. 1996, 13, 47–60. [Google Scholar] [CrossRef]
Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. In Proceedings of the JMLR Workshop and Conference Proceedings, Taoyuan, Taiwan, 13–15 November 2011; pp. 315–323. [Google Scholar]
Hui, L.; Belkin, M. Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks. arXiv 2021, arXiv:2006.07322. [Google Scholar]

Figure 1. Data distribution for missing value ratio

θ_{k}, \forall k \in K

for the PTDP data. These data were acquired from January to June 2021 in Narashino, Chiba, Japan. There are differences in the quantities of data among the missing value ratios. In particular, because the data range of 0.99–1.00 accounts for 60% of all user data, this figure shows that the user trajectories obtained from mobile devices in the area and in that period are very sparse.

Figure 1. Data distribution for missing value ratio

θ_{k}, \forall k \in K

for the PTDP data. These data were acquired from January to June 2021 in Narashino, Chiba, Japan. There are differences in the quantities of data among the missing value ratios. In particular, because the data range of 0.99–1.00 accounts for 60% of all user data, this figure shows that the user trajectories obtained from mobile devices in the area and in that period are very sparse.

Figure 2. Imputation method based on the assumption as described in Section 3.3. Data between two temporally continuous nodes are linearly imputed.

Figure 3. Procedure for generating feature vectors

f_{k}^{GMM}

from

L_{k}

and

T_{k}

.

Figure 3. Procedure for generating feature vectors

f_{k}^{GMM}

from

L_{k}

and

T_{k}

.

Figure 4. (a) SDA. (b) Fine-tuning of the DNN obtained via SDA.

Figure 5. Data distribution by class and month. The raw location data are imbalanced. Undersampling is conducted based on the black line, i.e.,

S^{'}

.

Figure 5. Data distribution by class and month. The raw location data are imbalanced. Undersampling is conducted based on the black line, i.e.,

S^{'}

.

Figure 6. Data distribution for the missing value rate

θ_{k}, \forall k \in K

by month. Although there are differences in the data quantity, the distributions’ shapes for each missing value rate between months are almost the same.

Figure 6. Data distribution for the missing value rate

θ_{k}, \forall k \in K

by month. Although there are differences in the data quantity, the distributions’ shapes for each missing value rate between months are almost the same.

Figure 7. Diagram of fivefold GS+SV for tuning the hyperparameters of the ML models. This process increases the model performance and reduces overfitting of the model on the training data.

Figure 8. Summary of a series of experimental procedures.

Figure 9. Average and standard deviation (1SD) of 20 accuracies on test data by month for the

θ_{th} = 1.00

and ML models. Each graph title indicates the maximum average and its standard deviation by components n in the form of average (1SD) and the p-value for the Shapiro–Wilk test for the accuracy distribution of 20 seeds obtained by components n. There is a tendency that the higher the number of GMD components is, the higher the accuracy across all months. Additionally, for all months, distribution normality for 20 accuracies is confirmed with

p > 0.050

, and accuracies are higher than those of random classification (

accuracy \approx 0.170

) in the range of 3SD on components

n > 2

, with any accuracy average by SVC and RF.

Figure 9. Average and standard deviation (1SD) of 20 accuracies on test data by month for the

θ_{th} = 1.00

and ML models. Each graph title indicates the maximum average and its standard deviation by components n in the form of average (1SD) and the p-value for the Shapiro–Wilk test for the accuracy distribution of 20 seeds obtained by components n. There is a tendency that the higher the number of GMD components is, the higher the accuracy across all months. Additionally, for all months, distribution normality for 20 accuracies is confirmed with

p > 0.050

, and accuracies are higher than those of random classification (

accuracy \approx 0.170

) in the range of 3SD on components

n > 2

, with any accuracy average by SVC and RF.

Figure 10. Average and standard deviation (1SD) of 20 accuracies on training data by month in

θ_{th} = 1.00

and ML models. This metric is used to evaluate the generalization performance of the ML models, and the lower the difference in the accuracy of the test data is, the higher the generalization performance. Compared with the test data, SVC and DNN exhibit a gradual increase in accuracy with respect to the number of components n, whereas RF shows an increase in accuracy from smaller values of n compared with the other two models.

Figure 10. Average and standard deviation (1SD) of 20 accuracies on training data by month in

θ_{th} = 1.00

and ML models. This metric is used to evaluate the generalization performance of the ML models, and the lower the difference in the accuracy of the test data is, the higher the generalization performance. Compared with the test data, SVC and DNN exhibit a gradual increase in accuracy with respect to the number of components n, whereas RF shows an increase in accuracy from smaller values of n compared with the other two models.

Figure 11. This figure illustrates the trends in classification accuracy for the missing value ratio

θ_{th}

based on the combinations of feature extraction methods and ML models for each month. Averages of 20 accuracies on test data. The dashed horizontal line indicates the best accuracy among the combinations. The title shows the month with the best average accuracy (standard deviation), the combinations of the feature extraction method and the ML model provide the best accuracy, and the results of the Shapiro–Wilk test for the accuracy distribution of 20 seeds, where n.s. is

p \geq 0.050

. The best average accuracy values are higher than those of random classification (

accuracy \approx 0.170

) in the range of 3SD in any month.

Figure 11. This figure illustrates the trends in classification accuracy for the missing value ratio

θ_{th}

based on the combinations of feature extraction methods and ML models for each month. Averages of 20 accuracies on test data. The dashed horizontal line indicates the best accuracy among the combinations. The title shows the month with the best average accuracy (standard deviation), the combinations of the feature extraction method and the ML model provide the best accuracy, and the results of the Shapiro–Wilk test for the accuracy distribution of 20 seeds, where n.s. is

p \geq 0.050

. The best average accuracy values are higher than those of random classification (

accuracy \approx 0.170

) in the range of 3SD in any month.

Figure 12. Average of 20 accuracies on training data by

θ_{th}

for each month. The lower the difference between the accuracies of the training and test data is, the higher the generalization performance. The ML models using the IDNN show a wider range of accuracy differences between the training and test data than those using the GMM. This shows that features based on the GMM are more robust to overfitting than those based on the IDNN.

Figure 12. Average of 20 accuracies on training data by

θ_{th}

for each month. The lower the difference between the accuracies of the training and test data is, the higher the generalization performance. The ML models using the IDNN show a wider range of accuracy differences between the training and test data than those using the GMM. This shows that features based on the GMM are more robust to overfitting than those based on the IDNN.

Table 1. Sizes of training and test data in datasets

D_{m}^{θ_{th}}, \forall m \in M

by

θ_{th}

setting.

Table 1. Sizes of training and test data in datasets

D_{m}^{θ_{th}}, \forall m \in M

by

θ_{th}

setting.

Data	Month	0.95	0.96	0.97	0.98	0.99	1.00
$S^{train}$	-	96	198	300	510	1044	2286
$S_{m}^{test}$	January	42	84	126	216	444	972
	February	102	180	330	612	1176	2538
	March	834	1338	2004	3222	6342	16,752
	April	1146	1686	2592	4194	7848	21,660
	May	1044	1476	2556	4206	7680	20,952
	June	1638	2280	3996	6810	12,888	38,136

Table 2. Average and standard deviation of 20 accuracies on training and test data in

θ_{th} = 1.00

by the combinations of feature extraction methods and ML models for each month. The highest average accuracy and its standard deviation are indicated in bold. The parameters of the GMM and IDNN that yielded the highest average accuracy on test data are indicated.

Table 2. Average and standard deviation of 20 accuracies on training and test data in

θ_{th} = 1.00

by the combinations of feature extraction methods and ML models for each month. The highest average accuracy and its standard deviation are indicated in bold. The parameters of the GMM and IDNN that yielded the highest average accuracy on test data are indicated.

Month	Data	GMM-SVC	GMM-RF	GMM-DNN	IDNN-SVC	IDNN-RF	IDNN-DNN
January	Train	0.631 (0.008)	0.597 (0.011)	0.415 (0.027)	0.907 (0.019)	0.794 (0.017)	0.350 (0.039)
	Test	0.336 (0.013)	0.331 (0.011)	0.310 (0.012)	0.323 (0.014)	0.298 (0.012)	0.268 (0.018)
	Param.	(200)	(60)	(180)	(2, 200)	(2, 200)	(2, 200)
February	Train	0.575 (0.033)	0.599 (0.016)	0.390 (0.030)	0.907 (0.017)	0.785 (0.024)	0.313 (0.033)
	Test	0.298 (0.010)	0.296 (0.008)	0.276 (0.009)	0.296 (0.009)	0.270 (0.009)	0.236 (0.012)
	Param.	(160)	(50)	(200)	(2, 200)	(2, 150)	(2, 200)
March	Train	0.482 (0.065)	0.635 (0.035)	0.310 (0.020)	0.905 (0.058)	0.812 (0.029)	0.285 (0.036)
	Test	0.234 (0.006)	0.243 (0.004)	0.221 (0.007)	0.226 (0.007)	0.224 (0.006)	0.205 (0.011)
	Param.	(120)	(180)	(120)	(2, 200)	(3, 200)	(2, 200)
April	Train	0.453 (0.080)	0.590 (0.035)	0.326 (0.028)	0.712 (0.203)	0.743 (0.059)	0.290 (0.037)
	Test	0.233 (0.005)	0.244 (0.005)	0.227 (0.007)	0.226 (0.004)	0.229 (0.006)	0.216 (0.006)
	Param.	(130)	(180)	(190)	(2, 200)	(2, 150)	(2, 200)
May	Train	0.500 (0.095)	0.599 (0.038)	0.331 (0.025)	0.715 (0.222)	0.756 (0.054)	0.294 (0.033)
	Test	0.235 (0.004)	0.250 (0.004)	0.230 (0.004)	0.228 (0.005)	0.232 (0.004)	0.220 (0.006)
	Param.	(180)	(150)	(190)	(2, 200)	(2, 200)	(2, 200)
June	Train	0.443 (0.102)	0.586 (0.062)	0.316 (0.027)	0.578 (0.195)	0.736 (0.107)	0.285 (0.040)
	Test	0.224 (0.004)	0.238 (0.004)	0.221 (0.006)	0.220 (0.005)	0.223 (0.004)	0.211 (0.010)
	Param.	(160)	(80)	(160)	(2, 200)	(2, 200)	(2, 200)

Table 3. Four metric values for the GMM-RF model with an n-dimensional feature vector. This table shows the metric values for each month and each

θ_{th}

. The highest average accuracy and its standard deviation are indicated in bold. Moreover, the p-value for the Shapiro–Wilk test is used to confirm the distribution normality of 20 accuracy samples.

Table 3. Four metric values for the GMM-RF model with an n-dimensional feature vector. This table shows the metric values for each month and each

θ_{th}

. The highest average accuracy and its standard deviation are indicated in bold. Moreover, the p-value for the Shapiro–Wilk test is used to confirm the distribution normality of 20 accuracy samples.

Month	n	$θ_{th} (> θ_{k})$	$A_{M}$	$P_{M}$	$R_{M}$	$F 1_{M}$	Shapiro–Wilk ^†
January	60	0.95	0.535 (0.087)	0.562 (0.085)	0.535 (0.087)	0.534 (0.086)	n.s.
		0.96	0.545 (0.064)	0.564 (0.062)	0.545 (0.064)	0.547 (0.063)	n.s.
		0.97	0.528 (0.046)	0.543 (0.044)	0.528 (0.046)	0.529 (0.045)	n.s.
		0.98	0.485 (0.036)	0.496 (0.031)	0.485 (0.036)	0.484 (0.034)	n.s.
		0.99	0.421 (0.021)	0.426 (0.022)	0.421 (0.021)	0.418 (0.022)	n.s.
		1.00	0.331 (0.011)	0.330 (0.013)	0.331 (0.011)	0.320 (0.009)	n.s.
February	50	0.95	0.443 (0.040)	0.464 (0.041)	0.443 (0.040)	0.443 (0.040)	n.s.
		0.96	0.476 (0.038)	0.483 (0.042)	0.476 (0.038)	0.473 (0.039)	n.s.
		0.97	0.428 (0.024)	0.434 (0.027)	0.428 (0.024)	0.426 (0.025)	n.s.
		0.98	0.423 (0.018)	0.426 (0.019)	0.423 (0.018)	0.421 (0.018)	n.s.
		0.99	0.371 (0.016)	0.364 (0.018)	0.371 (0.016)	0.363 (0.017)	n.s.
		1.00	0.296 (0.008)	0.297 (0.009)	0.296 (0.008)	0.292 (0.009)	n.s.
March	180	0.95	0.326 (0.024)	0.341 (0.024)	0.326 (0.024)	0.327 (0.023)	n.s.
		0.96	0.342 (0.017)	0.343 (0.020)	0.342 (0.017)	0.338 (0.019)	n.s.
		0.97	0.339 (0.017)	0.339 (0.017)	0.339 (0.017)	0.334 (0.018)	n.s.
		0.98	0.329 (0.012)	0.330 (0.012)	0.329 (0.012)	0.325 (0.012)	n.s.
		0.99	0.292 (0.009)	0.289 (0.008)	0.292 (0.009)	0.287 (0.009)	n.s.
		1.00	0.243 (0.004)	0.245 (0.006)	0.243 (0.004)	0.239 (0.005)	n.s.
April	180	0.95	0.319 (0.023)	0.328 (0.026)	0.319 (0.023)	0.317 (0.024)	n.s.
		0.96	0.335 (0.023)	0.335 (0.025)	0.335 (0.023)	0.332 (0.024)	n.s.
		0.97	0.325 (0.014)	0.325 (0.017)	0.325 (0.014)	0.321 (0.017)	n.s.
		0.98	0.319 (0.012)	0.318 (0.017)	0.319 (0.012)	0.315 (0.014)	n.s.
		0.99	0.292 (0.008)	0.289 (0.011)	0.292 (0.008)	0.283 (0.010)	n.s.
		1.00	0.244 (0.005)	0.241 (0.006)	0.244 (0.005)	0.232 (0.006)	n.s.
May	150	0.95	0.315 (0.013)	0.317 (0.015)	0.315 (0.013)	0.312 (0.014)	**
		0.96	0.331 (0.015)	0.328 (0.016)	0.331 (0.015)	0.327 (0.015)	n.s.
		0.97	0.325 (0.012)	0.323 (0.014)	0.325 (0.012)	0.321 (0.013)	n.s.
		0.98	0.325 (0.014)	0.325 (0.017)	0.325 (0.014)	0.320 (0.016)	*
		0.99	0.298 (0.009)	0.292 (0.010)	0.298 (0.009)	0.288 (0.011)	n.s.
		1.00	0.250 (0.004)	0.245 (0.005)	0.250 (0.004)	0.235 (0.006)	n.s.
June	80	0.95	0.281 (0.021)	0.288 (0.025)	0.281 (0.021)	0.280 (0.022)	**
		0.96	0.310 (0.011)	0.309 (0.013)	0.310 (0.011)	0.307 (0.012)	n.s.
		0.97	0.316 (0.011)	0.311 (0.012)	0.316 (0.011)	0.311 (0.011)	n.s.
		0.98	0.308 (0.010)	0.306 (0.010)	0.308 (0.010)	0.304 (0.010)	n.s.
		0.99	0.278 (0.007)	0.274 (0.009)	0.278 (0.007)	0.270 (0.007)	n.s.
		1.00	0.238 (0.004)	0.232 (0.005)	0.238 (0.004)	0.227 (0.005)	n.s.

^† n.s.:

p \geq 0.050

, *:

p < 0.050

, **:

p < 0.010

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kakimoto, Y.; Omae, Y.; Takahashi, H. Analysis of Sparse Trajectory Features Based on Mobile Device Location for User Group Classification Using Gaussian Mixture Model. Appl. Sci. 2025, 15, 982. https://doi.org/10.3390/app15020982

AMA Style

Kakimoto Y, Omae Y, Takahashi H. Analysis of Sparse Trajectory Features Based on Mobile Device Location for User Group Classification Using Gaussian Mixture Model. Applied Sciences. 2025; 15(2):982. https://doi.org/10.3390/app15020982

Chicago/Turabian Style

Kakimoto, Yohei, Yuto Omae, and Hirotaka Takahashi. 2025. "Analysis of Sparse Trajectory Features Based on Mobile Device Location for User Group Classification Using Gaussian Mixture Model" Applied Sciences 15, no. 2: 982. https://doi.org/10.3390/app15020982

APA Style

Kakimoto, Y., Omae, Y., & Takahashi, H. (2025). Analysis of Sparse Trajectory Features Based on Mobile Device Location for User Group Classification Using Gaussian Mixture Model. Applied Sciences, 15(2), 982. https://doi.org/10.3390/app15020982

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analysis of Sparse Trajectory Features Based on Mobile Device Location for User Group Classification Using Gaussian Mixture Model

Abstract

1. Introduction

2. Literature Review

2.1. Trajectory Analysis Using Semantic Information

2.2. Trajectory Analysis Using Only Spatial–Temporal Information

2.2.1. Similarity-Based Methods

2.2.2. Trajectory Prediction and Recovery for Sparse Trajectory

2.3. Feature Extraction Applicable to Sparse Trajectories Without Semantic Information: A Comparative Baseline

3. Trajectory Data

3.1. PTDP Data

3.2. Data Assumption

3.3. Imputation for Missing Location Data

4. Feature Extraction Method for Sparse Trajectories

4.1. Feature Extraction by GMM

4.1.1. Determining RPs via GMM

4.1.2. Aggregation of User Trajectories into RPs

4.2. Deep Feature Extraction: A Comparative Baseline

5. Experimental Setup

5.1. Data Summary

5.2. Under Sampling for Imbalanced Data

5.3. ML Models for Classification

5.4. Metrics

6. Numerical Experiment

6.1. Methods

6.1.1. Tuning GMD Components n for Model Performance

6.1.2. Comparison of GMM and IDNN

6.1.3. Performance by the Missing Value Rate of the User Trajectory

6.2. Results and Discussion: Tuning GMD Components n for Model Performance

6.3. Results and Discussion: Comparison of the GMM and IDNN

6.4. Results and Discussions: Performance by the Missing Value Rate of the User Trajectory

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI