Hierarchical Clustering Algorithm for Multi-Camera Vehicle Trajectories Based on Spatio-Temporal Grouping under Intelligent Transportation and Smart City

Wang, Wei; Xie, Yujia; Tang, Luliang

doi:10.3390/s23156909

Open AccessArticle

Hierarchical Clustering Algorithm for Multi-Camera Vehicle Trajectories Based on Spatio-Temporal Grouping under Intelligent Transportation and Smart City

by

Wei Wang

¹,

Yujia Xie

^1,* and

Luliang Tang

²

¹

College of Information Engineering, Nanjing University of Finance & Economics, Nanjing 210023, China

²

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430072, China

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(15), 6909; https://doi.org/10.3390/s23156909

Submission received: 19 June 2023 / Revised: 28 July 2023 / Accepted: 1 August 2023 / Published: 3 August 2023

(This article belongs to the Section Sensor Networks)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

With the emergence of intelligent transportation and smart city system, the issue of how to perform an efficient and reasonable clustering analysis of the mass vehicle trajectories on multi-camera monitoring videos through computer vision has become a significant area of research. The traditional trajectory clustering algorithm does not consider camera position and field of view and neglects the hierarchical relation of the video object motion between the camera and the scenario, leading to poor multi-camera video object trajectory clustering. To address this challenge, this paper proposed a hierarchical clustering algorithm for multi-camera vehicle trajectories based on spatio-temporal grouping. First, we supervised clustered vehicle trajectories in the camera group according to the optimal point correspondence rule for unequal-length trajectories. Then, we extracted the starting and ending points of the video object under each group, hierarchized the trajectory according to the number of cross-camera groups, and supervised clustered the subsegment sets of different hierarchies. This method takes into account the spatial relationship between the camera and video scenario, which is not considered by traditional algorithms. The effectiveness of this approach has been proved through experiments comparing silhouette coefficient and CPU time.

Keywords:

intelligent transportation; smart city; computer vision; video GIS; multi-camera; vehicle trajectory; hierarchical clustering

1. Introduction

Vehicle trajectory data analysis is a research hotspot in intelligent transportation and smart city [1]. People can deeply understand their life trajectory, social behavior, environmental change, and urban evolution using video GIS, video object recognition, and visual analysis of video object trajectory [2,3]. In recent years, the urban video monitoring system has gradually developed from single-camera processing to multi-camera equipment joint analysis. The system generates a large number of trajectory data under the multi-camera joint monitoring system and records the movement of people, vehicles, and animals in various scenarios [4], with such characteristics as easy deployment, intuitive information, and rich media expression. Dividing each vehicle trajectory into a suitable cluster by measuring its similarity is a crucial and challenging aspect of trajectory analysis [5]. However, there are some areas for improvement in current clustering methods for video object trajectory research. Specifically, the analysis object is limited to image trajectory [6], ignoring the actual trajectory of the video object in geographical space. Additionally, the traditional trajectory clustering algorithm does not consider the geographic space relation of multi-cameras and only clusters datasets in a single camera and a single scenario, resulting in poor real-time performance and accuracy in large-scale cross-camera or even cross-camera group trajectory data.

Clustering large-scale trajectory data is challenging. First, the trajectory is complex, so the computational cost for clustering analysis is high. Second, the trajectories sampled by the multi-camera joint monitoring system are not equidistant, making it difficult to unify the dimensions between different video object trajectories, so reasonable point correspondence rules should be given to measure the difference in distance calculation. Finally, there is a vision-blind area and a long distance between camera groups, so the direct calculation without simplifying trajectories causes the local differences between trajectories to be overlooked due to the large global scope. The traditional algorithms only cluster the equidistant trajectories of sampling points under a single camera for small-scale datasets without considering the hierarchical relation between camera position, the field of view, and the motion of video objects between cameras and scenarios and without reasonable analysis of cross-group trajectory data. Therefore, this paper proposed a hierarchical clustering algorithm for multi-camera vehicle trajectories based on spatio-temporal grouping. Based on our previous work, we have done further research in this paper [7]. In this paper, the camera groups are divided according to the spatial distribution law of camera equipment in reality, which effectively makes up for the deficiency of traditional methods that do not take into account the spatial relationship between camera equipment. The point correspondence rule of optimal unequal length trajectories and the overlapping scale factor of trajectory distance under camera-joint system are proposed to effectively calculate the distance between unequal multi-camera trajectories and effectively reduce the computational cost compared with the traditional methods. In addition, this paper also proposes a method to automatically obtain labels of video object trajectories in the camera group and across camera groups, which can achieve better clustering results through additional useful supervision information.

This paper is mainly structured as follows: Section 2 introduces the relevant work in the research field involved; Section 3 presents the method description, including the spatialization of video object trajectory, scenarios and division rule of camera groups, point correspondence rule of optimal unequal length trajectory, overlapping scale factor of trajectory distance under camera-joint system, and the details of the supervised video object clustering algorithm within and between camera groups; Section 4 demonstrates the visualization effect of the algorithm on the dataset and evaluates the accuracy and efficiency of the method through experiments; Section 5 discusses the implications of this work, as well as the limitations of this work and how it will be addressed in future works; Section 6 presents the conclusions and prospects.

2. Related Work

The research fields involved in this paper include video-geographic scenario data fusion organization and trajectory clustering. The research status of relevant fields is as follows.

2.1. Video-Geographic Scenario Data Fusion Organization

The data fusion organization of video-geographic scenarios is the basis of video object analysis combining geospatial information. Based on the concepts of Multi Media GIS [8], Geo Video [9], and Video GIS [10], the metadata description method [11] and GPS correlation method [12] were constructed in the earlier research, and geographic retrieval and play of video images were realized by describing the geographic location of video frames.

In recent years, research has focused more on the fusion of video content and geographic scenarios [13]. The fusion of video content and geographic scene can be subdivided into the method of strengthening video through geographic scene and the method of strengthening geographic scene by video [14]. The output result of the former is enhanced video, and the video content is visually enhanced by using the spatial information in the geographic scene [15], while the latter is constructed and presented through the geographic scene [16]. According to the different mapping objects, it can be divided into video frame picture projection and video object projection. The video frame picture projection maps the video frame image to the geographic scene according to the camera parameters. The video object projection realizes the separation of the front and rear scenes through the related theory of computer vision and projects into the geographical scene. The picture projection of video frame includes the methods of correlation [17], fixed plane play [18], global mapping [19], and so on. The video object projection can be divided into the methods of foreground and background independent projection [20], foreground projection [21], and foreground abstraction [22].

2.2. Trajectory Clustering

Common in pattern recognition, data analysis, and machine learning, trajectory clustering is an efficient method for analyzing trajectory data, aiming to obtain space, time, and even potential information in trajectory data. It is widely applied in such fields as object motion prediction [23], traffic monitoring [24], behavior understanding [25], anomaly detection [26], and 3D reconstruction [27]. In addition, data representation, feature extraction, and distance measurement selection are the key preliminary work of trajectory clustering. For example, a trajectory can be represented as a vector and downsampled to a uniform length, so the Euclidean distance [28] can be used. Trajectories can also be considered as samples of the probability distribution, so Bhattacharyya distance [29] can be used to measure the distance between two trajectories.

According to the availability of labeled data, there are three trajectory clustering methods: unsupervised, supervised, and semi-supervised models. The unsupervised model aims to cluster the data without human experts supervising or labeling the data and obtain the reasoning function by analyzing the unlabeled dataset [30]. The supervised model is learned before trajectory clustering. The labeled data are usually used to learn the function that maps data to its label, the function that predicts unlabeled data clustering [31]. The labeled data often require heavy manual work by experts. The semi-supervised algorithm is trained by labeled data and adjusted by unlabeled data [32].

The unsupervised trajectory clustering algorithm does not rely on other prior knowledge but uses constructors to describe the implicit relationship between trajectory samples. The representative methods are densely clustering models [33], hierarchical clustering models [34], and spectral clustering models [35]. By directly defining the density-reachable rule, densely clustering models can obtain the maximum set of density connected samples as cluster clusters. Hierarchical clustering models are divided into “top-down [36]” type and “bottom-up [37]” type. The former regards each trajectory sample as a cluster and gradually merges into larger clusters by defining the similarity between clusters; the latter gradually divides the trajectory data set into smaller groups by using the idea of divide and conquer until it meets the requirements of clustering. Supervised trajectory clustering algorithm is represented by the K-NN algorithm [38] and statistical model method [39]. The K-NN algorithm uses the average nearest distance as the evaluation criterion to realize trajectory clustering. The statistical model method uses the Gaussian mixture model [40] and other statistical models to construct the trajectory category probability function to obtain the trajectory sample category. In addition to the above methods, in recent years, with the continuous extension of deep learning, there are a large number of deep learning algorithms [41,42]. Semi-supervised trajectory clustering algorithm [43,44] fully combines supervised clustering algorithm with unsupervised clustering algorithm to reduce the label labor cost and minimize the over-fitting problem in the clustering process. The basic idea is to update the classifier by classifying the trajectory data and clustering the new data.

In order to understand the three kinds of algorithms more clearly, we give the tabular overview of different trajectory clustering algorithms in Table 1.

3. Method Description

3.1. Spatialization of Video Object Trajectory

Building the mapping relation between each camera image square plane and geographical space object square plane is necessary to transform from image space coordinates to geographical space coordinates. This paper used the contact point between the video object subgraph and the ground as the locating point and sampled at a certain time interval [45]. The video object trajectory in the geographic scenario was obtained from a building mapping relation between image space and geographic space [46], as shown in Figure 1.

This study uses the homography matrix method to construct a mapping model. If the image coordinate of a point is

q

, and the geographic space coordinate is

Q

, then the homogeneous coordinate of

q

and

Q

can be represented as

q = {[x y 1]}^{T},

(1)

Q = {[X Y Z 1]}^{T} .

(2)

If the mapping matrix is

M

, the relational expression between

q

and

Q

is

q = M Q .

(3)

The image plane is scaled, translated, and rotated to a geospatial plane, so the mapping matrix

M

can be decomposed into

M = s \cdot W \cdot R,

(4)

where

s

is the scale factor;

W

is the camera translation transformation matrix;

R

is a 3 × 4 dimensional rotation transformation matrix.

W = [\begin{matrix} f_{u} & 0 & u \\ 0 & f_{v} & v \\ 0 & 0 & 1 \end{matrix}]

(5)

R = [r_{1} r_{2} r_{3} e]

(6)

where

f_{u}

and

f_{v}

represent the product of the physical focal length of the lens and the sensor size in the horizontal and vertical axis directions of each unit, respectively;

u

and

v

represent the offset of the imaging center relative to the principal optic axis on the horizontal and vertical axis, respectively;

r_{1}

,

r_{2}

, and

r_{3}

represent the rotation relation of the coordinate system in the X-axis, Y-axis, and Z-axis in the physical space, respectively;

e

is the translation relation between coordinate systems.

When using the homography matrix method, it is assumed that the camera field of view plane in geographic space is horizontal—that is, Z = 0 at the plane. Thus, the mapping relation between image and geographic space can be regarded as the mapping from one plane to another. To simplify the calculation, the Z in

Q

and the

r_{3}

rotated about the Z-axis in

R

are removed. Hence, the homography matrix

M

is simplified as

M = s \cdot [\begin{matrix} f_{u} & 0 & u \\ 0 & f_{v} & v \\ 0 & 0 & 1 \end{matrix}] \cdot [r_{1} r_{2} r_{3} e] .

(7)

The geospatial coordinates of the video object trajectory can be obtained according to the solution of the matrix

M

.

3.2. Scenarios and Division Rule of Camera Groups

The closely adjacent location cameras within the group and the overlapped camera field of view are preferred when cameras are divided into groups. The weigh options rules between the two can be specified as

S e l_{r u l e} = \frac{1}{ε} \underset{cam \in C}{argmax} P (D| h) P (h),

(8)

where

C

represents all cameras to be divided, and

P (h)

represents the nearest neighbor in a position, which can be stated as the reciprocal of the distance between positions:

P (h) = \frac{1}{\sum ‖l o c ({c a m}_{u}) - l o c ({c a m}_{v})‖} .

(9)

P (D | h)

is the degree of view overlap, expressed by the area of view overlap:

P (D| h) = \sum a r e a ({c a m}_{u}) \cup a r e a ({c a m}_{v}) .

(10)

ε

represents the normalized constant parameter.

3.3. Point Correspondence Rule of Optimal Unequal Length Trajectories

Inspired by the dynamic time warping [47], to better compare the distances between two trajectories with different lengths, it is necessary to establish one-to-many and many-to-one matching so that the two trajectories have the same pattern of wave troughs and peaks perfectly matched. Two unequal-length trajectory sequences are known:

{T r a}_{x} = \{p_{x, j} (j = 1, \dots, {l e n}_{x})\}, {T r a}_{y} = \{p_{y, j} (j = 1, \dots, {l e n}_{y})\}, {l e n}_{x} \neq {l e n}_{y} .

(11)

The trajectory sequence may not have equidistant time points. A feature space represented by

F

is fixed. Its local distance measure is defined as a function:

c : F \times F \to R \geq 0 .

(12)

The defined sequence is the point correspondence sequence of optimal unequal-length trajectory:

S e q (S) = (s_{1}, s_{2}, \dots, s_{e}, \dots, s_{l e n (S e q (S))}) .

(13)

s_{e} = (n_{e}, m_{e}) \in [1 : N] \times [1 : M]

(14)

Then, the sequence

S e q (S)

satisfies

\{\begin{matrix} s_{1} = (1,1) \\ s_{l e n (S e q (S))} = (N, M) \\ \begin{matrix} n_{1} \leq n_{2} \leq . . . \leq n_{l e n (S e q (S))}, m_{1} \leq m_{2} \leq . . . \leq m_{l e n (S e q (S))} \\ s_{e + 1} - s_{e} \in {(1,0), (0,1), (1,1)} \end{matrix} \end{matrix} .

(15)

The distance cost between

{T r a}_{x}

and

{T r a}_{y}

is written as

c_{p} ({T r a}_{x}, {T r a}_{y}) = \sum_{l = 1}^{L} c ({T r a}_{x}_{n_{l}}, {T r a}_{y}_{m_{l}}) .

(16)

The optimal total cost between

{T r a}_{x}

and

{T r a}_{y}

is shown as the minimum value of

c_{p} ({T r a}_{x}, {T r a}_{y})

:

\min \{c_{p} ({T r a}_{x}, {T r a}_{y})\} .

(17)

3.4. Overlapping Scale Factor of Trajectory Distance under Camera-Joint System

This part explains the increase in distance calculation due to the time overlap of multiple cameras in the camera-joint system. Then it gives a method to obtain the scale factor of trajectory distance overlap to offset the increase in distance.

Assuming

G

as a camera group,

G = {{C a m}_{1}, {C a m}_{2}, {C a m}_{3}, \dots \dots, {C a m}_{n}} .

(18)

For

\forall

video object

{O b j}_{i} \in G

, it is assumed that

{O b j}_{i}

is captured by

G_{i} = \{{C a m}^{(1)}, {C a m}^{(2)}, \dots, {C a m}^{(s)}, \dots, {C a m}^{(t)}\}

, where

G_{i} \subseteq G .

(19)

According to the capture sequence, the trajectory corresponding to the video object can be expressed as a series of nodes:

{T r a}_{i} = \{p_{i, j}, (j = 1, \dots, {l e n}_{i})\},

(20)

p_{i, j} = ({C a m}_{i, j}, x_{i, j}, y_{i, j}, t_{i, j}),

(21)

where

{C a m}_{i, j} \in G_{i},

(22)

which represents that the video object is captured by

{C a m}_{i, j}

at this node.

Assuming the two trajectories

{T r a}_{1}

and

{T r a}_{2}

under the camera group formed by cameras

{C a m}_{1}

and

{C a m}_{2}

are shown in Figure 2.

In

{T r a}_{1}

, the total duration under two cameras is

t_{1}

, which can be divided into

t_{2}, t_{3}, t_{4}

, in which

t_{3}, t_{4}

are captured by only one camera while

t_{2}

is captured by two cameras at the same time;

{T r a}_{1}

is only captured by

{C a m}_{1}

during time

t_{5}

. If the distance between the two is calculated directly, the density of trajectory nodes will increase as cameras overlap in time

t_{2}

of

{T r a}_{1}

, leading to an increase in the distance value. Therefore,

{T r a}_{1}

is obtained first to eliminate the distance of the added part. The total global duration of

{T r a}_{2}

is

α_{1}

=

t_{1}

,

α_{2}

=

t_{5}

, respectively, while the actual calculation duration is

β_{1}

=

t_{1}

+

t_{2}

,

β_{2}

=

t_{5}

. Therefore, the scale factor of overlapping trajectories of

{T r a}_{1}

and

{T r a}_{2}

can be obtained as follows:

k_{i} = \frac{α_{i 1} + α_{i 2}}{β_{i 1} + β_{i 2}} .

(23)

This factor should be multiplied in the distance calculation to eliminate the increase in distance due to camera overlap.

This part details the trajectory clustering algorithm within the camera group (SCAIMG) and between camera groups (SCAIBG). The process of the algorithm is shown in Figure 3.

3.4.1. Supervised Clustering Label Acquisition

Supervised clustering results can be obtained with more additional useful supervised information than unsupervised clustering. For the video object trajectory under the camera joint system in this chapter, the “main camera” of video objects is defined as the label for the supervised clustering of trajectory samples. For the trajectory between camera groups, the number sequence of the camera group passed through is used as the label of the supervised clustering of the trajectory sample.

In the camera group of the camera-joint system, the camera number with the most capture times should be the same among video objects with the same characteristics. Therefore, for accurate trajectory data clustering, we use the camera number with the highest capture times corresponding to each video object as the label of supervised clustering:

l a b e l ({T r a}_{i}) = \max_{{Cam}_{i, j} correspond to {Tra}_{i}} {N U M (C a m}_{i, j}),

(24)

Furthermore, we used the camera group number as the label of supervised clustering between camera groups.

3.4.2. Trajectory Clustering Algorithm within the Camera Group

The trajectory clustering algorithm within the camera group aims to obtain a group of clustering centers. First, the video objects in the camera group are spatialized, followed by clustering their trajectory. The pseudo-code of trajectory clustering within the group (SCAIMG) is shown in Algorithm 1.

Algorithm 1: Supervised trajectory clustering algorithm considering camera information in the multi-camera collaborative monitoring group (SCAIMG)

1 : Input : A camera group G = {{C a m}_{1}, {C a m}_{2}, {C a m}_{3}, \dots \dots, {C a m}_{n}}

, the trajectory set : T_{G} = \{{T r a}_{i}\}, {T r a}_{i} = \{p_{i, j}, (j = 1, \dots, {l e n}_{i})\}

.

2 : Output : a group of trajectory clustering centers C = {c_{1}, c_{2}, \dots c_{λ}}

.

3 :

Obtain the capture time of each video object under each camera according to Section 4.1

and use the camera number with the most capture times as the label : l a b e l ({T r a}_{i}) = \max_{{Cam}_{i, j} \in {Tra}_{i}} {N U M (C a m}_{i, j})

. Initialize the new set Φ = T_{G}

, set clusters number K

, initialize set a l l_v e c t o r s_u p d a t e d = [0,0, \dots, 0] (s i z e = K)

; Randomly select K

samples from Φ

as initial cluster centers C \leftarrow {c_{1}, c_{2}, \dots c_{λ}}

; Set the number of iterations parameter e p o c h s

; Set the learning rate η \in (0,1)

4 : According to Equation (23), obtain the overlap scale factor of each trajectory : k_{i} \leftarrow \frac{α_{i 1} + α_{i 2}}{β_{i 1} + β_{i 2}}

.

5 : s t e p = 0

6 : While not each vector in C

is updated and step < e p o c h s

:

7 : For each sample s from Φ

:

8 : calculate the distance between s

to each center c_{τ}

: d (s, c_{τ}) \leftarrow m i n \{c_{p} (s, c_{τ})\}

9 : c_{τ^{*}} \leftarrow c_{τ}, w h e r e τ^{*} = \arg \min_{τ \in \{1,2, \dots, K\}} d (s, c_{τ})

10 : If l a b e l (s) = = l a b e l (c_{τ}) :

11 : s^{'} \leftarrow s / (1 - η) \cdot d (s, c_{τ})

12 : Else :

13 : s^{'} \leftarrow s / (1 + η) \cdot d (s, c_{τ})

14 : a l l_v e c t o r s_u p d a t e d [s] = 1

15 : Return Φ

In Algorithm 1, line 11 and line 13 represent the operations of “approaching” and “moving away”, respectively. The schematic diagram of the approaching operation is shown in Figure 4. It is assumed that

l a b e l (s) = = l a b e l (c_{τ})

is known, and every node in the trajectory

s

moves towards

c_{τ}

.

The trajectory of the “moving away” operation moves in the opposite direction in Figure 4.

3.4.3. Trajectory Clustering Algorithm between Groups

We can combine the entry and departure points in each camera field of view of video object trajectories to reflect the dynamic characteristics of group trajectories [48,49] because the range of trajectories between groups is larger than that within a group. The entry point and departure point of the vehicle video object under each camera group are taken as the trajectory sampling points to form multiple trajectory subsegments, which are then hierarchically clustered.

The sampled trajectory is supposed as:

f_{i} = \{p_{i, j}, (j = 1, \dots, {l e n}_{f_{i}})\},

(25)

p_{i, j} = ({G r o u p}_{i, j}, x_{i, j}, y_{i, j}, t_{i, j}),

(26)

where

p_{i, j}

represents the entry point or departure point information, and

{G r o u p}_{i, j}

represents the

j

-th camera group number that video

f_{i}

passes through.

x_{i, j}, y_{i, j}

represent the coordinates, and

t_{i, j}

represents the timestamp.

Figure 5 shows the trajectory

f_{1}, f_{2}, f_{3}

in camera

G r o u p s 1 - 3

.

f_{1}

passes through

G r o u p 1

,

G r o u p 2

, and

G r o u p 3

, with five hierarchies;

f_{2}

passes through

G r o u p 1

and

G r o u p 2

, with three hierarchies;

f_{3}

passes through

G r o u p 3

, with one hierarchy.

f_{1}, f_{2}

contain trajectory subsegments of three hierarchies starting at

G r o u p 1

and ending at

G r o u p 2

;

f_{1}, f_{3}

contain trajectory subsegments in

G r o u p 3

, with one hierarcy.

We proposed a trajectory clustering algorithm between camera groups with multi-hierarchies to reasonably analyze the cross-camera group trajectories. The algorithm aims to obtain the

λ

sets of cluster centers:

\{C (1), C (3), \dots C (2 h - 1), \dots C (λ)\}, h \in N,

(27)

where

C (2 h - 1) = \{c_{2 h - 1,1}, c_{2 h - 1,2} \dots, c_{2 h - 1, k}, {\dots, c}_{2 h - 1, l}\}, h, k \in N,

(28)

c_{2 h - 1, k} = a v e r a g e (\{f_{2 h - 1, m}, f_{2 h - 1, m + 1}, \dots, f_{2 h - 1, m + 2 (λ - 1)}\}),

(29)

\{f_{2 h - 1, m}, f_{2 h - 1, m + 1}, \dots, f_{2 h - 1, m + 2 (λ - 1)}\} \in c_{2 h - 1, k},

(30)

where

λ

refers to the number of cross-camera groups of trajectories within a group,

C (2 h - 1)

refers to the cluster center set when the number of cross-camera groups of trajectories within groups is

2 h - 1

, and

k

refers to the

k

-th cluster center.

The pseudo-code of Supervised trajectory clustering algorithm considering camera information between groups (SCAIBG) is shown in Algorithm 2.

Algorithm 2: Supervised trajectory clustering algorithm considering camera information between groups (SCAIBG)

1 : Input : a group cross - camera groups video object trajectories : f_{i} = \{p_{i, j}, (j = 1, \dots, l e n_{f_{i}})\}

2 : Output : λ groups of trajectory clustering centers {C (1), C (2), \dots, C (λ)

}.

3 : For t \leftarrow 1 to λ :

4 : Initialize a new set Φ

5 : For u \leftarrow 1 to o b j_n u m

:

6 : If C n u m (f_{o b j (u)}) \geq λ

: / / C n u m (f_{o b j (u)})

indicates the total number of camera groups passed by o b j (u)

7 : For v \leftarrow 0 to u - λ - 1

:

8 : Add [f_{u}^{(v)}, f_{u}^{(v + 1)}, \dots, f_{u}^{(v + λ)}] to

Φ / / f_{u}^{(v)}

represents the v - th trajectory subsegment

of the u - th video object

9 : Set clusters number K_{λ}

10 : Randomly select K_{λ} samples from

Φ as Initial cluster centers : \{C (λ)\} = \{c_{t, 1}, c_{t, 2}, \dots, c_{t, K_{λ}}\}

11 : Set e p o c h s

as the maximum number of iterations

12 : Repeat :

13 : For each sample s from Φ

:

14 : c_{t, w^{*}} \leftarrow c_{t, w}, w h e r e w^{*} = a r g \min_{w \in \{1, 2, \dots, K_{λ}\}} d (s, c_{t, w})

15 : If label (s) = label (c_{t, w^{*}})

:

16 : s^{'} = s / (1 - η) \cdot d (s, c_{t, w^{*}})

17 : Else

18 : s^{'} = s / (1 + η) \cdot d (s, c_{t, w^{*}})

19 : Until iterations exceed e p o c h s

or there is no change in cluster centers

20 : t \leftarrow t + 1

21 : Return Φ

4. Experimental Analysis

This part briefly introduces the experimental conditions and data and the trajectory clustering results within and between camera groups. Compared with traditional clustering algorithms, the advantages of the proposed algorithm are demonstrated by silhouette coefficient evaluation, and the time complexity is analyzed to prove the algorithm’s effectiveness.

4.1. Experimental Conditions and Data

The experimental data in this paper come from the CityFlowV2 dataset [50] newly launched by NVIDIA, the first large-scale dataset in the world to support cross-camera vehicle tracking, accommodating more than ten intersections and 40 cameras at the same time, with a spatial span greater than 3 km². This dataset contains high-definition synchronous videos collected in Dubuque, the United States, including scenarios such as residential areas and expressways.

The experimental data in this paper are all the original dataset’s video sequences. The experimental environment comprises software (Windows10, python 3.6 + sklearn0.0) and hardware (Intel (R) Core (TM) i7-10510U CPU @ 1.80 GHz 2.30 GHz, RAM 12.0 GB, and NVIDIA GeForce MX250). The algorithm to obtain the video object trajectory by preprocessing is as follows: the video dynamic object detection algorithm is Mask-RCNN [51], the tracking algorithm is TNT [52], and the cross-camera re-recognition algorithm is Deep SORT [53].

4.2. Camera Grouping of the CityFlowV2 Dataset

The CityFlowV2 dataset is divided into groups according to Equations (9) and (10). Partial division results are shown in Figure 6. In addition to the groups shown in Figure 6, Cam 1 to Cam 5 are a group, and Cam 6 to Cam 9 are a group.

4.3. Determination of the Number of Cluster Centers

The number of cluster centers is determined using the elbow method [54]. Due to the continuous optimization of the cluster center and smaller quadratic sum, the first inflection point appearing in the change of the quadratic sum is chosen as the best K value.

m i n (S S E) .

(31)

S S E = \sum_{k}^{i = 1} \sum_{p \in C_{i}} {|p - m_{i}|}^{2} .

(32)

For SCAIMG and SCAIBG, the cluster center value of each camera group is determined using the elbow method, as shown in Figure 7 and Figure 8. The number of cluster centers of each camera is the abscissa K corresponding to the blue box.

According to Figure 7 and Figure 8, we can respectively determine the reasonable number of cluster centers K according to the number of cluster centers in each group—the inflection point in the curve of average distortion degree.

4.4. Determination of the Initial Cluster Center

If the initial cluster centers for SCAIMG and SCAIBG are randomly selected, this usually results in slightly different clusters at the end when the clustering algorithm is rerun-ed. In order to solve the above problem, we refer to the work of the other literature [55], make improvements, and propose the initial cluster center acquisition method.

Assume that there are

L

. trajectories for clustering, and the number of supervised clustering label categories is

ζ

. First, a trajectory sample is selected from each category—that is, a total of

ζ

trajectory samples are selected as the initial cluster centers, and then the sample with the largest average distance from

ζ

. centers is selected from the remaining

L - ζ

samples as the

ζ + 1

-th initial cluster center, and so on until K initial cluster centers are selected.

4.5. Algorithm Results and Effect Evaluation

4.5.1. Trajectory Clustering Results within a Group

First, we demonstrate the visualization results of clustering ten camera groups with overlapping cameras in each group. For example, the visualization effect of the trajectories in group 1 is shown in Figure 9:

The trajectories under each camera group were obtained in chronological order. The clustering results of SCAIMG in each group are shown in Figure 10. The parameters are set to

e p o c h s = 500, η = 0.01

. The red line represents the vehicle trajectory, while the green line is the cluster center.

Through the visualization effect, a small number of cluster centers represent the overall trend of trajectory data of 10 camera groups.

4.5.2. Trajectory Clustering Results between Camera Groups

In this paper, we performed clustering analysis on trajectory subsegments with three and five hierarchies. There are trajectory subsegments with seven, nine, and eleven or even more hierarchies. However, in these cases, there is a small number of trajectory subsegments, making the clustering analysis lose its practical significance because clustering is to extract a large number of trajectories of the overall trend.

The number of the trajectory subsegments at different hierarchies is shown in Table 2:

The results of vehicle trajectory clustering between groups with three hierarchies (upper) and five hierarchies (lower) are shown in Figure 11.

The trajectories between groups have a larger geographical range than those within a group. It can be seen from the visualization effect that the SCAIBG has achieved the overall trend of vehicles moving between groups. The green line represents all vehicle trajectories between groups, while the arrows in other colors represent the overall trend of vehicles moving at different hierarchies.

4.5.3. Cluster Effect Evaluation

The silhouette coefficient [56] is used to compare and analyze the traditional DBSCAN-based method [57] with the proposed algorithm to verify the effectiveness of the proposed method. The average distance between the sample and other samples in the same cluster and the average distance between the sample and the next nearest cluster are combined for evaluation:

S i l h o u e t t e C o e f f i c i e n t (ζ) = |1 - a (ζ) / b (ζ)| \in [- 1, 1] .

(33)

Figure 12 shows the

S i l h o u e t t e C o e f f i c i e n t

comparison of trajectory clustering within a group.

Figure 13 shows the silhouette coefficient comparison of trajectory clustering between groups.

For SCAIMG, the point correspondence of trajectory is considered when trajectory clustering within a group is so that the unequal-length trajectories can move reasonably according to the corresponding relation. The distance overlapping scale factor is used to offset the unreasonable distance increase between trajectories caused by camera overlap to obtain accurate trajectory centers. However, the traditional algorithm does not consider the corresponding relation between trajectories. It can only obtain the approximate distance between trajectories when measuring the trajectory similarity, so the cluster center obtained from the clustering results can not accurately represent the overall trend of trajectories. In addition, the camera number with the most capture times is used as the clustering label for supervised clustering, considering the spatial relationship between the camera and the video scenario. Figure 12 reveals that the SCAIMG is superior to the traditional trajectory clustering algorithm in clustering each camera group in large-scale vehicle trajectory data such as CityFlowV2.

For the trajectories between groups, similar to SCAIMG, SCAIBG takes group number as the cluster label for supervised clustering and considers the spatial position of camera groups and the spatial relation of video scenario, so the clustering effect is also better than that of the traditional algorithm, as shown in Figure 13.

4.5.4. Algorithm Time Analysis

For SCAIMG, it is assumed that there are

n

video objects. It takes

O (n)

. to obtain the overlap scale factor for each trajectory. The corresponding relation and distance between each pair of trajectories can be calculated before the iteration. The corresponding relation between each pair of trajectories can be accelerated to

O (n)

through coarse-graining, projection, and fine-graining [58], so it takes time

\leq n \cdot O (n) = O (n^{2})

, in general,

T_{M} (n) \leq O (n^{2}) .

(34)

Therefore, the algorithm maintains linear time complexity, proving that the algorithm is effective.

Under the hardware given, the CPU time required by SCAIBG on the CityFlowV2 dataset is shown in Table 3.

The algorithm can obtain clustering results in a few seconds or even one second in groups other than groups 2 and 3. The longer time in groups 2 and 3 is because some trajectories within the group are long, taking it extra time to calculate the distance. Overall, SCAIMG can quickly extract the cluster center of large-scale vehicle trajectories.

For the SCAIBG, the clustering results in

λ

group of prototype vectors, with

o b j_n u m

video objects. For each video object

o b j (u)

, it passes through

C n u m (f_{o b j (u)})

camera groups. Therefore, there should be

u - λ - 1

subsegment of the trajectory under hierarchy

u

. It takes

O (K_{t})

to randomly select

K_{t}

samples. In general, if the iterations in the learning phase reach

e p o c h s

or the cluster center does not change, it will take a time of

T_{B} (n) \leq O (I t e r_{m a x} \cdot \sum_{u = 1}^{o b j_{n u m}} C n u m (f_{o b j (t)}) \cdot K_{t}) = O (n) .

(35)

I t e r_{m a x}

,

o b j_{n u m}

,

C n u m (f_{o b j (t)})

, and

K_{t}

represent the maximum number of iterations, the number of video objects, the number of cross-camera trajectories corresponding to video objects, and the number of cluster centers, respectively. The linear time complexity of the algorithm proves that the algorithm is effective.

Under the hardware given, the CPU time required for different hierarchies of trajectories is shown in Table 4.

SCAIBG can effectively simplify the calculation of regular distance by extracting the start and end points of the trajectory. Therefore, it can be seen from Table 4 that the SCAIBG performs quite fast in extracting cluster centers between groups in large-scale vehicle datasets.

5. Discussion

The purpose of “Hierarchical Clustering Algorithm for Multi-camera Vehicle Trajectories” proposed in this paper is to represent vehicle video objects with similar motion patterns by obtaining different levels of video object clustering centers as the representative of vehicle video object motion patterns. The method in this paper has and is not limited to the following practical applications:

(1): Multi-scale vehicle path prediction. According to priori theories, we can combine time information to analyze the multi-scale prediction of vehicle moving path.
(2): Urban Planning and Road Planning. Different levels of clustering results, combined with road constraints, provide effective experience and technical support for urban planning departments and traffic management departments.
(3): Trajectory outliers detection. The clustering results can provide a prerequisite for vehicle trajectory outliers detection. Trajectory outliers are a small number of trajectories that are obviously different from other trajectories data in the trajectory data set, so vehicle trajectory outliers can be analyzed through the distance between the vehicle trajectories and clustering centers.

In the above applications, we think that the trajectory clustering algorithm should ensure high accuracy and efficiency. High accuracy is the premise of the accuracy of the follow-up work, and efficiency is the rigid requirement of some practical applications. For example, path prediction analysis is applied to navigation planning and trajectory anomaly detection to security departments. Rapid cluster analysis enables these applications to be implemented safely.

In this paper, the elbow method is used to determine the optimal number of clusters in clustering analysis, but this method may have some shortcomings in practical application, because the vehicle trajectory has road constraints, and there is a deviation simply according to the elbow method to determine the clustering center without taking into account the actual situation.

Therefore, in the future scientific research work, the final number of clustering centers should be determined by combining the algorithm and road constraints.

6. Conclusion and Prospects

This paper proposed a multi-camera vehicle trajectory hierarchical clustering algorithm based on spatio-temporal grouping, aiming at the problem that the traditional trajectory clustering algorithm did not consider the camera position information, the field of view, and the hierarchical relation of the video object movement between the camera and the scenario. This method used camera number and group number as labels and was a supervised clustering algorithm considering camera information applied to trajectories within and between groups, respectively, enabling trajectories to be classified into reasonable clustering centers.

The experiment showed the clustering results and the comparison and analysis of silhouette coefficients between the proposed method and other clustering methods, and the time complexity analysis proved the proposed algorithm’s accuracy, effectiveness, and high efficiency. The limitation of this paper lies in the lack of trajectory clustering analysis combined with spatial semantic features, which will be improved in future research.

Author Contributions

Conceptualization, W.W., Y.X. and L.T.; methodology, W.W.; software, W.W.; formal analysis, W.W., Y.X. and L.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (41801305) and the Open Research Fund of State Key Laboratory of Surveying, Mapping and Remote Sensing Information Engineering, Wuhan University (21S03).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

CityFlowV2 dataset at https://www.aicitychallenge.org (accessed on 2 July 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Srivastava, S.; Delp, E.J., III. Video-based real-time surveillance of vehicles. J. Electron. Imaging 2013, 22, 041103. [Google Scholar] [CrossRef]
Zhang, G.; Jia, S.; Zhang, X.; Li, X. Saliency-based foreground trajectory extraction using multiscale hybrid masks for action recognition. J. Electron. Imaging 2018, 27, 053049. [Google Scholar] [CrossRef]
Yan, X.; Yang, J.; Liu, Y.; Song, L. Multimodal based attention-pyramid for predicting pedestrian trajectory. J. Electron. Imaging 2022, 31, 053008. [Google Scholar] [CrossRef]
Wang, Z.; Yuan, X. Visual analysis of trajectory data. J. Comput.-Aided Des. Comput. Graph. 2015, 27, 9–25. [Google Scholar]
You, W.; Dong, C.; Wu, Q.; Qu, Y.; Wu, Y.; He, R. Joint task scheduling, resource allocation, and UAV trajectory under clustering for FANETs. China Commun. Engl. 2022, 19, 15. [Google Scholar] [CrossRef]
Su, J.; He, X.; Qing, L.; Niu, T.; Cheng, Y.; Peng, Y. A novel social distancing analysis in urban public space: A new online spatio-temporal trajectory approach. Sustain. Cities Soc. 2021, 68, 102765. [Google Scholar] [CrossRef]
Wang, W.; Xie, Y. Multi-Level Clustering Algorithm for Pedestrian Trajectory Flow Considering Multi-Camera Information. In Proceedings of the 2022 2nd International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), Virtual, 23–25 September 2022; pp. 691–698. [Google Scholar]
Charou, E.; Kabassi, K.; Martinis, A.; Stefouli, M. Integrating multimedia GIS technologies in a recommendation system for geotourism. In Multimedia Services in Intelligent Environments; Springer: Berlin/Heidelberg, Germany, 2010; pp. 63–74. [Google Scholar]
McDermid, G.J.; Franklin, S.E.; LeDrew, E.F. Remote sensing for large-area habitat mapping. Prog. Phys. Geogr. 2005, 29, 449–474. [Google Scholar] [CrossRef]
Navarrete, T.; Blat, J. VideoGIS: Segmenting and indexing video based on geographic information. In Proceedings of the 5th AGILE Conference on Geographic Information Science, Palma, Spain, 25–27 April 2002; p. 9. [Google Scholar]
Han, Z.; Kong, Y.; Qin, Q.; Wang, W. Geographic stereo video data analysis and model design. Geogr. Geo-Inf. Sci. 2013, 29, 1–7. [Google Scholar]
Feng, J.; Song, H. Analytical method for mobile elements in geo-video using random graph grammar. Geomat. Inf. Sci. Wuhan Univ. 2014, 39, 206–209. [Google Scholar]
Xie, Y.; Wang, M.; Liu, X.; Mao, B.; Wang, F. Integration of multi-camera video moving objects and GIS. Int. J. Geo-Inf. 2019, 8, 561. [Google Scholar] [CrossRef] [Green Version]
Milosavljević, A.; Rančić, D.; Dimitrijević, A.; Predić, B.; Mihajlović, V. A Method for Estimating Surveillance Video Georeferences. ISPRS Int. J. Geo-Inf. 2017, 6, 211. [Google Scholar] [CrossRef] [Green Version]
Lewis, P.; Fotheringham, S.; Winstanley, A. Spatial video and GIS. Int. J. Geogr. Inf. Sci. 2011, 25, 697–716. [Google Scholar] [CrossRef]
Walton, S.; Berger, K.; Ebert, D.; Chen, M. Vehicle object retargeting from dynamic traffic videos for real-time visualization. Vis. Comput. 2014, 30, 493–505. [Google Scholar] [CrossRef]
Du, R.; Bista, S.; Varshney, A. Video fields: Fusing multiple surveillance videos into a dynamic virtual environment. In Proceedings of the 21st International Conference on Web3D Technology, Anaheim, CA, USA, 22–24 July 2016; pp. 165–172. [Google Scholar]
Wu, C.; Zhu, Q.; Zhang, Y.; Du, Z.; Zhou, Y.; Xie, X. An adaptive organization method of geovideo data for spatio-temporal association analysis. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, 29. [Google Scholar] [CrossRef] [Green Version]
Cho, Y.; Park, J.; Kim, S.; Le, K.; Yoon, K. Unified framework for automated person re-identification and camera network topology inference in camera networks. arXiv 2017, arXiv:1704.07085. [Google Scholar]
Jian, H.; Liao, J.; Fan, X.; Xue, Z. Augmented virtual environment: Fusion of real-time video and 3D models in the digital earth system. Int. J. Digit. Earth 2017, 10, 1177–1196. [Google Scholar] [CrossRef]
Loy, C.; Xiang, T.; Gong, S. Time-delayed correlation analysis for multi-camera activity understanding. Int. J. Comput. Vis. 2010, 90, 106–129. [Google Scholar] [CrossRef]
Mehboob, F.; Abbas, M.; Rehman, S.; Khan, S.A.; Jiang, R.; Bouridane, A. Glyph-based video visualization on Google Map for surveillance in smart cities. EURASIP J. Image Video Process. 2017, 2017, 28. [Google Scholar] [CrossRef] [Green Version]
Chen, Z.; Shen, H.T.; Zhou, X.; Zheng, Y.; Xie, X. Searching trajectories by locations: An efficiency study. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, Indianapolis, IL, USA, 6–10 June 2010; pp. 255–266. [Google Scholar]
Gurung, S.; Lin, D.; Jiang, W.; Hurson, A.; Zhang, R. Traffic information publication with privacy preservation. ACM Trans. Intell. Syst. Technol. (TIST) 2014, 5, 44. [Google Scholar] [CrossRef]
Yao, T.; Wang, Z.; Xie, Z.; Gao, J.; Feng, D.D. Learning universal multiview dictionary for human action recognition. Pattern Recognit. 2017, 64, 236–244. [Google Scholar] [CrossRef]
Zhao, W.; Zhang, Z.; Huang, K. Gestalt laws based tracklets analysis for human crowd understanding. Pattern Recognit. 2018, 75, 112–127. [Google Scholar] [CrossRef]
Kumar, S.; Dai, Y.; Li, H. Spatio-temporal union of subspaces for multibody non-rigid structure-from-motion. Pattern Recognit. 2017, 71, 428–443. [Google Scholar] [CrossRef]
Nanni, M.; Pedreschi, D. Time-focused clustering of trajectories of moving objects. J. Intell. Inf. Syst. 2006, 27, 267–289. [Google Scholar] [CrossRef]
Li, X.; Hu, W.; Hu, W. A coarse-to-fine strategy for vehicle motion trajectory clustering. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; Volume 1, pp. 591–594. [Google Scholar]
Ferreira, N.; Klosowski, J.T.; Scheidegger, C.E.; Silva, C.T. Vector field k-means: Clustering trajectories by fitting multiple vector fields. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2013; Volume 32, pp. 201–210. [Google Scholar]
Yuan, Y.; Feng, Y.; Lu, X. Statistical hypothesis detector for abnormal event detection in crowded scenes. IEEE Trans. Cybern. 2017, 47, 3597–3608. [Google Scholar] [CrossRef]
Wang, L.; Dong, M. Detection of abnormal human behavior using a matrix approximation-based approach. In Proceedings of the 2014 13th International Conference on Machine Learning and Applications, Detroit, MI, USA, 3–5 December 2014; pp. 324–329. [Google Scholar]
Wang, R.; Zheng, W.; Huang, M.; Li, G. Driving Behavior Evaluation Based on DBSCAN and Kmeans++ Clustering. In Proceedings of the 2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), Wuhan, China, 22–24 April 2022; pp. 188–193. [Google Scholar]
Yao, D.; Hu, H.; Du, L.; Cong, G.; Han, S.; Bi, J. Trajgat: A graph-based long-term dependency modeling approach for trajectory similarity computation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 2275–2285. [Google Scholar]
Park, J.; Jeong, J.; Park, Y. Ship trajectory prediction based on bi-LSTM using spectral-clustered AIS data. J. Mar. Sci. Eng. 2021, 9, 1037. [Google Scholar] [CrossRef]
Zheng, Y.; Zhang, L.; Xie, X.; Ma, W.Y. Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 20–24 April 2009; pp. 791–800. [Google Scholar]
Guha, S.; Rastogi, R.; Shim, K. CURE: An efficient clustering algorithm for large databases. ACM Sigmod Rec. 1998, 27, 73–84. [Google Scholar] [CrossRef]
Zhang, L.; Zhu, Y.; Su, J.; Lu, W.; Li, J.; Yao, Y. A Hybrid Prediction Model Based on KNN-LSTM for Vessel Trajectory. Mathematics 2022, 10, 4493. [Google Scholar] [CrossRef]
Wu, J.; Cai, S.; Jin, H.; Liu, L. Vehicular delay tolerant network routing algorithm based on trajectory clustering and dynamic Bayesian network. Wirel. Netw. 2023, 29, 1873–1889. [Google Scholar] [CrossRef]
Zeng, W.; Xu, Z.; Cai, Z.; Chu, X.; Lu, X. Aircraft trajectory clustering in terminal airspace based on deep autoencoder and gaussian mixture model. Aerospace 2021, 8, 266. [Google Scholar] [CrossRef]
Zhong, G.; Zhang, H.; Zhou, J.; Zhou, J.; Liu, H. Short-Term 4D Trajectory Prediction for UAV Based on Spatio-Temporal Trajectory Clustering. IEEE Access 2022, 10, 93362–93380. [Google Scholar] [CrossRef]
Aparna, R.; Idicula, S.M. Spatio-temporal data clustering using deep learning: A review. In Proceedings of the 2022 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS), Larnaca, Cyprus, 25–26 May 2022; pp. 1–10. [Google Scholar]
Li, Q.; He, X.; Chen, K.; Ouyang, Q. A Two-Stage Semi-Supervised High Maneuvering Target Trajectory Data Classification Algorithm. Appl. Sci. 2022, 12, 10979. [Google Scholar] [CrossRef]
Ferreira, M.D.; Spadon, G.; Soares, A.; Matwin, S. A semi-supervised methodology for fishing activity detection using the geometry behind the trajectory of multiple vessels. Sensors 2022, 22, 6063. [Google Scholar] [CrossRef]
Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance measures and a data set for multi-target, Multi-Camera Tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
Kim, K.; Oh, S.; Lee, J.; Essa, I. Augmenting aerial earth maps with dynamic information from videos. Virtual Real. 2011, 15, 185–200. [Google Scholar] [CrossRef]
Kumawat, M.; Khaparde, A. Development of adaptive time-weighted dynamic time warping for time series vegetation classification using satellite images in Solapur district. Comput. J. 2022, bxac057. [Google Scholar] [CrossRef]
Cao, Y.; Tang, K.; Sun, J.; Ji, Y. Day-to-day dynamic origin–destination flow estimation using connected vehicle trajectories and automatic vehicle identification data. Transp. Res. Part C 2021, 129, 103241. [Google Scholar] [CrossRef]
Xi, J.; Jia, F.; Feng, J. An online estimation method for passenger flow OD of urban rail transit network by using AFC data. J. Transp. Syst. Eng. Inf. Technol. 2019, 18, 129–135. [Google Scholar]
Tang, Z.; Naphade, M.; Liu, M.Y.; Yang, X.; Birchfield, S.; Wang, S.; Kumar, R.; Anastasiu, D.; Hwang, J.N. Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8797–8806. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Zhao, H.; Gao, J.; Lan, T.; Sun, C.; Sapp, B.; Varadarajan, B.; Shen, Y.; Shen, Y.; Chai, Y.; Anguelov, D. Tnt: Target-driven trajectory prediction. In Proceedings of the Conference on Robot Learning, London, UK, 8–11 November 2021; pp. 895–904. [Google Scholar]
Hou, X.; Wang, Y.; Chau, L.P. Vehicle tracking using deep sort with low confidence track filtering. In Proceedings of the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, China, 18–21 September 2019; pp. 1–6. [Google Scholar]
Zheng, Z.; Zheng, L.; Yang, Y. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3754–3762. [Google Scholar]
Tekler, Z.D.; Low, R.; Gunay, B.; Andersen, R.K.; Blessing, L. A scalable Bluetooth Low Energy approach to identify occupancy patterns and profiles in office spaces. Build. Environ. 2020, 171, 106681. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef] [Green Version]
Yu, X.; Long, W.; Li, Y.; Gao, L.; Shi, X. Trajectory dimensionality reduction and hyperparameter settings of DBSCAN for trajectory clustering. IET Intell. Transp. Syst. 2022, 16, 691–710. [Google Scholar] [CrossRef]
Salvador, S.; Chan, P. Toward accurate dynamic time warping in linear time and space. Intell. Data Anal. 2007, 11, 561–580. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of video object trajectory spatialization (blue lines represent the image space trajectories and the geospatial trajectories of the vehicle video object).

Figure 2. Schematic diagram of trajectory capture camera overlay.

Figure 3. Flow chart of hierarchical clustering algorithm for vehicle trajectory considering camera information under the multi-camera joint monitoring system.

Figure 4. Schematic diagram of trajectory “approaching” the cluster center.

Figure 5. Trajectory subsegments.

Figure 6. Camera Grouping Results of the CityFlowV2 Dataset (numbers represent camera numbers, red arrows represent the main optical axis direction of the camera lens, and ellipses of different colors represent different camera groups).

Figure 7. Schematic diagram for determining the number of cluster centers of camera groups in SCAIMG (blue boxes represents the optimal K value).

Figure 8. Schematic diagram for determining the number of clustering centers of three and five hierarchies in SCAIBG (blue boxes represents the optimal K value).

Figure 9. Schematic diagram of trajectory acquisition of each camera in Group 1 (numbers represent camera numbers, red arrows represent the main optical axis direction of the camera lens, and different color lines represent trajectories under different cameras).

Figure 10. Schematic diagram of 10 groups of vehicle trajectory clustering in the CityflowV2 dataset (red lines represent the trajectory, and green lines represent the cluster center).

Figure 11. Schematic diagram of vehicle trajectory clustering between groups (three and five hierarchies; arrows of different colors represent cluster centers).

Figure 12. Silhouette coefficient comparison of trajectory clustering within a camera group.

Figure 13. Silhouette coefficient comparison of trajectory clustering between camera groups.

Table 1. Tabular overview of different trajectory clustering algorithms.

Category	Classification	Time Complexity	Antinoise Ability	Labor Cost	Measurement	Representative Algorithm
Unsupervised	Densely Models	O(n*logn)	General	low	Density	DBCSAN [33]
	Hierarchical Models	O(n²logn)	Strong	low	Distance	HITS [36]
	Spectral Models	O(n³)	Strong	low	Distance	CURE [37]
Supervised	Nearest Neighbor	O(n)	weak	high	Distance	K-NN [39]
	Statistical Models	O(n²)	weak	high	Pattern mining	GMM [41]
	Neural Network	Depends on network	General	high	Deep network characteristics	CNN [42]
Semi-supervised	Invented from unsupervised or supervised algorithms	Related to the invented algorithm	Related to the invented algorithm	General	Related to the invented algorithm	Modified Hierarchical Clustering Models [43], modified Statistical Models [44], etc.

Table 2. Number of trajectory subsegments (three and five hierarchies).

Hierarchies	Number of Trajectory Subsegments
3	460
5	391

Table 3. The CPU time required by SCAIMG for each group in the CityFlowV2 dataset.

Hierarchies	Number of Trajectory Subsegments
1	5.693
2	10.165
3	14.770
4	6.558
5	0.388
6	1.343
7	0.359
8	0.559
9	0.465
10	2.642

Table 4. The CPU time required by SCAIBG for each group in the CityFlowV2 dataset.

Hierarchies	Number of Trajectories	CPU Time
3	14.770	0.007
5	0.388	0.036

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, W.; Xie, Y.; Tang, L. Hierarchical Clustering Algorithm for Multi-Camera Vehicle Trajectories Based on Spatio-Temporal Grouping under Intelligent Transportation and Smart City. Sensors 2023, 23, 6909. https://doi.org/10.3390/s23156909

AMA Style

Wang W, Xie Y, Tang L. Hierarchical Clustering Algorithm for Multi-Camera Vehicle Trajectories Based on Spatio-Temporal Grouping under Intelligent Transportation and Smart City. Sensors. 2023; 23(15):6909. https://doi.org/10.3390/s23156909

Chicago/Turabian Style

Wang, Wei, Yujia Xie, and Luliang Tang. 2023. "Hierarchical Clustering Algorithm for Multi-Camera Vehicle Trajectories Based on Spatio-Temporal Grouping under Intelligent Transportation and Smart City" Sensors 23, no. 15: 6909. https://doi.org/10.3390/s23156909

APA Style

Wang, W., Xie, Y., & Tang, L. (2023). Hierarchical Clustering Algorithm for Multi-Camera Vehicle Trajectories Based on Spatio-Temporal Grouping under Intelligent Transportation and Smart City. Sensors, 23(15), 6909. https://doi.org/10.3390/s23156909

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hierarchical Clustering Algorithm for Multi-Camera Vehicle Trajectories Based on Spatio-Temporal Grouping under Intelligent Transportation and Smart City

Abstract

1. Introduction

2. Related Work

2.1. Video-Geographic Scenario Data Fusion Organization

2.2. Trajectory Clustering

3. Method Description

3.1. Spatialization of Video Object Trajectory

3.2. Scenarios and Division Rule of Camera Groups

3.3. Point Correspondence Rule of Optimal Unequal Length Trajectories

3.4. Overlapping Scale Factor of Trajectory Distance under Camera-Joint System

3.4.1. Supervised Clustering Label Acquisition

3.4.2. Trajectory Clustering Algorithm within the Camera Group

3.4.3. Trajectory Clustering Algorithm between Groups

4. Experimental Analysis

4.1. Experimental Conditions and Data

4.2. Camera Grouping of the CityFlowV2 Dataset

4.3. Determination of the Number of Cluster Centers

4.4. Determination of the Initial Cluster Center

4.5. Algorithm Results and Effect Evaluation

4.5.1. Trajectory Clustering Results within a Group

4.5.2. Trajectory Clustering Results between Camera Groups

4.5.3. Cluster Effect Evaluation

4.5.4. Algorithm Time Analysis

5. Discussion

6. Conclusion and Prospects

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI