1. Introduction
With the advent of sophisticated data acquisition technologies, there has been an upsurge in the acquisition and application of curve data across such diverse domains as signal processing, finance, medicine, and meteorology [
1,
2]. Curve data consist of geometric forms of sequence data that fundamentally represent a one-dimensional function defined as a real-valued function of time
t, where
. It is imperative to note that these curves are predominantly derived from observations at discrete time points
, with
denoting the observed value at the specific time point
. To facilitate further mining of curve data, it is essential to employ smoothing techniques that transform these discrete observations into functional curves [
3,
4].
Curve matching is an essential task in contexts such as classification, clustering, pattern recognition, and more. Many current dissimilarity or similarity methods have been extensively studied for curve matching [
5,
6,
7]. However, most of these are mainly based on the Euclidean distance (ED for short), which is not accurate enough to compute shape-based dissimilarity.
Figure 1 shows two curves belonging to different classes. From the subplot of
Figure 1, it can be seen that the Euclidean distance between these two curves is small. Therefore, the reason for their belonging to different classes mainly stems from the differences in their shapes. This means that calculating the dissimilarity between the curves solely on the basis of the Euclidean distance is inappropriate. Instead, information reflecting shape differences should be considered in the dissimilarity measure. Higher-order derivatives such as slope, curvature, concavity and convexity act as intrinsic geometric characteristics of curves, and as such are important factors in the variation of curve shapes. Therefore, a reasonable dissimilarity measure should comprehensively take into consideration both higher-order derivative features and location features.
To address the above issue, in this paper we propose a new weighted composite dissimilarity metric (WCDM) based on higher-order derivative information and location information. First, the shape discrepancy is calculated by point-to-point curvature difference, while the location difference is computed based on the ED between the pairs of points. Second, to allocate the weights to the curvature difference and the location difference, we define a new adaptive weighting function utilizing the relationship between the trends of two curves. This new weighting function can dynamically capture the effect of curve trends on shape discrepancy and location separation. In the end, our weighted composite dissimilarity metric (WCDM) is defined by integrating the weighted curvature difference and the ED. Our new dissimilarity metric can effectively recognize the shape-based dissimilarity between curves. Furthermore, to avoid the negative impact of randomly selected initial cluster centers on clustering task, an anchoring strategy is used in our experiments to obtain representative and reasonable initial cluster centers. The contributions of this paper can be summarized as follows:
We define a new weighted composite dissimilarity metric (WCDM) between two curves. The WCDM is more reasonable and explainable, and can effectively measure the dissimilarity between two curves.
We define a new adaptive weighting function to reasonably assign the weights based on the relationship between the trends of the curves.
We design a new strategy for selecting specific initial cluster centers, effectively bolstering the performance of curve K-medoids clustering and reducing the iteration number.
The remainder of this paper is organized as follows:
Section 2 revisits the relevant literature on metrics for determining curve dissimilarity and curve clustering methodologies;
Section 3 describes the proposed WCDM in detail and proves the metric properties satisfied by WCDM; in
Section 4, the improved K-medoids algorithm is explicated and the experimental results of the proposed techniques are presented and dissected; finaly, the paper ends with conclusions in
Section 5.
2. Related Works
2.1. Dissimilarity Measures between Curves
Dissimilarity measures serve as the cornerstone of curve data analysis tasks, and can be broadly classified into two categories: lock-step measures and elastic measures [
8,
9].
Functional theory [
10], which has achieved rapid advancement since the 1930s, has extensively applications such as the norm-induced metric
-norm (known as the Minkowski distance). In particular, the
-norm (Manhattan distance),
-norm (ED), and
-norm (Chebyshev distance) are favored in functional analysis due to their ease of implementation and robust theoretical underpinnings. As a metric,
-norm belongs to the class of lock-step measures.
Elastic measures imply the allocation one-to-many or one-to-none points on curves to allow for comparison. The Hausdorff distance, introduced in [
5,
11], measures the matching degree between two curves. The Fréchet distance, accounting for both the location and the ordering of points on curves, was presented from a computational perspective in [
6,
12,
13].
Time series, as a type of sequence data, have their own unique set of dissimilarity measures [
14]. A classical dissimilarity measure in this domain is DTW, which was introduced in [
15]. Recognizing the limitations of DTW, which is prone to pathological alignment, subsequent refinements have been made, resulting in the emergence of variants of DTW. DDTW [
16] aligns the estimated derivative sequences of time series. In [
17], a penalty-based DTW method was proposed, known as weighted dynamic time warping (WDTW). In [
18], the authors introduced Shape-DTW, which aligns the shape descriptors using DTW. LSDTW [
19] is an alignment method based on local slope information. Additionally, SSDTW [
7] leverages the maximum-overlap discrete wavelet transform to incorporate the structural information of time series and improve DTW.
Another series of dissimilarity measures applies to trajectory data, which is another form of curve data [
20]. Commonly used dissimilarity measures include extracting the longest common subsequences (LCSS) [
21], seeking the minimum edit operations number as the edit distance on real sequences (EDR) [
22], and using the one-sided distance (OWD) [
23] or locality in-between polylines (LIP) [
24].
Table 1 lists the properties of some dissimilarity measures. It is clear that the
-norm (
) is sensitive to noise, neglects higher-order information, and cannot handle curves with different lengths and local shifts; however, it is easy to implement. As elastic measures, the Hausdorff and Fréchet distances can be used to calculate the dissimilarity between curves with different lengths and do not need to adjust parameters. The variants of DTW utilize the higher-order derivative information of curves to some extent, and can measure the dissimilarity between curves with local shifts. Both LCSS and EDR can measure the dissimilarity between curves with different lengths and local shifts, and they both have strong anti-noise capabilities. In summary, although elastic measures provide a more precise assessment of curve dissimilarity compared to lock-step measures, they come with the tradeoff of higher computational complexity.
2.2. Clustering Methods for Curves
Clustering techniques for curve data can be broadly classified into three categories: two-stage methods, model-based methods, and non-parametric clustering methods [
25].
Two-stage clustering approaches are executed in two phases [
26,
27]. Initially, curve data are transformed into a set of coefficients or principal component scores of a finite-dimensional basis function. Subsequently, these dimensionally-reduced data undergo clustering [
28,
29,
30,
31].
Model-based clustering methods commence by converting curve data into a set of coefficients of finite-dimensional basis functions. Clustering models are then devised for these coefficients by considering them as random variables with specific probability distributions [
32], such as the expectation-maximization (EM) algorithm [
33] or the minimization–maximization (MM) algorithm [
34].
Non-parametric clustering techniques group curves by introducing distinct dissimilarity measures to curve data. These include the functional K-means clustering algorithm based on the distance
between curve functions [
35], the hierarchical clustering algorithm in combination with the semi-metric
proposed in [
36], the algorithm presented [
37], which integrates the K-means clustering method with
and
, and the shape-based approach grounded on Dynamic Time Warping (DTW) for household load curve clustering and prediction elucidated in [
38]. Moreover, due to their brevity and ease of implementation, the most widely used non-parametric clustering methods are the K-means and K-medoids algorithms. Despite their advantages, these algorithms exhibit a common limitation in their sensitivity to the initial cluster centers [
39,
40]. Various modifications have been proposed to enhance their performance in this area; for instance, [
41] introduced the K-means++ algorithm, which determines the initial cluster centers in a specified manner instead of randomly selecting them, while [
42] proposed a near-optimal large-scale K-medoids clustering algorithm to reduce the computational burden and memory load of the K-medoids algorithm when used on large-scale high-dimensional datasets.
3. Theories and Methods
In this section, we describe the technical details of our proposed metric for assessing the dissimilarity between curves.
3.1. Proposed Dissimilarity Metric: WCDM
In
Figure 2, the two curve segments in subplot (b) are more similar than those in subplot (a). The ED values between the curves in subplots (a) and (b) are
and
, respectively. Apparently,
; this result defies common human perception. In addition, as shown in
Figure 3, the curves in the dataset are usually compactly distributed. Therefore, merely considering the ED between curves cannot accurately measure the dissimilarity between curves. To properly measure the dissimilarity between the curves, higher-order derivatives that reflect the shape properties should be introduced into the dissimilarity measurement.
Curvature indicates the extent to which a curve deviates from a straight line. Serving as a geometric quantity delineating the shape of a curve, the change of curvature is inherently dictated by the curve itself; thus, the shape discrepancy of curves is absolutely influenced by their curvatures.
Let be a one-dimensional curve, and let denote the first-order and second-order derivatives of curve , respectively. The curvature of curve is provided by . The new dissimilarity measure between two curves can be defined in the following way.
Definition 1 (Weighted Composite Curve Dissimilarity Metric, WCDM)
. Let and be two planar curves on . The dissimilarity metric between and can be defined aswhere and are the curvatures of and on and ω is a weight value. The WCDM is obtained by integrating the weighted sum of the curvature difference and location distance. The WCDM is better able to discern curves, and is easier to understand because it inherits the merits of both differences. However, the weight value of the curvature difference and location distance need to be properly adjusted. The method for determining the weights is discussed in the next section.
3.2. The Weighting Function in the WCDM
The magnitudes of the effects of the curvature difference and location difference on the dissimilarity between two curves is not the same. In this subsection, we define a new weighting function to dynamically allocate the weight values by making full use of the relationships between the trends of curves.
As presented in the second column of
Figure 4, it is evident that when two points in curves have analogous slopes (highlighted by green dashed lines), the curvature information (presented in the third column of
Figure 4) does not exhibit significant variation. Yet, based on the original curves shown in the first column of
Figure 4, the location disparity is considerable. Hence, location differences predominantly influence the dissimilarity, warranting a larger weight assignment. Conversely, when the slopes of the curves diverge (marked by black dashed lines), curvature differences play a crucial role in determining dissimilarity and should be ascribed a greater weight. To address this, we introduce a weighting function based on the trends of the curves.
Let
and
be the tangent vectors of
and
, respectively, at an arbitrary time
; moreover,
is the cosine of the angle
between the tangent vectors
and
. Then, the weighting function at time
is defined as
where
is the logic function.
Because , we have . In addition, as the trends of the two curves become more similar, becomes larger; on the contrary, as the trends of the two curves become more different, becomes smaller.
The workflow for the WCDM is provided in
Figure 5. First, we calculate the ED between each pair of points on the two curves. Second, the first and second derivatives are computed at every point on each curve. Third, we calculate the curvature and the curvature difference at each pair of points on the curves. Fourth, the weight value at each point is determined using Equation (
2). Finally, we calculate the WCDM dissimilarity value according to Equation (
1). It is clear that the main differences between ED and WCDM dissimilarity measurements are in the weight calculation and shape discrepancy. Let
n denote the length of the curves; the time complexity of the ED for two curves is
, while the time complexity of the calculating weights and curvatures is
. Therefore, the time complexity of the WCDM is
.
3.3. Metric Property of the WCDM
In this subsection, we prove the metric property of the WCDM, which implies its reasonability.
Property 1. The WCDM defined in Equation (1) satisfies the following conditions: - 1.
(Non-negativity).
- 2.
if and only if coincides with (Identity).
- 3.
(Symmetry).
- 4.
There exists such that (Weak triangular inequality).
Therefore, the WCDM is a semi-metric.
Proof. (1) For arbitrary curves and , because , and are non-negative, we have .
(2) If , then , which means that and intersect and , which indicates that and have the same curvature. Thus, overlaps with . In contrast, when coincides with , then we have and ; hence, .
(3) Apparently, the WCDM satisfies symmetry in light of its definition.
(4) Suppose that
; for curves
and
, let
, and
be the weights in
,
, and
, respectively. Because
, we have the following:
Hence, for (actually, for arbitrary ) the WCDM satisfies the weak triangular inequality.
In conclusion, the proposed WCDM dissimilarity measure is a semi-metric. □
4. Experimental Scheme and Analysis
In this section, we evaluate the effectiveness of the WCDM through clustering and classification tasks. The experimental datasets were sourced from the UCR Time Series Data Mining Archive [
43]. These datasets underwent a preprocessing stage in which the data were smoothed utilizing the first-order exponential smoothing technique [
44] and the cubic spline interpolation fitting method [
45].
Table 2 provides the details of the datasets.
4.1. Strategy for Selecting the Initial Cluster Centers
To mitigate the negative impact of random selection of the initial cluster centers, we introduce a new selection strategy for obtaining initial cluster centers in our subsequent clustering experiments. For a curve set Y, the proposed selection process works as follows:
Calculate the sum of the dissimilarity metric between each curve and all other curves in the dataset Y.
Calculate the mean (
) and standard deviation (
) of the sums computed in step 1 according to Equations (
3) and (
4).
Select those curves with dissimilarity sums that fall within the interval and obtain the compact subset of Y.
Select the two curves with the lowest similarity in as the first two initial cluster centers and place them in .
Select the curve in that is most dissimilar to V, then add it to V as the next initial cluster center.
Repeat step 5 until all k initial centers are acquired.
where
m is size of the curve dataset
YBased on the anchoring strategy of the initial cluster centers, the detailed pseudocode of the improved K-medoids clustering algorithm (short for DCK-medoids) is provided in Algorithm 1, where
is the cluster assignment of curve
,
s is the number of iteration,
refers to the
i-th cluster assignment of curves at the
s-th iteration,
denotes the new cluster center of cluster
, and
.
Algorithm 1 Curve DCK-medoids clustering algorithm |
- Input:
Curve data set ; cluster number k; maximum number of allowed iterations ; parameter . - Output:
The cluster assignment of curves in Y.
- 1:
Calculate the initial cluster centers by the strategy described above; - 2:
Obtain the clusters by using Equation ( 5); - 3:
repeat - 4:
update the cluster centers by using Equation ( 6); - 5:
Obtain the clusters by using Equation ( 5); - 6:
update the cluster centers by using Equation ( 6); - 7:
until or the is reached. - 8:
return .
|
We conducted a series of experiments for DCK-medoids clustering to properly set the parameter. A high value of may lead to an invalid concentrated subset for some datasets, while a lower value may cause overscreening of the dataset, resulting in the curves in the compact subset falling into the same cluster. Thus, the experiments were conducted for variants ranging from 0.1 to 2.
The experimental results are listed in
Table 3. The optimal results for each dataset are highlighted in bold, while suboptimal results are underlined. From
Table 3, it can be seen that the most appropriate value of
is 0.5, as it produced the best results on most datasets.
To examine the effects of the initial cluster centers on clustering performance, we compared the DCK-medoids algorithm with K-means++ [
41,
46] and K-medoids [
47] based on the WCDM. In the K-means++ and K-medoids algorithms, the cluster centers at the
s-th iteration are selected using Equation (
6). Both the K-means++ and K-medoids algorithms were executed ten times for each dataset and the average results were recorded. Additionally, we recorded the number of iterations (abbreviated as NI) at which the algorithms converged. The experimental results are shown in
Table 4, with the best results for each dataset emphasized in bold.
From
Table 4, it can be seen that the DCK-medoids algorithm outperforms the other methods on a majority of the datasets when evaluated by Purity, ARI [
48], and NI. When considering the DI index [
49], DCK-medoids is slightly inferior to K-means++ for some datasets. Therefore, the experimental results demonstrate that the proposed initial cluster center selection method improves the effectiveness of the DCK-medoids clustering algorithm.
4.2. Evaluation of the Proposed WCDM
4.2.1. Comparison of Dissimilarity Measures for Clustering
Based on the improved K-medoids algorithm (DCK-medoids) introduced in
Section 4.1, this subsection evaluates the effectiveness of the proposed WCDM dissimilarity metric by comparing it with different classical dissimilarity measures. The compared dissimilarity measures include the ED [
10], DTW [
15], LSDTW [
19], Hausdorff distance [
11], Fréchet distance [
6], and LCSS [
21]. According to [
17] and to the experimental results, the parameters (
and
) in LCSS were set to the best-performing values. The parameters (
, and
k) in LSDTW were set to the same values provided in [
19]. The clustering results based on these comparative measures are presented in
Table 5,
Table 6 and
Table 7, where the last row indicates the number of wins in pairwise comparisons of the WCDM with other methods. The optimal results for each dataset are highlighted in bold, while suboptimal results are underlined.
The observations derived from
Table 5,
Table 6 and
Table 7 lead to the following conclusions: (1) in terms of Purity, WCDM outperforms ED on eight out of fourteen datasets datasets, DTW on eight, LSDTW on nine, Hausdorff on ten, Frechet on nine, and LCSS on eleven; (2) according to the Purity and ARI, the performance of the DCK-medoids clustering algorithm utilizing LSDTW is the best, which is also consistent with the fact that elastic measures considering higher-order derivatives are generally superior to other measures [
19]; (3) the performance of the DCK-medoids algorithm using ED is better than other dissimilarity measures under the DI metric, possibly because the DI values are calculated based on ED, which is favorable to using the ED; (4) compared to the best-performing LSDTW method, the Purity and ARI of the WCDM are increased by 7.5% and 11.4%, respectively, on the Beetlefly dataset; (5) as a lock-step measure, the WCDM can be used effectively for curve clustering tasks.
4.2.2. Comparison of Dissimilarity Measures for Classification
In order to verify that our proposed dissimilarity metric is also effective on curve classification tasks, we utilized the 1-nearest neighbor classifier to compare the curves in the test set against the training set for curve classification. Accuracy and
-score [
50] were utilized to comprehensively evaluate the performance of curve classification;
means the ratio of the number of correctly classified samples to the total number of samples, while
-score is the harmonic mean of precision and recall, taking values in the range
. The experimental results showing the classification validity are displayed in
Figure 6 and
Figure 7.
An examination of
Figure 6 indicates that the Accuracy of classification when using the WCDM is higher than that using the ED, DTW, Hausdorff distance, Fréchet distance, and LCSS on nine, ten, eight, nine, and nine of the tested datasets, respectively. In particular, the Accuracy of the WCDM on the Beef, Chinatown, and OliveOil datasets is increased by 13.32%, 10.08%, and 12.83%. From
Figure 7, it can be seen that the classification performance employing WCDM is superior to those using the ED, DTW, Hausdorff distance, Fréchet distance, and LCSS on more than half of the tested datasets. As a lock-step measure, the WCDM is more effective for classification tasks compared to the majority of dissimilarity measures. Although classification performance based on the WCDM falls short of that based on LSDTW across most datasets, this discrepancy can be attributed to the fact that the majority of curves in these datasets have a spiky and bumpy nature. As an elastic measure, LSDTW has a natural advantage in dealing with this case, as it can match one-to-many or one-to-none of points on the curves. However, the precise computation of higher-order derivatives in the WCDM is highly sensitive to these local perturbations, compromising its ability to effectively discern differences between these curves. Nonetheless, despite being a lock-step measure, the WCDM is effective for classification tasks compared to the majority of dissimilarity measures.
4.2.3. Time Efficiency of the WCDM
Concerning time complexity, it is noteworthy that the computational complexity of ED is
. Analogously, based on Equation (
1), we can discern that the computational complexity of the WCDM is
. As delineated in [
19], both DTW and LSDTW possess computational complexities of
. In [
6,
11], it was indicated that the time complexities for the Hausdorff and Fréchet distances are both
. Additionally, employing a dynamic programming algorithm, LCSS can be computed to have a complexity of
[
21], where
n denotes the length of the data.
We performed experiments to provide a visual representation of the time efficiency of the different dissimilarity measures. In these experiments, the time required by all comparative measures to measure the dissimilarity between the curves in the Beef, OliveOil, and GunPoint datasets was compared. The experimental results are depicted in
Figure 8. Clearly, our proposed WCDM dissimilarity metric excels over all other methods except the ED in terms of time efficiency. Therefore, WCDM is more effective in terms of time efficiency.
4.2.4. Application of the WCDM to Spectral Data
According to the definition of the WCDM in
Section 3.1, it can be inferred that it is applicable to general planar curves. To further verify its applicability, we employed the WCDM as the dissimilarity measure to conduct clustering and classification analysis on spectral data.
The spectra we used were selected from the Large Sky Area Multi-Object Fiber Spectroscopy Telescope (LAMOST) DR8 (
http://www.lamost.org/, accessed on 3 August 2024). To avoid the influence of red and blue ends, we limited the wavelength range to 3800–9000 Å. To eliminate the impact of spectral scale differences on the results, we normalized the spectral data. Experiments were conducted on A-type, F-type, G-type, and K-type stars with different signal-to-noise ratios (S/N) to evaluate the robustness of the WCDM in clustering and classification tasks. The true labels consisted of the spectral classes released by LAMOST.
Table 8 lists the information of the spectral data.
The experimental results are listed in
Table 9 with the best results highlighted in bold. According to
Table 9, when using the WCDM as dissimilarity measurement, the Purity and ARI of spectral data with high S/N (>30) are higher than those of spectral data with low S/N (<10). Similarly, the Accuracy and
-score of spectral data with high S/N are higher than those with low S/N. When comparing the clustering and classification results of the WCDM and ED, the WCDM shows better evaluation indices on the dataset with high S/N, while its indices are lower on the dataset with low S/N. These results indicate that the WCDM performs better when dealing with high-quality curve data, while it is prone to being affected by noise when dealing with low-quality data. The reason for this is that the WCDM relies on precise computational methods to calculate curve shape information, and these are sensitive to noise.
5. Conclusions
In this paper, we introduce a novel dynamic weighted composite dissimilarity metric, termed the WCDM, which takes into account both high-order derivative information and location information. Meanwhile, a new weighting function is defined by employing the relationship between the trends of curves. This weighting function can dynamically assign reasonable weight values to curvature difference and location difference. In this way, the WCDM can accurately measure the shape-based dissimilarity between curves. Based on the definition of the WCDM, it is feasible for application to general planar open curves. To perform comparison experiments on the curve clustering task, we used an improved K-medoids clustering algorithm in which a new pinpointing strategy is introduced to select the initial cluster centers. Comparative experiments were implemented on curve sets fitted by UCR datasets. The clustering results indicate that the DCK-medoids algorithm using the WCDM outperforms all other methods on more than eight datasets based on Purity and ARI. In terms of DI, the WCDM also performs better than most methods on more than half of datasets. Additionally, the classification results show that the WCDM is superior to the ED, DTW, Hausdorff distance, Fréchet distance, and LCSS on more than half the tested datasets based on Accuracy and -score. Therefore, the WCDM is a good choice for a measurement that is time-efficient and can measure shape-based dissimilarity. In a practical experiment, we applied the WCDM to spectral data. The results show its effectiveness when dealing with high-quality curve data; however, it is sensitive to noise.
In order to measure the morphological difference between curves, the methods used by the WCDM to compute the curvature and slope are sensitive to noise and local perturbations in curves. This leads to undesirable results when clustering or classifying curves affected by noise or local perturbations. To enhance the robustness of dissimilarity metric against local perturbations, a future direction of this work could involve approximation calculation of higher-order derivative characteristics.
Author Contributions
Conceptualization, Y.W.; data curation, J.C. and H.Y.; funding acquisition, J.C., H.Y. and J.W.; methodology, Y.W., J.C. and H.Y.; software, Y.W.; validation, J.C. and H.Y.; writing—original draft, Y.W.; writing—review and editing, J.W., B.L. and X.Z. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the National Natural Science Foundation of China (Grant Nos. 12473105, 12473106, 62306205), Projects of Science and Technology Cooperation and Exchange of Shanxi Province (Grant Nos. 202204041101037, 202204041101033), and the Fundamental Research Program of Shanxi Province (Grant No. 202203021222189).
Data Availability Statement
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Fontes, C.H.; Budman, H. A hybrid clustering approach for multivariate time series—A case study applied to failure analysis in a gas turbine. ISA Trans. 2017, 71, 513–529. [Google Scholar] [CrossRef] [PubMed]
- Izakian, H.; Pedrycz, W.; Jamal, I. Fuzzy clustering of time series data using dynamic time warping distance. Eng. Appl. Artif. Intell. 2015, 39, 235–244. [Google Scholar] [CrossRef]
- Guruswami, V.; Zuckerman, D. Robust Fourier and polynomial curve fitting. In Proceedings of the 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), New Brunswick, NJ, USA, 9–11 October 2016; pp. 751–759. [Google Scholar]
- Boullé, M. Functional data clustering via piecewise constant nonparametric density estimation. Pattern Recognit. 2012, 45, 4389–4401. [Google Scholar] [CrossRef]
- Alt, H.; Scharf, L. Computing the Hausdorff distance between curved objects. Int. J. Comput. Geom. Appl. 2008, 18, 307–320. [Google Scholar] [CrossRef]
- Alt, H.; Godau, M. Computing the Fréchet distance between two polygonal curves. Int. J. Comput. Geom. Appl. 1995, 5, 75–91. [Google Scholar] [CrossRef]
- Hong, J.Y.; Park, S.H.; Baek, J.G. SSDTW: Shape segment dynamic time warping. Expert Syst. Appl. 2020, 150, 113291. [Google Scholar] [CrossRef]
- Wang, X.; Mueen, A.; Ding, H.; Trajcevski, G.; Scheuermann, P.; Keogh, E. Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 2013, 26, 275–309. [Google Scholar] [CrossRef]
- Shifaz, A.; Pelletier, C.; Petitjean, F.; Webb, G.I. Elastic similarity and distance measures for multivariate time series. Knowl. Inf. Syst. 2023, 65, 2665–2698. [Google Scholar] [CrossRef]
- Yosida, K. Functional Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
- Bai, Y.B.; Yong, J.H.; Liu, C.Y.; Liu, X.M.; Meng, Y. Polyline approach for approximating hausdorff distance between planar free-form curves. Comput.-Aided Des. 2011, 43, 687–698. [Google Scholar] [CrossRef]
- Eiter, T.; Mannila, H. Computing Discrete Fréchet Distance; Technical Report CD–TR 94/64; Vienna University of Technology: Vienna, Austira, 1994. [Google Scholar]
- Filtser, A.; Filtser, O.; Katz, M.J. Approximate nearest neighbor for curves: Simple, efficient, and deterministic. Algorithmica 2023, 85, 1490–1519. [Google Scholar] [CrossRef]
- Holder, C.; Middlehurst, M.; Bagnall, A. A review and evaluation of elastic distance functions for time series clustering. Knowl. Inf. Syst. 2024, 66, 765–809. [Google Scholar] [CrossRef]
- Berndt, D.J.; Clifford, J. Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, USA, 14–17 August 1994; pp. 359–370. [Google Scholar]
- Eamonn, J.K.; Michael, J.P. Derivative Dynamic Time Warping. In Proceedings of the 2001 SIAM International Conference on Data Mining (SDM), Chicago, IL, USA, 5–7 April 2001; Volume 10, pp. 1–11. [Google Scholar]
- Jeong, Y.S.; Myong, K.J.; Olufemi, A.O. Weighted dynamic time warping for time series classification. Pattern Recognit. 2011, 44, 2231–2240. [Google Scholar] [CrossRef]
- Zhao, J.; Itti, L. shapeDTW: Shape Dynamic Time Warping. Pattern Recognit. 2018, 74, 171–184. [Google Scholar] [CrossRef]
- Yuan, J.; Lin, Q.; Zhang, W.; Wang, Z. Locally slope-based dynamic time warping for time series classification. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1713–1722. [Google Scholar]
- Chang, Y.; Tanin, E.; Cong, G.; Jensen, C.S.; Qi, J. Trajectory similarity measurement: An efficiency perspective. In Proceedings of the VLDB Endowment, Guangzhou, China, 26–30 August 2024; Volume 17, pp. 2293–2306. [Google Scholar]
- Vlachos, M.; Kollios, G.; Gunopulos, D. Discovering similar multidimensional trajectories. In Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, 26 February–1 March 2002; pp. 673–684. [Google Scholar]
- Chen, L.; Özsu, M.T.; Oria, V. Robust and fast similarity search for moving object trajectories. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, MD, USA, 14–16 June 2005; pp. 491–502. [Google Scholar]
- Lin, B.; Su, J. Shapes based trajectory queries for moving objects. In Proceedings of the 13th Annual ACM International Workshop on Geographic Information Systems, Bremen, Germany, 4–5 November 2005; pp. 21–30. [Google Scholar]
- Pelekis, N.; Kopanakis, I.; Marketos, G.; Ntoutsi, I.; Andrienko, G.; Theodoridis, Y. Similarity search in trajectory databases. In Proceedings of the 14th International Symposium on Temporal Representation and Reasoning (TIME’07), Alicante, Spain, 28–30 June 2007; pp. 129–140. [Google Scholar]
- Meng, Y.; Liang, J.; Cao, F.; He, Y. A new distance with derivative information for functional k-means clustering algorithm. Inf. Sci. 2018, 463, 166–185. [Google Scholar] [CrossRef]
- Jacques, J.; Preda, C. Functional data clustering: A survey. Adv. Data Anal. Classif. 2014, 8, 231–255. [Google Scholar] [CrossRef]
- Wang, J.L.; Chiou, J.M.; Müller, H.G. Functional data analysis. Annu. Rev. Stat. Its Appl. 2016, 3, 257–295. [Google Scholar] [CrossRef]
- Peng, J.; Müller, H.G. Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann. Appl. Stat. 2008, 2, 1056–1077. [Google Scholar] [CrossRef]
- Kayano, M.; Dozono, K.; Konishi, S. Functional cluster analysis via orthonormalized Gaussian basis expansions and its application. J. Classif. 2010, 27, 211–230. [Google Scholar] [CrossRef]
- Giacofci, M.; Lambert-Lacroix, S.; Marot, G.; Picard, F. Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics 2013, 69, 31–40. [Google Scholar] [CrossRef]
- Coffey, N.; Hinde, J.; Holian, E. Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data. Comput. Stat. Data Anal. 2014, 71, 14–29. [Google Scholar] [CrossRef]
- Chamroukhi, F.; Nguyen, H.D. Model-based clustering and classification of functional data. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1298. [Google Scholar] [CrossRef]
- McLachlan, G.J.; Krishnan, T. The EM Algorithm and Extensions; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
- Nguyen, H.D. An introduction to Majorization-Minimization algorithms for machine learning and statistical estimation. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2017, 7, e1198. [Google Scholar] [CrossRef]
- Tokushige, S.; Yadohisa, H.; Inada, K. Crisp and fuzzy k-means clustering algorithms for multivariate functional data. Comput. Stat. 2007, 22, 1–16. [Google Scholar] [CrossRef]
- Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Ieva, F.; Paganoni, A.N.N.A.; Pigoli, D.; Vitelli, V. Multivariate functional clustering for the analysis of ECG curves morphology. J. R. Stat. Soc. Ser. C Appl. Stat. 2011; in press. [Google Scholar]
- Teeraratkul, T.; O’Neill, D.; Lall, S. Shape-based approach to household electric load curve clustering and prediction. IEEE Trans. Smart Grid 2017, 9, 5196–5206. [Google Scholar] [CrossRef]
- Yu, D.; Liu, G.; Guo, M.; Liu, X. An improved K-medoids algorithm based on step increasing and optimizing medoids. Expert Syst. Appl. 2018, 92, 464–473. [Google Scholar] [CrossRef]
- Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2022, 622, 178–210. [Google Scholar] [CrossRef]
- Arthur, D. Vassilvitskii K-means++ the advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
- Ushakov, A.V.; Vasilyev, I. Near-optimal large-scale k-medoids clustering. Inf. Sci. 2021, 545, 344–362. [Google Scholar] [CrossRef]
- Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR time series archive. IEEE/CAA J. Autom. Sin. 2019, 6, 1293–1305. [Google Scholar] [CrossRef]
- Everette, G. Exponential smoothing: The state of the art—Part II. Int. J. Forecast. 2006, 22, 637–666. [Google Scholar]
- McKinley, S.; Levine, M. Cubic spline interpolation. Coll. Redwoods 1998, 45, 1049–1060. [Google Scholar]
- Ay, M.; Özbakır, L.; Kulluk, S.; Gülmez, B.; Öztürk, G.; Özer, S. FC-Kmeans: Fixed-centered K-means algorithm. Expert Syst. Appl. 2023, 211, 118656. [Google Scholar] [CrossRef]
- Park, H.S.; Jun, C.H. A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 2009, 36, 3336–3341. [Google Scholar] [CrossRef]
- Luna-Romera, J.M.; Martínez-Ballesteros, M.; García-Gutiérrez, J.; Riquelme, J.C. External clustering validity index based on chi-squared statistical test. Inf. Sci. 2019, 487, 1–17. [Google Scholar] [CrossRef]
- Xu, Q.; Zhang, Q.; Liu, J. Efficient synthetical clustering validity indexes for hierarchical clustering. Expert Syst. Appl. 2020, 151, 113367. [Google Scholar] [CrossRef]
- Tharwat, A. Classification assessment methods. Appl. Comput. Inform. 2020, 17, 168–192. [Google Scholar] [CrossRef]
Figure 1.
Two curves belonging to different classes; the bold parts of the two curves have different curvatures, while the subplot shows their real states.
Figure 1.
Two curves belonging to different classes; the bold parts of the two curves have different curvatures, while the subplot shows their real states.
Figure 2.
The blue curve segments in (a) and (b) are both on . The orange curve segments in (a) and (b) are and on , respectively.
Figure 2.
The blue curve segments in (a) and (b) are both on . The orange curve segments in (a) and (b) are and on , respectively.
Figure 3.
The curve sets fitted by the Beef, Car, and GunPoint datasets.
Figure 3.
The curve sets fitted by the Beef, Car, and GunPoint datasets.
Figure 4.
The first column shows curves randomly selected from the ArrowHead and Beef datasets. The second and the third columns respectively present the slope information and curvature information of the corresponding curves. Points exhibiting divergent trends on curves are highlighted with black dashed lines, while those with similar trends are indicated by green dashed lines.
Figure 4.
The first column shows curves randomly selected from the ArrowHead and Beef datasets. The second and the third columns respectively present the slope information and curvature information of the corresponding curves. Points exhibiting divergent trends on curves are highlighted with black dashed lines, while those with similar trends are indicated by green dashed lines.
Figure 5.
Calculation process of the proposed WCDM.
Figure 5.
Calculation process of the proposed WCDM.
Figure 6.
Classification performance in terms of Accuracy. Each point represents a dataset, and a point that falls in the lower triangle area indicates that WCDM works better.
Figure 6.
Classification performance in terms of Accuracy. Each point represents a dataset, and a point that falls in the lower triangle area indicates that WCDM works better.
Figure 7.
Classification performance in terms of -score. Each point represents a dataset, and a point that falls in the lower triangle area indicates that WCDM works better.
Figure 7.
Classification performance in terms of -score. Each point represents a dataset, and a point that falls in the lower triangle area indicates that WCDM works better.
Figure 8.
Time required for calculating the dissimilarity between curves from the Beef, OliveOil, and GunPoint datasets.
Figure 8.
Time required for calculating the dissimilarity between curves from the Beef, OliveOil, and GunPoint datasets.
Table 1.
Summary of previous works on dissimilarity measures for curves.
Table 1.
Summary of previous works on dissimilarity measures for curves.
Category | Measure | Param Free | Anti Noise | Higher-Order Info | Diff Length | Local Shift | Time Complexity |
---|
Lock-step measure | -norm [10] | ✓ | × | × | × | × | O(n) |
-norm [10] | ✓ | × | × | × | × | O(n) |
-norm [10] | ✓ | × | × | × | × | O(n) |
Elastic measure | Hausdorff [11] | ✓ | × | × | ✓ | × | O() |
Fréchet [6] | ✓ | × | × | ✓ | × | O() |
DTW [15] | ✓ | × | × | ✓ | ✓ | O() |
DDTW [16] | ✓ | × | ✓ | ✓ | ✓ | O() |
ShapeDTW [18] | × | × | ✓ | ✓ | ✓ | O() |
LSDTW [19] | × | ✓ | ✓ | ✓ | ✓ | O() |
SSDTW [7] | ✓ | × | × | ✓ | ✓ | O() |
LCSS [21] | × | ✓ | × | ✓ | ✓ | O() |
EDR [22] | × | ✓ | × | ✓ | ✓ | O() |
OWD [23] | × | × | × | ✓ | × | O() |
LIP [24] | × | × | × | ✓ | × | O() |
Table 2.
Details of the datasets and smoothing factors.
Table 2.
Details of the datasets and smoothing factors.
Datasets | Instances | Train Sets | Test Sets | Features | Classes | Type | Smoothing Factors |
---|
ArrowHead | 211 | 36 | 175 | 251 | 3 | IMAGE | 0.3 |
Beef | 60 | 30 | 30 | 470 | 5 | SPECTRO | 0.1 |
BeetleFly | 40 | 20 | 20 | 512 | 2 | IMAGE | 0.5 |
BirdChicken | 40 | 20 | 20 | 512 | 2 | IMAGE | 0.1 |
Car | 120 | 60 | 60 | 577 | 4 | SENSOR | 0.03 |
Chinatown | 363 | 20 | 343 | 24 | 2 | TRAFFIC | 0.5 |
Coffee | 56 | 28 | 28 | 286 | 2 | SPECTRO | 0.3 |
Earthquakes | 461 | 322 | 139 | 512 | 2 | SENSOR | 0.4 |
FiftyWords | 905 | 450 | 455 | 270 | 50 | IMAGE | 0.1 |
Fish | 350 | 175 | 175 | 463 | 7 | IMAGE | 0.3 |
Fungi | 204 | 18 | 186 | 201 | 18 | HRM | 0.1 |
GunPoint | 200 | 50 | 150 | 150 | 2 | MOTION | 0.5 |
Mallat | 2400 | 55 | 2345 | 1024 | 8 | SIMULATED | 0.3 |
MoteStrain | 1272 | 20 | 1252 | 84 | 2 | SENSOR | 0.3 |
OliveOil | 60 | 30 | 30 | 570 | 4 | SPECTRO | 0.1 |
OSULeaf | 442 | 200 | 242 | 427 | 6 | IMAGE | 0.1 |
PowerCons | 360 | 180 | 180 | 144 | 2 | POWER | 0.2 |
Symbols | 1020 | 25 | 995 | 398 | 6 | IMAGE | 0.3 |
Table 3.
Effect of parameter adjustment on the DCK-medoids clustering results.
Table 3.
Effect of parameter adjustment on the DCK-medoids clustering results.
Datasets | Purity | ARI |
---|
0.1 | 0.3 | 0.5 | 0.8 | 1 | 2 | 0.1 | 0.3 | 0.5 | 0.8 | 1 | 2 |
---|
ArrowHead | 0.550 | 0.573 | 0.607 | 0.592 | 0.588 | 0.521 | 0.195 | 0.162 | 0.220 | 0.143 | 0.156 | 0.132 |
BeetleFly | 0.725 | 0.575 | 0.725 | 0.575 | 0.575 | 0.525 | 0.182 | −0.002 | 0.182 | 0.005 | −0.002 | −0.024 |
Car | 0.550 | 0.542 | 0.542 | 0.542 | 0.500 | 0.450 | 0.179 | 0.157 | 0.157 | 0.157 | 0.081 | 0.117 |
Earthquakes | 0.798 | 0.798 | 0.798 | 0.798 | 0.798 | 0.798 | 0.001 | 0.003 | 0.014 | 0.009 | 0.009 | 0.001 |
PowerCons | 0.906 | 0.906 | 0.906 | 0.889 | 0.597 | 0.806 | 0.657 | 0.657 | 0.657 | 0.604 | 0.359 | 0.372 |
Symbols | 0.657 | 0.655 | 0.738 | 0.747 | 0.746 | 0.611 | 0.557 | 0.552 | 0.608 | 0.642 | 0.642 | 0.419 |
Table 4.
Comparison of curve clustering algorithms.
Table 4.
Comparison of curve clustering algorithms.
Datasets | K-Means++ | K-Medoids | DCK-Medoids () |
---|
Purity | ARI | DI | NI | Purity | ARI | DI | NI | Purity | ARI | DI | NI |
---|
ArrowHead | 0.415 | 0.019 | 0.325 | 2 | 0.605 | 0.227 | 0.041 | 3 | 0.607 | 0.220 | 0.020 | 3 |
Beef | 0.390 | 0.077 | 0.106 | 3 | 0.440 | 0.102 | 0.038 | 4 | 0.450 | 0.133 | 0.065 | 3 |
Beetlefly | 0.545 | −0.001 | 0.456 | 2 | 0.555 | 0.005 | 0.421 | 3 | 0.725 | 0.182 | 0.475 | 2 |
Birdchicken | 0.543 | −0.003 | 0.238 | 2 | 0.580 | 0.004 | 0.261 | 3 | 0.550 | −0.012 | 0.265 | 2 |
Car | 0.472 | 0.091 | 0.086 | 5 | 0.530 | 0.145 | 0.088 | 4 | 0.542 | 0.157 | 0.081 | 3 |
Chinatown | 0.793 | 0.336 | 0.058 | 3 | 0.793 | 0.336 | 0.058 | 4 | 0.793 | 0.336 | 0.058 | 3 |
Coffee | 0.536 | 0.003 | 0.406 | 1 | 0.780 | 0.405 | 0.152 | 3 | 0.947 | 0.793 | 0.182 | 3 |
Earthquakes | 0.798 | 0.004 | 0.515 | 2 | 0.798 | 0.002 | 0.599 | 3 | 0.798 | 0.014 | 0.570 | 3 |
FiftyWords | 0.451 | 0.233 | 0.088 | 6 | 0.520 | 0.221 | 0.090 | 6 | 0.527 | 0.266 | 0.065 | 5 |
Fish | 0.378 | 0.096 | 0.054 | 5 | 0.465 | 0.177 | 0.058 | 4 | 0.540 | 0.254 | 0.060 | 3 |
Fungi | 0.867 | 0.793 | 0.053 | 4 | 0.795 | 0.662 | 0.035 | 4 | 0.662 | 0.443 | 0.037 | 3 |
Gunpoint | 0.533 | 0.001 | 0.066 | 5 | 0.522 | −0.002 | 0.070 | 3 | 0.510 | −0.005 | 0.074 | 2 |
Mallat | 0.452 | 0.321 | 0.084 | 4 | 0.564 | 0.417 | 0.038 | 4 | 0.633 | 0.534 | 0.060 | 4 |
MoteStrain | 0.539 | 0.000 | 0.298 | 1 | 0.689 | 0.180 | 0.061 | 3 | 0.802 | 0.364 | 0.044 | 4 |
OliveOil | 0.690 | 0.410 | 0.141 | 4 | 0.715 | 0.403 | 0.086 | 4 | 0.767 | 0.611 | 0.087 | 3 |
OSULeaf | 0.299 | 0.026 | 0.126 | 3 | 0.313 | 0.038 | 0.134 | 4 | 0.330 | 0.051 | 0.181 | 3 |
PowerCons | 0.563 | 0.015 | 0.186 | 4 | 0.787 | 0.378 | 0.132 | 3 | 0.906 | 0.657 | 0.124 | 2 |
Symbols | 0.540 | 0.337 | 0.063 | 5 | 0.610 | 0.420 | 0.028 | 4 | 0.738 | 0.608 | 0.051 | 3 |
Table 5.
Clustering performance in terms of Purity.
Table 5.
Clustering performance in terms of Purity.
Datasets | ED | DTW | LSDTW | Hausdorff | Fréchet | LCSS () | WCDM |
---|
Beef | 0.467 | 0.450 | 0.417 | 0.433 | 0.433 | 0.450 (0.1, 5) | 0.450 |
Beetlefly | 0.575 | 0.625 | 0.650 | 0.625 | 0.525 | 0.600 (0.2, 6) | 0.725 |
Birdchicken | 0.550 | 0.525 | 0.575 | 0.550 | 0.575 | 0.600 (0.2, 6) | 0.575 |
Car | 0.558 | 0.583 | 0.542 | 0.592 | 0.592 | 0.408 (0.1, 6) | 0.542 |
Chinatown | 0.769 | 0.799 | 0.755 | 0.713 | 0.713 | 0.716 (0.4, 4) | 0.793 |
Earthquakes | 0.798 | 0.798 | 0.798 | 0.798 | 0.798 | 0.798 (0.1,6) | 0.798 |
FiftyWords | 0.540 | 0.578 | 0.669 | 0.552 | 0.559 | 0.410 (0.1, 6) | 0.527 |
Fish | 0.506 | 0.529 | 0.754 | 0.431 | 0.406 | 0.451 (0.1, 6) | 0.540 |
Fungi | 0.809 | 0.564 | 0.828 | 0.583 | 0.583 | 0.676 (0.2, 10) | 0.662 |
GunPoint | 0.505 | 0.505 | 0.505 | 0.570 | 0.570 | 0.510 (0.1, 6) | 0.510 |
MoteStrain | 0.827 | 0.803 | 0.789 | 0.770 | 0.782 | 0.539 (0.4, 8) | 0.802 |
OilveOil | 0.717 | 0.817 | 0.817 | 0.700 | 0.700 | 0.650 (0.05, 6) | 0.767 |
OSULeaf | 0.355 | 0.416 | 0.416 | 0.391 | 0.344 | 0.339 (0.2, 11) | 0.330 |
PowerCons | 0.894 | 0.897 | 0.906 | 0.881 | 0.883 | 0.828 (0.5, 11) | 0.906 |
wins number | 7/8 | 8/8 | 9/9 | 5/10 | 6/10 | 6/11 | - |
Table 6.
Clustering performance in terms of ARI.
Table 6.
Clustering performance in terms of ARI.
Datasets | ED | DTW | LSDTW | Hausdorff | Fréchet | LCSS () | WCDM |
---|
Beef | 0.160 | 0.113 | 0.112 | 0.120 | 0.120 | 0.113 (0.1, 5) | 0.133 |
Beetlefly | −0.002 | 0.038 | 0.068 | 0.039 | −0.024 | 0.015 (0.2, 6) | 0.182 |
Birdchicken | −0.015 | −0.022 | −0.003 | −0.014 | 0.000 | 0.015 (0.2, 6) | 0.002 |
Car | 0.178 | 0.190 | 0.178 | 0.225 | 0.225 | 0.072 (0.1, 6) | 0.157 |
Chinatown | 0.281 | 0.351 | 0.252 | 0.157 | 0.150 | 0.073 (0.4, 4) | 0.336 |
Earthquakes | 0.003 | 0.089 | 0.010 | 0.000 | 0.044 | −0.068 (0.1, 6) | 0.014 |
FiftyWords | 0.293 | 0.387 | 0.523 | 0.372 | 0.321 | 0.172 (0.1, 6) | 0.266 |
Fish | 0.216 | 0.251 | 0.621 | 0.156 | 0.161 | 0.169 (0.1, 6) | 0.254 |
Fungi | 0.751 | 0.352 | 0.695 | 0.445 | 0.453 | 0.487 (0.2, 10) | 0.443 |
GunPoint | −0.005 | −0.005 | −0.005 | 0.014 | 0.014 | −0.005 (0.1, 6) | −0.005 |
MoteStrain | 0.427 | 0.366 | 0.334 | 0.292 | 0.318 | 0.000 (0.4, 8) | 0.364 |
OilveOil | 0.478 | 0.609 | 0.557 | 0.450 | 0.450 | 0.381 (0.05, 6) | 0.611 |
OSULeaf | 0.095 | 0.146 | 0.160 | 0.123 | 0.103 | 0.070 (0.2, 11) | 0.051 |
PowerCons | 0.622 | 0.630 | 0.657 | 0.578 | 0.587 | 0.428 (0.5, 11) | 0.657 |
wins number | 7/8 | 7/8 | 7/9 | 5/9 | 6/8 | 4/11 | - |
Table 7.
Clustering performance in terms of DI.
Table 7.
Clustering performance in terms of DI.
Datasets | ED | DTW | LSDTW | Hausdorff | Fréchet | LCSS () | WCDM |
---|
Beef | 0.065 | 0.043 | 0.090 | 0.065 | 0.065 | 0.036 (0.1, 5) | 0.065 |
Beetlefly | 0.465 | 0.563 | 0.594 | 0.467 | 0.372 | 0.462 (0.2, 6) | 0.475 |
Birdchicken | 0.292 | 0.415 | 0.254 | 0.246 | 0.305 | 0.090 (0.2, 6) | 0.281 |
Car | 0.088 | 0.032 | 0.031 | 0.094 | 0.094 | 0.061 (0.1, 6) | 0.081 |
Chinatown | 0.078 | 0.067 | 0.050 | 0.059 | 0.046 | 0.034 (0.4, 4) | 0.058 |
Earthquakes | 0.496 | 0.489 | 0.487 | 0.487 | 0.507 | 0.489 (0.1, 6) | 0.570 |
FiftyWords | 0.099 | 0.066 | 0.099 | 0.081 | 0.088 | 0.067 (0.1, 6) | 0.065 |
Fish | 0.080 | 0.046 | 0.046 | 0.063 | 0.063 | 0.040 (0.1, 6) | 0.060 |
Fungi | 0.055 | 0.008 | 0.018 | 0.014 | 0.016 | 0.016 (0.2, 10) | 0.037 |
GunPoint | 0.139 | 0.091 | 0.139 | 0.084 | 0.084 | 0.066 (0.1, 6) | 0.074 |
MoteStrain | 0.045 | 0.057 | 0.044 | 0.045 | 0.042 | 0.020 (0.4, 8) | 0.044 |
OilveOil | 0.106 | 0.054 | 0.048 | 0.132 | 0.132 | 0.164 (0.05, 6) | 0.087 |
OSULeaf | 0.167 | 0.154 | 0.110 | 0.181 | 0.154 | 0.096 (0.2, 11) | 0.181 |
PowerCons | 0.108 | 0.113 | 0.138 | 0.106 | 0.108 | 0.109 (0.5, 11) | 0.124 |
wins number | 9/5 | 6/8 | 6/9 | 6/8 | 7/8 | 2/12 | - |
Table 8.
Spectral data used in the experiment.
Table 8.
Spectral data used in the experiment.
Type | Data Volume | S/N | Test:Train | Classes | Dimentionality | Smoothing Factors |
---|
A/F/G/K | 50/50/50/50 | <10 | 2:8 | 4 | 3121 | 0.5 |
A/F/G/K | 50/50/50/50 | >30 | 2:8 | 4 | 3121 | 0.5 |
Table 9.
Information of the spectral data.
Table 9.
Information of the spectral data.
| Type | S/N | Clustering Results | Classification Results |
---|
Purity | ARI | DI | Accuracy | -Score
|
---|
| A/F/G/K | <10 | 0.265 | 0.000 | 0.007 | 0.400 | 0.197 |
WCDM | A/F/G/K | >30 | 0.515 | 0.189 | 0.000 | 0.675 | 0.698 |
| A/F/G/K | <10 | 0.340 | 0.007 | 0.005 | 0.650 | 0.664 |
ED | A/F/G/K | >30 | 0.425 | 0.139 | 0.000 | 0.650 | 0.678 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).