A New Composite Dissimilarity Measure for Planar Curves Based on Higher-Order Derivatives

Wang, Yupeng; Cai, Jianghui; Yang, Haifeng; Wang, Jie; Liang, Bo; Zhao, Xujun

doi:10.3390/math12193083

Open AccessArticle

A New Composite Dissimilarity Measure for Planar Curves Based on Higher-Order Derivatives

by

Yupeng Wang

¹

,

Jianghui Cai

^2,*,

Haifeng Yang

^2,*,

Jie Wang

²,

Bo Liang

² and

Xujun Zhao

²

¹

School of Electronic Information Engineer, Taiyuan University of Science and Technology, Taiyuan 030024, China

²

School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(19), 3083; https://doi.org/10.3390/math12193083

Submission received: 30 August 2024 / Revised: 23 September 2024 / Accepted: 30 September 2024 / Published: 1 October 2024

(This article belongs to the Special Issue Mathematical and Computing Sciences for Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

With the rapid development of information technology, the problem of curve matching has appeared in many application domains, including sequence analysis, signals processing, speech recognition, etc. Many similarity measures have been studied for matching curves based on Euclidean distance, which shows fragility in portraying the morphological information of curve data. In this paper, we propose a novel weighted composite curve dissimilarity metric (WCDM). First, the WCDM measures the dissimilarity based on the higher-order semantic difference between curve shapes and location difference. These two differences are calculated using the curvature difference and Euclidean distance between the curves, respectively. Second, a new dynamic weighting function is defined by employing the relationship between the trends of the curves. This function aims at adjusting the contributions of the curvature difference and the Euclidean distance to compose the dissimilarity measure WCDM. Finally, to ascertain the rationality of the WCDM, its metric properties are studied and proved theoretically. Comparison experiments on clustering and classification tasks are carried out on curve sets transformed from UCR time series datasets, and an application analysis of the WCDM is conducted on spectral data. The experimental results indicate the effectiveness of the WCDM. Specifically, clustering and classification based on the WCDM are superior to those based on ED, DTW, Hausdorff, Fréchet, and LCSS on at least 8 out of 14 datasets across all evaluation indices. In particular, the Purity and ARI on the Beetlefly dataset are improved by more than 7.5%, while accuracy on the Beef, Chinatown, and OliveOil datasets increases by 13.32%, 10.08%, and 12.83%, respectively.

Keywords:

curve data; higher-order derivative; dissimilarity measure; morphological information

MSC:

68T01

1. Introduction

With the advent of sophisticated data acquisition technologies, there has been an upsurge in the acquisition and application of curve data across such diverse domains as signal processing, finance, medicine, and meteorology [1,2]. Curve data consist of geometric forms of sequence data that fundamentally represent a one-dimensional function defined as a real-valued function of time t, where

t \in [T_{1}, T_{2}]

. It is imperative to note that these curves are predominantly derived from observations at discrete time points

T_{1} = t_{1} < t_{2} < \dots < t_{n} = T_{2}

, with

y (t_{i})

denoting the observed value at the specific time point

t_{i}, i = 1, \dots, n

. To facilitate further mining of curve data, it is essential to employ smoothing techniques that transform these discrete observations into functional curves [3,4].

Curve matching is an essential task in contexts such as classification, clustering, pattern recognition, and more. Many current dissimilarity or similarity methods have been extensively studied for curve matching [5,6,7]. However, most of these are mainly based on the Euclidean distance (ED for short), which is not accurate enough to compute shape-based dissimilarity.

Figure 1 shows two curves belonging to different classes. From the subplot of Figure 1, it can be seen that the Euclidean distance between these two curves is small. Therefore, the reason for their belonging to different classes mainly stems from the differences in their shapes. This means that calculating the dissimilarity between the curves solely on the basis of the Euclidean distance is inappropriate. Instead, information reflecting shape differences should be considered in the dissimilarity measure. Higher-order derivatives such as slope, curvature, concavity and convexity act as intrinsic geometric characteristics of curves, and as such are important factors in the variation of curve shapes. Therefore, a reasonable dissimilarity measure should comprehensively take into consideration both higher-order derivative features and location features.

To address the above issue, in this paper we propose a new weighted composite dissimilarity metric (WCDM) based on higher-order derivative information and location information. First, the shape discrepancy is calculated by point-to-point curvature difference, while the location difference is computed based on the ED between the pairs of points. Second, to allocate the weights to the curvature difference and the location difference, we define a new adaptive weighting function utilizing the relationship between the trends of two curves. This new weighting function can dynamically capture the effect of curve trends on shape discrepancy and location separation. In the end, our weighted composite dissimilarity metric (WCDM) is defined by integrating the weighted curvature difference and the ED. Our new dissimilarity metric can effectively recognize the shape-based dissimilarity between curves. Furthermore, to avoid the negative impact of randomly selected initial cluster centers on clustering task, an anchoring strategy is used in our experiments to obtain representative and reasonable initial cluster centers. The contributions of this paper can be summarized as follows:

We define a new weighted composite dissimilarity metric (WCDM) between two curves. The WCDM is more reasonable and explainable, and can effectively measure the dissimilarity between two curves.
We define a new adaptive weighting function to reasonably assign the weights based on the relationship between the trends of the curves.
We design a new strategy for selecting specific initial cluster centers, effectively bolstering the performance of curve K-medoids clustering and reducing the iteration number.

The remainder of this paper is organized as follows: Section 2 revisits the relevant literature on metrics for determining curve dissimilarity and curve clustering methodologies; Section 3 describes the proposed WCDM in detail and proves the metric properties satisfied by WCDM; in Section 4, the improved K-medoids algorithm is explicated and the experimental results of the proposed techniques are presented and dissected; finaly, the paper ends with conclusions in Section 5.

2. Related Works

2.1. Dissimilarity Measures between Curves

Dissimilarity measures serve as the cornerstone of curve data analysis tasks, and can be broadly classified into two categories: lock-step measures and elastic measures [8,9].

Functional theory [10], which has achieved rapid advancement since the 1930s, has extensively applications such as the norm-induced metric

l_{p}

-norm (known as the Minkowski distance). In particular, the

l_{1}

-norm (Manhattan distance),

l_{2}

-norm (ED), and

l_{\infty}

-norm (Chebyshev distance) are favored in functional analysis due to their ease of implementation and robust theoretical underpinnings. As a metric,

l_{p}

-norm belongs to the class of lock-step measures.

Elastic measures imply the allocation one-to-many or one-to-none points on curves to allow for comparison. The Hausdorff distance, introduced in [5,11], measures the matching degree between two curves. The Fréchet distance, accounting for both the location and the ordering of points on curves, was presented from a computational perspective in [6,12,13].

Time series, as a type of sequence data, have their own unique set of dissimilarity measures [14]. A classical dissimilarity measure in this domain is DTW, which was introduced in [15]. Recognizing the limitations of DTW, which is prone to pathological alignment, subsequent refinements have been made, resulting in the emergence of variants of DTW. DDTW [16] aligns the estimated derivative sequences of time series. In [17], a penalty-based DTW method was proposed, known as weighted dynamic time warping (WDTW). In [18], the authors introduced Shape-DTW, which aligns the shape descriptors using DTW. LSDTW [19] is an alignment method based on local slope information. Additionally, SSDTW [7] leverages the maximum-overlap discrete wavelet transform to incorporate the structural information of time series and improve DTW.

Another series of dissimilarity measures applies to trajectory data, which is another form of curve data [20]. Commonly used dissimilarity measures include extracting the longest common subsequences (LCSS) [21], seeking the minimum edit operations number as the edit distance on real sequences (EDR) [22], and using the one-sided distance (OWD) [23] or locality in-between polylines (LIP) [24].

Table 1 lists the properties of some dissimilarity measures. It is clear that the

l_{p}

-norm (

p = 1, 2, \infty

) is sensitive to noise, neglects higher-order information, and cannot handle curves with different lengths and local shifts; however, it is easy to implement. As elastic measures, the Hausdorff and Fréchet distances can be used to calculate the dissimilarity between curves with different lengths and do not need to adjust parameters. The variants of DTW utilize the higher-order derivative information of curves to some extent, and can measure the dissimilarity between curves with local shifts. Both LCSS and EDR can measure the dissimilarity between curves with different lengths and local shifts, and they both have strong anti-noise capabilities. In summary, although elastic measures provide a more precise assessment of curve dissimilarity compared to lock-step measures, they come with the tradeoff of higher computational complexity.

2.2. Clustering Methods for Curves

Clustering techniques for curve data can be broadly classified into three categories: two-stage methods, model-based methods, and non-parametric clustering methods [25].

Two-stage clustering approaches are executed in two phases [26,27]. Initially, curve data are transformed into a set of coefficients or principal component scores of a finite-dimensional basis function. Subsequently, these dimensionally-reduced data undergo clustering [28,29,30,31].

Model-based clustering methods commence by converting curve data into a set of coefficients of finite-dimensional basis functions. Clustering models are then devised for these coefficients by considering them as random variables with specific probability distributions [32], such as the expectation-maximization (EM) algorithm [33] or the minimization–maximization (MM) algorithm [34].

Non-parametric clustering techniques group curves by introducing distinct dissimilarity measures to curve data. These include the functional K-means clustering algorithm based on the distance

d_{0}

between curve functions [35], the hierarchical clustering algorithm in combination with the semi-metric

d_{2}

proposed in [36], the algorithm presented [37], which integrates the K-means clustering method with

d_{0}, d_{1}

and

{(d_{0}^{2} + d_{1}^{2})}^{1 / 2}

, and the shape-based approach grounded on Dynamic Time Warping (DTW) for household load curve clustering and prediction elucidated in [38]. Moreover, due to their brevity and ease of implementation, the most widely used non-parametric clustering methods are the K-means and K-medoids algorithms. Despite their advantages, these algorithms exhibit a common limitation in their sensitivity to the initial cluster centers [39,40]. Various modifications have been proposed to enhance their performance in this area; for instance, [41] introduced the K-means++ algorithm, which determines the initial cluster centers in a specified manner instead of randomly selecting them, while [42] proposed a near-optimal large-scale K-medoids clustering algorithm to reduce the computational burden and memory load of the K-medoids algorithm when used on large-scale high-dimensional datasets.

3. Theories and Methods

In this section, we describe the technical details of our proposed metric for assessing the dissimilarity between curves.

3.1. Proposed Dissimilarity Metric: WCDM

In Figure 2, the two curve segments in subplot (b) are more similar than those in subplot (a). The ED values between the curves in subplots (a) and (b) are

E D (y (t), y_{1} (t)) = 1.5139

and

E D (y (t), y_{2} (t)) = 1.7802

, respectively. Apparently,

E D (y (t), y_{2} (t)) > E D (y (t), y_{1} (t))

; this result defies common human perception. In addition, as shown in Figure 3, the curves in the dataset are usually compactly distributed. Therefore, merely considering the ED between curves cannot accurately measure the dissimilarity between curves. To properly measure the dissimilarity between the curves, higher-order derivatives that reflect the shape properties should be introduced into the dissimilarity measurement.

Curvature indicates the extent to which a curve deviates from a straight line. Serving as a geometric quantity delineating the shape of a curve, the change of curvature is inherently dictated by the curve itself; thus, the shape discrepancy of curves is absolutely influenced by their curvatures.

Let

y (t) : [T_{1}, T_{2}] \to R

be a one-dimensional curve, and let

y^{'} (t), y^{″} (t)

denote the first-order and second-order derivatives of curve

y (t)

, respectively. The curvature of curve

y (t)

is provided by

κ (t) = y^{″} (t) / {(1 + y^{' 2} (t))}^{3 / 2}

. The new dissimilarity measure between two curves can be defined in the following way.

Definition 1 (Weighted Composite Curve Dissimilarity Metric, WCDM).

Let

y_{i} (t)

and

y_{j} (t)

be two planar curves on

[T_{1}, T_{2}]

. The dissimilarity metric between

y_{i} (t)

and

y_{j} (t)

can be defined as

\begin{matrix} W C D M (y_{i} (t), y_{j} (t)) = \int_{T_{1}}^{T_{2}} [ω |y_{i} (t) - y_{j} (t)| + (1 - ω) |κ_{i} (t) - κ_{j} (t)|] d t, \end{matrix}

(1)

where

κ_{i} (t)

and

κ_{j} (t)

are the curvatures of

y_{i} (t)

and

y_{j} (t)

on

[T_{1}, T_{2}]

and ω is a weight value.

The WCDM is obtained by integrating the weighted sum of the curvature difference and location distance. The WCDM is better able to discern curves, and is easier to understand because it inherits the merits of both differences. However, the weight value of the curvature difference and location distance need to be properly adjusted. The method for determining the weights is discussed in the next section.

3.2. The Weighting Function in the WCDM

The magnitudes of the effects of the curvature difference and location difference on the dissimilarity between two curves is not the same. In this subsection, we define a new weighting function to dynamically allocate the weight values by making full use of the relationships between the trends of curves.

As presented in the second column of Figure 4, it is evident that when two points in curves have analogous slopes (highlighted by green dashed lines), the curvature information (presented in the third column of Figure 4) does not exhibit significant variation. Yet, based on the original curves shown in the first column of Figure 4, the location disparity is considerable. Hence, location differences predominantly influence the dissimilarity, warranting a larger weight assignment. Conversely, when the slopes of the curves diverge (marked by black dashed lines), curvature differences play a crucial role in determining dissimilarity and should be ascribed a greater weight. To address this, we introduce a weighting function based on the trends of the curves.

Let

α_{i} (t_{0}) = (1, y_{i}^{'} (t_{0}))

and

α_{j} (t_{0}) = (1, y_{j}^{'} (t_{0}))

be the tangent vectors of

y_{i} (t)

and

y_{j} (t)

, respectively, at an arbitrary time

t_{0} \in [T_{1}, T_{2}]

; moreover,

cos Θ_{t_{0}}

is the cosine of the angle

Θ_{t_{0}}

between the tangent vectors

α_{i} (t_{0})

and

α_{j} (t_{0})

. Then, the weighting function at time

t_{0}

is defined as

ω (t_{0}) = σ (cos Θ_{t_{0}}) = \frac{1}{1 + e^{- cos Θ_{t_{0}}}},

(2)

where

σ (\cdot)

is the logic function.

Because

- 1 \leq cos Θ_{t_{0}} \leq 1

, we have

ω \in [\frac{1}{1 + e}, \frac{e}{1 + e}] \subset [0, 1]

. In addition, as the trends of the two curves become more similar,

ω

becomes larger; on the contrary, as the trends of the two curves become more different,

ω

becomes smaller.

The workflow for the WCDM is provided in Figure 5. First, we calculate the ED between each pair of points on the two curves. Second, the first and second derivatives are computed at every point on each curve. Third, we calculate the curvature and the curvature difference at each pair of points on the curves. Fourth, the weight value at each point is determined using Equation (2). Finally, we calculate the WCDM dissimilarity value according to Equation (1). It is clear that the main differences between ED and WCDM dissimilarity measurements are in the weight calculation and shape discrepancy. Let n denote the length of the curves; the time complexity of the ED for two curves is

O (n)

, while the time complexity of the calculating weights and curvatures is

O (2 n)

. Therefore, the time complexity of the WCDM is

O (n + 2 n) \approx O (n)

.

3.3. Metric Property of the WCDM

In this subsection, we prove the metric property of the WCDM, which implies its reasonability.

Property 1.

The WCDM defined in Equation (1) satisfies the following conditions:

1.: $W C D M (y_{i} (t), y_{j} (t)) \geq 0$ (Non-negativity).
2.: $W C D M (y_{i} (t), y_{j} (t)) = 0$ if and only if $y_{i} (t)$ coincides with $y_{j} (t)$ (Identity).
3.: $W C D M (y_{i} (t), y_{j} (t)) = W C D M (y_{j} (t), y_{i} (t))$ (Symmetry).
4.: There exists $λ > 1$ such that

$W C D M (y_{i} (t), y_{k} (t)) \leq λ [W C D M (y_{i} (t), y_{j} (t)) + W C D M (y_{j} (t), y_{k} (t))]$

(Weak triangular inequality).

Therefore, the WCDM is a semi-metric.

Proof.

(1) For arbitrary curves

y_{i} (t)

and

y_{j} (t)

, because

ω, 1 - ω \in [\frac{1}{1 + e}, \frac{e}{1 + e}]

,

| y_{i} (t) - y_{j} (t) |

and

| κ_{i} (t) - κ_{j} (t) |

are non-negative, we have

W C D M (y_{i} (t), y_{j} (t)) \geq 0

.

(2) If

W C D M (y_{i} (t), y_{j} (t)) = 0

, then

y_{i} (t) = y_{j} (t)

, which means that

y_{i} (t)

and

y_{j} (t)

intersect and

κ_{i} (t) = κ_{j} (t)

, which indicates that

y_{i} (t)

and

y_{j} (t)

have the same curvature. Thus,

y_{i} (t)

overlaps with

y_{j} (t)

. In contrast, when

y_{i} (t)

coincides with

y_{j} (t)

, then we have

y_{i} (t) = y_{j} (t)

and

κ_{i} (t) = κ_{j} (t)

; hence,

W C D M (y_{i} (t), y_{j} (t)) = 0

.

(3) Apparently, the WCDM satisfies symmetry in light of its definition.

(4) Suppose that

λ = e > 1

; for curves

y_{i} (t), y_{j} (t)

and

y_{k} (t)

, let

ω, ω_{1}

, and

ω_{2}

be the weights in

W C D M (y_{i} (t), y_{k} (t))

,

W C D M (y_{i} (t), y_{j} (t))

, and

W C D M (y_{j} (t), y_{k} (t))

, respectively. Because

\frac{1}{1 + e} \leq ω, ω_{1}, ω_{2} \leq \frac{e}{1 + e}

, we have the following:

\begin{matrix} W C D M (y_{i} (t), y_{k} (t)) \\ = \int_{T_{1}}^{T_{2}} [ω |y_{i} (t) - y_{k} (t)| + (1 - ω) |κ_{i} (t) - κ_{k} (t)|] d t \\ \leq \int_{T_{1}}^{T_{2}} [\frac{e}{1 + e} |y_{i} (t) - y_{k} (t)| + \frac{e}{1 + e} |κ_{i} (t) - κ_{k} (t)|] d t \\ \leq \int_{T_{1}}^{T_{2}} \{\frac{e}{1 + e} [|y_{i} (t) - y_{j} (t)| + |y_{j} (t) - y_{k} (t)|] + \frac{e}{1 + e} [|κ_{i} (t) - κ_{j} (t)| + |κ_{j} (t) - κ_{k} (t)|]\} d t \\ = \int_{T_{1}}^{T_{2}} \frac{e}{1 + e} [|y_{i} (t) - y_{j} (t)| + |κ_{i} (t) - κ_{j} (t)|] d t + \int_{T_{1}}^{T_{2}} \frac{e}{1 + e} [|y_{j} (t) - y_{k} (t)| + |κ_{j} (t) - κ_{k} (t)|] d t \\ \leq λ \int_{T_{1}}^{T_{2}} [ω_{1} |y_{i} (t) - y_{j} (t)| + (1 - ω_{1}) |κ_{i} (t) - κ_{j} (t)|] d t \\ + λ \int_{T_{1}}^{T_{2}} [ω_{2} |y_{j} (t) - y_{k} (t)| + (1 - ω_{2}) |κ_{j} (t) - κ_{k} (t)|] d t \\ = λ [W C D M (y_{i} (t), y_{j} (t)) + W C D M (y_{j} (t), y_{k} (t))] . \end{matrix}

Hence, for

λ = e

(actually, for arbitrary

λ \geq e

) the WCDM satisfies the weak triangular inequality.

In conclusion, the proposed WCDM dissimilarity measure is a semi-metric. □

4. Experimental Scheme and Analysis

In this section, we evaluate the effectiveness of the WCDM through clustering and classification tasks. The experimental datasets were sourced from the UCR Time Series Data Mining Archive [43]. These datasets underwent a preprocessing stage in which the data were smoothed utilizing the first-order exponential smoothing technique [44] and the cubic spline interpolation fitting method [45]. Table 2 provides the details of the datasets.

4.1. Strategy for Selecting the Initial Cluster Centers

To mitigate the negative impact of random selection of the initial cluster centers, we introduce a new selection strategy for obtaining initial cluster centers in our subsequent clustering experiments. For a curve set Y, the proposed selection process works as follows:

Calculate the sum of the dissimilarity metric between each curve and all other curves in the dataset Y.
Calculate the mean ( $M e a n$ ) and standard deviation ( $δ$ ) of the sums computed in step 1 according to Equations (3) and (4).
Select those curves with dissimilarity sums that fall within the interval $[M e a n - μ \cdot δ, M e a n + μ \cdot δ]$ and obtain the compact subset $Y^{'}$ of Y.
Select the two curves with the lowest similarity in $Y^{'}$ as the first two initial cluster centers and place them in $V = {ν_{1} (t), ν_{2} (t)}$ .
Select the curve in $Y^{'}$ that is most dissimilar to V, then add it to V as the next initial cluster center.
Repeat step 5 until all k initial centers $V = {ν_{1} (t), \dots, ν_{k} (t)}$ are acquired.

M e a n = \frac{1}{m} \sum_{i = 1}^{m} \sum_{j = 1}^{m} W C D M (y_{i} (t), y_{j} (t))

(3)

δ = \sqrt{\frac{\sum_{i = 1}^{m} {(\sum_{j = 1}^{m} W C D M (y_{i} (t), y_{j} (t)) - M e a n)}^{2}}{m - 1}}

(4)

where m is size of the curve dataset Y

\begin{matrix} C_{y (t), s} = arg min_{i = 1, \dots, k} W C D M (y (t), ν_{i, s - 1} (t)) \end{matrix}

(5)

\begin{matrix} ν_{i, s} (t) = arg min_{y^{*} (t) \in C^{i}} \sum_{y (t) \in C^{i}} W C D M (y^{*} (t), y (t)) \end{matrix}

(6)

Based on the anchoring strategy of the initial cluster centers, the detailed pseudocode of the improved K-medoids clustering algorithm (short for DCK-medoids) is provided in Algorithm 1, where

C_{y (t), s}

is the cluster assignment of curve

y (t)

, s is the number of iteration,

C^{i}

refers to the i-th cluster assignment of curves at the s-th iteration,

ν_{i, s} (t)

denotes the new cluster center of cluster

C^{i}

, and

i = 1, \dots, k

.

Algorithm 1 Curve DCK-medoids clustering algorithm

Input:: Curve data set $Y = \{y_{1} (t), \dots, y_{m} (t)\}$ ; cluster number k; maximum number of allowed iterations $m a x_{i t e r}$ ; parameter $δ$ .
Output:: The cluster assignment $C_{y_{1} (t)}, \dots, C_{y_{m} (t)}$ of curves in Y.

1:: Calculate the initial cluster centers $V_{0} = \{ν_{1, 0} (t), \dots,$ $ν_{k, 0} (t)\}$ by the strategy described above;
2:: Obtain the clusters $C_{y_{1} (t), 1}, \dots, C_{y_{m} (t), 1}$ by using Equation (5);
3:: repeat
4:: update the cluster centers $V_{s} = \{ν_{1, s} (t), \dots, ν_{k, s} (t)\}$ by using Equation (6);
5:: Obtain the clusters $C_{y_{1} (t), s + 1}, \dots, C_{y_{m} (t), s + 1}$ by using Equation (5);
6:: update the cluster centers $V_{s + 1} = \{ν_{1, s + 1} (t), \dots,$ $ν_{k, s + 1} (t)\}$ by using Equation (6);
7:: until $V_{s} = V_{s + 1}$ or the $m a x_{i t e r}$ is reached.
8:: return $C_{y_{1} (t), s + 1}, \dots, C_{y_{m} (t), s + 1}$ .

We conducted a series of experiments for DCK-medoids clustering to properly set the

μ

parameter. A high value of

μ

may lead to an invalid concentrated subset for some datasets, while a lower value may cause overscreening of the dataset, resulting in the curves in the compact subset falling into the same cluster. Thus, the experiments were conducted for variants ranging from 0.1 to 2.

The experimental results are listed in Table 3. The optimal results for each dataset are highlighted in bold, while suboptimal results are underlined. From Table 3, it can be seen that the most appropriate value of

μ

is 0.5, as it produced the best results on most datasets.

To examine the effects of the initial cluster centers on clustering performance, we compared the DCK-medoids algorithm with K-means++ [41,46] and K-medoids [47] based on the WCDM. In the K-means++ and K-medoids algorithms, the cluster centers at the s-th iteration are selected using Equation (6). Both the K-means++ and K-medoids algorithms were executed ten times for each dataset and the average results were recorded. Additionally, we recorded the number of iterations (abbreviated as NI) at which the algorithms converged. The experimental results are shown in Table 4, with the best results for each dataset emphasized in bold.

From Table 4, it can be seen that the DCK-medoids algorithm outperforms the other methods on a majority of the datasets when evaluated by Purity, ARI [48], and NI. When considering the DI index [49], DCK-medoids is slightly inferior to K-means++ for some datasets. Therefore, the experimental results demonstrate that the proposed initial cluster center selection method improves the effectiveness of the DCK-medoids clustering algorithm.

4.2. Evaluation of the Proposed WCDM

4.2.1. Comparison of Dissimilarity Measures for Clustering

Based on the improved K-medoids algorithm (DCK-medoids) introduced in Section 4.1, this subsection evaluates the effectiveness of the proposed WCDM dissimilarity metric by comparing it with different classical dissimilarity measures. The compared dissimilarity measures include the ED [10], DTW [15], LSDTW [19], Hausdorff distance [11], Fréchet distance [6], and LCSS [21]. According to [17] and to the experimental results, the parameters (

ϵ

and

δ

) in LCSS were set to the best-performing values. The parameters (

α, l, w

, and k) in LSDTW were set to the same values provided in [19]. The clustering results based on these comparative measures are presented in Table 5, Table 6 and Table 7, where the last row indicates the number of wins in pairwise comparisons of the WCDM with other methods. The optimal results for each dataset are highlighted in bold, while suboptimal results are underlined.

The observations derived from Table 5, Table 6 and Table 7 lead to the following conclusions: (1) in terms of Purity, WCDM outperforms ED on eight out of fourteen datasets datasets, DTW on eight, LSDTW on nine, Hausdorff on ten, Frechet on nine, and LCSS on eleven; (2) according to the Purity and ARI, the performance of the DCK-medoids clustering algorithm utilizing LSDTW is the best, which is also consistent with the fact that elastic measures considering higher-order derivatives are generally superior to other measures [19]; (3) the performance of the DCK-medoids algorithm using ED is better than other dissimilarity measures under the DI metric, possibly because the DI values are calculated based on ED, which is favorable to using the ED; (4) compared to the best-performing LSDTW method, the Purity and ARI of the WCDM are increased by 7.5% and 11.4%, respectively, on the Beetlefly dataset; (5) as a lock-step measure, the WCDM can be used effectively for curve clustering tasks.

4.2.2. Comparison of Dissimilarity Measures for Classification

In order to verify that our proposed dissimilarity metric is also effective on curve classification tasks, we utilized the 1-nearest neighbor classifier to compare the curves in the test set against the training set for curve classification. Accuracy and

F_{1}

-score [50] were utilized to comprehensively evaluate the performance of curve classification;

Accuracy \in [0, 1]

means the ratio of the number of correctly classified samples to the total number of samples, while

F_{1}

-score is the harmonic mean of precision and recall, taking values in the range

[0, 1]

. The experimental results showing the classification validity are displayed in Figure 6 and Figure 7.

An examination of Figure 6 indicates that the Accuracy of classification when using the WCDM is higher than that using the ED, DTW, Hausdorff distance, Fréchet distance, and LCSS on nine, ten, eight, nine, and nine of the tested datasets, respectively. In particular, the Accuracy of the WCDM on the Beef, Chinatown, and OliveOil datasets is increased by 13.32%, 10.08%, and 12.83%. From Figure 7, it can be seen that the classification performance employing WCDM is superior to those using the ED, DTW, Hausdorff distance, Fréchet distance, and LCSS on more than half of the tested datasets. As a lock-step measure, the WCDM is more effective for classification tasks compared to the majority of dissimilarity measures. Although classification performance based on the WCDM falls short of that based on LSDTW across most datasets, this discrepancy can be attributed to the fact that the majority of curves in these datasets have a spiky and bumpy nature. As an elastic measure, LSDTW has a natural advantage in dealing with this case, as it can match one-to-many or one-to-none of points on the curves. However, the precise computation of higher-order derivatives in the WCDM is highly sensitive to these local perturbations, compromising its ability to effectively discern differences between these curves. Nonetheless, despite being a lock-step measure, the WCDM is effective for classification tasks compared to the majority of dissimilarity measures.

4.2.3. Time Efficiency of the WCDM

Concerning time complexity, it is noteworthy that the computational complexity of ED is

O (n)

. Analogously, based on Equation (1), we can discern that the computational complexity of the WCDM is

O (n)

. As delineated in [19], both DTW and LSDTW possess computational complexities of

O (n^{2})

. In [6,11], it was indicated that the time complexities for the Hausdorff and Fréchet distances are both

O (n^{2})

. Additionally, employing a dynamic programming algorithm, LCSS can be computed to have a complexity of

O (n^{2})

[21], where n denotes the length of the data.

We performed experiments to provide a visual representation of the time efficiency of the different dissimilarity measures. In these experiments, the time required by all comparative measures to measure the dissimilarity between the curves in the Beef, OliveOil, and GunPoint datasets was compared. The experimental results are depicted in Figure 8. Clearly, our proposed WCDM dissimilarity metric excels over all other methods except the ED in terms of time efficiency. Therefore, WCDM is more effective in terms of time efficiency.

4.2.4. Application of the WCDM to Spectral Data

According to the definition of the WCDM in Section 3.1, it can be inferred that it is applicable to general planar curves. To further verify its applicability, we employed the WCDM as the dissimilarity measure to conduct clustering and classification analysis on spectral data.

The spectra we used were selected from the Large Sky Area Multi-Object Fiber Spectroscopy Telescope (LAMOST) DR8 (http://www.lamost.org/, accessed on 3 August 2024). To avoid the influence of red and blue ends, we limited the wavelength range to 3800–9000 Å. To eliminate the impact of spectral scale differences on the results, we normalized the spectral data. Experiments were conducted on A-type, F-type, G-type, and K-type stars with different signal-to-noise ratios (S/N) to evaluate the robustness of the WCDM in clustering and classification tasks. The true labels consisted of the spectral classes released by LAMOST. Table 8 lists the information of the spectral data.

The experimental results are listed in Table 9 with the best results highlighted in bold. According to Table 9, when using the WCDM as dissimilarity measurement, the Purity and ARI of spectral data with high S/N (>30) are higher than those of spectral data with low S/N (<10). Similarly, the Accuracy and

F_{1}

-score of spectral data with high S/N are higher than those with low S/N. When comparing the clustering and classification results of the WCDM and ED, the WCDM shows better evaluation indices on the dataset with high S/N, while its indices are lower on the dataset with low S/N. These results indicate that the WCDM performs better when dealing with high-quality curve data, while it is prone to being affected by noise when dealing with low-quality data. The reason for this is that the WCDM relies on precise computational methods to calculate curve shape information, and these are sensitive to noise.

5. Conclusions

In this paper, we introduce a novel dynamic weighted composite dissimilarity metric, termed the WCDM, which takes into account both high-order derivative information and location information. Meanwhile, a new weighting function is defined by employing the relationship between the trends of curves. This weighting function can dynamically assign reasonable weight values to curvature difference and location difference. In this way, the WCDM can accurately measure the shape-based dissimilarity between curves. Based on the definition of the WCDM, it is feasible for application to general planar open curves. To perform comparison experiments on the curve clustering task, we used an improved K-medoids clustering algorithm in which a new pinpointing strategy is introduced to select the initial cluster centers. Comparative experiments were implemented on curve sets fitted by UCR datasets. The clustering results indicate that the DCK-medoids algorithm using the WCDM outperforms all other methods on more than eight datasets based on Purity and ARI. In terms of DI, the WCDM also performs better than most methods on more than half of datasets. Additionally, the classification results show that the WCDM is superior to the ED, DTW, Hausdorff distance, Fréchet distance, and LCSS on more than half the tested datasets based on Accuracy and

F_{1}

-score. Therefore, the WCDM is a good choice for a measurement that is time-efficient and can measure shape-based dissimilarity. In a practical experiment, we applied the WCDM to spectral data. The results show its effectiveness when dealing with high-quality curve data; however, it is sensitive to noise.

In order to measure the morphological difference between curves, the methods used by the WCDM to compute the curvature and slope are sensitive to noise and local perturbations in curves. This leads to undesirable results when clustering or classifying curves affected by noise or local perturbations. To enhance the robustness of dissimilarity metric against local perturbations, a future direction of this work could involve approximation calculation of higher-order derivative characteristics.

Author Contributions

Conceptualization, Y.W.; data curation, J.C. and H.Y.; funding acquisition, J.C., H.Y. and J.W.; methodology, Y.W., J.C. and H.Y.; software, Y.W.; validation, J.C. and H.Y.; writing—original draft, Y.W.; writing—review and editing, J.W., B.L. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant Nos. 12473105, 12473106, 62306205), Projects of Science and Technology Cooperation and Exchange of Shanxi Province (Grant Nos. 202204041101037, 202204041101033), and the Fundamental Research Program of Shanxi Province (Grant No. 202203021222189).

Data Availability Statement

Data used in this article are available from an open database (https://www.timeseriesclassification.com/, accessed on 4 June 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fontes, C.H.; Budman, H. A hybrid clustering approach for multivariate time series—A case study applied to failure analysis in a gas turbine. ISA Trans. 2017, 71, 513–529. [Google Scholar] [CrossRef] [PubMed]
Izakian, H.; Pedrycz, W.; Jamal, I. Fuzzy clustering of time series data using dynamic time warping distance. Eng. Appl. Artif. Intell. 2015, 39, 235–244. [Google Scholar] [CrossRef]
Guruswami, V.; Zuckerman, D. Robust Fourier and polynomial curve fitting. In Proceedings of the 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), New Brunswick, NJ, USA, 9–11 October 2016; pp. 751–759. [Google Scholar]
Boullé, M. Functional data clustering via piecewise constant nonparametric density estimation. Pattern Recognit. 2012, 45, 4389–4401. [Google Scholar] [CrossRef]
Alt, H.; Scharf, L. Computing the Hausdorff distance between curved objects. Int. J. Comput. Geom. Appl. 2008, 18, 307–320. [Google Scholar] [CrossRef]
Alt, H.; Godau, M. Computing the Fréchet distance between two polygonal curves. Int. J. Comput. Geom. Appl. 1995, 5, 75–91. [Google Scholar] [CrossRef]
Hong, J.Y.; Park, S.H.; Baek, J.G. SSDTW: Shape segment dynamic time warping. Expert Syst. Appl. 2020, 150, 113291. [Google Scholar] [CrossRef]
Wang, X.; Mueen, A.; Ding, H.; Trajcevski, G.; Scheuermann, P.; Keogh, E. Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 2013, 26, 275–309. [Google Scholar] [CrossRef]
Shifaz, A.; Pelletier, C.; Petitjean, F.; Webb, G.I. Elastic similarity and distance measures for multivariate time series. Knowl. Inf. Syst. 2023, 65, 2665–2698. [Google Scholar] [CrossRef]
Yosida, K. Functional Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
Bai, Y.B.; Yong, J.H.; Liu, C.Y.; Liu, X.M.; Meng, Y. Polyline approach for approximating hausdorff distance between planar free-form curves. Comput.-Aided Des. 2011, 43, 687–698. [Google Scholar] [CrossRef]
Eiter, T.; Mannila, H. Computing Discrete Fréchet Distance; Technical Report CD–TR 94/64; Vienna University of Technology: Vienna, Austira, 1994. [Google Scholar]
Filtser, A.; Filtser, O.; Katz, M.J. Approximate nearest neighbor for curves: Simple, efficient, and deterministic. Algorithmica 2023, 85, 1490–1519. [Google Scholar] [CrossRef]
Holder, C.; Middlehurst, M.; Bagnall, A. A review and evaluation of elastic distance functions for time series clustering. Knowl. Inf. Syst. 2024, 66, 765–809. [Google Scholar] [CrossRef]
Berndt, D.J.; Clifford, J. Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, USA, 14–17 August 1994; pp. 359–370. [Google Scholar]
Eamonn, J.K.; Michael, J.P. Derivative Dynamic Time Warping. In Proceedings of the 2001 SIAM International Conference on Data Mining (SDM), Chicago, IL, USA, 5–7 April 2001; Volume 10, pp. 1–11. [Google Scholar]
Jeong, Y.S.; Myong, K.J.; Olufemi, A.O. Weighted dynamic time warping for time series classification. Pattern Recognit. 2011, 44, 2231–2240. [Google Scholar] [CrossRef]
Zhao, J.; Itti, L. shapeDTW: Shape Dynamic Time Warping. Pattern Recognit. 2018, 74, 171–184. [Google Scholar] [CrossRef]
Yuan, J.; Lin, Q.; Zhang, W.; Wang, Z. Locally slope-based dynamic time warping for time series classification. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1713–1722. [Google Scholar]
Chang, Y.; Tanin, E.; Cong, G.; Jensen, C.S.; Qi, J. Trajectory similarity measurement: An efficiency perspective. In Proceedings of the VLDB Endowment, Guangzhou, China, 26–30 August 2024; Volume 17, pp. 2293–2306. [Google Scholar]
Vlachos, M.; Kollios, G.; Gunopulos, D. Discovering similar multidimensional trajectories. In Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, 26 February–1 March 2002; pp. 673–684. [Google Scholar]
Chen, L.; Özsu, M.T.; Oria, V. Robust and fast similarity search for moving object trajectories. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, MD, USA, 14–16 June 2005; pp. 491–502. [Google Scholar]
Lin, B.; Su, J. Shapes based trajectory queries for moving objects. In Proceedings of the 13th Annual ACM International Workshop on Geographic Information Systems, Bremen, Germany, 4–5 November 2005; pp. 21–30. [Google Scholar]
Pelekis, N.; Kopanakis, I.; Marketos, G.; Ntoutsi, I.; Andrienko, G.; Theodoridis, Y. Similarity search in trajectory databases. In Proceedings of the 14th International Symposium on Temporal Representation and Reasoning (TIME’07), Alicante, Spain, 28–30 June 2007; pp. 129–140. [Google Scholar]
Meng, Y.; Liang, J.; Cao, F.; He, Y. A new distance with derivative information for functional k-means clustering algorithm. Inf. Sci. 2018, 463, 166–185. [Google Scholar] [CrossRef]
Jacques, J.; Preda, C. Functional data clustering: A survey. Adv. Data Anal. Classif. 2014, 8, 231–255. [Google Scholar] [CrossRef]
Wang, J.L.; Chiou, J.M.; Müller, H.G. Functional data analysis. Annu. Rev. Stat. Its Appl. 2016, 3, 257–295. [Google Scholar] [CrossRef]
Peng, J.; Müller, H.G. Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann. Appl. Stat. 2008, 2, 1056–1077. [Google Scholar] [CrossRef]
Kayano, M.; Dozono, K.; Konishi, S. Functional cluster analysis via orthonormalized Gaussian basis expansions and its application. J. Classif. 2010, 27, 211–230. [Google Scholar] [CrossRef]
Giacofci, M.; Lambert-Lacroix, S.; Marot, G.; Picard, F. Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics 2013, 69, 31–40. [Google Scholar] [CrossRef]
Coffey, N.; Hinde, J.; Holian, E. Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data. Comput. Stat. Data Anal. 2014, 71, 14–29. [Google Scholar] [CrossRef]
Chamroukhi, F.; Nguyen, H.D. Model-based clustering and classification of functional data. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1298. [Google Scholar] [CrossRef]
McLachlan, G.J.; Krishnan, T. The EM Algorithm and Extensions; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
Nguyen, H.D. An introduction to Majorization-Minimization algorithms for machine learning and statistical estimation. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2017, 7, e1198. [Google Scholar] [CrossRef]
Tokushige, S.; Yadohisa, H.; Inada, K. Crisp and fuzzy k-means clustering algorithms for multivariate functional data. Comput. Stat. 2007, 22, 1–16. [Google Scholar] [CrossRef]
Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Ieva, F.; Paganoni, A.N.N.A.; Pigoli, D.; Vitelli, V. Multivariate functional clustering for the analysis of ECG curves morphology. J. R. Stat. Soc. Ser. C Appl. Stat. 2011; in press. [Google Scholar]
Teeraratkul, T.; O’Neill, D.; Lall, S. Shape-based approach to household electric load curve clustering and prediction. IEEE Trans. Smart Grid 2017, 9, 5196–5206. [Google Scholar] [CrossRef]
Yu, D.; Liu, G.; Guo, M.; Liu, X. An improved K-medoids algorithm based on step increasing and optimizing medoids. Expert Syst. Appl. 2018, 92, 464–473. [Google Scholar] [CrossRef]
Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2022, 622, 178–210. [Google Scholar] [CrossRef]
Arthur, D. Vassilvitskii K-means++ the advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
Ushakov, A.V.; Vasilyev, I. Near-optimal large-scale k-medoids clustering. Inf. Sci. 2021, 545, 344–362. [Google Scholar] [CrossRef]
Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR time series archive. IEEE/CAA J. Autom. Sin. 2019, 6, 1293–1305. [Google Scholar] [CrossRef]
Everette, G. Exponential smoothing: The state of the art—Part II. Int. J. Forecast. 2006, 22, 637–666. [Google Scholar]
McKinley, S.; Levine, M. Cubic spline interpolation. Coll. Redwoods 1998, 45, 1049–1060. [Google Scholar]
Ay, M.; Özbakır, L.; Kulluk, S.; Gülmez, B.; Öztürk, G.; Özer, S. FC-Kmeans: Fixed-centered K-means algorithm. Expert Syst. Appl. 2023, 211, 118656. [Google Scholar] [CrossRef]
Park, H.S.; Jun, C.H. A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 2009, 36, 3336–3341. [Google Scholar] [CrossRef]
Luna-Romera, J.M.; Martínez-Ballesteros, M.; García-Gutiérrez, J.; Riquelme, J.C. External clustering validity index based on chi-squared statistical test. Inf. Sci. 2019, 487, 1–17. [Google Scholar] [CrossRef]
Xu, Q.; Zhang, Q.; Liu, J. Efficient synthetical clustering validity indexes for hierarchical clustering. Expert Syst. Appl. 2020, 151, 113367. [Google Scholar] [CrossRef]
Tharwat, A. Classification assessment methods. Appl. Comput. Inform. 2020, 17, 168–192. [Google Scholar] [CrossRef]

Figure 1. Two curves belonging to different classes; the bold parts of the two curves have different curvatures, while the subplot shows their real states.

Figure 2. The blue curve segments in (a) and (b) are both

y (t) = 0.5 t^{3} - 0.5 t^{2} - 0.8 t + 1

on

[- 1.5, 2]

. The orange curve segments in (a) and (b) are

y_{1} (t) = s i n (2 t + 1) + 1

and

y_{2} (t) = 0.5 t^{3} - t^{2} - 0.5 t + 2.3

on

[- 1.5, 2]

, respectively.

Figure 2. The blue curve segments in (a) and (b) are both

y (t) = 0.5 t^{3} - 0.5 t^{2} - 0.8 t + 1

on

[- 1.5, 2]

. The orange curve segments in (a) and (b) are

y_{1} (t) = s i n (2 t + 1) + 1

and

y_{2} (t) = 0.5 t^{3} - t^{2} - 0.5 t + 2.3

on

[- 1.5, 2]

, respectively.

Figure 3. The curve sets fitted by the Beef, Car, and GunPoint datasets.

Figure 4. The first column shows curves randomly selected from the ArrowHead and Beef datasets. The second and the third columns respectively present the slope information and curvature information of the corresponding curves. Points exhibiting divergent trends on curves are highlighted with black dashed lines, while those with similar trends are indicated by green dashed lines.

Figure 5. Calculation process of the proposed WCDM.

Figure 6. Classification performance in terms of Accuracy. Each point represents a dataset, and a point that falls in the lower triangle area indicates that WCDM works better.

Figure 7. Classification performance in terms of

F_{1}

-score. Each point represents a dataset, and a point that falls in the lower triangle area indicates that WCDM works better.

Figure 7. Classification performance in terms of

F_{1}

-score. Each point represents a dataset, and a point that falls in the lower triangle area indicates that WCDM works better.

Figure 8. Time required for calculating the dissimilarity between curves from the Beef, OliveOil, and GunPoint datasets.

Table 1. Summary of previous works on dissimilarity measures for curves.

Category	Measure	Param Free	Anti Noise	Higher-Order Info	Diff Length	Local Shift	Time Complexity
Lock-step measure	$l_{1}$ -norm [10]	✓	×	×	×	×	O(n)
	$l_{2}$ -norm [10]	✓	×	×	×	×	O(n)
	$l_{\infty}$ -norm [10]	✓	×	×	×	×	O(n)
Elastic measure	Hausdorff [11]	✓	×	×	✓	×	O( $n^{2}$ )
	Fréchet [6]	✓	×	×	✓	×	O( $n^{2}$ )
	DTW [15]	✓	×	×	✓	✓	O( $n^{2}$ )
	DDTW [16]	✓	×	✓	✓	✓	O( $n^{2}$ )
	ShapeDTW [18]	×	×	✓	✓	✓	O( $n^{2}$ )
	LSDTW [19]	×	✓	✓	✓	✓	O( $n^{2}$ )
	SSDTW [7]	✓	×	×	✓	✓	O( $n^{2}$ )
	LCSS [21]	×	✓	×	✓	✓	O( $n^{2}$ )
	EDR [22]	×	✓	×	✓	✓	O( $n^{2}$ )
	OWD [23]	×	×	×	✓	×	O( $m n$ )
	LIP [24]	×	×	×	✓	×	O( $n l o g n$ )

n is the length of curves, m is the number of local min points.

Table 2. Details of the datasets and smoothing factors.

Datasets	Instances	Train Sets	Test Sets	Features	Classes	Type	Smoothing Factors
ArrowHead	211	36	175	251	3	IMAGE	0.3
Beef	60	30	30	470	5	SPECTRO	0.1
BeetleFly	40	20	20	512	2	IMAGE	0.5
BirdChicken	40	20	20	512	2	IMAGE	0.1
Car	120	60	60	577	4	SENSOR	0.03
Chinatown	363	20	343	24	2	TRAFFIC	0.5
Coffee	56	28	28	286	2	SPECTRO	0.3
Earthquakes	461	322	139	512	2	SENSOR	0.4
FiftyWords	905	450	455	270	50	IMAGE	0.1
Fish	350	175	175	463	7	IMAGE	0.3
Fungi	204	18	186	201	18	HRM	0.1
GunPoint	200	50	150	150	2	MOTION	0.5
Mallat	2400	55	2345	1024	8	SIMULATED	0.3
MoteStrain	1272	20	1252	84	2	SENSOR	0.3
OliveOil	60	30	30	570	4	SPECTRO	0.1
OSULeaf	442	200	242	427	6	IMAGE	0.1
PowerCons	360	180	180	144	2	POWER	0.2
Symbols	1020	25	995	398	6	IMAGE	0.3

Table 3. Effect of parameter adjustment on the DCK-medoids clustering results.

Datasets	Purity						ARI
Datasets	0.1	0.3	0.5	0.8	1	2	0.1	0.3	0.5	0.8	1	2
ArrowHead	0.550	0.573	0.607	0.592	0.588	0.521	0.195	0.162	0.220	0.143	0.156	0.132
BeetleFly	0.725	0.575	0.725	0.575	0.575	0.525	0.182	−0.002	0.182	0.005	−0.002	−0.024
Car	0.550	0.542	0.542	0.542	0.500	0.450	0.179	0.157	0.157	0.157	0.081	0.117
Earthquakes	0.798	0.798	0.798	0.798	0.798	0.798	0.001	0.003	0.014	0.009	0.009	0.001
PowerCons	0.906	0.906	0.906	0.889	0.597	0.806	0.657	0.657	0.657	0.604	0.359	0.372
Symbols	0.657	0.655	0.738	0.747	0.746	0.611	0.557	0.552	0.608	0.642	0.642	0.419

Table 4. Comparison of curve clustering algorithms.

Datasets	K-Means++				K-Medoids				DCK-Medoids ( $μ = 0.5$ )
Datasets	Purity	ARI	DI	NI	Purity	ARI	DI	NI	Purity	ARI	DI	NI
ArrowHead	0.415	0.019	0.325	2	0.605	0.227	0.041	3	0.607	0.220	0.020	3
Beef	0.390	0.077	0.106	3	0.440	0.102	0.038	4	0.450	0.133	0.065	3
Beetlefly	0.545	−0.001	0.456	2	0.555	0.005	0.421	3	0.725	0.182	0.475	2
Birdchicken	0.543	−0.003	0.238	2	0.580	0.004	0.261	3	0.550	−0.012	0.265	2
Car	0.472	0.091	0.086	5	0.530	0.145	0.088	4	0.542	0.157	0.081	3
Chinatown	0.793	0.336	0.058	3	0.793	0.336	0.058	4	0.793	0.336	0.058	3
Coffee	0.536	0.003	0.406	1	0.780	0.405	0.152	3	0.947	0.793	0.182	3
Earthquakes	0.798	0.004	0.515	2	0.798	0.002	0.599	3	0.798	0.014	0.570	3
FiftyWords	0.451	0.233	0.088	6	0.520	0.221	0.090	6	0.527	0.266	0.065	5
Fish	0.378	0.096	0.054	5	0.465	0.177	0.058	4	0.540	0.254	0.060	3
Fungi	0.867	0.793	0.053	4	0.795	0.662	0.035	4	0.662	0.443	0.037	3
Gunpoint	0.533	0.001	0.066	5	0.522	−0.002	0.070	3	0.510	−0.005	0.074	2
Mallat	0.452	0.321	0.084	4	0.564	0.417	0.038	4	0.633	0.534	0.060	4
MoteStrain	0.539	0.000	0.298	1	0.689	0.180	0.061	3	0.802	0.364	0.044	4
OliveOil	0.690	0.410	0.141	4	0.715	0.403	0.086	4	0.767	0.611	0.087	3
OSULeaf	0.299	0.026	0.126	3	0.313	0.038	0.134	4	0.330	0.051	0.181	3
PowerCons	0.563	0.015	0.186	4	0.787	0.378	0.132	3	0.906	0.657	0.124	2
Symbols	0.540	0.337	0.063	5	0.610	0.420	0.028	4	0.738	0.608	0.051	3

Table 5. Clustering performance in terms of Purity.

Datasets	ED	DTW	LSDTW	Hausdorff	Fréchet	LCSS ( $ϵ, δ$ )	WCDM
Beef	0.467	0.450	0.417	0.433	0.433	0.450 (0.1, 5)	0.450
Beetlefly	0.575	0.625	0.650	0.625	0.525	0.600 (0.2, 6)	0.725
Birdchicken	0.550	0.525	0.575	0.550	0.575	0.600 (0.2, 6)	0.575
Car	0.558	0.583	0.542	0.592	0.592	0.408 (0.1, 6)	0.542
Chinatown	0.769	0.799	0.755	0.713	0.713	0.716 (0.4, 4)	0.793
Earthquakes	0.798	0.798	0.798	0.798	0.798	0.798 (0.1,6)	0.798
FiftyWords	0.540	0.578	0.669	0.552	0.559	0.410 (0.1, 6)	0.527
Fish	0.506	0.529	0.754	0.431	0.406	0.451 (0.1, 6)	0.540
Fungi	0.809	0.564	0.828	0.583	0.583	0.676 (0.2, 10)	0.662
GunPoint	0.505	0.505	0.505	0.570	0.570	0.510 (0.1, 6)	0.510
MoteStrain	0.827	0.803	0.789	0.770	0.782	0.539 (0.4, 8)	0.802
OilveOil	0.717	0.817	0.817	0.700	0.700	0.650 (0.05, 6)	0.767
OSULeaf	0.355	0.416	0.416	0.391	0.344	0.339 (0.2, 11)	0.330
PowerCons	0.894	0.897	0.906	0.881	0.883	0.828 (0.5, 11)	0.906
wins number	7/8	8/8	9/9	5/10	6/10	6/11	-

Table 6. Clustering performance in terms of ARI.

Datasets	ED	DTW	LSDTW	Hausdorff	Fréchet	LCSS ( $ϵ, δ$ )	WCDM
Beef	0.160	0.113	0.112	0.120	0.120	0.113 (0.1, 5)	0.133
Beetlefly	−0.002	0.038	0.068	0.039	−0.024	0.015 (0.2, 6)	0.182
Birdchicken	−0.015	−0.022	−0.003	−0.014	0.000	0.015 (0.2, 6)	0.002
Car	0.178	0.190	0.178	0.225	0.225	0.072 (0.1, 6)	0.157
Chinatown	0.281	0.351	0.252	0.157	0.150	0.073 (0.4, 4)	0.336
Earthquakes	0.003	0.089	0.010	0.000	0.044	−0.068 (0.1, 6)	0.014
FiftyWords	0.293	0.387	0.523	0.372	0.321	0.172 (0.1, 6)	0.266
Fish	0.216	0.251	0.621	0.156	0.161	0.169 (0.1, 6)	0.254
Fungi	0.751	0.352	0.695	0.445	0.453	0.487 (0.2, 10)	0.443
GunPoint	−0.005	−0.005	−0.005	0.014	0.014	−0.005 (0.1, 6)	−0.005
MoteStrain	0.427	0.366	0.334	0.292	0.318	0.000 (0.4, 8)	0.364
OilveOil	0.478	0.609	0.557	0.450	0.450	0.381 (0.05, 6)	0.611
OSULeaf	0.095	0.146	0.160	0.123	0.103	0.070 (0.2, 11)	0.051
PowerCons	0.622	0.630	0.657	0.578	0.587	0.428 (0.5, 11)	0.657
wins number	7/8	7/8	7/9	5/9	6/8	4/11	-

Table 7. Clustering performance in terms of DI.

Datasets	ED	DTW	LSDTW	Hausdorff	Fréchet	LCSS ( $ϵ, δ$ )	WCDM
Beef	0.065	0.043	0.090	0.065	0.065	0.036 (0.1, 5)	0.065
Beetlefly	0.465	0.563	0.594	0.467	0.372	0.462 (0.2, 6)	0.475
Birdchicken	0.292	0.415	0.254	0.246	0.305	0.090 (0.2, 6)	0.281
Car	0.088	0.032	0.031	0.094	0.094	0.061 (0.1, 6)	0.081
Chinatown	0.078	0.067	0.050	0.059	0.046	0.034 (0.4, 4)	0.058
Earthquakes	0.496	0.489	0.487	0.487	0.507	0.489 (0.1, 6)	0.570
FiftyWords	0.099	0.066	0.099	0.081	0.088	0.067 (0.1, 6)	0.065
Fish	0.080	0.046	0.046	0.063	0.063	0.040 (0.1, 6)	0.060
Fungi	0.055	0.008	0.018	0.014	0.016	0.016 (0.2, 10)	0.037
GunPoint	0.139	0.091	0.139	0.084	0.084	0.066 (0.1, 6)	0.074
MoteStrain	0.045	0.057	0.044	0.045	0.042	0.020 (0.4, 8)	0.044
OilveOil	0.106	0.054	0.048	0.132	0.132	0.164 (0.05, 6)	0.087
OSULeaf	0.167	0.154	0.110	0.181	0.154	0.096 (0.2, 11)	0.181
PowerCons	0.108	0.113	0.138	0.106	0.108	0.109 (0.5, 11)	0.124
wins number	9/5	6/8	6/9	6/8	7/8	2/12	-

Table 8. Spectral data used in the experiment.

Type	Data Volume	S/N	Test:Train	Classes	Dimentionality	Smoothing Factors
A/F/G/K	50/50/50/50	<10	2:8	4	3121	0.5
A/F/G/K	50/50/50/50	>30	2:8	4	3121	0.5

Table 9. Information of the spectral data.

	Type	S/N	Clustering Results			Classification Results
	Type	S/N	Purity	ARI	DI	Accuracy	$F_{1}$ -Score
	A/F/G/K	<10	0.265	0.000	0.007	0.400	0.197
WCDM	A/F/G/K	>30	0.515	0.189	0.000	0.675	0.698
	A/F/G/K	<10	0.340	0.007	0.005	0.650	0.664
ED	A/F/G/K	>30	0.425	0.139	0.000	0.650	0.678

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Cai, J.; Yang, H.; Wang, J.; Liang, B.; Zhao, X. A New Composite Dissimilarity Measure for Planar Curves Based on Higher-Order Derivatives. Mathematics 2024, 12, 3083. https://doi.org/10.3390/math12193083

AMA Style

Wang Y, Cai J, Yang H, Wang J, Liang B, Zhao X. A New Composite Dissimilarity Measure for Planar Curves Based on Higher-Order Derivatives. Mathematics. 2024; 12(19):3083. https://doi.org/10.3390/math12193083

Chicago/Turabian Style

Wang, Yupeng, Jianghui Cai, Haifeng Yang, Jie Wang, Bo Liang, and Xujun Zhao. 2024. "A New Composite Dissimilarity Measure for Planar Curves Based on Higher-Order Derivatives" Mathematics 12, no. 19: 3083. https://doi.org/10.3390/math12193083

APA Style

Wang, Y., Cai, J., Yang, H., Wang, J., Liang, B., & Zhao, X. (2024). A New Composite Dissimilarity Measure for Planar Curves Based on Higher-Order Derivatives. Mathematics, 12(19), 3083. https://doi.org/10.3390/math12193083

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Composite Dissimilarity Measure for Planar Curves Based on Higher-Order Derivatives

Abstract

1. Introduction

2. Related Works

2.1. Dissimilarity Measures between Curves

2.2. Clustering Methods for Curves

3. Theories and Methods

3.1. Proposed Dissimilarity Metric: WCDM

3.2. The Weighting Function in the WCDM

3.3. Metric Property of the WCDM

4. Experimental Scheme and Analysis

4.1. Strategy for Selecting the Initial Cluster Centers

4.2. Evaluation of the Proposed WCDM

4.2.1. Comparison of Dissimilarity Measures for Clustering

4.2.2. Comparison of Dissimilarity Measures for Classification

4.2.3. Time Efficiency of the WCDM

4.2.4. Application of the WCDM to Spectral Data

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI