In this section, we will first provide the formalisation for our current work. Subsequently, we will introduce our proposed novel working definition for ordinal classification tasks that is independent from the meaning of the corresponding class labels.
2.1. Formalisation
Let
be a
c-class classification task, which is defined by the
d-dimensional data set
,
, and the corresponding set of class labels
, with
. We denote the resulting index set as
I, i.e.,
. Each element of
is a task-related object, which is a pair consisting of a data sample and its true class label, i.e.,
,
, whereby
denotes the number of elements in the set
. Specifically, it holds,
. Moreover, by
, we denote the binary subtask that is restricted to the classes
and
, i.e., for all
, with
, we define
Therefore, it holds that . For the definition of ordinal classification tasks, in this work, we introduce the term level of separability measures, which we define as follows, in Definition 1.
Definition 1 (Level of Separability Measures).
Let , , , and constitute a binary classification task in a d-dimensional feature space. Furthermore, let be the corresponding binary classification task, where we interchange the class labels of all samples from the task , i.e., . Additionally, for each , we define the set as . We denote the non-constant
and non-random
mapping μ as a level of separability measure (LSM)
, if μ fulfils the following properties: Note that a higher value for
implies a higher level of separability. The properties defined in Equation (
2) are further discussed in the following remark, i.e., in Remark 1.
Remark 1 (Properties of LSM mappings).
Let μ be an LSM mapping by Definition 1. Furthermore, let constitute a binary classification task. Property (P1) of Equation (2) implies that if we set , then the value of is equal to the minimum value of μ, across all . Therefore, we say that μ is point-separating
. More precisely, the set is defined as , i.e., . In such a set, each data point is assigned to both of the class labels, leading to the lowest level of separability, in combination with any LSM mapping μ. Property (P2) of Equation (2) simply implies that interchanging the labels of all data points does not change the value of μ, evaluated in combination with the set X. Therefore, we say that μ is label invariant
, or simply symmetric.
By
, we denote the set of all mappings that measure the level of separability of any binary classification task from the
d-dimensional feature space, i.e.,
Note that by Definition 1, each element of the set
, not only fulfils the properties (P0), (P1), and (P2) of Equation (
2), but is also a
non-constant and
non-random mapping. Moreover, note that the set
is non-empty, for all
, as we briefly discuss in the following example, i.e., Example 1.
Example 1 (Existence of LSM mappings).
Note that the set is non-empty, for all . Let CM be a deterministic
classification model, e.g., a support vector machine, which is trained based on the set . Let be the resubstitution error (training error) of classifier CM, i.e., is the fraction of data points in X that are misclassified by classifier CM. Then, fulfils the properties of Equation (2), for each absolute term , and any deterministic classification model CM. Let
be an LSM mapping. For all
, we define
as follows:
which measures the level of separability between the samples from class
and the samples from class
. Therefore, for
, the statement,
, implies that it is
easier to separate the classes
and
from each other, than to separate the classes
and
from each other. Thus, if it holds,
, we simply say that the binary classification task
has a
higher level of separability than the binary classification task
. Note that from Equations (
1) and (
4), it directly follows that
. In addition, note that setting
to zero, for all
, is a logical consequence, as it is not possible to separate two
identical data sets from each other.
Furthermore, let be the set of all permutations of the set I. More precisely, each , is a bijective function. In addition, by , we denote the reversed permutation of . For instance, for the identity permutation, , it holds, .
By
, we denote the
pairwise separability matrix (PSM), consisting of the elements
, i.e.,
. Moreover, by
, we define the PSM whose rows and columns are rearranged specific to permutation
, i.e.,
. For reasons of simplicity, we will denote
simply by
, with
, whereby
denotes the identity permutation. More precisely,
can be depicted as follows:
Note that by definition, matrix
is symmetric for all
. Finally, we summarise all of the required ingredients for our proposed working definition of ordinal classification tasks in
Table 1.
As briefly introduced in
Section 1, in general, an ordinal class structure is denoted by the ≺-sign, e.g.,
. In an OC task, we denote the
first and the
last class of such an ordered class structure as
edge classes, or simply
edges. In that particular example, i.e.,
, the classes
and
would be the corresponding edges. Moreover, in the common understanding of ordinal (class) structures, the reversed order of an ordinal arrangement also constitutes an ordinal structure. More precisely, the relations
and
are equivalent. Therefore, the
uniqueness of an ordinal class structure is defined by
exactly two class orders (or permutations), originating from the two edges. Based on this observation and the definitions introduced above, in the next subsection, we propose a
working definition for OC tasks that is independent from the
meaning of the corresponding class labels.
2.2. Feature Space-Based Working Definition for Ordinal Classification Tasks
As a prior step to our proposed working definition for ordinal classification tasks, we first introduce the term ordinal arrangement, which we present in Definition 2.
Definition 2 (Ordinal Arrangements).
Let be a symmetric matrix with elements , with . Matrix S represents an ordinal arrangement
, if and only if, , it holds that Note that the properties of Equation (
6) can be summarised as follows. Let
,
, constitute an ordinal arrangement by Definition 2. Then, the relations between the elements of
S can be symbolically depicted as
whereby
denotes the transpose of
S. Note that for each symmetric matrix
,
, property
is equivalent to
, and statement
is equivalent to
(which is implied in Equation (
7) by including
). Based on the concept of ordinal arrangements introduced above, we provide the working definition for ordinal classification tasks as follows.
Definition 3 (Working Definition for Ordinal Classification Tasks).
Let , , , constitute a c-class classification task, with . Let be an LSM mapping. Furthermore, let be the set of all permutations of the set and . We denote as feature space-based ordinal (FS-ordinal), with respect to the order
and mapping
μ, if and only if, , for the corresponding PSMs, , it holds that Note that, as defined in
Section 2.1,
denotes the reversed permutation of
, e.g.,
and
.
If the task constitutes an FS-ordinal classification task, with respect to the order and mapping , we simply say that the task is FS-ordinal specific to .
Figure 2 illustrates the properties of an ordinal classification task, based on a two-dimensional 5-class toy data set. Note that the term
closer in the caption of
Figure 2 does not refer to the distances between different class pairs, but to the order of the classes, i.e., the
arrangement of the columns of the corresponding PSM. More precisely, e.g., for the arrangement
, we say that class
is
closer to class
than to class
. However, for instance, the (averaged) Euclidean distance between the samples from the classes
and
could be greater than the (averaged) Euclidean distance between the samples from the classes
and
.