2.1. Dataset
The Occupancy dataset is a five-dimensional time series dataset that records temperature (Celsius), relative humidity (percentage), light (Lux), CO
2 (parts per million), humidity ratio (kilograms of water divided by kilograms of air) in an office room. Each data point has one of two classes, indicating if the room is occupied or not. Introduced by Luis M. Candanedo and Véronique Feldheim [
3] (accessible at the UCI ML repository as of 16 January 2022), the original paper explores the use of supervised machine learning models to detect building occupancy and improve the energy efficiency of smart buildings. The dataset contains 20,560 data points recorded over 16 days. In total, 15,843 (77.1%) of the points correspond to the not-occupied class and 4717 (22.9%) to the occupied class. The bias towards the not-occupied class has led to the dataset being used to evaluate unsupervised classification techniques. This dataset has been used in a large number of works. For comparison, these are limited to English language primary research, which tests an unsupervised algorithm to classify occupancy on the unmodified Occupancy dataset. Screening occurred in two phases: the first was based only on abstracts, and the second considered full papers. From an initial pool of 226 papers that cited this dataset, 12 met the inclusion criteria; see Tables 1 and 2.
PCA was used to plot the first three principal components of the dataset. The timestamps for each point were removed, and their sequential nature was ignored (
Figure 1).
The figure shows a clear separation between the two classes of the Occupancy dataset, indicating that a hyperplane may suitably classify it. As this separation is seen when time is disregarded, it would appear that the temporal component of the dataset is not a particularly relevant feature for the classification problem and should be disregarded by a potential classification algorithm. It is also helpful to note that a small number of points from both classes deviate from the main body of the data. This appears to be due to one of the sensors breaking during data collection.
2.2. Category Theory and Kan Extensions for Classification Algorithms
Previous works have suggested that Kan extensions, a tool from category theory, might be used to generalise the notion of extrapolating new data points from previous observations, providing an interesting construction for a supervised classification algorithm from Kan extensions [
2]. This construction can be modified to help motivate the unsupervised C-SVM algorithm in the following section. There are three core concepts which are necessary to introduce the definition of a Kan extension: categories, functors, and natural transforms.
2.2.1. Categories
A category is similar to a graph. Where a graph would have nodes and edges, a category has objects and arrows, called morphisms. A morphism starts at an object (domain) and finishes at an object (codomain). A morphism f from a domain A to a codomain B can be written as .
The morphisms in category theory are inspired by functions. As a result, morphisms can be composed like functions. These compositions can be thought of as paths through a graph. If one morphism ends where another starts, they can be composed. The morphisms and can be composed as . The composition can be represented by a commutative diagram. A diagram is said to commute when all paths with the same domain and codomain (start and finish) are equal, i.e., .
There can be multiple morphisms between two objects, as shown in the non-commutative diagram below.
Finally, it is required that every object in a category has a morphism to itself which does nothing, called the identity morphism. It can be written (for an object A) as or . To say that a morphism does nothing is to say that composing a morphism with an identity morphism yields the same morphism: .
These requirements produce the following definition of a category [
4,
5,
6].
Definition 1 (Category). A category C consists of a class of objects , and between any two objects , a class of morphisms is , such that
Any pair and can be composed to form .
Composition is associative: .
Every object has an identity morphism .
For any , .
2.2.2. Functors
A functor is a morphism between categories. A functor maps objects and morphisms in a category C to objects and morphisms in category D. The structure of a category comes from how its morphisms can be composed together. For the functor to preserve the structure of the category, it must preserve the morphism’s composition. Firstly, this requires that identity morphisms in C are mapped to identity morphisms in D. Secondly, if two morphisms in C combine to make a third, then the image of these three morphisms in D should be a valid composition, i.e., if in C, then in D.
The following defines a functor [
4,
5,
6].
Definition 2 (Functor). A functor , between categories C and D, sends every object to , and every morphism sends every object to , such that
2.2.3. Natural Transforms
A natural transform is a morphism between functors. Given two functors , a natural transform slides the outputs of F to the outputs of G along morphisms in D.
The following defines a natural transform [
4,
5,
6].
Definition 3 (Natural Transform). Given functors , between categories C and D, a natural transformation is a family of morphisms in D for each object , such that for any , i.e., the following diagram commutes.
2.2.4. Kan Extensions
Kan extensions ask how one might extend one functor to produce another. Given two functors and , a Kan extension attempts to find a functor , such that is approximately equal to K. It is overly restrictive (and often less helpful) to ask for to be exactly equal to K. So, Kan extensions weaken the equality to the requirement for some universal natural transformation between the two functors. The left Kan extension asks for a natural transform , and the right asks for . The Kan extensions require that for any natural transform and functor H pair, the natural transform can be factored uniquely as a composition of the “best” natural transform, as well as some other natural transform, e.g., . The notation for the functors, which satisfy the requirements of the left and right Kan extensions, are and , respectively.
The following defines the left and right Kan extensions [
4].
Definition 4 (Left Kan Extension). Given functors , , a left Kan extension of K along G is a functor together with a natural transformation , such that for any other such pair , γ factors uniquely through η.
Definition 5 (Right Kan Extension). Given functors , , a right Kan extension of K along G is a functor together with a natural transformation , such that for any , δ factors uniquely through ϵ
2.2.5. A Supervised Classification Algorithm from Kan Extensions
This subsection summarises the supervised Kan extension classification algorithm described by Dan Shiebler [
2], which forms the basis for the unsupervised algorithm developed in this paper.
The unique IDs of a dataset can be represented by a category whose only morphisms are the identity morphisms for each object. This is called a discrete category. For a given discrete category
, functors may assign values to each data point. The input data can be described by a functor
(
Figure 2). In order to encode some of the geometric information present within the dataset, rather than a discrete category,
I is allowed to be a Preorder. This is a category with at most one morphism between any two objects. An example would be the ordered real numbers
, whose objects are the real numbers and for which a unique morphism
exists if and only if
.
For a dataset with binary classification labels, the target data can be represented by a functor . In this instance, represents an ordered two-object category whose only non-identity morphism is .
For each of the points in I selected by G, there is information about their classification labels given by K. The general principle of a supervised classification algorithm is to extend information from a subset to the whole space. This can be described in this context as finding a suitable functor .
An initial attempt may be to assign
F to be either the left
or right
Kan extensions in Equations (
1)–(
3) from [
2].
An example visualisation of these equations can be seen in
Figure 3, in which
I was set to
. The figure presents the resulting Kan extensions from datasets with overlapping and non-overlapping classes.
In the case where
I is
,
F is forced to become a step function due to the induced ordering of the two categories by their morphisms. This creates a decision boundary at some point in
.
For two functors , a natural transform must select for each object in and a morphism in . This, at most, may alter the output of F from false to true while retaining its monotonicity. Considering the decision boundary in F, the effect of can only be to increase . This means that a natural transform can only exist between F and if . When composed with , the objects of I and their image under G restrict the components of . Consequently, the left and right Kan extensions produce classifying functions with no false negatives and no false positives, respectively.
This approach is not yet sufficient for more complex systems. To extend the utility of this representation, an additional, trainable functor
can be added.
For the case of overlapping classification regions, it is a reasonable assumption that the less these regions overlap, i.e., the smaller the disagreement region, the better the resulting classification is likely to be. For this purpose, by assuming that
is
, a function known as the ordering loss can be introduced [
2], with the guarantee that minimizing the ordering loss will also minimize the disagreement region. The following equation uses
to represent the i-th component of the vector
.
2.3. Motivating C-SVM through Kan Extensions
Two steps are required to motivate the C-SVM from the construction shown in
Section 2.2.5. Firstly, details about the dataset, which were introduced in
Section 2.1, can be used to populate the construction with information specific to this task. Secondly, the construction of a classification algorithm through Kan extensions needs to be modified for it to define an unsupervised algorithm.
The morphism assigns information regarding the input data of the dataset to each of the data points in the discrete category . In this case, the measured values in the time series can be represented as a five-dimensional real number vector . The time series information was determined to be irrelevant in the data analysis. By selecting as the discrete category with n objects, which act analogously as unique IDs to the n data points in the dataset, disregards any time series information. For the sake of this formulation, rather than using G directly, the data can be shifted to have a mean of , giving . The choice of is arbitrary, making it a hyper-parameter of this algorithm. By inspection, the data analysis indicates that normalising the datasets to have a mean of zero is a reasonable choice.
The data analysis identified that the data could be suitably separated with a hyperplane. This can be represented by constraining the trainable portion of the construction to be a linear map into the real numbers
. For this construction,
is the discrete category, with
being the category which represents the ordered set of real numbers. The choice of
as the codomain of
f is equivalent to stating the belief that the points of
can be ordered based on how likely they are to be classified as either “occupied” or “not-occupied”. This means that
f uses the ordering of
to induce a partial ordering on the points of
based on their presumed classification. The role of the Kan extensions in this construction is to decide the cutoff, where points greater than a certain value must be classified as “occupied” and less than that value as “not-occupied”. These choices result in the diagram in
Figure 4.
The second problem is modifying the construction to be applied to an unsupervised learning problem. The definition of the Kan extension requires that the morphism
is known. By introducing an additional morphism (Equation (
8)), it is possible to define
K through composition
, as follows:
Interpreting this composition, introducing
converts the hyperplane
f into a binary classifier. The points’ classification depends on which side of the hyperplane they lie on. Due to the definition of
K by composition, it is guaranteed that there is no overlap in the classification boundaries. As the resulting Kan extensions are from the function space
, the resulting left and right Kan extensions correspond with the functions shown in
Figure 3. The decision boundaries formed by the three function
,
, and
can be visualised as in
Figure 5.
Any choice of f now produces a preliminary classification of the points in the Occupancy dataset. The left and right Kan extensions identify the points closest to the decision boundary. The classification quality produced by a given f can be judged by the distance between the left and right Kan extensions, .
The ordering loss, as previously defined, cannot be used directly for this version of the problem, as it is zero when there is no overlap between classification regions. However, given that it can be guaranteed that there will never be an overlap of the classification regions, the outer max function of the ordering loss function can be removed, allowing the modified ordering loss function (
) to become negative (Equations (
10) and (
11)). Minimisation of the modified ordering loss function maximises the separation region between the left and right Kan extensions. In the particular case where
f has codomain
, the modified ordering loss is reduced to be
, where
can be seen as the difference between the closest points on each side of the hyperplane
f when projected down onto the real numbers (
Figure 5). Ultimately, the choices and modifications applied to the construction produce an algorithm which appears to be a linear, unsupervised SVM whose hyperplane is constrained to pass through
2.4. Implementation of C-SVM
From the construction presented in the previous section, the remaining task is to produce an algorithm which finds a suitable, linear transform,
, which maximises the distance (
) between the decision boundaries produced by the left and right Kan extensions. As
f is a linear transform, it can be defined as
, where
is a five-dimensional real number vector. Applying
f to every point of the normalised dataset,
can be computed as the difference between the data point with the smallest positive value after
f, as well as the data point with the largest negative value after
f. For the purposes of classification, the process of normalising the dataset and then transforming with
f can be understood as assigning a score to each point based on its signed distance from the hyperplane. This corresponds with the image of the point in
(Equation (
12)):
The unsupervised fitting algorithm (Algorithm 1), which this paper introduces, determines the value for each dimension of the normal vector by rotating the vector around axes of a hyper-sphere centred at the constraining point. For each iteration, there are set values, the current value, and the unset values. For each loop, the fitting algorithm checks an integer number of uniformly spaced angles
, which are given by the following formula:
To determine the value which maximises the plane’s distance to the closest point, the algorithm checks integer values of
a, where
. The vector of values this induces can be represented by the notation
, where
. The set values remain unchanged, and only the current value and unset values are varied. At the end of the loop, the maximising value is assigned to the current value. The logic of conserving previously determined values, selecting the active dimension based on the angle, and modifying the following dimensions to preserve the unit magnitude of the vector is encoded in the following piece-wise function (Equation (
13)):
Algorithm 1: Fitting C-SVM |
Input: , |
Output: |
1: | mean of the n points in data |
2: | |
3: | for
to
do |
4: | |
5: | |
6: | |
7: | |
8: | |
9: | |
10: | |
11: | |
12: | end for |
13: | return |
Fitting each of the values sequentially the fitting algorithm achieves a time complexity of with n being the number of data points, m being the number of dimensions, and res being the resolution of the angle search space.