1. Introduction
Three-way proximity data are a structure of relationship data that is collected or measured between pairs of objects under multiple occasions, i.e., sources, settings, times, and experimental conditions. They extend the standard two-way proximity data, where the same objects are the entities in relation, to a three-dimensional framework. The additional dimension is given by the occasion on which the single pairwise proximity (similarity or dissimilarity) was collected. Three-way proximity data are typically arranged as a set of square data matrices (one for each occasion), each containing the proximities between all pairs of objects, and can be either symmetric or asymmetric.
In the latter case, where the relationship between the objects A and B is different from that between B and A, the asymmetry denotes a certain disequilibrium (imbalance) in the pairwise relationship. As a result, not only the presence of such relationships but also their directionality and magnitude, both of which are significant and cannot be ignored, define the true pattern of the data.
This phenomenon occurs in a variety of disciplines, including psychology, sociology, marketing research, social sciences, behavioral sciences, environmental sciences, and beyond, to name a few. Asymmetric proximity data can stem from diverse sources, such as judgments concerning the similarity between stimuli (confusion data), interactions among social actors (social networks), patterns of mobility, transitions between employment sectors, brand switching behaviors, or commercial transactions and trades between countries.
When dealing with asymmetric data, it may be of interest to unveil common behaviors of exchange in order to identify groups of objects that are primarily either origins or destinations of the exchanges, and in the presence of multiple occasions, such common patterns can also be analyzed and compared across them. For instance, the international student mobility between countries over several years yields a three-way asymmetric proximity array, where, in each annual matrix, the rows and columns correspond to the origins and destinations of the mobile students, respectively. The identification of possible pathways of student mobility over time can be an important source of information for policy makers.
Specially designed multiway models and methods are required to analyze such data. Within this framework, clustering three-way data is a complex and challenging task since asymmetric proximity matrices may subsume different classifications of the objects due to the heterogeneity across occasions, and the asymmetry may contain some information relevant to clustering efforts.
When analyzing asymmetric data, the asymmetry is often ignored by symmetrizing the proximities by averaging the two different values for each pair of objects. Nevertheless, asymmetry can be substantial and prominent, and deserves attention because it contains important information about the real patterns in the data, in terms of both the direction and the magnitude of the relationships.
A review of the literature on cluster analysis reveals a notable disparity: methods for asymmetric proximity data have received less attention compared with the large number of models and methods that have been proposed for symmetric proximity data, and even less for three-way data.
Clustering methods for asymmetric data have been developed for a single data matrix mainly following two approaches (see [
1,
2,
3] for extended reviews of clustering methods for asymmetric data).
In the first approach, two different classification structures are estimated, one for the rows and one for the columns of the asymmetric matrix, since rows and columns are assumed to refer to two different sets of objects. Within this approach, the majority of the proposed clustering algorithms generally estimate hierarchical trees and identify non-overlapping clusters, i.e., each object belongs to only one cluster [
4,
5,
6,
7]. A non-hierarchical clustering algorithm, called GENNCLUS, that provides either overlapping or non-overlapping clusters for asymmetric or symmetric data has been proposed as a generalization of ADCLUS [
8].
In the second approach, a single classification structure is estimated from both rows and columns of the asymmetric matrix because, in accordance with their true nature, they refer to the same set of objects. Within this approach, most of the methodologies extend the classical aggregative hierarchical methods [
9,
10,
11,
12], while few proposals concern non-hierarchical methods. Some of the latter are basically extensions of the
k-means clustering [
13,
14,
15]. In the same vein but with a different modeling, more recent proposals [
16,
17] concern non-hierarchical clustering methods that simultaneously fit both the symmetric and the skew-symmetric parts of the data resulting from the algebraic decomposition of the asymmetric matrix (see
Section 1.1 for the definition). Specific clustering models for skew-symmetric data have been proposed by Vicari [
18] and Vicari and Di Nuzzo [
19]. In the first model, possible external covariates for the objects have been included, while the second proposal relies on the between-cluster effects modeled by Singular Value Decompositions that exploit the peculiar properties of the skew-symmetric matrices.
Regarding three-way asymmetric proximities, several models have been proposed within the framework of Multidimensional Scaling (MDS) (see [
3] for a review of MDS methods for multiway asymmetric data), but to our knowledge, only one clustering model has been proposed in order to analyze three-way asymmetric proximity data. The proposal of Chaturvedi and Carroll [
20] generalizes the INDCLUS model [
21] to the asymmetric case by identifying two different sets of (overlapping) clusters of the objects (for the rows and the columns of the data matrices, respectively) common to all occasions, while the three-way heterogeneity is accounted for by occasion-specific weights for the clusters.
We can observe that since Generalized INDCLUS [
20] aims at finding two different sets of (possibly overlapping) clusters of objects, the identification of origins and destinations of exchange patterns is rather complicated from an interpretative point of view. This is because the clusters are only partially matched, which often leads to non-parsimonious solutions. Conversely, in many scenarios, when dealing with this type of data, the goal is to analyze the exchange between the same groups of objects that serve as either origins or destinations, because they contain objects with common exchange behavior toward the other groups across occasions, both in magnitude and in direction.
Accordingly, in the lack of proposals for clustering three-way asymmetric proximities, our research objectives are (1) to identify the same groups of objects (for the rows and columns of the data matrices) that are primarily origins or destinations of the exchanges and, together, (2) to measure the extent to which these clusters differ across occasions.
To fill this research gap, we start from the clustering model proposed by Vicari [
17], which addresses the first issue and accounts for between-cluster effects when only a single dissimilarity matrix is available (two-way case). In the present paper, we generalize the model [
17] to the three-way case with the aim of also accounting for heterogeneity across occasions.
The model is based on the decomposition of each asymmetric matrix into the sum of its symmetric and skew-symmetric components, which are modeled jointly. The asymmetric dissimilarities are assumed to subsume two clustering structures common to all occasions: the first defines a standard partitioning of all objects that fits the symmetric component of the exchanges; the second one, which fits the imbalances, defines an incomplete partition of objects, some of which are allowed to remain unallocated to any cluster. In both clustering structures, objects within the same cluster share the same behavior with respect to exchanges that are directed to objects in different clusters so that “origin” and “destination” clusters are identified. Objects possibly unassigned to any cluster of the incomplete partition represent “nearly” symmetric objects, characterized by small imbalances. As a novel contribution, this paper accounts for heterogeneity across occasions by estimating occasion-specific sets of weights that can capture both the average magnitude and the direction of exchange between clusters. This makes it possible to analyze the role of the common clusters as either origin or destination, which may differ across occasions.
This paper is organized as follows: After an illustrative example to introduce the problem (
Section 1.1), the model is formalized in a general framework in
Section 2 and an appropriate algorithm is proposed in
Section 3. Applications to the artificial data of
Section 1.1 and to real mobility data are presented in
Section 4 and
Section 5, respectively, to illustrate the usefulness and effectiveness of the proposal, also in comparison with the Generalized INDCLUS [
20]. Finally,
Section 6 summarizes the findings and concludes with directions for future developments.
1.1. Illustrative Example: Artificial Data
Before discussing the model in detail, an illustrative example of the heterogeneous three-way asymmetric data we are dealing with is presented.
Let us consider a three-way array of asymmetric dissimilarity data pertaining to pairwise exchanges between
objects measured at
occasions. The three matrices
(
) in
Figure 1, containing the exchanges of each of the three occasions, have been artificially generated from model (
4), which will be fully formalized in
Section 2, and by assuming three clusters of objects, namely,
,
, and
.
Exchanges are generally asymmetric: for example, the exchange from a to e is different from the exchange from e to a in each occasion. Furthermore, the exchanges from a to all other objects are greater than the exchanges from all other objects to a in occasions 1 and 3, while the opposite is true for occasion 2.
In addition, using the Gower decomposition [
22], each matrix
(
) can be decomposed into its symmetric and skew-symmetric components
and
, respectively, as shown in
Figure 2.
Let us recall the Gower decomposition [
22] of any square matrix
(
), which can be uniquely decomposed into the sum of a symmetric matrix
and a skew-symmetric matrix
, both of size (
) and orthogonal to each other (i.e.,
),
as elementary linear algebra can easily prove. The entry
represents the average amount of the exchange between objects
i and
j at occasion
h (
), while the entry
represents the imbalance between
i and
j, i.e., the amount by which
differs from its mean
(
and
). Thus, each element of the skew-symmetric matrix
is, by definition, such that
, and conveys information about the direction of the exchange.
According to this decomposition, every exchange between a pair of objects in matrix (), for example, from the object a to the object e in the first occasion, can be decomposed into the sum of two entries: (1) the average amount of the exchange between a and e, in , and (2) the imbalance between the exchanges from a to e and from e to a, in , i.e., the amount by which the exchange between a and e differs from its mean .
Based on the Gower decomposition, it is also possible to measure the percentage of asymmetry of each matrix as the squared ratio between the Frobenius norm of its skew-symmetric component and the Frobenius norm of , i.e., , which is a measure in .
The percentage of asymmetry of each of the matrices in
Figure 1 is not negligible (
,
, and
), which confirms that the structure of the data is characterized by the direction and magnitude of the exchanges, which are both relevant and cannot be ignored.
The underlying pattern in the data is evident from
Figure 1. In each heatmap, different shades of blue represent different magnitudes of exchange between pairs of objects in different clusters, with darker shades representing higher magnitudes. With a view of searching for origin/destination clusters, we are interested in modeling the exchange between clusters, while the exchange within clusters is not of interest here. Thus, the diagonal blocks are left white for better clarity. The heatmap of each data matrix shows three “blue” off-diagonal blocks, corresponding to the three clusters
,
, and
, each with a common pattern across matrices.
Furthermore, the same clustering structure can be identified in both symmetric and skew-symmetric components
and
across all occasions (
Figure 2).
By analyzing the symmetric and skew-symmetric components of each data matrix, it becomes easier to disclose the pattern of the data and determine which groups of objects serve primarily as the origin or destination of the exchanges.
Objects in the same cluster have a common behavior towards objects in different clusters, in terms of outgoing and incoming exchanges, but the magnitude and direction of the exchanges change across occasions. As an example, if we look at the first and third occasions, it is clear that the outgoing exchanges from all objects in directed to objects in and are smaller than the incoming exchanges. This results in negative imbalances for the exchanges from all objects in to objects in either or . Cluster can therefore be considered as the origin cluster for these two occasions. Conversely, in the second occasion, serves as a destination cluster, as its outgoing amounts towards and are greater than its incoming ones, resulting in positive imbalances.
By inspecting all the symmetric and skew-symmetric components, we may also note that both the objects b and c (in cluster ) present a common behavior across occasions: they have almost symmetrical exchanges with the objects g and h (in cluster ) and small imbalances with all other objects belonging to either cluster or .
In order to have a better understanding of the underlying patterns and identify the common behaviors of the objects across the different occasions, the model formalized in
Section 2 is fitted to the artificial data and the results analyzed in
Section 4, together with the results from the Generalized INDCLUS model [
20] for comparison.
2. The Model
Let us assume that () is a square asymmetric matrix where the element represents the pairwise dissimilarity between the objects i and l () observed at the occasion h () and is generally different from .
The model proposed here aims at clustering the
N objects by accounting for both the symmetric and the skew-symmetric effects from the decomposition (
1) of the observed asymmetries
(
).
In particular, the data are assumed to subsume two clustering structures that are common to all occasions: the first one defines a standard partitioning of all objects fitting the average amount of the exchanges; the second one, which fits the imbalances, defines an “incomplete” partitioning of the objects, where some of them are allowed to remain unassigned.
Specifically, the clustering structure consists of two nested partitions into J clusters and , such that
Every object belongs to one and only one non-empty cluster ();
If object i belongs to cluster , it can either belong to cluster or remain unassigned to any cluster of the partition .
The partition is referred to as a complete partition because every object must be assigned to some cluster , while is called an incomplete partition because a number of () out of N objects are allowed to remain unassigned to any cluster. Note that the complete and the incomplete partitions are common to all occasions and linked together, the latter being constrained to be nested within the former ( for ).
The complete partition is uniquely identified by an () binary membership matrix ( for and and for ), where if object i belongs to cluster , and otherwise.
The incomplete partition is identified by an () binary membership matrix ( for and ), where ( and ); i.e., any object i can either remain unassigned to any cluster or belong to cluster if it belongs to cluster in the complete partition.
Hereafter, denotes the identity matrix of size N; and denote the matrix of size () of all ones and the column vector with A ones, respectively; is the () matrix of ones except for the zeros on the main diagonal; and is the () augmented matrix obtained by collecting the H matrices of size () next to each other.
Let us consider the Gower decomposition (
1) of each matrix
(
) into the sum of its symmetric and skew-symmetric components
and
, respectively. Both of these components can be modeled by defining two clustering structures that depend on the matrices
and
, respectively, as introduced in Vicari [
17] for a two-way asymmetric dissimilarity matrix.
Specifically, the symmetric component
and the skew-symmetric component
for the occasion
h (
) are modeled by the two clustering structures introduced in Vicari [
16,
18] and depend on the common complete and incomplete membership matrices
and
, respectively, as follows:
where
and ;
and with and being the occasion-specific weight vectors of size J associated with the clusters of the complete and incomplete partition, respectively;
The error terms and represent the parts of and not accounted for by the model, respectively.
For identifiability reasons, any matrix is constrained to sum to zero: i.e., ().
Models (
2) and (3) can be combined and plugged into (
1) to specify the model accounting for the asymmetric dissimilarities between clusters at the occasion
h, as follows:
where
is an additive constant term and the general error term
represents the part of
not accounted for by the model.
Models (
2) and (3) can be expressed in compact notation in terms of
and
, which denote the (
) augmented matrices obtained by collecting the
H matrices
and
next to each other, respectively, as follows:
where ⊗ denotes the Kronecker product, and
and
are the two (
) diagonal matrices with the
-vectors
and
as main diagonals, respectively,
and
.
Recall that, given any two matrices
and
of sizes
and
, respectively, the Kronecker product between
and
is the
matrix, as follows:
Finally, model (
4) can be expressed in compact notation in terms of the augmented matrix
by combining models (
5) and (6) as follows:
where
and
.
It is important to note that, in the model, in addition to a common clustering structure, occasion-specific weight vectors and are assumed to account for the heterogeneity of the occasions. These weights allow for measuring the extent to which exchanges vary across occasions, providing quantifications of the exchanges between clusters at the occasion h in terms of magnitude and direction.
3. The Algorithm
In model (
4), the complete and the incomplete membership matrices
and
, the weight vectors
and
, and the constants
(
) can be estimated by solving the following least-squares fitting problem:
subject to
Problem (
8), subject to constraints (
9)–(11), can be reformulated in compact form in terms of model (
7) as follows:
where the equivalence is due to the orthogonality of
and
.
The equivalent constrained optimization problems (
8) and (
12) can be solved by using an Alternating Least-Squares (ALS) algorithm, which alternates the estimation of a set of parameters while maintaining all the others fixed as detailed below.
After an initialization step in which all parameters satisfying the constraints are chosen, the algorithm alternates between two main steps.
In the first step, in order to ensure that the relative loss function (
12) is non-increasing, the two membership matrices
and
are jointly updated row by row in
N substeps by solving assignment problems for the different rows of
and
satisfying the constraints (
9) and (10). Specifically, for a given row
i, setting
for
implies that either
or
; i.e., object
i in matrix
can either be assigned to the same cluster
j as in matrix
or remain unassigned. For the row
i and given the remaining rows of
and
, all possible
assignments of the object
i are considered: either to the corresponding clusters
and
(
and
) or to the cluster
only (
and
). For each of them, the weight vectors
and
(
) are also estimated as optimal solutions of constrained regression problems. Finally, by evaluating the relative loss function (
12) for all possible potential assignments and selecting the one corresponding to the minimum loss value, the assignment of the object
i in
and
is chosen. Note that the relative loss function cannot increase at each substep, since the whole space of feasible solutions for both
and
is explored for each object.
In the same step, the weight vectors and are estimated as solutions of constrained regression problems for each possible choice of the different rows of and , respectively.
In the second step, the constant is then estimated by successive residualizations of the three-way data matrix.
The two main steps are alternated and iterated until convergence. The relative loss function (
12) does not increase at each step, and the algorithm stops when the loss decreases less than a fixed arbitrary positive and small threshold.
In order to increase the chance of finding the global minimum, the best solution over different random starting parameters is retained.
Moreover, in order to estimate the matrices
and
, model (
7) is reformulated as a regression problem with respect to the unknown vectors
and
, as follows:
where
- –
is the column vector of size of the vectorized matrix , i.e., ;
- –
and are the column vectors of size of the vectorized matrices and , respectively;
- –
is a matrix of size (
), where
denotes the Khatri–Rao product [
23,
24];
- –
is a matrix of size ();
- –
is the column vector of size of the vectorized matrix ;
- –
is the column vector of size of the error term.
Recall that, given any the two matrices and with the same number J of columns, the Khatri–Rao product of and is the column-wise Kronecker product, i.e., , where and are the j-th () column of and , respectively.
Therefore, by taking into account (
13), the relative loss function (
12) becomes as follows:
A detailed description of the steps of the algorithm, implemented in MATLAB R2023a, is given below.
Initial estimates of the parameters
,
,
,
, and
are chosen randomly or in a rational way, but they are required to satisfy the set of constraints (
9)–(11).
Given the current estimates of
,
, and
, the weight vectors
, and
are estimated by minimizing (
14) subject to constraints (
9)–(11).
The loss function (
14) is minimized sequentially for the different rows of
and
by solving
N assignment problems. Finally, the column sums of the estimated
are checked to avoid empty clusters.
Furthermore, the weight vectors
and
are estimated in two nested substeps as follows. Given the current
and
, for every possible binary choice for the different rows of
, the vector
is estimated by solving the following regression problem:
Similarly, given the current
, for every admissible choice for the different rows of
, the weight vector
is obtained as the solution of the following constrained regression problem:
subject to constraints (11).
The pseudocode for updating the i-th row of and , holding all other rows constant, and for updating and is as follows.
In the following, let 0 be the J-column vector of zeros; be the J-dimensional vector with all entries equal to 0 except for the j-th one, which is 1; and and denote the vectors corresponding to the i-th row of and , respectively.
Algorithm 1 |
begin for i := 1 to N do for j := 1 to J do
comment: solution of the regression problem (15) corresponding to the possible assignment of object i to cluster j of the complete partition (); for w := 1 to 2 do
if w = 1 then , else if w = 2 then
end;
comment: number of assigned objects in the incomplete partition; comment: solution of the regression problem (16) corresponding to the possible assignment of object i to the cluster j of the incomplete partition (); for h := 1 to H do comment: constraint (11) is imposed; is the Moore–Penrose inverse of ; end
end end
If g = 1 then else if g = 2 then end;
end
|
Given the current estimates of
,
,
, and
, the estimate of
is given by the following:
where the inverse always exists, since
is a full-rank diagonal matrix of size
H with diagonal elements all equal to
.
The relative loss function value is computed for the current values of
,
,
,
, and
and since
is bounded from below, it converges to a point that is expected to be at least a local minimum. When the loss function (
14) has not decreased considerably with respect to a tolerance value, the process is assumed to be converged. Otherwise, steps 1 and 2 are repeated in turn.
Meaning of the Parameter Estimates
In order to evaluate the meaning of the estimated weights of the complete partition, let us consider the membership matrix that uniquely identifies the clusters of the complete partition .
From Equation (
17), the estimated weight
of the cluster
in the
h-th occasion is the average amount of the exchanges between objects in cluster
and objects in clusters different from
, corrected for the mean of the average amounts between all clusters different from
. Hence, large (small) values of
indicate clusters characterized by large (small) amounts of exchanges on average.
Similarly, let us consider the
incomplete partition identified by the matrix
. Then, from (18) imposing the constraint (11), the weight
of the cluster
in the occasion
h is as follows:
where
is the number of objects assigned to the cluster
.
Note that is the average imbalance from all objects in the cluster towards all objects in clusters other than , at the occasion h, corrected for the average imbalance originating from all clusters different from . Therefore, a positive (negative) weight qualifies the cluster as a “destination” (“origin”) cluster of the exchanges at the occasion h, and the objects belonging to such a cluster have a similar pattern in terms of exchanges directed towards the other clusters.
Given the occasion h and due to (3), all objects belonging to the same cluster have the same weight , and the weighted sum of them over all clusters is zero due to constraint (11). Constraint (11) is necessary to guarantee the identifiability of the model and the uniqueness of the solution: this is because the weights are defined by difference. In a general and widely applicable framework, such a choice implies that the model defines a closed exchange system in the sense that the total imbalance between clusters within each occasion is zero. Note that, in specific contexts, a value other than zero could generally be chosen to handle appropriate assumptions. Only (19) in the algorithm should be modified accordingly.
Furthermore, the objects that remain unassigned to any cluster of the incomplete partition are actually objects that generate (almost) zero mutual imbalances and, equivalently, almost symmetric exchanges in all occasions.
The constant term is assumed to be added to all dissimilarities in the h-th occasion and, thus, only affects the average amounts of exchanges between objects (symmetric component ). Therefore, the additive constant in (20) represents the baseline average dissimilarity, independent of any clustering and direction, and plays the same role as the intercept of a linear regression model.
Finally, it is worth noting that, for each occasion
h, model (
7) accounts for the between-cluster effects, while the exchanges within clusters are fitted only by the constant term
, which actually also represents the average exchange between objects within clusters.
4. Illustrative Example: Results
In order to detect the underlying clustering structure common to all occasions and identify “origin” and “destination” clusters, the model proposed here was fitted to the artificial data of
Figure 1, together with the Generalized INDCLUS model [
20] for comparison.
Let us briefly recall the generalization of the INDCLUS model [
20] to fit three-way asymmetric similarity data
(
). The model assumes that there exist two sets of
J overlapping clusters (i.e., two coverings) of the same set of
N objects in row and column, and that they are common to all occasions, namely,
and
, where
and
assume values in
(for
and
); i.e., each object is allowed to belong to more than one cluster or none at all. In addition, a single set of weights is assumed for both set clusters, but different for each occasion. The model can be written as follows:
where
is the non-negative diagonal weight matrix of order
J for the occasion
h,
is a real-valued additive constant for the occasion
h, and
is the error term.
Since model (21) fits similarities, the original artificial dissimilarities in
of
Figure 1 were simply converted into similarities by taking
(
).
The best solution in clusters was retained over 100 random starts of the algorithm.
None of the two best coverings in three overlapping clusters obtained by the Generalized INDCLUS model (relative loss function equal to 0.0228) identify the true generated partition of the objects. The common coverings and for the sets of objects in the rows and in the columns, respectively, consist of three row clusters, , , and , and three column clusters, , , and . The first covering identifies row clusters of objects with similar outgoing exchanges to the column clusters, which contain groups of objects with similar incoming exchanges.
The estimated weights of the three clusters of both coverings in each occasion and the constants are reported in
Table 1. The constants represent the average exchange between all objects within each occasion, regardless of the direction. Thus, the second occasion has the highest average level of exchange (
) and the first occasion the lowest (
).
Table 1 displays the weights of the outgoing row clusters and incoming column clusters, where a large positive weight for any two corresponding clusters
and
(
) qualifies
as the origin of the exchanges directed to the destination
. Therefore, in the second occasion,
is an origin cluster to
. In addition, it can be noted that, in occasion 2, Generalized INDCLUS fails to identify the true behavior of objects
in
as destination from
and origin to
(
Figure 1). Instead, objects
are estimated here to have mutual exchanges with all other objects due to their membership in several column clusters.
Moreover, in the first and third occasions, the clusters and are origins toward , which contains all objects, and , respectively.
We can observe that, in the special case when the clusters from Generalized INDCLUS form a partition (they do not overlap), model (21) reduces to estimating only the exchanges from any
row cluster to its corresponding
column cluster. Conversely, in the presence of overlapping clusters, objects common to different clusters also contribute to the estimation of the exchange between different clusters. That is why, in
Table 1, due to the large cluster overlap, it is very cumbersome to extract the directions of the exchange from any row cluster
to any other column cluster
. It becomes necessary to look at the whole full-size (
) estimated matrices and analyze the estimated exchanges between pairs of objects.
For the sake of clarity, we observe that, in this well-structured artificial situation, the lowest weights are exactly zero, which is generally not the case.
Model (
4) proposed here was fitted to the artificial data of
Figure 1, and the best solution in
clusters was retained over 100 random starts of the algorithm.
The best resulting partition (relative loss equal to 0.0026) correctly identifies the complete partition from the symmetric components consisting of clusters , , , and the incomplete partition formed by clusters , , , which are nested in the corresponding clusters of the former.
The objects b, c, g, and h remain unassigned to any cluster in the incomplete partition from the skew-symmetric components due to their almost zero mutual imbalances and almost symmetric exchanges at each occasion.
Table 2 reports the estimates of the weight vectors
and
and the constant
. In addition, for each occasion
h, the estimated between-cluster components, both the symmetric (
) from the complete partition and the skew-symmetric (
) from the incomplete partition, are shown in
Figure 3.
The estimated constants
(
Table 2) represent the baseline average exchange level at each occasion, regardless of any clustering, and take on different values in the occasions: the highest (
) in the first, while the third has the lowest (
).
Given the occasion
h, the weight
of the cluster
(
) represents the average exchange between the cluster
and every other cluster, in addition to the additive constant. Thus, from the occasion-specific weights
(
) in
Table 2, we observe that, in occasion 1,
subsumes the highest average exchange with the other two clusters (
), while in occasion 2, the highest average exchange concerns
(
). In contrast, in occasion 3, cluster
has the lowest weight (
), while
has the highest (
). This is consistent with what can be seen from the data in
Figure 2.
In addition, the different role of each cluster can be identified from the occasion-specific weights
(
) of the clusters of the incomplete partition, which allow for qualifying origins and destinations. Since each
represents the weighted average imbalance at the occasion
h from the cluster
to all other clusters, in
Table 2, the negative weight of
in the second occasion (
) qualifies
as an origin cluster, while it acts as a destination cluster in occasions 1 and 3, where its corresponding weights are both positive (
and
, respectively). The reverse is true for both clusters
and
.
As an example, we can see from
Figure 3 that the estimated symmetric part of the exchange between the clusters
and
in occasion 1 is 34.6, which is the sum of the estimated weights of
and
(
and
, respectively) plus the constant value (
), i.e.,
.
Moreover, the corresponding estimated imbalance between and yields , which is the difference of the estimated weights of the clusters and ( and , respectively), i.e., .
Finally, the estimated between-cluster dissimilarity between and in the first occasion is the sum of and , hence .
Remark 1. From Figure 3, we can easily derive the summary of what emerges from the detailed results, since the full-size exchange matrices between objects (Figure 2) are synthesized into reduced-size exchange matrices between clusters with a very limited loss of information. This makes it possible to show the role played by each cluster of objects as an origin or destination of exchange, exactly reflecting the original data in Figure 1. 5. Application to Student Mobility Data
The data analyzed in this application have been taken from the OECD (Organization for Economic Co-operation and Development) Education Statistics Database, which collects data annually on international student exchanges in tertiary education.
The three-way data array consists of asymmetric matrices of exchanges of international student mobility observed in countries (objects) from 2016 to 2020 ( occasions). The data report the number of international tertiary students who received their prior education in the origin country (the rows of each year matrix) enrolled in the host country (the columns of each year matrix). The countries are the twenty founding members of the OECD: Austria (AT), Belgium (BE), Canada (CA), Denmark (DK), France (FR), Germany (DE), Greece (EL), Iceland (IS), Ireland (IE), Italy (IT), Luxembourg (LU), Netherlands (NL), Norway (NO), Portugal (PT), Spain (ES), Sweden (SE), Switzerland (CH), Turkey (TR), United Kingdom (UK), and United States of America (US).
Between 2016 and 2020, the total population of mobile international students in the twenty founding members of the OECD experienced a remarkable growth of 16.8%. Specifically, the number jumped from 555,920 students in 2016 to 649,376 students in 2020, reflecting a significant increase in the number of tertiary students enrolled abroad. As highlighted in [
25], the COVID-19 pandemic had a very uneven impact on international student exchanges across countries during 2019–2020. Nevertheless, the percentage of tertiary students enrolled abroad increased in several of the twenty OECD countries and remained unchanged in many others. Overall, there were about three international students for every national student studying abroad in OECD countries in 2016–2020, but this ratio is equal to or greater than ten in the United Kingdom and the United States [
25,
26]. In contrast, Luxembourg is among the twenty OECD founder countries with the lowest ratio of international students to national students abroad [
25].
The aim here is to explore both the magnitude and the direction of the international mobility of the tertiary students to identify clusters of countries that are the main origins or destinations of the student exchanges across 5 years (2016–2020).
First, the data were transformed by calculating for each country the percentage share of outgoing mobile students moving to each of the other countries for tertiary education.
5.1. Mobility Data: Results from Generalized INDCLUS
The Generalized INDCLUS model (21) was first fitted directly to the original similarity data.
The algorithm was run by varying the number of clusters J from 2 to 7, and the best solution in 100 runs from different random starting partitions was retained to prevent from falling into local minima.
The choice of the optimal number of clusters was determined by looking at the decrease in the relative loss function as
J increases. From the scree plot (
Figure 4a, the partition in
clusters was chosen as the best solution for Generalized INDCLUS.
Table 3 shows the row and column clusters, while the occasion-specific weights are reported in
Table 4.
All countries belong to at least one cluster of the row covering, but while Iceland and the United Kingdom belong to only one cluster ( and , respectively), Austria, Canada, Ireland, Switzerland, and the United States belong to three clusters. The strong overlap of the row clusters indicates a very heterogeneous situation of student mobility with, in principle, many countries from which students leave.
On the other hand, only nine countries belong to at least one column cluster, while the remaining countries are not assigned to any cluster; i.e., they correspond to zero row profiles in the membership matrix . This reveals a greater concentration of incoming flows in a few countries than that of outgoing flows from the countries of origin.
The cluster weights of the two coverings (
Table 4) are quite similar over time, indicating that the mobility pattern is quite stable over the years considered.
Every year, student mobility is directed from almost all European countries ( and ) to two main European destinations (Germany and UK), as well as to the US from Canada and Northern Europe.
From the solution, we can say, for example, that Norway and Sweden, together in cluster and separately in clusters and , respectively, have outgoing flows mainly to Denmark, UK, and the US. Swedish students also move to France and the Netherlands, while Norwegian students to Germany. Swedish and Norwegian mobility to all other countries results to being reciprocal to the same extent.
5.2. Mobility Data: Results from the Proposed Model
In order to fit our proposed model that fits dissimilarity data, the proportions of outgoing students in each year were converted to pairwise dissimilarities by taking their complements to 100.
As before, the algorithm was run with
J varying from 2 to 7, and the best solution in 100 runs from different random starting partitions was retained. The scree plot of the loss values in
Figure 4b shows an elbow at the partition with
clusters selected for analysis.
The
complete partition and the
incomplete partition estimated from the symmetric and skew-symmetric components of the exchanges, respectively, are reported in
Table 5, where countries not assigned to the corresponding clusters of the incomplete partition are indicated in italics.
The six clusters are shown in
Figure 5, where different colors indicate different clusters and the unassigned countries of the incomplete partition are shown in light colors on the color scale.
The clusters , , and coincide with , , and , respectively, while six countries (Austria, Belgium, Denmark, France, Netherlands, and Spain) do not belong to any cluster of the incomplete partition because their mutual mobility is basically symmetrical.
The two nested partitions highlight the strength of proximity factors, such as language, historical ties, geographical distance, bilateral relationships, and political framework conditions (e.g., the European Higher Education Area) as key determinants for international student mobility. As an example, the Scandinavian countries in cluster share the same patterns of international student mobility directed to the other OECD countries.
Austria and Belgium, which belong to different clusters ( and , respectively) but remain unassigned in the incomplete partition, show small mutual mobility exchanges over time (0.7195% in 2016 and 0.8485% in 2020) with minimal imbalances (0.3077% in 2016 and 0.5077% in 2020).
The constants
(
increase slightly over the years (
Table 6 and
Figure 6): this means that since the model has been fitted to the complements to 100 of the share of outgoing students in each year, the baseline average mutual exchange between countries (
,
) shows a slight decrease over the years. This also implies that the average mutual mobility within clusters tends to decrease slightly over time, as the baseline fits the within-cluster mobility in each year.
In addition to the baseline exchange (
), the estimated average annual mobility between clusters of countries depends on the weights
(
Table 6). The average mobility has remained relatively stable over time, with very small increases (i.e., very small decreases in the weights
(
), taking into account the transformation of the percentages to obtain dissimilarities) in almost all clusters until 2019. Fluctuations can be observed for some clusters: in 2019, a slight increase in the average mobility was observed for the United Kingdom (
vs.
), in contrast with a slight decrease for the cluster
with Germany and the United States (
vs.
). Conversely, the situation is reversed in 2020, probably due to the impact of Brexit.
Each year, the ranking of the clusters based on the increasing weights
consistently ranks the United Kingdom (
) with the highest average share of reciprocal mobility (regardless of direction), followed by Germany and the United States (
), while Scandinavian countries (
) have the lowest, as can be seen in
Figure 7.
As for the directions of the mobility, the analysis of the cluster weights of the incomplete partition (
Table 7) reveals a consistent pattern over the years: throughout the 2016–2020 period, the mobility originates from countries in the clusters
,
,
, and
(the weights
assigned to these clusters are always negative over time) mainly directed to Germany and the United States (
), as well as to the United Kingdom (
).
In all years, the cluster ranking based on the increasing weights
places Greece, Ireland, and Turkey (
), followed by Luxembourg and Portugal (
), as the main countries of origin of the international students. As can be seen from the direction of the arrow in
Figure 8, the cluster
, which includes Canada, Italy, and Switzerland, is an origin cluster of mobile students toward
and
and, to a greater extent, a destination cluster from
,
, and
. Similarly, the Scandinavian students (
) move to Germany, the US, and the UK (
and
) and to
, while the Scandinavian countries have incoming mobility from
. This differs in part from the results of Generalized INDCLUS, which does not capture the imbalance of the international mobility from Greece, Ireland, and Turkey directed to the Scandinavian countries.
The countries in the two clusters
and
have comparable average mobility over time (the green (
) and red (
) lines overlap in
Figure 7), but with different levels of imbalances: in fact, the student outflows from Greece, Ireland, and Turkey (
) are always larger than those from Luxembourg and Portugal (
), as can be seen in
Figure 8.
English is the most widely spoken language in the globalized world, with one in four people worldwide using it [
27]. Not surprisingly, English-speaking countries are always the most attractive study destinations overall: the UK is the top destination in Europe; the United States and Germany are destinations for international students from Europe and Canada and are the source of student mobility to the United Kingdom. Among the top 3 destinations, Germany is the major recipient country in the European Union.
A few countries (clusters and ) are net “importers” of students; that is, they have more students coming in to study than those leaving to study abroad. In contrast, some clusters of countries, such as and , are net “exporters” of students.
6. Results and Discussion
In a complex situation where the available data represent asymmetric proximities between pairs of objects measured or observed in different occasions, the proposed model aims to unveil a common clustering structure that accounts for the systematic differences between occasions.
The model proposed here proved to be effective in identifying clusters of objects that share a common pattern of exchange across different occasions, even with a different role as origin or destination. The artificial data and the real application have shown as main results (1) the capability of the model to identify clusters of objects with similar exchange behavior in magnitude and direction (origin/destination clusters) and (2) the flexibility to capture possible differences in the directions of such cluster exchanges across occasions, compared with the Generalized INDCLUS model.
Actually, the goal of Generalized INDCLUS is not to summarize the average exchanges of each cluster, and compared with our model, it provides much more complicated solutions in order to derive the behavior of each cluster because of the overlap between clusters, as can also be seen from the simple and small illustrative example. We can observe that, in the special case where the clusters from Generalized INDCLUS form a partition (they do not overlap), as in our model, it actually accounts for the variability
within clusters and is not able to capture the exchanges
between clusters, which is the very goal of our proposal. However, as shown in
Section 4 and
Section 5, we experienced that, in the general case, the clusters resulting from Generalized INDCLUS have (often many) objects in common, which, on the one hand, guarantees flexibility and the possibility to estimate the exchange between pairs of objects belonging to different clusters. On the other hand, this comes at the expense of ease of interpretation and synthesis, since the behavior of objects belonging to the same cluster is not the same regardless of the destination cluster.
Thus, within the scarcity of methods for clustering three-way asymmetric proximity data, our proposal is effective, more parsimonious, and promising when the aim is to obtain a synthesis of the average behavior of exchange between clusters across occasions (which could be useful for policy makers, for example). If, instead, the estimation of any pairwise exchange is the main concern, Generalized INDCLUS might be more flexible in general due to the possible overlap.
Further developments may consider the possibility of assuming a fuzzy clustering, where, instead of a crisp membership of an object to each cluster ( in ), a membership degree in is allowed. In addition, the inclusion of possible covariates in the model, if available, could also be taken into consideration to better investigate the determinants of the exchanges.