In this section, the Table Invitation Prior (TIP) is presented in the context of a Gibbs sampler in iteration . An analogy is now provided to illustrate the prior’s mechanics. Suppose that n subjects (i.e., vectors, matrices, tensors, etc.) are sitting in a restaurant with tables (clusters). A randomly selected subject with index r is chosen and then the subjects that are most similar to the subject with index r are “invited” to a new table (cluster) (in this paper, all the subjects accept the invitation with probability 1). The posterior probability of every subject belonging to a table (cluster) is computed for tables (clusters) and, using the probabilities, the subjects are randomly assigned to a table (i.e., sample the posterior cluster assignment for every subject). The variable t is incremented by 1 and the this process continues; here t is the number of times the above process has occurred so far.
A more formal description of the Table Invitation Prior now follows. For the iteration
t in a Gibbs sampler, let the random variable
r be a randomly selected subject index (e.g., a randomly selected index corresponding to an individual gene expression) from a discrete uniform distribution
so that
. Suppose a random subject
is selected (i.e.,
can be a vector, matrix, higher-order tensor, etc.). The set of similarity values between subject
, itself, and the other
subjects is
where
is the similarity between the
rth subject and the
ith subject. Let the
jth ordered similarity value in the set
be
for
and let
The set of indices of the
subjects that are most similar to subject
is given by
where
is a hyperparameter. The estimation of hyperparameter
proceeds in the following manner. First, recall that
r is a randomly selected subject index so that
. The pairwise distances with respect to subject
r are extracted and the distance from subject
r to itself is removed:
The distances are then sorted in ascending order:
Next, a univariate multiple change point detection algorithm is applied to the sorted distances. The change point detection algorithm used in this paper is binary segmentation from the changepoint library in R [
12]. The binary segmentation function in R takes a hyperparameter, denoted by
Q, that is the maximum number of changepoints (
13). In this paper, the value is set to
since the changepoint library will throw an error if
; also this allows the change point method to have the maximum amount of flexibility in detecting changes in the subject distances. Let the set of change points be given by
then the number of subjects that are similar to the subject with index
r is taken to follow a Poisson distribution:
Consequently, the set
is given by:
Let the vector containing the cluster assignments be denoted by
so that the
ith element contains the cluster assignment for the
ith subject and
where
is the total number of clusters after posterior sampling in iteration
t. The Table Invitation Prior is based on selecting a random subject index
r and forming a new cluster with
subjects that are most similar to subject
. That is, the new cluster is formed using the subjects whose indices are in the set
. Consider a modified cluster vector
then the Table Invitation Prior (TIP) is given by