Improving Hybrid Regularized Diffusion Processes with the Triple-Cosine Smoothness Constraint for Re-Ranking

Du, Miao; Cai, Jianfeng

doi:10.3390/math12193082

Open AccessArticle

Improving Hybrid Regularized Diffusion Processes with the Triple-Cosine Smoothness Constraint for Re-Ranking

by

Miao Du

^* and

Jianfeng Cai

School of Management, Northwestern Polytechnical University, Xi’an 710129, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(19), 3082; https://doi.org/10.3390/math12193082

Submission received: 27 August 2024 / Revised: 22 September 2024 / Accepted: 28 September 2024 / Published: 1 October 2024

(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

In the last few decades, diffusion processes have been widely used to solve visual re-ranking problems. The key point of these approaches is that, by diffusing the baseline similarities in the context of other samples, more reliable similarities or dissimilarities can be learned. This was later found to be achieved by solving the optimization problem underlying the framework of the regularized diffusion process. In this paper, the proposed model differs from previous approaches in two aspects. Firstly, by taking the high-order information of the graph into account, a novel smoothness constraint, named the triple-cosine smoothness constraint, is proposed. The triple-cosine smoothness constraint is generated using the cosine of the angle between the vectors in the coordinate system, which is created based on a group of three elements: the queries treated as a whole and two other data points. A hybrid fitting constraint is also introduced into the proposed model. It consists of two types of predefined values, which are, respectively, used to construct two types of terms: the squared

L_{2}

norm and the

L_{1}

norm. Both the closed-form solution and the iterative solution of the proposed model are provided. Secondly, in the proposed model, the learned contextual dissimilarities can be used to describe “one-to-many” relationships, making it applicable to problems with multiple queries, which cannot be solved by previous methods that only handle “one-to-one” relationships. By taking advantage of these “one-to-many” contextual dissimilarities, an iterative re-ranking process based on the proposed model is further provided. Finally, the proposed algorithms are validated on various databases, and comprehensive experiments demonstrate that retrieval results can be effectively improved using our methods.

Keywords:

MSC:

94-08

1. Introduction

Object retrieval is an important issue in computer vision and pattern recognition and is widely used in object recognition [1,2,3], tracking [4,5], bioinformatics [6], medical imaging [7], etc. In general, given one or more queries, the most similar objects in a dataset are obtained based on a pairwise measure, and the retrieval results are presented in the form of ranking similarities or distances between the queries and the other objects. Conventionally, the goal of retrieval is achieved through only pairwise comparisons, with the basic premise that the more similar two elements are, the smaller the measured distance. Unfortunately, this idea suffers from the crucial limitation of ignoring the structure of the underlying data manifold, and the objects from the same category of the query element cannot be retrieved correctly [8,9].

To address this problem, a feasible idea is to analyze the manifold structure by considering all the distances between each pair of elements in the database. In the past two decades, more efforts have been focused on learning contextual similarities, and various methods have been developed to capture the geodesic paths on the manifold, either explicitly or implicitly [10,11,12,13]. Among these studies, a widely accepted approach is to focus on the performance of the baseline similarity diffusing on the graph, where the vertices represent the data points, and the edge between two vertices is weighted by their similarity; thus, the contextual similarity can be learned in an iterative manner. In the retrieval domain, this procedure is called the diffusion process [14,15]. Recently, a standard framework named the regularized diffusion process (RDP) was proposed in [16], which consists of two constraints: the smoothness constraint (SC) and the fitting constraint (FC). From the perspective of this framework, many previous methods can be considered variants of the RDP, and the contextual (dis)similarities are obtained by solving an unconstrained quadratic optimization problem.

Since the RDP was proposed, many improvements have been made to this framework from different perspectives [17,18,19]. With the tensor product graph as a replacement for the original graph, it has been verified that the high-order information of the graph has a great effect on improving retrieval results [9,20]. But there are still some issues that deserve attention: (1) how to construct a triplet-based relationship with high-order information within the RDP framework; (2) how to introduce “one-to-many” relationships into the RDP that contain high-order information; and (3) how to solve optimization problems to obtain closed-form or iterative solutions. In light of these issues, a novel variant of the RDP is proposed in this paper, named the triple-cosine hybrid regularized diffusion process (TH-RDP). The proposed model differs from previous works in two aspects.

Firstly, a novel smoothness constraint named the triple-cosine smoothness constraint is proposed, which takes into account the high-order information of the original graph. We set up a group with three elements: the queries viewed as a whole and two other data points. A coordinate system is then built based on this group. Then, the triple-cosine smoothness constraint is generated using the cosine of the angle between the vectors in the coordinate system. A hybrid fitting constraint is introduced into this model, which consists of two types of terms: the squared

L_{2}

norm and the

L_{1}

norm. These two terms and the corresponding predefined values are adaptively determined by whether there exists at least one edge between the queries and the data points in the graph. Both the closed-form solution and the iterative solution are then deduced for the proposed model.

Secondly, the proposed model can be used to describe “one-to-many” relationships, whereas many other variants can only learn “one-to-many” contextual dissimilarities. This means that our model can be applied to problems with multiple queries, whereas many other models cannot. By taking advantage of “one-to-many” similarities, we further present an iterative re-ranking process based on the proposed model. In this iterative re-ranking process, the relevant objects along the paths are gradually searched from near to far, and the data manifold structure can be explicitly captured.

The rest of this paper is organized as follows. In the next section, we present the related works. In Section 3, the framework of the regularized diffusion process is briefly reviewed. The details of our method are provided and analyzed in Section 4. In Section 5, experiments on some well-known databases are carried out to illustrate the effectiveness of our method. Finally, we summarize the conclusions and future work.

2. Related Works

Compared with object-matching problems that have been studied for a long time [21,22], contextual similarities have only attracted significant attention in recent years. The PageRank (PR) system of Google [23], derived from the random walk model, is one of the most well-known and successful retrieval methods and is used to rank webpages according to the interests of visitors. Using modified random walk models, several methods have been further proposed based on the idea of PageRank, e.g., global PageRank and personalized PageRank. Zhou et al. [24] presented a ranking method that considers the data manifold.

Label prorogation (LP) was introduced by Bai et al. [25] for learning contextual similarities, and the contextual information on the manifold was utilized to improve retrieval results. This strategy originates from the field of semi-supervised learning, which was first employed to solve the problem of image retrieval in [26]. With the label of the query object fixed, the information from the query element propagates to the unlabeled elements iteratively. Without explicitly selecting the shortest paths, LP suffers from drawbacks such as redundant contextual information and noisy elements. For this reason, Wang et al. [27] developed the shortest path propagation (SSP) method, which captures relevant objects on the shortest paths explicitly. A co-transduction approach for retrieval was proposed by Bai et al. [28], and retrieval performance was significantly improved by fusing different distance measures through a semi-supervised learning framework.

To improve retrieval scores, a novel method based on the locally constrained diffusion process (LCDP) on a KNN graph was proposed in [29]. Kontschieder et al. [30] used a modified mutual KNN graph to process pairwise similarities, and an efficient clustering method was further described based on the proposed approach. A meta-shape similarity focusing on local graph structures was proposed by Egozi et al. [31]. In this approach, each object is represented by a meta-descriptor obtained by building the KNN similarity graph, and the meta-similarity between two objects is efficiently computed using the

L_{1}

norm. Instead of propagating the similarity information on the original graph, Yang et al. [9] introduced the tensor product of the original graph, named the tensor product graph (TPG), and demonstrated that the proposed framework can be applied not only to shape recognition but also to image retrieval and image segmentation. Donoser and Bischof [14] revisited the diffusion processes on affinity graphs and introduced a generic framework for diffusion processes. Based on this framework, the approaches in this field can be summarized as specific instances.

Recently, a uniform framework named the regularized diffusion process was proposed in [16,20], and the essence of the TPG was explained theoretically. In [19], another variant, named the hybrid regularization diffusion process, was presented, revealing that the proposed model is closely related to the generalized mean first-passage time. In [32], the graph construction step was treated as a regularized function estimation problem, and an alternating diffusion process was proposed to learn the graph and unknown labels alternately. Using diffusion-based affinity learning and the absorbing Markov chain (AMC), a salient object segmentation framework was proposed in [33], which leverages an iterative diffusion process with the same computational complexity as the diffusion processes on the original graph. In [34], a novel manifold learning algorithm, named Rank Flow Embedding (RFE), was proposed based on recently developed manifold learning approaches, and the obtained context-sensitive embeddings were refined following a rank-based processing flow. In [35], with the contextual information encoded in the unlabeled data points, a rank-based model was presented for the task of weakly supervised classification.

3. Framework of Regularized Diffusion Process

Given

S = {1, 2, \dots, n}

as a collection of all samples, we have

w_{i j} \in [0, 1]

and

w_{i j} = w_{j i}

, where

w_{i j}

represents the connection weight between

i \in S

and

j \in S

. One can find that

W = {[w_{i j}]}_{n \times n}

is a symmetric matrix. With all the elements in S as vertices,

W = {w_{i j} : i \in S, j \in S}

can be treated as the edges between

i \in S

and

j \in S

, and

w_{i j} = 0

means that there is no edge connection between

i \in S

and

j \in S

. Then, an undirected graph

G

can be constructed with S and W. Moreover, unless otherwise noted, we assume that

G

is a strongly connected graph. A diagonal matrix

D

can be further defined, whose diagonal elements are

d_{i i} = \sum_{j = 1}^{n} w_{i j}

. In the following,

I

represents the identity matrix, and

r_{k}

denotes a k-dimensional vector, with each element being

r \in R

. For convenience,

1

is used as a replacement for

1_{k}

.

Let

f_{i k} \in [0, + \infty)

denote the contextual similarity between

k \in S

and

i \in S

, and the vector with contextual similarities can be expressed as

f_{k} = {[f_{1 k}, f_{2 k}, \dots, f_{n k}]}^{⊤}

. The matrix can be generated with n vectors as

F = [f_{1}, \dots, f_{n}]

. The objective function of the regularized diffusion process derived from

W

can be defined as follows:

L (f_{k}) = S (f_{k}) + μ F (f_{k}),

(1)

where

S (f_{k})

is a “smoothing constraint” (SC) and

F (f_{k})

is a “fitting constraint” (FC).

μ

is used to balance

S (f_{k})

and

F (f_{k})

. It is worth noting that

f_{i j}

can denote not only contextual similarity but also contextual dissimilarity, which is determined by the predefined values of the fitting constraint.

In Equation (1), the smoothing constraint is used to limit the relationship between different contextual similarities

f_{k}

, and the fitting constraint is used to describe the relationship between the contextual similarity

f_{k}

and a predetermined value. Considering the impacts of the smoothing constraint and the fitting constraint, solving the optimization problem implies the process of implicitly capturing the manifold structure of the sample space.

To achieve desirable retrieval performances, various types of smooth constraints and fitting constraints have been investigated in recent decades. For the smooth constraint, the “dual smooth constraint” (DSC)

S_{D} (f_{k})

and the “quaternary smooth constraint” (QSC)

S_{Q} (F)

are two typical forms. For the fitting constraint, two typical forms are the “binary fitting constraint” (BFC)

F_{B} (f_{k})

and the “weighted fitting constraint” (WFC)

F_{W} (f_{k})

. Different types of smoothing constraints and fitting constraints are reviewed in Section 3.1 and Section 3.2, respectively.

3.1. Smoothness Constraint

The dual smooth constraint

S_{D} (f_{k})

can be obtained as follows:

S_{D} (f_{k}) = \frac{1}{2} \sum_{i, j \in S} w_{i j} {(f_{i k} - f_{j k})}^{2} = \frac{1}{2} f_{k}^{⊤} (D - W) f_{k} = \frac{1}{2} f_{k}^{⊤} L f_{k},

(2)

where

L

is the Laplace matrix obtained from

W

. One can find that the connection weight

w_{i j}

between two samples is used to restrict the difference between the contextual similarities

f_{i k}

and

f_{j k}

. This means that the larger

w_{i j}

, the closer the contextual similarities

f_{i k}

and

f_{j k}

are to each other. Considering the normalization,

S_{D}^{nrm} (f_{k})

can be obtained as follows:

S_{D}^{nrm} (f_{k}) = \frac{1}{2} \sum_{i, j \in S} w_{i j} {(\frac{f_{i k}}{\sqrt{d_{i i}}} - \frac{f_{j k}}{\sqrt{d_{j j}}})}^{2} = \frac{1}{2} f_{k}^{⊤} (I - D^{- \frac{1}{2}} W D^{- \frac{1}{2}}) f_{k} = \frac{1}{2} f_{k}^{⊤} L^{nrm} f_{k},

(3)

where

L^{nrm}

is the Laplace matrix generated by

D^{- \frac{1}{2}} W D^{- \frac{1}{2}}

. From Equations (2) and (3), it is easy to find that the relationship between

f_{i k}

and

f_{j k}

is highly dependent on

w_{i j}

.

The quaternary smooth constraint

S_{Q} (F)

can be expressed as follows:

S_{Q} (F) = \frac{1}{2} \sum_{i, j, k, l \in S} w_{i j} w_{k l} {(f_{k i} - f_{l j})}^{2} = \frac{1}{2} {\vec{F}}^{⊤} (D - W) \vec{F} = \frac{1}{2} {\vec{F}}^{⊤} L \vec{F},

(4)

where

\vec{F} \in R^{n^{2} \times 1}

is obtained by stacking the columns of

F

one after the other into a column vector, and

L

is the Laplace matrix obtained from

W

.

D = D \otimes D \in R^{n^{2} \times n^{2}}

,

W = W \otimes W \in R^{n^{2} \times n^{2}}

, and ⊗ denotes the tensor product. According to Equation (4), one can find that if

w_{i j}

and

w_{k l}

are both large, the learned

f_{k i}

and

f_{l j}

should be similar. Compared with the DSC, the QSC provides a more relaxed condition, which largely avoids the influence of noise edges in the graph. After normalization,

S_{Q}^{nrm} (F)

can be obtained as follows:

\begin{matrix} S_{Q}^{nrm} (F) & = \frac{1}{2} \sum_{i, j, k, l \in S} w_{i j} w_{k l} {(\frac{f_{k i}}{\sqrt{d_{i i} d_{k k}}} - \frac{f_{l j}}{\sqrt{d_{j j} d_{l l}}})}^{2} \\ = \frac{1}{2} {\vec{F}}^{⊤} (I - D^{- \frac{1}{2}} W D^{- \frac{1}{2}}) \vec{F} = \frac{1}{2} {\vec{F}}^{⊤} L^{nrm} \vec{F}, \end{matrix}

(5)

where

L^{nrm}

is the Laplace matrix derived from

D^{- \frac{1}{2}} W D^{- \frac{1}{2}}

.

3.2. Fitting Constraint

For the binary fitting constraint, the predefined similarity between element

k \in S

and itself is 1, while the similarities with all other elements are 0. Then, the binary fitting constraint

F_{B} (f_{k})

can be constructed as follows:

F_{B} (f_{k}) = \sum_{i \in S} {(f_{i k} - b_{i k})}^{2} = {∥f_{k} - b_{k}∥}_{2}^{2},

(6)

where

b_{k} = [b_{1 k}, \dots, b_{n k}]

,

b_{i k} = 1

for

i = k

; and

b_{i k} = 0

for

i \neq k

. It can be seen that there are only two predefined values, which cannot accurately reflect the different relationships.

For the weighted fitting constraint,

w_{k}

is selected as the predefined values, and

F_{W} (f_{k})

can be obtained based on the

L_{2}

norm as follows:

F_{W} (f_{k}) = \sum_{i \in S} {(f_{i k} - w_{i k})}^{2} = {∥f_{k} - w_{k}∥}_{2}^{2} .

(7)

After normalization,

F_{W} (f_{k})

can be obtained as follows:

F_{W} (f_{k}) = \sum_{i \in S} {(f_{i k} - w_{i k})}^{2} = {∥f_{k} - D^{- 1} w_{k}∥}_{2}^{2} .

(8)

As can be seen, the predefined values in Equations (7) and (8) are much more reasonable than those in the BFC; therefore, the underlying manifold structure can be effectively captured by the learned contextual similarities.

4. The Proposed Method

In this section, the relevant notations are given in a preliminary summary in Section 4.1. In Section 4.2 and Section 4.3, the improvements are introduced and analyzed from the perspectives of the smoothness constraint and the fitting constraint, respectively. Finally, the solution of the proposed model is provided in Section 4.4.

4.1. Preliminary

It is assumed that S consists of two subsets, L and U, where

L \subset S

and

U \subset S

contain l labeled objects and u unlabeled ones, respectively. We further divide U into two subsets: A with a objects and B with b objects, where

A = {i : \sum_{k \in L} w_{i k} > 0, i \in U}

and

B = {j : \sum_{k \in L} w_{j k} = 0, j \in U}

. According to the above definitions, we have

l + u = n

and

a + b = u

. For the convenience of discussion, all the samples are rearranged so that unlabeled states are placed before labeled ones, and the states in A are listed before those in B. Then, we have

U = {1, \dots, u}

,

L = {u + 1, \dots, n}

,

A = {1, \dots, a}

, and

B = {a + 1, \dots, u}

. Then,

W

can be partitioned into blocks as follows:

W = [\begin{matrix} W_{U U} & W_{U L} \\ W_{L U} & W_{L L} \end{matrix}],

(9)

where

W_{U U} = {[w_{i i^{'}}]}_{u \times u}

,

W_{U L} = {[w_{i j^{'}}]}_{u \times l}

,

W_{L U} = {[w_{j i^{'}}]}_{l \times u}

, and

W_{L L} = {[w_{j j^{'}}]}_{l \times l}

. The diagonal matrix

D

can be partitioned into blocks as follows:

D = [\begin{matrix} D_{U} & 0 \\ 0 & D_{L} \end{matrix}],

(10)

where

D_{U}

and

D_{L}

are a

u \times u

diagonal matrix and an

l \times l

diagonal matrix, respectively.

D_{U}

can also be partitioned into blocks as follows:

D_{U} = [\begin{matrix} D_{A} & 0 \\ 0 & D_{B} \end{matrix}],

(11)

where

D_{A}

and

D_{B}

are an

a \times a

diagonal matrix and a

b \times b

diagonal matrix, respectively. We further define two

u \times u

diagonal matrices,

E_{U}

and

H_{U}

, where the diagonal elements of

E_{U}

and

H_{U}

are

e_{i i} = \sum_{j = u + 1}^{n} w_{i j}

and

h_{i i} = \sum_{j = 1}^{u} w_{i j}

.

E_{U}

can be partitioned into blocks as follows:

E_{U} = [\begin{matrix} E_{A} & 0 \\ 0 & 0 \end{matrix}],

(12)

where

E_{A}

denotes an

a \times a

diagonal matrix, and we have

D_{U} = H_{U} + E_{U}

.

As discussed in Section 3,

f_{i j}

denotes the contextual similarity between two samples, which can be treated as a “one-to-one” contextual similarity. Referring to the definition of

f_{i j}

, we introduce

g_{k L} \in [0, + \infty)

as the “one-to-many” contextual similarity and have

g_{L} = {[g_{1 L}, g_{2 L}, \dots, g_{n L}]}^{⊤}

, where

k \in U

and

L \subset S

. For

\forall k \in L

,

g_{L}

is the maximum value in the range for the contextual similarity, and the minimum value in the range for the contextual dissimilarity.

Then, only

g_{U L} = {[g_{1 L}, \dots, g_{u L}]}^{⊤}

is needed to be learned, and the objective function based on

g_{L}

can be obtained as follows:

L (g_{U L}) = S (g_{U L}) + μ F (g_{U L}),

(13)

where

S (g_{U L})

and

F (g_{U L})

are the smoothing constraint and the fitting constraint, and

μ

is the hyperparameter used to balance them.

One can find that the “one-to-one” contextual similarity

f_{i j}

is actually a special case of the “one-to-many” contextual similarity

g_{k L}

. With L containing only one element n, the “one-to-many” contextual similarity

g_{k L}

degrades into the “one-to-many” contextual similarity

f_{k n}

.

4.2. Triple-Cosine Smoothness Constraint

Given two unlabeled samples

i \in U

and

j \in U

, and

L \subset S

, a coordinate system

(I, J, L)

can be generated by taking the pairwise similarity as the cosine value of the angle between the coordinate axes. Specifically, as shown in Figure 1, with

O

as the origin, the coordinate system includes the

OI

axis, the

OJ

axis, and the

OL

axis, where

OI ⊥ OL

,

OJ ⊥ OL

, and the angle between

OI

and

OJ

is

θ_{i j} = arccos w_{i j}

,

w_{i j} \in [0, 1]

.

\vec{OM}

and

\vec{ON}

are the vectors on the planes IOL and JOL, respectively. The angle between

\vec{OM}

and

\vec{OL}

is

∠ MOL = θ_{i L} = arccos (\frac{1}{l} \sum_{k \in L} w_{i k})

, and the angle between

\vec{ON}

and

\vec{OL}

is

∠ MOL = θ_{i L} = arccos (\frac{1}{l} \sum_{k \in L} w_{i k})

. We have

|\vec{OM}| = 1

and

|\vec{ON}| = 1

.

According to the above definitions, the cosine value of

∠ IOJ

is determined by the connection between

i \in U

and

j \in U

. The cosine value of

∠ MOL

is calculated as the mean of all the connections between

i \in U

and each element from

L \subset S

, and

∠ NOL

is calculated as the mean of all the connections between

j \in U

and each element from

L \subset S

. The angle

∠ MON

between

\vec{OM}

and

\vec{ON}

is denoted as

α

, and the cosine value of

α

is referred to as the triple-cosine. As shown in Equation (A1), it can be expressed as follows:

cos α = \frac{1}{l^{2}} \sum_{k \in L} w_{i k} \sum_{k \in L} w_{j k} + w_{i j} \sqrt{1 - {(\frac{1}{l} \sum_{k \in L} w_{i k})}^{2}} \sqrt{1 - {(\frac{1}{l} \sum_{k \in L} w_{j k})}^{2}} .

(14)

According to Equation (14), one can find that during the process of calculating

cos α

, the high-order relationships between data points are implicitly considered. Then, the triple-cosine smoothness constraint

S_{C} (g_{U L})

can be defined as follows:

S_{C} (g_{U L}) = \frac{1}{2} \sum_{i, j = 1}^{u} cos α {∥g_{i L} - g_{j L}∥}_{2}^{2} .

(15)

As shown in Equation (A3),

cos α

is used to limit how close

g_{i L}

and

g_{j L}

are. Unlike

S_{D} (f_{k})

, which uses one connection weight

w_{i j}

to restrict contextual similarities,

cos α

is the result of multiple factors, including

w_{i j}

,

\sum_{k \in L} w_{i k}

, and

\sum_{k \in L} w_{j k}

. So,

w_{i j}

has an impact on

cos α

, but the value of

cos α

is not entirely determined by

w_{i j}

. For example, if

i \in U

and

j \in U

are very close but

w_{i j}

is incorrectly set to a small value,

S_{D} (f_{k})

cannot correctly reflect the relationship between contextual similarities. However, in Equation (14), even if the value of

w_{i j}

is small,

cos α

could still be a large value. This is due to the closeness of

i \in U

and

j \in U

; the values of

\sum_{k \in L} w_{i k}

and

\sum_{k \in L} w_{j k}

may be both very large. Therefore, one can find that the proposed smoothness constraint is insensitive to incorrect connections and noise edges and can be used to improve the process of learning contextual similarities.

According to Equations (15) and (A3), the proposed smoothness constraint can be obtained as follows:

S_{C} (g_{U L}) = \frac{1}{2} g_{U L}^{⊤} L_{*} g_{U L} = \frac{1}{2} g_{U L}^{⊤} (D_{*} - W_{*}) g_{U L},

(16)

where

W_{*} = {[w_{i j}^{*}]}_{u \times u}

;

D_{*}

, with

d_{i i}^{*} = \sum_{j = 1}^{u} w_{i j}^{*}

, denotes the diagonal matrix generated by

W_{*}

; and

L_{*}

is the Laplace matrix obtained based on

W_{*}

.

W_{*}

can be obtained as follows:

W_{*} = \frac{1}{l^{2}} W_{U L} 1_{l} 1_{l}^{⊤} W_{U L}^{⊤} - W_{U U} \circ {(P P^{⊤})}^{\circ \frac{1}{2}},

(17)

where

P = 1_{u} - \frac{1}{l} {(W_{U L} 1_{l})}^{\circ 2}

and ∘ denotes the Hadamard product. After normalization, we have the smoothness constraint as follows:

\begin{matrix} S_{C}^{nrm} (g_{U L}) & = \frac{1}{2} \sum_{i, j = 1}^{u} w_{i j}^{*} {(\frac{g_{i L}}{\sqrt{d_{i i}^{*}}} - \frac{g_{j L}}{\sqrt{d_{j j}^{*}}})}^{2} \\ = \frac{1}{2} g_{U L}^{⊤} (I - D_{*}^{- \frac{1}{2}} W_{*} D_{*}^{- \frac{1}{2}}) g_{U L} = \frac{1}{2} g_{U L}^{⊤} L_{*}^{nrm} g_{U L} \end{matrix},

(18)

where

L_{*}^{nrm}

is the Laplace matrix obtained from the normalized matrix

W_{*}

.

4.3. Hybrid Fitting Constraint Based on $L_{1}$ and $L_{2}$ Norms

Unlike the previous fitting constraints, which treat all the predefined values equally, two different types of restrictions are created according to the confidence of the predefined values. Specifically, based on whether

k \in U

is directly connected to the elements in L, the elements from U can be classified into two categories. Firstly, for

k \in U

with

\sum_{k \in L} w_{i k} > 0

, the predefined values can be clearly and precisely generated, and the squared

L_{2}

norm is constructed. On the other hand, for

k \in U

with

\sum_{k \in L} w_{i k} = 0

, we treat each one equally with a rough predefined value, and the

L_{1}

norm is selected. Then, we have the hybrid fitting constraint as follows:

\begin{matrix} F_{H} (g_{U L}) & = \frac{1}{2} \sum_{i = 1}^{a} e_{i i} {(g_{i L} - \frac{d_{i i}}{e_{i i}})}^{2} + \sum_{j = a + 1}^{u} d_{j j} |g_{j L} - ξ| \\ = \frac{1}{2} {∥E_{A}^{\frac{1}{2}} g_{A L} - D_{A} E_{A}^{- \frac{1}{2}} 1∥}_{2}^{2} + {∥D_{B} (g_{B L} - ξ_{b})∥}_{1} \end{matrix}

(19)

where

d_{i i}

and

d_{j j}

are the diagonal elements of

D

and

ξ > max (g_{U L})

denotes the rough predefined value.

g_{A L} = {[g_{1 L}, \dots, g_{a L}]}^{⊤}

,

g_{B L} = {[g_{a + 1 L}, \dots, g_{u L}]}^{⊤}

, and

g_{U L} = {[g_{A L}^{⊤}, g_{B L}^{⊤}]}^{⊤}

.

It is important to note that,

g_{k L}

in Equation (19) is a contextual dissimilarity rather than a contextual similarity. Specifically, if

k \in U

is close to

L \subset S

, the predefined value is set to a small value

\frac{d_{i i}}{e_{i i}} = \frac{\sum_{j = 1}^{n} w_{i j}}{\sum_{k = u + 1}^{n} w_{i k}}

. Meanwhile, if

k \in U

is far from

L \subset S

, the predefined value is set to a very large value

ξ

. Since

ξ > max (g_{U L})

,

F_{H} (g_{U L})

can be written as follows:

F_{H} (g_{U L}) = \frac{1}{2} {∥E_{A}^{\frac{1}{2}} g_{A L} - D_{A} E_{A}^{- \frac{1}{2}} 1_{a}∥}_{2}^{2} - {(D_{B} g_{B L})}^{⊤} 1 + c o n s t

(20)

where

c o n s t

denotes a constant.

4.4. Solution of Proposed Model

For convenience, we first combine

S_{D} (g_{U L})

and

F_{H} (g_{U L})

to construct the objective function

L_{DH} (g_{U L})

and solve the optimization problem.

According to the previous analysis,

S_{D} (g_{U L})

can be defined as follows:

\begin{matrix} S_{D} (g_{k L}) & = \frac{1}{2} \sum_{i, j = 1}^{u} w_{i j} {(g_{i L} - g_{j L})}^{2} \\ = \frac{1}{2} g_{k L}^{⊤} (H_{U} - W_{U U}) g_{k L} = \frac{1}{2} g_{k L}^{⊤} L_{U} g_{k L}, \end{matrix}

(21)

where

L_{U}

is the Laplace matrix obtained from

W_{U U}

. Then, we have

L_{DH} (g_{U L})

as follows:

\begin{matrix} L_{DH} (g_{U L}) & = S_{D} (g_{U L}) + μ F_{H} (g_{U L}) \\ = \frac{1}{2} g_{U L}^{⊤} L_{U} g_{U L} + \frac{1}{2} μ {∥E_{A}^{\frac{1}{2}} g_{A L} - D_{A} E_{A}^{- \frac{1}{2}} 1∥}_{2}^{2} - μ {(D_{B} g_{B L})}^{⊤} 1 + c o n s t . \end{matrix}

(22)

Since

g_{U L}^{*} = arg min_{g_{U L}} L_{DH} (g_{U L})

, by taking the partial derivative of

(g_{U L})

with

g_{U L}

, we have

\frac{\partial L_{DH}}{\partial g_{U L}} = (H_{U} - W_{U U} + μ E_{U}) g_{U L} - μ D_{U} 1 .

(23)

Then, by setting Equation (23) to zero,

g_{U L}^{*}

satisfies the following equation:

(H_{U} - W_{U} + μ E_{U}) g_{U L}^{*} = μ D_{U} 1 .

(24)

We further introduce a diagonal matrix

Λ = H_{U} + μ E_{U}

, and Equation (24) can be transformed as follows:

(I - Λ^{- 1} W_{U}) g_{U L}^{*} = μ Λ^{- 1} D_{U} 1 .

(25)

Since the graph

G

is strongly connected, similar to the proof in [19], it can be proven that

ρ (Λ^{- 1} W_{U}) < 1

, where

ρ (\cdot)

is the spectral radius. So,

I - Λ^{- 1} W_{U}

is invertible, and

g_{U L}^{*}

can be written in a closed-form solution as follows:

g_{U L}^{*} \propto {(I - Λ^{- 1} W_{U})}^{- 1} Λ^{- 1} D_{U} 1 .

(26)

For the problem of object retrieval and re-ranking, instead of the exact values of the learned contextual similarities, we actually care more about the result of their ranking, which can be directly calculated using Equation (26).

The closed-form solution in Equation (26) can be obtained based on the inverse matrix, and it is time-consuming and complicated. Since

ρ (Λ^{- 1} W_{U}) < 1

, we have

\sum_{m = 0}^{\infty} {(Λ^{- 1} W_{U})}^{m}

as the series expansion of

{(I - Λ^{- 1} W_{U})}^{- 1}

. Then, the iterative solution can be obtained by directly replacing

{(I - Λ^{- 1} W_{U})}^{- 1}

with

\sum_{m = 0}^{\infty} {(Λ^{- 1} W_{U})}^{m}

in Equation (26).

As discussed previously,

S_{C} (g_{U L})

and

F_{H} (g_{U L})

are selected as the smoothness constraint and the fitting constraint, and the proposed model is referred to as TH-RDP. The objective function can be obtained as follows:

\begin{matrix} L_{CH} (g_{U L}) & = S_{C} (g_{U L}) + μ F_{H} (g_{U L}) \\ = \frac{1}{2} g_{U L}^{⊤} L_{*} g_{U L} + \frac{1}{2} μ {∥E_{A}^{\frac{1}{2}} g_{A L} - D_{A} E_{A}^{- \frac{1}{2}} 1∥}_{2}^{2} - μ {(D_{B} g_{B L})}^{⊤} 1 + c o n s t . \end{matrix}

(27)

Similar to the solving process of Equation (22), the closed-form solution of

g_{U L}^{*}

can be written as follows:

g_{U L}^{*} \propto {(I - Λ_{*}^{- 1} W_{*})}^{- 1} Λ_{*}^{- 1} D_{U} 1,

(28)

where

Λ_{*} = D_{*} + μ E_{U}

, and

D_{*}

is the diagonal matrix with the diagonal element

d_{i i}^{*} = \sum_{j = 1}^{n} w_{i j}^{*}

. The iterative solution can be obtained by replacing

{(I - Λ_{*}^{- 1} W_{*})}^{- 1}

with the series expansion. Then, it is easy to find that the time complexity for the closed-form solution is

O (u^{3})

, and the time complexity for the iterative solution is

O (T \cdot u^{2})

, where T is the number of iterative steps.

4.5. Iterative Re-Ranking Based on Proposed Model

As discussed in [19], the most difficult problem facing the diffusion process is that, for the relevant object very far away from the query, it will take a long time to complete the information propagation. A feasible approach to address this problem is to introduce the RDP into an iterative re-ranking process. Specifically, in each iteration of this process, several top-ranked samples are selected and treated as the queries in the next iteration, together with those from the current iteration. In this way, the relevant objects can be increasingly selected and identified, and the underlying manifold structure can be gradually captured.

Moreover, in each iteration, only a few top-ranked objects are filtered out and added to the set of queries, so the samples with long paths to the queries do not need to be taken into account. In iteration t, given

L^{t}

as the set of the labeled samples,

U^{t} = {k : \sum_{i \in L^{t}} w_{i j} w_{j k} > 0, k \notin L^{t}}

is generated, containing the set of unlabeled samples to be ranked in this iteration. During the iteration,

Δ^{t}

with

λ

top-ranked objects are selected by

g_{U L^{t}}

, and then

L^{t + 1} = L^{t} + Δ^{t}

is generated for the next iteration. The iterative re-ranking process based on TH-RDP is referred to as I-TH-RDP.

As shown in Section 3, the regularized diffusion process is carried out on the basis of the graph

G

, which is generated by

W

. In practice,

w_{i j}

is obtained from the pairwise measure, which can be either a distance or a similarity. If the baseline is a distance measure

d i s_{i j}

, the corresponding similarity

s i m_{i j}

can be obtained as follows:

s i m_{i j} = exp (- \frac{d i s_{i j}^{2}}{σ_{i j}^{2}}),

(29)

where

σ_{i j}

can be determined by kNN [25]. To eliminate the impact of noisy

s i m_{i j}

,

W = {[w_{i j}]}_{n \times n}

is further constructed on the basis of reciprocal kNN. In Section 3, it is assumed that

G

is a strongly connected graph, which, in reality, does not always hold. On the contrary,

G

may be composed of multiple connected components. So, after all the elements from the connected components containing queries are re-ranked, to ensure the proposed method is available, one can create temporary connections by using the filtered similarities.

5. Experimental Results

To demonstrate the advantages of our methods, experiments are conducted in this section, and the performance of our methods is evaluated. It is worth noting that in the following experiments, each category of every selected database contains numerous samples, ranging from 20 to 100. This selection is based on two perspectives. On one hand, the underlying manifold structure needs to be supported by sufficient samples from each category so that the contextual information can be effectively reflected. On the other hand, it is highly probable that the underlying manifold structure is complicated with numerous samples for each class. In this way, the designed experimental tasks are very challenging, and the results can reasonably verify the effectiveness of different methods.

On the basis of an identical baseline, the proposed methods, TH-RDP and I-TH-RDP, are compared with the following state-of-the-art algorithms: LP [25], mkNN [30], TPG [9], GDP [14], and SCA [36]. To provide fair comparisons between our algorithms and the competitors, we run the implementations based on the available codes released by the authors.

In the following, the P–R curve is employed to analyze the performance of different methods, with the horizontal axis and vertical axis representing precision and recall, respectively. The precision and recall values are defined as follows:

Precision = \frac{T P}{T P + F P},

(30)

and

Recall = \frac{T P}{T P + F N},

(31)

where

T P

denotes the data points of the same category as the query in the top K ranks,

F P

denotes those from different categories than the query in the top K ranks, and

F N

denotes those of the same category as the query but outside the top K ranks. In the P–R curve, we set

K = 1, 2, \dots, N_{Q}

, where

N_{Q}

is the number of data points from the same category. Significantly, when

K = N_{Q}

, the value of precision equals that of recall, and this value is the retrieval score, which is also employed to evaluate performance.

5.1. Tari-1000 Shape Dataset

The proposed methods are tested on the Tari-1000 dataset [37] in this subsection. In this database, 1000 shapes are divided into 50 categories, and each category contains 20 silhouettes. Figure 2 shows examples from each class in the dataset. The retrieval scores of the baselines, SC and IDSC, were 88.01% and 90.43%. IDSC performed better than SC on this database because of the many articulation changes within each class.

Table 1 shows the retrieval scores obtained by the proposed methods and other approaches, while the P–R curves for these methods are plotted in Figure 3. With IDSC as the baseline, the proposed methods achieved near-perfect retrieval scores: 99.73% for TH-RDP and 99.69% for I-TH-RDP. With SC as the baseline, our approaches achieved 97.55% for TH-RDP and 97.48% for I-TH-RDP, improving the original scores by over 9%. Specifically, as observed in Figure 3a,b, previous methods, including SCA [36], TPG [9], and LP [25], exhibited similar performance to the proposed approaches but demonstrated lower precision at high ranges of recall. The retrieval scores in Table 1 and the P–R curves in Figure 3 clearly demonstrate that the proposed methods perform better than the other methods based on the same baseline results.

5.2. Animal Shape Dataset

The Animal dataset, which contains 2000 manually extracted shapes across 20 categories, was first introduced in [40]. In this database, each class provides a larger number of images, resulting in greater variation within a single category [40,41]. This dataset was previously used for classification experiments [41]; however, in this subsection, it is selected for testing retrieval performances. The baseline retrieval scores on the Animal dataset were 38.38% for SC and 40.50% for IDSC.

Table 2 lists the re-ranking results of different methods, while the P–R curves of our approaches and the competitors are plotted in Figure 4. With IDSC as the baseline, the proposed methods clearly provided the best curve performances across ranges of recall, as well as the highest retrieval scores: 57.52% for TH-RDP and 57.84% for I-TH-RDP. Meanwhile, with SC as the baseline measure, TH-RDP and I-TH-RDP outperformed the other methods across most ranges of the P–R curves, achieving retrieval scores of 54.88% and 55.53%, respectively. Overall, the P–R curves and retrieval scores clearly demonstrate that the proposed methods deliver better results and outperform the other methods, and the baseline methods are improved significantly.

5.3. LEAF Dataset from UCI

Another real-world leaf (LEAF) dataset from the UCI datasets is introduced in this subsection. This dataset, as reported in [42], comprises 40 different plant species. A subset of 30 species numbered from 1 to 15 and from 22 to 36 has been constructed, which exhibits simple leaves according to the description of this dataset. Binary images of leaves from the selected species have also been released, together with the original images, and Figure 5 shows some examples of the color images from this database. For the previous databases in this paper, each class provided an identical number of samples. However, the introduced dataset is imbalanced, and the number of leaves for each class from the subset ranges from 8 to 16. So, given the query objects from various species, we set different values of

N_{Q}

to evaluate the retrieval scores. We obtained retrieval scores of 66.10% for SC and 65.51% for IDSC.

The retrieval performances based on different methods are listed in Table 3. As shown in Table 3, with IDSC as the baseline, we achieved retrieval results of 74.06% for TH-RDP and 74.19% for I-TH-RDP, respectively. On the other hand, with SC as the baseline, the proposed approaches yielded retrieval scores of 74.44% for TH-RDP and 74.92% for I-TH-RDP. The retrieval results in Table 3 demonstrate that the proposed methods achieved significant improvements, even though the shape dataset is unbalanced. Our approaches performed slightly better than state-of-the-art methods. Moreover, as can be seen in Table 3, our methods boosted the retrieval scores of SC and IDSC by over 8%. The baseline results were significantly improved by the proposed methods.

5.4. ORL Face Dataset

In this subsection, an experiment on the ORL face database, downloaded from the website of AT&T Laboratories Cambridge, is carried out. The ORL face database is a well-known dataset widely used for face recognition that contains 400 grayscale face images across 40 different categories. The images from this database were obtained at different times, with varying lighting, facial expressions, and facial details. Similar to [14], with each image downsampled and normalized to zero mean and unit variance, the vectorized representation (VR) was taken as the feature, and the

L_{2}

norm was selected as the input pairwise distance. Instead of using the retrieval window ranging in size from 11 to 20 [14,16], we used the retrieval score as the evaluation metric corresponding to a size 10 retrieval window, which is a more rigorous measure. The retrieval score of this baseline method was 57.13%.

The results of different methods are listed in Table 4, and the corresponding P–R curves are plotted in Figure 6. As shown in Table 4, with the VR as the baseline method, TH-RDP and I-TH-RDP achieved retrieval scores of 74.42% and 74.08%, respectively. Therefore, the baseline result was greatly and effectively improved by our approaches, and the original retrieval score was improved by around 14%. It can be seen in Figure 6 that the P–R curves of our methods achieved the second- and third-best re-ranking results on this dataset.

6. Conclusions

In this paper, with the high-order information taken into account, we proposed a novel variant of the regularized diffusion process with the triple-cosine smoothness constraint and the hybrid fitting constraint. The triple-cosine smoothness constraint is constructed using the cosine of the angle in the coordinate system, which is created based on a group with three elements. For the hybrid fitting constraint, the squared

L_{2}

norm and the

L_{1}

norm can be adaptively generated according to the predefined values. The closed-form solution and the iterative solution of the proposed model are given. Moreover, for the proposed model, since the learned contextual dissimilarities can be used to represent “one-to-many” relationships, an iterative re-ranking process based on the proposed model is further presented. The experimental results on different databases demonstrate that the proposed methods can significantly improve the baselines.

Some issues still need to be addressed in the future. The effectiveness of the proposed method largely depends on the weighted graph, which is generated based on noise filtering for the baselines. So, how to effectively eliminate noise edges will be investigated further. Moreover, the time complexity for learning contextual similarities is about

O (N^{3})

, and reducing it to meet the requirements of real-time applications is worth studying. Finally, with the rapid development of deep learning, deep features have been increasingly used to analyze pairwise (dis)similarities, and desirable results can often be achieved [2]. In this case, improving the retrieval results obtained with deep features is a task that should be continuously considered and studied in the future.

Author Contributions

Conceptualization, M.D. and J.C.; Methodology, M.D.; Software, M.D.; Validation, M.D.; Formal analysis, M.D.; Writing—original draft, M.D.; Writing—review and editing, J.C.; Supervision, J.C.; Project administration, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Derivation Process of Equation (A3)

As can be seen in Figure A1,

\vec{{OM}^{'}}

and

\vec{{ON}^{'}}

are the projections of

\vec{OM}

and

\vec{ON}

respectively, on the plane

IOJ

. For

\vec{OM}

,

\vec{{OM}^{'}}

, and

\vec{M^{'} M}

, it satisfies

\vec{OM} = \vec{{OM}^{'}} + \vec{M^{'} M}

. And for

\vec{ON}

,

\vec{{ON}^{'}}

, and

\vec{N^{'} N}

, it satisfies

\vec{ON} = \vec{{ON}^{'}} + \vec{N^{'} N}

. Since

|\vec{OM}| = 1

and

∠ MOL = θ_{i L} = arccos (\frac{1}{l} \sum_{k \in L} w_{i k})

, we have

|\vec{M^{'} M}| = \frac{1}{l} \sum_{k \in L} w_{i k}

and

|\vec{{OM}^{'}}| = \sqrt{1 - {(\frac{1}{l} \sum_{k \in L} w_{i k})}^{2}}

. Similarly,

|\vec{N^{'} N}| = \frac{1}{l} \sum_{k \in L} w_{j k}

and

|\vec{{ON}^{'}}| = \sqrt{1 - {(\frac{1}{l} \sum_{k \in L} w_{j k})}^{2}}

. For

▵ M^{'} {ON}^{'}

, we derive the following:

\begin{matrix} {|\vec{N^{'} M^{'}}|}^{2} & = {|\vec{{OM}^{'}}|}^{2} + {|\vec{{ON}^{'}}|}^{2} - 2 cos θ_{i j} |\vec{{OM}^{'}}| |\vec{{ON}^{'}}| \\ = 2 - {(\frac{1}{l} \sum_{k \in L} w_{i k})}^{2} - {(\frac{1}{l} \sum_{k \in L} w_{j k})}^{2} - 2 w_{i j} \sqrt{1 - {(\frac{1}{l} \sum_{k \in L} w_{i k})}^{2}} \sqrt{1 - {(\frac{1}{l} \sum_{k \in L} w_{j k})}^{2}} \end{matrix} .

(A1)

Figure A1. Calculation of the cosine of angle

α

in the coordinate system.

Figure A1. Calculation of the cosine of angle

α

in the coordinate system.

With

N

as the starting point, draw a vector

\vec{NP}

with the same magnitude and direction as

\vec{N^{'} M^{'}}

, and we have

M^{'} M ⊥ NP

. Since

\vec{PM} = \vec{M^{'} M} - \vec{N^{'} N}

, we have

{|\vec{PM}|}^{2} = {(\frac{1}{l} \sum_{k \in L} w_{i k} - \frac{1}{l} \sum_{k \in L} w_{j k})}^{2}

. Since

▵ NPM

is a right triangle, we have

{|\vec{NM}|}^{2} = {|\vec{PM}|}^{2} + {|\vec{NP}|}^{2} = {|\vec{PM}|}^{2} + {|\vec{N^{'} M^{'}}|}^{2}

. For

▵ MON

, the following relationship holds:

{|\vec{NM}|}^{2} = {|\vec{OM}|}^{2} + {|\vec{ON}|}^{2} - 2 cos α |\vec{OM}| |\vec{ON}| .

(A2)

cos α

can be finally obtained as follows:

\begin{matrix} cos α & = \frac{{|\vec{OM}|}^{2} + {|\vec{ON}|}^{2} - {|\vec{NM}|}^{2}}{2 |\vec{OM}| |\vec{ON}|} \\ = \frac{1}{l^{2}} \sum_{k \in L} w_{i k} \sum_{k \in L} w_{j k} + w_{i j} \sqrt{1 - {(\frac{1}{l} \sum_{k \in L} w_{i k})}^{2}} \sqrt{1 - {(\frac{1}{l} \sum_{k \in L} w_{j k})}^{2}} \end{matrix} .

(A3)

References

Croitoru, F.A.; Hondru, V.; Ionescu, R.T.; Shah, M. Diffusion Models in Vision: A Survey. IEEE Trans. Patt. Anal. Mach. Intell. 2023, 45, 10850–10869. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; Liu, Y.; Wang, W.; Bakker, E.M.; Georgiou, T.; Fieguth, P.; Liu, L.; Lew, M.S. Deep Learning for Instance Retrieval: A Survey. IEEE Trans. Patt. Anal. Mach. Intell. 2023, 45, 7270–7292. [Google Scholar] [CrossRef] [PubMed]
Leticio, G.R.; Kawai, V.S.; Valem, L.P.; Pedronette, D.C.G.; da S. Torres, R. Manifold information through neighbor embedding projection for image retrieval. Pattern Recogni. Lett. 2024, 183, 17–25. [Google Scholar] [CrossRef]
Guo, Q.; Feng, W.; Gao, R.; Liu, Y.; Wang, S. Exploring the Effects of Blur and Deblurring to Visual Object Tracking. IEEE Trans. Image Process. 2021, 30, 1812–1824. [Google Scholar] [CrossRef]
Chi, R.; Zhang, H.; Huang, B.; Hou, Z. Quantitative Data-Driven Adaptive Iterative Learning Control: From Trajectory Tracking to Point-to-Point Tracking. IEEE Trans. Cybern. 2022, 52, 4859–4873. [Google Scholar] [CrossRef] [PubMed]
Bicego, M.; Lovato, P. A bioinformatics approach to 2D shape classification. Comput. Vis. Image Underst. 2015, 145, 59–69. [Google Scholar] [CrossRef]
Schmidt, A.; Mohareri, O.; DiMaio, S.; Yip, M.C.; Salcudean, S.E. Tracking and mapping in medical computer vision: A review. Med. Image Anal. 2024, 94, 103131. [Google Scholar] [CrossRef]
Pereira-Ferrero, V.; Lewis, T.; Valem, L.; Ferrero, L.; Pedronette, D.; Latecki, L. Unsupervised affinity learning based on manifold analysis for image retrieval: A survey. Comput. Sci. Rev. 2024, 53, 100657. [Google Scholar] [CrossRef]
Yang, X.; Prasad, L.; Latecki, L.J. Affinity learning with diffusion on tensor product graph. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 28–38. [Google Scholar] [CrossRef]
Yang, X.; Bai, X.; Latecki, L.J. Improving Shape Retrieval by Learning Graph Transduction. In Proceedings of the European Conference on Computer Vision, Marseille, France, 12–18 October 2008; Volume 5305, pp. 788–801. [Google Scholar]
Pedronette, D.C.G.; Penatti, O.A.B.; Torres, R.d.S. Unsupervised manifold learning using Reciprocal kNN Graphs in image re-ranking and rank aggregation tasks. Image Vis. Comput. 2014, 32, 120–130. [Google Scholar] [CrossRef]
Bai, X.; Bai, S.; Wang, X. Beyond diffusion process: Neighbor set similarity for fast re-ranking. Inf. Sci. 2015, 325, 342–354. [Google Scholar] [CrossRef]
Bai, S.; Sun, S.; Bai, X.; Zhang, Z.; Tian, Q. Smooth Neighborhood Structure Mining on Multiple Affinity Graphs with Applications to Context-Sensitive Similarity. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 592–608. [Google Scholar]
Donoser, M.; Bischof, H. Diffusion Processes for Retrieval Revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1320–1327. [Google Scholar]
Luo, L.; Shen, C.; Zhang, C.; Hengel, A.v.d. Shape Similarity Analysis by Self-Tuning Locally Constrained Mixed-Diffusion. IEEE Trans. Multimed. 2013, 15, 1174–1183. [Google Scholar] [CrossRef]
Bai, S.; Bai, X.; Tian, Q.; Latecki, L.J. Regularized Diffusion Process for Visual Retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 3967–3973. [Google Scholar]
Zheng, D.; Liu, W.; Han, M. Learning contextual dissimilarity on tensor product graph for visual re-ranking. Image Vis. Comput. 2018, 79, 1–10. [Google Scholar] [CrossRef]
Bai, S.; Zhou, Z.; Wang, J.; Bai, X.; Latecki, L.J.; Tian, Q. Automatic Ensemble Diffusion for 3D Shape and Image Retrieval. IEEE Trans. Image Process. 2018, 28, 88–101. [Google Scholar] [CrossRef]
Zheng, D.; Fan, J.; Han, M. Hybrid Regularization of Diffusion Process for Visual Re-Ranking. IEEE Trans. Image Process. 2021, 30, 3705–3719. [Google Scholar] [CrossRef] [PubMed]
Bai, S.; Bai, X.; Tian, Q.; Latecki, L.J. Regularized Diffusion Process on Bidirectional Context for Object Retrieval. IEEE Trans. Patt. Anal. Mach. Intell. 2019, 41, 1213–1226. [Google Scholar] [CrossRef]
Ma, J.; Xu, F.; Rong, X. Discriminative multi-label feature selection with adaptive graph diffusion. Pattern Recognit. 2024, 148, 110154. [Google Scholar] [CrossRef]
Zhang, Z.; Wang, J.; Zhu, L.; Luo, Y.; Lu, G. Deep collaborative graph hashing for discriminative image retrieval. Pattern Recognit. 2023, 139, 109462. [Google Scholar] [CrossRef]
Page, L.; Brin, S.; Motwani, R.; Winograd, T. The PageRank citation ranking: Bringing order to the web. In Technical Report; Stanford InfoLab Publication Server: Brooklyn, NY, USA, 1999. [Google Scholar]
Zhou, D.; Bousquet, O.; Lal, T.N.; Weston, J.; Olkopf, B.S. Learning with Local and Global Consistency. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 13–18 December 2004; pp. 321–328. [Google Scholar]
Bai, X.; Yang, X.; Latecki, L.J.; Liu, W.; Tu, Z. Learning Context-Sensitive Shape Similarity by Graph Transduction. IEEE Trans. Patt. Anal. Mach. Intell. 2010, 32, 861–874. [Google Scholar]
Zhu, X. Semi-Supervised Learning with Graphs. Ph.D. Thesis, Carnegie Mellon University, Language Technologies Institute, School of Computer Science, Pittsburgh, PA, USA, 2005. [Google Scholar]
Wang, J.; Li, Y.; Bai, X.; Zhang, Y.; Wang, C.; Tang, N. Learning context-sensitive similarity by shortest path propagation. Pattern Recognit. 2011, 44, 2367–2374. [Google Scholar] [CrossRef]
Bai, X.; Wang, B.; Yao, C.; Liu, W.; Tu, Z. Co-Transduction for Shape Retrieval. IEEE Trans. Image Process. 2012, 21, 2747–2757. [Google Scholar] [PubMed]
Yang, X.; Koknar-Tezel, S.; Latecki, L.J. Locally constrained diffusion process on locally densified distance spaces with applications to shape retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 357–364. [Google Scholar]
Kontschieder, P.; Donoser, M.; Bischof, H. Beyond Pairwise Shape Similarity Analysis. In Proceedings of the Asian Conference on Computer Vision, Xi’an, China, 23–27 September 2009; Volume 5996, pp. 655–666. [Google Scholar]
Egozi, A.; Keller, Y.; Guterman, H. Improving Shape Retrieval by Spectral Matching and Meta Similarity. IEEE Trans. Image Process. 2010, 19, 1319–1327. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; An, S.; Liu, W.; Li, L. Semisupervised Learning on Graphs with an Alternating Diffusion Process. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 2862–2874. [Google Scholar] [CrossRef] [PubMed]
Moradi, M.; Bayat, F. A salient object segmentation framework using diffusion-based affinity learning. Expert Syst. Appl. 2021, 168, 114428. [Google Scholar] [CrossRef]
Valem, L.P.; Pedronette, D.C.G.; Latecki, L.J. Rank Flow Embedding for Unsupervised and Semi-Supervised Manifold Learning. IEEE Trans. Image Process. 2023, 32, 2811–2826. [Google Scholar] [CrossRef]
Presotto, J.G.C.; Valem, L.P.; de Sa, N.G.; Pedronette, D.C.G.; Papa, J.P. Weakly supervised classification through manifold learning and rank-based contextual measures. Neurocomputing 2024, 589, 127717. [Google Scholar] [CrossRef]
Bai, S.; Bai, X. Sparse Contextual Activation for Efficient Visual Re-ranking. IEEE Trans. Image Process. 2016, 25, 1056–1069. [Google Scholar] [CrossRef]
Baseski, E.; Erdem, A.; Tari, S. Dissimilarity between two skeletal trees in a context. Pattern Recognit. 2009, 42, 370–385. [Google Scholar] [CrossRef]
Belongie, S.; Malik, J.; Puzicha, J. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 509–522. [Google Scholar] [CrossRef]
Ling, H.; Jacobs, D.W. Shape Classification Using the Inner-Distance. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 286–299. [Google Scholar] [CrossRef]
Bai, X.; Liu, W.; Tu, Z. Integrating contour and skeleton for shape classification. In Proceedings of the IEEE International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009; pp. 360–367. [Google Scholar]
Xiang, B.; Cong, R.; Xinggang, W. Shape Vocabulary: A Robust and Efficient Shape Representation for Shape Matching. IEEE Trans. Image Process. 2014, 23, 3935–3949. [Google Scholar]
Silva, P.F.B.; Marçal, A.R.S.; Silva, R.M.A.D. Evaluation of Features for Leaf Discrimination. In Springer Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7950, pp. 197–204. [Google Scholar]

Figure 1. The coordinate system used to generate the triple-cosine representation.

Figure 2. Sample images from the Tari-1000 dataset, showing one image from each category.

Figure 3. Comparison of results of different methods on the Tari-1000 dataset. (a,b) P–R curves, with SC and IDSC as the baseline measures.

Figure 4. Comparison of results for different methods on the Animal dataset. (a,b) P–R curves, with SC and IDSC as the baseline measures.

Figure 5. Sample images from the LEAF dataset.

Figure 6. Comparison of results between different methods on the ORL dataset with VR as the baseline.

Table 1. Retrieval scores on the Tari-1000 dataset. The top three results are marked in boldface and colored blue, green, and orange, respectively.

Baseline	Method	Retrieval Score
SC [38]	-	88.01%
	LP [25]	96.59%
	mkNN [30]	95.44%
	TPG [9]	97.25%
	GDP [14]	95.52%
	SCA [36]	96.97%
	TH-RDP	97.55%
	I-TH-RDP	97.48%
IDSC [39]	-	90.43%
	LP [25]	98.75%
	mkNN [30]	97.11%
	TPG [9]	99.43%
	GDP [14]	98.00%
	SCA [36]	99.34%
	TH-RDP	99.69%
	I-TH-RDP	99.73%

Table 2. Retrieval rates on the Animal dataset. The top three results are marked in boldface and colored blue, green, and orange, respectively.

Baseline	Method	Retrieval Score
SC [38]	-	38.38%
	LP [25]	45.44%
	mkNN [30]	47.09%
	TPG [9]	53.86%
	GDP [14]	46.27%
	SCA [36]	47.26%
	TH-RDP	54.88%
	I-TH-RDP	55.53%
IDSC [39]	-	40.50%
	LP [25]	45.65%
	mkNN [30]	49.53%
	TPG [9]	56.39%
	GDP [14]	50.09%
	SCA [36]	51.43%
	TH-RDP	57.52%
	I-TH-RDP	57.84%

Table 3. Retrieval rates on the LEAF dataset. The best results for each baseline are marked in boldface and colored blue, green, and orange, respectively.

Baseline	Method	Retrieval Score
SC [38]	-	66.10%
	LP [25]	74.39%
	mkNN [30]	71.29%
	TPG [9]	69.61%
	GDP [14]	67.60%
	SCA [36]	70.96%
	TH-RDP	74.44%
	I-TH-RDP	74.92%
IDSC [39]	-	65.51%
	LP [25]	73.40%
	mkNN [30]	73.70%
	TPG [9]	69.84%
	GDP [14]	65.23%
	SCA [36]	68.57%
	TH-RDP	74.06%
	I-TH-RDP	74.19%

Table 4. Retrieval scores on the ORL dataset. The top three results are marked in boldface and colored blue, green, and orange, respectively.

Baseline	Method	Retrieval Score
VR	-	57.13%
	LP [25]	75.68%
	mkNN [30]	69.15%
	TPG [9]	73.40%
	GDP [14]	71.17%
	SCA [36]	69.93%
	TH-RDP	74.42%
	I-TH-RDP	74.08%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Du, M.; Cai, J. Improving Hybrid Regularized Diffusion Processes with the Triple-Cosine Smoothness Constraint for Re-Ranking. Mathematics 2024, 12, 3082. https://doi.org/10.3390/math12193082

AMA Style

Du M, Cai J. Improving Hybrid Regularized Diffusion Processes with the Triple-Cosine Smoothness Constraint for Re-Ranking. Mathematics. 2024; 12(19):3082. https://doi.org/10.3390/math12193082

Chicago/Turabian Style

Du, Miao, and Jianfeng Cai. 2024. "Improving Hybrid Regularized Diffusion Processes with the Triple-Cosine Smoothness Constraint for Re-Ranking" Mathematics 12, no. 19: 3082. https://doi.org/10.3390/math12193082

APA Style

Du, M., & Cai, J. (2024). Improving Hybrid Regularized Diffusion Processes with the Triple-Cosine Smoothness Constraint for Re-Ranking. Mathematics, 12(19), 3082. https://doi.org/10.3390/math12193082

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improving Hybrid Regularized Diffusion Processes with the Triple-Cosine Smoothness Constraint for Re-Ranking

Abstract

1. Introduction

2. Related Works

3. Framework of Regularized Diffusion Process

3.1. Smoothness Constraint

3.2. Fitting Constraint

4. The Proposed Method

4.1. Preliminary

4.2. Triple-Cosine Smoothness Constraint

4.3. Hybrid Fitting Constraint Based on $L_{1}$ and $L_{2}$ Norms

4.4. Solution of Proposed Model

4.5. Iterative Re-Ranking Based on Proposed Model

5. Experimental Results

5.1. Tari-1000 Shape Dataset

5.2. Animal Shape Dataset

5.3. LEAF Dataset from UCI

5.4. ORL Face Dataset

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Derivation Process of Equation (A3)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Improving Hybrid Regularized Diffusion Processes with the Triple-Cosine Smoothness Constraint for Re-Ranking

Abstract

1. Introduction

2. Related Works

3. Framework of Regularized Diffusion Process

3.1. Smoothness Constraint

3.2. Fitting Constraint

4. The Proposed Method

4.1. Preliminary

4.2. Triple-Cosine Smoothness Constraint

4.3. Hybrid Fitting Constraint Based on L 1 and L 2 Norms

4.4. Solution of Proposed Model

4.5. Iterative Re-Ranking Based on Proposed Model

5. Experimental Results

5.1. Tari-1000 Shape Dataset

5.2. Animal Shape Dataset

5.3. LEAF Dataset from UCI

5.4. ORL Face Dataset

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Derivation Process of Equation (A3)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.3. Hybrid Fitting Constraint Based on $L_{1}$ and $L_{2}$ Norms