1. Introduction
Object retrieval is an important issue in computer vision and pattern recognition and is widely used in object recognition [
1,
2,
3], tracking [
4,
5], bioinformatics [
6], medical imaging [
7], etc. In general, given one or more queries, the most similar objects in a dataset are obtained based on a pairwise measure, and the retrieval results are presented in the form of ranking similarities or distances between the queries and the other objects. Conventionally, the goal of retrieval is achieved through only pairwise comparisons, with the basic premise that the more similar two elements are, the smaller the measured distance. Unfortunately, this idea suffers from the crucial limitation of ignoring the structure of the underlying data manifold, and the objects from the same category of the query element cannot be retrieved correctly [
8,
9].
To address this problem, a feasible idea is to analyze the manifold structure by considering all the distances between each pair of elements in the database. In the past two decades, more efforts have been focused on learning contextual similarities, and various methods have been developed to capture the geodesic paths on the manifold, either explicitly or implicitly [
10,
11,
12,
13]. Among these studies, a widely accepted approach is to focus on the performance of the baseline similarity diffusing on the graph, where the vertices represent the data points, and the edge between two vertices is weighted by their similarity; thus, the contextual similarity can be learned in an iterative manner. In the retrieval domain, this procedure is called the diffusion process [
14,
15]. Recently, a standard framework named the regularized diffusion process (RDP) was proposed in [
16], which consists of two constraints: the smoothness constraint (SC) and the fitting constraint (FC). From the perspective of this framework, many previous methods can be considered variants of the RDP, and the contextual (dis)similarities are obtained by solving an unconstrained quadratic optimization problem.
Since the RDP was proposed, many improvements have been made to this framework from different perspectives [
17,
18,
19]. With the tensor product graph as a replacement for the original graph, it has been verified that the high-order information of the graph has a great effect on improving retrieval results [
9,
20]. But there are still some issues that deserve attention: (1) how to construct a triplet-based relationship with high-order information within the RDP framework; (2) how to introduce “one-to-many” relationships into the RDP that contain high-order information; and (3) how to solve optimization problems to obtain closed-form or iterative solutions. In light of these issues, a novel variant of the RDP is proposed in this paper, named the
triple-cosine hybrid regularized diffusion process (TH-RDP). The proposed model differs from previous works in two aspects.
Firstly, a novel smoothness constraint named the triple-cosine smoothness constraint is proposed, which takes into account the high-order information of the original graph. We set up a group with three elements: the queries viewed as a whole and two other data points. A coordinate system is then built based on this group. Then, the triple-cosine smoothness constraint is generated using the cosine of the angle between the vectors in the coordinate system. A hybrid fitting constraint is introduced into this model, which consists of two types of terms: the squared norm and the norm. These two terms and the corresponding predefined values are adaptively determined by whether there exists at least one edge between the queries and the data points in the graph. Both the closed-form solution and the iterative solution are then deduced for the proposed model.
Secondly, the proposed model can be used to describe “one-to-many” relationships, whereas many other variants can only learn “one-to-many” contextual dissimilarities. This means that our model can be applied to problems with multiple queries, whereas many other models cannot. By taking advantage of “one-to-many” similarities, we further present an iterative re-ranking process based on the proposed model. In this iterative re-ranking process, the relevant objects along the paths are gradually searched from near to far, and the data manifold structure can be explicitly captured.
The rest of this paper is organized as follows. In the next section, we present the related works. In
Section 3, the framework of the regularized diffusion process is briefly reviewed. The details of our method are provided and analyzed in
Section 4. In
Section 5, experiments on some well-known databases are carried out to illustrate the effectiveness of our method. Finally, we summarize the conclusions and future work.
2. Related Works
Compared with object-matching problems that have been studied for a long time [
21,
22], contextual similarities have only attracted significant attention in recent years. The PageRank (PR) system of Google [
23], derived from the random walk model, is one of the most well-known and successful retrieval methods and is used to rank webpages according to the interests of visitors. Using modified random walk models, several methods have been further proposed based on the idea of PageRank, e.g., global PageRank and personalized PageRank. Zhou et al. [
24] presented a ranking method that considers the data manifold.
Label prorogation (LP) was introduced by Bai et al. [
25] for learning contextual similarities, and the contextual information on the manifold was utilized to improve retrieval results. This strategy originates from the field of semi-supervised learning, which was first employed to solve the problem of image retrieval in [
26]. With the label of the query object fixed, the information from the query element propagates to the unlabeled elements iteratively. Without explicitly selecting the shortest paths, LP suffers from drawbacks such as redundant contextual information and noisy elements. For this reason, Wang et al. [
27] developed the shortest path propagation (SSP) method, which captures relevant objects on the shortest paths explicitly. A co-transduction approach for retrieval was proposed by Bai et al. [
28], and retrieval performance was significantly improved by fusing different distance measures through a semi-supervised learning framework.
To improve retrieval scores, a novel method based on the locally constrained diffusion process (LCDP) on a
KNN graph was proposed in [
29]. Kontschieder et al. [
30] used a modified mutual
KNN graph to process pairwise similarities, and an efficient clustering method was further described based on the proposed approach. A meta-shape similarity focusing on local graph structures was proposed by Egozi et al. [
31]. In this approach, each object is represented by a meta-descriptor obtained by building the
KNN similarity graph, and the meta-similarity between two objects is efficiently computed using the
norm. Instead of propagating the similarity information on the original graph, Yang et al. [
9] introduced the tensor product of the original graph, named the tensor product graph (TPG), and demonstrated that the proposed framework can be applied not only to shape recognition but also to image retrieval and image segmentation. Donoser and Bischof [
14] revisited the diffusion processes on affinity graphs and introduced a generic framework for diffusion processes. Based on this framework, the approaches in this field can be summarized as specific instances.
Recently, a uniform framework named the regularized diffusion process was proposed in [
16,
20], and the essence of the TPG was explained theoretically. In [
19], another variant, named the hybrid regularization diffusion process, was presented, revealing that the proposed model is closely related to the generalized mean first-passage time. In [
32], the graph construction step was treated as a regularized function estimation problem, and an alternating diffusion process was proposed to learn the graph and unknown labels alternately. Using diffusion-based affinity learning and the absorbing Markov chain (AMC), a salient object segmentation framework was proposed in [
33], which leverages an iterative diffusion process with the same computational complexity as the diffusion processes on the original graph. In [
34], a novel manifold learning algorithm, named Rank Flow Embedding (RFE), was proposed based on recently developed manifold learning approaches, and the obtained context-sensitive embeddings were refined following a rank-based processing flow. In [
35], with the contextual information encoded in the unlabeled data points, a rank-based model was presented for the task of weakly supervised classification.
3. Framework of Regularized Diffusion Process
Given as a collection of all samples, we have and , where represents the connection weight between and . One can find that is a symmetric matrix. With all the elements in S as vertices, can be treated as the edges between and , and means that there is no edge connection between and . Then, an undirected graph can be constructed with S and W. Moreover, unless otherwise noted, we assume that is a strongly connected graph. A diagonal matrix can be further defined, whose diagonal elements are . In the following, represents the identity matrix, and denotes a k-dimensional vector, with each element being . For convenience, is used as a replacement for .
Let
denote the contextual similarity between
and
, and the vector with contextual similarities can be expressed as
. The matrix can be generated with
n vectors as
. The objective function of the regularized diffusion process derived from
can be defined as follows:
where
is a “smoothing constraint” (SC) and
is a “fitting constraint” (FC).
is used to balance
and
. It is worth noting that
can denote not only contextual similarity but also contextual dissimilarity, which is determined by the predefined values of the fitting constraint.
In Equation (
1), the smoothing constraint is used to limit the relationship between different contextual similarities
, and the fitting constraint is used to describe the relationship between the contextual similarity
and a predetermined value. Considering the impacts of the smoothing constraint and the fitting constraint, solving the optimization problem implies the process of implicitly capturing the manifold structure of the sample space.
To achieve desirable retrieval performances, various types of smooth constraints and fitting constraints have been investigated in recent decades. For the smooth constraint, the “dual smooth constraint” (DSC)
and the “quaternary smooth constraint” (QSC)
are two typical forms. For the fitting constraint, two typical forms are the “binary fitting constraint” (BFC)
and the “weighted fitting constraint” (WFC)
. Different types of smoothing constraints and fitting constraints are reviewed in
Section 3.1 and
Section 3.2, respectively.
3.1. Smoothness Constraint
The dual smooth constraint
can be obtained as follows:
where
is the Laplace matrix obtained from
. One can find that the connection weight
between two samples is used to restrict the difference between the contextual similarities
and
. This means that the larger
, the closer the contextual similarities
and
are to each other. Considering the normalization,
can be obtained as follows:
where
is the Laplace matrix generated by
. From Equations (
2) and (
3), it is easy to find that the relationship between
and
is highly dependent on
.
The quaternary smooth constraint
can be expressed as follows:
where
is obtained by stacking the columns of
one after the other into a column vector, and
is the Laplace matrix obtained from
.
,
, and ⊗ denotes the tensor product. According to Equation (
4), one can find that if
and
are both large, the learned
and
should be similar. Compared with the DSC, the QSC provides a more relaxed condition, which largely avoids the influence of noise edges in the graph. After normalization,
can be obtained as follows:
where
is the Laplace matrix derived from
.
3.2. Fitting Constraint
For the binary fitting constraint, the predefined similarity between element
and itself is 1, while the similarities with all other elements are 0. Then, the binary fitting constraint
can be constructed as follows:
where
,
for
; and
for
. It can be seen that there are only two predefined values, which cannot accurately reflect the different relationships.
For the weighted fitting constraint,
is selected as the predefined values, and
can be obtained based on the
norm as follows:
After normalization,
can be obtained as follows:
As can be seen, the predefined values in Equations (
7) and (
8) are much more reasonable than those in the BFC; therefore, the underlying manifold structure can be effectively captured by the learned contextual similarities.
4. The Proposed Method
In this section, the relevant notations are given in a preliminary summary in
Section 4.1. In
Section 4.2 and
Section 4.3, the improvements are introduced and analyzed from the perspectives of the smoothness constraint and the fitting constraint, respectively. Finally, the solution of the proposed model is provided in
Section 4.4.
4.1. Preliminary
It is assumed that
S consists of two subsets,
L and
U, where
and
contain
l labeled objects and
u unlabeled ones, respectively. We further divide
U into two subsets:
A with
a objects and
B with
b objects, where
and
. According to the above definitions, we have
and
. For the convenience of discussion, all the samples are rearranged so that unlabeled states are placed before labeled ones, and the states in
A are listed before those in
B. Then, we have
,
,
, and
. Then,
can be partitioned into blocks as follows:
where
,
,
, and
. The diagonal matrix
can be partitioned into blocks as follows:
where
and
are a
diagonal matrix and an
diagonal matrix, respectively.
can also be partitioned into blocks as follows:
where
and
are an
diagonal matrix and a
diagonal matrix, respectively. We further define two
diagonal matrices,
and
, where the diagonal elements of
and
are
and
.
can be partitioned into blocks as follows:
where
denotes an
diagonal matrix, and we have
.
As discussed in
Section 3,
denotes the contextual similarity between two samples, which can be treated as a “one-to-one” contextual similarity. Referring to the definition of
, we introduce
as the “one-to-many” contextual similarity and have
, where
and
. For
,
is the maximum value in the range for the contextual similarity, and the minimum value in the range for the contextual dissimilarity.
Then, only
is needed to be learned, and the objective function based on
can be obtained as follows:
where
and
are the smoothing constraint and the fitting constraint, and
is the hyperparameter used to balance them.
One can find that the “one-to-one” contextual similarity is actually a special case of the “one-to-many” contextual similarity . With L containing only one element n, the “one-to-many” contextual similarity degrades into the “one-to-many” contextual similarity .
4.2. Triple-Cosine Smoothness Constraint
Given two unlabeled samples
and
, and
, a coordinate system
can be generated by taking the pairwise similarity as the cosine value of the angle between the coordinate axes. Specifically, as shown in
Figure 1, with
as the origin, the coordinate system includes the
axis, the
axis, and the
axis, where
,
, and the angle between
and
is
,
.
and
are the vectors on the planes IOL and JOL, respectively. The angle between
and
is
, and the angle between
and
is
. We have
and
.
According to the above definitions, the cosine value of
is determined by the connection between
and
. The cosine value of
is calculated as the mean of all the connections between
and each element from
, and
is calculated as the mean of all the connections between
and each element from
. The angle
between
and
is denoted as
, and the cosine value of
is referred to as the
triple-cosine. As shown in Equation (
A1), it can be expressed as follows:
According to Equation (
14), one can find that during the process of calculating
, the high-order relationships between data points are implicitly considered. Then, the
triple-cosine smoothness constraint can be defined as follows:
As shown in Equation (
A3),
is used to limit how close
and
are. Unlike
, which uses one connection weight
to restrict contextual similarities,
is the result of multiple factors, including
,
, and
. So,
has an impact on
, but the value of
is not entirely determined by
. For example, if
and
are very close but
is incorrectly set to a small value,
cannot correctly reflect the relationship between contextual similarities. However, in Equation (
14), even if the value of
is small,
could still be a large value. This is due to the closeness of
and
; the values of
and
may be both very large. Therefore, one can find that the proposed smoothness constraint is insensitive to incorrect connections and noise edges and can be used to improve the process of learning contextual similarities.
According to Equations (
15) and (
A3), the proposed smoothness constraint can be obtained as follows:
where
;
, with
, denotes the diagonal matrix generated by
; and
is the Laplace matrix obtained based on
.
can be obtained as follows:
where
and ∘ denotes the Hadamard product. After normalization, we have the smoothness constraint as follows:
where
is the Laplace matrix obtained from the normalized matrix
.
4.3. Hybrid Fitting Constraint Based on and Norms
Unlike the previous fitting constraints, which treat all the predefined values equally, two different types of restrictions are created according to the confidence of the predefined values. Specifically, based on whether
is directly connected to the elements in
L, the elements from
U can be classified into two categories. Firstly, for
with
, the predefined values can be clearly and precisely generated, and the squared
norm is constructed. On the other hand, for
with
, we treat each one equally with a rough predefined value, and the
norm is selected. Then, we have the hybrid fitting constraint as follows:
where
and
are the diagonal elements of
and
denotes the rough predefined value.
,
, and
.
It is important to note that,
in Equation (
19) is a contextual dissimilarity rather than a contextual similarity. Specifically, if
is close to
, the predefined value is set to a small value
. Meanwhile, if
is far from
, the predefined value is set to a very large value
. Since
,
can be written as follows:
where
denotes a constant.
4.4. Solution of Proposed Model
For convenience, we first combine and to construct the objective function and solve the optimization problem.
According to the previous analysis,
can be defined as follows:
where
is the Laplace matrix obtained from
. Then, we have
as follows:
Since
, by taking the partial derivative of
with
, we have
Then, by setting Equation (
23) to zero,
satisfies the following equation:
We further introduce a diagonal matrix
, and Equation (
24) can be transformed as follows:
Since the graph
is strongly connected, similar to the proof in [
19], it can be proven that
, where
is the spectral radius. So,
is invertible, and
can be written in a closed-form solution as follows:
For the problem of object retrieval and re-ranking, instead of the exact values of the learned contextual similarities, we actually care more about the result of their ranking, which can be directly calculated using Equation (
26).
The closed-form solution in Equation (
26) can be obtained based on the inverse matrix, and it is time-consuming and complicated. Since
, we have
as the series expansion of
. Then, the iterative solution can be obtained by directly replacing
with
in Equation (
26).
As discussed previously,
and
are selected as the smoothness constraint and the fitting constraint, and the proposed model is referred to as TH-RDP. The objective function can be obtained as follows:
Similar to the solving process of Equation (
22), the closed-form solution of
can be written as follows:
where
, and
is the diagonal matrix with the diagonal element
. The iterative solution can be obtained by replacing
with the series expansion. Then, it is easy to find that the time complexity for the closed-form solution is
, and the time complexity for the iterative solution is
, where
T is the number of iterative steps.
4.5. Iterative Re-Ranking Based on Proposed Model
As discussed in [
19], the most difficult problem facing the diffusion process is that, for the relevant object very far away from the query, it will take a long time to complete the information propagation. A feasible approach to address this problem is to introduce the RDP into an iterative re-ranking process. Specifically, in each iteration of this process, several top-ranked samples are selected and treated as the queries in the next iteration, together with those from the current iteration. In this way, the relevant objects can be increasingly selected and identified, and the underlying manifold structure can be gradually captured.
Moreover, in each iteration, only a few top-ranked objects are filtered out and added to the set of queries, so the samples with long paths to the queries do not need to be taken into account. In iteration t, given as the set of the labeled samples, is generated, containing the set of unlabeled samples to be ranked in this iteration. During the iteration, with top-ranked objects are selected by , and then is generated for the next iteration. The iterative re-ranking process based on TH-RDP is referred to as I-TH-RDP.
As shown in
Section 3, the regularized diffusion process is carried out on the basis of the graph
, which is generated by
. In practice,
is obtained from the pairwise measure, which can be either a distance or a similarity. If the baseline is a distance measure
, the corresponding similarity
can be obtained as follows:
where
can be determined by kNN [
25]. To eliminate the impact of noisy
,
is further constructed on the basis of reciprocal kNN. In
Section 3, it is assumed that
is a strongly connected graph, which, in reality, does not always hold. On the contrary,
may be composed of multiple connected components. So, after all the elements from the connected components containing queries are re-ranked, to ensure the proposed method is available, one can create temporary connections by using the filtered similarities.
5. Experimental Results
To demonstrate the advantages of our methods, experiments are conducted in this section, and the performance of our methods is evaluated. It is worth noting that in the following experiments, each category of every selected database contains numerous samples, ranging from 20 to 100. This selection is based on two perspectives. On one hand, the underlying manifold structure needs to be supported by sufficient samples from each category so that the contextual information can be effectively reflected. On the other hand, it is highly probable that the underlying manifold structure is complicated with numerous samples for each class. In this way, the designed experimental tasks are very challenging, and the results can reasonably verify the effectiveness of different methods.
On the basis of an identical baseline, the proposed methods, TH-RDP and I-TH-RDP, are compared with the following state-of-the-art algorithms: LP [
25], mkNN [
30], TPG [
9], GDP [
14], and SCA [
36]. To provide fair comparisons between our algorithms and the competitors, we run the implementations based on the available codes released by the authors.
In the following, the P–R curve is employed to analyze the performance of different methods, with the horizontal axis and vertical axis representing
precision and
recall, respectively. The precision and recall values are defined as follows:
and
where
denotes the data points of the same category as the query in the top
K ranks,
denotes those from different categories than the query in the top
K ranks, and
denotes those of the same category as the query but outside the top
K ranks. In the P–R curve, we set
, where
is the number of data points from the same category. Significantly, when
, the value of precision equals that of recall, and this value is the
retrieval score, which is also employed to evaluate performance.
5.1. Tari-1000 Shape Dataset
The proposed methods are tested on the Tari-1000 dataset [
37] in this subsection. In this database, 1000 shapes are divided into 50 categories, and each category contains 20 silhouettes.
Figure 2 shows examples from each class in the dataset. The retrieval scores of the baselines, SC and IDSC, were 88.01% and 90.43%. IDSC performed better than SC on this database because of the many articulation changes within each class.
Table 1 shows the retrieval scores obtained by the proposed methods and other approaches, while the P–R curves for these methods are plotted in
Figure 3. With IDSC as the baseline, the proposed methods achieved near-perfect retrieval scores: 99.73% for TH-RDP and 99.69% for I-TH-RDP. With SC as the baseline, our approaches achieved 97.55% for TH-RDP and 97.48% for I-TH-RDP, improving the original scores by over 9%. Specifically, as observed in
Figure 3a,b, previous methods, including SCA [
36], TPG [
9], and LP [
25], exhibited similar performance to the proposed approaches but demonstrated lower precision at high ranges of recall. The retrieval scores in
Table 1 and the P–R curves in
Figure 3 clearly demonstrate that the proposed methods perform better than the other methods based on the same baseline results.
5.2. Animal Shape Dataset
The Animal dataset, which contains 2000 manually extracted shapes across 20 categories, was first introduced in [
40]. In this database, each class provides a larger number of images, resulting in greater variation within a single category [
40,
41]. This dataset was previously used for classification experiments [
41]; however, in this subsection, it is selected for testing retrieval performances. The baseline retrieval scores on the Animal dataset were 38.38% for SC and 40.50% for IDSC.
Table 2 lists the re-ranking results of different methods, while the P–R curves of our approaches and the competitors are plotted in
Figure 4. With IDSC as the baseline, the proposed methods clearly provided the best curve performances across ranges of recall, as well as the highest retrieval scores: 57.52% for TH-RDP and 57.84% for I-TH-RDP. Meanwhile, with SC as the baseline measure, TH-RDP and I-TH-RDP outperformed the other methods across most ranges of the P–R curves, achieving retrieval scores of 54.88% and 55.53%, respectively. Overall, the P–R curves and retrieval scores clearly demonstrate that the proposed methods deliver better results and outperform the other methods, and the baseline methods are improved significantly.
5.3. LEAF Dataset from UCI
Another real-world leaf (LEAF) dataset from the UCI datasets is introduced in this subsection. This dataset, as reported in [
42], comprises 40 different plant species. A subset of 30 species numbered from 1 to 15 and from 22 to 36 has been constructed, which exhibits simple leaves according to the description of this dataset. Binary images of leaves from the selected species have also been released, together with the original images, and
Figure 5 shows some examples of the color images from this database. For the previous databases in this paper, each class provided an identical number of samples. However, the introduced dataset is imbalanced, and the number of leaves for each class from the subset ranges from 8 to 16. So, given the query objects from various species, we set different values of
to evaluate the retrieval scores. We obtained retrieval scores of 66.10% for SC and 65.51% for IDSC.
The retrieval performances based on different methods are listed in
Table 3. As shown in
Table 3, with IDSC as the baseline, we achieved retrieval results of 74.06% for TH-RDP and 74.19% for I-TH-RDP, respectively. On the other hand, with SC as the baseline, the proposed approaches yielded retrieval scores of 74.44% for TH-RDP and 74.92% for I-TH-RDP. The retrieval results in
Table 3 demonstrate that the proposed methods achieved significant improvements, even though the shape dataset is unbalanced. Our approaches performed slightly better than state-of-the-art methods. Moreover, as can be seen in
Table 3, our methods boosted the retrieval scores of SC and IDSC by over 8%. The baseline results were significantly improved by the proposed methods.
5.4. ORL Face Dataset
In this subsection, an experiment on the ORL face database, downloaded from the website of AT&T Laboratories Cambridge, is carried out. The ORL face database is a well-known dataset widely used for face recognition that contains 400 grayscale face images across 40 different categories. The images from this database were obtained at different times, with varying lighting, facial expressions, and facial details. Similar to [
14], with each image downsampled and normalized to zero mean and unit variance, the vectorized representation (VR) was taken as the feature, and the
norm was selected as the input pairwise distance. Instead of using the retrieval window ranging in size from 11 to 20 [
14,
16], we used the retrieval score as the evaluation metric corresponding to a size 10 retrieval window, which is a more rigorous measure. The retrieval score of this baseline method was 57.13%.
The results of different methods are listed in
Table 4, and the corresponding P–R curves are plotted in
Figure 6. As shown in
Table 4, with the VR as the baseline method, TH-RDP and I-TH-RDP achieved retrieval scores of 74.42% and 74.08%, respectively. Therefore, the baseline result was greatly and effectively improved by our approaches, and the original retrieval score was improved by around 14%. It can be seen in
Figure 6 that the P–R curves of our methods achieved the second- and third-best re-ranking results on this dataset.
6. Conclusions
In this paper, with the high-order information taken into account, we proposed a novel variant of the regularized diffusion process with the triple-cosine smoothness constraint and the hybrid fitting constraint. The triple-cosine smoothness constraint is constructed using the cosine of the angle in the coordinate system, which is created based on a group with three elements. For the hybrid fitting constraint, the squared norm and the norm can be adaptively generated according to the predefined values. The closed-form solution and the iterative solution of the proposed model are given. Moreover, for the proposed model, since the learned contextual dissimilarities can be used to represent “one-to-many” relationships, an iterative re-ranking process based on the proposed model is further presented. The experimental results on different databases demonstrate that the proposed methods can significantly improve the baselines.
Some issues still need to be addressed in the future. The effectiveness of the proposed method largely depends on the weighted graph, which is generated based on noise filtering for the baselines. So, how to effectively eliminate noise edges will be investigated further. Moreover, the time complexity for learning contextual similarities is about
, and reducing it to meet the requirements of real-time applications is worth studying. Finally, with the rapid development of deep learning, deep features have been increasingly used to analyze pairwise (dis)similarities, and desirable results can often be achieved [
2]. In this case, improving the retrieval results obtained with deep features is a task that should be continuously considered and studied in the future.
Author Contributions
Conceptualization, M.D. and J.C.; Methodology, M.D.; Software, M.D.; Validation, M.D.; Formal analysis, M.D.; Writing—original draft, M.D.; Writing—review and editing, J.C.; Supervision, J.C.; Project administration, J.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
Data is contained within the article.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A
Appendix A.1. Derivation Process of Equation (A3)
As can be seen in
Figure A1,
and
are the projections of
and
respectively, on the plane
. For
,
, and
, it satisfies
. And for
,
, and
, it satisfies
. Since
and
, we have
and
. Similarly,
and
. For
, we derive the following:
Figure A1.
Calculation of the cosine of angle in the coordinate system.
Figure A1.
Calculation of the cosine of angle in the coordinate system.
With
as the starting point, draw a vector
with the same magnitude and direction as
, and we have
. Since
, we have
. Since
is a right triangle, we have
. For
, the following relationship holds:
can be finally obtained as follows:
References
- Croitoru, F.A.; Hondru, V.; Ionescu, R.T.; Shah, M. Diffusion Models in Vision: A Survey. IEEE Trans. Patt. Anal. Mach. Intell. 2023, 45, 10850–10869. [Google Scholar] [CrossRef] [PubMed]
- Chen, W.; Liu, Y.; Wang, W.; Bakker, E.M.; Georgiou, T.; Fieguth, P.; Liu, L.; Lew, M.S. Deep Learning for Instance Retrieval: A Survey. IEEE Trans. Patt. Anal. Mach. Intell. 2023, 45, 7270–7292. [Google Scholar] [CrossRef] [PubMed]
- Leticio, G.R.; Kawai, V.S.; Valem, L.P.; Pedronette, D.C.G.; da S. Torres, R. Manifold information through neighbor embedding projection for image retrieval. Pattern Recogni. Lett. 2024, 183, 17–25. [Google Scholar] [CrossRef]
- Guo, Q.; Feng, W.; Gao, R.; Liu, Y.; Wang, S. Exploring the Effects of Blur and Deblurring to Visual Object Tracking. IEEE Trans. Image Process. 2021, 30, 1812–1824. [Google Scholar] [CrossRef]
- Chi, R.; Zhang, H.; Huang, B.; Hou, Z. Quantitative Data-Driven Adaptive Iterative Learning Control: From Trajectory Tracking to Point-to-Point Tracking. IEEE Trans. Cybern. 2022, 52, 4859–4873. [Google Scholar] [CrossRef] [PubMed]
- Bicego, M.; Lovato, P. A bioinformatics approach to 2D shape classification. Comput. Vis. Image Underst. 2015, 145, 59–69. [Google Scholar] [CrossRef]
- Schmidt, A.; Mohareri, O.; DiMaio, S.; Yip, M.C.; Salcudean, S.E. Tracking and mapping in medical computer vision: A review. Med. Image Anal. 2024, 94, 103131. [Google Scholar] [CrossRef]
- Pereira-Ferrero, V.; Lewis, T.; Valem, L.; Ferrero, L.; Pedronette, D.; Latecki, L. Unsupervised affinity learning based on manifold analysis for image retrieval: A survey. Comput. Sci. Rev. 2024, 53, 100657. [Google Scholar] [CrossRef]
- Yang, X.; Prasad, L.; Latecki, L.J. Affinity learning with diffusion on tensor product graph. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 28–38. [Google Scholar] [CrossRef]
- Yang, X.; Bai, X.; Latecki, L.J. Improving Shape Retrieval by Learning Graph Transduction. In Proceedings of the European Conference on Computer Vision, Marseille, France, 12–18 October 2008; Volume 5305, pp. 788–801. [Google Scholar]
- Pedronette, D.C.G.; Penatti, O.A.B.; Torres, R.d.S. Unsupervised manifold learning using Reciprocal kNN Graphs in image re-ranking and rank aggregation tasks. Image Vis. Comput. 2014, 32, 120–130. [Google Scholar] [CrossRef]
- Bai, X.; Bai, S.; Wang, X. Beyond diffusion process: Neighbor set similarity for fast re-ranking. Inf. Sci. 2015, 325, 342–354. [Google Scholar] [CrossRef]
- Bai, S.; Sun, S.; Bai, X.; Zhang, Z.; Tian, Q. Smooth Neighborhood Structure Mining on Multiple Affinity Graphs with Applications to Context-Sensitive Similarity. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 592–608. [Google Scholar]
- Donoser, M.; Bischof, H. Diffusion Processes for Retrieval Revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1320–1327. [Google Scholar]
- Luo, L.; Shen, C.; Zhang, C.; Hengel, A.v.d. Shape Similarity Analysis by Self-Tuning Locally Constrained Mixed-Diffusion. IEEE Trans. Multimed. 2013, 15, 1174–1183. [Google Scholar] [CrossRef]
- Bai, S.; Bai, X.; Tian, Q.; Latecki, L.J. Regularized Diffusion Process for Visual Retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 3967–3973. [Google Scholar]
- Zheng, D.; Liu, W.; Han, M. Learning contextual dissimilarity on tensor product graph for visual re-ranking. Image Vis. Comput. 2018, 79, 1–10. [Google Scholar] [CrossRef]
- Bai, S.; Zhou, Z.; Wang, J.; Bai, X.; Latecki, L.J.; Tian, Q. Automatic Ensemble Diffusion for 3D Shape and Image Retrieval. IEEE Trans. Image Process. 2018, 28, 88–101. [Google Scholar] [CrossRef]
- Zheng, D.; Fan, J.; Han, M. Hybrid Regularization of Diffusion Process for Visual Re-Ranking. IEEE Trans. Image Process. 2021, 30, 3705–3719. [Google Scholar] [CrossRef] [PubMed]
- Bai, S.; Bai, X.; Tian, Q.; Latecki, L.J. Regularized Diffusion Process on Bidirectional Context for Object Retrieval. IEEE Trans. Patt. Anal. Mach. Intell. 2019, 41, 1213–1226. [Google Scholar] [CrossRef]
- Ma, J.; Xu, F.; Rong, X. Discriminative multi-label feature selection with adaptive graph diffusion. Pattern Recognit. 2024, 148, 110154. [Google Scholar] [CrossRef]
- Zhang, Z.; Wang, J.; Zhu, L.; Luo, Y.; Lu, G. Deep collaborative graph hashing for discriminative image retrieval. Pattern Recognit. 2023, 139, 109462. [Google Scholar] [CrossRef]
- Page, L.; Brin, S.; Motwani, R.; Winograd, T. The PageRank citation ranking: Bringing order to the web. In Technical Report; Stanford InfoLab Publication Server: Brooklyn, NY, USA, 1999. [Google Scholar]
- Zhou, D.; Bousquet, O.; Lal, T.N.; Weston, J.; Olkopf, B.S. Learning with Local and Global Consistency. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 13–18 December 2004; pp. 321–328. [Google Scholar]
- Bai, X.; Yang, X.; Latecki, L.J.; Liu, W.; Tu, Z. Learning Context-Sensitive Shape Similarity by Graph Transduction. IEEE Trans. Patt. Anal. Mach. Intell. 2010, 32, 861–874. [Google Scholar]
- Zhu, X. Semi-Supervised Learning with Graphs. Ph.D. Thesis, Carnegie Mellon University, Language Technologies Institute, School of Computer Science, Pittsburgh, PA, USA, 2005. [Google Scholar]
- Wang, J.; Li, Y.; Bai, X.; Zhang, Y.; Wang, C.; Tang, N. Learning context-sensitive similarity by shortest path propagation. Pattern Recognit. 2011, 44, 2367–2374. [Google Scholar] [CrossRef]
- Bai, X.; Wang, B.; Yao, C.; Liu, W.; Tu, Z. Co-Transduction for Shape Retrieval. IEEE Trans. Image Process. 2012, 21, 2747–2757. [Google Scholar] [PubMed]
- Yang, X.; Koknar-Tezel, S.; Latecki, L.J. Locally constrained diffusion process on locally densified distance spaces with applications to shape retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 357–364. [Google Scholar]
- Kontschieder, P.; Donoser, M.; Bischof, H. Beyond Pairwise Shape Similarity Analysis. In Proceedings of the Asian Conference on Computer Vision, Xi’an, China, 23–27 September 2009; Volume 5996, pp. 655–666. [Google Scholar]
- Egozi, A.; Keller, Y.; Guterman, H. Improving Shape Retrieval by Spectral Matching and Meta Similarity. IEEE Trans. Image Process. 2010, 19, 1319–1327. [Google Scholar] [CrossRef] [PubMed]
- Li, Q.; An, S.; Liu, W.; Li, L. Semisupervised Learning on Graphs with an Alternating Diffusion Process. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 2862–2874. [Google Scholar] [CrossRef] [PubMed]
- Moradi, M.; Bayat, F. A salient object segmentation framework using diffusion-based affinity learning. Expert Syst. Appl. 2021, 168, 114428. [Google Scholar] [CrossRef]
- Valem, L.P.; Pedronette, D.C.G.; Latecki, L.J. Rank Flow Embedding for Unsupervised and Semi-Supervised Manifold Learning. IEEE Trans. Image Process. 2023, 32, 2811–2826. [Google Scholar] [CrossRef]
- Presotto, J.G.C.; Valem, L.P.; de Sa, N.G.; Pedronette, D.C.G.; Papa, J.P. Weakly supervised classification through manifold learning and rank-based contextual measures. Neurocomputing 2024, 589, 127717. [Google Scholar] [CrossRef]
- Bai, S.; Bai, X. Sparse Contextual Activation for Efficient Visual Re-ranking. IEEE Trans. Image Process. 2016, 25, 1056–1069. [Google Scholar] [CrossRef]
- Baseski, E.; Erdem, A.; Tari, S. Dissimilarity between two skeletal trees in a context. Pattern Recognit. 2009, 42, 370–385. [Google Scholar] [CrossRef]
- Belongie, S.; Malik, J.; Puzicha, J. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 509–522. [Google Scholar] [CrossRef]
- Ling, H.; Jacobs, D.W. Shape Classification Using the Inner-Distance. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 286–299. [Google Scholar] [CrossRef]
- Bai, X.; Liu, W.; Tu, Z. Integrating contour and skeleton for shape classification. In Proceedings of the IEEE International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009; pp. 360–367. [Google Scholar]
- Xiang, B.; Cong, R.; Xinggang, W. Shape Vocabulary: A Robust and Efficient Shape Representation for Shape Matching. IEEE Trans. Image Process. 2014, 23, 3935–3949. [Google Scholar]
- Silva, P.F.B.; Marçal, A.R.S.; Silva, R.M.A.D. Evaluation of Features for Leaf Discrimination. In Springer Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7950, pp. 197–204. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).