1. Introduction
Computer vision is an important research direction in current computer science since it occupies a pivotal position in human perception simulation. Automatically finding correspondences between object features in images is of interest for several applications, as target tracking [
1], 3D object retrieval [
2], pattern recognition [
3], image stitching [
4] and in many other fields.
Image matching is used to determine the geometric alignment of two or more images of the same scene taken by the same or different sensors from different viewpoints at the same or different times. We can distinguish dense correspondence, that determines correspondences at the pixel level, and sparse correspondence, that determines correspondences between a sparse set of higher lever features being first extracting from images. Most of the time such features represent invariant information at some location in the image, like corners, edges, gradients. Since we are interested in sparse correspondence, we study the standard methods for automatic extraction of the feature point sets from images. This becomes a critical step since it should avoid the presence of outliers and should allow discriminating objects easily. We propose a method for feature extraction step suitable to first-order matching.
Local feature descriptors, that is, providing detail feature detection and feature description information, play a fundamental and vital role in the process of feature correspondence, directly affecting the accuracy and objective score of graph matching (GM). High-quality local feature descriptors describe key points with uniqueness, repeatability, accuracy, compactness, and effective representation. These key points can keep robust and constant in terms of scaling, rotation, affine transformation, illumination, and occlusion [
5]. Here we focus on the theoretical and mathematical descriptions of various local feature descriptors. In the feature points detection algorithm, local descriptors are typically used to describe image regions near feature points. Currently, methods for extracting feature points include Harris descriptor [
6], Gilles descriptor [
7], LoG descriptor [
8], corner detector (CD) [
9], Harrislaplace descriptor [
10], SIFT descriptor [
11], and so on. Among these feature extraction algorithms proposed in the literature, Marr wavelets which was originally used in [
12] is favored for several properties: robustness (against distortion), rotationally invariant, noise insensitivity [
13]. We choose to focus on the latter, whereas other methods will serve as a basis for comparative evaluation.
Given a pair of images, how to detect and extract feature points is the first step in image matching. Automatic feature extraction is the key point, and further related image matching can be performed based on the feature-to-feature correspondence. Given some standard nearest neighbor matching strategies, how to improve the reliability of the feature set is the problem to be solved here. To this end, we try to combine or enhance standard and easy-to-implement feature detection methods to make the final overall method (including feature detection and matching) competitive in terms of computation time and matching quality.
In order to obtain better matching results, the method proposed inserts a Laplace filter-based image preprocessing method before detecting feature points to increase the size of the candidate feature set. Since the Marr wavelet within scale-interaction method is more inclined to extract the edge information of the object, the Laplacian method can be used to enhance the edge details of the image. Then, a sparse feature point method based on entropy selection is proposed as a new filtering step. Filtering (also known as convolution) is also a very popular operation in the field of image processing, which can be applied to image encryption by changing pixel values [
14]. This step combines the local entropy evaluation with the brightness deviation response as a new process for feature selection.
Entropy is a key concept in thermodynamics and statistical mechanics. It not only plays a special role in physical quantities, but also relates to the macro and micro aspects of nature, and determines the behavior of the macro system. Entropy is a well-defined quantity regardless of the type or size of the system under consideration. Entropy has many general properties, for instance, invariance, continuity, additivity, concavity, etc. [
15].The probability distribution of entropy can be interpreted not only as a measure of uncertainty, but also as a measure of information [
16]. Local entropy represents structured information, which is used to count the probability of occurrence of gray level in the sub-image, but independently around a single pixel. We claim that it is not sensitive to the influence of noise and can improve the accuracy of the image description. Based on the mesh division, the feature points in the sub-regions can be sorted according to their local entropy and selected trough deviation values. Then, entropy selection can not only effectively reduce the useless feature points for saving time, but also ensure the uniformity of feature point distribution. Mainly, the advantage of the proposed entropy and response algorithm is to realize a good compromise (trade-off) between accuracy and computation time together when compared to other standard approaches from literature. It is worth noting that both accuracy and computation time are essential criteria to compare and evaluate heuristic methods [
17,
18].
A necessary graph matching procedure based on the normalized cross-correlation similarity measure is applied to measure the effectiveness of the method in image matching. It entrusts quality assessment to the Random Sample Consensus Algorithm (RANSAC) program, which eliminates mismatched pairs and calculates true match recalls. The experiments were performed on standard image processing benchmarks. They showed how to increase the size of the feature set and matching accuracy with saving computation time.
The contributions of our paper are proposing two optimized algorithms: first-order and second-order for graph matching. Both of them are realized based on normalized cross-correlation (NCC) algorithm, which allows us to address graph matching or derived sub-problems with a closer relationship with experiments on integer quadratic programming (IQP) models in the Matlab platform. Especially for the first-order graph matching, we have proposed a new combination of feature points detection algorithms among Laplace filter, Marr wavelets, and the entropy-response based selection method.
This paper is organized as follows.
Section 2 introduces the motivations and taxonomy of graph matching. The formulation of standard graph matching is explained in
Section 3. In
Section 4, the different steps of the proposed feature extraction and first-order graph matching procedures are respectively presented. A preliminary version of this section forward appears as part of our previous conference paper [
19]. We also extended the NCC based second-order graph matching in
Section 5. The evaluation of the two proposed algorithms and their comparison with some algorithms are exposed in
Section 6.
Section 7 presents the conclusion.
3. General Formulation of Graph Matching Problem
This section introduces the general representation of traditional graph matching and the definition of the high-order graph matching problem. The main purpose is on finding the correspondence one-to-one mapping between two feature sets from two image sources. The goal is to maximize a function score among the set of correspondence pairs. In first-order matching, only local attribute descriptors are considered and evaluated, whereas, in the general case of graph matching, two order potentials between pairs (edges) of features must also be maximized to established the similarity between edges of features. On the other hand, the high-order GM method considers the invariant geometric information by considering the relationship between tuples of feature points. The input feature graph becomes a hyper-graph, where hyper-edges replace edges, that is subsets of k points, with the order , rather than only considering couple of points.
Suppose we are given a pair of graphs
with
feature points for the reference graph
, and
with
feature points for the query graph
.
P and
Q are the two sets of feature points, and
and
denote edge sets. We note
and
as representing feature points. Therefore, the main problem is to find a suitable one-to-one mapping from one feature set to the other feature set as illustrated in the
Figure 1. The pictures in
Figure 1 are from PF-WILLOW dataset (
https://www.di.ens.fr/willow/research/proposalflow/).
Finding a mapping form
P to
Q can be equivalent to find an
assignment matrix
X, such that
when point
i is assigned to point
a, and
otherwise. Therefore, a one-to-one admissible solution must verify the following constraints in (
1), that requires a binary solution, and (
2) and (
3), that express the two-ways constraints of a one-to-one mapping.
Then, the problem of graph matching can be formulated as the maximization of the following general objective score function (
4):
where
means the similarity or affinity measurement corresponding to the tuple of feature points
and
. The higher is the score
, the higher are the similarities between the two corresponding edges
and
. The product
is equal to 1 if and only if points
are respectively mapped to points
.
Then, we need to know how to compute such a positive and symmetric similarity matrix
H. Many cost functions may be used to compute affinity matrices for first-order and second-order GM. Note that
represents first-order similarity term, between local attributes of points
and
. For example, the authors in [
37,
38] use the normalized cross-correlation (NCC) cost function, as we have used to validate our feature point extraction method in this thesis. Nevertheless, any point-to-point distance function can be used, as the Euclidean distance between SIFT descriptors, or sum of squared error data terms.
Duchenne et al. in [
33] propose a general formula to compute the second-order affinity term
as shown in (
5), where
f is a feature vector associated to each edge.
Leordeanu et al. in [
29], and as most often encountered, computes the Euclidean distance between the corresponding candidate point pairs
and
, to build the affinity term
, as shown in Equation (
6). Here,
is the sensitive controller of the deformation.
4. First-Order NCC Based GM
In this section, we introduce the implementation of the system image matching process, including feature point detection and extraction, feature point matching, and removing mismatching pairs. The schematic diagram of the whole process is shown in
Figure 2. Regarding the interest point matching strategy, the proposed pipeline illustrated in
Figure 2 can be summarized as (1) Laplacian filter is used for edge detection; (2) Marr wavelets are used to identify salient points in the image; (3) entropy based metric for selecting the most distinctive points of the image denoted as feature points; (4) feature points matching; (5) outlier removal through RANSAC.
4.1. Image Pre-Processing
Laplacian is a second-order derivative operator that detects the zero-crossing in image intensity and usually produces more accurate edge detection results [
39]. Laplace filter represents a discrete approximation to the mathematical Laplace operator. Its second-order partial derivative in the orthogonal direction of continuous space and the approximation of its mathematical equivalent are defined below [
40]:
where
is the original image,
is the processed image,
c is a constant.
From formula (
8), the digital mask filter
w can be viewed as the following
set of filter coefficients as shown in
Figure 3a. The process of the Laplacian filter sharpening is essentially a convolution process. Suppose the origin pixel of
f is located in the upper left corner of the image
f, and set the middle value of mask
w as the center of kernel. Let
w move at all possible positions so that the center kernel of
w can coincide with each of pixels of
f. The convolution operation is essentially the sum of the products of the corresponding positions of the two functions. The convolution between
f and its corresponding mask filter
w is shown in
Figure 3b.
The definition of two-dimensional convolution is as:
4.2. Marr Wavelets within Scale-Interaction
Receptive field [
41] is used to describe the stimulation pattern of the retina. The receiving field of high-level neuron cells in the visual pathway is synthesized from the receiving field of low-level neuron cells. Therefore, as the level increases, the range of the receptive field becomes more impressive. Ultra-complex neuron cell models [
42] can respond to complex object features through powerful non-linear processing functions, and most ultra-complex neuron cells have sensitive termination characteristics at the ends of line segments, corner points, and line segments with high curvature. In other words, the response of ultra-complex neuron cells to light can be simulated by the difference in response of spatial filters with different bandwidths to light. The response of the received field to light can be represented by a spatial filter function, such as a Gaussian difference function or a Gabor wavelet function.
The scale-interaction model for feature detection is based on filtering using a class of self-similar Gabor functions or Gabor wavelets [
43,
44], which can achieve the minimum joint resolution in the spatial and frequency domains. This recommendation is made because it is unique in reaching the smallest possible value of the joint uncertainty [
45]. The function of feature detection is defined as the following formula:
where
is a normalizing factor,
and
are spatial filters. They go through the transformation of a nonlinear function
f at location
with preferred orientation
in two scales
i and
j respectively. If feature detection function
obtains a local maximum at the location
, this location is considered to be a potential feature point position.
For further optimization, the Marr wavelets [
46] were used instead of Gabor wavelets within the scale-interaction model, because of its isotropic [
47,
48]. Two-dimensional Marr wavelets and their corresponding feature detection function are defined as:
where
, and
.
and
i represents the scaling value of Marr wavelets. Convolve
with an grayscale image
. If its response value
obtains a maximum local value, then
X is considered to be a potential feature point. Algorithm 1 is pseudo-code of Marr wavelets within scale-interaction.
Figure 4 illustrates the process of extracting response value using Marr wavelets within scale-interaction. (b) and (c) are convolution results between the original picture and the mask filter. Then local maximum points are extracted on this basis.
Algorithm 1 Marr wavelets within scale-interaction algorithm |
Input:
|
Output: points |
1:
|
2:
|
3:
|
4:
|
5:
if
then |
6: |
7:
end if |
8:
|
9: function () |
10: |
11: |
12: |
13: |
14: |
15:
end function |
4.3. Entropy and Response
Aiming at the problems of uneven feature distribution, too many feature points, and long matching time, a feature point extraction method based on local entropy and feature point response was proposed, called the entropy and response algorithm (ER). In this section, three main parts are explained.
4.3.1. Entropy Algorithm
In actual images, feature points often appear as sharp changes of gray values or inhomogeneity in grayscale distribution; that is, the local region of a feature point has a large amount of information. Entropy is a measure of information in an image, and local entropy is a measure of local area information of an image. Local entropy value under feature-rich region is much higher than the local entropy value under feature-poor region. Therefore, it is possible to determine which regions have more features by calculating the local information entropy of image, and then extract the feature points in these regions.
Information entropy [
49,
50] is the amount that represents the overall characteristics of the source in a common sense. It is considered from the statistical properties of the entire source to measure the expected value of a random variable. An image is essentially a source of information that can be described by information entropy. Let the gray image
G have
m gray levels, mesh division is performed to obtain
sub-regions. The whole information entropy
and the average entropy
of the image are calculated as follows:
where
is the probability that the
gray level appears, that is, the ratio of the number of pixels whose gray value is
i to the total number of pixels of the image. So the local entropy is counted for the probability of occurrence of gray level in the sub-image. Since the value of the information entropy is only related to the distribution of the local gray-scale pixels, but independent with a single pixel, so it is not sensitive to the influence of noise and can improve the accuracy of the image authenticity description. Here, we use the local information entropy of the image to extract feature points. Under the meshing strategy, the image is divided into
sub-regions. Therefore, the sub-average entropy value of each sub-grid can be calculated.
Figure 5 is a schematic diagram of mesh division,
n is set to 40.
4.3.2. Response Algorithm
After mesh division and the computation of each local entropy, we will get sub-regions for the whole image, then detected feature points are mapped into the respective sub-areas. In this case, if we compute and sort the entropy values of all of these feature points extracted, then the first N feature points with larger entropy values can be selected to describe the whole image. However, this method only utilizes the entropy values of feature points, without considering its distribution in the image. Finally, feature points with high entropy values may mostly appear in the same local area, which will cause aggregation. So a block division and response algorithm are proposed to deal with this problem.
As mentioned before, if a pixel presents a sharp change in its neighborhood, this pixel will have a stronger deviation value from the mean value. Based on the Bresenham discrete circle [
51] with the pixel point
as the center and three pixels as the radius, 16-pixel points on the discrete circumference are considered in correspondence with the central pixel point
. This is shown in
Figure 6. These 16 pixels are assigned to dark and bright areas. The dividing criteria and deviation [
52] are respectively defined as the following:
where
indicates bright area,
indicates dark area.
is the gray value of center point
,
represents the gray value of a pixel labeled
x on a discrete circumference centered at pixel
,
t is the set threshold.
is the sum of the deviation value among the gray values of the pixel
and its corresponding neighboring pixels located in the bright or dark area.
4.3.3. Distribution Criterion
After the calculation of the entropy and response, a distribution criterion is proposed to extract the corresponding points that meet the requirements. In the region where the local entropy is bigger than the average of entropy , the feature points are extracted with the ratio r. The only one strongest response point is extracted in each remaining region where the local entropy is smaller than . Here, we choose the unified ratio method for the selection of r. Assuming that there are m regions whose local entropy is greater than , then is the ratio of extracting feature points in the region. For example, when , that is, in the region with large local entropy, the feature points with the top responses given by are extracted. The appropriate value of r is set empirically.
The detail of the entire ER method is outlined in Algorithm 2. Based on the mesh division strategy, we first compute each of the local average entropy value of all these sub-regions. Then the feature points are sorted according to the computation of their deviation values in each of their sub-region. Finally, the distribution criterion can not only effectively reduce some of the useless feature points, but also ensure the uniformity of feature point distribution. As we can see, the step entropy and response can both develop a custom algorithm to identify feature points. Entropy strategy is only to calculate the reflection degree of an individual pixel, but the response is based on the deviation value of a set pixels between the center point, and it is adjacent points based on the Bresenham discrete circle principle, so that the mutation of response is more reflected for finding points with reliable contrast. Therefore, deviation is more capable of extracting more qualified feature points than entropy.
Algorithm 2 Entropy and response algorithms |
Input:
|
Output:
|
1: for
do |
2: |
3: |
4: |
5: if then |
6: |
7:
else |
8: |
9: end if |
10: |
11: end for |
12:
return
|
13: function () |
14: |
15: |
16: |
17:
end function |
18: function ()
|
19: |
20: |
21: |
22:
end function |
4.4. Definition of First-Order GM Problem Based on NCC
The purpose of graph matching [
53] is to determine the correct attribute correspondences
and
between two graphs
P and
Q, where
V means vertex, and
E represents edge. We customize corresponding mapping edges
,
.
The objective of graph matching is to find the correct corresponding point pairs between two graphs P and Q among the feature points extracted. A unidirectional ’one-to-one’ constraint is assumed, which requires one node in P to match at most one node in Q.
Cross-correlation is a standard method for estimating the similarity between two sets of data [
54]. Normalized cross-correlation (NCC) is an essential application, which has been used widely for many signal processing applications due to its effective and direct representation in the frequency domain, and it is less sensitive to linear variations in the amplitude of two comparison signals [
55]. We use the NCC algorithm to measure the similarity between two feature point
p in graph
P and
q in graph
Q. These ratios
of the calculated correlation values represent the degree of matching between the two sets of corresponding images. The NCC algorithm used to find similarity match between a window near feature point
p and a window around feature point
q is defined as:
where the summations are over all window coordinates,
and
are pixel intensity in windows for
p and
q respectively, each of the windows is sized as
. Also,
and
are the corresponding mean of the window pixels. The coordinate of maximum values in this normalization cross-correlation is the position of the best matches for reference images.
Based on the NCC similarity measure, we use the nearest neighbor ratio (NNR) method to perform a rough match on the feature point set. The selection and matching process of the NNR algorithm with one-way ‘one-to-one’ constraint is described as follows: Based on the sample feature point pairs in the two images, first use the NCC algorithm to extract all the most significant corresponding pairs. These ratios are then compared with a fixed threshold. If the NCC ratio is higher than this fixed threshold, the corresponding point pair is considered a match. Otherwise, the pair of points are discarded. The fixed threshold is usually a constant not greater than 0.9. Since correct matching has stronger similarity than incorrect matching, this is a functional judging characterization for graph matching according to the NNR concept.
Figure 7 shows a flowchart of data processing. The detected feature points of the two images are respectively put into two corresponding buffers. Each point
p in graph
P is used to calculate the ratio of NCC to all points
q in graph
Q, and then all ratios are sorted in descending order. Under the principle of NNR, some will be extracted as the best matching points, otherwise, they will be discarded.
4.5. Outlier Elimination
In this application, before using the Marr filter for convolution, we will use Laplacian for convolution as a preprocessing step. Due to the linear relationship of these two operations, this corresponds to a single convolution, which is a smoothed fourth-order derivative filter. Therefore, as each derivative amplifies the noise, the number of feature points increases. On the other hand, although the NNR method is easy to implement and sometimes well-matched, some points in these extracted feature point sets do not match, so a mismatched cleanup operation is required. Therefore, it is particularly important to find ways to reduce the mismatch caused by interference.
The RANSAC algorithm [
56,
57] is called the Random Sample Consensus Algorithm, which can well eliminate the existence of mismatches. The algorithm has strong robustness and the ability to correct data sets. The basic idea of the RANSAC algorithm is to use an iterative method to extract the sample set from the model. Find an optimized parametric model that can include more internal points in the data set and then test the extracted samples using the residual set. Points in the algorithm that fit the data set model are called interior points. Otherwise, they are called outliers. Therefore, the RANSAC algorithm can be used to find the best parameter model in the data set containing outliers through an iterative algorithm. The detailed implementation process of RANSAC is as follows:
- (a)
Randomly extracts non-collinear a pairs of feature points from the data set ( in experiment), then calculate their transformation homography matrix H, and record it as model M.
- (b)
Calculate projection error of each point in the dataset with model . If the error is less than a predefined threshold , add it into the inner point set .
- (c)
If the current number of elements in inner point set is greater than the number in optimal inner point set , then update and re-estimate the model .
- (d)
If the number of iteration is more than k, the operation will be exited; otherwise, the number of iterations is increased by 1, and the above steps are repeated.
The threshold
is selected in accordance with
n-dimensional chi-square distribution.
is the cumulative chi-square distribution. Assume that the out-of-class points are white Gaussian noise with a mean of 0 and a variance of
. The number of iterations
k will be updated instead of fixed until it is greater than the maximum number of iterations.
where
is the confidence level, generally taking 0.95 to 0.99;
is the ratio of inlier point;
a is the minimum number of samples required to calculate for model. The pseudo-code of RANSAC is outlined in Algorithm 3.
Algorithm 3 Random Sample Consensus (RANSAC) algorithm |
Input:
|
Output:
|
1:
|
2:
for
do |
3: |
4: Use randomly sampled subset a to estimate and |
5: if then |
6: |
7: |
8: end if |
9: |
10:
end for |
4.6. Parameters of the Proposed Algorithm
This section details the specific parameters used in the experiments. In the feature point detection part, the Marr wavelet algorithm is used to define the feature points as the maximum local value inside the scale-interaction image (with
). The two scales we chose are
and
. Then the mesh division and feature points extraction are processed, the image is meshed by
to obtain
sub-regions. Here,
is selected. The detected feature points are mapped into various sub-regions and sorted based on the deviation value
in each sub-region to which they belong. At the same time, each local information entropy
and average information entropy
are calculated. Assume that there are
k sub-regions with local information entropy greater than the average information entropy, then feature points with maximum responsiveness of 30% are extracted from these
k regions. In the feature point matching part, the NCC similarity measurement algorithm and NNR method are used to match the feature point set roughly. For the matching point filtering part, RANSAC can better remove the unmatched points, and finally get more accurate matching results. Its computational complexity is
, where
is the number of features in the reference image, and
is number of features in query image. The algorithm is suitable for performing real-time global methods. Besides, it can also achieve efficient parallel implementation in GPU systems. Algorithm 4 outlines the details of the entire process of the proposed algorithm.
Algorithm 4 The proposed algorithm |
Input: Input images and - 1:
and processed under Laplace filter - 2:
- 3:
- 4:
- 5:
- 6:
|
5. Second-Order NCC Based GM
The first-order GM provides convolution-based algorithms, whereas the second-order GM emphasizes geometric inter-feature relationships, transforming the correspondence problem to a purely geometric problem stated in a high dimensional space, generally modeled as an integer quadratic programming. This section presents our second application. We introduce in this section a new contribution with an application to second-order graph matching in the Matlab framework. The framework is based on the original Matlab application provided by Cho et al. [
18]. This application constitutes a useful framework for graph matching as an IQP problem. It offers useful mathematical abstractions, and it allows us to develop and compare many algorithms based on a common evaluation platform, sharing input data, but also customizing affinity matrices and a matching list of candidate solution pairs as input data. This allows us to reuse these common data and context to start elaborate NCC algorithms for second-order graph matching application (As we discussed in detail in
Section 4.4). This approach uses NCC algorithm to search for the indicator vector, then the matching score will be computed under IQP based formulation. By considering the second-order term, the algorithm determines the mapping between two graphs that should reflect the geometric similarity relationship between the pairwise matching features. All the algorithms are executed and compared based on the same experimental framework with common data from standard benchmarks in the domain.
We set
P and
Q the two sets of features of query graph
and reference graph
respectively. We note
and
as feature points,
and
as edges. Also,
and
represent, when needed, candidate assignments. The main task it to find a suitable one-to-one mapping between
P and
Q. The feature points correspondence mapping is shown in the
Figure 1. The yellow lines are correct matches.
Affinity matrix M, also known as affinity tensor, is used to organize the mutual similarities between sets of feature points. The measurement of affinity can be interpreted as a product of a solution vector x, that represents the set of candidate correspondences, by the matrix. The solution variable is an indicator vector such that means feature matches with feature , means no correspondence, and where and are the respective set sizes of P and Q.
A graph matching score
S between edges can be defined by the following equation:
where
means
and
are correspondence pairs, and
x is the indicator vector. Then, the purpose of the graph matching IQP problem is computing solution
that maximizes the matching score as follows:
The binary constraint is expressed by Equation (
25), while (
26) and (
27) express the two-way constraints, that specify the solution to be a one-to-one mapping from
P to
Q. Note that by removing constraint (
27), we obtain a many-to-one mapping, that is, a (partial) function from
P to
Q. In this chapter both constraints must be verified.
Affinity matrix M which consists of the relational similarity values between edges and nodes must is considered as an input of the problem. It can be noted that its size is defined by the total number of candidate assignment pairs considered. Then, the affinity matrix size may vary from , in the case of full possible pairs, to where K is some constant, in case of a restricted list of candidate pairs. Note that this list of candidate pairs must be added as part of the input to relate the entries of the affinity matrix to the feature points. The indicator variable x size varies also accordingly to the symmetric affinity matrix size. Its length corresponds to the column, of line, size of the matrix, and may vary from to depending on the application.
Here, the matching score is completely retained as pairwise geometric only. The individual affinity
that represents first order affinity, is set to zero since there is no information about individual affinity. That is to say, all the diagonal values of the affinity matrix are zeros. The pairwise affinity
between edges is given by:
where
is the mutual projection error function used in [
58] between two candidate assignments
and
, that includes euclidean distance evaluation
between locations of features
. The
Table 1 summarizes notations and definitions used in this paper.