1. Introduction
Geometric model fitting attracts a lot of attention in the computer vision field, for example, in image registration and stitching [
1,
2,
3], structure from motion [
4,
5,
6], visual positioning [
7,
8,
9], object recognition [
10,
11,
12] and 3D reconstruction [
13,
14]. In image registration and stitching, geometric model fitting is used to remove the outliers (wrong matches) and find a subset of reliable matches—the inliers—which conform to a given geometric model. Then these correct matches are used to estimate an accurate homography matrix. Structure from motion uses these reliable matches to recover the pose. In contrast, in visual positioning, object recognition and 3D reconstruction, geometric model fitting is used to estimate the model parameters. For example, in 3D reconstruction, robust model fitting is used to estimate multiple rigid moving objects in order to initialize multi-body structure from motion, or in the processing of 3D point clouds, where planar patches are fitted to produce intermediate geometric interpretations.
Geometric transformations are needed when two-view or multi-view images are considered, but these images often contain noise and outliers, thus a robust geometric transformation is needed to deal with these cases. It is well known that random sample consensus (RANSAC) [
15] and its variants [
16] are commonly used to exclude the outliers, which are single geometric model fitting methods and suitable for processing simple outliers and small noise levels. On the other hand, the geometric multi-model fitting methods extended from the single-model fitting approaches such as sequential RANSAC [
17,
18] and multi-RANSAC [
19] usually fail when dealing with large amounts of outliers and noise. The reason is the mutual interference of these multiple models. Hence a robust multi-model fitting method should simultaneously extract multiple model instances.
So far, the most popular strategy for fitting multiple geometric models is preference analysis [
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32]. The multi-model fitting problem can be considered as a typical example of a chicken-and-egg problem [
29]: both the data-to-model assignments and model parameters are unavailable, but given a solution of one sub-problem, the solution of the other can be easily derived. Hence, the preference analysis-based method first generates a large number of hypotheses by sampling a minimum sample set (MSS), and then performs preference analysis on the hypotheses residuals. The most classical method, J-linkage [
20,
21,
33], adopts the binarized conceptual preference of points, which binarizes the residuals by inlier threshold, and introduces the Jaccard distance to conduct linkage clustering of the point preferences, thus the inliers are segmented into different clusters. Similar to J-linkage, T-linkage [
22,
24,
25] uses relaxation of the binary preference function and the soft Tanimoto distance to improve the conceptual preference in J-linkage for better clustering. Unlike the common “linkage clustering with preference” strategy, robust preference analysis (RPA) [
23,
25] first represents the data points in the conceptual space similar to J-linkage, and then performs robust principal component analysis (PCA) and symmetric non-negative matrix factorization (NMF) to reduce the multi-model fitting problem to many single-model fitting problems, which in turn are solved with a strategy that resembles m-estimator sample consensus (MSAC) [
34]. Because the outliers greatly affect the hypotheses and thus also affect the preference, both J-linkage and T-linkage get rid of outliers by means of an inlier threshold. On the other hand, kernel fitting (KF) [
26] uses permutations as preferences by sorting the residuals of the hypotheses, where the Mercer kernel is built to elicit potential points belonging to a common structure and to remove the outliers. Since then permutation preference had been widely used to represent the hypotheses or the data points for model fitting [
27,
28,
29,
30,
31,
32]. Furthermore, instead of the random cluster models, simulated annealing (SA-RCM) [
32] organizes the point preferences by permutation in a weighted graph, and the multi-model fitting task is stated as a optimization problem based on graph cut, which has been widely used for model segmentation [
35,
36,
37,
38,
39] by combining the model error and smoothness of the points.
Because the fitting accuracy is seriously affected by the outliers, a series of scale estimation techniques [
40,
41,
42,
43,
44] have been proposed to estimate the inlier scale and automatically eliminate the impact of outliers. Most of the scale estimator methods depend on a Gaussian distribution model:
k-th order scale estimator (KOSE) [
40] assumes that the data do not include pseudo-outliers [
45] and the residuals of the inliers are Gaussian distributed; adaptive least
k-th order scale estimator (ALKS) [
40] and modified selective statistical estimator (MSSE) [
41] try to find the parameter
k in KOSE automatically; iterative
k-th ordered scale estimator (IKOSE) [
42,
43] attempts to handle scale estimation with pseudo-outliers with KOSE iteratively, which is quite sensitive to the parameter
k. However, in the geometric model fitting problem, the noise distribution is always extremely complicated, and more or less deviation is caused by using Gaussian distribution, thus the scale estimators perform poorly in geometric model fitting.
In geometric model fitting, generally the residuals of outliers to all the true models are big, which presents the notion of a consensus of the outliers that will be highlighted when the proportion of good hypotheses is big enough. However, few works have been proposed studying the outlier consensus to exclude the outliers for robust fitting. Preference analysis methods like J-linkage and T-linkage eliminate the impact of outliers by using an inlier threshold. Permutation preference doesn’t need an inlier threshold, but is very sensitive to outliers. The conceptual space in J-linkage is the binarization of the residuals by inlier threshold, which extremely compresses the residual information and decreases the differences of points belonging to the different models and thus leads to under-segmentation, while T-linkage proposes a continuous relaxation of the binary to construct the preference set for linkage clustering, which keeps the continuous residual information and the retains the differences of inliers belonging to the same model, thus it results in over-segmentation.
The residual histograms for each data point have peaks corresponding to the true models because hypotheses generated with random sampling tend to cluster around the true model (as presented in
Figure 1), which is used by residual histogram analysis (RHA) [
46] to find modes in the data set. When quantizing the residuals, the peaks corresponding to the true models in histogram will be possibly transformed into one quantized value, while the flat areas corresponding to the quite different residuals will be quantized to different values. As a result, by quantifying the residual histogram, the differences of inliers belonging to the same model can be reduced and the differences of the points belonging to different models can be maintained. Especially, since the residuals of outliers to all the true models are bigger, the quantized values of outliers will tend to be bigger values corresponding to true models. In this way, if the proportion of good hypotheses close to true models is big enough, the residuals of outliers will share more common items to be bigger quantized values, which make the outliers gather away from the inliers in residual histogram preference space.
Therefore, in this paper we first introduce the residual histogram preference concept and compare its performance for outlier consensus with other different preferences including the binarized conceptual preference in J-linkage, the soft conceptual preference in T-linkage and the permutation preference. Then the residual histogram preference is applied in the proposed multi-model fitting method for detecting the outliers. After the outlier detection, an inlier segmentation process is carried out by linkage clustering, but with the permutation preference [
26,
30,
31,
47] applied on the data set with outliers removed. In order to reduce the instability of the sampling process and make the clustering process robust, an alternative sampling and clustering framework is integrated for both outlier detection process and inlier segmentation process, which involves alternately conducting random sampling within each cluster sets, and linkage clustering with the preferences obtained from the sampling hypotheses.
Hence, the contributions of this paper are three fold: (1) We propose a new preference analysis method for geometric model fitting and compare the proposed residual histogram preference with some current preferences like the binarized conceptual preference, the soft conceptual preference and the permutation preference. (2) Based on the residual histogram preference, we propose an outlier detection strategy for geometric model fitting. (3) We introduce a framework for alternative sampling and clustering in both the outlier detection and inlier segmentation for improving the stability and accuracy.
The rest of this paper is organized as follows: In
Section 2, we analyze the outlier consensus in different preference spaces. In
Section 3, we introduce the proposed multi-model fitting method in detail. The experiments in geometric multi-model fitting, including multi-homography matrix estimation and multi-fundamental matrix estimation, are presented in
Section 4. Finally, we draw our conclusions in
Section 5.
2. Preference Analysis on Outlier Consensus
In multi-model fitting, outliers greatly affect the fitting accuracy, and inliers polluted by a few outliers can make the fitting parameters far from the true result. As illustrated in
Figure 2, the least squares line fitting result is greatly influenced by outliers. With the increasing proportion of outliers, the bias between the fitting result and the real model also increases. It can be seen from
Figure 2a that the fitted line model is very accurate when the data does not involve outliers. Therefore, in order to fit correct model instances in the presence of outliers, the outliers in the data must be removed. In fact, many geometric model fitting algorithms usually include some procedure for outlier removal.
Most of existing geometric multi-model fitting methods depend on an inlier threshold to exclude the outliers, which make the fitting accuracy greatly reliant on the inlier threshold. Because the scales of different models vary from each other, a single inlier threshold cannot satisfy all the models to exclude the outliers, thus the single inlier threshold often works poorly in geometric multi-model fitting.
Some scale estimators (KOSE, ALKS, MSSE and IKOSE) consider the distribution of inliers as a Gaussian model and try to find the inlier scale of each model in the data automatically, which successfully makes the fitting process more adaptive and robust. However, the distribution of the inliers is extremely complicated in geometric multi-model fitting, and the Gaussian model cannot fit the inlier distribution well. Therefore, Gaussian-based scale estimators work poorly in geometric multi-model fitting with non-Gaussian inlier distribution conditions.
Because outliers are not assigned to any of the true models, the residuals of outliers are bigger to all the true models, which suggests using a consensus of the outliers. When the proportion of good hypotheses is big enough after the sampling process, the residuals of outliers will share more common items to have big values. However, the consensus of the outliers is not obviously presented in the residual space. Because the scales of different models are different, the extent of the “big” values in different models is also different. On the other hand, the pseudo-outliers will tend to be much closer to the outliers in residual space, which also have many common items with big values. Currently the preference of the data points, designed from the residuals like conceptual space in J-linkage or soft conceptual space in T-linkage, can highlight the consensus of the outliers from the inliers or pseudo-outliers with the help of an inlier threshold. Permutation preference decreases the impact of the inlier scale difference by sorting the residuals and getting a serial number as the preference for data points, which also obviously represents the consensus of the outliers. Similarly, a residual histogram is quite a good way to reduce the influence of inlier scale differences by quantizing the residuals according to each model. Moreover, residual histograms can compress the residual according to the histogram, and when used to present the consensus of the outliers, the difference of outliers will be greatly compressed, thus it will make the outliers gather away from the inliers. We present the detailed preference analysis on outlier consensus for residual histogram preference below.
2.1. Binarized Conceptual Preference in J-Linkage
Preference analysis-based method was first used in J-linkage [
20], in which the consensus set of each hypothesis is computed by collecting the points whose residual is lower than inlier threshold. In this way each point can be seen as the preference set of the hypotheses with the residuals less than inlier threshold. Given the data point set
, the hypotheses set
after the hypothesis generation, and the residual matrix
, where
refers to the residuals of hypothesis
to all the data points in
,
is the data number, and
is the number of hypotheses. Then the conceptual space
in J-linkage is the binarization of residual matrix
by an inlier threshold
given by:
where
is the inlier threshold used to determine whether a point is inlier. Then the conceptual representation
of point
indicates which model point
prefers, thus making the conceptual preference of point
. The preference set can be defined by
, which represents the hypotheses set of point
preference. And the Jaccard distance is proposed to measure the distance
between preference set
and
:
The conceptual space in J-linkage can be considered as , and points belonging to the same structure have similar conceptual representations, thus making the linkage clustering effective to segment the inliers. While for the outliers, whose residuals are bigger to all the true models, when the proportion of good hypothesis is high enough, the conceptual preference of outliers will share more common items to be zero, which will make the outliers aggregately distribute away from the inliers on conceptual preference space.
2.2. Soft Conceptual Preference in T-Linkage
Unlike the binarized conceptual space in J-linkage, T-linkage [
22] introduces a relaxation of the binary representation by preference function, which allows the preferences of a point integrating more specific information on residuals. The preference function of point
is defined by:
where the time constant
plays the same role of the inlier threshold
in Equation (1). Then the soft conceptual preference of point
is defined by
, each element in
represents the conversion of the residual
through Equation (3), where
is the residual of point
to hypothesis
. The distance between two soft conceptual preference
and
is defined as
Tanimoto distance as follows:
where the notation
indicates the standard inner product, and
is the induced norm.
2.3. Permutation Preference
The permutation preferences are also extracted from the hypotheses residuals [
29,
47]. When calculating the permutation preference of point
, we first sort the residuals of
as
, and then the permutation preference of point
is
. In practice, the permutation preference is often used as a top-
k list rather than a full rankings list, i.e.,
,
.
When comparing two points
and
with the corresponding permutation preferences
and
, the Spearman footrule distance
[
29,
47] between preferences
and
is computed as follows:
where
is the union of
and
.
equals
, which represents the position of an element in
when element in
is also exist in
. Conversely,
equals
, where
is the number of elements in
. Moreover, the preferences
and
are identical when
, the preferences
and
are not identical when
.
2.4. Residual Histogram Preference
The residual histograms for each data point have quite significant features whose peaks correspond to the true models because hypotheses generated with random sampling tend to cluster around the true model, which was used by residual histogram analysis (RHA) [
46] to find modes in the data set. Usually the residual histograms of the inliers are skewed right, which is not in line with the outliers. Equivalent with RHA, we introduce a more convenient residual histogram method for representing the data points. The residual histogram preference namely conducts quantization on residuals
R, and takes the quantized value
for representation by:
where
refers to the quantization level. When using the quantized residuals to represent the hypotheses or the data points, a valid quantization length
is needed to decrease the complexity of the residual histogram preferences. Then the quantized value
is transformed as follows:
We will introduce the selection principle on the quantization length
in detail in
Section 4. In this way, we can obtain the residual histogram matrix
. The residual histogram preference for data point
is the
ith row of
, i.e.,
. When comparing two residual histogram preferences
and
, the distance measurement
defined by:
where
when the quantized value
between point
and hypothesis
is equal with
, conversely,
.
2.5. Comparison of the Four Preferences on Outlier Consensus
In the binarized conceptual preference in J-linkage, the residuals bigger than the inlier threshold are set to 0 and the smaller are set to 1, the residuals of outliers are bigger for all the true models, and when the proportion of the good hypotheses is high enough, the binarized conceptual preference of the outliers will have more common items to be 0 with a proper inlier threshold. When projecting to the binarized conceptual preference with the Jaccard distance, the outliers will gather together.
Similar to the binarized conceptual preference in J-linkage, the soft conceptual preference in T-linkage cuts off the residuals bigger than the inlier threshold to 0, and the soft conceptual preference of the outliers will have more common items to be 0 with a proper inlier threshold, thus the outliers gather together in the soft conceptual preference space.
As for permutation preference, the residuals of outliers are bigger for all the true models and the residuals of the inliers are small for the hypotheses close to true models, the top-k items of the outliers’ permutation preferences will share quite few common list with the inliers’, while outliers’ permutation preferences will more likely to have more common items with each other when using a proper permutation length, thus the outliers gather away from the inliers in permutation preference space.
Because the residuals of outliers will be bigger for all the models in the data, the bigger residuals will more likely tend to be close to or 0 (Equation (7)) after quantization. In this way, the residual histogram preferences of the outliers will tend to have more values with 0 or , and when the proportion of good hypotheses is high enough, most of the values in the residual histogram preferences of the outliers will tend to be close to or 0, whereas the corresponding inliers will have quite small values. When projecting the data points into residual histogram preference space with the distance measurement in Equation (8), most of the outliers will present a concentrated distribution, and will be far away from the inliers, thus the outliers are easy separated from the inliers.
As shown in
Figure 3, the matched points between “Image1” and “Image2” in “johnsona” for homography estimation are presented in
Figure 3a, in which the outliers (usually mismatched points) are labeled by red circle. After sampling from the matched points, a great amount of homographies can be calculated as the hypothesis models, and then the Sampson distance is used for calculating the residuals, finally the preferences of the matched points can be obtained to transform the matched points into preference space. The distribution of the matched points in binarized conceptual preference space, soft conceptual preference space, permutation preference space and residual histogram preference space are presented in
Figure 3b−e, respectively, and the mismatched points as the outliers are labeled with red color corresponding to
Figure 3a.
As shown in
Figure 3b–e, outliers in these four preference spaces gather away from the inliers. Although outliers in binarized conceptual preference space (
Figure 3b) and soft conceptual preference space (
Figure 3c) are even more concentrated, both of these preferences need a proper inlier threshold, and the outlier cluster is seriously mixed with inliers, while the outlier cluster in permutation preference space (
Figure 3d) seems pure, but it is more dispersed and the boundary line with the inliers are not obvious. The outliers in residual histogram preference space (
Figure 3e) are quite obviously separated from the inliers, and only a few outliers are mixed in the inlier cluster, which makes the outlier detection in the residual histogram preference space quite successful and always better than the other three preference spaces.
4. Experiments
In this section, we describe the experiments undertaken in geometric multi-model fitting, including multi-homography estimation and multi-fundamental matrix estimation, which are fundamental issues in image stitching [
48,
49] and visual localization [
50]. Comparisons on the segmentation accuracy for real data with some of the state-of-the-art methods (PEARL [
35], SA-RCM [
32], J-linkage [
20], and T-linkage [
22]) are made to present the characteristics of the proposed method, and the corresponding outlier detection results are also given. The reason for choosing PEARL, J-linkage, SA-RCM and T-linkage as the comparison is not only because they are all state-of-the-art methods, but also because J-linkage and T-linkage are algorithms based on preference analysis, which is suitable for a comparison with our proposed method. Both the multi-homography estimation experiment and the multi-fundamental matrix estimation experiments were based on the AdelaideRMF [
28] data set. In addition to the fitting preference images, we also use the overall misclassification percentage (number of misclassified points divided by the number of points in the data set) [
51] to present the fitting performance when dealing with the multi-homography and multi-fundamental matrix estimation.
4.1. Multi-Homography Estimation
Table 1 shows the misclassification results of the state-of-the-art methods and the proposed method for multi-homography estimation, where the results of the other four methods were obtained from [
22]. It can be seen that the proposed method obtains the lowest misclassification result on most of the data sets and obtains the third lowest rate on “oldclassicswing”, which is actually very close to the lowest rate. The corresponding inlier segmentation preference results are presented in
Figure 7, in which the black dots represent the outliers detected in
Section 3.1 and most of the inliers for the different homography models can be segmented quite accurately. For the “johnsonb” data set, inliers for all four models are segmented accurately, which occupy the majority of all the points, but the inliers for models with a smaller inlier scale are easily mixed up and poorly segmented.
As a matter of fact, most of the multi-model fitting methods do not work well when dealing with a data set such as “johnsonb”, which contains quite a lot of models with highly varied inlier scales. This results in the sampling being uneven and the sampling rate on models with a smaller inlier scale being too low for the preferences to present a consensus on the inliers. Although the sub-region random sampling method can overcome the uneven sampling problem, the improvement is quite limited without the consideration of the distribution of the model inliers, which is quite difficult to obtain without sampling and preference analysis or a priori knowledge. The alternative sampling and clustering framework make it possible to sample the inlier clusters, but the performance is greatly dependent on the initial inlier segmentation results.
4.2. Multi-Fundamental Matrix Estimation
The misclassification results of the state-of-the-art methods and the proposed method for multi-fundamental estimation are presented in
Table 2, where the results of the other four methods were obtained from [
22]. The proposed method obtains the lowest misclassification error on all six experimental data sets, and even obtains zero misclassification error on three of them. The corresponding inlier segmentation preference results are presented in
Figure 8, in which the black dots represent the outliers detected in
Section 3.1 and most of the inliers for the different fundamental matrix models can be segmented quite accurately.
Overall, the proposed method works quite well in the multi-fundamental estimation experiment. For the “dinobooks” data set, the proportion of outliers (43%) is very high, and eight points need to be randomly selected every time to generate a fundamental matrix hypothesis, which results in the proportion of good hypotheses being very low and seriously impacts the performance of the preferences.
4.3. Outliers Detection
The outlier detection results are presented in
Figure 9, in which the outliers detected from all 12 data sets for the multi-homography estimation and multi-fundamental matrix estimation are shown. The ratio of the number of detected outliers to the number of total outliers is also listed in
Figure 9. As can be seen from the outlier detection results, most of the outliers can be detected and with no inliers mixed over all the data sets and all the ratios of detected outliers are higher than 87%.
During the outlier detection process, the key parameters are quantization level
and quantization length
, which have a great impact on the residual histogram preference for detecting outliers. In our experiments, these two parameters were set to be same for all six data sets in the multi-homography matrix estimation and multi-fundamental matrix estimation, respectively. In order to better distinguish inliers and outliers, the quantization length
should set to a small value, which makes the residual histogram preferences of the inliers contain more nonzero elements, while the outliers will contain few non-zero elements. In the multi-homography matrix estimation, we set
and
for all six data sets. In the multi-fundamental matrix estimation, the model becomes more complicated, and
gave a good result in our experiments. In this paper, we define an outlier-inlier distance (OID) to measure the difference between the inlier and the outlier, and use OID in analyzing the parameter settings. We calculated the outlier-inlier distance (Equation (12)) for every combination of
, and
, and the results for all the data sets are presented in
Figure 10.
Figure 10 shows the impact of the quantization level and quantization length on the outlier-inlier distance (OID in Equation (12)). The left column shows the outlier points in the images. The middle column shows the MDS plots for the data points in quantified residual preference space, where the quantization level
and the quantization length
for the multi-homography matrix estimation data (
Figure 10a–c) and
and
for the multi-fundamental matrix estimation data (
Figure 10d–f). The right column presents the
OID change with quantization level
and quantization length
, in which areas with a color closer to red refer to bigger outlier-inlier distances and outliers that are more easily separated from the inliers.
It is clear that the change trends of
OID are very close for the three multi-homography matrix data sets, which are also similar for the three multi-fundamental matrix data sets. The biggest OIDs for the multi-homography matrix data sets are near to the side where
and
. Actually,
obtains a good detection result, and
at the minimum value was applied for all the data sets in our experiments. Meanwhile, for the multi-fundamental matrix data sets, the biggest
OIDs arise along the
x-axis, which means that the value of
needs to be small, and
. However, in the multi-fundamental matrix estimation experiments, when
is too small, the outliers will be too dispersed to cluster into one group and, in practice,
can give a good result, and
can still ensure the separability of the outliers. We define the outlier-inlier distance (OID) as follows:
here
and
refer to the outlier set and inlier set, and
means the three-dimensional coordinate of
when projected in MDS space. Then
is the distance of point
and
in MDS space,
refers the
k-nearest inlier to point
in MDS space,
, where
is the ascending order of the distance from outlier
to all inliers
. Then OID is the average distance of the
k-th nearest inlier distance for each outlier. In our experiments, the value of
k was set to 20.