1. Introduction
Modern methods of modeling, digitization, and visualization of 3D objects brought explosive growth to the number of models accessible in different databases and on the Internet. This increase has allowed the development of new methods, fast progress in 3D object classification and identification, and new approaches in 3D object search and retrieval systems. The quick, accurate, and efficient identification of 3D objects is a practical problem in robotics, self-driving cars, medical image analysis, computer vision, 3D printing, protein matching, etc.
Two standard methods are usually used to represent and visualize 3D shapes: either a 3D model is represented as a surface (for example, as polygonal/triangular meshes) or a voxel representation. Each approach has its advantages and drawbacks.
In reviews [
1,
2], a plethora of methods for 3D search for both surface and voxel representations are considered. The approaches are assessed with respect to several content-based 3D shape retrieval criteria, such as properties of dissimilarity measures, discrimination abilities, ability to perform partial matching, robustness and efficiency, shape representation requirements, and the necessity of pose normalization.
This work is directed towards solving the optimization problem of identifying two 3D objects, namely, deciding if they are the same or distinctive. In case objects and are identified as the same, it is necessary to find a rigid transformation to map them together. The problem of object identification is different from the problem of classification when it is necessary to find if the objects have the same properties intrinsic to the same class.
The rigid transformation should minimize some distance metrics between the transformed original object and the sample . If this distance is less than some predefined threshold, consider the objects similar and indistinct.
Let us assume that a 3D object is defined by its triangular mesh. Different ways can be found to map objects to each other rigidly. One state-of-the-art method is the ICP (iterative closest point for point cloud registration) algorithm [
3,
4]. It is based on iterative gradient calculation and improvement of the solution.
In [
5], the problem of shape matching is reduced to comparing probability distributions, and a method for calculating shape signatures for arbitrary (possibly degenerate) 3D polygonal models is proposed. The key idea is to represent the signature of an object as a distribution of some geometric properties (features) of the object (for example, the distance between two random points on the surface). This signature should allow distinguishing between classes of objects (for example, cars and airplanes) in a medium-sized database. Of course, the method based on such an idea can be used as a preliminary classifier, but it does not solve the problem of identifying objects. In [
6], the mapping transformation is calculated by maximizing the overlap of objects’ features.
In [
7], query methods were developed that are simple enough for novice users, and relatively reliable matching algorithms were used to work with arbitrary polygonal models based on 3D sketches, 2D sketches, 3D models, and text keywords.
The neural network approach is usually used to solve the problem of object classification and check if the object under analysis owns the common properties of the objects of some class. Ref. [
8] attempts to improve both volumetric convolutional neural networks (CNNs) based on volumetric representations of the surface and multi-view CNNs, which are based on representations with multiple representations. The authors introduce two different network architectures of volumetric CNNs and investigate multi-view CNNs with filtering having multiple resolutions in 3D. In [
9], a patch convolutional neural network (PCNN) is proposed to search for 3D models based on representations.
Another direction to solve the mapping problem by applying neural networks is deep learning-based point cloud registration. A review of the recent progress and overview of methods can be found in [
10].
2. Relevant Methods: State of the Art
Earlier works on shape representation analysis for object identification can be divided into two broad categories. The first category uses methods such as PCA to align the model in a canonical coordinate system. Then, it defines the representation of the shape by relating it to this orientation. Such methods include, among others, moments for object surfaces [
11] and Extended Gaussian Images [
12]. Kazhdan [
13] (Appendix B) mentions that PCA-based methods are unstable due to the multiplicity of eigenvalues and sensitivity to outliers. The second category of methods defines representation invariants with respect to rotation and includes methods such as Shape Histograms [
14] and Shape Distributions [
5].
In this work, we adhere to ideas based on decomposing some function given on a sphere into the sum of spherical harmonics. The works of [
7,
13,
15,
16] gave the impetus for developing such an approach to the problems of classifying 3D objects and retrieving their information from databases.
In [
17,
18], tools for searching for 3D objects were presented. The model is represented as a polygonal grid and serves as a query, and similar objects are extracted from a collection of 3D objects. The first stage of the algorithm is normalization (estimation of the pose) when the models are transformed into a canonical coordinate system. Then, feature vectors are extracted and compared with vectors obtained from normalized models in the search space. Using metrics in the feature vector space, the nearest neighbors are calculated and ranked. The objects extracted in this way are displayed for inspection, selection, and processing. Feature vectors are based on rays cast from the object’s center of mass. The distance from the center of mass to the surface in the ray’s direction sets the function value on the sphere. To evaluate the pose, a modified Karhunen–Loeve transformation is introduced, taking into account not only vertices or polygonal centroids from 3D models but also all points in polygons of objects.
In [
19], the 3D model’s symmetry descriptors are presented as a set of spherical functions that describe the measure of the model’s rotational and reflective symmetry with respect to each axis passing through the center of mass.
Papadakis et al. [
20] present a methodology for reconstructing a 3D shape based on spherical harmonics, which ensures invariance to scaling and rotation. Rotation normalization is performed using continuous principal component analysis and its application to the model surface normals. The 3D model is represented as a set of spherical functions. This representation comprises the intersections of the surface with rays originating from the origin as well as points in the direction of each ray that are closer to the origin than the farthest intersection point.
Ref. [
21] presents a 3D shape descriptor that uses a set of panoramic views of a 3D object. The views describe the position and orientation of the object’s surface in 3D space. A panoramic view of the 3D object is obtained by projecting it onto the surface of the cylinder parallel to one of its three main coordinate axes and is centered on the object’s centroid. The object is projected onto three perpendicular cylinders aligned axes to capture the global shape of the object. The corresponding 2D discrete Fourier transformation and a 2D discrete wavelet transformation are found for each projection.
Ref. [
22] presents the decomposition of spherical harmonics for spherical functions to represent 3D triangulated star-shaped objects. After splitting the surface into star-shaped sections, this result can be extended to any triangular object. The evaluation of the spherical harmonic coefficients is performed by edge integration using the Monte Carlo method, which distinguishes this work from voxel methods.
In [
23], after translating and scaling the 3D model, the concentric spheres are used to analyze the object, and a new set of functionals is defined and applied for each sphere. This results in a vector of feature descriptors. This vector is invariant to rotation and, hence, suitable for matching 3D models. Weight coefficients are assigned to each descriptor to improve the mapping.
In [
24], descriptors of 3D shapes are generated using functions on a sphere. A set of vector descriptors invariant to rotation is extracted using spherical harmonics. First, the models’ size is normalized, and then a sample of data is taken from the surface of the 3D model and converted into point cloud data. Next, spherical harmonics are applied to this point cloud to obtain invariant rotation descriptor vectors. The method proved resistant to noise on the model’s surface and allowed us to compare similarities of 3D models.
The essential part of identifying the similarity of two objects is to find a rotation that best combines two sets of vectors (or two clouds of points). Algebraic approaches to solving the problem of optimal rotation were proposed in [
25,
26,
27,
28]. These results are exact but require knowing the correspondence among the points in these two sets.
In [
29], the spherical harmonics of a pair of images are related to each other by a shift theorem, which uses an irreducible representation of the rotation group. Using this theorem, Euler angles are extracted. Regarding the required number of spherical coefficients, it is assumed that they are global image encoders and that rotations can be estimated using a simple reference table of combinations of coefficients and angles.
Esteves et al. [
30] solve the equivariance problem of 3D rotation using convolutional neural networks. The authors model 3D data with spherical functions and propose a spherical convolutional network that implements point convolutions on a sphere. The paper demonstrates that networks with low bandwidth without increasing the dataset size can demonstrate performance comparable to state-of-the-art networks.
In [
31], the fundamental problem of finding the shape and registering a point cloud for the point clouds with zero point-to-point correspondences is considered. Using the trained optimization process, the authors propose a method to represent 3D point clouds by spherical Gaussians and a rotation-invariant convolution.
In [
32], spherical harmonic functions are directly used to simulate irregular 3D particles. Discrete surface points of irregularly shaped 3D particles are represented by spherical harmonic functions with a limited number of harmonic coefficients to restore the particle’s morphology. These spherical harmonic functions are used to detect overlap between particles and calculate the interparticle contact forces, moments, and particle motions in various engineering and industrial processes.
The analysis of spherical harmonics was initially used to solve problems in geophysics, potential theory, and mathematical physics [
33]. Later, it was found to apply to the classification and identification of 3D objects.
An increasing number of 3D models of buildings are hosted on web platforms for model sharing. Therefore, based on data reuse, in [
34,
35], an approach to coding and searching 3D models of buildings using point clouds obtained by airborne light detection and ranging (LiDAR) systems is proposed. To encode LiDAR point clouds with sparse, noisy, and incomplete sampling, the authors introduce an encoding scheme based on a set of low-frequency spherical harmonic basis functions. Descriptors are extracted from spatial histograms and used to search for 3D models.
The state-of-the-art approach to solving the optimization problem of mapping objects to each other is the ICP algorithm. The solution is found by step-by-step optimizing the loss function defined at the sets of points of both objects. Based on gradient search, the ICP iteratively finds the best correspondence among the subset of points of the objects. The latest modification of the ICP is Go-ICP [
36], where Local ICP is based on a branch-and-bound (BnB) scheme.
In [
37,
38], global optimization methods based on implicit enumeration were mentioned. As per their taxonomy, these methods belong to the space-covering group, which aims at implicitly exploring the whole feasible region. Such approaches could be iterative and, at the same time, non-gradient methods. These methods started with the simplex method [
39] and the algorithms in [
40,
41] and do not require calculating the gradient; they are still iterative methods. There are several modifications of the Powell method, such as COBYLA [
42], which employs linear polynomial approximations to the loss and constraint functions by interpolation at the vertices of simplices.
Another one-pass approach to calculating the mapping of two 3D objects is based on training a deep learning network. The trained network calculates the mapping in a non-gradient and non-iterative way. Such approaches could be found in [
43,
44,
45,
46,
47], where the neural network is initially trained to extract local spatial features from each of the 3D objects under comparison. Then, the mapping is calculated based on these features.
If the number of possible extrema at the object’s surface is relatively small, it is not prohibitively expensive to find the matching of the extrema for both objects and check how good these mappings of the objects are. As the possible number of extrema combinations is assumed not high, as an option, the mapping can be found even by exhaustive search without the calculation of gradients in a non-iterative way.
This is the idea of the proposed method for solving the identification problem and finding the mapping of the objects. Let us use a spherical transformation to map the 3D object in the feature space (encoding), and for partial reconstruction (decoding), let us define a 3D shortest distance function. The values of this function are treated as random values. The summation of random values gives us several extrema (see
Section 3.9 and
Section 3.10). Calculating the mappings for the corresponding extrema for two 3D objects under comparison allows us to find the required transformation or tells us that the objects are different.
In the “Results and Discussion” section, a comparison of the proposed method with a deep learning approach (using [
43] as an example) is given.
3. Method
Let us assume that there are two 3D objects: and . Each of these objects is represented by its triangular mesh. It is necessary to define whether these two objects are identical (up to some calculation error). If they are similar (this analysis considers the surface and the internal architecture), find the transformation matrix (rotation and translation) to map them.
Let us reiterate that the problem to be solved is not to identify if the class of the objects is the same (for example, for two gears with different numbers of teeth) but to check whether either two objects could be transformed to each other by a rigid transformation.
Initially, let us describe an overview of an identification algorithm, and every step will be explained in detail in the following sections.
It is necessary to compare two 3D objects, and , for identity. Each object’s model mesh is watertight and has consistent winding and outward-facing normal vectors (hereafter, let us define this set of features as “correct geometry”). A corresponding 3D shortest distance function (SDF) is calculated for each object. The SDF value at the 3D point is the shortest distance from this point to the object’s surface. Spherical harmonic coefficients represent the calculated SDF function. Using expansions of degree , we partially reconstruct the SDF functions related to objects and .
First, the energies of the corresponding degrees of spherical harmonic coefficients are compared to analyze the objects’ similarity. Then, extrema amplification is utilized, and next, an attempt to align the most distinguishable extrema of both partial reconstructions is conducted. If the objects match with the acceptable quality, the matrix of the object mapping is calculated.
Preprocessing. For object , let us denote its surface as . The surface (the object’s mesh) is assumed to have the correct geometry.
- P1.
Shift the origin of the system of coordinates to the centroid (the center of mass of the object’s surface).
- P2.
Scale so that none of its vertices are outside the sphere of radius sixteen, and at least one vertex is on the sphere surface. Let the coefficient of scaling be .
- P3.
Place concentric spheres (shells) centered at the origin with radii (we use and ). These spheres cut .
- P4.
On the
-th sphere,
, a continuous function
is defined, and
are spherical coordinates. SDF
is the shortest distance from the point
to the surface
(see the
Section 3.1 and [
48]).
- P5.
At each sphere, a grid of points
,
, is defined. For this grid, we can use, for example, a
t-design [
49], Fliege–Maier [
50], or an icosphere [
51] set of points (see
Section 3.2). These points are employed for low-error integration of spherical functions.
- P6.
At each sphere, using , let us calculate the spherical harmonic coefficients (or Fourier–Legendre coefficients) , , (see the definition (3) below). Let us calculate spherical coefficients up to degree .
- P7.
At each
i-th sphere, let us partially reconstruct the function
using these coefficients
for
degrees and the icosphere grid
(see the
Section 3.6).
- P8.
Next, on the unit sphere, let us add all the functions
over the entire set of concentric spheres as follows:
and define a surface
. The surface points are connected the same way as the corresponding points of the icosphere.
- P9.
Find the extrema points and saddle points of the surface of the previous item.
- P10.
Build the convex hull of the points found in the previous item.
- P11.
Order these points by not increasing their radius vector lengths. Denote this sequence as .
We repeat the same preprocessing for object and denote the corresponding calculated function with replacement to , obtaining, for example, , etc.
Steps of the algorithm. Let us find a transformation matrix that maps to , if it exists.
- A1.
Let us compare the scaling coefficient and . If the objects are not equal (up to some tolerance), they are different. If we want to check the similarity of two objects while ignoring the scale, go to the next step.
- A2.
For each sphere
, calculate and compare energy for each degree
for both objects. If
for at least one degree
, then the objects are different; otherwise, go to the next step.
- A3.
Let us choose vector . This is the longest vector from sequence that has not yet been used by the algorithm. If there is no such vector, go to step A8.
- A4.
Look for subsequent vectors from , which forms an angle of at least 10 degrees with . For each such vector, pair and and go to the next step; otherwise, go to step A3.
- A5.
Find the furthest point of the convex hull from the plane , which contains the origin and vectors and . Find, if any, the corresponding points, , , and , in sequence . These points in should have the same length of vectors and angles among vectors (up to some tolerance) as the vectors in . The scalar triple product of and should have the same sign.
If there are no such vectors, go to step A3 and look for the next vector .
- A6.
Calculate the rotation matrix using the Kabsch algorithm [
25,
26] to map the triplet of non-coplanar vectors
to
. If this (or similar, up to some tolerance) rotation matrix was calculated and analyzed before, go to step A3, and look for the next vector
.
- A7.
Using the rotation matrix found, rotate
to
, and calculate the vector of spherical harmonic coefficients
for
. In the ideal case, objects
and
should coincide. For each sphere
, we calculate the cosine similarity
(i.e., find the angle between two vectors),
, and
. To evaluate the quality of the mapping, we introduce two metrics (see
Section 3.3) as follows:
The rotation matrix found is considered an acceptable solution if is more than some tolerance (we set it as , with the perfect match being 1.0). This means that the cosine similarity of each sphere is greater than this value.
If there are several acceptable solutions, the best of them is defined as the one having the minimal (the minimal sum of deviations from the perfect match of coefficients for all the spheres). If is small (we chose ), we assume that the best acceptable solution has been found; stop the search. If step A7 was executed more than 30 times, go to step A8; otherwise, go to step A3.
- A8.
Suppose none of the acceptable solutions has been found. In that case, the objects are different; otherwise, the objects are considered the same, and the best rotation matrix found at step A7 is the solution. Return the best acceptable solution found and stop the algorithm.
Thirty repetitions are, in some sense, overkill. The rationale for choosing this number and its sufficiency will be discussed in
Section 3.8,
Section 3.9 and
Section 3.10.
Let us describe the steps of the algorithm in detail.
3.1. Spherical Harmonics
Spherical harmonics are the basis functions for irreducible representations of a group of rotations in . These orthonormal functions form a basis of the Hilbert space of square-integrable function .
Let us denote
a unit sphere. Any square-integrable function
can be represented as a sum [
33] as follows:
where spherical harmonics coefficients (or Fourier–Legendre’s coefficients) are
and spherical harmonics
are defined by
Here,
are associated Legendre’s polynomials as follows:
and
are Legendre’s polynomials, which can be represented by the Rodrigues differential form as follows:
Under rotation operation, a spherical harmonic of degree and order transforms into a linear combination of spherical harmonics of the same degree. For each degree , the basic functions , form a set of irreducible representations of the group of dimensions . This means that any rotation will be a self-homeomorphism in this subspace.
Let us denote the result of a rotation around the center of the sphere (around the origin of the system of coordinates) applied to sphere as and the function = calculated on a rotated sphere as . The rotation of sphere corresponds to the rotation of the object.
First, let us translate object
the way that the centroid of
is in the center of the coordinates. For now, we assume that objects
and
are the same, and
is the rotated version of
. We would like to find the matrix of rotation
. Next, we calculate
and
, the coefficients of spherical harmonics for degrees
up to some
for objects
and
. The useful property of energies for any degree
is that the energy is not changed by rotation. As basic functions
for each
form a subspace of dimension
, the total power of function
in the subspace corresponding to the degree
does not depend on the rotation
.
This directly follows from Parseval’s theorem [
33] and the properties of the group
. The coefficients for every degree
for the rotated object could be found using, e.g., the Wigner
-matrix [
52].
Functions
and
could be expanded on a unit sphere
as
where
If we only consider a specific degree
in Formula (4), we will obtain a partial reconstruction of the functions
and
for this degree. This reconstruction will use
coefficients
and
,
for the whole expansion.
Expansions
and
are given by arrays of their spherical harmonic coefficients, {
} and {
}. Let us denote the partial expansion of the degree
of the function
as
and
for
as follows:
Function is defined by coefficients {} with . The norm of deviation of functions and could be defined as a cosine similarity of arrays {} and {} for .
As we postulated, function is a result of rotation of function , and the results of the partial reconstruction will have the same relationship—they are the rotated versions of each other. For every degree (as well as for any combinations of such degrees), the partial reconstructions of and in could be mapped by the same rotations .
3.2. Numerical Calculation of Spherical Coefficients
Let us introduce several different grids for numerical calculations for the method proposed.
The numerical integration (step P5) of the square-integrable function defined on the two-dimensional sphere is performed using quadrature formulae. This function is evaluated at some nodes in the sphere. For example, this set of points could be Fliege–Maier’s set of points on the sphere for low-error integration of spherical functions [
50]. The mentioned grid can be used for integration by directly summating the function evaluated at grid points and weighted with the respective weights. Another option could be to use
t-designs [
49], which constitute uniform arrangements on the sphere for which spherical polynomials up to degree
can be integrated precisely by summating their values at the points defined by the t-design. One more way could be to use a grid defined by the icosphere [
51].
The Gauss’ quadrature formulae with the mentioned designs provide fast and accurate calculations of spherical harmonic coefficients. In [
50], it was shown that for pre-computed nodes and weights, the integration error drops to
for many functions.
Another approach could be performing the spherical harmonic transform (SHT) in the least-squares sense. Using values
(see Formula (4)) at nodes
on the grid, the spherical harmonic coefficients
could be computed using the Moore–Penrose pseudo-inverse of a matrix [
53]. The following least-squares problem is solved with respect to variables
as follows:
In step P7, it is necessary to reconstruct a function at the sphere by its spherical harmonic coefficients. This reconstruction requires calculating the function at the preferably uniform grid. We used a tessellation [
51] built from an icosahedron. To obtain a tessellation, the following steps are performed: (i) divide each triangular face of the icosahedron into four smaller triangles by connecting the midpoints of the three sides; (ii) normalize the new triangle vertices on the unit sphere; and (iii) repeat these steps until the desired solution is reached.
After applying five such iterations to the original icosahedron, we obtained an icosphere containing 10,242 vertices. Each vertex of this triangular tessellation has six adjacent triangles, except for the original twelve icosahedral vertices with five adjacent triangles.
3.3. Comparison of Two Objects: Optimization Function
We have two 3D objects, and , to be compared for identity. Let us mathematically define the meaning of the word “identity” used in this work.
We assume that a 3D object has the surface with a correct geometry. The object is scaled the way that the centroid of its surface is at the origin of the system of coordinates, and the object itself fits into a sphere of radius 16 and touches this sphere. The number 16 was arbitrarily chosen as a parameter of the method.
Let us place concentric spheres (shells) centered at this origin. We use a set of nine spheres with radii of 1, 3, 5, …, 17. On each -th sphere, , and a continuous function is defined. This function is called the shortest distance function (SDF), and its value is the shortest distance from the point of this -th sphere to the surface . This distance could have a positive or negative sign. The distance is considered positive if the point , for which the distance to the surface is measured, is outside this surface, and negative otherwise. The SDF is a continuous function on .
Let us assume that we found the function where corresponds to object spherical coefficients of -th sphere, and the function corresponds to .
Due to the Parseval equation [
33], if the objects are the rotated version of each other, the absolute value of the difference of energies for expansions
and
for degree
should be equal to zero as follows:
Due to inevitable errors of numerical calculations, we assume that for the rotated versions of objects, the difference should be less than or equal to some threshold
for all the degrees
used for partial reconstruction. In the calculations, the difference
is considered negligible if it is either below 5% of the first object’s energy or below 0.01 (the approach is the same as the Python function
isclose with relative and absolute tolerances). Condition (6) is used to check the energy similarity of both objects in step A2.
It is also possible to use a more general form as follows:
where
As a condition of similarity, we request that
should be less than the cosine of 10 degrees. This condition is necessary but not sufficient, as shown in [
13] (Figure 5 with an airplane).
One more additional loss function used is
This is the sum of overall deviations of the spherical coefficients at each sphere. We consider the smaller the better the solution.
The loss functions, and , are used in step A7.
If we assume that function
is the result of the application of function
to sphere
(sphere
rotated by transformation
), then
. We would like to find the transformation
to map object
(corresponding to function
) to
(corresponding to function
). This transformation
can be defined as
In case the best found gives the cosine similarity measure above some predefined threshold , we consider objects and as nonidentical. We chose corresponding to 10 degrees.
One more measure of mapping accuracy could be the Root-Mean-Square Deviation between two reconstructions (5).
3.4. Behind the Comparison: Usage of Spherical Harmonics as Structural Patterns
Let us denote the partial reconstruction of function for degree as . The same denotation will be used for .
Let us start with a toy example and assume that at degree
for function
has only one coefficient with order
, say,
and
; see
Figure 1.
The partial reconstruction for degree will be the spherical harmonic multiplied by the corresponding coefficient . Let us assume that this coefficient is equal to 1. Let us take a look at the visualization of a spherical harmonic . We can see that the number of extrema points on the harmonic’s surface is not large. The reconstruction will look geometrically (spatially) the same; hence, there is the same number of maxima for the original and rotated spherical harmonic. If the number of maxima is relatively small, we can find the mapping of the maxima of to the maxima of using an exhaustive search. We can preliminarily order these maxima by their values (length of vectors) and angles between different surface extrema points to speed up the process. This ordering would allow us to find the correct mapping faster with fewer combinations to be checked.
If, in the toy example, the harmonic under consideration has a continuous rotation symmetry, like the spherical harmonic in order (3,0) (
Figure 2, see a shape in the center of the last row), then the extrema and the saddle points of the harmonic surface could be sampled at the grid with the chosen grid step. The middle part (the “equatorial’’ line) will obtain limited points. Matching of respective extrema points at both objects could still be performed using an exhaustive search. The accuracy of matching depends on the size of the grid step.
Suppose several non-zero coefficients exist in the expansion for some degree (Formula (4)). In that case, the number of surface extrema for each harmonic is limited. For the partial reconstruction (4), the number of extrema in the sum of weighted spherical harmonics participating will also be bounded. The same consideration is valid for the sum of all the degrees participating in the partial reconstruction (5).
We will show that under some assumptions, the sum of all the partial reconstructions at all the shells mapped to the unit sphere will have the expected number of top-value extrema to be relatively small (on average, below thirty). We define top-value extrema as those that are among the largest by absolute value. A more formal definition will be given in
Section 3.8.
3.5. Non-Gradient Methods
The non-gradient method could be applied if either the direction of function descent is calculated without taking a gradient, like in [
41] the simplex or Powell methods [
40,
42], or if the number of points to be checked for being the extremum is relatively small and we can examine them one by one. We will follow the second approach for the problem of mapping two 3D objects.
If we look at spherical harmonics inside correspondent degrees (see
Figure 2), we can notice that when the band has a low number, the spherical harmonics have a relatively limited number of local extrema (the local peaks on the 3D surface of the harmonic). When we find a weighted sum of these harmonics, the number of local extrema at the resulting surface will also be limited and relatively small.
Considering that function is a weighted sum of spherical harmonics, and the spherical harmonics of different degrees are independent, we can say that function itself is the sum of partial reconstructions corresponding to every degree. These partial reconstructions for degree for 3D objects and will also be rotated at the same angle as the original objects, and . This observation is accurate for any specific degree or combination of degrees. Also, it is accurate for whatever square-integrable function on the surface of a sphere is analyzed. This means that exactly the same angle of rotation will be calculated (up to the error of calculation) for every partial reconstruction of degree on every concentric sphere.
Having a moderate number of local extrema at the partial reconstruction for level for and and considering that all the rotations are around the center of coordinates, the angle of rotation to rotationally match the peaks of reconstructed surfaces to each other could be performed, for example, by an exhaustive search of all possible configurations in a low-dimensional space. This calculation provides a rotation matrix to match and . Instead of the exhaustive search (due to a limited number of top-valued extrema generated), the solution could be found by analyses of only several possible combinations. The measure of the accuracy of mapping could be, in addition to (Formula (7)) and (Formula (8)), the Root-Mean-Square Deviation between two reconstructions.
3.6. Efficient Calculation of the Shortest Distance Function (SDF)
The SDF function calculation we used is based on library [
48], where instead of the SDF, which computes the distance to all the faces of the surface, this value is approximated by the distance to the faces adjacent to the nearest neighbors of the grid point
on the
i-th sphere with radius
.
3.7. Building the Surface at Icosphere Nodes
As mentioned in
Section 3.3, we cut the object by concentric spheres. Let us consider an SDF function
—the shortest distance from the point
of the
-th sphere to the surface
. Function
is continuous by parameters
and
, as well as by the radius of the sphere and the vector
, which is defined in the spherical coordinate system. If the value of
, the corresponding vector of length
is directed opposite vector
.
At each sphere, a grid of points,
,
, is defined. We use an icosphere [
51] set of points out of 10,242 vertices after five subdivisions. This icosphere has a dense and rather uniformly distributed set of vertices. Using values
, we compute (3) the spherical harmonic coefficients
; then, we partially reconstruct function
on the unit sphere at the icosphere grid
for degrees
using (1).
As was mentioned in steps P8–P11, on the unit sphere, we add functions over the entire set of concentric spheres, see Formula (2), and connect the points to make a surface with the adjacency matrix for the icosphere. We find this surface’s extrema and saddle points and build the convex hull using the points found. The number of vertices in the convex hull built on these points is usually much smaller than the number of points in the original surface . These vertices are ordered in sequence by the length of the radius vector. We obtain a similar sequence for function .
Let us form the sequence of lengths of vectors corresponding to the vertices of the convex hull . The global maxima of will have the same values as the global maxima for and will be reached at the same icosphere grid nodes. The values corresponding to the global maxima will be concentrated in the right tail of the distribution . The number of extrema on a convex hull cannot be more than the number of extrema for the original surface on which it is built.
If objects and are similar, then to match them, we can only use the top extrema from and and later check the quality of the mapping.
The process described here will be referred to as extrema amplification hereafter.
3.8. Values of Function as Dependent Random Variables
Let us find a set of values of the function
(1) at the icosphere grid
, and consider these values as a random variable
distributed on the segment
from global minima to global maxima. The variable
at each sphere
has finite mathematical expectations and variances as follows:
Let us discuss the rationale for the summation of functions in Formula (2) and how it ensures that the number of variants to be checked to find the solution is below two on average. In the proposed method, to obtain either a satisfactory result for object rotation (if it exists) or a negative answer otherwise (objects are not identified as similar), it is sufficient to apply the matching algorithm no more than 30 times. This number is confirmed experimentally using the algorithm.
To explain how the idea appeared and what is the logic behind it, we will first carry out the justification under the unrealistic assumption that the random variables corresponding to the distributions of the values of the distance functions on each sphere are independent. Then, the possibility of using the results found for the classes of dependent random variables will be discussed.
We know nothing about the 3D object and its SDF, as no assumptions are made about the object’s shape and topology. So, first, let us consider a toy example, then how the addition of constraints affects the result, and finally, how the experimental result is compared with the expected theoretical result.
3.9. Method 1: Central Limit Theorem
Consider the sum of random variables
. Then, in particular
One of the classical formulations of the central limit theorem is contained, for example, in [
54] (p. 383, Theorem 27.2).
Lindenberg’s Central Limit Theorem. Suppose that sequence ,
is independent and satisfies (9). If Lindenberg’s conditionholds for all positive , then at .
Here
and
denotes the standard Gaussian random variable, i.e., having zero mathematical expectation and variance equal to one. The density of the standard normal distribution is
This theorem follows from the more general Lyapunov theorem [
54] (p. 385, Theorem 27.3). While the classical central limit theorem requires finite moments of order
, Lyapunov’s central limit theorem requires finite moments of order
for some
.
Lyapunov’s Central Limit Theorem. Suppose that sequence , is independent and satisfies (9). If Lyapunov’ conditionholds for some , then at .
As shown in [
54] (p. 385), the Lindenberg condition (10) follows from the Lyapunov condition (11), but not vice versa.
Note that fulfilling the conditions of the Lindenberg and Lyapunov theorems is only sufficient to satisfy the requirement of independence and not the identical distribution of random variables.
A special case of Lyapunov’s theorem is the Bates theorems [
55] on the continuous probability distribution of the mean of independent uniformly distributed random variables on the unit interval and the Irwin–Hall theorems [
56,
57] on the continuous probability distribution for the sum of independent and identically distributed random variables. These theorems also tend to have a normal distribution.
Applying Lyapunov’s theorem to the sum , we find that will approach the normal distribution. The summation is performed per each element of , and will also have 10,242 elements. Due to the trend to the normal distribution, the maximum values at the right tail of the distribution (as well as the minimum values in the left tail) will have a relatively low probability. For example, if the extremum is greater than , where is the variance of this normal distribution, then the probability of encountering these values will be 0.27%. In other words, if we have a large sample from the population of all sums , then the number of extremes in the sample above will be approximately 0.27%. There are 10,242 points in our set. This means that 99.73% of values will be expected to be less than in both directions, and approximately no more than 30 values are greater than . The matching algorithm will use these 30 longest vectors (and the corresponding pyramids—triplets of non-coplanar vectors in steps A5 and A6).
Note that the longest vectors among the extrema can be located not only on the right tail of the
distribution but also on the left tail since the terms
can also take negative values and the local minima of the sum
can turn out to be negative in value but rather large by the absolute value and, therefore, enter the constructed convex hull based on the known functional property
In the case of a continuous function on a sphere, the extrema of function built for object will correspond to the extrema of function built for object as rotation of . The first vector from the list of the extrema of must have the same length as the first vector from . However, when the function is sampled on a grid, the extrema of continuous functions are not necessarily located precisely at the grid nodes. Therefore, all grid calculations are performed with a certain error, and extrema for and may differ from each other due to the choice of sampling nodes. This is the reason behind checking several vectors’ triplets for the solution and choosing the one with the best correspondence.
If a grid on the sphere surfaces is not dense (for example, containing only 100 nodes), the possible error of calculation of the extrema is significant. In such a case, even an exhaustive search on all possible triplets of vectors does not ensure obtaining the correct answer with an acceptable error. For this reason, we use a relatively dense grid.
We realize that in the discussion above, our assumption of the independence of is hardly realistic, but at this stage of the presentation, it is only performed to explain the basis of actions, such as why the idea to sum partial reconstruction functions on all the shells has appeared and where the limited number of steps to finalize the analysis could potentially come from.
The mentioned results about the trend to normal distribution will not only be valid under the assumption of independence of the random variables ( corresponds to the spherical functions ), as in Lyapunov’s theorem formulation. In recent decades, Lyapunov’s theorem has also been proven for some classes of dependent random variables, so the trend to normal distribution will also be held for more general cases. Examples of such classes might be the following:
Martingale differences [
58,
59,
60];
-dependent and
-mixing sequences, such as [
54], pp. 387–388, and [
61], pp. 773–774, [
62,
63,
64];
The common feature for all these results is the following. The random variables “far away” from each other are almost independent in a certain sense [
70]. This happens for the parts of the object under the assumption that these parts weakly depend on each other. There are also classical results by Bruns, Markoff, S. Bernstein, P. Lévy, and Loéve (see [
71,
72,
73,
74] (chapter VIII)) for dependent random variables. The formulated conditions of these theorems are either very strict or include conditional distributions, which makes their application difficult.
These results are also aligned with the statistical extreme value theory (EVT) [
75] (chapter 3) [
76]. Both the central limit theorem and the theory of extreme values allow us to estimate the probability of a rare event. This event can be the result of the accumulation of many events or be only one event exceeding some critical threshold [
76]. Unfortunately, for these theories, the essential requirement is the identical and independent distribution (i.i.d.) of random variables.
The SDF functions corresponding to the spheres cutting the surface are the distance functions. The SDF is continuous by polar angles and the radius. This means that the neighboring spheres at the points with the same angular coordinates cannot differ by an arbitrarily large value. Therefore, the respective values of SDF functions on spheres are dependent.
In [
77], the case of extreme value theory for dependent random variables was considered. It was shown that if the variables are not identically distributed or are dependent, the result is similar to the case of independent identically distributed random variables. However, as usual, the result relies on very restrictive conditions. Checking the feasibility of these conditions is also a non-obvious and sometimes non-formalized process.
3.10. Method 2: Multiplication Rule of Probability and Chebyshev’s Inequality
Let us assume that we have a random value distributed at the segment and also a random sample of this distribution. Let us call the significant extrema those that correspond to the leftmost (rightmost) bin in the histogram, and the length of each bin is ½ * 0.27% of the length of the segment . This arrangement arises from the following considerations. For the normal distribution, the part of the distribution outside the interval should contain approximately 0.27% of the whole distribution. If we take the uniform distribution at segment , these last bins are expected to receive the same 0.27% of the overall samples. Later, to map the 3D objects, we will only consider the extrema from these last bins.
Let us explore another way to show that the number of significant extrema of summation of two spherical functions will not increase compared to the number of significant extrema of each function. This consideration is based on the probability multiplication theorem and does not even require a Gaussian assumption.
Since the same number of nodes (namely, 10,242) is used on each of spheres, we divide all intervals, and , by the same constant number of bins. Let be the probability of the random variable to be a significant maximum, i.e., to get into the rightmost bin or to the leftmost one. Let us reiterate that the length of both bins is 0.27% of the length of segment .
According to the multiplication rule of probability, we have
, where
is the probability of occurrence of both events, i.e., the probability that the sum
is the significant maximum, and it includes the significant maxima for
and
.
is the conditional probability of
occurring given that
occurs, i.e., the probability of a point on the sphere to be a significant maximum of
, when
is a also significant maximum. In the case of independent random variables
and
, the conditional probability is
. As we see, in any case, it turns out to be
since
. On the other hand,
and
, thus
By induction, a similar conclusion can be drawn for
random variables as follows:
Due to the multiplication rule of probabilities
Formula (12) only shows a non-increase of the number of significant extrema in the sum and does not provide a quantitative assessment.
For significant minima, the same considerations are valid.
Chebyshev’s inequality allows us to quantify the probabilities , …, as being significant extrema in (12).
Chebyshev’s Inequality. If a random variable has expectation and variance ,
then the inequality holds for any If we consider the sum from Method 1 as a random variable (without any assumptions about the dependence or independence of its terms), then, we estimate the probability that the deviation of this quantity from its expectation will be more than three standard deviations in absolute value.
We have
and
; then, according to Chebyshev’s inequality
This estimate is not really helpful in practice as it corresponds to the worst-case scenario.
From the two approaches described above, we see that the weaker the restrictions imposed on the class of random variables, the rougher (less accurate) the theoretical estimates are.
For this reason, we decided to analyze the actual data statistics and compare how close they are to the theoretical results (see
Section 4).
4. Results and Discussion
We use a dataset out of 1309 STL files collected from ShapeNet (The Princeton Shape Benchmark (PSB), Version 1) [
78,
79], Princeton ModelNet 40-Class Subset [
80], Purdue Engineering Shape Benchmark (ESB) [
81], and internet websites. Every model used has a correct geometry.
The extrema amplification process described in
Section 3.10 (Method 2) generates the most prominent extrema. The amplification includes summation, which is performed per node for all the shells, looking for the extrema and saddle points, and building the convex hull using those points. We expect that the sum assumes a certain amount of independence of the multiple components (as discussed in
Section 3.9 (Method 1)).
We ran the method described on the database of 3D objects for 1309 objects. It was found that in 94% of cases, the first attempt (performed by steps A5–A7 of the algorithm) to align the objects using the amplified extrema was immediately successful. Five attempts were enough to achieve 99% success. The maximum number of attempts was set to 30, and it allowed us to find the solution in all the cases.
Let us find the rationale for this small number of attempts to find the solution.
For each object of the dataset mentioned, we calculate the values
The first one is the sum of the spreads of functions defined by (1), and the second one is the spread of the final sum ; see Formula (2).
Let us consider the ratio for the objects of the dataset. It turned out that the expectation and variance of this ratio are 0.90 and respectively. It can be interpreted that in 90 percent of cases, there is at least one extremum on each sphere with the same angular coordinates.
The total number of extrema points in the last bins near and appeared following two trends. If the object does not have symmetrical (repeatable) parts, the number of points in the last bins is less than 30 on average. It corresponds to approximately 0.27% of all the points (recall that we use an icosphere with 10,242 vertices).
The second trend manifests itself when the number of points in the last bins is relatively high, and this number can come to above two hundred. This situation appears when some parts of the object are similar or the object has some symmetry (for example, rotational symmetry). These symmetrical/similar parts may contribute a large number of extrema points of the convex hull to the last bins. This trend appears when all right-part probabilities are high in inequality (12). If there is some rotational self-similarity of the object, this number of extrema could be rather large, but all these extrema will define the same transformation of the 3D objects.
From this reasoning, in both cases, we have to try matching only a limited number of the extrema to check the object mapping. The results of running the method on a dataset demonstrate that a few attempts to match the objects (steps A5–A7) are enough to conclude whether the object is the same or different.
Due to grid sampling of the square-integrable functions and from (4), the matching could have some errors, and the best matching using metrics and (step A7) will define the best acceptable solution (if found).
We propose an algorithm to identify similar objects. It is a non-iterative and non-gradient method of optimization. The current state-of-the-art mapping algorithms are the ICP and Go-ICP.
For complex shapes with many local extrema and tens or hundreds of thousands of vertices, the ICP might be much more time consuming (say, 3 min vs. 30 s of our algorithm) and sometimes stuck in the local quasi-optimal solution (yellow and blue shapes do not match each other’s after rotation found by the ICP,
Figure 3).
Two algorithms (the ICP and ours) use different approaches to search for optimal mapping. Our approach is based on structural correspondences of spherical harmonics. It will not catch and, hence, not align the elements that are described by spherical harmonics of high degrees (greater than ). The idea to stop at is based on the following rationale. We work with a grid, and the errors in calculating coefficients for higher degrees are affected by the errors made for coefficients of lower degrees.
The accuracy of our algorithm and the ICP is analyzed in the following way.
Initially, we mapped both objects into the sphere of radius 16. The center of the sphere is the centroid of the object’s surface, and the farthest from the centroid point of the object mesh lies on the sphere’s surface.
When we look for the mapping of two objects as input data for the ICP algorithm, we use the point cloud, which contains all the mesh vertices. The number of vertices in the point cloud may reach millions. The mapping error is the average of all the distances from the mesh vertices of the first object to the surface of the second object after mapping. Suppose the ICP is converged to the global extremum. In that case, the average mapping error is usually small, around 10
−9 and less, and running time, on average, is around 0.18 s (we used Open3D implementation of the ICP). If the ICP converged to the incorrect optimum, the error could be arbitrarily huge, and the time to complete the run could reach up to 30 min for huge STL files (
Figure 3, top and middle row examples).
For our algorithm implementation, the computation time varies from 8 s for relatively not very complex meshes (<50,000 vertices) to below a minute for a huge mesh with more than 1M vertices. This time includes reading the STL file of the model, which could reach a hundred megabytes and even gigabytes in size.
To compare the ICP and our method, we chose the Princeton dataset [
78] and a threshold of 0.14 for the maximum mapping error ME (
Table 1). This threshold is an average value for ME for the dataset [
78] when using our approach. It divides the dataset approximately by half. For 53% of the objects in the dataset, the ICP has the maximum ME below this threshold, and other cases demonstrate relatively high errors. Suppose we do not consider these two datasets separately. In that case, the error received in the second case can make the average error of the ICP method extremely high despite great accuracy for the other half of the data samples.
Table 1 shows that our algorithm gave an average error (average distance to the surface) below 0.2 with a standard deviation of this value of less than 0.1. The maximum point-wise deviation for the whole dataset used was below 3. We can see that if we compare our approach with the ICP for the entire dataset, our method looks more accurate. Unfortunately, this commonly accepted type of analysis does not consider that the ICP has two clusters with different behaviors, as
Table 1 shows.
Methods with the same setting should be considered to compare the accuracy and efficiency of different approaches correctly. That is why we checked the validity of our algorithm by comparing it with the ICP (which performs a mapping of point clouds and not subsampled data, as neural networks do; see
Section 4.1). The ICP appeared to be the best method for comparison, but unfortunately, the settings of both methods are not perfectly the same. The ICP does the mapping of the point clouds to find the best mapping. Our method task is either to find a rigid transformation of two objects or to reject the identity of these objects. Let us see how different the solutions are received. Assume we have two objects, as shown in
Figure 4a,b. The ICP method provides the mapping shown in
Figure 4c. It maps the animal bodies and ignores the failure to map the absent fangs of cat (b) to cat (a). Our algorithm finds that both objects are not identical and stops processing.
4.1. Deep Learning Approach vs. Statistical Analytical Approach
The review of the current deep learning approaches shows they have definite constraints. As an example of a deep learning approach, let us consider the ICCV 2019 paper by Wang and Solomon [
43]. It describes a non-iterative and non-gradient method based on applying deep learning. The deep learning model should be trained on a dataset of 3D objects, and later, the transformation to map the objects will be produced.
In the mentioned manuscript, the following are pointed out:
Current deep learning networks can only handle object-level point clouds (each usually has around 500 to 5000 points); this is a common limitation of recent point cloud learning methods.
The deep learning model can be improved by iteration or “polishing” via the classical ICP (iterative closest point algorithm).
The deep learning model is trained on some input data. The application of the method to unseen categories of objects shows a sharp drop in accuracy.
The extraction of the rigid motion is performed by solving 3 × 3 SVD eigenproblems. This solution might be unstable due to the multiplicity of eigenvalues and sensitivity to outliers, as was shown by Kazhdan [
13] (Appendix B).
Usually, the proposed method is compared with the state-of-the-art ICP method and its modification, the Go-ICP method [
36]. The ICP method may converge to the global or local extremum. The convergence to the local extremum could show a huge discrepancy with the ground truth solution. Calculation of the average accuracy of converged and non-converged (local extremum convergence) cases could show the inferiority of the ICP, even if it has higher or comparable accuracy in an overwhelming majority of all the cases in comparison with the deep learning method. We think the accuracy should be calculated separately for the category of globally converged data, with the percentage of locally converged cases in the dataset mentioned.
We would like to mention the following:
Our algorithm does not depend on the end-to-end trained model, and we propose a solution based on a statistical analytical approach.
It does not matter to our algorithm which class of objects it uses; its accuracy does not depend on whether it has seen this class before.
Our algorithm was able to process 3D models with tens of millions of vertices and find global extrema, even when the ICP cannot, and the memory requirements for a deep learning approach are excessive.
It considers the internal architecture of the object, not only the tiny amount of surface points.
Our algorithm does not use a voxel representation of the object; this representation may result in the removal of the fine details of the object and would make the surface rough.
The proposed approach to finding extrema statistically by summating random values is not among classical techniques or modification of the known approaches; to the best of our knowledge, it has not been used before by anyone.
Summarizing the abovementioned, we can conclude the following.
The data preparation for neural network processing requires either voxelization of the object inside the box with a limited number of overall voxels or choosing a limited subset of points to represent the object (usually around 1 K points only). We work directly with the mesh of thousands, and even millions of vertices, and subsampling of this mesh would affect the accuracy of mapping objects with fine details and complex internal structures. Additionally, the comparison result would significantly depend on the choice of these 1 K points. That is why our approach cannot be directly compared to the neural network approaches.
4.2. Cases When Our Approach Does Not Work
It would be naïve to assume that this method would work in all the cases. We found that when the object’s surface has many superficial details, which mainly look like a texture, our method may generate the transformation corresponding to substandard results; see
Figure 5, top and bottom rows. The issue, likely, is in a large number of components at the surface of the object. The number of degrees
used for reconstruction is probably not enough to catch the correct alignment of a large number of similar components. In support of this claim, we can refer to
Figure 5, a bottom left cell with a stress ball, where the alignment is acceptable, likely due to the not-so-big number of repeated components.
4.3. Advantages of Our Method
In our work, we followed up and further developed the ideas described in [
7,
13,
15,
16]. In these works, the voxelization of the surface is considered, and the approach is based on the expansion of the indicator function defined on each concentric sphere crossing the surface into a sum of spherical harmonics. The value of the indicator function on the corresponding sphere is 1 if there is an intersection with the surface and 0 otherwise. For the partial reconstruction of discontinuous function by its spherical coefficients, in addition to computational errors associated with surface voxelization and sampling the function on a grid, the Gibbs effect takes place in [
82] (chapter 4). This effect manifested near the discontinuity point of a function [
83] as a jump in the sum of its Fourier series, which exceeds the jump of discontinuity of the function itself by 17.9%. In contrast, in our work, we consider the triangular surface of the object, not voxels, and only continuous functions
defined on the concentric spheres are used. Their sum
will also be a continuous function as the sum of a finite number of continuous functions.
Modifications made in our approach allow us to improve the accuracy of the identification. The summation of partial reconstructions at the grid nodes over the concentric spheres allows us to obtain the final grid. After summation over all spheres, the extrema on each sphere become even more prominent among the other sum values on the grid. We find the extrema and saddle points on this grid and build a convex hull on the found points. When we look for the mapping to match both 3D objects, we try to map the essential extrema of the respective objects. The amplification of the extrema causes a reduction in the number of mappings to check, so the final result is obtained in a limited (fixed) number of steps without calculation of the gradient.