Firstly, we perform initial segmentation of the objects based on the superpixel clustering algorithm. After obtaining the complete object segmentation results, the composition of each object’s super-voxels and facets, as well as the connection relationships between each component, can be synchronously obtained. Based on this premise, we use the simplest logic to analyze the geometric types of objects, extract the geometric parameters of objects, and model them as prior knowledge. Firstly, according to the covariance eigenvectors, the probabilities of panels belonging to planes and surfaces are calculated, and the types of panels constituting the object surface are initially determined. Then, based on the combination of different planes and surfaces, the geometric models of basic geometric bodies are constructed, and the geometric bodies are divided into two categories: planar geometric bodies and curved surface geometric bodies. In each category, the specific geometric body type is determined based on the relationship between the main matching surface and its adjacent facets. The parameters of the specified type of surface are extracted using the random sample consensus algorithm, and the size of the object is obtained by combining the directed bounding box for modeling.
3.1. Determining the Type of Planar or Curved Surface
Common surfaces can be simply classified into two main categories: plane and curved surfaces. Curved surfaces include cylindrical, conical, and spherical surfaces. Different combinations of plane and curved surfaces can form simple geometric shapes. Therefore, before analyzing the specific geometric type of an object, it is helpful to roughly classify its constituent surfaces, which facilitates quick determination of their respective types. This paper focuses on the judgment of planar and curved surfaces, including the selection of neighboring areas and feature calculation based on the covariance matrix.
3.1.1. Selection of Search Neighborhood
Currently, the neighborhood of a given three-dimensional point can be divided into a spherical neighborhood, a cylindrical neighborhood, and a fixed-point neighborhood. Spherical neighborhood and cylindrical neighborhood refer to searching for points within the corresponding shape around the given point as neighbors. The neighborhood shape is simple and symmetric. On the other hand, fixed-point neighborhood refers to finding the specified number of points closest to the given point as neighbors, so the obtained neighborhood shape is not fixed. For different point cloud data, neighborhood selection is generally carried out through empirical or heuristic methods. In addition, considering the three-dimensional structure of the point cloud and the local point density, some single-scale and multi-scale neighborhood search methods have been proposed to meet the demand for accurately extracting geometric features. For single-scale neighborhood search, geometric features calculated using points within a smaller radius range lack stability and are susceptible to noise and outliers. Points within a larger radius range lead to over-smoothing of calculated geometric features, making them unable to reflect the true shape [
16].
In this study, in order to quickly and simply determine the type of the current face, while considering point cloud noise and uneven density, the stable characteristics of the central region of the segmented face are fully utilized, and a spherical neighborhood is defined with the center of the face as the center of the sphere. Based on this, a multi-scale spherical neighborhood search is performed to calculate subsequent covariance features, which can accurately and stably determine the type of the face.
The schematic diagram of the multi-scale spherical neighborhood search region for a single face is shown in
Figure 2. The red dots in the figure represent the centers of the face, which are used as the centers of the spherical neighborhoods. Three radius values are uniformly selected as search radii, with the minimum width of the face as the upper limit of the radius. The arrows of different colors in the figure indicate the selected radii. By organizing and managing the points within the face using KD trees, the spherical neighborhoods of the current face under different search radii can be quickly obtained for the subsequent calculation of covariance features.
3.1.2. Feature Calculation Based on Covariance Matrix
To construct the covariance matrix using the points within the centroid and its search neighborhood, the eigenvalues
of the covariance matrix are calculated through principal component analysis. Based on the different quantity relationships between the eigenvalues, the corresponding dimensional features can be derived. The specific calculation formula is as follows:
where
represents the one-dimensional linear degree,
represents the two-dimensional planar degree, and
represents the three-dimensional scattering degree, satisfying
.
can be used to estimate the similarity between the local shape of the face and a plane. A higher value indicates a smoother face, thus, it can be used to distinguish between planes and curved surfaces [
17].
For curved surfaces, taking a sphere as an example, the degree of curvature of the sphere varies with different radius values. A sphere with a larger radius corresponds to a smaller curvature, indicating that the local shape is closer to a plane. The calculation formula for estimating surface curvature using eigenvalues is as follows:
When the
value is larger, it indicates that the shape of the face is more curved. In order to achieve a more uniform expression, the curvature of the local shape is defined based on
as follows:
When the value of
b is larger, the current face is closer to being a plane. The probability of the current face belonging to a plane or curved surface is described by combining the planarity
and curvature
b. The calculation formula is as follows:
For a face, the corresponding
values are calculated based on the points within the spherical neighborhood at different scales as discussed in the previous section, and then they are fused using a weighted approach. The weighted formula is as follows:
where
represents the calculation results at different scales and
represents the corresponding weight values. The results of the spherical neighborhood calculation at three scales are weighted using the average noise of the point cloud. The definition formula for
is as follows:
Among them,
t represents the average noise amplitude of the point cloud, which is generally set based on the average density of the point cloud. In the case of high point cloud noise, the value of
is larger, which can balance the final value obtained by the larger radius. Conversely, when the point cloud noise is low, the value of
is larger, which can use more locally stable features to ensure the accuracy of the results. For a given threshold
for judging planar and curved surfaces, when
, it indicates that the current face is classified as a plane; otherwise, it is classified as a curved surface. The proposed method of combining multi-scale neighborhoods with covariance matrix eigenvalues can effectively make preliminary judgments on face types in the presence of noise and outliers in point cloud data [
18].
3.2. Recognition and Modeling of Regular Geometric Shapes
The surfaces of objects in real indoor scenes are mostly composed of geometric primitives such as planes, cylinders, cones, and spheres. The Random Sampling Consistency Algorithm (RANSAC) can be used to search for the basic geometric primitives mentioned above in 3D point clouds, as well as to extract parameters from specified types of geometric primitives. RANSAC is a hypothesis- and validation-based method that generates hypothesis model parameters based on the minimum number of sample points, and uses all data points to validate and update model parameters. Compared to the process of fitting model parameters using all data points using the least squares method, the model parameters estimated by RANSAC using the minimum subset and local points method are more robust, especially suitable for processing point cloud data with more outliers.
For known geometric primitive types, first the minimum subset required for fitting is determined, such as determining at least three non collinear points on a plane in space. Then, the minimum subset is randomly selected to estimate the parameters of geometric primitives. By determining whether all other data points comply with the current model, the data points are divided into local and external points. Update the model parameters using local points and continue to iterate the above process for the remaining points until the local points are no longer amplified and meet the set threshold requirements [
19]. At this point, the optimal parameters of the geometric primitive model are obtained. If the optimal model parameters cannot be obtained in the end, it indicates that the current patch does not match the specified type, thus achieving verification of geometric primitive type judgment.
This paper mainly focuses on the study of simple geometric objects, including cuboids, cylinders, cones, and spheres. By determining the geometric type and extracting parameters of segmented objects, the modeling of regular objects in the original scene is achieved. The surface of a geometric object is composed of geometric primitives, and the Random Sample Consensus algorithm can be used to verify and extract parameters of known geometric primitives, providing a theoretical basis for subsequent geometric modeling. Based on the judgment results of internal faces of various objects mentioned earlier, a basic geometric model graph is established, categorizing geometric objects into planar and curved ones. For different types of geometric objects, their specific types are determined based on the combination information of internal face patches and the relationship of surface mean curvatures, combined with geometric primitive parameters and geometric shape parameters to perform the modeling.
3.2.1. Basic Geometric Model Graph
Several geometric primitives can be combined to form some basic geometric shapes, such as rectangular cuboids, cylinders, cones, and spheres, etc. After the initial type judgment of the surfaces composing an object, i.e., determining whether they are planar or curved, the following standard geometric models are defined based on the normal vector relationship of the planar and curved surfaces within the basic geometric shapes, as shown in
Figure 3. In the figure,
P represents a plane,
C represents a curved surface, ⊥ represents the perpendicularity between the normal vectors of adjacent surfaces, and the absence of notation indicates that there is no clear relationship between the normal vectors of adjacent surfaces.
Considering the phenomenon of excessive segmentation on object surfaces during the region-growing process, as well as various factors such as single-viewpoint acquisition and varying degrees of occlusion among objects, in order to quickly determine the geometric type of an object, the object can be divided into two main categories: planar geometry and curved geometry based on the combination of surface types within the object. Next, specific situations that appear in different categories will be discussed and processed separately in order to complete the reconstruction of regular objects in the scene.
3.2.2. Recognition and Modeling of Planar Geometric Objects
The most common indoor scenes are dominated by flat structures such as desktops, floors, and boxes. Using flat surfaces as the main matching surfaces allows for quick recognition of geometric objects that contain flat structures. For objects with only flat surfaces, the largest face is first identified by finding the largest visible face in the current perspective. This face is considered as the main matching surface. The relationship between the face normal vector and the adjacent face normal vectors is then determined. If the normal vectors are perpendicular to each other, the object is considered as a cuboid, corresponding to packaging boxes and other similar structures in the scene. If the normal vectors are parallel to each other, further fusion of the faces is needed, using a larger flat surface as a whole, corresponding to desktops, walls, or floors in the scene.
The initial parameters of a flat surface can be calculated from the center and normal vector of the face. Based on this, the RANSAC algorithm is utilized to fit the parameters of the plane, which speeds up the fitting process of optimal parameters. After determining the specific type of the flat geometric object and the parameters of each face, direct modeling of the geometric object is not feasible without knowing the object’s dimensions. To address this problem, the minimum oriented bounding box is computed for the segmented object, according to the construction method of the bounding box. For a cuboid, two perpendicular plane normal vectors can be used as the first and second principal axes, and the third principal axis can be obtained by utilizing the property of mutually orthogonal coordinate axes. Then, the point cloud of the object is projected onto the three directions to obtain the maximum and minimum values of coordinates in each direction. This can determine the length, width, and height of the cuboid, and also obtain the center of the bounding box to determine the object’s position in the scene.
Modeling of segmented objects is performed by combining the geometric parameters of the internal planes of flat geometric objects with the dimensions of the oriented bounding boxes. To visually display the modeling results, point cloud data corresponding to the generated geometric objects are obtained using known parameters, representing the reconstruction results. The reconstruction results of flat surfaces and cuboids in the scene are shown in
Figure 4.
Figure 4a shows the original point cloud data with certain missing parts. The green-bordered box in
Figure 4b represents the oriented bounding box for the object.
Figure 4c shows the overlaid result of the reconstruction and the original point cloud data, from which we can observe that the reconstruction effectively fills in the missing original data on the flat objects [
20].
3.2.3. Recognition and Modeling of Curved Geometric Objects
Common curved geometric objects include cylinders, cones, and spheres. Due to factors such as the capturing angle and object occlusion, cylinders and cones exist in two forms in actual capturing scenes, namely planar-surfaces combination and curved-surfaces combination. Due to their different manifestations, the logical processing of recognition and parameter extraction for each curved geometric object is also different.
When the segmented object belongs to the planar-surfaces combination type, the relationship between the plane and the adjacent curved surfaces’ normal vectors is determined by using the plane as the auxiliary matching surface. If the normal vectors of the two are perpendicular to each other, it is a cylinder, corresponding to objects like cups or other cylindrical objects. If there is no perpendicular relationship between the two normal vectors, and the angle with the normal vector of the adjacent surface remains unchanged, it is a cone.
Due to occlusion or capturing angle, cylinders or cones may also exist with only the curvature of surfaces being captured, in which case, all faces of the object are curved surfaces. In addition to this, spheres are also considered, as they are natural geometric objects consisting only of curved faces. For objects with only curved surfaces, the identification of cylindrical surfaces, conical surfaces, and spherical surfaces is done by analyzing the relative magnitude of the principal curvatures at each point in the object’s point cloud data. The values of the principal curvatures can be obtained using the calculation formula, assuming the maximum principal curvature is denoted by and the minimum principal curvature is denoted by . The geometric primitive type of the curved surface object can be determined based on the relationship between the extrema of the principal curvatures using the following criteria: (1) Cylinder: , and remains unchanged. (2) Cone: , and varies. (3) Sphere: , and both and are positive constants.
After determining the initial geometric primitive type of the segmented object using the aforementioned method, as prior knowledge for the RANSAC algorithm, the corresponding geometric primitives are used for parameter extraction. For spheres, the modeling can be directly accomplished by using the sphere center and radius. However, for cylinders and cones, after obtaining the parameter equations of the cylindrical surface and conical surface, the point cloud of the objects needs to be projected onto the corresponding axis, and the distance between the two farthest projected points is taken as the height of the cylinder and cone.
To visually demonstrate the modeling effectiveness of curved geometric objects, point cloud data of the corresponding surfaces of known geometric parameters and dimensions are generated. The reconstruction result of a single curved surface object in the scene is shown in
Figure 5, where
Figure 5b represents the oriented bounding box established for the original point cloud. As can be seen from
Figure 5c, for different types of curved surface objects, the reconstructed geometric point cloud fits well with the original point cloud and effectively fills in the missing data based on the obtained 3D model.