1. Introduction
A point cloud is typically a collection of data taken from an object using a laser scanner. This measured data is used in various fields, such as autonomous driving, space reconstruction, 3D printing, drones, robotics, and digital twins. Point clouds are often used for the purpose of measuring and quality inspection of manufactured parts and generating 3D CAD models. This is called reverse engineering. Reverse engineering is a design process that analyzes or reproduces a product from a physical part. It is useful in modeling tasks to experiment, evaluate, and verify conceptual designs for new product designs. Of these, reverse engineering of plant facilities is in increasing demand.
Reverse engineering of plant facilities is used for factory relocation, modernization, capacity expansion, and efficiency improvements of already built and operating facilities. Reverse engineering can make drawings easier to maintain, such as through part interference. Simulations using virtual space, such as virtual maintenance and simulation training using reverse engineering, are utilized in various areas [
1].
To reverse engineer of a plant facility, workers first install laser scanners in various locations to acquire point data. The measured data is integrated into the same coordinate system through the registration process, and then manually constructed as building information modeling (BIM) data. In general, the plant facilities are so large that the measured point cloud data is also very large. Therefore, manual works for constructing BIM data require a great deal of manpower and time. To reduce these shortcomings, many studies on modeling automation in point clouds have been conducted. The automatic pipe estimation algorithm, which is a large aspect of plant facilities, can reduce the opportunity cost of manual labor.
Pipes, which account for a large proportion of the reverse engineering of plant facilities, are relatively easy to automate and much research has been done with them. In general, the automatic estimation of pipes is accomplished through normal estimation, segmentation, and filtering, as shown in
Figure 1.
Normal estimation is a process of estimating the normal of a face when each point in a set of points forms a face, and filtering is a process of removing the noise based on the calculated normals. Segmentation is the process of grouping a set of points that form the same object around the normals from the noise-free data, and model estimation is the process of estimating the shape of the object based on each segmented group.
When the preprocessing process is completed, the pipe is selected from each of the recognized objects to determine the length and diameter, and the pipe modeling is completed. In this case, the Hough transform is generally used. The Hough transform determines the direction of the pipe by voting after arranging the normals of the points formed by the pipes around the Gaussian sphere. (a) of
Figure 2 illustrates this process.
Once the direction of the pipe is determined, the pipe modeling is completed by assuming that the circle formed by orthogonalizing each point of the pipe on a plane having the same normal as the direction of the pipe and the diameter of the pipe as shown in
Figure 2b [
2,
3,
4,
5,
6,
7,
8].
Existing methods perform the segmentation process after normal estimation; therefore, accurate normal estimation is very important. Recently, the accuracy of the laser sensors has been improved greatly, and the error of the point data is very small. Mobile augmented reality devices (including stand-alone devices, such as Facebook Smartglasses and MS HoloLens), however, operate at low power and are relatively weak in accuracy. Therefore, we need to compensate for the shortcomings of mobile augmented reality devices by using a fast pipe estimation algorithm without normal estimation.
In the point cloud, the estimation of pipes without normal estimation includes methods using sphere fitting and principal component analysis. The procedure of the above method calculates the parameters through mathematical modeling of spheres and straight lines. In this case, a noise reduction method using random sample consensus (RANSAC) is used. By defining the parameters that fit the model of the defined sphere, we can construct a central axis candidate group (center point and radius). The central axis candidate group also becomes a point cloud and estimates the central axis of the pipe from the central axis candidate group using the length and radius averages of the regions corresponding to the linear model. This method is relatively fast and does not require normal estimation, making it suitable for mobile augmented reality devices. Therefore, this paper presents the possibility of real-time pipe estimation using mobile augmented reality devices based on the above method.
Augmented reality is a field of virtual reality (VR) involving a computer graphics technique that synthesizes virtual objects in the real environment to make the real and virtual appear together. Recently, with the development of augmented reality devices, wearable devices of low power have been released and applied in various fields. For example, distance information using a smartphone’s camera, additional clarifications in education, additional information needed for flight, and model visualization during building project meetings. In addition, the application of augmented reality (AR)/VR is exploding [
9,
10,
11,
12].
Various studies using point cloud data and augmented reality devices have been conducted. Gupta developed a method using LiDAR data to solve the problem of augmenting dimensional information for mobile augmented reality system [
13]. Placitelli developed a face alignment tool that can be applied to real-time video streams using RGB and depth images [
14]. Gao developed an AR framework for accurate and stable virtual object matching for realistic tracking and recognition [
15]. Mancera-Taboada built augmented and virtual reality-based platforms that provide digital access for the disabled using 2D technical drawings (ground plan, front elevation, and sectional view), laser scanners, and digital cameras [
16]. Schwarz proposed a real-time augmented reality technology in mobile devices through compression technology based on an mpeg video-based point cloud [
17].
Mahmood proposed a RANSAC-based false correspondence rejection method to increase the accuracy in registration and correspondence estimation required for interaction in augmented reality devices [
18]. Kung extracted planar and curved regions using supervoxel segmentation for AR devices [
19]. Pavez studied the compression and expression of the polygon cloud, which is the middle between the polygon mesh and the point cloud on the AR device. A dynamic polygonal cloud has the advantage of temporal redundancy for compression; thereby, they proposed compressing static/dynamic multiple clouds [
20]. Cazamias used depth images for the real-time processing of technology to create virtual spaces based on real spaces [
21].
Jimeno-Morenilla proposed real-time footwear matching using a glove system that can control a 3D model and a magic mirror. In particular, the infrared emission sensor was attached to the foot to measure the position and direction, and the 3D shoe was augmented to complete the magic mirror [
22]. Based on the above research, after a real-time pipe measurement using a point cloud, the augmented reality visualization method first acquires a point cloud from a depth image and downscales it as shown in
Figure 3. After that, the center axis estimation using the variable scale is performed for the pipe estimation. After estimating the exact pipe by accumulating the estimated results, the augmented reality is constructed/completed by integrating the actual image and the estimated object.
In the remainder in the article,
Section 2 deals with the pipe estimation methodology, and
Section 3 contains the Experimental Results and Discussion. Finally,
Section 4 presents our conclusions.
3. Results and Discussions
3.1. Cylinder Estimation Using Kinect V1
Real-time estimation of the depth images in
Section 2.2 used Kinect V1 and RealSense SR300. The data is composed as shown in
Figure 13b by sampling about 15% from the depth image with a 640 × 480 resolution.
As shown in
Figure 13a, the cylinders used for the experiment were constructed using three cans of 15 cm in diameter and 16 cm in length as shown in
Figure 13b. For the constructed data, the single frame cylinder estimation is shown in
Figure 13c. The experiments used Unity3D on laptops equipped with an Intel 8250U CPU operating at 3.40 GHz and 16 GB memory.
Figure 13c shows the result of estimating the central axis every frame. This handles the average filter but is noisy, the matching results are inaccurate, and the results showed 2.08 frames per second. In addition, some frame estimation is impossible, and the estimation results are inconsistent. Based on these results, there was an error of about 15% in the diameter (15 cm). The result corrected using this ratio is shown in
Figure 13d. The data obtained from the depth image has an error rate depending on the distance. Therefore, the error according to the distance must be corrected.
Figure 14 shows the error result over distance. At a 60 cm distance from the camera, the error in the diameter was close to zero, and the error was the greatest at 95 cm. This shows that the error increased linearly. Therefore, the diameter error can be easily corrected.
A single frame produces inconsistent estimates. Increasing the value of ϵ (sampling rate) gives a better estimate, but using multiple frames yields better estimates.
Figure 15 shows that the multi-frame center axis estimates with better accuracy and consistency than a single frame. However, as the number of integrated frames increases, the number of frames per second decreases.
The real-time cylinder estimation method generated uniform data based on one surface in the depth image. The accuracy was inevitably lower than that of the registered data, as the straight line region was estimated after the sphere fitting and central axis candidate data integration. To overcome this, distance correction and multi-frames can improve the consistency and accuracy; however, this takes a long time to execute. This problem of processing speed may be solved through parallel processing.
Figure 16 shows the performance improvement through parallel processing in real-time cylinder estimation.
Parallel processing resulted in improved frame rates in the real-time estimation, but there was no significant difference in the accuracy. However, matching consistency showed a better appearance, which we estimate as a result of overlapping calculations in the divided regions for the sphere fit.
Finally, real-time cylinder estimation was completed when the estimated cylinder corresponded to the RGB image.
Figure 17 shows the result of mapping the estimated cylinder to an RGB image. Kinect’s actual RGB image FOV was 62° × 48.6°. Unity3D’s camera FOV can be set to 48.6° to correspond to the actual video. However, accurate pipe estimation is not possible and due to problems such as camera distortion, an accurate 1:1 correspondence is not shown.
In
Figure 17, the two pipes also match and when the camera is tilted and measured assuming a pipe installed diagonally, it can be estimated. However, error correction was applied because the matching result was not good.
Kinect V1 is a device with a phase-shift method that has been in use for years. As the measurement data is inaccurate and rather slow, we attempted to measure/compare the same model using Intel’s RealSense SR300 sensor.
3.2. Cylinder Estimation Using SR300
The SR300 (Intel, Santa Clara, CA, USA) was launched in 2016 and is a coded light sensor. The measurement range is 0.3 m to 2 m and the depth resolution is VGA 30FPS.
Figure 18 shows the internal structure of the SR300, which uses infrared (IR) and RGB cameras to estimate the depth, with on-demand integrated circuits for image processing. The FOV is 68° × 41.5°.
Figure 19a shows the characteristics of the SR300 sensor. The distance is calculated by measuring the pattern when the light emitted from the IR sensor is reflected. In
Figure 19a is the result of reflecting only a part of the entire surface due to diffuse reflections. To solve this problem, the surface was treated to prevent diffuse reflections as shown in
Figure 19b.
Figure 20 shows the measurement results using SR300, a coded light sensor. Compared to Kinect, which is a phase modulation method (structured light), the overall result was up to two frames or more, and the response speed was relatively good; however, the accuracy was insufficient. This is presumed to be the result of a lack of surface treatment.
In general, coded light sensors have advantages in miniaturization and accurate operation, and can be mass-produced by semiconductor processes. In addition, the sensor can generate dense, high-resolution codes using a continuous pattern. However, this sensor has a disadvantage in that the temporal consistency of the constructed code can be broken when an object moves in the scene. The SR300 has overcome this shortcoming in the short range by using a fast video graphics array (VGA) infrared camera and a 2-M pixel RGB color camera with an integrated image signal processor [
28,
29].
3.3. Real-Time Pipe Estimation with Kinect V1 and SR300
We identified accuracy and error correction methods. Therefore, it was necessary to verify the effectiveness of the proposed method through measurements on actual PVC pipes.
Figure 21 shows a 100 mm PVC pipe. As a result of the measurement, the inner diameter is 106 mm and the outer diameter is 114 mm. Using these PVC pipes, three types of composite pipelines were constructed, including 90°, T, and diagonal shapes
As demonstrated in
Figure 22, the Kinect showed poor results compared to the SR300, unlike measurements with a 15 cm can. This is because the PVC pipes are small in diameter and cover a wider range. In the Kinect results, the left region of the T and composite pipes showed no point data.
This problem can be solved by setting the distance between the Kinect and PVC pipes far away; however, the accuracy lowers, making cylinder estimation impossible. SR300, on the other hand, generates data relatively accurately over a relatively large area, resulting in better pipe estimation than Kinect. However, if the distance is greater, the data may be inaccurate, resulting in a failure to measure a relatively long distance of pipe, such as the third pipe.
Figure 23 corrects the errors in the pipeline of
Figure 22, as shown in
Figure 13d. Both sensors showed results close to the PVC outside diameter. However, the overall accuracy of the point data was not enough to measure the bent area.
Based on the above results, we assumed that Kinect had a narrow width when measuring data and SR300 had a short measuring distance.
Kinect V1 and SR300 are inexpensive devices that can be easily used at home. Therefore, the products have clear advantages and disadvantages, and the accuracy varies greatly depending on the usage environment. However, correcting the matching results by considering the standard deviation will give an accurate estimate of the pipe with standardized values. However, if the matching result is corrected using the error ratio according to the distance, it is possible to accurately estimate the pipe with the standardized values.
Based on the above results, they can be compared with the existing method as shown in
Figure 24. The comparison used the data from
Figure 10. The existing method was implemented using a point cloud library (pcl) [
30].
Figure 24b shows that the normal estimation was well done.
Figure 24c shows the results of segmentation based on the estimated normal. As a result, the segmentation was also performed well. However, in the process of confirming what type each group is based on the divided data, the cylinder shape was not predicted. In order to perform the Hough transform, the above process must be performed. Therefore, the existing method can no longer perform the cylinder estimation procedure. On the other hand, the fast cylinder matching method using RANSAC was able to estimate the central axis candidate as in (d) and filter as in (e), and as a result, predict the cylinder relatively accurately as in (f).
Looking at
Table 1, the existing method performed normal estimation and segmentation in 201 ms, and it determined whether each group was a cylinder. However, the number of points corresponding to the cylinder in each group were 1,1,0,1,1, and it was not possible to determine whether each group was a cylinder or not. In addition, according to
Figure 24c, the cylinder group was group 4. The mathematical parameters of group 4 were 12.51, −151.34, 837.76, 0.31, 0.94, −0.06, and 0.24. These are the position (x, y, z), direction (x, y, z), and radius, respectively. Here, the final factor, 0.24, was the radius of this cylinder, which is not correct. On the other hand, the fast cylinder matching method using RANSAC estimated the cylinder at 181 ms, sampled at an 80% ratio to find the candidate cylinder for 340 p at 179 ms, and estimated one cylinder at 2 ms.
The estimated cylinder was 16.2 cm in diameter and 20.7 cm in length. Looking at the above results, both the execution time and the results were more accurate than the existing methods. The existing method could not estimate the cylinder candidate; thus, it could not perform cylinder estimation using the Hough transform. The Hough transform is generally known to consume a lot of execution time and resources. We estimated that the amount of point data was small as the existing method failed to estimate the cylinder candidate, or data obtained from one side of the actual cylinder may be unsuitable with the existing method. Therefore, the fast cylinder matching method using RANSAC and the proposed method were suitable for real-time estimation for augmented reality.
4. Conclusions
This paper aimed to obtain point data from depth images, to estimate cylinders as quickly as possible, and then to visualize them. When considering these goals, fast pipe matching has also been shown to be feasible in real-time cylinder estimation. Although using Kinect V1 has shown a lack of point data accuracy and data acquisition speed, the real-time estimation using multiple cores was close to 2.5 frames. The SR300 sensor based on coded light estimated the cylinder at a slightly faster speed. Of course, the matching result was rather poor as it was a pipe estimation using a point cloud constructed based on one side rather than exact data obtained through a laser scanner and registration. However, it was possible to increase the accuracy by correcting the error through empirically checking the mean error. In general, pipes are manufactured in standard sizes; therefore, the diameter of the installed pipes can be predicted relatively accurately.
Although there are errors, the matching results can be confirmed quickly through real-time processing. These methods can be used in various areas when applied to AR devices to be developed. If the proposed method is compiled and executed in the form of parallel processing using a graphics processing unit (GPU), cloud computing, and the optimization of the source code and distribution, this will perform better than our experimental results. The proposed method used RASNAC and PCA methods repeatedly. RASNAC is a methodology that determines the parameters by finding the optimal hypothesis through sampling. PCA is a methodology that analyzes the directivity of data through eigenvalue/eigenvector decomposition using singular value decomposition (SVD) after constructing a covariance matrix. As the above methods are simple calculations, parallel processing is possible. The authors in [
31] proposed a fast and robust RANSAC framework through a GPU, and the researchers in [
32] proposed a singular value decomposition (SVD) using cuda cores. The reason the above method can be calculated faster on a GPU is because the structure is simple and there are many partial calculations that can be processed simultaneously.
The fast cylinder matching method using RANSAC enabled fast cylinder-like pipe estimation by local segmentation sampling in the point cloud. The sphere fitting was shown to be suitable for estimating the central axis of the cylinder, and the data were randomly extracted for fast processing. The constructed central axis data could confirm the direction through principal component analysis. We improved the work efficiency of manual modeling by using linear and curved cylinder estimation followed by central axis estimation without constraints and segmentation. Real-time estimation can also be performed to confirm the cylinder information quickly through an AR camera in the field.
It is unfortunate that we could only perform tests on easily available devices (Kinect, SR300); however, we expect that our proposed method will work on most devices that can acquire depth images.
In the future, research will be conducted to smoothly integrate the estimated straight lines and curves into one, and to increase the performance through parallel processing or cloud computing with efficient memory partitioning, to derive fast, accurate, and consistent results.