In this section, the implementation details of the proposed system in
Figure 1 are explained. The 3D point cloud video frames are captured live through the Kinect and directly fed into the system. The server, laptop and Kinect were located in an office within Martin Luther University’s Informatik Institute. The system uses two filters: outlier and CHC filters and an octree compression. The coordinate precision for the octree compression is set to 1 mm. Starting with the outlier filter parameters (number of neighborhood to search and the standard deviation multiplier); different values are used to select the best final output in terms of quality. The same is applied with the CHC filter standard deviation multiplier parameter, then using the selected parameters from both filters; test cases are constructed for the whole system. Using the final test cases, the system is evaluated in several terms (compression ratio, total bytes per point, PSNR, and SSIM). Network bandwidth requirement is captured before and after processing. Finally the system is compared to streaming system [
17].
5.1. Statistical Outlier Filter Test Cases
Starting with the first filter, the statistical outlier removal filter works by using the mean and the standard deviation threshold parameters. In order to demonstrate the effect of the two parameters, a several test cases were designed to select the best values for this filter.
Figure 3a represents a table in a 3D point cloud where the data points are equal to 460400, this point cloud frame will be used to define the effect of the test cases. The reduction is the process of removing the outliers, which is clearly visible around the legs.
Table 1 represents several test cases using different numbers of neighborhood search and standard deviation threshold with the corresponding data points reduction.
Figure 3b–d are the filter effect results for Cases 1, 8 and 9. As shown, the outlier filter successfully removed the outliers in all cases, but the main difference is that in some cases the filter removed a valid data point in the point cloud, as shown in Case 1
Figure 3b, therefore selecting the best parameters is not a straightforward task.
The selection of the best case depends on both the reduction ratio and the quality of the filtered frame. It is very essential that the best case does not remove legitimate (valid) point in the point cloud, at the same time it should reduce the number of outliers as possible.
Figure 4a is constructed to represent the test cases for the outlier filter and the corresponding SSIM.
Figure 4b represents the PSNR values for each test case. The SSIM and the PSNR are calculated between the depth image of the raw point cloud scene and the depth image for each case. The reason behind comparing with depth images relies on the fact that the outlier filter removes the noise according to their depth values.
Figure 5a–c represent the depth image for the raw point cloud frame, test Case 9 and test Case 11 respectively.
From
Figure 4a it can be noticed that the best four Cases are 9,10,11 and 12, where the threshold multiplier of the standard deviation is set to 3. That means that the filter will produce better quality when the condition is not highly restricted as the case when the multiplier was either 1 or 2. Furthermore, the best quality among all cases is Case 9 where the filter is set to search within 60 points.
5.3. Streaming System Test Cases
Using the selected best parameters from
Figure 4a and
Figure 7a,
Table 3 is constructed with the new test cases. The outlier filter, CHC filter and an octree compression are applied on the streamed live 3D point cloud video.
The proposed streaming system was implemented on a server-client network infrastructure, see
Table 4 for network type details and the server/client specifications.
The actual network bandwidth was captured before streaming using IPERF which is a command line tool to measure network link bandwidth and quality [
35]. The typical maximum network bandwidth for 802.11g WLAN connection is 54 Mbit/s [
36], but as shown in
Figure 8 the rate was about 21 Mbit/s, due to the link adaptation mechanism where it is implemented in most 802.11 devices [
37].
Selecting several frames randomly, the compression ratio and total bytes per point were captured for the test cases in
Table 3, see
Figure 9 and
Figure 10.
Case 1 has the best values in terms of compression ratio and total bytes per point. The reduction in Case 1 is obvious, since the CHC parameter is set to one, which is considered a difficult condition statement; only few data points will pass the condition. While in Case 4, the CHC parameter is more flexible and many numbers can meet the statement, though the compression ratio was the lowest.
Since the quality of the streaming video depends on the visual appearance therefore compression ratio and total bytes per point are not adequate for testing that, SSIM and PSNR are used.
From
Figure 11 and
Figure 12 it can be noticed that Case 1 has the worst video quality in terms of SSIM which is equal to 0.5042 and PSNR of 8.6 dB, while it has the best compression ratio. Case 4 has the best SSIM quality value among all test cases where the SSIM is about 0.8594, however the PSNR value of 26.57 db is ranked the second-best.
Selecting a specific case as the best case depends on the application requirements. If the quality in terms of SSIM and PSNR are the only consideration, then Case 4 is the best case, but if a compromise is required between the reduction (compression ratio and total bytes per points) and the quality (SSIM and PSNR), then Case 2 is the best case. For the proposed streaming system requirements Case 2 is considered the best case.
Figure 13a represents the processed (filtered and compressed) live 3D point cloud video at the server.
Figure 13b represents the streamed live 3D point cloud video decoded and rendered at the client. From the visual appearance perspective, both figures are nearly identical on the viewer’s eyes, despite that the SSIM value is around 0.86.
Figure 14a,b display the captured RGB frame from the original raw Kinect video and the RGB frame from the decoded video at the receiver.
Capturing the decoded frame at the client shows a few black areas, which represent an unknown depth values. Looking closely and comparing
Figure 14a,b, the black area at the ceiling is the room light, and since there are sunlight in the room, the Kinect fails to read the depth values.
The network bandwidth requirements provide a perspective of how much the system consumes and do the processing stages reduce the bandwidth needed for streaming the live 3D point cloud video.
Figure 15a,b prove that the 3D video streaming consumes much less than the actual existed bandwidth. The results show that the proposed streaming system is not being limited by the WLAN bandwidth. Furthermore according to
Figure 15b the needed bandwidth is minimized to around 80% after applying filtering and compression to the captured 3D point cloud video stream.
Starting with the whole system as in
Figure 1, two filters and an octree compression are used to process the live 3D point cloud video frames.
Table 5 represents the processing time amount at the server (PS) and the client (PC) for different frames.
In
Table 6 the processing time is also calculated for the streaming system but only the outlier filter and an octree compression are used to process the 3D point cloud video frame.
Table 7 shows the processing time at the server and the client for the streaming system where only CHC filter and an octree compression are used.
From
Table 5,
Table 6 and
Table 7 it is clearly noticed that the whole system requires more time to process the 3D point cloud frame since two filters and an octree compression are used, however using only CHC filter with an octree compression requires less time than using outlier and an octree compression, therefore the outlier filter is responsible of consuming more time in processing.
As mentioned in the related work, streaming system in [
17] uses only the octree compression and this system is compared to the streaming system designed in this paper.
Figure 16a,b are generated to capture the compression ratio and total bytes per point respectively. While the quality SSIM was 0.6843 and PSNR 18.02 dB. The quality is around 0.70 since the noise is not removed from the point cloud frame.
Table 8 represents the processing time for the streaming system in [
17].