Advanced Feature Learning on Point Clouds Using Multi-Resolution Features and Learnable Pooling
Abstract
:1. Introduction
- In the proposed PointStack, we employ a multi-resolution feature learning framework for point clouds. Leveraging point features of multiple resolutions provides both high-semantic and high-resolution point features to the task-specific heads. Therefore, the task-specific heads can obtain high-semantic information without substantially losing granularity.
- We propose a permutation invariant learnable pooling (LP) for point clouds as an advancement over the widely-used max pooling. LP is a generalized pooling method compared to max pooling, since it combines information from multi-resolution point features through the multi-head attention mechanism, as opposed to only preserving the highest-valued features.
- We demonstrate that PointStack outperforms various existing feature learning networks for point clouds on two popular tasks that include shape classification on the ScanObjectNN dataset and part segmentation on the ShapeNetPart dataset.
2. Related Work
2.1. Feature Learning on Point Clouds
2.2. Deep Learning with Multi-Resolution Features
3. PointStack: Multi-Resolution Feature Learning with Learnable Pooling
3.1. Multi-Resolution Feature Learning
3.2. Learnable Pooling
4. Experiment and Discussion
4.1. Implementation Details
4.1.1. Dataset
4.1.2. Network
4.1.3. Training Setup
4.2. Shape Classification
4.3. Part Segmentation
4.4. Ablation Study
4.5. Permutation Invariant Property of Learnable Pooling
4.6. Limitations on the Number of Training Samples
4.7. Runtime Performance Analysis
5. Limitations and Conclusions
5.1. Limitations
5.2. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Proof for Property 1
Appendix B. Semantic Scene Segmentation on the S3DIS Dataset
References
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3dssd: Point-based 3d single stage object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11040–11048. [Google Scholar]
- Yu, X.; Rao, Y.; Wang, Z.; Liu, Z.; Lu, J.; Zhou, J. Pointr: Diverse point cloud completion with geometry-aware transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 12498–12507. [Google Scholar]
- Graham, B.; Engelcke, M.; Van Der Maaten, L. 3d semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9224–9232. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Ma, X.; Qin, C.; You, H.; Ran, H.; Fu, Y. Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework. arXiv 2022, arXiv:2202.07123. [Google Scholar]
- Yu, D.; Wang, H.; Chen, P.; Wei, Z. Mixed Pooling for Convolutional Neural Networks. In Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2014; pp. 364–375. [Google Scholar] [CrossRef]
- Zhang, X.; Sun, X.; Lian, Z. BoW Pooling: A Plug-and-Play Unit for Feature Aggregation of Point Clouds. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 2–9 February 2021; Volume 35, pp. 3403–3411. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on x-transformed points. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
- Xu, Y.; Fan, T.; Xu, M.; Zeng, L.; Qiao, Y. Spidercnn: Deep learning on point sets with parameterized convolutional filters. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 87–102. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. (ToG) 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Long Beach, CA, USA, 15–20 June 2019; pp. 6411–6420. [Google Scholar]
- Hamdi, A.; Giancola, S.; Ghanem, B. Mvtn: Multi-view transformation network for 3d shape recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 1–11. [Google Scholar]
- Qiu, S.; Anwar, S.; Barnes, N. Dense-resolution network for point cloud classification and segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 3813–3822. [Google Scholar]
- Qiu, S.; Anwar, S.; Barnes, N. Geometric back-projection network for point cloud classification. IEEE Trans. Multimed. 2021, 24, 1943–1955. [Google Scholar] [CrossRef]
- Goyal, A.; Law, H.; Liu, B.; Newell, A.; Deng, J. Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline. In Proceedings of the International Conference on Machine Learning, Virtual Event, 18–24 July 2021. [Google Scholar]
- Xiang, T.; Zhang, C.; Song, Y.; Yu, J.; Cai, W. Walk in the cloud: Learning curves for point clouds shape analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 915–924. [Google Scholar]
- Yu, X.; Tang, L.; Rao, Y.; Huang, T.; Zhou, J.; Lu, J. Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling. arXiv 2021, arXiv:2111.14819. [Google Scholar]
- Cheng, S.; Chen, X.; He, X.; Liu, Z.; Bai, X. Pra-net: Point relation-aware network for 3d point cloud analysis. IEEE Trans. Image Process. 2021, 30, 4436–4448. [Google Scholar] [CrossRef] [PubMed]
- Pang, Y.; Wang, W.; Tay, F.E.; Liu, W.; Tian, Y.; Yuan, L. Masked Autoencoders for Point Cloud Self-supervised Learning. arXiv 2022, arXiv:2203.06604. [Google Scholar]
- Berg, A.; Oskarsson, M.; O’Connor, M. Points to Patches: Enabling the Use of Self-Attention for 3D Shape Recognition. arXiv 2022, arXiv:2204.03957. [Google Scholar]
- Paul, S.; Patterson, Z.; Bouguila, N. DualMLP: A two-stream fusion model for 3D point cloud classification. In The Visual Computer; Springer Nature: Berlin/Heidelberg, Germany, 2023. [Google Scholar] [CrossRef]
- Li, Z.; Gao, P.; Yuan, H.; Wei, R.; Paul, M. Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), Los Alamitos, CA, USA, 10–14 July 2023; pp. 140–145. [Google Scholar] [CrossRef]
- Wu, C.; Zheng, J.; Pfrommer, J.; Beyerer, J. Attention-based point cloud edge sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5333–5343. [Google Scholar]
- Tang, Y.; Li, X.; Xu, J.; Yu, Q.; Hu, L.; Hao, Y.; Chen, M. Point-LGMask: Local and Global Contexts Embedding for Point Cloud Pre-training with Multi-Ratio Masking. IEEE Trans. Multimed. 2023; Early Access. [Google Scholar] [CrossRef]
- Wu, W.; Qi, Z.; Fuxin, L. Pointconv: Deep convolutional networks on 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9621–9630. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. Multi-scale context aggregation by dilated convolutions. arXiv 2018, arXiv:1811.11922. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Wang, H.; Zhu, Y.; Green, B.; Adam, H.; Yuille, A.; Chen, L.C. Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 108–126. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Hui, L.; Yang, H.; Cheng, M.; Xie, J.; Yang, J. Pyramid Point Cloud Transformer for Large-Scale Place Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 6098–6107. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7036–7045. [Google Scholar]
- Kirillov, A.; Girshick, R.; He, K.; Dollár, P. Panoptic feature pyramid networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6399–6408. [Google Scholar]
- Zhiheng, K.; Ning, L. PyramNet: Point cloud pyramid attention network and graph embedding module for classification and segmentation. arXiv 2019, arXiv:1906.03299. [Google Scholar]
- Lee, J.; Lee, Y.; Kim, J.; Kosiorek, A.; Choi, S.; Teh, Y.W. Set transformer: A framework for attention-based permutation-invariant neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 3744–3753. [Google Scholar]
- Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar]
- Uy, M.A.; Pham, Q.H.; Hua, B.S.; Nguyen, T.; Yeung, S.K. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1588–1597. [Google Scholar]
- Yi, L.; Kim, V.G.; Ceylan, D.; Shen, I.C.; Yan, M.; Su, H.; Lu, C.; Huang, Q.; Sheffer, A.; Guibas, L. A scalable active framework for region annotation in 3d shape collections. ACM Trans. Graph. (ToG) 2016, 35, 1–12. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. PMLR, Lille, France, 7–9 July 2015; pp. 448–456. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- Loshchilov, I.; Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Armeni, I.; Sener, O.; Zamir, A.R.; Jiang, H.; Brilakis, I.; Fischer, M.; Savarese, S. 3D Semantic Parsing of Large-Scale Indoor Spaces. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1534–1543. [Google Scholar] [CrossRef]
- Tesla V100 and RTX3090 Performance Comparison. Available online: https://www.techpowerup.com/gpu-specs/tesla-v100-pcie-16-gb.c2957 (accessed on 27 April 2023).
ModelNet40 | ScanObjectNN | ShapeNetPart | ||||
---|---|---|---|---|---|---|
Model | Year | OA | mAcc | OA | mAcc | Inst. mIoU |
(%) | (%) | (%) | (%) | (%) | ||
PointNet [1] | 2017 | 89.2 | 86.0 | 68.2 | 63.4 | 83.7 |
PointNet++ [6] | 2017 | 90.7 | - | 77.9 | 75.4 | 85.1 |
PointCNN [11] | 2018 | 92.5 | 88.1 | 78.5 | 75.1 | 86.1 |
SpiderCNN [12] | 2018 | 92.4 | - | - | - | 85.3 |
DGCNN [13] | 2019 | 92.9 | 90.2 | 78.1 | 73.6 | 85.2 |
KPConv [14] | 2019 | 92.9 | - | - | - | 86.4 |
MVTN [15] | 2021 | 93.8 | 92.2 | 82.8 | - | - |
DRNet [16] | 2021 | 93.1 | - | 80.3 | 78.0 | 86.4 |
GBNet [17] | 2021 | 93.8 | 91.0 | 80.5 | 77.8 | 85.9 |
Simpleview [18] | 2021 | 93.9 | 91.8 | 80.5 | - | - |
CurveNet [19] | 2021 | 93.8 | - | - | - | 86.8 |
PointBERT [20] | 2021 | 93.8 | - | 83.1 | - | 85.6 |
PRA-Net [21] | 2021 | 93.7 | 91.2 | 82.1 | 79.1 | 86.3 |
PointMLP [7] | 2022 | 94.1 | 91.5 | 85.4 ± 0.3 | 83.9 ± 0.5 | 86.1 |
Point-MAE [22] | 2022 | 94.0 | - | 85.2 | - | 86.1 |
Point-TnT [23] | 2022 | 92.6 | - | 83.5 | 81.0 | - |
DualMLP [24] | 2023 | 93.7 | - | 86.4 | - | - |
IBT [25] | 2023 | 93.6 | 91.0 | 82.8 | 80.0 | 86.2 |
APES [26] | 2023 | 93.8 | - | - | - | 86.6 |
Point-LGMask [27] | 2023 | - | - | 85.3 | - | 86.1 |
ModelNet40 | ScanObjectNN | ShapeNetPart | |||
---|---|---|---|---|---|
Model | OA | mAcc | OA | mAcc | Inst. mIoU |
(%) | (%) | (%) | (%) | (%) | |
PointMLP [7] | 94.1 | 91.5 | 85.4 ± 0.3 | 83.9 ± 0.5 | 86.1 |
MVTN [15] | 93.8 | 92.2 | 82.8 | - | - |
Point-TnT [23] | 92.6 | - | 83.5 | 81.0 | - |
CurveNet [19] | 93.8 | - | - | - | 86.8 |
PointStack | 93.3 | 89.6 | 86.9 ± 0.3 | 85.8 ± 0.3 | 87.2 |
best = 87.2 | best = 86.2 |
Multi-Resolution Features | Single-Resolution LPs | Multi-Resolution LP | OA (%) | mAcc (%) |
---|---|---|---|---|
- | - | - | 85.4 ± 0.3 | 83.9 ± 0.5 |
✓ | - | - | 85.8 ± 0.1 | 84.4 ± 0.1 |
✓ | ✓ | - | 86.5 ± 0.4 | 85.2 ± 0.2 |
✓ | ✓ | ✓ | 86.9 ± 0.3 | 85.8 ± 0.3 |
✓ | - | ✓ | 86.0 ± 0.7 | 84.9 ± 0.4 |
Pooling Function | OA (%) |
---|---|
Max Pooling | 0.22 |
Learnable Pooling | 0.26 |
Model | OAF (%) → OAS (%) |
---|---|
PointMLP | 85.4 ± 0.3 → 73.9 ± 0.3 |
PointStack | 86.9 ± 0.3 → 71.7 ± 0.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wijaya, K.T.; Paek, D.-H.; Kong, S.-H. Advanced Feature Learning on Point Clouds Using Multi-Resolution Features and Learnable Pooling. Remote Sens. 2024, 16, 1835. https://doi.org/10.3390/rs16111835
Wijaya KT, Paek D-H, Kong S-H. Advanced Feature Learning on Point Clouds Using Multi-Resolution Features and Learnable Pooling. Remote Sensing. 2024; 16(11):1835. https://doi.org/10.3390/rs16111835
Chicago/Turabian StyleWijaya, Kevin Tirta, Dong-Hee Paek, and Seung-Hyun Kong. 2024. "Advanced Feature Learning on Point Clouds Using Multi-Resolution Features and Learnable Pooling" Remote Sensing 16, no. 11: 1835. https://doi.org/10.3390/rs16111835
APA StyleWijaya, K. T., Paek, D. -H., & Kong, S. -H. (2024). Advanced Feature Learning on Point Clouds Using Multi-Resolution Features and Learnable Pooling. Remote Sensing, 16(11), 1835. https://doi.org/10.3390/rs16111835