PointStaClu: A Deep Point Cloud Clustering Method Based on Stable Cluster Discrimination
Abstract
:1. Introduction
- We introduce PointStaClu, a point cloud clustering method that utilizes stable clustering discrimination, eliminating the need for additional pretext tasks. The stable cluster discrimination (StaClu) task bolsters the stability of single-stage deep clustering by omitting the gradients from the negative instances in the cross-entropy loss function responsible for updating cluster centers. It also adaptively assigns greater weight to challenging instances during the update process.
- We incorporate an entropy-constrained strategy to refine the distribution of clusters within the dataset.
- Our framework for deep point cloud clustering is streamlined, employing a single loss function and encoder.
2. Related Work
2.1. Unsupervised Representation Learning
2.2. Deep Clustering
3. Methods
3.1. Cluster Discrimination
3.2. Steady Loss for Small Batch Optimization
- (a)
- When a small batch of b samples is taken for K clusters, at least K − b clusters have no positive instances at all. This means that in the case of cross-entropy loss, many clustering centers will only be updated by negative instances.
- (b)
- Let and represent the variance of the positive and negative instances of the sample, respectively. If each instance has a unit norm and the norm of the cluster mean is α, then there is . This shows that when each cluster is compact, i.e., α approaches 1, the variance of the negative instances sampled is much larger than the variance of the positive instances. Due to the small size of the small batches when training the deep neural networks, the variance cannot be sufficiently reduced.
3.3. Deep Clustering Based on Stable Clustering Discrimination
Algorithm 1 Pseudo-code of single-stage deep clustering based on stable clustering discrimination |
Input: F: encoder network c: cluster center : cluster center from the last epoch y: list of pseudo one-hot labels : weight for labels from the last epoch λ: temperature |
Initialization: keep the last cluster centers before each epoch. = c.detach() Train one epoch for P in loader do # load a minibatch with b samples = F(aug(P)), F(aug(P)) # two random enhanced views ) # retrieves the tag of the previous epoch Calculate the predictions for each view /λ) /λ) # soft labels are obtained for identification : loss of representational learning = (StaClu(, ) + StaClu (, ))/2 Update prediction is used for clustering .detach() @ c/λ) .detach() @ c/λ) Update cluster allocation using entropy constraints ) : loss of clustering center = (StaClu (, ) + StaClu (, ))/2 Update the encoder and clustering center loss.backward( ) end Output: Loss of stable discriminant clusters: loss |
4. Experiments and Results
4.1. Dataset and Evaluation Metrics
4.2. Implementation Details
4.2.1. Architecture
4.2.2. Optimization
4.2.3. Enhancement
4.3. Comparative Experiment and Analysis of Clustering Performance
4.4. Visual Experiment and Analysis
4.4.1. Visualization of Semantic Clustering
4.4.2. Visualization of the Presentation Features
4.5. Sensitivity Analysis
4.5.1. Effect of Negatives
4.5.2. Effect of MLP Heads
4.5.3. Effect of the α Parameter in Entropy Constraint
4.5.4. Effect of Data Enhancement Methods
- (a)
- Translate, as shown in Figure 5b: for the input original point cloud, calculate its coordinate value range in the X, Y, and Z directions, and randomly move the whole point cloud object in each axial direction, and the moving distance is less than 10% of the original point cloud range;
- (b)
- Scale, as shown in Figure 5c: scale the entire point cloud sample to between 80% and 125% of the original point cloud;
- (c)
- Rotate, as shown in Figure 5d: the rotation method randomly rotates the point cloud object in three axial directions, X, Y, and Z, with a rotation range of 15 degrees;
- (d)
- Random jitter, as shown in Figure 5e: the three-dimensional position of each point is measured with a uniform random bias within the range of [0, 0.05];
- (e)
- Crop, as shown in Figure 5f: Sample evenly between 60% and 100% of the original 3D point cloud to crop out a random 3D cube patch. The aspect ratio is controlled within the range of [0.75, 1.33];
- (f)
- Cutout, as shown in Figure 5g: randomly cut out a three-dimensional cube, and each dimension of the three-dimensional cube is within the range of [0.1, 0.4] of the original dimension;
- (g)
- Drop out, as shown in Figure 5h: drop out three-dimensional points, and the ratio is within the range of [0, 0.7];
- (h)
- Subsampling, as shown in Figure 5i: randomly select some points from the three-dimensional point cloud, and the number of points is based on the input dimension of the encoder.
4.5.5. Performance on an Imbalanced Dataset
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Xu, Y.; Arai, S.; Liu, D.; Lin, F.; Kosuge, K. FPCC: Fast point cloud clustering-based instance segmentation for industrial bin-picking. Neurocomputing 2022, 494, 255–268. [Google Scholar] [CrossRef]
- Ye, N.; Zhu, H.; Wei, M.; Zhang, L. Accurate and dense point cloud generation for industrial Measurement via target-free photogrammetry. Opt. Lasers Eng. 2021, 140, 106521. [Google Scholar] [CrossRef]
- Yin, C.; Wang, B.; Gan, V.J.; Wang, M.; Cheng, J.C. Automated semantic segmentation of industrial point clouds using ResPointNet++. Autom. Constr. 2021, 130, 103874. [Google Scholar] [CrossRef]
- Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on x-transformed points. Adv. Neural Inf. Process. Syst. 2018, 31, 828–838. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 2019, 38, 146. [Google Scholar] [CrossRef]
- Ran, H.; Zhuo, W.; Liu, J.; Lu, L. Learning inner-group relations on point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 15477–15487. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Santiago, Chile, 7–13 December 2015; pp. 1912–1920. [Google Scholar]
- Ma, X.; Qin, C.; You, H.; Ran, H.; Fu, Y. Rethinking network design and local geometry in point cloud: A simple residual MLP framework. arXiv 2022, arXiv:2202.07123. [Google Scholar]
- Uy, M.A.; Pham, Q.-H.; Hua, B.-S.; Nguyen, T.; Yeung, S.-K. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1588–1597. [Google Scholar]
- Rao, Y.; Lu, J.; Zhou, J. PointGLR: Unsupervised structural representation learning of 3D point clouds. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2193–2207. [Google Scholar] [CrossRef] [PubMed]
- Xiang, T.; Zhang, C.; Song, Y.; Yu, J.; Cai, W. Walk in the cloud: Learning curves for point clouds shape analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 915–924. [Google Scholar]
- Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 16259–16268. [Google Scholar]
- Wu, Z.; Xiong, Y.; Yu, S.X.; Lin, D. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3733–3742. [Google Scholar]
- MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 21 June–18 July 1965, 27 December 1965–7 January 1966; University of California Press: Berkeley, CA, USA, 1967; pp. 281–297. Available online: https://books.google.com.sg/books?hl=zh-CN&lr=&id=IC4Ku_7dBFUC&oi=fnd&pg=PA281&ots=nQTkKVMbtN&sig=s5CdqqD5NRDI_Hz0qDdsPWYglqk&redir_esc=y#v=onepage&q&f=false (accessed on 12 May 2024).
- Caron, M.; Bojanowski, P.; Joulin, A.; Douze, M. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 132–149. [Google Scholar]
- Li, Y.; Yang, M.; Peng, D.; Li, T.; Huang, J.; Peng, X. Twin contrastive learning for online clustering. Int. J. Comput. Vis. 2022, 130, 2205–2221. [Google Scholar] [CrossRef]
- Huang, J.; Gong, S.; Zhu, X. Deep semantic clustering by partition confidence maximisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8849–8858. [Google Scholar]
- Van Gansbeke, W.; Vandenhende, S.; Georgoulis, S.; Proesmans, M.; Van Gool, L. Scan: Learning to classify images without labels. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 268–285. [Google Scholar]
- Yang, Y.; Feng, C.; Shen, Y.; Tian, D. Foldingnet: Point cloud auto-encoder via deep grid deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 206–215. [Google Scholar]
- Wu, J.; Zhang, C.; Xue, T.; Freeman, B.; Tenenbaum, J. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Adv. Neural Inf. Process. Syst. 2016, 29, 82. [Google Scholar]
- Li, C.-L.; Zaheer, M.; Zhang, Y.; Poczos, B.; Salakhutdinov, R. Point cloud gan. arXiv 2018, arXiv:1810.05795. [Google Scholar]
- Xiao, A.; Huang, J.; Guan, D.; Zhang, X.; Lu, S.; Shao, L. Unsupervised point cloud representation learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 11321–11339. [Google Scholar] [CrossRef] [PubMed]
- Xie, S.; Gu, J.; Guo, D.; Qi, C.R.; Guibas, L.; Litany, O. Pointcontrast: Unsupervised Pre-training for 3d Point Cloud Understanding. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16. pp. 574–591. [Google Scholar]
- Pang, Y.; Wang, W.; Tay, F.E.; Liu, W.; Tian, Y.; Yuan, L. Masked autoencoders for point cloud self-supervised learning. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 604–621. [Google Scholar]
- Zhang, R.; Guo, Z.; Gao, P.; Fang, R.; Zhao, B.; Wang, D.; Qiao, Y.; Li, H. Point-m2ae: Multi-scale masked autoencoders for hierarchical point cloud pre-training. Adv. Neural Inf. Process. Syst. 2022, 35, 27061–27074. [Google Scholar]
- Zhang, R.; Wang, L.; Qiao, Y.; Gao, P.; Li, H. Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 21769–21780. [Google Scholar]
- Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural Inf. Process. Syst. 2020, 33, 9912–9924. [Google Scholar]
- Qian, Q. Stable cluster discrimination for deep clustering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Vancouver, BC, Canada, 17–24 June 2023; pp. 16645–16654. [Google Scholar]
- Zhang, L.; Zhu, Z. Unsupervised feature learning for point cloud understanding by contrasting and clustering using graph convolutional neural networks. In Proceedings of the 2019 International Conference on 3D Vision (3DV), Quebec City, QC, Canada, 16–19 September 2019; pp. 395–404. [Google Scholar]
- Hassani, K.; Haley, M. Unsupervised multi-task feature learning on point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8160–8171. [Google Scholar]
- Li, J.; Chen, B.M.; Lee, G.H. So-net: Self-organizing network for point cloud analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9397–9406. [Google Scholar]
- Girdhar, R.; Fouhey, D.F.; Rodriguez, M.; Gupta, A. Learning a predictable and generative vector representation for objects. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part VI 14. pp. 484–499. [Google Scholar]
- Achlioptas, P.; Diamanti, O.; Mitliagkas, I.; Guibas, L. Learning representations and generative models for 3d point clouds. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 10–15 July 2018; pp. 40–49. [Google Scholar]
- Liu, F.; Lin, G.; Foo, C.-S. Point discriminative learning for unsupervised representation learning on 3D point clouds. arXiv 2021, arXiv:2108.02104. [Google Scholar]
- Asano, Y.M.; Rupprecht, C.; Vedaldi, A. Self-labelling via simultaneous clustering and representation learning. arXiv 2019, arXiv:1911.05371. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Online, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- Dang, Z.; Deng, C.; Yang, X.; Wei, K.; Huang, H. Nearest neighbor matching for deep clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13693–13702. [Google Scholar]
- Chang, A.X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, H. Shapenet: An information-rich 3d model repository. arXiv 2015, arXiv:1512.03012. [Google Scholar]
- Huang, S.; Xie, Y.; Zhu, S.-C.; Zhu, Y. Spatio-temporal self-supervised representation learning for 3d point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 6535–6545. [Google Scholar]
- Afham, M.; Dissanayake, I.; Dissanayake, D.; Dharmasiri, A.; Thilakarathna, K.; Rodrigo, R. Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9902–9912. [Google Scholar]
- Wu, Y.; Liu, J.; Gong, M.; Gong, P.; Fan, X.; Qin, A.; Miao, Q.; Ma, W. Self-supervised intra-modal and cross-modal contrastive learning for point cloud understanding. IEEE Trans. Multimed. 2023, 26, 1626–1638. [Google Scholar] [CrossRef]
- Qian, Q.; Xu, Y.; Hu, J.; Li, H.; Jin, R. Unsupervised visual representation learning by online constrained k-means. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16640–16649. [Google Scholar]
- Huang, Z.; Chen, J.; Zhang, J.; Shan, H. Learning representation for clustering via prototype scattering and positive sampling. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 7509–7524. [Google Scholar] [CrossRef] [PubMed]
- Zhou, S.; Xu, H.; Zheng, Z.; Chen, J.; Bu, J.; Wu, J.; Wang, X.; Zhu, W.; Ester, M. A comprehensive survey on deep clustering: Taxonomy, challenges, and future directions. arXiv 2022, arXiv:2206.07579. [Google Scholar]
- Min, E.; Guo, X.; Liu, Q.; Zhang, G.; Cui, J.; Long, J. A survey of clustering with deep learning: From the perspective of network architecture. IEEE Access 2018, 6, 39501–39514. [Google Scholar] [CrossRef]
- Kuhn, H.W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [Google Scholar] [CrossRef]
- Arthur, D.; Vassilvitskii, S. K-means++: The Advantages of Careful Seeding; Stanford University: Stanford, CA, USA, 2007; pp. 1027–1035. [Google Scholar]
- Ng, A.; Jordan, M.; Weiss, Y. On spectral clustering: Analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2001, 14, 849–856. [Google Scholar]
- Franti, P.; Virmajoki, O.; Hautamaki, V. Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1875–1881. [Google Scholar] [CrossRef]
- Niu, C.; Shan, H.; Wang, G. Spice: Semantic pseudo-labeling for image clustering. IEEE Trans. Image Process. 2022, 31, 7264–7278. [Google Scholar] [CrossRef] [PubMed]
Dataset | Sample | Class |
---|---|---|
ShapeNet | 14,890 | 10 |
ModelNet40 | 4350 | 10 |
Method | Multi-Stage | ShapeNet | ModelNet40 | ||||
---|---|---|---|---|---|---|---|
ACC | NMI | ARI | ACC | NMI | ARI | ||
Supervised(DGCNN) | 0.9514 | 0.8865 | 0.9053 | 0.9854 | 0.9782 | 0.9839 | |
K-means++ | 0.5039 | 0.4704 | 0.3099 | 0.1324 | 0.019 | 0.003 | |
SC | 0.2975 | 0.3170 | 0.1001 | 0.1391 | 0.0316 | 0.0061 | |
AC | 0.5776 | 0.5144 | 0.3698 | 0.1007 | 0.0267 | 0.0033 | |
STRL | √ | 0.7133 | 0.6755 | 0.5483 | 0.8856 | 0.8406 | 0.8025 |
Point-MAE | √ | 0.7222 | 0.6352 | 0.5292 | 0.7713 | 0.7874 | 0.6843 |
Point-M2AE | √ | 0.7793 | 0.7054 | 0.6038 | 0.8179 | 0.8059 | 0.7356 |
I2P-MAE | √ | 0.7952 | 0.7328 | 0.6779 | 0.8528 | 0.8343 | 0.8367 |
PointStaClu | 0.9236 | 0.8558 | 0.8440 | 0.9660 | 0.9410 | 0.9297 |
Loss Function | #Max | #Min | ACC | NMI | ARI |
---|---|---|---|---|---|
CE | 1989 | 1326 | 0.5533 | 0.5328 | 0.4125 |
StaClu | 1563 | 1408 | 0.9236 | 0.8558 | 0.8440 |
#Proj | #Pred | ACC | NMI | ARI |
---|---|---|---|---|
0 | 0 | 0.8665 | 0.7952 | 0.7611 |
1 | 0 | 0.9006 | 0.8278 | 0.8037 |
2 | 0 | 0.9236 | 0.8558 | 0.8440 |
3 | 0 | 0.9172 | 0.8440 | 0.8257 |
3 | 2 | 0.9147 | 0.8360 | 0.8315 |
α | #MAX | #MIN | ACC | NMI | ARI |
---|---|---|---|---|---|
10,000 | 1489 | 1478 | 0.8893 | 0.8063 | 0.7792 |
4000 | 1638 | 1310 | 0.9165 | 0.8416 | 0.8290 |
2200 | 1563 | 1408 | 0.9236 | 0.8558 | 0.8440 |
1000 | 1660 | 1192 | 0.8890 | 0.8070 | 0.7801 |
100 | 6500 | 0 | 0.3294 | 0.3375 | 0.2505 |
0 | 14,890 | 0 | 0.1 | 0.0 | 0.0 |
Model | Crop | Cutout | Rotate | Translation | Scale | Jitter | Drop | ACC | NMI | ARI |
---|---|---|---|---|---|---|---|---|---|---|
√ | √ | √ | √ | √ | √ | √ | 0.9236 | 0.8558 | 0.8440 | |
× | × | × | × | × | × | × | 0.8550 | 0.7890 | 0.7595 | |
× | × | √ | √ | √ | √ | √ | 0.8737 | 0.8028 | 0.7764 | |
√ | × | √ | √ | √ | √ | √ | 0.9156 | 0.8418 | 0.8297 | |
× | √ | √ | √ | √ | √ | √ | 0.8792 | 0.8117 | 0.7737 |
Metric | ACC | NMI | ARI | #MAX | #MIN |
---|---|---|---|---|---|
ζ = 1.0 | 0.9236 | 0.8558 | 0.8440 | 1563 | 1408 |
ζ = 0.8 | 0.9104 | 0.8332 | 0.8233 | 2081 | 1335 |
ζ = 0.6 | 0.9028 | 0.8243 | 0.8117 | 2109 | 1392 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cao, X.; Wang, H.; Zhu, Q.; Wang, Y.; Liu, X.; Li, K.; Su, L. PointStaClu: A Deep Point Cloud Clustering Method Based on Stable Cluster Discrimination. Remote Sens. 2024, 16, 2423. https://doi.org/10.3390/rs16132423
Cao X, Wang H, Zhu Q, Wang Y, Liu X, Li K, Su L. PointStaClu: A Deep Point Cloud Clustering Method Based on Stable Cluster Discrimination. Remote Sensing. 2024; 16(13):2423. https://doi.org/10.3390/rs16132423
Chicago/Turabian StyleCao, Xin, Haoyu Wang, Qiuquan Zhu, Yifan Wang, Xiu Liu, Kang Li, and Linzhi Su. 2024. "PointStaClu: A Deep Point Cloud Clustering Method Based on Stable Cluster Discrimination" Remote Sensing 16, no. 13: 2423. https://doi.org/10.3390/rs16132423
APA StyleCao, X., Wang, H., Zhu, Q., Wang, Y., Liu, X., Li, K., & Su, L. (2024). PointStaClu: A Deep Point Cloud Clustering Method Based on Stable Cluster Discrimination. Remote Sensing, 16(13), 2423. https://doi.org/10.3390/rs16132423