SS-TMNet: Spatial–Spectral Transformer Network with Multi-Scale Convolution for Hyperspectral Image Classification
Abstract
:1. Introduction
- We design a new Transformer-based HSI classification method (SS-TMNet), which uses multi-scale convolution and spatial–spectral attention to extract local and global information efficiently.
- We design an MSCP module to extract the fused spatial–spectral features as the initial feature projection. This module uses multi-scale 3D convolutions and feature fusion to extract fused spatial–spectral features from multiple scales efficiently.
- We propose an SSAM module to encode the input features from the height, width, and spectral dimensions. We use multi-dimensional convolution and self-attention to extract more effective local and global spatial–spectral features.
- We have conducted extensive experiments based on three benchmark datasets. The experimental results show that the proposed SS-TMNet outperforms the state-of-the-art CNN-based and Transformer-based hyperspectral image classifiers.
2. Related Work
2.1. Traditional Classification Methods
2.2. CNN-Based Methods
2.3. Transformer-Based Methods
3. The Proposed SS-TMNet Method
3.1. The Framework of the Proposed SS-TMNet
3.2. MSCP Module
3.2.1. Multi-Scale 3D Convolution
3.2.2. Module Composition
3.3. Encoder Sequence
3.3.1. Encoder
3.3.2. SSAM Module
4. Experiments
4.1. Datasets
4.1.1. Pavia University Dataset
4.1.2. Indian Pines Dataset
4.1.3. Houston2013 Dataset
4.2. Experimental Setup
4.2.1. Parameters Setting
4.2.2. Evaluation Metrics
4.2.3. Baselines
- Mou [16]: An RNN-based method, which uses a recurrent layer containing multiple gated recurrent units. In addition, a fully connection layer and softmax layer are utilized to construct the network.
- He [21]: A 3D-CNN-based method is composed of 3D convolution layers and multi-scale 3D convolution layers. Each multi-scale 3D convolution layer consists of four sublayers.
- 3D-CNN [55]: Another 3D-CNN method includes three convolution blocks and two fully connection layers. Each convolution block includes a 3D convolution layer, a BatchNorm layer, and an average pooling layer.
- HybridSN [56]: A method integrating 2D and 3D convolution, including three 3D convolution layers, one 2D convolution layer, and two fully connection layers.
- ViT [18]: A classic Transformer-based method, which firstly splits the input image into 16 × 16 patches and then feed them into the Transformer encoder to learning the representation of the image.
- CrossViT [51]: A method based on dual-branch ViT architecture, where each branch contains a linear projection layer and a different number of Transformer encoders for processing different sized image patches.
- LeViT [50]: Another Transformer-based method, which includes four convolution layers and three stage codes, and each stage contains four multiple attention layers. We replicated the methods used for HSI classification according to this architecture.
- RvT [57]: Based on ViT, the RvT method uses a pooling layer to downsample the image and reduce the size of the images. We follow this architecture to design the network for the HSI classification tasks.
- HiT [53]: A method of embedding convolution into Transformer, which uses two proposed SACP layers based on 3D convolution to process the input image. Feature extraction is performed using a three-branch convolution layer based on transformer architecture.
4.3. Results and Analysis
4.3.1. Experimental Analysis on Pavia University Dataset
4.3.2. Experimental Analysis on Indian Pines Dataset
4.3.3. Experimental Analysis on Houston2013 Dataset
4.3.4. Student’s t-Test
4.4. Ablation Studies
4.4.1. The Effectiveness of the MSCP Module
Methods | OA(%) | Mean(−) | Std(+) | Kappa(%) | Mean(−) | Std(+) |
---|---|---|---|---|---|---|
ViT | 88.92 ± 0.31 | −2.28% | +0.19% | 85.81 ± 0.40 | −3.63% | +0.24% |
Linear + SSAM | 90.58 ± 0.24 | −1.16% | +0.12% | 87.95 ± 0.30 | −1.49% | +0.14% |
Conv2D + SSAM | 91.53 ± 0.12 | −0.21% | +0.00% | 89.18 ± 0.16 | −0.26% | +0.00% |
SACP [53] + SSAM | 91.59 ± 0.19 | −0.15% | +0.03% | 89.25 ± 0.24 | −0.19% | +0.08% |
MSCP + SSAM | 91.74 ± 0.12 | 0% | 0% | 89.44 ± 0.16 | 0% | 0% |
4.4.2. The Effectiveness of the SSAM Module
Methods | OA(%) | Mean(−) | Std(+) | Kappa(%) | Mean(−) | Std(+) |
---|---|---|---|---|---|---|
ViT | 88.92 ± 0.31 | −2.82% | +0.19% | 85.81 ± 0.40 | −3.63% | +0.24% |
MSCP + Linear | 90.15 ± 0.30 | −1.59% | +0.18% | 87.40 ± 0.38 | −2.04% | +0.22% |
MSCP + ConvPermute [53] | 91.29 ± 0.17 | −0.45% | +0.05% | 88.86 ± 0.22 | −0.58% | +0.06% |
MSCP + ViP [58] | 91.37 ± 0.40 | −0.37% | +0.28% | 88.97 ± 0.52 | −0.47% | +0.36% |
MSCP + SSAM | 91.74 ± 0.12 | 0% | 0% | 89.44 ± 0.16 | 0% | 0% |
4.5. Scability
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.; Chanussot, J. Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef] [Green Version]
- Zhan, T.; Song, B.; Sun, L.; Jia, X.; Wan, M.; Yang, G.; Wu, Z. TDSSC: A three-directions spectral–spatial convolution neural network for hyperspectral image change detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 377–388. [Google Scholar] [CrossRef]
- Ahmad, M.; Shabbir, S.; Roy, S.K.; Hong, D.; Wu, X.; Yao, J.; Khan, A.M.; Mazzara, M.; Distefano, S.; Chanussot, J. Hyperspectral image classification—Traditional to deep models: A survey for future prospects. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 15, 968–999. [Google Scholar] [CrossRef]
- Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef] [Green Version]
- Samaniego, L.; Bárdossy, A.; Schulz, K. Supervised classification of remotely sensed imagery using a modified k-NN technique. IEEE Trans. Geosci. Remote Sens. 2008, 46, 2112–2125. [Google Scholar] [CrossRef]
- Li, J.; Bioucas-Dias, J.M.; Plaza, A. Semisupervised hyperspectral image segmentation using multinomial logistic regression with active learning. IEEE Trans. Geosci. Remote Sens. 2010, 48, 4085–4098. [Google Scholar] [CrossRef] [Green Version]
- Benediktsson, J.A.; Palmason, J.A.; Sveinsson, J.R. Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Trans. Geosci. Remote Sens. 2005, 43, 480–491. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Algan, G.; Ulusoy, I. Image classification with deep learning in the presence of noisy labels: A survey. Knowl.-Based Syst. 2021, 215, 106771. [Google Scholar] [CrossRef]
- Touvron, H.; Bojanowski, P.; Caron, M.; Cord, M.; El-Nouby, A.; Grave, E.; Izacard, G.; Joulin, A.; Synnaeve, G.; Verbeek, J.; et al. Resmlp: Feedforward networks for image classification with data-efficient training. IEEE Trans. Pattern Anal. Mach. Intell. 2022. [Google Scholar] [CrossRef]
- Zhao, Z.Q.; Zheng, P.; Xu, S.t.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [Green Version]
- Minaee, S.; Boykov, Y.Y.; Porikli, F.; Plaza, A.J.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef] [PubMed]
- Song, W.; Li, S.; Fang, L.; Lu, T. Hyperspectral image classification with deep feature fusion network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3173–3184. [Google Scholar] [CrossRef]
- Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
- Mou, L.; Ghamisi, P.; Zhu, X.X. Deep recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3639–3655. [Google Scholar] [CrossRef] [Green Version]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the 9th International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- He, J.; Zhao, L.; Yang, H.; Zhang, M.; Li, W. HSI-BERT: Hyperspectral image classification using the bidirectional encoder representation from transformers. IEEE Trans. Geosci. Remote Sens. 2019, 58, 165–178. [Google Scholar] [CrossRef]
- Hao, J.; Dong, F.; Li, Y.; Wang, S.; Cui, J.; Zhang, Z.; Wu, K. Investigation of the data fusion of spectral and textural data from hyperspectral imaging for the near geographical origin discrimination of wolfberries using 2D-CNN algorithms. Infrared Phys. Technol. 2022, 125, 104286. [Google Scholar] [CrossRef]
- He, M.; Li, B.; Chen, H. Multi-Scale 3D Deep Convolutional Neural Network for Hyperspectral Image Classification. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3904–3908. [Google Scholar]
- Fang, B.; Liu, Y.; Zhang, H.; He, J. Hyperspectral Image Classification Based on 3D Asymmetric Inception Network with Data Fusion Transfer Learning. Remote Sens. 2022, 14, 1711. [Google Scholar] [CrossRef]
- Chang, Y.L.; Tan, T.H.; Lee, W.H.; Chang, L.; Chen, Y.N.; Fan, K.C.; Alkhaleefah, M. Consolidated Convolutional Neural Network for Hyperspectral Image Classification. Remote Sens. 2022, 14, 1571. [Google Scholar] [CrossRef]
- Zhou, D.; Kang, B.; Jin, X.; Yang, L.; Lian, X.; Jiang, Z.; Hou, Q.; Feng, J. Deepvit: Towards Deeper Vision Transformer. arXiv 2021, arXiv:2103.11886. [Google Scholar]
- He, X.; Chen, Y.; Lin, Z. Spatial-Spectral Transformer for Hyperspectral Image Classification. Remote Sens. 2021, 13, 498. [Google Scholar] [CrossRef]
- Yu, D.; Li, Q.; Wang, X.; Zhang, Z.; Qian, Y.; Xu, C. DSTrans: Dual-Stream Transformer for Hyperspectral Image Restoration. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–7 January 2023; pp. 3739–3749. [Google Scholar]
- Li, J.; Xing, H.; Ao, Z.; Wang, H.; Liu, W.; Zhang, A. Convolution-Transformer Adaptive Fusion Network for Hyperspectral Image Classification. Appl. Sci. 2023, 13, 492. [Google Scholar] [CrossRef]
- Sun, L.; Zhao, G.; Zheng, Y.; Wu, Z. Spectral-Spatial Feature Tokenization Transformer for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5522214. [Google Scholar] [CrossRef]
- Wang, Y.; Jiang, S.; Xu, M.; Zhang, S.; Jia, S. A Center-Masked Convolutional Transformer for Hyperspectral Image Classification. In Proceedings of the 31st International Joint Conference on Artificial Intelligence, Vienna, Austria, 23–29 July 2022; Volume 3207, pp. 1–6. [Google Scholar]
- Zhang, Y.; Wang, X.; Jiang, X.; Zhou, Y. Marginalized graph self-representation for unsupervised hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5516712. [Google Scholar] [CrossRef]
- Ding, Y.; Zhang, Z.; Zhao, X.; Hong, D.; Cai, W.; Yu, C.; Yang, N.; Cai, W. Multi-feature fusion: Graph neural network and CNN combining for hyperspectral image classification. Neurocomputing 2022, 501, 246–257. [Google Scholar] [CrossRef]
- Zhang, Z.; Ding, Y.; Zhao, X.; Siye, L.; Yang, N.; Cai, Y.; Zhan, Y. Multireceptive field: An adaptive path aggregation graph neural framework for hyperspectral image classification. Expert Syst. Appl. 2023, 217, 119508. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, Y.; Chen, X.; Jiang, X.; Zhou, Y. Spectral–Spatial Feature Extraction With Dual Graph Autoencoder for Hyperspectral Image Clustering. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 8500–8511. [Google Scholar] [CrossRef]
- Ding, Y.; Zhang, Z.; Zhao, X.; Hong, D.; Li, W.; Cai, W.; Zhan, Y. AF2GNN: Graph convolution with adaptive filters and aggregator fusion for hyperspectral image classification. Inf. Sci. 2022, 602, 201–219. [Google Scholar] [CrossRef]
- Ding, Y.; Zhang, Z.; Zhao, X.; Cai, W.; Yang, N.; Hu, H.; Huang, X.; Cao, Y.; Cai, W. Unsupervised self-correlated learning smoothy enhanced locality preserving graph convolution embedding clustering for hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5536716. [Google Scholar] [CrossRef]
- Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking hyperspectral image classification with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5518615. [Google Scholar] [CrossRef]
- He, X.; Chen, Y.; Li, Q. Two-Branch Pure Transformer for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6015005. [Google Scholar] [CrossRef]
- Feng, J.; Luo, X.; Li, S.; Wang, Q.; Yin, J. Spectral Transformer with Dynamic Spatial Sampling and Gaussian Positional Embedding for Hyperspectral Image Classification. In Proceedings of the International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 3556–3559. [Google Scholar]
- Ding, Y.; Zhang, Z.; Zhao, X.; Cai, Y.; Li, S.; Deng, B.; Cai, W. Self-supervised locality preserving low-pass graph convolutional embedding for large-scale hyperspectral image clustering. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5536016. [Google Scholar] [CrossRef]
- Rakotomamonjy, A.; Bach, F.; Canu, S.; Grandvalet, Y. SimpleMKL. J. Mach. Learn. Res. 2008, 9, 2491–2521. [Google Scholar]
- Dalla Mura, M.; Atli Benediktsson, J.; Waske, B.; Bruzzone, L. Extended profiles with morphological attribute filters for the analysis of hyperspectral data. Int. J. Remote Sens. 2010, 31, 5975–5991. [Google Scholar] [CrossRef]
- Li, J.; Marpu, P.R.; Plaza, A.; Bioucas-Dias, J.M.; Benediktsson, J.A. Generalized Composite Kernel Framework for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2013, 51, 4816–4829. [Google Scholar] [CrossRef]
- Bandos, T.V.; Bruzzone, L.; Camps-Valls, G. Classification of Hyperspectral Images With Regularized Linear Discriminant Analysis. IEEE Trans. Geosci. Remote Sens. 2009, 47, 862–873. [Google Scholar] [CrossRef]
- Villa, A.; Benediktsson, J.A.; Chanussot, J.; Jutten, C. Hyperspectral Image Classification With Independent Component Discriminant Analysis. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4865–4876. [Google Scholar] [CrossRef] [Green Version]
- Licciardi, G.; Marpu, P.R.; Chanussot, J.; Benediktsson, J.A. Linear Versus Nonlinear PCA for the Classification of Hyperspectral Data Based on the Extended Morphological Profiles. IEEE Geosci. Remote Sens. Lett. 2011, 9, 447–451. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Q.; Tian, Y.; Yang, Y.; Pan, C. Automatic spatial–spectral feature selection for hyperspectral image via discriminative sparse multimodal learning. IEEE Trans. Geosci. Remote Sens. 2014, 53, 261–279. [Google Scholar] [CrossRef]
- Jouni, M.; Dalla Mura, M.; Comon, P. Hyperspectral image classification based on mathematical morphology and tensor decomposition. Math.-Morphol.-Theory Appl. 2020, 4, 1–30. [Google Scholar] [CrossRef] [Green Version]
- Luo, F.; Huang, H.; Duan, Y.; Liu, J.; Liao, Y. Local geometric structure feature for dimensionality reduction of hyperspectral imagery. Remote Sens. 2017, 9, 790. [Google Scholar] [CrossRef] [Green Version]
- Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H. Deep Convolutional Neural Networks for Hyperspectral Image Classification. J. Sens. 2015, 2015, 258619. [Google Scholar] [CrossRef] [Green Version]
- Graham, B.; El-Nouby, A.; Touvron, H.; Stock, P.; Joulin, A.; Jégou, H.; Douze, M. Levit: A Vision Transformer in Convnet’s Clothing for Faster Inference. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 12259–12269. [Google Scholar]
- Chen, C.F.R.; Fan, Q.; Panda, R. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 357–366. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
- Yang, X.; Cao, W.; Lu, Y.; Zhou, Y. Hyperspectral Image Transformer Classification Networks. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5528715. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar]
- Sharma, V.; Diba, A.; Tuytelaars, T.; Van Gool, L. Hyperspectral CNN for Image Classification & Band Selection, with Application to Face Recognition; Technical Report KUL/ESAT/PSI/1604, KU Leuven; ESAT: Leuven, Belgium, 2016. [Google Scholar]
- Roy, S.K.; Krishna, G.; Dubey, S.R.; Chaudhuri, B.B. HybridSN: Exploring 3-D–2-D CNN feature hierarchy for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2019, 17, 277–281. [Google Scholar] [CrossRef] [Green Version]
- Heo, B.; Yun, S.; Han, D.; Chun, S.; Choe, J.; Oh, S.J. Rethinking spatial dimensions of vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 11936–11945. [Google Scholar]
- Hou, Q.; Jiang, Z.; Yuan, L.; Cheng, M.M.; Yan, S.; Feng, J. Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 1328–1334. [Google Scholar] [CrossRef]
Class | Methods | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
# | Mou | He | 3D-CNN | HybridSN | ViT | CrossViT | LeViT | RvT | HiT | SS-TMNet |
1 | 90.32 ± 0.41 | 93.34 ± 0.80 | 93.97 ± 0.74 | 95.30 ± 0.57 | 92.92 ± 0.58 | 94.67 ± 0.22 | 95.70 ± 0.37 | 94.63 ± 0.43 | 95.14 ± 0.28 | 96.11 ± 0.24 |
2 | 95.77 ± 0.14 | 92.11 ± 0.20 | 92.56 ± 0.10 | 92.65 ± 0.06 | 91.13 ± 0.21 | 92.13 ± 0.13 | 92.61 ± 0.10 | 91.95 ± 0.19 | 92.53 ± 0.08 | 92.67 ± 0.08 |
3 | 75.34 ± 0.69 | 84.99 ± 2.20 | 88.73 ± 1.39 | 90.68 ± 1.44 | 82.35 ± 1.41 | 87.63 ± 0.87 | 91.48 ± 1.04 | 87.41 ± 1.00 | 89.91 ± 1.29 | 92.35 ± 0.66 |
4 | 94.63 ± 0.46 | 97.08 ± 0.29 | 96.31 ± 0.40 | 97.30 ± 0.24 | 95.80 ± 0.47 | 96.86 ± 0.31 | 96.80 ± 0.26 | 96.92 ± 0.39 | 97.15 ± 0.17 | 96.46 ± 0.49 |
5 | 99.80 ± 0.15 | 99.77 ± 0.11 | 99.79 ± 0.16 | 99.93 ± 0.08 | 99.69 ± 0.21 | 99.89 ± 0.07 | 99.51 ± 0.77 | 99.94 ± 0.06 | 99.91 ± 0.07 | 99.66 ± 0.16 |
6 | 85.96 ± 0.53 | 97.66 ± 0.79 | 99.52 ± 0.29 | 99.77 ± 0.18 | 94.74 ± 0.86 | 98.26 ± 0.39 | 99.54 ± 0.14 | 97.61 ± 0.61 | 99.38 ± 0.23 | 99.91 ± 0.09 |
7 | 71.43 ± 2.52 | 91.48 ± 1.65 | 92.22 ± 1.68 | 96.51 ± 1.55 | 90.72 ± 1.34 | 95.05 ± 0.95 | 97.90 ± 1.08 | 95.92 ± 0.85 | 95.79 ± 1.50 | 99.05 ± 0.57 |
8 | 82.87 ± 0.67 | 94.39 ± 1.05 | 95.95 ± 1.26 | 97.25 ± 1.22 | 94.43 ± 0.58 | 96.69 ± 0.38 | 98.84 ± 0.29 | 96.44 ± 0.53 | 97.39 ± 0.57 | 98.31 ± 0.38 |
9 | 99.44 ± 0.20 | 98.97 ± 1.00 | 97.50 ± 1.63 | 99.61 ± 0.37 | 97.79 ± 0.96 | 99.74 ± 0.19 | 97.83 ± 2.12 | 99.83 ± 0.19 | 99.47 ± 0.25 | 98.02 ± 0.76 |
OA(%) | 91.14 ± 0.21 | 89.97 ± 0.36 | 90.72 ± 0.37 | 91.44 ± 0.28 | 88.92 ± 0.31 | 90.70 ± 0.13 | 91.58 ± 0.17 | 90.55 ± 0.20 | 91.28 ± 0.21 | 91.74 ± 0.12 |
K(%) | 88.19 ± 0.27 | 87.17 ± 0.47 | 88.13 ± 0.47 | 89.06 ± 0.35 | 85.81 ± 0.40 | 88.11 ± 0.17 | 89.24 ± 0.22 | 87.92 ± 0.25 | 88.85 ± 0.27 | 89.44 ± 0.16 |
Class | Methods | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
# | Mou | He | 3D-CNN | HybridSN | ViT | CrossViT | LeViT | RvT | HiT | SS-TMNet |
1 | 31.28 ± 11.84 | 70.82 ± 13.59 | 49.33 ± 19.93 | 34.68 ± 26.05 | 50.11 ± 10.28 | 62.57 ± 11.25 | 65.77 ± 11.61 | 51.24 ± 14.29 | 80.64 ± 8.22 | 87.48 ± 8.15 |
2 | 72.76 ± 1.68 | 62.45 ± 10.18 | 68.47 ± 4.06 | 67.37 ± 20.83 | 65.46 ± 2.57 | 62.14 ± 8.14 | 88.59 ± 4.60 | 76.48 ± 3.17 | 86.18 ± 4.40 | 88.56 ± 2.34 |
3 | 55.39 ± 2.30 | 48.96 ± 12.23 | 51.89 ± 7.60 | 44.89 ± 22.79 | 52.57 ± 2.58 | 41.82 ± 9.88 | 72.43 ± 3.22 | 65.42 ± 5.50 | 69.94 ± 4.99 | 76.50 ± 3.23 |
4 | 47.20 ± 6.37 | 48.86 ± 14.14 | 40.95 ± 10.33 | 34.47 ± 25.45 | 57.92 ± 7.50 | 65.34 ± 11.69 | 77.73 ± 3.51 | 77.96 ± 5.31 | 75.63 ± 5.39 | 82.19 ± 3.92 |
5 | 85.59 ± 2.77 | 62.62 ± 19.22 | 75.29 ± 4.92 | 55.75 ± 24.99 | 52.76 ± 5.08 | 55.28 ± 5.24 | 79.78 ± 2.11 | 50.33 ± 3.87 | 75.26 ± 2.71 | 81.71 ± 3.49 |
6 | 93.19 ± 0.92 | 91.35 ± 4.93 | 93.40 ± 3.30 | 81.17 ± 21.38 | 79.49 ± 2.52 | 88.64 ± 1.35 | 95.69 ± 1.53 | 86.43 ± 2.16 | 94.79 ± 1.60 | 97.76 ± 0.92 |
7 | 50.16 ± 17.71 | 46.70 ± 15.68 | 22.49 ± 17.72 | 13.61 ± 16.56 | 43.72 ± 16.49 | 38.62 ± 33.31 | 21.61 ± 25.55 | 62.93 ± 21.35 | 73.03 ± 18.03 | 72.09 ± 16.30 |
8 | 93.37 ± 0.81 | 92.52 ± 2.77 | 91.76 ± 1.40 | 75.92 ± 26.44 | 89.41 ± 2.32 | 89.45 ± 2.41 | 91.48 ± 1.89 | 92.02 ± 1.13 | 93.09 ± 0.78 | 94.39 ± 0.47 |
9 | 33.62 ± 14.20 | 63.57 ± 18.28 | 35.74 ± 23.68 | 32.78 ± 29.91 | 31.35 ± 13.48 | 13.99 ± 19.46 | 23.43 ± 26.35 | 47.50 ± 13.45 | 59.99 ± 16.58 | 68.66 ± 17.26 |
10 | 66.05 ± 1.98 | 62.03 ± 17.77 | 72.40 ± 4.64 | 52.51 ± 32.21 | 61.48 ± 2.95 | 59.01 ± 6.94 | 83.09 ± 3.11 | 73.70 ± 4.17 | 85.34 ± 3.43 | 87.19 ± 1.98 |
11 | 72.82 ± 1.14 | 75.50 ± 6.83 | 79.24 ± 2.16 | 80.16 ± 9.91 | 72.26 ± 1.33 | 70.54 ± 4.43 | 92.85 ± 1.22 | 79.74 ± 3.35 | 89.73 ± 2.47 | 90.70 ± 1.63 |
12 | 60.66 ± 2.77 | 50.03 ± 15.02 | 58.43 ± 8.37 | 49.05 ± 25.87 | 51.64 ± 2.99 | 40.73 ± 14.18 | 83.30 ± 5.45 | 66.83 ± 6.57 | 76.38 ± 7.70 | 81.85 ± 3.97 |
13 | 94.23 ± 2.25 | 92.96 ± 4.79 | 96.80 ± 2.83 | 70.88 ± 25.35 | 86.61 ± 3.46 | 87.18 ± 3.73 | 92.75 ± 5.96 | 88.69 ± 5.38 | 95.57 ± 1.90 | 97.18 ± 3.02 |
14 | 92.56 ± 0.75 | 92.44 ± 2.79 | 93.60 ± 1.34 | 89.57 ± 10.91 | 88.50 ± 1.37 | 90.59 ± 0.57 | 97.09 ± 0.66 | 89.64 ± 1.10 | 94.53 ± 1.25 | 96.21 ± 0.89 |
15 | 61.43 ± 3.35 | 48.79 ± 5.36 | 44.69 ± 9.52 | 29.38 ± 13.56 | 44.94 ± 3.84 | 47.55 ± 4.32 | 58.74 ± 6.96 | 48.06 ± 7.94 | 58.84 ± 7.65 | 63.92 ± 3.78 |
16 | 84.57 ± 2.65 | 55.79 ± 16.94 | 55.15 ± 15.24 | 30.91 ± 30.87 | 48.29 ± 12.06 | 27.14 ± 28.65 | 87.47 ± 10.02 | 94.67 ± 3.59 | 86.10 ± 6.24 | 87.73 ± 3.15 |
OA(%) | 75.27 ± 0.77 | 69.25 ± 6.60 | 72.59 ± 2.80 | 67.26 ± 13.98 | 66.21 ± 0.89 | 65.71 ± 4.03 | 83.63 ± 1.13 | 73.98 ± 2.35 | 82.13 ± 2.65 | 84.67 ± 1.25 |
K(%) | 71.57 ± 0.87 | 64.72 ± 8.00 | 68.69 ± 3.27 | 62.21 ± 16.96 | 61.65 ± 0.98 | 60.79 ± 4.76 | 81.55 ± 1.27 | 70.55 ± 2.65 | 79.77 ± 3.02 | 82.66 ± 1.41 |
Class | Methods | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
# | Mou | He | 3D-CNN | HybridSN | ViT | CrossViT | LeViT | RvT | HiT | SS-TMNet |
1 | 95.49 ± 0.92 | 95.45 ± 1.45 | 96.31 ± 1.91 | 97.74 ± 0.72 | 95.82 ± 0.84 | 94.36 ± 2.16 | 94.66 ± 1.51 | 97.50 ± 0.54 | 96.99 ± 0.87 | 97.60 ± 0.64 |
2 | 96.28 ± 0.68 | 97.04 ± 1.46 | 96.40 ± 1.51 | 97.54 ± 0.91 | 96.03 ± 0.98 | 94.91 ± 2.60 | 95.37 ± 2.58 | 98.42 ± 0.24 | 97.69 ± 0.52 | 98.44 ± 0.56 |
3 | 99.97 ± 0.05 | 99.03 ± 0.31 | 98.91 ± 0.94 | 99.21 ± 1.00 | 98.15 ± 0.65 | 98.59 ± 1.07 | 92.46 ± 8.68 | 99.70 ± 0.29 | 99.28 ± 0.69 | 99.50 ± 0.23 |
4 | 96.50 ± 0.97 | 95.59 ± 1.18 | 96.54 ± 1.52 | 98.35 ± 0.84 | 95.25 ± 0.79 | 97.10 ± 0.37 | 94.50 ± 1.50 | 98.32 ± 0.54 | 97.45 ± 0.64 | 97.26 ± 0.96 |
5 | 97.76 ± 0.71 | 95.07 ± 1.55 | 96.38 ± 0.63 | 96.72 ± 1.11 | 96.10 ± 0.87 | 96.74 ± 0.64 | 96.56 ± 1.46 | 97.63 ± 0.47 | 97.49 ± 0.61 | 98.19 ± 0.33 |
6 | 97.19 ± 2.86 | 74.68 ± 5.34 | 83.14 ± 5.14 | 93.85 ± 2.61 | 73.81 ± 5.75 | 88.76 ± 2.57 | 87.45 ± 3.28 | 93.30 ± 2.21 | 88.74 ± 3.62 | 93.67 ± 2.33 |
7 | 83.06 ± 0.99 | 90.96 ± 1.45 | 91.60 ± 1.81 | 93.78 ± 1.73 | 91.16 ± 1.34 | 94.86 ± 0.80 | 91.19 ± 4.61 | 95.99 ± 1.41 | 93.05 ± 1.08 | 94.54 ± 1.03 |
8 | 67.91 ± 1.94 | 82.25 ± 3.29 | 86.07 ± 2.16 | 90.60 ± 2.20 | 88.58 ± 1.38 | 89.61 ± 1.99 | 81.54 ± 8.63 | 94.48 ± 1.71 | 91.24 ± 1.97 | 95.74 ± 1.35 |
9 | 78.28 ± 1.83 | 83.92 ± 2.64 | 89.48 ± 1.81 | 89.02 ± 4.05 | 88.71 ± 1.77 | 92.63 ± 0.82 | 83.87 ± 8.50 | 92.34 ± 1.55 | 90.64 ± 1.84 | 94.29 ± 1.32 |
10 | 72.09 ± 2.47 | 86.58 ± 2.87 | 90.36 ± 1.50 | 92.31 ± 3.74 | 90.39 ± 1.23 | 89.33 ± 2.54 | 76.11 ± 12.47 | 94.44 ± 1.44 | 92.39 ± 1.92 | 96.91 ± 0.81 |
11 | 76.74 ± 1.04 | 85.83 ± 2.80 | 90.29 ± 2.17 | 91.84 ± 3.23 | 91.15 ± 1.53 | 91.29 ± 2.27 | 78.69 ± 12.96 | 93.75 ± 1.06 | 93.28 ± 1.55 | 94.94 ± 0.72 |
12 | 71.20 ± 2.04 | 82.39 ± 4.06 | 89.76 ± 2.29 | 91.47 ± 3.03 | 87.13 ± 1.52 | 88.22 ± 3.34 | 84.79 ± 7.95 | 93.37 ± 1.77 | 90.72 ± 2.25 | 96.50 ± 1.00 |
13 | 54.00 ± 5.40 | 83.31 ± 3.68 | 90.21 ± 4.11 | 92.38 ± 1.39 | 74.81 ± 4.09 | 80.82 ± 2.66 | 57.02 ± 31.62 | 85.68 ± 4.58 | 88.52 ± 2.33 | 93.42 ± 1.61 |
14 | 95.64 ± 1.02 | 95.41 ± 1.85 | 96.94 ± 2.92 | 96.06 ± 2.52 | 95.13 ± 1.43 | 95.03 ± 1.53 | 90.17 ± 7.81 | 99.12 ± 0.43 | 97.13 ± 1.62 | 99.88 ± 0.19 |
15 | 98.25 ± 0.40 | 96.28 ± 1.59 | 98.13 ± 0.83 | 96.02 ± 2.05 | 94.69 ± 2.24 | 97.65 ± 1.18 | 94.10 ± 4.05 | 98.20 ± 0.70 | 98.40 ± 1.06 | 98.98 ± 0.58 |
OA(%) | 84.91 ± 0.51 | 89.61 ± 1.82 | 92.40 ± 1.30 | 93.90 ± 1.70 | 91.28 ± 0.69 | 92.61 ± 1.01 | 87.36 ± 5.97 | 95.28 ± 0.72 | 93.94 ± 1.02 | 96.22 ± 0.35 |
K(%) | 83.68 ± 0.55 | 88.77 ± 1.97 | 91.79 ± 1.40 | 93.41 ± 1.84 | 90.58 ± 0.74 | 92.02 ± 1.09 | 86.34 ± 6.47 | 94.91 ± 0.78 | 93.45 ± 1.10 | 95.92 ± 0.38 |
Datasets | Methods | ||||||||
---|---|---|---|---|---|---|---|---|---|
# | Mou | He | 3D-CNN | HybridSN | ViT | CrossViT | LeViT | RvT | HiT |
Pavia University | 7.12 × 10−7 | 5.20 × 10−11 | 7.45 × 10−6 | 1.40 × 10−2 | 1.02 × 10−11 | 1.22 × 10−12 | 3.68 × 10−2 | 1.08 × 10−11 | 2.26 × 10−5 |
Indian Pines | 1.77 × 10−13 | 1.94 × 10−6 | 4.05 × 10−8 | 5.01 × 10−3 | 2.80 × 10−18 | 4.59 × 10−8 | 4.48 × 10−2 | 4.76 × 10−10 | 1.78 × 10−2 |
Houston2013 | 1.78 × 10−21 | 1.16 × 10−6 | 5.61 × 10−6 | 2.57 × 10−3 | 1.86 × 10−13 | 7.50 × 10−9 | 1.59 × 10−3 | 2.50 × 10−3 | 5.08 × 10−5 |
Training Sample | Methods | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
# | Mou | He | 3D-CNN | HybridSN | ViT | CrossViT | LeViT | RvT | HiT | SS-TMNet |
10% | 84.91 ± 0.51 | 89.6 1± 1.82 | 92.40 ± 1.30 | 93.90 ± 1.70 | 91.28 ± 0.69 | 92.61 ± 1.01 | 87.36 ± 5.97 | 95.28 ± 0.72 | 93.94 ± 1.02 | 96.22 ± 0.35 |
20% | 87.77 ± 0.36 | 94.71 ± 1.05 | 95.84 ± 0.64 | 97.82 ± 0.28 | 95.59 ± 0.37 | 97.19 ± 0.15 | 97.70 ± 0.33 | 97.55 ± 0.22 | 96.96 ± 0.96 | 97.98 ± 0.19 |
30% | 89.42 ± 0.40 | 96.38 ± 0.93 | 97.32 ± 0.28 | 97.92 ± 0.67 | 97.15 ± 0.25 | 98.19 ± 0.13 | 98.46 ± 0.17 | 98.27 ± 0.22 | 98.04 ± 0.26 | 98.49 ± 0.15 |
40% | 90.53 ± 0.42 | 96.88 ± 0.90 | 97.88 ± 0.23 | 98.65 ± 0.41 | 97.78 ± 0.25 | 98.61 ± 0.16 | 98.85 ± 0.11 | 98.63 ± 0.11 | 98.43 ± 0.30 | 98.79 ± 0.11 |
50% | 91.48 ± 0.35 | 97.59 ± 0.38 | 98.40 ± 0.15 | 98.76 ± 0.18 | 98.24 ± 0.26 | 98.84 ± 0.13 | 98.98 ± 0.07 | 98.82 ± 0.11 | 98.54 ± 0.29 | 98.88 ± 0.13 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, X.; Zhou, Y.; Yang, X.; Zhu, X.; Wang, K. SS-TMNet: Spatial–Spectral Transformer Network with Multi-Scale Convolution for Hyperspectral Image Classification. Remote Sens. 2023, 15, 1206. https://doi.org/10.3390/rs15051206
Huang X, Zhou Y, Yang X, Zhu X, Wang K. SS-TMNet: Spatial–Spectral Transformer Network with Multi-Scale Convolution for Hyperspectral Image Classification. Remote Sensing. 2023; 15(5):1206. https://doi.org/10.3390/rs15051206
Chicago/Turabian StyleHuang, Xiaohui, Yunfei Zhou, Xiaofei Yang, Xianhong Zhu, and Ke Wang. 2023. "SS-TMNet: Spatial–Spectral Transformer Network with Multi-Scale Convolution for Hyperspectral Image Classification" Remote Sensing 15, no. 5: 1206. https://doi.org/10.3390/rs15051206
APA StyleHuang, X., Zhou, Y., Yang, X., Zhu, X., & Wang, K. (2023). SS-TMNet: Spatial–Spectral Transformer Network with Multi-Scale Convolution for Hyperspectral Image Classification. Remote Sensing, 15(5), 1206. https://doi.org/10.3390/rs15051206