Attention Fusion of Transformer-Based and Scale-Based Method for Hyperspectral and LiDAR Joint Classification
Abstract
:1. Introduction
1.1. Multisource Remote Sensing Classification
1.2. Transformer
1.3. Multi-Scale Method
1.4. Multi-Output Method
1.5. Contribution
- TRMSF network is proposed for multimodal fusion classification task;
- In order to solve the problem of inadequate extraction of multi-scale information from multi-source remote sensing data, we build multi-scale features and fuse them into a feature in the process of multimodal fusion. A multi-scale attention enhancement module (MSAE) is proposed for feature fusion between different scales and different modalities. This module enhances the representation of multi-scale semantic information;
- In order to solve the problem of inadequate fusion of multimodal features, we introduced the attention mechanism to refine the multimodal features for fusion, and use those features to reduce the contradiction and redundancy from different modalities. We propose a module named fusion transformer (FUTR) for multimodal fusion using cross attention and experiments to prove that this module can significantly enhance the representation ability of fusion features;
- Aiming at solving incomplete feature extraction problem caused by a single output, this paper designs a multi-output module, and constructs a multi-level loss function to avoid the optimization loss caused by a backwards gradient.
2. Materials and Methods
2.1. MSAE
2.2. MFM
2.3. Multi-Output Modules
3. Experimental Results and Analysis
3.1. Dataset Description
3.1.1. Houston Dataset
3.1.2. Trento Dataset
3.2. Evaluation Indicators
3.2.1. Overall Accuracy
3.2.2. Average Accuracy
3.2.3. Kappa Coefficient
3.3. Comparative Experiment
3.3.1. Houston Dataset
3.3.2. Trento Dataset
3.4. Ablation Study
3.4.1. Houston Dataset
3.4.2. Trento Dataset
3.5. Visualization Experiment
3.5.1. Houston Dataset
3.5.2. Trento Dataset
3.6. Convergence Experiment
3.7. Loss Ratio Selection
4. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, H.C.; Hu, W.S.; Li, W.; Li, J.; Du, Q.; Plaza, A. A 3 CLNN: Spatial spectral and multiscale attention ConvLSTM neural network for multisource remote sensing data classification. IEEE Trans. Neural Netw. Learn. Syst. 2020, 33, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Hang, R.; Li, Z.; Ghamisi, P.; Hong, D.; Xia, G.; Liu, Q. Classification of hyperspectral and LiDAR data using coupled CNNs. IEEE Trans. Geosci. Remote. Sens. 2020, 58, 4939–4950. [Google Scholar] [CrossRef] [Green Version]
- Zhang, M.; Li, W.; Tao, R.; Li, H.; Du, Q. Information fusion for classification of hyperspectral and LiDAR data using IP-CNN. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 1–12. [Google Scholar] [CrossRef]
- Zhang, M.; Li, W.; Du, Q.; Gao, L.; Zhang, B. Feature extraction for classification of hyperspectral and LiDAR data using patch-to-patch CNN. IEEE Trans. Cybern. 2018, 50, 100–111. [Google Scholar] [CrossRef] [PubMed]
- Ding, Y.; Zhang, Z.; Zhao, X.; Cai, W.; Yang, N.; Hu, H.; Cai, W. Unsupervised self-correlated learning smoothy enhanced locality preserving graph convolution embedding clustering for hyperspectral images. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 1–16. [Google Scholar] [CrossRef]
- Ding, Y.; Zhao, X.; Zhang, Z.; Cai, W.; Yang, N.; Zhan, Y. Semi-supervised locality preserving dense graph neural network with ARMA filters and context-aware learning for hyperspectral image classification. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 1–12. [Google Scholar] [CrossRef]
- Ding, Y.; Zhang, Z.; Zhao, X.; Cai, Y.; Li, S.; Deng, B.; Cai, W. Self-supervised locality preserving low-pass graph convolutional embedding for large-scale hyperspectral image clustering. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 1–16. [Google Scholar] [CrossRef]
- Ding, Y.; Zhao, X.; Zhang, Z.; Cai, W.; Yang, N. Graph sample and aggregate-attention network for hyperspectral image classification. IEEE Geosci. Remote. Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Ding, Y.; Zhang, Z.; Zhao, X.; Hong, D.; Li, W.; Cai, W.; Zhan, Y. AF2GNN: Graph convolution with adaptive filters and aggregator fusion for hyperspectral image classification. Inf. Sci. 2022, 602, 201–219. [Google Scholar] [CrossRef]
- Yao, D.; Zhi-li, Z.; Xiao-feng, Z.; Wei, C.; Fang, H.; Yao-ming, C.; Cai, W.W. Deep hybrid: Multi-graph neural network collaboration for hyperspectral image classification. Def. Technol. 2022; in press. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 2017, 30. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B.; Liu, Z.; Lin, Y.; Cao, Y. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Xue, Z.; Tan, X.; Yu, X.; Liu, B.; Yu, A.; Zhang, P. Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data Classification. IEEE Trans. Image Process. 2022, 31, 3095–3110. [Google Scholar] [CrossRef]
- Li, L.H.; Yatskar, M.; Yin, D.; Hsieh, C.J.; Chang, K.W. Visualbert: A simple and performant baseline for vision and language. arXiv 2019, arXiv:1908.03557. [Google Scholar]
- Lu, J.; Batra, D.; Parikh, D.; Lee, S. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Adv. Neural Inf. Process. Syst. 2019, 2019, 32. [Google Scholar]
- Zhao, X.; Zhang, M.; Tao, R.; Li, W.; Liao, W.; Tian, L.; Philips, W. Fractional fourier image transformer for multimodal remote sensing data classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–13. [Google Scholar] [CrossRef]
- Yuxuan, H.; He, H.; Weng, L. Hyperspectral and LiDAR Data Land-Use Classification Using Parallel Transformers. In Proceedings of the IGARSS 2022–2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022. [Google Scholar]
- Li, G.; Duan, N.; Fang, Y.; Gong, M.; Jiang, D. Unicoder-vl: A universal encoder for vision and language by cross-modal pre-training. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34. No. 07. [Google Scholar]
- Su, W.; Zhu, X.; Cao, Y.; Li, B.; Lu, L.; Wei, F.; Dai, J. Vl-bert: Pre-training of generic visual-linguistic representations. arXiv 2019, arXiv:1908.08530. [Google Scholar]
- Chen, Y.C.; Li, L.; Yu, L.; El Kholy, A.; Ahmed, F.; Gan, Z.; Liu, J. Uniter: Universal Image-Text Representation Learning. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sutskever, I. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, PMLR. Virtual, 18–24 July 2021. [Google Scholar]
- Huang, Z.; Zeng, Z.; Liu, B.; Fu, D.; Fu, J. Pixel-bert: Aligning image pixels with text by deep multi-modal transformers. arXiv 2020, arXiv:2004.00849. [Google Scholar]
- Zhen, L.; Hu, P.; Wang, X.; Peng, D. Deep supervised cross-modal retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10394–10403. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020. [Google Scholar]
- Wang, H.; Zhu, Y.; Adam, H.; Yuille, A.; Chen, L.C. Max-deeplab: End-to-end panoptic segmentation with mask transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5463–5474. [Google Scholar]
- Ren, P.; Li, C.; Wang, G.; Xiao, Y.; Du, Q.; Liang, X.; Chang, X. Beyond Fixation: Dynamic Window Visual Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Gong, Z.; Zhong, P.; Yu, Y.; Hu, W.; Li, S. A CNN with multiscale convolution and diversified metric for hyperspectral image classification. IEEE Trans. Geosci. Remote. Sens. 2019, 57, 3599–3618. [Google Scholar] [CrossRef]
- Roy, S.K.; Krishna, G.; Dubey, S.R.; Chaudhuri, B.B. HybridSN: Exploring 3-D–2-D CNN feature hierarchy for hyperspectral image classification. IEEE Geosci. Remote. Sens. Lett. 2019, 17, 277–281. [Google Scholar] [CrossRef] [Green Version]
- Hu, R.; Amanpreet, S. Unit: Multimodal multitask learning with a unified transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Cao, X.; Zhou, F.; Xu, L.; Meng, D.; Xu, Z.; Paisley, J. Hyperspectral image classification with Markov random fields and a convolutional neural network. IEEE Trans. Image Process. 2018, 27, 2354–2367. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, W.; Wu, G.; Zhang, F.; Du, Q. Hyperspectral image classification using deep pixel-pair features. IEEE Trans. Geosci. Remote. Sens. 2016, 55, 844–853. [Google Scholar] [CrossRef]
- Lee, H.; Kwon, H. Going deeper with contextual CNN for hyperspectral image classification. IEEE Trans. Image Process. 2017, 26, 4843–4855. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wu, H.; Prasad, S. Convolutional recurrent neural networks for hyperspectral data classification. Remote. Sens. 2017, 9, 298. [Google Scholar] [CrossRef] [Green Version]
- Xu, X.; Li, W.; Ran, Q.; Du, Q.; Gao, L.; Zhang, B. Multisource remote sensing data classification based on convolutional neural network. IEEE Trans. Geosci. Remote. Sens. 2017, 56, 937–949. [Google Scholar] [CrossRef]
- Zhao, X.; Tao, R.; Li, W.; Li, H.C.; Du, Q.; Liao, W.; Philips, W. Joint classification of hyperspectral and LiDAR data using hierarchical random walk and deep CNN architecture. IEEE Trans. Geosci. Remote. Sens. 2020, 58, 7355–7370. [Google Scholar] [CrossRef]
- Hong, D.; Gao, L.; Hang, R.; Zhang, B.; Chanussot, J. Deep encoder-decoder networks for classification of hyperspectral and LiDAR data. IEEE Geosci. Remote. Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Fang, S.; Li, K.; Li, Z. S2ENet: Spatial–spectral cross-modal enhancement network for classification of hyperspectral and LiDAR data. IEEE Geosci. Remote. Sens. Lett. 2022, 19, 1–5. [Google Scholar]
Class | SVM | CNN-PPF | CXC | TBC | CRNN | CC | CNN-MRF | EndNet | IP-CNN | PToP CNN | S2ENet | Ours |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Healthy grass | 82.43 | 83.57 | 84.89 | 83.10 | 83.00 | 98.51 | 85.77 | 78.54 | 85.77 | 85.77 | 82.72 | 98.29 |
Stressed grass | 82.05 | 98.21 | 87.40 | 84.10 | 79.41 | 97.83 | 86.28 | 96.33 | 87.34 | 87.08 | 100.00 | 89.77 |
Synthetic grass | 99.80 | 98.42 | 99.86 | 100.00 | 99.80 | 70.60 | 99.00 | 100.00 | 100.00 | 99.57 | 99.60 | 100.00 |
Trees | 92.80 | 97.73 | 93.49 | 93.09 | 90.15 | 99.06 | 92.85 | 88.26 | 94.26 | 94.13 | 95.74 | 94.08 |
Soil | 98.48 | 96.50 | 100.00 | 100.00 | 99.71 | 100.00 | 100.00 | 100.00 | 98.42 | 100.00 | 99.81 | 98.05 |
Water | 95.10 | 97.20 | 98.77 | 99.30 | 83.21 | 41.11 | 98.15 | 100.00 | 99.91 | 99.38 | 97.20 | 99.14 |
Residential | 75.47 | 85.82 | 82.81 | 92.82 | 88.06 | 83.14 | 91.64 | 83.02 | 94.59 | 87.38 | 91.23 | 95.78 |
Commercial | 46.91 | 56.51 | 78.78 | 82.34 | 88.61 | 98.39 | 80.79 | 79.96 | 91.81 | 97.35 | 91.55 | 94.45 |
Road | 77.53 | 71.20 | 82.51 | 84.70 | 66.01 | 94.81 | 91.37 | 93.30 | 89.35 | 90.81 | 95.94 | 93.89 |
Highway | 60.04 | 57.12 | 59.41 | 65.44 | 52.22 | 92.98 | 73.35 | 92.28 | 72.43 | 72.21 | 84.75 | 87.18 |
Railway | 81.02 | 80.55 | 83.24 | 88.24 | 81.97 | 90.88 | 98.87 | 85.86 | 96.57 | 100.00 | 94.31 | 94.84 |
Parking Lot 1 | 85.49 | 62.82 | 92.13 | 89.53 | 69.83 | 91.02 | 89.38 | 99.81 | 95.60 | 98.13 | 97.79 | 95.73 |
Parking Lot 2 | 75.09 | 63.86 | 94.88 | 92.28 | 79.64 | 97.09 | 92.75 | 83.16 | 94.37 | 92.11 | 89.47 | 98.55 |
Tennis Court | 100.00 | 100.00 | 99.77 | 96.76 | 100.00 | 100.00 | 100.00 | 100.00 | 99.86 | 99.30 | 100.00 | 99.92 |
Running Track | 98.31 | 98.10 | 98.79 | 99.79 | 100.00 | 97.85 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 99.54 |
OA(%) | 80.49 | 83.33 | 86.90 | 87.98 | 88.55 | 90.43 | 90.61 | 90.71 | 92.06 | 92.48 | 93.99 | 94.62 |
AA(%) | 83.37 | 83.21 | 89.11 | 90.11 | 90.3 | 90.22 | 92.01 | 92.03 | 93.35 | 93.55 | 94.67 | 95.95 |
KAPPA(%) | 78.98 | 81.88 | 85.89 | 86.98 | 87.56 | 89.68 | 89.87 | 89.92 | 91.42 | 91.87 | 93.48 | 94.16 |
Class | SVM | CNN-PPF | CXC | TBC | CRNN | CC | CNN-MRF | EndNet | IP-CNN | PToP CNN | S2ENet | Ours |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Apple trees | 88.62 | 90.11 | 99.26 | 98.07 | 98.39 | 99.87 | 99.95 | 99.90 | 99.00 | 99.60 | 99.90 | 96.88 |
Buildings | 94.04 | 83.34 | 86.81 | 95.21 | 90.46 | 83.84 | 89.97 | 99.03 | 99.40 | 93.90 | 98.88 | 96.38 |
Ground | 93.53 | 71.13 | 97.91 | 93.32 | 99.79 | 87.09 | 98.33 | 85.83 | 99.10 | 100.00 | 86.36 | 88.09 |
Woods | 98.90 | 99.04 | 97.31 | 99.93 | 96.96 | 99.98 | 100.00 | 100.00 | 99.92 | 99.27 | 100.00 | 99.89 |
Vineyard | 88.96 | 99.37 | 99.82 | 98.78 | 100.00 | 99.61 | 100.00 | 99.31 | 99.66 | 100.00 | 99.21 | 99.84 |
Roads | 91.75 | 89.73 | 84.63 | 89.98 | 81.63 | 98.75 | 97.86 | 90.83 | 90.21 | 97.28 | 91.32 | 96.69 |
OA(%) | 92.77 | 94.76 | 96.11 | 97.92 | 97.30 | 97.69 | 98.40 | 98.52 | 98.58 | 98.34 | 98.53 | 98.63 |
AA(%) | 92.63 | 88.97 | 94.29 | 96.19 | 94.54 | 94.86 | 97.04 | 95.81 | 97.88 | 97.53 | 95.94 | 96.30 |
KAPPA(%) | 95.85 | 93.04 | 94.81 | 96.81 | 96.39 | 96.91 | 97.86 | 98.01 | 98.17 | 97.79 | 98.03 | 98.17 |
Base | Msnet | Trnet | TRMSF | |
---|---|---|---|---|
OA(%) | 92.31 | 94.09 | 93.17 | 94.62 |
AA(%) | 94.72 | 95.67 | 94.86 | 95.95 |
kappa(%) | 91.65 | 93.58 | 92.58 | 94.16 |
Base | Msnet | Trnet | TRMSF | |
---|---|---|---|---|
OA(%) | 96.43 | 97.63 | 97.71 | 98.63 |
AA(%) | 94.01 | 95.82 | 95.41 | 96.30 |
kappa(%) | 95.25 | 96.83 | 97.41 | 98.17 |
Base | 0.1,0.1,0.1 | 0.1,0.6,0.3 | 0.6,0.1,0.3 | 0.1,0.3,0.6 | Aver | |
---|---|---|---|---|---|---|
OA(%) | 92.31 | 93.86 | 92.96 | 93.91 | 94.62 | 93.84 |
AA(%) | 94.72 | 95.38 | 94.74 | 95.70 | 95.95 | 95.44 |
kappa(%) | 91.65 | 93.33 | 92.36 | 93.39 | 94.16 | 93.31 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, M.; Gao, F.; Zhang, T.; Gan, Y.; Dong, J.; Yu, H. Attention Fusion of Transformer-Based and Scale-Based Method for Hyperspectral and LiDAR Joint Classification. Remote Sens. 2023, 15, 650. https://doi.org/10.3390/rs15030650
Zhang M, Gao F, Zhang T, Gan Y, Dong J, Yu H. Attention Fusion of Transformer-Based and Scale-Based Method for Hyperspectral and LiDAR Joint Classification. Remote Sensing. 2023; 15(3):650. https://doi.org/10.3390/rs15030650
Chicago/Turabian StyleZhang, Maqun, Feng Gao, Tiange Zhang, Yanhai Gan, Junyu Dong, and Hui Yu. 2023. "Attention Fusion of Transformer-Based and Scale-Based Method for Hyperspectral and LiDAR Joint Classification" Remote Sensing 15, no. 3: 650. https://doi.org/10.3390/rs15030650
APA StyleZhang, M., Gao, F., Zhang, T., Gan, Y., Dong, J., & Yu, H. (2023). Attention Fusion of Transformer-Based and Scale-Based Method for Hyperspectral and LiDAR Joint Classification. Remote Sensing, 15(3), 650. https://doi.org/10.3390/rs15030650