A Deep Ensemble Neural Network with Attention Mechanisms for Lung Abnormality Classification Using Audio Inputs
Abstract
:1. Introduction
- We propose a PSO-based evolving ensemble model for lung abnormality classification, which integrates four types of deep networks, i.e., A-CRNN, A-BiLSTM, A-BiGRU, and 1D CNN, with the attempt to generate diverse discriminative acoustic representations to enhance classification performance. Specifically, the A-CRNN model comprises 1D convolutional and BiLSTM layers to diversify feature learning mechanisms, while A-BiLSTM exploits bidirectional LSTM layers to learn feature representations from both forward and backward directions. A-BiGRU adopts similar bidirectional RNN layers (i.e., BiGRU layers) but with different gating mechanisms to explore network feature learning capabilities in tackling disease diagnosis. A 1D CNN model is also proposed which embeds a set of 1D convolutional layers with scalar multiplication and addition operations for extracting sequential temporal cues. On top of this, attention mechanisms are also exploited in A-CRNN, A-BiLSTM, and A-BiGRU for extracting more discriminative signal dynamics.
- To maximize network performance and diversify model learning behaviors, a PSO model is used to optimize the learning rate, batch size, and the number of training epochs for A-CRNN, A-BiLSTM, A-BiGRU, and CNN. The devised networks with distinctive learning configurations illustrate more diversified learning behaviors to enhance ensemble model robustness. The resulting ensemble model utilizes an average probability method to integrate the outputs of these optimized base networks.
- Evaluated using several challenging medical sound datasets, the proposed ensemble model outperforms existing methods for abnormal respiratory, coughing, and speech sound classification with respect to diverse lung disease and COVID-19 cases. In particular, the proposed base and ensemble models show great efficiency in distinguishing common respiratory diseases from COVID-19 using respiratory audio clips. To the best of our knowledge, we are also the first study to explore the classification of COVID-19 against diverse other commonly seen lung conditions.
2. Related Work
2.1. General Audio Classification
2.2. Related Studies Using the ICBHI Dataset
2.3. Related Studies Using the Coswara Dataset
2.4. Attention Mechanisms
2.5. Particle Swarm Optimization
3. Methodology
3.1. Dataset Preprocessing
3.1.1. Pre-Processing for the ICBHI Dataset (D1)
3.1.2. Pre-Processing for the Coswara Cough, Speech, and Breathing Datasets
3.1.3. Pre-Processing for the Combined Dataset (D5) Based on ICBHI and Coswara Breathing Databases
3.2. Feature Extraction
3.3. The Proposed Models
3.3.1. A-CRNN
3.3.2. LSTM
3.3.3. A-BiGRU
3.3.4. CNN
3.3.5. PSO-Based Hyper-Parameter Selection
3.3.6. The Ensemble Model
Ensemble Model Training
4. Evaluation
4.1. Evaluation Results for D1 (ICBHI) Using a Subject-Independent Split
4.2. Evaluation Results for Coswara Cough (D2), Speech (D3), and Breathing (D4) Datasets Using Random Splits
4.3. Evaluation Results for the Combined Dataset D5 (ICBHI + Coswara Breathing) Using a Subject-Independent Split
4.4. Discussions
4.4.1. Result Analysis
4.4.2. Advantages and Disadvantages of the Proposed Base and Ensemble Networks
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wall, C.; Young, F.; Zhang, L.; Phillips, E.J.; Jiang, R.; Yu, Y. Deep learning based melanoma diagnosis using dermoscopic images. In Developments of Artificial Intelligence Technologies in Computation and Robotics, Proceedings of the 14th International FLINS Conference (FLINS 2020), Cologne, Germany, 18–21 August 2020; World Scientific: Singapore, 2020; pp. 907–914. [Google Scholar]
- Yamashita, R.; Nishio, M.; Do, R.K.G.; Togashi, K. Convolutional neural networks: An overview and application in radiology. Insights Imaging 2018, 9, 611–629. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nogueira, D.M.; Ferreira, C.A.; Gomes, E.F.; Jorge, A.M. Classifying heart sounds using images of motifs, MFCC and temporal features. J. Med. Syst. 2019, 43, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Kochetov, K.; Putin, E.; Balashov, M.; Filchenkov, A.; Shalyto, A. Noise masking recurrent neural network for respiratory sound classification. In International Conference on Artificial Neural Networks, Proceedings of the ICANN 2018: Artificial Neural Networks and Machine Learning—ICANN 2018, Rhodes, Greece, 4–7 October 2018; Springer: Cham, Switzerland, 2018; pp. 208–217. [Google Scholar]
- Rana, R. Gated recurrent unit (GRU) for emotion classification from noisy speech. arXiv 2016, arXiv:1612.07778. [Google Scholar]
- Rocha, B.M.; Filos, D.; Mendes, L.; Vogiatzis, I.; Perantoni, E.; Kaimakamis, E.; Natsiavas, P.; Oliveira, A.; Jácome, C.; Marques, A.; et al. Α respiratory sound database for the development of automated classification. In International Conference on Biomedical and Health Informatics, Proceedings of the ICBHI 2017: Precision Medicine Powered by pHealth and Connected Health, Thessaloniki, Greece, 18–21 November 2017; Springer: Singapore, 2017; pp. 33–37. [Google Scholar]
- Phan, H.; Koch, P.; Katzberg, F.; Maass, M.; Mazur, R.; Mertins, A. Audio scene classification with deep recurrent neural networks. arXiv 2017, arXiv:1703.04770. [Google Scholar]
- Sharma, N.; Krishnan, P.; Kumar, R.; Ramoji, S.; Chetupalli, S.R.; Ghosh, P.K.; Ganapathy, S. Coswara--a database of breathing, cough, and voice sounds for COVID-19 diagnosis. arXiv 2020, arXiv:2005.10548. [Google Scholar]
- Choi, K.; Fazekas, G.; Sandler, M.; Cho, K. Convolutional recurrent neural networks for music classification. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 2392–2396. [Google Scholar]
- Bertin-Mahieux, T.; Ellis, D.P.; Whitman, B.; Lamere, P. The million song dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), Miami, FL, USA, 24–28 October 2011. [Google Scholar]
- McFee, B.; Raffel, C.; Liang, D.; Ellis, D.P.; McVicar, M.; Battenberg, E.; Nieto, O. librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference, Austin, TX, USA, 6–12 July 2015; Volume 8, pp. 18–25. [Google Scholar]
- Chen, C.; Li, Q. A multimodal music emotion classification method based on multifeature combined network classifier. Math. Probl. Eng. 2020, 2020, 4606027. [Google Scholar] [CrossRef]
- Perna, D.; Tagarelli, A. Deep auscultation: Predicting respiratory anomalies and diseases via recurrent neural networks. In Proceedings of the 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS), Cordoba, Spain, 5–7 June 2019; pp. 50–55. [Google Scholar]
- Pahar, M.; Klopper, M.; Warren, R.; Niesler, T. COVID-19 cough classification using machine learning and global smartphone recordings. Comput. Biol. Med. 2021, 135, 104572. [Google Scholar] [CrossRef]
- Marcano-Cedeño, A.; Quintanilla-Domínguez, J.; Cortina-Januchs, M.G.; Andina, D. Feature selection using sequential forward selection and classification applying artificial metaplasticity neural network. In Proceedings of the IECON 2010-36th annual conference on IEEE industrial electronics society, Glendale, AZ, USA, 7–10 November 2010; pp. 2845–2850. [Google Scholar]
- Muguli, A.; Pinto, L.; Sharma, N.; Krishnan, P.; Ghosh, P.K.; Kumar, R.; Bhat, S.; Chetupalli, S.R.; Ganapathy, S.; Ramoji, S.; et al. DiCOVA Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics. arXiv 2021, arXiv:2103.09148. [Google Scholar]
- Li JZhang, X.; Sun, M.; Zou, X.; Zheng, C. Attention-based LSTM algorithm for audio replay detection in noisy environments. Appl. Sci. 2019, 9, 1539. [Google Scholar]
- Zhang, Z.; Xu, S.; Zhang, S.; Qiao, T.; Cao, S. Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing 2020, 453, 896–903. [Google Scholar] [CrossRef]
- Piczak, K.J. ESC: Dataset for environmental sound classification. In Proceedings of the 23rd ACM international conference on Multimedia, Brisbane, Australia, 26–30 October 2015; pp. 1015–1018. [Google Scholar]
- Wall, C.; Zhang, L.; Yu, Y.; Mistry, K. Deep recurrent neural networks with attention mechanisms for respiratory anomaly classification. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar]
- Sait, U.; KV, G.L.; Shivakumar, S.; Kumar, T.; Bhaumik, R.; Prajapati, S.; Bhalla, K.; Chakrapani, A. A deep-learning based multimodal system for COVID-19 diagnosis using breathing sounds and chest X-ray images. Appl. Soft Comput. 2021, 109, 107522. [Google Scholar] [CrossRef] [PubMed]
- Wall, C.; Liu, C.; Zhang, L. Deep learning based respiratory anomaly and COVID diagnosis using audio and CT scan imagery. Recent Adv. AI-Enabled Autom. Med. Diagnosis 2022. Available online: https://www.routledge.com/Recent-Advances-in-AI-enabled-Automated-Medical-Diagnosis/Jiang-Crookes-Wei-Zhang-Chazot/p/book/9781032008431 (accessed on 11 June 2022).
- Perna, D. Convolutional neural networks learning from respiratory data. In Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 3–6 December 2018; pp. 2109–2113. [Google Scholar]
- García-Ordás, M.T.; Benítez-Andrades, J.A.; García-Rodríguez, I.; Benavides, C.; Alaiz-Moretón, H. Detecting respiratory pathologies using convolutional neural networks and variational autoencoders for unbalancing data. Sensors 2020, 20, 1214. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Boddapati, V.; Petef, A.; Rasmusson, J.; Lundberg, L. Classifying environmental sounds using image recognition networks. Procedia Comput. Sci. 2017, 112, 2048–2056. [Google Scholar] [CrossRef]
- Zhang, L.; Lim, C.P.; Yu, Y.; Jiang, M. Sound classification using evolving ensemble models and particle swarm optimization. Appl. Soft Comput. 2022, 116, 108322. [Google Scholar] [CrossRef]
- Li, F.; Tang, H.; Shang, S.; Mathiak, K.; Cong, F. Classification of heart sounds using convolutional neural network. Appl. Sci. 2020, 10, 3956. [Google Scholar] [CrossRef]
- Xiao, B.; Xu, Y.; Bi, X.; Zhang, J.; Ma, X. Heart sounds classification using a novel 1-D convolutional neural network with extremely low parameter consumption. Neurocomputing 2020, 392, 153–159. [Google Scholar] [CrossRef]
- Zhang, Z.; Xu, S.; Cao, S.; Zhang, S. November. Deep convolutional neural network with mixup for environmental sound classification. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Proceedings of the PRCV 2018: Pattern Recognition and Computer Vision, Guangzhou, China, 23–26 November 2018; Springer: Cham, Switzerland, 2018; pp. 356–367. [Google Scholar]
- Mistry, K.; Zhang, L.; Neoh, S.C.; Lim, C.P.; Fielding, B. A micro-GA embedded PSO feature selection approach to intelligent facial emotion recognition. IEEE Trans. Cybern. 2017, 47, 1496–1509. [Google Scholar] [CrossRef] [Green Version]
- Tan, T.Y.; Zhang, L.; Lim, C.P. Intelligent skin cancer diagnosis using improved particle swarm optimization and deep learning models. Appl. Soft Comput. 2019, 84, 105725. [Google Scholar] [CrossRef]
- Fielding, B.; Zhang, L. Evolving image classification architectures with enhanced particle swarm optimisation. IEEE Access 2018, 6, 68560–68575. [Google Scholar] [CrossRef]
- Tan, T.Y.; Zhang, L.; Lim, C.P. Adaptive melanoma diagnosis using evolving clustering, ensemble and deep neural networks. Knowl. Based Syst. 2020, 187, 104807. [Google Scholar] [CrossRef]
- Wu, J.M.T.; Tsai, M.H.; Huang, Y.Z.; Islam, S.H.; Hassan, M.M.; Alelaiwi, A.; Fortino, G. Applying an ensemble convolutional neural network with Savitzky–Golay filter to construct a phonocardiogram prediction model. Appl. Soft Comput. 2019, 78, 29–40. [Google Scholar] [CrossRef]
- Davis, S.; Mermelstein, P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 1980, 28, 357–366. [Google Scholar] [CrossRef] [Green Version]
- Minh-Tuan, N.; Kim, Y.-H. Bidirectional long short-term memory neural networks for linear sum assignment problems. Appl. Sci. 2019, 9, 3470. [Google Scholar] [CrossRef] [Green Version]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Zhang, L.; Srisukkham, W.; Neoh, S.C.; Lim, C.P.; Pandit, D. Classifier ensemble reduction using a modified firefly algorithm: An empirical evaluation. Expert Syst. Appl. 2018, 93, 395–422. [Google Scholar] [CrossRef]
- Srisukkham, W.; Zhang, L.; Neoh, S.C.; Todryk, S.; Lim, C.P. Intelligent Leukaemia diagnosis with bare-bones PSO based feature optimization. Appl. Soft Comput. 2017, 56, 405–419. [Google Scholar] [CrossRef]
- Lawrence, T.; Zhang, L.; Rogage, K.; Lim, C.P. Evolving deep architecture generation with residual connections for image classification using particle swarm optimization. Sensors 2021, 21, 7936. [Google Scholar] [CrossRef]
- Zhang, L.; Lim, C.P.; Yu, Y. Intelligent human action recognition using an ensemble model of evolving deep networks with swarm-based optimization. Knowl. Based Syst. 2021, 220, 106918. [Google Scholar] [CrossRef]
- Tan, C.J.; Neoh, S.C.; Lim, C.P.; Hanoun, S.; Wong, W.P.; Loo, C.K.; Zhang, L.; Nahavandi, S. Application of an evolutionary algorithm-based ensemble model to job-shop scheduling. J. Intell. Manuf. 2019, 30, 879–890. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Zhang, L.; Hossain, M.A. Adaptive 3D facial action intensity estimation and emotion recognition. Expert Syst. Appl. 2015, 42, 1446–1464. [Google Scholar] [CrossRef] [Green Version]
- Zahid, S.; Hussain, F.; Rashid, M.; Yousaf, M.H.; Habib, H.A. Optimized audio classification and segmentation algorithm by using ensemble methods. Math. Probl. Eng. 2015, 2015, 209814. [Google Scholar] [CrossRef]
- Neoh, S.C.; Zhang, L.; Mistry, K.; Hossain, M.A.; Lim, C.P.; Aslam, N.; Kinghorn, P. Intelligent facial emotion recognition using a layered encoding cascade optimization model. Appl. Soft Comput. 2015, 34, 72–93. [Google Scholar] [CrossRef]
- Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
- Prechelt, L. Early stopping-but when? In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 1998; pp. 55–69. [Google Scholar]
- Parikh, R.; Mathai, A.; Parikh, S.; Sekhar, G.C.; Thomas, R. Understanding and using sensitivity, specificity and predictive values. Indian J. Ophthalmol. 2008, 56, 45. [Google Scholar] [CrossRef]
- Liu, K.; Chen, Y.; Lin, R.; Han, K. Clinical features of COVID-19 in elderly patients: A comparison with young and middle-aged patients. J. Infect. 2020, 80, e14–e18. [Google Scholar] [CrossRef] [Green Version]
- Kinghorn, P.; Zhang, L.; Shao, L. A region-based image caption generator with refined descriptions. Neurocomputing 2018, 272, 416–424. [Google Scholar] [CrossRef] [Green Version]
- Kinghorn, P.; Zhang, L.; Shao, L. A hierarchical and regional deep learning architecture for image description generation. Pattern Recognit. Lett. 2019, 119, 77–85. [Google Scholar] [CrossRef] [Green Version]
- Kiranyaz, S.; Avci, O.; Abdeljaber, O.; Ince, T.; Gabbouj, M.; Inman, D.J. 1D convolutional neural networks and applications: A survey. Mech. Syst. Signal Process. 2021, 151, 107398. [Google Scholar] [CrossRef]
- Lawrence, T.; Zhang, L. IoTNet: An efficient and accurate convolutional neural network for IoT devices. Sensors 2019, 19, 5541. [Google Scholar] [CrossRef] [Green Version]
Related Studies | Methodologies | Novel Strategies |
---|---|---|
Choi et al. [9] | CRNN with four conv2d layers and two GRU layers for music classification using the Million Song dataset | - |
Chen and Li [12] | (1) CNN-BiLSTM and 1D DNN for audio emotion classification, and (2) 1D DNN for lyrics emotion classification, using the Million Song dataset. (3) A stacking ensemble used to combine emotion classification results from both audio and text inputs. | (1) CNN-BiLSTM and 1D DNN for audio emotion classification, and (2) 1D DNN for lyrics emotion classification, using the Million Song dataset. (3) A stacking ensemble used to combine emotion classification results from both audio and text inputs. |
Perna [23] | 2D CNN | - |
Perna and Tagarelli [13] | LSTM to train and classify the respiratory ICBHI dataset on both pathology and anomaly levels. The pathology-driven classification includes two tasks, i.e., binary (healthy/unhealthy) and 3-class (healthy/chronic/non-chronic) classification. On the other hand, for anomaly-driven diagnosis, a 4-class prediction is performed to detect normal/wheeze/crackle/both crackle and wheeze conditions. | Using different sliding window settings for data preparation |
Pahar et al. [14] | Resnet50, LSTM, CNN, MLP, SVM, and LR for the classification of different lung abnormalities using the Coswara dataset and the SARS-CoV-2 South Africa (Sarcos) dataset. | - |
Zhang et al. [18] | CRNN with attention mechanisms for environmental sound classification using ESC-10 and ESC-50 datasets. | CRNN with attention mechanisms |
Wall et al. [20] | BiLSTM and BiGRU with attention mechanisms for respiratory and coughing sound classification | BiLSTM and BiGRU with attention mechanisms |
Wall et al. [22] | BiLSTM for 2-class (health/unhealthy) respiratory sound classification | - |
Zhang et al. [26] | An evolving ensemble of CRNNs for respiratory abnormality (healthy/chronic/non-chronic) classification, as well as heart sound and environmental sound classification. | Hyper-parameter fine-tuning using PSO (but for 3-class respiratory abnormality detection) |
García-Ordás et al. [24] | 2D CNN with two convolutional layers in combination with different data augmentation and oversampling techniques for respiratory abnormality classification | Adopting different oversampling techniques |
Li et al. [27] | 1D CNN with three convolutional layers for heart sound classification | - |
Xiao et al. [28] | 1D CNN with clique and transition blocks for heart sound classification | 1D CNN with clique and transition blocks |
Boddapati et al. [25] | AlexNet and GoogLeNet for environmental sound classification | - |
Sait et al. [21] | Transfer learning based on Inception-v3 combined with MLP for COVID-19 diagnosis using breathing and chest X-ray image inputs | Transfer learning based on Inception-v3 combined with MLP for multimodal COVID-19 diagnosis |
Zhang et al. [29] | 2D CNN combined with sound mix-up | Sound mix-up for model training |
This research | An evolving ensemble of A-CRNN, A-BiLSTM, A-BiGRU, and 1D CNN, with PSO-based hyper-parameter optimization | (1) CRNN, BiLSTM, and BiGRU with attention mechanisms (i.e., A-CRNN, A-BiLSTM, and A-BiGRU), as well as 1D CNN for audio classification. (2) PSO-based hyper-parameter tuning, and (3) an ensemble model combining the devised A-CRNN, A-BiLSTM, A-BiGRU, and 1D CNN. |
Dataset | Dataset Name | Class | No. of Files |
---|---|---|---|
D1 | ICBHI | COPD | 793 |
Healthy | 35 | ||
Bronchiectasis | 16 | ||
Bronchiolitis | 13 | ||
URTI | 23 | ||
Pneumonia | 37 | ||
Asthma | 1 | ||
LRTI | 2 | ||
D2 | Coswara Cough | COVID-19 Positive | 110 |
COVID-19 Negative | 107 | ||
D3 | Coswara Speech | COVID-19 Positive | 103 |
COVID-19 Negative | 104 | ||
D4 | Coswara Breathing | COVID-19 Positive | 101 |
COVID-19 Negative | 103 | ||
D5 | ICBHI + Coswara Breathing | COPD | 793 |
Healthy | 35 | ||
Bronchiectasis | 16 | ||
Bronchiolitis | 13 | ||
URTI | 23 | ||
Pneumonia | 37 | ||
COVID-19 | 101 |
Training Set | Augmented Training Set | Test Set | |
---|---|---|---|
Bronchiectasis | 14 | 672 | 2 |
Bronchiolitis | 7 | 672 | 6 |
URTI | 16 | 672 | 7 |
Healthy | 18 | 672 | 17 |
Pneumonia | 30 | 660 | 7 |
COPD | 648 | 648 | 145 |
Total | 733 | 3996 | 184 |
Training Set | Augmented Training Set | Test Set | ||
---|---|---|---|---|
ICBHI (D1) | Bronchiectasis | 14 | 672 | 2 |
Bronchiolitis | 7 | 672 | 6 | |
URTI | 16 | 672 | 7 | |
Healthy | 18 | 672 | 17 | |
Pneumonia | 30 | 660 | 7 | |
COPD | 648 | 648 | 145 | |
Coswara Breathing (D4) | COVID-19 | 81 | 648 | 20 |
Total | 814 | 4644 | 204 |
Layer# | Layer Description | Unit Setting | Kernel Size |
---|---|---|---|
L1 | Conv1D | 512 | 3 |
L2 | Conv1D | 256 | 3 |
L3 | MaxPooling1D | N/A | N/A |
L4 | BiLSTM | 512 | N/A |
L5 | Attention Mechanism | N/A | N/A |
L6 | LSTM | 256 | N/A |
L7 | Dense | 128 | N/A |
L8 | FC Dense (Softmax) | Number of classes | N/A |
Layer# | Layer Description | Unit Setting |
---|---|---|
L1 | BiLSTM | 512 |
L2 | LSTM | 256 |
L3 | Attention Mechanism | N/A |
L4 | Dense | 128 |
L5 | Dropout | 0.6 |
L6 | Dense | 64 |
L7 | FC Dense (Softmax) | Number of classes |
Layer# | Layer Description | Unit Setting |
---|---|---|
L1 | BiGRU | 512 |
L2 | GRU | 256 |
L3 | Attention Mechanism | N/A |
L4 | Dense | 128 |
L5 | Dropout | 0.6 |
L6 | Dense | 64 |
L7 | FC Dense (Softmax) | Number of classes |
Layer# | Layer Description | Unit Setting | Kernel Size |
---|---|---|---|
L1 | Conv1D | 128 | 3 |
L2 | Conv1D | 128 | 3 |
L3 | Conv1D | 128 | 3 |
L4 | MaxPooling1D | N/A | N/A |
L5 | Conv1D | 256 | 3 |
L6 | Conv1D | 256 | 3 |
L7 | Conv1D | 256 | 3 |
L8 | MaxPooling1D | N/A | N/A |
L9 | Conv1D | 512 | 3 |
L10 | Conv1D | 512 | 1 |
L11 | Conv1D | 2 | 1 |
L12 | GlobalAveragePooling1D | N/A | N/A |
L13 | Activation | N/A | N/A |
Model | Hyper-Parameter | Setting |
---|---|---|
A-CRNN | Learning Rate | 0.00159 |
Batch Size | 128 | |
Epoch | 37 | |
A-BiLSTM | Learning Rate | 0.00095 |
Batch Size | 128 | |
Epoch | 105 | |
A-BiGRU | Learning Rate | 0.00193 |
Batch Size | 128 | |
Epoch | 45 | |
CNN | Learning Rate | 0.00019 |
Batch Size | 128 | |
Epoch | 53 |
Model | Hyper-Parameter | Setting |
---|---|---|
A-CRNN | Learning Rate | 0.000106 |
Batch Size | 64 | |
Epoch | 26 | |
A-BiLSTM | Learning Rate | 0.000101 |
Batch Size | 64 | |
Epoch | 15 | |
A-BiGRU | Learning Rate | 0.00909 |
Batch Size | 64 | |
Epoch | 16 | |
CNN | Learning Rate | 0.00013 |
Batch Size | 64 | |
Epoch | 23 |
Model | Hyper-Parameter | Setting |
---|---|---|
A-CRNN | Learning Rate | 0.000143 |
Batch Size | 512 | |
Epoch | 48 | |
A-BiLSTM | Learning Rate | 0.000099 |
Batch Size | 512 | |
Epoch | 96 | |
A-BiGRU | Learning Rate | 0.00187 |
Batch Size | 512 | |
Epoch | 33 | |
CNN | Learning Rate | 0.000122 |
Batch Size | 512 | |
Epoch | 130 |
Model | Hyper-Parameter | Setting |
---|---|---|
A-CRNN | Learning Rate | 0.000163 |
Batch Size | 512 | |
Epoch | 48 | |
A-BiLSTM | Learning Rate | 0.000098 |
Batch Size | 512 | |
Epoch | 96 | |
A-BiGRU | Learning Rate | 0.00083 |
Batch Size | 512 | |
Epoch | 33 | |
CNN | Learning Rate | 0.000103 |
Batch Size | 512 | |
Epoch | 130 |
Model | Hyper-Parameter | Setting |
---|---|---|
A-CRNN | Learning Rate | 0.000157 |
Batch Size | 128 | |
Epoch | 42 | |
A-BiLSTM | Learning Rate | 0.000197 |
Batch Size | 128 | |
Epoch | 30 | |
A-BiGRU | Learning Rate | 0.00192 |
Batch Size | 128 | |
Epoch | 38 | |
CNN | Learning Rate | 0.000083 |
Batch Size | 128 | |
Epoch | 43 |
Models | Sensitivity | Specificity | ICBHI Score | Accuracy |
---|---|---|---|---|
A-CRNN | 0.8947 | 1 | 0.9474 | 0.8989 |
A-BiLSTM | 0.8947 | 0.8571 | 0.8759 | 0.8933 |
A-BiGRU | 0.8655 | 0.8571 | 0.8613 | 0.8652 |
CNN | 0.883 | 0.8571 | 0.8701 | 0.882 |
Ensemble | 0.9532 | 1 | 0.9766 | 0.9551 |
Bronchiectasis | Bronchiolitis | COPD | Healthy | Pneumonia | URTI | |
---|---|---|---|---|---|---|
Bronchiectasis | 1 | 0 | 0 | 0 | 0 | 0 |
Bronchiolitis | 0 | 1 | 0 | 0 | 0 | 0 |
COPD | 0.0261 | 0 | 0.9673 | 0.0065 | 0 | 0 |
Healthy | 0 | 0 | 0 | 1 | 0 | 0 |
Pneumonia | 0 | 0 | 0.2 | 0 | 0.8 | 0 |
URTI | 0 | 0 | 0 | 0.4 | 0 | 0.6 |
Models | Sensitivity | Specificity | ICBHI Score | Accuracy |
---|---|---|---|---|
A-CRNN | 0.9231 | 0.8846 | 0.9038 | 0.9060 |
A-BiLSTM | 1 | 0.9391 | 0.9700 | 0.9754 |
A-BiGRU | 1 | 0.9600 | 0.9800 | 0.9825 |
CNN | 0.9524 | 0.9077 | 0.9300 | 0.9297 |
Ensemble | 1 | 0.9420 | 0.9710 | 0.9750 |
Positive | Negative | |
---|---|---|
Positive | 1 | 0 |
Negative | 0.058 | 0.942 |
Models | Sensitivity | Specificity | ICBHI Score | Accuracy |
---|---|---|---|---|
A-CRNN | 0.9289 | 0.8747 | 0.9018 | 0.9023 |
A-BiLSTM | 0.9422 | 0.8410 | 0.8916 | 0.8894 |
A-BiGRU | 0.8344 | 0.8258 | 0.8300 | 0.8304 |
CNN | 0.8965 | 0.8795 | 0.8880 | 0.8881 |
Ensemble | 0.9480 | 0.8920 | 0.9200 | 0.9240 |
Positive | Negative | |
---|---|---|
Positive | 0.9480 | 0.0520 |
Negative | 0.1080 | 0.8920 |
Models | Sensitivity | Specificity | ICBHI Score | Accuracy |
---|---|---|---|---|
A-CRNN | 0.9073 | 0.8746 | 0.8909 | 0.8936 |
A-BiLSTM | 0.9909 | 0.9530 | 0.9720 | 0.9724 |
A-BiGRU | 0.8654 | 0.8069 | 0.8362 | 0.8320 |
CNN | 0.9497 | 0.8562 | 0.9030 | 0.9066 |
Ensemble | 0.9810 | 0.8770 | 0.9290 | 0.9300 |
Positive | Negative | |
---|---|---|
Positive | 0.9810 | 0.019 |
Negative | 0.1230 | 0.8770 |
Models | Sensitivity | Specificity | ICBHI Score | Accuracy |
---|---|---|---|---|
A-CRNN | 0.911 | 1 | 0.9555 | 0.9141 |
A-BiLSTM | 0.9215 | 0.8571 | 0.8893 | 0.9192 |
A-BiGRU | 0.8743 | 0.8571 | 0.8657 | 0.8737 |
CNN | 0.9162 | 0.8571 | 0.8867 | 0.9141 |
Ensemble | 0.9424 | 1 | 0.9712 | 0.9444 |
Bronchiectasis | Bronchiolitis | COPD | Healthy | Pneumonia | URTI | COVID-19 | |
---|---|---|---|---|---|---|---|
Bronchiectasis | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Bronchiolitis | 0 | 0.8333 | 0 | 0.1667 | 0 | 0 | 0 |
COPD | 0.0261 | 0 | 0.9542 | 0.0196 | 0 | 0 | 0 |
Healthy | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
Pneumonia | 0 | 0 | 0 | 0 | 0.8 | 0.2 | 0 |
URTI | 0 | 0 | 0 | 0.4 | 0 | 0.6 | 0 |
COVID-19 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Dataset | Sensitivity | Specificity | ICBHI Score | Accuracy |
---|---|---|---|---|
D1 | 0.9532 | 1 | 0.9766 | 0.9551 |
D2 | 1 | 0.942 | 0.971 | 0.975 |
D3 | 0.948 | 0.892 | 0.920 | 0.924 |
D4 | 0.981 | 0.877 | 0.929 | 0.930 |
D5 | 0.9424 | 1 | 0.9712 | 0.9444 |
Existing Studies | Methodology | No. of Classes | Evaluation Strategies | Results | |
---|---|---|---|---|---|
ICBHI | Wall et al. [20] | BiLSTM with attention mechanisms | 6 | 90–10 (random) | Accuracy rate—0.962 |
Zhang et al. [26] | An evolving ensemble of CRNNs | 3 (healthy, chronic, and non-chronic) | 80–20 (subject-independent) | ICBHI score—0.9803 | |
Wall et al. [22] | BiLSTM | 2 (healthy and unhealthy) | 80–20 (random) | ICBHI score—0.957 | |
Perna [23] | 2D CNN | 3 (healthy, chronic, and non-chronic) | 80–20 (random) | ICBHI score—0.83 | |
Perna and Tagarelli [13] | LSTM with 50% overlapping between windows | 3 (healthy, chronic, and non-chronic) | 80–20 (random) | ICBHI score—0.9 | |
Perna and Tagarelli [13] | LSTM without overlapping | 3 (healthy, chronic, and non-chronic) | 80–20 (random) | ICBHI score—0.89 | |
García-Ordás et al. [24] | 2D CNN with Synthetic Minority Oversampling Technique | 3 (healthy, chronic, and non-chronic) | 10-fold (random) | ICBHI score—0.558 | |
García-Ordás et al. [24] | 2D CNN with Adaptive Synthetic Sampling Method | 3 (healthy, crohnic, and non-crohnic) | 10-fold (random) | ICBHI score—0.911 | |
García-Ordás et al. [24] | 2D CNN with dataset weighted | 3 (healthy, chronic, and non-chronic) | 10-fold (random) | ICBHI score—0.476 | |
This research | Ensemble of optimized A-CRNN, A-BiLSTM, A-BiGRU, and 1D CNN | 6 | 80–20 (subject-independent) | ICBHI score—0.9766 Acccuracy rate—0.9551 | |
Coswara (cough) | Wall et al. [20] | BiLSTM with attention mechanisms | 2 | 90–10 (random) | Accuracy rate—0.968 |
This research | Ensemble of optimized A-CRNN, A-BiLSTM, A-BiGRU, and 1D CNN | 2 | 80–20 (random) | ICBHI score—0.971 Acccuracy rate—0.975 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wall, C.; Zhang, L.; Yu, Y.; Kumar, A.; Gao, R. A Deep Ensemble Neural Network with Attention Mechanisms for Lung Abnormality Classification Using Audio Inputs. Sensors 2022, 22, 5566. https://doi.org/10.3390/s22155566
Wall C, Zhang L, Yu Y, Kumar A, Gao R. A Deep Ensemble Neural Network with Attention Mechanisms for Lung Abnormality Classification Using Audio Inputs. Sensors. 2022; 22(15):5566. https://doi.org/10.3390/s22155566
Chicago/Turabian StyleWall, Conor, Li Zhang, Yonghong Yu, Akshi Kumar, and Rong Gao. 2022. "A Deep Ensemble Neural Network with Attention Mechanisms for Lung Abnormality Classification Using Audio Inputs" Sensors 22, no. 15: 5566. https://doi.org/10.3390/s22155566
APA StyleWall, C., Zhang, L., Yu, Y., Kumar, A., & Gao, R. (2022). A Deep Ensemble Neural Network with Attention Mechanisms for Lung Abnormality Classification Using Audio Inputs. Sensors, 22(15), 5566. https://doi.org/10.3390/s22155566