Pneumonia Disease Detection Using Chest X-Rays and Machine Learning

Usman, Cathryn; Rehman, Saeed Ur; Ali, Anwar; Khan, Adil Mehmood; Ahmad, Baseer

doi:10.3390/a18020082

Open AccessArticle

Pneumonia Disease Detection Using Chest X-Rays and Machine Learning

by

Cathryn Usman

¹,

Saeed Ur Rehman

¹

,

Anwar Ali

^2,*,

Adil Mehmood Khan

¹ and

Baseer Ahmad

¹

Faculty of Science and Engineering, University of Hull, Hull HU6 7RX, UK

²

Department of Electronic and Electrical Engineering, Swansea University Bay Campus, Swansea SA1 8EN, UK

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(2), 82; https://doi.org/10.3390/a18020082 (registering DOI)

Submission received: 30 October 2024 / Revised: 10 December 2024 / Accepted: 17 December 2024 / Published: 3 February 2025

(This article belongs to the Section Algorithms and Mathematical Models for Computer-Assisted Diagnostic Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Pneumonia is a deadly disease affecting millions worldwide, caused by microorganisms and environmental factors. It leads to lung fluid build-up, making breathing difficult, and is a leading cause of death. Early detection and treatment are crucial for preventing severe outcomes. Chest X-rays are commonly used for diagnoses due to their accessibility and low costs; however, detecting pneumonia through X-rays is challenging. Automated methods are needed, and machine learning can solve complex computer vision problems in medical imaging. This research develops a robust machine learning model for the early detection of pneumonia using chest X-rays, leveraging advanced image processing techniques and deep learning algorithms that accurately identify pneumonia patterns, enabling prompt diagnosis and treatment. The research develops a CNN model from the ground up and a ResNet-50 pretrained model This study uses the RSNA pneumonia detection challenge original dataset comprising 26,684 chest array images collected from unique patients (56% male, 44% females) to build a machine learning model for the early detection of pneumonia. The data are made up of pneumonia (31.6%) and non-pneumonia (68.8%), providing an effective foundation for the model training and evaluation. A reduced size of the dataset was used to examine the impact of data size and both versions were tested with and without the use of augmentation. The models were compared with existing works, the model’s effectiveness in detecting pneumonia was compared with one another, and the impact of augmentation and the dataset size on the performance of the models was examined. The overall best accuracy achieved was that of the CNN model from scratch, with no augmentation, an accuracy of 0.79, a precision of 0.76, a recall of 0.73, and an F1 score of 0.74. However, the pretrained model, with lower overall accuracy, was found to be more generalizable.

Keywords:

machine learning; CNN; RESNET; pneumonia infections

1. Introduction and Background

Pneumonia is a respiratory infection caused by bacteria, viruses, and fungi [1], leading to inflammation of the lungs, with symptoms including fever, muscle aches, coughing and breathing difficulties [2]. If not treated on time, pneumonia can be fatal. Each year, more than four million people die from pneumonia [3]. Antibiotics and antivirals are drugs that can treat pneumonia-caused viruses or bacteria; even so, the ability to diagnose and treat pneumonia early on is necessary for the best possible patient care [4].

Chest X-ray is the frequently used imaging mode for detecting pneumonia, due to its low cost and accessibility [5]. Despite the benefits that X-rays provide, there are only a few expert radiologists whose predictions are required in several areas of the world, according to ref. [6], and, presently, the use of chest X-rays to detect pneumonia can result in delayed detection and treatment; as such, the use of computer-aided detection (CAD) systems can serve as a solution to these problems [7]. Deep learning (DL), a subset of machine learning, is significantly assessed as CAD technology in pneumonia detection [8].

Machine learning is being utilized in medical imaging tasks, mapping, and diagnosing images [9]. Machine learning can solve complex computer vision problems in medical imaging [10]. However, deep learning has taken giant steps in these models because of its ability to learn hierarchical features from raw image data [11].

Deep learning techniques achieve classification such that the key features are spontaneously extracted and classified [12]. Compared to other deep learning techniques, CNN is one of the effective approaches for pattern recognition because of its layering topography [13].

CNNs are a type of deep neural network that concentrates on image analyses and consists of convolutional layers and filters that assist in extracting the spatial and temporal features within an image [14]. CNNs principal benefit is its ability to extract meaningful features from full images [15]. CNN models require large, labeled data, excessive cost, and powerful software. As such, transfer learning as an approach to machine learning solves this issue [16]. Transfer learning is characterized by using a pretrained CNN model to solve another new problem, i.e., using the expertise acquired from it for a particular task, to solve other related tasks [16]. In the next section, the background is presented followed by the methodology section and then results. At the end of this article, we have a discussion section, followed by the conclusion of the work.

Background

Other studies have been conducted using deep learning techniques and chest X-rays in pneumonia detection. Ref. [17] used deep learning models RestNet-101 and RestNet-50 to diagnose pneumonia, used deep learning models for detecting fourteen diseases, including pneumonia, and achieved 96% precision, but experienced limitations that affected precision, due to the intricate [2].

Ref. [18] used AlexNet, ResNet 18, DenseNet201, and SqueezeNet to detect bacterial and viral pneumonia in digital X-ray images. The accuracy of the study was up to 98%, 95%, and 93.3%, making the study useful for screening pneumonia. Ref. [19] built a deep learning architecture with VGG16 architecture to detect pneumonia using chest X-ray images. The model was trained on 5856 chest X-ray images. The model’s accuracy was 96.6%, with a sensitivity of 98.1%, specificity of 92.4%, precision of 97.2% and an F1 score of 97.6% [20].

Ref. [21] worked on pneumonia detection in chest X-rays using a convolutional neuro network and transfer learning. There were six models, including VGG16, VGG16, ResNet 50, and Inception v3, trained to classify chest X-ray images into pneumonia and non-pneumonia categories. The models were able to achieve validation accuracy of 85.26% and 92.31%, respectively [7].

The highest accuracy reported in the above-mentioned literature in classifying pneumonia using chest X-rays is 98% [18]; as such, there is room for improvement.

The primary aim of this research is to detect pneumonia using chest X-rays and machine learning. This research develops a robust machine learning model for the early detection of pneumonia using chest X-rays, leveraging advanced image processing techniques and deep learning algorithms, the system will accurately identify pneumonia patterns, enabling prompt diagnosis and treatment. This research develops a convolutional neuro network (CNN) model from scratch and also uses a pretrained ResNet50 as the base model.

This research addresses the following key questions:

How correctly will the models identify pneumonia using chest X-rays compared to already existing work?
How effective is the CNN model developed ground up compared to the pretrained model ResNet-50 for accurately detecting pneumonia?
To what extent do advanced image processing techniques, in particular augmentation, affect the detection of pneumonia using chest X-rays and deep learning algorithms?
What is the impact of dataset size on the performance of the models?

2. Methodology

The deep learning algorithms chosen for the project are CNN and the pretrained model ResNet-50. CNN was considered due to its ability to recognize spatial hierarchies, capture details and patterns in chest X-rays, and effectively differentiate between normal and lung-infected chest X-rays. ResNet-50 has an architecture with 50 layers, trained on ImageNet. ResNet-50 was chosen for the research due to its deep architecture and ability to learn the features that are complex in the images.

Development Environment

The experiments were conducted using Python programming language, with libraries including TensorFlow and Keras because they provide excellent tools for deep learning models, Scikit-learn evaluation and splitting, Pandas for data manipulation and analysis, NumPy (np) for numerical operations, Matplotlib and Seaborn for visualization, and Pydicom for handling DICOM (medical imaging) data. The hardware set consists of an integrated entry-level consumer GPU and a specialized high-performing computing (HPC) system, which is an advanced GPU-enabled computing resource, “VIPER”, designed for demanding computational tasks.

The CNN models were developed with the original dataset and a reduced dataset of 5000 samples, with and without augmentation using CNN architecture, and then hyper-tuned using a Turner search to obtain the best results. The ResNet pretrained model was developed with the original data with augmentation and hyper-tuned for best results.

Dataset

The RSNA pneumonia detection challenge dataset consists of X-ray images and patient meta-data, publicly provided by the US National Institutes of Health Clinical Centre for a challenge [22]. The dataset was chosen because it has been successfully used for similar studies.

The dataset comprises chest X-ray images (DICOM) from 26,684 unique patients. The classification file is made up of three categories built on the following image conditions:

No Lung Opacity/Not Normal: no lung abnormalities detected;
Normal: abnormalities unrelated to pneumonia;
Lung Opacity: opacities that are visible, indicating pneumonia presence.

The label file includes binary targets of “1” for pneumonia and ‘0’ for non-pneumonia. The binary classification of “1” for pneumonia versus ‘0’ for non-pneumonia was categorized as follows:

Pneumonia (1): from Lung Opacity category;
Non-Pneumonia (0): from No Lung Opacity and Not Normal and Normal categories.

The distribution (as shown in Table 1) is as follows:

No Lung Opacity/Not Normal—39.1%;
Normal—29.3%;
Lung Opacity—31.6%.

The datasets were merged based on patient ID. The data are clean, and duplicates and null values are checked and relatively balanced with regard to distribution. The dataset is made up of mostly high-quality chest X-ray images reflecting variability in quality. However, no de-noising or artifact removal techniques were applied in order to maintain the real-world integrity of the dataset.

Flow Chart of Image Processing

As shown in the flow chart in Figure 1, augmentation was applied in models that required it.

Image meta-data were extracted and attributes such as patient sex, patient age, view position, and body part were examined with modality pixel spacing and integrated into the training data to improve pneumonia detection. A few sample images were viewed. Figure 2 displays string values converted to numeric values, which are essential for machine learning.

The distribution of patient sex is ‘1’ is 17,216 and ‘0’ is 13,011. There was no need to balance the data. See Table 2. Age distribution that provided demographic characteristics to see possible effects on training was even. Correlations heat maps of the chosen meta-data were examined for the possible enhancement of the detection of pneumonia.

2.1. Data Augmentation

Data Splitting: After the data were generated, the dataset was then split based on the patient IDs to make sure that a single patient appeared only in the training, validation, or test sets. Initially, 20% of the patients were allocated to the test set (see Table 3). The 80% of patients remaining were then divided into a training set (80% of the patients left) and a validation set (20% of the patients remaining). The correlation heat map of the numerical features in the dataset is shown in Figure 3.

2.2. CNN Model Development and Training

Initial CNN designed: Figure 4 shows the architecture of the CNN model and hyper-tuned model. The filters helped to extract features, Max Pooling helped to reduce dimensions, the flattening layer converted features to a 1D vector, and the dense layer with units dropout to prevent over-fitting. The Sigmoid activation outputs the binary classification probability, the model was compiled with the Adam optimizer and the learning rate of 0.0001 to adjust weights, with a binary cross-entropy loss function, and then trained with early stopping.

2.3. ResNet-50 Development and Training

This study used ResNet 50 architecture with the use of transfer learning, leveraging a pretrained model from the ImageNet dataset, which it used as a base. The top classification layer was taken off to make customization for the binary classification possible. A custom classification head was then added, which comprised a global average pooling layer to reduce feature maps, a dense layer and RELU activation to provide non-linearity, a drop layer of 0.5 to prevent overheating, and a final layer that has Sigmoid activation needed for binary classification. Only the custom layers were trained, with the use of the frozen ResNet50 base. The model was compiled with the use of an Adam optimizer, as well as binary cross-entropy loss and the evaluation matrix used included precision, an accuracy AUC F1 score, and a confusion matrix. A model was developed based on ResNet50 with the original dataset with augmentation, fine-tuned, and the best between the earlier dataset.

2.4. Evaluation Metrics

The evaluation metrics in the research to determine the effectiveness of the models are accuracy, precision, recall, and F1 score, and the terms False Positives (FPs), True Positives (TPs), False Negatives (FNs), and True Negatives (TNs). The metrics are defined as follows:

Accuracy = (TP + TN) (TP + TN + FN + FN)

Recall = TP/(TP + FN)

Precision = TP/(TP + FP)

F1score = 2/((1/Precision) + (1/Recall))

AUC − TPR = TP/TP + FN

3. Results

3.1. Quantitative Presentation and Evaluation of CNN Models and ResNet50 Model Is Shown in Table 4

3.2. CNN Model Trained on the Original Dataset Without Augmentation

With an accuracy of 0.79, the model correctly identifies pneumonia cases 79% of the time. Precision is 0.76, meaning it is accurate in 76% of its positive predictions, while recall is 0.73, indicating it detects 73% of all actual cases. The F1 score of 0.74 reflects a good balance between precision and recall. The value of the AUC, which is 0.85, shows that the model is not only accurate but has a strong ability to differentiate pneumonia and cases without pneumonia. An F1 score of 0.74, which is slightly lower, confirms that the model can benefit from being further refined (also see Table 5).

3.3. Comparison of CNN Models and ResNet50 Model for Sensitivity and Specificity

The models developed on the original dataset with no augmentation achieved the best in specificity, which shows that it is less likely for patients who are healthy wrongly. The augmented datasets, on the other hand, performed a bit more in sensitivity, showing an improvement in the detection of cases with pneumonia. The reduced dataset shows a performance that is lower in sensitivity and specificity, which shows that a smaller dataset may not generalize greatly, especially with augmentation (Table 6).

The results are trustworthy because they were validated and tested on the test dataset. The results are good and self-contained. However, it may have been influenced by over-fitting and the result can be extended, as using more different parameters to prevent over-fitting could lead to an improved outcome.

3.4. CNN Model Trained with Original Data with Augmentation

The CNN model achieved a precision of 0.76, suggesting accuracy in the positive prediction it makes, and recall is the same at 0.73, showing the ability to identify true positives. The F1 score of 0.74 shows a balance between the recall and the precision. The AUC reduces from 0.85 to 0.82, suggesting a reduction in the ability to differentiate between classes.

This result signifies that the addition of augmentation significant no large difference to change the model’s accuracy or precision a, slight advantages to be discriminative. Overall, the results signify that the model is robust.

The model’s results can be trusted due to their consistency with the non-augmented dataset, and the performance metrics suggest an ability to make predictions that are reliable. Further investigation made into the various augmentation strategies and their impact on the model can offer improvements.

3.5. CNN Trained on a Reduced Dataset with Augmentation Without Augmentation

The model achieved an accuracy of 0.75, a precision of 0.72, a recall of 0.67, an F1 score of 0.68, and an AUC of 0.78. The accuracy metrics were reduced to 0.75 from 0.79, precision was reduced to 0.72 and the recall was also reduced to 0.67, showing a decreased ability to identify true positive pneumonia cases. The F1 score of 0.68 shows an imbalance between the precision and the recall performance and the AUC reduced in performance from 0.85 to 0.78, showing a reduced level of discriminate ability.

The model’s decreased performance metrics signify that there is an impact in the use of a smaller dataset with augmentation, indicating that the model struggles with a smaller volume of data, affecting its generalization ability. The lower AUC shows a reduced ability to differentiate between pneumonia cases. Also, a decrease in the metrics shows that augmentation alone is not enough to help reduce dataset volume. Despite the decline in results, they can be said to be trustworthy, as they show the challenges of using a smaller dataset.

3.6. CNN Model Performance with Reduced Dataset with Augmentation

The CNN model performance with a reduced dataset with augmentation shows strong performance with accuracy (0.77), precision (0.74), recall (0.71), F1 score (0.72), and AUC (0.80). These results show that the model is effective, and it maintains a good balance between these three measures even with the use of limited data.

Regarding the quantification of the results, the accuracy achieved was 0.77, showing that the model can classify 77% of pneumonia cases correctly, an okay result considering the reduced data, and a precision of 0.74 shows the ability to correctly predict 74% of the time. A recall at 0.71 shows that the model identifies 71% of true cases of pneumonia, and is correct 74% of the time. The F1 score shows that the model has a balanced performance. The AUC of 0.80 highlights an ability to distinguish pneumonia cases.

The result signifies that, despite the reduction dataset, the model performed well, although with lower metrics than achieved by using the original datasets. The AUC of 0.80 indicates an ability to distinguish between classes.

The results were evaluated from the test set and are trustworthy, though associated with the challenges of using a reduced dataset; on the other hand, the AUC of 0.80 supports its ability to make correct predictions.

While the results are contained, there is room for improvement. Improving augmentation, increasing datasets, or applying transfer learning could boost performance.

3.7. ResNet Model Performance with Original Data with Augmentation

The result here shows moderate performance with an accuracy of 0.74 and an AUC of 0.75; however, a lower recall value of 0.63 shows the challenges the model has in the identification of all true cases. Quantitatively, the model classifies 74% of cases, with a precision of 0.73 and a recall of 0.63, showing there is room for improvement, especially with the recall. An F1 score of 0.63 shows a balance between precision and recall, while the AUC of 0.75 suggests the models’ ability to differentiate between classes.

The results signify little effects of data augmentation on ResNet-50 and are trustworthy; however, recall is lower, and the F1 score emphasizes the need to use caution in the application.

The results are self-contained but show that there is a need for improvement. The model’s moderate performance suggests improvement is needed to enable generalization.

3.8. Qualitative Evaluation of CNN and ResNet50 Models

Figure 5 shows a visual display of true and predicted labels for both CNN and ResNet 50 models, indicating that the models are generally accurate in prediction.

4. Discussion

With regards to comparing the CNN models and ResNet-50 models test results, an overall better accuracy of 0.79 achieved by the CNN model with and without the use of augmentation on the original dataset compared to the ResNet-50 model with augmentation of 0.74, shows a fairly good performance of the models at identifying pneumonia disease, suggesting that it could be effective for pneumonia detection. However, existing studies have performed better. Ref. [23], for instance, used customized CNNs to detect pneumonia and achieved an accuracy of 92%. Refs. [24,25] built a simple CNN architecture for the classification of pneumonia chest X-ray images and achieved 90.68% and 93.73%, respectively [20].

The CNN model with augmentation was compared to the ResNet 50 model with augmentation. The CNN model achieved 0.79 accuracy while the ResNet50 model achieved 0.74, the CNN model achieved a precision of 0.76 compared to ResNet50’s 0.73, and the CNN model achieved a recall value of 0.73 compared to ResNet50’s 0.63. The CNN model had an F1 score of 0.74 compared to ResNet50’s 0.63. CNN achieved an AUC OF 0.84. Although the results show that the CNN model developed from scratch is more effective across all performance metrics compared to the pretrained ResNet50 in training and test set results, the results may have been due to over-fitting. Though a higher result was achieved by [26], they used chest X-ray images to detect the presence of pneumonia with ResNet50, Inception v3, and InceptionResNetV2 and later compared the result with the CNN built from scratch, and then proposed a ResNet50 model from the result. They achieve an accuracy of 93.06%, an 88.87% precision rate, a 96.78% recall rate, and an F1 score of 92.71%.

The CNN model accuracy, using the original data with and without augmentation, remained the same. However, the AUC decreased from 0.85 to 0.82, showing a slight reduction in discriminate ability, even though the accuracy is consistent. The result of the CNN model trained on a reduced dataset the accuracy dropped to 0.75 and the AUC was 0.78. This suggests that the effect of the augmentation could be less effective with respect to data size. The AUC figure of 0.79, on the other hand, shows that augmentation helps but could be less effective with pretrained models. Though [27] developed the CheXNet algorithm using chest X-rays and applied augmentations to the images. They achieve a score of 0.4 35 higher than the average of 0.387 for radiologists. Also, ref. [28], in a classification task, trained on changed images, through different steps of pre-processing, and successfully classified chest infection in chest X-ray, achieving an overall accuracy of 98.46% [29,30].

The CNN original dataset achieved 0.79 with and without augmentation. The CNN reduced dataset accuracy to 75 with augmentation and 0.77 without augmentation. AUC for the original dataset was 0.82–0.85, while for the reduced dataset, it was 0.78–0.80. One would imply that data size has a notable impact on model performance, suggesting that having a larger dataset size generally for better generalization and detection accuracy. The performance drops in the reduced dataset highlight the importance of a considerable quantity of data in training effective deep learning models for image analyses.

Due to computational constraints during the training and time limitations, only the CNN model from scratch was trained using different data sizes and no augmentation. Using exactly the same training parameters for the ResNet 50 model would provide more room for comparison and not limit the ability to fully evaluate the pretrained model’s potential and adaptability to various scenarios.

Clinical Implications of the Visual Quality Improvements

In this study, the enhanced resolution and pre-processing methods have vital clinical implications. The images were resized to 224X224 pixels and normalized into standard output, helping to improve the ability of small pathologies like lung opacities, which are faint, to be detected, as well as the early stage of pneumonia. Converting the grayscale images to RGB allowed for compatibility with ResNet50, which enables the utilization of transfer learning, thereby improving accurate diagnosis. Having avoided artificial de-noising and, in some cases, augmentation, this study preserved real-world variability, which enhanced the model’s ability to generalize and improve the precision of diagnosis. Future research should investigate the effect of visual resolution on detecting small pathologies to validate the improvements more.

5. Conclusions

This research has been able to develop machine learning models for the detection of pneumonia with the use of chest X-ray and deep learning algorithms, demonstrating that a CNN model built from scratch outperforms a pretrained model. The study shows the importance of data augmentation and data size, uncovering that larger datasets significantly improve model performance. Despite the very good results, there is a need for further exploration to overcome the over-fitting of some of the models and enhance the ability to generalize. Future work should explore the potential of different datasets on pretrained models, with the aim of improving diagnostics in medical imaging.

Author Contributions

Conceptualization, S.U.R. and C.U.; Methodology, S.U.R. and C.U.; Software, S.U.R. and C.U.; Validation, S.U.R. and C.U.; Formal analysis, S.U.R., A.A., A.M.K. and C.U.; Investigation, S.U.R., A.A. and A.M.K.; Resources, S.U.R., A.A., A.M.K., B.A. and C.U.; Data curation, S.U.R. and C.U.; Writing—original draft preparation, S.U.R., A.A., A.M.K., B.A. and C.U.; Writing—review and editing, S.U.R., A.A., B.A. and A.M.K.; Visualization, S.U.R., A.A., A.M.K., B.A. and C.U.; Supervision, S.U.R.; Project administration, S.U.R. and A.A.; Funding acquisition, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Rehman, M.U.; Shafique, A.; Khan, K.H.; Khalid, S.; Alotaibi, A.A.; Althobaiti, T.; Ramzan, N.; Ahmad, J.; Shah, S.A.; Abbasi, Q.H. Novel privacy preserving non-invasive sensing-based diagnoses of pneumonia disease leveraging deep network model. Sensors 2022, 22, 461. [Google Scholar] [CrossRef] [PubMed]
Kareem, A.; Liu, H.; Sant, P. Review on Pneumonia Image Detection: A Machine Learning Approach. Hum. Centric Intell. Syst. 2022, 2, 31–43. [Google Scholar] [CrossRef]
Harsono, I.W.; Liawatimena, S.; Cenggoro, T.W. Lung nodule detection and classification from Thorax CT-scan using RetinaNet with transfer learning. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 567–577. [Google Scholar] [CrossRef]
Aydoğdu, M.; Ozyilmaz, E.; Aksoy, H.; Gürsel, G.; Ekim, N. Mortality prediction in community acquired pneumonia requiring mechanical ventilation, values of pneumonia and intensive care unit severity scores. Tuberk. Toraks. 2010, 58, 25–34. [Google Scholar] [PubMed]
Li, Y.; Zhang, Z.; Dai, C.; Dong, Q.; Badrigilan, S. Accuracy of deep learning for automated detection of pneumonia using chest X-ray images: A systematic review and meta-analysis. Comput. Biol. Med. 2020, 123, 103898. [Google Scholar] [CrossRef] [PubMed]
Tahir, A.M.; Chowdhury, M.E.; Khandakar, A.; Al-Hamouz, S.; Abdalla, M.; Awadallah, S.; Reaz, M.B.I.; Al-Emadi, N. A systematic approach to the design and characterization of a smart insole for detecting vertical ground reaction force (vGRF) in gait analysis. Sensors 2020, 20, 957. [Google Scholar] [CrossRef]
Zhang, D.; Ren, F.; Li, Y.; Na, L.; Ma, Y. Pneumonia Detection from Chest X-ray Images Based on Convolutional Neural Network. Electronics 2021, 10, 1512. [Google Scholar] [CrossRef]
Yaseliani, M.; Hamadani, A.Z.; Maghsoodi, A.I.; Mosavi, A. Pneumonia Detection Proposing a Hybrid Deep Convolutional Neural Network Based on Two Parallel Visual Geometry Group Architectures and Machine Learning Classifiers. IEEE Access 2022, 10, 62110–62128. [Google Scholar] [CrossRef]
Alharbi, A.H.; Hosni Mahmoud, H.A. Pneumonia Transfer Learning Deep Learning Model from Segmented X-rays. Healthcare 2022, 10, 987. [Google Scholar] [CrossRef]
Al Mamlook, R.E.; Chen, S.; Bzizi, H.F. Investigation of the Performance of Machine Learning Classifiers for Pneumonia Detection in Chest X-ray Images. In Proceedings of the 2020 IEEE International Conference on Electro Information Technology (EIT), Chicago, IL, USA, 31 July–1 August 2020; pp. 098–104. [Google Scholar] [CrossRef]
Chouhan, V.; Singh, S.K.; Khamparia, A.; Gupta, D.; Tiwari, P.; Moreira, C.; Damaševičius, R.; de Albuquerque, V.H.C. A novel transfer learning-based approach for pneumonia detection in chest X-ray images. Appl. Sci. 2020, 10, 559. [Google Scholar] [CrossRef]
Kadry, S.; Nam, Y.; Rauf, H.; Rajinikanth, V.; Lawal, I. Automated Detection of Brain Abnormality using Deep-Learning-Scheme: A Study. In Proceedings of the 2021 Seventh International Conference on Bio Signals, Images, and Instrumentation (ICBSII), Chennai, India, 25–27 March 2021; pp. 1–5. [Google Scholar]
Artemi, M.; Liu, H. Image Optimization using Improved Gray-Scale Quantization for Content-Based Image Retrieval. In Proceedings of the 2020 IEEE 6th International Conference on Optimization and Applications (ICOA), Beni Mellal, Morocco, 20–21 April 2020; pp. 1–6. [Google Scholar] [CrossRef]
Albawi, S.; Mohammed, T.A.; Al-Azawi, S. Understanding of a Convolutional Neural Network. In Proceedings of the 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017; pp. 1–6. Available online: https://www.researchgate.net/publication/319253577_Understanding_of_a_Convolutional_Neural_Network (accessed on 16 December 2024).
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 9. [Google Scholar] [CrossRef]
Yang, Z.; Zhao, Q. A Multiple Deep Learner Approach for X-ray Image-Based Pneumonia Detection. In Proceedings of the 2020 International Conference on Machine Learning and Cybernetics (ICMLC), Adelaide, Australia, 2 December 2020; pp. 70–75. [Google Scholar]
Rahman, T.; Chowdhury, M.; Khandakar, A.; Islam, K.; Islam, K.; Mahbub, Z.; Kadir, M.A.; Kashem, S. Transfer learning with deep convolutional neural network (CNN) for pneumonia detection using chest X-ray. Appl. Sci. 2020, 10, 3233. [Google Scholar] [CrossRef]
Zubair, S.; Shah, U.; Abd-Alrazeq, A.; Alam, T.; Househ, M. An Efficient Method to Predict Pneumonia from Chest X-Rays Using Deep Learning Approach. In The Importance of Health Informatics in Public Health During a Pandemic; Mantas, J., Hasman, A., Househ, M.S., Gallos, P., Zoulias, E., Eds.; IOS Press: Amsterdam, The Netherlands, 2020; Volume 272, pp. 457–460. [Google Scholar]
Kundu, R.; Das, R.; Geem, Z.W.; Han, G.-T.; Sarkar, R. Pneumonia detection in chest Xray images using an ensemble of deep learning models. PLoS ONE 2021, 16, e0256630. Available online: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0256630 (accessed on 16 December 2024). [CrossRef]
Rajain, R.; Nagrath, P.; Kataria, G.; Kaushik, V.S.; Hemanth, D.J. Pneumonia detection in chest X-ray images using convolutional neural networks and transfer learning. Measurement 2020, 165, 108046. [Google Scholar]
Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. ChestX-ray8: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Rajaraman, S.; Candemir, S.; Kim, I.; Thoma, G.; Antani, S. Visualization and interpretation of convolutional neural network predictions in detecting pneumonia in pediatric chest radiographs. Appl. Sci. 2018, 8, 1715. [Google Scholar] [CrossRef]
Sharma, H.; Jain, J.; Bansal, P.; Gupta, S. Feature Extraction and Classification of Chest X-ray Images Using CNN to Detect Pneumonia. In Proceedings of the 2020 10th International Conference on Cloud Computting, Data Science & Engineering (Confluence), Noida, India, 29–31 January 2020; pp. 227–231. [Google Scholar]
Stephen, O.; Sain, M.; Maduh, U.J.; Jeong, D.U. An efficient deep learning approach to pneumonia classification in healthcare. J. Healthc. Eng. 2019, 2019, 4180949. [Google Scholar] [CrossRef]
Manickam, A.; Jiang, J.; Zhou, Y.; Sagar, A.; Soundrapandiyan, R.; Samuel, R.D.J. Automated pneumonia detection on chest X-ray images: A deep learning approach with different optimizers and transfer learning architectures. Measurement 2021, 184, 109953. Available online: https://www.sciencedirect.com/science/article/pii/S0263224121008885 (accessed on 10 June 2024). [CrossRef]
Rajpurkar, P.; Irvin, J.; Zhu, K.; Yang, B.; Mehta, H.; Duan, T.; Ding, D.; Bagul, A.; Langlotz, C.; Shpanskaya, K.; et al. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv 2017, arXiv:1711.05225. [Google Scholar]
Mabrouk, A.; Redondo, R.P.D.; Dahou, A.; Elaziz, M.A.; Kayed, M. Pneumonia Detection on chest X-ray images Using Ensemble of Deep Con-volutional Neural Networks. Appl. Sci. 2022, 12, 6448. [Google Scholar] [CrossRef]
Arun Kumar, V.; Nikitha, B.; Anjali, B.; Sirichandana, A.; Harshitha, D. Machine Learning Model for Pneumonia Detection from Chest X-Rays. J. Cardiovasc. Dis. Res. JCDR 2023, 14, 271–282. [Google Scholar]
Razavian, A.S.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 512–519. Available online: https://ieeexplore.ieee.org/document/6910029 (accessed on 10 June 2024).

Figure 1. Flow chart.

Figure 2. A selection of chest X-ray samples used for the classification task. The upper rows show the class with lung opacity and the below row shows the class with no lung opacity/not normal.

Figure 3. Correlation heat map of the numerical features in the dataset.

Figure 4. Flow diagram of the CNN model, CNN model with hyper-parameter tuning, and ResNet-50 model architectures.

Figure 5. A sample visual display of true and predicted labels for the first and second line from CNN original data without augmentation and with augmentation, respectively, and the third line shows sample images from ResNet50 model.

Table 1. Distribution of the classes.

Class	Percentage in Dataset
Normal	29.3%
No Lung opacity/Not Normal	39.1%
Lung Opacity	31.6%

Table 2. Distribution of Patient Sex.

Patient Sex
1	17,216
0	13,011

Table 3. Augmentation parameters—when applied to the models to improve their robustness.

Augmentation	Rate	Function
Rotation range	20	Randomly rotates the image within the range that is specified
Width shift range	0.2	Randomly rotates the image within the degree that is specified
Height shift Range	0.2	Shifts the image in a vertical direction by a fraction of the total height
Zoom range	0.2	Zooms in and out of the image randomly
Horizontal flip	True	Flip the image randomly Sets out the way pixels created are filled when the image is Nearest transformed
Fill mode	Nearest	Transformed

Table 4. Result of CNN models.

	Accuracy	Precsion	Recall	F1 Score	AUC
CNN original dataset with no augmentation	0.79	0.76	0.73	0.74	0.85
CNN original data with augmentation	0.79	0.76	0.73	0.74	0.82
CNN reduced dataset with no augmentation	0.75	0.72	0.67	0.68	0.78
CNN reduced dataset with augmentation	0.77	0.74	0.71	0.72	0.80

Table 5. Result of ResNet-50 model.

	Accuracy	Precsion	Recall	F1 Score	AUC
ResNet-5- Original data with augmentation	0.74	0.73	0.63	0.63	0.75

Table 6. Result for CNN models and ResNet50 model for sensitivity and specificity.

Model Type	Sensitivity (%)	Specificity (%)
Original Dataset (No Augmentation)	70.6%	84.4%
Original Dataset (With Augmentation)	71%	81.7%
Reduced Dataset (No Augmentation)	67.6%	80.1%
Reduced Dataset (With Augmentation)	67.2%	77.2%
ResNet-50 Model (With Augmentation)	71.7%	74.7%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Usman, C.; Rehman, S.U.; Ali, A.; Khan, A.M.; Ahmad, B. Pneumonia Disease Detection Using Chest X-Rays and Machine Learning. Algorithms 2025, 18, 82. https://doi.org/10.3390/a18020082

AMA Style

Usman C, Rehman SU, Ali A, Khan AM, Ahmad B. Pneumonia Disease Detection Using Chest X-Rays and Machine Learning. Algorithms. 2025; 18(2):82. https://doi.org/10.3390/a18020082

Chicago/Turabian Style

Usman, Cathryn, Saeed Ur Rehman, Anwar Ali, Adil Mehmood Khan, and Baseer Ahmad. 2025. "Pneumonia Disease Detection Using Chest X-Rays and Machine Learning" Algorithms 18, no. 2: 82. https://doi.org/10.3390/a18020082

APA Style

Usman, C., Rehman, S. U., Ali, A., Khan, A. M., & Ahmad, B. (2025). Pneumonia Disease Detection Using Chest X-Rays and Machine Learning. Algorithms, 18(2), 82. https://doi.org/10.3390/a18020082

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Pneumonia Disease Detection Using Chest X-Rays and Machine Learning

Abstract

1. Introduction and Background

Background

2. Methodology

2.1. Data Augmentation

2.2. CNN Model Development and Training

2.3. ResNet-50 Development and Training

2.4. Evaluation Metrics

3. Results

3.1. Quantitative Presentation and Evaluation of CNN Models and ResNet50 Model Is Shown in Table 4

3.2. CNN Model Trained on the Original Dataset Without Augmentation

3.3. Comparison of CNN Models and ResNet50 Model for Sensitivity and Specificity

3.4. CNN Model Trained with Original Data with Augmentation

3.5. CNN Trained on a Reduced Dataset with Augmentation Without Augmentation

3.6. CNN Model Performance with Reduced Dataset with Augmentation

3.7. ResNet Model Performance with Original Data with Augmentation

3.8. Qualitative Evaluation of CNN and ResNet50 Models

4. Discussion

Clinical Implications of the Visual Quality Improvements

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI