1. Introduction
Heart failure (HF) occurs when the heart becomes too weak or stiff to effectively pump blood to meet the body’s needs [
1], resulting in a variety of health problems and high medical expenses. HF has a significant global impact, affecting millions of people worldwide despite medical advancements [
2,
3]. There is a shortage of HF data on certain patient populations [
3]: Many studies focus on HF patients in the United States, but these studies might not be representative of HF patients in the rest of the world [
4,
5,
6]. In Africa, HF is still a significant clinical and health concern, often manifesting as an urgent medical condition requiring prolonged hospital stays [
7]. Comprehensive data on HF are lacking in Sub-Saharan Africa (SSA), with little information primarily obtained from urban hospitals [
8,
9,
10]. SSA HF cases have high hospitalization rates and significant associated healthcare costs. Like other countries, HF is a serious public health emergency [
11] in Rwanda. Non-communicable illnesses such as HF accounted for 34.7% of deaths in Rwanda in 2020. The need to address HF’s socio-economic effects is highlighted by the fact that it accounts for 5% to 10% of adult hospital admissions in SSA, with a similar trend in Rwanda [
12].
Several methods for lowering HF-related hospital readmissions have been explored over the past three decades [
13,
14]. However, there is still a lack of widespread application machine learning (ML)/artificial intelligence (AI) techniques for anticipating readmissions due to HF, particularly in low- and middle-income countries [
15]. ML classifiers enable computers to autonomously learn from data, recognize patterns, and predict outcomes from various inputs without explicitly being programmed [
14]. The popularity of ML has grown across a variety of industries due to its outstanding ability to quickly analyze large datasets and reach complex conclusions, improving operations, data-driven decision-making, and innovation [
16].
ML classifiers have been used in previous research to precisely predict outcomes like hospital readmission for HF and others [
17,
18]. However, issues with limited electronic health data integration and class imbalance in medical datasets are present in low- and middle-income countries [
19]. Due to lack of data unique to specific countries, like health knowledge, cultural norms, and medical facilities, Rwanda lacks reliable and accurate predictive models to be used in clinical practice [
17,
20,
21]. In this study, we use locally gathered comprehensive data on HF in Rwanda, and we explore a variety of ML classifiers to predict HF hospital readmission. We compare prediction performance of multi-layer perceptrons (MLP), logistic regression (LR), decision trees (DT), K-nearest neighbors (KNN), random forests (RF), and support vector machines (SVM). This study also seeks to pinpoint crucial factors influencing hospital readmissions for HF in Rwanda, providing knowledge and skills to enhance the management of HF. Efforts to accurately predict HF hospital readmission, and to identify high-risk HF patients in Rwanda, may improve HF management and result in better patient outcomes and cost savings [
22,
23].
2. Materials and Methods
This retrospective study collected data from medical records of HF patients who were hospitalized in Rwanda between 1 January 2008 and 31 December 2019. The records were obtained from seven hospitals that were able to treat HF in Rwanda. These include Rwandan Military Hospital (RMH), King Faisal Hospital (KFH), University Teaching Hospital of Butare (CHUB), University Teaching Hospital of Kigali (CHUK), Rwinkwavu Hospital (RWH), Kirehe Hospital (KIH), and Butaro Hospital (BUH). We extracted various features of interest from the patients’ medical records, including age, sex, district of residence, marital status, occupation, resting heart rate, blood pressure, history of hypertension and smoking, heart ultrasound results, risk factors for HF, number of hospitalization days, respiratory rate upon admission, slope, chest pain, cholesterol status, blood sugar, results of electrocardiography at rest, reason for discharge, presence of feces on admission, and past medical and family history.
We utilized Jupyter Notebook as the primary tool for building ML models. We installed Python 3.11.1 and essential packages such as pandas, seaborn, matplotlib, and scikit-learn, which come with built-in libraries and functions, to facilitate data manipulation, visualization, analysis, and construction of machine learning models. Jupyter Notebook was our preferred tool due to its user-friendly interface, advanced data cleaning capabilities, and fast implementation of modeling processes using the Python programming language.
In this study, we compared six ML classifiers, including multi-layer perceptron (MLP), K-nearest neighbors (KNN), logistic regression (LR), decision trees (DT), random forests (RF), and support vector machines (SVM). All classifiers were trained, tested, and compared using performance metrics including the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. Inputs for each classifier included the following: The MLP classifier contained 3 hidden layers with 128, 64, and 32 neurons in the hidden layers, and it used a sigmoid activation function. The KNN classifier assumed k = 5; the DT classifier used a maximum depth of 8 and a minimum of 20 samples per leaf. For splitting purposes, Gini impurity criteria were used. The RF classifier used a random forest ensemble with 100 trees, with a maximum depth of 10 and 10 samples per leaf. Then, the SVM classifier with a radial basis function kernel used a regularization parameter of 10. For the SVM with a linear kernel, the regularization parameter was 0.1; other hyperparameters were considered at their default values for simplicity.
First, we performed data collection, exploratory data analysis, and preprocessing to clean and prepare the data for further analysis. Here, we dropped all variables with 50% or more null values from the data frame. For most variables with less than 50% null entries, we filled in missing values using the KNN imputer algorithm. For the age variable, we filled in the missing values using the median since this variable appeared to have a uniform distribution. Second, the imbalance in the dataset was handled to ensure that the two classes of HF patients (0 = no hospital readmission; 1 = at least one hospital readmission within 20 days of hospital discharge) were balanced. To address the problem of dataset heterogeneity, we used the Synthetic Minority Oversampling Technique (SMOTE). SMOTE is an oversampling method that involves synthesizing new instances using the current data to oversample underrepresented groups [
24,
25]. Third, we extracted important features to reduce the dimensionality of the dataset as it contained over 60 features. This step ensured that only relevant features were included in the model development process, leading to improved model accuracy. We then split the dataset into training and testing sets to evaluate the performance of the model. According to the methodology of Dobbin and colleagues, we used 80% of the data for training and 20% for testing [
26,
27]. Fourth, we standardized the features for training and testing by shifting and scaling the data to have zero mean and unit standard deviation. Lastly, we trained the classifiers with the training set and evaluated using the testing set to determine the precision and accuracy of the models in predicting HF hospital readmission rates in Rwanda.
We used a confusion matrix as the evaluation metric to measure the performance of the algorithms on both the training and testing datasets. Furthermore, the receiver operating characteristic curve (ROC), the area under the ROC curve (AUC), accuracy, precision, recall, and F1-score metrics were utilized to plot, compare, and identify the best-performing model. The results of this evaluation provided the necessary information for drawing conclusions in line with the research objectives.
Though there was no direct contact with the HF patients while collecting HF data but with their respective files, ethical approval was provided by competent authorities including the Ministry of Health of Rwanda and the Rwandan Institutional Review Board and all concerned seven hospitals.
4. Discussion
The purpose of this study was to investigate the applicability of multiple ML classifiers for predicting HF hospital readmission in Rwanda using locally gathered data. Our findings revealed notable insights into the effectiveness of these models. In terms of predicting hospital readmissions in the Rwandan context among the models evaluated, the random forest classifier emerged as the most promising option for predicting HF hospital readmission in Rwanda. In addition, the support vector machine, K-nearest neighbors, and multi-layer perceptron approaches all demonstrated admirable performance. Therefore, the results obtained in this study are consistent and reliable, which is supported by the fact that they align well with the study conducted in 2022 by Michailidis and colleagues [
28]. This implies that the effectiveness of these classifiers can be, at least in part, generalized and is not limited to a particular dataset or context. On the other hand, it is crucial to note that the decision tree classifier only managed to achieve an area under the curve of 57%, and this performance is below average. This outcome differs significantly from the performance of the same model that has been used previously in the literature [
29,
30,
31]. This might indicate that the decision tree classifier is not a good fit for the specific characteristics of the healthcare dataset from Rwanda.
Generally, the findings of this study as a whole highlight the potential of ML techniques in accurately predicting hospital readmission for HF patients in Rwanda. Nevertheless, system-specific challenges will need to be carefully considered in future studies due to the decision tree classifier’s poor performance, which calls into question its suitability. In fact, the use of these models can lead to better long-term health outcomes, reduced readmissions to hospitals, and enhanced patient care. These predictive capabilities might help healthcare professionals in Rwanda allocate resources more effectively and customize interventions to patients’ unique needs, thereby improving the standard of care. Nevertheless, it is important to recognize the limitation and challenges that come up when applying ML techniques to the Rwandan healthcare system. These classifiers’ performance and generalizability may be impacted by the particularities and complexities of the Rwandan healthcare system, including data availability, quality, and cultural considerations.
5. Conclusions and Recommendations
In order to improve the standard of care and health outcomes of HF in Rwanda, more research is still required. This can result in more accurate predictions, personalized treatment plans, and better utilization of healthcare resources. Future research could improve classifiers to better fit the local context, address data issues, and account for the unique constraints of the healthcare system. Addressing the identified knowledge gaps can aid in the development of more precise and relevant predictive models for HF readmission in Rwanda, which can have a positive impact on the country’s ability to manage and prevent HF hospital readmission.
While random forest classification shows the best performance, it is important to base actions on principles. The SVM approach also works well for this purpose. The study’s results indicate that ML techniques can accurately predict hospital readmission for HF patients in Rwanda, which can lead to improved care, fewer hospital readmissions, and better long-term health outcomes.
In order to help healthcare practitioners anticipate the possibility of readmission for specific patients, the classifiers can be used as decision support tools. By offering early interventions and personalized treatment plans, this technology can assist healthcare professionals in making the most use of their resources and improving patient outcomes. RF is a useful tool for anticipating HF readmissions, offering a strategy that can help patients obtain better results and make the most of healthcare resources. To minimize potential negative effects, such as over-reliance on the model’s predictions, which would impair clinical judgement and patient-centered treatment, healthcare practitioners must understand the model’s strengths and limits and ensure that it is used effectively.
To conclude, this study contributes to the growing body of literature on the application of the ML algorithm in medicine and suggests that ML has the potential to enhance HF management in Rwanda.