Diagnostic Tool for Early Detection of Rheumatic Disorders Using Machine Learning Algorithm and Predictive Models

Mills, Godfrey A.; Dey, Dzifa; Kassim, Mohammed; Yiwere, Aminu; Broni, Kenneth

doi:10.3390/biomedinformatics4020065

Open AccessArticle

Diagnostic Tool for Early Detection of Rheumatic Disorders Using Machine Learning Algorithm and Predictive Models

by

Godfrey A. Mills

^1,*,

Dzifa Dey

²,

Mohammed Kassim

³,

Aminu Yiwere

¹ and

Kenneth Broni

¹

Department of Computer Engineering, University of Ghana, Accra P.O. Box LG 77, Ghana

²

Department of Medicine and Therapeutics, University of Ghana Medical School, Accra P.O. Box GP 4236, Ghana

³

Department of Electrical Engineering and Computer Science, University of Ottawa, 75 Laurier Ave E, Ottawa, ON K1N 6N5, Canada

^*

Author to whom correspondence should be addressed.

BioMedInformatics 2024, 4(2), 1174-1201; https://doi.org/10.3390/biomedinformatics4020065

Submission received: 8 March 2024 / Revised: 6 April 2024 / Accepted: 4 May 2024 / Published: 8 May 2024

(This article belongs to the Special Issue Editor's Choice Series for the Applied Biomedical Data Science Section)

Download

Browse Figures

Versions Notes

Abstract

:

Background: Rheumatic diseases are chronic diseases that affect joints, tendons, ligaments, bones, muscles, and other vital organs. Detection of rheumatic diseases is a complex process that requires careful analysis of heterogeneous content from clinical examinations, patient history, and laboratory investigations. Machine learning techniques have made it possible to integrate such techniques into the complex diagnostic process to identify inherent features that lead to disease formation, development, and progression for remedial measures. Methods: An automated diagnostic tool using a multilayer neural network computational engine is presented to detect rheumatic disorders and the type of underlying disorder for therapeutic strategies. Rheumatic disorders considered are rheumatoid arthritis, osteoarthritis, and systemic lupus erythematosus. The detection system was trained and tested using 70% and 30% respectively of labelled synthetic dataset of 100,000 records containing both single and multiple disorders. Results: The detection system was able to detect and predict underlying disorders with accuracy of 97.48%, sensitivity of 96.80%, and specificity of 97.50%. Conclusion: The good performance suggests that this solution is robust enough and can be implemented for screening patients for intervention measures. This is a much-needed solution in environments with limited specialists, as the solution promotes task-shifting from the specialist level to the primary healthcare physicians.

Keywords:

rheumatic disorder; diagnostic tool; early detection; multilayer neural network; rheumatoid arthritis; osteoarthritis; systemic lupus erythematosus; machine learning

1. Introduction

Rheumatic diseases are complex and chronic diseases that affect the joints, muscles, bones, and vital organs of individuals, especially the aging group in society [1]. These diseases are multicausal and may be caused by external factors such as the environment, lifestyle, and genetic factors [2]. Rheumatic disorders comprise over 200 diseases, the majority of which are autoimmune [3]. These disorders can be broadly classified into two categories, inflammatory and non-inflammatory [4]. Inflammatory autoimmune rheumatic disorders occur when the body’s immune system attacks the healthy cells, while non-inflammatory disorders are usually associated with wear and tear in joint cartilage [5]. When rheumatic diseases become severe, affected individuals may find it difficult to perform basic activities of daily living, such as bathing or dressing, leading to deterioration in the quality of life, which consequently affects social life and financial costs to the affected individuals. In some cases, the disease may render the individual incapacitated and may cause permanent organ damage. Developing a mechanism to find early signs of the rheumatic disease formation, development, and progression in patients through screening to predict the existence or otherwise of the disease, and localizing the disease to the exact type of rheumatic disorder responsible can help rheumatologists to provide timely diagnosis and treatment strategies. This, therefore, motivated our study to leverage intelligent techniques to navigate and analyze the large data set with unstructured heterogeneous content of patient clinical information for characteristic features of rheumatic diseases to detect and predict the underlying type of disorder for intervention measures.

The screening of patients to establish the existence or absence of rheumatic diseases and localization of the condition is a complex process that involves the critical evaluation of large unstructured data from sources such as patient symptoms, patient history, history of previous episodes, clinical examinations, specific laboratory findings, and radiographic imaging content. The detection process can be organized into three functional processes: (a) acquisition of physiological signals of patients, which are mostly acquired through standard medical procedures (b) evaluation of information from medical history and related data, which are mainly used to consolidate clinical decisions, and (c) evaluation of laboratory and imaging investigations, which are mostly used to validate the clinical decisions and also to monitor the extent and impact of the diseases. Laboratory examinations typically involve analysis of acquired test data from biochemistry, serological, and radiographic imaging, to establish the impact of the disease on vital organs such as the liver, kidneys, etc. Some laboratory investigations that are usually performed include full blood count (FBC), erythrocyte sedimentation rate (ESR), C-reactive protein (CRP), antinuclear antibodies (ANA), liver function test (LFT), etc. [6,7]. Radiographic imaging usually provides important information to rheumatologists on disease activity and damage such as erosion, loss of joint space, formation of cysts, subluxation, and, ankyloses. Typical imaging information includes X-ray, ultrasound, computed tomography (CT), and magnetic resonance imaging (MRI).

One of the challenges of human efforts in accurately evaluating and predicting any underlying rheumatic disorders is the processing and analyzing of large, complex, and heterogeneous information content. The large amount of data that must be processed makes the detection process time-consuming and costly regarding the level of expertise and medical resources. Another challenge that can affect the quality of the clinical decision is the similarity of symptoms between the different rheumatic disorder types. This problem makes the detection tasks by human effort quite cumbersome, complicated, and time-consuming, leading to subjective decisions based on the level of expertise. This makes the detection task quite difficult for a primary healthcare physician who is not a specialist in that area due to the considerable amount of knowledge and experience required to make a valid and accurate diagnosis [8,9]. However, with the ability of machine learning techniques to navigate through large data and extract hidden trends and patterns [10], it is possible to create predictive models that could help in the diagnostic process of detecting underlying rheumatic conditions with good precision. This will not only improve processing time but also improve the diagnostic process for rapid evaluation.

One of the challenges for developing countries in bridging the rheumatic service delivery gap is the limited number of specialists available compared to the number of patients seeking rheumatic care. Incidentally, most available specialists and resources are not evenly distributed across the countries but are mostly concentrated in the capital cities and the large urban centres. Due to the high patient-to-specialist ratio, delays are likely to occur in the rate at which diagnoses are carried out on patients. Furthermore, subjecting the limited specialists to overwork may also potentially affect their ability to effectively diagnose patients, which could impact the quality of service delivery. In this regard, if some of the diagnostic tasks of the specialists could be shifted down to the primary healthcare physicians, through the use of an intelligent automated solution, this would not only help reduce the workload of the specialists but also help provide speedy and precise diagnosis of patients for early intervention measures. In such a case, the specialists would have the benefit of receiving patients who have been thoroughly screened and clinically established to have a form of rheumatic disorder for further diagnosis and treatment. strategies.

Machine learning algorithms are potential tools that could be integrated into the diagnostic process for the evaluation and detection of rheumatic diseases [3,11,12,13,14,15,16,17,18,19,20,21,22]. These algorithms have become important components in the diagnosis of conditions due to their inherent ability to analyze large datasets to identify trends and patterns that are not easily visible in the datasets and use the information to make predictions with high precision. Machine learning solutions have been used to analyze electronic health records (EHR) and imaging information to predict different types of rheumatic diseases. Through machine learning solutions, it is also now possible to map out diagnoses of patients with corresponding treatment strategies [23,24].

In this study, we present a machine learning-based software solution using a multilayer neural network (MLNN) as a computational engine for analysis and accurate detection of rheumatic disorders and prediction of the type of disorder. In this study, we focus on three of the most common rheumatic disorders found in most countries, and these are rheumatoid arthritis (RA), osteoarthritis (OA), and systemic lupus erythematosus (SLE). The implemented MLNN model was developed using an input layer with 45 neurons for the extracted features, two hidden layers with 10 and 15 neurons, respectively, and an output with 5 neurons for the multiclass classification task of the features. The characteristic features for the disorders were extracted from a combination of information sources that included clinical examination, patient history (medical and family), and other demographic-related information. The features defining the comorbidity conditions were also taken into account. The features were presented as logical data and encoded in binary format to represent the presence or absence of disorders. Patient medical information from laboratory examination and radiographic imaging were not considered in this study. The MLNN model was trained and tested on a labelled synthetic dataset of 100,000 records of patients with varied age ranges, gender, and rheumatic disorders for cases of single and multiple disorder conditions as well as non-diseased conditions. Historical records of diagnosed cases of rheumatic disorders from the Rheumatology Unit of the Korlebu Teaching hospital was used as the basis for the data distribution. Synthetic datasets were generated using a combination of the 2010 American College of Rheumatology ACR/European League Against Rheumatism EULAR classification criteria [25] and expert knowledge of rheumatologists. The model was trained using 70% of the dataset while 30% of the dataset was used for testing and validation of the model. Clinically diagnosed records of patients were also used in experiments to validate the detection model. The functional ability of the detection system to detect and predict the prevailing conditions of rheumatic disorders was evaluated using performance metrics such as accuracy, sensitivity, specificity, recall, precision and the F1 score.

The rest of the paper is organized as follows: Section 2 examines the initiatives in machine learning techniques for the detection of rheumatic disorders. Section 3 presents the development of the model for the rheumatic detection system, including the selection of the features and processing, the development of the machine learning model, training and testing, and the performance evaluation of the model. Section 4 focuses on the user application software development required for the detection system along with data management and security. Section 5 presents the experimental setup to validate the operational capability of the system and discussion of the results, while Section 6 presents the conclusion and further research directions.

2. Related Works

Machine learning techniques have emerged as tools that could be used to analyze large data and extract hidden information to assist decision-making. This capability has made such techniques suitable for implementation in the diagnostic process for various conditions such as rheumatic disorders, neurological disorders, etc., to assist physicians in their efforts to provide effective healthcare delivery [26,27]. Two popular machine learning techniques that have been used extensively in applications are the supervised learning and unsupervised learning techniques. While the supervised machine learning models are trained using labelled data that consists of inputs and their corresponding target outputs, the unsupervised learning models are trained using unlabelled data, and the models seek to find unknown underlying trends and patterns in the data to create clusters based on similarity of the features [10]. Various supervised and unsupervised machine learning techniques have been used to analyze and learn the characteristics of various rheumatic disorders to predict the prevailing conditions of patients for intervention measures.

In Jamian et al. [28], a random forest-based detection system was implemented for the detection of systemic sclerosis (SS) disorders from electronic health records (EHRs). Data from the EHR was analyzed and features were extracted for the learning and classification task. The detection model was trained and tested using a dataset with more than 3 million subjects with 1899 of the subjects having the systemic sclerosis with at least one count of the condition. The performance evaluation of the detection system showed that the model was able to adequately detect conditions of the SS disorders in the EHRs with a sensitivity of 92%, F1 score of 88%, and PPV rate of 84%. The good detection performance demonstrated the potential of the random forest algorithm as a capable tool to learn the characteristics of SS disorders from EHRs and predict patients with such underlying conditions for clinical decisions.

Jorge et al. [29] also demonstrated the ability of the machine learning algorithm as an effective tool over rule-based learning in the detection of patients with SLE disorders from EHR. The authors used Datasets from the Partners Healthcare Research Patient Data Repository (RPDR) and ICD-10 codes for the study. A training set of 200 randomly selected patient charts was used for classification of the SLE disorders into three categories, namely definite SLE, probable SLE, and no SLE. The performance evaluation results showed that the machine learning detection method was able to predict the SLE disorders in the EHR with a positive predictive value (PPV) of 90% and a sensitivity of 64% for the definite class of SLE disorders. The rule-based learning detection method achieved lower performance with a PPV range between 45% and 70% for probable SLE disorders.

In the study by Norgeot et al. [30], a detection system that uses deep learning models was implemented for the prediction of patients with RA disorders from EHR data. The focus of the study was mainly to predict future outcomes of patients with the RA diseases. The data sources used for the study include demographics, laboratory findings, and prior measures of disease activity from the EHR data to predict the RA activity in patients before the next clinic visit. A total of 578 patient records were used for development. The performance of the deep learning model was compared with results from the Clinical Disease Activity Index (CDAI) score at the next predicted visit for remission, low, moderate, or high disease activity. Performance evaluation of the detection system showed that the deep-learning model was able to effectively predict the next clinic visit of patients with AUROC score of 0.91 on testing with 116 patients. The authors were able to demonstrate that the prediction task of RA activity in patients could be learned by machine learning models with outcomes that are comparable to the performance of a highly skilled human.

In Yoo et al. [18], the k-means algorithm, which is an unsupervised learning method, was implemented as a tool to predict the early risk of RA diseases in patients. The authors used a clinical dataset with 60 records of anonymous RA patients for the study. The dataset was randomly organized into 4 clusters as: Anti-CCP, Rheumatoid factor—RF, SJC, and ESR, and the k-means learning model was used to predict the existence of RA disorder in a patient. The RA disorder was predicted using four assessment parameters comprising: RF > 7, ACCP > 18, SJC > 4, and ESR > 25. The evaluation results revealed that the k-means learning model was able to correctly predict patients who could develop RA disease with a BSS (between the sum of squares) to TSS (total sum of squares) ratio of 84.1%.

In Elkin et al. [31], a decision support system that employs the Bayesian learning concept was implemented for the evaluation of knee disorders in patients. The dataset used for the model creation of possible knees was based on responses to questions. The results of the performance evaluation confirmed that Bayesian learning was able to predict conditions of knee disorders better than the heuristic learning detection method.

Dang et al. [32] also implemented a deep learning model for automatic assignment of joint scores to RA patients using X-ray image information and SvH RA image scoring. The authors used a 13-layer MLP architecture and varied the dimensions of the X-ray images of the hands and feet of the patients. The performance evaluation results revealed that the deep learning based model was able to adequately classify the RA patients with an accuracy of 90.8%. The study by Vodencarevic et al. [33], also focused on RA detection using an ensemble machine learning technique that consists of four base learners and a stacking-based meta-learner. The focus of the study was to predict individual flare risks in patients with RA who tapered anti-rheumatic treatment when they reached remission. The authors used a dataset from 41 patients and 135 visits. The performance evaluation results showed that the stacking-based model was able to adequately predict the RA conditions with reasonable an AUROC score of 81%. Similarly, Zhou et al. [34] also used an ensemble machine learning method to identify RA disorders using the most informative predictors of patients in EHRs. The authors used two forms of datasets for the model development; a dataset from the SAIL databank with more than 2,238,360 records from 1999 to 2013, and a secondary dataset from the Cardiff-Cellma. The dataset for the model validation from the Cardiff-Cellma dataset had a prevalence of 27% of RA conditions. The evaluation results showed that the ensemble detection method was able to effectively detect prevailing RA conditions with a PPV value of 85.6%, a specificity of 94.6%, sensitivity of 86.2%, and an overall detection accuracy of 92.29%.

In Shamir et al. [8], a detection system that uses weighted nearest-neighbour learning rule was implemented for the prediction of OA disorders and classifying the different severity stages of the OA conditions based on the Kellgren-Lawrence (KL) classification grades. A training dataset of 350 knee X-ray images was used and each knee image was assigned a KL grade of between 0 and 4 based on the Atlas of Standard Radiographs. The test results showed that the detection method was able to predict the conditions of moderate OA (KL grade 3) and minimal OA (KL grade 2) with precisions of 91.5% and 80.4%, respectively. A lower accuracy of 57% was, however, predicted for doubtful OA cases (KL grade of 1). In the study by Gornale et al. [35], an ensemble of decision trees was implemented for the detection of OA disorders using knee X-ray images. Features from 200 knee X-ray images were extracted and processed for detection. The performance evaluation results revealed that the detection model was able to effectively detect prevailing OA conditions in the images with an accuracy of 87.92%.

In the study by Li et al. [19], three learning algorithms including logistic regression model, support vector machine, and adaptive boost learning model were investigated for the detection of rheumatic diseases using imaging data. The training dataset used for the study had 10,058 CT images with 5000 samples for sick conditions and 5058 samples for normal conditions. The performance evaluation results for each of the models revealed that the adaptive boost learning model performed better in detecting cases of rheumatic diseases with an accuracy of 90%. Support vector machine learning achieved a detection accuracy of 89%, while the logistic regression model achieved a detection performance of 79.6% of instances of RA diseases.

In Walsh et al. [36], a machine learning model was implemented for the detection of ankylosing spondylitis disorders from EHR dataset. The performance evaluation results confirmed that the detection model was able to adequately identify patients in the EHR with axSpA conditions with an accuracy range between 82.6 and 91.8%. The results show that machine learning techniques could be used for the early diagnosis of axSpA and ankylosing spondylitis in patients.

In the study by Liu et al. [37], medical vision-language pre-training and regularization model (M-FLAG) was implemented to extract information from medical images and textual data from radiography reports for classification, segmentation, and object detection tasks. The model was tested on five public datasets and the results were compared with the baseline medical vision language models. The performance evaluation results revealed that the M-FLAG model performed better over baseline medical vision language pre-training models by reducing the number of parameters by 78%.

So far, almost all supervised and unsupervised learning machine learning-based detection models that have been deployed for the detection of various rheumatic disorders have mostly relied on inputs from laboratory findings and radiographic imaging data, such as MRI and X-rays, for the learning models. In [37], medical data from both textual data and images were used. However, the input sources did not include clinical examination information. Similarly, in [30], demographics, laboratory findings, and activity of RA disease were used. Only isolated studies have used medical data from patient history and clinical examination data as input sources for learning models for the detection of rheumatic disorders. In most situations, the diagnosis of patients with rheumatic diseases requires the physician to follow standard medical procedures, which allows the acquisition of structured data in real time from clinical examinations. Some of these data can come from the patient’s medical history, family history, and medications, which are critical components that may be required in the screening process to establish rheumatic disorders. Furthermore, while many of the studies investigated have focused mainly on the detection of individual rheumatic disorders, which are often required in decisions for confirmation of the prevailing conditions, no documented diagnostic solution that allows for the detection and prediction of characteristic features of rheumatic disorders has been provided for the screening of patients for clinical decisions.

This study, therefore, focuses on using information from clinical examinations, family history, medical history, and demographic-related information, which are acquired directly through interaction with the patient, to detect and classify features of underlying conditions into categories such as RA, OA, SLE, unknown, and no disorder condition. The unknown rheumatic disorder condition defines a condition of comorbidity or other rheumatic disorders outside the three primary disorders. In the case of comorbidity condition, the solution provides the percentage probability contribution of the different disorders for informed decision making. This machine learning-based software solution is quite significant, as it presents primary healthcare professionals or service providers with the opportunity to rapidly screen patients for prevailing rheumatic disorders for further diagnosis and intervention measures. The solution is useful in resource-limited environments with limited available experts and resources. As an automated solution, it can also serve as a useful resource for training healthcare personnel.

3. Detection Model Design and Development

The operational flow diagram of the rheumatic disorder detection system shown in Figure 1 has three functional modules consisting of the data acquisition system, the detection classification system, and the user application system. The data acquisition system has a user interface (UI) system that allows for the acquisition of patient’s raw data, such as demographic information, medical and family history, and physical examination information. The detection and classification system performs three main functions including the selection of rheumatic features and processing, the machine learning model, and the clinical decision and reporting. In the feature processing operation, the characteristic features of the rheumatic disorders are extracted from the captured patient’s data which are used to train the machine learning model to learn the characteristics of the disorders.

The machine learning model, which is the pivot of the detection system, is trained and calibrated by fine-tuning the hyperparameters using a large sample of records for the detection and prediction task of rheumatic disorders. Once the model has been adequately trained, when the input features of a patient are presented to the trained model, the decision and reporting system generates a clinical decision on the patient by specifying the prevailing disorder and the probability distribution of the various disorders according to the presented features. The user application system contains the user interface system and the database of the records of patients. The database, which is on a server processes the requests received from users (physicians and related healthcare personnel) for tasks such as updating, deleting, creating, and retrieving the patient records. The application system also interfaces with the machine learning model for inference requests and processing for the detection of conditions of patients.

3.1. Dataset for Model Development

The first step in the development of the machine learning model for the rheumatic detection system is the acquisition of a relevant and adequate dataset of diagnosed conditions of patients for the model training. Since the performance of machine learning models is highly dependent on the quality, amount, depth, and appropriateness of the data used [38], the rheumatic dataset used must be large enough with features that can provide adequate learning for accurate prediction of the disorders. Since the available dataset at the Rheumatology Unit for the study was not large enough to achieve a good model, we used a synthetic dataset for the development, while the dataset available from clinically diagnosed patients was used for model validation. The dataset was generated using a combination of the 2010 American College of Rheumatology/European League Against Rheumatism classification criteria [39] and the knowledge of specialists.

A labelled dataset with 100,000 records of patients containing varying conditions of single and multiple cases of rheumatic disorders was generated for the development. Data generation and distribution for the study were based on historical records of diagnosed rheumatic cases from the Rheumatology Centre for 2022. Table 1 and Table 2 show the reported cases and the diagnosed rheumatic disorders from January to July 2022.

The records in Table 2, show that RA and SLE were the main rheumatic cases, accounting for 72.65% of the total number of cases diagnosed during the period with OA cases accounting for only 2.75%. The total contribution of other rheumatic disorders of cases such as Sarcoidosis, Mixed connective tissue diseases (MCTDx), Gouty arthritis, Scleroderma, Spondylosis, Juvenile idiopathic arthritis (JIA), Inflammatory arthritis, Sjogren syndrome, Dermatomyositis, Antiphospholipid syndrome, Granulomatosis, Psoriatic arthritis, Vasculitis, Chronic pain syndrome, and Muscle injury accounted for 24.58%. The diagnosed records in Table 1 also reveal that the male patients accounted for only 11.22% of the total cases. The data distribution for the model development was therefore structured to reflect these statistical records. Table 3 and Table 4 show the data distribution by age groups and categories of disorders used for the development.

To generate the data, we considered the diagnosis process as a binary classification problem where instances of clinical examinations were encoded in the binary format a logical ‘0’ or ‘1’, where ‘0’ represents the absence of a rheumatic characteristic or ‘NO’, and ‘1’ represents the presence of a characteristic or ‘YES’. For example, in the clinical examination of a patient, the response to ‘feet swelling’ can be captured as ‘1’ or ‘0’, depending on whether this characteristic is present or absent in the patient. To generate a patient record of a diagnosed case, we first identified all features associated with each of the disorders and assigned each feature in a disorder with a score of ‘1’, ‘0’, or ‘−1’, depending on the degree of influence of the feature on the particular disorder. A score of ‘1’ defines the feature as a primary feature, while a score of ‘0’ and ‘−1’, respectively, define the feature as a secondary positive feature and a secondary negative feature. For example, in a particular record of a patient, the feature ‘pain in elbows’ for RA disorder would be assigned a score of ‘1’ as a primary feature, while the feature ‘weight changes’ would be assigned a score of ‘0’ as a secondary positive feature, and the feature ‘functional difficulty’ would be assigned a score of ‘−1’ as a secondary negative or counter diagnostic feature. Similarly, the ‘time of day pain worsens’ feature for SLE disorder would be assigned a score of ‘1’ as a primary feature for SLE while the ‘fever’ feature would as assigned a score of ‘0’ and the ‘joint locking’ feature would be assigned a score of ‘−1’.

Based on the defined feature score coding system, the dataset for the different disorders was generated using the piecewise functions defined in Equations (1)–(3) where x, y, and z represent the scores for the RA, OA, and SLE disorders, respectively.

f (x) = \{\begin{matrix} 1, x > 12 \\ 0, o t h e r w i s e \end{matrix}

(1)

f (y) = \{\begin{matrix} 1, y > 8 \\ 0, o t h e r w i s e \end{matrix}

(2)

f (z) = \{\begin{matrix} 1, 12 \leq z \leq 18 \\ 0, o t h e r w i s e \end{matrix}

(3)

For RA disorders, a threshold score T_RA, greater than 12 (i.e., T_RA > 12) was established for all the rheumatic features associated with RA disorder. Thus, for a given diagnosed record of a patient to be considered an RA case, then all the features contained in that record must satisfy the T_RA score criterion. Similarly, a threshold score T_SLE greater than 18 (i.e., T_SLE > 18) was established for the SLE disorder, while for the OA disorder, a threshold score T_OA greater than 8 (i.e., T_OA > 8) was established. In the case of the unknown disorder (i.e., rheumatic disorder types outside the scope of this study), no threshold score was established for this category. This means that an unknown disorder case would be determined when more than one of the three disorders is detected. A sample data distribution for RA disorder is shown in Table 5 for 10 records of patients for the case of 7 features. In Table 5, if the score of the features of a given patient record is greater than 12 as in Equation (1), then the condition for that patient is determined as positive for RA, otherwise, the condition is considered negative for RA. Based on this approach, four different datasets were generated, 3 for the classification of the various rheumatic disease types (binary classification task) and 1 for multi-label classification.

3.2. Feature Selection and Processing

In the feature selection process, we first identified all possible features of each of the disorders [25,40] and ensured that the features were clearly defined to avoid misdiagnosis. All the markers associated with each of the disorders were first identified followed by identification of the features that characterize each of the markers. After analysis and evaluation of the features, 6 key markers that define the disorders were established, and these are; (1) pain, (2) swelling, (3) stiffness, (4) skin rashes and lesions, (5) systemic symptoms, and (6) patient history. Following evaluation of the features for the markers, 5 characteristic features were identified for the pain marker, while 8 features were identified for the swelling marker, 3 features for the stiffness marker, 3 features for the systemic symptoms marker, and 6 features for the history marker. Figure 2 presents the hierarchy of features for the key markers. For example, the primary features that were identified for the pain marker include pain area, onset, character, radiation, association, time course, ER factors, and level of severity. Sub-features that characterize the primary features were also identified. For example, sub-features that were identified for the pain area primary feature include; knuckle joints, wrist joints, feet joints, shoulder joints (left and right), elbows, knees, and ankles. In the case of the time course feature, sub-features such as worsening, improving, and fluctuating were identified.

An important consideration in the feature identification process is the issue of feature overlap, which has the potential to lead to misdiagnosis of the disorders. For example, the pain marker has features that fall under the categories of inflammatory and non-inflammatory disorders. Generally, the effects of pain markers features of inflammatory disorders such as RA disorder [2,41] are usually noticed at rest, but these effects tend to improve with movement [1,25]. On the contrary, the effects of features of pain markers that are affiliated with non-inflammatory disorders such as OA and other degenerative disorders tend to worsen with movement and are usually better at rest [25]. Such subtle differences in the features of the pain marker for these two disorder categories can lead to misdiagnosis if the features are not properly identified. Another important consideration in the feature identification process is the distribution of joint pain and swelling, which is mainly used to determine the level of disease activity and global assessment of the level of pain (measured on a scale of 0 to 10). All these feature characteristics and others, such as the patient’s skin, eyes, lungs, heart, etc., were considered discriminants for the detection and prediction of the different disorders.

Feature selection and processing is one of the critical processes in machine learning model development to achieve efficient and effective learning models. Although the raw data that are used for the learning model may have many features that define the characteristics of the data, not all the features will play a significant role in the learning process. Moreover, using too many features to represent the data can also affect the performance of the model and increase computational time. Using the feature hierarchy in Figure 2, we proceeded to select the set of features that are relevant and adequately represent the data to reduce computational time and the learning process. For example, features such as years of smoking, nature of pain, etc., were considered less significant to the learning model and were subsequently removed from the set of features identified for the disorders. The feature space for the rheumatic disorders was thus reduced to a total of 45 relevant features, which were denoted as F1, F2, …, F45, for the model development. Table 6 shows the list of features that were identified from the disorders and selected for the learning model.

The selected features defining the disorders were further organized into three disorder classes comprising inflammatory disorders, non-inflammatory disorders, and hybrid disorders, for the mapping of the features for unique identification. All features that are related to RA, SLE, and similar conditions were classified under inflammatory disorder, while all features related to OA and other degenerative disorders were classified as non-inflammatory disorders. The hybrid class contains combinations of features such as RA + OA, SLE + OA, RA + SLE + OA, and other disorders. This category accounts for conditions where the features come from both inflammatory and non-inflammatory disorders, as well as other rheumatic disorders that are not part of the three primary disorders.

3.3. Detection Model Development and Training

An artificial neural network is one of the machine learning techniques that can be used to automate decision processes. The appropriateness of a neural network to respond to a problem-solving task largely depends on the hyper-parameters of the network, which are the settings that govern the behaviour of the algorithm. Figure 3 shows the architectural diagram of the multilayer neural network (MLNN) used for the detection model. The MLNN has an input layer with neurons, multiple hidden layers, and an output layer. The number of neurons at the input layer was designed to correspond to the total number of extracted features, in this case, 45 features, while the number of neurons in the output layer was limited to the number of categories into which the features are to be classified, i.e., 4 categories comprising the RA, OA, SLE, Unknown, and No condition. The number of hidden layers and the number of neurons in each hidden layer were determined through the network design and training process.

To establish the most suitable MLNN architecture for adoption, the parameters of the network such as the number of hidden layers and neurons in each layer, learning rate, optimization method, and length of training data or number of epochs for training, were each varied and their effects were observed by repeating the training samples for each parameter variation. We thus started with the basic network architecture with fixed input and output layer neurons and one hidden layer whose number of neurons was varied one step at a time (and the number of hidden layers), and at each step, the network was trained and tested as the training error was minimized. The adaptive moment estimation (ADAM) optimization technique, which is a stochastic-based gradient descent learning algorithm, was used for the learning. This algorithm, which dynamically tunes and updates the learning rates iteratively for stable learning, is suitable for handling sparse gradients on noisy problems. The algorithm computes the adaptive learning rates of the network and stores an exponential decay average of the past squared gradients [42]. The weights of the neuron in the network are adjusted in the training process using Equation (4):

w_{i j} (k + 1) = w_{i j} (k) - \frac{α (k)}{[δ + \sqrt{v^{*} (k + 1)}]} \times m^{*} (k + 1)

(4)

where w_ij(.) is the weight matrix of the network between layer i and j, k is the time index, a(k) is the learning rate at the k-th time, δ is a tolerance value that prevents any division by zero in the implementation, and m*(k + 1) and v*(k + 1) are respectively, the biased corrections of the exponential moving average of the first gradient and the squared gradient, which are defined as:

m^{*} (k + 1) = \frac{1}{[1 - β_{1} (k)]} \times m (k + 1) .

(5)

v^{*} (k + 1) = \frac{1}{[1 - β_{2} (k)]} \times v (k + 1) .

(6)

The parameter β₁ controls the exponential decay rate for the first-moment estimate, m(k + 1) while β₂ controls the decay rate for the second moment estimate, v(k + 1). During training, the optimization algorithm derives the learning rates for the network from the gradients g(w_ij), of the estimates of the first and second moments, m(k + 1) and v(k + 1), of the and are updated according to the equation:

m (k + 1) = β_{1} m (k) + (1 - β_{1}) g (w (k))

(7)

v (k + 1) = β_{2} v (k) + (1 - β_{2}) g (w (k)) \times g (w (k))

(8)

The MLNN model was trained and tested using the extracted features of the rheumatic disorders in the dataset. The TensorFlow Version 2 was used for the development. Two separate datasets were used for the development: a training dataset containing 70,000 records (70% of the dataset) and a testing and validation dataset with 30,000 records (30% of the dataset). The advantage of using two separate datasets instead of a single dataset for training and testing of the model is the possibility of having datasets with different feature distributions and characteristics. This is useful for testing the robustness of detection models, as some of the features may not be known to the model during the training process. The rectified linear unit (ReLu) activation function was used for the neurons in the hidden layers while the logistic sigmoid activation function was used for the output layer neurons.

The first step in the training process is to ensure that the dataset has balanced number of samples to avoid skewness of the prediction towards the majority class. Since the data set was generated synthetically and not by natural distribution, a fairly balanced class distribution was maintained at generation based on the number of samples for each disorder. We did not have to employ resampling techniques to oversample the minority class or undersample the majority class.

In training the model, we started with the single MLP architecture with 5 neurons in the hidden layer and a learning rate of 0.01. The model was trained and the hyper parameters were tuned and the performance and model computational time observed. The number of neurons in the layer was increased gradually while the model trained at a learning rate of 0.01 and at each step, the performance was evaluated. The learning rate was subsequently varied incrementally from 0.01 to 0.60 for the single MLNN architecture and for each learning rate, the performance was observed. The number of hidden layers was also increased one step at a time with varied number of neurons in the layers and for each training process, the learning rates were also varied and the performance observed. Table 7 shows the parameter settings that were used for the MLNN model development while Table 8 presents a summary results of the different MLNN architectures tested. Figure 4 also shows the results of the MLNN model performance and learning rates for the different MLNN architectures and parameters.

It is evident from the development results in Table 8 that an MLNN model architecture with 45 neurons at the input layer, two hidden layers with 10 neurons and 15 neurons in the first layer and second layer, respectively, and an output layer with 5 neurons produced the best performance at a learning rate of 0.05, accuracy of 97.48%, and a model computational time of 5.60 s. This MLNN architecture was subsequently adopted and implemented for the rheumatic disorder detection and classification task.

3.4. Detection Model Performance Evaluation

Detection performance metrics are often used to evaluate the performance capability of machine learning detection models in terms of the correctness of the classification of the features for detection. Having a good training data with balanced class distribution is one of the most challenging aspects of machine learning classifier models. When the class distribution is not balanced, we have one class becoming much larger than the other members, a situation that results in the model giving predictions based on the value of the majority classes with high classification accuracy but with low predictive power [43]. In this case, the classifier will be skewed towards the majority class and achieve poor classification rates based on the minority classes. Considering the imbalanced nature of the class distribution in the dataset, we evaluated the performance using evaluation metrics of accuracy (ACC), precision (PRE), recall (REC), specificity (SPE), and F1 score (F1). These metrics were determined by computing the number of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) using the test dataset. TP denotes the number of the test instances classified correctly with their true classes, whereas TN represents the number of test instances that belong to the negative class and have been predicted correctly. The FP denotes the number of test instances that have been incorrectly classified as positive, but their actual class is negative whereas the FN defines the number of test instances that have been incorrectly classified as negative but their actual class is positive. The performance indicators were calculated as follows:

A C C = \frac{T P + T N}{(T P + F P + T N + F N)}

(9)

P R E = \frac{T P}{(T P + F P)}

(10)

R E C = \frac{T P}{(T N + F N)}

(11)

S P E = \frac{T N}{(T N + F P)}

(12)

F 1 = \frac{2 \times P R E \times R E C}{(P R E + R E C)}

(13)

Table 9 shows the summary performance evaluation results of the detection system. When features in the test dataset were applied to the detection model after training, the model was able to correctly classify 97.45% of the instances of the RA conditions with a 2.55% incorrect classification. Similarly, the model was able to detect 97.46% of the instances of the OA conditions with a 2.54% incorrect detection. The SLE also showed a detection accuracy of 97.46% of the instances with 2.54% incorrect detection of the conditions. In general, the model was able to achieve an average detection accuracy of 97.48% for the four disorder classes, namely, RA, OA, SLE, and unknown disorder. The reasonably high average sensitivity or recall rate of 96.80%, reveals the ability of the detection system to adequately classify a high proportion of the instances of patients who have rheumatic disorders to their correct respective classes. Similarly, the high average specificity value of 97.50% also confirms the low false positive rate performance ability of the detection model.

4. User Application Software Development

To implement the rheumatic detection system for useability, an application software (designed for both web and mobile) was developed that allows the users of the system to carry out assessments on patients, make diagnoses, predict prevailing disorders affecting patients, and store and update the records of patients for management. The application software has two major functional components, comprising the user application system and the data management system. These two major systems are integrated to provide the application service required and also to ensure the security of the records of the patients. Figure 5 presents the architectural diagram of the application software system that was implemented for the rheumatic disorder detection system.

4.1. User Application System

The user application system, which is the front end of the application software, provides an interface system for data acquisition during the clinical examination process. The user application was built with JavaScript and optimized for running in a mobile environment using Apache Cordova. The application system has features that allow the user to perform functions such as authentication, route authentication, diagnose patients for prevailing rheumatic disorder, create new patients, view patient results and records, update patient records after examination, and delete patient records or diagnosis details. The backend component of the application was built with Node.JS, which is a JavaScript runtime built on Chrome’s V8 engine. The back-end receives requests from the front-end application, processes the requests, and gives the response back in JSON format to the front-end. Figure 6 shows the functional flow diagram of the application system. The user first registers for the service by providing personal information and a unique medical identification number, which is a key requirement for authentication before access is granted to use the diagnostic tool for medical assessments of patients.

Figure 7 presents a sample user dashboard that shows historical information on the cases of rheumatic disorders that have been diagnosed and a summary report that gives the total cases reported with the distribution of disorders diagnosed. Figure 8 shows a sample user interface to perform clinical examinations of patients. The standard procedures for diagnosing rheumatic disorders are followed for the diagnosis where the user selects the appropriate features that correspond to the response of a patient. The features are then passed on as input to the detection model for determination of the existence or otherwise of a rheumatic disorder. Following analysis of the features by the detection model, a prediction of the prevailing disorder is made, and the decision is presented to the user in the form of the probability of occurrence of the disorder. Based on the diagnosed condition, a report is generated and recommendations made on the patient are provided to the user. The results of the diagnosis on each patient are also added to the patient records of the patient together with the activities that were carried out on the patient and stored in the data management system.

4.2. User Data Management

The data management system consists mainly of the database required to meet the functional requirements of the user application. Based on the user requirements analysis for the design, six entities were identified that included system administrator, request, clinician, records, patient, and reports. The entities and their relationships were modeled as shown in Figure 9. Object Document Mapping (ODM) was used to implement the database system in MongoDB, due to its flexibility in providing straightforward schema-based solutions for data modeling. Since MongoDB is a non-relational database, most of the relationships were implemented by embedding one entity (a document in this case) into another. This approach prevents join-operations during the querying of data, which results in less computational cost.

One of the challenges of managing patient electronic records in online services is how to effectively protect records to avoid leakages. One of the popular approaches is to employ encryption techniques to make the data in the database unintelligible to unauthorized users. To guarantee secure data exchange between the user application and the back-end application on the server, we implemented the blockchain technology [41,44], which is one of the emerging solutions to manage complex data and provide secure data exchange. The blockchain was used as a mechanism for securing patients data primarily due to two of its features; immutability and decentralized nature. The blockchain is an append-only structure that keeps records of every activity or transaction that takes place on the network. In this case, all activities that take place on the network such as creating patients, updating patients records, deleting patients, initiating new diagnosis, etc., are recorded and stored across multiple nodes, which eliminates a single point of failure. When a request for diagnosis is sent to the detection system and an inference is made, the results of the process are stored and synchronized across all the nodes. Due to its flexibility, efficiency, and ease of use, blockchain technology has now become an important tool in EHR management [31,45,46].

5. Experiments and Results

To demonstrate the functional capability of the detection system, different forms of tests were carried out in experiments: numerical simulation using blind tests which mimic the simulation of features obtained from patients, and retrospective tests using clinically diagnosed data from patients. The user application software was implemented as a web application and optimized to run in a mobile environment with Apache Cordova. The mobile device used for the experiment was a Samsung Galaxy S6, while the PC machine used was an HP Pro-book computer with an Intel Core i5 CPU, 2.70 GHz processor speed, and 8 GB RAM. The machine learning model was trained and tested using the Google Collab hosted runtime that provided 12 GB RAM and 108 GB disk space on the Google Cloud platform.

5.1. Blind Tests Simulation Experiment

In the blind tests experiment, rheumatologists from the Department of Medicine and Therapeutics (University of Ghana Medical School) conducted a series of trials to validate the ability of the detection system to accurately detect and predict the prevailing rheumatic conditions. The scenarios for the experiments were created by the rheumatologists based on the patient’s presentations for the different diagnoses. The rheumatologists used the user application on the Samsung Galaxy S6 smartphone to conduct clinical scenarios on patients by blindly selecting features that correspond to different target conditions. Blind test data was generated for 10 patients comprising 5 males and 5 females with varying rheumatic conditions and cases of healthy conditions (no disorder). Table 10 shows sample examination records for 3 blind test cases of patients (Test 1, Test 2, and Test 3) that were presented to the detection model for detection and prediction decision, while Table 11 presents the results of the detection model. Data for the blind tests included data from clinical examination, patient history, and demographic information. The instances from the diagnosis for each blind test of a patient were encoded as logical ‘1’ for the existence of a rheumatic feature and logical ‘0’ for the absence of a feature as discussed in Section 3.1. Each blind test data was presented to the trained detection model for the detection of the underlying condition and prediction of the type of disorder for each patient. The results in Table 11 show that the features of Test 1 were correct in detecting the underlying condition and the type of rheumatic disorder was predicted as unknown as expected. Similarly, the features of Test 2 and Test 3 were each correct at predicting the conditions and predicted them accurately as RA and SLE disorders, respectively.

5.2. Clinical Testing Experiment

To further validate the performance of the detection model, rheumatologists performed tests using retrospective clinically diagnosed medical data of five patients from the Rheumatology Center for the study. The diagnosed data provided on the patients were not sufficiently detailed to allow direct or indirect identification of the patient. Moreover, since the records of patients are protected under the Privacy and Ethical Act, the identities of all the patients whose medical records were used for the experiments were concealed and have not been revealed in this article. Furthermore, since the steps described for patient data acquisition do not fulfill the criteria for therapeutic or research experiments, consent was therefore not required for clinical data as direct contact with the patients and their data was not required.

The clinical experiments were carried out by the rheumatologists by capturing the diagnosed clinical findings of each patient, from clinical examination data, through the patient history (both medical and family) to the demographic data using the user application system as presented in Figure 10. The first clinical experimental data were for a patient with sarcoidosis. The features of the patient’s medical data from the clinical examinations were: pain in wrist joints, pain in knees, pain in ankles, swelling around the wrist, swelling in the feet, duration of pain greater than six weeks, pain in the chest, symmetric pain, stiffness for more than an hour, skin lesions, fatigue, age as 44 years, and gender as male. Although the rheumatic disorder tested was not part of the disorders considered for the study, due to the existence of features that overlap in some of the disorders, the model was able to diagnose the patient using the feature content and classify the condition as an unknown disorder. Figure 11 shows the prediction results of the diagnosis. Based on the features presented, the detection model was able to accurately detect a disorder condition as expected and predicted the prevailing condition of the patient as an unknown disorder with a probability precision of 97.01%. Although the actual condition of the patient was clinically diagnosed as sarcoidosis, which was correctly detected as a rheumatic disorder, the detection model was not able to exactly predict that disorder type mainly because it was not part of the scope of disorders considered for the study. However, since some of the features associated with the sarcoidosis disorder are similar to the symptoms of the SLE disorder, the detection model predicted a 4.87% probability of the detected condition as SLE disorder, which is extremely low to signify the existence of an SLE disorder. The detection results showed a 0% chance of the tested patients suffering from RA or OA conditions. In other subsequent experiments, clinically diagnosed data from patients suffering from RA, OA, and SLE disorders were used and the detection model was able to accurately detect and predict each of the prevailing conditions with probability precision above 97%. The clinical experimental results obtained from all the cases confirm that the detection system was able to accurately detect and predict the prevailing rheumatic disorders of the individual patients with a high degree of accuracy. This confirms the capability of the detection system to serve as a diagnostic tool for screening patients in hospitals for the detection of rheumatic disorders.

5.3. Comparative Analysis of Results

The performance of the proposed rheumatic detection system was compared with the results of five supervised machine learning algorithms that have been documented for the detection of rheumatic disorders. These algorithms include the decision tree (DT), random forest (RF), support vector machine (SVM), Naive Bayes (NB), and k-nearest neighbour (KNN) techniques. Each of the five machine learning algorithms was trained and tested on the same dataset that was used for the detection model (shown in Table 4). 70% of the dataset was used for training while 30% of the dataset was used for testing the model. Each disorder was classified separately in a binary classification task, and the average performance accuracy and computation times were measured. The test results for the Decision Tree algorithm revealed that 77.04% of the test instances of the RA conditions were correctly classified while 78.13% of the SLE disorders were correctly classified and 100% of the instances of the OA were all correctly classified. In the case of Random Forest, the algorithm was able to correctly classify 92.25% of the instances of the RA disorders while 99.32% of the instances of the OA disorders were correctly classified and 85.63% of the SLE disorders correctly classified. Figure 12, Figure 13 and Figure 14, respectively, show sample performance evaluation results for detection of RA disorders from the DT, KNN, and the NB learning algorithms.

The summary performance evaluation results in Table 12 show that the MLNN performed much better in correct detection of the instances of the different rheumatic disorders compared to the other machine learning classifiers with an average overall detection accuracy of 99.71% of the instances. Although the DT classifier produced had the best average computational time of 0.12 s, the low overall detection performance accuracy of 85.06% did not make it competitive enough for adoption. The results also show that the KNN classifier performs poorly with a low detection performance accuracy of 79.75% and extremely high computational time. This is expected, given that the algorithm searches through the entire dataset to find k-elements that are similar to each instance being tested. We can infer from the various performance evaluation results that the three competitive supervised machine learning algorithms that show promise for implementation of such detection systems are the RF, SVM, and the MLNN. These algorithms are able to adequately learn the features of the rheumatic disorders for reasonably high detection accuracy of the disorders. Although all the evaluated machine learning algorithm showed reasonably good performance, this could be attributed to the type of data used, which is essentially logical data (represented in binary format) compared to other data types such as numerical, categorical, or textual data. The good detection performance, however, of rheumatic detection systems that are based on the RF, SVM, and the MLNN have been confirmed by experimental results reported in literature [19,35].

We compared the results of our MLNN-based detection system with the results of related rheumatic detection algorithms that have been reported in literature. Incidentally, many of the results that have been reported only used part of the patient’s medical data, mostly the medical images, as input for the detection of rheumatic disorders. However, very few cases have used a combination of the examination data in addition to the medical images and laboratory findings for the detection of the disorders. So far, little work has been clinical examination evaluation data and patient history as the primary source of input sources for the detection system. These input sources provide the foundation for physicians in establishing their clinical decisions.

Saleem et al. [38] reported a detection accuracy of 97% for OA using computer vision techniques to analyze and classify knee radiographs. Similarly, in the work of Ureten et al. [47], the results of a rheumatic detection algorithm based on YOLO (You Only Look Once) and normal hand radiographs showed an average detection accuracy of 80.6% for RA and OA disorders. The algorithm was able to classify the RA with accuracy, sensitivity, specificity, precision, and AUC results of 90.7%, 92.6%, 88.7%, 89.3%, and 0.97, respectively. In the case of OA, accuracy, sensitivity, specificity, precision, and AUC values of 90.8%, 91.4%, 90.2%, 91.4%, and 0.96, respectively, were achieved. In the study by Olatunji et al. [48], the results for the classification of RA disorders using 30 clinical features and an ensemble voting technique achieved performance accuracy, recall, and precision rates of 94.03%, 96.00%, and 93.51%, respectively. In a related ensemble detection technique for RA by Ho et al. [39], detection accuracy rates of 97.50% and 94.84%, respectively were reported. Zhou et al. [49] also reported an AUC performance accuracy of 95% and sensitivity and specificity rates of 90% and 89%, respectively, for the detection of SLE and NPSLE disorders using machine learning models.

So far, the performance accuracies that have been noted for the rheumatic detection models range from 75% to 97%, which are lower than our performance rate of 97.48%. Although the input data sources used for our rheumatic detection model are not the same as most of the studies reported, nevertheless, the detection accuracy of 97.48% achievable from our MLNN-based detection model shows comparable results in detection performance. Furthermore, in most of the rheumatic detection models, the role of the primary healthcare physicians is limited in the diagnostic process in terms of physical interaction with patients and the acquisition of data for the diagnosis.

6. Conclusions

In this work, we have demonstrated the potential of a supervised machine learning algorithm that is based on MLNN for the detection and prediction of rheumatic disorders. The rheumatic disorders for the study were limited to RA, OA, and SLE disorders. Unlike many rheumatic detection systems that make use of radiographic imaging data and laboratory findings as input sources to the machine learning models, the proposed MLNN-based detection system uses information sources from the medical and family history of the patient and data from the clinical examinations using standard medical procedures. The detection system was implemented and tested in experiments using data from blind tests and clinically diagnosed records of patients. The experimental results showed that the MLNN-based detection model is capable of accurately detecting and predicting prevailing RA, OA, and SLE disorder types in diagnosis with probability rates greater than 97%. The rheumatic detection system achieved a precision rate of 97.48% of the instances with accuracy, sensitivity, and specificity of 99.71%, 96.80%, and 97.50%, respectively.

Unlike many proposed detection models, which mostly focus on only parts of the medical data of patients for the detection of rheumatic conditions, this work provides a platform for most of the data a healthcare physician would require for easy and fast screening of patients before further investigations are conducted to confirm the findings from laboratory and radiographic imaging information. This is a much-needed solution in environments with a limited number of specialists or very high patient-to-specialist ratio and therefore depends largely on primary care physicians in communities. Through the task-shifting concept being promoted in this work, the knowledge from the specialists is shared without compromising on the clinical decisions. This would help in reducing the workload of the specialists as a result of the high patient-to-specialist ratio. Furthermore, the proposed MLNN-based rheumatic detection solution could also serve as a training tool for medical students and related personnel in the diagnosis of rheumatic conditions.

Although the proposed rheumatic detection system focused on three key disorders, the scope of the disorders is being expanded to include other rheumatic disorders such as Sarcoidosis, JIA, MCTDx, Gouty arthritis, Spondylosis, Scleroderma, Sjogren syndrome, Psoriatic arthritis, etc., to provide a holistic model to support clinicians. Furthermore, integrating other medical information from laboratory findings and imaging investigations into the current detection model could further boost the clinical decisions from the detection system. Additionally, since real medical data of patients are important in increasing confidence in the prediction of results, a useability study will help refine and improve on the quality of the algorithm performance. To empower patients, their engagements and adherence to treatment plans, the solution will be integrated with an interface system that could allow the patients the opportunity to monitor and track changes in their symptoms over time as well as event triggers to automate alerts to the patients.

Author Contributions

Conceptualization, G.A.M. and D.D.; methodology, G.A.M., D.D., M.K. and A.Y.; software, M.K., A.Y. and K.B.; validation, G.A.M., D.D., M.K. and A.Y.; formal analysis, G.A.M., D.D., M.K. and A.Y.; investigation, M.K., A.Y. and K.B.; resources, G.A.M., D.D., M.K., A.Y. and K.B.; data curation, D.D., M.K. and A.Y.; writing—original draft preparation, M.K. and A.Y.; writing—review and editing, G.A.M. and D.D.; visualization, G.A.M., D.D. and K.B.; supervision, G.A.M. and D.D.; project administration, G.A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. It was financed by the authors.

Institutional Review Board Statement

Ethical review and approval were waived at this stage of the study since the clinical scenarios were created by rheumatologists based on patient presentations for different diagnosis. The steps described in the work do not fulfill the criteria of a therapeutic experiment. Therefore, the work was not subjected to the assessment of the ethics committee of the Teaching Hospital.

Informed Consent Statement

Patient consent was not needed for the clinical data as it did not require direct contact with the patient nor their data and the patients’ personal information were anonymized in the article. The data provided were not sufficiently detailed to allow direct or indirect identification of the patients anywhere.

Data Availability Statement

The raw data supporting the conclusions of this study will be made available by the authors on request.

Acknowledgments

The authors gratefully acknowledge the support from the Rheumatology Unit at the Korlebu Teaching Hospital for this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Majithia, V.; Geraci, S.A. Rheumatoid arthritis: Diagnosis and management. Am. J. Med. 2007, 120, 936–939. [Google Scholar] [CrossRef]
Deane, K.D.; Demoruelle, M.K.; Kelmenson, L.B.; Kuhn, K.A.; Norris, J.M.; Holers, V.M. Genetic and environmental risk factors for rheumatoid arthritis. Clin. Rheumatol. 2017, 31, 3. [Google Scholar] [CrossRef]
Hügle, M.; Omoumi, P.; Van Laar, J.M.; Boedecker, J.; Hügle, T. Applied machine learning and artificial intelligence in rheumatology. Rheumatol. Adv. Pract. 2020, 4, rkaa005. [Google Scholar] [CrossRef]
Kumar, P.; Alok, R.; Das, S.K.; Srivastava, R.; Agarwal, G.G. Distribution of rheumatological diseases in rural and urban areas: An adapted COPCORD Stage I Phase III survey of Lucknow district in north India. Int. J. Rheum. Dis. 2018, 21, 1894–1899. [Google Scholar] [CrossRef]
Hazes, J.M.W.; Luime, J.J. The epidemiology of early inflammatory arthritis. Nat. Review. Rheumatol. 2011, 7, 381–390. [Google Scholar] [CrossRef]
da Mota, L.M.H.; dos Santos Neto, L.L.; de Carvalho, J.F.; Pereira, I.A.; Burlingame, R.; Ménard, H.A.; Laurindo, I.M.M. The presence of anti-citrullinated protein antibodies (ACPA) and rheumatoid factor on patients with rheumatoid arthritis (RA) does not interfere with the chance of clinical remission in a follow-up of 3 years. Rheumatol. Int. 2012, 32, 3807–3812. [Google Scholar] [CrossRef]
Agrawal, S.; Misra, R.; Aggarwal, A. Autoantibodies in rheumatoid arthritis: Association with severity of disease in established RA. Clin. Rheumatol. 2007, 26, 201–204. [Google Scholar] [CrossRef]
Shamir, L.; Ling, S.M.; Scott, W.W., Jr.; Bos, A.; Orlov, N. Knee X-ray image analysis method for automated detection of osteoarthritis. IEEE Trans. Biomed. Eng. 2009, 56, 407–415. [Google Scholar] [CrossRef]
Kourilovitch, M.; Galarza-Maldonado, C.; Ortiz-Prado, E. Diagnosis and classification of rheumatoid arthritis. J. Autoimmun. 2014, 48–49, 26–30. [Google Scholar] [CrossRef]
Mills, G.A.; Pomary, P.; Togo, E.; Sowah, R.A. Detection and management of P2P traffic in networks using artificial neural networks. J. Netw. Syst. Manag. 2022, 30, 26. [Google Scholar] [CrossRef]
Alam, A.; Ahamad, M.K.; Mohammed Aarif, K.O.; Anwar, T. Detection of rheumatoid arthritis using CNN by transfer learning. In Artificial Intelligence and Autoimmune Diseases. Studies in Computational Intelligence; Raza, K., Singh, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2024; Volume 1133. [Google Scholar] [CrossRef]
Hassanzadeh, T.; Shamonin, D.P.; Li, Y.; Krijbolder, D.I.; Reijnierse, M.; van der Helm-van Mil, A.H.M.; Stoel, B.C. A deep learning-based comparative MRI model to detect inflammatory changes in rheumatoid arthritis. Biomed. Signal Process. Control 2024, 88, 105612. [Google Scholar] [CrossRef]
Khatoon, M.M.; Singh, B.R.N.; Harshita, M.S.; Sreeja, K.; Reddy, S.S.; Latha, J.S. Automated diagnosis of rheumatoid arthritis based on CNN. In Proceedings of the International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI), Chennai, India, 25–26 May 2023; pp. 1–5. [Google Scholar] [CrossRef]
Sakaria, S.; Jain, S.; Rana, M.K. Rheumatoid arthritis predictor using ML techniques and explainable AI. In Proceedings of the 2023 International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Ballar, India, 29–30 April 2023; pp. 1–7. [Google Scholar] [CrossRef]
Sundaramurthy, S.C.; Kshirsagar, P. Prediction and classification of rheumatoid arthritis using ensemble machine learning approaches. In Proceedings of the 2020 International Conference on Decision Aid Sciences Application (DASA), Sakheer, Bahrain, 8–9 November 2020; pp. 17–21. [Google Scholar] [CrossRef]
Khan, A.; Usman, M. Early diagnosis of Alzheimer’s disease using machine learning techniques: A review paper. In Proceedings of the 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), IEEE, Lisbon, Portugal, 12–14 November 2015; Volume 1, pp. 380–387. [Google Scholar]
Li, Y.; Hassanzadeh, T.; Shamonin, D.P.; Reijnierse, M.; van der Helm-van Mil, A.H.M.; Stoel, B.C. Rheumatoid arthritis classification and prediction by consistency-based deep learning using extremity MRI scans. Biomed. Signal Process. Control 2024, 91, 105990. [Google Scholar] [CrossRef]
Yoo, J.; Lim, M.K.; Ihm, C.; Choi, E.S.; Kang, M.S. A study on prediction of rheumatoid arthritis using machine learning. Int. J. Appl. Eng. Res. 2017, 12, 9858–9862. [Google Scholar]
Li, Y.; Zhao, L. Application of machine learning in rheumatic immune diseases. J. Healthc. Eng. 2022, 2022, 9. [Google Scholar] [CrossRef] [PubMed]
Jiang, M.; Li, Y.; Jiang, C.; Zhao, L.; Zhang, X.; Lipsky, P.E. Machine learning in rheumatic diseases. Clin. Rev. Allergy Immunol. 2021, 60, 96–110. [Google Scholar] [CrossRef] [PubMed]
Kim, K.J.; Tagkopoulos, I. Application of machine learning in rheumatic disease research. Korean J. Intern. Med. 2019, 34, 708–722. [Google Scholar] [CrossRef] [PubMed]
Ceccarelli, F.; Lapucci, M.; Olivieri, G.; Sortino, A.; Natalucci, F.; Spinelli, F.R.; Alessandri, C.; Sciandrone, M.; Conti, F. Can machine learning models support physicians in systemic lupus erythematosus diagnosis? Results from a monocentric cohort. Jt. Bone Spine 2022, 89, 105292. [Google Scholar] [CrossRef]
Bas, S.; Genevay, S.; Meyer, O.; Gabay, C. Anti-cyclic citrullinated peptide antibodies, IgM and IgA rheumatoid factors in the diagnosis and prognosis of rheumatoid arthritis. Rheumatology 2003, 42, 677–680. [Google Scholar] [CrossRef]
Deo, R.C. Machine learning in medicine. Circulation 2015, 132, 1920–1930. [Google Scholar] [CrossRef]
Kay, J.; Upchurch, K.S. ACR/EULAR 2010 Rheumatoid Arthritis classification criteria. Rheumatology 2012, 51, vi5–vi9. [Google Scholar] [CrossRef]
Momtazmanesh, S.; Nowroozi, A.; Rezaei, N. Artificial intelligence in rheumatoid arthritis: Current status and future perspectives: A state-of-the-art review. Rheumatol. Ther. 2022, 9, 1249–1304. [Google Scholar] [CrossRef]
Orange, D.E.; Agius, P.; DiCarlo, E.F.; Robine, N.; Geiger, H.; Szymonifka, J.; McNamara, M.; Cummings, R.; Andersen, K.M.; Mirza, S.; et al. Identification of three rheumatoid arthritis subtypes by machine learning integration of synovial histologic features and RNA sequencing data. Arthritis Rheumatol. 2018, 70, 690–701. [Google Scholar] [CrossRef]
Jamian, L.; Wheless, L.; Crofford, L.J.; Barnado, A. Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in electronic health record. Arthritis Res. Ther. 2019, 21, 305. [Google Scholar] [CrossRef]
Jorge, A.; Castro, V.M.; Barnado, A.; Gainer, V.; Hong, C.; Cai, T.; Carroll, R.; Denny, J.C.; Crofford, L.; Constenbader, K.H.; et al. Identifying lupus patients in electronic health records: Development and validation of machine learning algorithms and application of rule-based algorithms. Semin. Arthritis Rheum. 2019, 49, 84–90. [Google Scholar] [CrossRef]
Norgeot, B.; Glicksberg, B.S.; Trupin, L.; Lituiev, D.; Gianfrancesco, M.; Oskotsky, B.; Schmajuk, G.; Yazdany, J.; Butte, A.J. Assessment of a deep learning model based on electronic health record data to forecast clinical outcomes in patients with rheumatoid arthritis. JAMA Netw. Open 2019, 2, e190606. [Google Scholar] [CrossRef]
Elkin, P.L.; Schlegel, D.R.; Anderson, M.; Komm, J.; Ficheur, G.; Bisson, L. Artificial Intelligence: Bayesian versus Heuristic method for diagnostic decision support. Appl. Clin. Inform. 2018, 9, 432–439. [Google Scholar] [CrossRef]
Dang, S.D.H.; Allison, L. Using deep learning to assign rheumatoid arthritis scores. In Proceedings of the 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science, Las Vegas, NV, USA, 11–13 August 2020; pp. 399–402. [Google Scholar] [CrossRef]
Vodencarevic, A.; Tascilar, K.; Hartmann, F.; Reiser, M.; Hueber, A.J.; Haschka, J.; Bayat, S.; Meinderink, T.; Knitza, J.; Mendez, L.; et al. Advanced machine learning for predicting individual risk of flares in rheumatoid arthritis patients tapering biologic drugs. Arthritis Res. Ther. 2021, 23, 67. [Google Scholar] [CrossRef]
Zhou, S.M.; Fernandez-Gutierrez, F.; Kennedy, J.; Cooksey, R.; Atkinson, M.; Denaxas, S.; Siebert, S.; Dixon, W.G.; O’Neill, T.W.; Choy, E.; et al. Defining disease phenotypes in primary care electronic health records by a machine learning approach: A case study in identifying rheumatoid arthritis. PLoS ONE 2016, 11, e0154515. [Google Scholar] [CrossRef]
Gornale, S.S.; Patravali, P.U.; Manza, R.R. Detection of osteoarthritis using knee x-ray image analyses: A machine vision based approach. Int. J. Comput. Appl. 2016, 145, 20–26. [Google Scholar] [CrossRef]
Walsh, J.A.; Rozycki, M.; Yi, E.; Park, Y. Application of machine learning in the diagnosis of axial spondyloarthritis. Current Opin. Rheumatol. 2019, 31, 362–367. [Google Scholar] [CrossRef]
Liu, C.; Cheng, S.; Chen, C.; Qiao, M.; Zhang, W.; Shah, A.; Bai, W.; Arcucci, R. M-FLAG: Medical vision-language pre-training with frozen language models and latent space geometry optimization. arXiv 2023, arXiv:2307.08347. [Google Scholar]
Saleem, M.; Farid, M.S.; Saleem, S.; Khan, M.H. X-ray image analysis for automated knee osteoarthritis detection. SIViP 2020, 14, 1079–1087. [Google Scholar] [CrossRef]
Ho, S.; Elamvazuthi, I.; Lu, C.K. Classification of rheumatoid arthritis using machine learning algorithms. In Proceedings of the 2018 IEEE 4th International Symposium in Robotics and Manufacturing Automation (ROMA), Perambalur, India, 10–12 December 2018; pp. 345–350. [Google Scholar]
Hassan, R.; Faruqui, H.; Alquraa, R.; Eissa, A.; Aishaiki, F.; Cheikh, M. Classification criteria and clinical guidelines for rheumatic diseases. In Skills in Rheumatology [Internet]; Springer: Singapore, 2021; Chapter 25. [Google Scholar] [CrossRef]
Korte, S.M.; Straub, R.H. Fatigue in inflammatory rheumatic disorders: Pathophysiological mechanisms. Rheumatology 2019, 58 (Suppl. 5), v35–v50. [Google Scholar] [CrossRef] [PubMed]
Kingma, D.P.; Ba, J.L. ADAM: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representation (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Sowah, R.A.; Agebure, M.A.; Mills, G.A.; Koumadi, K.M.; Fiawoo, S.Y. New cluster undersampling technique for class imbalance learning. Int. J. Mach. Learn. Comput. 2016, 9, 205–214. [Google Scholar] [CrossRef]
Kaur, S.; White, S.; Bartold, M. Periodontal disease as a risk factor for rheumatoid arthritis: A systematic review. JBI Libr. Syst. Rev. 2012, 10, 1–12. [Google Scholar] [CrossRef]
Zhang, P.; Walker, M.A.; White, J.; Schmidt, D.C.; Lenz, G. Metrics for assessing blockchain-based healthcare decentralized apps. In Proceedings of the 2017 IEEE 19th International Conference on e-Health Networking, Applications, and Services, Healthcom, Dalian, China, 12–15 October 2017; pp. 1–4. [Google Scholar]
Da Conceição, A.F.; Da Silva, F.S.C.; Rocha, V.; Locoro, A.; Barguil, J.M. Electronic Health Records using Blockchain Technology. Cornell University. April 2018. Available online: https://arxiv.org/abs/1804.10078v1 (accessed on 24 April 2024).
Üreten, K.; Maraş, H.H. Automated classification of rheumatoid arthritis, osteoarthritis, and normal hand radiographs with deep learning methods. J. Digit. Imaging 2022, 35, 193–199. [Google Scholar] [CrossRef]
Olatunji, S.O.; Alansari, A.; Alkhorasani, H.; Alsubaii, M.; Sakloua, R.; Alzahrani, R.; Alsaleem, Y.; Almutairi, M.; Alhamad, N.; Alyami, A.; et al. A novel ensemble-based technique for the preemptive diagnosis of rheumatoid arthritis disease in the eastern province of Saudi Arabia using clinical data. Comput. Math. Methods Med. 2022, 2022, 2339546. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, M.; Zhao, S.; Yan, Y. Machine learning for diagnosis of systemic lupus erythematosus: A systematic review and meta-analysis. Comput. Intell. Neurosci. 2022, 2022, 7167066. [Google Scholar] [CrossRef]

Figure 1. Implementation block diagram for the rheumatic disorder detection system.

Figure 2. Markers and features of rheumatic disorders for classification.

Figure 3. MLNN architectural model for the detection system.

Figure 4. Performance results of MLNN architectural design.

Figure 5. Architecture of the detection system application software.

Figure 6. Operational flow diagram of the user application system.

Figure 7. User dashboard system showing historical cases and categories of rheumatic disorders.

Figure 8. Sample user interface for carrying out clinical examination on patients.

Figure 9. Relational diagram for data management.

Figure 10. User application interface system for diagnosing patients.

Figure 11. Sample prediction report of clinically diagnosed patients.

Figure 12. Sample performance evaluation results for RA disorder using the DT algorithm.

Figure 13. Sample performance evaluation results for RA disorder using the KNN algorithm.

Figure 14. Sample performance evaluation results for RA disorder using the NB algorithm.

Table 1. Monthly historical records of diagnosed rheumatic cases from January to July 2022.

Month	No of Cases	% of Males	% of Females	Age Range
January	138	11.59	88.41	20–80
February	162	16.67	83.33	3–72
March	213	10.80	89.20	20–64
April	183	11.48	88.52	9–80
May	182	9.89	90.11	22–68
June	254	6.69	93.31	21–72
July	178	14.04	85.96	19–72

Table 2. Records of diagnosed rheumatic disorders from January to July 2022.

Disorders	No of Cases	% Distribution	Age Range
RA disorder	452	34.50	18–70
OA disorder	36	2.75	45–80
SLE disorder	500	38.17	18–49
Other disorders	322	24.58	9–80

Table 3. Dataset Distribution of Rheumatic disorders by age range.

Age Range	No of Records	% Distribution
<20	7000	7.00
20–40	43,000	43.00
41–60	30,000	30.00
61–70	15,000	15.00
71–80	5000	5.00

Table 4. Dataset Distribution by Rheumatic Disorders.

Disorders	No of Records	% Distribution
RA disorder	34,500	34.50
OA disorder	2750	2.75
SLE disorder	38,170	38.17
Unknown order	24,580	24.58

Table 5. Sample data generation for RA rheumatic disorder.

Record No.	Gender	Wrist Swelling	Elbow Swelling	Joint Locking	History of RA	Fever	Knee Swelling
1	1	1	1	0	0	1	0
2	1	0	0	0	0	0	1
3	1	0	0	0	1	1	1
4	1	1	1	0	0	0	0
5	1	1	1	1	0	1	0
6	0	1	1	0	1	0	0
7	0	1	1	0	0	1	0
8	0	0	0	1	1	0	0
9	1	0	0	1	1	0	1
10	1	1	1	1	1	0	0

Table 6. Extracted rheumatic features for learning model of disorders.

Code	Feature Name	Code	Feature Name	Code	Feature Name
F1	Pain in knuckle joint	F2	Swelling around elbows	F3	Pain in wrist joints
F4	Swelling around the knees	F5	Pain in feet joints	F6	Facial swelling
F7	Pain in shoulder joints	F8	Redness of the skin around swelling	F9	Pain in elbows
F10	Symmetrical swelling	F11	Pain in knees	F12	Reduced range of movement
F13	Pain in ankles	F14	Joint locking	F15	Pain in the hips
F16	Functional difficulty	F17	Pain in the chest	F18	Stiffness for more than an hour
F19	Pain symmetrical	F20	Rashes or physical skin changes	F21	Duration of pain more than 6 weeks
F22	Mouth sores	F23	Pain spreads to other parts of the body	F24	Hair loss
F25	Time of day worsens or improves	F26	Skin lesions worsen with sun exposure	F27	Knuckle joint swelling
F28	History of trauma to joints	F29	Swelling around the wrist	F30	Bony outgrowth in fingers
F31	Swelling in feet	F32	OA in the medical records	F33	Swelling around shoulder joints
F34	Family history of OA	F35	SLE in the medical records	F36	Autoimmune condition in records
F37	Family history of SLE	F38	Family history of RA	F39	Fatigue
F40	Smoking	F41	Fever	F42	Gender
F43	Weight loss	F44	Age	F45	RA in the medical records

Table 7. General parameters for the MLNN model development.

Parameter Description	Parameter Range
Number of input layer neurons	45
Number of output layer neurons	5
Number of hidden layers	Varied
Number of neurons in hidden layers	Varied
Learning rate	0.01 to 0.9
Momentum factor	0.1 to 0.9
Batch size Beta parameters (β₁, β₂) Tolerance parameter (δ) Loss function	32 0.90–0.999 10⁻⁸ Binary/Categorical cross entropy
Number of epochs	2000

Table 8. Summary MLNN architecture design results at learning rate of 0.05.

Layers	Neurons (in Hidden Layers)	Accuracy (%)	Execution Time (s)
1	L1 = 5	95.45	5.07
1	L1 = 10	97.38	5.36
1	L1 = 15	97.25	5.23
1	L1 = 20	97.41	5.25
2	L1 = 10, L2 = 5	96.01	5.68
2	L1 = 10, L2 = 10	96.12	5.57
2	L1 = 10, L2 = 15	97.48	5.60
2	L1 = 10, L2 = 20	95.48	5.65
3	L1 = 10, L2 = 5, L3 = 10	93.30	6.05
3	L1 = 10, L2 = 10, L3 = 15	94.40	6.23
3	L1 = 10, L2 = 15, L3 = 20	96.25	6.10
3	L1 = 10, L2 = 20, L3 = 20	96.43	5.61

Table 9. Summary results of performance evaluation of MLNN detection model.

Tests	ACC (%)	PRE (%)	REC (%)	F1-Score (%)
Test 1	97.452	97.452	96.933	97.193
Test 2	97.458	97.458	98.567	98.008
Test 3	97.460	97.460	98.559	98.006
Test 4	97.636	97.636	93.121	95.325
Average	97.502	97.502	96.795	97.133

Table 10. Sample Blind test data of patients for detection model testing.

Features	Test 1	Test 2	Test 3	Features	Test 1	Test 2	Test 3
F1	0	1	0	F24	0	0	0
F2	0	0	1	F25	0	1	0
F3	1	1	0	F26	0	0	1
F4	0	0	1	F27	0	1	0
F5	1	1	1	F28	0	0	0
F6	0	0	0	F29	1	0	1
F7	0	1	0	F30	0	0	0
F8	0	0	0	F31	1	1	1
F9	0	1	1	F32	0	0	0
F10	0	0	0	F33	0	0	0
F11	1	0	0	F34	0	0	0
F12	0	0	1	F35	0	0	1
F13	1	0	1	F36	0	0	1
F14	0	0	0	F37	0	0	0
F15	0	0	0	F38	0	1	0
F16	0	0	0	F39	1	0	0
F17	1	0	0	F40	0	0	1
F18	1	1	0	F41	0	0	0
F19	1	1	0	F42	1	1	0
F20	1	0	0	F43	0	0	0
F21	1	1	0	F44	2	3	4
F22	0	0	0	F45	0	0	0
F23	0	1	0

Table 11. Prediction results of the Blind Test experiment.

Tests	RA Disorder	SLE Disorder	Unknown Disorder
Test 1	0	0	1
Test 2	1	0	0
Test 3	0	1	0

Table 12. Comparison of detection performance evaluation with other algorithms.

Machine Learning Algorithms	RA (%)	OA (%)	SLE (%)	Average (%)
Decision Tree	77.04	100.0	78.13	85.06
Random Forest	92.25	99.33	85.63	92.40
Support Vector Machine	99.16	99.60	99.43	99.40
Naïve Bayes	97.45	89.93	88.29	91.89
K-Nearest Neighbour	78.59	79.47	81.12	79.73
MLNN	99.44	100.0	99.68	99.71

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mills, G.A.; Dey, D.; Kassim, M.; Yiwere, A.; Broni, K. Diagnostic Tool for Early Detection of Rheumatic Disorders Using Machine Learning Algorithm and Predictive Models. BioMedInformatics 2024, 4, 1174-1201. https://doi.org/10.3390/biomedinformatics4020065

AMA Style

Mills GA, Dey D, Kassim M, Yiwere A, Broni K. Diagnostic Tool for Early Detection of Rheumatic Disorders Using Machine Learning Algorithm and Predictive Models. BioMedInformatics. 2024; 4(2):1174-1201. https://doi.org/10.3390/biomedinformatics4020065

Chicago/Turabian Style

Mills, Godfrey A., Dzifa Dey, Mohammed Kassim, Aminu Yiwere, and Kenneth Broni. 2024. "Diagnostic Tool for Early Detection of Rheumatic Disorders Using Machine Learning Algorithm and Predictive Models" BioMedInformatics 4, no. 2: 1174-1201. https://doi.org/10.3390/biomedinformatics4020065

APA Style

Mills, G. A., Dey, D., Kassim, M., Yiwere, A., & Broni, K. (2024). Diagnostic Tool for Early Detection of Rheumatic Disorders Using Machine Learning Algorithm and Predictive Models. BioMedInformatics, 4(2), 1174-1201. https://doi.org/10.3390/biomedinformatics4020065

Article Menu

Diagnostic Tool for Early Detection of Rheumatic Disorders Using Machine Learning Algorithm and Predictive Models

Abstract

1. Introduction

2. Related Works

3. Detection Model Design and Development

3.1. Dataset for Model Development

3.2. Feature Selection and Processing

3.3. Detection Model Development and Training

3.4. Detection Model Performance Evaluation

4. User Application Software Development

4.1. User Application System

4.2. User Data Management

5. Experiments and Results

5.1. Blind Tests Simulation Experiment

5.2. Clinical Testing Experiment

5.3. Comparative Analysis of Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Record No.	Gender	Wrist Swelling	Elbow Swelling	Joint Locking	History of RA	Fever	Knee Swelling
1	1	1	1	0	0	1	0
2	1	0	0	0	0	0	1
3	1	0	0	0	1	1	1
4	1	1	1	0	0	0	0
5	1	1	1	1	0	1	0
6	0	1	1	0	1	0	0
7	0	1	1	0	0	1	0
8	0	0	0	1	1	0	0
9	1	0	0	1	1	0	1
10	1	1	1	1	1	0	0

Features	Test 1	Test 2	Test 3	Features	Test 1	Test 2	Test 3
F1	0	1	0	F24	0	0	0
F2	0	0	1	F25	0	1	0
F3	1	1	0	F26	0	0	1
F4	0	0	1	F27	0	1	0
F5	1	1	1	F28	0	0	0
F6	0	0	0	F29	1	0	1
F7	0	1	0	F30	0	0	0
F8	0	0	0	F31	1	1	1
F9	0	1	1	F32	0	0	0
F10	0	0	0	F33	0	0	0
F11	1	0	0	F34	0	0	0
F12	0	0	1	F35	0	0	1
F13	1	0	1	F36	0	0	1
F14	0	0	0	F37	0	0	0
F15	0	0	0	F38	0	1	0
F16	0	0	0	F39	1	0	0
F17	1	0	0	F40	0	0	1
F18	1	1	0	F41	0	0	0
F19	1	1	0	F42	1	1	0
F20	1	0	0	F43	0	0	0
F21	1	1	0	F44	2	3	4
F22	0	0	0	F45	0	0	0
F23	0	1	0

Record No.	Gender	Wrist Swelling	Elbow Swelling	Joint Locking	History of RA	Fever	Knee Swelling
1	1	1	1	0	0	1	0
2	1	0	0	0	0	0	1
3	1	0	0	0	1	1	1
4	1	1	1	0	0	0	0
5	1	1	1	1	0	1	0
6	0	1	1	0	1	0	0
7	0	1	1	0	0	1	0
8	0	0	0	1	1	0	0
9	1	0	0	1	1	0	1
10	1	1	1	1	1	0	0

Features	Test 1	Test 2	Test 3	Features	Test 1	Test 2	Test 3
F1	0	1	0	F24	0	0	0
F2	0	0	1	F25	0	1	0
F3	1	1	0	F26	0	0	1
F4	0	0	1	F27	0	1	0
F5	1	1	1	F28	0	0	0
F6	0	0	0	F29	1	0	1
F7	0	1	0	F30	0	0	0
F8	0	0	0	F31	1	1	1
F9	0	1	1	F32	0	0	0
F10	0	0	0	F33	0	0	0
F11	1	0	0	F34	0	0	0
F12	0	0	1	F35	0	0	1
F13	1	0	1	F36	0	0	1
F14	0	0	0	F37	0	0	0
F15	0	0	0	F38	0	1	0
F16	0	0	0	F39	1	0	0
F17	1	0	0	F40	0	0	1
F18	1	1	0	F41	0	0	0
F19	1	1	0	F42	1	1	0
F20	1	0	0	F43	0	0	0
F21	1	1	0	F44	2	3	4
F22	0	0	0	F45	0	0	0
F23	0	1	0

Record No.	Gender	Wrist Swelling	Elbow Swelling	Joint Locking	History of RA	Fever	Knee Swelling
1	1	1	1	0	0	1	0
2	1	0	0	0	0	0	1
3	1	0	0	0	1	1	1
4	1	1	1	0	0	0	0
5	1	1	1	1	0	1	0
6	0	1	1	0	1	0	0
7	0	1	1	0	0	1	0
8	0	0	0	1	1	0	0
9	1	0	0	1	1	0	1
10	1	1	1	1	1	0	0

Features	Test 1	Test 2	Test 3	Features	Test 1	Test 2	Test 3
F1	0	1	0	F24	0	0	0
F2	0	0	1	F25	0	1	0
F3	1	1	0	F26	0	0	1
F4	0	0	1	F27	0	1	0
F5	1	1	1	F28	0	0	0
F6	0	0	0	F29	1	0	1
F7	0	1	0	F30	0	0	0
F8	0	0	0	F31	1	1	1
F9	0	1	1	F32	0	0	0
F10	0	0	0	F33	0	0	0
F11	1	0	0	F34	0	0	0
F12	0	0	1	F35	0	0	1
F13	1	0	1	F36	0	0	1
F14	0	0	0	F37	0	0	0
F15	0	0	0	F38	0	1	0
F16	0	0	0	F39	1	0	0
F17	1	0	0	F40	0	0	1
F18	1	1	0	F41	0	0	0
F19	1	1	0	F42	1	1	0
F20	1	0	0	F43	0	0	0
F21	1	1	0	F44	2	3	4
F22	0	0	0	F45	0	0	0
F23	0	1	0