We tested our proposed methodology with different techniques for clinical validation. We tested it on a new dataset from a clinical hospital. We approached the department of internal medicine, Rangpur Medical College and Hospital, Rangpur, Bangladesh. The clinic concerned collaborated to make clinical validation steps.
9.3. Group Division: [20]
Patients were divided into two groups based on the presence or absence of cardiovascular disease, which has determined based on a combination of clinical assessments, including history, physical examination, and further tests as required.
Normal (n = 368), CVD (n = 484).
Patients were divided into two groups based on the presence or absence of CVD. This division was informed by a combination of clinical assessments including medical history, physical examinations, and additional diagnostic tests. The total sample comprised patients, with the division into groups as follows:
Where p represents the prevalence rate of CVD derived from the dataset or assumed based on similar populations in prior studies. This method ensures that the sample division reflects realistic clinical scenarios, enhancing the reliability of our findings.
At first deviation and clinical set-1, there was Normal (n = 368), CVD (n = 484). The results shown in the table are comprehensive and potentially impact the research outcome.
Age: With a mean age of 54.37 in the derivation set, the data span from younger to elderly adults in the validation set (29 to 77 years). This wide range is crucial because it allows the prediction model to be tested across a diverse age group, which is vital for cardiovascular disease modeling, where age is a strong risk factor. However, the p-value of 0.471 indicates that there is no statistically significant difference in age between the two sets, suggesting that age alone may not provide a discriminatory power for the model.
Sex: A mean value of 0.68 (possibly indicating a predominance of one sex over the other in the derivation set) against a range of 0 to 1 in the validation set implies that both sexes are represented. The p-value of 0.493 reinforces that sex distribution is similar across both sets, hinting that sex may not be a differentiating factor in cardiovascular outcomes within these data, which could be beneficial in creating a non-biased predictive model.
Chest pain (cp): The standard deviation and range for chest pain indicate variability, which can enrich the model’s understanding of symptom patterns. Yet, the p-value of 0.845 shows no significant difference between the sets, suggesting consistency in chest pain patterns, which is useful for the model’s reliability.
Resting blood pressure (trestbps): The mean value of 131.62, with a significant range in the validation set, indicates diverse cardiovascular profiles. However, the statistical insignificance (p = 0.841) might suggest that for the sampled population, resting blood pressure varies widely but similarly between the two groups. This could mean that while important, trestbps needs to be considered alongside other variables for effective prediction.
Cholesterol (chol): The validation set’s mean of 246.26 and wide range signifies the inclusion of patients with various cholesterol levels. The non-significant p-value (0.594) may suggest that cholesterol levels alone, within the range observed, might not be a stand-alone predictor of cardiovascular disease in this model.
Fasting blood sugar (fbs): The low mean and range indicate the presence of both diabetic and non-diabetic individuals, yet the p-value (0.863) suggests that fasting blood sugar levels are not significantly different between the sets. For the predictive model, fbs may need to be combined with other risk factors to increase predictive accuracy.
Resting ECG (restecg): A mid-range mean value (0.53) covering the full potential range in the validation set implies diverse cardiac electrical patterns among the patients. The higher p-value (0.223) indicates a similar distribution of ECG results in both groups, which is essential for validating the model across typical ECG variances. There are various components that have been measured as the mean value (0.53) and range indicate a variety of ECG patterns observed in the study population, rather than focusing on a single part of the ECG waveform. The two components analysed in this study population include any ST-T abnormalities and indications for the presence of left ventricular hypertrophy. The higher p-value (0.223) indicates a similar distribution of electrocardiogram results in both groups, which is essential for validating the model across typical ECG variances.
Maximum heart rate achieved (thalach): The statistical significance (p = 0.003) of thalach, with a mean of 149.65, highlights it as a critical factor. This suggests that thalach has the potential to be a strong predictor in the model, emphasizing its importance in cardiovascular health assessment.
Exercise-induced angina (exang): A mean of 0.33 suggests a lower prevalence of exercise-induced angina in the derivation set, and a p-value of 0.856 shows no significant difference with the validation set. This could be informative for the model, as it may consider the presence of angina in conjunction with other factors.
BMI: A mean BMI of 26.18 falls within the overweight category, and the non-significant
p-value (0.732) suggests that BMI distributions are consistent across the derivation and validation sets. While BMI is a recognized cardiovascular risk factor, its non-significant difference across sets may imply it does not strongly differentiate the cardiovascular disease status within this particular dataset. BMI (Body Mass Index) thresholds for defining overweight and obesity can indeed vary among different populations due to variations in body composition and fat distribution. Research has shown that for the same BMI, Asian and Black populations may have different levels of body fat and associated health risks compared to caucasian populations. Thus, BMI values may need to be stratified according to ethnic origin [
21,
22].
Asian Populations [
21]—Asians generally have a higher percentage of body fat compared to Caucasians at the same BMI level. Consequently, health organizations such as the World Health Organization (WHO) and the International Diabetes Federation (IDF) recommend lower BMI cut-off points for defining overweight and obesity in Asian populations. For instance: Overweight: BMI
and Obesity: BMI
.
Black Populations—Although the evidence is mixed on adjusting BMI cut-off points for Black populations, some studies suggest using additional measures of body composition and fat distribution. A study indicated that the BMI cut-off for obesity in Black populations could be slightly higher than in White populations due to these differences [
22].
Moreover, the significant variation in thalach was leveraged to predict cardiovascular events more specifically. Moreover, the consistency across most indicators that the model could apply to a broader patient population, fulfilling the research’s aim to advance cardiovascular disease prediction by integrating dynamic simulation and ML for enhanced early detection.
Normal (n = 445), CVD (n = 578).
Characteristics of patients in the derivation set and validation set-2 regarding the external validity of CVD prediction are detailed below.
Age: With very close mean ages in the derivation (54.06 ± 8.74) and validation sets (53.95 ± 8.84) and a p-value of 0.932, this similarity underscores the model’s ability to generalize across adults in a broad age range. The negligible difference enhances confidence in the model’s applicability for predicting CVD across diverse age groups.
Sex: The derivation and validation sets have mean values of 0.72 ± 0.45 and 0.69 ± 0.46, respectively, with a p-value of 0.611. This indicates a balanced representation of sexes in both sets, suggesting the predictive model’s sex neutrality in assessing CVD risk.
Chest pain (cp): The slight variation in mean values (Dev. set: 1.00 ± 1.04, Val. set: 1.07 ± 1.03) and a p-value of 0.652 implies that chest pain as a symptom is similarly distributed among both groups, reinforcing the inclusion of this variable in CVD prediction without bias towards either dataset.
Resting blood pressure (trestbps): Mean values are 130.41 ± 16.01 for the derivation set and 131.52 ± 17.18 for the validation set, with a p-value of 0.658. This close similarity suggests that resting blood pressure, despite its critical role in CVD risk assessment, does not differ significantly between the sets, supporting its use in the predictive model.
Cholesterol (chol):Cholesterol (chol): Cholesterol (chol): The cholesterol levels in the development set (239.15 ± 50.07) and the validation set (238.47 ± 48.04) are nearly identical, with a p-value of 0.926. This statistical similarity indicates that cholesterol levels are consistently distributed across both groups. Such uniformity is crucial because it suggests that cholesterol can be a stable and reliable predictor of cardiovascular disease (CVD) across different datasets. This consistency ensures that the predictive models developed using cholesterol data from one set are likely to be equally effective when applied to another set, thereby affirming the relevance of cholesterol in CVD predictive modeling.
Fasting blood sugar (fbs): The derivation and validation sets show similar fasting blood sugar levels (Dev. set: 0.11 ± 0.32, Val. set: 0.12 ± 0.33) with a p-value of 0.877. The model can, therefore, reliably use fbs as a predictor without concern for set-specific bias.
Resting ECG (restecg), maximum heart rate achieved (thalach), Exercise-induced angina (exang): These indicators also demonstrate no significant differences (restecg p = 0.611, thalach p = 0.804, exang p = 0.572), further supporting their incorporation in CVD risk models for a broad patient population.
Oldpeak, slope, number of major vessels (ca), thalassemia (thal): The non-significant p-values across these more technical cardiovascular indicators (oldpeak p = 0.926, slope p = 0.852, ca p = 0.878, thal p = 0.681) highlight a consistency in disease severity markers, suggesting the potential for these factors to contribute reliably to risk prediction models.
Target (CVD presence): The similar distribution of CVD presence (target) in both sets (Dev. set: 0.53 ± 0.50, Val. set: 0.57 ± 0.50, p = 0.549) emphasizes the balanced representation of disease status. This balance is pivotal for validating the model’s ability to distinguish between CVD presence and absence accurately.
All these indicators, as evidenced by non-significant p-values external validity, suggest that the developed model, leveraging these characteristics, can be applied across different sets of patients without the risk of significant bias. This clinical setting can aid in early detection and targeted intervention strategies for cardiovascular disease, personalized patient care, and optimized treatment pathways.
The results from the multivariate Logistic regression analysis presented in the table are critical for the clinical validation of independent risk factors in predicting CVD. This analysis forms an essential part of the research for enhanced early detection and analysis. By examining the impact of various indicators on CVD risk and their statistical significance, this research contributes to the enhancement of predictive models and clinical decision making.
Age (Coef. = 8.087988, p < 0.0001): Age is shown to be a significant predictor of CVD risk, with a relatively large coefficient suggesting its strong influence. The confidence interval (CI) from 4.289382 to 11.886593 reinforces the robustness of this variable. This emphasizes the necessity of including age as a core variable in dynamic simulation and ML models for CVD prediction, reflecting its critical role in clinical validation and early detection efforts.
Chest pain (cp, Coef. = −1.772114, p < 0.00001): The negative coefficient indicates that as the severity of chest pain increases, the likelihood of CVD decreases, which may seem counterintuitive. This might suggest specific types of chest pain are inversely related to certain CVD outcomes or that this variable interacts with others in complex ways. The significant P-value highlights the importance of chest pain as a variable in the predictive model, necessitating further investigation into its role.
Resting blood pressure (trestbps, Coef. = 0.950022, p < 0.00001): This factor’s positive coefficient and its statistical significance suggest it is an important predictor of CVD risk. The CI indicates a high level of certainty about its effect size, reinforcing the value of including trestbps in predictive algorithms for clinical applications.
Cholesterol (chol, Coef. = −0.016875,
p = 0.0111): The negative coefficient for cholesterol, while small, is statistically significant, suggesting higher cholesterol levels might slightly decrease the log-odds of CVD in the context of other factors. This counterintuitive finding warrants further exploration within the model’s framework, particularly how cholesterol interacts with other risk factors. It may also suggest further stratification into cholesterol subtypes of HDL and LDL is necessary [
23].
Exercise-induced angina (exang, Coef. = 0.019805, p = 0.0039): Exang’s positive coefficient and statistical significance indicate its relevance as a predictor. It suggests that the presence of exercise-induced angina increases CVD risk, which aligns with clinical understanding and underscores its utility in predictive modeling.
Thalassemia (thal, Coef. = −0.727265, p < 0.00001): The significant negative coefficient for thalassemia suggests different types or severities of this condition might be inversely related to CVD risk in the analyzed population. This finding is particularly impactful for clinical validation, highlighting the need to consider genetic or hereditary factors in CVD risk models.
BMI (Coef. = −0.157092, p = 0.000016): The significance of BMI, with a negative coefficient, implies that higher BMI values might, in the context of this model, slightly reduce the log-odds of CVD, which could reflect the complex relationship between obesity, metabolic health, and cardiovascular outcomes.
These results provide a quantitative foundation for validating the predictive accuracy of models developed under the research aim. The significant p-values for most indicators confirm their relevance in predicting CVD, which is essential for the clinical validation of the model. By integrating these statistically significant risk factors into dynamic simulation and ML models, the research advances the capability for early detection of CVD. The models can offer personalized risk assessments, aiding in the identification of at-risk individuals based on a comprehensive analysis of their clinical and physiological data. The validated model, incorporating these key indicators, can be deployed through web applications, providing clinicians and patients with accessible tools for assessing CVD risk. This aligns with the research’s aim to leverage technology for improving cardiovascular health outcomes in clinical practice.
Ratio analysis for clinical validation is shown in
Table 14. ROC curve for clinical validation is shown in
Figure 19. Characteristics of patients in the derivation set and validation set-1 are shown in
Table 15, Characteristics of patients in the derivation set and validation set-2 are shown in
Table 16, Multivariate Logistic regression analysis of independent risk factors for clinical validation is shown in
Table 17, and ratio analysis for clinical validation is shown in
Table 14. The statistical analysis provides a detailed insight for determining the likelihood of cardiovascular events. These outcomes are instrumental for clinical validation to ensure the model’s applicability and reliability in a healthcare setting. The significance of these predictors, validated through statistical analysis, supports their clinical relevance. this implies that the model can effectively identify key risk factors for cardiovascular events, which are essential for early detection and preventive strategies in a clinical setting. The successful optimization and validation of the model with unseen data underscore its potential for real-world application. This demonstrates that the model can generalize well beyond the training dataset, a crucial aspect for clinical use where diverse patient demographics and conditions are encountered. Coefficients and odds ratios, along with the AUC score, provide a quantified insight into the factors influencing CVD risk. Identifying high-risk patients the important variables with significant odds ratios (like chest pain type, max heart rate, and the presence of exercise-induced angina) can help clinicians identify patients at higher risk for cardiovascular events. We built a logistic stat model with the model details below.
Model Performance: The value of 0.341792 represents the final value of the objective function, typically the log-likelihood or cross-entropy loss, after the optimization process for the logistic regression model concluded successfully in 7 iterations. The low value indicates a good fit between the model’s predicted probabilities and the actual outcomes suggests that the model has reached a stable solution with potentially high predictive power. In the context of multivariate logistic regression analysis, the value is crucial for the clinical validation of independent risk factors in predicting cardiovascular disease (CVD) by enhancing early detection and targeted treatment strategies.
Coefficient significance:
Sex (−1.772114 coefficient): Indicates a strong negative association with cardiovascular events that being male significantly increases the risk compared to females, which is a vital insight for sex-specific risk assessments.
Chest pain (cp, 0.950022 coefficient): Shows a positive and strong relationship with the likelihood of cardiovascular events, underscoring the importance of chest pain characteristics in predicting cardiovascular risks.
Max heart rate (thalach, 0.019805 coefficient) and slope of the peak exercise ST segment (slope, 0.582247 coefficient): Both have positive associations with cardiovascular events, emphasizing their roles in exercise-related cardiac assessments.
Major Vessels (ca, −0.727265 coefficient) and Thalassemia (thal, −0.892531 coefficient):These features present strong negative associations in the model, suggesting that higher values or presence of these conditions are associated with a decreased predicted risk of heart events. This counterintuitive result might be due to the complex interplay of multiple features within the dataset, where other factors could be compensating for these conditions, thus reducing the overall predicted risk. It is crucial to understand that these results are specific to the model and dataset used and may require further investigation to fully interpret their implications [
24,
25,
26].
- 3.
p-Values and confidence intervals: Variables like sex, cp, thalach, exang, oldpeak, ca, and thal show statistically significant p-values (p < 0.05), confirming their importance in the model. The confidence intervals provide an estimate of the precision of the coefficients, further reinforcing the reliability of these variables in predicting cardiovascular outcomes.
- 4.
Odds ratios and coefficients: Each variable’s coefficient and odds ratio tell us how changes in that variable are associated with the odds of experiencing a cardiovascular event.
Age (odds ratio: 0.987708): A negative coefficient that with each additional year, the odds of having a heart event slightly decrease, although the effect is minimal.
Sex (odds ratio: 0.197894): Indicates males are at a lower risk of experiencing a cardiovascular event compared to females in this model context, which is counterintuitive and warrants further investigation given that clinical research often shows higher CVD risk in males.
Chest pain (cp, odds ratio: 2.516810): A significant predictor with a positive coefficient, indicating that as the severity of chest pain increases, so does the likelihood of a cardiovascular event.
Max heart rate (thalach, odds ratio: 1.020009): Higher maximum heart rates are slightly associated with increased odds of cardiovascular events.
Exercise-induced angina (exang, odds ratio: 0.481620) and Oldpeak (odds ratio: 0.616033): Both have negative coefficients, indicating that the presence of exercise-induced angina and higher ST depression is associated with lower odds of experiencing heart events, which typically a higher risk in clinical contexts, so the interpretation should consider the model’s overall predictive context.
Major Vessels (ca, Odds Ratio: 0.491296) and Thalassemia (thal, Odds Ratio: 0.419880): These features show strong negative associations with the likelihood of heart events. Specifically, an odds ratio less than 1 indicates that the presence of more major vessels detected by fluoroscopy or certain types of thalassemia significantly lowers the odds of heart events according to this model. An odds ratio of less than 1 suggests that as the feature value increases, the likelihood of the outcome (in this case, heart events) decreases. For instance, an OR of 0.491296 for major vessels means that the presence of more major vessels is associated with approximately a 51% reduction in the odds of heart events. This suggests the need for further investigation into the data and model, including potential confounding factors and the specific characteristics of the dataset used as abnormalities in visualisation are not specified. The contradiction arises may arise because, clinically, both thalassemia and abnormalities in major vessels are known to be associated with increased cardiovascular risks. However, in the context of this model, the negative odds ratios suggest a reduced likelihood of heart events. This counterintuitive finding can occur due to several reasons, such as the presence of other influential features in the model that mitigate the risk associated with thalassemia, and non-specification of any major vessel abnormalities.
An odds ratio (OR) greater than 1 indicates a positive association with the outcome (higher odds of CVD), whereas an OR less than 1 indicates a negative association (lower odds of CVD). Factors such as chest pain type (cp), resting electrocardiogram results (restecg), maximum heart rate achieved (thalach), and the slope of the peak exercise ST segment (slope) have ORs greater than 1, suggesting a positive association with CVD. Conversely, factors like age, sex, resting blood pressure (trestbps), serum cholesterol (chol), fasting blood sugar (fbs), exercise-induced angina (exang), ST depression induced by exercise relative to rest (oldpeak), the number of major vessels colored by fluoroscopy (ca), thalassemia (thal), and body mass index (BMI) have ORs less than 1, indicating a negative association with CVD. The negative associations may arise due to various reasons such as non-linear relationships between the clinical factors and CVD, the presence of confounding factors, population-specific effects in the dataset, and multicollinearity among predictors. Non-linear relationships suggest that extremely high or low values of these factors might have different impacts compared to moderate values. Confounding factors and population-specific effects might skew the results based on the characteristics of the dataset used. Multicollinearity, where predictors are correlated with each other, can also affect the estimated coefficients and ORs. Despite the negative associations observed for some factors, their clinical relevance remains significant. For instance, the well-established importance of cholesterol and BMI in CVD risk assessment should be considered in a broader clinical context. Therefore, while odds ratios provide insights into associations between various factors and CVD, it is crucial to consider potential confounders, the characteristics of the dataset, and the overall clinical context.
- 5.
AUC Score (0.9373177842565598): The AUC-ROC score is an effective measure for classification models at various threshold settings. An AUC score close to 1 indicates a high degree of accuracy in the model’s ability to differentiate between patients who will and will not experience a cardiovascular event. A score of 93% is excellent and the model has a strong predictive capability
9.4. Model Generalization Phase for Clinical Validation
In the process of determining whether a diagnostic or predictive model is clinically relevant and accurate in a real-world setting, for further clinical validation, we also tested our model’s technique on unseen data. It proves that the unseen clinical setting data has not been used in training of the model, representing external validation perfectly correct during cross-validation; this further minimizes overfitting and improves the model’s predictive ability on new data. The model generalization phase for clinical validation is shown in
Figure 20.
The image depicts a graph from a Lassoregression cross-validation process that shows the relationship between the log of the lambda penalty and the mean squared error (MSE) during the model training process. This signifies that it could be reliably deployed in real-world clinical settings for CVD prediction.
Mean squared error (MSE): The vertical axis represents the MSE, a measure of the average squared difference between the observed actual outcomes and the outcomes predicted by the model. A lower MSE indicates a more accurate model.
values: The horizontal axis shows the log-transformed lambda values. Lambda () is the penalty parameter in Lassoregression, which controls the strength of the regularization applied to the model. Regularization can shrink the coefficients of less important features to zero, thus performing feature selection.
Optimal point: The dashed vertical line indicates the value that minimizes the cross-validated MSE. This is the optimal value where the model achieves the best balance between bias and variance at a of approximately . At this point, the model is neither overfitting nor underfitting.
Error bars: The red-shaded region with vertical lines represents the variability of the MSE across the different folds of cross-validation. The width of the error bars indicates the stability of the model’s performance; smaller bars mean that the performance is more stable across different subsets of the data.
The graph supports the clinical validation phase by illustrating that the model, when applied to unseen data, maintains a relatively low and stable MSE across different values of regularization strength. This suggests that the model has generalized well beyond the specific conditions of the training dataset.
The Lassoregression technique inherently performs feature selection, which could mean that only the most predictive features are retained, reducing the risk of overfitting and improving the model’s performance on unseen data.
By choosing the optimal lambda value through cross-validation, the model demonstrates its ability to maintain performance when applied to new data, thus supporting its use in a clinical setting where the ability to generalize to new patient data is critical.
The use of cross-validation in determining the optimal penalty reinforces the model’s credibility. It indicates that the model’s performance is tested in a way that mimics its future application on different patient data, ensuring that the performance metrics are not overly optimistic.
The clinical validation through Lassoregression cross-validation is significant for CVDs to ensure the method is not just theoretically sound but also practically reliable. By identifying the most influential predictors and eliminating overfitting, the model becomes a reliable tool for clinicians and patients alike, ensuring its generalizability to real-world clinical settings. The precise calibration of the model through clinical validation enables enhanced predictive capabilities, which are central to early detection and can lead to better patient outcomes. The validated model integrates seamlessly with advanced web application technologies, providing a foundation for the development of user-friendly tools that support healthcare decision making. The rigorous validation process confirms the model’s utility in clinical practice, encouraging its adoption and fostering trust in its predictive insights for both healthcare providers and patients. This validated model has the potential to revolutionize the early detection of CVD, supporting the project’s mission to leverage technology for improved health outcomes.