3.1. Pilot Study of Patients with Chronic Heart Failure and Chronic Obstructive Pulmonary Disease
First, it is important to mention that neither the smart pillbox (HF group) nor the smart inhalers (COPD group) provided usable information because the patients did not use them correctly or not often enough; therefore, they could not generate meaningful and statistically usable data. Most of the HF patients did not use the smart pillbox at all. To avoid influencing the adherence of the participants, it was not possible to inform them at the beginning of the pilot that the frequency of use of the pillbox was being measured. Instead, information was provided on how to include them to improve the design of the pillbox. It was a challenging task to monitor adherence, especially from an ethical perspective [
47], which is even more pertinent in the age of digital medicine [
48]. Participants were briefed about the true nature of the pillbox after the study. Most of the participants were not interested in improving the design of the pillbox, which was also an interesting finding. The underuse of smart inhalers was also an issue. This was especially problematic if the device was dismantled from the inhaler or the inhalers had different mechanical designs, leading to incompatibility issues. High failure rates were also mentioned, with recommendations for implementing higher quality controls [
49], although their failure rates ranged from 13 to 16%.
The administration of the questionnaires throughout the pilot study yielded the following results (
Figure 9). For the COPD group, the mean attractiveness score was 0.991, with a median of 1.000 and a standard deviation of 0.916. The distribution of the attractiveness scores shows a slight positive skew (skew = 0.207) and a platykurtic shape (kurtosis = −0.663). The Shapiro–Wilk test was conducted to assess normality, and the test statistic yielded a
p-value of 0.972, with a corresponding
p-value of 0.954. These results indicated that the attractiveness score distribution for the COPD group can be considered approximately normal.
The HF group exhibited a moderately positive perception of the visual appeal of the application, with a mean attractiveness score of 1.341 (SD = 1.170). The distribution of attractiveness scores was slightly negatively skewed (−0.658), indicating that a majority of participants rated the attractiveness of the application higher than the mean. Furthermore, the kurtosis value of −0.455 suggested a relatively flat distribution compared to a normal distribution. The Shapiro–Wilk test (p < 0.001) indicated a deviation from normality, implying that the data distribution significantly deviates from a normal pattern.
Moving on to UEQ Perspicuity (
Figure 9), the mean score for the COPD group was 1.464, with a median of 1.250 and a standard deviation of 0.982. The distribution of the perspicuity scores appeared to be relatively symmetric, as indicated by a skewness value of 0.200. The kurtosis value was −1.220, suggesting a distribution with negative kurtosis. The Shapiro–Wilk test yielded a test statistic of 0.972, with a corresponding
p-value of 0.929, suggesting that the distribution of perspicuity scores was approximately normal. Participants in the HF group reported a relatively high level of perceived clarity and comprehensibility in the design of the application, as indicated by a mean perspicuity score of 2.063 (SD = 1.150). The distribution was negatively skewed (−1.243), suggesting that most of the participants perceived the application’s perspicuity positively. Kurtosis of 0.564 indicated a slightly peaked distribution compared to a normal distribution. The Shapiro–Wilk test (
p = 0.001) suggested a departure from normality.
For the COPD group, the mean UEQ efficiency score was 0.881, with a median of 1.000 and a standard deviation of 1.147 (
Figure 9). The efficiency score distribution showed a slight negative skew (skew = −0.312) and a kurtosis value of −0.364, suggesting a distribution that was relatively close to a normal shape. The Shapiro–Wilk test yielded a test statistic of 0.972, with a corresponding
p-value of 0.978, which indicated that the distribution of efficiency scores was approximately normal.
The mean efficiency score of the HF group was 1.375 (SD = 1.151), which reflects a favourable perception of the efficiency of the application at the completion of the task. The distribution had a slightly negatively skewed shape (−0.383), implying that most participants perceived the application as efficient. A kurtosis of −0.863 indicated a distribution with flatter tails than a normal distribution. The Shapiro–Wilk test (p = 0.381) indicated a departure from normality.
The COPD group had a mean UEQ dependability score of 0.988, with a median of 0.500 and a standard deviation of 1.198 (
Figure 9). The distribution of the dependability scores appeared slightly positively skewed (skew = 0.298) and exhibited a kurtosis value of −1.081, suggesting a platykurtic distribution. The Shapiro–Wilk test yielded a test statistic of 0.972, with a corresponding
p-value of 0.936, indicating that the distribution of dependability scores can be considered approximately normal.
Participants in the HF group reported positive perceptions of the reliability of the application, with a mean dependability score of 1.700 (SD = 1.015). The distribution showed a slightly negatively skewed shape (−0.301), indicating that most of the participants perceived the application as reliable. The kurtosis of −1.022 suggested a relatively flat distribution with thin tails. The Shapiro–Wilk test (p = 0.160) suggested a departure from normality.
For the COPD group, the mean UEQ stimulation score was 0.821, with a median of 0.750 and a standard deviation of 0.949 (
Figure 9). The distribution of stimulation scores showed a slight negative skew (skew = −0.305) and a kurtosis value of 0.311, indicating a distribution that was relatively close to normal. The Shapiro–Wilk test yielded a test statistic of 0.972, with a corresponding
p-value of 0.981, suggesting that the distribution of the stimulation scores was approximately normal.
The mean stimulation score in the HF group was 1.125 (SD = 1.111), suggesting that a moderate level of perceived stimulation was provided by the characteristics of the application. The distribution was relatively symmetrical, as indicated by a skew of −0.086. A kurtosis of −1.387 implied a flatter distribution than a normal distribution with thin tails. The Shapiro–Wilk test (p = 0.104) suggested a departure from normality.
The COPD group had a mean UEQ novelty score of 0.929, with a median of 1.000 and a standard deviation of 0.867 (
Figure 9). The distribution of novelty scores exhibited a negative skew (skew = −1.159) and a kurtosis value of 1.916, suggesting a distribution with a long left tail and a leptokurtic shape. The Shapiro–Wilk test yielded a test statistic of 0.972, with a corresponding
p-value of 0.921, indicating that the distribution of novelty scores could be considered approximately normal.
The HF group reported a mean novelty score of 0.775 (SD = 1.029), indicating a moderate perception of the novelty of the application. The distribution was relatively symmetric, with a skew of −0.145. A kurtosis of 0.280 suggested a slightly peaked distribution compared to a normal distribution. The Shapiro–Wilk test (p = 0.607) suggested a departure from normality.
For the COPD group, the mean pragmatic quality score was 0.476, with a median of 0.500 and a standard deviation of 0.774 (
Figure 9). The distribution of pragmatic quality scores shows a negative skew (skew = −0.840) and a kurtosis value of 0.746, indicating that the distribution was slightly skewed to the left and exhibited a relatively flat shape. The Shapiro–Wilk test yielded a test statistic of 0.972, with a corresponding
p-value of 0.937, suggesting that the distribution of pragmatic quality scores was approximately normal.
The mean pragmatic quality score of the participants in the HF group was 0.662 (SD = 0.660), reflecting a favourable perception of the practical usefulness of the application. The distribution was relatively symmetrical, with a skew of −0.149. A kurtosis of −0.981 indicated a distribution with flatter tails than a normal distribution. The Shapiro–Wilk test (p = 0.079) suggested a departure from normality.
The COPD group had a mean UEQ hedonic quality score of 0.726, with a median of 0.750 and a standard deviation of 1.003 (
Figure 9). The distribution of the hedonic quality scores exhibited a negative skew (skew = −0.560) and a kurtosis value of −0.082, indicating a distribution that was slightly skewed to the left and had a relatively flat shape. The Shapiro–Wilk test yielded a test statistic of 0.972, with a corresponding
p-value of 0.951, suggesting that the distribution of hedonic quality scores could be considered approximately normal.
The mean hedonic quality score among the HF group was 1.063 (SD = 0.862), suggesting a positive perception of the application’s ability to provide enjoyable interactions. The distribution was slightly positively skewed (0.707), indicating that more participants positively rated the application’s hedonic quality. A kurtosis of −0.392 implied a distribution with flatter tails. The Shapiro–Wilk test (p = 0.069) suggested a departure from normality.
For the COPD group, the mean overall score was 1.096, with a median of 1.125 and a standard deviation of 0.887 (
Figure 10). The distribution of the overall scores showed a positive skew (skew = 0.759) and a kurtosis value of −0.653, indicating a slight skew to the right and a platykurtic shape. The Shapiro–Wilk test yielded a test statistic of 0.972, with a corresponding
p-value of 0.925, suggesting that the distribution of overall scores was approximately normal.
The average overall score in the HF group was 0.864, indicating a relatively high level of overall user experience satisfaction. The median score was 0.940, and the standard deviation was 0.652, suggesting some variability in the participant ratings. A skew value of 0.150 suggested a slightly positive skew, while a kurtosis value of −0.445 indicated a slightly platykurtic distribution. The Shapiro–Wilk test was not significant (p = 0.787), indicating that the data were normally distributed.
The COPD group had a mean change in necessity score of 0.333 (SD = 2.526), indicating a minor change in the perceived necessity of the application features (
Figure 10). The distribution was moderately positively skewed (0.885), suggesting that most of the participants experienced a slight increase in perceived necessity. A kurtosis of 2.979 indicated a distribution with relatively heavy tails compared to a normal distribution. The Shapiro–Wilk test (
p = 0.090) suggested a departure from normality.
In the HF group, the mean change in necessity score was 0.105 (SD = 2.622), also indicating a minor shift in perceived necessity. The distribution was relatively symmetrical, as indicated by a skew of 0.513. A kurtosis of 0.078 suggested a flatter distribution than a normal distribution. The Shapiro–Wilk test (p = 0.543) indicated no significant difference from normality.
The COPD group reported a mean concern change score of −1.133 (SD = 2.774), suggesting a decrease in perceived concerns about the use of the application (
Figure 10). The distribution was moderately negatively skewed (0.043), indicating that most of the participants experienced a reduction in concerns. A kurtosis of 1.253 indicated a distribution with heavier tails. The Shapiro–Wilk test (
p = 0.080) suggested a departure from normality.
The mean concern change score for the HF group was −0.763 (SD = 2.312), indicating a similar reduction in perceived concerns. The distribution was negatively skewed (0.388), indicating that most of the participants experienced decreased concerns. A kurtosis of −0.384 suggested a distribution with flatter tails. The Shapiro–Wilk test (p = 0.415) indicated no significant difference from normality.
The COPD group reported a mean NCD (Necessity–Concerns Differential) change score of 0.257 (SD = 0.776), suggesting a minor shift in perceived NCD of BMQ (
Figure 10). The distribution was moderately positively skewed (1.387), implying that most of the participants perceived a slight increase in NCD. A kurtosis of 3.427 indicated a distribution with relatively heavy tails. The Shapiro–Wilk test (
p = 0.097) suggested a departure from normality.
In the HF group, the mean NCD change score was −0.091 (SD = 1.154), indicating a minor decrease in the perceived NCD change. The distribution was strongly negatively skewed (−2.410), suggesting that most of the participants experienced a reduction in perceived change in NCD. A kurtosis of 8.744 indicated a distribution with much heavier tails. The Shapiro–Wilk test (p < 0.001) indicated a significant departure from normality.
The COPD group reported a mean SUS score of 68.333 (SD = 13.401), indicating a generally favourable perception of the usability of the Medimonitor application. The distribution was slightly positively skewed (0.207), suggesting that most of the participants rated the usability of the application favourably (
Figure 11). A kurtosis of −0.889 indicated a distribution with relatively lighter tails than a normal distribution, implying a moderate degree of peakness. The Shapiro–Wilk test (
p = 0.562) indicated no significant difference from normality.
In the HF group, the mean SUS score was 76.875 (SD = 18.671), indicating a higher mean usability perception than in the COPD group. The distribution was moderately negatively skewed (−1.383), suggesting that most participants in this group also favourably rated the application’s usability. A kurtosis of 2.352 indicated a distribution with relatively heavier tails, implying a flatter peak. The Shapiro–Wilk test (p = 0.013) suggested a departure from normality.
The results of the UEQ benchmark for HF patients indicated a pragmatic quality score of 0.663, categorising it as “bad” and falling within the range of the 25% worst results. In terms of hedonic quality, HF patients had a score of 1.063, indicating that they were above average (
Figure 12). This finding suggests that 25% of the results were better, while 50% were worse. The overall user experience for HF patients received a score of 0.86, categorising it as below average, with 50% of the results being better and 25% being worse.
In contrast, the COPD UEQ benchmark revealed a pragmatic quality score of 0.476, placing it in the “bad” category and within the range of the 25% worst results. The hedonic quality score for COPD patients was 0.726, indicating that it was below average. This implies that 50% of the results were better, while 25% were worse. The overall user experience for COPD yielded a score of 0.60, which was also categorised as below average, with 50% of the results being better and 25% being worse.
In general, the skewness and kurtosis values for the aforementioned usability variables in both groups showed varying degrees of departure from the normal distribution. The HF group had more extreme skewness and kurtosis values for the SUS score, indicating a potential non-normal distribution in usability perceptions among HF patients. On the other hand, the UEQ showed skewness and kurtosis values closer to normal in both groups.
When comparing the UEQ benchmark results between HF and COPD patients, it is evident that both conditions have suboptimal pragmatic quality scores, indicating a less favourable user experience in terms of meeting functional and practical needs. However, it should be noted that HF patients scored higher in terms of pragmatic quality (0.663) than the COPD patients did (0.476), suggesting that HF patients perceive a relatively better fulfilment of their pragmatic requirements.
In terms of hedonic quality, HF patients had a greater score (1.063) than the COPD patients did (0.726), indicating a more positive subjective experience for patients with HF. This could be attributed to various factors such as the nature of the condition, available treatment options, or differences in the management of symptoms. However, further investigation is necessary to understand the specific factors that contribute to these contrasting results.
The overall user experience scores for both conditions indicated that there is room for improvement. Although HF patients had a slightly higher overall score (0.86) than the COPD patients did (0.60), both of these scores fell within the below-average category. This suggested that efforts should be made to enhance the overall user experience for individuals affected by these conditions.
For the COPD group, the mean MARS change score was 0.591, with a median of 0.000 and a standard deviation of 1.532 (
Figure 13). The distribution of MARS change scores showed a positive skew (skew = 0.073) and a kurtosis value of 1.337, indicating a distribution that was slightly right-skewed and had a leptokurtic shape. The Shapiro–Wilk test yielded a test statistic of 0.953, with a corresponding
p-value of 0.888, suggesting that the distribution of MARS change scores can be considered approximately normal.
The pilot study aimed to explore the correlations between user experience, measured through the SUS and UEQ, and patient adherence, assessed by changes in MARS scores. Furthermore, the study included an examination of the perceived necessity of and concerns about specific prescribed medications, as measured by the BMQ at the beginning and end of the pilot study. The results, based on a sample size of 41 participants, were as follows:
When examining the relationship between SUS scores and adherence, the correlation was found to be negative but weak. The Spearman’s rho was −0.144, However, not reaching statistical significance (p > 0.05). This finding suggests that there is no strong association between SUS scores and changes in adherence.
Similarly, the correlations between the appraisal of pragmatic quality (measured by the UEQ) and adherence were weak and not statistically significant and with a Spearman’s rho equal to −0.118.
The appraisal of hedonic quality, as measured by the UEQ, also showed weak and non-significant correlation with adherence as described by a Spearman’s rho equal to −0.143.
There is no significant correlation between the changes in patients’ perception of necessity and the changes in adherence described by Spearman’s rho equal to −0.113. This finding indicates that the changes in patients’ perception of the necessity of the prescribed medication did not significantly affect adherence.
Similarly, the correlation between changes in concerns about prescribed medications and adherence was negative but weak with Spearman’s rho equal to −0.087. However, this correlation is also not significant.
In summary, the results of this pilot study did not indicate any strong associations between user experience (measured by the SUS and UEQ) and patient adherence (measured by changes in MARS scores) (
Table 3). Furthermore, changes in patients’ perception of the necessity of and concerns about prescribed medication, as measured by the BMQ, did not demonstrate significantly influence adherence. These findings suggest that factors other than user experience, perceived necessity, and concerns may play a more influential role in determining patient adherence to treatment regimens. Further research with larger sample sizes and additional variables is necessary to gain deeper insights into the complex relationships among user experience, beliefs about medication, and adherence in the healthcare context.
Linear regression analysis was aimed at investigating the relationship between various factors and patient adherence, as measured by changes in MARS scores.
First, the model fit measures indicated that the model explained a moderate amount of variance in the data (
Table 4). The coefficient of determination (R-squared) was 0.451, suggesting that approximately 45.1% of the variance in the MARS change could be explained by the independent variables. An adjusted R-squared of 0.324 accounted for the number of predictors in the model and penalised overfitting.
The
p-values of the individual predictors revealed that none of the predictors had a statistically significant effect on the MARS change (
Table 5). The SUS score, which represents user experience, had a
p-value of 0.244, indicating that it was not a significant predictor. Similarly, changes in perceived necessity of and concerns about the medication were not significant predictors (
p = 0.564 and
p = 0.178, respectively). However, the MARS baseline score was found to be a significant predictor with a
p-value of less than 0.001.
Normality: Normality tests (Shapiro–Wilk) indicated that the residuals did not strictly follow a normal distribution). This suggests a deviation from the assumption of normality, which may affect the validity of the regression estimates.
Heteroskedasticity: The heteroskedasticity tests (Breusch–Pagan) did not provide evidence of significant heteroskedasticity in the residuals. This suggests that the assumption of homoscedasticity is reasonable.
Autocorrelation: The Durbin–Watson test showed a value of −0.258, indicating the possibility of autocorrelation in the residuals. However, the p-value of 0.160 suggests that the presence of autocorrelation is not statistically significant.
Collinearity: Collinearity statistics (Variance Inflation Factor and Tolerance) indicated no severe multicollinearity among predictor variables. All the variables had VIF values of less than two and tolerance values greater than 0.5, suggesting that collinearity is not a major concern.
In summary, the linear regression analysis provided limited support for the relationship between the factors examined and patient adherence. While the model explained a moderate amount of variance, none of the individual predictors were found to be significant in predicting changes in adherence. Furthermore, assumptions related to normality, autocorrelation, and collinearity showed deviations from the ideal conditions, which may affect the reliability of the regression estimates.
Another objective of this study was to compare user experience (UX) and adherence between two groups of patients, specifically those with chronic heart failure (HF) and patients with chronic obstructive pulmonary disease (COPD). The results of the independent samples’
t-test reveal several interesting findings (see
Table 8).
In terms of UX, the following three dimensions were assessed using the User Experience Questionnaire (UEQ): Hedonic Quality, Pragmatic Quality, and Overall UX. The t-test results reveal no significant differences between the HF and COPD groups in Hedonic Quality (t = −1.149, p = 0.258), Pragmatic Quality (t = −0.827, p = 0.413) or Overall UX (t = −1.229, p = 0.226). These findings suggest that both groups perceive a similar level of UX when interacting with the system.
To evaluate adherence, the MARS was used to measure baseline adherence, adherence at the end of the pilot, and change in adherence. The t-test results show no significant differences between the HF and COPD groups in terms of baseline adherence (t = −0.532, p = 0.598), adherence at the end of the pilot (t = 0.663, p = 0.511) or change in adherence (t = 1.136, p = 0.262). These findings suggest that both groups exhibit similar levels of self-reported adherence to their medication regimen.
Additionally, the t-test results for Necessity Change (t = 0.256, p = 0.800) and Concern Change (t = −0.425, p = 0.674) indicate no significant differences between the HF and COPD groups. These findings suggest that both groups experienced similar changes in their perceptions of medication necessity and concern.
Furthermore, the t-test results for NCD Change (t = 1.001, p = 0.324) reveal no significant differences between the HF and COPD groups in terms of the change in the necessity–concerns differential of BMQ. This suggests that both groups report similar perceptions of beliefs about medication.
Finally, the t-test results for the SUS score (t = −1.689, p = 0.099) indicate no significant differences in the perceived usability of the system between the HF and COPD groups. Both groups report similar levels of usability.
In summary, the results of the t-test analysis comparing UX and adherence between the HF and COPD groups suggest that there are no significant differences between the two groups. Both groups had reported similar perceptions of UX, adherence, medication necessity and concerns, change in negative consequences, and system usability. These findings indicate that the system was equally well-received and effective in both patient populations, highlighting its potential to support adherence to medications in contexts of different chronic diseases.
We used jamovi for statistical analysis [
50,
51,
52,
53].