1. Introduction
Stroke is one of the major global health issues, with over 13 million new cases annually and representing the second leading cause of mortality and disability worldwide. According to the Global Burden of Disease Study (GBD) in 2016, there was an increase in frequency among younger groups (under 50 years old), depicting the distribution of incidence across ages. Moreover, the incidence of stroke, along with stroke-related mortality and disability, increased nearly two-fold from 1990 to 2016 [
1]. According to the consequences of disability, muscle dysfunction is the predominant form of impairment following a stroke. Muscle dysfunction significantly worsens the risk of arm paralysis after a stroke and is frequently associated with increased impairment, reduced work capacity, and diminished quality of life. The accumulation of inactive muscle fibers due to dysfunction leads to abnormal muscle activation patterns, such as spastic muscle contractions, which may contribute to further muscle-related disorders [
2,
3]. Consequently, individuals who have experienced a stroke affecting their arm may struggle to perform daily activities [
4].
In order to restore the function of muscle fiber after a stroke, rehabilitation is necessary as a routine muscle recovery process [
3,
5]. Rehabilitation refers to the combined and coordinated use of medical, social, educational, and occupational measures to retrain a person to the highest functional skills level [
5]. The rehabilitation process includes assessing the patient’s impairment condition, followed by further medication, training, and reassessment. The assessment is intended to observe and evaluate the patient’s actual condition. Thus, a suitable training and medication program can be specifically arranged for each patient. In this regard, the assessment process is essential, as it determines the patient’s actual condition at the beginning of the rehabilitation program and after several medication and training processes.
Several clinical methods to assess the impairment level of post-stroke patients are currently being employed worldwide. Regarding the assessment of motorfunction, the Fugl-Meyer assessment (FMA) is a notable tool, as it provides a detailed assessment protocol with a scoring system in most human extremities. The Fugl-Meyer scale is a groundbreaking quantitative evaluative tool to measure sensorimotor stroke recovery. The motor-domain function of this tool encompasses large and small parts of the upper–lower extremities, suggesting the most comprehensive measures of motor impairment following stroke. One of the assessment components the assessment a complex motor function, such as finger movement, that is composed of a combination of complex muscles [
3]. In the case of finger movement, the Fugl-Meyer assessment of the hand’s upper extremity (FMA-UE) is typically selected. As shown in
Table 1, this questionnaire consists of seven finger movement tasks, namely Mass Extension (ME), Mass Flexion (MF), Hook Grasp (HG), Thumb Adduction (TA), Pincer Grasp (PG), Cylinder Grasp (CG), and Spherical Grasp (SG). Meanwhile, the impairment level consists of three conditions (i.e., full, partial, and none).
The doctor or physiotherapist has an inherent role in the assessment. Several finger movements in FMA-UE are object-dependent tasks that require a patient to hold an object while the doctor tugs it. This mechanism in the Fugl-Meyer assessment is the conventional method that requires the doctor to observe the impairment level manually through visual and tug inspection. Nevertheless, the inherent role of a doctor in the conventional method raises another issue, i.e., inherent subjectivity, which also promotes the subjectivity of the assessment result. Consequently, this issue influences the accuracy and repeatability of the assessment score, which is particularly essential for repeated assessments in the rehabilitation process.
Considering a potential modality to address the subjectivity issue in FMA, electromyography (EMG) is a golden standard for objectively assessing muscle and nervous system function. It is widely used in rehabilitation and gesture recognition studies [
6,
7]. However, few studies have examined recognition performance in EMG-related subjective levels.
Our earlier study examined the performance of the EMG-based impairment-level recognition method in post-stroke patients for finger movement [
8]. We reported good recognition results on several FMA-based finger movement tasks, highlighting the promising performance of EMG. We also pointed out several crucial areas for improvement in the experimental settings, as the number of patients was only four, with incomplete impairment levels collected, consisting of full and partial levels only. Additionally, there were only two EMG channels, and the employed sampling frequency was too small. The occurrence of imbalanced dataset conditions could be addressed with a standard resampling method with a SMOTE filter algorithm [
9]. The employed machine learning methods were Support Vector Machine (SVM) and Random Forest (RF), with average accuracies of 67.1 and 64.6, respectively. However, the score could not be generalized due to critical issues in the experimental settings and accuracy bias due to the imbalanced dataset. Consequently, the machine learning models were at risk of recognizing the participant personally instead of the impairment levels. Another highlighted point is the tendency of imbalance of the collected impairment level target, where one patient can only exhibit one or two levels without certain information prior to assessment. Therefore, an improvement in the experimental settings is required.
This study focused on improving the recognition of finger movement impairment levels in the FMA, aiming to reduce the issue of subjectivity. This study also addressed the subjective assessment issue of FMA and the doctor’s perspective according to the performance of the machine learning model. The final aim of this study is to assist doctors in deciding patients’ actual impairment levels in finger movement. Therefore, this study also introduced the implementation of the constructed method in a desktop application that can recognize and display the impairment level of each finger movement and output an assessment video. Thus, a doctor or physiotherapist can double check and evaluate the impairment level separately.
3. Methods
In this section, we propose an improved recognition method to detect the impairment level of finger movement based on the Fugl-Meyer assessment in a non-ideal condition of imbalanced datasets. The improved recognition method is intended to benefit the system’s reliability and reproducibility in addressing the subjectivity issue of the FMA. By constructing the system under the non-ideal condition of an imbalanced dataset, the system adapts to the actual assessment condition in rehabilitation facilities, thereby enabling it to assist doctors and efficiently minimize the subjective nature of the assessment.
3.1. Participants
Prior to recruiting the participants, several criteria had to be strictly followed. All participants had to be partially impaired in one arm and be able to maintain a sitting position either on a chair or in a wheelchair by themselves. According to the assessment of a rehabilitation doctor, the participants had undergone at least three months of a rehabilitation program. The participants were all 18 years old or older on the day of the experiment. The participants were not congenital stroke patients and had no permanent finger injuries. The participants did not have any cognitive impairments and were capable of understanding the doctor’s instructions. After fulfilling the inclusion and exclusion criteria, the participants were selected based on a random sampling process. The participants, who never underwent a Fugl-Meyer assessment, were requested to participate in the experiment on the day of their rehabilitation program in the Airlangga University Hospital.
In this study, the participants comprised 28 stroke patients (17 males and 11 females). All experiment procedures were explained to the participants and their relatives, and they voluntarily agreed to participate. All participants sat on a chair or wheelchair alone or with the help of relatives or clinicians. Prior to the experiment, each participant was explained all of the experimental protocols and learned the FMA-based finger movement, as shown in
Table 1. The ethical committee of Airlangga University Hospital, Surabaya, Indonesia (No. 125/KEP/2023) approved all participant selection criteria and experimental procedures.
3.2. Instrumentation
This study employed four channels of the EMG sensors of two TSND151 and AMP151 sensors (ATR-Promotion Inc., Kyoto, Japan). The TSND151 is a compact, wireless, multi-function sensor embedded with inertial measurement unit sensors (IMUs) and an external terminal 16-bit analog–digital input. Regarding EMG measurement, the external terminal was connected to the AMP151, an extended amplifier specified for biological signals. The sampling frequency was set to a maximal setting of 1000 Hz with 1000× signal amplification. The AMP151 has a common mode rejection ratio exceeding 90 dB, an input impedance of 200 G, and a power supply rejection ratio above 105 dB. A rectangular surface electrode of 19 × 38 mm in size was utilized. The electrode featured a polymer gel and Al/AgCl electrode material. Other instruments were used in relation to the FMA task for finger movement, such as a pen, a piece of paper, a cylinder, and a tennis ball.
3.3. Electrode Attachment
In this study, we improved the data collection process by capturing EMG signals from three extrinsic and one intrinsic muscle activity. In contrast, the previous experiment only captured two extrinsic forearm muscle activities [
8]. The attachment locations of the electrodes were the extensor digitorum muscle (Channel 1), flexor digitorum muscle (Channel 2), extensor pollicis brevis muscle (Channel 3), and flexor pollicis brevis muscle (Channel 4). Channels 1–3 corresponded to extrinsic muscles, and channel 4 corresponded to intrinsic muscle [
17]. The utilization of intrinsic muscle must be considered due to its potential to provide a prominent EMG signal of a specific hand finger. The muscle of channel 4 was selected due to its high relative importance in providing an EMG signal of pinch-related movement [
18]. The locations of muscle attachment are shown in
Figure 1.
Prior to attaching the electrodes, an alcohol swab was applied to cleanse the skin. The electrodes were arranged so that the inter-electrode distance of an EMG channel was approximately 20 mm. Additionally, the electrodes were attached in a parallel orientation with respect to the muscle. These settings were utilized to obtain a good-quality EMG signal and minimize the occurrence of cross-talk [
19]. The positions of the employed muscles were not adjacent to each other, avoiding cross-talk interaction between the employed muscles, as shown in
Figure 1. In this study, the doctor guided the attachment of the electrode to find the best position concerning the target muscles. Hence, a good-quality EMG signal could be obtained. The subtle cross-talk interaction of the adjacent muscles with the employed muscle might have still occurred due to a muscle coordination event. However, given the anatomical positioning and functional roles of the employed muscles, this condition may be deemed negligible, as there is unlikely a cross-talk interaction among them [
17,
18].
3.4. Data Processing Flow
The data were processed to be prepared as machine learning input. The flow encompassed several steps: signal filtering, scaling, movement event exporting, and feature extraction. Following the feature extraction process, two mechanisms were employed to evaluate the performance of machine learning models. The first mechanism was inter-subject cross-validation, hereinafter referred to as ISCV, which consisted of a data resampling process to address imbalanced dataset conditions. Subsequently, the data were directly deployed in machine learning after the feature extraction step. Another mechanism that was employed is data-scaled inter-subject cross-validation, hereinafter referred to as DS-ISCV, which added data scaling before the classification process. The signal filtering process until feature extraction was performed individually for each participant’s data. Subsequently, the classification processes of both mechanisms were performed using inter-subject cross-validation. Finally, we observed and analyzed the classification outcomes of both mechanisms for each FMA-based movement task. The data process flow is shown in
Figure 2.
3.4.1. Signal Filtering
Motion artifacts, baseline, and power-line interference are primary noise sources in EMG signals [
20,
21]. The frequency of power-line interference is typically 50 Hz, depending on the location. The range of baseline and motion disturbances was between 0 and 20 Hz. Power-line interference also tends to induce harmonic noises that yield high spikes at multiples of 50 Hz, which are observable in the frequency domain. A typical process used to address this a signal-filtering technique. In the previous experiment, a Butterworth high-pass filter was able to generate a clean signal [
8]. Nevertheless, conventional digital filters still leaves undesirable noise within non-muscle-contracting signals such as a relaxed event.
One commonly used digital filtering method is wavelet denoising (WD). In contrast to Butterworth filters, which produce flat or smooth frequency responses, WD applies wavelet transform to decompose the signal into frequency components with matched resolutions [
22]. Consequently, unlike Butterworth filters, WD offers superior time-frequency localization, ensuring the retention of precise signal features while minimizing noise. The process of wavelet denoising involves an initial decomposition of the signal through a wavelet transform (WT), followed by the application of appropriate thresholds to the detail coefficients. This step entails setting all coefficients below the associated thresholds to zero. Subsequently, the denoised signal is reconstructed based on the modified detail coefficients [
23].
Parameter selection for wavelet denoising is essential. Preserving the muscle-contracting amplitude of the EMG signal is mandatory when eliminating undesirable noises. Additionally, addressing the non-stationary behavior of EMG signals remains a challenge. Thus, selecting incorrect parameters may result in poor noise removal or the elimination of essential amplitudes. A previous study investigated the optimum wavelet function to identify and denoise EMG signals [
24]. According to the study findings, the Daubechies1 (db1 or haar) wavelet with a hard transformation at 0 dB signal-to-noise ratio (SNR) achieved the best denoising performance. In this study, we combined the Butterworth filter and wavelet denoising techniques. We employed high-pass and bandstop Butterworth filters to eliminate the specific frequencies of the noises. Wavelet filters with a db1 function, hard transformation, and level 1 decomposition were employed to eliminate the remaining noises within the relaxed event signal.
Figure 3 illustrates the result after the signal filtering process.
3.4.2. Signal Scaling
An EMG signal represents the combined motor unit action potentials during contraction, recorded at a specific electrode location. The surface EMG voltage potential is greatly influenced by various factors, which differ among individuals and may also change over time within the same individual. Consequently, the amplitude of an EMG signal is ineffective for group comparisons or monitoring over extended durations [
25]. Subsequently, several studies have shown that a scaling method such as signal normalization or standardization can effectively reduce the differences between records within and across participants [
25,
26,
27]. Several studies have used the z-score normalization method to improve the consistency of EMG signals [
8,
25,
28,
29,
30,
31]. This study employed the z-score method for EMG signal scaling. The z-score method scales the signal instances by removing and scaling the mean feature to unit variance.
3.4.3. Event Exporting and Feature Extraction
Following the signal-filtering process, the EMG signal still consisted of all events of FMA movement tasks, including the relaxation period. Separating task-related amplitude from a relaxed state is essential to avoid undesirable data deployment in the feature extraction process and machine learning algorithms, such as the relax-related signal. In the following step, feature extraction was conducted on each exported event separately. As a result, only EMG signals corresponding to task-related events were extracted. An illustration of the event-exporting and feature extraction process is depicted in
Figure 4.
Many features have been introduced for recognition purposes, with time- and frequency-domain features mainly employed due to their simplicity, low computational costs, and promising recognition results. In the previous experiment, seven time-domain features and one frequency-domain feature were employed [
8]. The extracted features were mean absolute value (MAV), variance (VAR), root mean square (RMS), waveform length (WL), slope sign change (SSC), zero crossing (ZC), Willison amplitude (WAMP), and mean power frequency (MPF). Many studies have utilized these features to identify the amplitude, complexity, and frequency characteristics of EMG signals [
32,
33,
34,
35]. However, in the case of impairment-level recognition, the amplitude-related features can marginally differ in both between and within participants. Therefore, emphasizing the complexity-related features provides distinct information between impairment levels.
Several studies have introduced features that depict the complexity of EMG signals. Thongpanja et al. and Oo et al. utilized the skewness feature to provide complexity information from EMG signals [
36,
37]. Skewness measures the asymmetry of a variable within a distribution. Zero skewness signifies a symmetric distribution, while positive skewness indicates a right-skewed distribution and vice-versa [
36]. Another promising feature that provides complexity information for EMG signal is Shannon entropy. This feature describes a signal’s irregularity, complexity, and unpredictability characteristics [
38].
Furthermore, the inherent non-stationary nature of EMG signals potentially provides vital feedback with respect to the entropy feature [
39]. Typically, an increased value of Shannon entropy of an EMG signal represents good muscle condition, with many muscle motor units contributing to muscle contraction and a high-amplitude and randomized EMG signal. On the contrary, an impaired muscle makes it difficult to capture an EMG signal, resulting in a low value of Shannon entropy. In this regard, this study employed several features to focus more on the complex characteristics of EMG signals, including 5 time-domain features and 1 frequency-domain feature: waveform length (WL), zero crossing (ZC), mean absolute value (MAV), skewness, Shannon entropy, and mean power frequency (MPF).
3.4.4. Data Resampling and Scaling
Data resampling was performed to address the imbalance issue in the training datasets, as shown in
Figure 5. The famous resampling approaches include oversampling, undersampling, and both combinations. Incorrect selection of the resampling method may lead to invalid classification results. In this study, the actual number and value of the minority classes were essential, as they correspond to the actual impairment level of the patient. Several studies have shown the promising performance of undersampling approaches to maintain the number of minority classes and produce good classification results [
40,
41,
42]. This study utilized a random oversampling approach, employing Imblearn library in Python with a sampling strategy to resample all classes except the minority class.
Data scaling was performed for the second mechanism in the following step, as shown in
Figure 5. Data scaling is a standard measure involving the treatment of data before inputting them to machine learning algorithms. This technique is instrumental in managing diverse data input scales and improving machine learning models’ performance. One such data-scaling techniques is z-score scaling, in which z-score normalization or a standardization term is employed. The z score scales the data instances by removing the unit mean and dividing it by the unit variance. Several studies have proven the performance of the z-score method in enhancing classification accuracy and efficiency. Al-Faiz et al. showed that z-score scaling decreased the number of epochs required for the learning network [
43]. Suma et al. achieved improved accuracy after z-score scaling [
44]. Long et al. reported that classification performance was significantly improved after z-score scaling [
45].
3.5. Classification
In the previous experiment, we built machine learning models using the support vector machine (SVM) and random forest (RF) algorithms [
8]. Despite being known as traditional machine learning methods, these algorithms are capable of recognizing complex patterns in the field of EMG-related classification [
46,
47,
48,
49]. The SVM and RF algorithms are based on support vector and ensemble-type architectures, respectively. In this study, we employed an additional algorithm, namely multi-layer perceptron (MLP), a neural network architecture. A machine learning model was built for each FMA movement task to ensure that the final output of each model was focused on the impairment level of finger movement. As shown in
Figure 5, two mechanisms of data processing flow were applied, resulting in two classification results of the first (ISCV) and second (DS-ISCV) mechanisms for each machine learning model. In the present study, the machine learning models of the second mechanism are denoted as SVM_scaled, RF_scaled, and MLP_scaled.
A Python library named scikit-learn was employed to implement the machine learning models. The parameters of each machine learning algorithm were pre-determined. SVM utilized a poly kernel of degree 3, a regularization parameter of 3, and a gamma parameter of 1 divided by the number of features. Regarding the random forest algorithm, the employed number of trees was 700 with a Gini criterion, and the maximum depth of the tree was 15. Lastly, the MLP parameters comprised a hidden-layer size of 100, a maximum number of iterations of 200, an alpha parameter of 1 × 10−4, an initial learning rate of 1 × 10−4 with an adaptive method, and a stopping tolerance of 1 × 10−7, with rectification as the activation function and a stochastic gradient-based optimizer.
3.6. Evaluation
This study evaluated the performance of the employed machine learning models through the inter-subject cross-validation method. The inter-subject cross-validation process was chosen to deal with cases in which a patient can exhibit only one or two impairment levels for a movement task. In this study, the inter-subject cross-validation procedure encompassed the classification process and the prior data preparation processes, as shown in
Figure 2. The objective of this approach was to correctly classify inter-subject cross-validation, especially in train–test dataset selection, and evaluate the outcome of the machine learning models. An illustration of the inter-subject cross-validation approach is shown in
Figure 5. The ISCV mechanism encompasses data resampling and classification, while DS-ISCV includes the data scaling process after resampling. The first cross-validation fold encompassed selecting the test data from participant number 1, with the remaining data used as a training dataset. Subsequently, the last fold utilized test data from participant number 28, while the remaining data were used as a training dataset.
3.7. Data Collection Experiment
During the experiment, four channels of EMG sensors were attached to the participant’s forearm and palm on the impaired side, where a cleaning measure with an alcohol swab was performed prior to attachment in the skin area of the electrode attachment position. A doctor sat before the participant to assess and give finger movement instructions, as shown in
Figure 6.
Figure 7 shows that the assessed finger movement encompassing MF, ME, HG, TA, PG, CG, and SG. The participant was instructed to perform an FMA-based finger movement for 5 s with five repetitions. This repetition approach was intended to observe the stability of the patient’s finger movement to achieve a reasonable assessment. Simultaneously, the doctor observed the exhibited finger movement and performed the assessment accordingly.
Additionally, the patient was instructed to move only the finger part while the doctor performed a tug of the experimental object. This measure was intended to produce a proper assessment by the doctor. This study utilized the doctor’s assessment as the ground truth or true label for the employed machine learning models. The experimental environment is shown in
Figure 6.
One session of finger movement assessment encompassed five repetitions of one finger movement. A 12 s relaxation or resting period was allowed between movement repetitions. However, a preferable resting period between repetitions may be set arbitrarily based on the doctor’s instruction. Subsequently, the participant was allowed a preferable resting period before moving to another finger movement task. This setting was implemented to avoid muscle fatigue, which can influence the stability of the measured EMG [
32]. Additionally, an experimenter held a timer to guide the movement repetitions to avoid any mistakes. The participant’s EMG signal and an experimental video were also recorded throughout the data collection process. Illustrations of movement repetitions with the corresponding resting periods are shown in
Figure 8.
4. Results
This section demonstrates the presence of an inherently imbalanced dataset in the actual patient experiment. Consequently, recall scores between ISCV and DS-ISCV are compared. Furthermore, the recall score of non-majority classes is evaluated. Finally, detailed cross-validation results of the best machine learning model for each finger movement task are shown. Each result includes the recall score of the actual target class and the misclassification rate of the other target classes.
4.1. Movement Event Data
As mentioned in
Section 3.7, there were 28 participants, and each participant was instructed to perform each finger movement task with five repetitions. Ultimately, the total number of movement events for one task with five repetitions across 28 participants was 140, and the total number repetitions of the 7 FMA finger movement tasks across the 28 participants was 980. However, some movement events were removed due to the participant’s undesirable hand movement, with unstable conditions occurring in the EMG signal, resulting in a poor EMG waveform shape. The final movement events after data removal are shown in
Table 2.
As shown in
Table 2, the movement events of participant P8 in ME, P2 in PG, and P5 in SG were removed entirely. Consequently, the classification of these movements comprised only 27-fold cross-validation. On the other hand, several participants had fewer than five events in some movement tasks after data removal. The total number of movement events in
Table 2 shows an imbalanced dataset, where the majority class is the full level in all movement tasks. The partial level was the minority class in ME, MF, HG, CG, and SG, whereas the none level was the minority in TA and PG. Furthermore, there is a significant difference in the number of majority and minority classes, with an average of 92 ± 10 events.
Following the data collection experiment, the EMG data underwent a data processing step, as shown in
Figure 2. Following the feature extraction step with a 500 ms window size and a 100 ms window step, the processed EMG data were extracted to be prepared as machine learning input. The numbers of data points for each movement task and impairment level are shown in
Table 3.
4.2. Recognition Performance
A well-known metric such as accuracy may provide misleading information in an imbalanced dataset. The bias of the score toward the majority class accuracy causes this issue. Thus, this study evaluated recognition performance using an average of recall metrics to minimize bias due to an imbalanced dataset. The average of recall encompassed calculating the recall for each impairment level after concatenating the actual and predicted levels in inter-subject cross-validation, followed by calculating the overall average across impairment levels. Additionally, the average recall score of minority classes is presented.
In
Table 4, gray cells represent the highest recall score for each movement. Four out of seven movements surpassed a recall of 0.50, namely ME, HG, PG, and SG. Additionally, the ISCV and DS-ISCV machine learning models performed well depending on the movement, with the highest recall achieved by SVM for the second mechanism (SVM_DS-ISCV) in MF, the first mechanism of MLP (MLP_ISCV) in HG, SVM_DS-ISCV in TA, SVM_ISCV in PG, MLP_ISCV in CG, and MLP_ISCV in SG. However, in ME, SVM_DS-ISCV and MLP_ISCV shared the same recall score of 0.70. In this case, the average recall score of the minority classes of the partial and none levels must be observed.
The average recall scores of non-minority classes presented in
Table 5 were calculated from the partial and none levels. In the ME movement, SVM_DS-ISCV achieved the highest score, making it the best model for recognizing non-majority classes. However, different outcomes occurred in HG and TA, with MLP_DS-ISCV achieving the highest recall score for non-majority classes in both movements. There was a marginal difference in the recall scores for the overall (
Table 4) and non-majority (
Table 5) classes in the HG movement due to the trade-off between the recognition of majority and non-majority classes. In the TA movement, despite the highest score being achieved by SVM_DS-ISCV (
Table 4), for the non-majority class, MLP_DS-ISCV achieved the highest score. This was due to the high recall score for the majority class in TA. Considering the importance of the non-majority class, this study indicates the best model for each movement task with a dagger (†) symbol in
Table 5.
Figure 9 shows detailed information in concatenated confusion matrices for each movement after inter-subject cross-validation. In general, SVM_DS-ISCV of ME and MLP_ISCV of SG achieved good recognition performance, with the predicted labels achieving more than 50% correctness compared with all true labels, as shown in
Figure 9a,g respectively. Subsequently, SVM_DS-ISCV of MF, MLP_ISCV of HG, and SVM_ISCV of PG achieved more than 50% for two labels, as shown in
Figure 9b,c,e respectively. In contrast, the remainder achieved correct classification of the majority class only. However, the true labels were the doctor’s manual inspection assessment results. Therefore, the aforementioned good recognition performance represented the ability of the machine learning model to produce the same assessment result as the doctor.
4.3. Portion of Classification Outcome Across Impairment Levels in Inter-Subject Cross-Validation
Figure 9 shows the general information of correct-incorrect classification after concatenating the classification outcome across the participants. This subsection presents the detailed classification outcome of each cross-validation fold, including the misclassification rate across impairment levels. In this study, the misclassification rate of each cross-validation fold was calculated from the false positive rate, and the classification outcome of the corresponding impairment level was the recall score. Therefore, the classification portion can be observed using these metrics as shown in
Table 6. The recall score for the actual level is shown in black, while the false positive rate for other levels is shown in red for each fold. Each row corresponds to a specific fold in the inter-subject cross-validation, as illustrated in
Figure 5.
In contrast to
Figure 9, which showcased the concatenated confusion matrices after cross-validation, the classification portion offers detailed insight into discerning the misclassification trends of each movement. Consequently, this allowed for a comprehensive evaluation of the performance of the best machine-learning models on each movement. Most movements showed the misclassification that resulted in the recall score below 0.5 occurring between Full and Partial levels in ME, HG, TA, PG, and CG. On the other hand, misclassification between Partial and None levels occurred in MF, whereas misclassification between None and Full levels in SG.
4.4. Comparison of Recognition Performance with Previous Experiment
This study presents an improved recognition method for finger movement impairment levels based on the FMA. In the previous experiment, the F1 score of each impairment level was utilized as an evaluation metric. The previous experiment only involved the application a holdout method in the data-splitting process, and the dataset was split into 50% of training and testing datasets. However, this study included leave-one-out cross-validation, with the participant data representing the number of observations for data splitting. Accordingly, despite making this experiment slightly less comparable to the previous experiment, we employed the F1 score of the best fold of inter-subject cross-validation for comparison.
In the previous study, the impairment level of the none level was not acquired, as shown in
Table 7 [
8]. Therefore, only full and partial levels were utilized in this study. MF achieved the lowest F1 score, whereas TA achieved the highest. In the current study, the lowest F1 score was that of CG, whereas PG achieved the highest, as shown in
Figure 8. The F1 scores presented in
Table 8 have higher values than the recall score in
Table 6.
Figure 10 shows that the F1 scores achieved in this study are superior to those achieved in the previous experiment.
6. Desktop Application
This study also presents a use case of the proposed method in actual assessment using a desktop application. This study aimed to assist doctors in making more accurate judgments by employing a an EMG biosignal modality to automatically recognize the finger movements of the FMA. The implementation of the method proposed in this study requires the doctor to perform a manual assessment, especially in the object-grasping task. Simultaneously, the system automatically assesses the impairment level based on EMG muscle information. Finally, the doctor can double-checked the system’s output before deciding on the impairment level of the patient.
Figure 11 shows the desktop application’s assessment window with several features. The movement list feature consists of a list of FMA finger movements, where each item corresponds to the selected machine learning model. The connect and disconnect buttons control the connections between sensors and the application. The start and stop record buttons control impairment level recognition and camera recording processes. The impairment level is displayed in the impairment level display. A camera display shows and records finger movements. Lastly, the signal obtained from the sensors is displayed in the EMG signal display. The assessment window mainly implements the proposed method to output an impairment level for the FMA finger movements. As shown in
Figure 11, a display shows impairment at the none level, with a red background. This impairment level was produced through several processes in the proposed method that were adjusted for real-time assessment inside the application.
The first process of the application workflow was to store the EMG signal from four channels individually in an object with a size of 1000 instances. This process was intended to fix the number of EMG data to 1000 instances. The stored EMG data were then filtered using both Butterworth and wavelet filters. Subsequently, the data were extracted with five time-domain features and one frequency-domain feature, as explained in
Section 3.4.3. The last process was to input the data into the established machine learning model to output the impairment level. The established machine learning models were connected with the finger movement item inside the movement list features. Therefore, the model can automatically select when the user chooses a finger movement item inside the feature.
The assessment window consists of several stages—the early stage, the pre-recognition stage, and the recognition stage—intended to help the user run the assessment comfortably, as shown in
Figure 12. At this stage, the camera display, the green push button for connecting to sensors, and the combo box for finger movement selection are enabled. Meanwhile, the other features are automatically disabled. This stage has a function to facilitate the user adjusting the camera position for the assessment, connecting the application to the EMG sensors and selecting a finger movement to be assessed. In addition, each item in the list of finger movements is connected with the selected machine learning model to be utilized in the recognition stage.
The next stage is the pre-recognition stage. This stage starts after the user clicks the connect-sensor push button and the connection of the sensor with the application is successfully established. At this stage, several features that correspond to several functions are enabled. The first enabled feature is the plot widget, which is used to display the EMG signal. In this feature, the user may check the displayed EMG signal for each channel. With this feature, the doctor may also check for an error in the resulting EMG signal when the patient performs any movement. Therefore, an adjustment in the position of the electrodes or other necessary action to obtain a proper EMG signal is feasible. In addition, the selection of the finger movement is still feasible before the recognition stage. Another enabled feature is the green start-record push button, which is used to start the recognition and video recording process. If the user wishes to finish the assessment, the red disconnect-sensor push button is enabled at this stage.
Following the command from the start-record push button, the recognition stage is started. At this stage, the corresponding impairment-level information is also transferred to the displayed video. Therefore, the video displayed in the recognition stage simultaneously shows the assessment date and the impairment level. Furthermore, the video recorded on the PC allows the doctor to review the impairment level of the corresponding finger movement outside the desktop application.
The user has two options to proceed with the assessment window at this stage. The first option is to stop the assessment for the current movement and start another movement. In this option, the user can press the stop-record button and enter the pre-recognition stage. Subsequently, the user can select different movements from the movement list and press the green start-record button to start the assessment. The other option is to finish all assessments. The user can press the red stop-record button and disconnect-sensor buttons.