1. Introduction
Electromyography represents one of the most used techniques adopted to extract information about the control of movement carried out by the central nervous system [
1]. In particular, surface electromyography (sEMG) is extensively used in research and clinical fields such as rehabilitation and diagnosis processes [
2,
3]. Further, the sEMG signals are perhaps the most used bio-signals when dealing with lower and upper limb prosthesis control, and in this field a significant interest is recognizable in hand gesture recognition from forearm myoelectric activity [
3,
4,
5]. Indeed, from the seminal work by Lee and Saridis [
6], the field of hand gesture recognition from sEMG signals gained a dramatic increase in terms of related research [
7], which seems to be focused on two main subtopics, i.e., classification architectures and feature selection. Regarding the former, machine learning algorithms still result in the most used classifiers when dealing with sEMG-based gesture recognition, e.g., support vector machines, artificial neural networks and the k-nearest neighbors [
2,
8]. However, relatively less complex algorithms, such as linear discriminant analysis and decision trees, have also been employed for gesture recognition driven by myoelectric signals, providing fully satisfying outcomes [
5,
9]. Further, naive Bayes, quadratic discriminant analysis, recurrent neural networks [
5,
10] and architectures based on extended associative memories [
11] have been also tested. More recently, novel machine learning and deep learning methods have been investigated for hand motion classification [
12,
13,
14]. However, irrespective of the particular method adopted for gesture recognition, a number of challenging topics remain still open in the field of gesture recognition, such as the type and location of electrodes [
4,
15] and the most appropriate signal processing for sEMG feature extraction and selection, which still represents a fundamental issue [
2,
3,
7,
9].
Regarding the former aspect, the most common procedure is choosing a pair of forearm muscles, e.g., the flexor and extensor carpi radialis [
16,
17], or adding to the former also upper-arm-placed probes [
18], with a single-differential electrode configuration or by using multi-channels sEMG arrays [
4], which allow one to detect more spatially located muscle characteristics, such as fiber length, conduction velocity or temporal characteristics of single motor units [
1,
3], by means of the sEMG signal decomposition technique [
4]. However, in more recent years, the use of low-cost and off-the-shelf devices gained increasing traction for myoelectric-based hand gesture recognition and prosthesis control. In this field, the Myo armband (Thalmic, Ontario, Canada), despite some obvious technical limitations, e.g., a low sampling rate, represents one of the most popular devices for upper limb sEMG signal acquisition, being affordable, easy to set and use and unobtrusive [
2,
7,
10,
11,
14,
19]. Further, it was also used for publicly available datasets [
20] and for myoelectric control of prosthetic hands [
11].
On the other hand, in the field of gesture recognition, and in prosthetic control more generally, reliable feature selection is a key factor [
2,
3,
9]. Throughout the years, many different features and sets of features have been proposed, and their suitability for accurate gesture classification has been extensively investigated. Myoelectric features can be roughly sorted into three main categories: time domain features, frequency domain features and time-frequency domain features [
7]. Hudgins et al. [
21] proposed a set of time domain features, encompassing mean absolute value, waveform length, slope sign change and zero crossing, which was later modified by Du et al. [
22] by adding the integrated EMG measure, the variance and the Willison amplitude. In some cases, time domain measures were also considered in conjunction with autoregressive model coefficients [
19,
23]. For what concerns the frequency-domain features, a number of different measures are considered based on the power spectral density (PSD) of the EMG signal. The most used ones can be recognized to be the mean and median frequency and the mean and total power spectrum [
7,
8,
17]. Additional commonly used features are the peak frequency and the frequency ratio [
9]. Nonetheless, the reliability of frequency domain features represent a still debated issue [
4] and their effectiveness for hand gesture recognition was not always acknowledged, providing at most the same discrimination of the features in time domain, without further advantages [
9]. On the other hand, time-frequency domain features, e.g., discrete wavelet coefficients, discrete wavelet packet coefficients and continuous wavelet transform coefficients, e.g., [
24,
25], demonstrated little improvements in myoelectric pattern recognition with respect to time domain ones [
4], which take advantage by being relatively simple to calculate, with a low computational load [
4]. In addition, Phinyomark et al. [
9] investigated a set of thirty-seven time and frequency domain measures, showing that considerable levels of accuracy in gesture recognition can be attained by using also a very small number of time domain features. Therefore, time domain measures appear to still be the most used features when dealing with myoelectric signal classification and sEMG-based hand gesture recognition [
2,
5,
7,
9]. However, due to the well recognized non-linearity of said kind of bio-signal [
1,
16], non-linear time-series analysis was also introduced for myoelectric signal processing and feature extraction [
18,
26]. In [
18] the authors compared the performances of Higuchi’s fractal dimension and of the detrended fluctuation analysis with respect to a limited set of classical sEMG features, showing promising results if such kinds of non-linear measures are applied on weak sEMG signals acquired from the upper limbs. Later, in [
17] an extended set of fifty features belonging to both time and frequency domains was examined, further adding a series of non-linear measures, including a box-counting dimension, critical exponent analysis, a Katz fractal dimension, approximate entropy and sample entropy. The latter appeared as the best single feature among the other ones and its use together with another four features showed the best performance as a multiple-feature set. In particular, fuzzy entropy (FEn) was developed by Chen and colleagues [
16] for sEMG signal characterization, showing better performances with respect to approximate and sample entropy in distinguishing among arm gestures [
16]. However, its validity was investigated for high quality sEMG signals, acquired with a gold-standard system in a single-differential probe setup, while also considering a relatively limited set of hand gestures [
16,
27], and its validity in the hand gesture characterization field has not been investigated in further works, except for a comparison with different kinds of complexity measures [
27].
The FEn relies on a fuzzy function to assess similarity, while more classical entropy measures are based on binary logic decision rules, but they still each adopt a distance metric to compare subsequent samples [
16]. The latter aspect was overcome by more recently introduced entropy measures, wherein samples’ distribution patterns are considered [
28,
29]; in particular, the permutation entropy (PEn) and its modifications rely on an ordinal mapping of the neighboring values [
30,
31], evaluating their relative occurrence and avoiding the use of distance metrics. Despite the use of the PEn for time-series analysis representing a still investigated issue [
32], it has been successfully applied on EEG data for detecting epileptic seizures [
31,
33,
34]. Further, PEn was also used in ECG data analysis in order to investigate behavioral states [
35] and for the analysis of heart rate variability [
36,
37]. However, the validity of using a complexity measure as the PEn cannot be blindly extended to other biological signals, different to those listed above, such as the sEMG ones, due to their different characteristics [
1]; bear in mind also that it relies on ordinal mapping of data samples, thereby avoiding amplitude information which otherwise appears worth considering for some applications [
31]. Thus, as for the FEn, the investigation of the PEn as a possible feature for sEMG-driven gesture recognition appears to be yet lacking.
As happens for all the entropy metrics, FEn and PEn rely on the a priori determination of unknown parameters [
16,
32], and therefore a fundamental aspect which deserves to be carefully considered when dealing with complexity measures in time-series analysis is the influence of computational parameters. The FEn depends upon the embedding dimension
m, which represents the length of the subseries to be compared, and on other two parameters, which account for acknowledging or denying the similarity between subseries (see
Section 2). Generally speaking, the parameters related to the latter aspect play a crucial role in obtaining a reliable entropy computation and their role has been investigated for a number of different entropy algorithms [
38], and for biological signal analysis [
39,
40]. On the other hand, the PEn computation is based on an ordinal mapping of the embedding vectors (see
Section 2) and thus the choice of the embedding dimension
m becomes crucial for obtaining a reliable estimation of complexity [
32].
The goal of this study was to investigate the possible applicability of both FEn and PEn as significant features for distinguishing different hand gestures in healthy subjects, through the characterization of the sEMG signal acquired by a low-cost device such as the Myo armband. The major contributions provided by this work can be summarized as follows: Firstly, two complexity measures, based on highly different computational core logics, have been investigated as possible features for hand gesture characterization. The FEn was originally presented and evaluated for characterizing myoelectric signals for hand gesture recognition, but it was not compared with other different time or frequency domain features. On the other hand, despite it having been used for analyzing biological signals, the PEn has not been exploited for analyzing EMG time series, and thus its suitability as a possible feature able to adequately describe different gestures from sEMG recordings has not been investigated. Secondly, the choice of computational parameters represents a crucial aspect when dealing with complexity measures for time-series analysis. Therefore, for both FEn and PEn the influences of computational parameters were examined with respect to their gesture clustering capabilities, while providing a possible set of optimal parameters for each considered set of hand movements. Thirdly, the suitability of both FEn and PEn for hand gesture characterization was evaluated in three different ways. As reported above, the computational parameters were selected while relying on the best clustering of hand gestures. Then, the proposed complexity measures were compared with a set of well-assessed features in time and frequency domains, in terms of their predictive importance, according to the minimum redundancy maximum relevance criterion. Further, since a direct comparison between FEn and PEn was hardly feasible, due to their different computational core logics, their roles in different feature sets were also assessed in terms of classification accuracy.
4. Discussion
In this study the sEMG signal provided by a low-cost device—the Myo armband—has been treated by using two different complexity measures in order to assess whether they could be useful as distinctive features for hand gesture recognition in healthy subjects. The FEn was proposed by Chen et al. [
16] and its use as the only feature extracted from sEMG was adequate to obtain proper recognition of four different arm gestures, outperforming more classical entropy measures such as the approximate and sample entropy [
16,
27]. However, Chen and colleagues considered only four gestures, i.e., hand grasping, hand opening, forearm supination and forearm pronation [
16] and sEMG signals were acquired by means of a gold-standard device, with a proper sampling rate (1 kHz).
Results of the present study suggest the suitability of FEn to describe a wide range of arm gestures, since the FEn was tested on three different sets of hand gestures: movements miming finger numbering (set 1), wrist movements (set 2) and more complex gestures, involving both fingers and wrist (set 3), for a total of 14 gestures. However, the parameter selection for FEn computation seems to be crucial: the choice of a gradient of the exponential function (
4) as
did not allow a correct recognition of the actual number of gestures for any of the considered sets, independently from the
r value (width of the exponential function). This agrees with [
16], where a value of
was suggested in order to avoid an entropy computation over-affected by noise. However, despite the fact that
and
allowed for correct cluster identification, the best performance in terms of GS index resulted from
(see
Section 3). Therefore, a wider gradient of
led to a better characterization of sEMG signal in terms of the recognition of different gestures. The latter aspect could appear to not exactly match the suggestions reported in [
16], where
was used in sEMG-based hand motion recognition. However, it is noteworthy that the latter indications were based on the low variations of the FEn standard deviation for
and not on the numerical quantification of the quality of gesture clustering. Therefore, the gradient
n being a weight of vector similarity, the results of this study suggest that for a more reliable gesture classification closer vectors have to be weighted more than the dissimilar ones, as happens for
. However, the higher
n is, the greater the loss of detailed information, considering that as
, the
function becomes the Heaviside step function, where the contributions of samples are equalized depending on their values with respect to the hard boundaries [
16]. Thus, despite gradient values
not being considered in the present study, their use in sEMG signal analysis seems to be not recommendable.
Regarding the other parameter for FEn computation, all the considered
r values allowed us to distinguish the expected number of gestures (see
Section 3) and the goodness of clustering appeared poorly affected by its numerical values: the average distance of each point from the respective cluster showed almost the same value, for each
r and each gesture set (
Figure 5). This aspect highlights the validity of using the FEn as a possible feature for EMG-based gesture recognition, and when dealing with poor quality signals such as those provided by the Myo armband. Further, it also suggests that the use of FEn can be suited for the description of hand gestures which present different functional characteristics with respect to [
16], involving indeed pure finger movements (set 1) and complex gestures which mimic routine daily activities (set 3). The same holds also considering the distances of all points from every cluster (
Figure 6), where in addition the clusters for the first set of gestures were more spaced with respect to the other sets (
Figure 3). This indicates that by using the FEn together with an off-the-shelf device for forearm sEMG signal acquisition, it might be possible to distinguish subtle hand gestures involving single-finger movements in addition to the more common hand and wrist movements [
14,
16], considered in set 2.
The PEn was introduced [
30] as a tool for describing time-series complexity, based on a mapping of neighboring values to ordinal patterns, and it appeared to be a promising measure for characterizing sEMG signals, being able to correctly distinguish hand gestures within all the considered sets (
Figure 4). However, as for the FEn, the computational parameters play a significant role. An embedding dimension of seven did not allow us to recognize the expected number of clusters in any of the three gesture sets (see
Section 3). This represents an easily expected result, considering that with a sampling rate of 200 Hz and an epoch of muscular activation equal to about 3 s, the length of the considered time-series was at most 600 samples, far smaller than the 5040 samples required to satisfy the constraint
(see
Section 2) for having a reliable PEn estimation [
32]. For the remaining embedding dimensions, the clustering worked better when
(see
Section 3), particularly for set 2. The better performances of using
with respect to
seemed to be confirmed by the points-to-centroid distance within each cluster, which were lower for every gesture set (
Figure 5) and also by the higher cluster sparsity (
Figure 6). Thus, taking into account a higher number of possible permutations (3! versus 5!) could allow for a more detailed characterization of the sEMG signal. In some cases
d was selected, depending on the number of samples, in order to comply with the above-mentioned constraint regarding the data length. Fadlallah et al. [
31] used
on EEG time series in order to investigate the suitability of a modified version of the PEn to assess epileptic seizures by detecting abrupt changes in signal amplitude. This choice was mainly related to the width of the windows used in PEn computation (below 100 samples) and was adopted also in [
35], where for heart rate data including 256 samples, an embedding dimension of three was employed. On the contrary, despite PEn being computed in some cases for large amounts of data with high
d values, for instance, up to eight for
[
36], other studies reported that even if a relatively large number of data samples were available, the selected embedding dimension did not exceed five for
or even four for
[
34,
45]. Thus, the choice of the embedding dimension for PEn computation still represents an issue which deserves to be carefully investigated when dealing with biological time series. However, a possible even if partial explanation for the better performance of PEn with a relatively higher embedding dimension (
) observed in this study could lie in the frequency characteristics of sEMG signal. The greater bandwidth of the latter, with respect to EEG and heart-rate signals, leads to more complex patterns in terms of variations of signal amplitude in a given temporal epoch. Hence, a higher number of possible permutations (
) could be more appropriate for describing the sEMG’s fast and complex dynamics, even if the number of samples for each muscular activation during gesture performance (
) is comparable or lower than the numbers previously reported for EEG and heart-rate descriptions. On the contrary, small embedding dimensions encompass few ordinal schemes and could be not suitable for capturing time series patterns presenting fast dynamics [
34].
Both complexity measures were thereby capable of properly characterizing hand gestures from forearm sEMG data, but a direct comparison between FEn and PEn is hardly feasible, due to their different ranges of values and also considering the different computational logic used for assessing complexity, the first being related to the geometrical distance between embedding vectors [
16] and the second being based on the permutation patterns of ordinal symbolic schemes [
30,
32]. A possible advantage of the PEn can be found in its dependence upon a single parameter, while for the FEn three parameters need to be set in advance. The latter aspect can be worthwhile from the experimenter’s point of view, easing the initial settings of the measure, but it could potentially limit the application of PEn to biological time-series, where the heterogeneous nature of each different signal [
16,
31,
46] may require more refined parameter choices, such as those provided by the FEn. It is worth noticing, however, that PEn could present a significant advantage in terms of computational load, which is far lower than that of the FEn: for a white noise time series of length
and an embedding dimension of three for both FEn and PEn, the former takes 4.61 s to be computed while the latter only 0.08 s. Despite this aspect depending also on the particular implementation and hardware characteristics, it could be valuable in view of using such a complexity measure for real time control of robotic arms or upper limb prostheses [
11].
The results of this study showed that with a proper parameter settings, both FEn and PEn are able to provide a clear separation in all the three considered sets of gestures, which likely would improve the performances of a classification architecture fed by this kind of feature. Thus, the entropy-based feature investigated in the present study could be used in conjunction with more classical and well-acknowledged features [
3,
9], giving a more consistent description of the sEMG patterns characterizing specific motor tasks. The latter aspect seems to be supported by the comparison of FEn and PEn with a set of 18 classical time and frequency features, in terms of predictive importance score (
Figure 7). The analysis performed through the mRMR algorithm showed that for two out of three sets of gestures (set 1 and set 3) the proposed complexity measures were those with the highest scores (
Figure 7a,c); there was a non-negligible drop in score for the third one in both cases. Thus, both the proposed complexity measures appeared to fall within the subset of features able to better characterize subtle movements involving finger numbering and relatively complex grasping. Considering set 2, FEn and PEn were within the five most predictive features (
Figure 7b) and limited differences in their scores with respect to the first one were recognizable (0.016 and 0.013 versus 0.022). This could indicate that the FEn and PEn share with classical time domain features a significant and almost comparable predictive importance, which can enhance the characterization also of gestures related to forearm and wrist movements. In addition, it deserves to be underlined that irrespective of the gesture set, the SEn was one of the five most important features (
Figure 7), confirming its validity as an sEMG feature for upper limb movement recognition [
17] and at the same time supporting the use of complexity measures such as those considered in this study in the field of hand gesture characterization.
The results of the analysis based on the predictive importance scores seem to be confirmed by the investigation of FEn and PEn in terms of classification accuracy (
Table 2). The addition of the FEn alone and the PEn alone to each of the four considered feature sets led in any case to a substantial increase in the classification accuracy. It is worth underlining that also the accuracy of the HDF set, which comprises both time and frequency domain features, was noticeably enhanced by the addition of even a single entropy measure, reaching values beyond 80% for the first gesture set and beyond 90% for the other two sets (
Table 2). The classification accuracy allowed us also to perform a comparison, even if indirect, between FEn and PEn, in order to gain information about whether one of the two complexity measures could be considered as the most appropriate to be included in a feature set for gesture recognition. For all the feature sets, the inclusion of the FEn appeared to provide a limited increase in the classification accuracy with respect to the PEn, with a difference in any case not greater than 5% (
Table 2). However, it should be noted that when FEn and PEn were added at the same time within a feature set, the accuracy showed a dramatic improvement, as it did for the H feature set, which included the lowest number of features. In the latter case the accuracy raised up to 80% for set 1 and reached almost 90% for sets 2 and 3 (
Table 2); for the HDF set, which encompassed the both time and frequency features, the accuracy increased beyond 90% for sets 2 and 3 and up to 88% for set 1. Further, the best accuracy was attained when both entropy measures were employed as myoelectric features, since in the latter case the accuracy was noticeably greater with respect to when FEn and PEn were considered individually. Therefore, outcomes of this study seem to suggest the opportunity to consider both complexity measures within the same feature set, since each of them appeared to provide a different kind of information regarding the sEMG signal complexity, thereby avoiding feature redundancy. Incidentally, the latter aspect could be likely due to the different core logic behind the computation of FEn and PEn, the former being based on a distance metric to compare subsequent samples while the latter relies on the relative occurrence of an ordinal mapping of neighboring values (see
Section 2.1).
Few words deserve to be spent on the choice of analyzing data recorded from the Myo armband. Although this kind of device presents some technical limitations, e.g., a limited sampling rate which does not allow one to exploit the full sEMG signal bandwidth, it has been used as myoelectric signal recording system in a number of different studies focused on the topic of hand gesture recognition [
2,
5,
11,
19]. The above-mentioned works comprise also the investigation of cutting edge algorithms in the artificial intelligence field, such as deep learning algorithms and recurrent neural networks [
10,
13,
14]. Further, the Myo armband was also used for collecting databases of upper limb gestures [
20] which became in turn popular benchmarks for testing gesture recognition architectures [
13]. Hence, the use of the Myo armband as a recording device appears to be well assessed in the context of gesture recognition driven by sEMG signals [
7]. On the other hand, the wide use of the Myo armband in the field of upper limb motion recognition, joined with the above-mentioned limitations, led to examining the device characteristics in terms of recorded signal and its processing. For instance, Kanoga et al. [
15] tested the long-period reliability of the Myo armband, while in [
47] different filtering techniques were compared through the processing of Myo armband sEMG signals. It is noteworthy that despite the limited bandwidth, features in the frequency domain are commonly computed from myoelectric signals recorded by the Myo armband. Beyond the more classical MNF and MDNF [
2,
5], measures related to the power spectrum were considered too [
5], and in some cases the entire signal spectrogram was taken into account [
48] along with features in the time-frequency domain, e.g., wavelet related measures [
14]. The low sampling rate seems to not affect the validity of exploiting the information related to the frequency characteristics of the sEMG signal recorded by the Myo armband, and thus for gesture recognition purposes the use of devices with limited recording rates appears not completely prevented. Hence, all these aspects support the chance to assess the reliability of signal processing and computational techniques directly on data recorded from low-cost devices such as the Myo armband, since its technical limitations seem to not significantly affect its validity in the hand gesture recognition field.
As reported above, the sampling rate of the Myo armband restricts the available bandwidth of the acquired signal, and thus the validity of the results presented in this study when using a proper sampling rate for myoelectric recording (1 kHz or above) requires brief discussion. Both FEn and PEn measure the regularity of a finite-length time series, in terms of pattern repeatability, and thus they do not belong to time or frequency domains [
49,
50,
51]. Broadly speaking, entropy measures assess the existence of complex patterns in a time series, which describes the temporal development of a physical phenomenon, assigning a degree of repeatability lying in between two opposite extrema: a total randomness (e.g., white noise) and a total predictability (e.g., sinusoidal waves). Therefore, entropy measures deal with the existence of inner structures belonging to data distribution, mirroring in turn inner characteristics of a physical phenomenon which are not directly linked with the number of available data samples, which depend on each particular experimental setup [
49]. This appears to be confirmed also for the two entropy measures investigated in this study, considering that both FEn and PEn showed a relatively consistent behavior in assessing complexity of the same kind of signal with different numbers of data samples [
32,
39]. However, the latter does not imply that the issues related to data length and sampling rate are completely negligible because it is obvious that too few data samples could not allow a proper exploitation of the underlying structural characteristics of the signal, leading to poor reliability of entropy measurements [
49]. On the other hand, it is straightforward to assume that data acquired with quite different sampling rates, e.g., 1 Hz and 10 kHz, could lead to different entropy measurements as well, since with a limited recording frequency the inherent dynamics of the system under consideration would likely be captured with less accuracy and some structural characteristics in terms of patterns repeatability could be lost [
49]. Albeit it is not possible to determine a priori a suitable number of samples, a recommended rule of thumb is to have at least 200 data points in order to ensure the validity of entropy computation [
50,
51]. In this study, feature extraction was performed on sEMG bursts related to gesture execution, which lasted at least 3 s, thereby producing myoelectric time series of 600 samples, much higher than the minimum number of samples reported above. Further, downsampling raw data represents a common procedure when dealing with entropy-based analyses, aimed at reducing computational loads without degrading the reliability of the outcomes [
51]. Therefore, for all these reasons, results of this study seem to not be affected by the relatively low sampling rate used for data acquisition and could likely maintain their validity if different recording devices were to be adopted. However, this aspect deserves to be investigated in further studies, focused on a direct comparison between different acquisition systems, considering, for instance, that high-density sEMG systems are also employed for hand gesture recognition purposes [
4].
It is worth noticing that the armband channel selection was based on the lowest signal-to-noise ratio, thereby discarding the information provided by the remaining sensors. However, the integration of sEMG signals from all the Myo sensors could further improve the gesture characterization, providing global information on the whole forearm’s muscular activity. Due to the above-mentioned characteristics, the influence of armband positioning and thus probe placement on the myoelectric signal recorded by the device represents an additional topic of interest which warrants careful investigation in dedicated studies. This appears to be of particular importance considering that the Myo armband could present poor placement repeatability between experimental sessions [
15] and further sensor misplacements can involve not only shifts around the forearm but also proximal and distal probes misalignments, which likely would degrade the sEMG signals recorded by the Myo armband. Additionally, the use of different entropy measures should be evaluated in future studies, in order to assess whether the sEMG could be better characterized by complexity measured based on time-series dispersion or sample distribution [
28,
29].