Feature Extraction Based on Sparse Coding Approach for Hand Grasp Type Classification

Samkunta, Jirayu; Ketthong, Patinya; Mai, Nghia Thi; Kamal, Md Abdus Samad; Murakami, Iwanori; Yamada, Kou

doi:10.3390/a17060240

Open AccessArticle

Feature Extraction Based on Sparse Coding Approach for Hand Grasp Type Classification

by

Jirayu Samkunta

^1,*

,

Patinya Ketthong

^1,2,

Nghia Thi Mai

³,

Md Abdus Samad Kamal

⁴

,

Iwanori Murakami

⁴ and

Kou Yamada

^4,*

¹

Graduate School of Science and Technology, Gunma University, Maebashi 376-8515, Japan

²

Faculty of Engineering, Thai-Nichi Institute of Technology, 1771/1 Pattanakarn Rd. Suanluang, Bangkok 10250, Thailand

³

Department of Electrical and Electronic 1, Posts and Telecommunications Institute of Technology, Hanoi 100000, Vietnam

⁴

Division of Mechanical Science and Technology, Gunma University, Maebashi 376-8515, Japan

^*

Authors to whom correspondence should be addressed.

Algorithms 2024, 17(6), 240; https://doi.org/10.3390/a17060240

Submission received: 11 March 2024 / Revised: 26 May 2024 / Accepted: 28 May 2024 / Published: 3 June 2024

(This article belongs to the Special Issue Algorithms for Feature Selection (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

:

The kinematics of the human hand exhibit complex and diverse characteristics unique to each individual. Various techniques such as vision-based, ultrasonic-based, and data-glove-based approaches have been employed to analyze human hand movements. However, a critical challenge remains in efficiently analyzing and classifying hand grasp types based on time-series kinematic data. In this paper, we propose a novel sparse coding feature extraction technique based on dictionary learning to address this challenge. Our method enhances model accuracy, reduces training time, and minimizes overfitting risk. We benchmarked our approach against principal component analysis (PCA) and sparse coding based on a Gaussian random dictionary. Our results demonstrate a significant improvement in classification accuracy: achieving 81.78% with our method compared to 31.43% for PCA and 77.27% for the Gaussian random dictionary. Furthermore, our technique outperforms in terms of macro-average F1-score and average area under the curve (AUC) while also significantly reducing the number of features required.

Keywords:

feature extraction; sparse coding; human grasp types; classification; dictionary learning

1. Introduction

Understanding human hand characteristics involves investigating hand kinematics during various grasping actions and general patterns of hand usage. This research is critical across domains such as medicine, rehabilitation, psychology, and product design. Hand kinematics are studied to understand the characteristics of healthy hand movements [1,2] and to inform object design based on grasping patterns [3,4]. Moreover, the study of human hand manipulation has practical applications in the field of robotics, where the versatility of hand kinematics plays a crucial role. To facilitate effective manipulation, human hands exhibit coordination and a multitude of degrees of freedom (DoFs). Therefore, comprehensive hand kinematics data encompassing a wide range of DoFs are essential for studying interactions with various objects under different environmental conditions.

Existing methods for analyzing hand kinematics often fail to capture the necessary features for precise classification. The utilization of dimensionality reduction techniques becomes imperative and is implemented with all hand kinematic recording techniques. The dimensionality reduction techniques are used as a pre-processing step and can eliminate irrelevant data, noise, and redundant features. Dimensionality reduction has been performed based on two main methods, which are feature selection and feature extraction. These techniques are utilized for analyzing and interpreting intricate behavior of the human hand [5]. Feature selection involves choosing the most relevant functions that adequately describe the essential indicators. On the other hand, feature extraction entails identifying and transforming the most informative features from a given dataset. Both feature selection and feature extraction are fundamental steps in machine learning models [6], as the quality and relevance of features significantly impact the model’s performance and accuracy. Feature extraction becomes a necessary method to identify and extract the essential information so that the investigation of hand kinematics can be made easier.

Due to feature extraction outperforming in terms of performance and accuracy, then feature representation is important for analyzing hand kinematics. It involves transforming raw data into a format that highlights essential characteristics, making it easier to identify patterns and perform classification tasks. Accurate feature representation can significantly enhance the performance of machine learning models used for hand kinematic analysis. Hence, this paper addresses this gap by proposing a novel sparse coding feature extraction technique based on dictionary learning. Our contributions are threefold: we introduce a new feature extraction method tailored for hand kinematics time-series data, demonstrate the effectiveness of our method through extensive experimental evaluation, and significantly improve classification accuracy while reducing the number of features compared to existing methods.

Existing hand-based feature extraction methods can be divided into four main categories. The first category includes global and local statistical-based methods [7,8,9,10] that use features like local binary histograms and graph structures to capture information from the hand image. Examples of these methods include generalized symmetric local graph structure (GSLGS) [7], local binary pattern (LBP) [8], and histogram of oriented lines (HoL) [9] and its alternatives [10]. These methods are popular for hand-based biometric recognition. Another category is coding-based methods [11,12,13,14,15], which focus on encoding directional information in the hand image. This information is important because hand patterns often have distinct orientations that are resistant to changes in lighting. Examples of coding-based methods include sparse coding (SC) [15], ordinal code [11], competitive code (Compcode) [12], collaborative representation_Compcode (CR_Compcode) [13], and joint discriminative sparse coding (JDSC) [14]. Subspace-learning-based approaches [16,17,18,19,20,21,22] are another category. These methods, like principal component analysis (PCA) [16,17,18,19,21,22] and linear discriminant analysis (LDA) [20], aim to reduce the dimensionality of the data by projecting it into a lower-dimensional space. This can help improve recognition accuracy. Finally, deep-learning based methods [23,24] have recently emerged as a powerful tool for hand-based biometric recognition. These methods have shown promising performance for capturing discriminative features from hand images.

Despite recent advancements in hand kinematic analysis, feature extraction approaches still have several limitations. For example, most feature extraction techniques are specific to hand kinematic datasets derived from hand image datasets. Additionally, some techniques are only suitable for specific hand analyses using EMG signals [24,25] and are difficult to extend to other types of hand kinematic data. Moreover, most of the traditional feature extraction methods are focused on extracting features based on subspace-learning-based approaches while not adequately exploiting the consistency and complementary information among the other types of hand kinematic data, especially hand kinematic time-series formats.

To overcome the challenges in hand kinematic analysis, we propose a methodology consisting of three primary steps. First, the hand kinematic dataset undergoes preprocessing through resampling. Next, various feature extraction techniques are applied, including raw data, PCA, sparse coding based on a Gaussian random dictionary, and our proposed sparse coding based on dictionary learning. Finally, the extracted features are used for neural network classification. Our method leverages the sparsity-inducing properties of dictionary learning to capture relevant features from hand kinematics time-series data, as detailed in Figure 1.

Our proposed method aims to enhance classification accuracy for identifying grasp types using hand kinematics in a time-series format. The proposed feature extraction technique leverages sparse coding based on dictionary learning, as illustrated in Figure 2. Building on existing sparse coding techniques, our approach provides the sparsest solution to an underdetermined linear problem by generating sparse coefficients. The process involves resampling the kinematic dataset to a specific size, vertically concatenating the resampled data, and partitioning the dataset into training and test sets using random partitioning. The training set is used to derive a dictionary through an online dictionary learning algorithm, which is then employed to extract sparse coefficients from both the training and test sets. These sparse coefficients are subsequently used as inputs for neural network classification, as depicted in Figure 3.

The main contributions of this paper are summarized as follows:

We propose a sparse coding technique based on dictionary learning to extract hand kinematic features, which is a context that has not been extensively explored before.
Unlike most existing methods that extract features from hand datasets using images, our proposed method demonstrates the potential of using three-dimensional motion tracking using a time-series format for feature extraction.
Our approach differs from our previous work that utilized traditional sparse coding with a dictionary based on a Gaussian random distribution. Instead, we apply a dictionary learning technique to the hand kinematics dataset in a time-series format. Extensive experimental evaluation of the publicly available UNIPI dataset demonstrates the effectiveness of our proposed method compared to existing techniques and our prior work.
A key distinction from previous studies lies in the estimation technique used. While previous work employed the Frobenius norm for solving the optimization problem to obtain sparse coefficients, this work utilizes the L1-norm as the optimizer. This change results in a sparse representation, minimizing the number of features required.

The organization of the work is as follows. In Section 2, we review related work on feature extraction based on coding and our previous work on sparse coding for hand analysis. Section 3 details sparse coding feature extraction based on dictionary learning. In Section 4, we present the neural network classification used to evaluate the proposed technique. Section 5 describes the details of the dataset and the obtained experimental results. The discussion and limitations are presented in Section 6, while the conclusions are summarized in Section 7.

2. Related Work

Feature extraction based on sparse representations has been widely introduced for several tasks [26,27,28,29]. For example, Liu et al. [26] designed sparse coding based on dictionary learning with orthogonal matching pursuit (OMP) to extract sparse features for fault classification and recognition. Ma et al. [27] presented a joint sparse coding learning method for early fault feature extraction in rotating machinery with the aim of preserving weak fault features, promoting sparsity, and removing noise, thereby enhancing predictive maintenance. G. S. V. S. Sivaram et al. [28] introduced a novel speech recognition feature extraction technique using sparse coding. Speech spectro–temporal patterns are represented as sparse linear combinations of an overcomplete set of dictionaries. The proposed technique outperform conventional features in both clean and noisy conditions. H. Amintoosi et al. [29] presented a novel two-factor authentication mechanism for IoT devices, addressing the challenge of heterogeneity and security concerns. The proposed method utilizes sparse coding for feature extraction for remote user biometric authentication. By employing hash operations and an overcomplete dictionary, it efficiently stores and retrieves biometric data. Furthermore, many recent works have shown that sparse coding is well-applied to biometrics recognition and classification tasks [15,30,31], such as B. M. Whitaker et al. [30] presenting a method for classifying heart sounds using sparse coding. Preprocessed audio data are decomposed into a dictionary matrix representing key features and a sparse coefficient matrix mapping these features to each segment. M. C. Yo et al. [31] investigated the impact of sparse coding in face recognition by using PCA and sparse representation classification (SRC) with a public image dataset. The proposed method enhances face recognition with significant accuracy. In our previous work [15], we proposed a feature extraction method for hand gesture classification using sparse coding based on a Gaussian random dictionary to reduce the number of features. The optimizer in the previous work was the Frobenius norm, which was used to solve the optimization problem and obtain sparse coefficients. Additionally, the evaluation was performed on a dataset consisting of only five objects (two-Euro coin, credit card, salt shaker, screw, and marker). The results showed that the sparse-coding-based approach improved feature reduction performance. However, the classification accuracy was not satisfactory. Several related works have shown that sparse-representation-based approaches have achieved satisfactory performance and motivated various works in the feature extraction tasks [15,26,27,28,29,30,31].

Inspired by the success of the sparse-coding-based approach, this work proposes a sparse coding method based on a dictionary learning algorithm. This method utilizes the L1-norm as the optimizer to find the sparsest coefficients (the lowest number of nonzeros) for hand-grasp-type classification.

3. Sparse Coding Feature Extraction Based on Dictionary Learning Approach

Sparse coding represents an unsupervised learning approach with the objective of attaining a sparse representation of input data through a linear combination of fundamental elements known as atoms, which are organized in a dictionary. In this paper, we propose the utilization of sparse coding based on dictionary learning as a feature extraction method. This method is designed to capture relevant features from data. The main idea of the proposed method is related to the ability to reconstruct high-dimensional signals using only a few linear measurements under the condition that the signal exhibits sparsity or near-sparsity. The algorithm’s outline is shown in Algorithm 1. Initially, an initial dictionary is generated by randomizing raw data. Subsequently, this initial dictionary is employed in the construction of an updated dictionary through the application of an online dictionary learning algorithm [32]:

min_{\begin{matrix} D \in C \end{matrix}} lim_{n \to + \infty} \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{2} {∥x_{i} - D α_{i}∥}_{2}^{2} + λ {∥α_{i}∥}_{1})

(1)

Given

x_{i} = [x_{1}, . . ., x_{n}] \in R^{m \times n}

is a training set of signals, D is a dictionary,

λ

is a sparsity-inducing regularizer,

α = [α_{1}, . . ., α_{n}] \in R^{k \times n}

are the coefficients of the sparse decomposition, and C is the convex set of matrices verifying this constraint:

C ≜ \{D \in R^{m \times k} s . t . \forall j = 1, \dots, k, d_{j}^{T} d_{j} \leq 1\}

(2)

note that k is prescribed as 70 to construct a dictionary. Assume a matrix of signals

x = [x_{1}, . . ., x_{n}] \in R^{m \times n}

and a learned dictionary from Equation (1). To obtain a matrix of sparse coefficients

α = [α_{1}, . . ., α_{n}] \in R^{k \times n}

, the learned dictionary was utilized by using the sparse decomposition technique [33] that implemented a least angle regression (LARS) algorithm [34] based on the L1-norm for solving the lasso problem of Equation (3). The LARS algorithm can improve the effectiveness of solving the lasso problem. Moreover, the LARS algorithm is particularly useful for feature extraction as it can reduce the number of features by the sparsest coefficients to zero and speed up the process as a matrix factorization problem:

min_{\begin{matrix} α \in R^{k} \end{matrix}} {∥x - D α∥}_{2}^{2} s . t . {∥ α ∥}_{1} \leq λ

(3)

The matrix of sparse coefficients

α

from Equation (3) is represented as a significant feature of hand kinematics that will be used as the training set for classification to evaluate the performance of the feature extraction method based on the sparse coding approach.

Algorithm 1 Online Dictionary Learning

Require: $x \in R^{m} \sim p (x)$ (independent and identically distributed random variable samples of p), regularization parameter $α \in R$ , initial dictionary $D_{0} \in R^{m \times k}$ , number of iterations T

1:: $A_{0} \in R^{k \times k} \leftarrow 0$ , $B_{0} \in R^{m \times k} \leftarrow 0$ (reset the past information)
2:: for $t = 1 to T$ do
3:: Draw $x_{t}$ from $p (x)$
4:: Sparse coding: compute using LARS

$\begin{matrix} α_{t} ≜ \underset{α \in R^{k}}{argmin} \frac{1}{2} {∥x_{t} - D_{t - 1} α∥}_{2}^{2} + λ {∥α∥}_{1} \end{matrix}$

(4)
5:: $A_{t} \leftarrow A_{t - 1} + α_{t} α_{t}^{T}$ .
6:: $B_{t} \leftarrow B_{t - 1} + x_{t} α_{t}^{T}$ .
7:: Compute $D_{t}$ based on Algorithm 2, with $D_{t - 1}$ as warm restart:

$\begin{matrix} D_{t} & ≜ \underset{D \in C}{argmin} \frac{1}{t} \sum_{i = 1}^{t} (\frac{1}{2} {∥x_{i} - D α_{i}∥}_{2}^{2} + λ {∥α_{i}∥}_{1}), \\ = \underset{D \in C}{argmin} \frac{1}{t} \sum_{i = 1}^{t} (\frac{1}{2} {∥x_{i} - D α_{i}∥}_{2}^{2} + λ {∥α_{i}∥}_{1}) \end{matrix}$

(5)
8:: end for
9:: Return $D_{T}$ (learned dictionary).

Algorithm 2 Dictionary Update

Require: Input dictionary

D = [d_{1}, . . ., d_{k}] \in R^{m \times k}

,

1:: $A = [a_{1}, . . ., a_{k}] \in R^{k \times k}, B = [b_{1}, . . ., b_{k}] \in R^{m \times k}$
2:: repeat
3:: for $j = 1 to k$ do
4:: Update the j-th column to optimize for Equation (5)

$\begin{matrix} u_{j} \leftarrow \frac{1}{A [j, j]} (b_{j} - D a_{j}) + d_{j}, \\ d_{j} \leftarrow \frac{1}{{∥u_{j}∥}_{2}, 1} u_{j} . \end{matrix}$

(6)
5:: end for
6:: Until convergence
7:: Return D (updated dictionary).

4. Neural Network Classification

Neural network classification was utilized to evaluate the efficiencies of the proposed feature extraction. The UNIPI dataset contains hand kinematics data and is composed of 2 trials from 6 participants and involves 19 objects, and it was subjected to training using the “fitcnet” function within the MATLAB 2022b and classified into five grasping types. This training process employed a feedforward neural network for classification purposes. The initial fully connected layer of the model was linked to the predictor data, with each subsequent sublayer connected to the preceding one. Within each layer, the input was multiplied by a weight matrix and subsequently had a bias vector added to it. Furthermore, the rectified linear unit (ReLU) function served as the activation function. The limited-memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) algorithm was employed as the parameter estimation solver. In the final layer, the softmax function was applied as the activation function to generate the output, which consisted of classification scores and predicted labels. A summary of all hyperparameters used in the proposed method can be found in Table 1.

5. Experiments

A kinematic hand dataset in a time-series format was utilized to demonstrate the effectiveness of the proposed method. The details of the dataset are described in this section. The dataset contains recordings of hand movements involving 21 objects and can be categorized into 5 grasp types, as shown in Table 2. The classification results of these five grasp types were used to evaluate the performance of the proposed method, which is also detailed in this section. All experiments were conducted on a MacBook Pro (14-inch, 2021) running macOS, equipped with an Apple M1 Pro 8-core CPU and 16 GB of unified memory, and using MATLAB 2022b.

5.1. Dataset

The UNIPI dataset [35] provides hand kinematics data based on visual sensors to investigate postural synergies of human grasping by six subjects during asked notification to grasp objects that can be described as a hand posture behavior from a hand kinematic perspective. Regarding the participants, six volunteers were invited to record and test in the experiments. The six volunteers, including three females and three males, were 23–27 years old and had a mean age of 25.17 years. All volunteers were tested in the experiment based on self-reported right-hand dominance. All volunteers had no neuromuscular disorders that affected the investigation’s experimental purpose. Before the experiment, all volunteers signed a consent form to participate in the investigation. The acquisition setup consisted of two significant acquisition sources. First was the PhaseSpace Motion Capture system, which records kinematics data based on three-dimensional motion tracking with active LED markers. The system includes stereo cameras for tracking the 3D positions of active LED markers, which are attached to the volunteer’s hand and phalanges. The second source was the set of 21 objects that were selected. In this dataset, two objects were displaced as we could not clarify the hand grasp types. Then, we considered only 19 objects and clarified them into 5 grasp types as shown in Table 2. Figure 4 depicts sequences of examples illustrating various grasping types. In the first row, the hand employs a key to execute a flip-type grasp. Moving to the second row, a credit card is utilized for a sliding grasp: guiding it across the surface to the edge of the table. In the third row, tape is employed for a robust grasp, demonstrating a closing grasp. The fourth row features a screw used for a pinch grasp. Finally, in the last row, a marker is utilized for reorientation in a rotation grasp.

The kinematics model of the human hand has complexity that needs to be described and investigated. In this work, the kinematics model of the human hand is considered based on 20 degrees of freedom, as illustrated in Figure 5. Four long fingers, consisting of the index, middle, ring, and little, are described by four angles. Each long finger is described based on motions of the joint that have two DoFs at the metacarpophalangeal joints to describe flexion–extension and abduction–adduction mobilities along with one DoF at the proximal and distal interphalangeal joints to describe flexion–extension mobilities. The thumb is represented based on motions of four joints that have two DoFs at the trapeziometacarpal for expressing flexion–extension and abduction–adduction mobilities, along with one DoF at the metacarpophalangeal joints, and one DoF at the interphalangeal joint to describe flexion–extension mobility. The kinematic structure of the human hand model is described by Denavit–Hartenberg (DH) parameters to express the kinematic chain of each finger [35]. The description of hand kinematics based on 20 DoFs is described in Table 3.

5.2. Results for the UNIPI Dataset

The UNIPI dataset contributed hand kinematics data. The dataset is divided into 19 different objects, with each object subjected to 2 separate trials. In the first step, we applied data processing techniques to ensure that the kinematic data were in a suitable time-series format by adjusting for differences in data length. This involved resampling processes to effectively format the data by utilizing linear interpolation to adjust the length of the dataset to a specific size (500 time series). Before the next process, the resampled data were utilized to construct training and test sets using random partitions for holdout cross-validation. The training dataset comprised approximately 80% of the kinematic data, while the validation dataset accounted for the remaining 20%. The second step focused on feature extraction, where we employed the proposed technique based on dictionary learning constructed from the training set. The necessary parameters to construct the learned dictionary as k is prescribed as 70, and the number of iterations T was 100 iterations. The learned dictionary was utilized to obtain sparse coefficients based on sparse decomposition from the training and test sets. Additionally, principal component analysis (PCA) was employed to extract features from the pre-processed data, serving as a benchmark for comparing the efficiency of the proposed technique with the traditional technique for classifying grasping types. The selection of PCA as a benchmark was influenced by this research and was based on its linearity and well-known effectiveness for hand-based feature extraction tasks. Specifically, we chose two principal components for experimentation in benchmarking. This choice aimed to capture essential information while minimizing the number of components and ensuring a meaningful representation of the entire dataset’s variance. These criteria provide a foundation for a meaningful and informative comparison with the proposed sparse coding approach. The final step involved classification based on neural network classification, where the goal of classification was to identify grasping types based on hand kinematic data. The sparse coefficients from the training set were used to train the NN classification model. The sparse coefficient from the test set was utilized to evaluate the performance of the NN classification model. The classification results are visualized by using a confusion matrix. The proposed feature extraction process, along with PCA as the feature extraction technique, incorporated sparse coding based on dictionary learning. The dictionary was constructed using a dictionary learning function [32]. Sparse coefficients were computed using a sparse decomposition toolbox [33], which implemented the LARS algorithm to enhance the effectiveness of solving the lasso problem and accelerate the process. To evaluate the performance of the proposed technique and compare it with PCA, we utilized a neural network classification toolbox.

The experimental results obtained from the neural network classification are presented in Table 4, which showcases a comparison of the proposed method with other approaches. These approaches include using raw data (preprocessed without feature extraction), principal component analysis (PCA), sparse coding based on a Gaussian random dictionary, and the proposed dictionary learning technique. The comparison of accuracy was evaluated using confusion matrices for four cases: without feature extraction, PCA, sparse coding based on a Gaussian random dictionary, and the proposed method, as shown in Figure 6a,b,c,d, respectively. Table 4 shows significant differences in classification accuracy. The PCA method achieved an accuracy of

31.43 %

, which is markedly lower than that of the proposed method, which demonstrated an accuracy of

81.78 %

. Additionally, it is remarkable that the accuracy of the PCA method is significantly lower than that of the raw data, which yielded an accuracy of

68.38 %

. Sparse coding based on the Gaussian random dictionary achieved an accuracy of

77.27 %

, demonstrating better performance than PCA but still lower performance than the proposed dictionary learning technique. Figure 7a–d shows the ROC curves of each feature extraction techniques, it is clearly to see that the proposed method provides higher recall rate compare with other techniques. Moreover, Table 4 includes the number of features used with raw data, PCA, the Gaussian random dictionary, and the proposed dictionary learning technique. The proposed technique achieved the highest accuracy with the fewest number of features for grasp type classification using hand kinematic data.

To further enhance the performance evaluation, the macro-average F1-scores and average area under the curve (AUC) scores, which are depicted in Figure 8, are included in Table 4. These additional metrics offer insights into the model’s precision, recall, and overall discriminative ability. The proposed sparse coding technique outperforms both PCA and the Gaussian random dictionary in terms of accuracy, macro-average F1-scores, and average AUC. Specifically, the macro-average F1-scores are 79.87 for the proposed method, 66.65 for raw data, 76.06 for the Gaussian random dictionary, and only 20.12 for PCA. The average AUC values further confirm the superior performance of the proposed method, with a score of 0.9535, compared to 0.8958 for raw data, 0.93926 for the Gaussian random dictionary, and 0.5701 for PCA. These numerical results emphasize the effectiveness of the proposed sparse coding approach for achieving accurate and robust hand grasp type classification with reduced feature dimensions.

6. Discussion and Limitations

In the discussion section, we analyze and interpret the results obtained from our experiments, focusing on various performance factors and how our proposed system addresses the challenges highlighted in the introductory part of the article. Firstly, we observed significant improvements in classification accuracy with the proposed sparse-coding-based feature extraction method compared to traditional techniques such as principal component analysis (PCA). As shown in Table 4, the sparse coding approach achieved an accuracy of 81.78%, outperforming PCA by a substantial margin. This improvement can be attributed to the ability of sparse coding to capture essential features while effectively reducing data dimensionality. By representing hand kinematic data as a sparse linear combination of basis functions, the sparse coding method can extract discriminative features crucial for accurate classification.

Additionally, our proposed method offers versatility and applicability across a wide range of classification tasks. Furthermore, the number of features was significantly reduced using the proposed method (678,678 features) compared to raw data (1,620,299 features), PCA (855,000 features), and sparse coding based on Gaussian random dictionary (1,823,449). This reduction in feature dimensionality not only simplifies the computational complexity but also enhances the interpretability of the model. Moreover, the macro-average F1-scores and average AUC values for the proposed methods were substantially higher at 79.87 and 0.9535, respectively, compared to other methods, further highlighting the superior performance of our proposed approach. While the sparse coding approach using a Gaussian random dictionary (shown in Table 4) achieved a respectable accuracy of

77.27 %

, it fell short of the superior performance achieved by our proposed dictionary learning method (

81.78 %

accuracy). This highlights the importance of learning an dictionary that has been optimized to the specific data for achieving the best results.

In terms of addressing the issues highlighted, the proposed technique effectively tackles the challenge of feature extraction for hand kinematic analysis. By employing sparse coding based on dictionary learning techniques, we can extract meaningful features from high-dimensional kinematic data, reducing the complexity of the dataset while preserving essential information. This approach addresses the need for efficient data representation and dimensionality reduction in hand kinematic analysis, facilitating more accurate classification of grasping types. The experimental results validate the effectiveness of the sparse-coding-based feature extraction method based on dictionary learning for hand kinematic analysis and classification. By achieving higher accuracy compared to traditional techniques, these results indicate that our method not only enhances classification accuracy but also effectively reduces feature dimensionality, making it a valuable tool for practical applications in robotics, medicine, and rehabilitation. Future work could explore the application of our method to other types of hand kinematic data and investigate its performance in real-time systems.

In this research, the proposed feature extraction technique was implemented specifically on the UNIPI dataset for classification purposes. Consequently, they may not be directly applicable to other publicly available datasets related to hand kinematics. Furthermore, while the proposed technique proves effective for hand kinematic analysis, it may not be suitable for analyzing hand synergies due to its design limitations. Future research could explore further optimizations and extensions of the proposed method as well as its application in other domains such as natural language processing.

7. Conclusions

In this paper, we proposed a novel sparse coding feature extraction technique based on dictionary learning for classifying human hand grasp types. Our method significantly improves classification accuracy and reduces the number of features required compared to PCA- and Gaussian-random-dictionary-based approaches. The classification accuracy of our proposed technique is compared with PCA and sparse coding based on a Gaussian random dictionary. The results from the classification experiments clearly showed that the proposed method is highly effective at classifying hand grasp types. Specifically, the classification accuracy achieved by the proposed scheme is significantly higher than that of PCA-based and other feature extraction techniques. These findings have important implications for various applications, including robotics and rehabilitation. Future research will focus on extending our technique to other types of hand kinematic data and exploring its real-time application potential. Finally, the sparse-coding-based feature extraction based on the dictionary learning approach presented in this paper provides an alternative feature extraction method for classification. Its potential for broader applications makes it a valuable contribution to the field of machine learning (ML). Particularly, in natural language processing (NLP), feature extraction is a fundamental process for converting raw text data into a proper format that can be easily processed by utilizing machine learning algorithms.

Author Contributions

Conceptualization, J.S. and P.K.; methodology, J.S., P.K. and N.T.M.; software, J.S. and P.K.; validation, M.A.S.K. and I.M.; formal analysis, J.S. and K.Y.; investigation, K.Y.; resources, K.Y.; data curation, J.S.; writing—original draft preparation, J.S.; writing—review and editing, P.K. and K.Y.; supervision, K.Y.; project administration, K.Y.; funding acquisition, K.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Santello, S.M.; Flanders, M.; Soechting, J.F. Postural hand synergies for tool use. J. Neurosci. 1998, 18, 10105–10115. [Google Scholar] [CrossRef] [PubMed]
Carpinella, I.; Mazzoleni, P.; Rabuffetti, M.; Thorsen, R.; Ferrarin, M. Experimental protocol for the kinematic analysis of the hand: Definition and repeatability. Gait Posture 2006, 23, 445–454. [Google Scholar] [CrossRef] [PubMed]
Santello, M.; Flanders, M.; Soechting, J.F. Patterns of hand motion during grasping and the influence of sensory guidance. J. Neurosci. 2002, 22, 1426–1435. [Google Scholar] [CrossRef] [PubMed]
Amor, H.B.; Kroemer, O.; Hillenbrand, U.; Neumann, G.; Peters, J. Generalization of human grasping for multi-fingered robot hands. In Proceedings of the IEEE International Workshop on Intelligent Robots and Systems (IROS), Vilamoura, Portugal, 7–12 October 2012; pp. 2043–2050. [Google Scholar]
Normani, N.; Urru, A.; Abraham, L.; Walsh, M.; Tedesco, S.; Cenedese, A.; Susto, G.A.; O’Flynn, B. A Machine learning approach for gesture recognition with a lensless smart sensor system. In Proceedings of the IEEE 15th International Conference on Wearable and Implantable Body Sensor Networks (BSN), Las Vegas, NV, USA, 4–7 March 2018; pp. 4–7. [Google Scholar]
Núñez, J.C.; Cabido, R.; Pantrigo, J.J.; Montemayor, A.S.; Vélez, J.F. Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. J. Pattern Recognit. 2018, 76, 80–94. [Google Scholar] [CrossRef]
Li, S.; Zhang, H.; Shi, Y.; Yang, J. Novel local coding algorithm for multimodal finger feature description and recognition. Sensors 2019, 19, 2213. [Google Scholar] [CrossRef] [PubMed]
Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transation Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
Jia, W.; Hu, R.X.; Lei, Y.K.; Zhao, Y.; Gui, J. Histogram of oriented lines for palmprint recognition. IEEE Trans. Syst. Man Cybern. Syst. 2014, 44, 385–395. [Google Scholar] [CrossRef]
Rida, I.; AlMaadeed, S.; Mahmood, A.; Bouridane, A.; Bakshi, S. Palmprint identification using an ensemble of sparse representations. IEEE Access 2018, 6, 3241–3248. [Google Scholar] [CrossRef]
Sun, Z.N.; Tan, T.N.; Wang, Y.H.; Li, S.Z. Ordinal palmprint representation for personal identification. In Proceeding of the IEEE Computer Vision Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 279–284. [Google Scholar]
Zhang, L.; Zhang, L.; Zhang, D.; Zhu, H. Online finger-knuckle-print verification for personal authentication. Pattern Recognit. 2010, 43, 2560–2571. [Google Scholar] [CrossRef]
Zhang, L.; Li, L.; Yang, A.; Shen, Y.; Yang, M. Towards contactless palmprint recognition: A novel device, a new benchmark, and a collaborative representation based identification approach. Pattern Recognit. 2017, 69, 199–212. [Google Scholar] [CrossRef]
Li, S.; Zhang, B. Joint Discriminative Sparse Coding for Robust Hand-Based Multimodal Recognition. IEEE Trans. Inf. Forensics Secur. 2021, 16, 3186–3198. [Google Scholar] [CrossRef]
Samkunta, J.; Ketthong, P.; Hashikura, K.; Kamal, M.A.S.; Murakami, I.; Yamada, K. Feature reduction for hand gesture classification: Sparse coding approach. In Proceedings of the 20th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Nakhon Phanom, Thailand, 9–12 May 2023; pp. 1–4. [Google Scholar]
Todorov, E.; Ghahramani, Z. Analysis of the synergies underlying complex hand manipulation. In Proceeding of the 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Francisco, CA, USA, 1–5 September 2004; pp. 4637–4640. [Google Scholar]
Jarque-Bou, N.J.; Scano, A.; Atzori, M.; Müller, H. Kinematic synergies of hand grasps A comprehensive study on a large publicly available dataset. NeuroEngineering Rehabil 2019, 16, 63. [Google Scholar] [CrossRef] [PubMed]
Lapresa, M.; Zollo, L.; Cordella, F. A user-friendly automatic toolbox for hand kinematic analysis, clinical assessment and postural synergies extraction. Front. Bioeng. Biotechnol. 2022, 10, 1010073. [Google Scholar] [CrossRef] [PubMed]
Hemeren, P.; Peter, V.; Swege, T.; Jiong, S. Kinematic-based classification of social gestures and grasping by humans and machine learning techniques. Front. Robot. AI 2021, 8, 699505. [Google Scholar] [CrossRef] [PubMed]
Wang, N.; Lao, K.; Zhang, X.; Lin, J.; Zhang, X. The recognition of grasping force using LDA. Biomed. Signal Process. Control 2019, 47, 393–400. [Google Scholar] [CrossRef]
Jarque-Bou, N.J.; Vergara, M.; Sancho-Bru, J.L.; Gracia-Ibanz, V.; Roda-Sales, A. Hand kinematics characterization while performing activities of daily living through kinematics reduction. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 1556–1565. [Google Scholar] [CrossRef] [PubMed]
Battaglia, E.; Kasman, M.; Fey, A.M. Moving past principal component analysis Nonlinear dimensionality reduction towards better hand pose synthesis. In Proceedings of the International Symposium on Medical Robotics (ISMR), Atlanta, GA, USA, 13–15 April 2022. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference Computer Vision Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Baygin, M.; Barua, P.D.; Dogan, S.; Tuncer, T.; Key, S.; Acharya, U.R.; Cheong, K.H. A Hand-Modeled Feature Extraction-Based Learning Network to Detect Grasps Using sEMG Signal. Sensors 2022, 22, 2007. [Google Scholar] [CrossRef] [PubMed]
Reza, B.A.; Mohammad, E.; Mehrdad, N. EMG-Based Feature Extraction and Classification for Prosthetic Hand Control. arXiv 2021, arXiv:2107.00733. [Google Scholar]
Liu, C.; Wu, X.; Liu, T. Sparse feature extraction based on sparse representation and dictionary learning for rolling bearing fault diagnosis. In Proceedings of the International Conference on Applied System Innovation (ICASI), Sapporo, Japan, 13–17 May 2017; pp. 1733–1735. [Google Scholar]
Ma, S.; Han, Q.; Chu, F. Sparse representation learning for fault feature extraction and diagnosis of rotating machinery. Expert Syst. Appl. 2023, 232, 120858. [Google Scholar] [CrossRef]
Sivaram, G.S.V.S.; Nemala, S.K.; Elhilali, M.; Tran, T.D.; Hermansky, H. Sparse coding for speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14–19 March 2010; pp. 4346–4349. [Google Scholar]
Amintoosi, H.; Taresh, A.J. Sparse coding-based feature extraction for biometric remote authentication in Internet of Things. SN Appl. Sci. 2019, 1, 1098. [Google Scholar] [CrossRef]
Whitaker, B.M.; Suresha, P.B.; Liu, C.; Clifford, G.D.; Anderson, D.V. Combining sparse coding and time-domain features for heart sound classification. Physiol. Meas. 2017, 38, 1701–1713. [Google Scholar] [CrossRef] [PubMed]
Yo, M.C.; Chong, S.C.; Wee, K.K.; Chong, L.Y. Sparse representation with principal component analysis in face recognition. J. Syst. Manag. Sci. 2022, 12, 57–72. [Google Scholar]
Julien, M.; Francis, B.; Jean, P.; Guillermo, S. Online dictionary learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 2010, 11, 19–60. [Google Scholar]
Mairal, J.; Bach, F.; Ponce, J.; Sapiro, G. Online dictionary learning for sparse coding. In Proceedings of the International Conference on Machine Learning (ICML), Montreal, QC, Canada, 14–18 June 2009; pp. 689–696. [Google Scholar]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–451. [Google Scholar] [CrossRef]
Santina, C.D.; Bianchi, M.; Averta, G.; Ciotti, S.; Arapi, V.; Fani, S.; Battaglia, E.; Catalano, M.G.; Santello, M.; Bicchi, A. Postural Hand Synergies during Environmental Constraint Exploitation. IEEE Trans. Robot. 2017, 33, 252–269. [Google Scholar] [CrossRef]

Figure 1. Overview of proposed methodology.

Figure 2. The sparse-coding-based feature extraction technique based on dictionary learning.

Figure 3. Details of proposed method.

Figure 4. Sequences to illustrate five grasp types.

Figure 5. Kinematic model of the human hand.

Figure 6. Confusion matrices for NN classification using several feature extraction techniques: (a) raw data, (b) PCA, (c) sparse coding based on Gaussian random dictionary, and (d) sparse coding based on dictionary learning.

Figure 7. ROC curve of feature extraction techniques: (a) raw data, (b) PCA, (c) sparse coding based on Gaussian random dictionary, and (d) sparse coding based on dictionary learning.

Figure 8. Comparison of AUC values for each class between raw data, PCA, sparse coding based on Gaussian random dictionary, and dictionary learning approach.

Table 1. Hyperparameters of NN classification.

Hyperparameter	Value
Layer size	40
Cost function	Cross-entropy
Activation function	Rectified linear unit (ReLU)
Output classifier	Softmax function
Parameter estimation solver	Limited-memory Broyden–Fletcher–Goldfarb–Shano algorithm (LBFGS)
Regularization parameter (lambda)	0

Table 2. The details of 19 objects and hand grasp types.

No.	Object	Hand Grasp Type
1	2-Euro Coin	Flip
2	Button Badge
3	Key
4	Credit Card	Edge
5	CD
6	Hair-Coloring Comb
7	Saltshaker	Closing
8	Tape
9	Chess (Queen)
10	Knob
11	Matchbox
12	Screw	Pinch
13	Match
14	Cigarette
15	Rubber Band
16	Maker	Rotation
17	Screwdriver
18	Shashlik
19	Glasses

Table 3. Description of degrees of freedom.

No.	DoFs	Description
1	TA	Thumb Abduction
2	TR	Thumb Rotation
3	TM	Thumb Metacarpal
4	TI	Thumb Interphalangeal
5	IA	Index Abduction
6	IM	Index Metacarpal
7	IP	Index Proximal
8	ID	Index Distal
9	MA	Middle Abduction
10	MM	Middle Metacarpal
11	MP	Middle Proximal
12	MD	Middle Distal
13	RA	Ring Abduction
14	RM	Ring Metacarpal
15	RP	Ring Proximal
16	RD	Ring Distal
17	LA	Little Abduction
18	LM	Little Metacarpal
19	LP	Little Proximal
20	LD	Little Distal

Table 4. Comparison of classification performance between raw data, principal component analysis (PCA), sparse coding based on Gaussian random dictionary, and dictionary learning.

Method	Accuracy (%)	Number of Features	Macro-Average F1-Score	Average AUC
Raw Data	68.38	1,620,299	66.65	0.8958
PCA	31.43	855,000	20.12	0.5701
Gaussian Random	77.27	1,823,449	76.06	0.93926
Dictionary Learning	81.78	678,678	79.87	0.9535

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Samkunta, J.; Ketthong, P.; Mai, N.T.; Kamal, M.A.S.; Murakami, I.; Yamada, K. Feature Extraction Based on Sparse Coding Approach for Hand Grasp Type Classification. Algorithms 2024, 17, 240. https://doi.org/10.3390/a17060240

AMA Style

Samkunta J, Ketthong P, Mai NT, Kamal MAS, Murakami I, Yamada K. Feature Extraction Based on Sparse Coding Approach for Hand Grasp Type Classification. Algorithms. 2024; 17(6):240. https://doi.org/10.3390/a17060240

Chicago/Turabian Style

Samkunta, Jirayu, Patinya Ketthong, Nghia Thi Mai, Md Abdus Samad Kamal, Iwanori Murakami, and Kou Yamada. 2024. "Feature Extraction Based on Sparse Coding Approach for Hand Grasp Type Classification" Algorithms 17, no. 6: 240. https://doi.org/10.3390/a17060240

APA Style

Samkunta, J., Ketthong, P., Mai, N. T., Kamal, M. A. S., Murakami, I., & Yamada, K. (2024). Feature Extraction Based on Sparse Coding Approach for Hand Grasp Type Classification. Algorithms, 17(6), 240. https://doi.org/10.3390/a17060240

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Feature Extraction Based on Sparse Coding Approach for Hand Grasp Type Classification

Abstract

1. Introduction

2. Related Work

3. Sparse Coding Feature Extraction Based on Dictionary Learning Approach

4. Neural Network Classification

5. Experiments

5.1. Dataset

5.2. Results for the UNIPI Dataset

6. Discussion and Limitations

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI