Optimizing Rare Disease Gait Classification through Data Balancing and Generative AI: Insights from Hereditary Cerebellar Ataxia
Abstract
:1. Introduction
2. Materials and Methods
2.1. Gait Data Acquisition
2.1.1. Subjects
2.1.2. Procedures
2.2. Pre-Processing
Feature Selection
2.3. Data Balancing Strategies
2.3.1. Undersampling and Oversampling
2.3.2. Synthetic Minority Oversampling Technique (SMOTE)
2.3.3. Generative Adversarial Network (GAN)
2.3.4. Conditional Tabular Generative Adversarial Network (ctGAN)
2.4. ML Classification Algorithm
- The total amount of trees in the forest; an interval between 50 and 500 was chosen as a compromise between searching for improving model performance and computational costs, implying that the optimal number of trees between 50 and 500 was sought.
- The tree’s maximum depth; a range between 2 and 20 was specified to analyze trees of varying depths, from very simple (2 levels) to highly complicated (20 levels). Deeper trees than 20 levels could have captured more complicated associations, but they would also increase the risk of overfitting in training data.
- The smallest number of samples necessary to split an internal node; values ranging from 2 to 10 were chosen, thus limiting the minimum number of samples required in a node to be considered for subsequent splits, hence preventing overfitting.
- The minimal number of samples needed to form a leaf node; we specified the range of minimum samples required in a leaf node in a range from 1 to 10 in order to optimize the bias/variance trade-off.
2.4.1. Performance Metrics
- Accuracy was defined as the proportion of accurately positive and negative predicted cases based on the total number of cases. It was computed as
- Precision is the proportion of correctly predicted positive cases to the total predicted positives. For each class, it was calculated as
- F1 Score represents the harmonic mean of precision and recall, yielding a single score that balances both criteria. It is particularly beneficial when you need to balance precision and recall. It was calculated as
- Log loss is a performance metric that measures the penalty based on the likelihood that the model assigns to the actual correct class.
- Receiver Operating Characteristic Curves (ROCs) were plotted and their Area Under the Curve (AUC) was calculated. AUC is an overall performance metric of the classifier, with values ranging from 0 to 1, with 1 representing a flawless model that accurately separates all positive cases from negative ones [70].
2.4.2. Consistency and Explainability Analysis
3. Results
3.1. Feature Selection Results
Supervised ML Classification Metrics
3.2. Consistency and Explainability Results
4. Discussion
- Testing various dataset balancing strategies revealed that the analyzed generative artificial intelligence methods outperformed traditional techniques in terms of the classifier’s performances.
- ctGAN was the best method for balancing sample classes when classifying a rare condition such as cerebellar ataxia based on inertial sensor gait tabular data (Table 4).
- The synthetic data generated by the ctGAN model appeared to be reliable because of their strong similarity with the original data.
- The synthetic data generated by the ctGAN model yielded sound and explainable results regarding the impact of gait variables on the classification model.
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
CA Subtype | # | |
---|---|---|
Subjects (n) | ACD | 5 |
SAOA | 4 | |
MSA-C | 2 | |
SYNE1 | 1 | |
SCA-NDD | 2 | |
SCA1 | 6 | |
SCA2 | 4 | |
SCA3 | 2 | |
SCA6 | 1 | |
SCA8 | 2 | |
SCA27b | 1 | |
pwCA (n = 30) | HS (n = 100) | |
Age (years) | 51.60 (12.73) | 57.08 (10.40) |
SARA (n) | 12.66 (4.68) | |
SARAgait (n) | 3.03 (1.19) | |
Falls (n) | 3.43 (4.48) | |
Gait speed (m/s) | 0.97 (0.25) | 1.02 (0.24) |
Stance phase (% Gait cycle) | 64.66 (3.31) | 61.62 (4.94) |
Swing phase (% Gait cycle) | 35.34 (3.31) | 38.03 (3.36) |
Double support phase (% Gait cycle) | 14.70 (3.53) | 12.29 (4.84) |
Single support phase (% Gait cycle) | 35.26 (3.83) | 37.58 (5.42) |
Cadence (steps/min) | 97.92 (17.93) | 99.39 (13.23) |
Stride length (m) | 1.17 (0.19) | 1.24 (0.18) |
Pelvic tilt (°) | 3.05 (0.98) | 2.99 (1.12) |
Pelvic obliquity (°) | 3.97 (3.99) | 3.76 (4.47) |
Pelvic rotation (°) | 3.97 (2.62) | 5.19 (2.46) |
HRap | 1.84 (0.57) | 2.45 (0.68) |
HRml | 1.79 (0.45) | 2.23 (0.54) |
HRv | 1.81 (0.45) | 2.33 (0.62) |
RQA RECap (%) | 7.09 (10.32) | 5.15 (8.57) |
RQA RECml (%) | 5.02 (5.58) | 4.82 (6.68) |
RQA RECv (%) | 6.05 (10.17) | 4.70 (6.34) |
RQA DETap (%) | 33.70 (26.65) | 29.54 (26.35) |
RQA DETml (%) | 37.17 (24.21) | 32.79 (25.73) |
RQA DETv (%) | 27.00 (26.14) | 24.73 (21.50) |
(%) | 43.23 (16.14) | 23.63 (12.52) |
sLLEap (1/s) | 0.58 (0.22) | 0.40 (0.21) |
sLLEml (1/s) | 0.36 (0.20) | 0.25 (0.17) |
sLLEv (1/s) | 0.39 (0.20) | 0.37 (0.25) |
References
- David, P.F.; David, R.C.; Juan, M.C.; Diego, T. Human Locomotion Databases: A Systematic Review. IEEE J. Biomed. Health Inform. 2024, 28, 1716–1729. [Google Scholar] [CrossRef] [PubMed]
- Rinaldi, M.; Ranavolo, A.; Conforto, S.; Martino, G.; Draicchio, F.; Conte, C.; Varrecchia, T.; Bini, F.; Casali, C.; Pierelli, F.; et al. Increased Lower Limb Muscle Coactivation Reduces Gait Performance and Increases Metabolic Cost in Patients with Hereditary Spastic Paraparesis. Clin. Biomech. 2017, 48, 63–72. [Google Scholar] [CrossRef] [PubMed]
- Buckley, E.; Mazzà, C.; McNeill, A. A Systematic Review of the Gait Characteristics Associated with Cerebellar Ataxia. Gait Posture 2018, 60, 154–163. [Google Scholar] [CrossRef] [PubMed]
- Giordano, I.; Harmuth, F.; Jacobi, H.; Paap, B.; Vielhaber, S.; MacHts, J.; Schöls, L.; Synofzik, M.; Sturm, M.; Tallaksen, C.; et al. Clinical and Genetic Characteristics of Sporadic Adult-Onset Degenerative Ataxia. Neurology 2017, 89, 1043–1049. [Google Scholar] [CrossRef] [PubMed]
- Coarelli, G.; Wirth, T.; Tranchant, C.; Koenig, M.; Durr, A.; Anheim, M. The Inherited Cerebellar Ataxias: An Update. J. Neurol. 2023, 270, 208–222. [Google Scholar] [CrossRef] [PubMed]
- Manto, M.; Gandini, J.; Feil, K.; Strupp, M. Cerebellar Ataxias: An Update. Curr. Opin. Neurol. 2020, 33, 150–160. [Google Scholar] [CrossRef] [PubMed]
- Manto, M.; Serrao, M.; Filippo Castiglia, S.; Timmann, D.; Tzvi-Minker, E.; Pan, M.K.; Kuo, S.H.; Ugawa, Y. Neurophysiology of Cerebellar Ataxias and Gait Disorders. Clin. Neurophysiol. Pract. 2023, 8, 143. [Google Scholar] [CrossRef] [PubMed]
- Cabaraux, P.; Agrawal, S.K.; Cai, H.; Calabro, R.S.; Casali, C.; Damm, L.; Doss, S.; Habas, C.; Horn, A.K.E.; Ilg, W.; et al. Consensus Paper: Ataxic Gait. Cerebellum 2023, 22, 394–430. [Google Scholar] [CrossRef] [PubMed]
- Martino, G.; Ivanenko, Y.P.; Serrao, M.; Ranavolo, A.; d’Avella, A.; Draicchio, F.; Conte, C.; Casali, C.; Lacquaniti, F. Locomotor Patterns in Cerebellar Ataxia. J. Neurophysiol. 2014, 112, 2810–2821. [Google Scholar] [CrossRef]
- Serrao, M.; Chini, G.; Bergantino, M.; Sarnari, D.; Casali, C.; Conte, C.; Ranavolo, A.; Marcotulli, C.; Rinaldi, M.; Coppola, G.; et al. Identification of Specific Gait Patterns in Patients with Cerebellar Ataxia, Spastic Paraplegia, and Parkinson’s Disease: A Non-Hierarchical Cluster Analysis. Hum. Mov. Sci. 2018, 57, 267–279. [Google Scholar] [CrossRef]
- Conte, C.; Serrao, M.; Casali, C.; Ranavolo, A.; Mari, S.; Draicchio, F.; Di Fabio, R.; Monami, S.; Padua, L.; Iavicoli, S.; et al. Planned Gait Termination in Cerebellar Ataxias. Cerebellum 2012, 11, 896–904. [Google Scholar] [CrossRef]
- Caliandro, P.; Iacovelli, C.; Conte, C.; Simbolotti, C.; Rossini, P.M.; Padua, L.; Casali, C.; Pierelli, F.; Reale, G.; Serrao, M. Trunk-Lower Limb Coordination Pattern during Gait in Patients with Ataxia. Gait Posture 2017, 57, 252–257. [Google Scholar] [CrossRef]
- Zampogna, A.; Mileti, I.; Palermo, E.; Celletti, C.; Paoloni, M.; Manoni, A.; Mazzetta, I.; Costa, G.D.; Pérez-López, C.; Camerota, F.; et al. Fifteen Years of Wireless Sensors for Balance Assessment in Neurological Disorders. Sensors 2020, 20, 3247. [Google Scholar] [CrossRef] [PubMed]
- Bernhard, F.P.; Sartor, J.; Bettecken, K.; Hobert, M.A.; Arnold, C.; Weber, Y.G.; Poli, S.; Margraf, N.G.; Schlenstedt, C.; Hansen, C.; et al. Wearables for Gait and Balance Assessment in the Neurological Ward—Study Design and First Results of a Prospective Cross-Sectional Feasibility Study with 384 Inpatients. BMC Neurol. 2018, 18, 114. [Google Scholar]
- Bergamini, E.; Iosa, M.; Belluscio, V.; Morone, G.; Tramontano, M.; Vannozzi, G. Multi-Sensor Assessment of Dynamic Balance during Gait in Patients with Subacute Stroke. J. Biomech. 2017, 61, 208–215. [Google Scholar] [CrossRef] [PubMed]
- Tao, W.; Liu, T.; Zheng, R.; Feng, H. Gait Analysis Using Wearable Sensors. Sensors 2012, 12, 2255–2283. [Google Scholar] [CrossRef] [PubMed]
- Buckley, C.; Alcock, L.; McArdle, R.; Ur Rehman, R.Z.; Del Din, S.; Mazzà, C.; Yarnall, A.J.; Rochester, L. The Role of Movement Analysis in Diagnosing and Monitoring Neurodegenerative Conditions: Insights from Gait and Postural Control. Brain Sci. 2019, 9, 34. [Google Scholar] [CrossRef] [PubMed]
- Felius, R.A.W.; Geerars, M.; Bruijn, S.M.; van Dieën, J.H.; Wouda, N.C.; Punt, M. Reliability of IMU-Based Gait Assessment in Clinical Stroke Rehabilitation. Sensors 2022, 22, 908. [Google Scholar] [CrossRef] [PubMed]
- Hansen, C.; Ortlieb, C.; Romijnders, R.; Warmerdam, E.; Welzel, J.; Geritz, J.; Maetzler, W. Reliability of IMU-Derived Temporal Gait Parameters in Neurological Diseases. Sensors 2022, 22, 2304. [Google Scholar] [CrossRef]
- Pacini Panebianco, G.; Bisi, M.C.; Stagni, R.; Fantozzi, S. Analysis of the Performance of 17 Algorithms from a Systematic Review: Influence of Sensor Position, Analysed Variable and Computational Approach in Gait Timing Estimation from IMU Measurements. Gait Posture 2018, 66, 76–82. [Google Scholar] [CrossRef]
- Castiglia, S.F.; Tatarelli, A.; Trabassi, D.; De Icco, R.; Grillo, V.; Ranavolo, A.; Varrecchia, T.; Magnifica, F.; Di Lenola, D.; Coppola, G.; et al. Ability of a Set of Trunk Inertial Indexes of Gait to Identify Gait Instability and Recurrent Fallers in Parkinson’s Disease. Sensors 2021, 21, 3449. [Google Scholar] [CrossRef] [PubMed]
- Castiglia, S.F.; Trabassi, D.; Tatarelli, A.; Ranavolo, A.; Varrecchia, T.; Fiori, L.; Di Lenola, D.; Cioffi, E.; Raju, M.; Coppola, G.; et al. Identification of Gait Unbalance and Fallers Among Subjects with Cerebellar Ataxia by a Set of Trunk Acceleration-Derived Indices of Gait. Cerebellum 2022, 22, 46–58. [Google Scholar] [CrossRef] [PubMed]
- Castiglia, S.F.; Trabassi, D.; Conte, C.; Gioiosa, V.; Sebastianelli, G.; Abagnale, C.; Ranavolo, A.; Di Lorenzo, C.; Coppola, G.; Casali, C.; et al. Local Dynamic Stability of Trunk During Gait Is Responsive to Rehabilitation in Subjects with Primary Degenerative Cerebellar Ataxia. Cerebellum 2024. [Google Scholar] [CrossRef] [PubMed]
- Castiglia, S.F.; Trabassi, D.; Conte, C.; Ranavolo, A.; Coppola, G.; Sebastianelli, G.; Abagnale, C.; Barone, F.; Bighiani, F.; De Icco, R.; et al. Multiscale Entropy Algorithms to Analyze Complexity and Variability of Trunk Accelerations Time Series in Subjects with Parkinson’s Disease. Sensors 2023, 23, 4983. [Google Scholar] [CrossRef] [PubMed]
- Trabassi, D.; Serrao, M.; Varrecchia, T.; Ranavolo, A.; Coppola, G.; De Icco, R.; Tassorelli, C.; Castiglia, S.F. Machine Learning Approach to Support the Detection of Parkinson’s Disease in IMU-Based Gait Analysis. Sensors 2022, 22, 3700. [Google Scholar] [CrossRef] [PubMed]
- Mirelman, A.; Ben Or Frank, M.; Melamed, M.; Granovsky, L.; Nieuwboer, A.; Rochester, L.; Del Din, S.; Avanzino, L.; Pelosin, E.; Bloem, B.R.; et al. Detecting Sensitive Mobility Features for Parkinson’s Disease Stages Via Machine Learning. Mov. Disord. 2021, 36, 2144–2155. [Google Scholar] [CrossRef] [PubMed]
- Phinyomark, A.; Petri, G.; Ibáñez-Marcelo, E.; Osis, S.T.; Ferber, R. Analysis of Big Data in Gait Biomechanics: Current Trends and Future Directions. J. Med. Biol. Eng. 2018, 38, 244–260. [Google Scholar] [CrossRef] [PubMed]
- Khera, P.; Kumar, N. Role of Machine Learning in Gait Analysis: A Review. J. Med. Eng. Technol. 2020, 44, 441–467. [Google Scholar] [CrossRef] [PubMed]
- Greve, C.; Tam, H.; Grabherr, M.; Ramesh, A.; Scheerder, B.; Hijmans, J.M. Flexible Machine Learning Algorithms for Clinical Gait Assessment Tools. Sensors 2022, 22, 4957. [Google Scholar] [CrossRef]
- Hummel, J.; Schwenk, M.; Seebacher, D.; Barzyk, P.; Liepert, J.; Stein, M. Clustering Approaches for Gait Analysis within Neurological Disorders: A Narrative Review. Digit. Biomarkers 2024, 8, 93. [Google Scholar]
- Abdollahi, M.; Rashedi, E.; Jahangiri, S.; Kuber, P.M.; Azadeh-Fard, N.; Dombovy, M. Fall Risk Assessment in Stroke Survivors: A Machine Learning Model Using Detailed Motion Data from Common Clinical Tests and Motor-Cognitive Dual-Tasking. Sensors 2024, 24, 812. [Google Scholar] [CrossRef] [PubMed]
- Phan, D.; Nguyen, N.; Pathirana, P.N.; Horne, M.; Power, L.; Szmulewicz, D. A Random Forest Approach for Quantifying Gait Ataxia with Truncal and Peripheral Measurements Using Multiple Wearable Sensors. IEEE Sens. J. 2020, 20, 723–734. [Google Scholar] [CrossRef]
- Varrecchia, T.; Castiglia, S.F.; Ranavolo, A.; Conte, C.; Tatarelli, A.; Coppola, G.; Di Lorenzo, C.; Draicchio, F.; Pierelli, F.; Serrao, M. An Artificial Neural Network Approach to Detect Presence and Severity of Parkinson’s Disease via Gait Parameters. PLoS ONE 2021, 16, e0244396. [Google Scholar] [CrossRef] [PubMed]
- Liuzzi, P.; Carpinella, I.; Anastasi, D.; Gervasoni, E.; Lencioni, T.; Bertoni, R.; Carrozza, M.C.; Cattaneo, D.; Ferrarin, M.; Mannini, A. Machine Learning Based Estimation of Dynamic Balance and Gait Adaptability in Persons with Neurological Diseases Using Inertial Sensors. Sci. Rep. 2023, 13, 8640. [Google Scholar] [CrossRef] [PubMed]
- Mannini, A.; Trojaniello, D.; Cereatti, A.; Sabatini, A.M. A Machine Learning Framework for Gait Classification Using Inertial Sensors: Application to Elderly, Post-Stroke and Huntington’s Disease Patients. Sensors 2016, 16, 134. [Google Scholar] [CrossRef]
- Shah, V.; Flood, M.W.; Grimm, B.; Dixon, P.C. Generalizability of Deep Learning Models for Predicting Outdoor Irregular Walking Surfaces. J. Biomech. 2022, 139, 111159. [Google Scholar] [CrossRef] [PubMed]
- Moore, J.; Stuart, S.; McMeekin, P.; Walker, R.; Celik, Y.; Pointon, M.; Godfrey, A. Enhancing Free-Living Fall Risk Assessment: Contextualizing Mobility Based IMU Data. Sensors 2023, 23, 891. [Google Scholar] [CrossRef] [PubMed]
- Mazurowski, M.A.; Habas, P.A.; Zurada, J.M.; Lo, J.Y.; Baker, J.A.; Tourassi, G.D. Training Neural Network Classifiers for Medical Decision Making: The Effects of Imbalanced Datasets on Classification Performance. Neural Netw. 2008, 21, 427–436. [Google Scholar] [CrossRef]
- Kotsiantis, S.; Kanellopoulos, D.; Pintelas, P. Handling Imbalanced Datasets: A Review. GESTS Int. Trans. Comput. Sci. Eng. 2006, 30, 25–36. [Google Scholar]
- Mumuni, A.; Mumuni, F. Data Augmentation: A Comprehensive Survey of Modern Approaches. Array 2022, 16, 100258. [Google Scholar] [CrossRef]
- Taherdoost, H. Sampling Methods in Research Methodology; How to Choose a Sampling Technique for Research. Int. J. Acad. Res. Manag. 2016, 5, 18–27. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Fernández, A.; García, S.; Herrera, F.; Chawla, N.V. SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-Year Anniversary. J. Artif. Intell. Res. 2018, 61, 863–905. [Google Scholar] [CrossRef]
- Salehi, A.R.; Khedmati, M. A Cluster-Based SMOTE Both-Sampling (CSBBoost) Ensemble Algorithm for Classifying Imbalanced Data. Sci. Rep. 2024, 14, 5152. [Google Scholar] [CrossRef]
- Johnson, J.M.; Khoshgoftaar, T.M. Survey on Deep Learning with Class Imbalance. J. Big Data 2019, 6, 27. [Google Scholar] [CrossRef]
- Nazir, S.; Kaleem, M. Federated Learning for Medical Image Analysis with Deep Neural Networks. Diagnostics 2023, 13, 1532. [Google Scholar] [CrossRef] [PubMed]
- Uchitomi, H.; Ming, X.; Zhao, C.; Ogata, T.; Miyake, Y. Classification of Mild Parkinson’s Disease: Data Augmentation of Time-Series Gait Data Obtained via Inertial Measurement Units. Sci. Rep. 2023, 13, 12638. [Google Scholar] [CrossRef] [PubMed]
- Lopez-Nava, I.H.; Valentín-Coronado, L.M.; Garcia-Constantino, M.; Favela, J. Gait Activity Classification on Unbalanced Data from Inertial Sensors Using Shallow and Deep Learning. Sensors 2020, 20, 4756. [Google Scholar] [CrossRef] [PubMed]
- Zhang, G.P. Neural networks for classification: A survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2000, 30, 451–462. [Google Scholar] [CrossRef]
- Qu, W.; Balki, I.; Mendez, M.; Valen, J.; Levman, J.; Tyrrell, P.N. Assessing and Mitigating the Effects of Class Imbalance in Machine Learning with Application to X-Ray Imaging. Int. J. Comput. Assist. Radiol. Surg. 2020, 15, 2041–2048. [Google Scholar]
- Ambesange, S.; Vijayalaxmi, A.; Uppin, R.; Patil, S.; Patil, V. Optimizing Liver Disease Prediction with Random Forest by Various Data Balancing Techniques. In Proceedings of the 2020 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), Bengaluru, India, 6–7 November 2020; pp. 98–102. [Google Scholar] [CrossRef]
- Khalilia, M.; Chakraborty, S.; Popescu, M. Predicting Disease Risks from Highly Imbalanced Data Using Random Forest. BMC Med. Inform. Decis. Mak. 2011, 11, 51. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Sci. Robot. 2014, 3, 2672–2680. [Google Scholar] [CrossRef]
- Shaafi Kabiri, N.; Syed, S.; Bali, T.; Karlin, D.R.; Binneman, B.; Tan, Y.; Steinman, A.; Cote, A.C.; Thomas, K.C. Evaluation of the Use of the Scale for the Assessment and Rating of Ataxia (SARA) in Healthy Volunteers and Patients with Schizophrenia. J. Neurol. Sci. 2018, 391, 40–44. [Google Scholar] [CrossRef]
- Schmitz-Hübsch, T.; Du Montcel, S.T.; Baliko, L.; Berciano, J.; Boesch, S.; Depondt, C.; Giunti, P.; Globas, C.; Infante, J.; Kang, J.S.; et al. Scale for the Assessment and Rating of Ataxia: Development of a New Clinical Scale. Neurology 2006, 66, 1717–1720. [Google Scholar] [CrossRef]
- Serrao, M.; Chini, G.; Casali, C.; Conte, C.; Rinaldi, M.; Ranavolo, A.; Marcotulli, C.; Leonardi, L.; Fragiotta, G.; Bini, F.; et al. Progression of Gait Ataxia in Patients with Degenerative Cerebellar Disorders: A 4-Year Follow-Up Study. Cerebellum 2017, 16, 629–637. [Google Scholar] [CrossRef]
- Fiori, L.; Ranavolo, A.; Varrecchia, T.; Tatarelli, A.; Conte, C.; Draicchio, F.; Castiglia, S.F.; Coppola, G.; Casali, C.; Pierelli, F.; et al. Impairment of Global Lower Limb Muscle Coactivation during Walking in Cerebellar Ataxias. Cerebellum 2020, 19, 583–596. [Google Scholar] [CrossRef]
- Serrao, M.; Casali, C.; Ranavolo, A.; Mari, S.; Conte, C.; Chini, G.; Leonardi, L.; Coppola, G.; DI Lorenzo, C.; Harfoush, M.; et al. Use of Dynamic Movement Orthoses to Improve Gait Stability and Trunk Control in Ataxic Patients. Eur. J. Phys. Rehabil. Med. 2017, 53, 735–743. [Google Scholar] [CrossRef]
- Riva, F.; Grimpampi, E.; Mazzà, C.; Stagni, R. Are Gait Variability and Stability Measures Influenced by Directional Changes? Biomed. Eng. Online 2014, 13, 56. [Google Scholar] [CrossRef]
- Riva, F.; Bisi, M.C.; Stagni, R. Gait Variability and Stability Measures: Minimum Number of Strides and within-Session Reliability. Comput. Biol. Med. 2014, 50, 9–13. [Google Scholar] [CrossRef]
- Kroneberg, D.; Elshehabi, M.; Meyer, A.C.; Otte, K.; Doss, S.; Paul, F.; Nussbaum, S.; Berg, D.; Kühn, A.A.; Maetzler, W.; et al. Less Is More—Estimation of the Number of Strides Required to Assess Gait Variability in Spatially Confined Settings. Front. Aging Neurosci. 2019, 11, 389096. [Google Scholar]
- Pasciuto, I.; Bergamini, E.; Iosa, M.; Vannozzi, G.; Cappozzo, A. Overcoming the Limitations of the Harmonic Ratio for the Reliable Assessment of Gait Symmetry. J. Biomech. 2017, 53, 84–89. [Google Scholar] [CrossRef]
- Raffalt, P.C.; Kent, J.A.; Wurdeman, S.R.; Stergiou, N. Selection Procedures for the Largest Lyapunov Exponent in Gait Biomechanics. Ann. Biomed. Eng. 2019, 47, 913–923. [Google Scholar] [CrossRef]
- Werner de Vargas, V.; Schneider Aranda, J.A.; dos Santos Costa, R.; da Silva Pereira, P.R.; Victória Barbosa, J.L. Imbalanced Data Preprocessing Techniques for Machine Learning: A Systematic Mapping Study. Knowl. Inf. Syst. 2023, 65, 31–57. [Google Scholar]
- Ahsan, M.M.; Mahmud, M.A.P.; Saha, P.K.; Gupta, K.D.; Siddique, Z. Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance. Technologies 2021, 9, 52. [Google Scholar] [CrossRef]
- Qi, Y. Random Forest for Bioinformatics. In Ensemble Machine Learn; Zhang, C., Ma, Y., Eds.; Springer: New York, NY, USA, 2012; pp. 307–323. [Google Scholar] [CrossRef]
- Chandrashekar, G.; Sahin, F. A Survey on Feature Selection Methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
- Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling Tabular Data Using Conditional GAN. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
- Wu, J.; Chen, X.Y.; Zhang, H.; Xiong, L.D.; Lei, H.; Deng, S.H. Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
- Avati, A.; Jung, K.; Harman, S.; Downing, L.; Ng, A.; Shah, N.H. Improving Palliative Care with Deep Learning. BMC Med. Inform. Decis. Mak. 2018, 18, 55–64. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 2017, 4766–4775. [Google Scholar]
- Caliandro, P.; Conte, C.; Iacovelli, C.; Tatarelli, A.; Castiglia, S.F.; Reale, G.; Serrao, M. Exploring Risk of Falls and Dynamic Unbalance in Cerebellar Ataxia by Inertial Sensor Assessment. Sensors 2019, 19, 5571. [Google Scholar] [CrossRef]
- Akhiat, Y.; Manzali, Y.; Chahhou, M.; Zinedine, A. A New Noisy Random Forest Based Method for Feature Selection. Cybern. Inf. Technol. 2021, 21, 10–28. [Google Scholar] [CrossRef]
- Bebortta, S.; Panda, M.; Panda, S. Classification of Pathological Disorders in Children Using Random Forest Algorithm. In Proceedings of the 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, 24–25 February 2020. [Google Scholar] [CrossRef]
- Alkhatib, R.; DIab, M.O.; Corbier, C.; Badaoui, M. El Machine Learning Algorithm for Gait Analysis and Classification on Early Detection of Parkinson. IEEE Sens. Lett. 2020, 4, 6000604. [Google Scholar] [CrossRef]
- Ricciardi, C.; Amboni, M.; De Santis, C.; Ricciardelli, G.; Improta, G.; Iuppariello, L.; D’Addio, G.; Barone, P.; Cesarelli, M. Classifying Different Stages of Parkinson’s Disease through Random Forests. IFMBE Proc. 2020, 76, 1155–1162. [Google Scholar]
- Balaji, E.; Brindha, D.; Balakrishnan, R. Supervised Machine Learning Based Gait Classification System for Early Detection and Stage Classification of Parkinson’s Disease. Appl. Soft Comput. 2020, 94, 106494. [Google Scholar] [CrossRef]
- Jeon, Y.; Kang, J.; Kim, B.C.; Lee, K.H.; Song, J.I.; Gwak, J. Early Alzheimer’s Disease Diagnosis Using Wearable Sensors and Multilevel Gait Assessment: A Machine Learning Ensemble Approach. IEEE Sens. J. 2023, 23, 10041–10053. [Google Scholar] [CrossRef]
- Ricciardi, C.; Amboni, M.; De Santis, C.; Improta, G.; Volpe, G.; Iuppariello, L.; Ricciardelli, G.; D’Addio, G.; Vitale, C.; Barone, P.; et al. Using Gait Analysis’ Parameters to Classify Parkinsonism: A Data Mining Approach. Comput. Methods Programs Biomed. 2019, 180, 105033. [Google Scholar] [CrossRef]
- Yang, J.; Lu, H.; Li, C.; Hu, X.; Hu, B. Data Augmentation for Depression Detection Using Skeleton-Based Gait Information. Med. Biol. Eng. Comput. 2022, 60, 2665–2679. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Yang, X.H.; Wei, Z.; Heidari, A.A.; Zheng, N.; Li, Z.; Chen, H.; Hu, H.; Zhou, Q.; Guan, Q. Generative Adversarial Networks in Medical Image Augmentation: A Review. Comput. Biol. Med. 2022, 144, 105382. [Google Scholar] [CrossRef]
- Ramesh, V.; Bilal, E. Detecting Motor Symptom Fluctuations in Parkinson’s Disease with Generative Adversarial Networks. npj Digit. Med. 2022, 5, 138. [Google Scholar] [CrossRef]
- Bicer, M.; Phillips, A.T.M.; Melis, A.; McGregor, A.H.; Modenese, L. Generative Deep Learning Applied to Biomechanics: A New Augmentation Technique for Motion Capture Datasets. J. Biomech. 2022, 144, 111301. [Google Scholar] [CrossRef]
- Oliveira, G.C.; Ngo, Q.C.; Passos, L.A.; Papa, J.P.; Jodas, D.S.; Kumar, D. Tabular Data Augmentation for Video-Based Detection of Hypomimia in Parkinson’s Disease. Comput. Methods Programs Biomed. 2023, 240, 107713. [Google Scholar] [CrossRef] [PubMed]
- Kim, M.; Hargrove, L.J. Generating Synthetic Gait Patterns Based on Benchmark Datasets for Controlling Prosthetic Legs. J. Neuroeng. Rehabil. 2023, 20, 115. [Google Scholar] [CrossRef] [PubMed]
- Shi, X.; Weightman, A.; Cooper, G.; Dawes, H.; Bradbury, K.; Rahulamathavan, Y.; Peppes, N.; Tsakanikas, P.; Daskalakis, E.; Alexakis, T.; et al. FoGGAN: Generating Realistic Parkinson’s Disease Freezing of Gait Data Using GANs. Sensors 2023, 23, 8158. [Google Scholar] [CrossRef] [PubMed]
- Sauber-Cole, R.; Khoshgoftaar, T.M. The Use of Generative Adversarial Networks to Alleviate Class Imbalance in Tabular Data: A Survey. J. Big Data 2022, 9, 98. [Google Scholar]
- Lee, J.H.; Park, K.H. GAN-Based Imbalanced Data Intrusion Detection System. Pers. Ubiquitous Comput. 2021, 25, 121–128. [Google Scholar]
- Lang, O.; Yaya-Stupp, D.; Traynis, I.; Cole-Lewis, H.; Bennett, C.R.; Lyles, C.R.; Lau, C.; Irani, M.; Semturs, C.; Webster, D.R.; et al. Using Generative AI to Investigate Medical Imagery Models and Datasets. eBioMedicine 2024, 102, 105075. [Google Scholar] [CrossRef] [PubMed]
- Ktena, I.; Wiles, O.; Albuquerque, I.; Rebuffi, S.A.; Tanno, R.; Roy, A.G.; Azizi, S.; Belgrave, D.; Kohli, P.; Cemgil, T.; et al. Generative Models Improve Fairness of Medical Classifiers under Distribution Shifts. Nat. Med. 2024, 30, 1166–1173. [Google Scholar] [CrossRef]
- Wu, X.; Zhang, D.; Li, G.; Gao, X.; Metcalfe, B.; Chen, L. Data Augmentation for Invasive Brain-Computer Interfaces Based on Stereo-Electroencephalography (SEEG). J. Neural Eng. 2024, 21, 016026. [Google Scholar] [CrossRef] [PubMed]
- Zuo, Q.; Shen, Y.; Zhong, N.; Chen, C.L.P.; Lei, B.; Wang, S. Alzheimer’s Disease Prediction via Brain Structural-Functional Deep Fusing Network. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 4601–4612. [Google Scholar] [CrossRef]
- Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cognit. Comput. 2024, 16, 45–74. [Google Scholar]
- Mosca, E.; Szigeti, F.; Tragianni, S.; Gallagher, D.; Groh, G. SHAP-Based Explanation Methods: A Review for NLP Interpretability; International Committee on Computational Linguistics: Praha, Czech Republic, 2022; pp. 4593–4603. [Google Scholar]
- Ilg, W.; Seemann, J.; Giese, M.; Traschütz, A.; Schöls, L.; Timmann, D.; Synofzik, M. Real-Life Gait Assessment in Degenerative Cerebellar Ataxia: Toward Ecologically Valid Biomarkers. Neurology 2020, 95, E1199–E1210. [Google Scholar] [CrossRef] [PubMed]
- Serrao, M.; Chini, G.; Iosa, M.; Casali, C.; Morone, G.; Conte, C.; Bini, F.; Marinozzi, F.; Coppola, G.; Pierelli, F.; et al. Harmony as a Convergence Attractor That Minimizes the Energy Expenditure and Variability in Physiological Gait and the Loss of Harmony in Cerebellar Ataxia. Clin. Biomech. 2017, 48, 15–23. [Google Scholar] [CrossRef] [PubMed]
- Castiglia, S.F.; Trabassi, D.; De Icco, R.; Tatarelli, A.; Avenali, M.; Corrado, M.; Grillo, V.; Coppola, G.; Denaro, A.; Tassorelli, C.; et al. Harmonic Ratio Is the Most Responsive Trunk-Acceleration Derived Gait Index to Rehabilitation in People with Parkinson’s Disease at Moderate Disease Stages. Gait Posture 2022, 97, 152–158. [Google Scholar] [CrossRef] [PubMed]
- Conte, C.; Pierelli, F.; Casali, C.; Ranavolo, A.; Draicchio, F.; Martino, G.; Harfoush, M.; Padua, L.; Coppola, G.; Sandrini, G.; et al. Upper Body Kinematics in Patients with Cerebellar Ataxia. Cerebellum 2014, 13, 689–697. [Google Scholar]
- Yang, X.; Ye, Q.; Cai, G.; Wang, Y.; Cai, G. PD-ResNet for Classification of Parkinson’s Disease From Gait. IEEE J. Transl. Eng. Health Med. 2022, 10, 2200111. [Google Scholar] [CrossRef]
pwCA | HS | p | Cohen’s d | |
---|---|---|---|---|
Mean (SD) | Mean (SD) | |||
Stance Phase | 64.66 (3.31) | 61.62 (4.94) | <0.001 | 0.640 |
Cadence | 97.92 (17.93) | 99.39 (13.23) | 0.003 | 0.412 |
Stride length | 1.17 (0.19) | 1.24 (0.18) | 0.003 | 0.496 |
Pelvic rotation | 5.19 (2.46) | 3.97 (2.62) | <0.001 | 0.578 |
1.84 (0.57) | 2.45 (0.68) | <0.001 | 1.033 | |
43.23 (16.14) | 23.63 (12.52) | <0.001 | 1.311 | |
0.58 (0.22) | 0.40 (0.21) | <0.001 | 1.253 |
Accuracy | Recall | F1 Score | Log loss | ROC AUC | |
---|---|---|---|---|---|
Mean (SD) | |||||
Initial Unbalanced | 0.79 (0.2) | 0.79(0.1) | 0.75 (0.3) | 0.42 (0.3) | 0.87 (0.2) |
Undersampling | 0.77 (0.4) | 0.77 (0.3) | 0.78 (0.2) | 0.49 (0.3) | 0.89 (0.1) |
Oversampling | 0.83 (0.3) | 0.82 (0.4) | 0.83 (0.4) | 0.38 (0.2) | 0.89 (0.2) |
SMOTE (N = 200) | 0.80 (0.1) | 0.80 (0.2) | 0.79 (0.1) | 0.40 (0.1) | 0.87 (0.2) |
SMOTE (N = 1000) | 0.75 (0.2) | 0.74 (0.1) | 0.75 (0.2) | 0.41 (0.3) | 0.86(0.1) |
GAN (N = 200) | 0.83 (0.1) | 0.83 (0.2) | 0.79 (0.1) | 0.42 (0.2) | 0.83 (0.4) |
GAN (N = 1000) | 0.82 (0.2) | 0.83 (0.1) | 0.81 (0.3) | 0.44 (0.1) | 0.86 (0.2) |
ctGAN (N = 200) | 0.90 (0.1) | 0.88 (0.2) | 0.88 (0.1) | 0.35 (0.1) | 0.90 (0.1) |
ctGAN (N = 1000) | 0.81 (0.3) | 0.80 (0.1) | 0.79 (0.1) | 0.40 (0.2) | 0.85 (0.2) |
pwCA | HS | |||
---|---|---|---|---|
Precision | Recall | Precision | Recall | |
Mean (SD) | Mean (SD) | |||
Initial Unbalanced | 0.78 (0.2) | 0.37 (0.1) | 0.91 (0.3) | 0.91 (0.2) |
Undersampling | 0.55 (0.4) | 0.88 (0.2) | 0.95 (0.1) | 0.72 (0.1) |
Oversampling | 0.83 (0.4) | 0.60 (0.3) | 0.84 (0.3) | 0.92 (0.2) |
SMOTE (N = 200) | 0.72 (0.3) | 0.55 (0.2) | 0.81 (0.2) | 0.93 (0.1) |
SMOTE (N = 1000) | 0.58 (0.2) | 0.60 (0.3) | 0.82 (0.1) | 0.83 (0.3) |
GAN (N = 200) | 0.90 (0.2) | 0.40 (0.1) | 0.98 (0.2) | 0.80 (0.2) |
GAN (N = 1000) | 0.84 (0.1) | 0.5 (0.2) | 0.82 (0.3) | 0.96 (0.1) |
ctGAN (N = 200) | 0.85 (0.1) | 0.75 (0.1) | 0.92 (0.2) | 0.92 (0.2) |
ctGAN (N = 1000) | 0.83 (0.2) | 0.6 (0.2) | 0.82 (0.2) | 0.88 (0.1) |
Gait Parameter | KS | p-Value |
---|---|---|
Cadence | 0.22 | 0.09 |
Stride length | 0.18 | 0.23 |
0.17 | 0.35 | |
0.15 | 0.48 | |
Pelvic rotation | 0.09 | 0.92 |
Stance phase | 0.08 | 0.98 |
0.06 | 0.99 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Trabassi, D.; Castiglia, S.F.; Bini, F.; Marinozzi, F.; Ajoudani, A.; Lorenzini, M.; Chini, G.; Varrecchia, T.; Ranavolo, A.; De Icco, R.; et al. Optimizing Rare Disease Gait Classification through Data Balancing and Generative AI: Insights from Hereditary Cerebellar Ataxia. Sensors 2024, 24, 3613. https://doi.org/10.3390/s24113613
Trabassi D, Castiglia SF, Bini F, Marinozzi F, Ajoudani A, Lorenzini M, Chini G, Varrecchia T, Ranavolo A, De Icco R, et al. Optimizing Rare Disease Gait Classification through Data Balancing and Generative AI: Insights from Hereditary Cerebellar Ataxia. Sensors. 2024; 24(11):3613. https://doi.org/10.3390/s24113613
Chicago/Turabian StyleTrabassi, Dante, Stefano Filippo Castiglia, Fabiano Bini, Franco Marinozzi, Arash Ajoudani, Marta Lorenzini, Giorgia Chini, Tiwana Varrecchia, Alberto Ranavolo, Roberto De Icco, and et al. 2024. "Optimizing Rare Disease Gait Classification through Data Balancing and Generative AI: Insights from Hereditary Cerebellar Ataxia" Sensors 24, no. 11: 3613. https://doi.org/10.3390/s24113613
APA StyleTrabassi, D., Castiglia, S. F., Bini, F., Marinozzi, F., Ajoudani, A., Lorenzini, M., Chini, G., Varrecchia, T., Ranavolo, A., De Icco, R., Casali, C., & Serrao, M. (2024). Optimizing Rare Disease Gait Classification through Data Balancing and Generative AI: Insights from Hereditary Cerebellar Ataxia. Sensors, 24(11), 3613. https://doi.org/10.3390/s24113613