1. Introduction
Over 200,000 anterior cruciate ligament (ACL) injuries occur in the USA alone annually, with more than half of these injuries requiring surgical reconstruction and subsequent rehabilitation [
1]. This number is expected to increase in the coming years [
2]. The highest growth of ACL injuries is in young and active people (under 25 years) [
2] which, in the long term, may increase the risk of developing osteoarthritis and disability (e.g., reduced performance of daily living activities, leisure time activities, or sports activities). In particular, the rise of ACL injury in young people has been attributed to earlier specialisation by younger athletes, longer sporting seasons, more intense training, higher levels of competition, and a lack of free play [
2].
After surgical reconstruction, patients aiming to return to sport (RTS) go through a pre-defined rehabilitation programme which is deemed successful if the patient is able to return to the same level of sporting activity as before the injury [
3]. However, on average, 80% of patients were found to return to sport, with only 55% returning to competitive levels after ACL reconstruction [
4]. In professional sport, the RTS rate in professional male soccer players was higher (90%) one year after ACLR, but only 65% were still playing at the highest level three years after ACL reconstruction [
5].
King et al. [
6] highlighted that biomechanical differences between ACL and non-ACL reconstructed knees were evident at nine months after surgery despite no difference in performance time during a change-of-direction (COD) task. As shown by Slater et al. [
7], alterations in frontal- and sagittal-plane walking kinematics and kinetics observed early (<12 months) after surgery persisted in the following period (12–36 months). Despite clearance to return to physical activity, these gait patterns do not appear to normalize over time, which may indicate that the current approach to rehabilitation and assessment before return to activity is not adequate in identifying individuals with dysfunctional movement patterns. This was also confirmed in [
8,
9,
10,
11], where differences in joint kinematics and gait pattern (i.e., in anteroposterior translation and laxity, hamstring muscle activation, eccentric knee flexor strength) were also observed up to 5–10 years following reconstruction in subjects whose rehabilitation was deemed successful and returned to sport.
These results indicate that RTS still represents an important challenge after ACL reconstruction and that current clinical and sport practices could be improved. Indeed, no consensus exists on the most appropriate criteria for RTS after ACL, and criteria typically considered by clinicians involve time after surgery, self-reported patient-outcome measures (such as the International Knee Documentation Committee - IKDC questionnaire), clinical examination (e.g., muscle strength, hop tests, limb symmetry, movement quality, fatigue), and psychological factors [
5]. Indeed, Dingenen and Gokeler [
5], state current RTS measures are predominantly subjective and recommend we use more evidence-based objective RTS criteria. At present, there is a dearth of evidence supporting the relationship between RTS and standard subjective and objective assessments, which questions if existing RTS assessments and criteria are sensitive or demanding enough to elucidate clinically relevant indicators.
While marker-based motion analysis systems (e.g., Vicon) [
12] can provide objective assessments and represent the gold-standard technology adopted in gait analysis for quantitative movement analysis, their adoption is constrained by cost, access to specialist motion labs, as well as the practicality of application for larger patient/subject groups and, thus, shows limited use for on-the-field players. The market for wearable sensors has been growing significantly in recent years and such technologies represent a viable alternative to gold-standard technologies, offering remote real-time objective assessment at low cost and with small size.
The adoption of wearables in sport has spread significantly in individual athletes, sports teams, and physicians for the possibility to monitor in real-time functional movements, workloads, and biometric markers during training and competitive sports [
13]. Moreover, the integration of machine learning and artificial intelligence with wearable technology, thanks to the vast amount of data available nowadays, yields promising results in terms of athletic performance, coaching support, technique correction, injury prediction, and so on. Some relevant examples of the use of machine learning with inertial sensors in sport include [
14,
15,
16,
17].
A number of studies have considered the application of inertial sensors during ACL rehabilitation in laboratory settings [
18,
19,
20], however few works have evaluated the performance of athletes who returned to sport using wearable technology.
For example, Patterson et al. [
21] investigated 14 athletes post-ACL reconstruction (an average of 3.5 years after surgery) and 17 athletes as a control group in a walking scenario using two inertial sensors on the shanks and highlighted that gyroscope features were able to discriminate healthy from ACL-reconstructed individuals, which was not possible using gait temporal variables. On the contrary, Setuain et al. [
22,
23] tested 26 elite handball players (six of them were ACL-reconstructed with an average time since surgery of 6.3 years) using an inertial sensor worn on the lower-back during horizontal and vertical jumping tasks. However, no difference was evident among the two groups. Finally, Mandalapu et al. [
24] applied motion sensors and machine learning models on 131 subjects (109 of them with ACL injury) to discriminate between the two classes with good results. However, the tests were carried out in a lab setting and the time since surgery was not provided.
It is evident that additional testing of wearable sensor technologies used by athletes on-the-field after RTS is needed. The aim of this study is two-fold:
- (i)
to investigate whether there is a long after-effect of the ACL damage in rugby players, detecting significant differences in ACL-reconstructed vs. healthy players, when involved in a change-of-direction activity;
- (ii)
to provide an automated and objective method to distinguish between healthy and post-ACL groups of rugby players which is independent from subject-related information, step detection and segmentation processes, and standard gait spatiotemporal metrics, through the combination of a set of inertial sensors worn on the lower-limbs and data-driven machine learning models.
To the best of the authors’ knowledge, the combination of a data-driven approach and inertial sensors to classify healthy and ACL-reconstructed subjects on-the-field (with post-ACL athletes returned to sport and with time from surgery between five and 10 years) is not yet explored.
3. Results
3.1. Gait Analysis Results
A two-way ANOVA was performed to observe potential significant differences in temporal gait parameters between the post-ACL group and the control group. A summary of the descriptive statistics for each subgroup is shown in
Table 2. Statistical assumptions were checked before the test. Normality for each subgroup was assessed visually via histograms, box plots, and Q-Q plots showing only occasional and slight divergence from normality. Levene’s test was performed to evaluate the homogeneity of variances for each subgroup, with the following results: F = 1.92 and
p = 0.125 for gait cycle time, F = 5.152 and
p = 0.002 for stance phase, F = 1.164 and
p = 0.322 for swing phase, F = 5.036 and
p = 0.002 for relative stance phase, F = 5.044 and
p = 0.002 for relative swing phase, and F = 5.284 and
p = 0.001 for cadence. Even though the Levene’s test is significant for four parameters out of six, it is important to indicate that the ANOVA model is just an approximation for the data, and ANOVA assumptions may not be satisfied completely. With normal data but heterogeneous variances, ANOVA is robust for balanced or nearly balanced designs [
36,
37]. This is due to the fact that Levene’s test relies to a large extent on the sample size. Keppel [
38], indeed, suggested that a good rule of thumb is that, if sample sizes are equal, robustness should hold until the largest variance is more than nine times the smallest variance, whose condition is met in this study as variances are comparable among all the subgroups for each parameter.
Some gait parameters (swing phase, relative stance phase, and relative swing phase) do not show a statistically significant interaction between condition and limb, and likewise, do not show the statistical significance of the main effects.
For swing phase it was obtained F (1, 838) = 2.49, p = 0.115 for the group-limb interaction, and with the main effects on condition and limb being, respectively, F (1, 838) = 2.325, p = 0.128, and F (1, 838) = 0.36, p = 0.549. For relative stance phase, we obtained F (1, 838) = 2.588, p = 0.108 for the group-limb interaction, and with the main effects on condition and limb being, respectively, F (1, 838) = 0.299, p = 0.585, and F (1, 838) = 2.329, p = 0.127. For relative swing phase, we obtained F (1, 838) = 2.578, p = 0.109 for the group-limb interaction, and with the main effects on condition and limb being, respectively, F (1, 838) = 0.303, p = 0.582, and F (1, 838) = 2.333, p = 0.127.
The other parameters (gait cycle time, stance phase, and cadence) showed a significant interaction effect between condition and limb with results being, respectively, F (1, 838) = 5.45,
p = 0.02, F (1, 838) = 4.005,
p = 0.046, and F (1, 838) = 7.687,
p = 0.006. One-way ANOVAs were then performed for those cases considering the simple main effects. Results for the one-way ANOVAs are shown in
Table 3. The simple main effects analysis when comparing the players in the post-ACL group with the healthy control group showed that gait parameters obtained from the right (unaffected) leg where not significantly different between the two populations (
p = 0.46,
p = 0.355, and
p = 0.661 for gait cycle time, stance phase, and cadence, respectively). However, when considering the left (affected) leg, statistical significance was observed for gait cycle time and cadence (
p = 0.011 and
p = 0.001, respectively). This is also evident from the 95% confidence intervals (CI) for these two cases which, for the left leg gait cycle time were 0.501 to 0.527 s (post-ACL group) and 0.476 to 0.503 s (healthy group), and for the left cadence were 1.943 to 2.055 steps/s (post-ACL group) and 2.079 to 2.2 steps/s (healthy group). However, the effect size, expressed as Cohen’s d (
Table 4) calculated for between-limbs and between-group comparisons for each gait-related variable, is observed as very small for all the cases considered, except when analyzing the results of the left leg between healthy and post-ACL subjects (effect size small). Values for Cohen’s d statistics were interpreted as follows: <0.2 for very small, 0.2 to 0.5 for small, 0.5 to 0.8 for medium, 0.8 to 1.3 for large and >1.3 for very large differences [
39,
40]. A post-hoc analysis showed that, with the effect size obtained (0.15–0.34) and the available sample sizes, the power is too low and the sample size should be increased up to 137 subjects per group to have a power of at least 0.8.
3.2. Machine Learning Model Results
Summary results for all the models are reported in
Table 5, while the confusion matrices and the related metrics (sensitivity, specificity, precision, F1-score, and Cohen’s Kappa) are shown in
Table 6,
Table 7,
Table 8,
Table 9,
Table 10,
Table 11 and
Table 12, respectively. The overall accuracy is uniform among all the models, between 71.18% and 73.07%.
The kNN model shows an accuracy of 72.34%, with SE of 7.66%, sensitivity was 75.93%, specificity 68.9%, precision 70.63%, F1-score 73.19% and Cohen’s Kappa 0.448. The NB model shows an accuracy of 72.31% (SE: 7.95%), sensitivity 71.95%, specificity 72.8%, precision 72.26%, F1-score 72.1% and Cohen’s Kappa 0.447. The SVM model shows an accuracy of 71.18% (SE: 9.13%), sensitivity 67.9%, specificity 74.5%, precision 72.4%, F1-score 70.08% and Cohen’s Kappa 0.424. The XGB model shows an accuracy of 72.32% (SE: 10.47%), sensitivity 81.8%, specificity 63.07%, precision 68.56%, F1-score 74.6%, and Cohen’s Kappa 0.448. The MLP model shows an accuracy of 73.07% (SE: 8.99%), sensitivity 78.01%, specificity 68.3%, precision 70.79%, F1-score 74.22%, and Cohen’s Kappa 0.462. The stacking model (based on kNN, SVM, NB, XGB, and MLP as base models, and logistic regression as meta-learner) shows an accuracy of 72.84% (SE: 8.95%), sensitivity 77.59%, specificity 68.27%, precision 70.65%, F1-score 73.96%, and Cohen’s Kappa 0.458.
Cohen’s Kappa presents a moderate agreement (between 0.41 and 0.6, according to Landis and Koch [
41]) against the expected accuracy for all the considered models. Sensitivity (which identifies the proportion of actual positives correctly identified as such, e.g., the percentage of post-ACL subjects who are correctly identified as part of this class) is higher in XGB, MLP, and the stacking model (up to 81.8%). In contrast, specificity (which measures the proportion of actual negatives correctly identified as such, e.g., the percentage of healthy people who are correctly identified as not having any condition) is between 63.07% and 74.5% with SVM showing the best result. Those results may drive the selection of one model compared to another based on the priorities of athletes and clinicians. Indeed, if there is the requirement to limit the number of false negatives (e.g., the number of subjects in the post-ACL group who are incorrectly classified as healthy) in order to reduce the risk of injury relapse or contralateral ACL injury, then an XGB model should be preferred to the others.
The features selected predominantly across the 10 permutations by the different models are shown in the Excel file in the
Supplementary Material. The feature analysis shows that XGB selected overall 15 features, kNN 16, NB 14, SVM 16, and MLP 15. Most of those features are selected among several classifiers. Indeed, 18 out of 250 features have been selected at least once by at least one classifier. In particular, four out of the 18 features were extrapolated from the
y-axis (anteroposterior axis), six from the
z-axis (vertical axis), and seven from a combination of the three axes. Moreover, most of the selected features were obtained from derived signals rather than the raw accelerometry and angular rate measurements, in particular, the jerk (12 out of 22 features) and magnitude (five out of 22 features) signals of the 3D acceleration.
Focusing on the XGB model, which presented the best results, three out of the 15 features were extrapolated from the y-axis (anteroposterior axis), five from the z-axis (vertical axis), and six from a combination of the three axes. Moreover, most of the selected features were obtained again from jerk (10 out of 15) and magnitude (5 out of 15) signals of the 3D acceleration.
Finally, all the selected features were obtained from standard time-domain and statistical calculations, such as standard deviation, mean, IQR, RMS, SMA, energy, peak-to-peak, and minimum.
4. Discussion
Return-to-sport following an ACL reconstruction still represents a challenge for clinicians and sport scientists due to the lack of sensitive and objective assessments that could highlight clinically relevant information as to where the athlete is in their journey to recovery. The work presented herein constitutes one of the few studies to investigate the application of wearables sensors in the identification of ACL reconstructed subjects in a group of individuals involved in on-the-field rugby activities so as to classify healthy and post-injury subjects.
Regarding the first goal of the paper, standard gait parameters from healthy and post-ACL athletes were extrapolated from the collected data and the following statistical analysis demonstrated that some variables (e.g., GCT, and cadence) may be useful to evaluate a long after-effect of the ACL damage, detecting significant differences in ACL-reconstructed vs. healthy players. This confirms the results discussed in [
42,
43] where a number of male athletes (some of them approx. five years after ACL reconstruction) were asked to run on a treadmill at different speeds to collect kinematic and kinetic variables. Milandri et al. [
42] showed that gait velocity may be significantly different between the two cohorts; however, as also indicated in [
43], most of the residual long-term differences are evident from ground reaction forces-related metrics and joint moments, which could not be obtained in the studied scenario. In contrast to the works presented in [
42,
43] which adopted standard optoelectronic or plantar pressure systems, this study only considered the adoption of low-power body-worn motion sensors to guarantee that the assessment could be performed out-of-the lab. Even though recent studies [
44,
45] have investigated the promising use of IMUs for the estimation of the vertical ground reaction force waveforms via machine learning approaches, their application to on-the-field conditions still needs to be confirmed, and thus, those metrics have not been considered in this study.
A power analysis showed that, based on standard alpha and beta levels of 0.05 and 0.8, respectively, a large effect size (standardized by Cohen as 0.8 [
39,
40]) would require a minimum sample size of 26 subjects per group. The effect size in the experiment was observed as very small for all the cases considered, except when analyzing the results of the injured leg between healthy and post-ACL subjects (effect size small). A post-hoc analysis showed that, with the effect size obtained (0.15–0.34) and the available sample sizes, the power is too low, and hence the sample size should be increased up to 137 subjects per group to have a power of at least 0.8. However, no study available in literature on the investigated topic fulfills this criteria, and even the ones meeting the criteria of the 26 subjects per group are scarce.
Therefore, even though some statistical significance was detected in the analysis, the small observed power and effect size do not provide enough confidence that the difference seen between groups for those variables was a real observed effect and, as a result, further larger studies should be performed.
Regarding the second goal of the paper, the importance of addressing gait pattern classification in biomechanics, and, in particular, in defining which parameters can distinguish between post-ACL subjects from healthy controls is well-known in literature [
46,
47,
48]. While those works relied on gait metrics, recent works published by Wu et al. [
49] and Richter et al. [
50] also considered the application of machine learning models for the discrimination between ACL deficient and healthy subjects. Nevertheless, all those studies were carried out in gait laboratories using gold-standard marker-based optoelectronic systems, thus limiting the applicability of those insights to real-world use cases.
Moreover, standard gait parameters may not be relevant or applicable during typical sport movements, such as cutting manoeuvres or jumping, causing an unreliable step-detection. Therefore, a data-driven approach based on machine learning models and motion data has also been developed with the goal to objectively discriminate between the two cohorts. This method, as it is independent of the step detection, may be more robust and accurate than standard gait analysis for on-the-field scenarios, which is a concept already defined for falling risk classification [
51].
The final outcome of a MLP classifier showed an overall accuracy on the test dataset of 73.07%, which is only slightly better compared to the other models investigated (worst accuracy: 71.18% for SVM). The standard error for all the models was large (>7%) which is due to the limited number of subjects involved in the test dataset.
Even though accuracy may seem similar for all the approaches, those models show different performances when looking at the misclassifications and related metrics (e.g., sensitivity, specificity). Limiting the number of false negatives rather than the false positives may be more appealing for athletes and coaches in order to minimize possible re-injuries, and thus models with large sensitivity (81.8% for XGB) may be more helpful for the end-users when out-of-the-lab.
Observing the features repeatedly selected by the XGB model and the other classifiers, it is evident that >50% of the chosen features are related to the sagittal plane. This insight confirms the results reported in [
7,
52] which indicated that most of the alterations of interest take place in the sagittal plane. Moreover, >50% of those features were connected with the jerk of the 3D acceleration again confirming the results observed in [
53]. The jerk, indeed, has been found to characterize dynamic movement at the knee joint and it has been successfully used as an indicator of the lack of stable neuromuscular control or structural instability, often observed in ACL subjects, because this movement correlated with patient reports of instability [
53]. Finally, all the selected features were obtained from standard time-domain and statistical features, such as standard deviation, mean, IQR, RMS, SMA, energy, peak-to-peak, and minimum.
The considered features were different from those used in the state-of-the-art for similar problems; this was due to the underlying concept of building fully data-driven models which do not rely on traditional biomechanical parameters and step detection, hence trading model performance vs. its interpretability. For example, Patterson et al. [
21] considered gyroscope-extracted features (such as shank rotation rate and its variance, and shank rate of change at different moments in the gait cycle) besides the conventional gait temporal variables. Setuain et al. [
22] instead adopted jumping-related biomechanical features (e.g., vertical velocity, mechanical efficiency ratio). However, in both cases [
21,
22], no machine learning model was developed to discriminate between the two populations of interest. Wu et al. [
49] built a neural network based on the features extrapolated from the 3D phase space reconstruction of the knee mechanics during the internal-external rotation, and flexion-extension, antero-poster, and proximal-distal translations while walking on a treadmill. On the other hand, Richter et al. [
50] considered a wide range of biomechanical features, including ground reaction forces and impulses, center of mass velocity and power in pelvis, hip, knee, and ankle, as well as joint angles of the ankle, knee, hip, pelvis, thorax, and thorax on pelvis in sagittal, frontal, and transversal planes, joint angular velocities of the ankle, knee, hip, pelvis, thorax, and thorax on pelvis in sagittal, frontal, and transversal plane, joint powers, moments, work, and impulse of ankle, knee, hip, and pelvis in sagittal, frontal, and transversal plane, time, and the rotation foot angle to pelvis. Moreover, different exercises were also tested, such as double leg countermovement jump, single leg countermovement jump, double leg drop jump, single leg drop jump, hurdle hop, single leg hop, as well as planned and unplanned change of direction. The analysis showed that during an unplanned change of direction task (as the one implemented in the present investigation) the highest achieved accuracy was 67% with the best model based on discriminant analysis relying on vertical centre of mass velocity and hip flexion moment, hence in line with the performance results reported in this paper. It was also observed that double leg countermovement jump and double leg drop jump were the exercises that show the highest accuracy in the discrimination between post-ACL and healthy athletes with 82% and 87% accuracy, respectively. However, Richter et al. [
50] recruited the post-ACL population only nine months after ACL reconstruction, and both [
49,
50] relied on optoelectronic systems and force platforms. Finally, Mandalapu et al. [
24] adopted inertial sensors on the ankles, wrists, and sacrum on athletes while walking and jogging on a treadmill. The features considered were fully data-driven, e.g., phase slope index and pairwise causality matrix, and managed to achieve good performance in the discrimination task using auto multi-layer perceptron and neural network with the highest area under the curve of 0.76 and Cohen’s Kappa of 0.53 (again, in line with the results presented in this paper). Interestingly, Mandalapu et al. [
24] also reported slightly better model performance in female athletes compared to male athletes.
The present study is one of the few in literature which adopted motion sensors for studying the discrimination between post-ACL and healthy athletes and, to the best of the authors’ knowledge, the first to use a combination of wearable sensors and machine learning models in field-settings. The results of this study clearly show that motion sensors can distinguish between players with ACL-reconstructed knee and healthy players even after 5–10 years following the injury, despite the previously injured athletes being deemed fully recovered. This is a promising result that could suggest the reliable use of these sensors in real training environments, thus supporting the decision-making process of physiotherapists, medical staff, and sport scientists in their practice.
This study was conducted in a free-living environment using only motion sensors, therefore no gold-standard optical motion tracking system was adopted for monitoring the athletes’ biomechanical conditions. This gold-standard assessment would have provided further clarification regarding the observed significant differences in the stance phase and cadence parameters estimated from both limbs in the healthy group. However, it is not unusual to observe gait asymmetries also in healthy subjects when running, as already indicated in literature in [
54,
55], with the main reasons being indicated as the running speed, the runner’s running experience, and fatigue.
Given that the analyzed rugby players were not part of a team and the previously injured subjects were monitored by different independent physiotherapists, the consideration of them being deemed fully recovered was based exclusively on the reports from the athletes’ clinicians.
The analysis carried out in this study did not include subject-related features (e.g., height, weight) in order to develop subject-independent models based exclusively on data collected from the wearable sensors and the movement mechanics. Further analysis should be performed to highlight the impact that subject-related features may have on the model performance.
Moreover, gender and small sample size are other limitations of the study which may limit the generalizability of the results. Given the novelty of the study, the present investigation was designed as a pilot proof-of-concept; larger cohort will need to be recruited in the future to confirm those results as shown by the power analysis. Furthermore, the collection of a larger dataset could enable the possibility to adopt more powerful techniques associated to deep learning. The impact of different feature selection approaches on the model performance could also be investigated [
56]. Also, it is unknown if the post-ACL subjects reported similar asymmetries before the injury or any other time beside the test session in which they were recorded. Finally, despite the numerous tests available in return-to-sport protocols, this study focused specifically on cutting maneuvers, because of their high-risk mechanics and relations to ACL injuries [
6,
57], even though additional tasks should also be considered in future studies [
50].
Therefore, further larger scale and longitudinal studies should be defined to confirm these insights, in a less controlled environment and adopting additional sensing technology (e.g., pressure insoles, surface electromyography, full-body motion sensors) for model validation.