1. Introduction
The analysis of movement kinematics is essential to quantify and objectify human motion in various fields, including clinical research, sports science, orthopedics, and rehabilitation. This allows for the collection of measurable data to assess and compare individuals or outcomes at different time points (e.g., pre- and post-treatment, training interventions) [
1].
Among the most robust and scientifically validated tools for kinematic analysis are three-dimensional (3D) motion analysis systems, which provide highly precise data. However, these systems are often costly and require complex setups and specialized expertise [
2]. These limitations have led to the development of newer, more affordable two-dimensional (2D) technologies, some of which have demonstrated comparable precision to high-end 3D systems at significantly lower costs (approximately GBP 700 or EUR 950, as reported by Ugbolue et al. [
3]). Before such technologies can be adopted for standardized assessments—such as evaluating clinical tests like the Straight Leg Raise test—it is crucial to establish their validity and reliability.
One affordable option is Kinovea, an open-source 2D motion analysis software released under the GPLv2 license in 2009 through the collaboration of researchers, coaches, athletes, and programmers worldwide. Kinovea enables the analysis of distances, angles, coordinates, and spatial–temporal parameters [
4] by evaluating video recordings frame by frame. It allows calibration even in non-perpendicular planes relative to the camera’s perspective, offering versatile applications.
Kinovea has been employed across sports science [
5], clinical research [
6], and as a comparative tool to study other technologies [
7]. Kinovea stands out for being free, user-friendly, and portable, making it suitable for real-world applications without requiring prior technical expertise to obtain reliable measurements [
8]. Although the software has been validated for some temporal measurements, studies examining its reliability in specific clinical tests, such as the Straight Leg Raise (SLR) test, remain scarce.
This SLR test is commonly used to identify radicular pain by mechanically provoking irritated lumbosacral nerve roots, usually associated with lower back pain and lumbar disc herniation [
9,
10], as supported by biomechanical evidence [
11]. However, its capacity for distinguishing between structures is limited, so differentiation maneuvers including ankle dorsiflexion or hip adduction/rotation can help differentiate between the symptoms caused by neural and non-neural tissue irritation [
12,
13,
14,
15]. The goal of these maneuvers is to selectively increase or decrease the load on the nervous system without altering the mechanical stress on non-neural structures that might contribute to SLR-related symptoms. A change in SLR-provoked symptoms following one or more structural differentiation maneuvers suggests that the symptoms are at least partially related to neural tissue irritation [
15]. Although clinical guidelines recommend the use of the SLR test, evidence indicates that it demonstrates limited effectiveness in diagnosing lumbar radicular pain [
13].
One potential factor contributing to its suboptimal diagnostic performance is limited reliability [
16]. A specific aspect that warrants investigation is whether the reliability of angular hip measurements during the SLR could be a contributing factor to this limitation. The accurate and consistent measurement of the hip angle is crucial for standardizing the test and ensuring reproducible results, especially considering the usefulness of the test for the evaluation of subjects with flexibility deficits [
17]. In addition, collecting information on the range of motion achieved in hip flexion during the SLR test may provide relevant information on complementary factors. A relationship has been demonstrated between adaptive neural movement, whose limitation may reduce the range of motion values achieved during the test, and the presence of lumbar and radicular symptoms [
18]. Furthermore, range of motion is a widely used variable which is collected in combination with the SLR test in other musculoskeletal conditions, as in the case of hamstring injuries, where the restriction of movement persists up to 30 days after injury [
19] and has been postulated as a valid predictor for the time to return to play [
20]. The use of motion analysis software offers a potential solution for assessing these biomechanical aspects, but its reliability for such purposes must be thoroughly evaluated. This highlights the need for research into intra- and inter-examiner reliability when using Kinovea to measure hip angles during the SLR test. Establishing the validity and reliability of such software could improve the biomechanical assessment of the SLR, ultimately enhancing its diagnostic utility and addressing the current limitations in its application.
Since conflicting results have been reported regarding Kinovea and SLR reliability, with outcomes showing wide variations depending on the setup and perspective used [
21], this study aimed to analyze the intra- and inter-reliability of Kinovea in measuring hip flexion mobility at the positive response moment during a passive Straight Leg Raise test from an orthogonal perspective.
2. Materials and Methods
2.1. Study Design
From March to October 2024, an observational study focusing on reliability and agreement was carried out at Hospital 12 de Octubre, a public hospital in Madrid, Spain. The study followed the Reporting Reliability and Agreement Studies (GRRAS) guidelines [
22] and adhered to the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) guidelines [
23] to ensure high-quality reporting. Additionally, the study protocol and ethical considerations were reviewed, approved, and monitored by the Ethics Committee of Hospital 12 de Octubre prior to the commencement of data collection.
2.2. Participants
Informative announcements were displayed throughout the Faculty of Nursery, Physiotherapy, and Podiatry and Hospital 12 de Octubre to recruit the following two distinct samples: one comprising asymptomatic volunteers and another consisting of individuals with chronic low back pain. The general inclusion criteria for both groups were as follows: aged between 18 and 70 years; and has the ability to read, understand, and sign the informed consent. The presence of painful alterations in the hip joint, back surgeries in the last year, recent traumas, neurological or systemic diseases, or the inability to communicate with the researchers were considered as the exclusion criteria for both the asymptomatic and pain populations.
In addition to these considerations, asymptomatic participants had to be free of back pain, with a pain intensity of 0 out of 10 on the numerical pain rating scale, whilst subjects in the case group were required to have had recurrent lower back pain for at least 3 months and report a minimum pain intensity of at least 3 points out of 10 (as this is considered the minimum cut-off point to determine populations with musculoskeletal pain [
24]). The presence of symptoms compatible with cauda equina syndrome was considered as an additional specific exclusion criterion for the lower back pain group.
2.3. Examiners
This study involved two examiners—one with over 10 years of clinical experience specializing in musculoskeletal conditions and another with 1 year of experience. This choice was made to evaluate how examiner experience would impact measurement agreement, reflecting real-world clinical scenarios where professionals with varying expertise frequently collaborate. To improve the methodological rigor of the study, participant selection and the order of the side examined were randomized. Additionally, a rotating shift schedule was implemented to minimize communication between the examiners and prevent consensus-driven decisions, with shifts alternating daily. Participants were required to attend two sessions on the same day—one in the morning and one in the afternoon—with each session conducted by a different examiner.
All acquired images were coded according to the instructions provided by the principal investigator. Both examiners (experienced and novice) then performed all the measurements, with the order of the images (participant and side) randomized and blinding applied to the examiner who acquired each image, even in cases where they later measured their own images. This approach aimed to generate results that were more applicable to clinical practice, where examiners typically measure the images that they acquired. The time interval between the first and second measurements of each examiner (for intra-examiner reliability) was 1 week. Both examiners performed the procedures as in standard clinical practice and carefully followed the measurement instructions to reduce differences and avoid influencing the results. The blinding process accounted for examiner experience (experienced or novice), participant type (case or control), and side (symptomatic or non-symptomatic) for each image, ensuring unbiased analysis.
2.4. Straigth Leg Raise Procedure
The SLR test was carried out with the participants positioned on their backs on a flat examination table. To maintain stability and limit movement to the hip area, the participant’s pelvis was secured with a strap during the procedure. The examiner passively lifted the participant’s leg while keeping the knee fully straight (
Figure 1), continuing the motion until the participant indicated the onset of symptoms, such as pain or discomfort, or until resistance was encountered.
To accurately determine the angle at which symptoms or resistance occurred, the entire test was recorded on video, enabling precise evaluation and documentation.
2.5. Kinovea Measurements
The recorded videos were analyzed using the Kinovea software for Windows (version 0.9.5). Once the angle of symptom presentation or increased resistance was determined, the goniometer function was applied using the previously marked reference points on the greater trochanter and lateral malleolus, as shown in
Figure 2.
2.6. Statistical Analyses
Data processing and analysis were carried out using SPSS version 29 for Mac OS (Armonk, NY, USA), with a two-tailed significance level set at p < 0.05. The distribution of continuous variables was assessed through histograms and the Shapiro–Wilk test. Variables with p < 0.05 were classified as non-normally distributed, while those with p > 0.05 were considered normally distributed.
Descriptive statistics were applied to summarize sociodemographic data and hip flexion angle during the SLR test. Differences between sides (symptomatic and non-symptomatic) and groups (LBP cases and asymptomatic controls) were analyzed using Student’s t-test with a 95% confidence interval.
The intra- and inter-examiner reliability for hip flexion angle measurement were evaluated using the following six metrics: (1) the mean and standard deviation across trials (intra-examiner) and examiners (inter-examiner); (2) absolute error between trials (intra-examiner) and examiners (inter-examiner); (3) intraclass correlation coefficients (ICC
3,1 for intra-examiner and ICC
3,2 for inter-examiner reliability) [
25,
26]; (4) the standard error of measurement (SEM = SD × √(1 − ICC)); and (5) the minimal detectable change (MDC = SEM × √2 × 1.96).
3. Results
A total of 60 individuals (n = 60) were initially assessed for eligibility. After applying the inclusion and exclusion criteria, seven individuals (n = 7) were excluded due to a history of lower back (n = 4) and hip (n = 3) surgeries. This resulted in a final sample of 53 participants, comprising 15 asymptomatic individuals and 38 cases with LBP. Since each participant was assessed (1) on both sides, (2) with two measurements per side (3) taken by each of the two examiners, this process resulted in a total of 120 images for the 15 asymptomatic participants and 304 images for the 38 participants with LBP. Overall, 424 images were collected and analyzed in the study. For the subjects with LBP, the measurement was classified by side into symptomatic and non-symptomatic sides. For the asymptomatic control group, the dominant side was considered equivalent to the symptomatic side and the non-dominant side as equivalent to the non-symptomatic side.
Table 1 contains the demographic and clinical information of the sample with LBP and the sample consisting of asymptomatic individuals. Demographic characteristics of the participants showed no significant differences between the groups in terms of the proportion of females (
p = 0.609), age (
p = 0.983), or height (
p = 0.870). However, significant differences were observed in weight and body mass index (BMI), with higher values in the LBP group (weight:
p = 0.030; BMI:
p = 0.017). Regarding the hip flexion angle during the SLR test, the participants with LBP exhibited significantly reduced mobility compared to the asymptomatic controls. The mean hip flexion angle was 61.9 ± 13.7° in the LBP group and 82.2 ± 10.2° in the control group, with a mean difference of 20.3° (
p < 0.001). Similarly, significant reductions in the hip flexion angles were found for the symptomatic side (60.6 ± 13.7° vs. 81.7 ± 10.0°; mean difference: 21.0°;
p < 0.001) and the non-symptomatic side (63.2 ± 13.7° vs. 82.7 ± 10.8°; mean difference: 19.5°;
p < 0.001). The Shapiro–Wilk normality test for all continuous variables showed a
p > 0.05, so normal distribution of the data could be assumed. The histograms of the hip flexion angle measurements including the normality curve are shown in
Figure 3.
The intra-examiner reliability estimates of Kinovea for hip flexion angle measurement during the SLR test are available in
Table 2. Analyses were conducted separately for patients with LBP and asymptomatic controls and controlling the examiners’ experience. The results indicated no significant differences between trials 1 and 2 for the novel examiner assessing patients with LBP (
p = 0.716) and asymptomatic controls (
p = 0.856). Similarly, the scores obtained by the experienced examiner did not differ significantly between trials (
p = 0.924 for patients with LBP and
p = 0.930 for asymptomatic controls). The absolute error was low across all conditions, ranging from 0.8° to 1.0°, indicating minimal variation between trials for the same examiner. ICC
3,1 scores were exceptionally high (0.995–0.998), reflecting excellent reliability in the repeated measurements performed by both novel and experienced examiners. SEM scores were small in accordance with the ICCs obtained, ranging from 0.5° to 0.7°, further supporting the precision of the measurements. Finally, MDC at 95% confidence ranged from 1.3° to 1.9°, highlighting the test’s sensitivity to detect true changes in hip flexion angle. Therefore, the overall results demonstrated excellent intra-examiner reliability for hip flexion angle measurements during the SLR test, irrespective of examiner experience.
The inter-examiner reliability estimates of Kinovea for hip flexion angle measurement during the SLR test are summarized in
Table 3. Analyses were conducted separately for the patients with LBP and the asymptomatic controls and considering single measurements and a mean average of two measurements. The results indicated no significant differences between the examiners using a single measurement in the patients with LBP (
p = 0.484) or the asymptomatic controls (
p = 0.581). Similarly, the mean differences obtained by each examiner calculating a mean average of two measurements were comparable as were not significant (
p = 0.638 for patients with LBP and
p = 0.618 for asymptomatic controls). Absolute errors for inter-examiner comparisons were low across both groups and conditions, suggesting minimal discrepancies between the examiners. The ICC
3,2 scores demonstrated excellent reliability across all conditions, ranging from 0.985 to 0.992. For single measurements, ICC values were 0.989 for patients with LBP and 0.985 for asymptomatic controls. When using the mean average of two measurements, ICC values slightly improved, reaching 0.992 for patients with LBP and 0.986 for asymptomatic controls. SEM scores were small, ranging from 1.2° to 1.5°, supporting the precision of the inter-examiner measurements. Furthermore, the MDC values at 95% confidence were slightly higher but still low, ranging from 3.4° to 4.0°, indicating the sensitivity of the test to detect meaningful changes. Therefore, these results confirm high inter-examiner reliability for hip flexion angle measurements during the SLR test, regardless of whether single measurements or the mean average of two measurements were used.
4. Discussion
This is the first study assessing the intra- and inter-examiner reproducibility of Kinovea for measuring the hip flexion angle during the SLR test. In summary, the results obtained in this study demonstrate that using the Kinovea software to measure the hip flexion angle during the SLR test based on plain video recordings is highly reliable for both intra- and inter-examiner assessments. Compared to other 2D analysis methods, Kinovea stands out as a free, open-source, and user-friendly tool, offering significant advantages for clinical applications. Additionally, the software allows for the freezing of frames at precise moments of joint ROM for an accurate measurement [
27,
28] and provides more control in confusing factors during the maneuver (e.g., knee flexion compensations) compared with other tools (e.g., goniometer or inclinometer).
Previous studies have analyzed the reproducibility of Kinovea for the measurement of other regions and activities, but none have conducted a study of its use for the measurement of SLR-associated ROM in LBP populations. Evidence on the reliability of goniometric measurement has shown that data may vary between different joints and actions, even between active and passive maneuvers, in relation to the complexity of movements and functional and structural differences [
29]. This study provides innovative information on a useful tool to be used in combination with the SLR test, in populations with and without LBP, given its already studied applicability for the flexibility assessment [
17].
Regarding the patients with LBP, having a reliable tool such as Kinovea for measuring the ROM achieved in hip flexion during the SLR test allows professionals to gather important information on adaptive neural movement, which has a major relationship with the presence of both lumbar and radicular symptoms [
18].
Furthermore, beyond lumbar pathology and flexibility problems, the Kinovea software could be used in combination with the SLR test for the evaluation of other lower limb conditions such as hamstring injuries. Its use could facilitate the follow-up of subjects by both physiotherapists and sport professionals (in the case of athletes) as the relationships between ROM, time since injury, and time to return to play have been established [
19,
20].
Our results are consistent with those of other studies on the use of Kinovea in other structures such as the shoulder, both in healthy adults [
25] and in female volleyball players [
30], lower limb during gait analysis [
4] or running [
5], or the cervical spine [
6] among others, showing good ICC values and postulating itself as a feasible alternative for the evaluation of ROM in different joints and movements. However, as already stated in the evidence on goniometric evaluation [
29], it is essential to have precise control of the measurement procedures, carry out standardized protocols, and determining specific locations for the markers [
4].
Kinovea does not require extensive prior experience in video recording or analysis to achieve reliable and accurate measurements [
28,
31]. In this study, however, examiners attended a training session before conducting measurements to familiarize themselves with the software and improve accuracy, following the recommendations of Baude et al. [
27] as the authors suggested that formal training sessions for examiners (and patients) enhance the reliability of Kinovea in motion analysis.
To further improve reliability in the current study, a cross mark using tape on the participants’ skin to identify the joint fulcrum (center of motion). This approach aligns with methods used by Richardson et al. [
32], Damsted et al. [
5], and Moral-Muñoz et al. [
33], who used athletic tape markers on participants’ shoes to digitize foot strike angles, hip markers to measure knee and hip angles during running, and skin marks to assess hip and knee joint angles for hamstring flexibility evaluation, respectively.
Kinovea’s built-in angle selection tool eliminates the need to print still images and uses a digital virtual goniometer to measure joint angles with a precision of 1° increments [
25,
32]. This makes it a photography-based goniometric tool that differs from the traditional clinical goniometers, which require physical placement on the patient and can be cumbersome during joint ROM measurements.
The intra- and inter-examiner reliability estimates in this study surpassed those reported by Hsieh et al. [
34], who examined the reliability of three instruments (a standard plastic goniometer, a flexometer, and a tape measure) for measuring the hip flexion angle at the positive SLR response. Hsieh et al. [
34] demonstrated excellent intrasession reliability for these three instruments, with Cronbach alpha coefficients exceeding 0.94. However, intersession reliability varied as follows: the goniometer and flexometer each achieved an alpha coefficient of 0.88, whereas the tape measure scored lower at 0.74. The intersession reliability of the tape measure improved significantly to 0.93 when trigonometric calculations were used to determine the angles. The flexometer proved equally reliable as the goniometer for intrasession and intersession measurements and offers the additional advantage of allowing a single therapist to conduct the test without assistance. However, its bulkiness and higher cost limit its practicality in clinical settings. The goniometer, while reliable, lacks the convenience of single-operator use provided by the flexometer. The tape measure showed lower intersession reliability when directly recording distances but improved when used with trigonometric methods. Nonetheless, this method may require calculations that are less practical in busy clinical environments.
Later, Boyd et al. [
35], conducted one study assessing the psychometric properties of a hand-held inclinometer commonly utilized during SLR testing. Unlike Hsieh et al., who employed three different instruments, Boyd et al. focused solely on the hand-held inclinometer and its validity compared to a digital inclinometer and a digital goniometer. Methodologically, Boyd et al. measured the limb elevation angle rather than just hip flexion, integrating the contribution of pelvic and lumbar spine motion. The hand-held inclinometer produced excellent intra-rater reliability (ICCs: 0.95 to 0.98), with an SEM ranging from 0.54° to 1.22° and a MDC
95 between 1.50° and 3.41°. These findings indicate that the hand-held inclinometer is a reliable and valid tool for assessing limb elevation during SLR tests, with higher precision and practicality compared to the tape measure or goniometer methods described by Hsieh et al. Moreover, Boyd et al. emphasized the inclinometer’s ability to capture comprehensive motion, including contributions from the pelvis and lumbar spine, making it a better fit for assessing mechanosensitivity during neurodynamic testing. This broader construct measurement aligns with the clinical needs for evaluating neural involvement during SLR tests.
Comparing these estimates with the values obtained in this study, we found slightly better reliability and accuracy of measurements using Kinovea. One potential explanation for this finding lies in the methodological advantage provided by video-based analysis with Kinovea. Unlike Boyd et al., our approach enabled the precise control and monitoring of compensatory mechanisms in the lower limb, such as unintended knee flexion or rotational deviations. These compensations, if not addressed, can introduce variability and reduce the accuracy of range of motion measurements. By minimizing these compensatory movements through careful visual analysis, we ensured that the angles recorded more accurately reflected the true hip flexion or limb elevation, enhancing the reliability of our measurements. Although both studies incorporated the patient’s positive response to the maneuver as a determinant of the endpoint, the additional control of compensatory lower limb movements in our methodology provided a distinct advantage. This added layer of precision likely contributed to the improved reliability and accuracy observed in our results, underscoring the benefits of integrating video analysis into clinical and research settings.
Statistical methods also differed between the studies [
34,
35]. While Hsieh et al. [
34] utilized Cronbach’s alpha coefficients and ANOVA for reliability assessment, Boyd et al. [
35] relied on ICCs, SEM, and Bland–Altman analysis to evaluate the agreement between instruments. Furthermore, Boyd et al. examined construct validity, revealing strong correlations between the hand-held inclinometer and the digital inclinometer (r = 0.98–0.99) but also identifying discrepancies with the digital goniometer (∼10° differences due to the goniometer measuring hip flexion alone). Reporting the ICCs, SEM, and MDC offers significant advantages over Cronbach’s alpha in reliability studies. While Cronbach’s alpha assesses internal consistency, it does not provide information on measurement error or the sensitivity of the tool to detect meaningful changes. In contrast, the ICC evaluates the agreement between the examiners or the measurements, offering a more comprehensive understanding of reliability in clinical contexts. The SEM quantifies the degree of error in individual measurements, and the MDC indicates the smallest change that can be confidently interpreted as real rather than due to error. Together, these metrics provide a more nuanced and clinically relevant assessment of an instrument’s reliability and precision [
26].
Limitations
Despite the promising findings, this study has several limitations that should be acknowledged. The sample size, although adequate for reliability analyses, was relatively small, especially in the asymptomatic control group (n = 15) when compared to the LBP group (n = 38). However, the recruitment of a sample twice the size for the LBP group was performed with the objective of studying LBP and adjusting for possible differences between the subjects with pain due to their heterogeneous characteristics and conditions (e. g. determining the influence of variables such as BMI, age or disability), while the asymptomatic subjects were considered a more homogeneous sample. Nevertheless, this analysis was not performed in this study, highlighting the importance of its development in future research. In addition, the gender distribution of the total sample was not balanced (with only 35% being males). The distinction of gender characteristics could influence the reliability of the measurements. Also, there was a significant difference in weight and BMI between the LBP group and the asymptomatic controls, with the LBP group exhibiting higher values. However, the increase in BMI is usually a common finding in this population. These differences could have influenced the hip flexion angles during the SLR test due to biomechanical or physiological factors associated with higher BMI. The study did not control these variables, which may have introduced confounding effects on the measurements and their reliability. This limited sample may affect the generalizability of the results to broader populations. Future studies with larger sample and controlling gender and BMI are necessary to confirm these findings and enhance external validity.
Furthermore, only two examiners participated in the measurements—one experienced and one novice. While this design allowed for the assessment of the impact of examiner experience on measurement reliability, it does not capture the variability that might occur with a larger and more varied group of clinicians. Including multiple examiners with varying levels of expertise in future research could provide a more comprehensive understanding of inter-examiner reliability and enhance the generalizability of the results.
Moreover, the study focused solely on the reliability of the Kinovea software for measuring hip flexion angles during the SLR test and did not assess the validity of these measurements against a gold standard such as three-dimensional motion capture systems or other reliable methods such as digital inclinometer or goniometer. Without validation against a criterion standard, the accuracy of the Kinovea measurements remains uncertain. Future studies should aim to establish the concurrent validity of Kinovea by comparing its measurements with those obtained from advanced motion analysis technologies.
Also, the study did not evaluate the diagnostic utility of the SLR test when analyzed with Kinovea software, such as its sensitivity, specificity or predictive values for identifying neural tissue involvement in LBP patients. Assessing these diagnostic parameters would provide valuable insights into the clinical applicability of Kinovea-enhanced SLR assessments and their potential impact on patient management.