5.3. Quantitative Evaluation of the HPSUR Dataset
The average precision (AP) curves of the four backbone models throughout their training iterations are displayed in
Figure 12. All the networks demonstrate an increasing trend in AP, with the Hrnet-w32 and Hrnet-w64 models reaching a plateau earlier than the LiteHrnet variants. After about 50,000 iterations, the LiteHrnet-30 model exhibits the highest AP, indicating a more refined learning capability. Hrnet-w64 shows marginally better performance than Hrnet-w32.
The loss curves for different networks show how the optimization process went during training, as shown in
Figure 13. All networks initially had a quick drop in loss, followed by a gradual convergence toward a minimum value. The LiteHrnet-30 model had the lowest loss, which means it had better generalization ability on the training data from the three volunteers. Although the LiteHrnet-18 model did not perform better than the LiteHrnet-30, it had a lower loss than both the Hrnet models, indicating the effectiveness of the LiteHrnet architecture in capturing pose features with fewer parameters. Despite its lightweight design, the AP and loss curves show that the LiteHrnet-30 model can effectively learn complex human poses from UWB radar data. These results suggest that LiteHrnet models have potential in scenarios where model efficiency is crucial without significantly compromising performance.
Table 3 and
Figure 14 present a comparison of human pose estimation errors across various network models. Established models such as RF-Pose and RF-Pose 3D exhibit mean per-joint position errors (MPJPE) of 62.4 mm and 43.6 mm, respectively, with RF-Pose 3D being assessed over a substantial dataset comprising more than 1.6 million samples. In contrast, UWB-Pose and RadarFormer demonstrate pose estimation errors of 37.87 mm and 33.5 mm, respectively, indicating their higher accuracy in estimating human poses. RadarFormer’s evaluation involved a dataset of 162,280 samples, highlighting its effectiveness across a considerable number of data points.
Our paper proposes Hrnet-w32, Hrnet-w64, LiteHrnet-18, and LiteHrnet-30 models. These methods demonstrate MPJPEs of 34.43 mm, 33.48 mm, 32.09 mm, and 34.28 mm, respectively. We analyzed each model over a dataset of 311,963 samples. The LiteHrnet-18 model outperforms all models with the lowest error rate, signifying a significant advancement in radar-based human pose estimation.
Table 4 and
Figure 15 provide the performance details of various backbone models for human pose estimation, as measured using the HPSUR dataset. The focus is on the methods proposed in this paper. Among the backbones, the Hrformer-base and Hrformer-small models exhibit higher mean pose estimation errors of 59.16 mm and 49.94 mm, respectively. Other models, such as theCPM, Mobilenetv2, Resnet 50, Resnet 101, and Vipnas models, show a range of errors, with CPM having the lowest mean error of 39.75 mm, indicating a trend toward improvement with more recent architectures.
The final four backbones, part of this study’s contribution, show significant advancements in estimation accuracy. The Hrnet-w32 backbone model achieved a mean error of 34.43 mm, showcasing a relatively higher variation in performance with a variance of 3.42 mm. Its maximum and minimum errors were recorded at 39.65 mm and 7.40 mm, respectively. The Hrnet-w64 model demonstrated a slightly lower mean error of 33.48 mm and a reduced variance of 2.85 mm, indicating a more consistent performance. The maximum error was slightly lower at 38.89 mm, with a minimum error close to the Hrnet-w32 model at 7.48 mm. The LiteHrnet-18 model exhibited a mean error of 32.09 mm, the lowest among the four models, and had the most minor variance at 2.40 mm, suggesting a more stable prediction capability. However, it registered a higher maximum error of 36.01 mm but also showed a better minimum error of 8.26 mm, implying that while its peaks were higher, its overall performance tended to be more reliable. Lastly, the LiteHrnet-30 model reported a mean error of 34.28 mm with a variance of 3.07 mm. Its maximum error was recorded at 38.22 mm, and the minimum error was 7.24 mm, which was the best among all models.
To summarize the comparison between the human pose estimation models, LiteHrnet-18 had the lowest mean error, which indicates that it has better accuracy on average. LiteHrnet-30 had the best minimum error, indicating its potential to yield highly accurate predictions in the best scenarios. Although the Hrnet variants were less consistent than the LiteHrnet models, they still maintained a competitive range of error metrics. These insights into the models’ performance highlight the trade-offs between mean and variance and between the maximum and minimum errors. These trade-offs are crucial for practical applications of human pose estimation technology.
Table 5 and
Table 6 provide data on the performance metrics of four different neural network models evaluated on subjects S4 and S5, respectively. These evaluations were conducted as part of our study on how well these models perform when estimating human poses across different subjects.
Table 5 shows that, for subject S4, the mean errors ranged from 36.16 mm for the LiteHrnet-18 model to 40.28 mm for the LiteHrnet-30 model, which indicates that these models vary in their ability to generalize across different subjects. However, the Hrnet-w64 and LiteHrnet-18 models showed lower variances of 3.10 mm and 2.73 mm, respectively, suggesting that they consistently performed well across various poses of the subject. The Hrnet-w64 model had the lowest maximum error of 41.73 mm, while the LiteHrnet-18 model exhibited the highest minimum error of 10.85 mm, indicating that it performed well in best-case scenarios.
Table 6 shows that, for subject S5, all models had a reduced average error rate when compared with their performance on subject S4. The LiteHrnet-18 model had the lowest average error rate of 30.09 mm, followed closely by the LiteHrnet-30 model at 31.33 mm. The error variances were minimal across all models, indicating consistent performance. The Hrnet-W64 model registered the lowest variance at 2.72 mm. The LiteHrnet-18 model had the advantage in maximum error, recording a significantly lower figure of 34.34 mm, thereby demonstrating its superior capability in accurately estimating the poses of subject S5. Additionally, the LiteHrnet-30 model recorded the smallest minimum error at 5.93 mm, highlighting its outstanding accuracy under optimal conditions.
The analysis that compares the effectiveness of models across different subjects highlights the unique capabilities of each architectural design. The Hrnet model versions demonstrate consistent performance, while the LiteHrnet models excel in specific metrics and have average or minimal error rates. This evaluation emphasizes the importance of considering individual differences when creating and evaluating human pose estimation models. Gaining such an understanding is crucial for developing pose estimation technologies that are robust and reliable across different individuals.
Table 7 and
Table 8 show the performance of the S4 subject performing two different postures, labeled 1001 and 1003. These tables give us insights into the models’ performance across diverse subjects and postures. For posture 1001, as shown in
Table 7, the mean errors for the models range from 36.33 mm for the LiteHrnet-30 model to 39.91 mm for the Hrnet-w32 model, indicating a modest range of mean prediction accuracy across the network architectures. The LiteHrnet-18 and LiteHrnet-30 models show the lowest variances of 3.09 and 2.69 mm, respectively, indicating consistent performance across multiple instances of posture 1001. The maximum error is smallest for the LiteHrnet-30 model at 39.72 mm, whereas the minimum error does not vary significantly across the models, with the Hrnet-w32 model showing slightly better performance at 8.64 mm.
In the case of posture 1003, as illustrated in
Table 8, the mean errors are relatively lower, with the Hrnet-w64 and LiteHrnet-18 models showing similar performance at around 35.94 mm and 35.70 mm, respectively. However, the LiteHrnet-30 model shows a notable increase in mean error to 43.24 mm. The variance metrics are consistent with the previous posture, but the LiteHrnet-18 model shows a marginally better variance of 2.46 mm. The maximum error for the LiteHrnet-18 model is significantly lower at 37.52 mm, highlighting its effectiveness in handling posture 1003. Conversely, the LiteHrnet-30 model has the least favorable minimum error at 9.96 mm.
It is important to choose an appropriate model for estimating posture based on the specific posture to be estimated. The LiteHrnet models offer more consistent performance with less variance, while the Hrnet models fluctuate more accurately. The difference in performance between postures 1001 and 1003 for the same subject also highlights the models’ varying degrees of adaptability to different postural dynamics. This comprehensive evaluation is necessary for developing nuanced pose estimation models that can adapt to the subtleties of individual subject postures.
Table 9,
Table 10,
Table 11 and
Table 12 present a comprehensive analysis of the performance of four distinct neural network backbones when applied to subject S5 across four different postures (1001, 1002, 1003, and 1004). This analysis aims to gain a granular understanding of how well these models capture the nuanced differences in human movements. For posture 1001, as shown in
Table 9, the LiteHrnet-30 model outperforms the other models with the lowest mean error of 29.52 mm and a reasonable variance of 2.24 mm, indicating its robust performance. Additionally, this model has the lowest maximum error, demonstrating its effectiveness in dealing with posture 1001. However, the LiteHrnet-18 model shows the highest minimum error among all the models at 7.66 mm.
For posture 1002,
Table 10 reveals a closer range of mean errors among the models, with the LiteHrnet-18 model achieving the lowest mean error of 26.11 mm. Furthermore, it exhibits the lowest variance of 1.82 mm, illustrating its consistent performance across different instances of posture 1002. Once again, the LiteHrnet-30 model demonstrates a robust minimum error at 5.46 mm, which is the best among all models.
Table 11 shows that for posture 1003, the LiteHrnet-18 model records the lowest mean error of 33.43 mm, with a variance of 2.92 mm. This model also has the lowest maximum error of 36.92 mm, signifying a favorable performance. Although the LiteHrnet-30 model does not have the lowest mean error, it does maintain a competitive minimum error of 5.75 mm. Lastly,
Table 12 shows the most significant difference in performance for posture 1004. Here, the LiteHrnet-18 model achieves a substantially lower mean error of 29.23 mm and the lowest variance of 2.04 mm, indicating an exceptional ability to predict this posture accurately. The LiteHrnet-30 model, however, has a slightly higher mean error of 32.77 mm but maintains a relatively low minimum error of 5.29 mm.
These performance metrics across diverse postures for subject S5 illustrate the distinct capabilities and limitations of the different models. LiteHrnet models, particularly the LiteHrnet-18 model, consistently show lower mean and variance errors, implying better overall performance in diverse posture estimation. These findings are crucial for developing advanced human pose estimation models that can adapt to various human postures and movements, ensuring high accuracy and reliability in real-world applications.
Table 13 compares the performance of four neural network backbone models when tested on two subjects, S4 and S5. The analysis shows how each model performs on different subjects in the same task domain. The Hrnet-w32 model significantly reduces the mean error from 38.81 mm for subject S4 to 32.27 mm for subject S5. This reduction in mean error is coupled with a decrease in variance, suggesting a better fit for subject S5. Similarly, the Hrnet-w64 model performs better on subject S5 with a decrease in mean error and variance. The mean errors are 36.96 mm for S4 and 31.76 mm for S5. The LiteHrnet-18 model also shows a decrease in mean error from 36.16 mm for S4 to 30.09 mm for S5, with a notable reduction in variance, indicating a more stable performance on subject S5. Although the LiteHrnet-30 model shows an increased mean error for subject S4 at 40.28 mm, it presents a significantly lower mean error of 31.33 mm for subject S5, again with reduced variance. Maximum errors are consistently lower for subject S5 across all models, with the most considerable improvement seen in the Hrnet-w32 model, from 42.86 mm down to 38.06 mm. Minimum errors follow a similar trend, with all models achieving lower errors on subject S5, indicating that the models are better at capturing the least complex poses of subject S5 than S4.
This comparison highlights the significance of evaluating models for specific subjects in human pose estimation research. Although all models show improved performance metrics for subject S5, the reduction in variance and error indicates that either the models are inherently more adaptable or the poses of subject S5 are less challenging for the models to estimate. These findings are crucial in developing customized pose estimation solutions accommodating inter-subject variability.
Table 14 uses four neural network backbone models to compare human pose estimation results for four different postures labeled 1001, 1002, 1003, and 1004. The models’ performance is measured in terms of mean error and variance, which gives a comprehensive view of each model’s capabilities in handling different human movements. In postures 1001 and 1002, the LiteHrnet models outperform the Hrnet variant models, with the LiteHrnet-30 model achieving the lowest mean error at 32.05 mm in posture 1001 and the LiteHrnet-18 model demonstrating the lowest variance at 2.41 mm in posture 1001. The LiteHrnet models are better suited to capture posture 1001 and 1002 more consistently.
Conversely, in posture 1003, the LiteHrnet-30 model exhibits a considerable increase in mean error to 39.82 mm, the highest among all models for this posture, which might indicate a reduced ability to estimate this particular pose accurately. However, the Hrnet-w64 and LiteHrnet-18 models maintain lower mean errors and variances, with the Hrnet-w64 model achieving the lowest variance, suggesting it is less sensitive to the variations within posture 1003. Lastly, posture 1004 showcases LiteHrnet-18’s dominance, with the lowest mean error and variance across all models, suggesting its strong adaptability and reliability for this specific posture. Across all postures, the LiteHrnet-18 model consistently maintains low mean errors and variances, indicating its robustness and efficiency in human pose estimation tasks. The Hrnet models, while generally exhibiting higher mean errors and variances, still maintain competitive performance, especially the Hrnet-w64 model, which has the lowest variance for posture 1003. The comparison of these models provides critical insights into their posture-specific performance, underlining the importance of model selection based on the specific requirements of the pose estimation task at hand. This detailed assessment aids in discerning the strengths and limitations of each model, facilitating more informed decisions in the development and application of human pose estimation technologies.
Table 15 compares the performance of four backbone models in estimating poses 1001 and 1003 for subjects S4 and S5. This comparison helps to demonstrate how effectively the models identify different poses for different subjects. For pose 1001, all models showed a reduced mean error when evaluating subject S5 compared with S4. The LiteHrnet-30 model had the most significant reduction, indicating its heightened sensitivity to the subtleties of subject S5’s posture. The variances for S5 were consistently lower, implying a more stable estimation across different instances of pose 1001. However, for pose 1003, the mean errors were generally lower for subject S5 across all models except for the LiteHrnet-30 model, which showed an increase. The variances for S5 were higher in the Hrnet-w32 and LiteHrnet-30 models, suggesting that pose 1003 presents more complexity or diversity in this subject’s movements than in S4.
The Hrnet-w64 and LiteHrnet-18 models showed notable consistency across subjects and poses, with competitive and stable mean errors and variances. The LiteHrnet-18 model showed the lowest variance in both poses for S5, reinforcing its robustness in pose estimation across different subjects. This comparison demonstrates the importance of considering subject variability and poses difficulty when developing human pose estimation models. The results indicate that although LiteHrnet architectures generally offer superior accuracy and stability, the choice of model may depend on the specific subject and pose combination, requiring a tailored approach for optimal performance in practical applications.
A boxplot visualization compares the mean per-joint position error (MPJPE) for four backbone models, Hrnet32, Hrnet64, LiteHrnet18, and LiteHrnet30, as shown in
Figure 16. The bule and red lines of the figure represent the interquartile range (IQR) and mean value, respectively. The comparison is made across all test data and subsets for subjects S4 and S5. The boxplot gives an overview of the estimation errors for the models. The horizontal line within each box shows the median MPJPE, which helps quickly compare the central tendencies of the models. The LiteHrnet models consistently demonstrate a lower median error across all datasets, suggesting better pose estimation performance.
Each model’s interquartile range (IQR) is represented by the height of a box, which shows the middle 50% of the data with red lines in
Figure 16. A smaller IQR indicates less variability in the model’s performance. The LiteHrnet-18 model consistently displays a compact IQR, especially for S5 test data, indicating robust performance with fewer outliers and less dispersion. The whiskers extending from the boxes illustrate the range of the data. At the same time, outliers, depicted as individual points, represent data points that fall beyond the whiskers and indicate pose estimates that significantly deviate from the typical error range. All models have outliers, indicating challenges in estimating certain poses. Comparing the performance of the S4 and S5 test data shows that the models’ performances are subject-specific. The LiteHrnet-30 model shows a notable increase in the median MPJPE for S4 compared with S5. This could mean that the LiteHrnet30 model is more attuned to the characteristics of subject S5’s data or that subject S4 presents more challenging poses for this model. The boxplot in
Figure 16 succinctly encapsulates the performance distributions of the tested models, providing insights into their reliability and precision. The LiteHrnet models, particularly the LiteHrnet-18 model, exhibit consistently high performance across different subjects, making them promising candidates for real-world applications where pose estimation accuracy is critical.
A boxplot analysis of the mean per-joint position error (MPJPE) for four human pose estimation models, Hrnet-32, Hrnet-64, LiteHrnet-18, and LiteHrnet-30, is shown in
Figure 17. The bule and red lines of the figure represent the interquartile range (IQR) and mean value, respectively. The analysis is conducted over various test datasets, including a collective test dataset and specific actions (1001 and 1003) for subject S4 and actions 1001, 1002, 1003, and 1004 for subject S5. The dashed line within each box represents the median MPJPE, which suggests that the LiteHrnet models, especially the LiteHrnet-18 model, offer a lower median error across most actions and subjects than the Hrnet models, which indicates higher accuracy in pose estimation for the LiteHrnet models. The interquartile range (IQR) for each action of both subjects is compact for LiteHrnet-18, signifying consistent estimation across different poses. The other models exhibit slightly wider IQRs, indicating more variability in their pose estimations. Whiskers extending from the boxes demonstrate the range of data, excluding potential outliers, and reflect the variability in estimation accuracy for more challenging poses. The presence of outliers, as indicated by points above and below the whiskers, is observed across all models and actions, highlighting instances where pose estimation deviates from typical error ranges.
A comparison of S4 and S5 data for actions 1001 and 1003 reveals that model performance is not only model-specific but also action-specific. Some models handle specific actions better than others. For instance, the LiteHrnet-30 model tends to have a higher median error for S4’s action 1003, which suggests a potential model–subject–action interaction effect.
Figure 10 effectively illustrates the performance distribution of human pose estimation models across different subjects and actions, providing valuable insights into model precision and reliability. The LiteHrnet models, particularly the LiteHrnet-18 model, demonstrate a lower median MPJPE, signifying their potential as robust solutions for accurate human pose estimation in diverse scenarios. Such detailed analysis is essential for advancing pose estimation technology and its application in real-world settings where accuracy and consistency are paramount.