Author Contributions
Conceptualization, H.L. and Y.C. (Youngwoon Cha); Methodology, Y.C. (Youngwoon Cha); Software, S.H., Y.C. (Youngdae Cho), J.P., S.C., Z.T. and Y.C. (Youngwoon Cha); Validation, Y.C.; (Youngwoon Cha) Formal Analysis, Y.C.; (Youngwoon Cha) Investigation, S.H., Y.C. (Youngdae Cho), J.P., S.C. and Y.C.; (Youngwoon Cha) Resources, H.L. and Y.C.; (Youngwoon Cha) Data Curation, Y.C.; (Youngwoon Cha) Writing—Original Draft, S.H., Y.C. (Youngdae Cho), J.P., S.C., Z.T. and Y.C.; (Youngwoon Cha) Writing—Review and Editing, Y.C. (Youngwoon Cha); Visualization, S.H., Y.C. (Youngdae Cho), J.P., S.C. and Z.T.; Supervision, H.L.; Project Administration, Y.C. (Youngwoon Cha); Funding Acquisition, Y.C. (Youngwoon Cha). All authors have read and agreed to the published version of the manuscript.
Figure 1.
We present a learning-based full-body pose estimation method for various humanoid robots. Our keypoint detector, trained on an extended pose dataset, consistently estimates humanoid robot poses over time from videos, capturing front, side, back, and partial poses, while excluding human bodies (see supplemental video [
11]). The following robots are shown: (
a) Optimus Gen 2 (Tesla) [
12]; (
b) Apollo (Apptronik) [
13]; (
c) Atlas (Boston Dynamics) [
14]; (
d) Darwin-OP (Robotis) [
15]; (
e) EVE (1X Technologies) [
16]; (
f) FIGURE 01 (Figure) [
17]; (
g) H1 (Unitree) [
18]; (
h) Kepler (Kepler Exploration Robot) [
19]; (
i) Phoenix (Sanctuary AI) [
20]; (
j) TALOS (PAL Robotics) [
21]; (
k) Toro (DLR) [
22].
Figure 1.
We present a learning-based full-body pose estimation method for various humanoid robots. Our keypoint detector, trained on an extended pose dataset, consistently estimates humanoid robot poses over time from videos, capturing front, side, back, and partial poses, while excluding human bodies (see supplemental video [
11]). The following robots are shown: (
a) Optimus Gen 2 (Tesla) [
12]; (
b) Apollo (Apptronik) [
13]; (
c) Atlas (Boston Dynamics) [
14]; (
d) Darwin-OP (Robotis) [
15]; (
e) EVE (1X Technologies) [
16]; (
f) FIGURE 01 (Figure) [
17]; (
g) H1 (Unitree) [
18]; (
h) Kepler (Kepler Exploration Robot) [
19]; (
i) Phoenix (Sanctuary AI) [
20]; (
j) TALOS (PAL Robotics) [
21]; (
k) Toro (DLR) [
22].
Figure 2.
Joint configurations of humanoid robots used in the Diverse Humanoid Robot Pose Dataset (DHRP). (a) Apollo. (b) Atlas. (c) Darwin-OP. (d) EVE. (e) FIGURE 01. (f) H1. (g) Kepler. (h) Optimus Gen 2. (i) Phoenix. (j) TALOS. (k) Toro. Note: Phoenix lacks lower-body data in the dataset.
Figure 2.
Joint configurations of humanoid robots used in the Diverse Humanoid Robot Pose Dataset (DHRP). (a) Apollo. (b) Atlas. (c) Darwin-OP. (d) EVE. (e) FIGURE 01. (f) H1. (g) Kepler. (h) Optimus Gen 2. (i) Phoenix. (j) TALOS. (k) Toro. Note: Phoenix lacks lower-body data in the dataset.
Figure 3.
Example training images from the DHRP Dataset. (a) Apollo. (b) Atlas. (c) Darwin-OP. (d) EVE. (e) FIGURE 01. (f) H1. (g) Kepler. (h) Optimus Gen 2. (i) Phoenix. (j) TALOS. (k) Toro. Note: Phoenix lacks lower-body data in the dataset.
Figure 3.
Example training images from the DHRP Dataset. (a) Apollo. (b) Atlas. (c) Darwin-OP. (d) EVE. (e) FIGURE 01. (f) H1. (g) Kepler. (h) Optimus Gen 2. (i) Phoenix. (j) TALOS. (k) Toro. Note: Phoenix lacks lower-body data in the dataset.
Figure 4.
Example images from the arbitrary random humanoid robot dataset. These 2k additional images enhance the diversity of body shapes, appearances, and motions for various humanoid robots.
Figure 4.
Example images from the arbitrary random humanoid robot dataset. These 2k additional images enhance the diversity of body shapes, appearances, and motions for various humanoid robots.
Figure 5.
Example images from the synthetic dataset. The first row displays examples generated through AI-assisted image synthesis using Viggle [
54]. The second row showcases examples created via 3D character simulations using Unreal Engine [
55]. These 6.7k additional annotations enhance the diversity of motions and scenarios for various humanoid robots.
Figure 5.
Example images from the synthetic dataset. The first row displays examples generated through AI-assisted image synthesis using Viggle [
54]. The second row showcases examples created via 3D character simulations using Unreal Engine [
55]. These 6.7k additional annotations enhance the diversity of motions and scenarios for various humanoid robots.
Figure 6.
Example images from the random background dataset. The first row displays examples from the target humanoid robot dataset, while the second row shows the corresponding foreground-removed images in the background dataset, generated using Adobe Photoshop Generative Fill [
58]. This dataset includes 133 AI-assisted foreground-removed images and 1886 random indoor and outdoor background images. The 2k background images, which do not feature humanoid robots, improve the distinction between robots and their backgrounds, particularly in environments with metallic objects that may resemble the robots’ body surfaces.
Figure 6.
Example images from the random background dataset. The first row displays examples from the target humanoid robot dataset, while the second row shows the corresponding foreground-removed images in the background dataset, generated using Adobe Photoshop Generative Fill [
58]. This dataset includes 133 AI-assisted foreground-removed images and 1886 random indoor and outdoor background images. The 2k background images, which do not feature humanoid robots, improve the distinction between robots and their backgrounds, particularly in environments with metallic objects that may resemble the robots’ body surfaces.
Figure 7.
Network architecture for the 2D joint detector: Starting with a single input image, the n-stage network generates keypoint coordinates
K and their corresponding confidence heat maps
H. At each stage, the output from the hourglass module [
59] is passed forward to both the next stage and the Differentiable Spatial-to-Numerical Transform (DSNT) regression module [
62]. The DSNT module then produces both
H and
K. In the parsing stage, each keypoint
is identified with its associated confidence value
.
Figure 7.
Network architecture for the 2D joint detector: Starting with a single input image, the n-stage network generates keypoint coordinates
K and their corresponding confidence heat maps
H. At each stage, the output from the hourglass module [
59] is passed forward to both the next stage and the Differentiable Spatial-to-Numerical Transform (DSNT) regression module [
62]. The DSNT module then produces both
H and
K. In the parsing stage, each keypoint
is identified with its associated confidence value
.
Figure 8.
Qualitative evaluation on selected frames. The proposed learning-based full-body pose estimation method for various humanoid robots, trained on our DHRP dataset, consistently estimates the poses of humanoid robots over time from video frames, capturing front, side, back, and partial poses. (a) Apollo. (b) Atlas. (c) DARwln-OP. (d) EVE. (e) FIGURE 01. (f) H1. (g) Kepler. (h) Phoenix. (i) TALOS. (j) Toro.
Figure 8.
Qualitative evaluation on selected frames. The proposed learning-based full-body pose estimation method for various humanoid robots, trained on our DHRP dataset, consistently estimates the poses of humanoid robots over time from video frames, capturing front, side, back, and partial poses. (a) Apollo. (b) Atlas. (c) DARwln-OP. (d) EVE. (e) FIGURE 01. (f) H1. (g) Kepler. (h) Phoenix. (i) TALOS. (j) Toro.
Figure 9.
Qualitative evaluation on selected frames. Full-body pose estimation results for miniature humanoid robot models not included in the DHRP dataset. These results demonstrate that our method can be extended to other types of humanoid robots.
Figure 9.
Qualitative evaluation on selected frames. Full-body pose estimation results for miniature humanoid robot models not included in the DHRP dataset. These results demonstrate that our method can be extended to other types of humanoid robots.
Figure 10.
Common failure cases: (a–c) False part detections caused by interference from nearby metallic objects. (d) False negatives due to interference from objects with similar appearances. (e) False negatives in egocentric views caused by the rarity of torso body part observations. (f) False positives on non-humanoid robots.
Figure 10.
Common failure cases: (a–c) False part detections caused by interference from nearby metallic objects. (d) False negatives due to interference from objects with similar appearances. (e) False negatives in egocentric views caused by the rarity of torso body part observations. (f) False positives on non-humanoid robots.
Table 1.
Comparison of robot pose estimation methods.
Table 2.
Number of frames in real dataset used for training and evaluating target humanoid robots. On average, 347 images are used per robot for training.
Table 2.
Number of frames in real dataset used for training and evaluating target humanoid robots. On average, 347 images are used per robot for training.
Humanoid Robot | Train Size | Test Size |
---|
Apollo (Apptronik) [13] | 530 | 118 |
Atlas (Boston Dynamics) [14] | 337 | 141 |
DARwln-OP (Robotis) [15] | 348 | 134 |
EVE (1X Technologies) [16] | 512 | 100 |
FIGURE 01 (Figure) [17] | 261 | 146 |
H1 (Unitree) [18] | 200 | 110 |
Kepler (Kepler Exploration Robot) [19] | 332 | 136 |
Optimus Gen 2 (Tesla) [12] | 474 | 135 |
Phoenix (Sanctuary AI) [20] | 233 | 168 |
TALOS (PAL Robotics) [21] | 358 | 108 |
Toro (DLR) [22] | 233 | 158 |
Total | 3818 | 1454 |
Table 3.
Total DHRP Dataset, including target robots, augmented with other random humanoid robots, synthetic data, and random backgrounds used for training and evaluation, presented in number of frames.
Table 3.
Total DHRP Dataset, including target robots, augmented with other random humanoid robots, synthetic data, and random backgrounds used for training and evaluation, presented in number of frames.
Train Set | Size | Test Set | Size |
---|
Real (11 target humanoid robots) | 3818 | Real (11 target humanoid robots) | 1454 |
Real (arbitrary humanoid robots) | 2027 | - | - |
Synthetic dataset | 6733 | - | - |
Random Backgrounds | 2019 | - | - |
Total | 14,597 | Total | 1454 |
Table 4.
Random data augmentation during training.
Table 4.
Random data augmentation during training.
Augment. Type | Augment. Method | Prob. | Range |
---|
Motion jitter | Image scale | 0.8 | 0.5–1.6 |
Motion jitter | Image rotation | 0.8 | ≤90° |
Motion jitter | Image translation | 0.8 | ≤0.3 × image height |
Motion jitter | Horizontal flip | 0.5 | - |
Color jitter | Pixel contrast | 0.8 | ≤×0.2 |
Color jitter | Pixel brightness | 0.8 | ≤±30 |
Noise jitter | Gaussian blur | 0.4 | |
Noise jitter | Salt-and-pepper noise | 0.3 | ≤±25 |
Table 5.
An ablation study of the dataset configurations. The evaluations are performed on the DHRP test set per object, utilizing 4-stage hourglass networks. The best results are shown in bold, and the worst are underlined. Dataset Configurations: Dataset A = target humanoid robots; Dataset B = real (target + arbitrary humanoid robots); Dataset C = real + synthetic data; Dataset D = real + synthetic + random background data.
Table 5.
An ablation study of the dataset configurations. The evaluations are performed on the DHRP test set per object, utilizing 4-stage hourglass networks. The best results are shown in bold, and the worst are underlined. Dataset Configurations: Dataset A = target humanoid robots; Dataset B = real (target + arbitrary humanoid robots); Dataset C = real + synthetic data; Dataset D = real + synthetic + random background data.
Configuration | Model | | | | | | |
---|
Dataset A | 4-stage HG | 73.9 | 81.9 | 61.6 | 51.7 | 50.8 | 31.8 |
Dataset B | 4-stage HG | 80.9 | 89.9 | 74.0 | 68.4 | 73.2 | 51.4 |
Dataset C | 4-stage HG | 83.5 | 91.7 | 79.3 | 73.7 | 79.6 | 60.7 |
Dataset D | 4-stage HG | 84.9 | 93.2 | 82.1 | 75.1 | 80.6 | 64.5 |
Table 6.
Ablation study of the dataset configurations. Evaluations are conducted on the DHRP test set for each joint, employing 4-stage Hourglass networks across all assessments, with
as the evaluation metric. The best results are highlighted in bold, while the worst are underlined. The dataset configurations are consistent with those presented in
Table 5.
Table 6.
Ablation study of the dataset configurations. Evaluations are conducted on the DHRP test set for each joint, employing 4-stage Hourglass networks across all assessments, with
as the evaluation metric. The best results are highlighted in bold, while the worst are underlined. The dataset configurations are consistent with those presented in
Table 5.
Configuration | Model | Nose | Neck | Sho | Elb | Wri | Hip | Knee | Ank | |
---|
Dataset A | 4-stage HG | 78.6 | 79.1 | 73.9 | 61.4 | 45.0 | 73.3 | 77.3 | 69.3 | 69.7 |
Dataset B | 4-stage HG | 87.0 | 88.8 | 85.4 | 70.4 | 63.7 | 84.1 | 80.6 | 77.9 | 79.7 |
Dataset C | 4-stage HG | 89.6 | 88.9 | 87.7 | 76.3 | 67.4 | 88.4 | 85.9 | 80.8 | 83.1 |
Dataset D | 4-stage HG | 89.9 | 89.9 | 88.2 | 76.8 | 70.3 | 89.0 | 85.0 | 80.5 | 83.7 |
Table 7.
An ablation study of random data augmentations during training. The evaluations are performed on the DHRP test set for each object using 4-stage hourglass networks. The best results are shown in bold, and the worst are underlined. Training configurations: Train A = Dataset D with no random data augmentation during training; Train B = Dataset D with random image transformations and random color jitter; Train C = Dataset D with random image transformations, random color jitter, random blur, and random noise.
Table 7.
An ablation study of random data augmentations during training. The evaluations are performed on the DHRP test set for each object using 4-stage hourglass networks. The best results are shown in bold, and the worst are underlined. Training configurations: Train A = Dataset D with no random data augmentation during training; Train B = Dataset D with random image transformations and random color jitter; Train C = Dataset D with random image transformations, random color jitter, random blur, and random noise.
Configuration | Model | | | | | | |
---|
Train A | 4-stage HG | 64.7 | 72.3 | 46.0 | 27.3 | 23.7 | 13.3 |
Train B | 4-stage HG | 82.6 | 90.9 | 77.7 | 65.4 | 69.9 | 47.6 |
Train C | 4-stage HG | 84.9 | 93.2 | 82.1 | 75.1 | 80.6 | 64.5 |
Table 8.
An ablation study on individual target humanoid robot datasets using the DHRP test set and 4-stage hourglass networks, evaluated with the
metric. The best results are indicated in bold, while the worst are underlined. The dataset configurations are as follows: Individual = trained separately using only the corresponding target humanoid robot data; Dataset S = arbitrary humanoid robots + synthetic + random background data with no target robot data; Dataset A (Target robots only) and Dataset D (Full) align with the configurations presented in
Table 5.
Table 8.
An ablation study on individual target humanoid robot datasets using the DHRP test set and 4-stage hourglass networks, evaluated with the
metric. The best results are indicated in bold, while the worst are underlined. The dataset configurations are as follows: Individual = trained separately using only the corresponding target humanoid robot data; Dataset S = arbitrary humanoid robots + synthetic + random background data with no target robot data; Dataset A (Target robots only) and Dataset D (Full) align with the configurations presented in
Table 5.
Configuration | | Apo. | Atl. | Dar. | Eve | Fig. | H1 | Kep. | Opt. | Pho. | Tal. | Tor. |
---|
Individual | 51.0 | 48.8 | 57.8 | 61.3 | 38.4 | 27.8 | 49.9 | 90.8 | 50.9 | 34.3 | 42.6 | 58.4 |
Dataset S | 58.5 | 79.2 | 56.9 | 54.5 | 65.9 | 32.2 | 72.1 | 79.0 | 73.6 | 51.3 | 29.9 | 55.0 |
Dataset A | 73.9 | 71.4 | 79.9 | 66.9 | 75.3 | 49.4 | 85.1 | 90.6 | 64.4 | 75.0 | 81.6 | 77.5 |
Dataset D | 84.9 | 85.9 | 88.1 | 79.7 | 82.4 | 69.7 | 95.4 | 94.3 | 80.0 | 83.5 | 91.8 | 87.2 |
Table 9.
An ablation study of each humanoid robot dataset on the DHRP test set using the leave-one-out method. A 4-stage hourglass network is employed for evaluation, utilizing the metric for each class. Each row represents the method trained using Dataset D (real + synthetic + random background data) while excluding the corresponding target robot data. The best results are shown in bold, while the worst are underlined for each column.
Table 9.
An ablation study of each humanoid robot dataset on the DHRP test set using the leave-one-out method. A 4-stage hourglass network is employed for evaluation, utilizing the metric for each class. Each row represents the method trained using Dataset D (real + synthetic + random background data) while excluding the corresponding target robot data. The best results are shown in bold, while the worst are underlined for each column.
Leave-One-Out | | Apo. | Atl. | Dar. | Eve | Fig. | H1 | Kep. | Opt. | Pho. | Tal. | Tor. |
---|
Using All | 75.1 | 74.5 | 80.6 | 73.9 | 72.3 | 63.0 | 89.9 | 90.8 | 67.7 | 62.0 | 81.2 | 77.5 |
w/o Apo. | 65.6 | 52.5 | 74.8 | 65.9 | 62.4 | 55.3 | 83.7 | 83.2 | 61.3 | 48.3 | 66.3 | 71.9 |
w/o Atl. | 62.1 | 62.0 | 38.0 | 67.1 | 69.1 | 50.7 | 84.0 | 84.3 | 56.3 | 43.5 | 67.3 | 72.5 |
w/o Dar. | 65.0 | 63.6 | 75.6 | 47.5 | 61.4 | 54.8 | 82.5 | 82.6 | 59.5 | 46.0 | 67.9 | 78.8 |
w/o Eve | 64.8 | 59.2 | 76.7 | 65.8 | 29.5 | 56.4 | 85.3 | 84.9 | 59.0 | 46.8 | 74.3 | 73.4 |
w/o Fig. | 65.9 | 60.3 | 73.8 | 68.4 | 70.1 | 48.5 | 86.2 | 82.4 | 63.6 | 47.2 | 64.0 | 69.8 |
w/o H1 | 66.5 | 61.0 | 75.8 | 69.9 | 63.1 | 60.9 | 51.3 | 84.8 | 63.2 | 50.8 | 75.7 | 74.7 |
w/o Kep. | 62.5 | 62.0 | 74.1 | 61.3 | 65.2 | 54.7 | 80.8 | 63.7 | 55.9 | 44.5 | 56.7 | 74.1 |
w/o Opt. | 66.7 | 62.3 | 72.9 | 67.0 | 64.7 | 57.3 | 83.7 | 83.8 | 56.2 | 49.5 | 66.4 | 74.7 |
w/o Pho. | 64.3 | 60.6 | 73.0 | 63.7 | 66.7 | 62.1 | 83.3 | 85.6 | 60.0 | 22.0 | 73.1 | 71.0 |
w/o Tal. | 62.1 | 66.5 | 74.3 | 66.4 | 54.8 | 55.0 | 82.2 | 85.8 | 61.9 | 43.6 | 18.1 | 71.1 |
w/o Tor. | 63.9 | 59.4 | 76.6 | 65.9 | 64.0 | 59.4 | 84.6 | 87.4 | 58.1 | 48.3 | 73.1 | 39.1 |
Table 10.
Comparison of models by stage number on the DHRP test set per object. and are used as the evaluation metrics. The best results are shown in bold, and the worst are underlined.
Table 10.
Comparison of models by stage number on the DHRP test set per object. and are used as the evaluation metrics. The best results are shown in bold, and the worst are underlined.
Model | | | | | | |
---|
4-stage hourglass | 84.9 | 93.2 | 82.1 | 75.1 | 80.6 | 64.5 |
5-stage hourglass | 86.7 | 94.3 | 84.5 | 79.2 | 86.3 | 70.4 |
6-stage hourglass | 86.4 | 93.9 | 83.9 | 79.8 | 86.9 | 71.7 |
7-stage hourglass | 87.4 | 95.3 | 86.2 | 81.5 | 88.6 | 74.4 |
8-stage hourglass | 83.6 | 92.5 | 78.1 | 75.2 | 81.9 | 62.5 |
Table 11.
Comparison of models by stage number on the DHRP test set per joint. is used as the evaluation metric. The best results are shown in bold, and the worst are underlined.
Table 11.
Comparison of models by stage number on the DHRP test set per joint. is used as the evaluation metric. The best results are shown in bold, and the worst are underlined.
Method | Nose | Neck | Sho | Elb | Wri | Hip | Knee | Ank | |
---|
4-stage hourglass | 89.9 | 89.9 | 88.2 | 76.8 | 70.3 | 89.0 | 85.0 | 80.5 | 83.7 |
5-stage hourglass | 90.6 | 90.9 | 90.3 | 79.0 | 73.4 | 91.3 | 87.5 | 83.9 | 85.9 |
6-stage hourglass | 89.4 | 90.2 | 89.5 | 80.7 | 75.5 | 90.2 | 88.0 | 84.4 | 86.0 |
7-stage hourglass | 92.1 | 91.1 | 91.2 | 80.6 | 75.5 | 92.4 | 88.0 | 86.2 | 87.1 |
8-stage hourglass | 88.7 | 88.4 | 88.7 | 78.0 | 69.1 | 87.7 | 86.4 | 83.0 | 83.8 |
Table 12.
Evaluation of network models by stage number. All models process normalized input images of 320 × 320 pixels, obtained by transforming arbitrary image sizes.
Table 12.
Evaluation of network models by stage number. All models process normalized input images of 320 × 320 pixels, obtained by transforming arbitrary image sizes.
Model | Parameters | GFLOPs | Training (h) | Inference (FPS) |
---|
4-stage hourglass | 13.0 M | 48.66 | 10.5 | 63.64 |
5-stage hourglass | 16.2 M | 58.76 | 12.8 | 54.56 |
6-stage hourglass | 19.3 M | 68.88 | 14.6 | 48.34 |
7-stage hourglass | 22.4 M | 78.98 | 17.7 | 42.81 |
8-stage hourglass | 25.6 M | 89.10 | 19.8 | 38.34 |
Table 13.
Details of the network architectures for the evaluated methods.
Table 13.
Details of the network architectures for the evaluated methods.
Method | Input Size | Backbone | Parameters | GFLOPs | Inference (FPS) |
---|
RoboCup (NimbRo-Net2) [9] | 384 | ResNet18 | 12.8 M | 28.0 | 48 |
Ours (4-stage) | 320 | Hourglass | 13.0 M | 48.66 | 63.64 |
Ours (7-stage) | 320 | Hourglass | 22.4 M | 78.98 | 42.81 |
Table 14.
Comparative evaluations on the DHRP test set per joint. is used as the evaluation metric. The best results are shown in bold, and the worst are underlined.
Table 14.
Comparative evaluations on the DHRP test set per joint. is used as the evaluation metric. The best results are shown in bold, and the worst are underlined.
Method | Nose | Neck | Sho | Elb | Wri | Hip | Knee | Ank | |
---|
RoboCup (NimbRo-Net2) [9] | 49.9 | 64.1 | 49.1 | 25.5 | 10.5 | 39.1 | 23.8 | 30.2 | 36.5 |
Ours (4-stage) | 89.9 | 89.9 | 88.2 | 76.8 | 70.3 | 89.0 | 85.0 | 80.5 | 83.7 |
Ours (7-stage) | 92.1 | 91.1 | 91.2 | 80.6 | 75.5 | 92.4 | 88.0 | 86.2 | 87.1 |
Table 15.
Performance Evaluation on on the HRP validation set [
10]. Our method is evaluated using the Percentage of Correct Keypoints (PCK) as the evaluation metric. The best results are highlighted in bold.
Table 15.
Performance Evaluation on on the HRP validation set [
10]. Our method is evaluated using the Percentage of Correct Keypoints (PCK) as the evaluation metric. The best results are highlighted in bold.
Table 16.
Evaluation of each target humanoid robot on the DHRP test set by object, using 7-stage hourglass networks and the evaluation metrics and . The best results are highlighted in bold, while the worst are underlined.
Table 16.
Evaluation of each target humanoid robot on the DHRP test set by object, using 7-stage hourglass networks and the evaluation metrics and . The best results are highlighted in bold, while the worst are underlined.
Class | Model | | | | | | |
---|
Total | 7-stage HG | 87.4 | 95.3 | 86.2 | 81.5 | 88.6 | 74.4 |
Apollo | 7-stage HG | 90.0 | 97.5 | 92.4 | 83.6 | 89.8 | 79.7 |
Atlas | 7-stage HG | 87.7 | 95.0 | 88.7 | 83.1 | 91.5 | 76.6 |
Darwin Op | 7-stage HG | 84.1 | 87.3 | 81.3 | 82.6 | 86.6 | 80.6 |
Eve | 7-stage HG | 83.6 | 100 | 90.0 | 82.1 | 99.0 | 85.0 |
Figure01 | 7-stage HG | 72.3 | 85.6 | 54.1 | 69.5 | 79.5 | 50.7 |
H1 | 7-stage HG | 95.9 | 99.1 | 98.2 | 91.6 | 97.3 | 94.5 |
Kepler | 7-stage HG | 95.4 | 97.1 | 97.1 | 90.1 | 95.6 | 89.0 |
Optimus Gen2 | 7-stage HG | 86.1 | 92.6 | 82.2 | 76.6 | 82.2 | 67.4 |
Phoenix | 7-stage HG | 86.9 | 99.4 | 84.5 | 72.1 | 76.8 | 51.2 |
Talos | 7-stage HG | 91.1 | 100 | 97.2 | 87.8 | 98.1 | 89.8 |
Toro | 7-stage HG | 91.5 | 98.1 | 91.1 | 84.3 | 88.6 | 72.2 |
Table 17.
Evaluation of each target humanoid robot on the DHRP test set by joint, using 7-stage hourglass networks with as the evaluation metric. The best results are highlighted in bold, while the worst are underlined. Note: The Phoenix robot lacks lower-body data in the dataset.
Table 17.
Evaluation of each target humanoid robot on the DHRP test set by joint, using 7-stage hourglass networks with as the evaluation metric. The best results are highlighted in bold, while the worst are underlined. Note: The Phoenix robot lacks lower-body data in the dataset.
Class | Model | Nose | Neck | Sho | Elb | Wri | Hip | Knee | Ank | |
---|
Total | 7-stage HG | 92.1 | 91.1 | 91.2 | 80.6 | 75.5 | 92.4 | 88.0 | 86.2 | 87.1 |
Apollo | 7-stage HG | 92.6 | 93.2 | 91.9 | 83.1 | 71.4 | 95.8 | 84.2 | 98.5 | 88.8 |
Atlas | 7-stage HG | 93.8 | 91.8 | 93.0 | 75.6 | 70.9 | 92.6 | 91.4 | 91.4 | 87.6 |
Darwin Op | 7-stage HG | 92.7 | 83.9 | 91.0 | 78.0 | 79.8 | 98.4 | 94.8 | 81.6 | 87.5 |
Eve | 7-stage HG | 95.4 | 93.4 | 93.4 | 83.8 | 68.9 | 79.5 | 78.3 | 74.1 | 83.3 |
Figure01 | 7-stage HG | 92.7 | 89.8 | 88.6 | 61.5 | 53.2 | 82.1 | 71.5 | 69.6 | 76.1 |
H1 | 7-stage HG | 88.8 | 90.9 | 96.1 | 95.0 | 87.4 | 95.4 | 94.9 | 95.5 | 93.0 |
Kepler | 7-stage HG | 94.2 | 94.1 | 91.3 | 93.0 | 89.8 | 97.2 | 84.1 | 99.3 | 92.9 |
Optimus Gen2 | 7-stage HG | 86.9 | 93.1 | 85.4 | 80.9 | 77.7 | 92.6 | 80.0 | 95.3 | 86.5 |
Phoenix | 7-stage HG | 98.5 | 98.5 | 91.9 | 66.3 | 61.5 | 90.5 | N/A | N/A | 84.5 |
Talos | 7-stage HG | 79.4 | 85.2 | 93.2 | 89.4 | 89.2 | 94.9 | 90.6 | 89.8 | 89.0 |
Toro | 7-stage HG | 91.7 | 87.0 | 89.4 | 90.7 | 87.4 | 89.8 | 94.7 | 82.4 | 89.1 |