TY - GEN
T1 - Learning 2D Human Poses for Better 3D Lifting via Multi-model 3D-Guidance
AU - Lee, Sanghyeon
AU - Hwang, Yoonho
AU - Lee, Jong Taek
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Recent advancements in 2D pose detectors have significantly improved 3D human pose estimation via the 2D-to-3D lifting approach. Despite these advancements, a substantial accuracy gap remains between using ground-truth 2D poses and detected 2D poses for 3D lifting. However, most methods focus solely on enhancing the 3D lifting network, using 2D pose detectors optimized for 2D accuracy without any refinement to better serve the 3D lifting process. To address this limitation, we propose a novel 3D-guided training method that leverages 3D loss to improve 2D pose estimation. Additionally, we introduce a multi-model training method to ensure robust generalization across various 3D lifting networks. Extensive experiments with three 2D pose detectors and four 3D lifting networks demonstrate our method’s effectiveness. Our method achieves an average improvement of 4.6% in MPJPE on Human3.6M and 16.8% on Panoptic, enhancing 2D poses for accurate 3D lifting. The code is available at https://github.com/knu-vis/L2D-Pose.
AB - Recent advancements in 2D pose detectors have significantly improved 3D human pose estimation via the 2D-to-3D lifting approach. Despite these advancements, a substantial accuracy gap remains between using ground-truth 2D poses and detected 2D poses for 3D lifting. However, most methods focus solely on enhancing the 3D lifting network, using 2D pose detectors optimized for 2D accuracy without any refinement to better serve the 3D lifting process. To address this limitation, we propose a novel 3D-guided training method that leverages 3D loss to improve 2D pose estimation. Additionally, we introduce a multi-model training method to ensure robust generalization across various 3D lifting networks. Extensive experiments with three 2D pose detectors and four 3D lifting networks demonstrate our method’s effectiveness. Our method achieves an average improvement of 4.6% in MPJPE on Human3.6M and 16.8% on Panoptic, enhancing 2D poses for accurate 3D lifting. The code is available at https://github.com/knu-vis/L2D-Pose.
KW - Human pose estimation
KW - Training strategy
UR - http://www.scopus.com/inward/record.url?scp=85213006957&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-0885-0_11
DO - 10.1007/978-981-96-0885-0_11
M3 - Conference contribution
AN - SCOPUS:85213006957
SN - 9789819608843
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 185
EP - 202
BT - Computer Vision – ACCV 2024 - 17th Asian Conference on Computer Vision, Proceedings
A2 - Cho, Minsu
A2 - Laptev, Ivan
A2 - Tran, Du
A2 - Yao, Angela
A2 - Zha, Hongbin
PB - Springer Science and Business Media Deutschland GmbH
T2 - 17th Asian Conference on Computer Vision, ACCV 2024
Y2 - 8 December 2024 through 12 December 2024
ER -