TY - GEN
T1 - Spatio-temporal Weight of Active Region for Human Activity Recognition
AU - Lee, Dong Gyu
AU - Won, Dong Ok
N1 - Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Although activity recognition in the video has been widely studied with recent significant advances in deep learning approaches, it is still a challenging task on real-world datasets. Skeleton-based action recognition has gained popularity because of its ability to exploit sophisticated information about human behavior, but the most cost-effective depth sensor still has the limitation that it only captures indoor scenes. In this paper, we propose a framework for human activity recognition based on spatio-temporal weight of active regions by utilizing human a pose estimation algorithm on RGB video. In the proposed framework, the human pose-based joint motion features with body parts are extracted by adopting a publicly available pose estimation algorithm. Semantically important body parts that interact with other objects gain higher weights based on spatio-temporal activation. The local patches from actively interacting joints with weights and full body part image features are also combined in a single framework. Finally, the temporal dynamics are modeled by LSTM features over time. We validate the proposed method on two public datasets: the BIT-Interaction and UT-Interaction datasets, which are widely used for human interaction recognition performance evaluation. Our method showed the effectiveness by outperforming competing methods in quantitative comparisons.
AB - Although activity recognition in the video has been widely studied with recent significant advances in deep learning approaches, it is still a challenging task on real-world datasets. Skeleton-based action recognition has gained popularity because of its ability to exploit sophisticated information about human behavior, but the most cost-effective depth sensor still has the limitation that it only captures indoor scenes. In this paper, we propose a framework for human activity recognition based on spatio-temporal weight of active regions by utilizing human a pose estimation algorithm on RGB video. In the proposed framework, the human pose-based joint motion features with body parts are extracted by adopting a publicly available pose estimation algorithm. Semantically important body parts that interact with other objects gain higher weights based on spatio-temporal activation. The local patches from actively interacting joints with weights and full body part image features are also combined in a single framework. Finally, the temporal dynamics are modeled by LSTM features over time. We validate the proposed method on two public datasets: the BIT-Interaction and UT-Interaction datasets, which are widely used for human interaction recognition performance evaluation. Our method showed the effectiveness by outperforming competing methods in quantitative comparisons.
KW - Human activity recognition
KW - Human-human interaction
KW - Spatio-temporal weight
UR - http://www.scopus.com/inward/record.url?scp=85130392369&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-02375-0_7
DO - 10.1007/978-3-031-02375-0_7
M3 - Conference contribution
AN - SCOPUS:85130392369
SN - 9783031023743
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 92
EP - 103
BT - Pattern Recognition - 6th Asian Conference, ACPR 2021, Revised Selected Papers
A2 - Wallraven, Christian
A2 - Liu, Qingshan
A2 - Nagahara, Hajime
PB - Springer Science and Business Media Deutschland GmbH
T2 - 6th Asian Conference on Pattern Recognition, ACPR 2021
Y2 - 9 November 2021 through 12 November 2021
ER -