TY - GEN
T1 - Human activity prediction based on Sub-volume Relationship Descriptor
AU - Lee, Dong Gyu
AU - Lee, Seong Whan
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/1/1
Y1 - 2016/1/1
N2 - In this paper, we address the problem of recognizing unfinished human activity from partially observed videos. Specifically, we propose a novel human activity descriptor, which can represent pairwise relationships among human activities in a compact manner using pre-trained Convolutional Neural Networks (CNNs) by capturing the discriminative sub-volume. The potentially important relationship among all pairwise sub-volumes, called key-volumes, is automatically captured using global and local motion activation and the ratio of the participant. The captured key-volumes without prior knowledge hold discriminative information related to the unfinished activity. The key-volume information is considered in the descriptor construction procedure. Training a CNN model for a particular purpose requires a lot of resources, such as large amount of labeled data and computing power, despite its representational power. Thus, we develop a method to utilize pre-trained CNN without any additional model training procedure. The low-level features can be extracted through existing CNN toolkits. For a real application, the proposed method may be more cost-effective while implementing a smart surveillance system to understand human activity. In our experiments, we compare the performances of the proposed method with other state-of-the-art human activity prediction methods for two public datasets; the results of the experiments show that the proposed method outperforms these competing methods.
AB - In this paper, we address the problem of recognizing unfinished human activity from partially observed videos. Specifically, we propose a novel human activity descriptor, which can represent pairwise relationships among human activities in a compact manner using pre-trained Convolutional Neural Networks (CNNs) by capturing the discriminative sub-volume. The potentially important relationship among all pairwise sub-volumes, called key-volumes, is automatically captured using global and local motion activation and the ratio of the participant. The captured key-volumes without prior knowledge hold discriminative information related to the unfinished activity. The key-volume information is considered in the descriptor construction procedure. Training a CNN model for a particular purpose requires a lot of resources, such as large amount of labeled data and computing power, despite its representational power. Thus, we develop a method to utilize pre-trained CNN without any additional model training procedure. The low-level features can be extracted through existing CNN toolkits. For a real application, the proposed method may be more cost-effective while implementing a smart surveillance system to understand human activity. In our experiments, we compare the performances of the proposed method with other state-of-the-art human activity prediction methods for two public datasets; the results of the experiments show that the proposed method outperforms these competing methods.
UR - http://www.scopus.com/inward/record.url?scp=85019141877&partnerID=8YFLogxK
U2 - 10.1109/ICPR.2016.7899939
DO - 10.1109/ICPR.2016.7899939
M3 - Conference contribution
AN - SCOPUS:85019141877
T3 - Proceedings - International Conference on Pattern Recognition
SP - 2060
EP - 2065
BT - 2016 23rd International Conference on Pattern Recognition, ICPR 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd International Conference on Pattern Recognition, ICPR 2016
Y2 - 4 December 2016 through 8 December 2016
ER -