Abstract
Preprocessing video frames to extract the human ROIs has been widely adopted in many fall recognition tasks, demonstrating improved results compared to approaches that use raw frames. Nonetheless, these methods have limitations because the preprocessing is not optimized alongside the fall events classifier. This leads to high dependence on the quality of preprocessed results and consequently, limited generalization for the classifier in complex environments. In this study, we introduce Disjointed Representation Networks (DisJRNet) as a unified method that is capable of learning a general strategy for separating human and background components. We note that our method only needs raw frames without additional preprocessing steps to obtain human ROIs. DisJRNet first explicitly disjoints convolutional feature maps into two independent components “human” and “background”, and then reassembles them. This enables the model to learn the human-background separation process to obtain a balanced representation, which is useful for recognizing fall events as a result. Also, as the proposed model optimizes feature-level human ROI localization along with the classifier, our model learns more general representations about fall-related movements than existing approaches that use preprocessed data. In experiments, we applied our method to R(2+1)D, which is one of the variants of 3D convolutional neural networks, and achieved state-of-the-art performance on fall video benchmark datasets. Furthermore, by comparing Grad-CAMs, we observe that our model effectively separates the two components while paying more attention to the actual movements related to fall events and reducing background influence as intended.
| Original language | English |
|---|---|
| Article number | 131451 |
| Journal | Neurocomputing |
| Volume | 657 |
| DOIs | |
| State | Published - 7 Dec 2025 |
Keywords
- Computer vision
- Disjointed representation learning
- Human fall recognition
- Video recognition
Fingerprint
Dive into the research topics of 'Disjointed representation learning for better fall recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver