Abstract
Action recognition has emerged as a central research problem in computer vision, aiming to identify and understand human actions from video data. Over the past decade, the field has advanced from early convolutional approaches to sophisticated architectures capable of capturing complex spatio-temporal dependencies. This survey provides a comprehensive overview of action recognition across six major tasks: Action Classification, Temporal Action Localization, Spatio-temporal Action Localization, Temporal Action Segmentation, Online Action Detection, and Action Anticipation. For each task, we trace the methodological evolution from foundational models to recent state-of-the-art approaches, highlighting how key challenges such as long-range temporal modeling, viewpoint variation, boundary precision, over-segmentation, real-time inference, and future uncertainty have been addressed. We also reorganize benchmark results and evaluation metrics, presenting a unified perspective that facilitates fair comparisons and reproducible research. In addition, we analyze representative datasets, ranging from early benchmarks like UCF101 and HMDB51 to large-scale collections such as Kinetics, ActivityNet, and Epic-Kitchens, which have enabled rapid progress in both supervised and self-supervised learning. We discuss open issues and unresolved challenges, including the use of State Space Models for efficient sequence modeling, multimodal fusion techniques that dynamically assess modality reliability, synthetic data and weak supervision for reducing annotation costs, and fairness-aware frameworks that ensure ethical applicability. By consolidating a decade of progress, this survey offers a structured understanding of the action recognition landscape and aims to inspire further research toward robust, scalable, and responsible video understanding systems.
| Original language | English |
|---|---|
| Pages (from-to) | 32-49 |
| Number of pages | 18 |
| Journal | ICT Express |
| Volume | 12 |
| Issue number | 1 |
| DOIs | |
| State | Published - Feb 2026 |
Keywords
- Action anticipation
- Action classification
- Action recognition
- Online action detection
- Spatio-temporal action localization
- Temporal action localization
- Temporal action segmentation
Fingerprint
Dive into the research topics of 'Action recognition: A comprehensive survey of tasks, methods, and challenges'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver