Skip to main navigation Skip to search Skip to main content

Action recognition: A comprehensive survey of tasks, methods, and challenges

  • Kyungpook National University

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Action recognition has emerged as a central research problem in computer vision, aiming to identify and understand human actions from video data. Over the past decade, the field has advanced from early convolutional approaches to sophisticated architectures capable of capturing complex spatio-temporal dependencies. This survey provides a comprehensive overview of action recognition across six major tasks: Action Classification, Temporal Action Localization, Spatio-temporal Action Localization, Temporal Action Segmentation, Online Action Detection, and Action Anticipation. For each task, we trace the methodological evolution from foundational models to recent state-of-the-art approaches, highlighting how key challenges such as long-range temporal modeling, viewpoint variation, boundary precision, over-segmentation, real-time inference, and future uncertainty have been addressed. We also reorganize benchmark results and evaluation metrics, presenting a unified perspective that facilitates fair comparisons and reproducible research. In addition, we analyze representative datasets, ranging from early benchmarks like UCF101 and HMDB51 to large-scale collections such as Kinetics, ActivityNet, and Epic-Kitchens, which have enabled rapid progress in both supervised and self-supervised learning. We discuss open issues and unresolved challenges, including the use of State Space Models for efficient sequence modeling, multimodal fusion techniques that dynamically assess modality reliability, synthetic data and weak supervision for reducing annotation costs, and fairness-aware frameworks that ensure ethical applicability. By consolidating a decade of progress, this survey offers a structured understanding of the action recognition landscape and aims to inspire further research toward robust, scalable, and responsible video understanding systems.

Original languageEnglish
Pages (from-to)32-49
Number of pages18
JournalICT Express
Volume12
Issue number1
DOIs
StatePublished - Feb 2026

Keywords

  • Action anticipation
  • Action classification
  • Action recognition
  • Online action detection
  • Spatio-temporal action localization
  • Temporal action localization
  • Temporal action segmentation

Fingerprint

Dive into the research topics of 'Action recognition: A comprehensive survey of tasks, methods, and challenges'. Together they form a unique fingerprint.

Cite this