Efficient Visual Tracking with Stacked Channel-Spatial Attention Learning

Md Maklachur Rahman, Mustansar Fiaz, Soon Ki Jung

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

Template based learning, particularly Siamese networks, has recently become popular due to balancing accuracy and speed. However, preserving tracker robustness against challenging scenarios with real-time speed is a primary concern for visual object tracking. Siamese trackers confront difficulties handling target appearance changes continually due to less discrimination ability learning between target and background information. This paper presents stacked channel-spatial attention within Siamese networks to improve tracker robustness without sacrificing fast-tracking speed. The proposed channel attention strengthens target-specific channels increasing their weight while reducing the importance of irrelevant channels with lower weights. Spatial attention is focusing on the most informative region of the target feature map. We integrate the proposed channel and spatial attention modules to enhance tracking performance with end-to-end learning. The proposed tracking framework learns what and where to highlight important target information for efficient tracking. Experimental results on widely used OTB100, OTB50, VOT2016, VOT2017/18, TC-128, and UAV123 benchmarks verified the proposed tracker achieved outstanding performance compared with state-of-the-art trackers.

Original languageEnglish
Article number9102303
Pages (from-to)100857-100869
Number of pages13
JournalIEEE Access
Volume8
DOIs
StatePublished - 2020

Keywords

  • Deep learning
  • Siamese architecture
  • stacked channel-spatial attention
  • visual object tracking

Fingerprint

Dive into the research topics of 'Efficient Visual Tracking with Stacked Channel-Spatial Attention Learning'. Together they form a unique fingerprint.

Cite this