TY - GEN
T1 - Convolutional neural network with structural input for visual object tracking
AU - Fiaz, Mustansar
AU - Mahmood, Arif
AU - Jung, Soon Ki
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019
Y1 - 2019
N2 - Numerous deep learning approaches have been applied to visual object tracking owing to their capabilities to leverage huge training data for performance improvement. Most of these approaches have limitations with regard to learning target specific information rich features and therefore observe reduced accuracy in the presence of different challenges such as occlusion, scale variations, rotation and clutter. We proposed a deep neural network that takes input in the form of two stacked patches and regresses both the similarity and the dis-similarity scores in single evaluation. Image patches are concatenated depth-wise and fed to a six channel input of the network. The proposed network is generic and exploits the structural differences between the two input patches to obtain more accurate similarity and dissimilarity scores. Online learning is enforced via short-term and long-term updates to improve the tracking performance. Extensive experimental evaluations have been performed on OTB2015 and TempleColor128 benchmark datasets. Comparisons with state-of-the-art methods indicate that the proposed framework has achieved better tracking performance. The proposed tracking framework has obtained improved accuracy in different challenges including occlusion, background clutter, in-plane rotation and scale variations.
AB - Numerous deep learning approaches have been applied to visual object tracking owing to their capabilities to leverage huge training data for performance improvement. Most of these approaches have limitations with regard to learning target specific information rich features and therefore observe reduced accuracy in the presence of different challenges such as occlusion, scale variations, rotation and clutter. We proposed a deep neural network that takes input in the form of two stacked patches and regresses both the similarity and the dis-similarity scores in single evaluation. Image patches are concatenated depth-wise and fed to a six channel input of the network. The proposed network is generic and exploits the structural differences between the two input patches to obtain more accurate similarity and dissimilarity scores. Online learning is enforced via short-term and long-term updates to improve the tracking performance. Extensive experimental evaluations have been performed on OTB2015 and TempleColor128 benchmark datasets. Comparisons with state-of-the-art methods indicate that the proposed framework has achieved better tracking performance. The proposed tracking framework has obtained improved accuracy in different challenges including occlusion, background clutter, in-plane rotation and scale variations.
KW - Convolutional neural network
KW - Deep learning
KW - Machine learning
KW - Visual tracking
UR - http://www.scopus.com/inward/record.url?scp=85065660584&partnerID=8YFLogxK
U2 - 10.1145/3297280.3297416
DO - 10.1145/3297280.3297416
M3 - Conference contribution
AN - SCOPUS:85065660584
SN - 9781450359337
T3 - Proceedings of the ACM Symposium on Applied Computing
SP - 1345
EP - 1352
BT - Proceedings of the ACM Symposium on Applied Computing
PB - Association for Computing Machinery
T2 - 34th Annual ACM Symposium on Applied Computing, SAC 2019
Y2 - 8 April 2019 through 12 April 2019
ER -